Re: Shutdown Day

2007-03-06 Thread Miles O';Neal
Ioannis Vranos said...
|
|I think this may be interesting for many in here. :-)

If all the national labs shut down, we will, too.

8^)


Re: Shutdown Day

2007-03-06 Thread Miles O';Neal
Brett Viren said...
|
|[EMAIL PROTECTED] (Miles O'Neal) writes:
|
|> Ioannis Vranos said...
|> |
|> |I think this may be interesting for many in here. :-)
|>
|> If all the national labs shut down, we will, too.
|
|BNL leads the world in cutting edge shutingdownness:
|
|http://www.bnl.gov/bnlweb/pubaf/pr/PR_display.asp?prID=06-108

I'm not sure that qualifies.  Though they
are off the grid, everyone could still play
nethack or eliza (isn't that what supercomputer
clusters are for?)


Re: updating to gtk+2

2007-03-07 Thread Miles O';Neal
Nathan Moore said...
|
|I'd like to try out the Xfce desktop manager on my SL 4.4 box.  Xfce  
|requires a newer version of glib(2.12)/gtk(2.6)/pango(1.13) than is  
|provided in the SL yum repositories.  I've been playing the download/ 
|configure/make install game, but installing two copies of glib and  
|gtk makes me uncomfortable.
|
|Is there a more stable (ie through yum) way I can update these  
|libraries?

Have you looked for RPMS built for RHEL4?
Also, either FC2 or FC3 RPMs should work (I
forget which).  Assuming you don't absolutely
have to have the latest version...

We grabbed some RH9 RPMs for Xfce on SL3 and
they they worked fine.


RHN Satellite Server?

2007-03-15 Thread Miles O';Neal
I hear RH is going to open source this.
Does the SL team plan to build this as
well?  Just curious.

Thanks,
Miles


Re: RHN Satellite Server?

2007-03-15 Thread Miles O';Neal
Michael Mansour said...

|> I hear RH is going to open source this.
|> Does the SL team plan to build this as
|> well?  Just curious.
|
|Where did you hear this?
|
|RHN is their bread and butter, I'm surprised to hear this news.

Slashdot pointed to an ARS Technica article about
RHEL5, which linked to an article about this, which
linked to this at inforworld:

   
http://weblog.infoworld.com/openresource/archives/2007/01/red_hat_to_open.html

I was pretty surprised, too.  I suspect he's right, that
this is a reaction to Oracle.  Whether it'll stick, who
can say? 8^/

-Miles


Re: RHN Satellite Server?

2007-03-23 Thread Miles O';Neal
Stephen John Smoogen said...

|Ok as far as I know there has been no official declaration that they
|are open-sourcing the RHN server.

AFAIK you are correct.  However...

|There have been a couple of articles
|about them having to do so at some point.. but that is speculation of
|the authors not an announcement.

Not unless an InfoWorld author is flat
out lying.

   
http://weblog.infoworld.com/openresource/archives/2007/01/red_hat_to_open.html

While this is not definitive, and I can't find solid
corroboration anywhere, it's also not (as far as I can
tell) pure, authorial speculation.

-Miles


Re: NFS attribute cache problem in SL4

2007-04-12 Thread Miles O';Neal
Devin Bougie said...

|It looks like we=92ve run into an NFS client bug in SL4.  We stumbled=
|upon this while trying to checkout code from a subversion repository =
|to an nfs directory.  Our NFS servers are SL3, and we only see this =
|bug with SL4 clients.  Things work when mounting the directory using =
|=91noac=92, but we can=92t live with the performance hit.

We've run into serious NFS problems with SL3 with our
NetApp filer and NFS.  We solved the problem (or at
least greatly reduced it) with options like these:

foo:/vol/vol1/bar on /export/bar type nfs 
(rw,vers=3,hard,intr,bg,rsize=32768,wsize=32768,tcp,timeo=10,retrans=5,retry=5,actimeo=3,addr=www.xxx.yyy.zzz)

The low actimeo and big ?size options were key.  We
also found performance unacceptable with noac.

-Miles


Re: Seamonkey for SL5?

2007-04-20 Thread Miles O';Neal
Troy Dawson said...

|Just tried looking for it with yum, and nope, currently, nobody has it 
|in any of their extra repositories.

And it seems that's where it ought to be, in
SL5 extras, not SL5, per se.  That's where I'd
expect to pick it up.  Clear delineation of
what's part of the base and supported vs the
extra goodies (regardless of how important,
cool or whatever) is, IMO a good thing.


Re: [SCIENTIFIC-LINUX-USERS] reboot freezes

2007-05-07 Thread Miles O';Neal
Ken Teh said...
|
|cpuspeed was off.  If I say 'shutdown -h now', the machines powers off 
|after a clean shutdown.  If I say 'shutdown -r now', it brings the 
|machine down.  The last steps are remounting md0 readonly and syncing 
|the SATA drives.  The last line it prints on the screen is 'Restarting 
|system'.  Then, it just sits there.  I have to use the front-panel 
|switch to turn it off, then power it back on.  Which is a bummer if you 
|have to reboot it remotely. Btw, the front-panel switch is one of these 
|new types when you have to hold it down for several seconds before it 
|actually powers the machine off.

That should be changeable in teh BIOS.

Does the reboot switch not work?

|I'm wondering if it's some hardware problem.  It's my understanding that 
|a restart is basically like a push button reset.  The last thing the CPU 
|does is pull the reset line (A20 or A21 ??) which causes the M/B to go 
|through its boot-up.  But, now with ACPI and these smart power supplies, 
|I was wondering if somehow this is the cause of my problem.
|
|Any other suggestions?

We have some systems here that do this when we add in
certain sound cards.  It appears to be an interrupt or
similar resource conflict.


Re: SL5 installation - "Everything" is missing

2007-05-15 Thread Miles O';Neal
Connie Sieh said...

|Sorry but that is the way TUV has it coded and I agree with them for 
|taking it out.  I actually like the idea of not allowing a "Everything" 
|install.  It installs things that exist but are not configured and can 
|lead to security issues since they are are not configured.  It is also 
|hard to support as some packages just conflict.
|
|In the past when it was hard to install packages after a install was done 
|I can see how this option could be useful.  Today with yum and the gui 
|yum front ends making it easy to install packages later I do not see the 
|real need for this.

The thing is, some of us like a one step installation
process.  Every time I have ever used anything less
than everything (with one exception, see below) it has
caused lots of problems.  Inevitably things failed
because of dependancy problems someone missed along
the way, and some package we expected to be somewhere
wasn't, so it took a lot of extra effort.  These have
bitten us many times over the years; loading "Everything"
never bit us with conflicts.

Alex (the OP) also noted:

|>  By the way the "Minimal" option has gone too which we never used in 
practice but I imagine could be useful.

I have used this on a couple of occasions and hand
added a couple of specific packages, with good results,
for special purpose systems exposed to the internet.
It's been a while, so I have no idea if it would still
work.  But at the time, it was handy.

FWIW,
Miles


Re: SL5 installation - "Everything" is missing

2007-05-15 Thread Miles O';Neal
Connie Sieh said...

|You can still do a kickstart install with your ks config file listing all 
|of the rpms.

That's better than nothing, but it still
make initial testing annoying.  I don't
really expect y'all to undo the TUV's
philosophical changes, I'm just expressing
dissatisfaction with what they did.

Thanks,
Miles


Re: SL5 installation - "Everything" is missing

2007-05-15 Thread Miles O';Neal
Connie Sieh said...

|I am sure they had a reason for doing it.

One would hope so.  8^)

It may even be a good reason.  I just
wish they hadn't.  If it's an issue
with conflicts as others have suggested,
they could have simply added a note to
that effect, with a "don't do this unless
you know what you are doing and are
willing to accept responsibility for the
end of the world as we know it" message.

|Note that FC5, FC6 and FC7 do 
|not have "everything" installs either.

Which undooubtedly explains why EL5
doesn't have it, either.

Thanks,
Miles


YP/NIS weirdness on 4.4

2007-05-18 Thread Miles O';Neal
We're getting "do_ypcall: clnt_call: RPC: Timed out"
errors.

We're in the process of upgrading to 4.4,
starting with some new 64 bit Supermnicros,
some with a single Xeon dual core and some with
a single Core 2 Duo.  Both have Intel e1000
ethernet chipsets.

We use NIS for user passwd and group entries,
as well as netgroups, services and automounts.
This has worked for us on 32 bit systems from
Redhat5.2 up through SL30{4,7} (including some
64 bit Athlons running a 32 bit OS).  We can
reproduce this on the 32 bit SL3 systems, but
they're a lot slower, and it takes some effort
to do it.

We first saw problems with torque (we've used
PBS Pro in the past), but narrowed it down to
rsh (and even a bare bones program running
rcmd()).  A single, random rsh call is fairly
safe, but if we do one every second or two,
we quickly start getting hangs and the error:

   do_ypcall: clnt_call: RPC: Timed out

So it can happen at any time, but when we fire
off lots of jobs in quick succession via torque,
it's guaranteed to happen.  We have also seen
this with less frequency in some home grown tools.

We've stripped down NIS to bare essentials (using
only netgroup for testing), we've tried adding in
a 3Com ethernet card to use instead of the built
on cards, we've upgraded to the latest EL4 ypbind,
ypserv and glibc (which we found in a CERN repo
after looking through TUV's bug list), we've tried
adding more, faster NIS servers, and we've tried
isolating three machines on a 100Mb network (no
spare 1Gb switches).  And tried running the non-SMP
kernel.  No difference.

Bizarrely, we also get whining in the SL3 ypservers'
message logs about failed NIS host lookups.  We don't
use NIS for host lookups; nsswitch.conf has

   hosts:   files dns

.  We had only used solaris servers in the past,
and their ypserv's were not logging these errors.
Presumably they still got the requests, but we
don't know that.

We ran ypserv in debug mode for a while, and nothing
jumped out at us.

We started running nscd for passwd and group on all
the Linux systems after this started.  No change.

The switches are Cisco Gb switches and HP ProCurve
Gb switches (the isolated test network was a 3Com
100Mb switch).

Any ideas on either problem?

Thanks,
Miles

TEST SCRIPT (works every time with failure in less than
10 rsh calls on our faster boxes on the Gb network):

#!/bin/csh

# set LIST_OF_HOSTNAMES to a valid list of hosts
# to try, the more the merrier.  We use a command
# to generate these from a file of valid names.

while ( 1 )
foreach i ( $LIST_OF_HOSTNAMES )
rsh $i uname -a # or any command you like
end
end


Re: YP/NIS weirdness on 4.4

2007-05-18 Thread Miles O';Neal
John Hearns said...

|Echoing what Jon Peatfield says, I have seen these errors on systems.

I've passed Jon's notes on to someone else to
try.  I forgot to mention that we have IPMI cards
but have not yet done anything to them.  So they
are in the systems, but unconfigured.  (Is that
a problem?)

|Can I ask if you have IPMI on these machines?
|If that is so, I have a suggested fix.
|
|Use sysctl to set sunrpc.min_resvport to 665
|(IPMI cards use port 664 also)
|
|To make the fix persist after a reboot, edit
|/etc/sysctl.conf and add the lines:
|
|sunrpc.max_resvport = 1023
|sunrpc.min_resvport = 650

We tried this on 10 of the systems.  When I started
the test loop, it went a little longer before failing,
then started failing again.

Also I forgot to note that this doesn't happen
with ssh.

-Miles


partial answer to Jon Peatfield on NIS on SL4.4

2007-05-18 Thread Miles O';Neal
Jon,

Thanks for your insights.  We're looking at these
things.  Here are some partial answers.

|If you manage to send sufficiently many requests to the server that *it*
|can't cope then you will see these messages.  Some ypserv implementations 
|cope better with load than others...

We added more servers.  Then we tried it with just two
clients and one system running the server.  Same thing.
Even with "sleep 1" between rsh calls.

|Now we have some servers with Intel mboards with braindead BMC chipsets 
|which eat all traffic to the IPMI ports.  When anything happened to pick 
|those ports it never gets an answer so will time out.  We saw *lots* of 
|this especially doing things which caused lots of yp requests -- until we 
|tracked it down and caused things to avoid the IPMI ports.

I discussed this in the answer to John Hearn's note,
which I just sent to the list.  We have 'em, but have
yet to configure them.  We took the IPMI out of one
system as a test.  No difference.

|I assume that you also checked for firewall issues at both ends...

iptables and ipchains off comepletely.  No
forewalling on the switches.  SELinux is off.

|Do you also see it with ssh connections?  I ask 'cos rsh also picks a 
|privelaged (tcp) port...

No.  Only rsh.  We thought about switching to ssh for
torque, but we have other apps that throw these errors
on these systems as well (albeit far less often).

Thanks,
Miles


fix for NIS/YP timeout

2007-05-18 Thread Miles O';Neal
Thanks to all who responded, and Jon Peatfield for the fix.

The motherboards have something that watches
a pair of ports to hand things off to the IPMI card
*even when the IPMI card isn't installed*.  So if we
have xinetd watch that port with a fake service, the
problems go away.

The docs I had seen suggested all the intelligence
was on the IPMI card, which is why when this happened
without the card, we discounted IPMI involvement.
Jon and his cohorts were looking for a similar problem
to ours and stumbled across the fix.

Jon (and now we) installed four port watchers:

/etc/xinetd.d/rmcps-tcp
/etc/xinetd.d/rmcps-udp
/etc/xinetd.d/rmcp-tcp
/etc/xinetd.d/rmcp-udp

and restarted xinetd.

Each of these looks something like this:
-
# default: on
# description: Dummy entry for rmcps-tcp tcp version

service rmcps-tcp
{
type= UNLISTED
id  = rmcps-tcp
socket_type = stream
protocol= tcp
user= root
wait= no
disable = no
port= 664
server  = /bin/true
}
-

The rmcps services listen on port 664;
rmcp services listen on 623.

Once again, the SL community does a great job of support.
Thanks.

-Miles


Re: SL 4.4 Lack of disk space / HW problem

2007-05-24 Thread Miles O';Neal
Livio Condorelli said...

|
|" There was an error installing samba-common-3.0.10-1.4E.9 This can 
|indicate media failure
|   Lack of disk space, and/or hardware problems.This is a fatal error 
|and your install will be aborted.
|Please verify your media and try your install again"
|
|Now I will explain what I did until now
|
|1) Download the 4 iso image from the scientificlinix site, create a dvd 
|image and try to install and
| I got the error as I mention but with different package"

Does it die at a different package each time?  If
so, you may have a hardware issue.  We saw this with
some systems that had a timing problem of some sort;
backing off the RAM bus speed a notch resolved it.
I wouldn't expect this on a Dell 2950, but I guess
it's possible, especially if you used any aftermarket
parts (RAM, disk, etc).  You might try memory and other
hardware diags, too, if nobody else has a solution.

We saw this with 4.x (I forget which version) on hardware
that had previously run OK on 3.0.4.

Then there are possible APIC and APCI issues...

If it happens on the same package every time, I have no
idea.

-Miles


Re: Permissions

2007-05-29 Thread Miles O';Neal
Claudiu Tanaselia said...
|
|I have an ext3 partition that I can't write to using a normal user. My
|line from fstab look like this:
|
|/dev/sda2   /media/storage  ext3defaults,users  0 0
|
|I did chown and chmod -R +rw, no effect.
|
|This might be a basic linux thing, but until recently my only ext3
|partition was the root one so I never encounter this problem, now I
|decided to convert my whole harddrive into ext3, but I can write to it
|only as root.

You shouldn't need "users" since that just allows a normal user
to mount and unmount the file system.  "defaults" should be all
you need.  If you really want users to be able to mount and unmount
this FS, I would expect that to work.

What does it look like when you type

   mount | grep /media/storage

?


SL4 - e1000 NIC tweaks

2007-06-13 Thread Miles O';Neal
We've been working on tweaking our new
Supermicro Core 2 Duo servers (the same
ones we had the infamouns IPMI port issue
with a few weeks ago (thanks again to all
who helped, esp. Jon Peatfield!)).

Anyway, I find parameters that seem to
require ifconfig to tweak (txqueuelen)
and parameters that seem to need changing
via /etc/modprobe.conf (RxDescriptors,
TxDescriptors).  Is there a single way
to handle both sets?  I haven't tried
txqueuelen in the modprobe.conf file,
but RxDescriptorson the ifconfig line
just barfed.

I found an IBM RedPaper on RHEL tuning
that helped a lot with network tuning in
general, but I haven't found the answer
to this one anywhere.

Thanks,
Miles


fvwm RPMs for SL4?

2007-06-15 Thread Miles O';Neal
Does anyone have 64 bit RPMs for fvwm2
that will run with SL4 or FC3?  I can
find very few 64bit fvwm2 RPMs, and
nothing earlier than FC5, which means
a bunch of dependancies we don't want
to chase.

We'll build if we need to but would
rather fins something prebuilt.

Thanks,
Miles


Re: fvwm RPMs for SL4?

2007-06-15 Thread Miles O';Neal
Jon Peatfield said...

|If you just want binaries then look in 
|http://www.damtp.cam.ac.uk/linux/sl/local/4x/RPMS/x86_64/

This got me really excited, but the RPM her
wants glibc2.4, and EL4.4 seems to have glibc2.3.

8^(

Thanks, tho.


Re: fvwm RPMs for SL4?

2007-06-19 Thread Miles O';Neal
We installed Jon's fvwm2 RPMs on another system, and they
installed fine.  We now have a menu/font problem (the
fvwm menus appear blank), but at least they installed.
Since both systems aere clean installs from the same
dir, I have no idea why the first one failed.


Re: prelink

2007-06-22 Thread Miles O';Neal
Keith Lofstrom said...

|I plan to remove /etc/cron.daily/prelink,  revert my binaries and
|libraries with "prelink -au", then comment out all the "-l" lines
|in /etc/prelink.conf so that the loader doesn't attempt to do it.
|
|Then I will rebuild my backups, and reinitialize osiris.  
|
|Any flaws in my thinking?

Have you considered just asking the author to give
you an option to modify file times?  Or just modifying
the source?

Does your system integrity software let you add
actions?  If so, you could check the files that
get flagged with prelink's checksum feature.

But... without knowing prelink intimately, your
approach sounds like it should work.


Re: Minor problem with ATI video in SL 5.0, i386

2007-06-23 Thread Miles O';Neal
Michael Hannon said...
|
|Greetings.  We just installed Scientific Linux 5.0 on a Dell Optiplex

We're still running 3 and 4 on everything, but maybe we can help.

|745 belonging to one of the professors here.  On the whole, the system
|works just fine, but we've got one small, annoying glitch with the
|video, and I'm seeking advice about it.  Note that we've installed the 
|32-bit version of SL, on the theory that there would be better drivers, 

We just stared using 64 bit on a coupl eof desktops, and so
far haven't seen any problems with 64 bit drivers.  We've been
running it on compute servers for a while with no problems,
but obviously graphics drivers don't come into play there.
All the other drivers have been rock solid, though.

|etc., for this version (and this is the prof's desktop system, not a 
|number cruncher).
|
|The monitor for this system is a Dell 2407WFP, with a preferred
|resolution of [EMAIL PROTECTED]  We're using the DVI interface to the
|monitor.

We have about 10 of these.  Most of them are on nVidia cards,
but a couple are still on ATI cards.

|The graphics adapter is identified as follows by "lspci":
|
|01:00.1 Display controller: ATI Technologies Inc Unknown device 71a3
| Subsystem: Dell Unknown device 0d03
| Flags: bus master, fast devsel, latency 0
| Memory at dfdf (64-bit, non-prefetchable) [size=64K]
| Capabilities: [50] Power Management version 2
| Capabilities: [58] Express Endpoint IRQ 0

I'm guessing this is one of the reasons you went to the proprietary
drivers?

Knowing the chipset and device, you can try setting that in the OSS
driver and see if that works-- if you can otherwise use that driver.
We switched to the proprietary drivers for 3D, better multi-head,
etc.

|We've downloaded what appears to be latest version of the ATI driver
|(ati-driver-installer-8.37.6-x86.x86_64.run) and have used the included
|aticonfig utility to help configure X-windows.  Note that the ATI web 
|site has two choices for linux:
|
| Linux x86
| Linux x86_64
|
|So far as I can tell, both choices lead to the same installer file, 
|marked as "...x86_64...".
|
|By the way, the ATI software identifies the graphics adapter as:
|
| Radeon X1300/X1550 Series

We have also, on occasion, had to try several versions of the
driver to find the best one.  It's *usually* the newest, but
not always.

|The difficulty is that X-windows insists on running the display at a 
|resolution of:
|
| 1600x1200
|
|This makes for a perfectly clear display, IMHO, but it does give
|characters and images the wrong aspect ratio: everything looks a little 
|too short and a little too wide.

First off, have you tried setting up the monitor section of
xorg.conf both with and without the horizontal and vertical specs?
Sometimes it works better to supply them, and sometimes it works
better to let X probe the monitor.

We found that on some combinations of cards and Dell 2007WFPs
we need to explicitly set the mode lines.  Just type in
'X11 "mode line" generator" in your favorite search engine,
fill in the blanks, and cut and paste the results.  The one time
we needed to do this with a 2407WFP it didn't work, but the card
was pretty ancient.

An example 2007WFP setup looks like this:

Section "Modes"
# Optimal Dell 2007WFP Mode [EMAIL PROTECTED]
Identifier  "16:10"
ModeLine"1680x1050" 146.2 1680 1784 1960 2240 1050 1053 1059 
1089 #60Hz
EndSection

Section "Monitor"
Identifier   "aticonfig Monitor 1"
VendorName   "Dell"
ModelName"dell 2000WFP"
UseModes "16:10"
Option  "dpms"
EndSection

|We opened a help ticket with ATI support and received the following
|non-response:
|
| >The Linux drivers available from ATI are provide are "as is".
| >You may be able to get further assistance from the Linux community...
|
|(All your video are belong to us.)

And yet only a few weeks ago I read an interview with someone
at AMD assuring us of their support of the Linux drivers. 8^/

You might try to dig his name out of /. and see if he has an email
address, and let him know how much you appreciate their support.
(I believe it was a he, but I won't swear to that.)

The flip side of this is that good quality graphics adapters aren't
that expensive.  You can get an nVidia 7600GS for under $100 IIRC,
and the 7600GT for less than $200.  They have plenty of power, and
the drivers Just Plain Work, IME.

-Miles
-- 
Miles O'Neal
IT Manager
Intrinsity, Inc.
[EMAIL PROTECTED]


Re: Mail from a script

2007-07-12 Thread Miles O';Neal
Keith Chadwick said...
|
|Sure!
|
|=09/bin/mail -s "$message_subject" $message_mailto < $message_file

Just in case you don't know, if everything you need
is in the script, you can avoid writing a for the
message body:

/bin/mail -s "$message_subject" $message_mailto <<__EOD__
message body here
__EOD__


RE: Mail from a script

2007-07-14 Thread Miles O';Neal
Manuel Mussini said...
|Could you help me a little more to understand the syntax?
|It's not clear to me what "cat <

Re: Custom PHP install

2007-07-17 Thread Miles O';Neal
Johan Mares said...
|
|It is easy to install a webserver with SL, but more and more we are 
|having trouble because the versions that come with either SL4.4 (PHP 
|4.3.9) or SL5.0 (PHP5.1.6) are not recent enough to use current versions 
|of third party software (for example Mapserver needs PHP 4.4.6 or 
|5.2.1).  I've installed LAMP manually, but that was at home, not for a 
|production environment. I would like to keep on using SL, the apache and 
|mysql that comes with it are ok too, but what is the best means to 
|install a more recent version of PHP (5.2.1 or maybe even the 5.2.3) and 
|how do you update/maintain it ? With PHP I am not only referring to the 
|PHP module, but also to the PHP-SOAP,
|PHP-PEAR, PHP-mbstring, PHP-mcrypt, ...

Along with whatever you are requesting here you might also
fuss at the people writing the packages you are trying to
install, or the people packaging it.  A lot of times there
is no reason for requiring the most up to date packages.

Also, do they "really* need these, or is it just the package
manager because of whatever they built on?  Did you try
installing on a non-production system with "--nodeps"?

This is one of the downsides to the anarchy of free software.


Re: Custom PHP install (fwd)

2007-07-17 Thread Miles O';Neal
Donald Tripp said...

|You can also try using a YUM repository for Fedora, and doing a yum  
|install php, which should solve all your package dependencies for you.
|
|http://dag.wieers.com/rpm/

Dag deserves a monument for the packages he
makes available on so many platforms.  He's
not the only one, but he's one of the main
ones.  His repo has saved my hide many times.


torque vs sockets in 4.4

2007-08-06 Thread Miles O';Neal
We recently migrated from PBS to torque, and most of our
systems are now running 4.4 .  The torque server (a Core2
Duo at 2.4GHz) is only handling about 3x the jobs our 300MHz
Sun Ultra 5 could handle before bogging down horribly.  This
seems a bit odd.

Watching the server logs, it seems there's a lot of time
spent waiting for replies on sockets, though it's not clear
whether it's on the same system between the scheduler and
batch server, or between the batch server and client node
processes (pbs_moms).

We're beginning to wonder of it's OS-related.  Torque uses
a lot of sockets, and sets them up and tears them down at a
hefty rate.  We have the number set to 16K for the scheduler
and server processes via ulimit, but we aren't getting much
above 1400 between the two processes.

Is anyone aware of an issue in 4.4 that might affect this?

Thanks,
Miles


Re: torque vs sockets in 4.4

2007-08-07 Thread Miles O';Neal
Steve Traylen said...

|How many nodes and jobs?

About 325 nodes.  Without just one layer of queues,
it slows down drastically at 1500 jobs or so.  With
routing queues it can slos down at a few hundred
and gets unusable by 1200 jobs queued.

...

|Do consider changing the values as described here.
|http://www.clusterresources.com/torquedocs21/a.flargeclusters.shtml
|
|in particular for large farms you really need to have poll_jobs set  
|to true and increase the job_stat_rate.

We've played with everything there quite a bit.

Today we're going to try pulling some nodes and
setting up a separate server on SL3 to see if the
OS is involved or not.

Thanks,
Miles


Re: /etc/hosts and DNS, bug or feature?

2007-08-11 Thread Miles O';Neal
Nathan Moore said...

|[EMAIL PROTECTED] ~]# cat /etc/hosts
|# Do not remove the following line, or various programs
|# that require network functionality will fail.
|127.0.0.1   bufflocalhost.localdomain   localhost
|::1 localhost6.localdomain6 localhost6
|
|Can someone explain what the "localhost6" does?

I'm guessing from the format and that the line contains three
sixes (666!) that it's IPV6 stuff.

Which doresn't help with your other problem.

What is your nsswitch.conf entry for hosts?
Does NIS or LDAP factor in?


Re: /etc/hosts and DNS, bug or feature?

2007-08-12 Thread Miles O';Neal
Nathan Moore said...

|
|[EMAIL PROTECTED] ~]# cat /etc/nsswitch.conf
|#
|# /etc/nsswitch.conf
|#
|# An example Name Service Switch config file. This file should be
|# sorted with the most-used services at the beginning.
|...
|#hosts: db files nisplus nis dns
|# hosts:  files nis dns  this was the original version
|hosts:  dns files nis
|
|When I changed the order of the hosts entry, the name lookup worked.

Is honker by any chance an NIS server?
If so, it may have an /etc/hosts file
that has honker set to 127.0.0.1 .

|I
|suppose the other route would be to have a long /etc/hosts file on each
|machine that defines the names of all nodes.

You could, but maintenance is a pain.
This is what DNS is for, and it normally
works just fine.

|I'm still confused about the contents of /etc/hosts though.  Any ideas?

Only the suspicion I mentioned before,
that's it's the IPv6 version of localhost.

-Miles


/etc/hosts vs NIS vs DNS

2007-08-12 Thread Miles O';Neal
Nathan Moore said...

|Is there a way to share a hosts list via NIS?  Are there any advantages to
|this over DNS or /etc/hosts?  I'm running NIS, DNS, and home directory
|shares all off of one server (honker in the examples, 10.30.27.5)

Aha, I was right!  You are probably already sharing
the hosts file off honker.  Look and see if /etch/hosts
there or /var/yp/hosts (should be the former, but...)
contains an entry with honker as 127.0.0.1 or something
similar.  If so, break that into two lines, like this:

127.0.0.1   localhost.localdomain localhost
10.30.27.5  honker.whatever honker

Since I suspect you already are serving this file via
NIS, you could try adding a bogus host and IP to that
file and doing a make in /var/yp (wheberever) and
then trying a ping of the bogus host on an NIS client.
UIt should resolve, but not ping.

The problem with using NIS for host lookups is that
as far as I Know, NIS doesn't cache this.  The NIS
caching daemon (nscd) doesn't cache it, either.  Maybe
the resolver caches it even if it gets it through
NIS; I don't know.  I do know the resolved is supposed
to cache things looked up through DNS.

-Miles


stupid nslookup

2007-09-18 Thread Miles O';Neal
Since forever, the Linux nslookup command hasn't supported
"help" or "ls", at least.  Has this changed in EL5?  Can
anyone receommend a better version of nslookup for linux?

Thanks,
Miles


Re: stupid nslookup

2007-09-18 Thread Miles O';Neal
Mark Stodola (and others) said (or words to this effect)...

|nslookup has been deprecated for years and shouldn't be used.  I'd 
|recommend looking at using the replacement called 'dig' for queries.  
|See if that solves your problems.

I finally found out how to do what I needed in dig.
I really don't care much for dig's interface or docs,
but since it's supported, I guess I'm stuck with it.
Thanks to all who replied.

-Miles


Re: 64-bit application

2007-09-26 Thread Miles O';Neal
Troy Dawson said...

|It really comes down to your application.  If possible, try it on two 
|comparable CPU setups, one AMD and one Intel.  I've seen some wildly lopsided 
|tests, try to at least give them the same amount of memory and the same disks.
|Then run your application on it, and see which is faster.
|
|That's how I decided I like the Opteron.  On my tests (recompiling rpm's) the 
|Opteron beat the Xeon.  But I saw other people with the exact same setup, and 
|for them the Xeon beat the Opteron.  It all came down to the application.

And that can change over time.  For years
we bought only AMD-based systems, because
most of our apps consistently ran better
on them.  Then we found a couple that were
decidely better on INtel.  When we tested
equivalent servers for the last set of
compute farm systems, Intel won hands down.

-Miles


EL4.4 and older Logitech PS2 mouse

2007-10-02 Thread Miles O';Neal
We started a 4.4 migration a few months ago for our
compute servers but are just now moving the desktops.

My desktop system is working great except that the
mouse (an older, three button PS2 Logitech MouseMan)
has issues.

TUV or someone changed the mouse driver support
between 304 and 44.  There are hardly any Logitech
entries any more, so I picked the Generic 3 Buttom
Mouse (PS/2) option.

This mostly works, but the cursor seems to resist
moving onto window grab handles (I use GNOME at the
moment, but switch between that and KDE to support
my users), and every once in a while it just goes
nuts, flying randomly around the srceen no matter
how I move it, acting like I'm holding down a button.
It was fine until I moved to 44.  The problem started
that day, and happens from once to thrice a day,

This mouse is the only type that really fits my hand
well, so I really don't want to change.  (I even have
a couple of spares at home.)

Any ideas?

Thanks,
Miles


Re: EL4.4 and older Logitech PS2 mouse

2007-10-02 Thread Miles O';Neal
Daniel Widyono said...

|What's your X mouse driver?  I had this issue at one point as well, and it
|turned out to be a change in IMPS/2 or PS/2.  I switched to the other and it
|worked fine from then on.

Currently IMPS/2.  I'll try the other when I get a chance.
Thanks.


Re: EL4.4 and older Logitech PS2 mouse

2007-10-04 Thread Miles O';Neal
Daniel Widyono said...

|What's your X mouse driver?  I had this issue at one point as well, and it
|turned out to be a change in IMPS/2 or PS/2.  I switched to the other and it
|worked fine from then on.

I finally got a chance to switch from IMPS/2
to PS/2 for the mouse protocol in X, and it
seems to have solved the problem.

Thanks!


radius authentication under Linux

2007-10-15 Thread Miles O';Neal
We have both VPN and ssh here for remote users.
Some users have both types of accunts so they
have a fallback when a problenm occurs.  We
want to use a radius server so the ssh box and
the VPN appliance can share login informatioon.

We have the VPN appliance working with a freeradius
server, but it's not clear what the easiest or best
way is to use radius for authentication with ssh
under linux.

Does this require a special ssh server, or ldap,
or what?  (We've never done ldap, either).

Thanks,
Miles


Re: Convert vmlinuz to vmlinux

2007-10-20 Thread Miles O';Neal
Wenji Wu said...

|I want to profile the system with oprofile, which only requires the vmlinux
|(the uncompressed kernel image). My system has only vlinuz. Are there any
|way that I can convert vmlinuz to vmlinux?

Some details are here:

   http://en.wikipedia.org/wiki/Vmlinux

with reference to a way to do this.

Is a stripped kernel going to work for you?
If not, you need to get the source and build
one.  Or see if there is a kernel-debuginfo
package available for whichever version of
the OS you are running; this should have the
vmlinux you need, unless that has changed
with recent releases.


Re: seamonkey-mail and thunderbird

2007-11-14 Thread Miles O';Neal
Ken Teh said...
|
|What's the story behind seamonkey-mail and thunderbird on SL4x?  I 
|recall vaguely that there were problems with thunderbird and that it is 
|preferable to stick with mail from the UV's seamonkey packages?

I can't answer the initial question, but can note
that we have users running TBird 2.0.0.5 on SL4.4
and we're not hearing complaints.

-Miles


dvd burning software for SL4?

2007-11-19 Thread Miles O';Neal
I don't see anything in GNOME for burning DVDs in SL4.
Am I just missing it, or should I go get something else?
If something else, what?  Preferably something available
via a straight RPM download w/o having to desal with a yum
or apt repo.  A KDE app would be OK.

Thanks,
Miles


Re: NIC bonding

2008-01-08 Thread Miles O';Neal
We had to configure this by hand under EL3.

In /etc/modules.conf:
alias eth0 tg3
alias eth1 tg3
alias bond0 bonding
options bond0 miimon=100 mode=balance-alb updelay=1000

In /etc/sysconfig/network-scripts
ifcfg-bond0:
DEVICE=bond0
BOOTPROTO=none
BROADCAST=10.102.255.255
IPADDR=10.102.0.82
NETMASK=255.255.0.0
NETWORK=10.102.0.0
GATEWAY=10.102.255.254
ONBOOT=yes
USERCTL=no

ifcfg-eth0:
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

ifcfg-eth1:
DEVICE=eth1
BOOTPROTO=none
ONBOOT=yes
MASTER=bond0
SLAVE=yes
USERCTL=no

I *think* that was it.

Obviously options and NIC names will vary.

-Miles


newer ggv for SL4?

2008-01-30 Thread Miles O';Neal
We have a number of documents for which the
version of ggv in 4.4 and 4.5 does not work.
I can find newer i386 FC3 RPMs for ggv, but
not x86_64 RPMs.  Do any of you have a pointer
to newer 64bit ggv RPMs that wil work with SL4?

Thanks,
Miles


xcalc (was repos to add)

2008-02-18 Thread Miles O';Neal
Andrea said...

|I'm having the same problem I've had with the other RPM: it compiles, 
|but the resulting binary doesn't work correctly (it shows a window with 
|some misplaced squares and a zero).

You might have a bad app resources file somewhere.

Have you tried downloading the source directly
from www.x.org?


Re: eth0 and eth1 swapped on a Sony Vaio, with SL5.0

2008-02-19 Thread Miles O';Neal
Pierre Frenkiel said...
|
|On Mon, 18 Feb 2008, Beyerle Urs wrote:
|
|> try to bind eth0 and eth1 to the MAC address of the devices.
|
|   Thanks. That's a better workaround than mine, but is still a workaround.
|   I would like to understand why this is neccessary since SL5.0, and
|   only for this machine...

It's not only since SL5 or only for that
machine, although it may well seem so.

In fact, it appears to be random with low
probability, but any given system seems
more or less likely with a given kernel
to exhibit this behavior.

I have no idea why, but we've seen it since
RH7.1 or RH8, I forget which[1].  A system
that worked fine on RH8 blew on on SL3 and
vice versa.  One that acted wonky on SL3
works fine on SL4 and vice versa.  We won't
move to SL5 any time soon because we're
driven by third party tool support, but it
won't surprise me if it still occurs.

Binding the ethX port to the MAC address
has generally worked for us and seems to be
the standard way to address this.

In my book this has always been broken
behavior.  I have no idea why it occurs,
and it's certainly annoying.

-Miles

[1] We only had one dual NIC Linux system
before that, and it simply never gets
rebooted except after prolonged power
failures.


Re: Automatic printer installation in SL5

2008-03-11 Thread Miles O';Neal
Troy Dawson said...
|
|Niels Walet wrote:
|> I hate replying to my own messages, but all indications are that redhat g
|> ave
|> up on having a CLI tool. The description found at
|> http://cyberelk.net/tim/2007/05/04/what-happened-to-printconf-tui/ "argue
|> s"
|> the case for abandoning it. Not convinced I agree if these are networked
|> printers, and I only want to install a few with the correct drivers. Of
|> course we can install a print server, but why?
|> I still would like to see a high quality script that does this job though
|> !
|> 
|> Niels
|
|I am not a printer expert (which is why I haven't replied earlier) but can't 
|you just take the configuration files in /etc/cups/ from the one machine that 
|has them all, and just copy it to the other machine?

You [potentially] need to change or copy over several files.

cups/cupsd.conf
cups/lpoptions
cups/ppd/$PRINTER.ppd # for each PRINTER you care about
cups/printers.conf

We just keep a set of cups tarballs for each network
that has different settings.  An rpm would of course
work, also.

If you want things identical, you can, indeed, just
clone the /etc/cups/ tree.


Re: spreadsheet

2008-05-19 Thread Miles O';Neal
Sara Vanini said...

|is there a good spreadsheet software in scientific linux, like Windows 
|Excel?

OpenOffice (www.openoffice.org)

The 3.0.0 beta includes support for Office 2007
formats, but the release before that (2.4.x)
does everything else well.


another spreadsheet

2008-05-19 Thread Miles O';Neal
There's also gnumeric, which is a standalone spreadsheet.
But it's getting harder to find RPMs for newer builds.

Apologies for separate posts.


Firefox 3 vs EL4

2008-06-18 Thread Miles O';Neal
It's going to be a while before we're allowed to
move past EL4.  Firefox 3 requires a relatively
recent version of libpangocairo, which isn't
available in RPM form before FC7 that I can tell.

Does anyone have an RPM that would work with EL4
(or FC3)?  Or do I need to hunt down cairo and
pango source and build it?

Thanks,
Miles


NFS prolems - EL4?

2008-06-19 Thread Miles O';Neal
Looking through things it appears this was an issue
with some EL4 kernels as well.  Does anyone know
in which EL4 release this showed up?


Thanks,
Miles


Re: NFS prolems - EL4?

2008-06-19 Thread Miles O';Neal
|If you are referring to the NFS problem discussed in an earlier
|thread, no EL4 kernel is affected by this bug as far as I know.

The redhat bug referenced in a recent post in this thread
said it also existed in EL4.  They forked an EL4 bug.  It
was slated to be fixed in either 4.7 or 4.8.  There's
probably a kernel patch, but I got the impression it showed
up after the first EL4 release.  Nowhere did it say which
EL4 release first had the bug.

-Miles


can't update SL4 with yum

2008-06-22 Thread Miles O';Neal
I have a system that's been running the original SL4 for a while.  Last
night I tried to update it following the yum update instructions on the
SL docs How To page.  Twice.

Both times, at the end, it gave me a handful of missing dependencies
for mozilla or something similar.  And nothing changed, either in grub
or after the reboot.

This morning, going through old emails, I found something about a new
yum in the contrib section.  Decided to try that.

   yum --enable=sl-contrib install yum\*

This went along fine, but after downloading python, etc, I got:

warning: rpmts_HdrFromFdno: V3 DSA signature: NOKEY, key ID ff6382fa
public key not available for sqlite-3.1.2-3.0.el4.kde.i386.rpm
Retrieving GPG key from file:///etc/pki/rpm-gpg/RPM-GPG-KEY-kde-redhat
GPG key retrieval failed: [Errno 5] OSError: [Errno 2] No such file or 
directory: '/etc/pki/rpm-gpg/RPM-GPG-KEY-kde-redhat'

I then ran

   yum --enable=sl-contrib install yumex\*

which downloaded two packages for yumex and then whined exactly
as above.

And, of course, yumex isn't installed.

I'm using the default yum configs, including repos.

Questions:

1) Do I need to rerun the full yum update and send the dependency whines,
   or does someone recognize the problem already?  Would running
   "yum upgrade" work better"  I used update because the page said to.
2) Why am I getting the GPG whine trying to update yum, and is there a way
   around it?  Is that out of date?  Then why can't I get yumex?
3) Is there any way to avoid downloading all the stupid KDE packages for
   gazillions of languages we don't care about?  Or do we have to just
   nuke all the original versions?  It's always annoyed me to have the
   installer ask me questions about which languages I want, but then some
   packages (kde being the worst) install everything but Martianm, anyway.
   Personally, I'd find Martian more useful than most of these, but I do
   understand that's not the case for everyone else. 8^)

Thanks,
Miles


Re: yppasswd on SL5

2008-07-03 Thread Miles O';Neal
Eve V. E. Kovacs said...
|
|Does anyone know the correct hole to punch in the firewall on an
|SL5.x NIS server so that yppasswd works on the clients? I find if I
|drop the firewall on the server, yppasswd works on the clients, but
|if it is in place a get a message saying that
|yppasswd: yppasswdd not running on NIS master host
|even though it is.

Normally these get assigned dynamically by
the portmapper, which makes it difficult
to know which ports to lock down.

s looks like a way around it:

   http://www.ale.org/pipermail/ale/20031030/002564.html

[I haven't tried it as our firewall to the
world is solid, and internally we just lock
servers down and run only necessary services
with reasonably high levels of security.  We
don't run iptables on anything I can think of
inside the firewall, and we don't let NIS, NFS,
etc through the firewall].

-Miles


imap vs thunderbird

2008-07-09 Thread Miles O';Neal
We have an odd email setup.  Our primary, internal
mailserver is an old Solaris 9 system running postfix,
imap and pop.  This system is really getting bogged
down and we plan to move mail to another system soon.
But short term we have started running imap and pop
services via xinetd on two SL304 systems to offload
the main mail server AMAP.  All three servers mount
the mail directory from the same filer (NetApp) via
NFS.

For most users, changing the imap or pop server in
the client just works (as it should).  But a handful
of users, most running Thunderbird on Linux (a couple
running default clients on Mac or XP laptops) have
found that when they switch servers, they see an
empty inbox, or only a handful of the most recent
emails there, no matter how many times they try to
sync.  If they switch back to the slow, original
server, everything shows up again.

The desktops are running 4.4 and some version
of Thunderbird 1.5 or 2.0.

Have any of you seen anything like this?  Is it an
imap (and pop) problem on EL3, or a protocol bug,
or what?  I don't see it as a thunderbird issue
since we have a handful of Mac and Windows users
seeing it on clients that came with their MacOS
or XP.  Of course, weirder things than that have
happened...

Thanks,
Miles


sl5 sites

2008-07-25 Thread Miles O';Neal
The "How to Create an SL Site" page
doesn't reference SL5, just 3 & 4.
Is the process the same?  We never got
around to creating one before, but I'd
really like to play with this for 5 so
that when we switch (still a ways off),
we already have this in place.

Thanks,
Miles


Re: sl5 sites

2008-07-25 Thread Miles O';Neal
Jon Peatfield said...

|If you just want to add extra packages for the install then you can do 
|that just creating an extra 'yum repo' and pointing the sl5 installer at 
|it in addition to the standard ones.
|
|Then either with kickstart or a semi-manual install you get to see the 
|extra rpms in all the repos you have listed - and you can specify your own 
|groupings if you care to write a suitable comps xml file...
|
|Any other customisations can be done by a script (for kickstart) or extra 
|packages containing the magic (if you want to support manual installs).
|
|I guess if you want to make custom ISOs you need to arrange to either add 
|the extra repo into existing ones, ship an extra ISO of your repo or just 
|point them at a network accessible version.
|
|Keeping your own repo(s) of extra packages is handy for doing yum updates 
|from later anyway so you probably need that anyway.
|
|What else do 'sites' offer?

That's fine until you start using a different version of
a package than the vendor uses.  Maybe there's a way around
that in yum; I haven't really figured yum out yet.  Is there
a *good* doc on yum out there that explains such things?


Re: sl5 sites

2008-07-28 Thread Miles O';Neal
John Summerfield said...

|> That's fine until you start using a different version of
|> a package than the vendor uses.  Maybe there's a way around
|> that in yum; I haven't really figured yum out yet.  Is there
|> a *good* doc on yum out there that explains such things?
|> 
|
|Where are the equivalent documents for SL{3,4}?

SL Docs Howtos: https://www.scientificlinux.org/documentation/howto/create.site

|I'm not sure I understand the question, and "site" is awfully vague.

Sites are a feature of the SL distribution.  Someone there
noted they're the same for SL5.

|_I_ don't like adding different versions of packages than the vendor 
|provides as it instantly increases the maintenance burden; RH does a 
|fairly good job of maintaining the packages it offers, and the cloners 
|such as SL mostly do a good job of tracking that maintenance and of 
|maintaining their own additions.

I don't, either, but we don't always have a choice.
Since we chose to go with SL instead of RH, getting
RH to change something isn't an option.  If they
don't upgrade (and they don't unless they have to
since one of the main reasons for EL is stability)
then SL isn't likely to, either.

|As soon as one uses a different version of a package, to a greater or 
|lesser extent that support is negated.

Given my previous paragraph, it should be obvious that
the term "support" has no bearing here.

Sometimes we need a newer package than the one delivered
by TUV.  This can be driven by 3rd party software,  our
developers, obscure bugs, customer requirements, all sorts
of things.  Especially since we tend to switch to a new OS
distribution only once every 2-3 years.  For instance we
had a real need for OpenOffice 2 while we were still on a
platform that came with OO 1.x.

|Generally, and depending on budgetary and support requirements, I would 
|choose amongst RHEL, a RHEL clone and Fedora, or the equivalent other 
|distros.
|
|Where I require a wide range of prepackaged software, I tend to use 
|Debian (but it's a long time since that happened on my desktop, and with 
|the advent of support for virtualisation that has become less likely).

We have no choice.  Our tool vendors all support EL.  We don't
want to pay for hundreds of EL licenses so we use a rebuild.
So far the tool vendors have accepted that, so long as we're
on a rev they support.

Thanks,
Miles


imake?

2008-08-13 Thread Miles O';Neal
CentOS4 ships with a functional version of imake and xmkmf.
SL4 ships with imake but no xmkmf, and the imake there doesn't
seem to work, at least not without a lot of messing around.

I'm trying to build a bunch of legacy apps, and they require
imake.  I don't know autoconf, and don't have time to try to
suss out how to convcert everything right now.

First, is there an RPM with imake and xmkmf that works for SL4?
Secondly, why does the xorg with CentOS ship with this while SL
doesn't?  Did someone at CentOS just fix this?  Or is it because
4.4 had it (the CentOS version I have available) but 4.6 doesn't
(the SL version I have available)?

Thanks,
Miles


Re: texlive kile kde?

2008-09-11 Thread Miles O';Neal
schoappied said...

|I'm interested in the scientific-linux distro. I have some questions 
|about some packages I like to use.
|
|Are R-project and texlive in scientific linux? Which versions?

SL is basically a rebuild of RHEL, with the concept of "sites"
added (sites provide a way to customize installations per site).
So if RH supports it, it's there, otherwise not.

There are some contributor packages uploaded to the site; you
can easily check these to see if the software you're interested
in is pre-packaged for use with SL.  But in general, RPMs built
for the equivalent version of the RH version or other rebuilds,
such as CentOS, work fine.  The usual dependancy caveats apply.
Also, RPMs built for the equivalent Fedora Core will work.  For
instance, EL4 and hence SL4 are based on FC3, so FC3 packages
usually work for SL4.

|Is it possible to choose kde or gnome as desktop manager?

Yes.  Just run "switchdesk".

-Miles (not affliated with any Linux distributor including SL)


Re: Security Breach

2008-10-01 Thread Miles O';Neal
|> Harry Enke wrote:
...
|> Is this in error?
|> "Fail2ban scans log files like /var/log/pwdfail or 
|> /var/log/apache/error_log and bans IP that makes too many password 
|> failures. It updates firewall rules to reject the IP address."
|> 
|> Examining logs after the event does not provide real-time protection.

I haven't looked at this tool, but sshblack scans the logs
*as they are written* to monitor for likely attackers and
updates the rules as the attacks begin.  You can set the
threshold in terms of number of tries over some period of
time that triggers a rule to block an apparent attacker.
You can be aggressive or easy-going as you like.  The result
is that a brute force attack must be so ponderously slow
as to be useless unless they just get ridiculously lucky
or you used a ridiculously simple to guess password.

I'd think this is what they mean.  It's real time monitoring
with real time blocking.


imap/pop server on EL5

2008-10-10 Thread Miles O';Neal
We're replacing our older servers, most of which
are running EL3.  The servers we want to handle
POP/IMAP with are running EL5.  EL4 and EL5
dropped support for the xinetd-based imap.  Cyrus
is way too complex, especially to tie in to the
standard NIS passwd/shadow setup (as far as I can
tell).  We don'twant to set up another radius
server (we can't use the existing radius server
for reasons irrelevant to this discussion).

So I installed dovecot.  It mostly works out of
the box, but wants to force users to migrate where
they keep their files.  Most of our users like where
their files are, or are at least use dto it, and
don't want to have to learn new locations.  Their
mail folder roots have a variety of names.  Dovecot
is extremely configurable, but not in ways useful
to us.

What I *really* want is to just use the old imap
server from xinetd, as it was compatible with how
our users work.  After quite a bit of searching, I
found exactly one rpm site with uw-imapd but it
had a dependancy for an rpm nobody seems to have
for EL5.

SO... does anyone have an EL5 RPM of the old imapd
for xinetd, or can anyone definitively tell me that
dovecot will search and use multiple folder root
dirs and point me to docs that explain how to do this?

[NOTE: I tend to just say EL because we have a mix
of SL and CentOS.]

Thanks,
Miles


Re: imap/pop server on EL5

2008-10-13 Thread Miles O';Neal
Jan vandenBerg said...
|
|Hi, Miles. We recently switched from wu-imap to dovecot, mainly to gain 
|support for multiple read/write sessions per mailbox, and to reap the 
|performance gains of Dovecot's indexes on old-fashioned mbox files. We've 
|been very pleased with the results. In our case, we already had folks 
|using a pretty consistent folder location, and we felt like it was worth 
|the effort to help migrate the few stragglers in order to converge on One 
|True Folder Location. But if you have some knowledge of where folks store 
|their mail folders and don't want to deal with migrating everybody to one 
|location, I _think_ you can simply specify multiple folder paths in 
|dovecot's mail_location setting.

The problem is that we don't know everywhere they are.
Once upon a time, the common *nix mail clients used a
common location.  The big question was whether to use
~/mail or ~/Mail.  Over the past decade, however, the
GUI-based clients have started splattering email all
over the place.  We are understaffed and don't really
have the bandwidth to track them all down...

At some point in the future  we'll do this, but this
week we just needed to get off an overloaded server
ASAP.

[snippage]

Thanks for the info; I'll file it against the day
we can move up to dovecot.

-Miles


Re: SL4 install on Intel 82801 RAID1

2008-10-13 Thread Miles O';Neal
|I am trying to install SL4 on Dell Dimension 9200 with Intel 82801 
|(ICH8) Raid controller. Not sure how to proceed.
|
|I have configured RAID1 array using the built-in Intel config utility (I 
|have two physical drives, 320G each). When I try to run the 4.5 
|installer, it sees two separate drives rather than the array. I do not 
|want to create the software RAID, and I am not 100% sure what LVM is.

LVM, the logical volume manager, is optional.  You don't
have to use it.

I know that on newer systems, we found we had
to either use 4.7 or 5.x to get RAID support,
especially with SATA.

-Miles


nfsd woes on 5.2

2008-10-31 Thread Miles O';Neal
We have a 5.2 system we're using as a storage
server/filer running nfsd.  We have hundreds
of nodes that can hit it at one time; these
clients are configured with autofs rather
than permanent mounts (legacy from the early
days).

We use NFS3 over TCP.  Originally we configured
the system with 100 daemons.  Very quickly we
started having jobs fail on the clients, and
the server log had lots of messages that say:

   kernel: lockd: too many open TCP sockets, consider increasing the number of 
nfsd threads

So we bumped it to 300, rebooting because we
had a new kernel to run, anyway.  Worked great
for a few days, then we started getting failures
again.

I bumped it up to 500 daemons and tried to
restart nfsd.  nfsd refused to start, saying
the port was busy.  I couldn't find anything
that I'd expect to use that port.  I finally
rebooted.  No nfs.  In the message log we now
had:

kernel: nfsd: Could not allocate memory read-ahead cache.
nfsd[6413]: nfssvc: Cannot allocate memory

[We have 8GB of RAM on the system, and at boot
time with 300 nfsd we don't even come close
to using 8GB.]

Backed down to 300, had to reboot as nfs would
not start.  It came up fine, but we still see
those pesky failures.

It gets more interesting.  Or bizarre.

% cat /proc/net/rpc/nfsd
rc 13537 33496396 192754161
fh 28 0 0 0 0
io 3943998555 1199297042
th 300 0 1188.353 239.850 65.863 16.361 1.857 0.000 0.000 0.000 0.000 0.000
ra 600 1328847 18752 16893 13305 9929 6954 5154 4301 3170 2710 0
net 226265416 0 226264783 70942
rpc 226260856 0 0 0 0
proc2 18 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
proc3 22 4477 96527885 5693324 38850837 37663631 12004 3694379 10771245 7160052 
1719510 42932 0 3152360 971863 3965505 33110 159 4197685 14857 4550 0 8837637
proc4 2 0 0
proc4ops 40 2284365 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 
0 0 0 0 0 0 0 0 0

As I understand things, the "th" line here says that
we have never come close to using all the nfs daemons
at one time!

So, we have two (possibly) problems.

1) Are the stats wrong, or is the problem not really
   in the number of threads?  This is a fast, dual,
   quadcore SuperMicro server, so I'm not worried
   that it can handle the load; we have much slower
   systems handling 100 threads without a hiccup
   (the nature of the projects means this newer
   system will get a lot more traffic).

   The NIC doesn't seem to be swamped.

   Is there a kernel param I need to tweak for
   more open sockets or something?

2) If I do need more daemons, how do I determine
   how much memory I need?  What is the limit on
   the number of daemons?

Thanks,
Miles


Re: nfsd woes on 5.2

2008-10-31 Thread Miles O';Neal
Stephen John Smoogen said...

|useful info deleted for focus.
|
|> So, we have two (possibly) problems.
|>
|> 1) Are the stats wrong, or is the problem not really
|>   in the number of threads?  This is a fast, dual,
|>   quadcore SuperMicro server, so I'm not worried
|>   that it can handle the load; we have much slower
|>   systems handling 100 threads without a hiccup
|>   (the nature of the projects means this newer
|>   system will get a lot more traffic).
|>
|>   The NIC doesn't seem to be swamped.
|>
|>   Is there a kernel param I need to tweak for
|>   more open sockets or something?
|>
|
|actually I think you need to look at the various nfs kernel proc/sys
|items first before bumping up the number of threads. You could be
|saturating various memory handlers and such and then you are just
|exasperating the problem with more threads and such. The process may
|be running out of open files or other items.

Can you recommend a good doc for tuning these in the 2.6 kernel?
sysctl -a doesn't show me anything that looks problematic but
maybe I just don't know what to look for in this case.  We just
started using the 2.6 kernels...


|> 2) If I do need more daemons, how do I determine
|>   how much memory I need?  What is the limit on
|>   the number of daemons?
|
|Well the big issue may not be memory at that point but 32bit versus
|64bit. The box might run out of possible allocations at 4GB of ram as
|that is as much one process can map to. I am guessing that each nfsd
|is allocating potential memory it can use for readahead and is running
|out of what it can set-aside for a buffer.

It's all 64 bit hardware and the 64 bit distro.

% cat /proc/cpuinfo
[8 of these]
processor   : 7
vendor_id   : GenuineIntel
cpu family  : 6
model   : 23
model name  : Intel(R) Xeon(R) CPU   E5430  @ 2.66GHz
stepping: 6
cpu MHz : 2666.829
cache size  : 6144 KB
physical id : 1
siblings: 4
core id : 3
cpu cores   : 4
fpu : yes
fpu_exception   : yes
cpuid level : 10
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm 
constant_tsc pni monitor ds_cpl vmx est tm2 cx16 xtpr lahf_lm
bogomips: 5332.77
clflush size: 64
cache_alignment : 64
address sizes   : 38 bits physical, 48 bits virtual
power management:

% uname -a
Linux elcampo 2.6.18-92.1.13.el5 #1 SMP Wed Sep 24 19:32:05 EDT 2008 x86_64 
x86_64 x86_64 GNU/Linux


Re: internet

2008-11-08 Thread Miles O';Neal
suhaiL khan said...

|i am using SL - 5.1 64 bit version. i am a new user. i
|want to know how to configure the internet connection in SL - 5.1. i am
|using cable connection. i will appreciate the help provided.

I'm not sure what you're really asking.

It should just come up and work.

If you don't have a good firewall in your router,
you definitely want to run iptables,  What did
you choose for "firewall" during the configuration
after "first boot"?

For that matter, what did you do about "network"
when you were running the install?


Re: ftp.scientificlinux.org was rebooted

2008-11-09 Thread Miles O';Neal
John Summerfield said...
|
|Troy Dawson wrote:
|> Hello,
|> ftp.scientificlinux had to be rebooted here, sometime around 9:00 am 
|> Central Standard Time (the time in Chicago).  And it took a while to 
|
|Please remember people that the rest of us have difficulty converting 
|your time to our time. Can you convert ours to yours? Try this:

Ah, yes.  Troy, please convert to each of our local
times, and list it beside each of our names!

8^)

|10:30 [EMAIL PROTECTED] ~]$ date
|Mon Nov 10 09:46:13 WST 2008
|09:46 [EMAIL PROTECTED] ~]$

John (and everyone else), "just plug "time zone calculator"
into the search field of your favorite web search, and you'll
find tools galore for this.

Just plug in the time Troy gave, along with either CST or
America/Chicago or US Central, or whichever variant your
chosen calculator provides, and there you are.

-Miles


Re: xpdf

2008-11-13 Thread Miles O';Neal
Gasser Marc said...

|does anybody know where I can get xpdf for SL5?

I plugged this into google:

   xpdf el5 rpm

; the top one looks good.  Just change 1386 to x86_64 if you need that.
|
|Thanks
|Marc
|


-- 
Miles O'Neal

Intrinsity, Inc.   |[EMAIL PROTECTED]
11612 Bee Caves Rd.|512-421-2242 (v)
Bldg II / Suite 200|512-577-3133 (c) <- best bet
Austin, Texas 78738|512-263-0795 (f)


SL5 large file systems

2008-11-19 Thread Miles O';Neal
Our local vendor built us a Supermicro/Adaptec
system with 16x1TB SATA drives.  We have a 12TB
partition that they built as EXT2.  When I tried
to add journaling, it took forever, and then the
system locked up.  On reboot, the FS was still
EXT2, and takes hours (even empty) to fsck.  Based
on the messages flying by I am also not confident
fsck rally understands a filesystem this large.

Is the XFS module stable on 5.1 and 5.2?  (The
vendor installed 5.1 because that's what they
have, but I ran "yum update").

Anyone have experience with filesystems this large
on a Linux system?  Will XFS work well for this?

If any of you have successfully used EXT3 on a
filesystem this large, are there any tuning tips
you recommend?  I was thinking of turning on
dir_index, but somewhere I saw a warning this
nmight not work with other OSes.  Since we do have
some Windows and Mac users accessing things via
SMB, I wasn't sure that was safe. either.

This is a 64bit system. 8^)

Thanks,
Miles


Re: sci-linux as a pseudo-embedded os,

2008-12-08 Thread Miles O';Neal
Salvador Aguinaga said...

|I've wondered if any of you have used scientific linux on a x86 hardware to
|run a single application ( like an embedded OS ) and disable updates and
|remove unneeded packages?

Sure, at least a dozen times.

|Or if you there is a better alternative to accomplish the same thing.

That depends on what you are trying to do with it.
For instance when I needed a bridging firewall a
few years back, there was no east way to do that
with SL (I tried for 2 day), so I gave up and used
freesco.

But, just for example, we've done this with DMZ
systems, fileservers, and a handful of others.  
Very straightforward; just install what you need,
remove the extras it installed anyway, and only
turn on the minimum services necessary.

Update manually as needed.

Works fine.


stinit.def for HP Ultrium 1840 LTO-4 tape drive?

2008-12-15 Thread Miles O';Neal
We are having a problem with a new SCSI tape drive.
Overland says we need am stinit.def file, but refuses
to give us one, saying we need one "from your OS vendor".

Anyone have such a beast laying around?

Thanks,
Miles


Re: stinit.def for HP Ultrium 1840 LTO-4 tape drive?

2008-12-15 Thread Miles O';Neal
Mark Stodola said...

|This is unknown territory for me, but the manpage for stinit seems to 
|outline what makes up the stinit.def.
|Also, Appendix C of this pdf contains an example: 
|http://h20331.www2.hp.com/ERC/downloads/4AA1-6074ENW.pdf

Thannks; it's a good start.  We *think* we have the
right values to plug in, but we want to be sure.


SL5 large file systems, part ][

2008-12-31 Thread Miles O';Neal
About a month ago, I said:

> Our local vendor built us a Supermicro/Adaptec
> system with 16x1TB SATA drives.  We have a 12TB
> partition that they built as EXT2.  When I tried
> to add journaling, it took forever, and then the
> system locked up.  On reboot, the FS was still
> EXT2, and takes hours (even empty) to fsck.  Based
> on the messages flying by I am also not confident
> fsck rally understands a filesystem this large.
> 
> Is the XFS module stable on 5.1 and 5.2?  (The
> vendor installed 5.1 because that's what they
> have, but I ran "yum update"), so it's effectively
> 5.2.

I rebuilt 12TB partition as XFS.  But after about 11GB
of data moved, the system locked up with "bus error".
After reboot, the system looks fine.  The vendor always
runs diags and burns the systems in, though it could
still be hardware or driver issue.

Is it likely to be the OS/XFS with the large partition,
or would you just send it back for diagnostics again?
Supermicros have been very reliable for us, but between
the Adaptec and 1TB SATAs, and the large partition, I'm
not sure how reliable the current drivers are.

I'd hoped to just have one mount point, but could make
2-3 smaller partitions if that seems to be the likely
issue.

Thanks,
Miles


Re: New logo for SL6?

2009-01-08 Thread Miles O';Neal
Troy Dawson said...

|If everyone is ok with this, the contests would have the following rules.
|
|The logo must be licensed GPL v2 or applicable Creative Common's license.
|The logo must be in SVG format.  (This should allow it to scale better)
|It should be a carbon, bohr style atom.  Bohr atoms have the electrons circle 
|around the nucleus, which contains protons and neutrons.
|There should be 6 electron's, 6 neutrons, and 6 protons.  Not all neutrons and 
|protons must be seen.
|The color scheme should be somewhat the same as previous logo's.
|In the end, someone should be able to look at the logo and say that although 
it 
|is newer and has more electron orbits, it clearly is the logo for Scientific 
Linux.
|
|Does this sound like a good idea? a bad idea?
|Any variations or changes to what I'm thinking?

I'd use Penguins instead of the electrons.  I wanted
to do this for the previous contest, but I'm not that
great at graphics tools.


Somewhat OT: thoughts on filers?

2009-01-14 Thread Miles O';Neal
We've been using NetApp for our tier 1 storage
and x86/x86_64 RAID systems for tier 2 (no snaps
or backups, but fast access) and tier 3 (archived
"write once read occasionally").  This has worked
pretty well, but the NetApp solution hasn't scaled
as well as we'd like.  On the other hand, it plays
really well with NFS, and we haven't had much in
the way of CIFS problems (we have some Windows and
Mac laptops using this).

But NetApp doesn't scale that well, and it's pretty
pricy.  We've been looking into other vendors, and
at the moment the leading contenders seem to be
Isilon and Pillar.

Anyone have any experience with these vendors and
any kudos or warnings?  95% of our computers and
data access are Linux (mostly EL4, a few EL3 and
EL5), but playing well with Mac and Windows systems
is also important.  Feel free to reply offlist if
you like (m...@intrinsity.com).

We run NFSv3 over TCP, NTP, DNS, NIS, all the usual
stuff.  We may use LDAP in the future, not sure yet.
We have a couple of Windows serves providing Windows
Domain services, etc.  Cisco switches (multiple GB
ethernet lines per filer head).  Fiber-attached
LTO4 robot for backups (will likely go to a SAN
switch for backups in the near future).

We will likely consolidate some of our tier 2 and
3 onto any new filers. but some will stay on the
x86_64 boxes for some time to come.

Performance, reliability, scalability- these are key.
We also need visibility into where the storage goes;
for instance to easily find all the storage used by
a user, and where it is.  We'd prefer not to have to
purchase a third party app for that (him NetApp).

We have all sorts of data, but it's almost all file
based (not relational databases, etc).  We need good
NFS performance whether it's writing 100GB files or
reading and writing directories with 100,000 files in
them.  It took some time, but we found the magic
combination for NetApps and the Linux boxes; I would
hope the Linux side of things would not change with
any good quality filer... any guidance?

Thanks,
Miles


data visualization tool?

2009-02-06 Thread Miles O';Neal
Howdy!

I've started looking for a data visualization tool.
Prefereably one that lets me select data from a mySQL
or pSQL database and graphically determine how to
display it (it has to be usable by techies, managers,
and admins if possible).  Does such a thing exist?

Thanks,
Miles
--
Miles O'Neal
m...@intrinsity.com
30° 18' 39N, 97° 55' 1W


Re: SLF47 2.6.9-78.0.1 kernel/tg3 3.86 driver vs. BCM5704

2009-02-11 Thread Miles O';Neal
Connie Sieh said...

|We have been experiencing intermittent network failures on systems running 
|SLF47 2.6.9-78.0.1 kernel/tg3 3.86 driver .  They waiting for the tcp to 
|finish, which never happens.  The failures are load-related.
|
|This error happened from time to time over the life of these nodes
|but they were operating more or less stably under SLF45/x86_64, tg3 3.77
|driver.

All I know is that I have found the Broadcom Linux drivers to be
so unstable between releases that I try hard to avoid them.  They
will work just fine with one release, but the next release we try
may be a nightmare.

We have a variety of servers from a few years ago that came with
Broadcom on-board NICs.  Each worked with the initial install it
had been certified with.  OS upgrades was painful, and in each
case we gave up after several hours of fighting drivers and
installed NIC cards using another vendor's chip.

The only systems we've ever had that were rock solid with Broadcom
are some Dell Optiplex 330n desktops.  But we've only run EL4.4 on
them, and we're nervous about the switch to 5.x .  The NIC type was
the one thing we forgot to check on the eval before doing the bulk
order. 8^/

I know that doesn't help you but wanted to add the data point for
anyone else considering systems with these NICs.

FWIW, they seem to work fine in the handful of Windows systems we
have with them. 8^0

-Miles


Re: AFS on XFS or ext3?

2009-02-19 Thread Miles O';Neal
Brent L. Bates said...
|
| Michael Mansour, cut the CRAP/FUD out!  I would NOT depend on ext3 if I

Let's not get nasty.

|CARED about what was stored on my disks.  I ONLY use ext3 if the data stored
|is NOT of "very high importance".  I use XFS when I DO CARE, so I use it all
|the time.  XFS is the most reliable, dependable, and robust file system out
|there and independent tests have consistently shown it to be much faster than
|ext3.  It has far more YEARS and Pentabytes of service under it's belt than
|ext3, a LOT more!  I've had XFS do a much better job of surviving system
|crashes and disk failures than ext3.

We use ext3 in production on desktops, compute servers,
and crucial, fast storage, and have never had a problem
with it.  We have stuck with it because (a) it just works,
(b) we're familiar with it, and (c) it installs by default.

We use xfs on systems where we need something ext3 doesn't
provide (such as many, many millions of inodes, etc).

Both have performed flawlessly for us, ext3 since RH9
or EL3, whenever we installed it.


-- 
Miles O'Neal

Intrinsity, Inc.   |m...@intrinsity.com
11612 Bee Caves Rd.|512-421-2242 (v)
Bldg II / Suite 200|512-577-3133 (c) <- best bet
Austin, Texas 78738|512-263-0795 (f)


Re: NFS default protocol change

2009-02-26 Thread Miles O';Neal
P. Larry Nelson said...

...
|I am currently going thru and adding "udp" to all the SL4.7 clients' fstab
|entries so they will use UDP rather than TCP.
|
|My main question is, lacking any explicit protocol designation in the fstab,
|how can one tell which protocol a client is using?

You can find the tcp connections using

   netstat -a | grep nfs

or just run

   cat /etc/mtab

to see each mount.

|And lastly, why wasn't the change documented in the release notes?
|
| From what I've gleaned about the two protocols from googling, it appears
|that TCP has advantages on a lossy network but that's not our scenario.
|It also is not a stateless protocol, like UDP, so if a server crashes in
|the middle of a packet transmission, the client will hang and filesystems
|will need to be unmounted and remounted.  So it would seem UDP is better,
|at least in our case.

We found things to be much more robust, and only very slightly
slower, using tcp.  We had plenty of hangs using udp, but that
was many kernel revs and other bugs back, so who knows?

-Miles


my ongoing battle with large filesystems

2009-03-05 Thread Miles O';Neal
recap: new 64 bit Intel quadcore server with Adaptec SATA RAID
controller, 16x1TB drives.  1 drive JBOD for OS.  The rest are
setup as RAID6 with 1 spare.  We've tried EL5.1 + all yum updates,
and EL5.2 stock.  We can't get /dev/sdb1 (12TB) stable with ext2
or xfs (ext3 blows up in the journal setup).

So I decided to carve /dev/sdb up into a dozen partitions and
use LVM.  Initially I want to use one partition per LV and make
each of those one xfs FS.  Then as things grow I can add a PV
(one partition per PV) into the appropriate VG and grow the LV/FS.
Between typos and missteps, I've had to build up and tear down the
LV pieces several times.  And now I get messages such as

  Aborting - please provide new pathname for what used to be 
/dev/disk/by-path/pci-:01:00.0-scsi-0:0:1:0-part6
or
  Device /dev/sdb6 not found (or ignored by filtering).

I clean it all up, wipe out all the files in /etc/lvm/*/*
(including cache/.cache), and try again, still broken.

I tried rebooting.  Still broken.

How can I fix this short of a full reinstall?

The whole LVM system feels really kludgy.  I suppose there's
not a better alternative at this time?

Thanks,
Miles


Re: my ongoing battle with large filesystems

2009-03-05 Thread Miles O';Neal
John Summerfield said...

|> recap: new 64 bit Intel quadcore server with Adaptec SATA RAID
|> controller, 16x1TB drives.  1 drive JBOD for OS.  The rest are
|> setup as RAID6 with 1 spare.  We've tried EL5.1 + all yum updates,
|> and EL5.2 stock.  We can't get /dev/sdb1 (12TB) stable with ext2
|> or xfs (ext3 blows up in the journal setup).
|> 
|> So I decided to carve /dev/sdb up into a dozen partitions and
|> use LVM.  Initially I want to use one partition per LV and make
|> each of those one xfs FS.  Then as things grow I can add a PV
|> (one partition per PV) into the appropriate VG and grow the LV/FS.
|> Between typos and missteps, I've had to build up and tear down the
|> LV pieces several times.  And now I get messages such as
|> 
|>   Aborting - please provide new pathname for what used to be 
/dev/disk/by-path/pci-:01:00.0-scsi-0:0:1:0-part6
|> or
|>   Device /dev/sdb6 not found (or ignored by filtering).
|> 
|> I clean it all up, wipe out all the files in /etc/lvm/*/*
|> (including cache/.cache), and try again, still broken.
|> 
|> I tried rebooting.  Still broken.
|> 
|> How can I fix this short of a full reinstall?
|> 
|> The whole LVM system feels really kludgy.  I suppose there's
|> not a better alternative at this time?
|
|How large a filesystem are you trying to create?

Each partition is roughly 1TB.  So each pv wll be 1TB
(12 TB / 12 partitoins per above), hence each vg and lv
will initially be 1TB.  As the FS grows on an lv, we'd
add another 1TB PV.

|What blocksize are you using?

Default - how does it matter in terms of the errors?
The first few times I built everything it worked fine.
Now when I create the partitions and try to make each
one a PV, some fail as above.  If I nuke them and start
over, I get more failures-- maybe the same ones fail the
same way, maybe not.  Somewhere a config or resource
has gotten corrupt, but for the life of me I can't find
it.

|What research have you done?

I've gone through a half dozen docs on the web to set
things up.  I googled for the error messages, but couldn't
find anything useful.


Re: my ongoing battle with large filesystems

2009-03-11 Thread Miles O';Neal
Jon Peatfield said...

[snip]

|Re-configuring your RAID controllers to export as <2TB slices isn't fun,
|but it should be possible without a re-install (if a bit fiddly).

Thanks for all of this.

I'll look again, but I didn't notice anything obvious in the Adaptec
screens at boot that would do this.  Any key phrases to look for?

Thanks,
Miles


whining about root not existing

2009-05-04 Thread Miles O';Neal
I occasionally get things like this from 5.2 systems:

   sudo: uid 0 does not exist in the passwd file!

Root cron jobs occasionally have error messages based
on uid 0 not existing (and maybe root user, per se,
I don't recall for sure).  It's random and not very
frequent.

We have root in /etc/passwd .  We use NIS and nscd for
user accounts.  nsswitch.conf checks files then network
for passwd.  Usually it works, and we don't see this on
our 4.x systems (of which we have far, far, more--
call it 300+ 4.4 systems and a couple of 4.7 systems,
and maybe a dozen 5.2 systems.

Any idea what needs to be updated to stop this?

Thanks,
Miles
-- 
Miles O'Neal

Intrinsity, Inc.   |m...@intrinsity.com
11612 Bee Caves Rd.|512-421-2242 (v)
Bldg II / Suite 200|512-577-3133 (c) <- best bet
Austin, Texas 78738|512-263-0795 (f)


changing MAC address on "device not found"

2009-10-19 Thread Miles O';Neal
I'm installing 4.7 on some new Supermicro servers (X8DTN+ boards).
(We will roll out 5.x later, but can't do that until testing against
all apps is complete.  We need these servers "last week".)

These use an Intel NIC not supported in base 4.7 .  I manually install
from a stock 4.7 DVD, then load the driver source from Intel, build for
4.7 base, then upgrade the kernel to 2.6.9-78.0.22.ELsmp and copy drivers
built against that kernel.

When booting the base 4.7 kernel, I allow kudzu to configure
both NICs with DHCP because it seems to randomly pick which one will
be eth0.  (Only NIC 1 is cabled).

Oddly enough the base kernel sees the devices as needing the igp
driver but the 2.6.9-78.0.22.ELsmp kernel sees them as e1000e.

When I boot into 2.6.9-78.0.22.ELsmp I start in single user mode, and
adjust the ifcfg-eth? files to have eth0 use the lower MAC address and
eth1 to have no MAC address and not boot.  Then I reboot (to avoid
problems with prefetched files).  On three of the five systems I installed
so far this worked fine.  On the other two, the network service says
"device not found" for eth0.  When I look at the output of "lspci -v"
both ethernets have the same MAC address.

Three questions:
1) How does this happen?
2) What do I do to get the behavior I want, avoiding the problem?
3) How can I fix the two systems that are hosed?

The standard solutions (ifconfig, ethtool, GNU mac changer) all require
use of the software device name (e.g., eth0) but eth0 has not been assigned
because the device is "not found".

Thanks,
Miles


Re: Addendum: Automounter problems on SL5.3

2009-11-02 Thread Miles O';Neal
On Mon, Nov 2, 2009 at 6:48 AM, Faye Gibbins  
wrote:

Addendum:

It is felt (I have no records) that the problem is more frequent under 
SL5.3. Much more frequent.


More frequent than what?  5.2?  4.x?

We have been running 4.4 for a couple of years, and 4.7 for a few weeks on some 
systems, with no problem such as this.  When we do see a freeze it's related to 
the server or network, and unless the problem goes on for quite a while, the 
clients all recover when the problem goes away.


Re: Automounter problems on SL5.3

2009-11-02 Thread Miles O';Neal
If it seems to be NIS-related, are you using nscd?  We had to turn on nscd and 
tweak the nscd configs or we had a lot of DNS issues.  Wihtout nscd we were 
seeing gethostbyname issues, although IIRC they were in torque more than the AM.


Re: kstars

2015-09-03 Thread Miles O';Neal

What happened to the SL contrib directories?

On 09/03/2015 10:13 AM, prmari...@gmail.com wrote:

Well it comes down to server vs desktop‎.
Keep in mind that TUV for SL is RHEL which is meant to be a business server 
distro. Kstars is a great application for a desktop but has no place on a 
server.
Back before RHEL 6 Red Hat tried to push an "Enterprise Desktop" variant of 
RHEL which included a lot of Desktop applications and was missing a lot of the server 
applications. RHEL AS (Advanced Server) included every thin‎g from both the server and 
desktop variants, at the time that is was what SL was built off of. Now Red Hat doesn't 
push the Enterprise Desktop version as much as they use too because it never caught on as 
well as they hoped, furthermore they minimized what they included in it to strictly what 
developers asked for no more no less. As for EPEL any thing can be added assuming some 
one is willing to maintain the packages, and you can get a fedora project shepard (kindof 
like a project manager) to sign off on it.

   Original Message
From: Alec T. Habig
Sent: Thursday, September 3, 2015 09:20
To: Efraim Yawitz
Cc: Mailing list for Scientific Linux users worldwide
Subject: Re: kstars

Efraim Yawitz writes:

Why is kstars no longer part of Scientific Linux? I'm still using 5.4
which has this wonderful and small planetarium program. Why was it
removed from later versions?

doesn't directly answer your question, but I use xephem: but have had to
roll my own rpm for many releases now. Just compiling a new version now
as I use it for my intro astronomy course (for making current starfields
etc for my lectures).

Over time, non-core programs come and go from TUV repository, which
composes 99% of all the packages in SL, and which the SL maintainers
have no control. You can find many of the things you'd like in a
supplemental repository like EPEL (unfortunately, neither kstars nor
xephem): if there's a critical mass of people who want the the thing.
But sometimes you just gotta do the old fashioned thing and compile it
yourself :(




--
Miles O'Neal
CAD Systems Engineer
Cirrus Logic | cirrus.com | 1.512.851.4659


Re: a year later - CERN move to Centos - what are we doing?

2016-01-12 Thread Miles O';Neal
Has CentOS got support yet? My employer moved to RHEL because we got 
tired of fighting third party vendors over their support on non-RHEL 
platforms, but I personally always found SL to be more consistent and 
quicker to release... and they had much better support.


On 01/12/2016 02:04 PM, lejeczek wrote:

hi,
after my first post I made a move, I should say a smaller rather, I 
did migrate a small HA cluster from SL7.1 to Centos7.2.
Instructions to do that I'm sure everybody can easily look up, just 
one tiny manual intervention was needed above what is already covered 
by a doc on Centos website.
But most importantly nothing broke, all the usual servers, web, mail, 
other net related services including HA carried on seamlessly.
Like I said earlier, and everybody knows, a lot, a lot is already 
shared, differences boil down to maybe a philosophy behind each 
organization responsible for each snip-off, some organizational and 
administrative processes, protocols.
Slight advantage seems that Centos offers, but expected as they are 
closer to the source in the lifecycle supply chain, is higher revision 
of some rpm packages, I see I get slightly newer kernel for example, etc.


If I was to voice my opinion out - and scientific devel & other 
responsible culprits are listening - then I say: go for it, get 
together, merge userbase, share devel jobs, duties, etc. Merge/share 
or even better, tell Redhat we want to use their, shared by all, bug 
reporting system.


I've decided, I'll be moving over to Centos, gradually but surely.
Note, one thing to remember if you did SL -> Centos, afterwards, is 
yum repos, make sure what you have enabled there.


cheers

On 12/01/16 09:48, lejeczek wrote:

hi everybody,

I've wondered and got curious, what do you guys, gals think about 
that move?
More importantly do you think it's a step we SL users should also 
consider?
CERN mention there were talks between them, Fermilab - what are 
Fermilab plans with regards to future releases, with regards to SL in 
general? (Not much info on the website.)
I personally am just about to trial a migration from SL7 to Centos. 
I'm thinking it's inevitable, am I wrong?


best wishes.




--
Miles O'Neal
CAD Systems Engineer
Cirrus Logic | cirrus.com | 1.512.851.4659


Re: a year later - CERN move to Centos - what are we doing?

2016-01-13 Thread Miles O';Neal
" >> /dev/tty1

echo "Failed to find a suitable system disk" >> /dev/tty1
echo "The system will be rebooted when you press Ctrl-C or 
Ctrl-Alt-Delete." >> /dev/tty1
echo "" 
>> /dev/tty1

while true; do
sleep 1
done

> fi
> fi


echo "Installing linux to $INSTALL_DISK" | tee -a /tmp/ks.log >> 
/dev/tty3

echo "Installing linux to $INSTALL_DISK" >> /dev/tty1
echo "" >> /dev/tty1
# Done figuring out where to install
###


#Write a file out to be included below for disk config
cat << EOF > /tmp/partitions
clearpart --drives=$INSTALL_DISK --initlabel --all
zerombr yes
part swap --recommended --ondisk=$INSTALL_DISK
part /--size=25600 --ondisk=$INSTALL_DISK
part /var --size=4096   --ondisk=$INSTALL_DISK
part /export/scratch --size=128 --grow --ondisk=$INSTALL_DISK
EOF



Graham
Good idea! Here's ours. Our constraints (swap, disk sizes, etc.) may of 
course be completely different for someone else's setups. We use this 
across everything from desktops to servers, with slightly different 
layouts for each. This has evolved as we moved from 5 to 6 (we are now 
on 6.5).


### Disk partitioning is handled in a pre script.
%include /tmp/disk-parts

%pre
### Determine physical RAM; we hope to have that much swap.
### Need to use /proc/meminfo since free was removed from install.img
SYSTYPE='DESKTOP'
MEM=$(grep ^MemTotal /proc/meminfo | awk '{print $2}')
GB=$(($MEM / 100))
if [ $GB -lt 24 ] ; then
GB=24
fi
MSIZE=$(($GB * 1024))

### Determine the style of disk labels used. Based on that,
### determine the number and size of the physical disks. For now
### assume no more than two disks
grep 'cciss/c[0-9]d[0-9]' /proc/partitions > /dev/null 2>&1
if [ $? == 0 ] ; then
DISKS=(`grep 'cciss/c[0-9]d[0-9]$' /proc/partitions | awk 
'{print $3 " " $4}'`)

else
DISKS=(`grep '[shv]d[a-z]$' /proc/partitions | awk '{print $3 " 
" $4}'`)

fi
NDISKS=$((${#DISKS[@]} / 2))
SIZE=${DISKS[0]}
D1SIZE=$((SIZE / 1024))
D1NAME=${DISKS[1]}
### First disk is always system disk.
SYSDISK=$D1NAME
if [ $NDISKS -gt 1 ] ; then
SIZE=${DISKS[2]}
D2SIZE=$((SIZE / 1024))
### Allow for X4170 tiny second disk.
if [ $D2SIZE -lt 10240 ] ; then
NDISKS=1
else
D2NAME=${DISKS[3]}
fi
fi

### Default sizes for all non-swap partitions
SLASH=4096
USR=20480
VAR=10240
TMP=10240
EXPORT=10240
if [ $SYSTYPE == 'SERVER' ] ; then
BIGPART='/tmp'
SMALLPART='/export'
else
BIGPART='/export'
SMALLPART='/tmp'
fi

### If one disk, make sure swap doesn't consume too much of it.
### If two disks, give second disk to swap and assume disks are equal in 
size.

### /var only grows to consume its disk if we have two disks.
MAXSWAP=$(($D1SIZE - $SLASH - $USR - $VAR - $TMP - $EXPORT - 100))
#--echo DSIZE = $D1SIZE MAXSWAP = $MAXSWAP MSIZE = $MSIZE
if [ $NDISKS == 1 ] ; then
if [ $MSIZE -gt $MAXSWAP ] ; then
MSIZE=$MAXSWAP  ### swap has to fit on the disk.
fi
SWAPDISK=$D1NAME
GROWVAR=""
TMPDISK=$D1NAME
else
### If swap would overflow disk 1, put it on disk 2.
### Don't use more than half of disk 2.
if [ $MSIZE -gt  $MAXSWAP ] ; then
SWAPDISK=$D2NAME
if [ $MSIZE -gt $(($D2SIZE / 2)) ] ; then
MSIZE=$(($D2SIZE / 2))
else
MSIZE=$D2SIZE
fi
else
SWAPDISK=$D1NAME
fi
GROWVAR="--grow"
TMPDISK=$D2NAME
fi

### Finally, swap can't exceed 64GB on EL6
if (( $MSIZE > 64000 )); then MSIZE=64000; fi

### Output the partition tables
cat > /tmp/disk-parts <<__EOD__
part / --bytes-per-inode=4096 --fstype="ext4" --size=$SLASH --ondisk 
$SYSDISK

part swap --bytes-per-inode=4096 --size=$MSIZE --ondisk $SWAPDISK
part /usr --bytes-per-inode=4096 --fstype="ext4" --size=$USR --ondisk 
$SYSDISK
part /var --bytes-per-inode=4096 --fstype="ext4" --size=$VAR --ondisk 
$SYSDISK $GROWVAR
part $BIGPART --bytes-per-inode=4096 --fstype="ext4" --size=$TMP 
--ondisk $TMPDISK --grow
part $SMALLPART --bytes-per-inode=4096 --fstype="ext4" --size=$EXPORT 
--ondisk $SYSDISK

__EOD__


--
Miles O'Neal
CAD Systems Engineer
Cirrus Logic | cirrus.com | 1.512.851.4659


Re: SL7.1 and printing to a CUPS backend - weird notification icon

2016-01-21 Thread Miles O';Neal

You may be able to google the icon.

https://support.google.com/websearch/answer/1325808?hl=en

On 01/21/2016 03:03 PM, James M. Pulver wrote:
So, we recently got the patch that fixed the printer name issue with _ 
in the printer name. I can now print successfully. However, as soon as 
I printed, in the notification area I get a printer icon with a "do 
not enter sign" on it, i.e. a red circle with a white horizontal line 
over it. It gives no hover over information, I cannot left or right 
click on it, and it doesn't seem to go away.


I can't exactly google an icon, so does anyone know anything about 
this icon, what it's trying to tell me, and how to fix the issue?



--
Miles O'Neal
CAD Systems Engineer
Cirrus Logic | cirrus.com | 1.512.851.4659


Re: RHEL 5/6/7 "rosetta stone"

2016-02-01 Thread Miles O';Neal

This is what plotters are for!

On 02/01/2016 12:51 PM, Keith Lofstrom wrote:

"W.L." provided this URL, for a poster that shows
commonly used commands for RHEL 5, 6, and 7:

https://urldefense.proofpoint.com/v2/url?u=https-3A__access.redhat.com_sites_default_files_attachments_rhel-5F5-5F6-5F7-5Fcheatsheet-5F27x36-5F1014-5Fjcs-5Fweb.pdf&d=CwIBAg&c=O3LcjD-V2Iepl5V0N1424A&r=VCvRwPOrm0njzjSrvx26Ik46vMGFCQNOuW-so6eZTdM&m=L2yOFBuGnqW_pQfnZF00F8GHtDxpq8jJvuRWIePN_gY&s=Wo_PAr-dq13UzW8NOkSpJ1HnB1ebN2uLdSPMaOCchX8&e=

It is a large poster (approaching the Rosetta Stone in size),
but it is very useful for understanding what's what in RHEL7.
This, plus the man pages for the tools, is a good approximation
of what I was asking for.




--
Miles O'Neal
CAD Systems Engineer
Cirrus Logic | cirrus.com | 1.512.851.4659


Re: *RESOLVED* "Not using downloaded repomd.xml because it is older than what we have:"

2016-03-28 Thread Miles O';Neal

This has been known to help:

% yum clean all
% yum update yum\*
% yum update

On 03/28/2016 04:08 PM, Steven Haigh wrote:

On 29/03/2016 4:37 AM, Thomas Leavitt wrote:

So, I need to qualify swapping chronyd for ntpd as a "resolution"...

The problem *mostly* went away. I'm still intermittently getting it, on a 
smaller set of machines:

/etc/cron.daily/0yum-daily.cron:

Not using downloaded repomd.xml because it is older than what we have:
   Current   : Fri Mar 25 07:54:39 2016
   Downloaded: Fri Mar 25 07:54:20 2016

So I haven't really been following this thread - however, once I read it
I note that I get this problem at least once an hour from at least one
of the 30 or so VMs that I have running.

On one of the systems that just threw this error:
# ntpdate 203.23.237.200
29 Mar 07:54:27 ntpdate[15136]: adjust time server 203.23.237.200 offset
0.002116 sec

The NTP server at 203.23.237.200 is a stratum 2 public NTP server that I
run. It syncs from multiple atomic sources provided by the Australian
Government. It's accurate.

RedHat seems to have something on this - but its locked behind the
subscription:
 https://access.redhat.com/solutions/29845

If its only the overnight cron you get it from, it may be good enough to
modify /etc/sysconfig/yum-autoupdate and add "yum clean all" to PRERUN.

I note that I only ever get these from 7.x systems. 6.x doesn't seem to
throw this at all.




--
Miles O'Neal
CAD Systems Engineer
Cirrus Logic | cirrus.com | 1.512.851.4659