[CentOS] USB drive very slow on CentOS 8

2020-01-04 Thread Michael Eager

I recently upgraded from CentOS 7 to CentOS 8.

I have a Mediasonic HF2-SU3S2 external drive enclosure with 4x3Tb drives 
configured as a software RAID 5 array (mdadm) with LVM.  It's connected 
to a USB 3.0 port.


On CentOS 7, the drive performance is reasonable.  On CentOS 8, 
performance is extremely slow, about USB 1.0 performance.  Maybe worse.


Connecting an external USB SSD to the CentOS 8 system appears to perform 
well.


I've updated the ASUS Prime-B350-Plus motherboard with the latest BIOS, 
but it doesn't appear to make any difference.


Does anyone have any idea what might be different between CentOS 7 and 8 
that would cause this?



--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Problems installing CentOS 8

2019-12-25 Thread Michael Eager

On 12/23/19 5:19 PM, Michael Eager wrote:
I'm having a problem installing CentOS 8 from a USB drive.  When the 
installer boots from the USB, it displays the language selection screen. 
  After I select English and continue, the installer freezes.  The USB 
drive flashes a couple times over the next minute or so, the stops.  The 
mouse moves the cursor, but the installer is unresponsive to either 
selecting QUIT or HELP.


I resolved the problem.

I had a multi-Tb external RAID box attached by USB to the system. 
Apparently, the installer was trying to analyze the available space and 
either failing or taking forever.  Once I disconnected the external 
drive, the installer was able to find the local disks.


--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Problems installing CentOS 8

2019-12-24 Thread Michael Eager

On 12/24/19 6:39 AM, Mauricio Tavares wrote:

On Mon, Dec 23, 2019 at 8:20 PM Michael Eager  wrote:


I'm having a problem installing CentOS 8 from a USB drive.  When the
installer boots from the USB, it displays the language selection screen.
   After I select English and continue, the installer freezes.  The USB
drive flashes a couple times over the next minute or so, the stops.  The
mouse moves the cursor, but the installer is unresponsive to either
selecting QUIT or HELP.

I've tried both the default and the basic graphic install with the same
results.


   Stupid (as in I am guilty of that) question: do you know if this
USB is not a bum? The later explains why I could not my raspberry pi
booting. Replacing with a new sd card solved this issue.


CentOS completes the self-test without error before booting.  Booting 
completes without apparent problem.  My guess is that the USB is good.



If you want to be lazy and have a hypervisor, create an vm guest and
boot it using the usb.


I can try that.  It will tell me if there is a HW/BIOS issue.


With that said, it is possible that while you are having an
uncooperative gui you can still switch screens  (i.e. keyboard still
listening to you) to screen 1 or 2 and then take a look at the
dmesg/log output for clues of what went boink.


Good idea -- I hadn't thought of that.




Details:
CentOS-8-x86-1905-dvd1.iso (sha256 verified)
ASUS Prime B350 Plus motherboard
AMD Ryzen 5 1600 CPU
32Gb DRAM
4 SATA drives in RAID/LVM configuration.
M.2 500Gb Samsung SSD (not formatted)

--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos

___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos



--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Problems installing CentOS 8

2019-12-23 Thread Michael Eager
I'm having a problem installing CentOS 8 from a USB drive.  When the 
installer boots from the USB, it displays the language selection screen. 
 After I select English and continue, the installer freezes.  The USB 
drive flashes a couple times over the next minute or so, the stops.  The 
mouse moves the cursor, but the installer is unresponsive to either 
selecting QUIT or HELP.


I've tried both the default and the basic graphic install with the same 
results.


Details:
CentOS-8-x86-1905-dvd1.iso (sha256 verified)
ASUS Prime B350 Plus motherboard
AMD Ryzen 5 1600 CPU
32Gb DRAM
4 SATA drives in RAID/LVM configuration.
M.2 500Gb Samsung SSD (not formatted)

--
Michael Eagerea...@eagerm.com
1960 Park Blvd., Palo Alto, CA 94306
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] Building for older versions

2015-11-23 Thread Michael Eager

Hi --

I'm trying to build an application on CentOS 7 which
can run on older versions of CentOS.  I'm running into
problems with versioning of memcpy in Glibc.  Executables
built on CentOS 7 require memcpy from glibc-2.14, which
causes the program not to load on systems with older
versions of glibc.

My online search suggests to add an asm() with a .symver
option to select memcpy from glibc-2.2.5 in each of the
source files which reference memcpy().  This isn't practical
with a program with tens of thousands of source files.

Does anyone have a reasonable solution?

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Building for older versions

2015-11-23 Thread Michael Eager

On 11/23/2015 08:06 AM, Nicolas Thierry-Mieg wrote:

On 11/23/2015 04:33 PM, Michael Eager wrote:

Hi --

I'm trying to build an application on CentOS 7 which
can run on older versions of CentOS.  I'm running into
problems with versioning of memcpy in Glibc.  Executables
built on CentOS 7 require memcpy from glibc-2.14, which
causes the program not to load on systems with older
versions of glibc.

My online search suggests to add an asm() with a .symver
option to select memcpy from glibc-2.2.5 in each of the
source files which reference memcpy().  This isn't practical
with a program with tens of thousands of source files.

Does anyone have a reasonable solution?


IMO you should really be building your app on an older Centos version (5 or 6). 
Then your binary
should run everywhere, though it may sometimes require installing a -compat 
package.


That causes a number of other problems, when the only issue is
accessing a working version of memcpy from the installed glibc.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Building for older versions

2015-11-23 Thread Michael Eager

On 11/23/2015 07:43 AM, Chris Adams wrote:

Once upon a time, Michael Eager <ea...@eagerm.com> said:

I'm trying to build an application on CentOS 7 which
can run on older versions of CentOS.  I'm running into
problems with versioning of memcpy in Glibc.  Executables
built on CentOS 7 require memcpy from glibc-2.14, which
causes the program not to load on systems with older
versions of glibc.


Most shared libraries are "upwards compatible" but not "backwards
compatible" - builds against an old version will run with the new
version, but not the other way around.  You've found this with glibc,
but you could also run into it with other libraries.


The situation with memcpy is a bit different.  This isn't really a
forward/backward interface compatibility issue.

There was a patch applied to memcpy to improve performance on
some architectures, but it also changed the order in which data was
moved.  Some programs were dependent on this and they broke with the
new implementation.  These programs did not conform to the non-overlapping
data requirements in memcpy's specification.  Programs which did conform
worked with both the new and old implementation.

To prevent non-conforming programs from using the new version of
memcpy, they tagged it with glibc-2.14.  Unfortunately, this also
makes conforming programs, which work with either the old or new
implementation from running on systems which have older versions of
glibc.




My online search suggests to add an asm() with a .symver
option to select memcpy from glibc-2.2.5 in each of the
source files which reference memcpy().  This isn't practical
with a program with tens of thousands of source files.

Does anyone have a reasonable solution?


Would it be practical to use mock and build on the oldest version you
want to support?  This is how EPEL packages are built for example.  It
is targeted at building RPMs, but you can manually use copy-in and
copy-out to do other things.


I'll look into mock.


--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Building for older versions

2015-11-23 Thread Michael Eager

On 11/23/2015 09:10 AM, Nicolas Thierry-Mieg wrote:

On 11/23/2015 06:00 PM, Michael Eager wrote:

On 11/23/2015 08:06 AM, Nicolas Thierry-Mieg wrote:

On 11/23/2015 04:33 PM, Michael Eager wrote:

Hi --

I'm trying to build an application on CentOS 7 which
can run on older versions of CentOS.  I'm running into
problems with versioning of memcpy in Glibc.  Executables
built on CentOS 7 require memcpy from glibc-2.14, which
causes the program not to load on systems with older
versions of glibc.

My online search suggests to add an asm() with a .symver
option to select memcpy from glibc-2.2.5 in each of the
source files which reference memcpy().  This isn't practical
with a program with tens of thousands of source files.

Does anyone have a reasonable solution?


IMO you should really be building your app on an older Centos version
(5 or 6). Then your binary
should run everywhere, though it may sometimes require installing a
-compat package.


That causes a number of other problems,


can you please provide some details? I'm genuinely curious as I've been faced 
with this occasionally
and the only problem I've encountered is having to install a few *-compat 
packages.
thanks.


Building on an older version of CentOS means using older compilers and
libraries.  Some applications require building with more current tools.

So you end up between a rock and a hard place.  You can try to build
on the older system for library compatibility, but then you have to
use development tools from newer versions.  Or you can build with the
newer tools, and you have compatibility issues running on the older system.



--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Building for older versions

2015-11-23 Thread Michael Eager

On 11/23/2015 07:57 AM, Gordon Messmer wrote:

On 11/23/2015 07:33 AM, Michael Eager wrote:

Does anyone have a reasonable solution?


I'd start here:
https://wiki.linuxfoundation.org/en/Using_lsbdev


Yeah.  I know about LSB and I've worked with the
LSB committee.  Maybe it's time I tried using it.  :-)


It does seem to be a sledge hammer to address what
seems to be a minor issue.



--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
https://lists.centos.org/mailman/listinfo/centos


[CentOS] NFS performance on CentOS 7

2015-05-09 Thread Michael Eager

I am setting up a file server with CentOS 7.  I'm seeing
performance which is considerably slower than a similar
server running CentOS 6.6.  A 3Gb directory can be copied
to/from the CentOS 6.6 server in about 50 seconds.  The
same directory takes about 270 seconds to copy to/from
the CentOS 7 system.

I see the same performance difference with NFS mounted
file systems or using scp, so it doesn't appear to be
an NFS issue.  The MTU on the NICs on both systems is
1500, and changing it to 6000 on the CentOS 7 system had
no effect.

Anyone have any ideas what might cause this problem or
how to fix it?

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 5.8 crash/freeze running VMware

2012-07-10 Thread Michael Eager
On 07/06/2012 11:17 AM, Johnny Hughes wrote:
 On 06/29/2012 09:52 AM, Michael Eager wrote:
 On 06/28/2012 06:33 PM, Ted Miller wrote:
 On 06/28/2012 12:45 PM, Michael Eager wrote:
 Hi --

 I have a server running CentOS 5.8.  It has a 6-core AMD processor,
 16Gb memory, and a RAID 5 file system.  It serves as both a file server
 and to run several VMware virtual machines.  The guest machines run
 Windows 7 and various versions of Linux.

 The system is running the latest version of VMware Workstation.
 Until recently, I started VMs using the VMware Workstation GUI.
 The system has been very stable and seldom crashes.

 Recently, I set up an init script to start several VMs at boot
 time using the vmrun command.  This appeared to work correctly,
 but the system has become unstable, freezing at various times.
 When the system freezes, there is no console response and it
 does not respond to a ping.  There is nothing in syslog to
 indicate any error.

 The script started 8 VMs.  I've cut back to now running 4 VMs
 and the system appears stable.

 Is there some relation between the number of cores and the number
 of VMs one can run?

 Is there something else which might cause the system to crash
 when running multiple VMs?

 Any suggestions to identify why the system crashed?

 Are you staggering the startups of the VMs?  The server may be choking
 trying to boot 8 machines at once.  I suggest starting a VM every 30-60
 seconds, so that you aren't trying to boot all 8 at once.  Don't know if it
 will help, but it might.
 The crashs happen long after boot time when all of the VMs are running.

 (Actually, startup goes very smoothly, with the VMs starting in parallel
 in the background while system boot completes.)

 This sounds like the issue with the machine running out of memory and
 the Out of Memory killer actually killing one of the VMWare instances.

 My experience with this on a very good machine was that there was enough
 memory, but it was timing that was causing the issue.  The machine did
 not respond quickly enough to the memory request and the OOM Killer then
 acted.

 How I solved my problem was to reserve more memory as unused with this
 memory variable:

 I have had issues with VMWare host server and running out of memory,
 maybe try setting this variable in sysctl.conf:

 vm.min_free_kbytes=65536

 (that will maintain 64MB of free RAM and should allow for enough time to
 prevent OOM kills)

I'll give that a try.

But the problem was not that one or more VMware instances was killed and
other processes continued, but that the system hung.  Nothing was running.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] System crash -- no clue why

2012-07-10 Thread Michael Eager
Hi --

My CentOS 5.8 server crashed, leaving no clue why.  The
last entry in /var/log/messages is a dhcpd notice around
4:00am, followed by the restart message when I rebooted.
The only clue that I have is that the fan was running full
speed when I restarted it.  The fan slowed to normal speed.

Any ideas what I can do to find out the cause?

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077

___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] CentOS 5.8 crash/freeze running VMware

2012-06-29 Thread Michael Eager
On 06/28/2012 06:33 PM, Ted Miller wrote:
 On 06/28/2012 12:45 PM, Michael Eager wrote:
 Hi --

 I have a server running CentOS 5.8.  It has a 6-core AMD processor,
 16Gb memory, and a RAID 5 file system.  It serves as both a file server
 and to run several VMware virtual machines.  The guest machines run
 Windows 7 and various versions of Linux.

 The system is running the latest version of VMware Workstation.
 Until recently, I started VMs using the VMware Workstation GUI.
 The system has been very stable and seldom crashes.

 Recently, I set up an init script to start several VMs at boot
 time using the vmrun command.  This appeared to work correctly,
 but the system has become unstable, freezing at various times.
 When the system freezes, there is no console response and it
 does not respond to a ping.  There is nothing in syslog to
 indicate any error.

 The script started 8 VMs.  I've cut back to now running 4 VMs
 and the system appears stable.

 Is there some relation between the number of cores and the number
 of VMs one can run?

 Is there something else which might cause the system to crash
 when running multiple VMs?

 Any suggestions to identify why the system crashed?

 Are you staggering the startups of the VMs?  The server may be choking
 trying to boot 8 machines at once.  I suggest starting a VM every 30-60
 seconds, so that you aren't trying to boot all 8 at once.  Don't know if it
 will help, but it might.

The crashs happen long after boot time when all of the VMs are running.

(Actually, startup goes very smoothly, with the VMs starting in parallel
in the background while system boot completes.)


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] CentOS 5.8 crash/freeze running VMware

2012-06-28 Thread Michael Eager
Hi --

I have a server running CentOS 5.8.  It has a 6-core AMD processor,
16Gb memory, and a RAID 5 file system.  It serves as both a file server
and to run several VMware virtual machines.  The guest machines run
Windows 7 and various versions of Linux.

The system is running the latest version of VMware Workstation.
Until recently, I started VMs using the VMware Workstation GUI.
The system has been very stable and seldom crashes.

Recently, I set up an init script to start several VMs at boot
time using the vmrun command.  This appeared to work correctly,
but the system has become unstable, freezing at various times.
When the system freezes, there is no console response and it
does not respond to a ping.  There is nothing in syslog to
indicate any error.

The script started 8 VMs.  I've cut back to now running 4 VMs
and the system appears stable.

Is there some relation between the number of cores and the number
of VMs one can run?

Is there something else which might cause the system to crash
when running multiple VMs?

Any suggestions to identify why the system crashed?

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-10 Thread Michael Eager
Simon Matter wrote:

 One fan is listed as 0 rpm.   Something to look into.
 
 Hmm, much has been said now in this thread and I know how difficult it can
 be to find such an issue. However, I suggest not to throw in too many new
 tools in parallel. And, be careful of how to interpret any information
 gathered by tools like lm_sensors. They can only report as good as the
 mainboard and it's sensors were designed and built, both can be
 suboptimal. I've seen all kind of things like temp sensors not mounted
 where they should. Of course, builtin sensors like thiose of a CPU should
 be taken very serious.

Thanks for the suggestions.

 So, may I give some more tips how I'd try to find what is wrong:
 - Take a vacuum cleaner and *carefully* clean the whole box. Dust can
 really do bad things because it is not a perfect insulator.
 - If you feel you have to remove any device like CPU, make sure you up
 everything, have a good quality heat sink paste at hand and make sure
 everything is seated well after mounting it again.
 - For the memory part, do you have ECC? If not, then it is really a
 problem and if the box is used as a server, ECC is a must, if yes, then
 most errors will be corrected by ECC but what is more important, memory
 errors are usually logged. You should be able to find a list of those
 errors in the BIOS, you may see how many times errors occur and where,
 does something like that exist?

The MB docs/website don't mention ECC support, but I presume it is as part
of the DDR2 spec.  I'll check whether the memory has ECC.  If not, this is
a reasonable upgrade.

 - For the temparatures, 87C is not so uncommon, but yes, it looks a little
 bit high. Someone else posted 80C to be the max for your CPU, that seems
 correct, at least our 12core Opterons have Caution: 75C; Critical: 80C
 but they usually run at 45C-55C under normal load. So if 87C is really
 correct, under normal load, that may be already too much, and then
 consider what happens at peak times?

The most recent crash was overnight and not discovered until morning.
Probably not related to load.  But if it really is running over temp,
then almost anything can happen.

 - When you look at the lm_sensors values, do they correspund with what is
 shown in the BIOS (if is has this kind of diagnostics)?

Something I'll check when the system is taken down.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-10 Thread Michael Eager
Alexander Arlt wrote:
 Am 03/10/2011 11:04 AM, schrieb Simon Matter:
 - Take a vacuum cleaner and *carefully* clean the whole box. Dust can
 really do bad things because it is not a perfect insulator.
 
 Never ever do that. Especially not inside the machine. There is a real 
 risk of simply vacuuming smaller components like smd-resistors of the 
 board. And, as already mentioned, you also have the chance of killing 
 components by electrostatic discharge. Always use compressed air, even 
 if just using canned one. Vacuuming is a pretty bad advice.

Previous cleaning have been with canned compressed air.
Thanks for the caution about vacuums and static.  I may
use the vacuum on the case fans from the outside.  The
case should provide an adequate static shield.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
Dr. Ed Morbius wrote:
 on 09:24 Tue 08 Mar, Michael Eager (ea...@eagerm.com) wrote:
 Hi --

 I'm running a server which is usually stable, but every
 once in a while it hangs.  The server is used as a file
 store using NFS and to run VMware machines.

 I don't see anything in /var/log/messages or elsewhere
 to indicate any problem or offer any clue why the system
 was hung.

 Any suggestions where I might look for a clue?
 
 I'd very strongly recommend you configure netconsole.  Though not entire
 clear from the name, it's actually an in-kernel network logging module,
 which is very useful for kicking out kernel panics which otherwise
 aren't logged to disk and can't be seen on a (nonresponsive) monitor.

I'll take a look at netconsole.

 Alternately, a serial console which actually retains all output sent to
 it (some remote access systems support this, some don't) may help.
 
 Barring that, I'd start looking at individual HW components, starting
 with RAM.

The problem with randomly replacing various components, other than
the downtime and nuisance, is that there's no way to know that the
change actually fixed any problem.  When the base rate is one
unknown system hang every few weeks, how many wees should I wait
without a failure to conclude that the replaced component was the
cause?  A failure which happens infrequently isn't really amenable
to a random diagnostic approach.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
John Hodrien wrote:
 On Wed, 9 Mar 2011, Michael Eager wrote:
 
 The problem with randomly replacing various components, other than
 the downtime and nuisance, is that there's no way to know that the
 change actually fixed any problem.  When the base rate is one
 unknown system hang every few weeks, how many wees should I wait
 without a failure to conclude that the replaced component was the
 cause?  A failure which happens infrequently isn't really amenable
 to a random diagnostic approach.
 
 So you pitch the whole thing over to being a test rig, and buy all new
 hardware?

I'll repeat from my original post:

I don't see anything in /var/log/messages or elsewhere
to indicate any problem or offer any clue why the system
was hung.

Any suggestions where I might look for a clue?

I'm looking for diagnostics to focus on the cause of the crash.
My thanks for the several suggestions in this area.

I'm not particularly interested in a listing of the myriad of
hypothetical causes absent observable evidence and some of
which are contradicted by evidence (such as overheating).

I've encountered my share of bad power supplies, bad RAM,
poorly seated cards, etc.  I've replaced failing capacitors
in monitors (never on a motherboard).  I've replaced video
cards, hard drives, bad cables.  And so forth.  Each of these
had characteristics which pointed to the problem: kernel oops,
POST failures, flickering screens, etc.  The problem I have is
that there is a lack of diagnostic information to focus on the
cause of the server failure.

I don't mean to appear unappreciative, but suggestions which
amount to spending many hours making a series of unfocused
modifications to the server, hoping that one of these random
alterations fixes an infrequent problem, doesn't strike me as
useful.  At the other extreme, the suggestions that I not look
for the cause of the system failure and instead replace the
server with one or three servers also doesn't seem to be a
useful diagnostic approach either.

During the next server downtime, I'll re-seat RAM and
cables, check for excess dust, and do normal maintenance
as folks have suggested.  I might also run a memory diag.
I'll also look at the several excellent and appreciated
suggestions (some of which I've already installed) on how
to get a better picture on the state of the server when/if
there is a future failure.

Thanks all!



-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
m.r...@5-cent.us wrote:
 Michael Eager wrote:
 John Hodrien wrote:
 On Wed, 9 Mar 2011, Michael Eager wrote:

 The problem with randomly replacing various components, other than
 the downtime and nuisance, is that there's no way to know that the
 change actually fixed any problem.  When the base rate is one
 unknown system hang every few weeks, how many wees should I wait
 without a failure to conclude that the replaced component was the
 cause?  A failure which happens infrequently isn't really amenable
 to a random diagnostic approach.
 So you pitch the whole thing over to being a test rig, and buy all new
 hardware?
 I'll repeat from my original post:

 I don't see anything in /var/log/messages or elsewhere
 to indicate any problem or offer any clue why the system
 was hung.

 Any suggestions where I might look for a clue?

 I'm looking for diagnostics to focus on the cause of the crash.
 My thanks for the several suggestions in this area.

 I'm not particularly interested in a listing of the myriad of
 hypothetical causes absent observable evidence and some of
 which are contradicted by evidence (such as overheating).
 snip
 Here's one more, off-the-wall thought: do the setterm --powersave off, and
 find some way to make it work, so that you can see what's on the screen
 when it dies. 

Yes, I did this.  Switched to console screen.  The correct command
is setterm -powersave off -blank off, otherwise the screen gets
blanked.  Turned the monitor off.  I hope it shows something
useful on the next fault.

 What may be very important here is I recently had a problem
 with a honkin' big server crashing... and it turned out that a user was
 running a parallel processing job that kicked off three? four? dozen
 threads, and towards the end of the job, every single thread wanted 10G...
 on a system with 256G RAM (which size still boggles my mind). The
 OOM-Killer didn't even have a chance to do its thing Yes, he's limited
 what his job requests, and the system hasn't crashed since.

Strange.  OOM-Killer should get priority.  That's what it's for.
Although it usually seems to kill the innocent bystanders before
it gets around to killing the offenders.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
Les Mikesell wrote:

 Note that overheating can be localized or a bad heat sink mounting or 
 fan on a CPU.

I'll re-seat the CPU, heatsink, and fan on the next downtime.

Heat related problems usually present as a system which fails
and will not reboot immediately, but will after they sit for a
while to cool down.  This system doesn't do that.

I'll install sensord to log CPU temps in case this is a problem.

 There's not really a good way to approach intermittent failures.  It may 
 only break when you aren't looking.  Major component swaps or taking it 
 offline for extended diagnostics hoping to catch a glimpse of the cause 
 when it fails is about all you can do.
 
 During the next server downtime, I'll re-seat RAM and
 cables, check for excess dust, and do normal maintenance
 as folks have suggested.  I might also run a memory diag.
 I'll also look at the several excellent and appreciated
 suggestions (some of which I've already installed) on how
 to get a better picture on the state of the server when/if
 there is a future failure.
 
 Memory diagnostics may take days to catch a problem.  Did you check for 
 a newer bios for your MB?  I mentioned before that it seemed strange, 
 but I've seen that fix mysterious problems even after the machines had 
 previously been reliable for a long time (and even more oddly, all the 
 machines in the lot weren't affected).

Yes, most memory diagnostics are not very effective.

I'll have to stop the server to find out what the installed bios version
is and see whether there is an update.  Most bios updates appear to only
change supported CPUs.  Something else for the next downtime.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
compdoc wrote:
 I'll re-seat the CPU, heatsink, and fan on the next downtime.
 
 Is the CPU overheating? Pointless to reseat the cpu or even remove the
 heatsink, if not.

No evidence to suggest that it is.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
m.r...@5-cent.us wrote:
 Michael Eager wrote:
 snip
 I'll have to stop the server to find out what the installed bios version
 is and see whether there is an update.  Most bios updates appear to only
 change supported CPUs.  Something else for the next downtime.
 
 Nope: dmidecode, or lshw, is your friend.

Thanks.  Looks like there might be a newer bios available,
although the vendor identifies it as 'beta'.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
Dr. Ed Morbius wrote:

 If the issue is repeated but rare system failures on one of a set of
 similarly configured hosts, I'd RMA the box and get a replacement.  End
 of story.

I'll repeat:  this is a house-made system.  There's no vendor to RMA to.
It seems obvious to me:  RMA is not a diagnostic tool.

 If you'd post
 details of the host, more logging information, netconsole panic logs,
 etc., it might be possible to narrow down possible causes.

The problem is that there are NO DIAGNOSTICS generated when the
system hangs.  There's no panic and nothing in the logs which
indicates any problem.  This is what I indicated from the get go.

 With what you've posted to date, it's not.

I could waste my time posting logs for you to tell me that they don't
point to any problem.  I'd rather skip that step.


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
m.r...@5-cent.us wrote:
 Michael Eager wrote:
 compdoc wrote:
 I'll re-seat the CPU, heatsink, and fan on the next downtime.
 Is the CPU overheating? Pointless to reseat the cpu or even remove the
 heatsink, if not.
 No evidence to suggest that it is.
 
 Have you used ipmitool to see what the temperatures are?

No, I'm not familiar with ipmitool.   I just installed it and
the man page will take some time to read.  It looks like it
does everything and then more.

According to the man page, it apparently needs a kernel driver
named OpenIMPI, which it claims is installed in standard
distributions.  I don't find it on my system.   Running
impitool sdr type Temperature results in an error message
saying that it could not open /dev/imp0, etc.


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
Rudi Ahlers wrote:
 On Thu, Mar 10, 2011 at 12:31 AM, Michael Eager ea...@eagerm.com wrote:
 Dr. Ed Morbius wrote:

 If the issue is repeated but rare system failures on one of a set of
 similarly configured hosts, I'd RMA the box and get a replacement.  End
 of story.
 I'll repeat:  this is a house-made system.  There's no vendor to RMA to.
 
 
 
 I don't know where you are, but in our country we can RMA anything and
 everything. Apart from CPU's. So, even a cheap desktop mobo could be
 RMA'd, as long as I can prove to the suppliers it's faulty, and it's
 within the warrenty period

I responded to Dr. Morbius' suggestion that I RMA the box.
There is vendor to RMA the box to.

If I knew that it was a motherboard problem, I could RMA it.
Or disk, or PSU, or network card, or whatever.  But, as I've mentioned,
there's no indication what causes the system to hang.  There is no
way at this point to prove that it is a defective motherboard.


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
compdoc wrote:
 According to the man page, it apparently needs a kernel driver
 named OpenIMPI, which it claims is installed in standard
 distributions.  I don't find it on my system.
 
 
 lm_sensors is another, and I think installs ready to use from the repos.

sensors says that the three temp sensors read +36C, +39C, and +87C.
These appear to be AMD K10 temp sensors, although I might be
misreading sensors-detect.  Low/highs are (+127/+127, +127/+90,
+127/+127) respectively.  (I'm not sure if these are alarm set
points or something else.)

One fan is listed as 0 rpm.   Something to look into.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
Rudi Ahlers wrote:

 As far as I can see you were giving a bucked load of advice, which you
 haven't even bothered to follow yet. You're the only one who could
 actually do anything about the problem.

I have followed quite a bit of the advice, which I have
appreciated and noted.  I've set up the monitor so that it
will not be blanked on a crash, installed monitoring software,
and checked a number of conditions which people have suggested.

No, I have not responded to the philosophical discussions
about vender management, nor to the suggestions to RMA
something to somebody for unknown reasons.  No, I'm not
going to replace RAM or capacitors here and there on the off
chance that something might be bad.  (But I will look for
capacitors which show signs of bulging or leaking.)

 No amount of suggestions made on this list will fix the problem for
 you. You need to actually take apart the server and see what's going
 on.

I wasn't interested in anyone fixing the server for me.
I did ask for suggestions on how improve the diagnostics
for the problem, which several people have responded to.
Again, I appreciate their suggestions greatly.

As I've said, I have a list of things to check when the
server is next taken down.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
compdoc wrote:
 Err, that should read 128C
 
 -Original Message-
 From: centos-boun...@centos.org [mailto:centos-boun...@centos.org] On Behalf
 Of compdoc
 Sent: Wednesday, March 09, 2011 4:50 PM
 To: 'CentOS mailing list'
 Subject: Re: [CentOS] Server hangs on CentOS 5.5
 
 +36C and +39C are likely your cpu and motherboard temps. You have to look at
 the temps in the cmos and match them.
 
 The +87C is likely just a miss-reading by lm_sensors. Anything running that
 hot won't be stable.
 
 I use AMD as well, and lm_sensors tells me something is 1280C.

I'll compare the values from lm_sensors with the bios
temps to see if they are in line.

1280C is about the melting point of iron.  Wow!


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-09 Thread Michael Eager
Dr. Ed Morbius wrote:

 
 You're NOT obliged to repeat information you've already posted (e.g.:
 home-brew system), but it's helpful to front-load data rather than have
 us tease it out of you.

No intention to have anyone tease information out of me.

The subject line says that the system is CentOS 5.5.  The other
info has been forthcoming, as much as I have been able to provide.
Sorry it wasn't all at the same time -- I didn't think that saying
the server was not a Dell or HP box was important.

 With what you've posted to date, it's not.
 I could waste my time posting logs for you to tell me that they don't
 point to any problem.  I'd rather skip that step.
 
 Krell forfend you should post relevant and useful information which
 might be useful in actually diagnosing your problem (or pointing to
 likely candidates and/or further tests).

The logs are uninformative.  No messages for hours before the crash.

Thanks for the help.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


[CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
Hi --

I'm running a server which is usually stable, but every
once in a while it hangs.  The server is used as a file
store using NFS and to run VMware machines.

I don't see anything in /var/log/messages or elsewhere
to indicate any problem or offer any clue why the system
was hung.

Any suggestions where I might look for a clue?

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
compdoc wrote:
 I'm running a server which is usually stable, but every
 once in a while it hangs.
 
 
 There can be many reasons for that. One thing I'm curious about - try
 looking at the reallocated sector count, and current pending sector count
 for your drives with smartctl.

Thanks for the suggestions.  All disks show zero realloc sectors
and pending sectors.  Smartctl says no failures.  Also, max temp
was 48 C or less.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
Brian Mathis wrote:
 On Tue, Mar 8, 2011 at 12:24 PM, Michael Eager ea...@eagerm.com wrote:
 Hi --

 I'm running a server which is usually stable, but every
 once in a while it hangs.  The server is used as a file
 store using NFS and to run VMware machines.

 I don't see anything in /var/log/messages or elsewhere
 to indicate any problem or offer any clue why the system
 was hung.

 Any suggestions where I might look for a clue?
 
 Please be more specific when you say it hangs.  Does it just pause
 for a minute and then continue working, or does it freeze completely
 until you reboot it?  Does it respond to s soft reboot like
 Ctrl-Alt-Del, or do you need to hard power it off?

System is unresponsive.  Monitor blank, no response to keyboard,
no response to remote ssh.  Hit reset to reboot.

The only indication that I had that there was a problem (other
that attached systems were not accessing files) was that the fan(s)
on the server were louder than normal.

 Since this is an NFS server I'm going to guess there might be a lot of
 IO.  Maybe there is some large IO load going on, like maybe all your
 VMs are running anti-virus scan at the same time, or something like
 that.

At the time, should be very low NFS load.

 To troubleshoot, I recommend installing the 'sar' utilities (yum
 install sysstat) and then reviewing the collected data using the
 'ksar' utility (http://sourceforge.net/projects/ksar/).  sar/ksar are
 good for tracking down acute problems.

Thanks for the suggestion.  I'll look into sar.


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
Les Mikesell wrote:
 On 3/8/2011 11:24 AM, Michael Eager wrote:
 Hi --

 I'm running a server which is usually stable, but every
 once in a while it hangs.  The server is used as a file
 store using NFS and to run VMware machines.

 I don't see anything in /var/log/messages or elsewhere
 to indicate any problem or offer any clue why the system
 was hung.

 Any suggestions where I might look for a clue?
 
 Probably something hardware related.  Bad memory, overheating, power 
 supply, etc.  I've even seen some rare cases where a bios update would 
 fix it although it didn't make much sense for a machine to run for 
 years, then need a firmware change.

The system is on a UPS and temps seem reasonable.
Locating a transient memory problem is time consuming.
Identifying a power supply which sometimes spikes is
even more difficult.  I'd like to have a clue about the
likely problem before shutting down the server for an
extended period.

I'll set up sar and sensord to periodically log system
status and see if this gives me a clue for the next
time this happens.


-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
compdoc wrote:
 The only indication that I had that there was a problem (other
 that attached systems were not accessing files) was that the
 fan(s) on the server were louder than normal.
 
 Are you saying the fans were running faster than normal while it was hung?
 Or are they louder than usual even while its running?

They were louder than normal when hung, but
returned to being quiet after the reboot.

 Fans making noise can mean the fan isn't spinning as fast as it should
 because the bearing is failing. Be a good time to open the case to check to
 see that all fans are working...

Good idea.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
m.r...@5-cent.us wrote:
 Michael Eager wrote:
 Brian Mathis wrote:
 On Tue, Mar 8, 2011 at 12:24 PM, Michael Eager ea...@eagerm.com wrote:
 Hi --

 I'm running a server which is usually stable, but every
 once in a while it hangs.  The server is used as a file
 store using NFS and to run VMware machines.

 I don't see anything in /var/log/messages or elsewhere
 to indicate any problem or offer any clue why the system
 was hung.

 Any suggestions where I might look for a clue?
 snip
 System is unresponsive.  Monitor blank, no response to keyboard,
 no response to remote ssh.  Hit reset to reboot.
 
 Suggestion 1: -from the console-, run
 setterm --powersave off
 That way, even if you connect a monitor (in our, uh, computer labs, we
 have a monitor-on-a-stick), you'll still see what's on the screen at the
 end, not the power save blanking.

I get a message cannot (un)set powersave mode.

I'll add this to .xinitrc.

 The only indication that I had that there was a problem (other
 that attached systems were not accessing files) was that the fan(s)
 on the server were louder than normal.
 
 Um. Um. What make is the server? We had that on some new Suns, where after
 working on them, the fans would spin up and *not* spin down to normal. The
 answer to that was, after powering them down, pull all the plugs, and
 leave them out for 20 sec or so

House-built, Gigabyte MB, AMD Phenom II X6, 6Gb RAM.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
Scott Silva wrote:

 Did you try the obvious stuff for older equipment? Remove and reseat ALL cards
 and memory, several times, to clean off any oxidation from contacts.
 Blow out any dust and collected lint.
 reseat drive cables.

Not yet, but that's always a good idea.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
Michael Eager wrote:
 m.r...@5-cent.us wrote:

 Suggestion 1: -from the console-, run
 setterm --powersave off
 That way, even if you connect a monitor (in our, uh, computer labs, we
 have a monitor-on-a-stick), you'll still see what's on the screen at the
 end, not the power save blanking.
 
 I get a message cannot (un)set powersave mode.
 
 I'll add this to .xinitrc.

Or better, CTRL-ALT-F1 to switch to serial console
and run setterm -powersave off.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos


Re: [CentOS] Server hangs on CentOS 5.5

2011-03-08 Thread Michael Eager
m.r...@5-cent.us wrote:
 Michael Eager wrote:

 House-built, Gigabyte MB, AMD Phenom II X6, 6Gb RAM.
 
 Any chance the problem's with the video card?

Video is on the MB.  It doesn't seem likely that it's
the video, since the system doesn't respond to network
when it crashes.

It could be anything.  That's why I'm looking for
something that would give me a bit of a hint what
to look at.  With an infrequent failure, it's not
practical to replace components piecemeal.

-- 
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077
___
CentOS mailing list
CentOS@centos.org
http://lists.centos.org/mailman/listinfo/centos