Re: [gentoo-user] machine check exception errors

2010-09-25 Thread Grant
  Thanks Mick.  My host is big with multiple data centers of their own.
  They did exactly as I asked and I'm running on new RAM.  There was a
  problem bringing my system back online and the cause was purported to
  be an unseated ethernet cable.  I handed over my root password as I
  was requested to do, and then started to get paranoid.  I suppose I
  shouldn't though because with physical access to my machine they
  pretty much have full access anyway, right?

 Usually, physical access means they either have it or can get it pretty
 quick.  Boot a CD/DVD, mount the partitions, chroot in, change password
 and reboot.  Then, you don't have the password but they do.

 That's pretty obvious though. Physical access allows them to change your
 password but not read it, so you'd know pretty soon if they'd been up to
 anything.

 If they really do need the root password, you have to give it to them,
 but that doesn't stop you changing it, and running a rootkit scan, as
 soon as they've finished with it.

I've run chkrootkit, but I noticed:

The file of stored file properties (rkhunter.dat) does not exist, and
so must be created. To do this type in 'rkhunter --propupd'.

I thought the best practice with a rootkit checker like chkrootkit was
to not leave it installed on the system so you can run it as a clean
install when the time comes?

Do any of these warnings sound an alarm for anyone?  I think the SSH
warnings are OK because I have a normal user specified with AllowUsers
and the config file says:

# The default requires explicit activation of protocol 1
#Protocol 2

Here are the warnings:

Warning: The command '/usr/bin/ldd' has been replaced by a script:
/usr/bin/ldd: Bourne-Again shell script text executable

Warning: The command '/usr/bin/whatis' has been replaced by a script:
/usr/bin/whatis: POSIX shell script text executable

Warning: The command '/usr/bin/lwp-request' has been replaced by a
script: /usr/bin/lwp-request: a /usr/bin/perl -w script text
executable

Warning: No output found from the lsmod command or the /proc/modules file:
/proc/modules output:
lsmod output:

Warning: The SSH configuration option 'PermitRootLogin' has not been
set. The default value may be 'yes', to allow root access.

Warning: The SSH configuration option 'Protocol' has not been set. The
default value may be '2,1', to allow the use of protocol version 1.

Warning: Hidden directory found: /dev/.udev

- Grant



Re: [gentoo-user] machine check exception errors

2010-09-25 Thread Volker Armin Hemmann
On Tuesday 21 September 2010, Stroller wrote:
 On 21 Sep 2010, at 18:37, Grant wrote:
  I'm getting a lot of machine check exception errors in dmesg on my
  hosted server.  Running mcelog I get:
  ...
  
  They offered to take my machine down and do a memory test which they
  said would take a number of hours.  Is a memory test likely to help?
  Did you suggest reseating or replacing RAM modules as opposed to a
  memory test because it will result in less downtime?
 
 I suspect that your hosting provider are offering you this memory test
 because they don't want to go swapping out memory modules willy-nilly.
 
 How do they know that the problem is really memory, and not your operating
 system? If they take all this RAM out and put new RAM in, what do they do
 with the old RAM? They don't know if it's good or bad, so are they
 expected to just slap it in a server belonging to another customer, and
 stitch him up?
 
 A memory test is likely to identify bad RAM, if it is bad, so you should
 proceed with this. This is likely the best route to solving the problem.
 

sure?
this is ecc ram - does memtest report ecc-corrected errors? i don't think so. 
The mce errors say:
we detected an error. Error was corrected. Applications will not see error. 
Everything marches on.

The ram is borked and must be replaced. 



Re: [gentoo-user] machine check exception errors

2010-09-23 Thread Neil Bothwick
On Wed, 22 Sep 2010 23:26:09 -0500, Dale wrote:

  Thanks Mick.  My host is big with multiple data centers of their own.
  They did exactly as I asked and I'm running on new RAM.  There was a
  problem bringing my system back online and the cause was purported to
  be an unseated ethernet cable.  I handed over my root password as I
  was requested to do, and then started to get paranoid.  I suppose I
  shouldn't though because with physical access to my machine they
  pretty much have full access anyway, right?

 Usually, physical access means they either have it or can get it pretty 
 quick.  Boot a CD/DVD, mount the partitions, chroot in, change password 
 and reboot.  Then, you don't have the password but they do.

That's pretty obvious though. Physical access allows them to change your
password but not read it, so you'd know pretty soon if they'd been up to
anything.

If they really do need the root password, you have to give it to them,
but that doesn't stop you changing it, and running a rootkit scan, as
soon as they've finished with it.


-- 
Neil Bothwick

God said, div D = rho, div B = 0, curl E = - @B/@t, curl H = J + @D/@t,
and there was light.


signature.asc
Description: PGP signature


Re: [gentoo-user] machine check exception errors

2010-09-22 Thread Mick
On Wednesday 22 September 2010 02:24:39 Grant wrote:
   I'm getting a lot of machine check exception errors in dmesg on my
   hosted server.  Running mcelog I get:
   ...
   
   They offered to take my machine down and do a memory test which they
   said would take a number of hours.  Is a memory test likely to help?
   Did you suggest reseating or replacing RAM modules as opposed to a
   memory test because it will result in less downtime?
  
  I suspect that your hosting provider are offering you this memory test
  because they don't want to go swapping out memory modules willy-nilly.
  
  How do they know that the problem is really memory, and not your
  operating system? If they take all this RAM out and put new RAM in,
  what do they do with the old RAM? They don't know if it's good or bad,
  so are they expected to just slap it in a server belonging to another
  customer, and stitch him up?
  
  A memory test is likely to identify bad RAM, if it is bad, so you should
  proceed with this. This is likely the best route to solving the problem.
  
  I think that ideally, for you, they would move the system image onto a
  different known-good server with the same configuration. Then you cannot
  complain if the same problems start occurring again. If the problem is
  genuinely hardware then they won't. And the hosting provider is free to
  run diagnostics on your old machine.
  
  But realistically, the memory test is likely to show up a bad RAM
  module, you'll get it replaced and be up and running within a few
  hours. Why would you refuse? If your system needed a guaranteed uptime
  you'd perhaps have to pay for a higher level of service than the fees
  you're paying at present.
  
  I run memory tests overnight.  If a module is seriously borked then it
  will fail earlier.  Reseating/replacing takes a few minutes, instead of
  hours.
  
  If they have spare machines (for dev't or testing) they can fit the
  memory module(s) there and test them exhaustively, before they put the
  good ones back into a customer's machine.
 
 Thanks Mick and Stroller.  I'll see if they'll go for this.

You're welcome.  Bear in mind though that a lot of hosters are just glorified 
resellers with an account in a bigger data centre.  In many cases they do not 
even have physical access to the machines.  Only the data centre techies do 
and they may be less willing to oblige and break procedure or routine, just 
because one end user out of hundreds/thousands complained about some memory 
errors.

YMMV
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] machine check exception errors

2010-09-22 Thread Grant
   I'm getting a lot of machine check exception errors in dmesg on my
   hosted server.  Running mcelog I get:
   ...
  
   They offered to take my machine down and do a memory test which they
   said would take a number of hours.  Is a memory test likely to help?
   Did you suggest reseating or replacing RAM modules as opposed to a
   memory test because it will result in less downtime?
 
  I suspect that your hosting provider are offering you this memory test
  because they don't want to go swapping out memory modules willy-nilly.
 
  How do they know that the problem is really memory, and not your
  operating system? If they take all this RAM out and put new RAM in,
  what do they do with the old RAM? They don't know if it's good or bad,
  so are they expected to just slap it in a server belonging to another
  customer, and stitch him up?
 
  A memory test is likely to identify bad RAM, if it is bad, so you should
  proceed with this. This is likely the best route to solving the problem.
 
  I think that ideally, for you, they would move the system image onto a
  different known-good server with the same configuration. Then you cannot
  complain if the same problems start occurring again. If the problem is
  genuinely hardware then they won't. And the hosting provider is free to
  run diagnostics on your old machine.
 
  But realistically, the memory test is likely to show up a bad RAM
  module, you'll get it replaced and be up and running within a few
  hours. Why would you refuse? If your system needed a guaranteed uptime
  you'd perhaps have to pay for a higher level of service than the fees
  you're paying at present.
 
  I run memory tests overnight.  If a module is seriously borked then it
  will fail earlier.  Reseating/replacing takes a few minutes, instead of
  hours.
 
  If they have spare machines (for dev't or testing) they can fit the
  memory module(s) there and test them exhaustively, before they put the
  good ones back into a customer's machine.

 Thanks Mick and Stroller.  I'll see if they'll go for this.

 You're welcome.  Bear in mind though that a lot of hosters are just glorified
 resellers with an account in a bigger data centre.  In many cases they do not
 even have physical access to the machines.  Only the data centre techies do
 and they may be less willing to oblige and break procedure or routine, just
 because one end user out of hundreds/thousands complained about some memory
 errors.

Thanks Mick.  My host is big with multiple data centers of their own.
They did exactly as I asked and I'm running on new RAM.  There was a
problem bringing my system back online and the cause was purported to
be an unseated ethernet cable.  I handed over my root password as I
was requested to do, and then started to get paranoid.  I suppose I
shouldn't though because with physical access to my machine they
pretty much have full access anyway, right?

- Grant



Re: [gentoo-user] machine check exception errors

2010-09-22 Thread Dale

Grant wrote:


Thanks Mick.  My host is big with multiple data centers of their own.
They did exactly as I asked and I'm running on new RAM.  There was a
problem bringing my system back online and the cause was purported to
be an unseated ethernet cable.  I handed over my root password as I
was requested to do, and then started to get paranoid.  I suppose I
shouldn't though because with physical access to my machine they
pretty much have full access anyway, right?

- Grant


   


Usually, physical access means they either have it or can get it pretty 
quick.  Boot a CD/DVD, mount the partitions, chroot in, change password 
and reboot.  Then, you don't have the password but they do.


My conspiracy hat on, if you can't trust them with the password, why do 
they have your data?  Just thinking.  ;-)


This leaves out the encryption thing tho.  That would change things.

Dale

:-)  :-)



Re: [gentoo-user] machine check exception errors

2010-09-21 Thread Grant
  I'm getting a lot of machine check exception errors in dmesg on my
  hosted server.  Running mcelog I get:
 
  # mcelog
  HARDWARE ERROR. This is *NOT* a software problem!

 [...]

  Should I just contact the hosting company?  Can anyone give me more
  info on what this means?  Bad memory?

 They are likely better able to help you if it's a hardware problem.

 It reads as if the error correction in one of the RAM modules is kicking in.
 Ask them to reseat or replace the bad module - which they will have to find by
 trial and error.  They could hot-swap them and see then the errors stop.
 --
 Regards,
 Mick

They offered to take my machine down and do a memory test which they
said would take a number of hours.  Is a memory test likely to help?
Did you suggest reseating or replacing RAM modules as opposed to a
memory test because it will result in less downtime?

- Grant



Re: [gentoo-user] machine check exception errors

2010-09-21 Thread Stroller

On 21 Sep 2010, at 18:37, Grant wrote:
 I'm getting a lot of machine check exception errors in dmesg on my
 hosted server.  Running mcelog I get:
 ...
 
 They offered to take my machine down and do a memory test which they
 said would take a number of hours.  Is a memory test likely to help?
 Did you suggest reseating or replacing RAM modules as opposed to a
 memory test because it will result in less downtime?

I suspect that your hosting provider are offering you this memory test because 
they don't want to go swapping out memory modules willy-nilly.

How do they know that the problem is really memory, and not your operating 
system?
If they take all this RAM out and put new RAM in, what do they do with the old 
RAM? They don't know if it's good or bad, so are they expected to just slap it 
in a server belonging to another customer, and stitch him up?

A memory test is likely to identify bad RAM, if it is bad, so you should 
proceed with this. This is likely the best route to solving the problem.

I think that ideally, for you, they would move the system image onto a 
different known-good server with the same configuration. Then you cannot 
complain if the same problems start occurring again. If the problem is 
genuinely hardware then they won't. And the hosting provider is free to run 
diagnostics on your old machine.

But realistically, the memory test is likely to show up a bad RAM module, 
you'll get it replaced and be up and running within a few hours. Why would you 
refuse? If your system needed a guaranteed uptime you'd perhaps have to pay for 
a higher level of service than the fees you're paying at present.

Stroller.




Re: [gentoo-user] machine check exception errors

2010-09-21 Thread Mick
On Tuesday 21 September 2010 20:15:05 Stroller wrote:
 On 21 Sep 2010, at 18:37, Grant wrote:
  I'm getting a lot of machine check exception errors in dmesg on my
  hosted server.  Running mcelog I get:
  ...
  
  They offered to take my machine down and do a memory test which they
  said would take a number of hours.  Is a memory test likely to help?
  Did you suggest reseating or replacing RAM modules as opposed to a
  memory test because it will result in less downtime?
 
 I suspect that your hosting provider are offering you this memory test
 because they don't want to go swapping out memory modules willy-nilly.
 
 How do they know that the problem is really memory, and not your operating
 system? If they take all this RAM out and put new RAM in, what do they do
 with the old RAM? They don't know if it's good or bad, so are they
 expected to just slap it in a server belonging to another customer, and
 stitch him up?
 
 A memory test is likely to identify bad RAM, if it is bad, so you should
 proceed with this. This is likely the best route to solving the problem.
 
 I think that ideally, for you, they would move the system image onto a
 different known-good server with the same configuration. Then you cannot
 complain if the same problems start occurring again. If the problem is
 genuinely hardware then they won't. And the hosting provider is free to
 run diagnostics on your old machine.
 
 But realistically, the memory test is likely to show up a bad RAM module,
 you'll get it replaced and be up and running within a few hours. Why would
 you refuse? If your system needed a guaranteed uptime you'd perhaps have
 to pay for a higher level of service than the fees you're paying at
 present.

I run memory tests overnight.  If a module is seriously borked then it will 
fail earlier.  Reseating/replacing takes a few minutes, instead of hours.

If they have spare machines (for dev't or testing) they can fit the memory 
module(s) there and test them exhaustively, before they put the good ones back 
into a customer's machine.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] machine check exception errors

2010-09-21 Thread Grant
  I'm getting a lot of machine check exception errors in dmesg on my
  hosted server.  Running mcelog I get:
  ...
 
  They offered to take my machine down and do a memory test which they
  said would take a number of hours.  Is a memory test likely to help?
  Did you suggest reseating or replacing RAM modules as opposed to a
  memory test because it will result in less downtime?

 I suspect that your hosting provider are offering you this memory test
 because they don't want to go swapping out memory modules willy-nilly.

 How do they know that the problem is really memory, and not your operating
 system? If they take all this RAM out and put new RAM in, what do they do
 with the old RAM? They don't know if it's good or bad, so are they
 expected to just slap it in a server belonging to another customer, and
 stitch him up?

 A memory test is likely to identify bad RAM, if it is bad, so you should
 proceed with this. This is likely the best route to solving the problem.

 I think that ideally, for you, they would move the system image onto a
 different known-good server with the same configuration. Then you cannot
 complain if the same problems start occurring again. If the problem is
 genuinely hardware then they won't. And the hosting provider is free to
 run diagnostics on your old machine.

 But realistically, the memory test is likely to show up a bad RAM module,
 you'll get it replaced and be up and running within a few hours. Why would
 you refuse? If your system needed a guaranteed uptime you'd perhaps have
 to pay for a higher level of service than the fees you're paying at
 present.

 I run memory tests overnight.  If a module is seriously borked then it will
 fail earlier.  Reseating/replacing takes a few minutes, instead of hours.

 If they have spare machines (for dev't or testing) they can fit the memory
 module(s) there and test them exhaustively, before they put the good ones back
 into a customer's machine.

Thanks Mick and Stroller.  I'll see if they'll go for this.

- Grant



Re: [gentoo-user] machine check exception errors

2010-09-15 Thread Mick
On Tuesday 14 September 2010 19:16:52 Albert Hopkins wrote:
 On Tue, 2010-09-14 at 09:45 -0700, Grant wrote:
  I'm getting a lot of machine check exception errors in dmesg on my
  hosted server.  Running mcelog I get:
  
  # mcelog
  HARDWARE ERROR. This is *NOT* a software problem!
 
 [...]
 
  Should I just contact the hosting company?  Can anyone give me more
  info on what this means?  Bad memory?
 
 They are likely better able to help you if it's a hardware problem.

It reads as if the error correction in one of the RAM modules is kicking in.  
Ask them to reseat or replace the bad module - which they will have to find by 
trial and error.  They could hot-swap them and see then the errors stop.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] machine check exception errors

2010-09-14 Thread Albert Hopkins
On Tue, 2010-09-14 at 09:45 -0700, Grant wrote:
 I'm getting a lot of machine check exception errors in dmesg on my
 hosted server.  Running mcelog I get:
 
 # mcelog
 HARDWARE ERROR. This is *NOT* a software problem!
[...]
 Should I just contact the hosting company?  Can anyone give me more
 info on what this means?  Bad memory?

They are likely better able to help you if it's a hardware problem.