Re: [gentoo-user] Question about flakey RAM

2015-01-29 Thread Daniel Frey
On 01/27/2015 03:28 PM, walt wrote:
 My question is why didn't memtest86+ find any errors?  Could it be that the
 first RAM I bought was actually okay but this machine didn't like it for some
 reason?  Both were DDR3/1333MHz, just from different manufacturers.
 

If the timing/voltage is set wrong in the BIOS this can happen. I had
bad memory sticks where the BIOS assumed certain timings and voltage,
but when I set them to the manufacturers recommendations (manually
changing voltage and timings, and no, I was not overclocking...) they
were fine.

I ran the memory I had in its bad state and memtest checked out okay
after leaving it for three days straight testing.

Weirdest thing I'd ever seen.

Dan




Re: [gentoo-user] Question about flakey RAM

2015-01-29 Thread Andrew Savchenko
On Tue, 27 Jan 2015 15:28:11 -0800 walt wrote:
 Yesterday I installed 4GB more of RAM in this machine for a total of 8GB, and
 the machine soon began random segfaulting and even a kernel crash or two, so
 obviously I suspected the new RAM was faulty.
 
 I let memtest86+ run overnight and it found zero memory errors. Today I
 exchanged the new RAM anyway and got a different brand this time, and
 that fixed the problem.
 
 My question is why didn't memtest86+ find any errors?  Could it be that the
 first RAM I bought was actually okay but this machine didn't like it for some
 reason?  Both were DDR3/1333MHz, just from different manufacturers.

As an addition to earlier posted comments:

1) memtest86+ has a bit fade test which is not enabled by default
(at least for 4.x branch which is the latest in tree now), so
you have to enable and run it manually. IIRC it is enabled by
default in 5.x branch (bug pending in bugzilla). By the way 5.x
have some additional tests which may find faults unknown to 4.x

2) The same frequency is not enough to guarantee memory banks
compatibility. They may require different timings or, less probably,
voltage. Some BIOS tuning may help here.

3) Memory may be (un)buffered, (un)registered, ecc/non-ecc. Many of
these combinations are not compatible with each other.

4) In some rare cases even banks with the same parameters from
different manufacturers are not compatible due to technological
differences (this goes down to how logical circuits are
implemented).

Best regards,
Andrew Savchenko


pgpiXoHXB_nSL.pgp
Description: PGP signature


Re: [gentoo-user] Question about flakey RAM

2015-01-29 Thread Volker Armin Hemmann
Am 28.01.2015 um 00:28 schrieb walt:
 Yesterday I installed 4GB more of RAM in this machine for a total of 8GB, and
 the machine soon began random segfaulting and even a kernel crash or two, so
 obviously I suspected the new RAM was faulty.

 I let memtest86+ run overnight and it found zero memory errors. Today I
 exchanged the new RAM anyway and got a different brand this time, and
 that fixed the problem.

 My question is why didn't memtest86+ find any errors?  Could it be that the
 first RAM I bought was actually okay but this machine didn't like it for some
 reason?  Both were DDR3/1333MHz, just from different manufacturers.




Since this was not mentioned yet:

Maybe because the ram was not faulty at all.

Maybe it really operated in the range of allowed tolerances - and those
were never crossed with memtest as a very light system load.

But with an OS booted, the CPU, graphics solution, harddisks all sucking
power like mad, your mainboard or PSU might not be able to deliver as
stable currents as the specifications demand. Some memory is more
tolerant than other.



Re: [gentoo-user] Question about flakey RAM

2015-01-29 Thread Mick
On Thursday 29 Jan 2015 22:13:28 Volker Armin Hemmann wrote:
 Am 28.01.2015 um 00:28 schrieb walt:
  Yesterday I installed 4GB more of RAM in this machine for a total of 8GB,
  and the machine soon began random segfaulting and even a kernel crash or
  two, so obviously I suspected the new RAM was faulty.
  
  I let memtest86+ run overnight and it found zero memory errors. Today I
  exchanged the new RAM anyway and got a different brand this time, and
  that fixed the problem.
  
  My question is why didn't memtest86+ find any errors?  Could it be that
  the first RAM I bought was actually okay but this machine didn't like it
  for some reason?  Both were DDR3/1333MHz, just from different
  manufacturers.
 
 Since this was not mentioned yet:
 
 Maybe because the ram was not faulty at all.
 
 Maybe it really operated in the range of allowed tolerances - and those
 were never crossed with memtest as a very light system load.
 
 But with an OS booted, the CPU, graphics solution, harddisks all sucking
 power like mad, your mainboard or PSU might not be able to deliver as
 stable currents as the specifications demand. Some memory is more
 tolerant than other.

Yes, I've witnessed this too after adding 2 new memory modules of a different 
size to the originals and from a different manufacturer, in a box with a 
suspect PSU.  Memtest+86 was not erroring out, but the system was crashing 
when put under pressure.  Typically I would get errors when more than the size 
of the old memory started being used.  This got worse over time, as the PSU 
components were ageing.  Eventually I replaced a capacitor in the PSU and the 
memory problems disappeared.

It has been already mentioned, but it is worth noting that some BIOS/MoBos are 
more sensitive to different brands of memory.  In those cases I found that 
using the same make and size modules resolves the problems.
-- 
Regards,
Mick


signature.asc
Description: This is a digitally signed message part.


Re: [gentoo-user] Question about flakey RAM

2015-01-28 Thread Alan McKinnon
On 28/01/2015 01:28, walt wrote:
 Yesterday I installed 4GB more of RAM in this machine for a total of 8GB, and
 the machine soon began random segfaulting and even a kernel crash or two, so
 obviously I suspected the new RAM was faulty.
 
 I let memtest86+ run overnight and it found zero memory errors. Today I
 exchanged the new RAM anyway and got a different brand this time, and
 that fixed the problem.
 
 My question is why didn't memtest86+ find any errors?  Could it be that the
 first RAM I bought was actually okay but this machine didn't like it for some
 reason?  Both were DDR3/1333MHz, just from different manufacturers.


RAM, like everything else made in a factory, is built to tolerances. So
is your CPU and motherboard.

A positive result from memtest+ (something failed) is definitive - there
really is a problem and it is likely the RAM. Or maybe your RAM just
doesn't like your motherboard but this is rare.

A negative result from memtest (nothing failed) is not definitive - it
doesn't mean the RAM and your system is not faulty, it just means
memtest+ didn't find anything. Sometimes you have to run memtest+ for 48
hours to trip over the problem whereas your running OS does it
immediately (it's a computer, go figure)

Keep in mind memtest+ is an artifical testbed - it tries it's best to
find issues but it's not the same thing as your running system. And
there's lots of variables:

Have you overclocked? Over or under volted?
Is your PSU OK, could the running system stress it out?
Or maybe timing tolerances between that RAM stick your motherboard are
close to the edge.

I think your machine just didn;t like that RAM and it would work fine in
999 other machines. It happens sometimes, manufacturing and test rigs
are not 100 perfect. They are close, but never 100%

-- 
Alan McKinnon
alan.mckin...@gmail.com