On 05/02/13 12:49, Chris Mason wrote:
On Tue, Feb 05, 2013 at 03:16:34AM -0700, Tomasz Kusmierz wrote:
On 16/01/13 09:21, Bernd Schubert wrote:
On 01/16/2013 12:32 AM, Tom Kusmierz wrote:

p.s. bizzare that when I "fill" ext4 partition with test data everything
check's up OK (crc over all files), but with Chris tool it gets
corrupted - for both Adaptec crappy pcie controller and for mother board
built in one. Also since courses of history proven that my testing
facilities are crap - any suggestion's on how can I test ram, cpu &
controller would be appreciated.
Similar issues had been the reason we wrote ql-fstest at q-leap. Maybe
you could try that? You can easily see the pattern of the corruption
with that. But maybe Chris' stress.sh also provides it.
Anyway, I yesterday added support to specify min and max file size, as
it before only used 1MiB to 1GiB sizes... It's a bit cryptic with
bits, though, I will improve that later.
https://bitbucket.org/aakef/ql-fstest/downloads


Cheers,
Bernd


PS: But see my other thread, using ql-fstest I yesterday entirely
broke a btrfs test file system resulting in kernel panics.
Hi,

Its been a while, but I think I should provide a "definite anwser" or
simply what was the cause of whole problem:

It was a printer!

Long story short, I was going nuts trying to diagnose which bit of my
server is going bad and effectively I was down to blaming a interface
card that connects hotswapable disks to mobo / pcie controllers. When
I've got back from my holiday I've sat in front of server and decided to
go with ql-fstest which in a very nice way reports errors with a very
low lag (~2 minutes) after they occurred. At this point my printer
kicked in with "self clean" and error just showed up after ~ two minutes
- so I've restarted printer and while it was going through it's own post
with self clean another error showed up. Issue here turned out to be
that I was using one of those fantastic pci 4 port ethernet cards and
printer was directly to it - after moving it and everything else to
switch all problem and issues have went away. AT the moment I'm running
server for 2 weeks without any corruptions, any random kernel btrfs
crashes etc.
Wow, I've never heard that one before.  You might want to try a
different 4 port card and/or report it to the driver maintainer.  That
shouldn't happen ;)

ql-fstest looks neat, I'll check it out (thanks Bernd).
-chris

I've forgot to mention that server sits on UPS, and printer is directly connected to mains - when thinking of it, it creates an ground shift effect since nothing on cheap PSU got "real" ground. But anyway this is not a fault of this 4 port card, I've tried moving it to cheap ne2000 and to motherboard integrated one and effect was the same. Also diagnostics was veeery problematic because beside of having a corruption on hdd memtest was returning corruptions in ram, but on a very rare occation, also a cpu test was returning corruption on 1 / day basis. I've replaced nearly everything on this server - including psu (to 1400W from my dev rig) to make NO difference. I should mention as well that this printer is a colour laser printer which got 4 drums to clean, so I would assume that it produces enough static electricity to power a small cattle.

ps. it shouldn't be an driver issue since errors in ram were 1 - 4 bit big located in same 32 bit word - hence i think a single transfer had to be corrupt rather than whole eth packet showed into random memory.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to