Hello everybody,

I'm also facing this problem for several months, trying to solve it individually.

In my case the situation is yet more complicated, as I'm using quite complex setup, connecting VM's partitions from another server via AoE.

According to console logs, the problems seems to be related to networking, as the first "hang-up" messages are usually something like
igb 0000:01:00.0 eth2: Reset adapter
igb 0000:01:00.0 eth2: Reset adapter
...

followed by sequence of:
ata3.00: exception Emask 0x0 SAct 0x300000 SErr 0x0 action 0x6 frozen
ata3.00: failed command: WRITE FPDMA QUEUED
ata3.00: cmd 61/01:a0:08:f0:08/00:00:00:00:00/40 tag 20 ncq dma 512 out
         res 40/00:ff:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:a8:20:f8:55/00:00:11:00:00/40 tag 21 ncq dma 4096 in
         res 40/00:00:00:4f:c2/00:00:00:00:00/40 Emask 0x4 (timeout)
...

and then other I/O errors, like
blk_update_request: I/O error, dev sdb, sector 254266864

and others.


IMHO something (the networking?) breaks the I/O in kernel. Nevertheless, the system somehow keeps running, being not able to read or write to disks, but a kernel panic is not announced!

I've tried all suggested actions - trying newer kernels, building custom kernel with disabled IPv6 & QoS and other potentially problematic items, disabling all suspected features in BIOS, tuning network interfaces parameters, even disabling them (and using others from another vendor)...
With no difference.

However I've discovered by lucky chance, that the problem is somehow related to CPU family - when using the Intel E55xx (tested at least for E5520 and E5504), the problem occurs quite frequently (~ once per day), while using E56xx (tested at least for E5606), the problem becomes very rare or disappears at all (I'm using it longer than 1 month without a problem now).

I hope this information will help to solve this problem, or at least it will give viable option for other afflicted (and desparate) users.

Regards,

Jan.

Reply via email to