Re: PROBLEM: I/O scheduler problem with an 8 SATA disks raid 5 under heavy load ?

Robert Hancock Mon, 07 Jan 2008 16:30:38 -0800

Laurès wrote:

Hello,
Dear kernel developers, my dmesg asked me to report this, so here I go ;)
Here is what I found in my dmesg: "anticipatory: forced dispatching isbroken (nr_sorted=1), please report this".
- First, let's talk about the machine: it's quite pushed so maybe thecause is me doing something wrong rather than a bug in the kernel.
I got this alert on a dual core amd64 xen host. It has 8 SATA drivesmaking a raid 5 array. This array makes a virtual block device for oneof the virtual machines: an Openfiler appliance. Openfiler then manageslogical volumes on this device including an XFS partition shared viaNFS. 2 MythTV hosts continuously write MPEG2 tv shows on it (1 to 4Gbeach).Still following ? Here is a summary: MPEG2 files -> NFS -> XFS -> LVM ->Xen VBD -> RAID 5 -> 8x SATA disks.
- Next, the symptoms.
This setup is only 2 weeks old. Behavior was quite good, except for someunexplained failures from the sata_nv attached disks. Not always fromthe same disk. Never from any disks attached through the sata_sil HBA.Eventually a second disk would go down before the end of the raidreconstruction (still a sata_nv attached one).Since the disks showed nothing wrong with smartmontools I re-added themeach time. So far the raid array was strong enough to be fullyrecovered, mdadm --force and xfs_check are my friends ;-)It seems to happen more often now that the XFS partition is quiteheavily fragmented, and I can't even run the defragmenter without aquick failure.I didn't payed big attention to the logs and quickly decided to buy aSATA Sil PCI card to get rid of the Nvidia SATA HBA.
- Now the problem.
Yesterday, however, the MPEG2 streams hanged for a few tens of secondsjust as usual. But there were no disk failure. The array was still ingood shape, although dmesg showed the same "ata[56]: Resetting port","SCSI errors" etc. fuss.However this was new in dmesg: "anticipatory: forced dispatching isbroken (nr_sorted=1), please report this". Got 4 identical in a row.Maybe managing 8 block devices queues under load with the anticipatoryscheduler is too much ? I immediately switched to deadline on the 8disks, and I'll see if it it happens again by stressing the whole systemmore and more.I have no clue if anticipatory is a good choice or definitely not in mycase, anyone can point some documentation or good advices ?
- How to reproduce.

Here is what I would do:
Harness a small CPU with lots of sata/scsi drives.
Do raid 5 with big block size (1-4Mb) on it.
Make a 50G XFS file system with sunit/swidth options
Trigger bonnie++ with 1G<files<4G and fill the FS to 80-95%, trying toachieve 98%+ fragmentation.
Defrag !

- Finally the usual bug report stuff is attached.


From your report:

ata5: EH in ADMA mode, notifier 0x0 notifier_error 0x0 gen_ctl 0x1501000status 0x400

ata5: CPB 0: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 1: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 2: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 3: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 4: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 5: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 6: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 7: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 8: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 9: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 10: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 11: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 12: ctl_flags 0x1f, resp_flags 0x2
ata5: CPB 13: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 14: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 15: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 16: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 17: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 18: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 19: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 20: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 21: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 22: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 23: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 24: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 25: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 26: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 27: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 28: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 29: ctl_flags 0x1f, resp_flags 0x1
ata5: CPB 30: ctl_flags 0x1f, resp_flags 0x1
ata5: Resetting port
ata5.00: exception Emask 0x0 SAct 0x1f02 SErr 0x0 action 0x2 frozen
ata5.00: cmd 60/40:08:8f:eb:67/00:00:03:00:00/40 tag 1 cdb 0x0 data 32768 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 60/08:40:17:eb:67/00:00:03:00:00/40 tag 8 cdb 0x0 data 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 60/18:48:47:eb:67/00:00:03:00:00/40 tag 9 cdb 0x0 data 12288 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 60/08:50:77:eb:67/00:00:03:00:00/40 tag 10 cdb 0x0 data 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5.00: cmd 60/08:58:87:eb:67/00:00:03:00:00/40 tag 11 cdb 0x0 data 4096 in
         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

ata5.00: cmd 60/48:60:d7:eb:67/00:00:03:00:00/40 tag 12 cdb 0x0 data36864 in

         res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata5: soft resetting port

The CPB resp_flags 0x2 entries are ones where the drive has been sentthe request and the controller is waiting for a response. The timeout is30 seconds, so that means the drive failed to service those queuedcommands for that length of time.

It may be that your drive has a poor NCQ implementation that can starvesome of the pending commands for a long time under heavy load?


--
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: I/O scheduler problem with an 8 SATA disks raid 5 under heavy load ?

Reply via email to