On 06/14/2011 05:12 PM, Massimiliano Ferrero wrote: > >> Next time, when you try to test/re-create the bug, capture dstat output. >> The default dstat output is good enough to report us on the system state >> was during starvation. > Hello, yesterday and tonight I performed some other tests, these are the > results: > > 1) it seems I am not able to reproduce the bug on a test system > the test system (san01) has the same processor (E5220) and amount of RAM > 12 GB, but a smaller I/O system: an 8 channel 3ware controller with an 8 > disks raid 5 array > the system that presents the problem (san00) has a 24 channel controller > and a 23 disks raid 6 array (+ 1 hot spare) > both systems are connected through the same gigabit switches >
Is the OS/kernel also the same? > there is another hw difference between the two environment: the nodes > connected to san00 are high end hw, their network card is able to > generate nearly 1 Gb/s of iscsi traffic > the nodes connected to san01 are low end hw and their network card does > not exceed 300 Mb/s > so the system that presents the problem has both an I/O subsystem with > higher performance and the machine that is doing iscsi traffic is able > to generate more than 3 times i/o operations > Unfortunately, dstat default output doesn't capture VM statistics. Do you have any idea what was the VM consumption when you saw the problem? Assuming it is buffered I/O, Your VM will soon be consumed (as I see from dstat logs that the CPU was in wait for a very long time) and then start paging. And this is the scenario, where linux tends to succumb, still. On your faulty setup, can you try "unbuffered direct I/O" and see if that can trigger the problem? I have low hopes that that will fail. To trigger (direct) I/O, you can use fio tool. It is already pre-packaged in Debian. > at the moment I am not able to tell which of these aspects, or the sum > of them, create the condition for the problem: I suspect that it's a mix > of all these > unfortunately at the moment I do not have hw similar to the one in > production to perform a test in the same conditions. > > 2) san00 presents the problem event with deadline scheduler active on > all logical volume exported through iscsi or used by the heavy load > operation (dd) > > 3) on san00 I was able to reproduce the problem in a simpler condition > than the one I described in the first mail: just one node connected > through iscsi, the other node was restarting, no virtual machines > running on the node, the node was performing one i/o intensive operation > on one of the lv exported by iscsi/lvm (an fsck on one file system) > during this operation I launched a dd on san00 and the iscsi connection > was dropped after a few seconds > I think it is the typical linux I/O controller problem, which I believe is a combination of I/O Scheduler + VM Subsystem. > I am attaching 3 files: dstat output during the test and an extract of > /var/log/messages and /var/log/syslog > I have just filtered out information for non relevant services (nagios, > dhcp, snmp, postfix, etc.) both for readability and confidentiality > ietd was running with the following command line > /usr/sbin/ietd --debug=255 > so in the log we have debug information > the problem can be seen in syslog at Jun 14 01:28:53 > at Jun 14 01:34:06 I turned off the node for reboot and in the log there > are some record regarding termination of iscsi sessions > I do not see anything relevant in ietd debug log, just a restart of the > connections > > in dstat output the dd operation was started around line 197 and was > terminated at line 208 (I interrupted the operation as soon as I saw the > problem) > > what I see in dstat output is the following: dd for some seconds (about > 10) does not generate a lot of read and writes > > usr sys idl wai hiq siq| read writ| recv send| in out | int csw > 7 2 56 35 0 0| 12M 14M| 22k 35k| 0 0 |4415 12k > > 12M read and 14M write, and this could be from the dd operation or the > fsck performed through iscsi > > then there is a burst of write, I guess using the full I/O capacity of > the controller and of the disks > > usr sys idl wai hiq siq| read writ| recv send| in out | int csw > 35 7 35 22 0 1|8180k 325M| 38k 25k| 0 0 |6860 11k > 2 3 59 36 0 1|3072B 541M| 20k 26k| 0 0 |5380 2747 > 3 4 64 30 0 0|5120B 473M| 21k 30k| 0 0 |4752 16k > > write 325M, 541M, 473M > and this is exactly the moment when the problem arise > > could it be that the i/o operation are cached in memory and the problem > presents when they are flushed to disk? > Yes, that's what my suspicion is also. The per bdi writeback mechanism improves this situation to a great extent but I'm not sure if that is part of the Squeeze kernel. http://lwn.net/Articles/326552/ > > If from the logs does not come out any pointer to a potential solution > the only other test I can think of is upgrading to a newer kernel, but I You can try this just to ascertain the cause. > see this a last resort for several reasons: > - as I see it putting a test kernel directly on a production system is > not a wise move, I could (and in the past already have) incur into > several other unknown bugs > - all our other systems are running on a standard lenny or squeeze kernel > - I would lose support for kernel security updates from debian > -- Ritesh Raj Sarraf | http://people.debian.org/~rrs Debian - The Universal Operating System
signature.asc
Description: OpenPGP digital signature