Re: repeatable hang with loop mount and heavy IO in guest [NOT SOLVED]

2010-02-03 Thread Antoine Martin

On 01/23/2010 02:15 AM, Antoine Martin wrote:

On 01/23/2010 01:28 AM, Antoine Martin wrote:

On 01/22/2010 02:57 PM, Michael Tokarev wrote:

Antoine Martin wrote:

I've tried various guests, including most recent Fedora12 kernels,
custom 2.6.32.x
All of them hang around the same point (~1GB written) when I do 
heavy IO

write inside the guest.

[]

Host is running: 2.6.31.4
QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Please update to last version and repeat.  kvm-88 is ancient and
_lots_ of stuff fixed and changed since that time, I doubt anyone
here will try to dig into kvm-88 problems.

Current kvm is qemu-kvm-0.12.2, released yesterday.

Sorry about that, I didn't realize 88 was so far behind.
Upgrading to qemu-kvm-0.12.2 did solve my IO problems.
Only for a while. Same problem just re-occurred, only this time it 
went a little further.

It is now just sitting there, with a load average of exactly 3.0 (+- 5%)

Here is a good trace of the symptom during writeback, you can see it 
write the data at around 50MB/s, it goes from being idle to sys, but 
after a while it just stops writing and goes into mostly wait state:

total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
  1   0  99   0   0   0|   0 0 | 198B  614B|   0 0 |  3617
  1   0  99   0   0   0|   0 0 | 198B  710B|   0 0 |  3117
  1   1  98   0   0   0|   0   128k| 240B  720B|   0 0 |  3926
  1   1  98   0   0   0|   0 0 | 132B  564B|   0 0 |  3114
  1   0  99   0   0   0|   0 0 | 132B  468B|   0 0 |  3114
  1   1  98   0   0   0|   0 0 |  66B  354B|   0 0 |  3013
  0   4  11  85   0   0| 852k0 | 444B 1194B|   0 0 | 215   477
  2   2   0  96   0   0| 500k0 | 132B  756B|   0 0 | 169   458
  3  57   0  39   1   0| 228k   10M| 132B  692B|   0 0 | 476  5387
  6  94   0   0   0   0|  28k   23M| 132B  884B|   0 0 | 373  2142
  6  89   0   2   2   0|  40k   38M|  66B  692B|   0  8192B| 502  5651
  4  47   0  48   0   0| 140k   34M| 132B  836B|   0 0 | 605  1664
  3  64   0  30   2   0|  60k   50M| 132B  370B|   060k| 750   631
  4  59   0  35   2   0|  48k   45M| 132B  836B|   028k| 708  1293
  7  81   0  10   2   0|  68k   67M| 132B  788B|   0   124k| 928  1634
  5  74   0  20   1   0|  48k   48M| 132B  756B|   0   316k| 830  5715
  5  70   0  24   1   0| 168k   48M| 132B  676B|   0   100k| 734  5325
  4  70   0  24   1   0|  72k   49M| 132B  948B|   088k| 776  3784
  5  57   0  37   1   0|  36k   37M| 132B  996B|   0   480k| 602   369
  2  21   0  77   0   0|  36k   23M| 132B  724B|   072k| 318  1033
  4  51   0  43   2   0| 112k   43M| 132B  756B|   0   112k| 681   909
  5  55   0  40   0   0|  88k   48M| 140B  926B|  16k   12k| 698   557
total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  3  45   0  51   1   0|2248k   29M| 198B 1028B|  28k   44k| 681  5468
  1  21   0  78   0   0|  92k   17M|1275B 2049B|  92k   52k| 328  1883
  3  30   0  66   1   0| 288k   28M| 498B 2116B|   040k| 455   679
  1   1   0  98   0   0|4096B0 | 394B 1340B|4096B0 |  4119
  1   1   0  98   0   0| 148k   52k| 881B 1592B|4096B   44k|  7561
  1   2   0  97   0   0|1408k0 | 351B 1727B|   0 0 | 110   109
  2   1   0  97   0   0|8192B0 |1422B 1940B|   0 0 |  5334
  1   0   0  99   0   0|4096B   12k| 328B 1018B|   0 0 |  4124
  1   4   0  95   0   0| 340k0 |3075B 2152B|4096B0 | 153   191
  4   7   0  89   0   0|1004k   44k|1526B 1906B|   0 0 | 254   244
  0   1   0  99   0   0|  76k0 | 708B 1708B|   0 0 |  6757
  1   1   0  98   0   0|   0 0 | 174B  702B|   0 0 |  3214
  1   1   0  98   0   0|   0 0 | 132B  354B|   0 0 |  3211
  1   0   0  99   0   0|   0 0 | 132B  468B|   0 0 |  3216
  1   0   0  99   0   0|   0 0 | 132B  468B|   0 0 |  3214
  1   1   0  98   0   0|   052k| 132B  678B|   0 0 |  4127
  1   0   0  99   0   0|   0 0 | 198B  678B|   0 0 |  3517
  1   1   0  98   0   0|   0 0 | 198B  468B|   0 0 |  3414
  1   0   0  99   0   0|   0 0 |  66B  354B|   0 0 |  2811
  1   0   0  99   0   0|   0 0 |  66B  354B|   0 0 |  28 9
  1   1   0  98   0   0|   0 0 | 132B  468B|   0 0 |  3416
  1   0   0  98   0   1|   0 0 |  66B  354B|   0 0 |  3011
  1   1   0  98   0   0|   0 0 |  66B  354B|   0 0 |  2911
From that point onwards, nothing will happen.
The host has disk IO to spare... So what is it waiting for??

Moved to an AMD64 host. No effect.
Disabled swap before running the test. No effect.
Moved the guest to a fully up-to-date FC12 server 
(2.6.31.6-145.fc12.x86_64), no effect.


I am still seeing traces like these in dmesg (various length, but always 
ending in sync_page):


[ 2401.350143] INFO: task perl:29512 blocked for more 

Re: repeatable hang with loop mount and heavy IO in guest [NOT SOLVED]

2010-01-24 Thread Antoine Martin

On 01/23/2010 02:15 AM, Antoine Martin wrote:

On 01/23/2010 01:28 AM, Antoine Martin wrote:

On 01/22/2010 02:57 PM, Michael Tokarev wrote:

Antoine Martin wrote:

I've tried various guests, including most recent Fedora12 kernels,
custom 2.6.32.x
All of them hang around the same point (~1GB written) when I do 
heavy IO

write inside the guest.

[]

Host is running: 2.6.31.4
QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Please update to last version and repeat.  kvm-88 is ancient and
_lots_ of stuff fixed and changed since that time, I doubt anyone
here will try to dig into kvm-88 problems.

Current kvm is qemu-kvm-0.12.2, released yesterday.

Sorry about that, I didn't realize 88 was so far behind.
Upgrading to qemu-kvm-0.12.2 did solve my IO problems.
Only for a while. Same problem just re-occurred, only this time it 
went a little further.

It is now just sitting there, with a load average of exactly 3.0 (+- 5%)

Here is a good trace of the symptom during writeback, you can see it 
write the data at around 50MB/s, it goes from being idle to sys, but 
after a while it just stops writing and goes into mostly wait state:

[snip]

From that point onwards, nothing will happen.
The host has disk IO to spare... So what is it waiting for??
Note: if I fill the disk in the guest with zeroes but without going via 
a loop mounted filesystem, then everything works just fine. Something in 
using the loopback makes it fall over.


Here is the simplest way to make this happen:
time dd if=/dev/zero of=./test bs=1048576 count=2048
2147483648 bytes (2.1 GB) copied, 65.1344 s, 33.0 MB/s

mkfs.ext3 ./test; mkdir tmp
mount -o loop ./test ./tmp
time dd if=/dev/zero of=./tmp/test-loop bs=1048576 count=2048
^this one will never return and you can't just kill dd, it's stuck.
The whole guest has to be killed at this point.


QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 
2003-2008 Fabrice Bellard

Guests: various, all recent kernels.
Host: 2.6.31.4
Before anyone suggests this, I have tried with/without elevator=noop, 
with/without virtio disks.

No effect, still hangs.

Antoine


Please advise.

Thanks
Antoine



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: repeatable hang with loop mount and heavy IO in guest [NOT SOLVED]

2010-01-22 Thread Antoine Martin

On 01/23/2010 01:28 AM, Antoine Martin wrote:

On 01/22/2010 02:57 PM, Michael Tokarev wrote:

Antoine Martin wrote:

I've tried various guests, including most recent Fedora12 kernels,
custom 2.6.32.x
All of them hang around the same point (~1GB written) when I do 
heavy IO

write inside the guest.

[]

Host is running: 2.6.31.4
QEMU PC emulator version 0.10.50 (qemu-kvm-devel-88)

Please update to last version and repeat.  kvm-88 is ancient and
_lots_ of stuff fixed and changed since that time, I doubt anyone
here will try to dig into kvm-88 problems.

Current kvm is qemu-kvm-0.12.2, released yesterday.

Sorry about that, I didn't realize 88 was so far behind.
Upgrading to qemu-kvm-0.12.2 did solve my IO problems.
Only for a while. Same problem just re-occurred, only this time it went 
a little further.

It is now just sitting there, with a load average of exactly 3.0 (+- 5%)

Here is a good trace of the symptom during writeback, you can see it 
write the data at around 50MB/s, it goes from being idle to sys, but 
after a while it just stops writing and goes into mostly wait state:

total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
  1   0  99   0   0   0|   0 0 | 198B  614B|   0 0 |  3617
  1   0  99   0   0   0|   0 0 | 198B  710B|   0 0 |  3117
  1   1  98   0   0   0|   0   128k| 240B  720B|   0 0 |  3926
  1   1  98   0   0   0|   0 0 | 132B  564B|   0 0 |  3114
  1   0  99   0   0   0|   0 0 | 132B  468B|   0 0 |  3114
  1   1  98   0   0   0|   0 0 |  66B  354B|   0 0 |  3013
  0   4  11  85   0   0| 852k0 | 444B 1194B|   0 0 | 215   477
  2   2   0  96   0   0| 500k0 | 132B  756B|   0 0 | 169   458
  3  57   0  39   1   0| 228k   10M| 132B  692B|   0 0 | 476  5387
  6  94   0   0   0   0|  28k   23M| 132B  884B|   0 0 | 373  2142
  6  89   0   2   2   0|  40k   38M|  66B  692B|   0  8192B| 502  5651
  4  47   0  48   0   0| 140k   34M| 132B  836B|   0 0 | 605  1664
  3  64   0  30   2   0|  60k   50M| 132B  370B|   060k| 750   631
  4  59   0  35   2   0|  48k   45M| 132B  836B|   028k| 708  1293
  7  81   0  10   2   0|  68k   67M| 132B  788B|   0   124k| 928  1634
  5  74   0  20   1   0|  48k   48M| 132B  756B|   0   316k| 830  5715
  5  70   0  24   1   0| 168k   48M| 132B  676B|   0   100k| 734  5325
  4  70   0  24   1   0|  72k   49M| 132B  948B|   088k| 776  3784
  5  57   0  37   1   0|  36k   37M| 132B  996B|   0   480k| 602   369
  2  21   0  77   0   0|  36k   23M| 132B  724B|   072k| 318  1033
  4  51   0  43   2   0| 112k   43M| 132B  756B|   0   112k| 681   909
  5  55   0  40   0   0|  88k   48M| 140B  926B|  16k   12k| 698   557
total-cpu-usage -dsk/total- -net/total- ---paging-- ---system--
usr sys idl wai hiq siq| read  writ| recv  send|  in   out | int   csw
  3  45   0  51   1   0|2248k   29M| 198B 1028B|  28k   44k| 681  5468
  1  21   0  78   0   0|  92k   17M|1275B 2049B|  92k   52k| 328  1883
  3  30   0  66   1   0| 288k   28M| 498B 2116B|   040k| 455   679
  1   1   0  98   0   0|4096B0 | 394B 1340B|4096B0 |  4119
  1   1   0  98   0   0| 148k   52k| 881B 1592B|4096B   44k|  7561
  1   2   0  97   0   0|1408k0 | 351B 1727B|   0 0 | 110   109
  2   1   0  97   0   0|8192B0 |1422B 1940B|   0 0 |  5334
  1   0   0  99   0   0|4096B   12k| 328B 1018B|   0 0 |  4124
  1   4   0  95   0   0| 340k0 |3075B 2152B|4096B0 | 153   191
  4   7   0  89   0   0|1004k   44k|1526B 1906B|   0 0 | 254   244
  0   1   0  99   0   0|  76k0 | 708B 1708B|   0 0 |  6757
  1   1   0  98   0   0|   0 0 | 174B  702B|   0 0 |  3214
  1   1   0  98   0   0|   0 0 | 132B  354B|   0 0 |  3211
  1   0   0  99   0   0|   0 0 | 132B  468B|   0 0 |  3216
  1   0   0  99   0   0|   0 0 | 132B  468B|   0 0 |  3214
  1   1   0  98   0   0|   052k| 132B  678B|   0 0 |  4127
  1   0   0  99   0   0|   0 0 | 198B  678B|   0 0 |  3517
  1   1   0  98   0   0|   0 0 | 198B  468B|   0 0 |  3414
  1   0   0  99   0   0|   0 0 |  66B  354B|   0 0 |  2811
  1   0   0  99   0   0|   0 0 |  66B  354B|   0 0 |  28 9
  1   1   0  98   0   0|   0 0 | 132B  468B|   0 0 |  3416
  1   0   0  98   0   1|   0 0 |  66B  354B|   0 0 |  3011
  1   1   0  98   0   0|   0 0 |  66B  354B|   0 0 |  2911
From that point onwards, nothing will happen.
The host has disk IO to spare... So what is it waiting for??

QEMU PC emulator version 0.12.2 (qemu-kvm-0.12.2), Copyright (c) 
2003-2008 Fabrice Bellard

Guests: various, all recent kernels.
Host: 2.6.31.4

Please advise.

Thanks
Antoine

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html