Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-13 Thread Sage Weil
On Mon, 5 Aug 2013, Mike Dawson wrote:
 Josh,
 
 Logs are uploaded to cephdrop with the file name mikedawson-rbd-qemu-deadlock.
 
 - At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0
 - At about 2013-08-05 19:53:51, ran a 'virsh screenshot'
 
 
 Environment is:
 
 - Ceph 0.61.7 (client is co-mingled with three OSDs)
 - rbd cache = true and cache=writeback
 - qemu 1.4.0 1.4.0+dfsg-1expubuntu4
 - Ubuntu Raring with 3.8.0-25-generic
 
 This issue is reproducible in my environment, and I'm willing to run any wip
 branch you need. What else can I provide to help?

This looks like a different issue than Oliver's.  I see one anomaly in the 
log, where a rbd io completion is triggered a second time for no apparent 
reason.  I opened a separate bug 

http://tracker.ceph.com/issues/5955

and pushed wip-5955 that will hopefully shine some light on the weird 
behavior I saw.  Can you reproduce with this branch and

 debug objectcacher = 20
 debug ms = 1
 debug rbd = 20
 debug finisher = 20

Thanks!
sage


 
 Thanks,
 Mike Dawson
 
 
 On 8/5/2013 3:48 AM, Stefan Hajnoczi wrote:
  On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
   Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:
We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs as
expected. At that point we can examine the guest. Each time we'll see:
  
  If virsh screenshot works then this confirms that QEMU itself is still
  responding.  Its main loop cannot be blocked since it was able to
  process the screendump command.
  
  This supports Josh's theory that a callback is not being invoked.  The
  virtio-blk I/O request would be left in a pending state.
  
  Now here is where the behavior varies between configurations:
  
  On a Windows guest with 1 vCPU, you may see the symptom that the guest no
  longer responds to ping.
  
  On a Linux guest with multiple vCPUs, you may see the hung task message
  from the guest kernel because other vCPUs are still making progress.
  Just the vCPU that issued the I/O request and whose task is in
  UNINTERRUPTIBLE state would really be stuck.
  
  Basically, the symptoms depend not just on how QEMU is behaving but also
  on the guest kernel and how many vCPUs you have configured.
  
  I think this can explain how both problems you are observing, Oliver and
  Mike, are a result of the same bug.  At least I hope they are :).
  
  Stefan
  
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-13 Thread Sage Weil
Hi Oliver,

(Posted this on the bug too, but:)

Your last log revealed a bug in the librados aio flush.  A fix is pushed 
to wip-librados-aio-flush (bobtail) and wip-5919 (master); can you retest 
please (with caching off again)?

Thanks!
sage


On Fri, 9 Aug 2013, Oliver Francke wrote:
 Hi Josh,
 
 just opened
 
 http://tracker.ceph.com/issues/5919
 
 with all collected information incl. debug-log.
 
 Hope it helps,
 
 Oliver.
 
 On 08/08/2013 07:01 PM, Josh Durgin wrote:
  On 08/08/2013 05:40 AM, Oliver Francke wrote:
   Hi Josh,
   
   I have a session logged with:
   
debug_ms=1:debug_rbd=20:debug_objectcacher=30
   
   as you requested from Mike, even if I think, we do have another story
   here, anyway.
   
   Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
   3.2.0-51-amd...
   
   Do you want me to open a ticket for that stuff? I have about 5MB
   compressed logfile waiting for you ;)
  
  Yes, that'd be great. If you could include the time when you saw the guest
  hang that'd be ideal. I'm not sure if this is one or two bugs,
  but it seems likely it's a bug in rbd and not qemu.
  
  Thanks!
  Josh
  
   Thnx in advance,
   
   Oliver.
   
   On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:
On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
 Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:
  We can un-wedge the guest by opening a NoVNC session or running a
  'virsh screenshot' command. After that, the guest resumes and runs
  as expected. At that point we can examine the guest. Each time we'll
  see:
If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest
no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan
   
   
  
 
 
 -- 
 
 Oliver Francke
 
 filoo GmbH
 Moltkestra?e 25a
 0 G?tersloh
 HRB4355 AG G?tersloh
 
 Gesch?ftsf?hrer: J.Rehp?hler | C.Kunz
 
 Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh
 
 ___
 ceph-users mailing list
 ceph-users@lists.ceph.com
 http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
 
 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-10 Thread Josh Durgin

On 08/09/2013 08:03 AM, Stefan Hajnoczi wrote:

On Fri, Aug 09, 2013 at 03:05:22PM +0100, Andrei Mikhailovsky wrote:

I can confirm that I am having similar issues with ubuntu vm guests using fio 
with bs=4k direct=1 numjobs=4 iodepth=16. Occasionally i see hang tasks, 
occasionally guest vm stops responding without leaving anything in the logs and 
sometimes i see kernel panic on the console. I typically leave the runtime of 
the fio test for 60 minutes and it tends to stop responding after about 10-30 
mins.

I am on ubuntu 12.04 with 3.5 kernel backport and using ceph 0.61.7 with qemu 
1.5.0 and libvirt 1.0.2


Oliver's logs show one aio_flush() never getting completed, which
means it's an issue with aio_flush in librados when rbd caching isn't
used.

Mike's log is from a qemu without aio_flush(), and with caching turned 
on, and shows all flushes completing quickly, so it's a separate bug.



Josh,
In addition to the Ceph logs you can also use QEMU tracing with the
following events enabled:
virtio_blk_handle_write
virtio_blk_handle_read
virtio_blk_rw_complete

See docs/tracing.txt for details on usage.

Inspecting the trace output will let you observe the I/O request
submission/completion from the virtio-blk device perspective.  You'll be
able to see whether requests are never being completed in some cases.


Thanks for the info. That may be the best way to check what's happening
when caching is enabled. Mike, could you recompile qemu with tracing
enabled and get a trace of the hang you were seeing, in addition to
the ceph logs?


This bug seems like a corner case or race condition since most requests
seem to complete just fine.  The problem is that eventually the
virtio-blk device becomes unusable when it runs out of descriptors (it
has 128).  And before that limit is reached the guest may become
unusable due to the hung I/O requests.


It seems only one request hung from an important kernel thread in
Oliver's case, but it's good to be aware of the descriptor limit.

Josh
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-09 Thread Oliver Francke

Hi Josh,

just opened

http://tracker.ceph.com/issues/5919

with all collected information incl. debug-log.

Hope it helps,

Oliver.

On 08/08/2013 07:01 PM, Josh Durgin wrote:

On 08/08/2013 05:40 AM, Oliver Francke wrote:

Hi Josh,

I have a session logged with:

 debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story
here, anyway.

Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
3.2.0-51-amd...

Do you want me to open a ticket for that stuff? I have about 5MB
compressed logfile waiting for you ;)


Yes, that'd be great. If you could include the time when you saw the 
guest hang that'd be ideal. I'm not sure if this is one or two bugs,

but it seems likely it's a bug in rbd and not qemu.

Thanks!
Josh


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs
as expected. At that point we can examine the guest. Each time we'll
see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the 
guest no

longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but 
also

on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver 
and

Mike, are a result of the same bug.  At least I hope they are :).

Stefan








--

Oliver Francke

filoo GmbH
Moltkestraße 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-09 Thread Andrei Mikhailovsky
I can confirm that I am having similar issues with ubuntu vm guests using fio 
with bs=4k direct=1 numjobs=4 iodepth=16. Occasionally i see hang tasks, 
occasionally guest vm stops responding without leaving anything in the logs and 
sometimes i see kernel panic on the console. I typically leave the runtime of 
the fio test for 60 minutes and it tends to stop responding after about 10-30 
mins. 

I am on ubuntu 12.04 with 3.5 kernel backport and using ceph 0.61.7 with qemu 
1.5.0 and libvirt 1.0.2 

Andrei 
- Original Message -

From: Oliver Francke oliver.fran...@filoo.de 
To: Josh Durgin josh.dur...@inktank.com 
Cc: ceph-users@lists.ceph.com, Mike Dawson mike.daw...@cloudapt.com, 
Stefan Hajnoczi stefa...@redhat.com, qemu-de...@nongnu.org 
Sent: Friday, 9 August, 2013 10:22:00 AM 
Subject: Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, 
heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive 
qemu-process, [Qemu-devel] [Bug 1207686] 

Hi Josh, 

just opened 

http://tracker.ceph.com/issues/5919 

with all collected information incl. debug-log. 

Hope it helps, 

Oliver. 

On 08/08/2013 07:01 PM, Josh Durgin wrote: 
 On 08/08/2013 05:40 AM, Oliver Francke wrote: 
 Hi Josh, 
 
 I have a session logged with: 
 
 debug_ms=1:debug_rbd=20:debug_objectcacher=30 
 
 as you requested from Mike, even if I think, we do have another story 
 here, anyway. 
 
 Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is 
 3.2.0-51-amd... 
 
 Do you want me to open a ticket for that stuff? I have about 5MB 
 compressed logfile waiting for you ;) 
 
 Yes, that'd be great. If you could include the time when you saw the 
 guest hang that'd be ideal. I'm not sure if this is one or two bugs, 
 but it seems likely it's a bug in rbd and not qemu. 
 
 Thanks! 
 Josh 
 
 Thnx in advance, 
 
 Oliver. 
 
 On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote: 
 On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote: 
 Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com: 
 We can un-wedge the guest by opening a NoVNC session or running a 
 'virsh screenshot' command. After that, the guest resumes and runs 
 as expected. At that point we can examine the guest. Each time we'll 
 see: 
 If virsh screenshot works then this confirms that QEMU itself is still 
 responding. Its main loop cannot be blocked since it was able to 
 process the screendump command. 
 
 This supports Josh's theory that a callback is not being invoked. The 
 virtio-blk I/O request would be left in a pending state. 
 
 Now here is where the behavior varies between configurations: 
 
 On a Windows guest with 1 vCPU, you may see the symptom that the 
 guest no 
 longer responds to ping. 
 
 On a Linux guest with multiple vCPUs, you may see the hung task message 
 from the guest kernel because other vCPUs are still making progress. 
 Just the vCPU that issued the I/O request and whose task is in 
 UNINTERRUPTIBLE state would really be stuck. 
 
 Basically, the symptoms depend not just on how QEMU is behaving but 
 also 
 on the guest kernel and how many vCPUs you have configured. 
 
 I think this can explain how both problems you are observing, Oliver 
 and 
 Mike, are a result of the same bug. At least I hope they are :). 
 
 Stefan 
 
 
 


-- 

Oliver Francke 

filoo GmbH 
Moltkestraße 25a 
0 Gütersloh 
HRB4355 AG Gütersloh 

Geschäftsführer: J.Rehpöhler | C.Kunz 

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh 

___ 
ceph-users mailing list 
ceph-users@lists.ceph.com 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-09 Thread Stefan Hajnoczi
On Fri, Aug 09, 2013 at 03:05:22PM +0100, Andrei Mikhailovsky wrote:
 I can confirm that I am having similar issues with ubuntu vm guests using fio 
 with bs=4k direct=1 numjobs=4 iodepth=16. Occasionally i see hang tasks, 
 occasionally guest vm stops responding without leaving anything in the logs 
 and sometimes i see kernel panic on the console. I typically leave the 
 runtime of the fio test for 60 minutes and it tends to stop responding after 
 about 10-30 mins. 
 
 I am on ubuntu 12.04 with 3.5 kernel backport and using ceph 0.61.7 with qemu 
 1.5.0 and libvirt 1.0.2 

Josh,
In addition to the Ceph logs you can also use QEMU tracing with the
following events enabled:
virtio_blk_handle_write
virtio_blk_handle_read
virtio_blk_rw_complete

See docs/tracing.txt for details on usage.

Inspecting the trace output will let you observe the I/O request
submission/completion from the virtio-blk device perspective.  You'll be
able to see whether requests are never being completed in some cases.

This bug seems like a corner case or race condition since most requests
seem to complete just fine.  The problem is that eventually the
virtio-blk device becomes unusable when it runs out of descriptors (it
has 128).  And before that limit is reached the guest may become
unusable due to the hung I/O requests.

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-08 Thread Oliver Francke

Hi Josh,

I have a session logged with:

debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story 
here, anyway.


Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is 
3.2.0-51-amd...


Do you want me to open a ticket for that stuff? I have about 5MB 
compressed logfile waiting for you ;)


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a 'virsh 
screenshot' command. After that, the guest resumes and runs as expected. At that point we 
can examine the guest. Each time we'll see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan



--

Oliver Francke

filoo GmbH
Moltkestraße 25a
0 Gütersloh
HRB4355 AG Gütersloh

Geschäftsführer: J.Rehpöhler | C.Kunz

Folgen Sie uns auf Twitter: http://twitter.com/filoogmbh

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-08 Thread Josh Durgin

On 08/08/2013 05:40 AM, Oliver Francke wrote:

Hi Josh,

I have a session logged with:

 debug_ms=1:debug_rbd=20:debug_objectcacher=30

as you requested from Mike, even if I think, we do have another story
here, anyway.

Host-kernel is: 3.10.0-rc7, qemu-client 1.6.0-rc2, client-kernel is
3.2.0-51-amd...

Do you want me to open a ticket for that stuff? I have about 5MB
compressed logfile waiting for you ;)


Yes, that'd be great. If you could include the time when you saw the 
guest hang that'd be ideal. I'm not sure if this is one or two bugs,

but it seems likely it's a bug in rbd and not qemu.

Thanks!
Josh


Thnx in advance,

Oliver.

On 08/05/2013 09:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a
'virsh screenshot' command. After that, the guest resumes and runs
as expected. At that point we can examine the guest. Each time we'll
see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan





___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-05 Thread Stefan Hajnoczi
On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:
 Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:
  We can un-wedge the guest by opening a NoVNC session or running a 'virsh 
  screenshot' command. After that, the guest resumes and runs as expected. At 
  that point we can examine the guest. Each time we'll see:

If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-05 Thread Mike Dawson

Josh,

Logs are uploaded to cephdrop with the file name 
mikedawson-rbd-qemu-deadlock.


- At about 2013-08-05 19:46 or 47, we hit the issue, traffic went to 0
- At about 2013-08-05 19:53:51, ran a 'virsh screenshot'


Environment is:

- Ceph 0.61.7 (client is co-mingled with three OSDs)
- rbd cache = true and cache=writeback
- qemu 1.4.0 1.4.0+dfsg-1expubuntu4
- Ubuntu Raring with 3.8.0-25-generic

This issue is reproducible in my environment, and I'm willing to run any 
wip branch you need. What else can I provide to help?


Thanks,
Mike Dawson


On 8/5/2013 3:48 AM, Stefan Hajnoczi wrote:

On Sun, Aug 04, 2013 at 03:36:52PM +0200, Oliver Francke wrote:

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

We can un-wedge the guest by opening a NoVNC session or running a 'virsh 
screenshot' command. After that, the guest resumes and runs as expected. At that point we 
can examine the guest. Each time we'll see:


If virsh screenshot works then this confirms that QEMU itself is still
responding.  Its main loop cannot be blocked since it was able to
process the screendump command.

This supports Josh's theory that a callback is not being invoked.  The
virtio-blk I/O request would be left in a pending state.

Now here is where the behavior varies between configurations:

On a Windows guest with 1 vCPU, you may see the symptom that the guest no
longer responds to ping.

On a Linux guest with multiple vCPUs, you may see the hung task message
from the guest kernel because other vCPUs are still making progress.
Just the vCPU that issued the I/O request and whose task is in
UNINTERRUPTIBLE state would really be stuck.

Basically, the symptoms depend not just on how QEMU is behaving but also
on the guest kernel and how many vCPUs you have configured.

I think this can explain how both problems you are observing, Oliver and
Mike, are a result of the same bug.  At least I hope they are :).

Stefan


___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] qemu-1.4.0 and onwards, linux kernel 3.2.x, ceph-RBD, heavy I/O leads to kernel_hung_tasks_timout_secs message and unresponsive qemu-process, [Qemu-devel] [Bug 1207686]

2013-08-04 Thread Oliver Francke
Hi Mike,

you might be the guy StefanHa was referring to on the qemu-devel mailing-list.

I just made some more tests, so…

Am 02.08.2013 um 23:47 schrieb Mike Dawson mike.daw...@cloudapt.com:

 Oliver,
 
 We've had a similar situation occur. For about three months, we've run 
 several Windows 2008 R2 guests with virtio drivers that record video 
 surveillance. We have long suffered an issue where the guest appears to hang 
 indefinitely (or until we intervene). For the sake of this conversation, we 
 call this state wedged, because it appears something (rbd, qemu, virtio, 
 etc) gets stuck on a deadlock. When a guest gets wedged, we see the following:
 
 - the guest will not respond to pings

If showing up the hung_task - message, I can ping and establish new 
ssh-sessions, just the session with a while loop does not accept any 
keyboard-action.

 - the qemu-system-x86_64 process drops to 0% cpu
 - graphite graphs show the interface traffic dropping to 0bps
 - the guest will stay wedged forever (or until we intervene)
 - strace of qemu-system-x86_64 shows QEMU is making progress [1][2]
 

nothing special here:

5, events=POLLIN}, {fd=7, events=POLLIN}, {fd=6, events=POLLIN}, {fd=19, 
events=POLLIN}, {fd=15, events=POLLIN}, {fd=4, events=POLLIN}], 11, -1) = 1 
([{fd=12, revents=POLLIN}])
[pid 11793] read(5, 0x7fff16b61f00, 16) = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 11793] read(12, 
\2\0\0\0\0\0\0\0\0\0\0\0\0\361p\0\252\340\374\373\373!gH\10\0E\0\0Yq\374..., 
69632) = 115
[pid 11793] read(12, 0x7f0c1737fcec, 69632) = -1 EAGAIN (Resource temporarily 
unavailable)
[pid 11793] poll([{fd=27, events=POLLIN|POLLERR|POLLHUP}, {fd=26, 
events=POLLIN|POLLERR|POLLHUP}, {fd=24, events=POLLIN|POLLERR|POLLHUP}, {fd=12, 
events=POLLIN|POLLERR|POLLHUP}, {fd=3, events=POLLIN|POLLERR|POLLHUP}, {fd=

and that for many, many threads.
Inside the VM I see 75% wait, but I can restart the spew-test in a second 
session.

All that tested with rbd_cache=false,cache=none.

I also test every qemu-version with a 2 CPU 2GiB mem Windows 7 VM with some 
high load, encountering no problem ATM. Running smooth and fast.

 We can un-wedge the guest by opening a NoVNC session or running a 'virsh 
 screenshot' command. After that, the guest resumes and runs as expected. At 
 that point we can examine the guest. Each time we'll see:
 
 - No Windows error logs whatsoever while the guest is wedged
 - A time sync typically occurs right after the guest gets un-wedged
 - Scheduled tasks do not run while wedged
 - Windows error logs do not show any evidence of suspend, sleep, etc
 
 We had so many issue with guests becoming wedged, we wrote a script to 'virsh 
 screenshot' them via cron. Then we installed some updates and had a month or 
 so of higher stability (wedging happened maybe 1/10th as often). Until today 
 we couldn't figure out why.
 
 Yesterday, I realized qemu was starting the instances without specifying 
 cache=writeback. We corrected that, and let them run overnight. With RBD 
 writeback re-enabled, wedging came back as often as we had seen in the past. 
 I've counted ~40 occurrences in the past 12-hour period. So I feel like 
 writeback caching in RBD certainly makes the deadlock more likely to occur.
 
 Joshd asked us to gather RBD client logs:
 
 joshd it could very well be the writeback cache not doing a callback at 
 some point - if you could gather logs of a vm getting stuck with debug rbd = 
 20, debug ms = 1, and debug objectcacher = 30 that would be great
 
 We'll do that over the weekend. If you could as well, we'd love the help!
 
 [1] http://www.gammacode.com/kvm/wedged-with-timestamps.txt
 [2] http://www.gammacode.com/kvm/not-wedged.txt
 

As I wrote above, no cache so far, so omitting the verbose debugging at the 
moment. But will do if requested.

Thnx for your report,

Oliver.

 Thanks,
 
 Mike Dawson
 Co-Founder  Director of Cloud Architecture
 Cloudapt LLC
 6330 East 75th Street, Suite 170
 Indianapolis, IN 46250
 
 On 8/2/2013 6:22 AM, Oliver Francke wrote:
 Well,
 
 I believe, I'm the winner of buzzwords-bingo for today.
 
 But seriously speaking... as I don't have this particular problem with
 qcow2 with kernel 3.2 nor qemu-1.2.2 nor newer kernels, I hope I'm not
 alone here?
 We have a raising number of tickets from people reinstalling from ISO's
 with 3.2-kernel.
 
 Fast fallback is to start all VM's with qemu-1.2.2, but we then lose
 some features ala latency-free-RBD-cache ;)
 
 I just opened a bug for qemu per:
 
 https://bugs.launchpad.net/qemu/+bug/1207686
 
 with all dirty details.
 
 Installing a backport-kernel 3.9.x or upgrade Ubuntu-kernel to 3.8.x
 fixes it. So we have a bad combination for all distros with 3.2-kernel
 and rbd as storage-backend, I assume.
 
 Any similar findings?
 Any idea of tracing/debugging ( Josh? ;) ) very welcome,
 
 Oliver.
 

___
ceph-users mailing list
ceph-users@lists.ceph.com