Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure
On Thu, Jul 25, 2013 at 02:53:57PM +0900, MORITA Kazutaka wrote: At Thu, 25 Jul 2013 13:25:33 +0800, Liu Yuan wrote: Hello Kazutaka, I have two patches fixing the problems I found on my testing and they are complementary patches. Please consider sending them on top of your patch set. Thanks a lot for your comments and patches, but I've already prepared patches, which would be probably better fixes. I'll send the v3 series soon. It'd be appreciated if you would give a review for it. Okay, no problem. Well, in my previous patches, patch 2/2 isn't correct, I did a wrong manual rebase by hasty copy. Just FYI. Thanks Yuan
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure
On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote: Currently, if a sheepdog server exits, all the connecting VMs need to be restarted. This series implements a feature to reconnect the server, and enables us to do online sheepdog upgrade and avoid restarting VMs when sheepdog servers crash unexpectedly. It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog cluster and then 1. did some buffered writes 2. restart sheep that this QEMU VM connected to. 3. $ sync I got following error: $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda sheepdog:test qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused ...repeat... QEMU version is master tip Thanks Yuan
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure
At Wed, 24 Jul 2013 16:28:30 +0800, Liu Yuan wrote: On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote: Currently, if a sheepdog server exits, all the connecting VMs need to be restarted. This series implements a feature to reconnect the server, and enables us to do online sheepdog upgrade and avoid restarting VMs when sheepdog servers crash unexpectedly. It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog cluster and then 1. did some buffered writes 2. restart sheep that this QEMU VM connected to. 3. $ sync I got following error: $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda sheepdog:test qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused ...repeat... QEMU version is master tip Your sheep daemon looks like unreachable from qemu. I tried the same procedure, but couldn't reproduce it. Is the problem reproducible? Can you make sure that you can connect to the sheep daemon from collie while the error message shows up? Thanks, Kazutaka
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure
On Wed, Jul 24, 2013 at 06:07:21PM +0900, MORITA Kazutaka wrote: At Wed, 24 Jul 2013 16:28:30 +0800, Liu Yuan wrote: On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote: Currently, if a sheepdog server exits, all the connecting VMs need to be restarted. This series implements a feature to reconnect the server, and enables us to do online sheepdog upgrade and avoid restarting VMs when sheepdog servers crash unexpectedly. It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog cluster and then 1. did some buffered writes 2. restart sheep that this QEMU VM connected to. 3. $ sync I got following error: $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda sheepdog:test qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused ...repeat... QEMU version is master tip Your sheep daemon looks like unreachable from qemu. I tried the same procedure, but couldn't reproduce it. Is the problem reproducible? Can you make sure that you can connect to the sheep daemon from collie while the error message shows up? Yesh. Well I try to repeat it with following process: 1. did some buffered write 2. kill the sheep 3. $ sync # at guest, now 'sync' hang for response 4. restart sheep After 4 'sync' still hangs until timeout with a message hda:dma_timer_expiry: dma status == 0x21 Guest end up freeze. QEMU output is the same: qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused But notice, if I did restart sheep with guest doing nothing, your patch set work like a charm. Thanks Yuan
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure
On Wed, Jul 24, 2013 at 11:42:49PM +0800, Liu Yuan wrote: On Wed, Jul 24, 2013 at 06:07:21PM +0900, MORITA Kazutaka wrote: At Wed, 24 Jul 2013 16:28:30 +0800, Liu Yuan wrote: On Wed, Jul 24, 2013 at 04:56:24PM +0900, MORITA Kazutaka wrote: Currently, if a sheepdog server exits, all the connecting VMs need to be restarted. This series implements a feature to reconnect the server, and enables us to do online sheepdog upgrade and avoid restarting VMs when sheepdog servers crash unexpectedly. It doesn't work on my test. I tried start linux-0.2.img stored in sheepdog cluster and then 1. did some buffered writes 2. restart sheep that this QEMU VM connected to. 3. $ sync I got following error: $ ../qemu/x86_64-softmmu/qemu-system-x86_64 --enable-kvm -m 1024 -hda sheepdog:test qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused ...repeat... QEMU version is master tip Your sheep daemon looks like unreachable from qemu. I tried the same procedure, but couldn't reproduce it. Is the problem reproducible? Can you make sure that you can connect to the sheep daemon from collie while the error message shows up? Yesh. Well I try to repeat it with following process: 1. did some buffered write 2. kill the sheep 3. $ sync # at guest, now 'sync' hang for response 4. restart sheep After 4 'sync' still hangs until timeout with a message hda:dma_timer_expiry: dma status == 0x21 Guest end up freeze. QEMU output is the same: qemu-system-x86_64: failed to get the header, Resource temporarily unavailable qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused qemu-system-x86_64: Failed to connect to socket: Connection refused But notice, if I did restart sheep with guest doing nothing, your patch set work like a charm. I have debug it a bit. The problem is that at stage 3, 'sync' invoke add_aio_request() in the sheepdog driver and add_aio_request *succeed* with aio put on the inflight_aio_head list, *not* on the failed_aio_head list. So in the reconnect_to_sdog(), we have no way to resend the targeted aio and 'sync' wait for ever. Thanks Yuan
Re: [Qemu-devel] [sheepdog] [PATCH v2 0/9] sheepdog: reconnect server after connection failure
At Thu, 25 Jul 2013 13:25:33 +0800, Liu Yuan wrote: Hello Kazutaka, I have two patches fixing the problems I found on my testing and they are complementary patches. Please consider sending them on top of your patch set. Thanks a lot for your comments and patches, but I've already prepared patches, which would be probably better fixes. I'll send the v3 series soon. It'd be appreciated if you would give a review for it. Thanks, Kazutaka