Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
On Wed, May 23, 2012 at 2:31 AM, Michael Tokarev m...@tls.msk.ru wrote: On 22.05.2012 14:03, ronnie sahlberg wrote: [] Doh, now I remember. Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. Thanks. I just confirmed that qemu_notify_event() fixes the issue. Ill send a patch that uses qemu_notify_event() and a comment why this is needed. When I were writing network block driver I tried to not call qemu_aio_set_fd_handler() unless absolutely necessary -- that is, trying to write as much as possible till the next write returns EAGAIN, or to read till the next read returns EAGAIN. This way there's no need to call qemu_notify_event(). FWIW. Good idea. Ill create a patch that keeps track of which events we are listening on already and only calling these functions when the events change. regards ronnie sahlberg
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Hi Stefan On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Yes, I use IDE since I boot from this LUN. I just managed to track it down to the IDE changes. It looks like basically this change triggered it : commit bef0fd5958120542f126f2dedbfce65d8839a94d Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date: Thu Mar 29 10:31:30 2012 +0100 ide: convert ide_sector_read() to asynchronous I/O Stefan, any ideas? Ill continue trying to understand the IDE code tomorrow and maybe I make some progress. It sounds like aio for iSCSI is broken. IDE submits bdrv_aio_readv() requests which iSCSI handles. QEMU main loop should find that the iSCSI socket file descriptor becomes readable when the SCSI response arrives. We can then report I/O completion back to the guest. However, it appears that completions are not triggering. My guess is that once the OS is fully booted you don't see the problem because other events are pumping the main loop besides a 55ms timer. Therefore this problem probably applies across the board but it's most noticable when the BIOS/bootloader/early OS is executing. I booted off a CDROM image and then had iscsi device as a disk. From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null bs=512 iflag=direct' but did not see these 55ms delays anymore. I did however see a ~4ms delay . Have you added debugging into block/iscsi.c or libiscsi to watch when read handlers get called? Using tcpdump or strace might also be useful to figure out when the reply becomes available and why we aren't registering a read handler function with the QEMU main loop. I just did that, and I print a message and timeval to stdout for the functions in block/iscsi.c What seems to happen is that we call iscsi_aio_readv() which sets up a write event but this event is not triggered for another 55ms. Since the descriptor is writeable at this point, i would expected that the write_event_callback would trigger almost immediately. Below iscsi_aio_readv is printed when we enter this function. This function prepares a CDB and sets it up for queueing, but it does not actually write it to the socket. At the end of this function, we set up the events by calling iscsi_set_events() and we set it up for writeable. iscsi_process_write() is when we enter the fd-is-writeable callback. iscsi_aio_readv 1337676989.429822 iscsi_set_events write_event?Y 1337676989.429834 iscsi_process_write 1337676989.484424 iscsi_set_events write_event?N 1337676989.484507 iscsi_process_read1337676989.484628 iscsi_aio_read10_cb 1337676989.484647 iscsi_set_events write_event?N 1337676989.484660 iscsi_readv_writev_bh_cb 1337676989.484665 iscsi_aio_readv 1337676989.484751 So what happens seems to be that once we have set up the i/o and queued it, and we set up the writeable event, but this event is not triggered immediately! It takes another 55ms before the event is triggered. To me this looks like a bug in the eventsystem. The socket is always writeable. shouldnt the callback be invoked almost immediately here ? regards ronnie sahlberg
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) { struct iscsi_context *iscsi = iscsilun-iscsi; +if (iscsi_which_events(iscsi) POLLOUT) { +iscsi_process_write(iscsilun); +} + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, On Tue, May 22, 2012 at 7:07 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Hi Stefan On Mon, May 21, 2012 at 11:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Yes, I use IDE since I boot from this LUN. I just managed to track it down to the IDE changes. It looks like basically this change triggered it : commit bef0fd5958120542f126f2dedbfce65d8839a94d Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date: Thu Mar 29 10:31:30 2012 +0100 ide: convert ide_sector_read() to asynchronous I/O Stefan, any ideas? Ill continue trying to understand the IDE code tomorrow and maybe I make some progress. It sounds like aio for iSCSI is broken. IDE submits bdrv_aio_readv() requests which iSCSI handles. QEMU main loop should find that the iSCSI socket file descriptor becomes readable when the SCSI response arrives. We can then report I/O completion back to the guest. However, it appears that completions are not triggering. My guess is that once the OS is fully booted you don't see the problem because other events are pumping the main loop besides a 55ms timer. Therefore this problem probably applies across the board but it's most noticable when the BIOS/bootloader/early OS is executing. I booted off a CDROM image and then had iscsi device as a disk. From the cdrom booted guest I then did 'dd if=/dev/sda of=/dev/null bs=512 iflag=direct' but did not see these 55ms delays anymore. I did however see a ~4ms delay . Have you added debugging into block/iscsi.c or libiscsi to watch when read handlers get called? Using tcpdump or strace might also be useful to figure out when the reply becomes available and why we aren't registering a read handler function with the QEMU main loop. I just did that, and I print a message and timeval to stdout for the functions in block/iscsi.c What seems to happen is that we call iscsi_aio_readv() which sets up a write event but this event is not triggered for another 55ms. Since the descriptor is writeable at this point, i would expected that the write_event_callback would trigger almost immediately. Below iscsi_aio_readv is printed when we enter this function. This function prepares a CDB and sets it up for queueing, but it does not actually write it to the socket. At the end of this function, we set up the events by calling iscsi_set_events() and we set it up for writeable. iscsi_process_write() is when we enter the fd-is-writeable callback. iscsi_aio_readv 1337676989.429822 iscsi_set_events write_event?Y 1337676989.429834 iscsi_process_write 1337676989.484424 iscsi_set_events write_event?N 1337676989.484507 iscsi_process_read 1337676989.484628 iscsi_aio_read10_cb 1337676989.484647 iscsi_set_events write_event?N 1337676989.484660 iscsi_readv_writev_bh_cb 1337676989.484665 iscsi_aio_readv 1337676989.484751 So what happens seems to be that once we have set up the i/o and queued it, and we set up the writeable event, but this event is not triggered immediately! It takes another 55ms before the event is triggered. To me this looks like a bug in the eventsystem. The socket is always writeable. shouldnt the callback be invoked almost immediately here ? regards ronnie sahlberg
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Il 22/05/2012 11:15, ronnie sahlberg ha scritto: Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) { struct iscsi_context *iscsi = iscsilun-iscsi; +if (iscsi_which_events(iscsi) POLLOUT) { +iscsi_process_write(iscsilun); +} + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, Doh, now I remember. Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. The alternative is your patch, which is correct. Can you resend it with SoB? Paolo
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/05/2012 11:15, ronnie sahlberg ha scritto: Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) { struct iscsi_context *iscsi = iscsilun-iscsi; + if (iscsi_which_events(iscsi) POLLOUT) { + iscsi_process_write(iscsilun); + } + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, Doh, now I remember. Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. Thanks. I just confirmed that qemu_notify_event() fixes the issue. Ill send a patch that uses qemu_notify_event() and a comment why this is needed. regards ronnie sahlberg
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
None of the other drivers in block/*.c call qemu_notify_event() Do you you think those should be audited and have this call added to where required ? regards ronnie sahlberg On Tue, May 22, 2012 at 7:48 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: On Tue, May 22, 2012 at 7:29 PM, Paolo Bonzini pbonz...@redhat.com wrote: Il 22/05/2012 11:15, ronnie sahlberg ha scritto: Hi, Now that I see what happens, I can easily workaround this in block/iscsi.c by the patch below, but I dont know if this is the right thing to do. It does appear that here, when I use qemu_set_fd_handler() and add a handler for writeble it takes 55ms before the event system notices this and reacts. diff --git a/block/iscsi.c b/block/iscsi.c index d37c4ee..1ebff0f 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -105,6 +105,10 @@ iscsi_set_events(IscsiLun *iscsilun) { struct iscsi_context *iscsi = iscsilun-iscsi; + if (iscsi_which_events(iscsi) POLLOUT) { + iscsi_process_write(iscsilun); + } + qemu_aio_set_fd_handler(iscsi_get_fd(iscsi), iscsi_process_read, (iscsi_which_events(iscsi) POLLOUT) ? iscsi_process_write : NULL, Doh, now I remember. Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. Thanks. I just confirmed that qemu_notify_event() fixes the issue. Ill send a patch that uses qemu_notify_event() and a comment why this is needed. regards ronnie sahlberg
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
On 22.05.2012 14:03, ronnie sahlberg wrote: [] Doh, now I remember. Whenever you change the aio handlers you need to call qemu_notify_event() afterwards, if the handler may fire right away. Thanks. I just confirmed that qemu_notify_event() fixes the issue. Ill send a patch that uses qemu_notify_event() and a comment why this is needed. When I were writing network block driver I tried to not call qemu_aio_set_fd_handler() unless absolutely necessary -- that is, trying to write as much as possible till the next write returns EAGAIN, or to read till the next read returns EAGAIN. This way there's no need to call qemu_notify_event(). FWIW. /mjt
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Am 21.05.2012 08:07, schrieb ronnie sahlberg: List, Kevin, Since this merge : commit 1f8bcac09af61e58c5121aa0a932190700ad554d Merge: cb4c254 1042ec9 Author: Anthony Liguori aligu...@us.ibm.com Date: Mon Apr 23 14:27:04 2012 -0500 Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: (38 commits) qemu-iotests: Fix test 031 for qcow2 v3 support qemu-iotests: Add -o and make v3 the default for qcow2 qcow2: Zero write support qemu-iotests: Test backing file COW with zero clusters qemu-iotests: add a simple test for write_zeroes qcow2: Support for feature table header extension qcow2: Support reading zero clusters qcow2: Version 3 images qcow2: Ignore reserved bits in check_refcounts qcow2: Ignore reserved bits in refcount table entries qcow2: Simplify count_cow_clusters qcow2: Refactor qcow2_free_any_clusters qcow2: Ignore reserved bits in L1/L2 entries qcow2: Fail write_compressed when overwriting data qcow2: Ignore reserved bits in count_contiguous_clusters() qcow2: Ignore reserved bits in get_cluster_offset qcow2: Save disk size in snapshot header Specification for qcow2 version 3 qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at() iotests: Resolve test failures caused by hostname ... I am seeing some weirdness when using iscsi. I have isolated it to this particular commit, but since it is 3900 lines in sinze I have not yet found the exact change that triggers this particular behaviour. It shows up when using an iscsi device to boot from, where when during the bios boot and later grub boot almost all I/O has a pause of 55ms between them. During boot the bios and later grub will read a lot of data, primarily sequentially and one single block at a time. After these changes were applied there is now an approximate 55ms delay between all these I/O, causing the boot process to become very slow. I have not yet found the exact part of this big patch that cause this slowdown, but will continue investigating. I am posting this here in case someone has any idea or if 55ms rings any bells. The only thing in this merge that looks as if it could be related is Paolo's AIO changes between commit adfe92f6 and 9eb0bfca. Or if you use IDE, Stefan's patches to make PIO operations asynchronous could be related. Kevin
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Il 21/05/2012 12:45, Kevin Wolf ha scritto: I have not yet found the exact part of this big patch that cause this slowdown, but will continue investigating. I am posting this here in case someone has any idea or if 55ms rings any bells. 55ms is the frequency of the PIT while the BIOS is running. I'd guess something is not waking up the iothread properly. The only thing in this merge that looks as if it could be related is Paolo's AIO changes between commit adfe92f6 and 9eb0bfca. Or if you use IDE, Stefan's patches to make PIO operations asynchronous could be related. Yeah, both seem just as likely to be the culprit. Please bisect. Paolo
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
Yes, I use IDE since I boot from this LUN. I just managed to track it down to the IDE changes. It looks like basically this change triggered it : commit bef0fd5958120542f126f2dedbfce65d8839a94d Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date: Thu Mar 29 10:31:30 2012 +0100 ide: convert ide_sector_read() to asynchronous I/O Stefan, any ideas? Ill continue trying to understand the IDE code tomorrow and maybe I make some progress. regards ronnie sahlberg On Mon, May 21, 2012 at 8:45 PM, Kevin Wolf kw...@redhat.com wrote: Am 21.05.2012 08:07, schrieb ronnie sahlberg: List, Kevin, Since this merge : commit 1f8bcac09af61e58c5121aa0a932190700ad554d Merge: cb4c254 1042ec9 Author: Anthony Liguori aligu...@us.ibm.com Date: Mon Apr 23 14:27:04 2012 -0500 Merge remote-tracking branch 'kwolf/for-anthony' into staging * kwolf/for-anthony: (38 commits) qemu-iotests: Fix test 031 for qcow2 v3 support qemu-iotests: Add -o and make v3 the default for qcow2 qcow2: Zero write support qemu-iotests: Test backing file COW with zero clusters qemu-iotests: add a simple test for write_zeroes qcow2: Support for feature table header extension qcow2: Support reading zero clusters qcow2: Version 3 images qcow2: Ignore reserved bits in check_refcounts qcow2: Ignore reserved bits in refcount table entries qcow2: Simplify count_cow_clusters qcow2: Refactor qcow2_free_any_clusters qcow2: Ignore reserved bits in L1/L2 entries qcow2: Fail write_compressed when overwriting data qcow2: Ignore reserved bits in count_contiguous_clusters() qcow2: Ignore reserved bits in get_cluster_offset qcow2: Save disk size in snapshot header Specification for qcow2 version 3 qcow2: Fix refcount block allocation during qcow2_alloc_cluster_at() iotests: Resolve test failures caused by hostname ... I am seeing some weirdness when using iscsi. I have isolated it to this particular commit, but since it is 3900 lines in sinze I have not yet found the exact change that triggers this particular behaviour. It shows up when using an iscsi device to boot from, where when during the bios boot and later grub boot almost all I/O has a pause of 55ms between them. During boot the bios and later grub will read a lot of data, primarily sequentially and one single block at a time. After these changes were applied there is now an approximate 55ms delay between all these I/O, causing the boot process to become very slow. I have not yet found the exact part of this big patch that cause this slowdown, but will continue investigating. I am posting this here in case someone has any idea or if 55ms rings any bells. The only thing in this merge that looks as if it could be related is Paolo's AIO changes between commit adfe92f6 and 9eb0bfca. Or if you use IDE, Stefan's patches to make PIO operations asynchronous could be related. Kevin
Re: [Qemu-devel] Weird iscsi/fd-event issue since recent merge of event system changes
On Mon, May 21, 2012 at 12:00 PM, ronnie sahlberg ronniesahlb...@gmail.com wrote: Yes, I use IDE since I boot from this LUN. I just managed to track it down to the IDE changes. It looks like basically this change triggered it : commit bef0fd5958120542f126f2dedbfce65d8839a94d Author: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Date: Thu Mar 29 10:31:30 2012 +0100 ide: convert ide_sector_read() to asynchronous I/O Stefan, any ideas? Ill continue trying to understand the IDE code tomorrow and maybe I make some progress. It sounds like aio for iSCSI is broken. IDE submits bdrv_aio_readv() requests which iSCSI handles. QEMU main loop should find that the iSCSI socket file descriptor becomes readable when the SCSI response arrives. We can then report I/O completion back to the guest. However, it appears that completions are not triggering. My guess is that once the OS is fully booted you don't see the problem because other events are pumping the main loop besides a 55ms timer. Therefore this problem probably applies across the board but it's most noticable when the BIOS/bootloader/early OS is executing. Have you added debugging into block/iscsi.c or libiscsi to watch when read handlers get called? Using tcpdump or strace might also be useful to figure out when the reply becomes available and why we aren't registering a read handler function with the QEMU main loop. Stefan