On Thu, May 8, 2014 at 4:33 AM, Stefan Hajnoczi <stefa...@redhat.com> wrote: > On Wed, May 07, 2014 at 04:09:27PM +0200, Peter Lieven wrote: >> On 07.05.2014 12:29, Paolo Bonzini wrote: >> >Il 07/05/2014 12:07, Stefan Hajnoczi ha scritto: >> >>On Fri, May 02, 2014 at 12:39:06AM +0200, Peter Lieven wrote: >> >>>>+static void iscsi_attach_aio_context(BlockDriverState *bs, >> >>>>+ AioContext *new_context) >> >>>>+{ >> >>>>+ IscsiLun *iscsilun = bs->opaque; >> >>>>+ >> >>>>+ iscsilun->aio_context = new_context; >> >>>>+ iscsi_set_events(iscsilun); >> >>>>+ >> >>>>+#if defined(LIBISCSI_FEATURE_NOP_COUNTER) >> >>>>+ /* Set up a timer for sending out iSCSI NOPs */ >> >>>>+ iscsilun->nop_timer = aio_timer_new(iscsilun->aio_context, >> >>>>+ QEMU_CLOCK_REALTIME, SCALE_MS, >> >>>>+ iscsi_nop_timed_event, iscsilun); >> >>>>+ timer_mod(iscsilun->nop_timer, >> >>>>+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) + NOP_INTERVAL); >> >>>>+#endif >> >>>>+} >> >>> >> >>>Is it still guaranteed that iscsi_nop_timed_event for a target is not >> >>>invoked >> >>>while we are in another function/callback of the iscsi driver for the >> >>>same target? >> > >> >Yes, since the timer is in the same AioContext as the iscsi driver >> >callbacks. >> >> >> Ok. Stefan: What MUST NOT happen is that the timer gets fired while we are >> in iscsi_service. >> As Paolo outlined, this cannot happen, right? > > Okay, I think we're safe then. The timer can only be invoked during > aio_poll() event loop iterations. It cannot be invoked while we're > inside iscsi_service(). > >> >>BTW, is iscsi_reconnect() the right libiscsi interface to use since it >> >>is synchronous? It seems like this would block QEMU until the socket >> >>has connected! The guest would be frozen. >> > >> >There is no asynchronous interface yet for reconnection, unfortunately. >> >> We initiate the reconnect after we miss a few NOP replies. So the target is >> already down for approx. 30 seconds. >> Every process inside the guest is already haging or has timed out. >> >> If I understand correctly with the new patches only the communication with >> this target is hanging or isn't it? >> So what benefit would an asyncronous reconnect have? > > Asynchronous reconnect is desirable: > > 1. The QEMU monitor is blocked while we're waiting for the iSCSI target > to accept our reconnect. This means the management stack (libvirt) > cannot control QEMU until we time out or succeed. > > 2. The guest is totally frozen - cannot execute instructions - because > it will soon reach a point in the code that locks the QEMU global > mutex (which is being held while we reconnect to the iSCSI target). > > This may be okayish for guests where the iSCSI LUN contains the > "main" data that is being processed. But what if an iSCSI LUN was > just attached to a guest that is also doing other things that are > independent (e.g. serving a website, processing data from a local > disk, etc) - now the reconnect causes downtime for the entire guest.
I will look into making the reconnect async over the next few days.