Olaf Hering <o...@aepfle.de> writes: > On Mon, Sep 21, Vitaly Kuznetsov wrote: > >> I'd like to see a trace from the hang, it is not obvious to me how it >> happened and what caused it. (or if you have such hang scenario in your >> head, can you please reveal it?) > > There is no trace. I think fcopy_respond_to_host notifies the host, > which in turn triggers an interrupt right away which is processed while > fcopy_on_msg is executing somewhere between the return from > fcopy_respond_to_host and the call into hv_fcopy_onchannelcallback. >
I think it is fcopy_transaction.fcopy_context which gets out of sync. When we're done processing some request we have the following code: fcopy_transaction.state = HVUTIL_USERSPACE_RECV; fcopy_respond_to_host(*val); fcopy_transaction.state = HVUTIL_READY; hv_poll_channel(fcopy_transaction.fcopy_context, hv_fcopy_onchannelcallback); If interrupt happens after we did fcopy_respond_to_host() fcopy_transaction.state will still be HVUTIL_USERSPACE_RECV or even its previous HVUTIL_USERSPACE_REQ but it's OK as we have the following in hv_fcopy_onchannelcallback() if (fcopy_transaction.state > HVUTIL_READY) { /* * We will defer processing this callback once * the current transaction is complete. */ fcopy_transaction.fcopy_context = context; return; } And we're supposed to process the work with hv_poll_channel(). The problem is (I guess) that fcopy_transaction.fcopy_context gets out of sync and it still has its previous value (possibly NULL). We call hv_poll_channel() with NULL and everything gets stuck as we'll never process the request. AFAICS proper locking is requred here (and probably in all three drivers), we need to protect not only .state but the whole transaction. [...] -- Vitaly -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/