On 19.06.2018 [15:35:57 -0700], Nishanth Aravamudan wrote: > On 19.06.2018 [13:14:51 -0700], Nishanth Aravamudan wrote: > > On 19.06.2018 [14:35:33 -0500], Eric Blake wrote: > > > On 06/15/2018 12:47 PM, Nishanth Aravamudan via Qemu-devel wrote: > > <snip> > > > > > } else if (s->use_linux_aio) { > > > > + int rc; > > > > + rc = aio_setup_linux_aio(bdrv_get_aio_context(bs)); > > > > + if (rc != 0) { > > > > + error_report("Unable to use native AIO, falling back > > > > to " > > > > + "thread pool."); > > > > > > In general, error_report() should not output a trailing '.'. > > > > Will fix. > > > > > > + s->use_linux_aio = 0; > > > > + return rc; > > > > > > Wait - the message claims we are falling back, but the non-zero return > > > code > > > sounds like we are returning an error instead of falling back. (My > > > preference - if the user requested something and we can't do it, it's > > > better > > > to error than to fall back to something that does not match the user's > > > request). > > > > I think that makes sense, I hadn't tested this specific case (in my > > reading of the code, it wasn't clear to me if raw_co_prw() could be > > called before raw_aio_plug() had been called, but I think returning the > > error code up should be handled correctly. What about the cases where > > there is no error handling (the other two changes in the patch)? > > While looking at doing these changes, I realized that I'm not quite sure > what the right approach is here. My original rationale for returning > non-zero was that AIO was requested but could not be completed. I > haven't fully tracked back the calling paths, but I assumed it would get > retried at the top level, and since we indicated to not use AIO on > subsequent calls, it will succeed and use threads then (note, that I do > now realize this means a mismatch between the qemu command-line and the > in-use AIO model). > > In practice, with my v2 patch, where I do return a non-zero error-code > from this function, qemu does not exit (nor is any logging other than > that I added emitted on the monitor). If I do not fallback, I imagine we > would just continuously see this error message and IO might not actually > every occur? Reworking all of the callpath to fail on non-zero returns > from raw_co_prw() seems like a fair bit of work, but if that is what is > being requested, I can try that (it will just take a while). > Alternatively, I can produce a v3 quickly that does not bubble the > actual errno all the way up (since it does seem like it is ignored > anyways?).
Sorry for the noise, but I had one more thought. Would it be appropriate to push the _setup() call up to when we parse the arguments about aio=native? E.g., we already check there if cache=directsync is specified and error out if not. We could, in theory, also call laio_init() there (via the new function) and error out to the CLI if that fails. Then the runtime paths would simply be able to use the context that was setup earlier? I would need to verify the laio_cleanup() happens correctly still. Thanks, Nish