On 6/5/19 6:15 AM, Richard W.M. Jones wrote: > For each nbd_(unlocked_)?aio_is_* function, split it into an internal > function that tests the state and a public visible API function. > > For calls within the library, use the internal function. > > This is simple refactoring. > > As a side effect this fixes a race condition identified by Eric Blake: > > Thread A Thread B > (in a call that holds h->lock) (calling nbd_aio_pread) > -------------------------------------------------------------------- > h->state is processing > checks nbd_aio_is_ready > (it's false) > h->state is moved to READY > checks nbd_aio_is_processing > (it's false) > validation check fails > > (However the state was valid so the validation check should have > succeeded). > > Fixes commit e63a11736930c381a79a8cc2d03844cfff5db3ef. > > Thanks: Eric Blake for identifying the race condition above. > ---
> @@ -2841,19 +2841,20 @@ let generate_lib_api_c () =
> pr " /* We can check the state outside the handle lock because the\n";
> pr " * the state is atomic.\n";
> pr " */\n";
> + pr " enum state state = h->state;\n";
> let tests =
> List.map (
> function
> - | Created -> "nbd_aio_is_created (h)"
> - | Connecting -> "nbd_aio_is_connecting (h)"
> - | Connected -> "nbd_aio_is_ready (h) || nbd_aio_is_processing (h)"
> - | Closed -> "nbd_aio_is_closed (h)"
> - | Dead -> "nbd_aio_is_dead (h)"
> + | Created -> "nbd_internal_is_state_created (state)"
> + | Connecting -> "nbd_internal_is_state_connecting (state)"
> + | Connected -> "nbd_internal_is_state_ready (state) ||
> nbd_internal_is_state_processing (state)"
Yes, this fixes the race: this code is executed outside the lock, so it
must read h->state exactly once before using it in multiple spots.
> +++ b/lib/connect.c
> @@ -38,16 +38,16 @@
> static int
> error_unless_ready (struct nbd_handle *h)
> {
> - if (nbd_unlocked_aio_is_ready (h))
> + if (nbd_internal_is_state_ready (h->state))
> return 0;
>
> /* Why did it fail? */
> - if (nbd_unlocked_aio_is_closed (h)) {
> + if (nbd_internal_is_state_closed (h->state)) {
> set_error (0, "connection is closed");
> return -1;
> }
>
> - if (nbd_unlocked_aio_is_dead (h))
> + if (nbd_internal_is_state_dead (h->state))
error_unless_ready() is called while lock is held, so multiple reads of
h->state are fine here.
> +++ b/lib/disconnect.c
> @@ -29,15 +29,14 @@
> int
> nbd_unlocked_shutdown (struct nbd_handle *h)
> {
> -
> - if (nbd_unlocked_aio_is_ready (h) ||
> - nbd_unlocked_aio_is_processing (h)) {
> + if (nbd_internal_is_state_ready (h->state) ||
> + nbd_internal_is_state_processing (h->state)) {
Likewise safe for multiple h->state reads.
> +/* NB: is_locked = false, may_set_error = false. */
> +int
> +nbd_unlocked_aio_is_created (struct nbd_handle *h)
> +{
> + return nbd_internal_is_state_created (h->state);
> +}
> +
> +/* NB: is_locked = false, may_set_error = false. */
> +int
> +nbd_unlocked_aio_is_connecting (struct nbd_handle *h)
> +{
> + return nbd_internal_is_state_connecting (h->state);
> +}
While we've fixed our internal uses, external callers may still have a
race when calling multiple nbd_aio_is_* functions in a row, since the
state can change between those functions. Is that a problem? Are there
any common state combinations (such as is_ready||is_processing) that
warrant their own API entry points so as to be race-free?
At any rate, this is better than what we've had, and does fix a real
race (even if it doesn't solve everything yet), so ACK.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3226
Virtualization: qemu.org | libvirt.org
signature.asc
Description: OpenPGP digital signature
_______________________________________________ Libguestfs mailing list [email protected] https://www.redhat.com/mailman/listinfo/libguestfs
