Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-14 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 09:55:00AM -0500, Eric Blake wrote:
> On Thu, Jul 13, 2023 at 09:36:58AM -0500, Eric Blake wrote:
> > > The user needs a way to know if an error occurred. So the completion
> > > callback must be called if the asynchronous function did not fail 
> > > (returned
> > > 0). If the completion callback should be called, with the error parameter
> > > set, even when the asynchronous function immediately failed with a 
> > > non-zero
> > > return value is another question. I see two possibilities: Either the
> > > completion callback should always be called. Or it should be called iff 
> > > the
> > > asynchronous function returned 0 (did not fail).
> > 
> > Good point.
> > 
> > Our documentation currently states (docs/libnbd.pod) that the
> > completion callback is ALWAYS called, but this is inconsistent with
> > current code - you are correct that at present, the completion
> > callback is NOT called if the aio command returns non-zero (easy test:
> > call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> > that part of the documentation to match existing practice (changing
> > our guarantee to the completion callback will be called once iff the
> > aio command reported success), since our testsuite wasn't covering it,
> > and it is probably an easier fix than munging the generator to call
> > completion.callback even for aio failures.  Meanwhile, the .free
> > callback MUST be called unconditionally, and I think our testsuite is
> > already covering that.
> > 
> > Life-cycle wise, I see the following sequence as being something we
> > could usefully rely on (although we don't yet have enough testsuite
> > coverage to prove that we already have it).  Note that it is not only
> > important how many times things are called, but I would like it if we
> > can guarantee the specific ordering between them (neither .free should
> > be called until all .callback opportunities are complete finished, to
> > avoid any use-after-free issues regardless of which of the two .free
> > slots the programmer actually uses):
> > 
> > - if aio command fails:
> > mid-command .callback: 0 calls
> > completion .callback: 0 calls
> 
> This is the part that is inconsistent with our docs added in commit
> 6f4dcdab, but which matches current code from all the tests I've come
> up with so far.
> 
> > mid-command .free: 1 call
> > completion .free: 1 call
> > 
> > - if aio command succeeds:
> > mid-command .callback: 0, 1, or multiple times
> > completion .callback: 1 call
> > mid-command .free: 1 call
> > completion .free: 1 call
> > 
> > What I'm not sure of yet (without more code inspection) is whether we
> > can sometimes call completion.callback after mid-command.free.
> 
> Commit 6f4dcdab documents that we guarantee this order, and I'm trying
> to preserve that guarantee.  Thus, I think the only thing up for
> debate is whether completion.callback can be skipped if the
> corresponding aio command failed early (because that is the one case
> where you KNOW there was an error without needing a completion
> callback to tell you that fact), but that it does make sense as being
> closer to what our code base already does.
> 
> Ramifications to user code: for aio commands that take both a
> mid-command and completion callback, the user can share the same
> allocated data structure between both callbacks, and use their choice
> of mid-command.free or completion.free to clean up the allocation.
> But if the user instead tries to uss completion.callback to clean up
> the allocation, then they must ALSO manually clean up the allocation
> if the aio command itself reported -1 (that is, it is better to stick
> cleanup in .free than in complete.callback).

Also this is why we have the .free method in callbacks.  It's the
place to do final cleanup which is guaranteed to be called after the
last use.  I don't recall us making any other guarantee apart from
this one.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-top is 'top' for virtual machines.  Tiny program with many
powerful monitoring features, net stats, disk stats, logging, etc.
http://people.redhat.com/~rjones/virt-top
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 07:07:33PM +, Tage Johansson wrote:
> Does this mean that the call to `aio_notify_read()` or
> `aio_notify_write()` which discovered the error will also return the
> error?

Yes!

> And that the completion callback was called by that call to
> `aio_notify_*`, in the same thread and before the call to
> `aio_notify_*` returned?

I believe so, although I don't think the contract requires this.  For
example it would seem to be contractually valid to not call the
completion callback until much later, eg. nbd_close.

Also, in addition to what Eric said in his reply, it is also possible
for callbacks to set the error int (through the pointer) in order to
cause an error to be returned.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson



On 7/13/2023 6:50 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 05:46:56PM +0100, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 04:18:03PM +, Tage Johansson wrote:

On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 03:05:30PM +, Tage Johansson wrote:

On 7/13/2023 4:36 PM, Eric Blake wrote:

On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:

On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:

On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:

I have replaced a call to `nbd_opt_info()` with a call to
`nbd_aio_opt_info()` and passed in a completion callback which just
calls `exit(EXIT_FAILURE)`. So if the completion callback is called
the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.

For the context callback (for opt_set_meta), .callback may be called
zero, one, or multiple times, but .free should be called exactly once.
But for the completion callback, I agree that the docs state that both
.callback and .free should each be called exactly once ("On the other
hand, if a completion callback is supplied (only possible with
asynchronous commands), it will always be reached exactly once, and
the completion callback must not ignore the value pointed to by
C."); we are indeed missing the call to .callback.  I'll work
on a patch.

Eww, the bug is bigger than just nbd_aio_opt* not always calling
completion.callback exactly once.  With just this diff (to be folded
into the larger patch I'm working on), I'm getting an assertion
failure that we fail to call completion.callback for
nbd_aio_pread_structured when calling nbd_close() prior to the command
running to completion, so I'll have to fix that too.

Just to be clear about this, are we really sure the completion
callback should always be called once?  I'm not clear why that should
be the case.  (The .free callback however should be.)

For example, if I call a function with bogus invalid parameters so
that it fails very early on (or when the handle is in an incorrect
state), should I expect the completion callback to be called?  I would
expect not.

Rich.

The user needs a way to know if an error occurred. So the completion
callback must be called if the asynchronous function did not fail (returned
0). If the completion callback should be called, with the error parameter
set, even when the asynchronous function immediately failed with a non-zero
return value is another question. I see two possibilities: Either the
completion callback should always be called. Or it should be called iff the
asynchronous function returned 0 (did not fail).

Good point.

Our documentation currently states (docs/libnbd.pod) that the
completion callback is ALWAYS called, but this is inconsistent with
current code - you are correct that at present, the completion
callback is NOT called if the aio command returns non-zero (easy test:
call nbd_aio_block_status before nbd_connect*).  I'm game to updating
that part of the documentation to match existing practice (changing
our guarantee to the completion callback will be called once iff the
aio command reported success), since our testsuite wasn't covering it,
and it is probably an easier fix than munging the generator to call
completion.callback even for aio failures.  Meanwhile, the .free
callback MUST be called unconditionally, and I think our testsuite is
already covering that.

Life-cycle wise, I see the following sequence as being something we
could usefully rely on (although we don't yet have enough testsuite
coverage to prove that we already have it).  Note that it is not only
important how many times things are called, but I would like it if we
can guarantee the specific ordering between them (neither .free should
be called until all .callback opportunities are complete finished, to
avoid any use-after-free issues regardless of which of the two .free
slots the programmer actually uses):

- if aio command fails:
mid-command .callback: 0 calls
completion .callback: 0 calls
mid-command .free: 1 call
completion .free: 1 call

- if aio command succeeds:
mid-command .callback: 0, 1, or multiple times
completion .callback: 1 call
mid-command .free: 1 call
completion .free: 1 call

What I'm not sure of yet (without more code inspection) is whether we
can sometimes call completion.callback after mid-command.free.


By the way, if the error parameter is set in the completion callback, how
can the user retrieve the error text? Is it possible to call
`get_nbd_error(3)` from inside the completion callback?

You mean nbd_get_error().  We currently document that it is not safe
to call any nbd_* from within a callback.  Maybe we could relax that
to carve out exceptions for nbd_get_error/errno as special cases,
since they only inspect thread-local state and can be used even when
there is no current nbd_handle.  The problem is that I'm not sure if
there 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Eric Blake
On Thu, Jul 13, 2023 at 04:18:03PM +, Tage Johansson wrote:
> 
> > > So is there any safe way to get some description of the error from a
> > > completion callback apart from a non-zero number? It isn't too
> > > helpful to report to the user that the read operation faild with -1.
> > As I recall, from the callback, no.
> > 
> > The error is not lost however, it will be returned by the API call
> > itself.  eg. If you're in nbd_aio_opt_list -> callback (error) then
> > nbd_aio_opt_list will return -1 and at that point you can use
> > nbd_get_error to report the error.
> 
> 
> I don't understand. If I call `nbd_aio_opt_list()` with a completion
> callback. `nbd_aio_opt_list()` will return immediately, maybe with a
> successful result. But the command is not complete until
> `nbd_aio_is_connecting()` returns `false`, so the completion callback may be
> invoked with an error after `nbd_aio_opt_list()` has returned.

When you call nbd_aio_opt_list(), here are several possible scenarios
that can[*] happen (not exhaustive, but should be a good starting
point for thinking about it):

1. pre-transmission early failure
- call nbd_aio_opt_list
  - early failure detected, nothing queued to send to server
  - call list_callback.free if provided
  - call completion_callback.free if provided
  - nbd_aio_opt_list returns -1
- nbd_aio_is_negotiating is unchanged (the early failure did not
  change the state, .callbacks were not invoked, and .free is done
  before aio)

2. command succeeds after blocking
- nbd_aio_is_negotiating is true (otherwise we were in scenario 1)
- call nbd_aio_opt_list
  - no early failure detected, command is queued via nbd_internal_run()
- state machine changes to connecting instead of negotiating
- advance through states to send the request, hit EAGAIN waiting to
  read the server's reply
- nbd_internal_run returns 0
  - nbd_aio_opt_list returns 0
- nbd_aio_is_negotiating is false, you must manually move the state
  machine along once you have data (by nbd_aio_poll,
  nbd_aio_notify_read, ...)
  - state machine reaches server responses
  - list_callback.callback called if server's reply includes a name
  - state machine gets to end of server's message
  - call completion_callback.callback if provided, with error 0
  - state set back to negotiating
  - call list_callback.free (if provided)
  - call completion_callback.free (if provided)
  - nbd_aio_poll/nbd_aio_notify_write/... returns 0
- nbd_aio_is_negotiating is true again (callbacks reached after the aio
  call completed)

3. server failure not detected until after blocking
- nbd_aio_is_negotiating is true (otherwise we were in scenario 1)
- call nbd_aio_opt_list
  - no early failure detected, command is queued via nbd_internal_run()
- state machine changes to connecting instead of negotiating
- advance through states to send the request, hit EAGAIN waiting to
  read the server's reply
- nbd_internal_run returns 0
  - nbd_aio_opt_list returns 0
- nbd_aio_is_negotiating is false, you must manually move the state
  machine along once you have data (by nbd_aio_poll,
  nbd_aio_notify_read, ...)
  - state machine reaches server responses
  - list_callback.callback called if server's reply includes a name
  - state machine gets to end of server's message
  - call completion_callback.callback if provided, with error reported
by the server
  - state set back to negotiating
  - call list_callback.free (if provided)
  - call completion_callback.free (if provided)
  - nbd_aio_poll/nbd_aio_notify_write/... returns 0
- nbd_aio_is_negotiating is true again (you had to use a completion
  callback to know about the server failure)

4. transmission failure after blocking
- nbd_aio_is_negotiating is true (otherwise we were in scenario 1)
- call nbd_aio_opt_list
  - no early failure detected, command is queued via nbd_internal_run()
- state machine changes to connecting instead of negotiating
- advance through states to send the request, hit EAGAIN waiting to
  read the server's reply
- nbd_internal_run returns 0
  - nbd_aio_opt_list returns 0
- nbd_aio_is_negotiating is false, you must manually move the state
  machine along once you have data (by nbd_aio_poll,
  nbd_aio_notify_read, ...)
  - state machine sees server is no longer connected
  - state set to DEAD
  - call completion_callback.callback if provided, with error ENOTCONN
  - call list_callback.free (if provided)
  - call completion_callback.free (if provided)
  - nbd_aio_poll/nbd_aio_notify_write/... returns -1
- nbd_aio_is_negotiating is false (the completion callback occurs
  after the aio success)

5. server failure detected without blocking (unlikely, but possible
under gdb or heavy system load)
- nbd_aio_is_negotiating is true (otherwise we were in scenario 1)
- call nbd_aio_opt_list
  - no early failure detected, command is queued via nbd_internal_run()
- state machine changes to connecting instead of negotiating
- advance 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 05:46:56PM +0100, Richard W.M. Jones wrote:
> On Thu, Jul 13, 2023 at 04:18:03PM +, Tage Johansson wrote:
> > 
> > On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:
> > >On Thu, Jul 13, 2023 at 03:05:30PM +, Tage Johansson wrote:
> > >>On 7/13/2023 4:36 PM, Eric Blake wrote:
> > >>>On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:
> > On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> > >On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> > >>On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> > >I have replaced a call to `nbd_opt_info()` with a call to
> > >`nbd_aio_opt_info()` and passed in a completion callback which just
> > >calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> > >the test should fail, which it doesn't, at least not on my machine.
> > Isn't that OK?  Only .free is required to be called.
> > >>>For the context callback (for opt_set_meta), .callback may be called
> > >>>zero, one, or multiple times, but .free should be called exactly 
> > >>>once.
> > >>>But for the completion callback, I agree that the docs state that 
> > >>>both
> > >>>.callback and .free should each be called exactly once ("On the other
> > >>>hand, if a completion callback is supplied (only possible with
> > >>>asynchronous commands), it will always be reached exactly once, and
> > >>>the completion callback must not ignore the value pointed to by
> > >>>C."); we are indeed missing the call to .callback.  I'll work
> > >>>on a patch.
> > >>Eww, the bug is bigger than just nbd_aio_opt* not always calling
> > >>completion.callback exactly once.  With just this diff (to be folded
> > >>into the larger patch I'm working on), I'm getting an assertion
> > >>failure that we fail to call completion.callback for
> > >>nbd_aio_pread_structured when calling nbd_close() prior to the command
> > >>running to completion, so I'll have to fix that too.
> > >Just to be clear about this, are we really sure the completion
> > >callback should always be called once?  I'm not clear why that should
> > >be the case.  (The .free callback however should be.)
> > >
> > >For example, if I call a function with bogus invalid parameters so
> > >that it fails very early on (or when the handle is in an incorrect
> > >state), should I expect the completion callback to be called?  I would
> > >expect not.
> > >
> > >Rich.
> > The user needs a way to know if an error occurred. So the completion
> > callback must be called if the asynchronous function did not fail 
> > (returned
> > 0). If the completion callback should be called, with the error 
> > parameter
> > set, even when the asynchronous function immediately failed with a 
> > non-zero
> > return value is another question. I see two possibilities: Either the
> > completion callback should always be called. Or it should be called iff 
> > the
> > asynchronous function returned 0 (did not fail).
> > >>>Good point.
> > >>>
> > >>>Our documentation currently states (docs/libnbd.pod) that the
> > >>>completion callback is ALWAYS called, but this is inconsistent with
> > >>>current code - you are correct that at present, the completion
> > >>>callback is NOT called if the aio command returns non-zero (easy test:
> > >>>call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> > >>>that part of the documentation to match existing practice (changing
> > >>>our guarantee to the completion callback will be called once iff the
> > >>>aio command reported success), since our testsuite wasn't covering it,
> > >>>and it is probably an easier fix than munging the generator to call
> > >>>completion.callback even for aio failures.  Meanwhile, the .free
> > >>>callback MUST be called unconditionally, and I think our testsuite is
> > >>>already covering that.
> > >>>
> > >>>Life-cycle wise, I see the following sequence as being something we
> > >>>could usefully rely on (although we don't yet have enough testsuite
> > >>>coverage to prove that we already have it).  Note that it is not only
> > >>>important how many times things are called, but I would like it if we
> > >>>can guarantee the specific ordering between them (neither .free should
> > >>>be called until all .callback opportunities are complete finished, to
> > >>>avoid any use-after-free issues regardless of which of the two .free
> > >>>slots the programmer actually uses):
> > >>>
> > >>>- if aio command fails:
> > >>>mid-command .callback: 0 calls
> > >>>completion .callback: 0 calls
> > >>>mid-command .free: 1 call
> > >>>completion .free: 1 call
> > >>>
> > >>>- if aio command succeeds:
> > >>>mid-command .callback: 0, 1, or multiple times
> > >>>completion .callback: 1 call
> > >>>mid-command .free: 1 call
> > >>>completion .free: 1 call
> > >>>
> > 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 04:18:03PM +, Tage Johansson wrote:
> 
> On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:
> >On Thu, Jul 13, 2023 at 03:05:30PM +, Tage Johansson wrote:
> >>On 7/13/2023 4:36 PM, Eric Blake wrote:
> >>>On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:
> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> >On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> >>On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> >I have replaced a call to `nbd_opt_info()` with a call to
> >`nbd_aio_opt_info()` and passed in a completion callback which just
> >calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> >the test should fail, which it doesn't, at least not on my machine.
> Isn't that OK?  Only .free is required to be called.
> >>>For the context callback (for opt_set_meta), .callback may be called
> >>>zero, one, or multiple times, but .free should be called exactly once.
> >>>But for the completion callback, I agree that the docs state that both
> >>>.callback and .free should each be called exactly once ("On the other
> >>>hand, if a completion callback is supplied (only possible with
> >>>asynchronous commands), it will always be reached exactly once, and
> >>>the completion callback must not ignore the value pointed to by
> >>>C."); we are indeed missing the call to .callback.  I'll work
> >>>on a patch.
> >>Eww, the bug is bigger than just nbd_aio_opt* not always calling
> >>completion.callback exactly once.  With just this diff (to be folded
> >>into the larger patch I'm working on), I'm getting an assertion
> >>failure that we fail to call completion.callback for
> >>nbd_aio_pread_structured when calling nbd_close() prior to the command
> >>running to completion, so I'll have to fix that too.
> >Just to be clear about this, are we really sure the completion
> >callback should always be called once?  I'm not clear why that should
> >be the case.  (The .free callback however should be.)
> >
> >For example, if I call a function with bogus invalid parameters so
> >that it fails very early on (or when the handle is in an incorrect
> >state), should I expect the completion callback to be called?  I would
> >expect not.
> >
> >Rich.
> The user needs a way to know if an error occurred. So the completion
> callback must be called if the asynchronous function did not fail 
> (returned
> 0). If the completion callback should be called, with the error parameter
> set, even when the asynchronous function immediately failed with a 
> non-zero
> return value is another question. I see two possibilities: Either the
> completion callback should always be called. Or it should be called iff 
> the
> asynchronous function returned 0 (did not fail).
> >>>Good point.
> >>>
> >>>Our documentation currently states (docs/libnbd.pod) that the
> >>>completion callback is ALWAYS called, but this is inconsistent with
> >>>current code - you are correct that at present, the completion
> >>>callback is NOT called if the aio command returns non-zero (easy test:
> >>>call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> >>>that part of the documentation to match existing practice (changing
> >>>our guarantee to the completion callback will be called once iff the
> >>>aio command reported success), since our testsuite wasn't covering it,
> >>>and it is probably an easier fix than munging the generator to call
> >>>completion.callback even for aio failures.  Meanwhile, the .free
> >>>callback MUST be called unconditionally, and I think our testsuite is
> >>>already covering that.
> >>>
> >>>Life-cycle wise, I see the following sequence as being something we
> >>>could usefully rely on (although we don't yet have enough testsuite
> >>>coverage to prove that we already have it).  Note that it is not only
> >>>important how many times things are called, but I would like it if we
> >>>can guarantee the specific ordering between them (neither .free should
> >>>be called until all .callback opportunities are complete finished, to
> >>>avoid any use-after-free issues regardless of which of the two .free
> >>>slots the programmer actually uses):
> >>>
> >>>- if aio command fails:
> >>>mid-command .callback: 0 calls
> >>>completion .callback: 0 calls
> >>>mid-command .free: 1 call
> >>>completion .free: 1 call
> >>>
> >>>- if aio command succeeds:
> >>>mid-command .callback: 0, 1, or multiple times
> >>>completion .callback: 1 call
> >>>mid-command .free: 1 call
> >>>completion .free: 1 call
> >>>
> >>>What I'm not sure of yet (without more code inspection) is whether we
> >>>can sometimes call completion.callback after mid-command.free.
> >>>
> By the way, if the error parameter is set in the completion callback, how
> can the user retrieve the error text? Is 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson



On 7/13/2023 5:37 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 03:05:30PM +, Tage Johansson wrote:

On 7/13/2023 4:36 PM, Eric Blake wrote:

On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:

On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:

On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:

I have replaced a call to `nbd_opt_info()` with a call to
`nbd_aio_opt_info()` and passed in a completion callback which just
calls `exit(EXIT_FAILURE)`. So if the completion callback is called
the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.

For the context callback (for opt_set_meta), .callback may be called
zero, one, or multiple times, but .free should be called exactly once.
But for the completion callback, I agree that the docs state that both
.callback and .free should each be called exactly once ("On the other
hand, if a completion callback is supplied (only possible with
asynchronous commands), it will always be reached exactly once, and
the completion callback must not ignore the value pointed to by
C."); we are indeed missing the call to .callback.  I'll work
on a patch.

Eww, the bug is bigger than just nbd_aio_opt* not always calling
completion.callback exactly once.  With just this diff (to be folded
into the larger patch I'm working on), I'm getting an assertion
failure that we fail to call completion.callback for
nbd_aio_pread_structured when calling nbd_close() prior to the command
running to completion, so I'll have to fix that too.

Just to be clear about this, are we really sure the completion
callback should always be called once?  I'm not clear why that should
be the case.  (The .free callback however should be.)

For example, if I call a function with bogus invalid parameters so
that it fails very early on (or when the handle is in an incorrect
state), should I expect the completion callback to be called?  I would
expect not.

Rich.

The user needs a way to know if an error occurred. So the completion
callback must be called if the asynchronous function did not fail (returned
0). If the completion callback should be called, with the error parameter
set, even when the asynchronous function immediately failed with a non-zero
return value is another question. I see two possibilities: Either the
completion callback should always be called. Or it should be called iff the
asynchronous function returned 0 (did not fail).

Good point.

Our documentation currently states (docs/libnbd.pod) that the
completion callback is ALWAYS called, but this is inconsistent with
current code - you are correct that at present, the completion
callback is NOT called if the aio command returns non-zero (easy test:
call nbd_aio_block_status before nbd_connect*).  I'm game to updating
that part of the documentation to match existing practice (changing
our guarantee to the completion callback will be called once iff the
aio command reported success), since our testsuite wasn't covering it,
and it is probably an easier fix than munging the generator to call
completion.callback even for aio failures.  Meanwhile, the .free
callback MUST be called unconditionally, and I think our testsuite is
already covering that.

Life-cycle wise, I see the following sequence as being something we
could usefully rely on (although we don't yet have enough testsuite
coverage to prove that we already have it).  Note that it is not only
important how many times things are called, but I would like it if we
can guarantee the specific ordering between them (neither .free should
be called until all .callback opportunities are complete finished, to
avoid any use-after-free issues regardless of which of the two .free
slots the programmer actually uses):

- if aio command fails:
mid-command .callback: 0 calls
completion .callback: 0 calls
mid-command .free: 1 call
completion .free: 1 call

- if aio command succeeds:
mid-command .callback: 0, 1, or multiple times
completion .callback: 1 call
mid-command .free: 1 call
completion .free: 1 call

What I'm not sure of yet (without more code inspection) is whether we
can sometimes call completion.callback after mid-command.free.


By the way, if the error parameter is set in the completion callback, how
can the user retrieve the error text? Is it possible to call
`get_nbd_error(3)` from inside the completion callback?

You mean nbd_get_error().  We currently document that it is not safe
to call any nbd_* from within a callback.  Maybe we could relax that
to carve out exceptions for nbd_get_error/errno as special cases,
since they only inspect thread-local state and can be used even when
there is no current nbd_handle.  The problem is that I'm not sure if
there is a reliable string at that point in time: part of the reason
the completion callback has an errnum parameter is that the point of
reporting the completion failure may be in a 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 03:05:30PM +, Tage Johansson wrote:
> 
> On 7/13/2023 4:36 PM, Eric Blake wrote:
> >On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:
> >>On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> >>>On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> >>>I have replaced a call to `nbd_opt_info()` with a call to
> >>>`nbd_aio_opt_info()` and passed in a completion callback which just
> >>>calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> >>>the test should fail, which it doesn't, at least not on my machine.
> >>Isn't that OK?  Only .free is required to be called.
> >For the context callback (for opt_set_meta), .callback may be called
> >zero, one, or multiple times, but .free should be called exactly once.
> >But for the completion callback, I agree that the docs state that both
> >.callback and .free should each be called exactly once ("On the other
> >hand, if a completion callback is supplied (only possible with
> >asynchronous commands), it will always be reached exactly once, and
> >the completion callback must not ignore the value pointed to by
> >C."); we are indeed missing the call to .callback.  I'll work
> >on a patch.
> Eww, the bug is bigger than just nbd_aio_opt* not always calling
> completion.callback exactly once.  With just this diff (to be folded
> into the larger patch I'm working on), I'm getting an assertion
> failure that we fail to call completion.callback for
> nbd_aio_pread_structured when calling nbd_close() prior to the command
> running to completion, so I'll have to fix that too.
> >>>Just to be clear about this, are we really sure the completion
> >>>callback should always be called once?  I'm not clear why that should
> >>>be the case.  (The .free callback however should be.)
> >>>
> >>>For example, if I call a function with bogus invalid parameters so
> >>>that it fails very early on (or when the handle is in an incorrect
> >>>state), should I expect the completion callback to be called?  I would
> >>>expect not.
> >>>
> >>>Rich.
> >>
> >>The user needs a way to know if an error occurred. So the completion
> >>callback must be called if the asynchronous function did not fail (returned
> >>0). If the completion callback should be called, with the error parameter
> >>set, even when the asynchronous function immediately failed with a non-zero
> >>return value is another question. I see two possibilities: Either the
> >>completion callback should always be called. Or it should be called iff the
> >>asynchronous function returned 0 (did not fail).
> >Good point.
> >
> >Our documentation currently states (docs/libnbd.pod) that the
> >completion callback is ALWAYS called, but this is inconsistent with
> >current code - you are correct that at present, the completion
> >callback is NOT called if the aio command returns non-zero (easy test:
> >call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> >that part of the documentation to match existing practice (changing
> >our guarantee to the completion callback will be called once iff the
> >aio command reported success), since our testsuite wasn't covering it,
> >and it is probably an easier fix than munging the generator to call
> >completion.callback even for aio failures.  Meanwhile, the .free
> >callback MUST be called unconditionally, and I think our testsuite is
> >already covering that.
> >
> >Life-cycle wise, I see the following sequence as being something we
> >could usefully rely on (although we don't yet have enough testsuite
> >coverage to prove that we already have it).  Note that it is not only
> >important how many times things are called, but I would like it if we
> >can guarantee the specific ordering between them (neither .free should
> >be called until all .callback opportunities are complete finished, to
> >avoid any use-after-free issues regardless of which of the two .free
> >slots the programmer actually uses):
> >
> >- if aio command fails:
> >mid-command .callback: 0 calls
> >completion .callback: 0 calls
> >mid-command .free: 1 call
> >completion .free: 1 call
> >
> >- if aio command succeeds:
> >mid-command .callback: 0, 1, or multiple times
> >completion .callback: 1 call
> >mid-command .free: 1 call
> >completion .free: 1 call
> >
> >What I'm not sure of yet (without more code inspection) is whether we
> >can sometimes call completion.callback after mid-command.free.
> >
> >>
> >>By the way, if the error parameter is set in the completion callback, how
> >>can the user retrieve the error text? Is it possible to call
> >>`get_nbd_error(3)` from inside the completion callback?
> >You mean nbd_get_error().  We currently document that it is not safe
> >to call any nbd_* from within a callback.  Maybe we could relax that
> >to carve out exceptions for nbd_get_error/errno as special cases,
> 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson



On 7/13/2023 4:36 PM, Eric Blake wrote:

On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:

On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:

On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:

I have replaced a call to `nbd_opt_info()` with a call to
`nbd_aio_opt_info()` and passed in a completion callback which just
calls `exit(EXIT_FAILURE)`. So if the completion callback is called
the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.

For the context callback (for opt_set_meta), .callback may be called
zero, one, or multiple times, but .free should be called exactly once.
But for the completion callback, I agree that the docs state that both
.callback and .free should each be called exactly once ("On the other
hand, if a completion callback is supplied (only possible with
asynchronous commands), it will always be reached exactly once, and
the completion callback must not ignore the value pointed to by
C."); we are indeed missing the call to .callback.  I'll work
on a patch.

Eww, the bug is bigger than just nbd_aio_opt* not always calling
completion.callback exactly once.  With just this diff (to be folded
into the larger patch I'm working on), I'm getting an assertion
failure that we fail to call completion.callback for
nbd_aio_pread_structured when calling nbd_close() prior to the command
running to completion, so I'll have to fix that too.

Just to be clear about this, are we really sure the completion
callback should always be called once?  I'm not clear why that should
be the case.  (The .free callback however should be.)

For example, if I call a function with bogus invalid parameters so
that it fails very early on (or when the handle is in an incorrect
state), should I expect the completion callback to be called?  I would
expect not.

Rich.


The user needs a way to know if an error occurred. So the completion
callback must be called if the asynchronous function did not fail (returned
0). If the completion callback should be called, with the error parameter
set, even when the asynchronous function immediately failed with a non-zero
return value is another question. I see two possibilities: Either the
completion callback should always be called. Or it should be called iff the
asynchronous function returned 0 (did not fail).

Good point.

Our documentation currently states (docs/libnbd.pod) that the
completion callback is ALWAYS called, but this is inconsistent with
current code - you are correct that at present, the completion
callback is NOT called if the aio command returns non-zero (easy test:
call nbd_aio_block_status before nbd_connect*).  I'm game to updating
that part of the documentation to match existing practice (changing
our guarantee to the completion callback will be called once iff the
aio command reported success), since our testsuite wasn't covering it,
and it is probably an easier fix than munging the generator to call
completion.callback even for aio failures.  Meanwhile, the .free
callback MUST be called unconditionally, and I think our testsuite is
already covering that.

Life-cycle wise, I see the following sequence as being something we
could usefully rely on (although we don't yet have enough testsuite
coverage to prove that we already have it).  Note that it is not only
important how many times things are called, but I would like it if we
can guarantee the specific ordering between them (neither .free should
be called until all .callback opportunities are complete finished, to
avoid any use-after-free issues regardless of which of the two .free
slots the programmer actually uses):

- if aio command fails:
mid-command .callback: 0 calls
completion .callback: 0 calls
mid-command .free: 1 call
completion .free: 1 call

- if aio command succeeds:
mid-command .callback: 0, 1, or multiple times
completion .callback: 1 call
mid-command .free: 1 call
completion .free: 1 call

What I'm not sure of yet (without more code inspection) is whether we
can sometimes call completion.callback after mid-command.free.



By the way, if the error parameter is set in the completion callback, how
can the user retrieve the error text? Is it possible to call
`get_nbd_error(3)` from inside the completion callback?

You mean nbd_get_error().  We currently document that it is not safe
to call any nbd_* from within a callback.  Maybe we could relax that
to carve out exceptions for nbd_get_error/errno as special cases,
since they only inspect thread-local state and can be used even when
there is no current nbd_handle.  The problem is that I'm not sure if
there is a reliable string at that point in time: part of the reason
the completion callback has an errnum parameter is that the point of
reporting the completion failure may be in a different thread than
when the failure was first detected, and we only preserved a numeric
value in cmd->error rather 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Eric Blake
On Thu, Jul 13, 2023 at 09:36:58AM -0500, Eric Blake wrote:
> > The user needs a way to know if an error occurred. So the completion
> > callback must be called if the asynchronous function did not fail (returned
> > 0). If the completion callback should be called, with the error parameter
> > set, even when the asynchronous function immediately failed with a non-zero
> > return value is another question. I see two possibilities: Either the
> > completion callback should always be called. Or it should be called iff the
> > asynchronous function returned 0 (did not fail).
> 
> Good point.
> 
> Our documentation currently states (docs/libnbd.pod) that the
> completion callback is ALWAYS called, but this is inconsistent with
> current code - you are correct that at present, the completion
> callback is NOT called if the aio command returns non-zero (easy test:
> call nbd_aio_block_status before nbd_connect*).  I'm game to updating
> that part of the documentation to match existing practice (changing
> our guarantee to the completion callback will be called once iff the
> aio command reported success), since our testsuite wasn't covering it,
> and it is probably an easier fix than munging the generator to call
> completion.callback even for aio failures.  Meanwhile, the .free
> callback MUST be called unconditionally, and I think our testsuite is
> already covering that.
> 
> Life-cycle wise, I see the following sequence as being something we
> could usefully rely on (although we don't yet have enough testsuite
> coverage to prove that we already have it).  Note that it is not only
> important how many times things are called, but I would like it if we
> can guarantee the specific ordering between them (neither .free should
> be called until all .callback opportunities are complete finished, to
> avoid any use-after-free issues regardless of which of the two .free
> slots the programmer actually uses):
> 
> - if aio command fails:
> mid-command .callback: 0 calls
> completion .callback: 0 calls

This is the part that is inconsistent with our docs added in commit
6f4dcdab, but which matches current code from all the tests I've come
up with so far.

> mid-command .free: 1 call
> completion .free: 1 call
> 
> - if aio command succeeds:
> mid-command .callback: 0, 1, or multiple times
> completion .callback: 1 call
> mid-command .free: 1 call
> completion .free: 1 call
> 
> What I'm not sure of yet (without more code inspection) is whether we
> can sometimes call completion.callback after mid-command.free.

Commit 6f4dcdab documents that we guarantee this order, and I'm trying
to preserve that guarantee.  Thus, I think the only thing up for
debate is whether completion.callback can be skipped if the
corresponding aio command failed early (because that is the one case
where you KNOW there was an error without needing a completion
callback to tell you that fact), but that it does make sense as being
closer to what our code base already does.

Ramifications to user code: for aio commands that take both a
mid-command and completion callback, the user can share the same
allocated data structure between both callbacks, and use their choice
of mid-command.free or completion.free to clean up the allocation.
But if the user instead tries to uss completion.callback to clean up
the allocation, then they must ALSO manually clean up the allocation
if the aio command itself reported -1 (that is, it is better to stick
cleanup in .free than in complete.callback).

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Eric Blake
On Thu, Jul 13, 2023 at 01:37:49PM +, Tage Johansson wrote:
> 
> On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:
> > On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> > > On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> > > > > > I have replaced a call to `nbd_opt_info()` with a call to
> > > > > > `nbd_aio_opt_info()` and passed in a completion callback which just
> > > > > > calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> > > > > > the test should fail, which it doesn't, at least not on my machine.
> > > > > Isn't that OK?  Only .free is required to be called.
> > > > For the context callback (for opt_set_meta), .callback may be called
> > > > zero, one, or multiple times, but .free should be called exactly once.
> > > > But for the completion callback, I agree that the docs state that both
> > > > .callback and .free should each be called exactly once ("On the other
> > > > hand, if a completion callback is supplied (only possible with
> > > > asynchronous commands), it will always be reached exactly once, and
> > > > the completion callback must not ignore the value pointed to by
> > > > C."); we are indeed missing the call to .callback.  I'll work
> > > > on a patch.
> > > Eww, the bug is bigger than just nbd_aio_opt* not always calling
> > > completion.callback exactly once.  With just this diff (to be folded
> > > into the larger patch I'm working on), I'm getting an assertion
> > > failure that we fail to call completion.callback for
> > > nbd_aio_pread_structured when calling nbd_close() prior to the command
> > > running to completion, so I'll have to fix that too.
> > Just to be clear about this, are we really sure the completion
> > callback should always be called once?  I'm not clear why that should
> > be the case.  (The .free callback however should be.)
> > 
> > For example, if I call a function with bogus invalid parameters so
> > that it fails very early on (or when the handle is in an incorrect
> > state), should I expect the completion callback to be called?  I would
> > expect not.
> > 
> > Rich.
> 
> 
> The user needs a way to know if an error occurred. So the completion
> callback must be called if the asynchronous function did not fail (returned
> 0). If the completion callback should be called, with the error parameter
> set, even when the asynchronous function immediately failed with a non-zero
> return value is another question. I see two possibilities: Either the
> completion callback should always be called. Or it should be called iff the
> asynchronous function returned 0 (did not fail).

Good point.

Our documentation currently states (docs/libnbd.pod) that the
completion callback is ALWAYS called, but this is inconsistent with
current code - you are correct that at present, the completion
callback is NOT called if the aio command returns non-zero (easy test:
call nbd_aio_block_status before nbd_connect*).  I'm game to updating
that part of the documentation to match existing practice (changing
our guarantee to the completion callback will be called once iff the
aio command reported success), since our testsuite wasn't covering it,
and it is probably an easier fix than munging the generator to call
completion.callback even for aio failures.  Meanwhile, the .free
callback MUST be called unconditionally, and I think our testsuite is
already covering that.

Life-cycle wise, I see the following sequence as being something we
could usefully rely on (although we don't yet have enough testsuite
coverage to prove that we already have it).  Note that it is not only
important how many times things are called, but I would like it if we
can guarantee the specific ordering between them (neither .free should
be called until all .callback opportunities are complete finished, to
avoid any use-after-free issues regardless of which of the two .free
slots the programmer actually uses):

- if aio command fails:
mid-command .callback: 0 calls
completion .callback: 0 calls
mid-command .free: 1 call
completion .free: 1 call

- if aio command succeeds:
mid-command .callback: 0, 1, or multiple times
completion .callback: 1 call
mid-command .free: 1 call
completion .free: 1 call

What I'm not sure of yet (without more code inspection) is whether we
can sometimes call completion.callback after mid-command.free.

> 
> 
> By the way, if the error parameter is set in the completion callback, how
> can the user retrieve the error text? Is it possible to call
> `get_nbd_error(3)` from inside the completion callback?

You mean nbd_get_error().  We currently document that it is not safe
to call any nbd_* from within a callback.  Maybe we could relax that
to carve out exceptions for nbd_get_error/errno as special cases,
since they only inspect thread-local state and can be used even when
there is no current nbd_handle.  The problem is that I'm not sure if
there is a reliable string at that point in time: part of the reason
the completion callback has an 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:
> On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> > > > I have replaced a call to `nbd_opt_info()` with a call to
> > > > `nbd_aio_opt_info()` and passed in a completion callback which just
> > > > calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> > > > the test should fail, which it doesn't, at least not on my machine.
> > > 
> > > Isn't that OK?  Only .free is required to be called.
> > 
> > For the context callback (for opt_set_meta), .callback may be called
> > zero, one, or multiple times, but .free should be called exactly once.
> > But for the completion callback, I agree that the docs state that both
> > .callback and .free should each be called exactly once ("On the other
> > hand, if a completion callback is supplied (only possible with
> > asynchronous commands), it will always be reached exactly once, and
> > the completion callback must not ignore the value pointed to by
> > C."); we are indeed missing the call to .callback.  I'll work
> > on a patch.
> 
> Eww, the bug is bigger than just nbd_aio_opt* not always calling
> completion.callback exactly once.  With just this diff (to be folded
> into the larger patch I'm working on), I'm getting an assertion
> failure that we fail to call completion.callback for
> nbd_aio_pread_structured when calling nbd_close() prior to the command
> running to completion, so I'll have to fix that too.

Just to be clear about this, are we really sure the completion
callback should always be called once?  I'm not clear why that should
be the case.  (The .free callback however should be.)

For example, if I call a function with bogus invalid parameters so
that it fails very early on (or when the handle is in an incorrect
state), should I expect the completion callback to be called?  I would
expect not.

Rich.

> diff --git i/tests/closure-lifetimes.c w/tests/closure-lifetimes.c
> index 9bb4e120..c5ad4c10 100644
> --- i/tests/closure-lifetimes.c
> +++ w/tests/closure-lifetimes.c
> @@ -66,6 +66,7 @@ read_cb (void *opaque,
>   uint64_t offset, unsigned status, int *error)
>  {
>assert (!read_cb_freed);
> +  assert (!completion_cb_called);
>read_cb_called++;
>return 0;
>  }
> @@ -73,6 +74,7 @@ read_cb (void *opaque,
>  static void
>  read_cb_free (void *opaque)
>  {
> +  assert (completion_cb_called);
>read_cb_freed++;
>  }
> 
> @@ -81,6 +83,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
> offset,
>   uint32_t *entries, size_t nr_entries, int *error)
>  {
>assert (!block_status_cb_freed);
> +  assert (!completion_cb_called);
>block_status_cb_called++;
>return 0;
>  }
> @@ -88,6 +91,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
> offset,
>  static void
>  block_status_cb_free (void *opaque)
>  {
> +  assert (completion_cb_called);
>block_status_cb_freed++;
>  }
> 
> @@ -102,6 +106,7 @@ completion_cb (void *opaque, int *error)
>  static void
>  completion_cb_free (void *opaque)
>  {
> +  assert (completion_cb_called);
>completion_cb_freed++;
>  }
> 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson


On 7/13/2023 2:13 PM, Eric Blake wrote:

On Thu, Jul 13, 2023 at 12:03:24PM +0100, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 09:53:58AM +, Tage Johansson wrote:

Apologize if resending, but I'm not sure my previous email was
actually delivered.


On 7/12/2023 10:33 PM, Eric Blake wrote:



On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:

Hello,


While writing some tests for the Rust bindings, I discovered a
memory leak
with Valgrind due to a completion callback not being freed. More
specificly,
the completion callback of `aio_opt_info()` doesn't seem to be if
`aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
connected state, so it should indeed fail, but it should perhaps
also call
the free function associated with the callback.

Can you paste the valgrind output showing the leak?


The Valgrind output is very Rust specific and only shows the Rust
allocations which goes into the completion callback and are lost.


But if you apply the following patch (which modifies
tests/opt-info.c) you shall see that the completion callback is not
called.

Ah. I see.  We are testing the synchronous command, but not the aio
counterpart command.  That is indeed a hole in the testsuite (even
after my patch yesterday).


I have replaced a call to `nbd_opt_info()` with a call to
`nbd_aio_opt_info()` and passed in a completion callback which just
calls `exit(EXIT_FAILURE)`. So if the completion callback is called
the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.

For the context callback (for opt_set_meta), .callback may be called
zero, one, or multiple times, but .free should be called exactly once.
But for the completion callback, I agree that the docs state that both
.callback and .free should each be called exactly once ("On the other
hand, if a completion callback is supplied (only possible with
asynchronous commands), it will always be reached exactly once, and
the completion callback must not ignore the value pointed to by
C."); we are indeed missing the call to .callback.  I'll work
on a patch.



Ok, so if I understand correctly the completion callback should actually 
be called exactly once, right?


Then I will not have to adjust my code. I am storing the completion 
callback as an `FnOnce` 
, so it should free 
itself when it is called. If it will be called zero or one time I could 
store it in an `Option` 
 and take 
 it 
when it is called. The free callback would then free that Option.


But if the completion callback will be called exactly once that'll be 
unnecessary.



Best regards,

Tage
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson



On 7/13/2023 3:26 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 08:01:09AM -0500, Eric Blake wrote:

On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:

I have replaced a call to `nbd_opt_info()` with a call to
`nbd_aio_opt_info()` and passed in a completion callback which just
calls `exit(EXIT_FAILURE)`. So if the completion callback is called
the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.

For the context callback (for opt_set_meta), .callback may be called
zero, one, or multiple times, but .free should be called exactly once.
But for the completion callback, I agree that the docs state that both
.callback and .free should each be called exactly once ("On the other
hand, if a completion callback is supplied (only possible with
asynchronous commands), it will always be reached exactly once, and
the completion callback must not ignore the value pointed to by
C."); we are indeed missing the call to .callback.  I'll work
on a patch.

Eww, the bug is bigger than just nbd_aio_opt* not always calling
completion.callback exactly once.  With just this diff (to be folded
into the larger patch I'm working on), I'm getting an assertion
failure that we fail to call completion.callback for
nbd_aio_pread_structured when calling nbd_close() prior to the command
running to completion, so I'll have to fix that too.

Just to be clear about this, are we really sure the completion
callback should always be called once?  I'm not clear why that should
be the case.  (The .free callback however should be.)

For example, if I call a function with bogus invalid parameters so
that it fails very early on (or when the handle is in an incorrect
state), should I expect the completion callback to be called?  I would
expect not.

Rich.



The user needs a way to know if an error occurred. So the completion 
callback must be called if the asynchronous function did not fail 
(returned 0). If the completion callback should be called, with the 
error parameter set, even when the asynchronous function immediately 
failed with a non-zero return value is another question. I see two 
possibilities: Either the completion callback should always be called. 
Or it should be called iff the asynchronous function returned 0 (did not 
fail).



By the way, if the error parameter is set in the completion callback, 
how can the user retrieve the error text? Is it possible to call 
`get_nbd_error(3)` from inside the completion callback?



Best regards,

Tage



diff --git i/tests/closure-lifetimes.c w/tests/closure-lifetimes.c
index 9bb4e120..c5ad4c10 100644
--- i/tests/closure-lifetimes.c
+++ w/tests/closure-lifetimes.c
@@ -66,6 +66,7 @@ read_cb (void *opaque,
   uint64_t offset, unsigned status, int *error)
  {
assert (!read_cb_freed);
+  assert (!completion_cb_called);
read_cb_called++;
return 0;
  }
@@ -73,6 +74,7 @@ read_cb (void *opaque,
  static void
  read_cb_free (void *opaque)
  {
+  assert (completion_cb_called);
read_cb_freed++;
  }

@@ -81,6 +83,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
offset,
   uint32_t *entries, size_t nr_entries, int *error)
  {
assert (!block_status_cb_freed);
+  assert (!completion_cb_called);
block_status_cb_called++;
return 0;
  }
@@ -88,6 +91,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
offset,
  static void
  block_status_cb_free (void *opaque)
  {
+  assert (completion_cb_called);
block_status_cb_freed++;
  }

@@ -102,6 +106,7 @@ completion_cb (void *opaque, int *error)
  static void
  completion_cb_free (void *opaque)
  {
+  assert (completion_cb_called);
completion_cb_freed++;
  }


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Eric Blake
On Thu, Jul 13, 2023 at 07:13:37AM -0500, Eric Blake wrote:
> > > I have replaced a call to `nbd_opt_info()` with a call to
> > > `nbd_aio_opt_info()` and passed in a completion callback which just
> > > calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> > > the test should fail, which it doesn't, at least not on my machine.
> > 
> > Isn't that OK?  Only .free is required to be called.
> 
> For the context callback (for opt_set_meta), .callback may be called
> zero, one, or multiple times, but .free should be called exactly once.
> But for the completion callback, I agree that the docs state that both
> .callback and .free should each be called exactly once ("On the other
> hand, if a completion callback is supplied (only possible with
> asynchronous commands), it will always be reached exactly once, and
> the completion callback must not ignore the value pointed to by
> C."); we are indeed missing the call to .callback.  I'll work
> on a patch.

Eww, the bug is bigger than just nbd_aio_opt* not always calling
completion.callback exactly once.  With just this diff (to be folded
into the larger patch I'm working on), I'm getting an assertion
failure that we fail to call completion.callback for
nbd_aio_pread_structured when calling nbd_close() prior to the command
running to completion, so I'll have to fix that too.

diff --git i/tests/closure-lifetimes.c w/tests/closure-lifetimes.c
index 9bb4e120..c5ad4c10 100644
--- i/tests/closure-lifetimes.c
+++ w/tests/closure-lifetimes.c
@@ -66,6 +66,7 @@ read_cb (void *opaque,
  uint64_t offset, unsigned status, int *error)
 {
   assert (!read_cb_freed);
+  assert (!completion_cb_called);
   read_cb_called++;
   return 0;
 }
@@ -73,6 +74,7 @@ read_cb (void *opaque,
 static void
 read_cb_free (void *opaque)
 {
+  assert (completion_cb_called);
   read_cb_freed++;
 }

@@ -81,6 +83,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
offset,
  uint32_t *entries, size_t nr_entries, int *error)
 {
   assert (!block_status_cb_freed);
+  assert (!completion_cb_called);
   block_status_cb_called++;
   return 0;
 }
@@ -88,6 +91,7 @@ block_status_cb (void *opaque, const char *meta, uint64_t 
offset,
 static void
 block_status_cb_free (void *opaque)
 {
+  assert (completion_cb_called);
   block_status_cb_freed++;
 }

@@ -102,6 +106,7 @@ completion_cb (void *opaque, int *error)
 static void
 completion_cb_free (void *opaque)
 {
+  assert (completion_cb_called);
   completion_cb_freed++;
 }


-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Eric Blake
On Thu, Jul 13, 2023 at 12:03:24PM +0100, Richard W.M. Jones wrote:
> On Thu, Jul 13, 2023 at 09:53:58AM +, Tage Johansson wrote:
> > Apologize if resending, but I'm not sure my previous email was
> > actually delivered.
> > 
> > 
> > On 7/12/2023 10:33 PM, Eric Blake wrote:
> > 
> > 
> > >On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:
> > >>Hello,
> > >>
> > >>
> > >>While writing some tests for the Rust bindings, I discovered a
> > >>memory leak
> > >>with Valgrind due to a completion callback not being freed. More
> > >>specificly,
> > >>the completion callback of `aio_opt_info()` doesn't seem to be if
> > >>`aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
> > >>connected state, so it should indeed fail, but it should perhaps
> > >>also call
> > >>the free function associated with the callback.
> > >Can you paste the valgrind output showing the leak?
> > 
> > 
> > The Valgrind output is very Rust specific and only shows the Rust
> > allocations which goes into the completion callback and are lost.
> > 
> > 
> > But if you apply the following patch (which modifies
> > tests/opt-info.c) you shall see that the completion callback is not
> > called.

Ah. I see.  We are testing the synchronous command, but not the aio
counterpart command.  That is indeed a hole in the testsuite (even
after my patch yesterday).

> > 
> > I have replaced a call to `nbd_opt_info()` with a call to
> > `nbd_aio_opt_info()` and passed in a completion callback which just
> > calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> > the test should fail, which it doesn't, at least not on my machine.
> 
> Isn't that OK?  Only .free is required to be called.

For the context callback (for opt_set_meta), .callback may be called
zero, one, or multiple times, but .free should be called exactly once.
But for the completion callback, I agree that the docs state that both
.callback and .free should each be called exactly once ("On the other
hand, if a completion callback is supplied (only possible with
asynchronous commands), it will always be reached exactly once, and
the completion callback must not ignore the value pointed to by
C."); we are indeed missing the call to .callback.  I'll work
on a patch.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:
> While writing some tests for the Rust bindings, I discovered a
> memory leak with Valgrind due to a completion callback not being
> freed.

A note about the valgrind tests:

They are quite sensitive to versions of software installed.  Basically
they work on my machine, they may work on other Fedora developer's
machines.  They probably won't work in general.

Another thing is they are known to fail if you don't have full debug
symbols installed (and probably frame pointers enabled), since the
stack traces won't be accurate and won't be comparable to the symbol
names in valgrind/suppressions.

As Eric said, if you see another valgrind failure please post the
complete error on the mailing list so we can take a look at it.

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson



On 7/13/2023 1:03 PM, Richard W.M. Jones wrote:

On Thu, Jul 13, 2023 at 09:53:58AM +, Tage Johansson wrote:

Apologize if resending, but I'm not sure my previous email was
actually delivered.


On 7/12/2023 10:33 PM, Eric Blake wrote:



On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:

Hello,


While writing some tests for the Rust bindings, I discovered a
memory leak
with Valgrind due to a completion callback not being freed. More
specificly,
the completion callback of `aio_opt_info()` doesn't seem to be if
`aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
connected state, so it should indeed fail, but it should perhaps
also call
the free function associated with the callback.

Can you paste the valgrind output showing the leak?


The Valgrind output is very Rust specific and only shows the Rust
allocations which goes into the completion callback and are lost.


But if you apply the following patch (which modifies
tests/opt-info.c) you shall see that the completion callback is not
called.

I have replaced a call to `nbd_opt_info()` with a call to
`nbd_aio_opt_info()` and passed in a completion callback which just
calls `exit(EXIT_FAILURE)`. So if the completion callback is called
the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.



Of course, you are right. I will continue investigating why my Rust 
closure isn't freed though.



Best regards,

Tage


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Richard W.M. Jones
On Thu, Jul 13, 2023 at 09:53:58AM +, Tage Johansson wrote:
> Apologize if resending, but I'm not sure my previous email was
> actually delivered.
> 
> 
> On 7/12/2023 10:33 PM, Eric Blake wrote:
> 
> 
> >On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:
> >>Hello,
> >>
> >>
> >>While writing some tests for the Rust bindings, I discovered a
> >>memory leak
> >>with Valgrind due to a completion callback not being freed. More
> >>specificly,
> >>the completion callback of `aio_opt_info()` doesn't seem to be if
> >>`aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
> >>connected state, so it should indeed fail, but it should perhaps
> >>also call
> >>the free function associated with the callback.
> >Can you paste the valgrind output showing the leak?
> 
> 
> The Valgrind output is very Rust specific and only shows the Rust
> allocations which goes into the completion callback and are lost.
> 
> 
> But if you apply the following patch (which modifies
> tests/opt-info.c) you shall see that the completion callback is not
> called.
> 
> I have replaced a call to `nbd_opt_info()` with a call to
> `nbd_aio_opt_info()` and passed in a completion callback which just
> calls `exit(EXIT_FAILURE)`. So if the completion callback is called
> the test should fail, which it doesn't, at least not on my machine.

Isn't that OK?  Only .free is required to be called.

Rich.

> 
> diff --git a/tests/opt-info.c b/tests/opt-info.c
> index 86a0608..ba5c72b 100644
> --- a/tests/opt-info.c
> +++ b/tests/opt-info.c
> @@ -30,6 +30,12 @@
> 
>  #include 
> 
> +int
> +my_completion_cb(void *_user_data, int *_err)
> +{
> +  exit(EXIT_FAILURE);
> +}
> +
>  int
>  main (int argc, char *argv[])
>  {
> @@ -201,7 +207,10 @@ main (int argc, char *argv[])
>  fprintf (stderr, "expecting export name 'good', got '%s'\n", s);
>  exit (EXIT_FAILURE);
>    }
> -  if (nbd_opt_info (nbd) != -1) {
> +  if (nbd_aio_opt_info (
> +    nbd,
> +    (nbd_completion_callback) { .callback = my_completion_cb }
> +  ) != -1) {
>  fprintf (stderr, "expecting error for opt_info\n");
>  exit (EXIT_FAILURE);
>    }
> 
> Best regards,
> 
> Tage
> 
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
Fedora Windows cross-compiler. Compile Windows programs, test, and
build Windows installers. Over 100 libraries supported.
http://fedoraproject.org/wiki/MinGW
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs



Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson

On 7/12/2023 10:33 PM, Eric Blake wrote:


On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:

Hello,


While writing some tests for the Rust bindings, I discovered a memory leak
with Valgrind due to a completion callback not being freed. More specificly,
the completion callback of `aio_opt_info()` doesn't seem to be if
`aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
connected state, so it should indeed fail, but it should perhaps also call
the free function associated with the callback.

Can you paste the valgrind output showing the leak?



The Valgrind output is very Rust specific and only shows the Rust 
allocations which goes into the completion callback and are lost.



But if you apply the following patch (which modifies tests/opt-info.c) 
you shall see that the completion callback is not called.


I have replaced a call to `nbd_opt_info()` with a call to 
`nbd_aio_opt_info()` and passed in a completion callback which just 
calls `exit(EXIT_FAILURE)`. So if the completion callback is called the 
test should fail, which it doesn't, at least not on my machine.



diff --git a/tests/opt-info.c b/tests/opt-info.c
index 86a0608..ba5c72b 100644
--- a/tests/opt-info.c
+++ b/tests/opt-info.c
@@ -30,6 +30,12 @@

 #include 

+int
+my_completion_cb(void *_user_data, int *_err)
+{
+  exit(EXIT_FAILURE);
+}
+
 int
 main (int argc, char *argv[])
 {
@@ -201,7 +207,10 @@ main (int argc, char *argv[])
 fprintf (stderr, "expecting export name 'good', got '%s'\n", s);
 exit (EXIT_FAILURE);
   }
-  if (nbd_opt_info (nbd) != -1) {
+  if (nbd_aio_opt_info (
+    nbd,
+    (nbd_completion_callback) { .callback = my_completion_cb }
+  ) != -1) {
 fprintf (stderr, "expecting error for opt_info\n");
 exit (EXIT_FAILURE);
   }

Best regards,

Tage


___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-13 Thread Tage Johansson
Apologize if resending, but I'm not sure my previous email was actually 
delivered.



On 7/12/2023 10:33 PM, Eric Blake wrote:



On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:

Hello,


While writing some tests for the Rust bindings, I discovered a memory 
leak
with Valgrind due to a completion callback not being freed. More 
specificly,

the completion callback of `aio_opt_info()` doesn't seem to be if
`aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
connected state, so it should indeed fail, but it should perhaps also 
call

the free function associated with the callback.

Can you paste the valgrind output showing the leak?



The Valgrind output is very Rust specific and only shows the Rust 
allocations which goes into the completion callback and are lost.



But if you apply the following patch (which modifies tests/opt-info.c) 
you shall see that the completion callback is not called.


I have replaced a call to `nbd_opt_info()` with a call to 
`nbd_aio_opt_info()` and passed in a completion callback which just 
calls `exit(EXIT_FAILURE)`. So if the completion callback is called the 
test should fail, which it doesn't, at least not on my machine.



diff --git a/tests/opt-info.c b/tests/opt-info.c
index 86a0608..ba5c72b 100644
--- a/tests/opt-info.c
+++ b/tests/opt-info.c
@@ -30,6 +30,12 @@

 #include 

+int
+my_completion_cb(void *_user_data, int *_err)
+{
+  exit(EXIT_FAILURE);
+}
+
 int
 main (int argc, char *argv[])
 {
@@ -201,7 +207,10 @@ main (int argc, char *argv[])
 fprintf (stderr, "expecting export name 'good', got '%s'\n", s);
 exit (EXIT_FAILURE);
   }
-  if (nbd_opt_info (nbd) != -1) {
+  if (nbd_aio_opt_info (
+    nbd,
+    (nbd_completion_callback) { .callback = my_completion_cb }
+  ) != -1) {
 fprintf (stderr, "expecting error for opt_info\n");
 exit (EXIT_FAILURE);
   }

Best regards,

Tage



___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-12 Thread Eric Blake
On Wed, Jul 12, 2023 at 03:33:28PM -0500, Eric Blake wrote:
> > > /* Issue NBD_OPT_LIST and wait for the reply. */
> > > int
> > > nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
> > > {
> > >   struct list_helper s = { .list = *list };
> > >   nbd_list_callback l = { .callback = list_visitor, .user_data =  };
> > >   nbd_completion_callback c = { .callback = list_complete, .user_data =
> > >  };
> > > 
> > >   if (nbd_unlocked_aio_opt_list (h, , ) == -1)

We are passing a stack-local variable that points to the user's
callback...[1]

> > Here the list callback has not been freed.
> > >     return -1;
> 
> True at this point, but it also has not been NULL'd.  But
> nbd_unlocked_aio_opt_list() can ONLY return -1 if fails prior to the
> SET_CALLBACK_TO_NULL lines, so we also know *list still contains the
> free callback, and the wrapper nbd_opt_list() in lib/api.c WILL call
> the callback one function up in the call stack, so there is no leak
> here.
> 
> > > 
> > >   SET_CALLBACK_TO_NULL (*list);
> > >   if (wait_for_option (h) == -1)
> > >     return -1;
> 
> But here, we've explicitly taken ownership of *list (that is,
> successful nbd_unlocked_aio_opt_list() transferred the callback over
> to h->opt_cb.fn.list), so if wait_for_option() does not free the
> callback, then we must do so or we indeed have a bug.  But to actually
> figure that out, I have to inspect the state machine, because
> wait_for_option() itself does not currently do anything with the
> callbacks, on success or on failure.
> 
> /me goes and digs further...
> 
> I _do_ see that nbd_internal_free_option() calls the appropriate
> callbacks.  So the question is if we can guarantee that h->opt_cb was
> set, do we always reach nbd_internal_free_option() prior to
> nbd_unlocked_poll() progressing the state machine to the point that we
> either succeeded or transitioned out of state_connecting.
> 
>

I meant to add that a grep for nbd_intenral_free_option() shows that
all of the generator/states-newstyle-opt-*.c do indeed call this as
part of their state machine.  So once h->opt_cb is set (all of our
synchronous nbd_opt calls do so), the state machine guarantees that we
will then reach go_complete(), list_complete(), or context_complete()
(whichever function we registered in our stack-local completion
callback), which in turn calls the user's free callback before
returning on the synchronous functions.

> 
> When I look over the older lib/rw.c, I remember spending quite a bit
> of time patching it to get the handling correct.  Using
> nbd_unlocked_pread_structured as the comparison: if
> nbd_unlocked_aio_pread_structured() returns -1, then either *chunk has
> not yet been set to NULL (so the api.c nbd_pread_structured will call
> the free callback), or the code ensured that the transfer semantics
> happened (and now the state machine will take care of calling the free
> callback at the right point in time on cmd->cb).  There's even code in
> nbd_internal_common_command() that frees the callback before returning
> -1 when the command could not be queued to the state machine.  I even
> remember how much time I spent writing tests/test-errors.c (which Rich
> then later subdivided into finer-grained tests) covering various
> aspects (failure when called in the wrong state, failure with
> parameter validation before contacting the server, failure returned by
> the server, ...) and that each of them didn't leak.
> 
> But I would not be surprised if my more recent additions of lib/opt.c
> missed some steps (unlike commands, you don't have parallel options in
> flight, so the code is simpler - but maybe I simplified too much),
> where we could have a leak.

With my deeper look and testsuite additions, I haven't been able to
prove a leak on either success or failure paths.  Again, if you have a
valgrind trace, please share it (the leak may be in your use of the
code as part of the Rust bindings, and not in the core libnbd code).

> 
> At any rate, I could argue that in the synchronous nbd_unlocked_opt
> functions, we should probably be able to assert that the callback was
> already NULL'd by the successful nbd_unlocked_aio_opt counterpart on
> success, rather than manually trying to re-NULL it ourselves.

[1] the aio function NULL'd our stack-local variable, not the user's
callback, but we did successfully transfer it to the state machine, so
this explicit NULL (rather than asserting that it was already nulled)
is correct after all.  Still, I'm not opposed to a patch adding
comments to lib/opt.c to make things clearer.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs


Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-12 Thread Eric Blake
On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:
> Hello,
> 
> 
> While writing some tests for the Rust bindings, I discovered a memory leak
> with Valgrind due to a completion callback not being freed. More specificly,
> the completion callback of `aio_opt_info()` doesn't seem to be if
> `aio_opt_info()` failes. In this case, `aio_opt_info()` was called in the
> connected state, so it should indeed fail, but it should perhaps also call
> the free function associated with the callback.

Can you paste the valgrind output showing the leak?

It is important to view the generated lib/api.c.  Note that
nbd_aio_opt_info() unconditionally calls any remaining free callback
prior to returning -1.  If you called nbd_aio_opt_info() from the
wrong state, we never even get to the nbd_unlocked_aio_opt_info() code
in lib/opt.c.  So the only thing we really have to worry about is if
we have any exit paths where we have claimed ownership of the callback
by transferring it elsewhere (that is, where we called
SET_CALLBACK_TO_NULL because we have set up h->opt_cb) but still
return -1 without our claimed ownership still having a reachable path
for still calling the callback at the right time.

> 
> 
> According to libnbd(3) :
> 
> > Callback lifetimes
> > You can associate an optional free function with callbacks. Libnbd will
> > call this function when the callback will not be called again by libnbd,
> > including in the case where the API fails.
> 
> After studying the code in lib/opt.c, I think that the situation with
> synchronous callbacks like `nbd_opt_list()` is even worse. Here the list
> callback will not be freed if the internal call to
> `nbd_unlocked_aio_opt_list()` failes, but it will be freed if the completion
> callback got an error which is returned by `nbd_opt_list()`.
> 
> 
> > /* Issue NBD_OPT_LIST and wait for the reply. */
> > int
> > nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
> > {
> >   struct list_helper s = { .list = *list };
> >   nbd_list_callback l = { .callback = list_visitor, .user_data =  };
> >   nbd_completion_callback c = { .callback = list_complete, .user_data =
> >  };
> > 
> >   if (nbd_unlocked_aio_opt_list (h, , ) == -1)
> Here the list callback has not been freed.
> >     return -1;

True at this point, but it also has not been NULL'd.  But
nbd_unlocked_aio_opt_list() can ONLY return -1 if fails prior to the
SET_CALLBACK_TO_NULL lines, so we also know *list still contains the
free callback, and the wrapper nbd_opt_list() in lib/api.c WILL call
the callback one function up in the call stack, so there is no leak
here.

[meta-comment: I find that when repling inline, which is common
practice in patch reviews, it helps if I include a blank line before
and after anything I inject, so my reply text doesn't get overlooked
when scanning the wall of quoted text]

> > 
> >   SET_CALLBACK_TO_NULL (*list);
> >   if (wait_for_option (h) == -1)
> >     return -1;

But here, we've explicitly taken ownership of *list (that is,
successful nbd_unlocked_aio_opt_list() transferred the callback over
to h->opt_cb.fn.list), so if wait_for_option() does not free the
callback, then we must do so or we indeed have a bug.  But to actually
figure that out, I have to inspect the state machine, because
wait_for_option() itself does not currently do anything with the
callbacks, on success or on failure.

/me goes and digs further...

I _do_ see that nbd_internal_free_option() calls the appropriate
callbacks.  So the question is if we can guarantee that h->opt_cb was
set, do we always reach nbd_internal_free_option() prior to
nbd_unlocked_poll() progressing the state machine to the point that we
either succeeded or transitioned out of state_connecting.



When I look over the older lib/rw.c, I remember spending quite a bit
of time patching it to get the handling correct.  Using
nbd_unlocked_pread_structured as the comparison: if
nbd_unlocked_aio_pread_structured() returns -1, then either *chunk has
not yet been set to NULL (so the api.c nbd_pread_structured will call
the free callback), or the code ensured that the transfer semantics
happened (and now the state machine will take care of calling the free
callback at the right point in time on cmd->cb).  There's even code in
nbd_internal_common_command() that frees the callback before returning
-1 when the command could not be queued to the state machine.  I even
remember how much time I spent writing tests/test-errors.c (which Rich
then later subdivided into finer-grained tests) covering various
aspects (failure when called in the wrong state, failure with
parameter validation before contacting the server, failure returned by
the server, ...) and that each of them didn't leak.

But I would not be surprised if my more recent additions of lib/opt.c
missed some steps (unlike commands, you don't have parallel options in
flight, so the code is simpler - but maybe I simplified too much),
where we 

Re: [Libguestfs] libnbd: When are callbacks freed

2023-07-12 Thread Richard W.M. Jones
On Wed, Jul 12, 2023 at 03:19:59PM +, Tage Johansson wrote:
> Hello,
> 
> While writing some tests for the Rust bindings, I discovered a
> memory leak with Valgrind due to a completion callback not being
> freed. More specificly, the completion callback of `aio_opt_info()`
> doesn't seem to be if `aio_opt_info()` failes. In this case,
> `aio_opt_info()` was called in the connected state, so it should
> indeed fail, but it should perhaps also call the free function
> associated with the callback.
> 
> According to libnbd(3):
> 
> Callback lifetimes
> You can associate an optional free function with callbacks. Libnbd will
> call this function when the callback will not be called again by libnbd,
> including in the case where the API fails.

I wanted to see / remember exactly how this was supposed to work, so I
picked nbd_aio_pread which has a callback.  Does this free the
callback in all basic error cases?  The answer is yes:

  int64_t
  nbd_aio_pread (
struct nbd_handle *h, void *buf, size_t count, uint64_t offset,
nbd_completion_callback completion_callback, uint32_t flags
  )
  {
  ...
p = aio_pread_in_permitted_state (h);
if (unlikely (!p)) {
  ret = -1;
  goto out;
}
  ...
  out:
  ...
   FREE_CALLBACK (completion_callback);

> After studying the code in lib/opt.c, I think that the situation with
> synchronous callbacks like `nbd_opt_list()` is even worse. Here the list
> callback will not be freed if the internal call to 
> `nbd_unlocked_aio_opt_list()
> ` failes, but it will be freed if the completion callback got an error which 
> is
> returned by `nbd_opt_list()`.
> 
> 
> /* Issue NBD_OPT_LIST and wait for the reply. */
> int
> nbd_unlocked_opt_list (struct nbd_handle *h, nbd_list_callback *list)
> {
>   struct list_helper s = { .list = *list };
>   nbd_list_callback l = { .callback = list_visitor, .user_data =  };
>   nbd_completion_callback c = { .callback = list_complete, .user_data = 
> };
> 
>   if (nbd_unlocked_aio_opt_list (h, , ) == -1)
> 
> Here the list callback has not been freed.

Yes, I believe this is just a bug in the implementation of
nbd_unlocked_opt_list which should be fixed (to use FREE_CALLBACK
along this path).

>     return -1;
> 
>   SET_CALLBACK_TO_NULL (*list);
>   if (wait_fo_option (h) == -1)
>     return -1;
>   if (s.err) {
> 
> But here I think that the callback has been freed.

It took me a while to figure it out, but it seems so.  The callbacks
are copied to h->opt_cb.fn.list and opt_cb.fn.context and then freed
in the state machine.

>     set_error (s.err, "server replied with error to list request");
>     return -1;
>   }
>   return s.count;
> }
>
> The problem is that if `nbd_opt_list()` returns an error, I as a
> user cannot know wether I should free the data associated with the
> list callback myself or if that has been done already.

It seems like it's just a missing call to FREE_CALLBACK.

> Maybe I am just misunderstanding the code or the API, but if not
> then I wonder how I should know when I need to free the user data
> associated with a callback myself and when that is done by libnbd?

No, it should work as in the documentation, else it's a bug.

Rich.

> 
> Best regards,
> 
> Tage
> 
> 

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html
___
Libguestfs mailing list
Libguestfs@redhat.com
https://listman.redhat.com/mailman/listinfo/libguestfs