Re: [libvirt] virNetClientPtr leak in remote driver

2011-08-04 Thread Daniel P. Berrange
On Thu, Aug 04, 2011 at 05:26:31PM +0200, Matthias Bolte wrote:
> 2011/8/2 Daniel P. Berrange :
> > On Mon, Aug 01, 2011 at 06:35:21PM +0200, Matthias Bolte wrote:
> >> 2011/8/1 Eric Blake :
> >> > On 07/28/2011 12:07 PM, Matthias Bolte wrote:
> >> >> 2011/7/27 Matthias Bolte :
> >> >>> doRemoteClose doesn't free the virNetClientPtr and this creates a
> >> >>> 260kb leak per remote connection. This happens because
> >> >>> virNetClientFree doesn't remove the last ref, because virNetClientNew
> >> >>> creates the virNetClientPtr with a refcount of 2.
> >> >>>
> >> >
> >> >> The memory leak I saw was due to virsh calling
> >> >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
> >> >> Because the event loop is initialized, the call to
> >> >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
> >> >> driving the event loop results in not removing callbacks that were
> >> >> marked for deletion. Finally this leaks the virNetClientPtr using in
> >> >> the remote driver. I used a virsh -c qemu:///system to test.
> >> >>
> >> >> I was able fix this by calling virEventRunDefaultImpl after
> >> >> virConnectClose in virsh. But I don't think that this is the correct
> >> >> general solution.
> >> >
> >> > Where do we stand with 0.9.4?  Is this something where we need the
> >> > general fix before the release, or is just the virsh hack to call
> >> > virEventRunDefaultImpl good enough for the release, or is this problem
> >> > not severe enough and we can release as-is?
> >>
> >> virsh is a bit special here as it registers/initializes the default
> >> even loop but doesn't drive it properly. It only drives it for the
> >> console command. That's the problem in virsh. This can be avoided by
> >> calling virEventRunDefaultImpl after virConnectClose iff it's a remote
> >> connection, in other cases virEventRunDefaultImpl might block.
> >> Therefore, we shouldn't be calling it in general after a
> >> virConnectClose in virsh.
> >>
> >> If we assume that an application that registers an event loop will
> >> drive it properly then this problem is not critical, as it doesn't
> >> exist. Just virsh is a special case that leaks 260kb per closed remote
> >> connection. When we assume that a typical virsh user uses a single
> >> connection per virsh invocation then we can view this as a static
> >> leak.
> >
> > Yeah, this is a case I never thought of. The "easy" fix is to just
> > make virsh spawn a new thread to run the event loop in the background.
> 
> The problem here will probably be the console command with this lines
> 
> while (!con->quit) {
> if (virEventRunDefaultImpl() < 0)
> break;
> }
> 
> in console.c. Having two threads calling virEventRunDefaultImpl
> probably isn't a good idea.

Oh yes, we'd have to change that code too, to delegate to the
background event loop thread.


> > The "hard" fix is to make virsh I/O entirely event driven, so that
> > we don't just sit in a blocking read of stdin waiting for input,
> > but instead use the event loop to process stdin.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] virNetClientPtr leak in remote driver

2011-08-04 Thread Matthias Bolte
2011/8/2 Daniel P. Berrange :
> On Mon, Aug 01, 2011 at 06:35:21PM +0200, Matthias Bolte wrote:
>> 2011/8/1 Eric Blake :
>> > On 07/28/2011 12:07 PM, Matthias Bolte wrote:
>> >> 2011/7/27 Matthias Bolte :
>> >>> doRemoteClose doesn't free the virNetClientPtr and this creates a
>> >>> 260kb leak per remote connection. This happens because
>> >>> virNetClientFree doesn't remove the last ref, because virNetClientNew
>> >>> creates the virNetClientPtr with a refcount of 2.
>> >>>
>> >
>> >> The memory leak I saw was due to virsh calling
>> >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
>> >> Because the event loop is initialized, the call to
>> >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
>> >> driving the event loop results in not removing callbacks that were
>> >> marked for deletion. Finally this leaks the virNetClientPtr using in
>> >> the remote driver. I used a virsh -c qemu:///system to test.
>> >>
>> >> I was able fix this by calling virEventRunDefaultImpl after
>> >> virConnectClose in virsh. But I don't think that this is the correct
>> >> general solution.
>> >
>> > Where do we stand with 0.9.4?  Is this something where we need the
>> > general fix before the release, or is just the virsh hack to call
>> > virEventRunDefaultImpl good enough for the release, or is this problem
>> > not severe enough and we can release as-is?
>>
>> virsh is a bit special here as it registers/initializes the default
>> even loop but doesn't drive it properly. It only drives it for the
>> console command. That's the problem in virsh. This can be avoided by
>> calling virEventRunDefaultImpl after virConnectClose iff it's a remote
>> connection, in other cases virEventRunDefaultImpl might block.
>> Therefore, we shouldn't be calling it in general after a
>> virConnectClose in virsh.
>>
>> If we assume that an application that registers an event loop will
>> drive it properly then this problem is not critical, as it doesn't
>> exist. Just virsh is a special case that leaks 260kb per closed remote
>> connection. When we assume that a typical virsh user uses a single
>> connection per virsh invocation then we can view this as a static
>> leak.
>
> Yeah, this is a case I never thought of. The "easy" fix is to just
> make virsh spawn a new thread to run the event loop in the background.

The problem here will probably be the console command with this lines

while (!con->quit) {
if (virEventRunDefaultImpl() < 0)
break;
}

in console.c. Having two threads calling virEventRunDefaultImpl
probably isn't a good idea.

> The "hard" fix is to make virsh I/O entirely event driven, so that
> we don't just sit in a blocking read of stdin waiting for input,
> but instead use the event loop to process stdin.
>
> Daniel

-- 
Matthias Bolte
http://photron.blogspot.com

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] virNetClientPtr leak in remote driver

2011-08-02 Thread Daniel P. Berrange
On Mon, Aug 01, 2011 at 06:35:21PM +0200, Matthias Bolte wrote:
> 2011/8/1 Eric Blake :
> > On 07/28/2011 12:07 PM, Matthias Bolte wrote:
> >> 2011/7/27 Matthias Bolte :
> >>> doRemoteClose doesn't free the virNetClientPtr and this creates a
> >>> 260kb leak per remote connection. This happens because
> >>> virNetClientFree doesn't remove the last ref, because virNetClientNew
> >>> creates the virNetClientPtr with a refcount of 2.
> >>>
> >
> >> The memory leak I saw was due to virsh calling
> >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
> >> Because the event loop is initialized, the call to
> >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
> >> driving the event loop results in not removing callbacks that were
> >> marked for deletion. Finally this leaks the virNetClientPtr using in
> >> the remote driver. I used a virsh -c qemu:///system to test.
> >>
> >> I was able fix this by calling virEventRunDefaultImpl after
> >> virConnectClose in virsh. But I don't think that this is the correct
> >> general solution.
> >
> > Where do we stand with 0.9.4?  Is this something where we need the
> > general fix before the release, or is just the virsh hack to call
> > virEventRunDefaultImpl good enough for the release, or is this problem
> > not severe enough and we can release as-is?
> 
> virsh is a bit special here as it registers/initializes the default
> even loop but doesn't drive it properly. It only drives it for the
> console command. That's the problem in virsh. This can be avoided by
> calling virEventRunDefaultImpl after virConnectClose iff it's a remote
> connection, in other cases virEventRunDefaultImpl might block.
> Therefore, we shouldn't be calling it in general after a
> virConnectClose in virsh.
> 
> If we assume that an application that registers an event loop will
> drive it properly then this problem is not critical, as it doesn't
> exist. Just virsh is a special case that leaks 260kb per closed remote
> connection. When we assume that a typical virsh user uses a single
> connection per virsh invocation then we can view this as a static
> leak.

Yeah, this is a case I never thought of. The "easy" fix is to just
make virsh spawn a new thread to run the event loop in the background.
The "hard" fix is to make virsh I/O entirely event driven, so that
we don't just sit in a blocking read of stdin waiting for input,
but instead use the event loop to process stdin.

Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] virNetClientPtr leak in remote driver

2011-08-01 Thread Matthias Bolte
2011/8/1 Eric Blake :
> On 07/28/2011 12:07 PM, Matthias Bolte wrote:
>> 2011/7/27 Matthias Bolte :
>>> doRemoteClose doesn't free the virNetClientPtr and this creates a
>>> 260kb leak per remote connection. This happens because
>>> virNetClientFree doesn't remove the last ref, because virNetClientNew
>>> creates the virNetClientPtr with a refcount of 2.
>>>
>
>> The memory leak I saw was due to virsh calling
>> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
>> Because the event loop is initialized, the call to
>> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
>> driving the event loop results in not removing callbacks that were
>> marked for deletion. Finally this leaks the virNetClientPtr using in
>> the remote driver. I used a virsh -c qemu:///system to test.
>>
>> I was able fix this by calling virEventRunDefaultImpl after
>> virConnectClose in virsh. But I don't think that this is the correct
>> general solution.
>
> Where do we stand with 0.9.4?  Is this something where we need the
> general fix before the release, or is just the virsh hack to call
> virEventRunDefaultImpl good enough for the release, or is this problem
> not severe enough and we can release as-is?

virsh is a bit special here as it registers/initializes the default
even loop but doesn't drive it properly. It only drives it for the
console command. That's the problem in virsh. This can be avoided by
calling virEventRunDefaultImpl after virConnectClose iff it's a remote
connection, in other cases virEventRunDefaultImpl might block.
Therefore, we shouldn't be calling it in general after a
virConnectClose in virsh.

If we assume that an application that registers an event loop will
drive it properly then this problem is not critical, as it doesn't
exist. Just virsh is a special case that leaks 260kb per closed remote
connection. When we assume that a typical virsh user uses a single
connection per virsh invocation then we can view this as a static
leak.

Therefore, I'd say we could just leave this as-is for 0.9.4, except
Dan would be back soon and the problem is easy to fix for him.

Another problem is the libvirtd crash that Wen discovered. That's more
serious, as a misbehaving client shouldn't be able to crash libvirtd.

https://www.redhat.com/archives/libvir-list/2011-August/msg5.html

-- 
Matthias Bolte
http://photron.blogspot.com

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] virNetClientPtr leak in remote driver

2011-08-01 Thread Eric Blake
On 07/28/2011 12:07 PM, Matthias Bolte wrote:
> 2011/7/27 Matthias Bolte :
>> doRemoteClose doesn't free the virNetClientPtr and this creates a
>> 260kb leak per remote connection. This happens because
>> virNetClientFree doesn't remove the last ref, because virNetClientNew
>> creates the virNetClientPtr with a refcount of 2.
>>

> The memory leak I saw was due to virsh calling
> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
> Because the event loop is initialized, the call to
> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
> driving the event loop results in not removing callbacks that were
> marked for deletion. Finally this leaks the virNetClientPtr using in
> the remote driver. I used a virsh -c qemu:///system to test.
> 
> I was able fix this by calling virEventRunDefaultImpl after
> virConnectClose in virsh. But I don't think that this is the correct
> general solution.

Where do we stand with 0.9.4?  Is this something where we need the
general fix before the release, or is just the virsh hack to call
virEventRunDefaultImpl good enough for the release, or is this problem
not severe enough and we can release as-is?

> 
> To me the general problem seems to be the entanglement between the
> virNetClientPtr refcounting and the event loop. I'm not sure how to
> improve this in general. Maybe should have a thread calling
> virEventRunDefaultImpl as the comment on virEventRunDefaultImpl
> suggest.

I'm hoping danpb will be able to chime in soon.

-- 
Eric Blake   ebl...@redhat.com+1-801-349-2682
Libvirt virtualization library http://libvirt.org

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list


Re: [libvirt] virNetClientPtr leak in remote driver

2011-07-28 Thread Matthias Bolte
2011/7/27 Matthias Bolte :
> doRemoteClose doesn't free the virNetClientPtr and this creates a
> 260kb leak per remote connection. This happens because
> virNetClientFree doesn't remove the last ref, because virNetClientNew
> creates the virNetClientPtr with a refcount of 2.
>
> static virNetClientPtr virNetClientNew(virNetSocketPtr sock,
>                                       const char *hostname)
> {
> [...]
>    client->refs = 1;
> [...]
>    /* Set up a callback to listen on the socket data */
>    client->refs++;
>    if (virNetSocketAddIOCallback(client->sock,
>                                  VIR_EVENT_HANDLE_READABLE,
>                                  virNetClientIncomingEvent,
>                                  client,
>                                  virNetClientEventFree) < 0) {
>        client->refs--;
>        VIR_DEBUG("Failed to add event watch, disabling events");
>    }
> [...]
> }
>
> virNetClientNew adds a ref before calling virNetSocketAddIOCallback
> but only removes it when virNetSocketAddIOCallback fails. This seems
> wrong too me and the ref should be removed in the success case too.
>
> The same pattern was added in 0302391ee643ad91fdc6d2ecf7e4cf0fc9724516
> (Fix race in ref counting when handling RPC jobs)
>
> --- a/src/rpc/virnetserverclient.c
> +++ b/src/rpc/virnetserverclient.c
> @@ -763,10 +763,12 @@ readmore:
>
>         /* Send off to for normal dispatch to workers */
>         if (msg) {
> +            client->refs++;
>             if (!client->dispatchFunc ||
>                 client->dispatchFunc(client, msg,
> client->dispatchOpaque) < 0) {
>                 virNetMessageFree(msg);
>                 client->wantClose = true;
> +                client->refs--;
>                 return;
>             }
>         }
>
> Again, this seems wrong and the ref should be removed in the success
> case here too. Before I spent time to figure out how the refcounting
> is supposed to work here I just report it and hope that Dan can easily
> fix this.

Okay, after some discussion on IRC with Michal and Eric and further
testing I think that the ref counting is okay here.

virNetClientNew adds the second ref because the free callback
(virNetClientEventFree) passed to virNetSocketAddIOCallback will call
virNetClientFree.

Another observation is that virNetClientFree calls
virNetSocketRemoveIOCallback when the last ref was removed. Actually
that's pointless because virNetSocketRemoveIOCallback might indirectly
removed the last ref. So this looks like an ordering problem. The
ordering gets fixed by calling virNetClientClose before
virNetClientFree, because virNetClientClose calls
virNetSocketRemoveIOCallback. Another problem is that
virNetSocketRemoveIOCallback only indirectly results in a call to
virNetClientFree, because the event loop will call virNetClientFree
when removing callbacks marked for deletion --
virNetSocketRemoveIOCallback does this marking.

The memory leak I saw was due to virsh calling
virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl.
Because the event loop is initialized, the call to
virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not
driving the event loop results in not removing callbacks that were
marked for deletion. Finally this leaks the virNetClientPtr using in
the remote driver. I used a virsh -c qemu:///system to test.

I was able fix this by calling virEventRunDefaultImpl after
virConnectClose in virsh. But I don't think that this is the correct
general solution.

To me the general problem seems to be the entanglement between the
virNetClientPtr refcounting and the event loop. I'm not sure how to
improve this in general. Maybe should have a thread calling
virEventRunDefaultImpl as the comment on virEventRunDefaultImpl
suggest.

-- 
Matthias Bolte
http://photron.blogspot.com

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list