Re: [libvirt] virNetClientPtr leak in remote driver
On Thu, Aug 04, 2011 at 05:26:31PM +0200, Matthias Bolte wrote: > 2011/8/2 Daniel P. Berrange : > > On Mon, Aug 01, 2011 at 06:35:21PM +0200, Matthias Bolte wrote: > >> 2011/8/1 Eric Blake : > >> > On 07/28/2011 12:07 PM, Matthias Bolte wrote: > >> >> 2011/7/27 Matthias Bolte : > >> >>> doRemoteClose doesn't free the virNetClientPtr and this creates a > >> >>> 260kb leak per remote connection. This happens because > >> >>> virNetClientFree doesn't remove the last ref, because virNetClientNew > >> >>> creates the virNetClientPtr with a refcount of 2. > >> >>> > >> > > >> >> The memory leak I saw was due to virsh calling > >> >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl. > >> >> Because the event loop is initialized, the call to > >> >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not > >> >> driving the event loop results in not removing callbacks that were > >> >> marked for deletion. Finally this leaks the virNetClientPtr using in > >> >> the remote driver. I used a virsh -c qemu:///system to test. > >> >> > >> >> I was able fix this by calling virEventRunDefaultImpl after > >> >> virConnectClose in virsh. But I don't think that this is the correct > >> >> general solution. > >> > > >> > Where do we stand with 0.9.4? Is this something where we need the > >> > general fix before the release, or is just the virsh hack to call > >> > virEventRunDefaultImpl good enough for the release, or is this problem > >> > not severe enough and we can release as-is? > >> > >> virsh is a bit special here as it registers/initializes the default > >> even loop but doesn't drive it properly. It only drives it for the > >> console command. That's the problem in virsh. This can be avoided by > >> calling virEventRunDefaultImpl after virConnectClose iff it's a remote > >> connection, in other cases virEventRunDefaultImpl might block. > >> Therefore, we shouldn't be calling it in general after a > >> virConnectClose in virsh. > >> > >> If we assume that an application that registers an event loop will > >> drive it properly then this problem is not critical, as it doesn't > >> exist. Just virsh is a special case that leaks 260kb per closed remote > >> connection. When we assume that a typical virsh user uses a single > >> connection per virsh invocation then we can view this as a static > >> leak. > > > > Yeah, this is a case I never thought of. The "easy" fix is to just > > make virsh spawn a new thread to run the event loop in the background. > > The problem here will probably be the console command with this lines > > while (!con->quit) { > if (virEventRunDefaultImpl() < 0) > break; > } > > in console.c. Having two threads calling virEventRunDefaultImpl > probably isn't a good idea. Oh yes, we'd have to change that code too, to delegate to the background event loop thread. > > The "hard" fix is to make virsh I/O entirely event driven, so that > > we don't just sit in a blocking read of stdin waiting for input, > > but instead use the event loop to process stdin. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] virNetClientPtr leak in remote driver
2011/8/2 Daniel P. Berrange : > On Mon, Aug 01, 2011 at 06:35:21PM +0200, Matthias Bolte wrote: >> 2011/8/1 Eric Blake : >> > On 07/28/2011 12:07 PM, Matthias Bolte wrote: >> >> 2011/7/27 Matthias Bolte : >> >>> doRemoteClose doesn't free the virNetClientPtr and this creates a >> >>> 260kb leak per remote connection. This happens because >> >>> virNetClientFree doesn't remove the last ref, because virNetClientNew >> >>> creates the virNetClientPtr with a refcount of 2. >> >>> >> > >> >> The memory leak I saw was due to virsh calling >> >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl. >> >> Because the event loop is initialized, the call to >> >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not >> >> driving the event loop results in not removing callbacks that were >> >> marked for deletion. Finally this leaks the virNetClientPtr using in >> >> the remote driver. I used a virsh -c qemu:///system to test. >> >> >> >> I was able fix this by calling virEventRunDefaultImpl after >> >> virConnectClose in virsh. But I don't think that this is the correct >> >> general solution. >> > >> > Where do we stand with 0.9.4? Is this something where we need the >> > general fix before the release, or is just the virsh hack to call >> > virEventRunDefaultImpl good enough for the release, or is this problem >> > not severe enough and we can release as-is? >> >> virsh is a bit special here as it registers/initializes the default >> even loop but doesn't drive it properly. It only drives it for the >> console command. That's the problem in virsh. This can be avoided by >> calling virEventRunDefaultImpl after virConnectClose iff it's a remote >> connection, in other cases virEventRunDefaultImpl might block. >> Therefore, we shouldn't be calling it in general after a >> virConnectClose in virsh. >> >> If we assume that an application that registers an event loop will >> drive it properly then this problem is not critical, as it doesn't >> exist. Just virsh is a special case that leaks 260kb per closed remote >> connection. When we assume that a typical virsh user uses a single >> connection per virsh invocation then we can view this as a static >> leak. > > Yeah, this is a case I never thought of. The "easy" fix is to just > make virsh spawn a new thread to run the event loop in the background. The problem here will probably be the console command with this lines while (!con->quit) { if (virEventRunDefaultImpl() < 0) break; } in console.c. Having two threads calling virEventRunDefaultImpl probably isn't a good idea. > The "hard" fix is to make virsh I/O entirely event driven, so that > we don't just sit in a blocking read of stdin waiting for input, > but instead use the event loop to process stdin. > > Daniel -- Matthias Bolte http://photron.blogspot.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] virNetClientPtr leak in remote driver
On Mon, Aug 01, 2011 at 06:35:21PM +0200, Matthias Bolte wrote: > 2011/8/1 Eric Blake : > > On 07/28/2011 12:07 PM, Matthias Bolte wrote: > >> 2011/7/27 Matthias Bolte : > >>> doRemoteClose doesn't free the virNetClientPtr and this creates a > >>> 260kb leak per remote connection. This happens because > >>> virNetClientFree doesn't remove the last ref, because virNetClientNew > >>> creates the virNetClientPtr with a refcount of 2. > >>> > > > >> The memory leak I saw was due to virsh calling > >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl. > >> Because the event loop is initialized, the call to > >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not > >> driving the event loop results in not removing callbacks that were > >> marked for deletion. Finally this leaks the virNetClientPtr using in > >> the remote driver. I used a virsh -c qemu:///system to test. > >> > >> I was able fix this by calling virEventRunDefaultImpl after > >> virConnectClose in virsh. But I don't think that this is the correct > >> general solution. > > > > Where do we stand with 0.9.4? Is this something where we need the > > general fix before the release, or is just the virsh hack to call > > virEventRunDefaultImpl good enough for the release, or is this problem > > not severe enough and we can release as-is? > > virsh is a bit special here as it registers/initializes the default > even loop but doesn't drive it properly. It only drives it for the > console command. That's the problem in virsh. This can be avoided by > calling virEventRunDefaultImpl after virConnectClose iff it's a remote > connection, in other cases virEventRunDefaultImpl might block. > Therefore, we shouldn't be calling it in general after a > virConnectClose in virsh. > > If we assume that an application that registers an event loop will > drive it properly then this problem is not critical, as it doesn't > exist. Just virsh is a special case that leaks 260kb per closed remote > connection. When we assume that a typical virsh user uses a single > connection per virsh invocation then we can view this as a static > leak. Yeah, this is a case I never thought of. The "easy" fix is to just make virsh spawn a new thread to run the event loop in the background. The "hard" fix is to make virsh I/O entirely event driven, so that we don't just sit in a blocking read of stdin waiting for input, but instead use the event loop to process stdin. Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] virNetClientPtr leak in remote driver
2011/8/1 Eric Blake : > On 07/28/2011 12:07 PM, Matthias Bolte wrote: >> 2011/7/27 Matthias Bolte : >>> doRemoteClose doesn't free the virNetClientPtr and this creates a >>> 260kb leak per remote connection. This happens because >>> virNetClientFree doesn't remove the last ref, because virNetClientNew >>> creates the virNetClientPtr with a refcount of 2. >>> > >> The memory leak I saw was due to virsh calling >> virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl. >> Because the event loop is initialized, the call to >> virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not >> driving the event loop results in not removing callbacks that were >> marked for deletion. Finally this leaks the virNetClientPtr using in >> the remote driver. I used a virsh -c qemu:///system to test. >> >> I was able fix this by calling virEventRunDefaultImpl after >> virConnectClose in virsh. But I don't think that this is the correct >> general solution. > > Where do we stand with 0.9.4? Is this something where we need the > general fix before the release, or is just the virsh hack to call > virEventRunDefaultImpl good enough for the release, or is this problem > not severe enough and we can release as-is? virsh is a bit special here as it registers/initializes the default even loop but doesn't drive it properly. It only drives it for the console command. That's the problem in virsh. This can be avoided by calling virEventRunDefaultImpl after virConnectClose iff it's a remote connection, in other cases virEventRunDefaultImpl might block. Therefore, we shouldn't be calling it in general after a virConnectClose in virsh. If we assume that an application that registers an event loop will drive it properly then this problem is not critical, as it doesn't exist. Just virsh is a special case that leaks 260kb per closed remote connection. When we assume that a typical virsh user uses a single connection per virsh invocation then we can view this as a static leak. Therefore, I'd say we could just leave this as-is for 0.9.4, except Dan would be back soon and the problem is easy to fix for him. Another problem is the libvirtd crash that Wen discovered. That's more serious, as a misbehaving client shouldn't be able to crash libvirtd. https://www.redhat.com/archives/libvir-list/2011-August/msg5.html -- Matthias Bolte http://photron.blogspot.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] virNetClientPtr leak in remote driver
On 07/28/2011 12:07 PM, Matthias Bolte wrote: > 2011/7/27 Matthias Bolte : >> doRemoteClose doesn't free the virNetClientPtr and this creates a >> 260kb leak per remote connection. This happens because >> virNetClientFree doesn't remove the last ref, because virNetClientNew >> creates the virNetClientPtr with a refcount of 2. >> > The memory leak I saw was due to virsh calling > virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl. > Because the event loop is initialized, the call to > virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not > driving the event loop results in not removing callbacks that were > marked for deletion. Finally this leaks the virNetClientPtr using in > the remote driver. I used a virsh -c qemu:///system to test. > > I was able fix this by calling virEventRunDefaultImpl after > virConnectClose in virsh. But I don't think that this is the correct > general solution. Where do we stand with 0.9.4? Is this something where we need the general fix before the release, or is just the virsh hack to call virEventRunDefaultImpl good enough for the release, or is this problem not severe enough and we can release as-is? > > To me the general problem seems to be the entanglement between the > virNetClientPtr refcounting and the event loop. I'm not sure how to > improve this in general. Maybe should have a thread calling > virEventRunDefaultImpl as the comment on virEventRunDefaultImpl > suggest. I'm hoping danpb will be able to chime in soon. -- Eric Blake ebl...@redhat.com+1-801-349-2682 Libvirt virtualization library http://libvirt.org -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] virNetClientPtr leak in remote driver
2011/7/27 Matthias Bolte : > doRemoteClose doesn't free the virNetClientPtr and this creates a > 260kb leak per remote connection. This happens because > virNetClientFree doesn't remove the last ref, because virNetClientNew > creates the virNetClientPtr with a refcount of 2. > > static virNetClientPtr virNetClientNew(virNetSocketPtr sock, > const char *hostname) > { > [...] > client->refs = 1; > [...] > /* Set up a callback to listen on the socket data */ > client->refs++; > if (virNetSocketAddIOCallback(client->sock, > VIR_EVENT_HANDLE_READABLE, > virNetClientIncomingEvent, > client, > virNetClientEventFree) < 0) { > client->refs--; > VIR_DEBUG("Failed to add event watch, disabling events"); > } > [...] > } > > virNetClientNew adds a ref before calling virNetSocketAddIOCallback > but only removes it when virNetSocketAddIOCallback fails. This seems > wrong too me and the ref should be removed in the success case too. > > The same pattern was added in 0302391ee643ad91fdc6d2ecf7e4cf0fc9724516 > (Fix race in ref counting when handling RPC jobs) > > --- a/src/rpc/virnetserverclient.c > +++ b/src/rpc/virnetserverclient.c > @@ -763,10 +763,12 @@ readmore: > > /* Send off to for normal dispatch to workers */ > if (msg) { > + client->refs++; > if (!client->dispatchFunc || > client->dispatchFunc(client, msg, > client->dispatchOpaque) < 0) { > virNetMessageFree(msg); > client->wantClose = true; > + client->refs--; > return; > } > } > > Again, this seems wrong and the ref should be removed in the success > case here too. Before I spent time to figure out how the refcounting > is supposed to work here I just report it and hope that Dan can easily > fix this. Okay, after some discussion on IRC with Michal and Eric and further testing I think that the ref counting is okay here. virNetClientNew adds the second ref because the free callback (virNetClientEventFree) passed to virNetSocketAddIOCallback will call virNetClientFree. Another observation is that virNetClientFree calls virNetSocketRemoveIOCallback when the last ref was removed. Actually that's pointless because virNetSocketRemoveIOCallback might indirectly removed the last ref. So this looks like an ordering problem. The ordering gets fixed by calling virNetClientClose before virNetClientFree, because virNetClientClose calls virNetSocketRemoveIOCallback. Another problem is that virNetSocketRemoveIOCallback only indirectly results in a call to virNetClientFree, because the event loop will call virNetClientFree when removing callbacks marked for deletion -- virNetSocketRemoveIOCallback does this marking. The memory leak I saw was due to virsh calling virEventRegisterDefaultImpl, but not calling virEventRunDefaultImpl. Because the event loop is initialized, the call to virNetSocketAddIOCallback in virNetClientNew succeeds. But virsh not driving the event loop results in not removing callbacks that were marked for deletion. Finally this leaks the virNetClientPtr using in the remote driver. I used a virsh -c qemu:///system to test. I was able fix this by calling virEventRunDefaultImpl after virConnectClose in virsh. But I don't think that this is the correct general solution. To me the general problem seems to be the entanglement between the virNetClientPtr refcounting and the event loop. I'm not sure how to improve this in general. Maybe should have a thread calling virEventRunDefaultImpl as the comment on virEventRunDefaultImpl suggest. -- Matthias Bolte http://photron.blogspot.com -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list