Hi, I can follow your argument that - and agree that in this case the leak can't be solved your patch. Still I found it useful to revise it along our discussion as eventually it will still be a good patch to have. I followed your suggestion and found: - rte_vhost_driver_register callocs vserver (implies fh=0) - later when on init when getting the callback to vserver_new_vq_conn it would get set by ops->new_device(vdev_ctx); - but as you pointed out that could be fh = 0 for the first - so I initialized vserver->fh with -1 in rte_vhost_driver_register - that won't ever be a real fh. - later on get_config_ll_entry won't find a device with that then on the call by destroy_device. - so the revised patch currently in use (still for DPDK 2.2) can be found here http://paste.ubuntu.com/15961394/
Also as you requested I tried with no guest attached at all - that way I can still reproduce it. Here is a new stacktrace, but to me it looks the same http://paste.ubuntu.com/15961185/ Also as you asked before a log of the vswitch, but it is 895MB since a lot of messages repeat on port add/remove. Even compressed still 27MB - I need to do something about verbosity there. Also the system journal of the same time. Therefore I only added links to bz2 files. The crash is at "2016-04-21T07:54:47.782Z" in the logs. => http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/mem-leak-addremove.journal.bz2 => http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/ovs-vswitchd.log.bz2 Kind Regards, Christian Ehrhardt Christian Ehrhardt Software Engineer, Ubuntu Server Canonical Ltd On Thu, Apr 21, 2016 at 7:54 AM, Yuanhan Liu <yuanhan.liu at linux.intel.com> wrote: > On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote: > > On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu < > yuanhan.liu at linux.intel.com> > > wrote: > > > > On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote: > > > > [...] > > > > > With that applied one (and only one) of my two guests looses > connectivity > > after > > > removing the ports the first time. > > > > Yeah, that's should be because I invoked the "->destroy_device()" > > callback. > > > > > > Shouldn't that not only destroy the particular vhost_user device I > remove? > > I assume the "not" is typed wrong here, then yes. Well, it turned > out that I accidentally destroyed the first guest (with id 0) with > following code: > > ctx.fh = g_vhost_server.server[i]->fh; > vhost_destroy_device(ctx); > > server[i]->fh is initialized with 0 when no connection is established > (check below for more info), and the first id is started with 0. Anyway, > this could be fixed easily. > > > See below for some better details on the test to clarify that. > > > > > > BTW, I'm curious how do you do the test? I saw you added 256 ports, > but > > with 2 guests only? So, 254 of them are idle, just for testing the > > memory leak bug? > > > > > > Maybe I should describe it better: > > 1. Spawn some vhost-user ports (40 in my case) > > 2. Spawn a pair of guests that connect via four of those ports per guest > > 3. Guests only intialize one of that vhost_user based NICs > > 4. check connectivity between guests via the vhost_user based connection > > (working at this stage) > > LOOP 5-7: > > 5. add ports 41-512 > > 6. remove ports 41-512 > > 7. check connectivity between guests via the vhost_user based > connection > > Yes, it's much clearer now. Thanks. > > I then don't see it's a leak from DPDK vhost-user, at least not the leak > on "struct virtio_net" I have mentioned before. "struct virito_net" will > not even be allocated for those ports never used (ports 41-512 in your > case), > as it will be allocated only when there is a connection established, aka, > a guest is connected. > > BTW, will you be able to reproduce it without any connections? Say, all > 512 ports are added, and then deleted. > > Thanks. > > --yliu > > > > > So the vhost_user ports the guests are using are never deleted. > > Only some extra (not even used) ports are added&removed in the loop to > search > > for potential leaks over a longer lifetime of an openvswitch-dpdk based > > solution. > > >