[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-21 Thread Christian Ehrhardt
Thanks Ilya,
yeah we usually wait for the point releases as they undergo some extra
testing and verification.
.1 shouldn't be too much into the future I guess.
Thanks a lot for identifying.

That said, I'd still go on with Yuanhan to finalize the dpdk side leak fix
we identified, so we eventually get it committed.
So Yuanhan, what do you think of my last revised version of your patch for
upstream DPDK (there with the vhost_destroy_device then)?
I mean it is essentially your patch plus a bit of polishing, not mine so I
don't feel entitled to submit it as mine :-)

Kind Regards,
Christian


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Thu, Apr 21, 2016 at 1:01 PM, Ilya Maximets 
wrote:

> Hi, Christian.
> You're, likely, using tar archive with openvswitch from openvswitch.org.
> It doesn't contain many bug fixes from git/branch-2.5 unfortunately.
>
> The problem that you are facing has been solved in branch-2.5 by
>
> commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
> Author: Ilya Maximets 
> Date:   Thu Mar 3 11:30:06 2016 +0300
>
> netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().
>
> Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
> Signed-off-by: Ilya Maximets 
> Acked-by: Flavio Leitner 
> Acked-by: Daniele Di Proietto 
>
> Best regards, Ilya Maximets.
>
> > I assume there is a leak somewhere on adding/removing vhost_user ports.
> > Although it could also be "only" a fragmentation issue.
> >
> > Reproduction is easy:
> > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > Then in a loop I
> >- add up to more 512 ports
> >- test connectivity between the two guests
> >- remove up to 512 ports
> >
> > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > slightly change when exactly it breaks. But for my default setup of 4
> > queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> > iteration.
> > Here a link to the stack trace indicating a memory shortage (TBC):
> > https://launchpadlibrarian.net/253916410/apport-retrace.log
> >
> > Known Todos:
> > - I want to track it down more, and will try to come up with a non
> > openvswitch based looping testcase that might show it as well to simplify
> > debugging.
> > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04
> and
> > Openvswitch master is planned.
> >
> > I will go on debugging this and let you know, but I wanted to give a
> heads
> > up to everyone.
> > In case this is a known issue for some of you please let me know.
> >
> > Kind Regards,
> > Christian Ehrhardt
> > Software Engineer, Ubuntu Server
> > Canonical Ltd
> >
> > P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
> > ovs-dpdk as well.
>


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-21 Thread Ilya Maximets
Hi, Christian.
You're, likely, using tar archive with openvswitch from openvswitch.org.
It doesn't contain many bug fixes from git/branch-2.5 unfortunately.

The problem that you are facing has been solved in branch-2.5 by

commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
Author: Ilya Maximets 
Date:   Thu Mar 3 11:30:06 2016 +0300

netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().

Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
Signed-off-by: Ilya Maximets 
Acked-by: Flavio Leitner 
Acked-by: Daniele Di Proietto 

Best regards, Ilya Maximets.

> I assume there is a leak somewhere on adding/removing vhost_user ports.
> Although it could also be "only" a fragmentation issue.
> 
> Reproduction is easy:
> I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> Then in a loop I
>- add up to more 512 ports
>- test connectivity between the two guests
>- remove up to 512 ports
> 
> Depending on memory and the amount of multiqueue/rxq I use it seems to
> slightly change when exactly it breaks. But for my default setup of 4
> queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> iteration.
> Here a link to the stack trace indicating a memory shortage (TBC):
> https://launchpadlibrarian.net/253916410/apport-retrace.log
> 
> Known Todos:
> - I want to track it down more, and will try to come up with a non
> openvswitch based looping testcase that might show it as well to simplify
> debugging.
> - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> Openvswitch master is planned.
> 
> I will go on debugging this and let you know, but I wanted to give a heads
> up to everyone.
> In case this is a known issue for some of you please let me know.
> 
> Kind Regards,
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
> ovs-dpdk as well.


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-21 Thread Christian Ehrhardt
Hi,
I can follow your argument that - and agree that in this case the leak
can't be solved your patch.
Still I found it useful to revise it along our discussion as eventually it
will still be a good patch to have.
I followed your suggestion and found:
- rte_vhost_driver_register callocs vserver (implies fh=0)
- later when on init when getting the callback to vserver_new_vq_conn it
would get set by ops->new_device(vdev_ctx);
- but as you pointed out that could be fh = 0 for the first
- so I initialized vserver->fh with -1 in rte_vhost_driver_register - that
won't ever be a real fh.
- later on get_config_ll_entry won't find a device with that then on the
call by destroy_device.
- so the revised patch currently in use (still for DPDK 2.2) can be found
here http://paste.ubuntu.com/15961394/

Also as you requested I tried with no guest attached at all - that way I
can still reproduce it.
Here is a new stacktrace, but to me it looks the same
http://paste.ubuntu.com/15961185/
Also as you asked before a log of the vswitch, but it is 895MB since a lot
of messages repeat on port add/remove.
Even compressed still 27MB - I need to do something about verbosity there.
Also the system journal of the same time.
Therefore I only added links to bz2 files.
The crash is at "2016-04-21T07:54:47.782Z" in the logs.
=>
http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/mem-leak-addremove.journal.bz2
=>
http://people.canonical.com/~paelzer/ovs-dpdk-vhost-add-remove-leak/ovs-vswitchd.log.bz2

Kind Regards,
Christian Ehrhardt


Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Thu, Apr 21, 2016 at 7:54 AM, Yuanhan Liu 
wrote:

> On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote:
> > On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu <
> yuanhan.liu at linux.intel.com>
> > wrote:
> >
> > On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> >
> > [...]
> >
> > > With that applied one (and only one) of my two guests looses
> connectivity
> > after
> > > removing the ports the first time.
> >
> > Yeah, that's should be because I invoked the "->destroy_device()"
> > callback.
> >
> >
> > Shouldn't that not only destroy the particular vhost_user device I
> remove?
>
> I assume the "not" is typed wrong here, then yes. Well, it turned
> out that I accidentally destroyed the first guest (with id 0) with
> following code:
>
> ctx.fh = g_vhost_server.server[i]->fh;
> vhost_destroy_device(ctx);
>
> server[i]->fh is initialized with 0 when no connection is established
> (check below for more info), and the first id is started with 0. Anyway,
> this could be fixed easily.
>
> > See below for some better details on the test to clarify that.
> >
> >
> > BTW, I'm curious how do you do the test? I saw you added 256 ports,
> but
> > with 2 guests only? So, 254 of them are idle, just for testing the
> > memory leak bug?
> >
> >
> > Maybe I should describe it better:
> > 1. Spawn some vhost-user ports (40 in my case)
> > 2. Spawn a pair of guests that connect via four of those ports per guest
> > 3. Guests only intialize one of that vhost_user based NICs
> > 4. check connectivity between guests via the vhost_user based connection
> > (working at this stage)
> > LOOP 5-7:
> >5. add ports 41-512
> >6. remove  ports 41-512
> >7. check connectivity between guests via the vhost_user based
> connection
>
> Yes, it's much clearer now. Thanks.
>
> I then don't see it's a leak from DPDK vhost-user, at least not the leak
> on "struct virtio_net" I have mentioned before. "struct virito_net" will
> not even be allocated for those ports never used (ports 41-512 in your
> case),
> as it will be allocated only when there is a connection established, aka,
> a guest is connected.
>
> BTW, will you be able to reproduce it without any connections? Say, all
> 512 ports are added, and then deleted.
>
> Thanks.
>
> --yliu
>
> >
> > So the vhost_user ports the guests are using are never deleted.
> > Only some extra (not even used) ports are added&removed in the loop to
> search
> > for potential leaks over a longer lifetime of an openvswitch-dpdk based
> > solution.
> >
>


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-21 Thread Yuanhan Liu
On Thu, Apr 21, 2016 at 04:04:03PM +0200, Christian Ehrhardt wrote:
> Thanks Ilya,
> yeah we usually wait for the point releases as they undergo some extra testing
> and verification.
> .1 shouldn't be too much into the future I guess.
> Thanks a lot for identifying.
> 
> That said, I'd still go on with Yuanhan to finalize the dpdk side leak fix we
> identified, so we eventually get it committed.
> So Yuanhan, what do you think of my last revised version of your patch for
> upstream DPDK (there with the?vhost_destroy_device then)?

That's good.

> I mean it is essentially your patch plus a bit of polishing, not mine so I
> don't feel entitled to submit it as mine :-)

Thanks. I will make and send out a formal patch later.

--yliu


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-21 Thread Yuanhan Liu
On Thu, Apr 21, 2016 at 02:01:26PM +0300, Ilya Maximets wrote:
> Hi, Christian.
> You're, likely, using tar archive with openvswitch from openvswitch.org.
> It doesn't contain many bug fixes from git/branch-2.5 unfortunately.
> 
> The problem that you are facing has been solved in branch-2.5 by
> 
> commit d9df7b9206831631ddbd90f9cbeef1b4fc5a8e89
> Author: Ilya Maximets 
> Date:   Thu Mar 3 11:30:06 2016 +0300
> 
> netdev-dpdk: Fix memory leak in netdev_dpdk_vhost_destruct().
> 
> Fixes: 4573fbd38fa1 ("netdev-dpdk: Add vhost-user multiqueue support")
> Signed-off-by: Ilya Maximets 
> Acked-by: Flavio Leitner 
> Acked-by: Daniele Di Proietto 
Hi Ilya,

Thanks for the info. And, I actually checked this peice of code. I was
using new code, so, I didn't find anything wrong.

--yliu


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-20 Thread Yuanhan Liu
On Wed, Apr 20, 2016 at 08:18:49AM +0200, Christian Ehrhardt wrote:
> On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu 
> wrote:
> 
> On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> 
> [...]?
> 
> > With that applied one (and only one) of my two guests looses 
> connectivity
> after
> > removing the ports the first time.
> 
> Yeah, that's should be because I invoked the "->destroy_device()"
> callback.
> 
> 
> Shouldn't that not only destroy the particular vhost_user device I remove?

I assume the "not" is typed wrong here, then yes. Well, it turned
out that I accidentally destroyed the first guest (with id 0) with
following code:

ctx.fh = g_vhost_server.server[i]->fh;
vhost_destroy_device(ctx);

server[i]->fh is initialized with 0 when no connection is established
(check below for more info), and the first id is started with 0. Anyway,
this could be fixed easily.

> See below for some better details on the test to clarify that.
> 
> 
> BTW, I'm curious how do you do the test? I saw you added 256 ports, but
> with 2 guests only? So, 254 of them are idle, just for testing the
> memory leak bug?
> 
> 
> Maybe I should describe it better:
> 1. Spawn some vhost-user ports (40 in my case)
> 2. Spawn a pair of guests that connect via four of those ports per guest
> 3. Guests only intialize one of that vhost_user based NICs
> 4. check connectivity between guests via the vhost_user based connection
> (working at this stage)
> LOOP 5-7:
> ? ?5. add ports 41-512
> ? ?6. remove ?ports 41-512
> ? ?7. check connectivity between guests via the vhost_user based connection

Yes, it's much clearer now. Thanks.

I then don't see it's a leak from DPDK vhost-user, at least not the leak
on "struct virtio_net" I have mentioned before. "struct virito_net" will
not even be allocated for those ports never used (ports 41-512 in your case),
as it will be allocated only when there is a connection established, aka,
a guest is connected.

BTW, will you be able to reproduce it without any connections? Say, all
512 ports are added, and then deleted.

Thanks.

--yliu

> 
> So the vhost_user ports the guests are using are never deleted.
> Only some extra (not even used) ports are added&removed in the loop to search
> for potential leaks over a longer lifetime of an openvswitch-dpdk based
> solution.
> 


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-20 Thread Christian Ehrhardt
On Wed, Apr 20, 2016 at 7:04 AM, Yuanhan Liu 
wrote:

> On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
>
[...]

> > With that applied one (and only one) of my two guests looses
> connectivity after
> > removing the ports the first time.
>
> Yeah, that's should be because I invoked the "->destroy_device()"
> callback.
>

Shouldn't that not only destroy the particular vhost_user device I remove?
See below for some better details on the test to clarify that.

BTW, I'm curious how do you do the test? I saw you added 256 ports, but
> with 2 guests only? So, 254 of them are idle, just for testing the
> memory leak bug?
>

Maybe I should describe it better:
1. Spawn some vhost-user ports (40 in my case)
2. Spawn a pair of guests that connect via four of those ports per guest
3. Guests only intialize one of that vhost_user based NICs
4. check connectivity between guests via the vhost_user based connection
(working at this stage)
LOOP 5-7:
   5. add ports 41-512
   6. remove  ports 41-512
   7. check connectivity between guests via the vhost_user based connection

So the vhost_user ports the guests are using are never deleted.
Only some extra (not even used) ports are added&removed in the loop to
search for potential leaks over a longer lifetime of an openvswitch-dpdk
based solution.


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-19 Thread Yuanhan Liu
On Tue, Apr 19, 2016 at 06:33:50PM +0200, Christian Ehrhardt wrote:
> Hi,
> thanks for the patch.
> I backported it this way to my DPDK 2.2 based environment for now (see below):
> 
> With that applied one (and only one) of my two guests looses connectivity 
> after
> removing the ports the first time.

Yeah, that's should be because I invoked the "->destroy_device()"
callback.

BTW, I'm curious how do you do the test? I saw you added 256 ports, but
with 2 guests only? So, 254 of them are idle, just for testing the
memory leak bug?

And then you remove all of them, without stopping the guest. How that's
gonna work? I mean, the vhost-user connection would be broken, and data
flow would not work.

--yliu

> No traffic seems to pass, setting the device in the guest down/up doesn't get
> anything.
> But It isn't totally broken - stopping/starting the guest gets it working
> again.
> So openvswitch/dpdk is still somewhat working - it just seems the guest lost
> something, after tapping on that vhost_user interface again it works.
> 
> I will check tomorrow and let you know:
> - if I'm more lucky with that patch on top of 16.04
> - if it looses connectivity after the first or a certain amount of port 
> removes
> 
> If you find issues with my backport adaption let me know.
> 
> 
> ---
> 
> Backport and reasoning:
> 
> new fix relies on a lot of new code, vhost_destroy_device looks totally
> different from the former destroy_device.
> History on todays function content:
> ? 4796ad63 - original code moved from examples to lib
> ? a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device
> ? 71dc571e - simple check against null pointers
> ? 45ca9c6f - this changed the code from linked list to arrays
> 
> ? New code cleans with:
> ? ? ? notify_ops->destroy_device (callback into the parent)
> ? ? ? cleanup_device was existing before even in 2.2 code
> ? ? ? free_device as well existing before even in 2.2 code
> ? Old code cleans with:
> ? ? ? notify_ops->destroy_device - still there
> ? ? ? rm_config_ll_entry -> eventually calls cleanup_device and free_device
> ? ? ? ? (just in the more complex linked list way)
> 
> So the "only" adaption for backporting needed is to replace
> vhost_destroy_device
> with ops->destroy_device(ctx)
> 
> Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
> ===
> --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c
> +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
> @@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _
> ? }
> ?
> ? vdev_ctx.fh = fh;
> + vserver->fh = fh;
> ? size = strnlen(vserver->path, PATH_MAX);
> ? ops->set_ifname(vdev_ctx, vserver->path,
> ? size);
> @@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char *
> ?
> ? for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
> ? if (!strcmp(g_vhost_server.server[i]->path, path)) {
> + struct vhost_device_ctx ctx;
> +
> + ctx.fh = g_vhost_server.server[i]->fh;
> + ops->destroy_device(ctx);
> +
> ? fdset_del(&g_vhost_server.fdset,
> ? g_vhost_server.server[i]->listenfd);
> ?
> Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
> ===
> --- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h
> +++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
> @@ -43,6 +43,7 @@
> ?struct vhost_server {
> ? char *path; /**< The path the uds is bind to. */
> ? int listenfd; ? ? /**< The listener sockfd. */
> + uint32_t fh;
> ?};
> ?
> ?/* refer to hw/virtio/vhost-user.c */
> 
> 
> 
> 
> Christian Ehrhardt
> Software Engineer, Ubuntu Server
> Canonical Ltd
> 
> On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu 
> wrote:
> 
> On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
> > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> > > I assume there is a leak somewhere on adding/removing vhost_user 
> ports.
> > > Although it could also be "only" a fragmentation issue.
> > >
> > > Reproduction is easy:
> > > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > > Then in a loop I
> > >? ? - add up to more 512 ports
> > >? ? - test connectivity between the two guests
> > >? ? - remove up to 512 ports
> > >
> > > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > > slightly change when exactly it breaks. But for my default setup of 4
> > > queues and 5G Hugepages initialized by DPDK it always breaks at the
> sixth
> > > iteration.
> > > Here a link to the stack trace indicating a memory shortage (TBC):
> > > https://launchpadlibrarian.net/253916410/apport-retrace.log
> > >
> > > Known Todos:
> > > - I want to track it down more, and will try to come up with a non
> > > openvswitch based looping testcase that might show it as well to
> simplify
> > > debugging.
> > > - in use were Openvswitch-dpdk 2.5 

[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-19 Thread Christian Ehrhardt
Hi,
thanks for the patch.
I backported it this way to my DPDK 2.2 based environment for now (see
below):

With that applied one (and only one) of my two guests looses connectivity
after removing the ports the first time.
No traffic seems to pass, setting the device in the guest down/up doesn't
get anything.
But It isn't totally broken - stopping/starting the guest gets it working
again.
So openvswitch/dpdk is still somewhat working - it just seems the guest
lost something, after tapping on that vhost_user interface again it works.

I will check tomorrow and let you know:
- if I'm more lucky with that patch on top of 16.04
- if it looses connectivity after the first or a certain amount of port
removes

If you find issues with my backport adaption let me know.


---

Backport and reasoning:

new fix relies on a lot of new code, vhost_destroy_device looks totally
different from the former destroy_device.
History on todays function content:
  4796ad63 - original code moved from examples to lib
  a90ca1a1 - this replaces ops->destroy_device with vhost_destroy_device
  71dc571e - simple check against null pointers
  45ca9c6f - this changed the code from linked list to arrays

  New code cleans with:
  notify_ops->destroy_device (callback into the parent)
  cleanup_device was existing before even in 2.2 code
  free_device as well existing before even in 2.2 code
  Old code cleans with:
  notify_ops->destroy_device - still there
  rm_config_ll_entry -> eventually calls cleanup_device and free_device
(just in the more complex linked list way)

So the "only" adaption for backporting needed is to replace
vhost_destroy_device
with ops->destroy_device(ctx)

Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
===
--- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -310,6 +310,7 @@ vserver_new_vq_conn(int fd, void *dat, _
  }

  vdev_ctx.fh = fh;
+ vserver->fh = fh;
  size = strnlen(vserver->path, PATH_MAX);
  ops->set_ifname(vdev_ctx, vserver->path,
  size);
@@ -516,6 +517,11 @@ rte_vhost_driver_unregister(const char *

  for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
  if (!strcmp(g_vhost_server.server[i]->path, path)) {
+ struct vhost_device_ctx ctx;
+
+ ctx.fh = g_vhost_server.server[i]->fh;
+ ops->destroy_device(ctx);
+
  fdset_del(&g_vhost_server.fdset,
  g_vhost_server.server[i]->listenfd);

Index: dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
===
--- dpdk.orig/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ dpdk/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -43,6 +43,7 @@
 struct vhost_server {
  char *path; /**< The path the uds is bind to. */
  int listenfd; /**< The listener sockfd. */
+ uint32_t fh;
 };

 /* refer to hw/virtio/vhost-user.c */




Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

On Mon, Apr 18, 2016 at 8:14 PM, Yuanhan Liu 
wrote:

> On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
> > On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> > > I assume there is a leak somewhere on adding/removing vhost_user ports.
> > > Although it could also be "only" a fragmentation issue.
> > >
> > > Reproduction is easy:
> > > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > > Then in a loop I
> > >- add up to more 512 ports
> > >- test connectivity between the two guests
> > >- remove up to 512 ports
> > >
> > > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > > slightly change when exactly it breaks. But for my default setup of 4
> > > queues and 5G Hugepages initialized by DPDK it always breaks at the
> sixth
> > > iteration.
> > > Here a link to the stack trace indicating a memory shortage (TBC):
> > > https://launchpadlibrarian.net/253916410/apport-retrace.log
> > >
> > > Known Todos:
> > > - I want to track it down more, and will try to come up with a non
> > > openvswitch based looping testcase that might show it as well to
> simplify
> > > debugging.
> > > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK
> 16.04 and
> > > Openvswitch master is planned.
> > >
> > > I will go on debugging this and let you know, but I wanted to give a
> heads
> > > up to everyone.
> >
> > Thanks for the report.
> >
> > > In case this is a known issue for some of you please let me know.
> >
> > Yeah, it might be. I'm wondering that virtio_net struct is not freed.
> > It will be freed only (if I'm not mistaken) when guest quits, by far.
>
> Would you try following diff and to see if it fix your issue?
>
> --yliu
>
> ---
>  lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++
>  lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
>  2 files changed, 7 insertions(+)
>
> diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c
> b/lib/librte_vhost/vhost_user/

[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-18 Thread Christian Ehrhardt
I assume there is a leak somewhere on adding/removing vhost_user ports.
Although it could also be "only" a fragmentation issue.

Reproduction is easy:
I set up a pair of nicely working OVS-DPDK connected KVM Guests.
Then in a loop I
   - add up to more 512 ports
   - test connectivity between the two guests
   - remove up to 512 ports

Depending on memory and the amount of multiqueue/rxq I use it seems to
slightly change when exactly it breaks. But for my default setup of 4
queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
iteration.
Here a link to the stack trace indicating a memory shortage (TBC):
https://launchpadlibrarian.net/253916410/apport-retrace.log

Known Todos:
- I want to track it down more, and will try to come up with a non
openvswitch based looping testcase that might show it as well to simplify
debugging.
- in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
Openvswitch master is planned.

I will go on debugging this and let you know, but I wanted to give a heads
up to everyone.
In case this is a known issue for some of you please let me know.

Kind Regards,
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

P.S. I think it is a dpdk issue, but adding Daniele on CC to represent
ovs-dpdk as well.


[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-18 Thread Yuanhan Liu
On Mon, Apr 18, 2016 at 10:46:50AM -0700, Yuanhan Liu wrote:
> On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> > I assume there is a leak somewhere on adding/removing vhost_user ports.
> > Although it could also be "only" a fragmentation issue.
> > 
> > Reproduction is easy:
> > I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> > Then in a loop I
> >- add up to more 512 ports
> >- test connectivity between the two guests
> >- remove up to 512 ports
> > 
> > Depending on memory and the amount of multiqueue/rxq I use it seems to
> > slightly change when exactly it breaks. But for my default setup of 4
> > queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> > iteration.
> > Here a link to the stack trace indicating a memory shortage (TBC):
> > https://launchpadlibrarian.net/253916410/apport-retrace.log
> > 
> > Known Todos:
> > - I want to track it down more, and will try to come up with a non
> > openvswitch based looping testcase that might show it as well to simplify
> > debugging.
> > - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> > Openvswitch master is planned.
> > 
> > I will go on debugging this and let you know, but I wanted to give a heads
> > up to everyone.
> 
> Thanks for the report.
> 
> > In case this is a known issue for some of you please let me know.
> 
> Yeah, it might be. I'm wondering that virtio_net struct is not freed.
> It will be freed only (if I'm not mistaken) when guest quits, by far.

Would you try following diff and to see if it fix your issue?

--yliu

---
 lib/librte_vhost/vhost_user/vhost-net-user.c | 6 ++
 lib/librte_vhost/vhost_user/vhost-net-user.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.c 
b/lib/librte_vhost/vhost_user/vhost-net-user.c
index df2bd64..8f7ebd7 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.c
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.c
@@ -309,6 +309,7 @@ vserver_new_vq_conn(int fd, void *dat, __rte_unused int 
*remove)
}

vdev_ctx.fh = fh;
+   vserver->fh = fh;
size = strnlen(vserver->path, PATH_MAX);
vhost_set_ifname(vdev_ctx, vserver->path,
size);
@@ -501,6 +502,11 @@ rte_vhost_driver_unregister(const char *path)

for (i = 0; i < g_vhost_server.vserver_cnt; i++) {
if (!strcmp(g_vhost_server.server[i]->path, path)) {
+   struct vhost_device_ctx ctx;
+
+   ctx.fh = g_vhost_server.server[i]->fh;
+   vhost_destroy_device(ctx);
+
fdset_del(&g_vhost_server.fdset,
g_vhost_server.server[i]->listenfd);

diff --git a/lib/librte_vhost/vhost_user/vhost-net-user.h 
b/lib/librte_vhost/vhost_user/vhost-net-user.h
index e3bb413..7cf21db 100644
--- a/lib/librte_vhost/vhost_user/vhost-net-user.h
+++ b/lib/librte_vhost/vhost_user/vhost-net-user.h
@@ -43,6 +43,7 @@
 struct vhost_server {
char *path; /**< The path the uds is bind to. */
int listenfd; /**< The listener sockfd. */
+   uint32_t fh;
 };

 /* refer to hw/virtio/vhost-user.c */
-- 
1.9.3




[dpdk-dev] Memory leak when adding/removing vhost_user ports

2016-04-18 Thread Yuanhan Liu
On Mon, Apr 18, 2016 at 07:18:05PM +0200, Christian Ehrhardt wrote:
> I assume there is a leak somewhere on adding/removing vhost_user ports.
> Although it could also be "only" a fragmentation issue.
> 
> Reproduction is easy:
> I set up a pair of nicely working OVS-DPDK connected KVM Guests.
> Then in a loop I
>- add up to more 512 ports
>- test connectivity between the two guests
>- remove up to 512 ports
> 
> Depending on memory and the amount of multiqueue/rxq I use it seems to
> slightly change when exactly it breaks. But for my default setup of 4
> queues and 5G Hugepages initialized by DPDK it always breaks at the sixth
> iteration.
> Here a link to the stack trace indicating a memory shortage (TBC):
> https://launchpadlibrarian.net/253916410/apport-retrace.log
> 
> Known Todos:
> - I want to track it down more, and will try to come up with a non
> openvswitch based looping testcase that might show it as well to simplify
> debugging.
> - in use were Openvswitch-dpdk 2.5 and DPDK 2.2; Retest with DPDK 16.04 and
> Openvswitch master is planned.
> 
> I will go on debugging this and let you know, but I wanted to give a heads
> up to everyone.

Thanks for the report.

> In case this is a known issue for some of you please let me know.

Yeah, it might be. I'm wondering that virtio_net struct is not freed.
It will be freed only (if I'm not mistaken) when guest quits, by far.

BTW, could you dump the ovs-dpdk log?

--yliu