Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-11-08 Thread Ilya Maximets

On 08.11.2019 9:30, David Marchand wrote:

On Tue, Nov 5, 2019 at 4:37 PM Ilya Maximets  wrote:

That's an interesting debug method, but it looks not very suitable
for an end-user documentation.  One thing that bothers me the most
is referencing C code snippets in this kind of documentation.


Ok, can we conclude on the coverage counter wrt the v1 then?
https://patchwork.ozlabs.org/patch/1153238/

Should I submit a v3 with the doc update (but removing the parts about perf) ?


'perf' part should not be there.

I'm in doubt about the rest of the docs.  This part might be useful, but
it doesn't provide any solution for the problem and I really don't know
if there is something we can suggest in that case.  "Rework your network
topology" doesn't sound like a friendly solution or exact steps to do.

One more thing is that documenting coverage counters doesn't look like a
good idea to me and I'd like to not create a precedent.

One day we'll rework this to be some "PMD/netdev performance statistics"
and it'll be OK to document it.  But there is nothing more permanent
than a temporary solution.

Right now the easiest way for me is to just apply v1.


Ok for me.


OK. I'll apply v1 then.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-11-08 Thread David Marchand
On Tue, Nov 5, 2019 at 4:37 PM Ilya Maximets  wrote:
> >> That's an interesting debug method, but it looks not very suitable
> >> for an end-user documentation.  One thing that bothers me the most
> >> is referencing C code snippets in this kind of documentation.
> >
> > Ok, can we conclude on the coverage counter wrt the v1 then?
> > https://patchwork.ozlabs.org/patch/1153238/
> >
> > Should I submit a v3 with the doc update (but removing the parts about 
> > perf) ?
>
> 'perf' part should not be there.
>
> I'm in doubt about the rest of the docs.  This part might be useful, but
> it doesn't provide any solution for the problem and I really don't know
> if there is something we can suggest in that case.  "Rework your network
> topology" doesn't sound like a friendly solution or exact steps to do.
>
> One more thing is that documenting coverage counters doesn't look like a
> good idea to me and I'd like to not create a precedent.
>
> One day we'll rework this to be some "PMD/netdev performance statistics"
> and it'll be OK to document it.  But there is nothing more permanent
> than a temporary solution.
>
> Right now the easiest way for me is to just apply v1.

Ok for me.


-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-11-05 Thread Ilya Maximets

On 20.10.2019 14:31, David Marchand wrote:

Hello,

On Tue, Oct 15, 2019 at 2:13 PM Ilya Maximets  wrote:


On 14.10.2019 20:04, Aaron Conole wrote:

David Marchand  writes:


Add a coverage counter to help diagnose contention on the vhost txqs.
This is seen as dropped packets on the physical ports for rates that
are usually handled fine by OVS.
Document how to further debug this contention with perf.

Signed-off-by: David Marchand 
---
Changelog since v1:
- added documentation as a bonus: not sure this is the right place, or if it
really makes sense to enter into such details. But I still find it useful.
Comments?


It's useful, and I think it makes sense here.


That's an interesting debug method, but it looks not very suitable
for an end-user documentation.  One thing that bothers me the most
is referencing C code snippets in this kind of documentation.


Ok, can we conclude on the coverage counter wrt the v1 then?
https://patchwork.ozlabs.org/patch/1153238/

Should I submit a v3 with the doc update (but removing the parts about perf) ?


'perf' part should not be there.

I'm in doubt about the rest of the docs.  This part might be useful, but
it doesn't provide any solution for the problem and I really don't know
if there is something we can suggest in that case.  "Rework your network
topology" doesn't sound like a friendly solution or exact steps to do.

One more thing is that documenting coverage counters doesn't look like a
good idea to me and I'd like to not create a precedent.

One day we'll rework this to be some "PMD/netdev performance statistics"
and it'll be OK to document it.  But there is nothing more permanent
than a temporary solution.

Right now the easiest way for me is to just apply v1.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-10-20 Thread David Marchand
Hello,

On Tue, Oct 15, 2019 at 2:13 PM Ilya Maximets  wrote:
>
> On 14.10.2019 20:04, Aaron Conole wrote:
> > David Marchand  writes:
> >
> >> Add a coverage counter to help diagnose contention on the vhost txqs.
> >> This is seen as dropped packets on the physical ports for rates that
> >> are usually handled fine by OVS.
> >> Document how to further debug this contention with perf.
> >>
> >> Signed-off-by: David Marchand 
> >> ---
> >> Changelog since v1:
> >> - added documentation as a bonus: not sure this is the right place, or if 
> >> it
> >>really makes sense to enter into such details. But I still find it 
> >> useful.
> >>Comments?
> >
> > It's useful, and I think it makes sense here.
>
> That's an interesting debug method, but it looks not very suitable
> for an end-user documentation.  One thing that bothers me the most
> is referencing C code snippets in this kind of documentation.

Ok, can we conclude on the coverage counter wrt the v1 then?
https://patchwork.ozlabs.org/patch/1153238/

Should I submit a v3 with the doc update (but removing the parts about perf) ?


Thanks.

-- 
David Marchand
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-10-15 Thread Ilya Maximets

On 14.10.2019 20:04, Aaron Conole wrote:

David Marchand  writes:


Add a coverage counter to help diagnose contention on the vhost txqs.
This is seen as dropped packets on the physical ports for rates that
are usually handled fine by OVS.
Document how to further debug this contention with perf.

Signed-off-by: David Marchand 
---
Changelog since v1:
- added documentation as a bonus: not sure this is the right place, or if it
   really makes sense to enter into such details. But I still find it useful.
   Comments?


It's useful, and I think it makes sense here.


That's an interesting debug method, but it looks not very suitable
for an end-user documentation.  One thing that bothers me the most
is referencing C code snippets in this kind of documentation.

There are other ways to check the possible contentions. Maybe
not so certain, but more friendly and doesn't require any debug
symbols and special tools.  For example, user could find out how
many PMD threads are using same output port by checking the installed
datapath flows by filtering the output of 'ovs-appctl dpctl/dump-flows'
with desired port number/name. Comparing that number with the number
of 'configured_tx_queues' will give the good candidates for tx lock
contention.
For me above method looks more native as we're comparing the number
of threads that uses same output port with number of available tx
queues. This is something natural in compare with a tracepoint in
some random place in the code that could change over time.




---
  Documentation/topics/dpdk/vhost-user.rst | 61 
  lib/netdev-dpdk.c|  8 -
  2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index fab87bd..c7e605e 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -623,3 +623,64 @@ Because of this limitation, this feature is considered 
'experimental'.
  Further information can be found in the
  `DPDK documentation
  `__
+
+Troubleshooting vhost-user tx contention
+
+
+Depending on the number of a virtio port Rx queues enabled by a guest and on
+the number of PMDs used on OVS side, OVS can end up with contention occuring
+on the lock protecting the vhost Tx queue.


Maybe make the wording specific to a vhostuser port?  I think someone
might make a wrong conclusion if they use the virtio PMD as a dpdk port
instead of using the vhostuser ports.  Not sure *why* someone might do
that, but it's a possibility and this counter won't tick for those
cases.


+This problem can be hard to catch since it is noticeable as an increased cpu
+cost for handling the received packets and, usually, as drops in the
+statistics of the physical port receiving the packets.
+
+To identify such a situation, a coverage statistic is available::
+
+  $ ovs-appctl coverage/read-counter vhost_tx_contention
+  59530681
+
+If you want to further debug this contention, perf can be used if your OVS
+daemon had been compiled with debug symbols.
+
+First, identify the point in the binary sources where the contention occurs::
+
+  $ perf probe -x $(which ovs-vswitchd) -L __netdev_dpdk_vhost_send \
+ |grep -B 3 -A 3 'COVERAGE_INC(vhost_tx_contention)'
+   }
+
+   21  if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) {
+   22  COVERAGE_INC(vhost_tx_contention);
+   23  rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
+   }
+
+Then, place a probe at the line where the lock is taken.
+You can add additional context to catch which port and queue are concerned::
+
+  $ perf probe -x $(which ovs-vswitchd) \
+'vhost_tx_contention=__netdev_dpdk_vhost_send:23 netdev->name:string qid'
+
+Finally, gather data and generate a report::
+
+  $ perf record -e probe_ovs:vhost_tx_contention -aR sleep 10
+  [ perf record: Woken up 120 times to write data ]
+  [ perf record: Captured and wrote 30.151 MB perf.data (356278 samples) ]
+
+  $ perf report -F +pid --stdio
+  # To display the perf.data header info, please use --header/--header-only 
options.
+  #
+  #
+  # Total Lost Samples: 0
+  #
+  # Samples: 356K of event 'probe_ovs:vhost_tx_contention'
+  # Event count (approx.): 356278
+  #
+  # Overhead  Pid:CommandTrace output
+  #   .  
+  #
+  55.57%83332:pmd-c01/id:33  (9e9775) name="vhost0" qid=0
+  44.43%8:pmd-c15/id:34  (9e9775) name="vhost0" qid=0
+
+
+  #
+  # (Tip: Treat branches as callchains: perf report --branch-history)
+  #
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7f709ff..3525870 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -41,6 +41,7 @@
  #include 
  
  #include "cmap.h"

+#include "coverage.h"
  #include "dirs.h"
  #include "dp-

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-10-15 Thread Eelco Chaudron




On 13 Oct 2019, at 17:55, David Marchand wrote:


Add a coverage counter to help diagnose contention on the vhost txqs.
This is seen as dropped packets on the physical ports for rates that
are usually handled fine by OVS.
Document how to further debug this contention with perf.

Signed-off-by: David Marchand 
---
Changelog since v1:
- added documentation as a bonus: not sure this is the right place, or 
if it
  really makes sense to enter into such details. But I still find it 
useful.

  Comments?



Thanks for the detailed documentation!

Acked-by: Eelco Chaudron 


___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-10-14 Thread Aaron Conole
David Marchand  writes:

> Add a coverage counter to help diagnose contention on the vhost txqs.
> This is seen as dropped packets on the physical ports for rates that
> are usually handled fine by OVS.
> Document how to further debug this contention with perf.
>
> Signed-off-by: David Marchand 
> ---
> Changelog since v1:
> - added documentation as a bonus: not sure this is the right place, or if it
>   really makes sense to enter into such details. But I still find it useful.
>   Comments?

It's useful, and I think it makes sense here.

> ---
>  Documentation/topics/dpdk/vhost-user.rst | 61 
> 
>  lib/netdev-dpdk.c|  8 -
>  2 files changed, 68 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/topics/dpdk/vhost-user.rst 
> b/Documentation/topics/dpdk/vhost-user.rst
> index fab87bd..c7e605e 100644
> --- a/Documentation/topics/dpdk/vhost-user.rst
> +++ b/Documentation/topics/dpdk/vhost-user.rst
> @@ -623,3 +623,64 @@ Because of this limitation, this feature is considered 
> 'experimental'.
>  Further information can be found in the
>  `DPDK documentation
>  `__
> +
> +Troubleshooting vhost-user tx contention
> +
> +
> +Depending on the number of a virtio port Rx queues enabled by a guest and on
> +the number of PMDs used on OVS side, OVS can end up with contention occuring
> +on the lock protecting the vhost Tx queue.

Maybe make the wording specific to a vhostuser port?  I think someone
might make a wrong conclusion if they use the virtio PMD as a dpdk port
instead of using the vhostuser ports.  Not sure *why* someone might do
that, but it's a possibility and this counter won't tick for those
cases.

> +This problem can be hard to catch since it is noticeable as an increased cpu
> +cost for handling the received packets and, usually, as drops in the
> +statistics of the physical port receiving the packets.
> +
> +To identify such a situation, a coverage statistic is available::
> +
> +  $ ovs-appctl coverage/read-counter vhost_tx_contention
> +  59530681
> +
> +If you want to further debug this contention, perf can be used if your OVS
> +daemon had been compiled with debug symbols.
> +
> +First, identify the point in the binary sources where the contention occurs::
> +
> +  $ perf probe -x $(which ovs-vswitchd) -L __netdev_dpdk_vhost_send \
> + |grep -B 3 -A 3 'COVERAGE_INC(vhost_tx_contention)'
> +   }
> +
> +   21  if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) 
> {
> +   22  COVERAGE_INC(vhost_tx_contention);
> +   23  rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
> +   }
> +
> +Then, place a probe at the line where the lock is taken.
> +You can add additional context to catch which port and queue are concerned::
> +
> +  $ perf probe -x $(which ovs-vswitchd) \
> +'vhost_tx_contention=__netdev_dpdk_vhost_send:23 netdev->name:string qid'
> +
> +Finally, gather data and generate a report::
> +
> +  $ perf record -e probe_ovs:vhost_tx_contention -aR sleep 10
> +  [ perf record: Woken up 120 times to write data ]
> +  [ perf record: Captured and wrote 30.151 MB perf.data (356278 samples) ]
> +
> +  $ perf report -F +pid --stdio
> +  # To display the perf.data header info, please use --header/--header-only 
> options.
> +  #
> +  #
> +  # Total Lost Samples: 0
> +  #
> +  # Samples: 356K of event 'probe_ovs:vhost_tx_contention'
> +  # Event count (approx.): 356278
> +  #
> +  # Overhead  Pid:CommandTrace output
> +  #   .  
> +  #
> +  55.57%83332:pmd-c01/id:33  (9e9775) name="vhost0" qid=0
> +  44.43%8:pmd-c15/id:34  (9e9775) name="vhost0" qid=0
> +
> +
> +  #
> +  # (Tip: Treat branches as callchains: perf report --branch-history)
> +  #
> diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
> index 7f709ff..3525870 100644
> --- a/lib/netdev-dpdk.c
> +++ b/lib/netdev-dpdk.c
> @@ -41,6 +41,7 @@
>  #include 
>  
>  #include "cmap.h"
> +#include "coverage.h"
>  #include "dirs.h"
>  #include "dp-packet.h"
>  #include "dpdk.h"
> @@ -72,6 +73,8 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
>  VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
>  static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
>  
> +COVERAGE_DEFINE(vhost_tx_contention);
> +
>  #define DPDK_PORT_WATCHDOG_INTERVAL 5
>  
>  #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
> @@ -2353,7 +2356,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int 
> qid,
>  goto out;
>  }
>  
> -rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
> +if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) {
> +COVERAGE_INC(vhost_tx_contention);
> +rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
> +}
>  
>  cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
>  /* Check has QoS has been config

Re: [ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-10-13 Thread 0-day Robot
Bleep bloop.  Greetings David Marchand, I am a robot and I have tried out your 
patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Line is 84 characters long (recommended limit is 79)
#68 FILE: Documentation/topics/dpdk/vhost-user.rst:669:
  # To display the perf.data header info, please use --header/--header-only 
options.

Lines checked: 121, Warnings: 1, Errors: 0


Please check this out.  If you feel there has been an error, please email 
acon...@redhat.com

Thanks,
0-day Robot
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH v2] netdev-dpdk: Track vhost tx contention.

2019-10-13 Thread David Marchand
Add a coverage counter to help diagnose contention on the vhost txqs.
This is seen as dropped packets on the physical ports for rates that
are usually handled fine by OVS.
Document how to further debug this contention with perf.

Signed-off-by: David Marchand 
---
Changelog since v1:
- added documentation as a bonus: not sure this is the right place, or if it
  really makes sense to enter into such details. But I still find it useful.
  Comments?

---
 Documentation/topics/dpdk/vhost-user.rst | 61 
 lib/netdev-dpdk.c|  8 -
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/Documentation/topics/dpdk/vhost-user.rst 
b/Documentation/topics/dpdk/vhost-user.rst
index fab87bd..c7e605e 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -623,3 +623,64 @@ Because of this limitation, this feature is considered 
'experimental'.
 Further information can be found in the
 `DPDK documentation
 `__
+
+Troubleshooting vhost-user tx contention
+
+
+Depending on the number of a virtio port Rx queues enabled by a guest and on
+the number of PMDs used on OVS side, OVS can end up with contention occuring
+on the lock protecting the vhost Tx queue.
+This problem can be hard to catch since it is noticeable as an increased cpu
+cost for handling the received packets and, usually, as drops in the
+statistics of the physical port receiving the packets.
+
+To identify such a situation, a coverage statistic is available::
+
+  $ ovs-appctl coverage/read-counter vhost_tx_contention
+  59530681
+
+If you want to further debug this contention, perf can be used if your OVS
+daemon had been compiled with debug symbols.
+
+First, identify the point in the binary sources where the contention occurs::
+
+  $ perf probe -x $(which ovs-vswitchd) -L __netdev_dpdk_vhost_send \
+ |grep -B 3 -A 3 'COVERAGE_INC(vhost_tx_contention)'
+   }
+
+   21  if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) {
+   22  COVERAGE_INC(vhost_tx_contention);
+   23  rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
+   }
+
+Then, place a probe at the line where the lock is taken.
+You can add additional context to catch which port and queue are concerned::
+
+  $ perf probe -x $(which ovs-vswitchd) \
+'vhost_tx_contention=__netdev_dpdk_vhost_send:23 netdev->name:string qid'
+
+Finally, gather data and generate a report::
+
+  $ perf record -e probe_ovs:vhost_tx_contention -aR sleep 10
+  [ perf record: Woken up 120 times to write data ]
+  [ perf record: Captured and wrote 30.151 MB perf.data (356278 samples) ]
+
+  $ perf report -F +pid --stdio
+  # To display the perf.data header info, please use --header/--header-only 
options.
+  #
+  #
+  # Total Lost Samples: 0
+  #
+  # Samples: 356K of event 'probe_ovs:vhost_tx_contention'
+  # Event count (approx.): 356278
+  #
+  # Overhead  Pid:CommandTrace output
+  #   .  
+  #
+  55.57%83332:pmd-c01/id:33  (9e9775) name="vhost0" qid=0
+  44.43%8:pmd-c15/id:34  (9e9775) name="vhost0" qid=0
+
+
+  #
+  # (Tip: Treat branches as callchains: perf report --branch-history)
+  #
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 7f709ff..3525870 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -41,6 +41,7 @@
 #include 
 
 #include "cmap.h"
+#include "coverage.h"
 #include "dirs.h"
 #include "dp-packet.h"
 #include "dpdk.h"
@@ -72,6 +73,8 @@ enum {VIRTIO_RXQ, VIRTIO_TXQ, VIRTIO_QNUM};
 VLOG_DEFINE_THIS_MODULE(netdev_dpdk);
 static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
 
+COVERAGE_DEFINE(vhost_tx_contention);
+
 #define DPDK_PORT_WATCHDOG_INTERVAL 5
 
 #define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
@@ -2353,7 +2356,10 @@ __netdev_dpdk_vhost_send(struct netdev *netdev, int qid,
 goto out;
 }
 
-rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
+if (unlikely(!rte_spinlock_trylock(&dev->tx_q[qid].tx_lock))) {
+COVERAGE_INC(vhost_tx_contention);
+rte_spinlock_lock(&dev->tx_q[qid].tx_lock);
+}
 
 cnt = netdev_dpdk_filter_packet_len(dev, cur_pkts, cnt);
 /* Check has QoS has been configured for the netdev */
-- 
1.8.3.1

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev