Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-20 Thread Ilya Maximets
On 9/19/23 13:57, Robin Jarry wrote:
> Ilya Maximets, Sep 19, 2023 at 13:47:
>> With flexibility of appctl comes absolutely no guarantee for API
>> stability.  But as soon as we have structured output, someone will
>> expect it.  If we can agree that users cannot rely on the structure
>> of that structured output, then it's fine.  Otherwise, OVSDB with
>> its defined schema is a much better choice, IMO.  Constructing a
>> single 'select' transaction for OVSDB isn't really much more
>> difficult than constructing an appctl JSON-PRC request.
> 
> I would argue that the ovsdb schema could also be modified so I guess
> this comes up to deciding whether API can be broken or not.

Schema can be modified, but only in major releases.  And columns can't
be removed from the schema or changed, only added.

Appctl output can change completely even in a minor release.

> However, going through ovsdb simply as an API proxy to query live stats
> to ovs-vswitchd seems complex and not resource efficient. Especially if
> the appctl socket is already available and allows to reach vswitchd
> directly.
> 
> I think that for statistics, it would make more sense to go with the
> lightweight option.

OVSDB has a few advantages.  First is that you may actually avoid waking
up ovs-vswitchd if you're fine with a couple of seconds delay in stats.
And I'm not convinced that many users actually need higher precision.
Things like prometheus collector definitely don't need that.  In that
sense the database solution is actually much lighter.

What kind of use case you have in mind for these stats?  Who is the
consumer?

The second advantage is a potential ability to expose the stats over the
network, instead of only to processes that run locally and have enough
privileges to talk to ovs-vswitchd directly.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Robin Jarry
Ilya Maximets, Sep 19, 2023 at 13:47:
> With flexibility of appctl comes absolutely no guarantee for API
> stability.  But as soon as we have structured output, someone will
> expect it.  If we can agree that users cannot rely on the structure
> of that structured output, then it's fine.  Otherwise, OVSDB with
> its defined schema is a much better choice, IMO.  Constructing a
> single 'select' transaction for OVSDB isn't really much more
> difficult than constructing an appctl JSON-PRC request.

I would argue that the ovsdb schema could also be modified so I guess
this comes up to deciding whether API can be broken or not.

However, going through ovsdb simply as an API proxy to query live stats
to ovs-vswitchd seems complex and not resource efficient. Especially if
the appctl socket is already available and allows to reach vswitchd
directly.

I think that for statistics, it would make more sense to go with the
lightweight option.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Ilya Maximets
On 9/19/23 13:27, Eelco Chaudron wrote:
> 
> 
> On 19 Sep 2023, at 10:42, Robin Jarry wrote:
> 
>> Adrian Moreno, Sep 19, 2023 at 09:18:
 Both OVSDB and appctl are literally JSON-RPC protocols.  There is no
 need to re-invent anything.
>>>
>>> Right. Isn't appctl simpler in this case? IIUC, it would still satisfy
>>> the requirements: client decides update interval, flexible schema, etc
>>> with the benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb
>>> communication, no need to store data in both places) and probably
>>> involving less internal code change.
>>>
>>> Just to clarify: I'm referring to allowing JSON output of the (already
>>> JSON-RPC) appctl protocol.
>>
>> I agree. Using ovsdb for ephemeral stats seems weird to me. appctl and

OVSDB already contains a lot of ephemeral stats.

>> a more fluid schema/data structure would be suitable.
>>
>> What kind of API did you have in mind to structure the JSON output?
> 
> I guess we should use the appctl json API. I’m including Jakob who is going 
> to come up with a POC to see if we can add json support for the existing 
> appctl command. Probably starting with the “dpif-netdev/pmd-perf-show” 
> output, but I let him comment.

With flexibility of appctl comes absolutely no guarantee for API
stability.  But as soon as we have structured output, someone will
expect it.  If we can agree that users cannot rely on the structure
of that structured output, then it's fine.  Otherwise, OVSDB with
its defined schema is a much better choice, IMO.  Constructing a
single 'select' transaction for OVSDB isn't really much more
difficult than constructing an appctl JSON-PRC request.

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Eelco Chaudron


On 19 Sep 2023, at 10:42, Robin Jarry wrote:

> Adrian Moreno, Sep 19, 2023 at 09:18:
>>> Both OVSDB and appctl are literally JSON-RPC protocols.  There is no
>>> need to re-invent anything.
>>
>> Right. Isn't appctl simpler in this case? IIUC, it would still satisfy
>> the requirements: client decides update interval, flexible schema, etc
>> with the benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb
>> communication, no need to store data in both places) and probably
>> involving less internal code change.
>>
>> Just to clarify: I'm referring to allowing JSON output of the (already
>> JSON-RPC) appctl protocol.
>
> I agree. Using ovsdb for ephemeral stats seems weird to me. appctl and
> a more fluid schema/data structure would be suitable.
>
> What kind of API did you have in mind to structure the JSON output?

I guess we should use the appctl json API. I’m including Jakob who is going to 
come up with a POC to see if we can add json support for the existing appctl 
command. Probably starting with the “dpif-netdev/pmd-perf-show” output, but I 
let him comment.

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Robin Jarry
Adrian Moreno, Sep 19, 2023 at 09:18:
> > Both OVSDB and appctl are literally JSON-RPC protocols.  There is no
> > need to re-invent anything.
>
> Right. Isn't appctl simpler in this case? IIUC, it would still satisfy
> the requirements: client decides update interval, flexible schema, etc
> with the benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb
> communication, no need to store data in both places) and probably
> involving less internal code change.
>
> Just to clarify: I'm referring to allowing JSON output of the (already
> JSON-RPC) appctl protocol.

I agree. Using ovsdb for ephemeral stats seems weird to me. appctl and
a more fluid schema/data structure would be suitable.

What kind of API did you have in mind to structure the JSON output?

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-19 Thread Adrian Moreno



On 9/18/23 20:24, Ilya Maximets wrote:

On 9/12/23 15:47, Eelco Chaudron wrote:



On 12 Sep 2023, at 15:19, Robin Jarry wrote:


Eelco Chaudron, Sep 12, 2023 at 09:17:

I feel like if we do need another way of getting (real time)
statistics out of OVS, we should use the same communication channel as
the other ovs-xxx utilities are using. But rather than returning
text-based responses, we might be able to make it JSON (which is
already used by the dbase). I know that Adrian is already
investigating machine-readable output for some existing utilities,
maybe it can be extended for the (pmd) statistics use case.

Using something like the DPDK telemetry socket, might not work for
other use cases where DPDK is not in play.


Maybe the telemetry socket code could be reused even when DPDK is not in
play.


Many distributions like Debian and Ubuntu do not even build their main
OVS packages with DPDK.  They have a separate DPDK-enabled package.  So,
this telemetry will not be available.  Also, in order to use telemetry,
you need to initialize DPDK, which is a heavy and completely unnecessary
operation if DPDK is not going to be used.


It already has all the APIs to return structured data and
serialize it to JSON. It would be nice not to have to reinvent the
wheel.


Both OVSDB and appctl are literally JSON-RPC protocols.  There is no need
to re-invent anything.



Right. Isn't appctl simpler in this case? IIUC, it would still satisfy the 
requirements: client decides update interval, flexible schema, etc with the 
benefits of incurring in less cost (no ovs-vswithcd <-> ovsdb  communication, no 
need to store data in both places) and probably involving less internal code change.


Just to clarify: I'm referring to allowing JSON output of the (already JSON-RPC) 
appctl protocol.



But this is a new type of connecting into OVS, and I feel like we should keep 
the existing infrastructure, and not add another connection type. This would 
make it easy for existing tools to also benefit from the new format over the 
existing connection methods.

Any input from others in the community?


I agree, addition of a new connection type doesn't look good to me either.


I had considered using ovsdb but it seemed to me
less suitable for a few reasons:

* I had understood that ovsdb is a configuration database, not a state
   reporting database.


OVSDB already reports port statistics and some system stats, so
it's fine to expose things like that.  They are usually ephemeral
columns that do not end up written on disk.


* To have reliable and up to date numbers, ovs would need to push them
   at high rate to the database so that clients to get outdated cpu
   usage. The DPDK telemetry socket is real-time, the current numbers are
   returned on every request.


There is a mechanism to wait for ovs-vswitchd reply on request.
So, ovs-vswitch may update stats in the database the moment they
were requested by the client.  Should not be an issue.  This is
working today for port status and some other things.


* I would need to define a custom schema / table to store structured
   information in the db. The DPDK telemetry socket already has a schema
   defined for this.


It's true that we'll need some new columns and maybe a table, but
that should not be hard to do.  And it's even better because we'll
be able to define columns that make sense for OVS.


* Accessing ovsdb requires a library making it more complex to use for
   telemetry scrapers. The DPDK telemetry socket can be accessed with
   a standalone python script with no external dependencies[1].


Accessing OVSDB doesn't require a library, it's just a JSON-RPC [1].
We do provide our own implementation of the protocol, but there
is no need to use it, especially for basic "list-all" type of requests.
Most languages like python have built-in JSON libraries.  Some have
JSON-RPC libraries.

[1] https://www.rfc-editor.org/rfc/rfc7047

Best regards, Ilya Maximets.




--
Adrián Moreno

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-18 Thread Ilya Maximets
On 9/12/23 15:47, Eelco Chaudron wrote:
> 
> 
> On 12 Sep 2023, at 15:19, Robin Jarry wrote:
> 
>> Eelco Chaudron, Sep 12, 2023 at 09:17:
>>> I feel like if we do need another way of getting (real time)
>>> statistics out of OVS, we should use the same communication channel as
>>> the other ovs-xxx utilities are using. But rather than returning
>>> text-based responses, we might be able to make it JSON (which is
>>> already used by the dbase). I know that Adrian is already
>>> investigating machine-readable output for some existing utilities,
>>> maybe it can be extended for the (pmd) statistics use case.
>>>
>>> Using something like the DPDK telemetry socket, might not work for
>>> other use cases where DPDK is not in play.
>>
>> Maybe the telemetry socket code could be reused even when DPDK is not in
>> play.

Many distributions like Debian and Ubuntu do not even build their main
OVS packages with DPDK.  They have a separate DPDK-enabled package.  So,
this telemetry will not be available.  Also, in order to use telemetry,
you need to initialize DPDK, which is a heavy and completely unnecessary
operation if DPDK is not going to be used.

>> It already has all the APIs to return structured data and
>> serialize it to JSON. It would be nice not to have to reinvent the
>> wheel.

Both OVSDB and appctl are literally JSON-RPC protocols.  There is no need
to re-invent anything.

> But this is a new type of connecting into OVS, and I feel like we should keep 
> the existing infrastructure, and not add another connection type. This would 
> make it easy for existing tools to also benefit from the new format over the 
> existing connection methods.
> 
> Any input from others in the community?

I agree, addition of a new connection type doesn't look good to me either.

> I had considered using ovsdb but it seemed to me
> less suitable for a few reasons:
> 
> * I had understood that ovsdb is a configuration database, not a state
>   reporting database.

OVSDB already reports port statistics and some system stats, so
it's fine to expose things like that.  They are usually ephemeral
columns that do not end up written on disk.

> * To have reliable and up to date numbers, ovs would need to push them
>   at high rate to the database so that clients to get outdated cpu
>   usage. The DPDK telemetry socket is real-time, the current numbers are
>   returned on every request.

There is a mechanism to wait for ovs-vswitchd reply on request.
So, ovs-vswitch may update stats in the database the moment they
were requested by the client.  Should not be an issue.  This is
working today for port status and some other things.

> * I would need to define a custom schema / table to store structured
>   information in the db. The DPDK telemetry socket already has a schema
>   defined for this.

It's true that we'll need some new columns and maybe a table, but
that should not be hard to do.  And it's even better because we'll
be able to define columns that make sense for OVS.

> * Accessing ovsdb requires a library making it more complex to use for
>   telemetry scrapers. The DPDK telemetry socket can be accessed with
>   a standalone python script with no external dependencies[1].

Accessing OVSDB doesn't require a library, it's just a JSON-RPC [1].
We do provide our own implementation of the protocol, but there
is no need to use it, especially for basic "list-all" type of requests.
Most languages like python have built-in JSON libraries.  Some have
JSON-RPC libraries.

[1] https://www.rfc-editor.org/rfc/rfc7047

Best regards, Ilya Maximets.
___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-12 Thread Eelco Chaudron



On 12 Sep 2023, at 15:19, Robin Jarry wrote:

> Eelco Chaudron, Sep 12, 2023 at 09:17:
>> I feel like if we do need another way of getting (real time)
>> statistics out of OVS, we should use the same communication channel as
>> the other ovs-xxx utilities are using. But rather than returning
>> text-based responses, we might be able to make it JSON (which is
>> already used by the dbase). I know that Adrian is already
>> investigating machine-readable output for some existing utilities,
>> maybe it can be extended for the (pmd) statistics use case.
>>
>> Using something like the DPDK telemetry socket, might not work for
>> other use cases where DPDK is not in play.
>
> Maybe the telemetry socket code could be reused even when DPDK is not in
> play. It already has all the APIs to return structured data and
> serialize it to JSON. It would be nice not to have to reinvent the
> wheel.

But this is a new type of connecting into OVS, and I feel like we should keep 
the existing infrastructure, and not add another connection type. This would 
make it easy for existing tools to also benefit from the new format over the 
existing connection methods.

Any input from others in the community? Adrian maybe you can share your 
research, ideas?

//Eelco

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-12 Thread Robin Jarry
Eelco Chaudron, Sep 12, 2023 at 09:17:
> I feel like if we do need another way of getting (real time)
> statistics out of OVS, we should use the same communication channel as
> the other ovs-xxx utilities are using. But rather than returning
> text-based responses, we might be able to make it JSON (which is
> already used by the dbase). I know that Adrian is already
> investigating machine-readable output for some existing utilities,
> maybe it can be extended for the (pmd) statistics use case.
>
> Using something like the DPDK telemetry socket, might not work for
> other use cases where DPDK is not in play.

Maybe the telemetry socket code could be reused even when DPDK is not in
play. It already has all the APIs to return structured data and
serialize it to JSON. It would be nice not to have to reinvent the
wheel.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-12 Thread Eelco Chaudron


On 11 Sep 2023, at 12:41, Robin Jarry wrote:

> Hey Kevin,
>
> Kevin Traynor, Sep 07, 2023 at 15:37:
>> This came up in conversation with other maintainers as I mentioned I was
>> reviewing and the question raised was - Why add this ? if you want these
>> values exposed, wouldn't it be better to to add to ovsdb ?
>
> That's a good point. I had considered using ovsdb but it seemed to me
> less suitable for a few reasons:
>
> * I had understood that ovsdb is a configuration database, not a state
>   reporting database.
>
> * To have reliable and up to date numbers, ovs would need to push them
>   at high rate to the database so that clients to get outdated cpu
>   usage. The DPDK telemetry socket is real-time, the current numbers are
>   returned on every request.
>
> * I would need to define a custom schema / table to store structured
>   information in the db. The DPDK telemetry socket already has a schema
>   defined for this.
>
> * Accessing ovsdb requires a library making it more complex to use for
>   telemetry scrapers. The DPDK telemetry socket can be accessed with
>   a standalone python script with no external dependencies[1].
>
> [1]: 
> https://github.com/rjarry/dpdk/blob/main/usertools/prometheus-dpdk-exporter.py#L135-L143
>
> Maybe my observations are wrong, please do correct me if they are.

I feel like if we do need another way of getting (real time) statistics out of 
OVS, we should use the same communication channel as the other ovs-xxx 
utilities are using. But rather than returning text-based responses, we might 
be able to make it JSON (which is already used by the dbase). I know that 
Adrian is already investigating machine-readable output for some existing 
utilities, maybe it can be extended for the (pmd) statistics use case.

Using something like the DPDK telemetry socket, might not work for other use 
cases where DPDK is not in play.

>> Are you looking for individual lcore usage with identification of that
>> pmd? or overall aggregate usage ?
>>
>> I ask because it will report lcore id's which would need to be mapped to
>> pmd core id's for anything regarding individual pmds.
>>
>> That can be found in ovs-vswitchd.log or checked locally with
>> 'ovs-appctl dpdk/lcore-list' but assuming if they were available, then
>> user would not be using dpdk telemetry anyway.
>
> I would assume that the important data is the aggregate usage for
> overall monitoring and resource planing. Individual pmd usage can be
> accessed for fine tuning and debugging via appctl.
>
>> These stats are cumulative so in the absence of 'ovs-appctl
>> dpif-netdev/pmd-stats-clear'  that would need to be taken care of with
>> some post-processing by whatever is pulling these stats - otherwise
>> you'll get cumulative stats for an unknown time period and unknown
>> traffic profile (e.g. it would be counting before any traffic started).
>>
>> These might also be reset with pmd-stats-clear independently, so that
>> would need to be accounted for too.
>
> The only important data point that we need is the ratio between
> busy/(busy + idle) over a specified delta which any scraper can do.
> I consider these numbers like any other counter that can eventually be
> reset.
>
> See this reply from Morten Brørup on dpdk-dev for more context:
>
> https://lore.kernel.org/dpdk-dev/98cbd80474fa8b44bf855df32c47dc35d87...@smartserver.smartshare.dk/
>
>> Another thing I noticed is that without the pmd-sleep info the stats in
>> isolation can be misleading. Example below:
>>
>> With low rate traffic and clearing stats between 10 sec runs
>>
>> 2023-09-07T13:14:56Z|00158|dpif_netdev|INFO|PMD max sleep request is 0
>> usecs.
>> 2023-09-07T13:14:56Z|00159|dpif_netdev|INFO|PMD load based sleeps are
>> disabled.
>>
>> Time: 13:15:06.842
>> Measurement duration: 10.009 s
>>
>> pmd thread numa_id 0 core_id 8:
>>
>>Iterations: 51712564  (0.19 us/it)
>>- Used TSC cycles:   26021354654  (100.0 % of total cycles)
>>- idle iterations:  51710963  ( 99.9 % of used cycles)
>>- busy iterations:  1601  (  0.1 % of used cycles)
>>- sleep iterations:0  (  0.0 % of iterations)
>> ^^^ can see here that pmd does not sleep and is 0.1% busy
>>
>>Sleep time (us):   0  (  0 us/iteration avg.)
>>Rx packets:37250  (4 Kpps, 866 cycles/pkt)
>>Datapath passes:   37250  (1.00 passes/pkt)
>>- PHWOL hits:  0  (  0.0 %)
>>- MFEX Opt hits:   0  (  0.0 %)
>>- Simple Match hits:   37250  (100.0 %)
>>- EMC hits:0  (  0.0 %)
>>- SMC hits:0  (  0.0 %)
>>- Megaflow hits:   0  (  0.0 %, 0.00 subtbl lookups/hit)
>>- Upcalls: 0  (  0.0 %, 0.0 us/upcall)
>>- Lost upcalls:0  (  0.0 %)
>>Tx packets:0
>>
>> {
>>"/eal/lcore/usage": {
>>  "lcore_ids": [
>>1
>>  ],
>>  

Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-11 Thread Robin Jarry
Hey Kevin,

Kevin Traynor, Sep 07, 2023 at 15:37:
> This came up in conversation with other maintainers as I mentioned I was 
> reviewing and the question raised was - Why add this ? if you want these 
> values exposed, wouldn't it be better to to add to ovsdb ?

That's a good point. I had considered using ovsdb but it seemed to me
less suitable for a few reasons:

* I had understood that ovsdb is a configuration database, not a state
  reporting database.

* To have reliable and up to date numbers, ovs would need to push them
  at high rate to the database so that clients to get outdated cpu
  usage. The DPDK telemetry socket is real-time, the current numbers are
  returned on every request.

* I would need to define a custom schema / table to store structured
  information in the db. The DPDK telemetry socket already has a schema
  defined for this.

* Accessing ovsdb requires a library making it more complex to use for
  telemetry scrapers. The DPDK telemetry socket can be accessed with
  a standalone python script with no external dependencies[1].

[1]: 
https://github.com/rjarry/dpdk/blob/main/usertools/prometheus-dpdk-exporter.py#L135-L143

Maybe my observations are wrong, please do correct me if they are.

> Are you looking for individual lcore usage with identification of that 
> pmd? or overall aggregate usage ?
>
> I ask because it will report lcore id's which would need to be mapped to 
> pmd core id's for anything regarding individual pmds.
>
> That can be found in ovs-vswitchd.log or checked locally with 
> 'ovs-appctl dpdk/lcore-list' but assuming if they were available, then 
> user would not be using dpdk telemetry anyway.

I would assume that the important data is the aggregate usage for
overall monitoring and resource planing. Individual pmd usage can be
accessed for fine tuning and debugging via appctl.

> These stats are cumulative so in the absence of 'ovs-appctl 
> dpif-netdev/pmd-stats-clear'  that would need to be taken care of with 
> some post-processing by whatever is pulling these stats - otherwise 
> you'll get cumulative stats for an unknown time period and unknown 
> traffic profile (e.g. it would be counting before any traffic started).
>
> These might also be reset with pmd-stats-clear independently, so that 
> would need to be accounted for too.

The only important data point that we need is the ratio between
busy/(busy + idle) over a specified delta which any scraper can do.
I consider these numbers like any other counter that can eventually be
reset.

See this reply from Morten Brørup on dpdk-dev for more context:

https://lore.kernel.org/dpdk-dev/98cbd80474fa8b44bf855df32c47dc35d87...@smartserver.smartshare.dk/

> Another thing I noticed is that without the pmd-sleep info the stats in 
> isolation can be misleading. Example below:
>
> With low rate traffic and clearing stats between 10 sec runs
>
> 2023-09-07T13:14:56Z|00158|dpif_netdev|INFO|PMD max sleep request is 0 
> usecs.
> 2023-09-07T13:14:56Z|00159|dpif_netdev|INFO|PMD load based sleeps are 
> disabled.
>
> Time: 13:15:06.842
> Measurement duration: 10.009 s
>
> pmd thread numa_id 0 core_id 8:
>
>Iterations: 51712564  (0.19 us/it)
>- Used TSC cycles:   26021354654  (100.0 % of total cycles)
>- idle iterations:  51710963  ( 99.9 % of used cycles)
>- busy iterations:  1601  (  0.1 % of used cycles)
>- sleep iterations:0  (  0.0 % of iterations)
> ^^^ can see here that pmd does not sleep and is 0.1% busy
>
>Sleep time (us):   0  (  0 us/iteration avg.)
>Rx packets:37250  (4 Kpps, 866 cycles/pkt)
>Datapath passes:   37250  (1.00 passes/pkt)
>- PHWOL hits:  0  (  0.0 %)
>- MFEX Opt hits:   0  (  0.0 %)
>- Simple Match hits:   37250  (100.0 %)
>- EMC hits:0  (  0.0 %)
>- SMC hits:0  (  0.0 %)
>- Megaflow hits:   0  (  0.0 %, 0.00 subtbl lookups/hit)
>- Upcalls: 0  (  0.0 %, 0.0 us/upcall)
>- Lost upcalls:0  (  0.0 %)
>Tx packets:0
>
> {
>"/eal/lcore/usage": {
>  "lcore_ids": [
>1
>  ],
>  "total_cycles": [
>26127284389
>  ],
>  "busy_cycles": [
>32370313
>  ]
>}
> }
>
> ^^^ This in isolation implies pmd is 32370313/26127284389 0.12% busy 
> which is true
>
> 2023-09-07T13:15:06Z|00160|dpif_netdev|INFO|PMD max sleep request is 500 
> usecs.
> 2023-09-07T13:15:06Z|00161|dpif_netdev|INFO|PMD load based sleeps are 
> enabled.
>
> Time: 13:15:16.908
> Measurement duration: 10.008 s
>
> pmd thread numa_id 0 core_id 8:
>
>Iterations:75197  (133.09 us/it)
>- Used TSC cycles: 237910969  (  0.9 % of total cycles)
>- idle iterations: 73782  ( 74.4 % of used cycles)
>- busy iterations:  1415  ( 25.6 % of used cycles)
>- sleep iterations:

Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-09-07 Thread Kevin Traynor



On 31/08/2023 14:11, Robin Jarry wrote:

Since DPDK 23.03, it is possible to register a callback to report lcore
TSC cycles usage. Reuse the busy/idle cycles gathering in dpif-netdev
and expose them to the DPDK telemetry socket.



Hi Robin,

This came up in conversation with other maintainers as I mentioned I was 
reviewing and the question raised was - Why add this ? if you want these 
values exposed, wouldn't it be better to to add to ovsdb ?


Above is probably a more important starting point, but I have some 
comments on the stats from this approach below too.


thanks,
Kevin.


Upon dpdk_attach_thread, record the mapping between the DPDK lcore_id
and the dpif-netdev core_id. Reuse that mapping in the lcore usage
callback to invoke dpif_netdev_get_pmd_cycles.

Here is an example output:

   ~# ovs-appctl dpif-netdev/pmd-stats-show | grep -e ^pmd -e cycles:
   pmd thread numa_id 0 core_id 8:
 idle cycles: 2720796781680 (100.00%)
 processing cycles: 3566020 (0.00%)
   pmd thread numa_id 0 core_id 9:
 idle cycles: 2718974371440 (100.00%)
 processing cycles: 3136840 (0.00%)
   pmd thread numa_id 0 core_id 72:
   pmd thread numa_id 0 core_id 73:

   ~# echo /eal/lcore/usage | dpdk-telemetry.py | jq
   {
 "/eal/lcore/usage": {
   "lcore_ids": [
 3,
 5,
 11,
 15
   ],


Are you looking for individual lcore usage with identification of that 
pmd? or overall aggregate usage ?


I ask because it will report lcore id's which would need to be mapped to 
pmd core id's for anything regarding individual pmds.


That can be found in ovs-vswitchd.log or checked locally with 
'ovs-appctl dpdk/lcore-list' but assuming if they were available, then 
user would not be using dpdk telemetry anyway.



   "total_cycles": [
 2725722342740,
 2725722347480,
 2723899464040,
 2725722354980
   ],
   "busy_cycles": [
 3566020,
 3566020,
 3136840,
 3566020
   ]
 }
   }


These stats are cumulative so in the absence of 'ovs-appctl 
dpif-netdev/pmd-stats-clear'  that would need to be taken care of with 
some post-processing by whatever is pulling these stats - otherwise 
you'll get cumulative stats for an unknown time period and unknown 
traffic profile (e.g. it would be counting before any traffic started).


These might also be reset with pmd-stats-clear independently, so that 
would need to be accounted for too.


Another thing I noticed is that without the pmd-sleep info the stats in 
isolation can be misleading. Example below:


With low rate traffic and clearing stats between 10 sec runs

2023-09-07T13:14:56Z|00158|dpif_netdev|INFO|PMD max sleep request is 0 
usecs.
2023-09-07T13:14:56Z|00159|dpif_netdev|INFO|PMD load based sleeps are 
disabled.


Time: 13:15:06.842
Measurement duration: 10.009 s

pmd thread numa_id 0 core_id 8:

  Iterations: 51712564  (0.19 us/it)
  - Used TSC cycles:   26021354654  (100.0 % of total cycles)
  - idle iterations:  51710963  ( 99.9 % of used cycles)
  - busy iterations:  1601  (  0.1 % of used cycles)
  - sleep iterations:0  (  0.0 % of iterations)
^^^ can see here that pmd does not sleep and is 0.1% busy

  Sleep time (us):   0  (  0 us/iteration avg.)
  Rx packets:37250  (4 Kpps, 866 cycles/pkt)
  Datapath passes:   37250  (1.00 passes/pkt)
  - PHWOL hits:  0  (  0.0 %)
  - MFEX Opt hits:   0  (  0.0 %)
  - Simple Match hits:   37250  (100.0 %)
  - EMC hits:0  (  0.0 %)
  - SMC hits:0  (  0.0 %)
  - Megaflow hits:   0  (  0.0 %, 0.00 subtbl lookups/hit)
  - Upcalls: 0  (  0.0 %, 0.0 us/upcall)
  - Lost upcalls:0  (  0.0 %)
  Tx packets:0

{
  "/eal/lcore/usage": {
"lcore_ids": [
  1
],
"total_cycles": [
  26127284389
],
"busy_cycles": [
  32370313
]
  }
}

^^^ This in isolation implies pmd is 32370313/26127284389 0.12% busy 
which is true


2023-09-07T13:15:06Z|00160|dpif_netdev|INFO|PMD max sleep request is 500 
usecs.
2023-09-07T13:15:06Z|00161|dpif_netdev|INFO|PMD load based sleeps are 
enabled.


Time: 13:15:16.908
Measurement duration: 10.008 s

pmd thread numa_id 0 core_id 8:

  Iterations:75197  (133.09 us/it)
  - Used TSC cycles: 237910969  (  0.9 % of total cycles)
  - idle iterations: 73782  ( 74.4 % of used cycles)
  - busy iterations:  1415  ( 25.6 % of used cycles)
  - sleep iterations:74033  ( 98.5 % of iterations)
^^^ can see here that pmd spends most of the time sleeping and is 25% 
busy when it is not sleeping


  Sleep time (us): 9916314  (134 us/iteration avg.)
  Rx packets:37249  (4 Kpps, 1637 cycles/pkt)
  Datapath passes:   37249  (1.00 passes/pkt)
  - PHWOL hits:  0  (  0.0 %)
  - MFEX Opt 

Re: [ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-08-31 Thread Robin Jarry
Robin Jarry, Aug 31, 2023 at 15:11:
> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
> index 70b953ae6dd3..ebf43a0f62e4 100644
> --- a/lib/dpif-netdev.c
> +++ b/lib/dpif-netdev.c
> @@ -1427,6 +1427,41 @@ dpif_netdev_pmd_rebalance(struct unixctl_conn *conn, 
> int argc,
>  ds_destroy();
>  }
>  
> +static void
> +dpif_netdev_get_pmd_cycles(unsigned int core_id,
> +uint64_t *busy_cycles, uint64_t *total_cycles)
> +{
> +struct dp_netdev_pmd_thread **pmd_list = NULL;
> +uint64_t stats[PMD_N_STATS];
> +struct dp_netdev *dp;
> +size_t num_pmds;
> +
> +ovs_mutex_lock(_netdev_mutex);
> +
> +if (shash_count(_netdevs) != 1) {
> +goto out;
> +}
> +
> +dp = shash_first(_netdevs)->data;
> +sorted_poll_thread_list(dp, _list, _pmds);
> +
> +for (size_t i = 0; i < num_pmds; i++) {
> +struct dp_netdev_pmd_thread *pmd = pmd_list[i];
> +
> +if (pmd->core_id == core_id) {
> +continue;
> +}

This logic is reversed. Too bad for the last minute cleanup/rework.

Before sending a v2, I'll wait to see if there are other things to
change.

By the way, this patch is targeted for the dpdk-latest branch. I forgot
to change the subject prefix. I'll do that for v2 as well.

___
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev


[ovs-dev] [PATCH] dpdk: expose cpu usage stats on telemetry socket

2023-08-31 Thread Robin Jarry
Since DPDK 23.03, it is possible to register a callback to report lcore
TSC cycles usage. Reuse the busy/idle cycles gathering in dpif-netdev
and expose them to the DPDK telemetry socket.

Upon dpdk_attach_thread, record the mapping between the DPDK lcore_id
and the dpif-netdev core_id. Reuse that mapping in the lcore usage
callback to invoke dpif_netdev_get_pmd_cycles.

Here is an example output:

  ~# ovs-appctl dpif-netdev/pmd-stats-show | grep -e ^pmd -e cycles:
  pmd thread numa_id 0 core_id 8:
idle cycles: 2720796781680 (100.00%)
processing cycles: 3566020 (0.00%)
  pmd thread numa_id 0 core_id 9:
idle cycles: 2718974371440 (100.00%)
processing cycles: 3136840 (0.00%)
  pmd thread numa_id 0 core_id 72:
  pmd thread numa_id 0 core_id 73:

  ~# echo /eal/lcore/usage | dpdk-telemetry.py | jq
  {
"/eal/lcore/usage": {
  "lcore_ids": [
3,
5,
11,
15
  ],
  "total_cycles": [
2725722342740,
2725722347480,
2723899464040,
2725722354980
  ],
  "busy_cycles": [
3566020,
3566020,
3136840,
3566020
  ]
}
  }

Link: https://git.dpdk.org/dpdk/commit/?id=9ab1804922ba583b0b16
Cc: David Marchand 
Cc: Kevin Traynor 
Signed-off-by: Robin Jarry 
---
 lib/dpdk-stub.c   |  5 +++
 lib/dpdk.c| 95 ++-
 lib/dpdk.h|  5 +++
 lib/dpif-netdev.c | 38 +++
 4 files changed, 142 insertions(+), 1 deletion(-)

diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index 58ebf6cb62cd..02fb561bea7b 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -49,6 +49,11 @@ dpdk_detach_thread(void)
 {
 }
 
+void
+dpdk_register_core_usage_callback(dpdk_core_usage_cb *cb OVS_UNUSED)
+{
+}
+
 bool
 dpdk_available(void)
 {
diff --git a/lib/dpdk.c b/lib/dpdk.c
index d76d53f8f16c..31871300f719 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -23,6 +23,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -310,6 +311,10 @@ malloc_dump_stats_wrapper(FILE *stream)
 rte_malloc_dump_stats(stream, NULL);
 }
 
+#ifdef ALLOW_EXPERIMENTAL_API
+static int dpdk_get_lcore_cycles(unsigned int, struct rte_lcore_usage *);
+#endif
+
 static bool
 dpdk_init__(const struct smap *ovs_other_config)
 {
@@ -440,6 +445,10 @@ dpdk_init__(const struct smap *ovs_other_config)
 /* We are called from the main thread here */
 RTE_PER_LCORE(_lcore_id) = NON_PMD_CORE_ID;
 
+#ifdef ALLOW_EXPERIMENTAL_API
+rte_lcore_register_usage_cb(dpdk_get_lcore_cycles);
+#endif
+
 /* Finally, register the dpdk classes */
 netdev_dpdk_register(ovs_other_config);
 netdev_register_flow_api_provider(_offload_dpdk);
@@ -490,9 +499,52 @@ dpdk_available(void)
 return initialized;
 }
 
+struct lcore_id_map {
+unsigned int lcore_id;
+unsigned int pmd_core_id;
+};
+
+/* Protects against changes to 'lcore_id_maps'. */
+struct ovs_mutex lcore_id_maps_mutex = OVS_MUTEX_INITIALIZER;
+
+/* Contains all 'struct lcore_id_map's. */
+static struct shash lcore_id_maps OVS_GUARDED_BY(lcore_id_maps_mutex)
+= SHASH_INITIALIZER(_id_maps);
+
+static void
+lcore_id_to_str(char *buf, size_t len, unsigned int lcore_id)
+{
+int n;
+
+n = snprintf(buf, len, "%u", lcore_id);
+if (n < 0) {
+VLOG_WARN("Failed to format lcore_id: %s", ovs_strerror(errno));
+n = 0;
+}
+buf[n] = '\0';
+}
+
+static void
+lcore_id_map_update(unsigned int lcore_id, unsigned int cpu, bool add)
+{
+char buf[128];
+
+lcore_id_to_str(buf, sizeof buf, lcore_id);
+
+ovs_mutex_lock(_id_maps_mutex);
+if (add) {
+shash_replace(_id_maps, buf, (void *) (uintptr_t) cpu);
+} else {
+shash_find_and_delete(_id_maps, buf);
+}
+ovs_mutex_unlock(_id_maps_mutex);
+}
+
 bool
 dpdk_attach_thread(unsigned cpu)
 {
+unsigned int lcore_id;
+
 /* NON_PMD_CORE_ID is reserved for use by non pmd threads. */
 ovs_assert(cpu != NON_PMD_CORE_ID);
 
@@ -506,7 +558,9 @@ dpdk_attach_thread(unsigned cpu)
 return false;
 }
 
-VLOG_INFO("PMD thread uses DPDK lcore %u.", rte_lcore_id());
+lcore_id = rte_lcore_id();
+lcore_id_map_update(lcore_id, cpu, true);
+VLOG_INFO("PMD thread uses DPDK lcore %u.", lcore_id);
 return true;
 }
 
@@ -516,10 +570,49 @@ dpdk_detach_thread(void)
 unsigned int lcore_id;
 
 lcore_id = rte_lcore_id();
+lcore_id_map_update(lcore_id, 0, false);
+
 rte_thread_unregister();
 VLOG_INFO("PMD thread released DPDK lcore %u.", lcore_id);
 }
 
+static dpdk_core_usage_cb_t *core_usage_cb;
+
+void
+dpdk_register_core_usage_callback(dpdk_core_usage_cb_t *cb)
+{
+core_usage_cb = cb;
+}
+
+#ifdef ALLOW_EXPERIMENTAL_API
+static int
+dpdk_get_lcore_cycles(unsigned int lcore_id, struct rte_lcore_usage *usage)
+{
+struct shash_node *node;
+unsigned int core_id;
+char buf[128];
+
+if (!core_usage_cb) {
+return -1;
+}
+
+