Adrian Moreno <amore...@redhat.com> writes:

> On 2/1/24 10:02, Eelco Chaudron wrote:
>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>> 
>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>
>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>
>>>>> From: Kevin Sprague <ksprague0...@gmail.com>
>>>>>
>>>>> During normal operations, it is useful to understand when a particular 
>>>>> flow
>>>>> gets removed from the system. This can be useful when debugging 
>>>>> performance
>>>>> issues tied to ofproto flow changes, trying to determine deployed traffic
>>>>> patterns, or while debugging dynamic systems where ports come and go.
>>>>>
>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>> expiration.
>>>>> The existing debugging infrastructure could tell us when a flow was added 
>>>>> to
>>>>> the datapath, but not when it was removed or why.
>>>>>
>>>>> This change introduces a USDT probe at the point where the revalidator
>>>>> determines that the flow should be removed.  Additionally, we track the
>>>>> reason for the flow eviction and provide that information as well.  With
>>>>> this change, we can track the complete flow lifecycle for the netlink
>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put USDT, 
>>>>> and
>>>>> the revaldiator USDT, letting us watch as flows are added and removed from
>>>>> the kernel datapath.
>>>>>
>>>>> This change only enables this information via USDT probe, so it won't be
>>>>> possible to access this information any other way (see:
>>>>> Documentation/topics/usdt-probes.rst).
>>>>>
>>>>> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>> which serves as a demonstration of how the new USDT probe might be used
>>>>> going forward.
>>>>>
>>>>> Signed-off-by: Kevin Sprague <ksprague0...@gmail.com>
>>>>> Co-authored-by: Aaron Conole <acon...@redhat.com>
>>>>> Signed-off-by: Aaron Conole <acon...@redhat.com>
>>>>
>>>> Thanks for following this up Aaron! See comments on this patch
> below. I have no additional comments on patch 2.
>>>>
>>>> Cheers,
>>>>
>>>> Eelco
>>>>
>>>>
>>>>> ---
>>>>>   Documentation/topics/usdt-probes.rst         |   1 +
>>>>>   ofproto/ofproto-dpif-upcall.c                |  42 +-
>>>>>   utilities/automake.mk                        |   3 +
>>>>>   utilities/usdt-scripts/flow_reval_monitor.py | 653 +++++++++++++++++++
>>>>>   4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>   create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>
>>>>> diff --git a/Documentation/topics/usdt-probes.rst 
>>>>> b/Documentation/topics/usdt-probes.rst
>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>   - dpif_recv:recv_upcall
>>>>>   - main:poll_block
>>>>>   - main:run_start
>>>>> +- revalidate:flow_result
>>>>>   - revalidate_ukey\_\_:entry
>>>>>   - revalidate_ukey\_\_:exit
>>>>>   - udpif_revalidator:start_dump
>>>>
>>>> You are missing the specific flow_result result section. This is from the 
>>>> previous patch:
>>>
>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>
>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>   - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>
>>>>
>>>> +probe revalidate:flow_result
>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>> +
>>>> +**Description**:
>>>> +This probe is triggered when the revalidator decides whether or not to
>>>> +revalidate a flow. ``reason`` is an enum that denotes that either the flow
>>>> +is being kept, or the reason why the flow is being deleted. The
>>>> +``flow_reval_monitor.py`` script uses this probe to notify users when 
>>>> flows
>>>> +matching user-provided criteria are deleted.
>>>> +
>>>> +**Arguments**:
>>>> +
>>>> +- *arg0*: ``(enum flow_del_reason) reason``
>>>> +- *arg1*: ``(struct udpif *) udpif``
>>>> +- *arg2*: ``(struct udpif_key *) ukey``
>>>> +
>>>> +**Script references**:
>>>> +
>>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>>>> +
>>>> +
>>>>   Adding your own probes
>>>>   ----------------------
>>>>
>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
>>>>> index b5cbeed878..97d75833f7 100644
>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>> @@ -269,6 +269,18 @@ enum ukey_state {
>>>>>   };
>>>>>   #define N_UKEY_STATES (UKEY_DELETED + 1)
>>>>>
>>>>> +enum flow_del_reason {
>>>>> +    FDR_REVALIDATE = 0,     /* The flow was revalidated. */
>>>>
>>>> It was called FDR_FLOW_LIVE before, which might make more
> sense. As the flow is just NOT deleted. It might or might not have
> been revalidated. Thoughts?
>>>
>>> I think it had to have been revalidated if we emit the reason, because
>>> we only emit the reason code after revalidation.  IE: there are many
>>> places where we skip revalidation but the flow stays live - and we don't
>>> emit reasons in those cases.
>>>
>>> So at least for this patch, it MUST have been revalidated.  But maybe in
>>> the future, we would want to catch cases where the flow hasn't been.  In
>>> that case, it makes sense to add the FDR_FLOW_LIVE at that time - I
>>> think.
>>>
>>> Maybe you disagree?
>> Well, it depends on how you define revalidation, it might only have
>> updated the counters. i.e. it all depends on ‘bool need_revalidate =
>> ukey->reval_seq != reval_seq;’ in revalidate_ukey(). That was why I
>> opted for a more general name.
>> 
>>>>> +    FDR_FLOW_IDLE,          /* The flow went unused and was deleted. */
>>>>> +    FDR_TOO_EXPENSIVE,      /* The flow was too expensive to revalidate. 
>>>>> */
>>>>> +    FDR_FLOW_WILDCARDED,    /* The flow needed a narrower wildcard mask. 
>>>>> */
>>>>> +    FDR_BAD_ODP_FIT,        /* The flow had a bad ODP flow fit. */
>>>>> +    FDR_NO_OFPROTO,         /* The flow didn't have an associated 
>>>>> ofproto. */
>>>>> +    FDR_XLATION_ERROR,      /* There was an error translating the flow. 
>>>>> */
>>>>> +    FDR_AVOID_CACHING,      /* Flow deleted to avoid caching. */
>>>>> +    FDR_FLOW_LIMIT,         /* All flows being killed. */
>>>>
>>>> Looking at the comment from Han on FDR_PURGE, and this patch
> needing another spin, we should probably add it.
>>>
>>> I can do that, sure.  In that case, we will need to have a new flow op
>>> added to revalidator_sweep__ so that we can catch it.  But in that case,
>>> it will be a different usdt probe, so I still don't know if we need
>>> FDR_PURGE right?  WDYT?
>> In revalidator_sweep__() you have sort of the following:
>>                  if (purge || ukey_state == UKEY_INCONSISTENT) {
>>                      result = UKEY_DELETE;
>>                  } else if (!seq_mismatch) {
>> And I’m afraid that if we use this tool to debug we miss the
>> ukey_state == UKEY_INCONSISTENT when debugging and spent a long time
>> figuring this out.
>> Maybe add something general like this (did not give it a lot of
> thought), and only take the FDR_PURGE : FDR_UPDATE_FAIL results in the
> script?
>>   /* 'udpif_key's are responsible for tracking the little bit of
>> state udpif
>> @@ -2991,13 +2993,13 @@ revalidator_sweep__(struct revalidator *revalidator, 
>> bool purge)
>>           uint64_t odp_actions_stub[1024 / 8];
>>           struct ofpbuf odp_actions = 
>> OFPBUF_STUB_INITIALIZER(odp_actions_stub);
>> -        enum flow_del_reason reason = FDR_REVALIDATE;
>>           struct ukey_op ops[REVALIDATE_MAX_BATCH];
>>           struct udpif_key *ukey;
>>           struct umap *umap = &udpif->ukeys[i];
>>           size_t n_ops = 0;
>>           CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>               enum ukey_state ukey_state;
>>               /* Handler threads could be holding a ukey lock while
>> it installs a
>> @@ -3016,8 +3018,10 @@ revalidator_sweep__(struct revalidator *revalidator, 
>> bool purge)
>>                   if (purge || ukey_state == UKEY_INCONSISTENT) {
>>                       result = UKEY_DELETE;
>> +                    reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>>                   } else if (!seq_mismatch) {
>>                       result = UKEY_KEEP;
>> +                    reason = FDR_REVALIDATE; //_KEEP
>>                   } else {
>>                       struct dpif_flow_stats stats;
>>                       COVERAGE_INC(revalidate_missed_dp_flow);
>> @@ -3030,6 +3034,8 @@ revalidator_sweep__(struct revalidator *revalidator, 
>> bool purge)
>>                       reval_op_init(&ops[n_ops++], result, udpif, ukey, 
>> &recircs,
>>                                     &odp_actions);
>>                   }
>> +                OVS_USDT_PROBE(revalidator_sweep__, flow_result, result,
>> +                               reason, udpif, ukey);
>>               }
>>               ovs_mutex_unlock(&ukey->mutex);
>> In addition in revalidator_sweep__() should the “enum
>> flow_del_reason reason = FDR_REVALIDATE;” not be moved to the
>> CMAP_FOR_EACH() loop?
>> 
>>>>> +};
>>>>> +
>>>>>   /* 'udpif_key's are responsible for tracking the little bit of state 
>>>>> udpif
>>>>>    * needs to do flow expiration which can't be pulled directly from the
>>>>>    * datapath.  They may be created by any handler or revalidator thread 
>>>>> at any
>>>>> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct 
>>>>> udpif_key *ukey,
>>>>>   static enum reval_result
>>>>>   revalidate_ukey__(struct udpif *udpif, const struct udpif_key *ukey,
>>>>>                     uint16_t tcp_flags, struct ofpbuf *odp_actions,
>>>>> -                  struct recirc_refs *recircs, struct xlate_cache 
>>>>> *xcache)
>>>>> +                  struct recirc_refs *recircs, struct xlate_cache 
>>>>> *xcache,
>>>>> +                  enum flow_del_reason *reason)
>>>>>   {
>>>>>       struct xlate_out *xoutp;
>>>>>       struct netflow *netflow;
>>>>> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif,
> const struct udpif_key *ukey,
>>>>>       netflow = NULL;
>>>>>
>>>>>       if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) {
>>>>> +        *reason = FDR_XLATION_ERROR;
>>>>>           goto exit;
>>>>>       }
>>>>>       xoutp = &ctx.xout;
>>>>>
>>>>>       if (xoutp->avoid_caching) {
>>>>> +        *reason = FDR_AVOID_CACHING;
>>>>>           goto exit;
>>>>>       }
>>>>>
>>>>> @@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
>>>>> udpif_key *ukey,
>>>>>           ofpbuf_clear(odp_actions);
>>>>>
>>>>>           if (!ofproto) {
>>>>> +            *reason = FDR_NO_OFPROTO;
>>>>>               goto exit;
>>>>>           }
>>>>>
>>>>> @@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
>>>>> udpif_key *ukey,
>>>>>       if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, &dp_mask, 
>>>>> &ctx.flow,
>>>>>                                NULL)
>>>>>           == ODP_FIT_ERROR) {
>>>>> +        *reason = FDR_BAD_ODP_FIT;
>>>>>           goto exit;
>>>>>       }
>>>>>
>>>>> @@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif, const struct 
>>>>> udpif_key *ukey,
>>>>>        * down.  Note that we do not know if the datapath has ignored any 
>>>>> of the
>>>>>        * wildcarded bits, so we may be overly conservative here. */
>>>>>       if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) {
>>>>> +        *reason = FDR_FLOW_WILDCARDED;
>>>>>           goto exit;
>>>>>       }
>>>>>
>>>>> @@ -2400,7 +2418,7 @@ static enum reval_result
>>>>>   revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>>>>                   const struct dpif_flow_stats *stats,
>>>>>                   struct ofpbuf *odp_actions, uint64_t reval_seq,
>>>>> -                struct recirc_refs *recircs)
>>>>> +                struct recirc_refs *recircs, enum flow_del_reason 
>>>>> *reason)
>>>>>       OVS_REQUIRES(ukey->mutex)
>>>>>   {
>>>>>       bool need_revalidate = ukey->reval_seq != reval_seq;
>>>>> @@ -2430,8 +2448,12 @@ revalidate_ukey(struct udpif *udpif, struct 
>>>>> udpif_key *ukey,
>>>>>                   xlate_cache_clear(ukey->xcache);
>>>>>               }
>>>>>               result = revalidate_ukey__(udpif, ukey, push.tcp_flags,
>>>>> -                                       odp_actions, recircs, 
>>>>> ukey->xcache);
>>>>> -        } /* else delete; too expensive to revalidate */
>>>>> +                                       odp_actions, recircs, 
>>>>> ukey->xcache,
>>>>> +                                       reason);
>>>>> +        } else {
>>>>> +            /* delete; too expensive to revalidate */
>>>>> +            *reason = FDR_TOO_EXPENSIVE;
>>>>> +        }
>>>>>       } else if (!push.n_packets || ukey->xcache
>>>>>                  || !populate_xcache(udpif, ukey, push.tcp_flags)) {
>>>>>           result = UKEY_KEEP;
>>>>> @@ -2831,6 +2853,7 @@ revalidate(struct revalidator *revalidator)
>>>>>           for (f = flows; f < &flows[n_dumped]; f++) {
>>>>>               long long int used = f->stats.used;
>>>>>               struct recirc_refs recircs = RECIRC_REFS_EMPTY_INITIALIZER;
>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>               struct dpif_flow_stats stats = f->stats;
>>>>>               enum reval_result result;
>>>>>               struct udpif_key *ukey;
>>>>> @@ -2905,9 +2928,14 @@ revalidate(struct revalidator *revalidator)
>>>>>               }
>>>>>               if (kill_them_all || (used && used < now - max_idle)) {
>>>>>                   result = UKEY_DELETE;
>>>>> +                if (kill_them_all) {
>>>>> +                    reason = FDR_FLOW_LIMIT;
>>>>> +                } else {
>>>>> +                    reason = FDR_FLOW_IDLE;
>>>>> +                }
>>>>>               } else {
>>>>>                   result = revalidate_ukey(udpif, ukey, &stats, 
>>>>> &odp_actions,
>>>>> -                                         reval_seq, &recircs);
>>>>> +                                         reval_seq, &recircs, &reason);
>>>>>               }
>>>>>               ukey->dump_seq = dump_seq;
>>>>>
>>>>> @@ -2916,6 +2944,7 @@ revalidate(struct revalidator *revalidator)
>>>>>                   udpif_update_flow_pps(udpif, ukey, f);
>>>>>               }
>>>>>
>>>>> +            OVS_USDT_PROBE(revalidate, flow_result, reason, udpif, ukey);
>
> I have been experimenting with several upcall tracking techniques that
> would make it easier to correlate upcalls with their subsequent
> related events.
> To achieve that, we need (among other things) some easy-to-compare
> unique value in the events. For revalidation events, I think a good
> candidate would be "ukey->ufid" and so does the script in this patch.
>
> However, requiring all external tools to know the layout of "struct
> udpif_key" in order to get that value makes things quite complicated
> for CORE tools (e.g: retis).
>
> With all this, would you consider adding the ufid to probe payload directly?

I don't mind making that change.

>>>>>               if (result != UKEY_KEEP) {
>>>>>                   /* Takes ownership of 'recircs'. */
>>>>>                   reval_op_init(&ops[n_ops++], result, udpif, ukey, 
>>>>> &recircs,
>>>>> @@ -2962,6 +2991,7 @@ revalidator_sweep__(struct revalidator 
>>>>> *revalidator, bool purge)
>>>>>           uint64_t odp_actions_stub[1024 / 8];
>>>>>           struct ofpbuf odp_actions = 
>>>>> OFPBUF_STUB_INITIALIZER(odp_actions_stub);
>>>>>
>>>>> +        enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>           struct ukey_op ops[REVALIDATE_MAX_BATCH];
>>>>>           struct udpif_key *ukey;
>>>>>           struct umap *umap = &udpif->ukeys[i];
>>>>> @@ -2993,7 +3023,7 @@ revalidator_sweep__(struct revalidator 
>>>>> *revalidator, bool purge)
>>>>>                       COVERAGE_INC(revalidate_missed_dp_flow);
>>>>>                       memcpy(&stats, &ukey->stats, sizeof stats);
>>>>>                       result = revalidate_ukey(udpif, ukey, &stats, 
>>>>> &odp_actions,
>>>>> -                                             reval_seq, &recircs);
>>>>> +                                             reval_seq, &recircs, 
>>>>> &reason);
>>>>>                   }
>>>>>                   if (result != UKEY_KEEP) {
>>>>>                       /* Clears 'recircs' if filled by revalidate_ukey(). 
>>>>> */
>>>>> diff --git a/utilities/automake.mk b/utilities/automake.mk
>>>>> index 9a2114df40..146b8c37fb 100644
>>>>> --- a/utilities/automake.mk
>>>>> +++ b/utilities/automake.mk
>>>>> @@ -23,6 +23,7 @@ scripts_DATA += utilities/ovs-lib
>>>>>   usdt_SCRIPTS += \
>>>>>           utilities/usdt-scripts/bridge_loop.bt \
>>>>>           utilities/usdt-scripts/dpif_nl_exec_monitor.py \
>>>>> + utilities/usdt-scripts/flow_reval_monitor.py \
>>>>>           utilities/usdt-scripts/kernel_delay.py \
>>>>>           utilities/usdt-scripts/kernel_delay.rst \
>>>>>           utilities/usdt-scripts/reval_monitor.py \
>>>>> @@ -72,6 +73,7 @@ EXTRA_DIST += \
>>>>>           utilities/docker/debian/build-kernel-modules.sh \
>>>>>           utilities/usdt-scripts/bridge_loop.bt \
>>>>>           utilities/usdt-scripts/dpif_nl_exec_monitor.py \
>>>>> + utilities/usdt-scripts/flow_reval_monitor.py \
>>>>>           utilities/usdt-scripts/kernel_delay.py \
>>>>>           utilities/usdt-scripts/kernel_delay.rst \
>>>>>           utilities/usdt-scripts/reval_monitor.py \
>>>>> @@ -146,6 +148,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
>>>>>           utilities/ovs-tcpdump.in \
>>>>>           utilities/ovs-pipegen.py \
>>>>>           utilities/usdt-scripts/dpif_nl_exec_monitor.py \
>>>>> + utilities/usdt-scripts/flow_reval_monitor.py \
>>>>>           utilities/usdt-scripts/upcall_monitor.py \
>>>>>           utilities/usdt-scripts/upcall_cost.py
>>>>>
>>>>> diff --git a/utilities/usdt-scripts/flow_reval_monitor.py
> b/utilities/usdt-scripts/flow_reval_monitor.py
>>>>> new file mode 100755
>>>>> index 0000000000..e808020bb5
>>>>> --- /dev/null
>>>>> +++ b/utilities/usdt-scripts/flow_reval_monitor.py
>>>>> @@ -0,0 +1,653 @@
>>>>> +#!/usr/bin/env python3
>>>>> +#
>>>>> +# Copyright (c) 2022 Redhat, Inc.
>>>>> +#
>
> It's already 2024! Do we need an update here?

O_O  Yes.  I'll bump the dates.

>>>>> +# Licensed under the Apache License, Version 2.0 (the "License");
>>>>> +# you may not use this file except in compliance with the License.
>>>>> +# You may obtain a copy of the License at:
>>>>> +#
>>>>> +#     http://www.apache.org/licenses/LICENSE-2.0
>>>>> +#
>>>>> +# Unless required by applicable law or agreed to in writing, software
>>>>> +# distributed under the License is distributed on an "AS IS" BASIS,
>>>>> +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
>>>>> implied.
>>>>> +# See the License for the specific language governing permissions and
>>>>> +# limitations under the License.
>>>>> +#
>>>>> +# Script information:
>>>>> +# -------------------
>>>>> +# flow_reval_monitor.py uses the dpif_netlink_operate:flow_put and
>>>>> +# revalidator:flow_result USDT probes to monitor flow lifetimes and
>>>>> +# expiration events. By default, this will show all flow_put and flow
>>>>> +# expiration events, along with their reasons. This will look like so:
>>>>> +#
>>>>> +# TIME               UFID                                        
>>>>> EVENT/REASON
>>>>> +# 101536.226986736   ufid:f76fc899-376d-466b-bc74-0000b933eb97   flow_put
>>>>> +# 101536.227196214   ufid:d08472b6-110e-46cb-a9e4-00008f46838e   flow_put
>>>>> +# 101541.516610178   ufid:fc5cc4a2-39e7-4a2d-bbce-000019665b32   flow_put
>>>>> +# 101541.516967303   ufid:fddd6510-26dc-4c87-8f7a-0000fc0c2c3a   flow_put
>>>>> +# 101551.688050747   ufid:fddd6510-26dc-4c87-8f7a-0000fc0c2c3a   flow 
>>>>> timed out
>>>>> +# 101551.688077175   ufid:fc5cc4a2-39e7-4a2d-bbce-000019665b32   flow 
>>>>> timed out
>>>>> +# 101557.695391371   ufid:f76fc899-376d-466b-bc74-0000b933eb97   flow 
>>>>> timed out
>>>>> +# 101557.695408909   ufid:d08472b6-110e-46cb-a9e4-00008f46838e   flow 
>>>>> timed out/
>>>>> +#
>>>>> +# flow key data can be printed using the --flow-keys option.  This will
>>>>> +# print the equivalent datapath flow string.
>>>>> +#
>>>>> +# When filtering flows, the syntax is the same as used by
>>>>> +# `ovs-appctl dpctl/add-flow`.
>>>>> +#
>>>>> +# The following options are available:
>>>>> +#
>>>>> +# usage: flow_reval_monitor.py [-h] [--buffer-page-count NUMBER]
>>>>> +#                              [-k [FLOW_KEYS]] [-p VSWITCHD_PID]
>>>>> +#                              [-D [DEBUG]] [-f [FLOW STRING ...]]
>>>>
>>>>     # usage: flow_reval_monitor.py [-h] [--buffer-page-count NUMBER]
>>>>                                    [-f [64-2048]] [-k] [-l [FLOW_STRING 
>>>> ...]]
>>>>                                    [-p VSWITCHD_PID] [-D [DEBUG]]
>>>
>>> Oops, I'll fix it up.
>>>
>>>>> +#
>>>>> +#  optional arguments:
>>>>> +#   -h, --help            show this help message and exit
>>>>> +#   --buffer-page-count NUMBER
>>>>> +#                         Number of BPF ring buffer pages, default 1024
>>>>> +#   -f <64..2048>, --flow-key-size=<64..2048>
>>>>> +#                         Set the size of the flow key, default 64
>>>>> +#   -k, --flow-keys       Print flow keys as flow strings
>>>>> +#   -l [FLOW_STRING ...], --filter-flows [FLOW_STRING ...]
>>>>> +#                         Filter flows that match the specified ODP-like 
>>>>> flow
>>>>
>>>> We do not filter on the flow itself but on the packet content/keys 
>>>> creating the flow.
>>>> We might want to clarify this as the actual DP flow might not
> include., for example, the IP fields.
>>>
>>> I guess it's ambiguous.  I'll try and clean up the language.  Because we
>>> are filtering on the ODP flow key, and not an openflow string.
>>>
>>>>> +#   -p VSWITCHD_PID, --pid VSWITCHD_PID
>>>>> +#                         ovs-vswitchd's PID
>>>>> +#   -D [DEBUG], --debug [DEBUG]
>>>>> +#                         Enable eBPF debugging
>>>>> +#
>>>>> +# Examples:
>>>>> +#
>>>>> +# To use the script on a running ovs-vswitchd to see flow keys and 
>>>>> expiration
>>>>> +# events for flows with an ipv4 source of 192.168.10.10:
>>>>> +# $ ./flow_reval_monitor.py --flow-keys --filter-flows \
>>>>> +#   "ipv4(src=192.168.10.10)"
>>>>
>>>> Can we add some details on what kind of filters/format is
> supported? For example no mask support.
>>>
>>> Sure, I can add it.
>>>
>>>>> +# TIME               UFID                                          
>>>>> EVENT/REASON
>>>>> +# 105082.457322742   ufid:f76fc899-376d-466b-bc74-0000b933eb97     
>>>>> flow_put
>>>>> +# ufid:f76fc899-376d-466b-bc74-0000b933eb97 has the following flow 
>>>>> information:
>>>>> +#     in_port(2),
>>>>> +#     eth(src=0e:04:47:fc:74:51, dst=da:dc:c5:69:05:d7), \
>>>>> +#     eth_type(0x800), \
>>>>> +#     ipv4(src=192.168.10.10, dst=192.168.10.30, proto=1, tos=0, 
>>>>> ttl=64,[...]),
>>>>> +#     icmp(type=8, code=0)
>>>>> +# 105092.635450202   ufid:f76fc899-376d-466b-bc74-0000b933eb97   Flow 
>>>>> timed out
>>>>> +#
>>>>> +# Notes:
>>>>> +#   1) No options are needed to attach when there is a single running 
>>>>> instance
>>>>> +#      of ovs-vswitchd.
>>>>> +#   2) If you're using the flow filtering option, it will only track 
>>>>> flows that
>>>>> +#      have been upcalled since the script began running.
>>>>> +#   3) When using the flow filtering option, the key size will likely 
>>>>> need to
>>>>> +#      be expanded to match on all the fields in the message.  The 
>>>>> default is
>>>>> +#      kept small to keep the buffer copy sizes down when displaying
>>>>> +#      flows (-k), but is hardcoded to 2048 when an actual filter (-l) is
>>>>> +#      applied
>>>>
>>>> We should add a note that the flow_put part is not included when
> HW offload (TC) is used for the kernel datapath, or if DPDK is used.
>>>
>>> That makes sense.  But we will still have a revalidator output in
>>> f.e. the DPDK case, IIRC.
>> True, I just want to make sure we are clear that we will not see the
>> flow_put messages hence the filtering will not work.
>> 
>>>>>
>>>>> +try:
>>>>> +    from bcc import BPF
>>>>> +    from bcc import USDT
>>>>> +    from bcc import USDTException
>>>>> +except ModuleNotFoundError:
>>>>> +    print("ERROR: Can't find the BPF Compiler Collection Tools.")
>>>>> +    print("Please install them before running this script.")
>>>>> +    exit(1)
>>>>> +
>>>>> +import argparse
>>>>> +from ipaddress import IPv4Address, IPv6Address
>
> nit: It seems common (not only in python coding but in other usdt
> scripts) to split "import" and "from _ import _" lines.

Okay, I'll clean it up.

>>>>> +import psutil
>>>>> +import struct
>>>>> +import sys
>>>>> +import time
>>>>> +
>>>>> +#
>>>>> +# eBPF source code
>>>>> +#
>>>>> +bpf_src = """
>>>>> +#include <linux/sched.h>
>>>>> +#include <uapi/linux/ptrace.h>
>>>>> +
>>>>> +#define MAX_KEY      <MAX_KEY_VAL>
>>>>> +#define FLOW_FILTER  <FILTER_BOOL>
>>>>> +
>>>>> +enum probe { OP_FLOW_PUT, FLOW_RESULT };
>>>>> +
>>>>> +typedef union ovs_u128 {
>>>>> +    unsigned int ufid32[4];
>>>>> +    unsigned long long ufid64[2];
>>>>> +} ovs_u128;
>>>>> +
>>>>> +struct dpif_flow_put {
>>>>> +    int flags;
>>>>> +    void *key_ptr;
>>>>> +    size_t key_len;
>>>>> +    void *mask_ptr;
>>>>> +    size_t mask_len;
>>>>> +    u64 action_ptr;
>>>>> +    size_t action_len;
>>>>> +    void *ufid_ptr;
>>>>> +};
>>>>> +
>>>>> +struct udpif_key {
>>>>> +    void *cmap_node;
>>>>> +    void *key_ptr;
>>>>> +    size_t key_len;
>>>>> +    void *mask_ptr;
>>>>> +    size_t mask_len;
>>>>> +    ovs_u128 ufid;
>>>>> +};
>>>>> +
>>>>> +struct event_t {
>>>>> +    u64 ts;
>>>>> +    u32 reason;
>>>>> +    u32 ufid[4]; /* Can't seem to make the ovs_u128 pass to python side. 
>>>>> */
>>>>
>>>> Is this still true?
>>>
>>> I didn't try it.  Actually, I think these data structures can all be
>>> extracted with pahole or something which converts the ovs_u128.
>>>
>>> Actually I think there's some converter under the hood and it doesn't
>>> have a mapping of what 'ovs_u128' means.  So we need to basically teach
>>> it to make it work if we want that.
>> Ok, not a blocking thing, just wondered if there was a quick fix or
>> not. I thought it might be related to the BCC issue.
>>>>> +    u64 key_size;
>>>>> +    u8 key[MAX_KEY];
>>>>> +    enum probe probe;
>>>>> +};
>>>>> +
>>>>> +BPF_HASH(watchlist, ovs_u128);
>>>>> +BPF_RINGBUF_OUTPUT(events, <BUFFER_PAGE_COUNT>);
>>>>> +
>>>>> +int usdt__flow_result(struct pt_regs *ctx) {
>>>>> +    u64 *ufid_present = NULL;
>>>>> +    struct udpif_key ukey;
>>>>> +
>>>>> +    bpf_usdt_readarg_p(3, ctx, &ukey, sizeof ukey);
>>>>> +    ovs_u128 ufid = ukey.ufid;
>>>>> +    ufid_present = watchlist.lookup(&ufid);
>>>>> +    if(FLOW_FILTER && !ufid_present) {
>>>>> +        return 0;
>>>>> +    }
>>>>> +
>>>>> +    struct event_t *event = events.ringbuf_reserve(sizeof(struct 
>>>>> event_t));
>>>>> +    if(!event) {
>>>>> +        /* If we can't reserve the space in the ring buffer, return 1. */
>>>>
>>>> See comments at the end regarding __sync_fetch_and_add().
>>>>
>>>>> +        return 1;
>>>>> +    }
>>>>> +
>>>>> +    event->probe = FLOW_RESULT;
>>>>> +    event->ts = bpf_ktime_get_ns();
>>>>> +    bpf_probe_read(&event->ufid, sizeof ufid, &ufid);
>>>>> +    bpf_usdt_readarg(1, ctx, &event->reason);
>>>>> +    events.ringbuf_submit(event, 0);
>>>>> +
>>>>> +    return 0;
>>>>> +};
>>>>> +
>>>>> +
>>>>> +int usdt__op_flow_put(struct pt_regs *ctx) {
>>>>> +    struct dpif_flow_put put;
>>>>> +    ovs_u128 ufid;
>>>>> +
>>>>> +    struct event_t *event = events.ringbuf_reserve(sizeof(struct 
>>>>> event_t));
>>>>> +    if(!event) {
>>>>> +        /* If we can't reserve the space in the ring buffer, return 1. */
>>>>
>>>> See comments at the end regarding __sync_fetch_and_add().
>>>>
>>>>> +        return 1;
>>>>> +    }
>>>>> +
>>>>> +    event->probe = OP_FLOW_PUT;
>>>>> +    event->ts = bpf_ktime_get_ns();
>>>>> +    bpf_usdt_readarg_p(2, ctx, &put, sizeof put);
>>>>> +    bpf_probe_read(&event->ufid, sizeof event->ufid, put.ufid_ptr);
>>>>> +    bpf_probe_read(&ufid, sizeof ufid, &event->ufid);
>>>>> +    if (put.key_len > MAX_KEY) {
>>>>> +        put.key_len = MAX_KEY;
>>>>> +    }
>>>>> +    event->key_size = put.key_len;
>>>>> +    bpf_probe_read(&event->key, put.key_len, put.key_ptr);
>>>>> +    event->reason = 0;
>>>>> +    events.ringbuf_submit(event, 0);
>>>>> +
>>>>> +    watchlist.increment(ufid);
>>>>> +    return 0;
>>>>> +};
>>>>> +"""
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# buffer_size_type()
>>>>> +#
>>>>> +def buffer_size_type(astr, min=64, max=2048):
>>>>> +    value = int(astr)
>>>>> +    if min <= value <= max:
>>>>> +        return value
>>>>> +    else:
>>>>> +        raise argparse.ArgumentTypeError(
>>>>> +            'value not in range {}-{}'.format(min, max))
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# format_ufid()
>>>>> +#
>>>>> +def format_ufid(ufid):
>>>>> +    if ufid is None:
>>>>> +        return "ufid:none"
>>>>> +
>>>>> +    return "ufid:{:08x}-{:04x}-{:04x}-{:04x}-{:04x}{:08x}".format(
>>>>> +           ufid[0], ufid[1] >> 16, ufid[1] & 0xffff,
>>>>> +           ufid[2] >> 16, ufid[2] & 0, ufid[3])
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# find_and_delete_from_watchlist()
>>>>> +#
>>>>> +def find_and_delete_from_watchlist(event):
>>>>> +    for k, _ in b["watchlist"].items():
>>>>> +        key_ufid = struct.unpack("=IIII", k)
>>>>> +        if key_ufid == tuple(event.ufid):
>>>>> +            key = (b["watchlist"].Key * 1)(k)
>>>>> +            b["watchlist"].items_delete_batch(key)
>>>>> +            break
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# handle_flow_put()
>>>>> +#
>>>>> +def handle_flow_put(event):
>>>>> +    if args.flow_keys or args.filter_flows is not None:
>>>>> +        key = decode_key(bytes(event.key)[:event.key_size])
>>>>> +        flow_dict, flow_str = parse_flow_dict(key)
>>>>> +        # For each attribute that we're watching.
>>>>> +        if args.filter_flows is not None:
>>>>> +            if not compare_flow_to_target(args.filter_flows, flow_dict):
>>>>> +                find_and_delete_from_watchlist(event)
>>>>> +                return
>>>>> +
>>>>> +    print("{:<18.9f} {:<45} {:<13}".format(event.ts / 1000000000,
>>>>> +          format_ufid(event.ufid), "Insert (put) flow to kernel."))
>>>>
>>>> Maybe change this to “Insert (put) flow to kernel module.” to valid 
>>>> missing tc flow put?
>>>
>>> Ack.
>>>
>>>>> +
>>>>> +    if args.flow_keys:
>>>>> +        if len(flow_str) > 80:<
>>>>> +            flow_str = "    " + "),\n    ".join(flow_str.split("), “))<<
>>>>> +        else:
>>>>> +            flow_str = "    " + flow_str
>>>>> +        print(" - It holds the following flow information:")
>>>>
>>>> This is confusing as, it’s not the flow information, i.e. flow
> installed, but the keys from the packet.
>>>
>>> Agreed.
>>>
>>>>> +        print(flow_str)
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# compare_flow_to_target()
>>>>> +#
>>>>> +def compare_flow_to_target(target, flow):
>>>>> +    for key in target:
>>>>> +        if key not in flow:
>>>>> +            return False
>>>>> +        elif target[key] is True:
>>>>> +            continue
>>>>> +        elif target[key] == flow[key]:
>>>>> +            continue
>>>>> +        elif isinstance(target[key], dict) and isinstance(flow[key], 
>>>>> dict):
>>>>> +            return compare_flow_to_target(target[key], flow[key])
>>>>> +        else:
>>>>> +            return False
>>>>> +    return True
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# parse_flow_str()
>>>>> +#
>>>>> +def parse_flow_str(flow_str):
>>>>> +    f_list = [i.strip(", ") for i in flow_str.split(")")]
>>>>> +    if f_list[-1] == "":
>>>>> +        f_list = f_list[:-1]
>>>>> +    flow_dict = {}
>>>>> +    for e in f_list:
>>>>> +        split_list = e.split("(")
>>>>> +        k = split_list[0]
>>>>> +        if len(split_list) == 1:
>>>>> +            flow_dict[k] = True
>>>>> +        elif split_list[1].count("=") == 0:
>>>>> +            flow_dict[k] = split_list[1]
>>>>> +        else:
>>>>> +            sub_dict = {}
>>>>> +            sublist = [i.strip() for i in split_list[1].split(",")]
>>>>> +            for subkey in sublist:
>>>>> +                brk = subkey.find("=")
>>>>> +                sub_dict[subkey[:brk]] = subkey[brk + 1:]
>>>>> +            flow_dict[k] = sub_dict
>>>>> +    return flow_dict
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# print_expiration()
>>>>> +#
>>>>> +def print_expiration(event):
>>>>> +    reasons = ["Unknown flow expiration reason!", "Flow timed out",
>>>>> +               "Flow revalidation too expensive",
>>>>> +               "Flow needs narrower wildcard mask",
>>>>> +               "Bad ODP flow fit", "Flow with associated ofproto",
>>>>> +               "Flow translation error", "Flow cache avoidance",
>>>>> +               "Kill them all signal"]
>>>>
>>>> Should we maybe define this with something like this:
>>>>
>>>> Event = IntEnum("flow_del_reason", ["FDR_FLOW_LIVE",
>>>>                                      "FDR_FLOW_TIME_OUT",
>>>>                                      ...], start=0)
>>>>
>>>> If we do this, we can also use flow_del_reason.FDR_FLOW_LIVE below.
>>>
>>> I wrote a bit below, but I was wondering if there's really a better way
>>> to do this like extracting the details from the code itself.  But for
>>> now, I can hard code something in there like is done in the other
>>> revalidator script.
>> Dont think we had scripts relying on OVS enums before. Not sure if
>> pahole can extract this also.
>> 
>>>>> +    ufid_str = format_ufid(event.ufid)
>>>>> +    reason = event.reason
>>>>> +
>>>>> +    if reason not in range(0, len(reasons) - 1):
>>>>> +        reason = 0
>>>>> +    print("{:<18.9f} {:<45} {:<17}".
>>>>> +          format(event.ts / 1000000000, ufid_str, reasons[reason]))
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# decode_key()
>>>>> +#
>>>>> +def decode_key(msg):
>>>>> +    bytes_left = len(msg)
>>>>> +    result = {}
>>>>> +    while bytes_left:
>>>>> +        if bytes_left < 4:
>>>>> +            break
>>>>> +        nla_len, nla_type = struct.unpack("=HH", msg[:4])
>>>>> +        if nla_len < 4:
>>>>> +            break
>>>>> +        nla_data = msg[4:nla_len]
>>>>> +        trunc = False
>>>>> +        if nla_len > bytes_left:
>>>>> +            trunc = True
>>>>> +            nla_data = nla_data[:(bytes_left - 4)]
>>>>
>>>>              Can we just not break out of this right away without doing 
>>>> the two above lines?
>>>
>>> I'll double check - I think I can rewrite this section a bit.
>>>
>>>>> +        else:
>>>>> +            result[get_ovs_key_attr_str(nla_type)] = nla_data
>>>>> +        if trunc:
>>>>> +            break
>>>>> +        next_offset = (nla_len + 3) & (~3)
>>>>> +        msg = msg[next_offset:]
>>>>> +        bytes_left -= next_offset
>>>>
>>>> if bytes_left:
>>>>     “Can we report that our buffer was truncated?!”
>>>>
>>>> Not sure how to do this, but with 64 bytes being the default the
> -k option only showed in_port() which took me a while to figure
> out. Maybe 128 would be better when -k is configured?
>>>
>>> Good idea.  Actually, I don't know if 64 bytes would ever really make
>>> sense anyway because it doesn't allow much to include.
>> Agreed, I think 128 sounds like a good middle ground, however, it
>> will not decode ARP messages (have not tried ipv6). Maybe 64 is good
>> enough if -k/-f is not supplied (I guess we can even set it to 0
>> without -k or -f).
>> 
>>>>> +    return result
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# get_ovs_key_attr_str()
>>>>> +#
>>>>> +def get_ovs_key_attr_str(attr):
>>>>> +    ovs_key_attr = ["OVS_KEY_ATTR_UNSPEC",
>>>>> +                    "encap",
>>>>> +                    "skb_priority",
>>>>> +                    "in_port",
>>>>> +                    "eth",
>>>>> +                    "vlan",
>>>>> +                    "eth_type",
>>>>> +                    "ipv4",
>>>>> +                    "ipv6",
>>>>> +                    "tcp",
>>>>> +                    "udp",
>>>>> +                    "icmp",
>>>>> +                    "icmpv6",
>>>>> +                    "arp",
>>>>> +                    "nd",
>>>>> +                    "skb_mark",
>>>>> +                    "tunnel",
>>>>> +                    "sctp",
>>>>> +                    "tcp_flags",
>>>>> +                    "dp_hash",
>>>>> +                    "recirc_id",
>>>>> +                    "mpls",
>>>>> +                    "ct_state",
>>>>> +                    "ct_zone",
>>>>> +                    "ct_mark",
>>>>> +                    "ct_label",
>>>>> +                    "ct_tuple4",
>>>>> +                    "ct_tuple6",
>>>>> +                    "nsh"]
>>>>> +
>>>>> +    if attr < 0 or attr > len(ovs_key_attr):
>>>>> +        return "<UNKNOWN>: {}".format(attr)
>>>>> +    return ovs_key_attr[attr]
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# is_nonzero()
>>>>> +#
>>>>> +def is_nonzero(val):
>>>>> +    if isinstance(val, int):
>>>>> +        return (val != 0)
>>>>> +
>
> nit: I don't think we need these parenthesis.

Okay, I'll trim it - and in the return line below

>>>>> +    if isinstance(val, str):
>>>>> +        val = bytes(val, "utf-8")
>>>>> +
>>>>> +    # If it's not a string or an int, it's bytes.
>>>>> +    return (val.count(0) < len(val))
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# parse_flow_dict()
>>>>> +#
>>>>> +def parse_flow_dict(key_dict, decode=True):
>>>>> +    ret_str = ""
>>>>> +    parseable = {}
>>>>> +    skip = ["nsh", "tunnel", "mpls", "vlan"]
>>>>> +    need_byte_swap = ["ct_label"]
>>>>> +    ipv4addrs = ["ct_tuple4", "tunnel", "ipv4", "arp"]
>>>>> +    ipv6addrs = ["ipv6", "nd", "ct_tuple6"]
>>>>> +    macs = {"eth": [0, 1], "arp": [3, 4], "nd": [1, 2]}
>>>>> +    fields = [("OVS_KEY_ATTR_UNSPEC"),
>>>>> +              ("encap", ),
>>>>> +              ("skb_priority", "<I"),
>>>>> +              ("in_port", "<I"),
>>>>> +              ("eth", "!6s6s", "src", "dst"),
>>>>> +              ("vlan", ),
>>>>> +              ("eth_type", "!H"),
>>>>> +              ("ipv4", "!4s4s4B", "src", "dst", "proto", "tos", "ttl", 
>>>>> "frag"),
>>>>> +              ("ipv6", "!16s16s4s4B", "src", "dst",
>>>>> +               "label", "proto", "tclass", "hlimit", "frag"),
>>>>> +              ("tcp", "!2H", "src", "dst"),
>>>>> +              ("udp", "!2H", "src", "dst"),
>>>>> +              ("icmp", "!2B", "type", "code"),
>>>>> +              ("icmpv6", "!2B", "type", "code"),
>>>>> +              ("arp", "!4s4sH6s6s", "sip", "tip", "op", "sha", "tha"),
>>>>> +              ("nd", "!16s6s6s", "target", "sll", "tll"),
>>>>> +              ("skb_mark", "<I"),
>>>>> +              ("tunnel", ),
>>>>> +              ("sctp", "!2H", "src", "dst"),
>>>>> +              ("tcp_flags", "!H"),
>>>>> +              ("dp_hash", "<I"),
>>>>> +              ("recirc_id", "<I"),
>>>>> +              ("mpls", ),
>>>>> +              ("ct_state", "<I"),
>>>>> +              ("ct_zone", "<H"),
>>>>> +              ("ct_mark", "<I"),
>>>>> +              ("ct_label", "!16s"),
>>>>> +              ("ct_tuple4",
>>>>> +               "!4s4s2HB", "src", "dst", "tp_src", "tp_dst", "proto"),
>>>>> +              ("ct_tuple6",
>>>>> +               "!16s16sB2H", "src", "dst", "proto", "tp_src", "tp_dst"),
>>>>> +              ("nsh", )]
>>>>> +    for k, v in key_dict.items():
>>>>> +        s = ""
>>>>> +        if k in skip:
>>>>> +            continue
>>>>> +        if decode and int.from_bytes(v, "big") == 0:
>>>>> +            parseable[k] = "0"
>>>>> +            continue
>>>>> +        if decode and k in need_byte_swap:
>>>>> +            v = int.from_bytes(v, "little").to_bytes(len(v), "big")
>>>>> +        attr = -1
>>>>> +        found = False
>>>>> +        for f in fields:
>>>>> +            if k == f[0]:
>>>>> +                attr = fields.index(f)
>>>>> +                found = True
>>>>> +                break
>>>>> +        if not found:
>>>>> +            raise KeyError("Invalid flow field '%s'" % k)
>>>>> +        if decode and len(fields[attr]) > 1:
>>>>> +            data = list(struct.unpack(fields[attr][1],
>>>>> +                        v[:struct.calcsize(fields[attr][1])]))
>>>>> +            if k in ipv4addrs:
>>>>> +                if data[0].count(0) < 4:
>>>>> +                    data[0] = str(IPv4Address(data[0]))
>>>>> +                else:
>>>>> +                    data[0] = b"\x00"
>>>>> +                if data[1].count(0) < 4:
>>>>> +                    data[1] = str(IPv4Address(data[1]))
>>>>> +                else:
>>>>> +                    data[1] = b"\x00"
>>>>> +            if k in ipv6addrs:
>>>>> +                if data[0].count(0) < 16:
>>>>> +                    data[0] = str(IPv6Address(data[0]))
>>>>> +                else:
>>>>> +                    data[0] = b"\x00"
>>>>> +                if data[1].count(0) < len(data[1]):
>>>>> +                    data[1] = str(IPv6Address(data[1]))
>>>>> +                else:
>>>>> +                    data[1] = b"\x00"
>>>>> +            if k in macs.keys():
>>>>> +                for e in macs[k]:
>>>>> +                    if data[e].count(0) == 6:
>>>>> +                        mac_str = b"\x00"
>>>>> +                    else:
>>>>> +                        mac_str = ":".join(["%02x" % i for i in data[e]])
>>>>> +                    data[e] = mac_str
>>>>> +        if decode and len(fields[attr]) > 2:
>>>>> +            field_dict = {field: d for field, d in zip(fields[attr][2:], 
>>>>> data)}
>
> nit: I think this can be writen as:
> field_dict = dict(zip(fields[attr][2:], data))

Okay - I'll fix it.

>>>>> +            s = ", ".join(k + "=" + str(v) for k, v in 
>>>>> field_dict.items())
>>>>> +        elif decode and k != "eth_type":
>>>>> +            s = str(data[0])
>>>>> +            field_dict = s
>>>>> +        else:
>>>>> +            if decode:
>>>>> +                s = hex(data[0])
>>>>> +            field_dict = s
>>>>> +        ret_str += k + "(" + s + "), "
>>>>> +        parseable[k] = field_dict
>>>>> +    ret_str = ret_str[:-2]
>>>>> +    return (parseable, ret_str)
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# handle_event()
>>>>> +#
>>>>> +def handle_event(ctx, data, size):
>>>>> +    # Once we grab the event, we have three cases.
>>>>> +    # 1. It's a revalidator probe and the reason is nonzero: A flow is 
>>>>> expiring
>>>>> +    # 2. It's a revalidator probe and the reason is zero: flow 
>>>>> revalidated
>>>>> +    # 3. It's a flow_put probe.
>>>>> +    #
>>>>> +    # We will ignore case 2, and report all others.
>>>>> +    #
>>>>> +    event = b["events"].event(data)
>>>>> +    if event.probe == 0:  # OP_FLOW_PUT
>>>>
>>>> Here we should also define an enum for the probe events, see
> ‘Event = IntEnum("Event”...’ and ‘<EVENT_ENUM>’ in reval_monitor.py
>>>>
>>>>> +        handle_flow_put(event)<
>>>>> +    elif event.probe == 1 and event.reason > 0:  # FLOW_RESULT
>>>>
>>>> Here we could do “event.reason > flow_del_reason.FDR_FLOW_LIVE”, see 
>>>> comment above.
>>>
>>> I can do the above, but I also am wondering if it's possible to have
>>> something we can use to fill up the enum dynamically without needing to
>>> duplicate things on the python side.
>> That would be nice, maybe pahole already supports this.
>> 
>
> It does, on a private branch that I plan to send to the list soon, I did it:
> https://github.com/amorenoz/ovs/blob/862acef0f1af48574924182675f5332bba46e9e3/utilities/usdt-scripts/drop_monitor.py#L212
>
> BTW, when I send this, it'll be the third copy of the pahole code. I
> think we should start discussing where to put the common code. My firt
> thought is, in a subpackage inside ovs python package.

Makes sense to me.  Meanwhile, I'll yoink the technique you use and we
can trim it when the refactor time comes.

>>>>> +        print_expiration(event)
>>>>> +
>>>>> +
>>>>> +def main():
>>>>> +    #
>>>>> +    # Don't like these globals, but ctx passing does not work with the 
>>>>> existing
>>>>> +    # open_ring_buffer() API :(
>>>>> +    #
>>>>> +    global b
>>>>> +    global args
>>>>> +
>>>>> +    #
>>>>> +    # Argument parsing
>>>>> +    #
>>>>> +    parser = argparse.ArgumentParser()
>>>>> +    parser.add_argument("--buffer-page-count",
>>>>> +                        help="Number of BPF ring buffer pages, default 
>>>>> 1024",
>>>>> +                        type=int, default=1024, metavar="NUMBER")
>>>>> +    parser.add_argument("-f", "--flow-key-size",
>>>>> +                        help="Set maximum flow key size to capture, "
>>>>> +                        "default 64 - see notes", type=buffer_size_type,
>>>>> +                        default=64, metavar="[64-2048]")
>>>>> +    parser.add_argument("-k", "--flow-keys",
>>>>> +                        help="Print flow keys as flow strings",
>>>>> +                        action="store_true")
>>>>> +    parser.add_argument("-l", "--filter-flows", metavar="FLOW_STRING",
>>>>> +                        help="Filter flows that match the specified "
>>>>> +                        "ODP-like flow",
>>>>> +                        type=str, default=None, nargs="*")
>>>>> +    parser.add_argument("-p", "--pid", metavar="VSWITCHD_PID",
>>>>> +                        help="ovs-vswitchd's PID", type=int, 
>>>>> default=None)
>>>>> +    parser.add_argument("-D", "--debug", help="Enable eBPF debugging",
>>>>> +                        type=int, const=0x3f, default=0, nargs="?")
>>>>> +    args = parser.parse_args()
>>>>> +
>>>>> +    #
>>>>> +    # Find the PID of the ovs-vswitchd daemon if not specified.
>>>>> +    #
>>>>> +    if args.pid is None:
>>>>> +        for proc in psutil.process_iter():
>>>>> +            if "ovs-vswitchd" in proc.name():
>>>>> +                if args.pid is not None:
>>>>> +                    print("Error: Multiple ovs-vswitchd daemons running, 
>>>>> "
>>>>> +                          "use the -p option!")
>>>>> +                    sys.exit(-1)
>>>>> +
>>>>> +                args.pid = proc.pid
>>>>> +    #
>>>>> +    # Error checking on input parameters
>>>>> +    #
>>>>> +    if args.pid is None:
>>>>> +        print("ERROR: Failed to find ovs-vswitchd's PID!")
>>>>> +        sys.exit(-1)
>>>>> +
>>>>> +    #
>>>>> +    # Attach the USDT probes
>>>>> +    #
>>>>> +    u = USDT(pid=int(args.pid))
>>>>> +    try:
>>>>> +        u.enable_probe(probe="op_flow_put", fn_name="usdt__op_flow_put")
>>>>> +    except USDTException as e:
>>>>> +        print("Error attaching the dpif_netlink_operate__:op_flow_put 
>>>>> probe.")
>>>>> +        print(str(e))
>>>>> +        sys.exit(-1)
>>>>> +
>>>>> +    try:
>>>>> +        u.enable_probe(probe="flow_result", fn_name="usdt__flow_result")
>>>>> +    except USDTException as e:
>>>>> +        print("Error attaching the revalidate:flow_result probe.")
>>>>> +        print(str(e))
>>>>> +        sys.exit(-1)
>>>>> +
>>>>> +    #
>>>>> +    # Attach the probes to the running process
>>>>> +    #
>>>>> +    source = bpf_src.replace("<BUFFER_PAGE_COUNT>",
>>>>> +                             str(args.buffer_page_count))
>>>>> +
>>>>> +    if args.filter_flows is None:
>>>>> +        filter_bool = 0
>>>>> +
>>>>> +        # Set the key size based on what the user wanted
>>>>> +        source = source.replace("<MAX_KEY_VAL>", str(args.flow_key_size))
>>>>> +    else:
>>>>> +        filter_bool = 1
>>>>> +        args.filter_flows = parse_flow_str(args.filter_flows[0])
>>>>> +
>>>>> +        # Run through the parser to make sure we only filter on fields we
>>>>> +        # understand
>>>>> +        parse_flow_dict(args.filter_flows, False)
>>>>> +
>>>>> +        # This is hardcoded here because it doesn't make sense to shrink 
>>>>> the
>>>>> +        # size, since the flow key might be missing fields that are 
>>>>> matched in
>>>>> +        # the flow filter.
>>>>> +        source = source.replace("<MAX_KEY_VAL>", "2048")
>>>>> +
>>>>> +    source = source.replace("<FILTER_BOOL>", str(filter_bool))
>>>>> +
>>>>> +    b = BPF(text=source, usdt_contexts=[u], debug=args.debug)
>>>>> +
>>>>> +    #
>>>>> +    # Print header
>>>>> +    #
>>>>> +    print("{:<18} {:<45} {:<17}".format("TIME", "UFID", "EVENT/REASON"))
>>>>> +
>>>>> +    #
>>>>> +    # Dump out all events.
>>>>> +    #
>>>>> +    b["events"].open_ring_buffer(handle_event)
>>>>> +    while 1:
>>>>> +        try:
>>>>> +            b.ring_buffer_poll()
>>>>> +            time.sleep(0.5)
>>>>
>>>> I think we can remove this sleep.
>>>
>>> I'll try without it.  IIRC, the ring buffer polling was very aggressive
>>> on the CPU, but that is just a memory from mid-2022.
>> I got the ‘remove’ comment from Adrian also a while back and did
>> some tests and I did not see any load increase on the Python
>> application. But it might be worth it for you to do the same, you
>> never know where I screwed up ;)
>> 
>>>>> +        except KeyboardInterrupt:
>>>>> +            break
>>>>> +
>>>>> +
>>>>> +#
>>>>> +# Start main() as the default entry point
>>>>> +#<
>>>>> +if __name__ == "__main__":
>>>>> +    main()
>>>>> -- 2.41.0
>>>>
>>>> Missing my previous comment on adding a check to make sure we do not lose 
>>>> events:
>>>>
>>>> “
>>>> Forgot to mention that you probably also want to add some checking
> to make sure you do not lose events.
>>>>
>>>> See __sync_fetch_and_add() below:
>>>>
>>>> +BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1);
>>>> +
>>>> +static struct event_t *get_event(uint32_t id) {
>>>> +    struct event_t *event = events.ringbuf_reserve(sizeof(struct 
>>>> event_t));
>>>> +
>>>> +    if (!event) {
>>>> +        uint32_t type = 0;
>>>> +        uint64_t *value = dropcnt.lookup(&type);
>>>> +        if (value)
>>>> +            __sync_fetch_and_add(value, 1);
>>>> +
>>>> +        return NULL;
>>>> +    }
>>>> +
>>>> +    event->id = id;
>>>> +    event->ts = bpf_ktime_get_ns();
>>>> +    event->pid = bpf_get_current_pid_tgid();
>>>> +
>>>> +    return event;
>>>> +}
>>>> “
>>>>
>>>> The other missing part is to include the PID/TID in the output so
> we can relate to which revalidator thread did this (or add the comm
> with the name).
>>>
>>> Okay.
>>>
>>>> And finally, the part that got this patch delayed, not adding
> static OVS structure definitions. Which is still the case in this
> version. For now, you should probably copy the get_ovs_definitions()
> implementation from reval_monitor.py.
>>>
>>> Will do.
>> 

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to