Adrian Moreno <amore...@redhat.com> writes:

> On 2/20/24 19:06, Aaron Conole wrote:
>> Eelco Chaudron <echau...@redhat.com> writes:
>> 
>>> On 19 Feb 2024, at 19:57, Aaron Conole wrote:
>>>
>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>
>>>>> On 12 Feb 2024, at 15:15, Aaron Conole wrote:
>>>>>
>>>>>> Aaron Conole <acon...@redhat.com> writes:
>>>>>>
>>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>>
>>>>>>>> On 2 Feb 2024, at 11:31, Adrian Moreno wrote:
>>>>>>>>
>>>>>>>>> On 2/1/24 10:02, Eelco Chaudron wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 31 Jan 2024, at 18:03, Aaron Conole wrote:
>>>>>>>>>>
>>>>>>>>>>> Eelco Chaudron <echau...@redhat.com> writes:
>>>>>>>>>>>
>>>>>>>>>>>> On 25 Jan 2024, at 21:55, Aaron Conole wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> From: Kevin Sprague <ksprague0...@gmail.com>
>>>>>>>>>>>>>
>>>>>>>>>>>>> During normal operations, it is useful to understand when a 
>>>>>>>>>>>>> particular flow
>>>>>>>>>>>>> gets removed from the system. This can be useful when debugging 
>>>>>>>>>>>>> performance
>>>>>>>>>>>>> issues tied to ofproto flow changes, trying to determine deployed 
>>>>>>>>>>>>> traffic
>>>>>>>>>>>>> patterns, or while debugging dynamic systems where ports come and 
>>>>>>>>>>>>> go.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Prior to this change, there was a lack of visibility around flow 
>>>>>>>>>>>>> expiration.
>>>>>>>>>>>>> The existing debugging infrastructure could tell us when a flow 
>>>>>>>>>>>>> was added to
>>>>>>>>>>>>> the datapath, but not when it was removed or why.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change introduces a USDT probe at the point where the 
>>>>>>>>>>>>> revalidator
>>>>>>>>>>>>> determines that the flow should be removed.  Additionally, we 
>>>>>>>>>>>>> track the
>>>>>>>>>>>>> reason for the flow eviction and provide that information as 
>>>>>>>>>>>>> well.  With
>>>>>>>>>>>>> this change, we can track the complete flow lifecycle for the 
>>>>>>>>>>>>> netlink
>>>>>>>>>>>>> datapath by hooking the upcall tracepoint in kernel, the flow put 
>>>>>>>>>>>>> USDT, and
>>>>>>>>>>>>> the revaldiator USDT, letting us watch as flows are added and 
>>>>>>>>>>>>> removed from
>>>>>>>>>>>>> the kernel datapath.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This change only enables this information via USDT probe, so it 
>>>>>>>>>>>>> won't be
>>>>>>>>>>>>> possible to access this information any other way (see:
>>>>>>>>>>>>> Documentation/topics/usdt-probes.rst).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Also included is a script 
>>>>>>>>>>>>> (utilities/usdt-scripts/flow_reval_monitor.py)
>>>>>>>>>>>>> which serves as a demonstration of how the new USDT probe might 
>>>>>>>>>>>>> be used
>>>>>>>>>>>>> going forward.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Signed-off-by: Kevin Sprague <ksprague0...@gmail.com>
>>>>>>>>>>>>> Co-authored-by: Aaron Conole <acon...@redhat.com>
>>>>>>>>>>>>> Signed-off-by: Aaron Conole <acon...@redhat.com>
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for following this up Aaron! See comments on this patch
>>>>>>>> below. I have no additional comments on patch 2.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>
>>>>>>>>>>>> Eelco
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> ---
>>>>>>>>>>>>>    Documentation/topics/usdt-probes.rst         |   1 +
>>>>>>>>>>>>>    ofproto/ofproto-dpif-upcall.c                |  42 +-
>>>>>>>>>>>>>    utilities/automake.mk                        |   3 +
>>>>>>>>>>>>>    utilities/usdt-scripts/flow_reval_monitor.py | 653 
>>>>>>>>>>>>> +++++++++++++++++++
>>>>>>>>>>>>>    4 files changed, 693 insertions(+), 6 deletions(-)
>>>>>>>>>>>>>    create mode 100755 utilities/usdt-scripts/flow_reval_monitor.py
>>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/Documentation/topics/usdt-probes.rst
> b/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>>> index e527f43bab..a8da9bb1f7 100644
>>>>>>>>>>>>> --- a/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>>> +++ b/Documentation/topics/usdt-probes.rst
>>>>>>>>>>>>> @@ -214,6 +214,7 @@ Available probes in ``ovs_vswitchd``:
>>>>>>>>>>>>>    - dpif_recv:recv_upcall
>>>>>>>>>>>>>    - main:poll_block
>>>>>>>>>>>>>    - main:run_start
>>>>>>>>>>>>> +- revalidate:flow_result
>>>>>>>>>>>>>    - revalidate_ukey\_\_:entry
>>>>>>>>>>>>>    - revalidate_ukey\_\_:exit
>>>>>>>>>>>>>    - udpif_revalidator:start_dump
>>>>>>>>>>>>
>>>>>>>>>>>> You are missing the specific flow_result result
> section. This is from the previous patch:
>>>>>>>>>>>
>>>>>>>>>>> D'oh!  Thanks for catching it.  I'll re-add it.
>>>>>>>>>>>
>>>>>>>>>>>> @@ -358,6 +360,27 @@  See also the ``main:run_start`` probe above.
>>>>>>>>>>>>    - ``utilities/usdt-scripts/bridge_loop.bt``
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> +probe revalidate:flow_result
>>>>>>>>>>>> +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>>>>>>>>>>> +
>>>>>>>>>>>> +**Description**:
>>>>>>>>>>>> +This probe is triggered when the revalidator decides whether or 
>>>>>>>>>>>> not to
>>>>>>>>>>>> +revalidate a flow. ``reason`` is an enum that denotes that either 
>>>>>>>>>>>> the flow
>>>>>>>>>>>> +is being kept, or the reason why the flow is being deleted. The
>>>>>>>>>>>> +``flow_reval_monitor.py`` script uses this probe to notify users 
>>>>>>>>>>>> when flows
>>>>>>>>>>>> +matching user-provided criteria are deleted.
>>>>>>>>>>>> +
>>>>>>>>>>>> +**Arguments**:
>>>>>>>>>>>> +
>>>>>>>>>>>> +- *arg0*: ``(enum flow_del_reason) reason``
>>>>>>>>>>>> +- *arg1*: ``(struct udpif *) udpif``
>>>>>>>>>>>> +- *arg2*: ``(struct udpif_key *) ukey``
>>>>>>>>>>>> +
>>>>>>>>>>>> +**Script references**:
>>>>>>>>>>>> +
>>>>>>>>>>>> +- ``utilities/usdt-scripts/flow_reval_monitor.py``
>>>>>>>>>>>> +
>>>>>>>>>>>> +
>>>>>>>>>>>>    Adding your own probes
>>>>>>>>>>>>    ----------------------
>>>>>>>>>>>>
>>>>>>>>>>>>> diff --git a/ofproto/ofproto-dpif-upcall.c 
>>>>>>>>>>>>> b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>>> index b5cbeed878..97d75833f7 100644
>>>>>>>>>>>>> --- a/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>>> +++ b/ofproto/ofproto-dpif-upcall.c
>>>>>>>>>>>>> @@ -269,6 +269,18 @@ enum ukey_state {
>>>>>>>>>>>>>    };
>>>>>>>>>>>>>    #define N_UKEY_STATES (UKEY_DELETED + 1)
>>>>>>>>>>>>>
>>>>>>>>>>>>> +enum flow_del_reason {
>>>>>>>>>>>>> +    FDR_REVALIDATE = 0,     /* The flow was revalidated. */
>>>>>>>>>>>>
>>>>>>>>>>>> It was called FDR_FLOW_LIVE before, which might make more
>>>>>>>> sense. As the flow is just NOT deleted. It might or might not have
>>>>>>>> been revalidated. Thoughts?
>>>>>>>>>>>
>>>>>>>>>>> I think it had to have been revalidated if we emit the reason, 
>>>>>>>>>>> because
>>>>>>>>>>> we only emit the reason code after revalidation.  IE: there are many
>>>>>>>>>>> places where we skip revalidation but the flow stays live - and we 
>>>>>>>>>>> don't
>>>>>>>>>>> emit reasons in those cases.
>>>>>>>>>>>
>>>>>>>>>>> So at least for this patch, it MUST have been revalidated.  But 
>>>>>>>>>>> maybe in
>>>>>>>>>>> the future, we would want to catch cases where the flow hasn't 
>>>>>>>>>>> been.  In
>>>>>>>>>>> that case, it makes sense to add the FDR_FLOW_LIVE at that time - I
>>>>>>>>>>> think.
>>>>>>>>>>>
>>>>>>>>>>> Maybe you disagree?
>>>>>>>>>>
>>>>>>>>>> Well, it depends on how you define revalidation, it might only have
>>>>>>>> updated the counters. i.e. it all depends on ‘bool need_revalidate =
>>>>>>>> ukey->reval_seq != reval_seq;’ in revalidate_ukey(). That was why I
>>>>>>>> opted for a more general name.
>>>>>>>>>>
>>>>>>>>>>>>> +    FDR_FLOW_IDLE,          /* The flow went unused and was 
>>>>>>>>>>>>> deleted. */
>>>>>>>>>>>>> +    FDR_TOO_EXPENSIVE,      /* The flow was too expensive to 
>>>>>>>>>>>>> revalidate. */
>>>>>>>>>>>>> +    FDR_FLOW_WILDCARDED,    /* The flow needed a narrower 
>>>>>>>>>>>>> wildcard mask. */
>>>>>>>>>>>>> +    FDR_BAD_ODP_FIT,        /* The flow had a bad ODP flow fit. 
>>>>>>>>>>>>> */
>>>>>>>>>>>>> +    FDR_NO_OFPROTO,         /* The flow didn't have an 
>>>>>>>>>>>>> associated ofproto. */
>>>>>>>>>>>>> +    FDR_XLATION_ERROR,      /* There was an error translating 
>>>>>>>>>>>>> the flow. */
>>>>>>>>>>>>> +    FDR_AVOID_CACHING,      /* Flow deleted to avoid caching. */
>>>>>>>>>>>>> +    FDR_FLOW_LIMIT,         /* All flows being killed. */
>>>>>>>>>>>>
>>>>>>>>>>>> Looking at the comment from Han on FDR_PURGE, and this patch
>>>>>>>> needing another spin, we should probably add it.
>>>>>>>>>>>
>>>>>>>>>>> I can do that, sure.  In that case, we will need to have a new flow 
>>>>>>>>>>> op
>>>>>>>>>>> added to revalidator_sweep__ so that we can catch it.  But in that 
>>>>>>>>>>> case,
>>>>>>>>>>> it will be a different usdt probe, so I still don't know if we need
>>>>>>>>>>> FDR_PURGE right?  WDYT?
>>>>>>>>>>
>>>>>>>>>> In revalidator_sweep__() you have sort of the following:
>>>>>>>>>>
>>>>>>>>>>                   if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>>>>>                       result = UKEY_DELETE;
>>>>>>>>>>                   } else if (!seq_mismatch) {
>>>>>>>>>>
>>>>>>>>>> And I’m afraid that if we use this tool to debug we miss the
>>>>>>>> ukey_state == UKEY_INCONSISTENT when debugging and spent a long time
>>>>>>>> figuring this out.
>>>>>>>>>> Maybe add something general like this (did not give it a lot of
>>>>>>>> thought), and only take the FDR_PURGE : FDR_UPDATE_FAIL results in the
>>>>>>>> script?
>>>>>>>>>>
>>>>>>>>>>    /* 'udpif_key's are responsible for tracking the little bit of 
>>>>>>>>>> state udpif
>>>>>>>>>> @@ -2991,13 +2993,13 @@ revalidator_sweep__(struct
> revalidator *revalidator, bool purge)
>>>>>>>>>>            uint64_t odp_actions_stub[1024 / 8];
>>>>>>>>>>            struct ofpbuf odp_actions = 
>>>>>>>>>> OFPBUF_STUB_INITIALIZER(odp_actions_stub);
>>>>>>>>>>
>>>>>>>>>> -        enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>            struct ukey_op ops[REVALIDATE_MAX_BATCH];
>>>>>>>>>>            struct udpif_key *ukey;
>>>>>>>>>>            struct umap *umap = &udpif->ukeys[i];
>>>>>>>>>>            size_t n_ops = 0;
>>>>>>>>>>
>>>>>>>>>>            CMAP_FOR_EACH(ukey, cmap_node, &umap->cmap) {
>>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>                enum ukey_state ukey_state;
>>>>>>>>>>
>>>>>>>>>>                /* Handler threads could be holding a ukey lock while 
>>>>>>>>>> it installs a
>>>>>>>>>> @@ -3016,8 +3018,10 @@ revalidator_sweep__(struct
> revalidator *revalidator, bool purge)
>>>>>>>>>>
>>>>>>>>>>                    if (purge || ukey_state == UKEY_INCONSISTENT) {
>>>>>>>>>>                        result = UKEY_DELETE;
>>>>>>>>>> +                    reason = purge ? FDR_PURGE : FDR_UPDATE_FAIL;
>>>>>>>>>>                    } else if (!seq_mismatch) {
>>>>>>>>>>                        result = UKEY_KEEP;
>>>>>>>>>> +                    reason = FDR_REVALIDATE; //_KEEP
>>>>>>>>>>                    } else {
>>>>>>>>>>                        struct dpif_flow_stats stats;
>>>>>>>>>>                        COVERAGE_INC(revalidate_missed_dp_flow);
>>>>>>>>>> @@ -3030,6 +3034,8 @@ revalidator_sweep__(struct revalidator 
>>>>>>>>>> *revalidator, bool purge)
>>>>>>>>>>                        reval_op_init(&ops[n_ops++], result, udpif, 
>>>>>>>>>> ukey, &recircs,
>>>>>>>>>>                                      &odp_actions);
>>>>>>>>>>                    }
>>>>>>>>>> +                OVS_USDT_PROBE(revalidator_sweep__, flow_result, 
>>>>>>>>>> result,
>>>>>>>>>> +                               reason, udpif, ukey);
>>>>>>>>>>                }
>>>>>>>>>>                ovs_mutex_unlock(&ukey->mutex);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> In addition in revalidator_sweep__() should the “enum
>>>>>>>> flow_del_reason reason = FDR_REVALIDATE;” not be moved to the
>>>>>>>> CMAP_FOR_EACH() loop?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>> +};
>>>>>>>>>>>>> +
>>>>>>>>>>>>>    /* 'udpif_key's are responsible for tracking the little bit of 
>>>>>>>>>>>>> state udpif
>>>>>>>>>>>>>     * needs to do flow expiration which can't be pulled directly 
>>>>>>>>>>>>> from the
>>>>>>>>>>>>>     * datapath.  They may be created by any handler or 
>>>>>>>>>>>>> revalidator thread at any
>>>>>>>>>>>>> @@ -2272,7 +2284,8 @@ populate_xcache(struct udpif *udpif, struct 
>>>>>>>>>>>>> udpif_key *ukey,
>>>>>>>>>>>>>    static enum reval_result
>>>>>>>>>>>>>    revalidate_ukey__(struct udpif *udpif, const struct udpif_key 
>>>>>>>>>>>>> *ukey,
>>>>>>>>>>>>>                      uint16_t tcp_flags, struct ofpbuf 
>>>>>>>>>>>>> *odp_actions,
>>>>>>>>>>>>> -                  struct recirc_refs *recircs, struct 
>>>>>>>>>>>>> xlate_cache *xcache)
>>>>>>>>>>>>> +                  struct recirc_refs *recircs, struct 
>>>>>>>>>>>>> xlate_cache *xcache,
>>>>>>>>>>>>> +                  enum flow_del_reason *reason)
>>>>>>>>>>>>>    {
>>>>>>>>>>>>>        struct xlate_out *xoutp;
>>>>>>>>>>>>>        struct netflow *netflow;
>>>>>>>>>>>>> @@ -2293,11 +2306,13 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>        netflow = NULL;
>>>>>>>>>>>>>
>>>>>>>>>>>>>        if (xlate_ukey(udpif, ukey, tcp_flags, &ctx)) {
>>>>>>>>>>>>> +        *reason = FDR_XLATION_ERROR;
>>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>        xoutp = &ctx.xout;
>>>>>>>>>>>>>
>>>>>>>>>>>>>        if (xoutp->avoid_caching) {
>>>>>>>>>>>>> +        *reason = FDR_AVOID_CACHING;
>>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -2311,6 +2326,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>            ofpbuf_clear(odp_actions);
>>>>>>>>>>>>>
>>>>>>>>>>>>>            if (!ofproto) {
>>>>>>>>>>>>> +            *reason = FDR_NO_OFPROTO;
>>>>>>>>>>>>>                goto exit;
>>>>>>>>>>>>>            }
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -2322,6 +2338,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>        if (odp_flow_key_to_mask(ukey->mask, ukey->mask_len, 
>>>>>>>>>>>>> &dp_mask, &ctx.flow,
>>>>>>>>>>>>>                                 NULL)
>>>>>>>>>>>>>            == ODP_FIT_ERROR) {
>>>>>>>>>>>>> +        *reason = FDR_BAD_ODP_FIT;
>>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -2331,6 +2348,7 @@ revalidate_ukey__(struct udpif *udpif,
>>>>>>> const struct udpif_key *ukey,
>>>>>>>>>>>>>         * down.  Note that we do not know if the datapath has 
>>>>>>>>>>>>> ignored any of the
>>>>>>>>>>>>>         * wildcarded bits, so we may be overly conservative here. 
>>>>>>>>>>>>> */
>>>>>>>>>>>>>        if (flow_wildcards_has_extra(&dp_mask, ctx.wc)) {
>>>>>>>>>>>>> +        *reason = FDR_FLOW_WILDCARDED;
>>>>>>>>>>>>>            goto exit;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -2400,7 +2418,7 @@ static enum reval_result
>>>>>>>>>>>>>    revalidate_ukey(struct udpif *udpif, struct udpif_key *ukey,
>>>>>>>>>>>>>                    const struct dpif_flow_stats *stats,
>>>>>>>>>>>>>                    struct ofpbuf *odp_actions, uint64_t reval_seq,
>>>>>>>>>>>>> -                struct recirc_refs *recircs)
>>>>>>>>>>>>> +                struct recirc_refs *recircs, enum 
>>>>>>>>>>>>> flow_del_reason *reason)
>>>>>>>>>>>>>        OVS_REQUIRES(ukey->mutex)
>>>>>>>>>>>>>    {
>>>>>>>>>>>>>        bool need_revalidate = ukey->reval_seq != reval_seq;
>>>>>>>>>>>>> @@ -2430,8 +2448,12 @@ revalidate_ukey(struct udpif *udpif, 
>>>>>>>>>>>>> struct udpif_key *ukey,
>>>>>>>>>>>>>                    xlate_cache_clear(ukey->xcache);
>>>>>>>>>>>>>                }
>>>>>>>>>>>>>                result = revalidate_ukey__(udpif, ukey, 
>>>>>>>>>>>>> push.tcp_flags,
>>>>>>>>>>>>> -                                       odp_actions, recircs, 
>>>>>>>>>>>>> ukey->xcache);
>>>>>>>>>>>>> -        } /* else delete; too expensive to revalidate */
>>>>>>>>>>>>> +                                       odp_actions, recircs, 
>>>>>>>>>>>>> ukey->xcache,
>>>>>>>>>>>>> +                                       reason);
>>>>>>>>>>>>> +        } else {
>>>>>>>>>>>>> +            /* delete; too expensive to revalidate */
>>>>>>>>>>>>> +            *reason = FDR_TOO_EXPENSIVE;
>>>>>>>>>>>>> +        }
>>>>>>>>>>>>>        } else if (!push.n_packets || ukey->xcache
>>>>>>>>>>>>>                   || !populate_xcache(udpif, ukey, 
>>>>>>>>>>>>> push.tcp_flags)) {
>>>>>>>>>>>>>            result = UKEY_KEEP;
>>>>>>>>>>>>> @@ -2831,6 +2853,7 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>>>>>            for (f = flows; f < &flows[n_dumped]; f++) {
>>>>>>>>>>>>>                long long int used = f->stats.used;
>>>>>>>>>>>>>                struct recirc_refs recircs = 
>>>>>>>>>>>>> RECIRC_REFS_EMPTY_INITIALIZER;
>>>>>>>>>>>>> +            enum flow_del_reason reason = FDR_REVALIDATE;
>>>>>>>>>>>>>                struct dpif_flow_stats stats = f->stats;
>>>>>>>>>>>>>                enum reval_result result;
>>>>>>>>>>>>>                struct udpif_key *ukey;
>>>>>>>>>>>>> @@ -2905,9 +2928,14 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>>>>>                }
>>>>>>>>>>>>>                if (kill_them_all || (used && used < now - 
>>>>>>>>>>>>> max_idle)) {
>>>>>>>>>>>>>                    result = UKEY_DELETE;
>>>>>>>>>>>>> +                if (kill_them_all) {
>>>>>>>>>>>>> +                    reason = FDR_FLOW_LIMIT;
>>>>>>>>>>>>> +                } else {
>>>>>>>>>>>>> +                    reason = FDR_FLOW_IDLE;
>>>>>>>>>>>>> +                }
>>>>>>>>>>>>>                } else {
>>>>>>>>>>>>>                    result = revalidate_ukey(udpif, ukey, &stats, 
>>>>>>>>>>>>> &odp_actions,
>>>>>>>>>>>>> -                                         reval_seq, &recircs);
>>>>>>>>>>>>> +                                         reval_seq, &recircs, 
>>>>>>>>>>>>> &reason);
>>>>>>>>>>>>>                }
>>>>>>>>>>>>>                ukey->dump_seq = dump_seq;
>>>>>>>>>>>>>
>>>>>>>>>>>>> @@ -2916,6 +2944,7 @@ revalidate(struct revalidator *revalidator)
>>>>>>>>>>>>>                    udpif_update_flow_pps(udpif, ukey, f);
>>>>>>>>>>>>>                }
>>>>>>>>>>>>>
>>>>>>>>>>>>> +            OVS_USDT_PROBE(revalidate, flow_result, reason, 
>>>>>>>>>>>>> udpif, ukey);
>>>>>>>>>
>>>>>>>>> I have been experimenting with several upcall tracking techniques
>>>>>>>> that would make it easier to correlate upcalls with their subsequent
>>>>>>>> related events.
>>>>>>>>> To achieve that, we need (among other things) some easy-to-compare
>>>>>>>> unique value in the events. For revalidation events, I think a good
>>>>>>>> candidate would be "ukey->ufid" and so does the script in this patch.
>>>>>>>>>
>>>>>>>>> However, requiring all external tools to know the layout of "struct
>>>>>>>> udpif_key" in order to get that value makes things quite complicated
>>>>>>>> for CORE tools (e.g: retis).
>>>>>>>>>
>>>>>>>>> With all this, would you consider adding the ufid to probe payload 
>>>>>>>>> directly?
>>>>>>>>
>>>>>>>> Dont see why we can not, but if we need anything else it would quickly
>>>>>>>> explode in too much arguments. I guess CORE needs a good solution for
>>>>>>>> userspace.
>>>>>>>
>>>>>>> I think having the ufid and udpif makes sense for now.  Actually, maybe
>>>>>>> for the long term we can do something as an appctl command which will
>>>>>>> dump the relevant datastructures.  That way a tool like this, which
>>>>>>> needs the pid anyway, can generate an appctl command against the daemon
>>>>>>> and get the exact right header layout.
>>>>>>>
>>>>>>> WDYT?
>>>>>>
>>>>>> Update on this - I have it ready to go with the exception of this
>>>>>> particular change.  The USDT probe code generation seems to have a bug,
>>>>>> or maybe it just won't work using these kinds of data types.  For
>>>>>> example, the bcc tools gets confused and ends up generating code for the
>>>>>> probe like:
>>>>>>
>>>>>>      __attribute__((always_inline))
>>>>>> static __always_inline int _bpf_readarg_usdt__flow_result_3(struct
>>>>> pt_regs *ctx, void *dest, size_t len) {
>>>>>>        if (len != sizeof(d)) return -1;
>>>>>> { u64 __addr = ctx->r13 + 40; __asm__ __volatile__("": : :"memory");
>>>>> uint128_t __res = 0x0; bpf_probe_read_user(&__res, sizeof(__res),
>>>>> (void *)__addr); *((d *)dest) = __res; }
>>>>>>        return 0;
>>>>>>      }
>>>>>>
>>>>>> This code isn't valid - the (d) variable isn't properly represented
>>>>>> here.  I guess maybe there's some issue with the way they use ctypes
>>>>>> internally?  It isn't clear to me.
>>>>>
>>>>> I miss some context, but is it because you point to ufid, which is a
>>>>> union (typedef union ovs_u128)? What if you point to the u32 member of
>>>>> the union?
>>>>
>>>> I'm not sure, but that doesn't seem like the best approach to me, tbh.
>>>>
>>>>>> We could work around this by making the ufid always pass as a void *,
>>>>>> and passing a length that tells the size.  However, if the underlying
>>>>>> data type ever changes, it will be totally incompatible.  I doubt it
>>>>>> would ever change, but at least with something stronger typed, we could
>>>>>> get a better exception that bubbled up.
>>>>>
>>>>> I do not like the pointer + length, as this adds another argument. I
>>>>> can’t remember which, but some tools had problems with 5 or more
>>>>> arguments.
>>>>
>>>> I do agree.  I also want to test the latest version of bcc-tools also,
>>>> to see if it really is just related to my f38 version of bcc-tools.
>>>> That also raises a different issue, but it can be something we solve in
>>>> the interrim with a version check.
>>>>
>>>>>> I'm okay with doing it this way, but just wanted to make sure that
>>>>>> retis, et. al. would be okay with interpreting it this way.
>>>>>>
>>>>>> Alternatively, there's definitely a bug there - we could omit the
>>>>>> argument entirely and rely on pahole or even pyelftools to help extract
>>>>>> the struct layout and get the ufid from the udpif_key struct.  The data
>>>>>> is already being sent, so it seems like duplication a bit to send it,
>>>>>> and becomes difficult in light of this bcc "bug?"
>>>>>
>>>>> I agree it’s overloading arguments with data already present, but I
>>>>> guess CORE can not yet read debug data from userspace
>>>>> applications. Adrian, I thought this was being worked on with the
>>>>> Pahole guys. If this is close we could just leave out this extra
>>>>> argument for now.
>>>>
>>>> That should be something doable, I think.  For example, there are crates
>>>> like gimli that parse DWARF data (but not sure if it is parsing v5,
>>>> yet).
>>>>
>>>>> Alternatively, we could ask the BCC guys if this is a known issue,
>>>>> with maybe a potential workaround. Just post it here
>>>>> https://github.com/iovisor/bcc/issues the respond quickly.
>>>>
>>>> They will tell to run with the latest version, I think.  The version I'm
>>>> using is what ships with Fedora 38, and seems quite out of date.
>>>
>>> ACK, will wait for your conclusion and the v9.
>> After tying to get it working with latest bcc-tools, the behavior is
>> actually much worse.  Whereas before the flow_reval_monitor script would
>> fail during the bpf compilation (with a clear error message), the latest
>> BCC tools will crash the python interpreter.
>> I'm going to omit the ufid details.  I think CO-RE will have support
>> anyway for this kinds of output, so v9 I will keep the args as-is and
>> when the support gets added to decode the debuginfo packages to pull
>> those structures, they will get the support.
>> 
>
> Sorry, maybe I missed something. What support will CO-RE have?

I guess, my understanding was the pahole work would be the thing CO-RE
needs, but after reading your comments, maybe that isn't true.

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to