On 21 Oct 2022, at 18:35, Kevin Sprague wrote:

> During normal operations, it is useful to understand when a particular flow
> gets removed from the system. This can be useful when debugging performance
> issues tied to ofproto flow changes, trying to determine deployed traffic
> patterns, or while debugging dynamic systems where ports come and go.
>
> Prior to this change, there was a lack of visibility around flow expiration.
> The existing debugging infrastructure could tell us when a flow was added to
> the datapath, but not when it was removed or why.
>
> This change introduces a USDT probe at the point where the revalidator
> determines that the flow should be removed.  Additionally, we track the
> reason for the flow eviction and provide that information as well.  With
> this change, we can track the complete flow lifecycle for the netlink datapath
> by hooking the upcall tracepoint in kernel, the flow put USDT, and the
> revaldiator USDT, letting us watch as flows are added and removed from the
> kernel datapath.
>
> This change only enables this information via USDT probe, so it won't be
> possible to access this information any other way (see:
> Documentation/topics/usdt-probes.rst).
>
> Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py) that
> serves as a demonstration of how the new USDT probe might be used going
> forward.
>
> Change since v5: fixed author information.
>
> Signed-off-by: Kevin Sprague <ksprague0...@gmail.com>


Hi Kevin,

Most of the changes look fine to me, however, there are still a lot of crashes 
in the filter code.

Also for now including the OVS data structures in the script will work for now. 
If we do not get a solution before this gets merged, I’ll fix up all the 
scripts that need this later.

About the crash, it has to do with when we do not receive any uuid/key (I think 
I did not research).

But if I start the script, and do the following (RHEL8):

  ovs-vsctl del-br br-int
  ovs-vsctl add-br br-int

Now I get this:

TIME               UFID                                          EVENT/REASON
5361884.255647616  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255689699  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255712148  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255734158  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255753341  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255772079  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255805591  ufid:2876428c-567e-429c-9dc3-0000d83503f1     Insert (put) 
flow to kernel.
5361884.255832007  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255852449  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255871090  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255889960  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255909455  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255928863  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.
5361884.255948291  ufid:00000000-0000-0000-0000-000000000000     Insert (put) 
flow to kernel.

So a lot of all 0 ufid’s, did not investigate if this is true, or a script 
error.


Now if I use the script with the -k option:

$ ./flow_reval_monitor.py -k
TIME               UFID                                          EVENT/REASON
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 234, in 'calling callback function'
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
ringbuf_cb_
    ret = callback(ctx, data, size)
  File "./flow_reval_monitor.py", line 502, in handle_event
    handle_flow_put(event)
  File "./flow_reval_monitor.py", line 227, in handle_flow_put
    key = decode_key(bytes(event.key)[:event.key_size])
  File "./flow_reval_monitor.py", line 328, in decode_key
    result[get_ovs_key_attr_str(nla_type)] = nla_data
  File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
    return ovs_key_attr[attr]
IndexError: list index out of range

Same thing if I try to use a filter option:

[wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f ipv6
TIME               UFID                                          EVENT/REASON
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 234, in 'calling callback function'
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
ringbuf_cb_
    ret = callback(ctx, data, size)
  File "./flow_reval_monitor.py", line 502, in handle_event
    handle_flow_put(event)
  File "./flow_reval_monitor.py", line 227, in handle_flow_put
    key = decode_key(bytes(event.key)[:event.key_size])
  File "./flow_reval_monitor.py", line 328, in decode_key
    result[get_ovs_key_attr_str(nla_type)] = nla_data
  File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str
    return ovs_key_attr[attr]
IndexError: list index out of range


Also, an invalid filter would also crash. I think Adrian requested to report 
errors for the unknown/invalid filter option, i.e. do a sanity check of the 
supplied filter.

$ ./flow_reval_monitor.py -f eelco
TIME               UFID                                          EVENT/REASON
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 234, in 'calling callback function'
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
ringbuf_cb_
    ret = callback(ctx, data, size)
  File "./flow_reval_monitor.py", line 502, in handle_event
    handle_flow_put(event)
  File "./flow_reval_monitor.py", line 232, in handle_flow_put
    find_and_delete_from_watchlist(event)
  File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist
    b["watchlist"].items_delete_batch(key)

I guess you need to re-look at it, as even your example is crashing.

$ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)"
TIME               UFID                                          EVENT/REASON
Traceback (most recent call last):
  File "_ctypes/callbacks.c", line 234, in 'calling callback function'
  File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in 
ringbuf_cb_
    ret = callback(ctx, data, size)
  File "./flow_reval_monitor.py", line 502, in handle_event
    handle_flow_put(event)
  File "./flow_reval_monitor.py", line 232, in handle_flow_put
    find_and_delete_from_watchlist(event)
  File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist
    b["watchlist"].items_delete_batch(key)
AttributeError: 'HashTable' object has no attribute 'items_delete_batch'


Maybe you should also add some details on what filter options are supported and 
the format as it gets me puzzled…

Cheers,

Eelco

_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to