On 21 Oct 2022, at 18:35, Kevin Sprague wrote:
> During normal operations, it is useful to understand when a particular flow > gets removed from the system. This can be useful when debugging performance > issues tied to ofproto flow changes, trying to determine deployed traffic > patterns, or while debugging dynamic systems where ports come and go. > > Prior to this change, there was a lack of visibility around flow expiration. > The existing debugging infrastructure could tell us when a flow was added to > the datapath, but not when it was removed or why. > > This change introduces a USDT probe at the point where the revalidator > determines that the flow should be removed. Additionally, we track the > reason for the flow eviction and provide that information as well. With > this change, we can track the complete flow lifecycle for the netlink datapath > by hooking the upcall tracepoint in kernel, the flow put USDT, and the > revaldiator USDT, letting us watch as flows are added and removed from the > kernel datapath. > > This change only enables this information via USDT probe, so it won't be > possible to access this information any other way (see: > Documentation/topics/usdt-probes.rst). > > Also included is a script (utilities/usdt-scripts/flow_reval_monitor.py) that > serves as a demonstration of how the new USDT probe might be used going > forward. > > Change since v5: fixed author information. > > Signed-off-by: Kevin Sprague <ksprague0...@gmail.com> Hi Kevin, Most of the changes look fine to me, however, there are still a lot of crashes in the filter code. Also for now including the OVS data structures in the script will work for now. If we do not get a solution before this gets merged, I’ll fix up all the scripts that need this later. About the crash, it has to do with when we do not receive any uuid/key (I think I did not research). But if I start the script, and do the following (RHEL8): ovs-vsctl del-br br-int ovs-vsctl add-br br-int Now I get this: TIME UFID EVENT/REASON 5361884.255647616 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255689699 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255712148 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255734158 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255753341 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255772079 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255805591 ufid:2876428c-567e-429c-9dc3-0000d83503f1 Insert (put) flow to kernel. 5361884.255832007 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255852449 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255871090 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255889960 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255909455 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255928863 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. 5361884.255948291 ufid:00000000-0000-0000-0000-000000000000 Insert (put) flow to kernel. So a lot of all 0 ufid’s, did not investigate if this is true, or a script error. Now if I use the script with the -k option: $ ./flow_reval_monitor.py -k TIME UFID EVENT/REASON Traceback (most recent call last): File "_ctypes/callbacks.c", line 234, in 'calling callback function' File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in ringbuf_cb_ ret = callback(ctx, data, size) File "./flow_reval_monitor.py", line 502, in handle_event handle_flow_put(event) File "./flow_reval_monitor.py", line 227, in handle_flow_put key = decode_key(bytes(event.key)[:event.key_size]) File "./flow_reval_monitor.py", line 328, in decode_key result[get_ovs_key_attr_str(nla_type)] = nla_data File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str return ovs_key_attr[attr] IndexError: list index out of range Same thing if I try to use a filter option: [wsfd-netdev64:~/...ilities/usdt-scripts]$ ./flow_reval_monitor.py -f ipv6 TIME UFID EVENT/REASON Traceback (most recent call last): File "_ctypes/callbacks.c", line 234, in 'calling callback function' File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in ringbuf_cb_ ret = callback(ctx, data, size) File "./flow_reval_monitor.py", line 502, in handle_event handle_flow_put(event) File "./flow_reval_monitor.py", line 227, in handle_flow_put key = decode_key(bytes(event.key)[:event.key_size]) File "./flow_reval_monitor.py", line 328, in decode_key result[get_ovs_key_attr_str(nla_type)] = nla_data File "./flow_reval_monitor.py", line 373, in get_ovs_key_attr_str return ovs_key_attr[attr] IndexError: list index out of range Also, an invalid filter would also crash. I think Adrian requested to report errors for the unknown/invalid filter option, i.e. do a sanity check of the supplied filter. $ ./flow_reval_monitor.py -f eelco TIME UFID EVENT/REASON Traceback (most recent call last): File "_ctypes/callbacks.c", line 234, in 'calling callback function' File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in ringbuf_cb_ ret = callback(ctx, data, size) File "./flow_reval_monitor.py", line 502, in handle_event handle_flow_put(event) File "./flow_reval_monitor.py", line 232, in handle_flow_put find_and_delete_from_watchlist(event) File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist b["watchlist"].items_delete_batch(key) I guess you need to re-look at it, as even your example is crashing. $ ./flow_reval_monitor.py --filter-flows "ipv4(src=192.168.10.10)" TIME UFID EVENT/REASON Traceback (most recent call last): File "_ctypes/callbacks.c", line 234, in 'calling callback function' File "/usr/lib/python3.6/site-packages/bcc/table.py", line 1068, in ringbuf_cb_ ret = callback(ctx, data, size) File "./flow_reval_monitor.py", line 502, in handle_event handle_flow_put(event) File "./flow_reval_monitor.py", line 232, in handle_flow_put find_and_delete_from_watchlist(event) File "./flow_reval_monitor.py", line 218, in find_and_delete_from_watchlist b["watchlist"].items_delete_batch(key) AttributeError: 'HashTable' object has no attribute 'items_delete_batch' Maybe you should also add some details on what filter options are supported and the format as it gets me puzzled… Cheers, Eelco _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev