On 5/14/24 13:27, Eelco Chaudron wrote: > > > On 14 May 2024, at 13:05, Ilya Maximets wrote: > >> On 5/14/24 12:14, Adrian Moreno wrote: >>> >>> >>> On 5/14/24 11:09 AM, Ilya Maximets wrote: >>>> On 5/14/24 09:39, Adrian Moreno wrote: >>>>> >>>>> >>>>> On 5/10/24 12:45 PM, Adrian Moreno wrote: >>>>>> >>>>>> >>>>>> On 5/10/24 12:06 PM, Eelco Chaudron wrote: >>>>>>> On 24 Apr 2024, at 21:53, Adrian Moreno wrote: >>>>>>> >>>>>>>> The new odp sample attributes allow userspace to specify a group_id and >>>>>>>> user-defined cookie to be passed down to psample. >>>>>>>> >>>>>>>> Add support for parsing and formatting such action. >>>>>>>> >>>>>>>> Signed-off-by: Adrian Moreno <amore...@redhat.com> >>>>>>> >>>>>>> Hi Adrian, >>>>>>> >>>>>>> See my comments below inline. >>>>>>> >>>>>>> //Eelco >>>>>>> >>>>>>>> --- >>>>>>>> include/linux/openvswitch.h | 49 +++++++++--- >>>>>>>> lib/odp-execute.c | 3 + >>>>>>>> lib/odp-util.c | 150 >>>>>>>> ++++++++++++++++++++++++++--------- >>>>>>>> ofproto/ofproto-dpif-ipfix.c | 2 + >>>>>>>> python/ovs/flow/odp.py | 2 + >>>>>>>> tests/odp.at | 5 ++ >>>>>>>> 6 files changed, 163 insertions(+), 48 deletions(-) >>>>>>>> >>>>>>>> diff --git a/include/linux/openvswitch.h b/include/linux/openvswitch.h >>>>>>>> index d9fb991ef..3e748405c 100644 >>>>>>>> --- a/include/linux/openvswitch.h >>>>>>>> +++ b/include/linux/openvswitch.h >>>>>>>> @@ -696,6 +696,7 @@ enum ovs_flow_attr { >>>>>>>> #define OVS_UFID_F_OMIT_MASK (1 << 1) >>>>>>>> #define OVS_UFID_F_OMIT_ACTIONS (1 << 2) >>>>>>>> >>>>>>>> +#define OVS_PSAMPLE_COOKIE_MAX_SIZE 16 >>>>>>>> /** >>>>>>>> * enum ovs_sample_attr - Attributes for %OVS_ACTION_ATTR_SAMPLE >>>>>>>> action. >>>>>>>> * @OVS_SAMPLE_ATTR_PROBABILITY: 32-bit fraction of packets to >>>>>>>> sample with >>>>>>>> @@ -703,15 +704,27 @@ enum ovs_flow_attr { >>>>>>>> * %UINT32_MAX samples all packets and intermediate values sample >>>>>>>> intermediate >>>>>>>> * fractions of packets. >>>>>>>> * @OVS_SAMPLE_ATTR_ACTIONS: Set of actions to execute in sampling >>>>>>>> event. >>>>>>>> - * Actions are passed as nested attributes. >>>>>>>> + * Actions are passed as nested attributes. Optional if >>>>>>>> + * OVS_SAMPLE_ATTR_PSAMPLE_GROUP is set. >>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_GROUP: A 32-bit number to be used as >>>>>>>> psample group. >>>>>>>> + * Optional if OVS_SAMPLE_ATTR_ACTIONS is set. >>>>>>>> + * @OVS_SAMPLE_ATTR_PSAMPLE_COOKIE: A variable-length binary cookie >>>>>>>> that, if >>>>>>>> + * provided, will be copied to the psample cookie. >>>>>>> >>>>>>> Should we mention any limit on the actual length of the cookie? >>>>>>> >>>>>>> Any reason we explicitly call this psample? From an OVS perspective, >>>>>>> this >>>>>>> could just be >>>>>>> SAMPLE_ATTR_FORWARD/OFFLOAD/DESTINATION/ENDPOINT/COLLECTOR/TARGET_COOKIE >>>>>>> and >>>>>>> _GROUP. Other datapaths, do not have PSAMPLE. >>>>>>> >>>>>> >>>>>> See my response to your comment on the cover-letter for possible name >>>>>> alternatives. >>>>>> >>>>>> >>>>>>> Thinking about it more, from an OVS perspective we should probably not >>>>>>> even >>>>>>> send down a COOKIE, but we should send down an obs_domain_id and >>>>>>> obs_point_id, >>>>>>> and then the kernel can move this into a cookie. >>>>>>> >>>>>> >>>>>> I did consider this. However, the opaque cookie fits well with the tc >>>>>> API which >>>>>> does not have two 32-bit values but an opaque 128-bit cookie. So if the >>>>>> action >>>>>> is offloaded OVS needs to "encode" the two 32-bit values into the cookie >>>>>> anyways. >>>>>> >>>>>>> The collector itself could be called system/local collector, or >>>>>>> something like >>>>>>> that. This way it can use for example psample for kernel and UDST >>>>>>> probes for >>>>>>> usespace. Both can pass the group and cookie (obs_domain_id and >>>>>>> obs_point_id). >>>>>>> >>>>>>> Also, the presence of any of this should not dictate the psample >>>>>>> action, we >>>>>>> probably need a specific OVS_SAMPLE_ATTR_SYSTEM_TARGET type nl flag. >>>>>>> >>>>>> >>>>>> It clearer and it might simplify option verification a bit, but isn't a >>>>>> bit >>>>>> redundant? There is no flag for action execution for instance, the >>>>>> presence of >>>>>> the attribute is enough. >>>>>> >>>>>> An alternative would be to have nested attributes, e.g: >>>>>> >>>>>> enum ovs_sample_attr { >>>>>> OVS_SAMPLE_ATTR_UNSPEC, >>>>>> OVS_SAMPLE_ATTR_PROBABILITY, /* u32 number */ >>>>>> OVS_SAMPLE_ATTR_ACTIONS, /* Nested OVS_ACTION_ATTR_* >>>>>> attributes. */ >>>>>> OVS_SAMPLE_ATTR_SYSTEM, /* Nested OVS_SAMPLE_SYSTEM_ATTR_* >>>>>> attributes. */ >>>>>> >>>>>> __OVS_SAMPLE_ATTR_MAX, >>>>>> }; >>>>>> >>>>>> enum ovs_sample_system_attr { >>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_DOMAIN, /* u32 number */ >>>>>> OVS_SAMPLE_SYSTEM_ATTR_OBS_POINT, /* u32 number */ >>>>>> OVS_SAMPLE_SYSTEM_ATTR_COOKIE, /* Nested OVS_ACTION_ATTR_* >>>>>> >>>>>> __OVS_SSAMPLE_SYSTEM_ATTR_MAX, >>>>>> }; >>>>>> >>>>>> WDYT? >>>>>> >>>>> >>>>> Another benefit of nested attributes is the easier addition of other >>>>> sample-related attributes, I already have a candidate: >>>>> OVS_SAMPLE_SYSTEM_TRUNC >>>>> >>>>> This will be passed to psample (md->trunc_size) who will truncate the >>>>> packet >>>>> size to the specified value. This can be useful for large packets for >>>>> which we >>>>> are only interested in the headers. >>>>> >>>>> WDYT? >>>>> >>>>> [snip] >>>>> >>>> >>>> Hrm. This makes me think these should not be attributes of a sample >>>> action at all, >>>> but rather we should have a separate EMIT_SAMPLE action that just calls >>>> psample, so >>>> we can combine it with other actions: >>>> >>>> sample(..., actions(userspace(...),trunc(100),emit_sample(cookie=...))) >>>> >>>> or even: >>>> >>>> sample(..., >>>> actions(clone(trunc(100),emit_sample(cookie=...)),userspace(...))) >>>> >>>> This will also simplify the requires_datapath_assistance() check as we can >>>> just >>>> check the action and not iterate over the list of attributes. And it will >>>> be >>>> capable to only send this one action to the datapath and not the whole >>>> thing. >>>> >>> >>> My suggestion of adding a OVS_SAMPLE_{}_TRUNC was precisely to avoid using >>> the >>> existing trunc implementation. >>> >>> Truncation is implemented inside psample so we would not be maintaining two >>> implementations. >> >> We will maintain trunc action and a trunc attribute for sampling, so we will >> have a duplication here. >> >>> psample implementation does not require packet clone (which is >>> needed for OVS trunc) plus it can be easily optimized out if there is no >>> listener in the multicast group (I plan to add that optimization in the next >>> kernel series). >> >> 'trunc' doesn't require a clone in case there are no other actions after, >> which >> is the main use case for drop sampling. >> >> BTW, having explicit drop actions might have broken some of the assumptions >> around this optimization, but it should be fixable. >> >> And psample itself might need a clone() in case there was a trunc() before >> sampling. psample will report skb->len as original packet size, but that >> is will not be correct as we have a postponed cutlen, so the reported length >> must be the length after the cut and that can't be done without actually >> cutting the skb at this point. >> >>> >>> Having said that, I'm not against adding a new action if that simplifies >>> things. >>> I think it might simplify things in ovs-vswitchd and even in the kernel. >>> However, I have one concern. >>> >>> IIUC and based on your examples, your suggestions means we would not have >>> sampling rate inside emit_sample, right? This might be a problem because >>> psample >>> does have sampling rate which I believe to be crucial to balance performance >>> impact and accuracy. >> >> psample doesn't limit the rate, it just reports a number. Rate is part of >> the main sample() action, i.e. >> >> sample(sample=75%,actions(emit_sample(cookie=...))) >> >> Having rate also in the 'emit_sample' would be another API duplication. > > I agree, but this is the confusing part :) If you use the psample TC action, > it will have the rate-limiting as part of the action. > > So if we offload the above, we will have a nested tc psample action, the > outer one with the desired sample rate, and the inner one with a 100% sample > rate. Actually, the inner one needs to be handled by sw as it will go to > vswitchd (and than back to psample?). I guess this avoids the idea of doing > this fast… Maybe if emit_sample() is the only action we can do some > optimisation, but it needs to be translatable back and forth from TC to > OVS_ATTRs. > > I guess we need the same for OVS_ACTION_ATTR_USERSPACE actions, which by the > way Nvidia was going to work on once the sFlow part was accepted.
The actual sample() action can not be offloaded in a generic case. There is no smaple action in TC that would accept arbitrary action list. We can only offload a few corner cases. Today we offload none. With the sFlow series we can offload a single specific OVS_ACTION_ATTR_USERSPACE that contains USER_ACTION_COOKIE_SFLOW. And we can't offload more than one sampling method within a single OVS_ACTION_ATTR_SAMPLE, because in TC they'll have independent probabilities, which is not a desired behavior. So, we'll always have special case offloading for each particular action inside the sample(). > >> BTW, "sampling" column is part of IPFIX and sFlow tables and you're not >> adding a new table for this new smapling method, i.e. it will not be >> possible to set sampling rate for psample with the current database >> configuration. >> >>> This value must be present in the psample sample or else >>> users will not know how inaccurate the sample is. >> >> Users can check the sampling configuration in the database if we will >> have one. And they are configuring it in the first place. And it will >> be confusing to see two rates in the same action list. >> >> It might be possible to carry the current rate in the OVS_CB(skb) and >> pass it to the psample. >> >> In any case, the rate reported by psample will not be accurate if there >> are nested sample actions, unless we carry and properly multiply / divide >> probabilities when entering / exiting the sample action. >> >>> >>> >>>> Naming is still hard though. I like the word "emit" because it doesn't >>>> have a >>>> concept of directionality, i.e. we're not sending it to somewhere, we're >>>> just >>>> emitting it to the wild. That kind of matches with our vague definition >>>> of a >>>> "local" or "system" sampling. >>>> >>> >>> I like the word "emit". >>> >>>> Another option might be 'OUTPUT_SAMPLE/SAMPLE_OUTPUT', i.e. output(sample, >>>> cookie=...). >>>> >>>> The main concern is that we already have truncation implemented, so we can >>>> be more >>>> flexible in a way we generate actions instead of adding new attributes. >>>> >>>> Thoughts? >>> >>> See my comment above. I am not suggesting we use the current trunc >>> implementation nor to add a new one, just to use psample's. >>> > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev