On Mon, Jun 17, 2024 at 01:26:39PM GMT, Ilya Maximets wrote: > On 6/17/24 09:08, Adrián Moreno wrote: > > On Fri, Jun 14, 2024 at 12:55:59PM GMT, Aaron Conole wrote: > >> Adrian Moreno <amore...@redhat.com> writes: > >> > >>> The behavior of actions might not be the exact same if they are being > >>> executed inside a nested sample action. Store the probability of the > >>> parent sample action in the skb's cb area. > >> > >> What does that mean? > >> > > > > Emit action, for instance, needs the probability so that psample > > consumers know what was the sampling rate applied. Also, the way we > > should inform about packet drops (via kfree_skb_reason) changes (see > > patch 7/9). > > > >>> Use the probability in emit_sample to pass it down to psample. > >>> > >>> Signed-off-by: Adrian Moreno <amore...@redhat.com> > >>> --- > >>> include/uapi/linux/openvswitch.h | 3 ++- > >>> net/openvswitch/actions.c | 25 ++++++++++++++++++++++--- > >>> net/openvswitch/datapath.h | 3 +++ > >>> net/openvswitch/vport.c | 1 + > >>> 4 files changed, 28 insertions(+), 4 deletions(-) > >>> > >>> diff --git a/include/uapi/linux/openvswitch.h > >>> b/include/uapi/linux/openvswitch.h > >>> index a0e9dde0584a..9d675725fa2b 100644 > >>> --- a/include/uapi/linux/openvswitch.h > >>> +++ b/include/uapi/linux/openvswitch.h > >>> @@ -649,7 +649,8 @@ enum ovs_flow_attr { > >>> * Actions are passed as nested attributes. > >>> * > >>> * Executes the specified actions with the given probability on a > >>> per-packet > >>> - * basis. > >>> + * basis. Nested actions will be able to access the probability value of > >>> the > >>> + * parent @OVS_ACTION_ATTR_SAMPLE. > >>> */ > >>> enum ovs_sample_attr { > >>> OVS_SAMPLE_ATTR_UNSPEC, > >>> diff --git a/net/openvswitch/actions.c b/net/openvswitch/actions.c > >>> index 3b4dba0ded59..33f6d93ba5e4 100644 > >>> --- a/net/openvswitch/actions.c > >>> +++ b/net/openvswitch/actions.c > >>> @@ -1048,12 +1048,15 @@ static int sample(struct datapath *dp, struct > >>> sk_buff *skb, > >>> struct nlattr *sample_arg; > >>> int rem = nla_len(attr); > >>> const struct sample_arg *arg; > >>> + u32 init_probability; > >>> bool clone_flow_key; > >>> + int err; > >>> > >>> /* The first action is always 'OVS_SAMPLE_ATTR_ARG'. */ > >>> sample_arg = nla_data(attr); > >>> arg = nla_data(sample_arg); > >>> actions = nla_next(sample_arg, &rem); > >>> + init_probability = OVS_CB(skb)->probability; > >>> > >>> if ((arg->probability != U32_MAX) && > >>> (!arg->probability || get_random_u32() > arg->probability)) { > >>> @@ -1062,9 +1065,21 @@ static int sample(struct datapath *dp, struct > >>> sk_buff *skb, > >>> return 0; > >>> } > >>> > >>> + if (init_probability) { > >>> + OVS_CB(skb)->probability = ((u64)OVS_CB(skb)->probability * > >>> + arg->probability / U32_MAX); > >>> + } else { > >>> + OVS_CB(skb)->probability = arg->probability; > >>> + } > >>> + > >> > >> I'm confused by this. Eventually, integer arithmetic will practically > >> guarantee that nested sample() calls will go to 0. So eventually, the > >> test above will be impossible to meet mathematically. > >> > >> OTOH, you could argue that a 1% of 50% is low anyway, but it still would > >> have a positive probability count, and still be possible for > >> get_random_u32() call to match. > >> > > > > Using OVS's probability semantics, we can express probabilities as low > > as (100/U32_MAX)% which is pretty low indeed. However, just because the > > probability of executing the action is low I don't think we should not > > report it. > > > > Rethinking the integer arithmetics, it's true that we should avoid > > hitting zero on the division, eg: nesting 6x 1% sampling rates will make > > the result be zero which will make probability restoration fail on the > > way back. Threrefore, the new probability should be at least 1. > > > > > >> I'm not sure about this particular change. Why do we need it? > >> > > > > Why do we need to propagate the probability down to nested "sample" > > actions? or why do we need to store the probability in the cb area in > > the first place? > > > > The former: Just for correctness as only storing the last one would be > > incorrect. Although I don't know of any use for nested "sample" actions. > > I think, we can drop this for now. All the user interfaces specify > the probability per action. So, it should be fine to report the > probability of the action that emitted the sample without taking into > account the whole timeline of that packet. Besides, packet can leave > OVS and go back loosing the metadata, so it will not actually be a > full solution anyway. Single-action metadata is easier to define. >
Sure, I guess we can drop it, I don't think there is a use case for nested samples anyway. > > The latter: To pass it down to psample so that sample receivers know how > > the sampling rate applied (and, e.g: do throughput estimations like OVS > > does with IPFIX). > > > > > >>> clone_flow_key = !arg->exec; > >>> - return clone_execute(dp, skb, key, 0, actions, rem, last, > >>> - clone_flow_key); > >>> + err = clone_execute(dp, skb, key, 0, actions, rem, last, > >>> + clone_flow_key); > >>> + > >>> + if (!last) > >> > >> Is this right? Don't we only want to set the probability on the last > >> action? Should the test be 'if (last)'? > >> > > > > This is restoring the parent's probability after the actions in the > > current sample action have been executed. > > > > If it was the last action there is no need to restore the probability > > back to the parent's (or zero if it's there's only one level) since no > > further action will require it. And more importantly, if it's the last > > action, the packet gets free'ed inside that "branch" so we must not > > access its memory. > > > > > >>> + OVS_CB(skb)->probability = init_probability; > >>> + > >>> + return err; > >>> } > >>> > >>> /* When 'last' is true, clone() should always consume the 'skb'. > >>> @@ -1313,6 +1328,7 @@ static int execute_emit_sample(struct datapath *dp, > >>> struct sk_buff *skb, > >>> struct psample_metadata md = {}; > >>> struct vport *input_vport; > >>> const struct nlattr *a; > >>> + u32 rate; > >>> int rem; > >>> > >>> for (a = nla_data(attr), rem = nla_len(attr); rem > 0; > >>> @@ -1337,8 +1353,11 @@ static int execute_emit_sample(struct datapath > >>> *dp, struct sk_buff *skb, > >>> > >>> md.in_ifindex = input_vport->dev->ifindex; > >>> md.trunc_size = skb->len - OVS_CB(skb)->cutlen; > >>> + md.rate_as_probability = 1; > >>> + > >>> + rate = OVS_CB(skb)->probability ? OVS_CB(skb)->probability : U32_MAX; > >>> > >>> - psample_sample_packet(&psample_group, skb, 0, &md); > >>> + psample_sample_packet(&psample_group, skb, rate, &md); > >>> #endif > >>> > >>> return 0; > >>> diff --git a/net/openvswitch/datapath.h b/net/openvswitch/datapath.h > >>> index 0cd29971a907..9ca6231ea647 100644 > >>> --- a/net/openvswitch/datapath.h > >>> +++ b/net/openvswitch/datapath.h > >>> @@ -115,12 +115,15 @@ struct datapath { > >>> * fragmented. > >>> * @acts_origlen: The netlink size of the flow actions applied to this > >>> skb. > >>> * @cutlen: The number of bytes from the packet end to be removed. > >>> + * @probability: The sampling probability that was applied to this skb; > >>> 0 means > >>> + * no sampling has occurred; U32_MAX means 100% probability. > >>> */ > >>> struct ovs_skb_cb { > >>> struct vport *input_vport; > >>> u16 mru; > >>> u16 acts_origlen; > >>> u32 cutlen; > >>> + u32 probability; > >>> }; > >>> #define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb) > >>> > >>> diff --git a/net/openvswitch/vport.c b/net/openvswitch/vport.c > >>> index 972ae01a70f7..8732f6e51ae5 100644 > >>> --- a/net/openvswitch/vport.c > >>> +++ b/net/openvswitch/vport.c > >>> @@ -500,6 +500,7 @@ int ovs_vport_receive(struct vport *vport, struct > >>> sk_buff *skb, > >>> OVS_CB(skb)->input_vport = vport; > >>> OVS_CB(skb)->mru = 0; > >>> OVS_CB(skb)->cutlen = 0; > >>> + OVS_CB(skb)->probability = 0; > >>> if (unlikely(dev_net(skb->dev) != ovs_dp_get_net(vport->dp))) { > >>> u32 mark; > >> > > > _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev