On 25/02/2022 10:08, Simon Horman wrote:
> Hi Pieter,
>
> nice to hear from you :)
Hi Simon. Thank you for the insightful feedback.
>
> On Tue, Feb 22, 2022 at 12:17:40PM +0000, pieter.jansen-van-vuu...@amd.com
> wrote:
>> From: Pieter Jansen van Vuuren <pieter.jansen-van-vuu...@amd.com>
>>
>> Previously when making use of the tc datapath to achieve decrement ttl,
>> we would install a filter that matches on the ttl/hoplimit field and use
>> a pedit set action to set the ttl/hoplimit to one less than the match
>> value. This results in a filter for each unique ttl value we want to
>> decrement.
>>
>> This patch introduces true decrement ttl which makes use of the pedit add
>> action. Adding 0xff is equivalent to decrementing the ttl/hoplimit
>> field. This also improves the hardware offload case by reducing the
>> number of filters required to support decrement ttl. In order to utilise
>> this, the following config option needs to be set to true.
>
> I'd be interested to understand the specific motivation for wanting
> to reduce flows. But I agree that in the general case it seems nice.
I think main motivation is just relaxing some of the constraints on HW
offload slightly; although matching ip ttl and setting it to one less
than the match field is possible, it feels a bit wasteful if we really
only decrement the field. I concede that depending on the use case the
gain might be insignificant. Mostly I just wanted to start a discussion
on this as I think we can improve this area a bit.
>
> One thing that I am curious about is the approach you have taken,
> which seems to be to patch pedit commands. But I am wondering if you
> considered plumbing the OpenFlow DEC_TTL command more directly into
> the ODP layer - perhaps hinging on the handling of OFPACT_DEC_TTL in
> do_xlate_actions().
Yes, I think that is good point. I think this ties in with Eelco's work;
https://patchwork.ozlabs.org/project/openvswitch/patch/162134902591.762107.15543938565196336653.st...@wsfd-netdev64.ntdv.lab.eng.bos.redhat.com/
Possibly this patch needs to be rebased on top Eelco's v5 of the above.
>
>> ovs-vsctl set Open_vSwitch . other_config:tc-pedit-add=true
>
>> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuu...@amd.com>
>> Reviewed-by: Alejandro Lucero Palau <aluce...@amd.com>
>> Reviewed-by: Martin Habets <martin.hab...@amd.com>
>> ---
>> .../linux/compat/include/linux/openvswitch.h | 1 +
>> lib/dpif-netdev.c | 1 +
>> lib/dpif.c | 1 +
>> lib/netdev-offload-tc.c | 16 ++++++++-
>> lib/netdev-offload.c | 12 +++++++
>> lib/netdev-offload.h | 1 +
>> lib/odp-execute.c | 2 ++
>> lib/odp-util.c | 4 +++
>> lib/tc.c | 35 +++++++++++++++++--
>> lib/tc.h | 1 +
>> ofproto/ofproto-dpif-ipfix.c | 1 +
>> ofproto/ofproto-dpif-sflow.c | 1 +
>> vswitchd/vswitch.xml | 15 ++++++++
>> 13 files changed, 87 insertions(+), 4 deletions(-)
>>
>> diff --git a/datapath/linux/compat/include/linux/openvswitch.h
>> b/datapath/linux/compat/include/linux/openvswitch.h
>> index 8d9300091..f7daffeb0 100644
>> --- a/datapath/linux/compat/include/linux/openvswitch.h
>> +++ b/datapath/linux/compat/include/linux/openvswitch.h
>> @@ -1075,6 +1075,7 @@ enum ovs_action_attr {
>> OVS_ACTION_ATTR_DROP, /* u32 xlate_error. */
>> OVS_ACTION_ATTR_LB_OUTPUT, /* u32 bond-id. */
>> #endif
>> + OVS_ACTION_ATTR_DEC_TTL, /* No argument. */
>
> I'm not entirely sure, but I think that:
> * if this attribute is not used by the OVS kernel datapath then
> it should be inside the ifndef; otherwise
> * it should be above the ifndef
>
> Perhaps there is precedence, but it would strike me as unusual to implement
> a feature in the TC offload datapath but not in the OVS kernel datapath.
Yeah, I see. I think you are correct; this needs to be implement in the kernel
datapath as well and likely before we tackle the TC offload datapath.
I am hoping to rebase this on Eelco's patch that deals with this.
>
>> __OVS_ACTION_ATTR_MAX, /* Nothing past this will be accepted
>> * from userspace. */
>>
>> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
>> index 9f35713ef..6343da93b 100644
>> --- a/lib/dpif-netdev.c
>> +++ b/lib/dpif-netdev.c
>> @@ -9067,6 +9067,7 @@ dp_execute_cb(void *aux_, struct dp_packet_batch
>> *packets_,
>> case OVS_ACTION_ATTR_PUSH_ETH:
>> case OVS_ACTION_ATTR_POP_ETH:
>> case OVS_ACTION_ATTR_CLONE:
>> + case OVS_ACTION_ATTR_DEC_TTL:
>> case OVS_ACTION_ATTR_PUSH_NSH:
>> case OVS_ACTION_ATTR_POP_NSH:
>> case OVS_ACTION_ATTR_CT_CLEAR:
>> diff --git a/lib/dpif.c b/lib/dpif.c
>> index 40f5fe446..6ac5e3581 100644
>> --- a/lib/dpif.c
>> +++ b/lib/dpif.c
>> @@ -1265,6 +1265,7 @@ dpif_execute_helper_cb(void *aux_, struct
>> dp_packet_batch *packets_,
>> case OVS_ACTION_ATTR_POP_VLAN:
>> case OVS_ACTION_ATTR_PUSH_MPLS:
>> case OVS_ACTION_ATTR_POP_MPLS:
>> + case OVS_ACTION_ATTR_DEC_TTL:
>> case OVS_ACTION_ATTR_SET:
>> case OVS_ACTION_ATTR_SET_MASKED:
>> case OVS_ACTION_ATTR_SAMPLE:
>> diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
>> index 9845e8d3f..878d63d37 100644
>> --- a/lib/netdev-offload-tc.c
>> +++ b/lib/netdev-offload-tc.c
>> @@ -18,6 +18,7 @@
>>
>> #include <errno.h>
>> #include <linux/if_ether.h>
>> +#include <linux/tc_act/tc_pedit.h>
>>
>> #include "dpif.h"
>> #include "hash.h"
>> @@ -877,7 +878,20 @@ parse_tc_flower_to_match(struct tc_flower *flower,
>> }
>> break;
>> case TC_ACT_PEDIT: {
>> - parse_flower_rewrite_to_netlink_action(buf, flower);
>> + if (action->pedit_type == TCA_PEDIT_KEY_EX_CMD_ADD) {
>> + if (flower->rewrite.key.eth_type == htons(ETH_TYPE_IP)
>> &&
>> + flower->rewrite.mask.ipv4.rewrite_ttl == 0xff &&
>> + flower->rewrite.key.ipv4.rewrite_ttl == 0xff) {
>> + nl_msg_put_flag(buf, OVS_ACTION_ATTR_DEC_TTL);
>> + }
>> + if (flower->rewrite.key.eth_type ==
>> htons(ETH_TYPE_IPV6) &&
>> + flower->rewrite.mask.ipv6.rewrite_hlimit == 0xff &&
>> + flower->rewrite.key.ipv6.rewrite_hlimit == 0xff) {
>> + nl_msg_put_flag(buf, OVS_ACTION_ATTR_DEC_TTL);
>> + }
>> + } else {
>> + parse_flower_rewrite_to_netlink_action(buf, flower);
>> + }
>> }
>> break;
>> case TC_ACT_ENCAP: {
>> diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
>> index fb108c0d5..c7551dd97 100644
>> --- a/lib/netdev-offload.c
>> +++ b/lib/netdev-offload.c
>> @@ -765,6 +765,14 @@ netdev_is_offload_rebalance_policy_enabled(void)
>> return netdev_offload_rebalance_policy;
>> }
>>
>> +static bool netdev_offload_tc_pedit_add = false;
>> +
>> +bool
>> +netdev_is_tc_pedit_add_enabled(void)
>> +{
>> + return netdev_offload_tc_pedit_add;
>> +}
>> +
>> static void
>> netdev_ports_flow_init(void)
>> {
>> @@ -780,6 +788,10 @@ netdev_ports_flow_init(void)
>> void
>> netdev_set_flow_api_enabled(const struct smap *ovs_other_config)
>> {
>> + if (smap_get_bool(ovs_other_config, "tc-pedit-add", false)) {
>> + netdev_offload_tc_pedit_add = true;
>> + }
>> +
>> if (smap_get_bool(ovs_other_config, "hw-offload", false)) {
>> static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
>>
>> diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
>> index 8237a85dd..1d38fe338 100644
>> --- a/lib/netdev-offload.h
>> +++ b/lib/netdev-offload.h
>> @@ -126,6 +126,7 @@ bool netdev_any_oor(void);
>> bool netdev_is_flow_api_enabled(void);
>> void netdev_set_flow_api_enabled(const struct smap *ovs_other_config);
>> bool netdev_is_offload_rebalance_policy_enabled(void);
>> +bool netdev_is_tc_pedit_add_enabled(void);
>> int netdev_flow_get_n_flows(struct netdev *netdev, uint64_t *n_flows);
>>
>> struct dpif_port;
>> diff --git a/lib/odp-execute.c b/lib/odp-execute.c
>> index 2f4cdd92c..026d6fb0e 100644
>> --- a/lib/odp-execute.c
>> +++ b/lib/odp-execute.c
>> @@ -821,6 +821,7 @@ requires_datapath_assistance(const struct nlattr *a)
>> case OVS_ACTION_ATTR_CHECK_PKT_LEN:
>> case OVS_ACTION_ATTR_ADD_MPLS:
>> case OVS_ACTION_ATTR_DROP:
>> + case OVS_ACTION_ATTR_DEC_TTL:
>> return false;
>>
>> case OVS_ACTION_ATTR_UNSPEC:
>> @@ -1001,6 +1002,7 @@ odp_execute_actions(void *dp, struct dp_packet_batch
>> *batch, bool steal,
>> }
>> break;
>> case OVS_ACTION_ATTR_METER:
>> + case OVS_ACTION_ATTR_DEC_TTL:
>> /* Not implemented yet. */
>> break;
>> case OVS_ACTION_ATTR_PUSH_ETH: {
>> diff --git a/lib/odp-util.c b/lib/odp-util.c
>> index 9a705cffa..abbf649c1 100644
>> --- a/lib/odp-util.c
>> +++ b/lib/odp-util.c
>> @@ -131,6 +131,7 @@ odp_action_len(uint16_t type)
>> case OVS_ACTION_ATTR_POP_MPLS: return sizeof(ovs_be16);
>> case OVS_ACTION_ATTR_RECIRC: return sizeof(uint32_t);
>> case OVS_ACTION_ATTR_HASH: return sizeof(struct ovs_action_hash);
>> + case OVS_ACTION_ATTR_DEC_TTL: return ATTR_LEN_VARIABLE;
>> case OVS_ACTION_ATTR_SET: return ATTR_LEN_VARIABLE;
>> case OVS_ACTION_ATTR_SET_MASKED: return ATTR_LEN_VARIABLE;
>> case OVS_ACTION_ATTR_SAMPLE: return ATTR_LEN_VARIABLE;
>> @@ -1162,6 +1163,9 @@ format_odp_action(struct ds *ds, const struct nlattr
>> *a,
>> case OVS_ACTION_ATTR_HASH:
>> format_odp_hash_action(ds, nl_attr_get(a));
>> break;
>> + case OVS_ACTION_ATTR_DEC_TTL:
>> + ds_put_cstr(ds, "dec_ttl");
>> + break;
>> case OVS_ACTION_ATTR_SET_MASKED:
>> a = nl_attr_get(a);
>> /* OVS_KEY_ATTR_NSH is nested attribute, so it needs special
>> process */
>> diff --git a/lib/tc.c b/lib/tc.c
>> index adb2d3182..01bb4c643 100644
>> --- a/lib/tc.c
>> +++ b/lib/tc.c
>> @@ -36,6 +36,7 @@
>> #include <unistd.h>
>>
>> #include "byte-order.h"
>> +#include "netdev-offload.h"
>> #include "netlink-socket.h"
>> #include "netlink.h"
>> #include "openvswitch/ofpbuf.h"
>> @@ -1010,12 +1011,13 @@ nl_parse_act_pedit(struct nlattr *options, struct
>> tc_flower *flower)
>> struct nlattr *pe_attrs[ARRAY_SIZE(pedit_policy)];
>> const struct tc_pedit *pe;
>> const struct tc_pedit_key *keys;
>> - const struct nlattr *nla, *keys_ex, *ex_type;
>> + const struct nlattr *nla, *keys_ex, *ex_type, *ex_cmd;
>> const void *keys_attr;
>> char *rewrite_key = (void *) &flower->rewrite.key;
>> char *rewrite_mask = (void *) &flower->rewrite.mask;
>> size_t keys_ex_size, left;
>> int type, i = 0, err;
>> + uint16_t pedit_type;
>>
>> if (!nl_parse_nested(options, pedit_policy, pe_attrs,
>> ARRAY_SIZE(pedit_policy))) {
>> @@ -1043,6 +1045,10 @@ nl_parse_act_pedit(struct nlattr *options, struct
>> tc_flower *flower)
>> ex_type = nl_attr_find_nested(nla, TCA_PEDIT_KEY_EX_HTYPE);
>> type = nl_attr_get_u16(ex_type);
>>
>> + ex_cmd = nl_attr_find_nested(nla, TCA_PEDIT_KEY_EX_CMD);
>> + pedit_type = nl_attr_get_u16(ex_cmd);
>> + flower->actions[flower->action_count].pedit_type = pedit_type;
>> +
>
> Is there a case where TCA_PEDIT_KEY_EX_CMD will not be present?
> If so, does the above deal with such a case?
I suspect that this is not possible; from pedit I believe we always
either need SET or ADD to be included in the action. I will confirm
this and if it's not the case, I agree we need a check here to deal
with it correctly.
>
>> err = csum_update_flag(flower, type);
>> if (err) {
>> return err;
>> @@ -2473,7 +2479,8 @@ csum_update_flag(struct tc_flower *flower,
>>
>> static int
>> nl_msg_put_flower_rewrite_pedits(struct ofpbuf *request,
>> - struct tc_flower *flower)
>> + struct tc_flower *flower,
>> + struct tc_action *action)
>> {
>> struct {
>> struct tc_pedit sel;
>> @@ -2484,6 +2491,7 @@ nl_msg_put_flower_rewrite_pedits(struct ofpbuf
>> *request,
>> .nkeys = 0
>> }
>> };
>> + uint8_t tr_ttl;
>
> nit: I think that this declaration can move inside the block where tr_ttl
> is used.
I will do. Thanks.
>
>> int i, j, err;
>>
>> for (i = 0; i < ARRAY_SIZE(flower_pedit_map); i++) {
>> @@ -2529,6 +2537,26 @@ nl_msg_put_flower_rewrite_pedits(struct ofpbuf
>> *request,
>> pedit_key->mask = ~mask_word;
>> pedit_key->val = data_word & mask_word;
>> sel.sel.nkeys++;
>> + tr_ttl = (flower->key.ip_ttl & flower->mask.ip_ttl) - 1;
>> + if (netdev_is_tc_pedit_add_enabled()) {
>> + /* rewrite ttl action */
>> + if (pedit_key_ex->htype == TCA_PEDIT_KEY_EX_HDR_TYPE_IP4 &&
>> + pedit_key->val == tr_ttl &&
>> + pedit_key->off == 8) {
>> + pedit_key_ex->cmd = TCA_PEDIT_KEY_EX_CMD_ADD;
>> + pedit_key->val = 0xff;
>> + flower->mask.ip_ttl = 0;
>> + action->pedit_type = TCA_PEDIT_KEY_EX_CMD_ADD;
>> + }
>> + if (pedit_key_ex->htype == TCA_PEDIT_KEY_EX_HDR_TYPE_IP6 &&
>> + pedit_key->val == tr_ttl << 24 &&
>> + pedit_key->off == 4) {
>> + pedit_key_ex->cmd = TCA_PEDIT_KEY_EX_CMD_ADD;
>> + pedit_key->val = 0xff << 24;
>> + flower->mask.ip_ttl = 0;
>> + action->pedit_type = TCA_PEDIT_KEY_EX_CMD_ADD;
>> + }
>> + }
>
> Perhaps this problem doesn't arise, but I'm somewhat confused about how
> the above differentiates between the following two cases described
> in OpenFlow:
>
> 1. match on ip & ttl==7; set ttl to 6
> 2. match on ip; dec_ttl
>
I don't think we currently do. Thanks for pointing it out. We'll need to
add some logic here to distinguish between the 2 cases.
<snip>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev