On Mon, Feb 8, 2021 at 4:02 PM Dumitru Ceara <dce...@redhat.com> wrote:
>
> On 2/8/21 11:09 AM, Numan Siddique wrote:
> > On Sat, Feb 6, 2021 at 4:00 AM Dumitru Ceara <dce...@redhat.com> wrote:
> >>
> >> The main trait of load balancer hairpin traffic is that it never leaves
> >> the local hypervisor.  Essentially this means that only hairpin
> >> openflows installed for logical switches that have at least one logical
> >> switch port bound locally can ever be hit.
> >>
> >> Until now, if a load balancer was applied on multiple logical switches
> >> that are connected through a distributed router, ovn-controller would
> >> install flows to detect hairpin replies for all logical switches. In
> >> practice this leads to a very high number of openflows out of which
> >> most will never be used.
> >>
> >> Instead we now use an additional action, learn(), on flows that match on
> >> packets that create the hairpin session.  The learn() action will then
> >> generate the necessary flows to handle hairpin replies, but only for
> >> the local datapaths which actually generate hairpin traffic.
> >>
> >> For example, simulating how ovn-k8s uses load balancer for services,
> >> in a "switch per node" scenario, the script below would generate
> >> 10K (n_nodes * n_vips * n_backends) openflows on every node in table=69
> >> (hairpin reply).  With this patch the maximum number of openflows that
> >> can be created for hairpin replies is 200 (n_vips * n_backends).
> >>
> >> In general, for deployments that leverage switch-per-node topologies,
> >> the number of openflows is reduced by a factor of N, where N is the
> >> number of nodes.
> >>
> >>    $ cat lbs.sh
> >>    NODES=50
> >>    VIPS=20
> >>    BACKENDS=10
> >>    ovn-nbctl lr-add rtr
> >>    for ((i = 1; i <= $NODES; i++)); do
> >>        ovn-nbctl \
> >>            -- ls-add ls$i \
> >>            -- lsp-add ls$i vm$i \
> >>            -- lsp-add ls$i ls$i-rtr \
> >>            -- lsp-set-type ls$i-rtr router \
> >>            -- lsp-set-options ls$i-rtr router-port=rtr-ls$i \
> >>            -- lrp-add rtr rtr-ls$i 00:00:00:00:01:00 42.42.42.$i/24
> >>    done
> >>
> >>    for ((i = 1; i <= $VIPS; i++)); do
> >>        lb=lb$i
> >>        vip=10.10.10.$i:1
> >>        bip=20.20.20.1:2
> >>        for ((j = 2; j <= $BACKENDS; j++)); do
> >>            bip="$bip,20.20.20.$j:2"
> >>        done
> >>        ovn-nbctl lb-add $lb $vip $backends
> >>    done
> >>
> >>    for ((i = 1; i <= $NODES; i++)); do
> >>        for ((j = 1; j <= $VIPS; j++)); do
> >>            ovn-nbctl ls-lb-add ls$i lb$j
> >>        done
> >>    done
> >>
> >>    ovs-vsctl add-port br-int vm1 \
> >>        -- set interface vm1 type=internal \
> >>        -- set interface vm1 external-ids:iface-id=vm1
> >>
> >> Suggested-by: Ilya Maximets <i.maxim...@ovn.org>
> >> Signed-off-by: Dumitru Ceara <dce...@redhat.com>
> >
> > Hi Dumitru,
>
> Hi Numan,
>
> >
> > Thanks for this patch. The patch LGTM. I tested it out with  huge NB
> > DB with lots
> > of load balancer and backends.  I see a significant reduction in OF flows.
>
> Thanks for the review!
>
> >
> > Please see the few comments below.
> >
> > Thanks
> > Numan
> >
> >> ---
> >>   controller/lflow.c      | 204 ++++++++++++++++---
> >>   tests/ofproto-macros.at |   5 +-
> >>   tests/ovn.at            | 516 
> >> +++++++++++++++++++++++++-----------------------
> >>   3 files changed, 455 insertions(+), 270 deletions(-)
> >>
> >> diff --git a/controller/lflow.c b/controller/lflow.c
> >> index 946c1e0..2b7d356 100644
> >> --- a/controller/lflow.c
> >> +++ b/controller/lflow.c
> >> @@ -1171,6 +1171,178 @@ add_neighbor_flows(struct ovsdb_idl_index 
> >> *sbrec_port_binding_by_name,
> >>       }
> >>   }
> >>
> >> +/* Builds the "learn()" action to be triggered by packets initiating a
> >> + * hairpin session.
> >> + *
> >> + * This will generate flows in table OFTABLE_CHK_LB_HAIRPIN_REPLY of the 
> >> form:
> >> + * - match:
> >> + *     metadata=<orig-pkt-metadata>,ip/ipv6,ip.src=<backend>,ip.dst=<vip>
> >> + *     nw_proto='lb_proto',tp_src_port=<backend-port>
> >> + * - action:
> >> + *     set MLF_LOOKUP_LB_HAIRPIN_BIT=1
> >> + */
> >> +static void
> >> +add_lb_vip_hairpin_reply_action(struct in6_addr *vip6, ovs_be32 vip,
> >> +                                uint8_t lb_proto, bool has_l4_port,
> >> +                                uint64_t cookie, struct ofpbuf *ofpacts)
> >> +{
> >> +    struct match match = MATCH_CATCHALL_INITIALIZER;
> >> +    struct ofpact_learn *ol = ofpact_put_LEARN(ofpacts);
> >> +    struct ofpact_learn_spec *ol_spec;
> >> +    unsigned int imm_bytes;
> >> +    uint8_t *src_imm;
> >> +
> >> +    /* Once learned, hairpin reply flows are permanent until the 
> >> VIP/backend
> >> +     * is removed.
> >> +     */
> >> +    ol->flags = NX_LEARN_F_DELETE_LEARNED;
> >> +    ol->idle_timeout = OFP_FLOW_PERMANENT;
> >> +    ol->hard_timeout = OFP_FLOW_PERMANENT;
> >> +    ol->priority = OFP_DEFAULT_PRIORITY;
> >> +    ol->table_id = OFTABLE_CHK_LB_HAIRPIN_REPLY;
> >> +    ol->cookie = htonll(cookie);
> >> +
> >> +    /* Match on metadata of the packet that created the hairpin session. 
> >> */
> >> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +
> >> +    ol_spec->dst.field = mf_from_id(MFF_METADATA);
> >> +    ol_spec->dst.ofs = 0;
> >> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >> +    ol_spec->src.field = mf_from_id(MFF_METADATA);
> >> +
> >> +    /* Match on the same ETH type as the packet that created the hairpin
> >> +     * session.
> >> +     */
> >> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +    ol_spec->dst.field = mf_from_id(MFF_ETH_TYPE);
> >> +    ol_spec->dst.ofs = 0;
> >> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >> +    ol_spec->src_type = NX_LEARN_SRC_IMMEDIATE;
> >> +    union mf_value imm_eth_type = {
> >> +        .be16 = !vip6 ? htons(ETH_TYPE_IP) : htons(ETH_TYPE_IPV6)
> >> +    };
> >> +    mf_write_subfield_value(&ol_spec->dst, &imm_eth_type, &match);
> >> +
> >> +    /* Push value last, as this may reallocate 'ol_spec'. */
> >> +    imm_bytes = DIV_ROUND_UP(ol_spec->dst.n_bits, 8);
> >> +    src_imm = ofpbuf_put_zeros(ofpacts, OFPACT_ALIGN(imm_bytes));
> >> +    memcpy(src_imm, &imm_eth_type, imm_bytes);
> >> +
> >> +    /* Hairpin replies have ip.src == <backend-ip>. */
> >> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +    if (!vip6) {
> >> +        ol_spec->dst.field = mf_from_id(MFF_IPV4_SRC);
> >> +        ol_spec->src.field = mf_from_id(MFF_IPV4_SRC);
> >> +    } else {
> >> +        ol_spec->dst.field = mf_from_id(MFF_IPV6_SRC);
> >> +        ol_spec->src.field = mf_from_id(MFF_IPV6_SRC);
> >> +    }
> >> +    ol_spec->dst.ofs = 0;
> >> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >> +    ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >> +
> >> +    /* Hairpin replies have ip.dst == <vip>. */
> >> +    union mf_value imm_ip;
> >> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +    if (!vip6) {
> >> +        ol_spec->dst.field = mf_from_id(MFF_IPV4_DST);
> >> +        imm_ip = (union mf_value) {
> >> +            .be32 = vip
> >> +        };
> >> +    } else {
> >> +        ol_spec->dst.field = mf_from_id(MFF_IPV6_DST);
> >> +        imm_ip = (union mf_value) {
> >> +            .ipv6 = *vip6
> >> +        };
> >> +    }
> >> +    ol_spec->dst.ofs = 0;
> >> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >> +    ol_spec->src_type = NX_LEARN_SRC_IMMEDIATE;
> >> +    mf_write_subfield_value(&ol_spec->dst, &imm_ip, &match);
> >> +
> >> +    /* Push value last, as this may reallocate 'ol_spec' */
> >> +    imm_bytes = DIV_ROUND_UP(ol_spec->dst.n_bits, 8);
> >> +    src_imm = ofpbuf_put_zeros(ofpacts, OFPACT_ALIGN(imm_bytes));
> >> +    memcpy(src_imm, &imm_ip, imm_bytes);
> >> +
> >> +    /* Hairpin replies have the same nw_proto as packets that created the
> >> +     * session.
> >> +     */
> >> +    union mf_value imm_proto = {
> >> +        .u8 = lb_proto,
> >> +    };
> >> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +    ol_spec->dst.field = mf_from_id(MFF_IP_PROTO);
> >> +    ol_spec->src.field = mf_from_id(MFF_IP_PROTO);
> >> +    ol_spec->dst.ofs = 0;
> >> +    ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +    ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >> +    ol_spec->src_type = NX_LEARN_SRC_IMMEDIATE;
> >> +    mf_write_subfield_value(&ol_spec->dst, &imm_proto, &match);
> >> +
> >> +    /* Push value last, as this may reallocate 'ol_spec' */
> >> +    imm_bytes = DIV_ROUND_UP(ol_spec->dst.n_bits, 8);
> >> +    src_imm = ofpbuf_put_zeros(ofpacts, OFPACT_ALIGN(imm_bytes));
> >> +    memcpy(src_imm, &imm_proto, imm_bytes);
> >> +
> >> +    /* Hairpin replies have source port == <backend-port>. */
> >> +    if (has_l4_port) {
> >> +        ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +        switch (lb_proto) {
> >> +        case IPPROTO_TCP:
> >> +            ol_spec->dst.field = mf_from_id(MFF_TCP_SRC);
> >> +            ol_spec->src.field = mf_from_id(MFF_TCP_DST);
> >> +            break;
> >> +        case IPPROTO_UDP:
> >> +            ol_spec->dst.field = mf_from_id(MFF_UDP_SRC);
> >> +            ol_spec->src.field = mf_from_id(MFF_UDP_DST);
> >> +            break;
> >> +        case IPPROTO_SCTP:
> >> +            ol_spec->dst.field = mf_from_id(MFF_SCTP_SRC);
> >> +            ol_spec->src.field = mf_from_id(MFF_SCTP_DST);
> >> +            break;
> >> +        default:
> >> +            OVS_NOT_REACHED();
> >> +            break;
> >> +        }
> >> +        ol_spec->dst.ofs = 0;
> >> +        ol_spec->dst.n_bits = ol_spec->dst.field->n_bits;
> >> +        ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +        ol_spec->dst_type = NX_LEARN_DST_MATCH;
> >> +        ol_spec->src_type = NX_LEARN_SRC_FIELD;
> >> +    }
> >> +
> >> +    /* Set MLF_LOOKUP_LB_HAIRPIN_BIT for hairpin replies. */
> >> +    ol_spec = ofpbuf_put_zeros(ofpacts, sizeof *ol_spec);
> >> +    ol_spec->dst.field = mf_from_id(MFF_LOG_FLAGS);
> >> +    ol_spec->dst.ofs = MLF_LOOKUP_LB_HAIRPIN_BIT;
> >> +    ol_spec->dst.n_bits = 1;
> >> +    ol_spec->n_bits = ol_spec->dst.n_bits;
> >> +    ol_spec->dst_type = NX_LEARN_DST_LOAD;
> >> +    ol_spec->src_type = NX_LEARN_SRC_IMMEDIATE;
> >> +    union mf_value imm_reg_value = {
> >> +        .u8 = 1
> >> +    };
> >> +    mf_write_subfield_value(&ol_spec->dst, &imm_reg_value, &match);
> >> +
> >> +    /* Push value last, as this may reallocate 'ol_spec' */
> >> +    imm_bytes = DIV_ROUND_UP(ol_spec->dst.n_bits, 8);
> >> +    src_imm = ofpbuf_put_zeros(ofpacts, OFPACT_ALIGN(imm_bytes));
> >> +    memcpy(src_imm, &imm_reg_value, imm_bytes);
> >> +
> >> +    ofpact_finish_LEARN(ofpacts, &ol);
> >
> > The above code makes perfect sense. But I think we can reduce the amount of
> > code by making use of the 'learn_parse()' function present in
> > lib/learn.c of OVS.
> >
> > This would also make the code much more simpler. Otherwise the reader
> > has to understand all the intricacies of adding 'learn' ofpact.
> >
> > I tried something like below to test it out and It worked fine to me
> >
> > *****************
> > struct ds learn = DS_EMPTY_INITIALIZER;
> > ds_put_format(&learn, "table=%d,delete_learned,OXM_OF_METADATA[],"
> >
> > "eth_type=0x800,nw_proto=6,NXM_OF_IP_SRC[],ip_dst="IP_FMT","
> >
> > "NXM_OF_TCP_SRC[]=NXM_OF_TCP_DST[],load:0x1->NXM_NX_REG10[7]",
> >                          OFTABLE_CHK_LB_HAIRPIN_REPLY, IP_ARGS(vip));
> > char *error = learn_parse(ds_cstr(&learn), NULL, NULL, &ofpbuf_learn);
> > if (!error) {
> > ofpbuf_put(ofpacts, ofpbuf_learn.data, ofpbuf_learn.size);
> > }
> > ...
> > ..
> >
> > ******************
> >
> > What do you think ?
>
> To be honest, I'm not so sure about this.  It would create a dependency
> on how OVS formats openflows internally.
>
> I think we can argue the same thing about intricacies of adding other
> OVS actions, e.g.:
>
> https://github.com/ovn-org/ovn/blob/44ea2ec88136f83e7eab9790473025b6c95bdcc0/controller/lflow.c#L1267
>
> https://github.com/ovn-org/ovn/blob/44ea2ec88136f83e7eab9790473025b6c95bdcc0/lib/actions.c#L422
>
> We could change those too and use OVS parsing internal utilities to
> parse strings but we don't and, in my opinion, the proper way to go is
> to configure the actions programatically by using the data structures in
> include/openvswitch/ofp-actions.h
>
> If it helps I can try to refactor a bit the code that creates the
> learn() action for hairpin although I'm not sure at the moment how to do
> that.

Thanks for the reply. Fair enough. I applied this patch to master.

Numan

>
> >
> > Thanks
> > Numan
>
> Regards,
> Dumitru
>
> _______________________________________________
> dev mailing list
> d...@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
_______________________________________________
dev mailing list
d...@openvswitch.org
https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Reply via email to