On Wed, Jun 5, 2019 at 9:35 AM Darrell Ball <dlu...@gmail.com> wrote:
> > > On Tue, Jun 4, 2019 at 11:41 PM 姜立东 <jianglido...@jd.com> wrote: > >> *comments* *inline*. >> >> >> >> *发件人:* Darrell Ball <dlu...@gmail.com> >> *发送时间:* 2019年6月5日 13:31 >> *收件人:* 姜立东 <jianglido...@jd.com> >> *抄送:* d...@openvswitch.org; 王志克 <wangzh...@jd.com> >> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in >> userspace conntrack >> >> >> >> >> >> >> >> On Tue, Jun 4, 2019 at 7:13 PM 姜立东 <jianglido...@jd.com> wrote: >> >> Please check *comments* *inline*. >> >> >> >> BR, Lidong >> >> >> >> *发件人:* Darrell Ball <dlu...@gmail.com> >> *发送时间:* 2019年6月4日 23:33 >> *收件人:* 姜立东 <jianglido...@jd.com> >> *抄送:* d...@openvswitch.org; 王志克 <wangzh...@jd.com> >> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in >> userspace conntrack >> >> >> >> >> >> >> >> On Mon, Jun 3, 2019 at 7:59 PM 姜立东 <jianglido...@jd.com> wrote: >> >> Hi Darrell, >> >> >> >> The “tighter checking” on TCP_RST in kernel, do you mean processes under >> “TCP_CONNTRACK_CLOSE” case? >> >> There is only one situation(when th->seq is less than peer td_maxack, as >> mentioned) may leads to explicitly –NF_ACCEPT, kernel implementation >> just lets most TCP_RST packets pass when TCP liberal mode is enabled. >> >> >> >> yep, this is the tighter checking the kernel does for RSTs in general >> >> Userspace code has a tighter check for RSTs >> >> *Why not follow kernel tight checking then? **J* >> >> >> >> Meanwhile, I think the purpose of TCP liberal mode is to skip TCP SEQ >> check in conntrack, when SEQ out of sync is normal situation in some >> application scenarios. >> >> For example, TCP_RST packets can have very different SEQ than what is >> captured in conntrack in hardware-offloading scenario, because most of >> packets are forwarded by hardware. >> >> >> >> TCP resets are used in selected valid scenarios, but are more often >> associated with DOS scenarios, so they are a special case >> >> Removing most or all protection from reset attacks seems irresponsible. >> >> >> >> *TCP reset happens in few scenarios in theory, but it can happen >> frequently in big scale applicant scenario with huge connections, some >> software clients use sensible shorter wait time, they may close socket >> before server response. * >> >> *So it depends on TCP RST function to free server resource, it is >> necessary requirement just as what we can observer in our network every >> day, that is why we need to keep TCP_RST function working either in kernel >> conntrack and in user space conntrack.* >> >> >> >> That is common; however, the question is why not pass through the RST in >> offload mode in this scenario and let conntrack do cleanup on reuse, >> if/when necessary ? >> >> >> >> *If pass through TCP_RST in hardware, it leaves garbage information in >> OVS conntrack, while this connection has been cleared in both endpoints. >> When/if same port is reused, this garbage information may cause other >> problems.* >> > > 'Reuse' is a pre-existing capability that handles this kind of case and > the summary behavior in this use case will be similar. > In retrospect, the reuse capability may not meet your requirements in all cases. Anyways, your second point below is sufficient. I will modify the patch for 'no-tcp-seq-chk' > > >> *Also such solution makes OVS conntrack connection information less >> valuable, once flow is offloaded, it’s information is meaningless.* >> > > From this point in the lifecycle of the connection, there is just a little > more to do anyways, 'normally'. > > That being said, there can be added value features that may come into play > here in the non-offload path. > > >> >> >> >> >> *Meanwhile, kernel implementation of this feature has been there years, >> we don**’t see serious security issue is involved.* >> >> >> >> :-) >> >> >> >> *Why don**’t we do less limitation, leave the decision to user of this >> mode, considering TCP liberal is mostly related to special deployment >> scenario.* >> >> >> >> 'no-tcp-seq-chk' mode is a related feature and can be easily >> incrementally added or in lieu of 'liberal' mode; this will do as you >> require. >> >> >> >> >> >> >> >> >> >> Hardware offload implementations can be done well or poorly; catering to >> poor implementations at the cost to every other use case >> >> may not be the best approach. >> >> >> >> *No matter good or bad design, TCP RST is needed to let conntrack know >> that this connection is closing, and recycle this connection in short time* >> . >> >> >> >> In this scenario, TCP_RST can work normally with kernel TCP liberal mode >> because it is passed to destination, but won’t work with user space >> conntrack with current change due to being dropped. >> >> >> >> I think tighter check is good for normal situation, but may be conflict >> with the purpose of TCP liberal mode by putting more limitations. >> >> >> >> BR, Lidong >> >> >> >> *发件人:* Darrell Ball <dlu...@gmail.com> >> *发送时间:* 2019年6月3日 23:53 >> *收件人:* 姜立东 <jianglido...@jd.com> >> *抄送:* d...@openvswitch.org; 王志克 <wangzh...@jd.com> >> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in >> userspace conntrack >> >> >> >> >> >> >> >> On Mon, Jun 3, 2019 at 12:21 AM 姜立东 <jianglido...@jd.com> wrote: >> >> Hi Darrell, >> >> >> >> By checking latest patch, we notice it has inconsistent behavior on >> TCP_RST packet with >> >> kernel tcp_liberal option. >> >> In kernel, TCP_RST packet is only checked for TCP_CONNTRACK_CLOSE state, >> >> >> >> TCP_CONNTRACK_CLOSE is the 'next state' on receiving RST >> >> >> >> >> >> TCP_RST >> >> packet is dropped if its seq is less than peer acked. >> >> In other cases, TCP_RST packets are processed in tcp_in_window() as >> other packets, if >> >> tcp_liberal is enabled, they always are accepted. >> >> >> >> On receipt of a RST in any present state, tighter checking is done; this >> checking is done >> >> before *tcp_in_window* >> <https://elixir.bootlin.com/linux/v4.9.12/ident/tcp_in_window>() is >> executed. >> >> >> >> Do you mean “tighter checking” is processes under “TCP_CONNTRACK_CLOSE” >> case? There is >> >> >> >> Please refer to tcp_packet() and tcp_in_window() in kernel conntrack. >> >> >> >> Current implementation may cause TCP_RST packet is marked as INVALID by >> conntrack >> >> due to old seq info in conntrack in hardware-offloading scenario. >> >> So that we suggest to move conntrack_get_tcp_liberal(ct) up in this >> check to include >> >> TCP_RST packet as below, >> >> >> >> I think the tighter checking for RSTs I am proposing is generally >> consistent, although BSD and >> >> Linux are somewhat different. >> >> >> >> >> >> >> >> - if (SEQ_GEQ(src->seqhi, end) >> >> - /* Last octet inside other's window space */ >> >> - && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) >> >> - /* Retrans: not more than one window back */ >> >> - && (ackskew >= -MAXACKWINDOW) >> >> - /* Acking not more than one reassembled fragment backwards */ >> >> - && (ackskew <= (MAXACKWINDOW << sws)) >> >> - /* Acking not more than one window forward */ >> >> + if ((SEQ_GEQ(src->seqhi, end) >> >> + /* Last octet inside other's window space */ >> >> + && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) >> >> + /* Retrans: not more than one window back */ >> >> + && ackskew >= -MAXACKWINDOW >> >> + /* Acking not more than one reassembled fragment backwards */ >> >> + && ackskew <= (MAXACKWINDOW << sws)) >> >> + /* Acking not more than one window forward */ >> >> && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo >> >> - || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == >> src->seqlo))) { >> >> + || (orig_seq == src->seqlo + 1) >> >> + || (orig_seq + 1 == src->seqlo))) >> >> + || conntrack_get_tcp_liberal(ct)) { >> >> >> >> >> >> >> >> Thanks, >> >> Lidong >> >> >> >> *发件人:* Darrell Ball <dlu...@gmail.com> >> *发送时间:* 2019年6月1日 2:11 >> *收件人:* 姜立东 <jianglido...@jd.com> >> *抄送:* d...@openvswitch.org; 王志克 <wangzh...@jd.com> >> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in >> userspace conntrack >> >> >> >> >> >> >> >> On Fri, May 31, 2019 at 9:40 AM Darrell Ball <dlu...@gmail.com> wrote: >> >> >> >> >> >> On Fri, May 31, 2019 at 2:42 AM 姜立东 <jianglido...@jd.com> wrote: >> >> Hi Darrell, >> >> >> >> Thanks for prompt action. >> >> >> >> I think there are 2 main differences in your change, >> >> 1. Replace Open_vswitch.other_config by dpctl CLI command. >> >> 2. One more case to check tcp_liberal as below. >> >> @@ -333,10 +335,12 @@ tcp_conn_update(struct conntrack *ct, struct conn >> *conn_, >> >> } else if ((dst->state < CT_DPIF_TCPS_SYN_SENT >> >> || dst->state >= CT_DPIF_TCPS_FIN_WAIT_2 >> >> || src->state >= CT_DPIF_TCPS_FIN_WAIT_2) >> >> - && SEQ_GEQ(src->seqhi + MAXACKWINDOW, end) >> >> - /* Within a window forward of the originating packet */ >> >> - && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) { >> >> - /* Within a window backward of the originating packet */ >> >> + && ((SEQ_GEQ(src->seqhi + MAXACKWINDOW, end) >> >> + /* Within a window forward of the originating packet >> */ >> >> + && SEQ_GEQ(seq, src->seqlo - MAXACKWINDOW)) >> >> + /* Within a window backward of the originating >> packet */ >> >> + || (conntrack_get_tcp_liberal(ct) >> >> + && (tcp_flags & TCP_RST) == 0))) >> >> For item 1, it is fine for us. >> >> For item 2, we didn’t actually hit into this conditional branch in our >> application, but considering conntrack is in the middle of this path, >> >> it is reasonable to add this check and leave SEQ validation to endpoints >> ’ local TCP state machine. So it looks good to us. >> >> >> >> Lidong >> >> >> >> I just noticed you were referring to the "else if" condition above >> >> This change had been dropped in v2 since it is not needed, here: >> >> >> >> https://patchwork.ozlabs.org/patch/1107761/ >> >> >> >> The change in the 'if' condition check is the main algorithm change; you >> might want to >> >> compare it to the proposed change that you had sent out. >> >> >> >> Thanks Darrell >> >> >> >> >> >> >> >> *diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c* >> >> *index 397aca1..e95d2f3 100644* >> >> *--- a/lib/conntrack-tcp.c* >> >> *+++ b/lib/conntrack-tcp.c* >> >> *@@ -272,16 +272,18 @@* tcp_conn_update(struct conntrack *ct, struct conn >> *conn_, >> >> int ackskew = check_ackskew ? dst->seqlo - ack : 0; >> >> #define MAXACKWINDOW (0xffff + 1500) /* 1500 is an arbitrary fudge >> factor */ >> >> - if (SEQ_GEQ(src->seqhi, end) >> >> - /* Last octet inside other's window space */ >> >> - && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) >> >> - /* Retrans: not more than one window back */ >> >> - && (ackskew >= -MAXACKWINDOW) >> >> - /* Acking not more than one reassembled fragment backwards */ >> >> - && (ackskew <= (MAXACKWINDOW << sws)) >> >> - /* Acking not more than one window forward */ >> >> + if (((SEQ_GEQ(src->seqhi, end) >> >> + /* Last octet inside other's window space */ >> >> + && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) >> >> + /* Retrans: not more than one window back */ >> >> + && ackskew >= -MAXACKWINDOW >> >> + /* Acking not more than one reassembled fragment backwards */ >> >> + && ackskew <= (MAXACKWINDOW << sws)) >> >> + || conntrack_get_tcp_liberal(ct)) >> >> + /* Acking not more than one window forward */ >> >> && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo >> >> - || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == >> src->seqlo))) { >> >> + || (orig_seq == src->seqlo + 1) >> >> + || (orig_seq + 1 == src->seqlo))) { >> >> /* Require an exact/+1 sequence match on resets when possible */ >> >> /* update max window */ >> >> >> >> >> >> >> >> >> >> >> >> Thanks Lidong >> >> >> >> The algorithm I used adheres to the liberal mode definition. >> >> >> >> For future reference, another important aspect is test support. >> >> >> >> Darrell >> >> >> >> >> >> BR, >> >> Lidong >> >> >> >> *发件人:* Darrell Ball <dlu...@gmail.com> >> *发送时间:* 2019年5月30日 14:51 >> *收件人:* 姜立东 <jianglido...@jd.com> >> *抄送:* d...@openvswitch.org; 王志克 <wangzh...@jd.com> >> *主题:* Re: [ovs-dev] [PATCH] conntrack: add tcp_in_liberal option in >> userspace conntrack >> >> >> >> Hi Lidong/Zhike >> >> >> >> I sent an alternative implementation: >> https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359364.html >> >> >> >> Pls take a look and check if it meets your requirements. >> >> >> >> Thanks Darrell >> >> >> >> On Fri, May 24, 2019 at 10:11 AM Darrell Ball <dlu...@gmail.com> wrote: >> >> Thanks for the patch Lidong/Zhike >> >> >> >> I took an initial look and will send a response next week. >> >> >> >> Darrell >> >> >> >> On Tue, May 21, 2019 at 8:53 PM 姜立东 <jianglido...@jd.com> wrote: >> >> From 3ce112684921bca74839e109fda91848aa024a54 Mon Sep 17 00:00:00 2001 >> From: Jiang Lidong <jianglido...@jd.com> >> Date: Wed, 22 May 2019 11:21:34 +0800 >> Subject: [PATCH] conntrack: add tcp_in_liberal option in userspace >> conntrack >> >> Adding similar cp_in_liberal option in userspace conntrack as >> kernel conntrack does to skip seq check on tcp connection. >> It prevents packet is marked as INVALID by stable seq >> info in conntrack connection. >> >> This option can help to make traffic survive in hardware >> offloading cases, especially when traffic is being moved >> back to software path from hardware forwarding engine. >> >> Signed-off-by: Lidong Jiang <jianglido...@jd.com> >> Signed-off-by: Zhike Wang <wangzh...@jd.com> >> >> --- >> lib/conntrack-private.h | 2 ++ >> lib/conntrack-tcp.c | 5 +++-- >> lib/conntrack.c | 6 ++++++ >> lib/conntrack.h | 4 +++- >> lib/dpif-netdev.c | 6 ++++++ >> 5 files changed, 20 insertions(+), 3 deletions(-) >> >> diff --git a/lib/conntrack-private.h b/lib/conntrack-private.h >> index 51b7d7f..9bc99cd 100644 >> --- a/lib/conntrack-private.h >> +++ b/lib/conntrack-private.h >> @@ -172,6 +172,8 @@ struct conntrack { >> /* Fragmentation handling context. */ >> struct ipf *ipf; >> + >> + bool tcp_be_liberal; >> }; >> /* Lock acquisition order: >> diff --git a/lib/conntrack-tcp.c b/lib/conntrack-tcp.c >> index 397aca1..61abafb 100644 >> --- a/lib/conntrack-tcp.c >> +++ b/lib/conntrack-tcp.c >> @@ -272,7 +272,7 @@ tcp_conn_update(struct conntrack *ct, struct conn >> *conn_, >> int ackskew = check_ackskew ? dst->seqlo - ack : 0; >> #define MAXACKWINDOW (0xffff + 1500) /* 1500 is an arbitrary fudge >> factor */ >> - if (SEQ_GEQ(src->seqhi, end) >> + if ((SEQ_GEQ(src->seqhi, end) >> /* Last octet inside other's window space */ >> && SEQ_GEQ(seq, src->seqlo - (dst->max_win << dws)) >> /* Retrans: not more than one window back */ >> @@ -281,7 +281,8 @@ tcp_conn_update(struct conntrack *ct, struct conn >> *conn_, >> && (ackskew <= (MAXACKWINDOW << sws)) >> /* Acking not more than one window forward */ >> && ((tcp_flags & TCP_RST) == 0 || orig_seq == src->seqlo >> - || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == >> src->seqlo))) { >> + || (orig_seq == src->seqlo + 1) || (orig_seq + 1 == >> src->seqlo))) >> + || (ct->tcp_be_liberal)) { >> /* Require an exact/+1 sequence match on resets when possible */ >> /* update max window */ >> diff --git a/lib/conntrack.c b/lib/conntrack.c >> index 6711f5e..bd92710 100644 >> --- a/lib/conntrack.c >> +++ b/lib/conntrack.c >> @@ -2282,6 +2282,12 @@ conntrack_ipf_ctx(struct conntrack *ct) >> return ct->ipf; >> } >> +void >> +conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled) >> +{ >> + ct->tcp_be_liberal = enabled; >> +} >> + >> int >> conntrack_dump_start(struct conntrack *ct, struct conntrack_dump *dump, >> const uint16_t *pzone, int *ptot_bkts) >> diff --git a/lib/conntrack.h b/lib/conntrack.h >> index 2012150..b8d799d 100644 >> --- a/lib/conntrack.h >> +++ b/lib/conntrack.h >> @@ -119,5 +119,7 @@ int conntrack_set_maxconns(struct conntrack *ct, >> uint32_t maxconns); >> int conntrack_get_maxconns(struct conntrack *ct, uint32_t *maxconns); >> int conntrack_get_nconns(struct conntrack *ct, uint32_t *nconns); >> struct ipf *conntrack_ipf_ctx(struct conntrack *ct); >> - >> + >> +void conntrack_set_tcp_be_liberal(struct conntrack *ct, bool enabled); >> + >> #endif /* conntrack.h */ >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index 5a6f2ab..ae6a18e 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -3823,6 +3823,12 @@ dpif_netdev_set_config(struct dpif *dpif, const >> struct smap *other_config) >> uint32_t tx_flush_interval, cur_tx_flush_interval; >> uint64_t rebalance_intvl; >> + bool tcp_be_liberal = smap_get_bool(other_config, >> + "conntrack_tcp_be_liberal", >> + false); >> + >> + conntrack_set_tcp_be_liberal(&dp->conntrack, tcp_be_liberal); >> + >> tx_flush_interval = smap_get_int(other_config, "tx-flush-interval", >> DEFAULT_TX_FLUSH_INTERVAL); >> atomic_read_relaxed(&dp->tx_flush_interval, &cur_tx_flush_interval); >> -- >> 1.8.3.1 >> >> _______________________________________________ >> dev mailing list >> d...@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >> >> _______________________________________________ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev