Re: [PATCH 00/14] Modify action API for implementing lockless actions
On Fri, May 25, 2018 at 1:39 PM, Vlad Buslovwrote: > > On Thu 24 May 2018 at 23:34, Cong Wang wrote: >> On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov wrote: >>> Currently, all netlink protocol handlers for updating rules, actions and >>> qdiscs are protected with single global rtnl lock which removes any >>> possibility for parallelism. This patch set is a first step to remove >>> rtnl lock dependency from TC rules update path. It updates act API to >>> use atomic operations, rcu and spinlocks for fine-grained locking. It >>> also extend API with functions that are needed to update existing >>> actions for parallel execution. >> >> Can you give a summary here for what and how it is achieved? > > Got it, will expand cover letter in V2 with summary. >> >> You said this is the first step, what do you want to achieve in this >> very first step? And how do you achieve it? Do you break the RTNL > > But aren't this questions answered in paragraph you quoted? Obviously not, you said to remove it, but never explains why it can be removed and how it is removed. This is crucial for review. "use atomic operations, rcu and spinlocks for fine-grained locking" is literately nothing, why atomic/rcu makes RTNL unnecessary? How RCU is used? What spinlocks are you talking about? What do these spinlocks protect after removing RTNL? Why are they safe with other netdevice and netns operations? You explain _nothing_ here. Really. Please don't force people to read 14 patches to understand how it works. In fact, no one wants to read the code unless there is some high-level explanation that makes basic sense. > What: Change act API to not rely on one-big-global-RTNL-lock and to use > more fine-grained synchronization methods to allow safe concurrent > execution. Sure, how fine-grained it is after your patchset? Why this fine-grained lock could safely replace RTNL? Could you stop letting us guess your puzzle words? It would save your time from exchanging emails with me, it would save my time from guessing you too. It is a win-win. > How: Refactor act API code to use atomics, rcu and spinlocks, etc. for > protecting shared data structures, add new functions required to update What shared data structures? The per-netns idr which is already protected by a spinlock? The TC hierarchy? The shared standalone actions? Hey, why do I have to guess? :-/ > specific actions implementation for parallel execution. (step 2) Claim is easy, prove is hard. I can easily claim I break RTNL down to a per-netns lock, but I can't prove it really works. :-D > > If you feel that this cover letter is too terse, I will add outline of > changes in V2. It is not my rule, it is how you have to help people to review your 14 patches. I think it is a fair game: you help people like me to review your patches, we help you to get them reviewed and merged if they all make sense. > >> lock down to, for a quick example, a per-device lock? Or perhaps you >> completely remove it because of what reason? > > I want to remove RTNL _dependency_ from act API data structures and > code. I probably should me more specific in this case: > > Florian recently made a change that allows registering netlink protocol > handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with > this flag are called without RTNL taken. My end goal is to have rule > update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered > with UNLOCKED flag to allow parallel execution. Please add this paragraph in your cover letter, it is very important for review. > > I do not intend to globally remove or break RTNL. > >> >> I go through all the descriptions of your 14 patches (but not any code), >> I still have no clue how you successfully avoid RTNL. Please don't >> let me read into your code to understand that, there must be some >> high-level justification on how it works. Without it, I don't event want >> to read into the code. > > On internal code review I've been asked not to duplicate info from > commit messages in cover letter, but I guess I can expand it with some > high level outline in V2. In cover letter, you should put a high-level overview of "why" and "how". If, in the worst case, on high-level it doesn't make sense, why should we bother to read the code? In short, you have to convince people to read your code here. In each patch description, you should explain what a single patch does. I don't see any duplication. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/14] Modify action API for implementing lockless actions
On Thu 24 May 2018 at 23:34, Cong Wangwrote: > On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov wrote: >> Currently, all netlink protocol handlers for updating rules, actions and >> qdiscs are protected with single global rtnl lock which removes any >> possibility for parallelism. This patch set is a first step to remove >> rtnl lock dependency from TC rules update path. It updates act API to >> use atomic operations, rcu and spinlocks for fine-grained locking. It >> also extend API with functions that are needed to update existing >> actions for parallel execution. > > Can you give a summary here for what and how it is achieved? Got it, will expand cover letter in V2 with summary. > > You said this is the first step, what do you want to achieve in this > very first step? And how do you achieve it? Do you break the RTNL But aren't this questions answered in paragraph you quoted? What: Change act API to not rely on one-big-global-RTNL-lock and to use more fine-grained synchronization methods to allow safe concurrent execution. How: Refactor act API code to use atomics, rcu and spinlocks, etc. for protecting shared data structures, add new functions required to update specific actions implementation for parallel execution. (step 2) If you feel that this cover letter is too terse, I will add outline of changes in V2. > lock down to, for a quick example, a per-device lock? Or perhaps you > completely remove it because of what reason? I want to remove RTNL _dependency_ from act API data structures and code. I probably should me more specific in this case: Florian recently made a change that allows registering netlink protocol handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with this flag are called without RTNL taken. My end goal is to have rule update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered with UNLOCKED flag to allow parallel execution. I do not intend to globally remove or break RTNL. > > I go through all the descriptions of your 14 patches (but not any code), > I still have no clue how you successfully avoid RTNL. Please don't > let me read into your code to understand that, there must be some > high-level justification on how it works. Without it, I don't event want > to read into the code. On internal code review I've been asked not to duplicate info from commit messages in cover letter, but I guess I can expand it with some high level outline in V2. > > Thanks. Thank you for your feedback! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2 net-next 1/2] ipvs: add full ipv6 support to nfct
Prepare NFCT to support IPv6 for FTP: - Do not restrict the expectation callback to PF_INET - Split the debug messages, so that the 160-byte limitation in IP_VS_DBG_BUF is not exceeded when printing many IPv6 addresses. This means no more than 3 addresses in one message, i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses. Signed-off-by: Julian Anastasov--- net/netfilter/ipvs/ip_vs_nfct.c | 101 +++- 1 file changed, 49 insertions(+), 52 deletions(-) diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c index 6cf3fd8..eb8b9c8 100644 --- a/net/netfilter/ipvs/ip_vs_nfct.c +++ b/net/netfilter/ipvs/ip_vs_nfct.c @@ -67,15 +67,20 @@ #include -#define FMT_TUPLE "%pI4:%u->%pI4:%u/%u" -#define ARG_TUPLE(T) &(T)->src.u3.ip, ntohs((T)->src.u.all), \ - &(T)->dst.u3.ip, ntohs((T)->dst.u.all), \ +#define FMT_TUPLE "%s:%u->%s:%u/%u" +#define ARG_TUPLE(T) IP_VS_DBG_ADDR((T)->src.l3num, &(T)->src.u3), \ + ntohs((T)->src.u.all), \ + IP_VS_DBG_ADDR((T)->src.l3num, &(T)->dst.u3), \ + ntohs((T)->dst.u.all), \ (T)->dst.protonum -#define FMT_CONN "%pI4:%u->%pI4:%u->%pI4:%u/%u:%u" -#define ARG_CONN(C)&((C)->caddr.ip), ntohs((C)->cport), \ - &((C)->vaddr.ip), ntohs((C)->vport), \ - &((C)->daddr.ip), ntohs((C)->dport), \ +#define FMT_CONN "%s:%u->%s:%u->%s:%u/%u:%u" +#define ARG_CONN(C)IP_VS_DBG_ADDR((C)->af, &((C)->caddr)), \ + ntohs((C)->cport), \ + IP_VS_DBG_ADDR((C)->af, &((C)->vaddr)), \ + ntohs((C)->vport), \ + IP_VS_DBG_ADDR((C)->daf, &((C)->daddr)),\ + ntohs((C)->dport), \ (C)->protocol, (C)->state void @@ -127,13 +132,17 @@ ip_vs_update_conntrack(struct sk_buff *skb, struct ip_vs_conn *cp, int outin) new_tuple.dst.protonum != IPPROTO_ICMPV6) new_tuple.dst.u.tcp.port = cp->vport; } - IP_VS_DBG(7, "%s: Updating conntrack ct=%p, status=0x%lX, " - "ctinfo=%d, old reply=" FMT_TUPLE - ", new reply=" FMT_TUPLE ", cp=" FMT_CONN "\n", - __func__, ct, ct->status, ctinfo, - ARG_TUPLE(>tuplehash[IP_CT_DIR_REPLY].tuple), - ARG_TUPLE(_tuple), ARG_CONN(cp)); + IP_VS_DBG_BUF(7, "%s: Updating conntrack ct=%p, status=0x%lX, " + "ctinfo=%d, old reply=" FMT_TUPLE "\n", + __func__, ct, ct->status, ctinfo, + ARG_TUPLE(>tuplehash[IP_CT_DIR_REPLY].tuple)); + IP_VS_DBG_BUF(7, "%s: Updating conntrack ct=%p, status=0x%lX, " + "ctinfo=%d, new reply=" FMT_TUPLE "\n", + __func__, ct, ct->status, ctinfo, + ARG_TUPLE(_tuple)); nf_conntrack_alter_reply(ct, _tuple); + IP_VS_DBG_BUF(7, "%s: Updated conntrack ct=%p for cp=" FMT_CONN "\n", + __func__, ct, ARG_CONN(cp)); } int ip_vs_confirm_conntrack(struct sk_buff *skb) @@ -152,9 +161,6 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct, struct ip_vs_conn_param p; struct net *net = nf_ct_net(ct); - if (exp->tuple.src.l3num != PF_INET) - return; - /* * We assume that no NF locks are held before this callback. * ip_vs_conn_out_get and ip_vs_conn_in_get should match their @@ -171,19 +177,15 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct, cp = ip_vs_conn_out_get(); if (cp) { /* Change reply CLIENT->RS to CLIENT->VS */ + IP_VS_DBG_BUF(7, "%s: for ct=%p, status=0x%lX found inout cp=" + FMT_CONN "\n", + __func__, ct, ct->status, ARG_CONN(cp)); new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple; - IP_VS_DBG(7, "%s: ct=%p, status=0x%lX, tuples=" FMT_TUPLE ", " - FMT_TUPLE ", found inout cp=" FMT_CONN "\n", - __func__, ct, ct->status, - ARG_TUPLE(orig), ARG_TUPLE(_reply), - ARG_CONN(cp)); + IP_VS_DBG_BUF(7, "%s: ct=%p before alter: reply tuple=" + FMT_TUPLE "\n", + __func__, ct, ARG_TUPLE(_reply)); new_reply.dst.u3 = cp->vaddr; new_reply.dst.u.tcp.port = cp->vport; - IP_VS_DBG(7, "%s: ct=%p, new tuples=" FMT_TUPLE ", " FMT_TUPLE - ", inout cp=" FMT_CONN "\n", -
[PATCHv2 net-next 2/2] ipvs: add ipv6 support to ftp
Add support for FTP commands with extended format (RFC 2428): - FTP EPRT: IPv4 and IPv6, active mode, similar to PORT - FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV. EPSV response usually contains only port but we allow real server to provide different address We restrict control and data connection to be from same address family. Allow the "(" and ")" to be optional in PASV response. Also, add ipvsh argument to the pkt_in/pkt_out handlers to better access the payload after transport header. Signed-off-by: Julian Anastasov--- include/net/ip_vs.h | 10 +- net/netfilter/ipvs/ip_vs_app.c| 24 +- net/netfilter/ipvs/ip_vs_ftp.c| 467 ++ net/netfilter/ipvs/ip_vs_proto_sctp.c | 4 +- net/netfilter/ipvs/ip_vs_proto_tcp.c | 4 +- net/netfilter/ipvs/ip_vs_proto_udp.c | 4 +- 6 files changed, 331 insertions(+), 182 deletions(-) diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h index ae72d90..824d7ef 100644 --- a/include/net/ip_vs.h +++ b/include/net/ip_vs.h @@ -763,14 +763,14 @@ struct ip_vs_app { * 2=Mangled but checksum was not updated */ int (*pkt_out)(struct ip_vs_app *, struct ip_vs_conn *, - struct sk_buff *, int *diff); + struct sk_buff *, int *diff, struct ip_vs_iphdr *ipvsh); /* input hook: Process packet in outin direction, diff set for TCP. * Return: 0=Error, 1=Payload Not Mangled/Mangled but checksum is ok, * 2=Mangled but checksum was not updated */ int (*pkt_in)(struct ip_vs_app *, struct ip_vs_conn *, - struct sk_buff *, int *diff); + struct sk_buff *, int *diff, struct ip_vs_iphdr *ipvsh); /* ip_vs_app initializer */ int (*init_conn)(struct ip_vs_app *, struct ip_vs_conn *); @@ -1328,8 +1328,10 @@ int register_ip_vs_app_inc(struct netns_ipvs *ipvs, struct ip_vs_app *app, __u16 int ip_vs_app_inc_get(struct ip_vs_app *inc); void ip_vs_app_inc_put(struct ip_vs_app *inc); -int ip_vs_app_pkt_out(struct ip_vs_conn *, struct sk_buff *skb); -int ip_vs_app_pkt_in(struct ip_vs_conn *, struct sk_buff *skb); +int ip_vs_app_pkt_out(struct ip_vs_conn *, struct sk_buff *skb, + struct ip_vs_iphdr *ipvsh); +int ip_vs_app_pkt_in(struct ip_vs_conn *, struct sk_buff *skb, +struct ip_vs_iphdr *ipvsh); int register_ip_vs_pe(struct ip_vs_pe *pe); int unregister_ip_vs_pe(struct ip_vs_pe *pe); diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c index 1c98c90..12d7489 100644 --- a/net/netfilter/ipvs/ip_vs_app.c +++ b/net/netfilter/ipvs/ip_vs_app.c @@ -355,7 +355,8 @@ static inline void vs_seq_update(struct ip_vs_conn *cp, struct ip_vs_seq *vseq, } static inline int app_tcp_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb, - struct ip_vs_app *app) + struct ip_vs_app *app, + struct ip_vs_iphdr *ipvsh) { int diff; const unsigned int tcp_offset = ip_hdrlen(skb); @@ -386,7 +387,7 @@ static inline int app_tcp_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb, if (app->pkt_out == NULL) return 1; - if (!app->pkt_out(app, cp, skb, )) + if (!app->pkt_out(app, cp, skb, , ipvsh)) return 0; /* @@ -404,7 +405,8 @@ static inline int app_tcp_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb, * called by ipvs packet handler, assumes previously checked cp!=NULL * returns false if it can't handle packet (oom) */ -int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb) +int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb, + struct ip_vs_iphdr *ipvsh) { struct ip_vs_app *app; @@ -417,7 +419,7 @@ int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb) /* TCP is complicated */ if (cp->protocol == IPPROTO_TCP) - return app_tcp_pkt_out(cp, skb, app); + return app_tcp_pkt_out(cp, skb, app, ipvsh); /* * Call private output hook function @@ -425,12 +427,13 @@ int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb) if (app->pkt_out == NULL) return 1; - return app->pkt_out(app, cp, skb, NULL); + return app->pkt_out(app, cp, skb, NULL, ipvsh); } static inline int app_tcp_pkt_in(struct ip_vs_conn *cp, struct sk_buff *skb, -struct ip_vs_app *app) +struct ip_vs_app *app, +struct ip_vs_iphdr *ipvsh) { int diff; const unsigned int tcp_offset = ip_hdrlen(skb); @@ -461,7 +464,7 @@ static inline int app_tcp_pkt_in(struct ip_vs_conn *cp, struct sk_buff *skb, if (app->pkt_in == NULL)
[PATCHv2 net-next 0/2] Add IPv6 support to IPVS FTP-NAT
The patchset includes two changes to support IPv6 in ip_vs_ftp. The first patch allows IPv6 addresses in ip_vs_nfct.c debugging and removes the AF_INET restriction for netfilter expectations. The second patch changes ip_vs_ftp.c to support EPRT and EPSV commands with extended format (RFC 2428) which supports both IPv4 and IPv6 addresses. v1->v2: two places were missing the (void *) cast for cp->app_data, reported by kbuild test robot Julian Anastasov (2): ipvs: add full ipv6 support to nfct ipvs: add ipv6 support to ftp include/net/ip_vs.h | 10 +- net/netfilter/ipvs/ip_vs_app.c| 24 +- net/netfilter/ipvs/ip_vs_ftp.c| 467 ++ net/netfilter/ipvs/ip_vs_nfct.c | 101 net/netfilter/ipvs/ip_vs_proto_sctp.c | 4 +- net/netfilter/ipvs/ip_vs_proto_tcp.c | 4 +- net/netfilter/ipvs/ip_vs_proto_udp.c | 4 +- 7 files changed, 380 insertions(+), 234 deletions(-) -- 2.9.5 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nft] tests: shell: Add test case for multiple sets
This test case tests the id allocation of nftable set names. Signed-off-by: Varsha Rao--- .../shell/testcases/sets/0034add_many_sets_0 | 25 +++ 1 file changed, 25 insertions(+) create mode 100755 tests/shell/testcases/sets/0034add_many_sets_0 diff --git a/tests/shell/testcases/sets/0034add_many_sets_0 b/tests/shell/testcases/sets/0034add_many_sets_0 new file mode 100755 index 000..cd23082 --- /dev/null +++ b/tests/shell/testcases/sets/0034add_many_sets_0 @@ -0,0 +1,25 @@ +#!/bin/bash + +# This testscase checks addition of multiple sets. + +COUNT=4 + +tmpfile=$(mktemp) +if [ ! -w $tmpfile ] ; then +echo "Failed to create tmp file" >&2 +exit 0 +fi + +trap "rm -rf $tmpfile" EXIT # cleanup if aborted + +generate() { +for ((i=1; i<=COUNT; i++)) ; do + echo "add set x y${i} { type ipv4_addr; }" +done +} + +echo "add table x +$(generate)" > $tmpfile + +set -e +$NFT -f $tmpfile -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nf-next v3] net: netfilter: nf_tables_api: Use id allocation.
In nf_tables_set_alloc_name function, remove get_zeroed_page find_first_zero_bit and set_bit functions. Instead use ida_get_new_above function as it simplifies the code. Signed-off-by: Varsha Rao--- Changes in v2: - Modified the upper limit of page size. Changes in v3: - Used ida_get_new_above instead of ida_simple_get due to internal locking. - Defined macro NFT_SET_IDA_SIZE. - Modified commit message. net/netfilter/nf_tables_api.c | 39 ++- 1 file changed, 25 insertions(+), 14 deletions(-) diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 87b2a77add65..4fb3c5dfe9f7 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -23,6 +23,8 @@ #include #include +#define NFT_SET_IDA_SIZE BITS_PER_BYTE * PAGE_SIZE + static LIST_HEAD(nf_tables_expressions); static LIST_HEAD(nf_tables_objects); static LIST_HEAD(nf_tables_flowtables); @@ -2718,17 +2720,14 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set, { const struct nft_set *i; const char *p; - unsigned long *inuse; - unsigned int n = 0, min = 0; + int n = 0, min = 0, id; + DEFINE_IDA(inuse); p = strchr(name, '%'); if (p != NULL) { if (p[1] != 'd' || strchr(p + 2, '%')) return -EINVAL; - inuse = (unsigned long *)get_zeroed_page(GFP_KERNEL); - if (inuse == NULL) - return -ENOMEM; cont: list_for_each_entry(i, >table->sets, list) { int tmp; @@ -2737,22 +2736,34 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set, continue; if (!sscanf(i->name, name, )) continue; - if (tmp < min || tmp >= min + BITS_PER_BYTE * PAGE_SIZE) + if (tmp < min || tmp >= min + NFT_SET_IDA_SIZE) continue; - set_bit(tmp - min, inuse); + n = ida_get_new_above(, tmp - min, ); + if (n < 0) { + if (n == -EAGAIN) + goto cont; + + return n; + } } + n = ida_get_new_above(, 0, ); - n = find_first_zero_bit(inuse, BITS_PER_BYTE * PAGE_SIZE); - if (n >= BITS_PER_BYTE * PAGE_SIZE) { - min += BITS_PER_BYTE * PAGE_SIZE; - memset(inuse, 0, PAGE_SIZE); - goto cont; + if (n < 0) { + if (n == -EAGAIN) + goto cont; + else if (n == -ENOSPC) { + min += NFT_SET_IDA_SIZE; + ida_destroy(); + goto cont; + } else + return n; } - free_page((unsigned long)inuse); + + ida_destroy(); } - set->name = kasprintf(GFP_KERNEL, name, min + n); + set->name = kasprintf(GFP_KERNEL, name, id); if (!set->name) return -ENOMEM; -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] ipvs: add ipv6 support to ftp
Hello, On Fri, 25 May 2018, kbuild test robot wrote: > Hi Julian, > > I love your patch! Perhaps something to improve: > > [auto build test WARNING on net-next/master] > > url: > https://github.com/0day-ci/linux/commits/Julian-Anastasov/Add-IPv6-support-to-IPVS-FTP-NAT/20180525-153345 > reproduce: > # apt-get install sparse > make ARCH=x86_64 allmodconfig > make C=1 CF=-D__CHECK_ENDIAN__ > > > sparse warnings: (new ones prefixed by >>) > > >> net/netfilter/ipvs/ip_vs_ftp.c:399:24: sparse: Using plain integer as NULL > >> pointer >net/netfilter/ipvs/ip_vs_ftp.c:533:24: sparse: Using plain integer as NULL > pointer I missed it, thanks! Will send v2. Regards -- Julian Anastasov <j...@ssi.bg> -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nft] fix printing of "tcp flags syn" and "tcp flags == syn" expressions
Sabrina Dubrocawrote: > Commit 6979625686ec ("relational: Eliminate meta OPs") introduced some > bugs when printing bitmask types. > > First, during the post-processing phase of delinearization, the > expression for "tcp flags syn" (PAYLOAD & flag != 0) gets converted to > PAYLOAD == flag, which is not equivalent. This should be > PAYLOAD (IMPL) flag. Applied, thanks for finding/fixing this. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH nft] fix printing of "tcp flags syn" and "tcp flags == syn" expressions
Commit 6979625686ec ("relational: Eliminate meta OPs") introduced some bugs when printing bitmask types. First, during the post-processing phase of delinearization, the expression for "tcp flags syn" (PAYLOAD & flag != 0) gets converted to PAYLOAD == flag, which is not equivalent. This should be PAYLOAD (IMPL) flag. Then, during output, the "==" sign from "tcp flags == syn" is dropped, because the bitmask condition in must_print_eq_op() was removed. Let's restore it, so that "tcp flags == syn" doesn't get printed as "tcp flags syn". An extra check for value types is added, so that we don't start printing "==" for sets such as "tcp flags {syn,ack}" Finally, add a regression test for this particular case. Fixes: 6979625686ec ("relational: Eliminate meta OPs") Signed-off-by: Sabrina Dubroca--- src/expression.c| 5 + src/netlink_delinearize.c | 2 +- tests/py/inet/tcp.t | 1 + tests/py/inet/tcp.t.payload | 7 +++ 4 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/expression.c b/src/expression.c index 53fb1811f28c..bea0f4c8d9bc 100644 --- a/src/expression.c +++ b/src/expression.c @@ -565,6 +565,11 @@ static void binop_arg_print(const struct expr *op, const struct expr *arg, bool must_print_eq_op(const struct expr *expr) { + if (expr->right->dtype->basetype != NULL && + expr->right->dtype->basetype->type == TYPE_BITMASK && + expr->right->ops->type == EXPR_VALUE) + return true; + return expr->left->ops->type == EXPR_BINOP; } diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c index 8f4035a291f4..7d882ebac555 100644 --- a/src/netlink_delinearize.c +++ b/src/netlink_delinearize.c @@ -1733,7 +1733,7 @@ static void relational_binop_postprocess(struct rule_pp_ctx *ctx, struct expr *e expr->left = expr_get(binop->left); expr->right = binop_tree_to_list(NULL, binop->right); - expr->op= OP_EQ; + expr->op= OP_IMPLICIT; expr_free(binop); } else if (binop->left->dtype->flags & DTYPE_F_PREFIX && diff --git a/tests/py/inet/tcp.t b/tests/py/inet/tcp.t index 5276516625cc..d66ba8438a32 100644 --- a/tests/py/inet/tcp.t +++ b/tests/py/inet/tcp.t @@ -76,6 +76,7 @@ tcp flags { fin, syn, rst, psh, ack, urg, ecn, cwr} drop;ok tcp flags != { fin, urg, ecn, cwr} drop;ok tcp flags cwr;ok tcp flags != cwr;ok +tcp flags == syn;ok tcp flags & (syn|fin) == (syn|fin);ok;tcp flags & (fin | syn) == fin | syn tcp window 2;ok diff --git a/tests/py/inet/tcp.t.payload b/tests/py/inet/tcp.t.payload index 512b42e9df08..09538aed746c 100644 --- a/tests/py/inet/tcp.t.payload +++ b/tests/py/inet/tcp.t.payload @@ -421,6 +421,13 @@ inet test-inet input [ payload load 1b @ transport header + 13 => reg 1 ] [ cmp neq reg 1 0x0080 ] +# tcp flags == syn +inet test-inet input + [ meta load l4proto => reg 1 ] + [ cmp eq reg 1 0x0006 ] + [ payload load 1b @ transport header + 13 => reg 1 ] + [ cmp eq reg 1 0x0002 ] + # tcp flags & (syn|fin) == (syn|fin) inet test-inet input [ meta load l4proto => reg 1 ] -- 2.17.0 -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] ipvs: add ipv6 support to ftp
Hi Julian, I love your patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Julian-Anastasov/Add-IPv6-support-to-IPVS-FTP-NAT/20180525-153345 reproduce: # apt-get install sparse make ARCH=x86_64 allmodconfig make C=1 CF=-D__CHECK_ENDIAN__ sparse warnings: (new ones prefixed by >>) >> net/netfilter/ipvs/ip_vs_ftp.c:399:24: sparse: Using plain integer as NULL >> pointer net/netfilter/ipvs/ip_vs_ftp.c:533:24: sparse: Using plain integer as NULL pointer vim +399 net/netfilter/ipvs/ip_vs_ftp.c 239 240 /* Look at outgoing ftp packets to catch the response to a PASV/EPSV command 241 * from the server (inside-to-outside). 242 * When we see one, we build a connection entry with the client address, 243 * client port 0 (unknown at the moment), the server address and the 244 * server port. Mark the current connection entry as a control channel 245 * of the new entry. All this work is just to make the data connection 246 * can be scheduled to the right server later. 247 * 248 * The outgoing packet should be something like 249 * "227 Entering Passive Mode (xxx,xxx,xxx,xxx,ppp,ppp)". 250 * xxx,xxx,xxx,xxx is the server address, ppp,ppp is the server port number. 251 * The extended format for EPSV response provides usually only port: 252 * "229 Entering Extended Passive Mode (|||ppp|)" 253 */ 254 static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp, 255 struct sk_buff *skb, int *diff, 256 struct ip_vs_iphdr *ipvsh) 257 { 258 char *data, *data_limit; 259 char *start, *end; 260 union nf_inet_addr from; 261 __be16 port; 262 struct ip_vs_conn *n_cp; 263 char buf[24]; /* xxx.xxx.xxx.xxx,ppp,ppp\000 */ 264 unsigned int buf_len; 265 int ret = 0; 266 enum ip_conntrack_info ctinfo; 267 struct nf_conn *ct; 268 269 *diff = 0; 270 271 /* Only useful for established sessions */ 272 if (cp->state != IP_VS_TCP_S_ESTABLISHED) 273 return 1; 274 275 /* Linear packets are much easier to deal with. */ 276 if (!skb_make_writable(skb, skb->len)) 277 return 0; 278 279 if (cp->app_data == (void *) IP_VS_FTP_PASV) { 280 data = ip_vs_ftp_data_ptr(skb, ipvsh); 281 data_limit = skb_tail_pointer(skb); 282 283 if (!data || data >= data_limit) 284 return 1; 285 286 if (ip_vs_ftp_get_addrport(data, data_limit, 287 SERVER_STRING_PASV, 288 sizeof(SERVER_STRING_PASV)-1, 289 '(', false, IP_VS_FTP_PASV, 290 , , cp->af, 291 , ) != 1) 292 return 1; 293 294 IP_VS_DBG(7, "PASV response (%pI4:%u) -> %pI4:%u detected\n", 295, ntohs(port), >caddr.ip, 0); 296 } else if (cp->app_data == (void *) IP_VS_FTP_EPSV) { 297 data = ip_vs_ftp_data_ptr(skb, ipvsh); 298 data_limit = skb_tail_pointer(skb); 299 300 if (!data || data >= data_limit) 301 return 1; 302 303 /* Usually, data address is not specified but 304 * we support different address, so pre-set it. 305 */ 306 from = cp->daddr; 307 if (ip_vs_ftp_get_addrport(data, data_limit, 308 SERVER_STRING_EPSV, 309 sizeof(SERVER_STRING_EPSV)-1, 310 '(', true, IP_VS_FTP_EPSV, 311 , , cp->af, 312 , ) != 1) 313 return 1; 314 315 IP_VS_DBG_BUF(7, "EPSV response (%s:%u) -> %s:%u detected\n", 316IP_VS_DBG_ADDR(cp->af, ), ntohs(port), 317IP_VS_DBG_ADDR(cp->af, >caddr), 0); 318 } else { 319 return 1; 320 } 321 322 /* Now update or create a connection entry for it */ 323 { 324 str
[PATCH nft,v2] segtree: incorrect handling of comments and timeouts with mapping
Check if expression is a mapping to do the right handling. Fixes: 35fedcf540bf ("segtree: missing comments in range and prefix expressions in sets") Fixes: be90e03dd1fa ("segtree: add timeout for range and prefix expressions in sets") Reported-by: Florian WestphalSigned-off-by: Pablo Neira Ayuso --- v2: wrong reference to expr->left from non-mapping expression. src/segtree.c | 67 --- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/src/segtree.c b/src/segtree.c index 28d45c920c3c..8a8aa71e8a6e 100644 --- a/src/segtree.c +++ b/src/segtree.c @@ -540,13 +540,19 @@ static void set_insert_interval(struct expr *set, struct seg_tree *tree, expr = set_elem_expr_alloc(_location, expr); if (ei->expr != NULL) { - if (ei->expr->comment) - expr->comment = xstrdup(ei->expr->comment); - if (ei->expr->timeout) - expr->timeout = ei->expr->timeout; - if (ei->expr->ops->type == EXPR_MAPPING) + if (ei->expr->ops->type == EXPR_MAPPING) { + if (ei->expr->left->comment) + expr->comment = xstrdup(ei->expr->left->comment); + if (ei->expr->left->timeout) + expr->timeout = ei->expr->left->timeout; expr = mapping_expr_alloc(>expr->location, expr, expr_get(ei->expr->right)); + } else { + if (ei->expr->comment) + expr->comment = xstrdup(ei->expr->comment); + if (ei->expr->timeout) + expr->timeout = ei->expr->timeout; + } } if (ei->flags & EI_F_INTERVAL_END) @@ -831,15 +837,25 @@ void interval_map_decompose(struct expr *set) tmp = range_expr_alloc(>location, expr_value(low), tmp); tmp = set_elem_expr_alloc(>location, tmp); - if (low->comment) - tmp->comment = xstrdup(low->comment); - if (low->timeout) - tmp->timeout = low->timeout; - if (low->expiration) - tmp->expiration = low->expiration; - if (low->ops->type == EXPR_MAPPING) - tmp = mapping_expr_alloc(>location, tmp, low->right); + if (low->ops->type == EXPR_MAPPING) { + if (low->left->comment) + tmp->comment = xstrdup(low->left->comment); + if (low->left->timeout) + tmp->timeout = low->left->timeout; + if (low->left->expiration) + tmp->expiration = low->left->expiration; + + tmp = mapping_expr_alloc(>location, tmp, +low->right); + } else { + if (low->comment) + tmp->comment = xstrdup(low->comment); + if (low->timeout) + tmp->timeout = low->timeout; + if (low->expiration) + tmp->expiration = low->expiration; + } compound_expr_add(set, tmp); } else { @@ -852,16 +868,25 @@ void interval_map_decompose(struct expr *set) prefix->len = expr_value(i)->len; prefix = set_elem_expr_alloc(>location, prefix); - if (low->comment) - prefix->comment = xstrdup(low->comment); - if (low->timeout) - prefix->timeout = low->timeout; - if (low->expiration) - prefix->expiration = low->expiration; - - if (low->ops->type == EXPR_MAPPING) + + if (low->ops->type == EXPR_MAPPING) { + if (low->left->comment) + prefix->comment = xstrdup(low->left->comment); + if (low->left->timeout) + prefix->timeout = low->left->timeout; + if (low->left->expiration) + prefix->expiration = low->left->expiration; + prefix = mapping_expr_alloc(>location, prefix, low->right); +
[PATCH nft] segtree: incorrect handling of comments and timeouts with mapping
Check if expression is a mapping to do the right handling. Fixes: 35fedcf540bf ("segtree: missing comments in range and prefix expressions in sets") Fixes: be90e03dd1fa ("segtree: add timeout for range and prefix expressions in sets") Reported-by: Florian WestphalSigned-off-by: Pablo Neira Ayuso --- src/segtree.c | 67 --- 1 file changed, 46 insertions(+), 21 deletions(-) diff --git a/src/segtree.c b/src/segtree.c index 28d45c920c3c..423c54bdeda6 100644 --- a/src/segtree.c +++ b/src/segtree.c @@ -540,13 +540,19 @@ static void set_insert_interval(struct expr *set, struct seg_tree *tree, expr = set_elem_expr_alloc(_location, expr); if (ei->expr != NULL) { - if (ei->expr->comment) - expr->comment = xstrdup(ei->expr->comment); - if (ei->expr->timeout) - expr->timeout = ei->expr->timeout; - if (ei->expr->ops->type == EXPR_MAPPING) + if (ei->expr->ops->type == EXPR_MAPPING) { + if (ei->expr->left->comment) + expr->comment = xstrdup(ei->expr->left->comment); + if (ei->expr->left->timeout) + expr->timeout = ei->expr->left->timeout; expr = mapping_expr_alloc(>expr->location, expr, expr_get(ei->expr->right)); + } else { + if (ei->expr->comment) + expr->comment = xstrdup(ei->expr->comment); + if (ei->expr->timeout) + expr->timeout = ei->expr->timeout; + } } if (ei->flags & EI_F_INTERVAL_END) @@ -831,15 +837,25 @@ void interval_map_decompose(struct expr *set) tmp = range_expr_alloc(>location, expr_value(low), tmp); tmp = set_elem_expr_alloc(>location, tmp); - if (low->comment) - tmp->comment = xstrdup(low->comment); - if (low->timeout) - tmp->timeout = low->timeout; - if (low->expiration) - tmp->expiration = low->expiration; - if (low->ops->type == EXPR_MAPPING) - tmp = mapping_expr_alloc(>location, tmp, low->right); + if (low->ops->type == EXPR_MAPPING) { + if (low->left->comment) + tmp->comment = xstrdup(low->left->comment); + if (low->left->timeout) + tmp->timeout = low->left->timeout; + if (low->left->expiration) + tmp->expiration = low->left->expiration; + + tmp = mapping_expr_alloc(>location, tmp, +low->right); + } else { + if (low->comment) + tmp->comment = xstrdup(low->comment); + if (low->timeout) + tmp->timeout = low->timeout; + if (low->left->expiration) + tmp->expiration = low->expiration; + } compound_expr_add(set, tmp); } else { @@ -852,16 +868,25 @@ void interval_map_decompose(struct expr *set) prefix->len = expr_value(i)->len; prefix = set_elem_expr_alloc(>location, prefix); - if (low->comment) - prefix->comment = xstrdup(low->comment); - if (low->timeout) - prefix->timeout = low->timeout; - if (low->expiration) - prefix->expiration = low->expiration; - - if (low->ops->type == EXPR_MAPPING) + + if (low->ops->type == EXPR_MAPPING) { + if (low->left->comment) + prefix->comment = xstrdup(low->left->comment); + if (low->left->timeout) + prefix->timeout = low->left->timeout; + if (low->left->expiration) + prefix->expiration = low->left->expiration; + prefix = mapping_expr_alloc(>location, prefix, low->right); + } else { + if
Re: [PATCH net] ipvs: fix buffer overflow with sync daemon and service
On Sat, 19 May 2018 18:22:35 +0300 Julian Anastasovwrote: > The same happens for sched_name when adding/editing virtual server. > > We are restricted by IP_VS_SCHEDNAME_MAXLEN and IP_VS_IFNAME_MAXLEN > being used as size in include/uapi/linux/ip_vs.h, so they > include the space for NUL. Ah, okay I see why you also need the ipvsadm change :-) -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html