Re: [PATCH 00/14] Modify action API for implementing lockless actions

2018-05-25 Thread Cong Wang
On Fri, May 25, 2018 at 1:39 PM, Vlad Buslov  wrote:
>
> On Thu 24 May 2018 at 23:34, Cong Wang  wrote:
>> On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov  wrote:
>>> Currently, all netlink protocol handlers for updating rules, actions and
>>> qdiscs are protected with single global rtnl lock which removes any
>>> possibility for parallelism. This patch set is a first step to remove
>>> rtnl lock dependency from TC rules update path. It updates act API to
>>> use atomic operations, rcu and spinlocks for fine-grained locking. It
>>> also extend API with functions that are needed to update existing
>>> actions for parallel execution.
>>
>> Can you give a summary here for what and how it is achieved?
>
> Got it, will expand cover letter in V2 with summary.
>>
>> You said this is the first step, what do you want to achieve in this
>> very first step? And how do you achieve it? Do you break the RTNL
>
> But aren't this questions answered in paragraph you quoted?


Obviously not, you said to remove it, but never explains why it can
be removed and how it is removed. This is crucial for review.

"use atomic operations, rcu and spinlocks for fine-grained locking"
is literately nothing, why atomic/rcu makes RTNL unnecessary?
How RCU is used? What spinlocks are you talking about? What
do these spinlocks protect after removing RTNL? Why are they
safe with other netdevice and netns operations?

You explain _nothing_ here. Really. Please don't force people to
read 14 patches to understand how it works. In fact, no one wants
to read the code unless there is some high-level explanation that
makes basic sense.


> What: Change act API to not rely on one-big-global-RTNL-lock and to use
> more fine-grained synchronization methods to allow safe concurrent
> execution.

Sure, how fine-grained it is after your patchset? Why this fine-grained
lock could safely replace RTNL?

Could you stop letting us guess your puzzle words? It would save your
time from exchanging emails with me, it would save my time from
guessing you too. It is a win-win.


> How: Refactor act API code to use atomics, rcu and spinlocks, etc. for
> protecting shared data structures, add new functions required to update


What shared data structures? The per-netns idr which is already protected
by a spinlock? The TC hierarchy? The shared standalone actions? Hey,
why do I have to guess? :-/


> specific actions implementation for parallel execution. (step 2)


Claim is easy, prove is hard. I can easily claim I break RTNL down
to a per-netns lock, but I can't prove it really works. :-D


>
> If you feel that this cover letter is too terse, I will add outline of
> changes in V2.

It is not my rule, it is how you have to help people to review your
14 patches. I think it is a fair game: you help people like me to
review your patches, we help you to get them reviewed and merged
if they all make sense.



>
>> lock down to, for a quick example, a per-device lock? Or perhaps you
>> completely remove it because of what reason?
>
> I want to remove RTNL _dependency_ from act API data structures and
> code. I probably should me more specific in this case:
>
> Florian recently made a change that allows registering netlink protocol
> handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with
> this flag are called without RTNL taken. My end goal is to have rule
> update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered
> with UNLOCKED flag to allow parallel execution.


Please add this paragraph in your cover letter, it is very important for review.

>
> I do not intend to globally remove or break RTNL.
>
>>
>> I go through all the descriptions of your 14 patches (but not any code),
>> I still have no clue how you successfully avoid RTNL. Please don't
>> let me read into your code to understand that, there must be some
>> high-level justification on how it works. Without it, I don't event want
>> to read into the code.
>
> On internal code review I've been asked not to duplicate info from
> commit messages in cover letter, but I guess I can expand it with some
> high level outline in V2.

In cover letter, you should put a high-level overview of "why" and "how".
If, in the worst case, on high-level it doesn't make sense, why should
we bother to read the code? In short, you have to convince people to
read your code here.

In each patch description, you should explain what a single patch does.
I don't see any duplication.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/14] Modify action API for implementing lockless actions

2018-05-25 Thread Vlad Buslov

On Thu 24 May 2018 at 23:34, Cong Wang  wrote:
> On Mon, May 14, 2018 at 7:27 AM, Vlad Buslov  wrote:
>> Currently, all netlink protocol handlers for updating rules, actions and
>> qdiscs are protected with single global rtnl lock which removes any
>> possibility for parallelism. This patch set is a first step to remove
>> rtnl lock dependency from TC rules update path. It updates act API to
>> use atomic operations, rcu and spinlocks for fine-grained locking. It
>> also extend API with functions that are needed to update existing
>> actions for parallel execution.
>
> Can you give a summary here for what and how it is achieved?

Got it, will expand cover letter in V2 with summary.
>
> You said this is the first step, what do you want to achieve in this
> very first step? And how do you achieve it? Do you break the RTNL

But aren't this questions answered in paragraph you quoted?
What: Change act API to not rely on one-big-global-RTNL-lock and to use
more fine-grained synchronization methods to allow safe concurrent
execution.
How: Refactor act API code to use atomics, rcu and spinlocks, etc. for
protecting shared data structures, add new functions required to update
specific actions implementation for parallel execution. (step 2)

If you feel that this cover letter is too terse, I will add outline of
changes in V2.

> lock down to, for a quick example, a per-device lock? Or perhaps you
> completely remove it because of what reason?

I want to remove RTNL _dependency_ from act API data structures and
code. I probably should me more specific in this case:

Florian recently made a change that allows registering netlink protocol
handlers with flag RTNL_FLAG_DOIT_UNLOCKED. Handlers registered with
this flag are called without RTNL taken. My end goal is to have rule
update handlers(RTM_NEWTFILTER, RTM_DELTFILTER, etc.) to be registered
with UNLOCKED flag to allow parallel execution.

I do not intend to globally remove or break RTNL.

>
> I go through all the descriptions of your 14 patches (but not any code),
> I still have no clue how you successfully avoid RTNL. Please don't
> let me read into your code to understand that, there must be some
> high-level justification on how it works. Without it, I don't event want
> to read into the code.

On internal code review I've been asked not to duplicate info from
commit messages in cover letter, but I guess I can expand it with some
high level outline in V2.

>
> Thanks.

Thank you for your feedback!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2 net-next 1/2] ipvs: add full ipv6 support to nfct

2018-05-25 Thread Julian Anastasov
Prepare NFCT to support IPv6 for FTP:

- Do not restrict the expectation callback to PF_INET

- Split the debug messages, so that the 160-byte limitation
in IP_VS_DBG_BUF is not exceeded when printing many IPv6
addresses. This means no more than 3 addresses in one message,
i.e. 1 tuple with 2 addresses or 1 connection with 3 addresses.

Signed-off-by: Julian Anastasov 
---
 net/netfilter/ipvs/ip_vs_nfct.c | 101 +++-
 1 file changed, 49 insertions(+), 52 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_nfct.c b/net/netfilter/ipvs/ip_vs_nfct.c
index 6cf3fd8..eb8b9c8 100644
--- a/net/netfilter/ipvs/ip_vs_nfct.c
+++ b/net/netfilter/ipvs/ip_vs_nfct.c
@@ -67,15 +67,20 @@
 #include 
 
 
-#define FMT_TUPLE  "%pI4:%u->%pI4:%u/%u"
-#define ARG_TUPLE(T)   &(T)->src.u3.ip, ntohs((T)->src.u.all), \
-   &(T)->dst.u3.ip, ntohs((T)->dst.u.all), \
+#define FMT_TUPLE  "%s:%u->%s:%u/%u"
+#define ARG_TUPLE(T)   IP_VS_DBG_ADDR((T)->src.l3num, &(T)->src.u3),   \
+   ntohs((T)->src.u.all),  \
+   IP_VS_DBG_ADDR((T)->src.l3num, &(T)->dst.u3),   \
+   ntohs((T)->dst.u.all),  \
(T)->dst.protonum
 
-#define FMT_CONN   "%pI4:%u->%pI4:%u->%pI4:%u/%u:%u"
-#define ARG_CONN(C)&((C)->caddr.ip), ntohs((C)->cport), \
-   &((C)->vaddr.ip), ntohs((C)->vport), \
-   &((C)->daddr.ip), ntohs((C)->dport), \
+#define FMT_CONN   "%s:%u->%s:%u->%s:%u/%u:%u"
+#define ARG_CONN(C)IP_VS_DBG_ADDR((C)->af, &((C)->caddr)), \
+   ntohs((C)->cport),  \
+   IP_VS_DBG_ADDR((C)->af, &((C)->vaddr)), \
+   ntohs((C)->vport),  \
+   IP_VS_DBG_ADDR((C)->daf, &((C)->daddr)),\
+   ntohs((C)->dport),  \
(C)->protocol, (C)->state
 
 void
@@ -127,13 +132,17 @@ ip_vs_update_conntrack(struct sk_buff *skb, struct 
ip_vs_conn *cp, int outin)
new_tuple.dst.protonum != IPPROTO_ICMPV6)
new_tuple.dst.u.tcp.port = cp->vport;
}
-   IP_VS_DBG(7, "%s: Updating conntrack ct=%p, status=0x%lX, "
- "ctinfo=%d, old reply=" FMT_TUPLE
- ", new reply=" FMT_TUPLE ", cp=" FMT_CONN "\n",
- __func__, ct, ct->status, ctinfo,
- ARG_TUPLE(>tuplehash[IP_CT_DIR_REPLY].tuple),
- ARG_TUPLE(_tuple), ARG_CONN(cp));
+   IP_VS_DBG_BUF(7, "%s: Updating conntrack ct=%p, status=0x%lX, "
+ "ctinfo=%d, old reply=" FMT_TUPLE "\n",
+ __func__, ct, ct->status, ctinfo,
+ ARG_TUPLE(>tuplehash[IP_CT_DIR_REPLY].tuple));
+   IP_VS_DBG_BUF(7, "%s: Updating conntrack ct=%p, status=0x%lX, "
+ "ctinfo=%d, new reply=" FMT_TUPLE "\n",
+ __func__, ct, ct->status, ctinfo,
+ ARG_TUPLE(_tuple));
nf_conntrack_alter_reply(ct, _tuple);
+   IP_VS_DBG_BUF(7, "%s: Updated conntrack ct=%p for cp=" FMT_CONN "\n",
+ __func__, ct, ARG_CONN(cp));
 }
 
 int ip_vs_confirm_conntrack(struct sk_buff *skb)
@@ -152,9 +161,6 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
struct ip_vs_conn_param p;
struct net *net = nf_ct_net(ct);
 
-   if (exp->tuple.src.l3num != PF_INET)
-   return;
-
/*
 * We assume that no NF locks are held before this callback.
 * ip_vs_conn_out_get and ip_vs_conn_in_get should match their
@@ -171,19 +177,15 @@ static void ip_vs_nfct_expect_callback(struct nf_conn *ct,
cp = ip_vs_conn_out_get();
if (cp) {
/* Change reply CLIENT->RS to CLIENT->VS */
+   IP_VS_DBG_BUF(7, "%s: for ct=%p, status=0x%lX found inout cp="
+ FMT_CONN "\n",
+ __func__, ct, ct->status, ARG_CONN(cp));
new_reply = ct->tuplehash[IP_CT_DIR_REPLY].tuple;
-   IP_VS_DBG(7, "%s: ct=%p, status=0x%lX, tuples=" FMT_TUPLE ", "
- FMT_TUPLE ", found inout cp=" FMT_CONN "\n",
- __func__, ct, ct->status,
- ARG_TUPLE(orig), ARG_TUPLE(_reply),
- ARG_CONN(cp));
+   IP_VS_DBG_BUF(7, "%s: ct=%p before alter: reply tuple="
+ FMT_TUPLE "\n",
+ __func__, ct, ARG_TUPLE(_reply));
new_reply.dst.u3 = cp->vaddr;
new_reply.dst.u.tcp.port = cp->vport;
-   IP_VS_DBG(7, "%s: ct=%p, new tuples=" FMT_TUPLE ", " FMT_TUPLE
- ", inout cp=" FMT_CONN "\n",
- 

[PATCHv2 net-next 2/2] ipvs: add ipv6 support to ftp

2018-05-25 Thread Julian Anastasov
Add support for FTP commands with extended format (RFC 2428):

- FTP EPRT: IPv4 and IPv6, active mode, similar to PORT
- FTP EPSV: IPv4 and IPv6, passive mode, similar to PASV.
EPSV response usually contains only port but we allow real
server to provide different address

We restrict control and data connection to be from same
address family.

Allow the "(" and ")" to be optional in PASV response.

Also, add ipvsh argument to the pkt_in/pkt_out handlers to better
access the payload after transport header.

Signed-off-by: Julian Anastasov 
---
 include/net/ip_vs.h   |  10 +-
 net/netfilter/ipvs/ip_vs_app.c|  24 +-
 net/netfilter/ipvs/ip_vs_ftp.c| 467 ++
 net/netfilter/ipvs/ip_vs_proto_sctp.c |   4 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c  |   4 +-
 net/netfilter/ipvs/ip_vs_proto_udp.c  |   4 +-
 6 files changed, 331 insertions(+), 182 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index ae72d90..824d7ef 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -763,14 +763,14 @@ struct ip_vs_app {
 * 2=Mangled but checksum was not updated
 */
int (*pkt_out)(struct ip_vs_app *, struct ip_vs_conn *,
-  struct sk_buff *, int *diff);
+  struct sk_buff *, int *diff, struct ip_vs_iphdr *ipvsh);
 
/* input hook: Process packet in outin direction, diff set for TCP.
 * Return: 0=Error, 1=Payload Not Mangled/Mangled but checksum is ok,
 * 2=Mangled but checksum was not updated
 */
int (*pkt_in)(struct ip_vs_app *, struct ip_vs_conn *,
- struct sk_buff *, int *diff);
+ struct sk_buff *, int *diff, struct ip_vs_iphdr *ipvsh);
 
/* ip_vs_app initializer */
int (*init_conn)(struct ip_vs_app *, struct ip_vs_conn *);
@@ -1328,8 +1328,10 @@ int register_ip_vs_app_inc(struct netns_ipvs *ipvs, 
struct ip_vs_app *app, __u16
 int ip_vs_app_inc_get(struct ip_vs_app *inc);
 void ip_vs_app_inc_put(struct ip_vs_app *inc);
 
-int ip_vs_app_pkt_out(struct ip_vs_conn *, struct sk_buff *skb);
-int ip_vs_app_pkt_in(struct ip_vs_conn *, struct sk_buff *skb);
+int ip_vs_app_pkt_out(struct ip_vs_conn *, struct sk_buff *skb,
+ struct ip_vs_iphdr *ipvsh);
+int ip_vs_app_pkt_in(struct ip_vs_conn *, struct sk_buff *skb,
+struct ip_vs_iphdr *ipvsh);
 
 int register_ip_vs_pe(struct ip_vs_pe *pe);
 int unregister_ip_vs_pe(struct ip_vs_pe *pe);
diff --git a/net/netfilter/ipvs/ip_vs_app.c b/net/netfilter/ipvs/ip_vs_app.c
index 1c98c90..12d7489 100644
--- a/net/netfilter/ipvs/ip_vs_app.c
+++ b/net/netfilter/ipvs/ip_vs_app.c
@@ -355,7 +355,8 @@ static inline void vs_seq_update(struct ip_vs_conn *cp, 
struct ip_vs_seq *vseq,
 }
 
 static inline int app_tcp_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb,
- struct ip_vs_app *app)
+ struct ip_vs_app *app,
+ struct ip_vs_iphdr *ipvsh)
 {
int diff;
const unsigned int tcp_offset = ip_hdrlen(skb);
@@ -386,7 +387,7 @@ static inline int app_tcp_pkt_out(struct ip_vs_conn *cp, 
struct sk_buff *skb,
if (app->pkt_out == NULL)
return 1;
 
-   if (!app->pkt_out(app, cp, skb, ))
+   if (!app->pkt_out(app, cp, skb, , ipvsh))
return 0;
 
/*
@@ -404,7 +405,8 @@ static inline int app_tcp_pkt_out(struct ip_vs_conn *cp, 
struct sk_buff *skb,
  * called by ipvs packet handler, assumes previously checked cp!=NULL
  * returns false if it can't handle packet (oom)
  */
-int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb)
+int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff *skb,
+ struct ip_vs_iphdr *ipvsh)
 {
struct ip_vs_app *app;
 
@@ -417,7 +419,7 @@ int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct sk_buff 
*skb)
 
/* TCP is complicated */
if (cp->protocol == IPPROTO_TCP)
-   return app_tcp_pkt_out(cp, skb, app);
+   return app_tcp_pkt_out(cp, skb, app, ipvsh);
 
/*
 *  Call private output hook function
@@ -425,12 +427,13 @@ int ip_vs_app_pkt_out(struct ip_vs_conn *cp, struct 
sk_buff *skb)
if (app->pkt_out == NULL)
return 1;
 
-   return app->pkt_out(app, cp, skb, NULL);
+   return app->pkt_out(app, cp, skb, NULL, ipvsh);
 }
 
 
 static inline int app_tcp_pkt_in(struct ip_vs_conn *cp, struct sk_buff *skb,
-struct ip_vs_app *app)
+struct ip_vs_app *app,
+struct ip_vs_iphdr *ipvsh)
 {
int diff;
const unsigned int tcp_offset = ip_hdrlen(skb);
@@ -461,7 +464,7 @@ static inline int app_tcp_pkt_in(struct ip_vs_conn *cp, 
struct sk_buff *skb,
if (app->pkt_in == NULL)

[PATCHv2 net-next 0/2] Add IPv6 support to IPVS FTP-NAT

2018-05-25 Thread Julian Anastasov
The patchset includes two changes to support IPv6 in ip_vs_ftp. 

The first patch allows IPv6 addresses in ip_vs_nfct.c debugging
and removes the AF_INET restriction for netfilter expectations.

The second patch changes ip_vs_ftp.c to support EPRT and EPSV
commands with extended format (RFC 2428) which supports both
IPv4 and IPv6 addresses.

v1->v2: two places were missing the (void *) cast for cp->app_data,
reported by kbuild test robot

Julian Anastasov (2):
  ipvs: add full ipv6 support to nfct
  ipvs: add ipv6 support to ftp

 include/net/ip_vs.h   |  10 +-
 net/netfilter/ipvs/ip_vs_app.c|  24 +-
 net/netfilter/ipvs/ip_vs_ftp.c| 467 ++
 net/netfilter/ipvs/ip_vs_nfct.c   | 101 
 net/netfilter/ipvs/ip_vs_proto_sctp.c |   4 +-
 net/netfilter/ipvs/ip_vs_proto_tcp.c  |   4 +-
 net/netfilter/ipvs/ip_vs_proto_udp.c  |   4 +-
 7 files changed, 380 insertions(+), 234 deletions(-)

-- 
2.9.5

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] tests: shell: Add test case for multiple sets

2018-05-25 Thread Varsha Rao
This test case tests the id allocation of nftable set names.

Signed-off-by: Varsha Rao 
---
 .../shell/testcases/sets/0034add_many_sets_0  | 25 +++
 1 file changed, 25 insertions(+)
 create mode 100755 tests/shell/testcases/sets/0034add_many_sets_0

diff --git a/tests/shell/testcases/sets/0034add_many_sets_0 
b/tests/shell/testcases/sets/0034add_many_sets_0
new file mode 100755
index 000..cd23082
--- /dev/null
+++ b/tests/shell/testcases/sets/0034add_many_sets_0
@@ -0,0 +1,25 @@
+#!/bin/bash
+
+# This testscase checks addition of multiple sets.
+
+COUNT=4
+
+tmpfile=$(mktemp)
+if [ ! -w $tmpfile ] ; then
+echo "Failed to create tmp file" >&2
+exit 0
+fi
+
+trap "rm -rf $tmpfile" EXIT # cleanup if aborted
+
+generate() {
+for ((i=1; i<=COUNT; i++)) ; do
+   echo "add set x y${i} { type ipv4_addr; }"
+done
+}
+
+echo "add table x
+$(generate)" > $tmpfile
+
+set -e
+$NFT -f $tmpfile
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next v3] net: netfilter: nf_tables_api: Use id allocation.

2018-05-25 Thread Varsha Rao
In nf_tables_set_alloc_name function, remove get_zeroed_page
find_first_zero_bit and set_bit functions. Instead use ida_get_new_above
function as it simplifies the code.

Signed-off-by: Varsha Rao 
---
Changes in v2:
- Modified the upper limit of page size.

Changes in v3:
- Used ida_get_new_above instead of ida_simple_get due to internal
  locking.
- Defined macro NFT_SET_IDA_SIZE.
- Modified commit message.

 net/netfilter/nf_tables_api.c | 39 ++-
 1 file changed, 25 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 87b2a77add65..4fb3c5dfe9f7 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 
+#define NFT_SET_IDA_SIZE  BITS_PER_BYTE * PAGE_SIZE
+
 static LIST_HEAD(nf_tables_expressions);
 static LIST_HEAD(nf_tables_objects);
 static LIST_HEAD(nf_tables_flowtables);
@@ -2718,17 +2720,14 @@ static int nf_tables_set_alloc_name(struct nft_ctx 
*ctx, struct nft_set *set,
 {
const struct nft_set *i;
const char *p;
-   unsigned long *inuse;
-   unsigned int n = 0, min = 0;
+   int n = 0, min = 0, id;
+   DEFINE_IDA(inuse);
 
p = strchr(name, '%');
if (p != NULL) {
if (p[1] != 'd' || strchr(p + 2, '%'))
return -EINVAL;
 
-   inuse = (unsigned long *)get_zeroed_page(GFP_KERNEL);
-   if (inuse == NULL)
-   return -ENOMEM;
 cont:
list_for_each_entry(i, >table->sets, list) {
int tmp;
@@ -2737,22 +2736,34 @@ static int nf_tables_set_alloc_name(struct nft_ctx 
*ctx, struct nft_set *set,
continue;
if (!sscanf(i->name, name, ))
continue;
-   if (tmp < min || tmp >= min + BITS_PER_BYTE * PAGE_SIZE)
+   if (tmp < min || tmp >= min + NFT_SET_IDA_SIZE)
continue;
 
-   set_bit(tmp - min, inuse);
+   n = ida_get_new_above(, tmp - min, );
+   if (n < 0) {
+   if (n == -EAGAIN)
+   goto cont;
+
+   return n;
+   }
}
+   n = ida_get_new_above(, 0, );
 
-   n = find_first_zero_bit(inuse, BITS_PER_BYTE * PAGE_SIZE);
-   if (n >= BITS_PER_BYTE * PAGE_SIZE) {
-   min += BITS_PER_BYTE * PAGE_SIZE;
-   memset(inuse, 0, PAGE_SIZE);
-   goto cont;
+   if (n < 0) {
+   if (n == -EAGAIN)
+   goto cont;
+   else if (n == -ENOSPC) {
+   min += NFT_SET_IDA_SIZE;
+   ida_destroy();
+   goto cont;
+   } else
+   return n;
}
-   free_page((unsigned long)inuse);
+
+   ida_destroy();
}
 
-   set->name = kasprintf(GFP_KERNEL, name, min + n);
+   set->name = kasprintf(GFP_KERNEL, name, id);
if (!set->name)
return -ENOMEM;
 
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] ipvs: add ipv6 support to ftp

2018-05-25 Thread Julian Anastasov

Hello,

On Fri, 25 May 2018, kbuild test robot wrote:

> Hi Julian,
> 
> I love your patch! Perhaps something to improve:
> 
> [auto build test WARNING on net-next/master]
> 
> url:
> https://github.com/0day-ci/linux/commits/Julian-Anastasov/Add-IPv6-support-to-IPVS-FTP-NAT/20180525-153345
> reproduce:
> # apt-get install sparse
> make ARCH=x86_64 allmodconfig
> make C=1 CF=-D__CHECK_ENDIAN__
> 
> 
> sparse warnings: (new ones prefixed by >>)
> 
> >> net/netfilter/ipvs/ip_vs_ftp.c:399:24: sparse: Using plain integer as NULL 
> >> pointer
>net/netfilter/ipvs/ip_vs_ftp.c:533:24: sparse: Using plain integer as NULL 
> pointer

I missed it, thanks! Will send v2.

Regards

--
Julian Anastasov <j...@ssi.bg>
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nft] fix printing of "tcp flags syn" and "tcp flags == syn" expressions

2018-05-25 Thread Florian Westphal
Sabrina Dubroca  wrote:
> Commit 6979625686ec ("relational: Eliminate meta OPs") introduced some
> bugs when printing bitmask types.
> 
> First, during the post-processing phase of delinearization, the
> expression for "tcp flags syn" (PAYLOAD & flag != 0) gets converted to
> PAYLOAD == flag, which is not equivalent. This should be
> PAYLOAD (IMPL) flag.

Applied, thanks for finding/fixing this.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] fix printing of "tcp flags syn" and "tcp flags == syn" expressions

2018-05-25 Thread Sabrina Dubroca
Commit 6979625686ec ("relational: Eliminate meta OPs") introduced some
bugs when printing bitmask types.

First, during the post-processing phase of delinearization, the
expression for "tcp flags syn" (PAYLOAD & flag != 0) gets converted to
PAYLOAD == flag, which is not equivalent. This should be
PAYLOAD (IMPL) flag.

Then, during output, the "==" sign from "tcp flags == syn" is dropped,
because the bitmask condition in must_print_eq_op() was removed. Let's
restore it, so that "tcp flags == syn" doesn't get printed as
"tcp flags syn". An extra check for value types is added, so that we
don't start printing "==" for sets such as "tcp flags {syn,ack}"

Finally, add a regression test for this particular case.

Fixes: 6979625686ec ("relational: Eliminate meta OPs")
Signed-off-by: Sabrina Dubroca 
---
 src/expression.c| 5 +
 src/netlink_delinearize.c   | 2 +-
 tests/py/inet/tcp.t | 1 +
 tests/py/inet/tcp.t.payload | 7 +++
 4 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/expression.c b/src/expression.c
index 53fb1811f28c..bea0f4c8d9bc 100644
--- a/src/expression.c
+++ b/src/expression.c
@@ -565,6 +565,11 @@ static void binop_arg_print(const struct expr *op, const 
struct expr *arg,
 
 bool must_print_eq_op(const struct expr *expr)
 {
+   if (expr->right->dtype->basetype != NULL &&
+   expr->right->dtype->basetype->type == TYPE_BITMASK &&
+   expr->right->ops->type == EXPR_VALUE)
+   return true;
+
return expr->left->ops->type == EXPR_BINOP;
 }
 
diff --git a/src/netlink_delinearize.c b/src/netlink_delinearize.c
index 8f4035a291f4..7d882ebac555 100644
--- a/src/netlink_delinearize.c
+++ b/src/netlink_delinearize.c
@@ -1733,7 +1733,7 @@ static void relational_binop_postprocess(struct 
rule_pp_ctx *ctx, struct expr *e
 
expr->left  = expr_get(binop->left);
expr->right = binop_tree_to_list(NULL, binop->right);
-   expr->op= OP_EQ;
+   expr->op= OP_IMPLICIT;
 
expr_free(binop);
} else if (binop->left->dtype->flags & DTYPE_F_PREFIX &&
diff --git a/tests/py/inet/tcp.t b/tests/py/inet/tcp.t
index 5276516625cc..d66ba8438a32 100644
--- a/tests/py/inet/tcp.t
+++ b/tests/py/inet/tcp.t
@@ -76,6 +76,7 @@ tcp flags { fin, syn, rst, psh, ack, urg, ecn, cwr} drop;ok
 tcp flags != { fin, urg, ecn, cwr} drop;ok
 tcp flags cwr;ok
 tcp flags != cwr;ok
+tcp flags == syn;ok
 tcp flags & (syn|fin) == (syn|fin);ok;tcp flags & (fin | syn) == fin | syn
 
 tcp window 2;ok
diff --git a/tests/py/inet/tcp.t.payload b/tests/py/inet/tcp.t.payload
index 512b42e9df08..09538aed746c 100644
--- a/tests/py/inet/tcp.t.payload
+++ b/tests/py/inet/tcp.t.payload
@@ -421,6 +421,13 @@ inet test-inet input
   [ payload load 1b @ transport header + 13 => reg 1 ]
   [ cmp neq reg 1 0x0080 ]
 
+# tcp flags == syn
+inet test-inet input
+  [ meta load l4proto => reg 1 ]
+  [ cmp eq reg 1 0x0006 ]
+  [ payload load 1b @ transport header + 13 => reg 1 ]
+  [ cmp eq reg 1 0x0002 ]
+
 # tcp flags & (syn|fin) == (syn|fin)
 inet test-inet input
   [ meta load l4proto => reg 1 ]
-- 
2.17.0

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] ipvs: add ipv6 support to ftp

2018-05-25 Thread kbuild test robot
Hi Julian,

I love your patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Julian-Anastasov/Add-IPv6-support-to-IPVS-FTP-NAT/20180525-153345
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__


sparse warnings: (new ones prefixed by >>)

>> net/netfilter/ipvs/ip_vs_ftp.c:399:24: sparse: Using plain integer as NULL 
>> pointer
   net/netfilter/ipvs/ip_vs_ftp.c:533:24: sparse: Using plain integer as NULL 
pointer

vim +399 net/netfilter/ipvs/ip_vs_ftp.c

   239  
   240  /* Look at outgoing ftp packets to catch the response to a PASV/EPSV 
command
   241   * from the server (inside-to-outside).
   242   * When we see one, we build a connection entry with the client address,
   243   * client port 0 (unknown at the moment), the server address and the
   244   * server port.  Mark the current connection entry as a control channel
   245   * of the new entry. All this work is just to make the data connection
   246   * can be scheduled to the right server later.
   247   *
   248   * The outgoing packet should be something like
   249   *   "227 Entering Passive Mode (xxx,xxx,xxx,xxx,ppp,ppp)".
   250   * xxx,xxx,xxx,xxx is the server address, ppp,ppp is the server port 
number.
   251   * The extended format for EPSV response provides usually only port:
   252   *   "229 Entering Extended Passive Mode (|||ppp|)"
   253   */
   254  static int ip_vs_ftp_out(struct ip_vs_app *app, struct ip_vs_conn *cp,
   255   struct sk_buff *skb, int *diff,
   256   struct ip_vs_iphdr *ipvsh)
   257  {
   258  char *data, *data_limit;
   259  char *start, *end;
   260  union nf_inet_addr from;
   261  __be16 port;
   262  struct ip_vs_conn *n_cp;
   263  char buf[24];   /* xxx.xxx.xxx.xxx,ppp,ppp\000 */
   264  unsigned int buf_len;
   265  int ret = 0;
   266  enum ip_conntrack_info ctinfo;
   267  struct nf_conn *ct;
   268  
   269  *diff = 0;
   270  
   271  /* Only useful for established sessions */
   272  if (cp->state != IP_VS_TCP_S_ESTABLISHED)
   273  return 1;
   274  
   275  /* Linear packets are much easier to deal with. */
   276  if (!skb_make_writable(skb, skb->len))
   277  return 0;
   278  
   279  if (cp->app_data == (void *) IP_VS_FTP_PASV) {
   280  data = ip_vs_ftp_data_ptr(skb, ipvsh);
   281  data_limit = skb_tail_pointer(skb);
   282  
   283  if (!data || data >= data_limit)
   284  return 1;
   285  
   286  if (ip_vs_ftp_get_addrport(data, data_limit,
   287 SERVER_STRING_PASV,
   288 sizeof(SERVER_STRING_PASV)-1,
   289 '(', false, IP_VS_FTP_PASV,
   290 , , cp->af,
   291 , ) != 1)
   292  return 1;
   293  
   294  IP_VS_DBG(7, "PASV response (%pI4:%u) -> %pI4:%u 
detected\n",
   295, ntohs(port), >caddr.ip, 0);
   296  } else if (cp->app_data == (void *) IP_VS_FTP_EPSV) {
   297  data = ip_vs_ftp_data_ptr(skb, ipvsh);
   298  data_limit = skb_tail_pointer(skb);
   299  
   300  if (!data || data >= data_limit)
   301  return 1;
   302  
   303  /* Usually, data address is not specified but
   304   * we support different address, so pre-set it.
   305   */
   306  from = cp->daddr;
   307  if (ip_vs_ftp_get_addrport(data, data_limit,
   308 SERVER_STRING_EPSV,
   309 sizeof(SERVER_STRING_EPSV)-1,
   310 '(', true, IP_VS_FTP_EPSV,
   311 , , cp->af,
   312 , ) != 1)
   313  return 1;
   314  
   315  IP_VS_DBG_BUF(7, "EPSV response (%s:%u) -> %s:%u 
detected\n",
   316IP_VS_DBG_ADDR(cp->af, ), 
ntohs(port),
   317IP_VS_DBG_ADDR(cp->af, >caddr), 0);
   318  } else {
   319  return 1;
   320  }
   321  
   322  /* Now update or create a connection entry for it */
   323  {
   324  str

[PATCH nft,v2] segtree: incorrect handling of comments and timeouts with mapping

2018-05-25 Thread Pablo Neira Ayuso
Check if expression is a mapping to do the right handling.

Fixes: 35fedcf540bf ("segtree: missing comments in range and prefix expressions 
in sets")
Fixes: be90e03dd1fa ("segtree: add timeout for range and prefix expressions in 
sets")
Reported-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
v2: wrong reference to expr->left from non-mapping expression.

 src/segtree.c | 67 ---
 1 file changed, 46 insertions(+), 21 deletions(-)

diff --git a/src/segtree.c b/src/segtree.c
index 28d45c920c3c..8a8aa71e8a6e 100644
--- a/src/segtree.c
+++ b/src/segtree.c
@@ -540,13 +540,19 @@ static void set_insert_interval(struct expr *set, struct 
seg_tree *tree,
expr = set_elem_expr_alloc(_location, expr);
 
if (ei->expr != NULL) {
-   if (ei->expr->comment)
-   expr->comment = xstrdup(ei->expr->comment);
-   if (ei->expr->timeout)
-   expr->timeout = ei->expr->timeout;
-   if (ei->expr->ops->type == EXPR_MAPPING)
+   if (ei->expr->ops->type == EXPR_MAPPING) {
+   if (ei->expr->left->comment)
+   expr->comment = 
xstrdup(ei->expr->left->comment);
+   if (ei->expr->left->timeout)
+   expr->timeout = ei->expr->left->timeout;
expr = mapping_expr_alloc(>expr->location, expr,
  expr_get(ei->expr->right));
+   } else {
+   if (ei->expr->comment)
+   expr->comment = xstrdup(ei->expr->comment);
+   if (ei->expr->timeout)
+   expr->timeout = ei->expr->timeout;
+   }
}
 
if (ei->flags & EI_F_INTERVAL_END)
@@ -831,15 +837,25 @@ void interval_map_decompose(struct expr *set)
 
tmp = range_expr_alloc(>location, expr_value(low), 
tmp);
tmp = set_elem_expr_alloc(>location, tmp);
-   if (low->comment)
-   tmp->comment = xstrdup(low->comment);
-   if (low->timeout)
-   tmp->timeout = low->timeout;
-   if (low->expiration)
-   tmp->expiration = low->expiration;
 
-   if (low->ops->type == EXPR_MAPPING)
-   tmp = mapping_expr_alloc(>location, tmp, 
low->right);
+   if (low->ops->type == EXPR_MAPPING) {
+   if (low->left->comment)
+   tmp->comment = 
xstrdup(low->left->comment);
+   if (low->left->timeout)
+   tmp->timeout = low->left->timeout;
+   if (low->left->expiration)
+   tmp->expiration = low->left->expiration;
+
+   tmp = mapping_expr_alloc(>location, tmp,
+low->right);
+   } else {
+   if (low->comment)
+   tmp->comment = xstrdup(low->comment);
+   if (low->timeout)
+   tmp->timeout = low->timeout;
+   if (low->expiration)
+   tmp->expiration = low->expiration;
+   }
 
compound_expr_add(set, tmp);
} else {
@@ -852,16 +868,25 @@ void interval_map_decompose(struct expr *set)
prefix->len = expr_value(i)->len;
 
prefix = set_elem_expr_alloc(>location, prefix);
-   if (low->comment)
-   prefix->comment = xstrdup(low->comment);
-   if (low->timeout)
-   prefix->timeout = low->timeout;
-   if (low->expiration)
-   prefix->expiration = low->expiration;
-
-   if (low->ops->type == EXPR_MAPPING)
+
+   if (low->ops->type == EXPR_MAPPING) {
+   if (low->left->comment)
+   prefix->comment = 
xstrdup(low->left->comment);
+   if (low->left->timeout)
+   prefix->timeout = low->left->timeout;
+   if (low->left->expiration)
+   prefix->expiration = 
low->left->expiration;
+
prefix = mapping_expr_alloc(>location, 
prefix,
low->right);
+   

[PATCH nft] segtree: incorrect handling of comments and timeouts with mapping

2018-05-25 Thread Pablo Neira Ayuso
Check if expression is a mapping to do the right handling.

Fixes: 35fedcf540bf ("segtree: missing comments in range and prefix expressions 
in sets")
Fixes: be90e03dd1fa ("segtree: add timeout for range and prefix expressions in 
sets")
Reported-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 src/segtree.c | 67 ---
 1 file changed, 46 insertions(+), 21 deletions(-)

diff --git a/src/segtree.c b/src/segtree.c
index 28d45c920c3c..423c54bdeda6 100644
--- a/src/segtree.c
+++ b/src/segtree.c
@@ -540,13 +540,19 @@ static void set_insert_interval(struct expr *set, struct 
seg_tree *tree,
expr = set_elem_expr_alloc(_location, expr);
 
if (ei->expr != NULL) {
-   if (ei->expr->comment)
-   expr->comment = xstrdup(ei->expr->comment);
-   if (ei->expr->timeout)
-   expr->timeout = ei->expr->timeout;
-   if (ei->expr->ops->type == EXPR_MAPPING)
+   if (ei->expr->ops->type == EXPR_MAPPING) {
+   if (ei->expr->left->comment)
+   expr->comment = 
xstrdup(ei->expr->left->comment);
+   if (ei->expr->left->timeout)
+   expr->timeout = ei->expr->left->timeout;
expr = mapping_expr_alloc(>expr->location, expr,
  expr_get(ei->expr->right));
+   } else {
+   if (ei->expr->comment)
+   expr->comment = xstrdup(ei->expr->comment);
+   if (ei->expr->timeout)
+   expr->timeout = ei->expr->timeout;
+   }
}
 
if (ei->flags & EI_F_INTERVAL_END)
@@ -831,15 +837,25 @@ void interval_map_decompose(struct expr *set)
 
tmp = range_expr_alloc(>location, expr_value(low), 
tmp);
tmp = set_elem_expr_alloc(>location, tmp);
-   if (low->comment)
-   tmp->comment = xstrdup(low->comment);
-   if (low->timeout)
-   tmp->timeout = low->timeout;
-   if (low->expiration)
-   tmp->expiration = low->expiration;
 
-   if (low->ops->type == EXPR_MAPPING)
-   tmp = mapping_expr_alloc(>location, tmp, 
low->right);
+   if (low->ops->type == EXPR_MAPPING) {
+   if (low->left->comment)
+   tmp->comment = 
xstrdup(low->left->comment);
+   if (low->left->timeout)
+   tmp->timeout = low->left->timeout;
+   if (low->left->expiration)
+   tmp->expiration = low->left->expiration;
+
+   tmp = mapping_expr_alloc(>location, tmp,
+low->right);
+   } else {
+   if (low->comment)
+   tmp->comment = xstrdup(low->comment);
+   if (low->timeout)
+   tmp->timeout = low->timeout;
+   if (low->left->expiration)
+   tmp->expiration = low->expiration;
+   }
 
compound_expr_add(set, tmp);
} else {
@@ -852,16 +868,25 @@ void interval_map_decompose(struct expr *set)
prefix->len = expr_value(i)->len;
 
prefix = set_elem_expr_alloc(>location, prefix);
-   if (low->comment)
-   prefix->comment = xstrdup(low->comment);
-   if (low->timeout)
-   prefix->timeout = low->timeout;
-   if (low->expiration)
-   prefix->expiration = low->expiration;
-
-   if (low->ops->type == EXPR_MAPPING)
+
+   if (low->ops->type == EXPR_MAPPING) {
+   if (low->left->comment)
+   prefix->comment = 
xstrdup(low->left->comment);
+   if (low->left->timeout)
+   prefix->timeout = low->left->timeout;
+   if (low->left->expiration)
+   prefix->expiration = 
low->left->expiration;
+
prefix = mapping_expr_alloc(>location, 
prefix,
low->right);
+   } else {
+   if 

Re: [PATCH net] ipvs: fix buffer overflow with sync daemon and service

2018-05-25 Thread Jesper Dangaard Brouer
On Sat, 19 May 2018 18:22:35 +0300
Julian Anastasov  wrote:

> The same happens for sched_name when adding/editing virtual server.
> 
> We are restricted by IP_VS_SCHEDNAME_MAXLEN and IP_VS_IFNAME_MAXLEN
> being used as size in include/uapi/linux/ip_vs.h, so they
> include the space for NUL.

Ah, okay I see why you also need the ipvsadm change :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html