Re: Concurrent iptables-restore calls clobberring each other

2017-02-03 Thread Jan Engelhardt

On Friday 2017-02-03 21:37, Shaun Crampton wrote:
>
>I'm trying to diagnose an incompatibility between my application
>(Project Calico's Felix daemon) and another (Kuberenetes' kube-proxy).
>Both are (ab)using iptables-restore to do high-speed bulk updates to
>iptables and they're both using --noflush so they can use
>iptables-restore to edit only some chains.  Mostly, this works great
>and it's many times faster than using individual iptables commands.
[...]
>My understanding is that each iptables-restore call actually does a
>read-modify-write of the whole table

This is by design; the RMW cycle in principle also affects the "slower"
iptables - which is why it is slower, because it does only one rule per cycle.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] audit: normalize NETFILTER_PKT

2017-02-03 Thread Paul Moore
On Tue, Jan 31, 2017 at 2:44 PM, Richard Guy Briggs  wrote:
> On 2017-01-31 17:13, Steve Grubb wrote:

...

>> I was curious about something. Auparse is trying to interpret the
>> icmptype field for every event. This is not good. Which fields are
>> valid for each kind of packet? Are there fields valid for all packets?
>> Is the icmptype/code the only ones that don't apply in all cases?
>
> Ok, this is important to know...  You sound surprised.  So if that field
> isn't valid for all cases of that event, then the event should be split
> or the "unset" value should be used as a hint to ignore it.
>
> This was the point of my earlier posting:
> https://www.redhat.com/archives/linux-audit/2017-January/msg00074.html
> There are still a number of questions from that thread that had no
> reply.  Answering those questions would help inform this discussion, so
> if you could answer some of those questions in that first thread, I'd
> have a better chance of understanding what are the limitations of the
> parser and design/work around them.
>
> There is no packet for which all fields are valid.  This is why using
> "unset" values in those fields was suggested, seemed to be accepted in
> discussion, and implemented.

...

> Swinging fields in and out makes it very handy to use one message type
> for all of them and can save precious disk bandwidth, but the point was
> to normalize these messages.  Is that still realistic and necessary?  If
> so, we're trying to find a balance between message type explosion and
> disk bandwidth.
>
> We either need to make this fine-grained, ignore fields that aren't
> valid for that type, or swing fields in and out.  Or maybe I have missed
> something fundamental, such as the presence of subsequent fields depends
> on the values of previous fields?

I'm still trying to understand what purpose this record actually
serves, and what requirements may exist.  In an earlier thread
somewhere Steve mentioned some broad requirements around data
import/export, and I really wonder if the NETFILTER_PKT record
provides anything useful here when it really isn't connecting the
traffic to the sender/receiver without a lot of additional logging and
post-processing smarts.  If you were interested in data import/export
I think auditing the socket syscalls would provide a much more useful
set of records in the audit log.

Considering that one of the primary motivations for the audit
subsystem is to enable compliance with various security
specifications, let's get the ones we know about listed in this thread
and then figure out how best to meet those requirements.

-- 
paul moore
www.paul-moore.com
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/27] Netfilter updates for net-next

2017-02-03 Thread David Miller
From: Pablo Neira Ayuso 
Date: Fri,  3 Feb 2017 13:25:11 +0100

> The following patchset contains Netfilter updates for your net-next
> tree, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Concurrent iptables-restore calls clobberring each other

2017-02-03 Thread Shaun Crampton
Hi,

I'm trying to diagnose an incompatibility between my application
(Project Calico's Felix daemon) and another (Kuberenetes' kube-proxy).
Both are (ab)using iptables-restore to do high-speed bulk updates to
iptables and they're both using --noflush so they can use
iptables-restore to edit only some chains.  Mostly, this works great
and it's many times faster than using individual iptables commands.
However, sometimes when they do an iptables-restore at the same time,
I see one of the updates get lost even though the command reported
success.  I've boiled it down to a repro script[1] that starts two
threads writing to iptables and looks for missing updates.

My understanding is that each iptables-restore call actually does a
read-modify-write of the whole table so it's not too surprising that
we could get a missed update.  However, I thought that iptables has
some sort of sequence number to prevent clobbering, making it a
compare-and-swap operation.  I've certainly seen iptables-restore
calls fail on the COMMIT line when doing concurrent updates and I have
a tweaked script[2] that exhibits that behaviour.  In script [2] I
added an extra superfluous rule update to one of the writers and
suddenly the COMMIT starts failing as I was hoping.  While the toy
example in [2] seems to work, if I add more operations, it seems to go
back to failing again so it may just be a timing window.

Output from script [1] (it quickly fails after detecting a lost update):

$ sudo ./iptables.sh
[sudo] password for shaun:
akKkKkKkKkiptables-restore: line 4 failed
AbKkKkBKkCaKkAbKkBKkCaKkAbKkBKk
FELIX-B update was clobbered

Output from script [2] (keeps going for as long as I've let it run):

$ sudo ./iptables.sh
akKkAbKkBKkCaKkAbKkBKkKCakAbZkBZkKkCaKkAbKkBCaKkAbBZkKkKkCaKkAbKkBZkKkCaKkAbKkBKkKkCaKkKkAbKkBKkKkCaKkAbKkBZkKkCaKkAbKkBKkCaKkAbKkBZkKkCaKkAbKkBKkKkCaKkAbKkBKkKkCaAbKkKkBKkKkCaKkAbKkBKkCaKkAbBZkCaKkAbKkBKkKkCaKkAbBCa

Where a K means that the "kube" thread successfully wrote to iptables
and a Z means it got a "COMMIT failed".

It'd be great to know if this is working as designed or a bug, or if
there's a way to make sure that I get a COMMIT failure if there's been
a concurrent update.  Without that, I'm thinking we'll have to do a
regular poll to make sure that nothing got clobberred.

I'd appreciate if you CCed me on any responses since I'm not
subscribed to the list.  Thanks,

-Shaun

[1] https://gist.github.com/fasaxc/ee443a9ef82ce2e4dab059161f095ec2
[2] https://gist.github.com/fasaxc/05a80a48211500e4f2225011a131f92e
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nft] statement: Print NAT IPv4 address in nat_stmt_print()

2017-02-03 Thread Elise Lennion
The case which "nat.addr != NULL && nat.proto != NULL && type != ipv6"
wasn't caught in nat_stmt_print(). Now all cases should be considered.

Also, the if statements were reorganized to get rid of one nested if.

Fixes(Bug 1117 - Table ipv4-nat prerouting dnat doesn't accept dest IP:PORT)

Signed-off-by: Elise Lennion 
---
 src/statement.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/statement.c b/src/statement.c
index 9cdabbb..0585d66 100644
--- a/src/statement.c
+++ b/src/statement.c
@@ -494,25 +494,26 @@ static void nat_stmt_print(const struct stmt *stmt)
};
 
printf("%s to ", nat_types[stmt->nat.type]);
-   if (stmt->nat.addr) {
-   if (stmt->nat.proto) {
-   if (stmt->nat.addr->ops->type == EXPR_VALUE &&
-   stmt->nat.addr->dtype->type == TYPE_IP6ADDR) {
-   printf("[");
-   expr_print(stmt->nat.addr);
-   printf("]");
-   } else if (stmt->nat.addr->ops->type == EXPR_RANGE &&
-  stmt->nat.addr->left->dtype->type == 
TYPE_IP6ADDR) {
-   printf("[");
-   expr_print(stmt->nat.addr->left);
-   printf("]-[");
-   expr_print(stmt->nat.addr->right);
-   printf("]");
-   }
+   if (stmt->nat.addr && stmt->nat.proto) {
+   if (stmt->nat.addr->ops->type == EXPR_RANGE &&
+   stmt->nat.addr->left->dtype->type == TYPE_IP6ADDR) {
+   printf("[");
+   expr_print(stmt->nat.addr->left);
+   printf("]-[");
+   expr_print(stmt->nat.addr->right);
+   printf("]");
+   }
+   else if (stmt->nat.addr->dtype->type == TYPE_IP6ADDR) {
+   printf("[");
+   expr_print(stmt->nat.addr);
+   printf("]");
} else {
expr_print(stmt->nat.addr);
}
}
+   else if (stmt->nat.addr) {
+   expr_print(stmt->nat.addr);
+   }
 
if (stmt->nat.proto) {
printf(":");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nftables] statement: fix print of ip dnat address

2017-02-03 Thread Florian Westphal
the change causes non-ipv6 addresses to not be printed at all in case
a nfproto was given.

Also add a test case to catch this.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1117
Fixes: 5ab0e10fc6e2c22363a ("src: support for RFC2732 IPv6 address format with 
brackets")
Signed-off-by: Florian Westphal 
---
 src/statement.c   |  2 ++
 tests/py/ip/dnat.t|  1 +
 tests/py/ip/dnat.t.payload.ip | 12 
 3 files changed, 15 insertions(+)

diff --git a/src/statement.c b/src/statement.c
index 9cdabbb979e8..3beb86ab4263 100644
--- a/src/statement.c
+++ b/src/statement.c
@@ -508,6 +508,8 @@ static void nat_stmt_print(const struct stmt *stmt)
printf("]-[");
expr_print(stmt->nat.addr->right);
printf("]");
+   } else {
+   expr_print(stmt->nat.addr);
}
} else {
expr_print(stmt->nat.addr);
diff --git a/tests/py/ip/dnat.t b/tests/py/ip/dnat.t
index da00106edbb4..089017c84704 100644
--- a/tests/py/ip/dnat.t
+++ b/tests/py/ip/dnat.t
@@ -7,6 +7,7 @@ iifname "eth0" tcp dport != 80-90 dnat to 192.168.3.2;ok
 iifname "eth0" tcp dport {80, 90, 23} dnat to 192.168.3.2;ok
 iifname "eth0" tcp dport != {80, 90, 23} dnat to 192.168.3.2;ok
 iifname "eth0" tcp dport != 23-34 dnat to 192.168.3.2;ok
+iifname "eth0" tcp dport 81 dnat to 192.168.3.2:8080;ok
 
 dnat to ct mark map { 0x0014 : 1.2.3.4};ok
 dnat to ct mark . ip daddr map { 0x0014 . 1.1.1.1 : 1.2.3.4};ok
diff --git a/tests/py/ip/dnat.t.payload.ip b/tests/py/ip/dnat.t.payload.ip
index 66926990d880..7a7f5a82dd5a 100644
--- a/tests/py/ip/dnat.t.payload.ip
+++ b/tests/py/ip/dnat.t.payload.ip
@@ -60,6 +60,18 @@ ip test-ip4 prerouting
   [ immediate reg 1 0x0203a8c0 ]
   [ nat dnat ip addr_min reg 1 addr_max reg 0 ]
 
+# iifname "eth0" tcp dport 81 dnat to 192.168.3.2:8080
+ip test-ip4 prerouting
+  [ meta load iifname => reg 1 ]
+  [ cmp eq reg 1 0x30687465 0x 0x 0x ]
+  [ payload load 1b @ network header + 9 => reg 1 ]
+  [ cmp eq reg 1 0x0006 ]
+  [ payload load 2b @ transport header + 2 => reg 1 ]
+  [ cmp eq reg 1 0x5100 ]
+  [ immediate reg 1 0x0203a8c0 ]
+  [ immediate reg 2 0x901f ]
+  [ nat dnat ip addr_min reg 1 addr_max reg 0 proto_min reg 2 proto_max reg 0 ]
+
 # dnat to ct mark map { 0x0014 : 1.2.3.4}
 __map%d test-ip4 b
 __map%d test-ip4 0
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nftables 9/9] tests: add test entries for conntrack zones

2017-02-03 Thread Florian Westphal
Signed-off-by: Florian Westphal 
---
 tests/py/any/ct.t | 13 +
 tests/py/any/ct.t.payload | 44 
 2 files changed, 57 insertions(+)

diff --git a/tests/py/any/ct.t b/tests/py/any/ct.t
index 2cfbfe13ccd2..6f32d29c0c40 100644
--- a/tests/py/any/ct.t
+++ b/tests/py/any/ct.t
@@ -100,6 +100,19 @@ ct label 127;ok
 ct label set 127;ok
 ct label 128;fail
 
+ct zone 0;ok
+ct zone 23;ok
+ct zone 65536;fail
+ct both zone 1;fail
+ct original zone 1;ok
+ct reply zone 1;ok
+
+ct zone set 1;ok
+ct original zone set 1;ok
+ct reply zone set 1;ok
+ct zone set mark map { 1 : 1,  2 : 2 };ok;ct zone set mark map { 0x0001 : 
1, 0x0002 : 2}
+ct both zone set 1;fail
+
 ct invalid;fail
 ct invalid original;fail
 ct set invalid original 42;fail
diff --git a/tests/py/any/ct.t.payload b/tests/py/any/ct.t.payload
index 3370bcac0594..e4c7f62b69f5 100644
--- a/tests/py/any/ct.t.payload
+++ b/tests/py/any/ct.t.payload
@@ -402,6 +402,50 @@ ip test-ip4 output
   [ immediate reg 1 0x 0x 0x 0x8000 ]
   [ ct set label with reg 1 ]
 
+# ct zone 0
+ip test-ip4 output
+  [ ct load zone => reg 1 ]
+  [ cmp eq reg 1 0x ]
+
+# ct zone 23
+ip test-ip4 output
+  [ ct load zone => reg 1 ]
+  [ cmp eq reg 1 0x0017 ]
+
+# ct original zone 1
+ip test-ip4 output
+  [ ct load zone => reg 1 , dir original ]
+  [ cmp eq reg 1 0x0001 ]
+
+# ct reply zone 1
+ip test-ip4 output
+  [ ct load zone => reg 1 , dir reply ]
+  [ cmp eq reg 1 0x0001 ]
+
+# ct zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x0001 ]
+  [ ct set zone with reg 1 ]
+
+# ct original zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x0001 ]
+  [ ct set zone with reg 1 , dir original ]
+
+# ct reply zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x0001 ]
+  [ ct set zone with reg 1 , dir reply ]
+
+# ct zone set mark map { 1 : 1,  2 : 2 }
+__map%d test-ip4 b
+__map%d test-ip4 0
+element 0001  : 0001 0 [end]element 0002  : 0002 0 
[end]
+ip test-ip4 output
+  [ meta load mark => reg 1 ]
+  [ lookup reg 1 set __map%d dreg 1 ]
+  [ ct set zone with reg 1 ]
+
 # notrack
 ip test-ip4 output
   [ notrack ]
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nftables 7/9] ct: refactor print function so it can be re-used for ct statement

2017-02-03 Thread Florian Westphal
Once directional zone support is added we also need to print the
direction of the statement, so factor the common code to re-use
this helper from the statement print function.

Signed-off-by: Florian Westphal 
---
 src/ct.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/ct.c b/src/ct.c
index dffa0e5fa44a..7e09c5b246b2 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -238,22 +238,27 @@ static const struct ct_template ct_templates[] = {
  BYTEORDER_HOST_ENDIAN, 16),
 };
 
-static void ct_expr_print(const struct expr *expr)
+static void ct_print(enum nft_ct_keys key, int8_t dir)
 {
const struct symbolic_constant *s;
 
printf("ct ");
-   if (expr->ct.direction < 0)
+   if (dir < 0)
goto done;
 
for (s = ct_dir_tbl.symbols; s->identifier != NULL; s++) {
-   if (expr->ct.direction == (int) s->value) {
+   if (dir == (int)s->value) {
printf("%s ", s->identifier);
break;
}
}
  done:
-   printf("%s", ct_templates[expr->ct.key].token);
+   printf("%s", ct_templates[key].token);
+}
+
+static void ct_expr_print(const struct expr *expr)
+{
+   ct_print(expr->ct.key, expr->ct.direction);
 }
 
 static bool ct_expr_cmp(const struct expr *e1, const struct expr *e2)
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nftables 6/9] src: add conntrack zone support

2017-02-03 Thread Florian Westphal
This enables zone get/set support.

As the zone can be optionally tied to a direction as well we need a new
token for this (unless we turn reply/original into tokens in which case
we could handle zone via STRING).

There was some discussion on how zone set support should be handled,
especially 'zone set 1'.

There are several issues to consider:

1. its not possible to change a zone 'later on', any given
conntrack flow has exactly one zone for its entire lifetime.

2. to create conntracks in a given zone, the zone therefore has to be
assigned *before* the packet gets picked up by conntrack (so that lookup
finds the correct existing flow or the flow is created with the desired
zone id).  In iptables, this is enforced because zones are assigned with
CT target and this is restricted to the 'raw' table in iptables, which
runs after defragmentation but before connection tracking.

3. Thus, in nftables the 'ct zone set' rule needs to hook before
conntrack too, e.g. via

 table raw {
  chain pre {
   type filter hook prerouting priority -300;
   iif eth3 ct zone set 23
  }
  chain out {
   type filter hook output priority -300;
   oif eth3 ct zone set 23
  }
 }

... but this is not enforced.

There were two alternatives to better document this.
One was to use an explicit 'template' keyword:
  nft ... template zone set 23

... but 'connection tracking templates' are a kernel detail
that users should not and need not know about.

The other one was to use the meta keyword instead since
we're (from a practical point of view) assigning the zone to
the packet, not the conntrack:

 nft ... meta zone set 23

However, next patch also supports 'directional' zones, and

 nft ... meta original zone 23

makes no sense because 'direction' refers to a direction as understood
by the connection tracker.

Signed-off-by: Florian Westphal 
---
 doc/nft.xml | 10 +-
 include/linux/netfilter/nf_tables.h |  1 +
 src/ct.c|  2 ++
 src/parser_bison.y  | 10 ++
 src/scanner.l   |  1 +
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/doc/nft.xml b/doc/nft.xml
index 78e112f3974b..0a81728789bf 100644
--- a/doc/nft.xml
+++ b/doc/nft.xml
@@ -2126,7 +2126,8 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
direction before the conntrack key, others must 
be used directly because they are direction agnostic.
The packets, 
bytes and avgpkt keywords can be
used with or without a direction. If the 
direction is omitted, the sum of the original and the reply
-   direction is returned.
+   direction is returned.  The same is true for 
the zone, if a direction is given, the zone
+   is only matched if the zone id is tied to the 
given direction.



@@ -2144,6 +2145,7 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
bytes
packets
avgpkt
+   zone



@@ -2162,6 +2164,7 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
bytes
packets
avgpkt
+   zone



@@ -2260,6 +2263,11 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
average 
bytes per packet, see description for packets keyword
integer 
(64 bit)

+   
+   
zone
+   
conntrack zone
+   integer 
(16 bit)
+   



diff --git a/include/linux/netfilter/nf_tables.h 
b/include/linux/netfilter/nf_tables.h
index b00a05d1ee56..fc0ed47d974d 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -883,6 +883,7 @@ enum nft_ct_keys {
NFT_CT_PKTS,
NFT_CT_BYTES,
NFT_CT_AVGPKT,
+   

[PATCH -next 0/9] nftables: add zone support to ct statement

2017-02-03 Thread Florian Westphal
This adds the ability to set the conntrack zone from nftables, i.e.
native replacement for -j CT --zone $number.

See individual patches for details.
This will need more documentation and exposure of the builtin
hook priorities (e.g. via defines?) so users can more easily
see whats happening.

Pablo suggested to allow something like

hook prerouting prio $raw;
or even
hook prerouting prio $conntrack - 1;

instead of the 'awkward' use of the actual numbers used by the kernel
('priority -300' to hook at same priority as raw table).

However, this series doesn't contain any of that, so users will
have to use priorities between -399 and -199 (i.e. after defrag and
before conntrack pickup) to assign zones.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH libnftnl 4/9] src: ct: add zone support

2017-02-03 Thread Florian Westphal
Signed-off-by: Florian Westphal 
---
 include/linux/netfilter/nf_tables.h | 2 ++
 src/expr/ct.c   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/netfilter/nf_tables.h 
b/include/linux/netfilter/nf_tables.h
index b00a05d1ee56..b972e72623c2 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -864,6 +864,7 @@ enum nft_rt_attributes {
  * @NFT_CT_PKTS: conntrack packets
  * @NFT_CT_BYTES: conntrack bytes
  * @NFT_CT_AVGPKT: conntrack average bytes per packet
+ * @NFT_CT_ZONE: conntrack zone
  */
 enum nft_ct_keys {
NFT_CT_STATE,
@@ -883,6 +884,7 @@ enum nft_ct_keys {
NFT_CT_PKTS,
NFT_CT_BYTES,
NFT_CT_AVGPKT,
+   NFT_CT_ZONE,
 };
 
 /**
diff --git a/src/expr/ct.c b/src/expr/ct.c
index d3d352e9f959..cdd08e95c86c 100644
--- a/src/expr/ct.c
+++ b/src/expr/ct.c
@@ -32,7 +32,7 @@ struct nftnl_expr_ct {
 #define IP_CT_DIR_REPLY1
 
 #ifndef NFT_CT_MAX
-#define NFT_CT_MAX (NFT_CT_AVGPKT + 1)
+#define NFT_CT_MAX (NFT_CT_ZONE + 1)
 #endif
 
 static int
@@ -170,6 +170,7 @@ static const char *ctkey2str_array[NFT_CT_MAX] = {
[NFT_CT_PKTS]   = "packets",
[NFT_CT_BYTES]  = "bytes",
[NFT_CT_AVGPKT] = "avgpkt",
+   [NFT_CT_ZONE]   = "zone",
 };
 
 static const char *ctkey2str(uint32_t ctkey)
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nftables 5/9] src: add host byte order integer type

2017-02-03 Thread Florian Westphal
This is needed once we add support to set a zone, as in

ct zone set 42

Using integer_type makes nft use big-endian representation of the zone id
instead of the required host byte order.

When using 'ct zone 1', things will work because the (implicit) relational
operation makes sure that the left and right sides have same byte order.

In the statement case the lack of relop means we either need to convert
ourselves (the ct template contains endianess info), or use a dedicated type
(the latter is the reason why setting a mark will 'just work' since the
 mark type takes care of it).

The dedicated type has the advantage that it also works when maps are used:

ct zone set mark map { 1 : 10, 2 : 20, 3 : 30 }

... which is not easy to do with current map/set code, its endianess
settings rely on dtype->byteorder (i.e., it will always set BIG_ENDIAN
when we'd use integer_type for the zone).

Using evaluation context seems like a nightmare because several
places during eval steps can re-set this information, and propagating
the template info means to pollute generic code with something specific
to ct.

It seems like a future removal of all .byteorder members in the templates
in favor of using appropriate types might be a good idea.

Signed-off-by: Florian Westphal 
---
 include/datatype.h |  2 ++
 src/datatype.c | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/include/datatype.h b/include/datatype.h
index 9f127f2954e3..8c1c827253be 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -82,6 +82,7 @@ enum datatypes {
TYPE_DSCP,
TYPE_ECN,
TYPE_FIB_ADDR,
+   TYPE_U32,
__TYPE_MAX
 };
 #define TYPE_MAX   (__TYPE_MAX - 1)
@@ -231,6 +232,7 @@ extern const struct datatype icmp_code_type;
 extern const struct datatype icmpv6_code_type;
 extern const struct datatype icmpx_code_type;
 extern const struct datatype time_type;
+extern const struct datatype u32_type;
 
 extern const struct datatype *concat_type_alloc(uint32_t type);
 extern void concat_type_destroy(const struct datatype *dtype);
diff --git a/src/datatype.c b/src/datatype.c
index 1518606a3f89..cab42d47f0f0 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -48,6 +48,7 @@ static const struct datatype *datatypes[TYPE_MAX + 1] = {
[TYPE_ICMP_CODE]= _code_type,
[TYPE_ICMPV6_CODE]  = _code_type,
[TYPE_ICMPX_CODE]   = _code_type,
+   [TYPE_U32]  = _type,
 };
 
 void datatype_register(const struct datatype *dtype)
@@ -1057,3 +1058,12 @@ struct error_record *rate_parse(const struct location 
*loc, const char *str,
 
return NULL;
 }
+
+const struct datatype u32_type = {
+   .type   = TYPE_U32,
+   .name   = "u32",
+   .desc   = "32bit host endian integer",
+   .size   = 4 * BITS_PER_BYTE,
+   .byteorder  = BYTEORDER_HOST_ENDIAN,
+   .basetype   = _type,
+};
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind

2017-02-03 Thread Florian Westphal
Next patch will add ZONE_ID set support which will need similar
error unwind (put operation) as conntrack labels.

Prepare for this: remove the 'label_got' boolean in favor
of a switch statement that can be extended in next patch.

As we already have that in the set_destroy function place that in
a separate function and call it from the set init function.

Signed-off-by: Florian Westphal 
---
 net/netfilter/nft_ct.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 5bd4cdfdcda5..2d82df2737da 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -386,12 +386,24 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
return 0;
 }
 
+static void __nft_ct_set_destroy(const struct nft_ctx *ctx, struct nft_ct 
*priv)
+{
+   switch (priv->key) {
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+   case NFT_CT_LABELS:
+   nf_connlabels_put(ctx->net);
+   break;
+#endif
+   default:
+   break;
+   }
+}
+
 static int nft_ct_set_init(const struct nft_ctx *ctx,
   const struct nft_expr *expr,
   const struct nlattr * const tb[])
 {
struct nft_ct *priv = nft_expr_priv(expr);
-   bool label_got = false;
unsigned int len;
int err;
 
@@ -412,7 +424,6 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
err = nf_connlabels_get(ctx->net, (len * BITS_PER_BYTE) - 1);
if (err)
return err;
-   label_got = true;
break;
 #endif
default:
@@ -431,8 +442,7 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
return 0;
 
 err1:
-   if (label_got)
-   nf_connlabels_put(ctx->net);
+   __nft_ct_set_destroy(ctx, priv);
return err;
 }
 
@@ -447,16 +457,7 @@ static void nft_ct_set_destroy(const struct nft_ctx *ctx,
 {
struct nft_ct *priv = nft_expr_priv(expr);
 
-   switch (priv->key) {
-#ifdef CONFIG_NF_CONNTRACK_LABELS
-   case NFT_CT_LABELS:
-   nf_connlabels_put(ctx->net);
-   break;
-#endif
-   default:
-   break;
-   }
-
+   __nft_ct_set_destroy(ctx, priv);
nft_ct_netns_put(ctx->net, ctx->afi->family);
 }
 
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 09/27] iptables: use match, target and data copy_to_user helpers

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

Convert iptables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/ip_tables.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 91656a1d8fbd..384b85713e06 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -826,10 +826,6 @@ copy_entries_to_user(unsigned int total_size,
return PTR_ERR(counters);
 
loc_cpu_entry = private->entries;
-   if (copy_to_user(userptr, loc_cpu_entry, total_size) != 0) {
-   ret = -EFAULT;
-   goto free_counters;
-   }
 
/* FIXME: use iterator macros --RR */
/* ... then go back and fix counters and names */
@@ -839,6 +835,10 @@ copy_entries_to_user(unsigned int total_size,
const struct xt_entry_target *t;
 
e = (struct ipt_entry *)(loc_cpu_entry + off);
+   if (copy_to_user(userptr + off, e, sizeof(*e))) {
+   ret = -EFAULT;
+   goto free_counters;
+   }
if (copy_to_user(userptr + off
 + offsetof(struct ipt_entry, counters),
 [num],
@@ -852,23 +852,14 @@ copy_entries_to_user(unsigned int total_size,
 i += m->u.match_size) {
m = (void *)e + i;
 
-   if (copy_to_user(userptr + off + i
-+ offsetof(struct xt_entry_match,
-   u.user.name),
-m->u.kernel.match->name,
-strlen(m->u.kernel.match->name)+1)
-   != 0) {
+   if (xt_match_to_user(m, userptr + off + i)) {
ret = -EFAULT;
goto free_counters;
}
}
 
t = ipt_get_target_c(e);
-   if (copy_to_user(userptr + off + e->target_offset
-+ offsetof(struct xt_entry_target,
-   u.user.name),
-t->u.kernel.target->name,
-strlen(t->u.kernel.target->name)+1) != 0) {
+   if (xt_target_to_user(t, userptr + off + e->target_offset)) {
ret = -EFAULT;
goto free_counters;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 08/27] xtables: add xt_match, xt_target and data copy_to_user functions

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

xt_entry_target, xt_entry_match and their private data may contain
kernel data.

Introduce helper functions xt_match_to_user, xt_target_to_user and
xt_data_to_user that copy only the expected fields. These replace
existing logic that calls copy_to_user on entire structs, then
overwrites select fields.

Private data is defined in xt_match and xt_target. All matches and
targets that maintain kernel data store this at the tail of their
private structure. Extend xt_match and xt_target with .usersize to
limit how many bytes of data are copied. The remainder is cleared.

If compatsize is specified, usersize can only safely be used if all
fields up to usersize use platform-independent types. Otherwise, the
compat_to_user callback must be defined.

This patch does not yet enable the support logic.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/netfilter/x_tables.h |  9 +++
 net/netfilter/x_tables.c   | 54 ++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/netfilter/x_tables.h 
b/include/linux/netfilter/x_tables.h
index 5117e4d2ddfa..be378cf47fcc 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -167,6 +167,7 @@ struct xt_match {
 
const char *table;
unsigned int matchsize;
+   unsigned int usersize;
 #ifdef CONFIG_COMPAT
unsigned int compatsize;
 #endif
@@ -207,6 +208,7 @@ struct xt_target {
 
const char *table;
unsigned int targetsize;
+   unsigned int usersize;
 #ifdef CONFIG_COMPAT
unsigned int compatsize;
 #endif
@@ -287,6 +289,13 @@ int xt_check_match(struct xt_mtchk_param *, unsigned int 
size, u_int8_t proto,
 int xt_check_target(struct xt_tgchk_param *, unsigned int size, u_int8_t proto,
bool inv_proto);
 
+int xt_match_to_user(const struct xt_entry_match *m,
+struct xt_entry_match __user *u);
+int xt_target_to_user(const struct xt_entry_target *t,
+ struct xt_entry_target __user *u);
+int xt_data_to_user(void __user *dst, const void *src,
+   int usersize, int size);
+
 void *xt_copy_counters_from_user(const void __user *user, unsigned int len,
 struct xt_counters_info *info, bool compat);
 
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index 2ff499680cc6..feccf527abdd 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -262,6 +262,60 @@ struct xt_target *xt_request_find_target(u8 af, const char 
*name, u8 revision)
 }
 EXPORT_SYMBOL_GPL(xt_request_find_target);
 
+
+static int xt_obj_to_user(u16 __user *psize, u16 size,
+ void __user *pname, const char *name,
+ u8 __user *prev, u8 rev)
+{
+   if (put_user(size, psize))
+   return -EFAULT;
+   if (copy_to_user(pname, name, strlen(name) + 1))
+   return -EFAULT;
+   if (put_user(rev, prev))
+   return -EFAULT;
+
+   return 0;
+}
+
+#define XT_OBJ_TO_USER(U, K, TYPE, C_SIZE) \
+   xt_obj_to_user(>u.TYPE##_size, C_SIZE ? : K->u.TYPE##_size,  \
+  U->u.user.name, K->u.kernel.TYPE->name,  \
+  >u.user.revision, K->u.kernel.TYPE->revision)
+
+int xt_data_to_user(void __user *dst, const void *src,
+   int usersize, int size)
+{
+   usersize = usersize ? : size;
+   if (copy_to_user(dst, src, usersize))
+   return -EFAULT;
+   if (usersize != size && clear_user(dst + usersize, size - usersize))
+   return -EFAULT;
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(xt_data_to_user);
+
+#define XT_DATA_TO_USER(U, K, TYPE, C_SIZE)\
+   xt_data_to_user(U->data, K->data,   \
+   K->u.kernel.TYPE->usersize, \
+   C_SIZE ? : K->u.kernel.TYPE->TYPE##size)
+
+int xt_match_to_user(const struct xt_entry_match *m,
+struct xt_entry_match __user *u)
+{
+   return XT_OBJ_TO_USER(u, m, match, 0) ||
+  XT_DATA_TO_USER(u, m, match, 0);
+}
+EXPORT_SYMBOL_GPL(xt_match_to_user);
+
+int xt_target_to_user(const struct xt_entry_target *t,
+ struct xt_entry_target __user *u)
+{
+   return XT_OBJ_TO_USER(u, t, target, 0) ||
+  XT_DATA_TO_USER(u, t, target, 0);
+}
+EXPORT_SYMBOL_GPL(xt_target_to_user);
+
 static int match_revfn(u8 af, const char *name, u8 revision, int *bestp)
 {
const struct xt_match *m;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 02/27] netfilter: nat: merge udp and udplite helpers

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

udplite nat was copied from udp nat, they are virtually 100% identical.
Not really surprising given udplite is just udp with partial csum coverage.

old:
   textdata bss dec hex filename
  116061457 210   1327333d9 nf_nat.ko
330   0   2 332 14c nf_nat_proto_udp.o
276   0   2 278 116 nf_nat_proto_udplite.o
new:
   textdata bss dec hex filename
  115981457 210   1326533d1 nf_nat.ko
640   0   4 644 284 nf_nat_proto_udp.o

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/Makefile   |  1 -
 net/netfilter/nf_nat_proto_udp.c | 78 ++--
 net/netfilter/nf_nat_proto_udplite.c | 73 -
 3 files changed, 66 insertions(+), 86 deletions(-)
 delete mode 100644 net/netfilter/nf_nat_proto_udplite.c

diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index bf5c577113b6..6b3034f12661 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -46,7 +46,6 @@ nf_nat-y  := nf_nat_core.o nf_nat_proto_unknown.o 
nf_nat_proto_common.o \
 # NAT protocols (nf_nat)
 nf_nat-$(CONFIG_NF_NAT_PROTO_DCCP) += nf_nat_proto_dccp.o
 nf_nat-$(CONFIG_NF_NAT_PROTO_SCTP) += nf_nat_proto_sctp.o
-nf_nat-$(CONFIG_NF_NAT_PROTO_UDPLITE) += nf_nat_proto_udplite.o
 
 # generic transport layer logging
 obj-$(CONFIG_NF_LOG_COMMON) += nf_log_common.o
diff --git a/net/netfilter/nf_nat_proto_udp.c b/net/netfilter/nf_nat_proto_udp.c
index b1e627227b6e..edd4a77dc09a 100644
--- a/net/netfilter/nf_nat_proto_udp.c
+++ b/net/netfilter/nf_nat_proto_udp.c
@@ -30,20 +30,15 @@ udp_unique_tuple(const struct nf_nat_l3proto *l3proto,
_port_rover);
 }
 
-static bool
-udp_manip_pkt(struct sk_buff *skb,
- const struct nf_nat_l3proto *l3proto,
- unsigned int iphdroff, unsigned int hdroff,
- const struct nf_conntrack_tuple *tuple,
- enum nf_nat_manip_type maniptype)
+static void
+__udp_manip_pkt(struct sk_buff *skb,
+   const struct nf_nat_l3proto *l3proto,
+   unsigned int iphdroff, struct udphdr *hdr,
+   const struct nf_conntrack_tuple *tuple,
+   enum nf_nat_manip_type maniptype, bool do_csum)
 {
-   struct udphdr *hdr;
__be16 *portptr, newport;
 
-   if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
-   return false;
-   hdr = (struct udphdr *)(skb->data + hdroff);
-
if (maniptype == NF_NAT_MANIP_SRC) {
/* Get rid of src port */
newport = tuple->src.u.udp.port;
@@ -53,7 +48,7 @@ udp_manip_pkt(struct sk_buff *skb,
newport = tuple->dst.u.udp.port;
portptr = >dest;
}
-   if (hdr->check || skb->ip_summed == CHECKSUM_PARTIAL) {
+   if (do_csum) {
l3proto->csum_update(skb, iphdroff, >check,
 tuple, maniptype);
inet_proto_csum_replace2(>check, skb, *portptr, newport,
@@ -62,9 +57,68 @@ udp_manip_pkt(struct sk_buff *skb,
hdr->check = CSUM_MANGLED_0;
}
*portptr = newport;
+}
+
+static bool udp_manip_pkt(struct sk_buff *skb,
+ const struct nf_nat_l3proto *l3proto,
+ unsigned int iphdroff, unsigned int hdroff,
+ const struct nf_conntrack_tuple *tuple,
+ enum nf_nat_manip_type maniptype)
+{
+   struct udphdr *hdr;
+   bool do_csum;
+
+   if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
+   return false;
+
+   hdr = (struct udphdr *)(skb->data + hdroff);
+   do_csum = hdr->check || skb->ip_summed == CHECKSUM_PARTIAL;
+
+   __udp_manip_pkt(skb, l3proto, iphdroff, hdr, tuple, maniptype, do_csum);
+   return true;
+}
+
+#ifdef CONFIG_NF_NAT_PROTO_UDPLITE
+static u16 udplite_port_rover;
+
+static bool udplite_manip_pkt(struct sk_buff *skb,
+ const struct nf_nat_l3proto *l3proto,
+ unsigned int iphdroff, unsigned int hdroff,
+ const struct nf_conntrack_tuple *tuple,
+ enum nf_nat_manip_type maniptype)
+{
+   struct udphdr *hdr;
+
+   if (!skb_make_writable(skb, hdroff + sizeof(*hdr)))
+   return false;
+
+   hdr = (struct udphdr *)(skb->data + hdroff);
+   __udp_manip_pkt(skb, l3proto, iphdroff, hdr, tuple, maniptype, true);
return true;
 }
 
+static void
+udplite_unique_tuple(const struct nf_nat_l3proto *l3proto,
+struct nf_conntrack_tuple *tuple,
+const struct nf_nat_range *range,
+enum nf_nat_manip_type maniptype,
+const struct nf_conn *ct)
+{
+  

[PATCH 03/27] netfilter: nf_tables: add missing descriptions in nft_ct_keys

2017-02-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

We missed to add descriptions about NFT_CT_LABELS, NFT_CT_PKTS and
NFT_CT_BYTES, now add it.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/nf_tables.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index 881d49e94569..5726f90bfc2f 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -860,6 +860,9 @@ enum nft_rt_attributes {
  * @NFT_CT_PROTOCOL: conntrack layer 4 protocol
  * @NFT_CT_PROTO_SRC: conntrack layer 4 protocol source
  * @NFT_CT_PROTO_DST: conntrack layer 4 protocol destination
+ * @NFT_CT_LABELS: conntrack labels
+ * @NFT_CT_PKTS: conntrack packets
+ * @NFT_CT_BYTES: conntrack bytes
  */
 enum nft_ct_keys {
NFT_CT_STATE,
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/27] netfilter: nft_ct: add average bytes per packet support

2017-02-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

Similar to xt_connbytes, user can match how many average bytes per packet
a connection has transferred so far.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 include/uapi/linux/netfilter/nf_tables.h |  2 ++
 net/netfilter/nft_ct.c   | 22 +-
 2 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/netfilter/nf_tables.h 
b/include/uapi/linux/netfilter/nf_tables.h
index 5726f90bfc2f..b00a05d1ee56 100644
--- a/include/uapi/linux/netfilter/nf_tables.h
+++ b/include/uapi/linux/netfilter/nf_tables.h
@@ -863,6 +863,7 @@ enum nft_rt_attributes {
  * @NFT_CT_LABELS: conntrack labels
  * @NFT_CT_PKTS: conntrack packets
  * @NFT_CT_BYTES: conntrack bytes
+ * @NFT_CT_AVGPKT: conntrack average bytes per packet
  */
 enum nft_ct_keys {
NFT_CT_STATE,
@@ -881,6 +882,7 @@ enum nft_ct_keys {
NFT_CT_LABELS,
NFT_CT_PKTS,
NFT_CT_BYTES,
+   NFT_CT_AVGPKT,
 };
 
 /**
diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index e6baeaebe653..d774d7823688 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -129,6 +129,22 @@ static void nft_ct_get_eval(const struct nft_expr *expr,
memcpy(dest, , sizeof(count));
return;
}
+   case NFT_CT_AVGPKT: {
+   const struct nf_conn_acct *acct = nf_conn_acct_find(ct);
+   u64 avgcnt = 0, bcnt = 0, pcnt = 0;
+
+   if (acct) {
+   pcnt = nft_ct_get_eval_counter(acct->counter,
+  NFT_CT_PKTS, priv->dir);
+   bcnt = nft_ct_get_eval_counter(acct->counter,
+  NFT_CT_BYTES, priv->dir);
+   if (pcnt != 0)
+   avgcnt = div64_u64(bcnt, pcnt);
+   }
+
+   memcpy(dest, , sizeof(avgcnt));
+   return;
+   }
case NFT_CT_L3PROTOCOL:
*dest = nf_ct_l3num(ct);
return;
@@ -316,6 +332,7 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
break;
case NFT_CT_BYTES:
case NFT_CT_PKTS:
+   case NFT_CT_AVGPKT:
/* no direction? return sum of original + reply */
if (tb[NFTA_CT_DIRECTION] == NULL)
priv->dir = IP_CT_DIR_MAX;
@@ -346,7 +363,9 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
if (err < 0)
return err;
 
-   if (priv->key == NFT_CT_BYTES || priv->key == NFT_CT_PKTS)
+   if (priv->key == NFT_CT_BYTES ||
+   priv->key == NFT_CT_PKTS  ||
+   priv->key == NFT_CT_AVGPKT)
nf_ct_set_acct(ctx->net, true);
 
return 0;
@@ -445,6 +464,7 @@ static int nft_ct_get_dump(struct sk_buff *skb, const 
struct nft_expr *expr)
break;
case NFT_CT_BYTES:
case NFT_CT_PKTS:
+   case NFT_CT_AVGPKT:
if (priv->dir < IP_CT_DIR_MAX &&
nla_put_u8(skb, NFTA_CT_DIRECTION, priv->dir))
goto nla_put_failure;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 07/27] netfilter: xt_connlimit: use rb_entry()

2017-02-03 Thread Pablo Neira Ayuso
From: Geliang Tang 

To make the code clearer, use rb_entry() instead of container_of() to
deal with rbtree.

Signed-off-by: Geliang Tang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_connlimit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/xt_connlimit.c b/net/netfilter/xt_connlimit.c
index 2aff2b7c4689..660b61dbd776 100644
--- a/net/netfilter/xt_connlimit.c
+++ b/net/netfilter/xt_connlimit.c
@@ -218,7 +218,7 @@ count_tree(struct net *net, struct rb_root *root,
int diff;
bool addit;
 
-   rbconn = container_of(*rbnode, struct xt_connlimit_rb, node);
+   rbconn = rb_entry(*rbnode, struct xt_connlimit_rb, node);
 
parent = *rbnode;
diff = same_source_net(addr, mask, >addr, family);
@@ -398,7 +398,7 @@ static void destroy_tree(struct rb_root *r)
struct rb_node *node;
 
while ((node = rb_first(r)) != NULL) {
-   rbconn = container_of(node, struct xt_connlimit_rb, node);
+   rbconn = rb_entry(node, struct xt_connlimit_rb, node);
 
rb_erase(node, r);
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 10/27] ip6tables: use match, target and data copy_to_user helpers

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

Convert ip6tables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv6/netfilter/ip6_tables.c | 21 ++---
 1 file changed, 6 insertions(+), 15 deletions(-)

diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 25a022d41a70..1e15c54fd5e2 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -855,10 +855,6 @@ copy_entries_to_user(unsigned int total_size,
return PTR_ERR(counters);
 
loc_cpu_entry = private->entries;
-   if (copy_to_user(userptr, loc_cpu_entry, total_size) != 0) {
-   ret = -EFAULT;
-   goto free_counters;
-   }
 
/* FIXME: use iterator macros --RR */
/* ... then go back and fix counters and names */
@@ -868,6 +864,10 @@ copy_entries_to_user(unsigned int total_size,
const struct xt_entry_target *t;
 
e = (struct ip6t_entry *)(loc_cpu_entry + off);
+   if (copy_to_user(userptr + off, e, sizeof(*e))) {
+   ret = -EFAULT;
+   goto free_counters;
+   }
if (copy_to_user(userptr + off
 + offsetof(struct ip6t_entry, counters),
 [num],
@@ -881,23 +881,14 @@ copy_entries_to_user(unsigned int total_size,
 i += m->u.match_size) {
m = (void *)e + i;
 
-   if (copy_to_user(userptr + off + i
-+ offsetof(struct xt_entry_match,
-   u.user.name),
-m->u.kernel.match->name,
-strlen(m->u.kernel.match->name)+1)
-   != 0) {
+   if (xt_match_to_user(m, userptr + off + i)) {
ret = -EFAULT;
goto free_counters;
}
}
 
t = ip6t_get_target_c(e);
-   if (copy_to_user(userptr + off + e->target_offset
-+ offsetof(struct xt_entry_target,
-   u.user.name),
-t->u.kernel.target->name,
-strlen(t->u.kernel.target->name)+1) != 0) {
+   if (xt_target_to_user(t, userptr + off + e->target_offset)) {
ret = -EFAULT;
goto free_counters;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/27] arptables: use match, target and data copy_to_user helpers

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

Convert arptables to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/arp_tables.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index a467e1236c43..6241a81fd7f5 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -677,11 +677,6 @@ static int copy_entries_to_user(unsigned int total_size,
return PTR_ERR(counters);
 
loc_cpu_entry = private->entries;
-   /* ... then copy entire thing ... */
-   if (copy_to_user(userptr, loc_cpu_entry, total_size) != 0) {
-   ret = -EFAULT;
-   goto free_counters;
-   }
 
/* FIXME: use iterator macros --RR */
/* ... then go back and fix counters and names */
@@ -689,6 +684,10 @@ static int copy_entries_to_user(unsigned int total_size,
const struct xt_entry_target *t;
 
e = (struct arpt_entry *)(loc_cpu_entry + off);
+   if (copy_to_user(userptr + off, e, sizeof(*e))) {
+   ret = -EFAULT;
+   goto free_counters;
+   }
if (copy_to_user(userptr + off
 + offsetof(struct arpt_entry, counters),
 [num],
@@ -698,11 +697,7 @@ static int copy_entries_to_user(unsigned int total_size,
}
 
t = arpt_get_target_c(e);
-   if (copy_to_user(userptr + off + e->target_offset
-+ offsetof(struct xt_entry_target,
-   u.user.name),
-t->u.kernel.target->name,
-strlen(t->u.kernel.target->name)+1) != 0) {
+   if (xt_target_to_user(t, userptr + off + e->target_offset)) {
ret = -EFAULT;
goto free_counters;
}
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 12/27] ebtables: use match, target and data copy_to_user helpers

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

Convert ebtables to copying entries, matches and targets one by one.

The solution is analogous to that of generic xt_(match|target)_to_user
helpers, but is applied to different structs.

Convert existing helpers ebt_make_XXXname helpers that overwrite
fields of an already copy_to_user'd struct with ebt_XXX_to_user
helpers that copy all relevant fields of the struct from scratch.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/bridge/netfilter/ebtables.c | 78 +
 1 file changed, 47 insertions(+), 31 deletions(-)

diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c
index 537e3d506fc2..79b69917f521 100644
--- a/net/bridge/netfilter/ebtables.c
+++ b/net/bridge/netfilter/ebtables.c
@@ -1346,56 +1346,72 @@ static int update_counters(struct net *net, const void 
__user *user,
hlp.num_counters, user, len);
 }
 
-static inline int ebt_make_matchname(const struct ebt_entry_match *m,
-const char *base, char __user *ubase)
+static inline int ebt_obj_to_user(char __user *um, const char *_name,
+ const char *data, int entrysize,
+ int usersize, int datasize)
 {
-   char __user *hlp = ubase + ((char *)m - base);
-   char name[EBT_FUNCTION_MAXNAMELEN] = {};
+   char name[EBT_FUNCTION_MAXNAMELEN] = {0};
 
/* ebtables expects 32 bytes long names but xt_match names are 29 bytes
 * long. Copy 29 bytes and fill remaining bytes with zeroes.
 */
-   strlcpy(name, m->u.match->name, sizeof(name));
-   if (copy_to_user(hlp, name, EBT_FUNCTION_MAXNAMELEN))
+   strlcpy(name, _name, sizeof(name));
+   if (copy_to_user(um, name, EBT_FUNCTION_MAXNAMELEN) ||
+   put_user(datasize, (int __user *)(um + EBT_FUNCTION_MAXNAMELEN)) ||
+   xt_data_to_user(um + entrysize, data, usersize, datasize))
return -EFAULT;
+
return 0;
 }
 
-static inline int ebt_make_watchername(const struct ebt_entry_watcher *w,
-  const char *base, char __user *ubase)
+static inline int ebt_match_to_user(const struct ebt_entry_match *m,
+   const char *base, char __user *ubase)
 {
-   char __user *hlp = ubase + ((char *)w - base);
-   char name[EBT_FUNCTION_MAXNAMELEN] = {};
+   return ebt_obj_to_user(ubase + ((char *)m - base),
+  m->u.match->name, m->data, sizeof(*m),
+  m->u.match->usersize, m->match_size);
+}
 
-   strlcpy(name, w->u.watcher->name, sizeof(name));
-   if (copy_to_user(hlp, name, EBT_FUNCTION_MAXNAMELEN))
-   return -EFAULT;
-   return 0;
+static inline int ebt_watcher_to_user(const struct ebt_entry_watcher *w,
+ const char *base, char __user *ubase)
+{
+   return ebt_obj_to_user(ubase + ((char *)w - base),
+  w->u.watcher->name, w->data, sizeof(*w),
+  w->u.watcher->usersize, w->watcher_size);
 }
 
-static inline int ebt_make_names(struct ebt_entry *e, const char *base,
-char __user *ubase)
+static inline int ebt_entry_to_user(struct ebt_entry *e, const char *base,
+   char __user *ubase)
 {
int ret;
char __user *hlp;
const struct ebt_entry_target *t;
-   char name[EBT_FUNCTION_MAXNAMELEN] = {};
 
-   if (e->bitmask == 0)
+   if (e->bitmask == 0) {
+   /* special case !EBT_ENTRY_OR_ENTRIES */
+   if (copy_to_user(ubase + ((char *)e - base), e,
+sizeof(struct ebt_entries)))
+   return -EFAULT;
return 0;
+   }
+
+   if (copy_to_user(ubase + ((char *)e - base), e, sizeof(*e)))
+   return -EFAULT;
 
hlp = ubase + (((char *)e + e->target_offset) - base);
t = (struct ebt_entry_target *)(((char *)e) + e->target_offset);
 
-   ret = EBT_MATCH_ITERATE(e, ebt_make_matchname, base, ubase);
+   ret = EBT_MATCH_ITERATE(e, ebt_match_to_user, base, ubase);
if (ret != 0)
return ret;
-   ret = EBT_WATCHER_ITERATE(e, ebt_make_watchername, base, ubase);
+   ret = EBT_WATCHER_ITERATE(e, ebt_watcher_to_user, base, ubase);
if (ret != 0)
return ret;
-   strlcpy(name, t->u.target->name, sizeof(name));
-   if (copy_to_user(hlp, name, EBT_FUNCTION_MAXNAMELEN))
-   return -EFAULT;
+   ret = ebt_obj_to_user(hlp, t->u.target->name, t->data, sizeof(*t),
+ t->u.target->usersize, t->target_size);
+   if (ret != 0)
+   return ret;
+
return 0;
 }
 
@@ -1475,13 +1491,9 @@ 

[PATCH 13/27] xtables: use match, target and data copy_to_user helpers in compat

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

Convert compat to copying entries, matches and targets one by one,
using the xt_match_to_user and xt_target_to_user helper functions.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/x_tables.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index feccf527abdd..016db6be94b9 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -619,17 +619,14 @@ int xt_compat_match_to_user(const struct xt_entry_match 
*m,
int off = xt_compat_match_offset(match);
u_int16_t msize = m->u.user.match_size - off;
 
-   if (copy_to_user(cm, m, sizeof(*cm)) ||
-   put_user(msize, >u.user.match_size) ||
-   copy_to_user(cm->u.user.name, m->u.kernel.match->name,
-strlen(m->u.kernel.match->name) + 1))
+   if (XT_OBJ_TO_USER(cm, m, match, msize))
return -EFAULT;
 
if (match->compat_to_user) {
if (match->compat_to_user((void __user *)cm->data, m->data))
return -EFAULT;
} else {
-   if (copy_to_user(cm->data, m->data, msize - sizeof(*cm)))
+   if (XT_DATA_TO_USER(cm, m, match, msize - sizeof(*cm)))
return -EFAULT;
}
 
@@ -977,17 +974,14 @@ int xt_compat_target_to_user(const struct xt_entry_target 
*t,
int off = xt_compat_target_offset(target);
u_int16_t tsize = t->u.user.target_size - off;
 
-   if (copy_to_user(ct, t, sizeof(*ct)) ||
-   put_user(tsize, >u.user.target_size) ||
-   copy_to_user(ct->u.user.name, t->u.kernel.target->name,
-strlen(t->u.kernel.target->name) + 1))
+   if (XT_OBJ_TO_USER(ct, t, target, tsize))
return -EFAULT;
 
if (target->compat_to_user) {
if (target->compat_to_user((void __user *)ct->data, t->data))
return -EFAULT;
} else {
-   if (copy_to_user(ct->data, t->data, tsize - sizeof(*ct)))
+   if (XT_DATA_TO_USER(ct, t, target, tsize - sizeof(*ct)))
return -EFAULT;
}
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 16/27] netfilter: nft_meta: deal with PACKET_LOOPBACK in netdev family

2017-02-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

After adding the following nft rule, then ping 224.0.0.1:
  # nft add rule netdev t c pkttype host counter

The warning complain message will be printed out again and again:
  WARNING: CPU: 0 PID: 10182 at net/netfilter/nft_meta.c:163 \
   nft_meta_get_eval+0x3fe/0x460 [nft_meta]
  [...]
  Call Trace:
  
  dump_stack+0x85/0xc2
  __warn+0xcb/0xf0
  warn_slowpath_null+0x1d/0x20
  nft_meta_get_eval+0x3fe/0x460 [nft_meta]
  nft_do_chain+0xff/0x5e0 [nf_tables]

So we should deal with PACKET_LOOPBACK in netdev family too. For ipv4,
convert it to PACKET_BROADCAST/MULTICAST according to the destination
address's type; For ipv6, convert it to PACKET_MULTICAST directly.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_meta.c | 28 +++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 9a22b24346b8..e1f5ca9b423b 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -156,8 +156,34 @@ void nft_meta_get_eval(const struct nft_expr *expr,
case NFPROTO_IPV6:
*dest = PACKET_MULTICAST;
break;
+   case NFPROTO_NETDEV:
+   switch (skb->protocol) {
+   case htons(ETH_P_IP): {
+   int noff = skb_network_offset(skb);
+   struct iphdr *iph, _iph;
+
+   iph = skb_header_pointer(skb, noff,
+sizeof(_iph), &_iph);
+   if (!iph)
+   goto err;
+
+   if (ipv4_is_multicast(iph->daddr))
+   *dest = PACKET_MULTICAST;
+   else
+   *dest = PACKET_BROADCAST;
+
+   break;
+   }
+   case htons(ETH_P_IPV6):
+   *dest = PACKET_MULTICAST;
+   break;
+   default:
+   WARN_ON_ONCE(1);
+   goto err;
+   }
+   break;
default:
-   WARN_ON(1);
+   WARN_ON_ONCE(1);
goto err;
}
break;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 21/27] netfilter: reduce direct skb->nfct usage

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

Next patch makes direct skb->nfct access illegal, reduce noise
in next patch by using accessors we already have.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/ip_vs.h   |  9 ++---
 net/netfilter/nf_conntrack_core.c | 15 +--
 2 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index cd6018a9ee24..2a344ebd7ebe 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1554,10 +1554,13 @@ static inline void ip_vs_notrack(struct sk_buff *skb)
struct nf_conn *ct = nf_ct_get(skb, );
 
if (!ct || !nf_ct_is_untracked(ct)) {
-   nf_conntrack_put(skb->nfct);
-   skb->nfct = _ct_untracked_get()->ct_general;
+   struct nf_conn *untracked;
+
+   nf_conntrack_put(>ct_general);
+   untracked = nf_ct_untracked_get();
+   nf_conntrack_get(>ct_general);
+   skb->nfct = >ct_general;
skb->nfctinfo = IP_CT_NEW;
-   nf_conntrack_get(skb->nfct);
}
 #endif
 }
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 86186a2e2715..adb7af3a4c4c 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -686,8 +686,11 @@ static int nf_ct_resolve_clash(struct net *net, struct 
sk_buff *skb,
!nfct_nat(ct) &&
!nf_ct_is_dying(ct) &&
atomic_inc_not_zero(>ct_general.use)) {
-   nf_ct_acct_merge(ct, ctinfo, (struct nf_conn *)skb->nfct);
-   nf_conntrack_put(skb->nfct);
+   enum ip_conntrack_info oldinfo;
+   struct nf_conn *loser_ct = nf_ct_get(skb, );
+
+   nf_ct_acct_merge(ct, ctinfo, loser_ct);
+   nf_conntrack_put(_ct->ct_general);
/* Assign conntrack already in hashes to this skbuff. Don't
 * modify skb->nfctinfo to ensure consistent stateful filtering.
 */
@@ -1288,7 +1291,7 @@ unsigned int
 nf_conntrack_in(struct net *net, u_int8_t pf, unsigned int hooknum,
struct sk_buff *skb)
 {
-   struct nf_conn *ct, *tmpl = NULL;
+   struct nf_conn *ct, *tmpl;
enum ip_conntrack_info ctinfo;
struct nf_conntrack_l3proto *l3proto;
struct nf_conntrack_l4proto *l4proto;
@@ -1298,9 +1301,9 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned 
int hooknum,
int set_reply = 0;
int ret;
 
-   if (skb->nfct) {
+   tmpl = nf_ct_get(skb, );
+   if (tmpl) {
/* Previously seen (loopback or untracked)?  Ignore. */
-   tmpl = (struct nf_conn *)skb->nfct;
if (!nf_ct_is_template(tmpl)) {
NF_CT_STAT_INC_ATOMIC(net, ignore);
return NF_ACCEPT;
@@ -1364,7 +1367,7 @@ nf_conntrack_in(struct net *net, u_int8_t pf, unsigned 
int hooknum,
/* Invalid: inverse of the return code tells
 * the netfilter core what to do */
pr_debug("nf_conntrack_in: Can't track with proto module\n");
-   nf_conntrack_put(skb->nfct);
+   nf_conntrack_put(>ct_general);
skb->nfct = NULL;
NF_CT_STAT_INC_ATOMIC(net, invalid);
if (ret == -NF_DROP)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 14/27] xtables: extend matches and targets with .usersize

2017-02-03 Thread Pablo Neira Ayuso
From: Willem de Bruijn 

In matches and targets that define a kernel-only tail to their
xt_match and xt_target data structs, add a field .usersize that
specifies up to where data is to be shared with userspace.

Performed a search for comment "Used internally by the kernel" to find
relevant matches and targets. Manually inspected the structs to derive
a valid offsetof.

Signed-off-by: Willem de Bruijn 
Signed-off-by: Pablo Neira Ayuso 
---
 net/bridge/netfilter/ebt_limit.c   | 1 +
 net/ipv4/netfilter/ipt_CLUSTERIP.c | 1 +
 net/ipv6/netfilter/ip6t_NPT.c  | 2 ++
 net/netfilter/xt_CT.c  | 3 +++
 net/netfilter/xt_RATEEST.c | 1 +
 net/netfilter/xt_TEE.c | 2 ++
 net/netfilter/xt_bpf.c | 2 ++
 net/netfilter/xt_cgroup.c  | 1 +
 net/netfilter/xt_connlimit.c   | 1 +
 net/netfilter/xt_hashlimit.c   | 4 
 net/netfilter/xt_limit.c   | 2 ++
 net/netfilter/xt_quota.c   | 1 +
 net/netfilter/xt_rateest.c | 1 +
 net/netfilter/xt_string.c  | 1 +
 14 files changed, 23 insertions(+)

diff --git a/net/bridge/netfilter/ebt_limit.c b/net/bridge/netfilter/ebt_limit.c
index 517e78befcb2..61a9f1be1263 100644
--- a/net/bridge/netfilter/ebt_limit.c
+++ b/net/bridge/netfilter/ebt_limit.c
@@ -105,6 +105,7 @@ static struct xt_match ebt_limit_mt_reg __read_mostly = {
.match  = ebt_limit_mt,
.checkentry = ebt_limit_mt_check,
.matchsize  = sizeof(struct ebt_limit_info),
+   .usersize   = offsetof(struct ebt_limit_info, prev),
 #ifdef CONFIG_COMPAT
.compatsize = sizeof(struct ebt_compat_limit_info),
 #endif
diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
b/net/ipv4/netfilter/ipt_CLUSTERIP.c
index 21db00d0362b..8a3d20ebb815 100644
--- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
+++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
@@ -468,6 +468,7 @@ static struct xt_target clusterip_tg_reg __read_mostly = {
.checkentry = clusterip_tg_check,
.destroy= clusterip_tg_destroy,
.targetsize = sizeof(struct ipt_clusterip_tgt_info),
+   .usersize   = offsetof(struct ipt_clusterip_tgt_info, config),
 #ifdef CONFIG_COMPAT
.compatsize = sizeof(struct compat_ipt_clusterip_tgt_info),
 #endif /* CONFIG_COMPAT */
diff --git a/net/ipv6/netfilter/ip6t_NPT.c b/net/ipv6/netfilter/ip6t_NPT.c
index 590f767db5d4..a379d2f79b19 100644
--- a/net/ipv6/netfilter/ip6t_NPT.c
+++ b/net/ipv6/netfilter/ip6t_NPT.c
@@ -112,6 +112,7 @@ static struct xt_target ip6t_npt_target_reg[] __read_mostly 
= {
.table  = "mangle",
.target = ip6t_snpt_tg,
.targetsize = sizeof(struct ip6t_npt_tginfo),
+   .usersize   = offsetof(struct ip6t_npt_tginfo, adjustment),
.checkentry = ip6t_npt_checkentry,
.family = NFPROTO_IPV6,
.hooks  = (1 << NF_INET_LOCAL_IN) |
@@ -123,6 +124,7 @@ static struct xt_target ip6t_npt_target_reg[] __read_mostly 
= {
.table  = "mangle",
.target = ip6t_dnpt_tg,
.targetsize = sizeof(struct ip6t_npt_tginfo),
+   .usersize   = offsetof(struct ip6t_npt_tginfo, adjustment),
.checkentry = ip6t_npt_checkentry,
.family = NFPROTO_IPV6,
.hooks  = (1 << NF_INET_PRE_ROUTING) |
diff --git a/net/netfilter/xt_CT.c b/net/netfilter/xt_CT.c
index 95c750358747..26b0bccfa0c5 100644
--- a/net/netfilter/xt_CT.c
+++ b/net/netfilter/xt_CT.c
@@ -373,6 +373,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
.name   = "CT",
.family = NFPROTO_UNSPEC,
.targetsize = sizeof(struct xt_ct_target_info),
+   .usersize   = offsetof(struct xt_ct_target_info, ct),
.checkentry = xt_ct_tg_check_v0,
.destroy= xt_ct_tg_destroy_v0,
.target = xt_ct_target_v0,
@@ -384,6 +385,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
.family = NFPROTO_UNSPEC,
.revision   = 1,
.targetsize = sizeof(struct xt_ct_target_info_v1),
+   .usersize   = offsetof(struct xt_ct_target_info, ct),
.checkentry = xt_ct_tg_check_v1,
.destroy= xt_ct_tg_destroy_v1,
.target = xt_ct_target_v1,
@@ -395,6 +397,7 @@ static struct xt_target xt_ct_tg_reg[] __read_mostly = {
.family = NFPROTO_UNSPEC,
.revision   = 2,
.targetsize = sizeof(struct xt_ct_target_info_v1),
+   .usersize   = offsetof(struct xt_ct_target_info, ct),
.checkentry = xt_ct_tg_check_v2,

[PATCH 18/27] netfilter: nf_tables: Eliminate duplicated code in nf_tables_table_enable()

2017-02-03 Thread Pablo Neira Ayuso
From: Feng 

If something fails in nf_tables_table_enable(), it unregisters the
chains. But the rollback code is the same as nf_tables_table_disable()
almostly, except there is one counter check.  Now create one wrapper
function to eliminate the duplicated codes.

Signed-off-by: Feng 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_tables_api.c | 48 ++-
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 6e07c214c208..e6741ac4ccc1 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -576,6 +576,28 @@ static int nf_tables_gettable(struct net *net, struct sock 
*nlsk,
return err;
 }
 
+static void _nf_tables_table_disable(struct net *net,
+const struct nft_af_info *afi,
+struct nft_table *table,
+u32 cnt)
+{
+   struct nft_chain *chain;
+   u32 i = 0;
+
+   list_for_each_entry(chain, >chains, list) {
+   if (!nft_is_active_next(net, chain))
+   continue;
+   if (!(chain->flags & NFT_BASE_CHAIN))
+   continue;
+
+   if (cnt && i++ == cnt)
+   break;
+
+   nf_unregister_net_hooks(net, nft_base_chain(chain)->ops,
+   afi->nops);
+   }
+}
+
 static int nf_tables_table_enable(struct net *net,
  const struct nft_af_info *afi,
  struct nft_table *table)
@@ -598,18 +620,8 @@ static int nf_tables_table_enable(struct net *net,
}
return 0;
 err:
-   list_for_each_entry(chain, >chains, list) {
-   if (!nft_is_active_next(net, chain))
-   continue;
-   if (!(chain->flags & NFT_BASE_CHAIN))
-   continue;
-
-   if (i-- <= 0)
-   break;
-
-   nf_unregister_net_hooks(net, nft_base_chain(chain)->ops,
-   afi->nops);
-   }
+   if (i)
+   _nf_tables_table_disable(net, afi, table, i);
return err;
 }
 
@@ -617,17 +629,7 @@ static void nf_tables_table_disable(struct net *net,
const struct nft_af_info *afi,
struct nft_table *table)
 {
-   struct nft_chain *chain;
-
-   list_for_each_entry(chain, >chains, list) {
-   if (!nft_is_active_next(net, chain))
-   continue;
-   if (!(chain->flags & NFT_BASE_CHAIN))
-   continue;
-
-   nf_unregister_net_hooks(net, nft_base_chain(chain)->ops,
-   afi->nops);
-   }
+   _nf_tables_table_disable(net, afi, table, 0);
 }
 
 static int nf_tables_updtable(struct nft_ctx *ctx)
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 25/27] netfilter: merge ctinfo into nfct pointer storage area

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

After this change conntrack operations (lookup, creation, matching from
ruleset) only access one instead of two sk_buff cache lines.

This works for normal conntracks because those are allocated from a slab
that guarantees hw cacheline or 8byte alignment (whatever is larger)
so the 3 bits needed for ctinfo won't overlap with nf_conn addresses.

Template allocation now does manual address alignment (see previous change)
on arches that don't have sufficent kmalloc min alignment.

Some spots intentionally use skb->_nfct instead of skb_nfct() helpers,
this is to avoid undoing the skb_nfct() use when we remove untracked
conntrack object in the future.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/skbuff.h  | 21 +
 include/net/netfilter/nf_conntrack.h| 11 ++-
 net/ipv6/netfilter/nf_dup_ipv6.c|  2 +-
 net/netfilter/core.c|  2 +-
 net/netfilter/nf_conntrack_core.c   | 11 ++-
 net/netfilter/nf_conntrack_standalone.c |  3 +++
 net/netfilter/xt_CT.c   |  4 ++--
 7 files changed, 28 insertions(+), 26 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 276431e047af..ac0bc085b139 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -585,7 +585,6 @@ static inline bool skb_mstamp_after(const struct skb_mstamp 
*t1,
  * @cloned: Head may be cloned (check refcnt to be sure)
  * @ip_summed: Driver fed us an IP checksum
  * @nohdr: Payload reference only, must not modify header
- * @nfctinfo: Relationship of this skb to the connection
  * @pkt_type: Packet class
  * @fclone: skbuff clone status
  * @ipvs_property: skbuff is owned by ipvs
@@ -594,7 +593,7 @@ static inline bool skb_mstamp_after(const struct skb_mstamp 
*t1,
  * @nf_trace: netfilter packet trace flag
  * @protocol: Packet protocol from driver
  * @destructor: Destruct function
- * @nfct: Associated connection, if any
+ * @_nfct: Associated connection, if any (with nfctinfo bits)
  * @nf_bridge: Saved data about a bridged frame - see br_netfilter.c
  * @skb_iif: ifindex of device we arrived on
  * @tc_index: Traffic control index
@@ -668,7 +667,7 @@ struct sk_buff {
struct  sec_path*sp;
 #endif
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
-   struct nf_conntrack *nfct;
+   unsigned long_nfct;
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
struct nf_bridge_info   *nf_bridge;
@@ -721,7 +720,6 @@ struct sk_buff {
__u8pkt_type:3;
__u8pfmemalloc:1;
__u8ignore_df:1;
-   __u8nfctinfo:3;
 
__u8nf_trace:1;
__u8ip_summed:2;
@@ -836,6 +834,7 @@ static inline bool skb_pfmemalloc(const struct sk_buff *skb)
 #define SKB_DST_NOREF  1UL
 #define SKB_DST_PTRMASK~(SKB_DST_NOREF)
 
+#define SKB_NFCT_PTRMASK   ~(7UL)
 /**
  * skb_dst - returns skb dst_entry
  * @skb: buffer
@@ -3556,7 +3555,7 @@ static inline void skb_remcsum_process(struct sk_buff 
*skb, void *ptr,
 static inline struct nf_conntrack *skb_nfct(const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
-   return skb->nfct;
+   return (void *)(skb->_nfct & SKB_NFCT_PTRMASK);
 #else
return NULL;
 #endif
@@ -3590,8 +3589,8 @@ static inline void nf_bridge_get(struct nf_bridge_info 
*nf_bridge)
 static inline void nf_reset(struct sk_buff *skb)
 {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
-   nf_conntrack_put(skb->nfct);
-   skb->nfct = NULL;
+   nf_conntrack_put(skb_nfct(skb));
+   skb->_nfct = 0;
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
nf_bridge_put(skb->nf_bridge);
@@ -3611,10 +3610,8 @@ static inline void __nf_copy(struct sk_buff *dst, const 
struct sk_buff *src,
 bool copy)
 {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
-   dst->nfct = src->nfct;
-   nf_conntrack_get(src->nfct);
-   if (copy)
-   dst->nfctinfo = src->nfctinfo;
+   dst->_nfct = src->_nfct;
+   nf_conntrack_get(skb_nfct(src));
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
dst->nf_bridge  = src->nf_bridge;
@@ -3629,7 +3626,7 @@ static inline void __nf_copy(struct sk_buff *dst, const 
struct sk_buff *src,
 static inline void nf_copy(struct sk_buff *dst, const struct sk_buff *src)
 {
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
-   nf_conntrack_put(dst->nfct);
+   nf_conntrack_put(skb_nfct(dst));
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
nf_bridge_put(dst->nf_bridge);
diff --git a/include/net/netfilter/nf_conntrack.h 

[PATCH 19/27] netfilter: conntrack: no need to pass ctinfo to error handler

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

It is never accessed for reading and the only places that write to it
are the icmp(6) handlers, which also set skb->nfct (and skb->nfctinfo).

The conntrack core specifically checks for attached skb->nfct after
->error() invocation and returns early in this case.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack_l4proto.h   |  2 +-
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   | 12 ++--
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c | 12 ++--
 net/netfilter/nf_conntrack_core.c  |  3 +--
 net/netfilter/nf_conntrack_proto_dccp.c|  1 -
 net/netfilter/nf_conntrack_proto_sctp.c|  2 +-
 net/netfilter/nf_conntrack_proto_tcp.c |  1 -
 net/netfilter/nf_conntrack_proto_udp.c |  3 +--
 8 files changed, 16 insertions(+), 20 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
b/include/net/netfilter/nf_conntrack_l4proto.h
index e7b836590f0b..85e993e278d5 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -55,7 +55,7 @@ struct nf_conntrack_l4proto {
void (*destroy)(struct nf_conn *ct);
 
int (*error)(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb,
-unsigned int dataoff, enum ip_conntrack_info *ctinfo,
+unsigned int dataoff,
 u_int8_t pf, unsigned int hooknum);
 
/* Print out the per-protocol part of the tuple. Return like seq_* */
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index d075b3cf2400..566afac98a88 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -128,13 +128,13 @@ static bool icmp_new(struct nf_conn *ct, const struct 
sk_buff *skb,
 /* Returns conntrack if it dealt with ICMP, and filled in skb fields */
 static int
 icmp_error_message(struct net *net, struct nf_conn *tmpl, struct sk_buff *skb,
-enum ip_conntrack_info *ctinfo,
 unsigned int hooknum)
 {
struct nf_conntrack_tuple innertuple, origtuple;
const struct nf_conntrack_l4proto *innerproto;
const struct nf_conntrack_tuple_hash *h;
const struct nf_conntrack_zone *zone;
+   enum ip_conntrack_info ctinfo;
struct nf_conntrack_zone tmp;
 
NF_CT_ASSERT(skb->nfct == NULL);
@@ -160,7 +160,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
return -NF_ACCEPT;
}
 
-   *ctinfo = IP_CT_RELATED;
+   ctinfo = IP_CT_RELATED;
 
h = nf_conntrack_find_get(net, zone, );
if (!h) {
@@ -169,11 +169,11 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
}
 
if (NF_CT_DIRECTION(h) == IP_CT_DIR_REPLY)
-   *ctinfo += IP_CT_IS_REPLY;
+   ctinfo += IP_CT_IS_REPLY;
 
/* Update skb to refer to this connection */
skb->nfct = _ct_tuplehash_to_ctrack(h)->ct_general;
-   skb->nfctinfo = *ctinfo;
+   skb->nfctinfo = ctinfo;
return NF_ACCEPT;
 }
 
@@ -181,7 +181,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
 static int
 icmp_error(struct net *net, struct nf_conn *tmpl,
   struct sk_buff *skb, unsigned int dataoff,
-  enum ip_conntrack_info *ctinfo, u_int8_t pf, unsigned int hooknum)
+  u8 pf, unsigned int hooknum)
 {
const struct icmphdr *icmph;
struct icmphdr _ih;
@@ -225,7 +225,7 @@ icmp_error(struct net *net, struct nf_conn *tmpl,
icmph->type != ICMP_REDIRECT)
return NF_ACCEPT;
 
-   return icmp_error_message(net, tmpl, skb, ctinfo, hooknum);
+   return icmp_error_message(net, tmpl, skb, hooknum);
 }
 
 #if IS_ENABLED(CONFIG_NF_CT_NETLINK)
diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c 
b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
index f5a61bc3ec2b..44b9af3f813e 100644
--- a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
+++ b/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c
@@ -145,12 +145,12 @@ static int
 icmpv6_error_message(struct net *net, struct nf_conn *tmpl,
 struct sk_buff *skb,
 unsigned int icmp6off,
-enum ip_conntrack_info *ctinfo,
 unsigned int hooknum)
 {
struct nf_conntrack_tuple intuple, origtuple;
const struct nf_conntrack_tuple_hash *h;
const struct nf_conntrack_l4proto *inproto;
+   enum ip_conntrack_info ctinfo;
struct nf_conntrack_zone tmp;
 
NF_CT_ASSERT(skb->nfct == NULL);
@@ -176,7 +176,7 @@ icmpv6_error_message(struct net *net, struct nf_conn *tmpl,
return -NF_ACCEPT;
}
 
-   *ctinfo = IP_CT_RELATED;
+   ctinfo = 

[PATCH 22/27] skbuff: add and use skb_nfct helper

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

Followup patch renames skb->nfct and changes its type so add a helper to
avoid intrusive rename change later.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/linux/skbuff.h | 13 ++---
 include/net/netfilter/nf_conntrack_core.h  |  2 +-
 net/core/skbuff.c  |  2 +-
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  8 
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |  2 +-
 net/ipv4/netfilter/nf_defrag_ipv4.c|  4 ++--
 net/ipv4/netfilter/nf_dup_ipv4.c   |  2 +-
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  8 
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |  4 ++--
 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c  |  4 ++--
 net/netfilter/nf_conntrack_core.c  |  4 ++--
 net/netfilter/nf_nat_helper.c  |  2 +-
 net/netfilter/xt_CT.c  |  2 +-
 net/openvswitch/conntrack.c|  6 +++---
 net/sched/cls_flow.c   |  2 +-
 15 files changed, 36 insertions(+), 29 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index b53c0cfd417e..276431e047af 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -3553,6 +3553,15 @@ static inline void skb_remcsum_process(struct sk_buff 
*skb, void *ptr,
skb->csum = csum_add(skb->csum, delta);
 }
 
+static inline struct nf_conntrack *skb_nfct(const struct sk_buff *skb)
+{
+#if IS_ENABLED(CONFIG_NF_CONNTRACK)
+   return skb->nfct;
+#else
+   return NULL;
+#endif
+}
+
 #if defined(CONFIG_NF_CONNTRACK) || defined(CONFIG_NF_CONNTRACK_MODULE)
 void nf_conntrack_destroy(struct nf_conntrack *nfct);
 static inline void nf_conntrack_put(struct nf_conntrack *nfct)
@@ -3652,9 +3661,7 @@ static inline bool skb_irq_freeable(const struct sk_buff 
*skb)
 #if IS_ENABLED(CONFIG_XFRM)
!skb->sp &&
 #endif
-#if IS_ENABLED(CONFIG_NF_CONNTRACK)
-   !skb->nfct &&
-#endif
+   !skb_nfct(skb) &&
!skb->_skb_refdst &&
!skb_has_frag_list(skb);
 }
diff --git a/include/net/netfilter/nf_conntrack_core.h 
b/include/net/netfilter/nf_conntrack_core.h
index 62e17d1319ff..84ec7ca5f195 100644
--- a/include/net/netfilter/nf_conntrack_core.h
+++ b/include/net/netfilter/nf_conntrack_core.h
@@ -62,7 +62,7 @@ int __nf_conntrack_confirm(struct sk_buff *skb);
 /* Confirm a connection: returns NF_DROP if packet must be dropped. */
 static inline int nf_conntrack_confirm(struct sk_buff *skb)
 {
-   struct nf_conn *ct = (struct nf_conn *)skb->nfct;
+   struct nf_conn *ct = (struct nf_conn *)skb_nfct(skb);
int ret = NF_ACCEPT;
 
if (ct && !nf_ct_is_untracked(ct)) {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 5a03730fbc1a..cac3ebfb4b45 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -655,7 +655,7 @@ static void skb_release_head_state(struct sk_buff *skb)
skb->destructor(skb);
}
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
-   nf_conntrack_put(skb->nfct);
+   nf_conntrack_put(skb_nfct(skb));
 #endif
 #if IS_ENABLED(CONFIG_BRIDGE_NETFILTER)
nf_bridge_put(skb->nf_bridge);
diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c 
b/net/ipv4/netfilter/ipt_SYNPROXY.c
index 30c0de53e254..a12d4f0aa674 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -107,8 +107,8 @@ synproxy_send_client_synack(struct net *net,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(net, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
- niph, nth, tcp_hdr_size);
+   synproxy_send_tcp(net, skb, nskb, skb_nfct(skb),
+ IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size);
 }
 
 static void
@@ -230,8 +230,8 @@ synproxy_send_client_ack(struct net *net,
 
synproxy_build_options(nth, opts);
 
-   synproxy_send_tcp(net, skb, nskb, skb->nfct, IP_CT_ESTABLISHED_REPLY,
- niph, nth, tcp_hdr_size);
+   synproxy_send_tcp(net, skb, nskb, skb_nfct(skb),
+ IP_CT_ESTABLISHED_REPLY, niph, nth, tcp_hdr_size);
 }
 
 static bool
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index 566afac98a88..478a025909fc 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -137,7 +137,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
enum ip_conntrack_info ctinfo;
struct nf_conntrack_zone tmp;
 
-   NF_CT_ASSERT(skb->nfct == NULL);
+   NF_CT_ASSERT(!skb_nfct(skb));
zone = nf_ct_zone_tmpl(tmpl, skb, );
 
/* Are they talking about one of our connections? */
diff --git a/net/ipv4/netfilter/nf_defrag_ipv4.c 

[PATCH 23/27] netfilter: add and use nf_ct_set helper

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

Add a helper to assign a nf_conn entry and the ctinfo bits to an sk_buff.
This avoids changing code in followup patch that merges skb->nfct and
skb->nfctinfo into skb->_nfct.

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/ip_vs.h|  3 +--
 include/net/netfilter/nf_conntrack.h   |  8 
 net/ipv4/netfilter/ipt_SYNPROXY.c  |  3 +--
 net/ipv4/netfilter/nf_conntrack_proto_icmp.c   |  3 +--
 net/ipv4/netfilter/nf_dup_ipv4.c   |  3 +--
 net/ipv6/netfilter/ip6t_SYNPROXY.c |  3 +--
 net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c |  6 ++
 net/ipv6/netfilter/nf_dup_ipv6.c   |  3 +--
 net/netfilter/nf_conntrack_core.c  | 11 +++
 net/netfilter/nft_ct.c |  3 +--
 net/netfilter/xt_CT.c  |  6 ++
 net/openvswitch/conntrack.c|  6 ++
 12 files changed, 24 insertions(+), 34 deletions(-)

diff --git a/include/net/ip_vs.h b/include/net/ip_vs.h
index 2a344ebd7ebe..4b46c591b542 100644
--- a/include/net/ip_vs.h
+++ b/include/net/ip_vs.h
@@ -1559,8 +1559,7 @@ static inline void ip_vs_notrack(struct sk_buff *skb)
nf_conntrack_put(>ct_general);
untracked = nf_ct_untracked_get();
nf_conntrack_get(>ct_general);
-   skb->nfct = >ct_general;
-   skb->nfctinfo = IP_CT_NEW;
+   nf_ct_set(skb, untracked, IP_CT_NEW);
}
 #endif
 }
diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 5916aa9ab3f0..d704aed11684 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -34,6 +34,7 @@ union nf_conntrack_proto {
struct ip_ct_sctp sctp;
struct ip_ct_tcp tcp;
struct nf_ct_gre gre;
+   unsigned int tmpl_padto;
 };
 
 union nf_conntrack_expect_proto {
@@ -341,6 +342,13 @@ struct nf_conn *nf_ct_tmpl_alloc(struct net *net,
 gfp_t flags);
 void nf_ct_tmpl_free(struct nf_conn *tmpl);
 
+static inline void
+nf_ct_set(struct sk_buff *skb, struct nf_conn *ct, enum ip_conntrack_info info)
+{
+   skb->nfct = >ct_general;
+   skb->nfctinfo = info;
+}
+
 #define NF_CT_STAT_INC(net, count)   __this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_INC_ATOMIC(net, count) this_cpu_inc((net)->ct.stat->count)
 #define NF_CT_STAT_ADD_ATOMIC(net, count, v) 
this_cpu_add((net)->ct.stat->count, (v))
diff --git a/net/ipv4/netfilter/ipt_SYNPROXY.c 
b/net/ipv4/netfilter/ipt_SYNPROXY.c
index a12d4f0aa674..3240a2614e82 100644
--- a/net/ipv4/netfilter/ipt_SYNPROXY.c
+++ b/net/ipv4/netfilter/ipt_SYNPROXY.c
@@ -57,8 +57,7 @@ synproxy_send_tcp(struct net *net,
goto free_nskb;
 
if (nfct) {
-   nskb->nfct = nfct;
-   nskb->nfctinfo = ctinfo;
+   nf_ct_set(nskb, (struct nf_conn *)nfct, ctinfo);
nf_conntrack_get(nfct);
}
 
diff --git a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c 
b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
index 478a025909fc..73c591d8a9a8 100644
--- a/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
+++ b/net/ipv4/netfilter/nf_conntrack_proto_icmp.c
@@ -172,8 +172,7 @@ icmp_error_message(struct net *net, struct nf_conn *tmpl, 
struct sk_buff *skb,
ctinfo += IP_CT_IS_REPLY;
 
/* Update skb to refer to this connection */
-   skb->nfct = _ct_tuplehash_to_ctrack(h)->ct_general;
-   skb->nfctinfo = ctinfo;
+   nf_ct_set(skb, nf_ct_tuplehash_to_ctrack(h), ctinfo);
return NF_ACCEPT;
 }
 
diff --git a/net/ipv4/netfilter/nf_dup_ipv4.c b/net/ipv4/netfilter/nf_dup_ipv4.c
index 1a5e1f53ceaa..f0dbff05fc28 100644
--- a/net/ipv4/netfilter/nf_dup_ipv4.c
+++ b/net/ipv4/netfilter/nf_dup_ipv4.c
@@ -69,8 +69,7 @@ void nf_dup_ipv4(struct net *net, struct sk_buff *skb, 
unsigned int hooknum,
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
/* Avoid counting cloned packets towards the original connection. */
nf_reset(skb);
-   skb->nfct = _ct_untracked_get()->ct_general;
-   skb->nfctinfo = IP_CT_NEW;
+   nf_ct_set(skb, nf_ct_untracked_get(), IP_CT_NEW);
nf_conntrack_get(skb_nfct(skb));
 #endif
/*
diff --git a/net/ipv6/netfilter/ip6t_SYNPROXY.c 
b/net/ipv6/netfilter/ip6t_SYNPROXY.c
index 2dc01d2c6ec0..4ef1ddd4bbbd 100644
--- a/net/ipv6/netfilter/ip6t_SYNPROXY.c
+++ b/net/ipv6/netfilter/ip6t_SYNPROXY.c
@@ -71,8 +71,7 @@ synproxy_send_tcp(struct net *net,
skb_dst_set(nskb, dst);
 
if (nfct) {
-   nskb->nfct = nfct;
-   nskb->nfctinfo = ctinfo;
+   nf_ct_set(nskb, (struct nf_conn *)nfct, ctinfo);
nf_conntrack_get(nfct);
}
 
diff --git a/net/ipv6/netfilter/nf_conntrack_proto_icmpv6.c 

[PATCH 15/27] netfilter: pkttype: unnecessary to check ipv6 multicast address

2017-02-03 Thread Pablo Neira Ayuso
From: Liping Zhang 

Since there's no broadcast address in IPV6, so in ipv6 family, the
PACKET_LOOPBACK must be multicast packets, there's no need to check
it again.

Signed-off-by: Liping Zhang 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nft_meta.c   | 5 +
 net/netfilter/xt_pkttype.c | 3 +--
 2 files changed, 2 insertions(+), 6 deletions(-)

diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c
index 66c7f4b4c49b..9a22b24346b8 100644
--- a/net/netfilter/nft_meta.c
+++ b/net/netfilter/nft_meta.c
@@ -154,10 +154,7 @@ void nft_meta_get_eval(const struct nft_expr *expr,
*dest = PACKET_BROADCAST;
break;
case NFPROTO_IPV6:
-   if (ipv6_hdr(skb)->daddr.s6_addr[0] == 0xFF)
-   *dest = PACKET_MULTICAST;
-   else
-   *dest = PACKET_BROADCAST;
+   *dest = PACKET_MULTICAST;
break;
default:
WARN_ON(1);
diff --git a/net/netfilter/xt_pkttype.c b/net/netfilter/xt_pkttype.c
index 57efb703ff18..1ef99151b3ba 100644
--- a/net/netfilter/xt_pkttype.c
+++ b/net/netfilter/xt_pkttype.c
@@ -33,8 +33,7 @@ pkttype_mt(const struct sk_buff *skb, struct xt_action_param 
*par)
else if (xt_family(par) == NFPROTO_IPV4 &&
ipv4_is_multicast(ip_hdr(skb)->daddr))
type = PACKET_MULTICAST;
-   else if (xt_family(par) == NFPROTO_IPV6 &&
-   ipv6_hdr(skb)->daddr.s6_addr[0] == 0xFF)
+   else if (xt_family(par) == NFPROTO_IPV6)
type = PACKET_MULTICAST;
else
type = PACKET_BROADCAST;
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 01/27] netfilter: merge udp and udplite conntrack helpers

2017-02-03 Thread Pablo Neira Ayuso
From: Florian Westphal 

udplite was copied from udp, they are virtually 100% identical.

This adds udplite tracker to udp instead, removes udplite module,
and then makes the udplite tracker builtin.

udplite will then simply re-use udp timeout settings.
It makes little sense to add separate sysctls, nowadays we have
fine-grained timeout policy support via the CT target.

old:
 textdata bss dec hex filename
 1633 672   02305 901 nf_conntrack_proto_udp.o
 1756 672   02428 97c nf_conntrack_proto_udplite.o
69526   17937 268   87731   156b3 nf_conntrack.ko

new:
 textdata bss dec hex filename
 24421184   03626 e2a nf_conntrack_proto_udp.o
68565   17721 268   86554   1521a nf_conntrack.ko

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/ipv4/nf_conntrack_ipv4.h |   1 +
 include/net/netfilter/ipv6/nf_conntrack_ipv6.h |   1 +
 include/net/netns/conntrack.h  |  16 --
 net/netfilter/Makefile |   1 -
 net/netfilter/nf_conntrack_proto_udp.c | 123 ++
 net/netfilter/nf_conntrack_proto_udplite.c | 324 -
 6 files changed, 125 insertions(+), 341 deletions(-)
 delete mode 100644 net/netfilter/nf_conntrack_proto_udplite.c

diff --git a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h 
b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
index 919e4e8af327..6ff32815641b 100644
--- a/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
+++ b/include/net/netfilter/ipv4/nf_conntrack_ipv4.h
@@ -14,6 +14,7 @@ extern struct nf_conntrack_l3proto nf_conntrack_l3proto_ipv4;
 
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp4;
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_udp4;
+extern struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite4;
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_icmp;
 #ifdef CONFIG_NF_CT_PROTO_DCCP
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_dccp4;
diff --git a/include/net/netfilter/ipv6/nf_conntrack_ipv6.h 
b/include/net/netfilter/ipv6/nf_conntrack_ipv6.h
index eaea968f8657..c59b82456f89 100644
--- a/include/net/netfilter/ipv6/nf_conntrack_ipv6.h
+++ b/include/net/netfilter/ipv6/nf_conntrack_ipv6.h
@@ -5,6 +5,7 @@ extern struct nf_conntrack_l3proto nf_conntrack_l3proto_ipv6;
 
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_tcp6;
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_udp6;
+extern struct nf_conntrack_l4proto nf_conntrack_l4proto_udplite6;
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_icmpv6;
 #ifdef CONFIG_NF_CT_PROTO_DCCP
 extern struct nf_conntrack_l4proto nf_conntrack_l4proto_dccp6;
diff --git a/include/net/netns/conntrack.h b/include/net/netns/conntrack.h
index cf799fc3fdec..17724c62de97 100644
--- a/include/net/netns/conntrack.h
+++ b/include/net/netns/conntrack.h
@@ -69,19 +69,6 @@ struct nf_sctp_net {
 };
 #endif
 
-#ifdef CONFIG_NF_CT_PROTO_UDPLITE
-enum udplite_conntrack {
-   UDPLITE_CT_UNREPLIED,
-   UDPLITE_CT_REPLIED,
-   UDPLITE_CT_MAX
-};
-
-struct nf_udplite_net {
-   struct nf_proto_net pn;
-   unsigned int timeouts[UDPLITE_CT_MAX];
-};
-#endif
-
 struct nf_ip_net {
struct nf_generic_net   generic;
struct nf_tcp_net   tcp;
@@ -94,9 +81,6 @@ struct nf_ip_net {
 #ifdef CONFIG_NF_CT_PROTO_SCTP
struct nf_sctp_net  sctp;
 #endif
-#ifdef CONFIG_NF_CT_PROTO_UDPLITE
-   struct nf_udplite_net   udplite;
-#endif
 };
 
 struct ct_pcpu {
diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile
index ca30d1960f1d..bf5c577113b6 100644
--- a/net/netfilter/Makefile
+++ b/net/netfilter/Makefile
@@ -7,7 +7,6 @@ nf_conntrack-$(CONFIG_NF_CONNTRACK_EVENTS) += 
nf_conntrack_ecache.o
 nf_conntrack-$(CONFIG_NF_CONNTRACK_LABELS) += nf_conntrack_labels.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_DCCP) += nf_conntrack_proto_dccp.o
 nf_conntrack-$(CONFIG_NF_CT_PROTO_SCTP) += nf_conntrack_proto_sctp.o
-nf_conntrack-$(CONFIG_NF_CT_PROTO_UDPLITE) += nf_conntrack_proto_udplite.o
 
 obj-$(CONFIG_NETFILTER) = netfilter.o
 
diff --git a/net/netfilter/nf_conntrack_proto_udp.c 
b/net/netfilter/nf_conntrack_proto_udp.c
index 20f35ed68030..ae63944c9dc4 100644
--- a/net/netfilter/nf_conntrack_proto_udp.c
+++ b/net/netfilter/nf_conntrack_proto_udp.c
@@ -108,6 +108,59 @@ static bool udp_new(struct nf_conn *ct, const struct 
sk_buff *skb,
return true;
 }
 
+#ifdef CONFIG_NF_CT_PROTO_UDPLITE
+static int udplite_error(struct net *net, struct nf_conn *tmpl,
+struct sk_buff *skb,
+unsigned int dataoff,
+enum ip_conntrack_info *ctinfo,
+u8 pf, unsigned int hooknum)
+{
+   unsigned int udplen = skb->len - dataoff;
+   const struct udphdr *hdr;
+   struct udphdr _hdr;
+   unsigned int cscov;
+
+   /* Header is too small? */
+ 

[PATCH 00/27] Netfilter updates for net-next

2017-02-03 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter updates for your net-next
tree, they are:

1) Stash ctinfo 3-bit field into pointer to nf_conntrack object from
   sk_buff so we only access one single cacheline in the conntrack
   hotpath. Patchset from Florian Westphal.

2) Don't leak pointer to internal structures when exporting x_tables
   ruleset back to userspace, from Willem DeBruijn. This includes new
   helper functions to copy data to userspace such as xt_data_to_user()
   as well as conversions of our ip_tables, ip6_tables and arp_tables
   clients to use it. Not surprinsingly, ebtables requires an ad-hoc
   update. There is also a new field in x_tables extensions to indicate
   the amount of bytes that we copy to userspace.

3) Add nf_log_all_netns sysctl: This new knob allows you to enable
   logging via nf_log infrastructure for all existing netnamespaces.
   Given the effort to provide pernet syslog has been discontinued,
   let's provide a way to restore logging using netfilter kernel logging
   facilities in trusted environments. Patch from Michal Kubecek.

4) Validate SCTP checksum from conntrack helper, from Davide Caratti.

5) Merge UDPlite conntrack and NAT helpers into UDP, this was mostly
   a copy from the original helper, from Florian Westphal.

6) Reset netfilter state when duplicating packets, also from Florian.

7) Remove unnecessary check for broadcast in IPv6 in pkttype match and
   nft_meta, from Liping Zhang.

8) Add missing code to deal with loopback packets from nft_meta when
   used by the netdev family, also from Liping.

9) Several cleanups on nf_tables, one to remove unnecessary check from
   the netlink control plane path to add table, set and stateful objects
   and code consolidation when unregister chain hooks, from Gao Feng.

10) Fix harmless reference counter underflow in IPVS that, however,
results in problems with the introduction of the new refcount_t
type, from David Windsor.

11) Enable LIBCRC32C from nf_ct_sctp instead of nf_nat_sctp,
from Davide Caratti.

12) Missing documentation on nf_tables uapi header, from Liping Zhang.

13) Use rb_entry() helper in xt_connlimit, from Geliang Tang.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Thanks!



The following changes since commit 0a0a8d6b0e88d947d7ab3198b325e31f677bebc2:

  net: fealnx: use new api ethtool_{get|set}_link_ksettings (2017-01-02 
16:59:10 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git HEAD

for you to fetch changes up to 2851940ffee313e0ff12540a8e11a8c54dea9c65:

  netfilter: allow logging from non-init namespaces (2017-02-02 14:31:58 +0100)


David Windsor (1):
  ipvs: free ip_vs_dest structs when refcnt=0

Davide Caratti (2):
  netfilter: select LIBCRC32C together with SCTP conntrack
  netfilter: conntrack: validate SCTP crc32c in PREROUTING

Feng (1):
  netfilter: nf_tables: Eliminate duplicated code in 
nf_tables_table_enable()

Florian Westphal (9):
  netfilter: merge udp and udplite conntrack helpers
  netfilter: nat: merge udp and udplite helpers
  netfilter: conntrack: no need to pass ctinfo to error handler
  netfilter: reset netfilter state when duplicating packet
  netfilter: reduce direct skb->nfct usage
  skbuff: add and use skb_nfct helper
  netfilter: add and use nf_ct_set helper
  netfilter: guarantee 8 byte minalign for template addresses
  netfilter: merge ctinfo into nfct pointer storage area

Gao Feng (1):
  netfilter: nf_tables: eliminate useless condition checks

Geliang Tang (1):
  netfilter: xt_connlimit: use rb_entry()

Liping Zhang (4):
  netfilter: nf_tables: add missing descriptions in nft_ct_keys
  netfilter: nft_ct: add average bytes per packet support
  netfilter: pkttype: unnecessary to check ipv6 multicast address
  netfilter: nft_meta: deal with PACKET_LOOPBACK in netdev family

Michal Kubeček (1):
  netfilter: allow logging from non-init namespaces

Willem de Bruijn (7):
  xtables: add xt_match, xt_target and data copy_to_user functions
  iptables: use match, target and data copy_to_user helpers
  ip6tables: use match, target and data copy_to_user helpers
  arptables: use match, target and data copy_to_user helpers
  ebtables: use match, target and data copy_to_user helpers
  xtables: use match, target and data copy_to_user helpers in compat
  xtables: extend matches and targets with .usersize

 Documentation/networking/netfilter-sysctl.txt  |  10 +
 include/linux/netfilter/x_tables.h |   9 +
 include/linux/skbuff.h |  32 +--
 include/net/ip_vs.h|  12 +-
 include/net/netfilter/ipv4/nf_conntrack_ipv4.h |   1 +
 

[PATCH ulogd2] adjust ulogd.logrotate to match ulogd.conf

2017-02-03 Thread Kaarle Ritvanen
Signed-off-by: Kaarle Ritvanen 
---
 ulogd.logrotate | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ulogd.logrotate b/ulogd.logrotate
index b3fb6d1..a865353 100644
--- a/ulogd.logrotate
+++ b/ulogd.logrotate
@@ -1,4 +1,4 @@
-/var/log/ulogd.log /var/log/ulogd.syslogemu /var/log/ulogd.pktlog 
/var/log/ulogd.pcap {
+/var/log/ulogd*.log {
 missingok
 sharedscripts
 postrotate
-- 
2.9.3

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html