date:20170203

Re: Concurrent iptables-restore calls clobberring each other

2017-02-03 Thread Jan Engelhardt


On Friday 2017-02-03 21:37, Shaun Crampton wrote:
>
>I'm trying to diagnose an incompatibility between my application
>(Project Calico's Felix daemon) and another (Kuberenetes' kube-proxy).
>Both are (ab)using iptables-restore to do high-speed bulk updates to
>iptables and they're both using --noflush so they can use
>iptables-restore to edit only some chains.  Mostly, this works great
>and it's many times faster than using individual iptables commands.
[...]
>My understanding is that each iptables-restore call actually does a
>read-modify-write of the whole table

This is by design; the RMW cycle in principle also affects the "slower"
iptables - which is why it is slower, because it does only one rule per cycle.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] audit: normalize NETFILTER_PKT

2017-02-03 Thread Paul Moore

On Tue, Jan 31, 2017 at 2:44 PM, Richard Guy Briggs  wrote:
> On 2017-01-31 17:13, Steve Grubb wrote:

...

>> I was curious about something. Auparse is trying to interpret the
>> icmptype field for every event. This is not good. Which fields are
>> valid for each kind of packet? Are there fields valid for all packets?
>> Is the icmptype/code the only ones that don't apply in all cases?
>
> Ok, this is important to know...  You sound surprised.  So if that field
> isn't valid for all cases of that event, then the event should be split
> or the "unset" value should be used as a hint to ignore it.
>
> This was the point of my earlier posting:
> https://www.redhat.com/archives/linux-audit/2017-January/msg00074.html
> There are still a number of questions from that thread that had no
> reply.  Answering those questions would help inform this discussion, so
> if you could answer some of those questions in that first thread, I'd
> have a better chance of understanding what are the limitations of the
> parser and design/work around them.
>
> There is no packet for which all fields are valid.  This is why using
> "unset" values in those fields was suggested, seemed to be accepted in
> discussion, and implemented.

...

> Swinging fields in and out makes it very handy to use one message type
> for all of them and can save precious disk bandwidth, but the point was
> to normalize these messages.  Is that still realistic and necessary?  If
> so, we're trying to find a balance between message type explosion and
> disk bandwidth.
>
> We either need to make this fine-grained, ignore fields that aren't
> valid for that type, or swing fields in and out.  Or maybe I have missed
> something fundamental, such as the presence of subsequent fields depends
> on the values of previous fields?

I'm still trying to understand what purpose this record actually
serves, and what requirements may exist.  In an earlier thread
somewhere Steve mentioned some broad requirements around data
import/export, and I really wonder if the NETFILTER_PKT record
provides anything useful here when it really isn't connecting the
traffic to the sender/receiver without a lot of additional logging and
post-processing smarts.  If you were interested in data import/export
I think auditing the socket syscalls would provide a much more useful
set of records in the audit log.

Considering that one of the primary motivations for the audit
subsystem is to enable compliance with various security
specifications, let's get the ones we know about listed in this thread
and then figure out how best to meet those requirements.

-- 
paul moore
www.paul-moore.com
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Netfilter updates for net-next

2017-02-03 Thread David Miller

From: Pablo Neira Ayuso 
Date: Fri,  3 Feb 2017 13:25:11 +0100

> The following patchset contains Netfilter updates for your net-next
> tree, they are:
 ...
> You can pull these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git

Pulled, thanks a lot!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Concurrent iptables-restore calls clobberring each other

2017-02-03 Thread Shaun Crampton

Hi,

I'm trying to diagnose an incompatibility between my application
(Project Calico's Felix daemon) and another (Kuberenetes' kube-proxy).
Both are (ab)using iptables-restore to do high-speed bulk updates to
iptables and they're both using --noflush so they can use
iptables-restore to edit only some chains. Mostly, this works great
and it's many times faster than using individual iptables commands.
However, sometimes when they do an iptables-restore at the same time,
I see one of the updates get lost even though the command reported
success. I've boiled it down to a repro script[1] that starts two
threads writing to iptables and looks for missing updates.

My understanding is that each iptables-restore call actually does a
read-modify-write of the whole table so it's not too surprising that
we could get a missed update. However, I thought that iptables has
some sort of sequence number to prevent clobbering, making it a
compare-and-swap operation. I've certainly seen iptables-restore
calls fail on the COMMIT line when doing concurrent updates and I have
a tweaked script[2] that exhibits that behaviour. In script [2] I
added an extra superfluous rule update to one of the writers and
suddenly the COMMIT starts failing as I was hoping. While the toy
example in [2] seems to work, if I add more operations, it seems to go
back to failing again so it may just be a timing window.

Output from script [1] (it quickly fails after detecting a lost update):

$ sudo ./iptables.sh
[sudo] password for shaun:
akKkKkKkKkiptables-restore: line 4 failed
AbKkKkBKkCaKkAbKkBKkCaKkAbKkBKk
FELIX-B update was clobbered

Output from script [2] (keeps going for as long as I've let it run):

$ sudo ./iptables.sh
akKkAbKkBKkCaKkAbKkBKkKCakAbZkBZkKkCaKkAbKkBCaKkAbBZkKkKkCaKkAbKkBZkKkCaKkAbKkBKkKkCaKkKkAbKkBKkKkCaKkAbKkBZkKkCaKkAbKkBKkCaKkAbKkBZkKkCaKkAbKkBKkKkCaKkAbKkBKkKkCaAbKkKkBKkKkCaKkAbKkBKkCaKkAbBZkCaKkAbKkBKkKkCaKkAbBCa

Where a K means that the "kube" thread successfully wrote to iptables
and a Z means it got a "COMMIT failed".

It'd be great to know if this is working as designed or a bug, or if
there's a way to make sure that I get a COMMIT failure if there's been
a concurrent update. Without that, I'm thinking we'll have to do a
regular poll to make sure that nothing got clobberred.

I'd appreciate if you CCed me on any responses since I'm not
subscribed to the list. Thanks,

-Shaun

[1] https://gist.github.com/fasaxc/ee443a9ef82ce2e4dab059161f095ec2
[2] https://gist.github.com/fasaxc/05a80a48211500e4f2225011a131f92e
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH nft] statement: Print NAT IPv4 address in nat_stmt_print()

2017-02-03 Thread Elise Lennion

The case which "nat.addr != NULL && nat.proto != NULL && type != ipv6"
wasn't caught in nat_stmt_print(). Now all cases should be considered.

Also, the if statements were reorganized to get rid of one nested if.

Fixes(Bug 1117 - Table ipv4-nat prerouting dnat doesn't accept dest IP:PORT)

Signed-off-by: Elise Lennion 
---
 src/statement.c | 31 ---
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/src/statement.c b/src/statement.c
index 9cdabbb..0585d66 100644
--- a/src/statement.c
+++ b/src/statement.c
@@ -494,25 +494,26 @@ static void nat_stmt_print(const struct stmt *stmt)
};
 
printf("%s to ", nat_types[stmt->nat.type]);
-   if (stmt->nat.addr) {
-   if (stmt->nat.proto) {
-   if (stmt->nat.addr->ops->type == EXPR_VALUE &&
-   stmt->nat.addr->dtype->type == TYPE_IP6ADDR) {
-   printf("[");
-   expr_print(stmt->nat.addr);
-   printf("]");
-   } else if (stmt->nat.addr->ops->type == EXPR_RANGE &&
-  stmt->nat.addr->left->dtype->type == 
TYPE_IP6ADDR) {
-   printf("[");
-   expr_print(stmt->nat.addr->left);
-   printf("]-[");
-   expr_print(stmt->nat.addr->right);
-   printf("]");
-   }
+   if (stmt->nat.addr && stmt->nat.proto) {
+   if (stmt->nat.addr->ops->type == EXPR_RANGE &&
+   stmt->nat.addr->left->dtype->type == TYPE_IP6ADDR) {
+   printf("[");
+   expr_print(stmt->nat.addr->left);
+   printf("]-[");
+   expr_print(stmt->nat.addr->right);
+   printf("]");
+   }
+   else if (stmt->nat.addr->dtype->type == TYPE_IP6ADDR) {
+   printf("[");
+   expr_print(stmt->nat.addr);
+   printf("]");
} else {
expr_print(stmt->nat.addr);
}
}
+   else if (stmt->nat.addr) {
+   expr_print(stmt->nat.addr);
+   }
 
if (stmt->nat.proto) {
printf(":");
-- 
2.7.4

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nftables] statement: fix print of ip dnat address

2017-02-03 Thread Florian Westphal

the change causes non-ipv6 addresses to not be printed at all in case
a nfproto was given.

Also add a test case to catch this.

Closes: https://bugzilla.netfilter.org/show_bug.cgi?id=1117
Fixes: 5ab0e10fc6e2c22363a ("src: support for RFC2732 IPv6 address format with 
brackets")
Signed-off-by: Florian Westphal 
---
 src/statement.c   |  2 ++
 tests/py/ip/dnat.t|  1 +
 tests/py/ip/dnat.t.payload.ip | 12 
 3 files changed, 15 insertions(+)

diff --git a/src/statement.c b/src/statement.c
index 9cdabbb979e8..3beb86ab4263 100644
--- a/src/statement.c
+++ b/src/statement.c
@@ -508,6 +508,8 @@ static void nat_stmt_print(const struct stmt *stmt)
printf("]-[");
expr_print(stmt->nat.addr->right);
printf("]");
+   } else {
+   expr_print(stmt->nat.addr);
}
} else {
expr_print(stmt->nat.addr);
diff --git a/tests/py/ip/dnat.t b/tests/py/ip/dnat.t
index da00106edbb4..089017c84704 100644
--- a/tests/py/ip/dnat.t
+++ b/tests/py/ip/dnat.t
@@ -7,6 +7,7 @@ iifname "eth0" tcp dport != 80-90 dnat to 192.168.3.2;ok
 iifname "eth0" tcp dport {80, 90, 23} dnat to 192.168.3.2;ok
 iifname "eth0" tcp dport != {80, 90, 23} dnat to 192.168.3.2;ok
 iifname "eth0" tcp dport != 23-34 dnat to 192.168.3.2;ok
+iifname "eth0" tcp dport 81 dnat to 192.168.3.2:8080;ok
 
 dnat to ct mark map { 0x0014 : 1.2.3.4};ok
 dnat to ct mark . ip daddr map { 0x0014 . 1.1.1.1 : 1.2.3.4};ok
diff --git a/tests/py/ip/dnat.t.payload.ip b/tests/py/ip/dnat.t.payload.ip
index 66926990d880..7a7f5a82dd5a 100644
--- a/tests/py/ip/dnat.t.payload.ip
+++ b/tests/py/ip/dnat.t.payload.ip
@@ -60,6 +60,18 @@ ip test-ip4 prerouting
   [ immediate reg 1 0x0203a8c0 ]
   [ nat dnat ip addr_min reg 1 addr_max reg 0 ]
 
+# iifname "eth0" tcp dport 81 dnat to 192.168.3.2:8080
+ip test-ip4 prerouting
+  [ meta load iifname => reg 1 ]
+  [ cmp eq reg 1 0x30687465 0x 0x 0x ]
+  [ payload load 1b @ network header + 9 => reg 1 ]
+  [ cmp eq reg 1 0x0006 ]
+  [ payload load 2b @ transport header + 2 => reg 1 ]
+  [ cmp eq reg 1 0x5100 ]
+  [ immediate reg 1 0x0203a8c0 ]
+  [ immediate reg 2 0x901f ]
+  [ nat dnat ip addr_min reg 1 addr_max reg 0 proto_min reg 2 proto_max reg 0 ]
+
 # dnat to ct mark map { 0x0014 : 1.2.3.4}
 __map%d test-ip4 b
 __map%d test-ip4 0
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nftables 9/9] tests: add test entries for conntrack zones

2017-02-03 Thread Florian Westphal

Signed-off-by: Florian Westphal 
---
 tests/py/any/ct.t | 13 +
 tests/py/any/ct.t.payload | 44 
 2 files changed, 57 insertions(+)

diff --git a/tests/py/any/ct.t b/tests/py/any/ct.t
index 2cfbfe13ccd2..6f32d29c0c40 100644
--- a/tests/py/any/ct.t
+++ b/tests/py/any/ct.t
@@ -100,6 +100,19 @@ ct label 127;ok
 ct label set 127;ok
 ct label 128;fail
 
+ct zone 0;ok
+ct zone 23;ok
+ct zone 65536;fail
+ct both zone 1;fail
+ct original zone 1;ok
+ct reply zone 1;ok
+
+ct zone set 1;ok
+ct original zone set 1;ok
+ct reply zone set 1;ok
+ct zone set mark map { 1 : 1,  2 : 2 };ok;ct zone set mark map { 0x0001 : 
1, 0x0002 : 2}
+ct both zone set 1;fail
+
 ct invalid;fail
 ct invalid original;fail
 ct set invalid original 42;fail
diff --git a/tests/py/any/ct.t.payload b/tests/py/any/ct.t.payload
index 3370bcac0594..e4c7f62b69f5 100644
--- a/tests/py/any/ct.t.payload
+++ b/tests/py/any/ct.t.payload
@@ -402,6 +402,50 @@ ip test-ip4 output
   [ immediate reg 1 0x 0x 0x 0x8000 ]
   [ ct set label with reg 1 ]
 
+# ct zone 0
+ip test-ip4 output
+  [ ct load zone => reg 1 ]
+  [ cmp eq reg 1 0x ]
+
+# ct zone 23
+ip test-ip4 output
+  [ ct load zone => reg 1 ]
+  [ cmp eq reg 1 0x0017 ]
+
+# ct original zone 1
+ip test-ip4 output
+  [ ct load zone => reg 1 , dir original ]
+  [ cmp eq reg 1 0x0001 ]
+
+# ct reply zone 1
+ip test-ip4 output
+  [ ct load zone => reg 1 , dir reply ]
+  [ cmp eq reg 1 0x0001 ]
+
+# ct zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x0001 ]
+  [ ct set zone with reg 1 ]
+
+# ct original zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x0001 ]
+  [ ct set zone with reg 1 , dir original ]
+
+# ct reply zone set 1
+ip test-ip4 output
+  [ immediate reg 1 0x0001 ]
+  [ ct set zone with reg 1 , dir reply ]
+
+# ct zone set mark map { 1 : 1,  2 : 2 }
+__map%d test-ip4 b
+__map%d test-ip4 0
+element 0001  : 0001 0 [end]element 0002  : 0002 0 
[end]
+ip test-ip4 output
+  [ meta load mark => reg 1 ]
+  [ lookup reg 1 set __map%d dreg 1 ]
+  [ ct set zone with reg 1 ]
+
 # notrack
 ip test-ip4 output
   [ notrack ]
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nftables 7/9] ct: refactor print function so it can be re-used for ct statement

2017-02-03 Thread Florian Westphal

Once directional zone support is added we also need to print the
direction of the statement, so factor the common code to re-use
this helper from the statement print function.

Signed-off-by: Florian Westphal 
---
 src/ct.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/src/ct.c b/src/ct.c
index dffa0e5fa44a..7e09c5b246b2 100644
--- a/src/ct.c
+++ b/src/ct.c
@@ -238,22 +238,27 @@ static const struct ct_template ct_templates[] = {
  BYTEORDER_HOST_ENDIAN, 16),
 };
 
-static void ct_expr_print(const struct expr *expr)
+static void ct_print(enum nft_ct_keys key, int8_t dir)
 {
const struct symbolic_constant *s;
 
printf("ct ");
-   if (expr->ct.direction < 0)
+   if (dir < 0)
goto done;
 
for (s = ct_dir_tbl.symbols; s->identifier != NULL; s++) {
-   if (expr->ct.direction == (int) s->value) {
+   if (dir == (int)s->value) {
printf("%s ", s->identifier);
break;
}
}
  done:
-   printf("%s", ct_templates[expr->ct.key].token);
+   printf("%s", ct_templates[key].token);
+}
+
+static void ct_expr_print(const struct expr *expr)
+{
+   ct_print(expr->ct.key, expr->ct.direction);
 }
 
 static bool ct_expr_cmp(const struct expr *e1, const struct expr *e2)
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nftables 6/9] src: add conntrack zone support

2017-02-03 Thread Florian Westphal

This enables zone get/set support.

As the zone can be optionally tied to a direction as well we need a new
token for this (unless we turn reply/original into tokens in which case
we could handle zone via STRING).

There was some discussion on how zone set support should be handled,
especially 'zone set 1'.

There are several issues to consider:

1. its not possible to change a zone 'later on', any given
conntrack flow has exactly one zone for its entire lifetime.

2. to create conntracks in a given zone, the zone therefore has to be
assigned *before* the packet gets picked up by conntrack (so that lookup
finds the correct existing flow or the flow is created with the desired
zone id).  In iptables, this is enforced because zones are assigned with
CT target and this is restricted to the 'raw' table in iptables, which
runs after defragmentation but before connection tracking.

3. Thus, in nftables the 'ct zone set' rule needs to hook before
conntrack too, e.g. via

 table raw {
  chain pre {
   type filter hook prerouting priority -300;
   iif eth3 ct zone set 23
  }
  chain out {
   type filter hook output priority -300;
   oif eth3 ct zone set 23
  }
 }

... but this is not enforced.

There were two alternatives to better document this.
One was to use an explicit 'template' keyword:
  nft ... template zone set 23

... but 'connection tracking templates' are a kernel detail
that users should not and need not know about.

The other one was to use the meta keyword instead since
we're (from a practical point of view) assigning the zone to
the packet, not the conntrack:

 nft ... meta zone set 23

However, next patch also supports 'directional' zones, and

 nft ... meta original zone 23

makes no sense because 'direction' refers to a direction as understood
by the connection tracker.

Signed-off-by: Florian Westphal 
---
 doc/nft.xml | 10 +-
 include/linux/netfilter/nf_tables.h |  1 +
 src/ct.c|  2 ++
 src/parser_bison.y  | 10 ++
 src/scanner.l   |  1 +
 5 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/doc/nft.xml b/doc/nft.xml
index 78e112f3974b..0a81728789bf 100644
--- a/doc/nft.xml
+++ b/doc/nft.xml
@@ -2126,7 +2126,8 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
direction before the conntrack key, others must 
be used directly because they are direction agnostic.
The packets, 
bytes and avgpkt keywords can be
used with or without a direction. If the 
direction is omitted, the sum of the original and the reply
-   direction is returned.
+   direction is returned.  The same is true for 
the zone, if a direction is given, the zone
+   is only matched if the zone id is tied to the 
given direction.



@@ -2144,6 +2145,7 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
bytes
packets
avgpkt
+   zone



@@ -2162,6 +2164,7 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
bytes
packets
avgpkt
+   zone



@@ -2260,6 +2263,11 @@ inet filter meta nfproto ipv6 output rt nexthop fd00::1
average 
bytes per packet, see description for packets keyword
integer 
(64 bit)

+   
+   
zone
+   
conntrack zone
+   integer 
(16 bit)
+   



diff --git a/include/linux/netfilter/nf_tables.h 
b/include/linux/netfilter/nf_tables.h
index b00a05d1ee56..fc0ed47d974d 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -883,6 +883,7 @@ enum nft_ct_keys {
NFT_CT_PKTS,
NFT_CT_BYTES,
NFT_CT_AVGPKT,
+

[PATCH -next 0/9] nftables: add zone support to ct statement

2017-02-03 Thread Florian Westphal

This adds the ability to set the conntrack zone from nftables, i.e.
native replacement for -j CT --zone $number.

See individual patches for details.
This will need more documentation and exposure of the builtin
hook priorities (e.g. via defines?) so users can more easily
see whats happening.

Pablo suggested to allow something like

hook prerouting prio $raw;
or even
hook prerouting prio $conntrack - 1;

instead of the 'awkward' use of the actual numbers used by the kernel
('priority -300' to hook at same priority as raw table).

However, this series doesn't contain any of that, so users will
have to use priorities between -399 and -199 (i.e. after defrag and
before conntrack pickup) to assign zones.

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH libnftnl 4/9] src: ct: add zone support

2017-02-03 Thread Florian Westphal

Signed-off-by: Florian Westphal 
---
 include/linux/netfilter/nf_tables.h | 2 ++
 src/expr/ct.c   | 3 ++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/netfilter/nf_tables.h 
b/include/linux/netfilter/nf_tables.h
index b00a05d1ee56..b972e72623c2 100644
--- a/include/linux/netfilter/nf_tables.h
+++ b/include/linux/netfilter/nf_tables.h
@@ -864,6 +864,7 @@ enum nft_rt_attributes {
  * @NFT_CT_PKTS: conntrack packets
  * @NFT_CT_BYTES: conntrack bytes
  * @NFT_CT_AVGPKT: conntrack average bytes per packet
+ * @NFT_CT_ZONE: conntrack zone
  */
 enum nft_ct_keys {
NFT_CT_STATE,
@@ -883,6 +884,7 @@ enum nft_ct_keys {
NFT_CT_PKTS,
NFT_CT_BYTES,
NFT_CT_AVGPKT,
+   NFT_CT_ZONE,
 };
 
 /**
diff --git a/src/expr/ct.c b/src/expr/ct.c
index d3d352e9f959..cdd08e95c86c 100644
--- a/src/expr/ct.c
+++ b/src/expr/ct.c
@@ -32,7 +32,7 @@ struct nftnl_expr_ct {
 #define IP_CT_DIR_REPLY1
 
 #ifndef NFT_CT_MAX
-#define NFT_CT_MAX (NFT_CT_AVGPKT + 1)
+#define NFT_CT_MAX (NFT_CT_ZONE + 1)
 #endif
 
 static int
@@ -170,6 +170,7 @@ static const char *ctkey2str_array[NFT_CT_MAX] = {
[NFT_CT_PKTS]   = "packets",
[NFT_CT_BYTES]  = "bytes",
[NFT_CT_AVGPKT] = "avgpkt",
+   [NFT_CT_ZONE]   = "zone",
 };
 
 static const char *ctkey2str(uint32_t ctkey)
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nftables 5/9] src: add host byte order integer type

2017-02-03 Thread Florian Westphal

This is needed once we add support to set a zone, as in

ct zone set 42

Using integer_type makes nft use big-endian representation of the zone id
instead of the required host byte order.

When using 'ct zone 1', things will work because the (implicit) relational
operation makes sure that the left and right sides have same byte order.

In the statement case the lack of relop means we either need to convert
ourselves (the ct template contains endianess info), or use a dedicated type
(the latter is the reason why setting a mark will 'just work' since the
 mark type takes care of it).

The dedicated type has the advantage that it also works when maps are used:

ct zone set mark map { 1 : 10, 2 : 20, 3 : 30 }

... which is not easy to do with current map/set code, its endianess
settings rely on dtype->byteorder (i.e., it will always set BIG_ENDIAN
when we'd use integer_type for the zone).

Using evaluation context seems like a nightmare because several
places during eval steps can re-set this information, and propagating
the template info means to pollute generic code with something specific
to ct.

It seems like a future removal of all .byteorder members in the templates
in favor of using appropriate types might be a good idea.

Signed-off-by: Florian Westphal 
---
 include/datatype.h |  2 ++
 src/datatype.c | 10 ++
 2 files changed, 12 insertions(+)

diff --git a/include/datatype.h b/include/datatype.h
index 9f127f2954e3..8c1c827253be 100644
--- a/include/datatype.h
+++ b/include/datatype.h
@@ -82,6 +82,7 @@ enum datatypes {
TYPE_DSCP,
TYPE_ECN,
TYPE_FIB_ADDR,
+   TYPE_U32,
__TYPE_MAX
 };
 #define TYPE_MAX   (__TYPE_MAX - 1)
@@ -231,6 +232,7 @@ extern const struct datatype icmp_code_type;
 extern const struct datatype icmpv6_code_type;
 extern const struct datatype icmpx_code_type;
 extern const struct datatype time_type;
+extern const struct datatype u32_type;
 
 extern const struct datatype *concat_type_alloc(uint32_t type);
 extern void concat_type_destroy(const struct datatype *dtype);
diff --git a/src/datatype.c b/src/datatype.c
index 1518606a3f89..cab42d47f0f0 100644
--- a/src/datatype.c
+++ b/src/datatype.c
@@ -48,6 +48,7 @@ static const struct datatype *datatypes[TYPE_MAX + 1] = {
[TYPE_ICMP_CODE]= _code_type,
[TYPE_ICMPV6_CODE]  = _code_type,
[TYPE_ICMPX_CODE]   = _code_type,
+   [TYPE_U32]  = _type,
 };
 
 void datatype_register(const struct datatype *dtype)
@@ -1057,3 +1058,12 @@ struct error_record *rate_parse(const struct location 
*loc, const char *str,
 
return NULL;
 }
+
+const struct datatype u32_type = {
+   .type   = TYPE_U32,
+   .name   = "u32",
+   .desc   = "32bit host endian integer",
+   .size   = 4 * BITS_PER_BYTE,
+   .byteorder  = BYTEORDER_HOST_ENDIAN,
+   .basetype   = _type,
+};
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH nf-next 2/9] netfilter: nft_ct: prepare for key-dependent error unwind

2017-02-03 Thread Florian Westphal

Next patch will add ZONE_ID set support which will need similar
error unwind (put operation) as conntrack labels.

Prepare for this: remove the 'label_got' boolean in favor
of a switch statement that can be extended in next patch.

As we already have that in the set_destroy function place that in
a separate function and call it from the set init function.

Signed-off-by: Florian Westphal 
---
 net/netfilter/nft_ct.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

diff --git a/net/netfilter/nft_ct.c b/net/netfilter/nft_ct.c
index 5bd4cdfdcda5..2d82df2737da 100644
--- a/net/netfilter/nft_ct.c
+++ b/net/netfilter/nft_ct.c
@@ -386,12 +386,24 @@ static int nft_ct_get_init(const struct nft_ctx *ctx,
return 0;
 }
 
+static void __nft_ct_set_destroy(const struct nft_ctx *ctx, struct nft_ct 
*priv)
+{
+   switch (priv->key) {
+#ifdef CONFIG_NF_CONNTRACK_LABELS
+   case NFT_CT_LABELS:
+   nf_connlabels_put(ctx->net);
+   break;
+#endif
+   default:
+   break;
+   }
+}
+
 static int nft_ct_set_init(const struct nft_ctx *ctx,
   const struct nft_expr *expr,
   const struct nlattr * const tb[])
 {
struct nft_ct *priv = nft_expr_priv(expr);
-   bool label_got = false;
unsigned int len;
int err;
 
@@ -412,7 +424,6 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
err = nf_connlabels_get(ctx->net, (len * BITS_PER_BYTE) - 1);
if (err)
return err;
-   label_got = true;
break;
 #endif
default:
@@ -431,8 +442,7 @@ static int nft_ct_set_init(const struct nft_ctx *ctx,
return 0;
 
 err1:
-   if (label_got)
-   nf_connlabels_put(ctx->net);
+   __nft_ct_set_destroy(ctx, priv);
return err;
 }
 
@@ -447,16 +457,7 @@ static void nft_ct_set_destroy(const struct nft_ctx *ctx,
 {
struct nft_ct *priv = nft_expr_priv(expr);
 
-   switch (priv->key) {
-#ifdef CONFIG_NF_CONNTRACK_LABELS
-   case NFT_CT_LABELS:
-   nf_connlabels_put(ctx->net);
-   break;
-#endif
-   default:
-   break;
-   }
-
+   __nft_ct_set_destroy(ctx, priv);
nft_ct_netns_put(ctx->net, ctx->afi->family);
 }
 
-- 
2.10.2

--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

35 matches

Mail list logo