[PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
Hi Linus! DaveM is probably travelling back from UKUUG at the moment and therefore wasn't able to push this fix to you. The following trivial patch is confirmed to solve an ICMP corruption problem if NAT and the NOTRACK target are used together. Please apply before 2.6.13 is released. Thanks, Harald -- - Harald Welte <[EMAIL PROTECTED]> http://netfilter.org/ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed."-- Paul Vixie [NETFILTER] don't try to do any NAT on untracked connections With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <[EMAIL PROTECTED]> --- commit c16fd4ffed6349d0888cd97a75d04394dac42021 tree b4f0e73c7c36f3a52b23593c40f1f49353ba67e3 parent 4d08142e287f852db3f4bfd614f2d73521bd7f07 author Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 committer Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 net/ipv4/netfilter/ip_nat_standalone.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c --- a/net/ipv4/netfilter/ip_nat_standalone.c +++ b/net/ipv4/netfilter/ip_nat_standalone.c @@ -100,6 +100,10 @@ ip_nat_fn(unsigned int hooknum, return NF_ACCEPT; } + /* Don't try to NAT if this packet is not conntracked */ + if (ct == &ip_conntrack_untracked) + return NF_ACCEPT; + switch (ctinfo) { case IP_CT_RELATED: case IP_CT_RELATED+IP_CT_IS_REPLY: pgpk4wAw4mBU1.pgp Description: PGP signature
Re: [PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
[ Cleaning up Cc list ] On Mon, Aug 08, 2005 at 12:34:00AM +0200, Patrick McHardy wrote: > > Looking at the latest traces Vladimir sent me, there is another case, > > too. > > Yes, but nat_packet checks if manips have actually been set up before > touching the packet. This can never happen for the untracked entry > because it is initialized with IPS_NAT_DONE_MASK in ip_nat_core. > I guess we can remove this now: > > /* Initialize fake conntrack so that NAT will skip it */ > ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK; Yes, that is true. However, we shouldn't push that immediately and test it before mainline inclusion. I'm still trying to add a testcase for the original problem to the nfsim-testsuite, so we can verify the problem is gone. -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) pgpsU2siBBsRw.pgp Description: PGP signature
Re: [PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
Harald Welte wrote: > On Sun, Aug 07, 2005 at 08:42:56PM +0200, Patrick McHardy wrote: > >>The conntrack reference is manually attached to locally generated ICMP >>errors and icmp_reply_translation() doesn't check if NAT mappings have >>been set up but simply replaces IP/port by what is stored in the >>untracked conntrack entry, which is all 0's. > > ah, manually attached references, I forgot about them. > > Looking at the latest traces Vladimir sent me, there is another case, > too. Yes, but nat_packet checks if manips have actually been set up before touching the packet. This can never happen for the untracked entry because it is initialized with IPS_NAT_DONE_MASK in ip_nat_core. I guess we can remove this now: /* Initialize fake conntrack so that NAT will skip it */ ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK; - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
On Sun, Aug 07, 2005 at 08:42:56PM +0200, Patrick McHardy wrote: > Harald Welte wrote: > > On Sun, Aug 07, 2005 at 05:18:06PM +0200, Harald Welte wrote: > > > >>The following trivial patch was confirmed to solve the problem. Patrick > >>also has no objections, so please apply this to mainline. > > > > Please hold it back for another minute. I'm still puzzled by this > > problem. I can neither reproduce it nor understand how the code could > > end up in a state where it would try to do NAT on untracked connections. > > The conntrack reference is manually attached to locally generated ICMP > errors and icmp_reply_translation() doesn't check if NAT mappings have > been set up but simply replaces IP/port by what is stored in the > untracked conntrack entry, which is all 0's. ah, manually attached references, I forgot about them. Looking at the latest traces Vladimir sent me, there is another case, too. Dave: Please go ahead and apply the patch (attached again for reference) -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) [NETFILTER] don't try to do any NAT on untracked connections With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <[EMAIL PROTECTED]> --- commit c16fd4ffed6349d0888cd97a75d04394dac42021 tree b4f0e73c7c36f3a52b23593c40f1f49353ba67e3 parent 4d08142e287f852db3f4bfd614f2d73521bd7f07 author Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 committer Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 net/ipv4/netfilter/ip_nat_standalone.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c --- a/net/ipv4/netfilter/ip_nat_standalone.c +++ b/net/ipv4/netfilter/ip_nat_standalone.c @@ -100,6 +100,10 @@ ip_nat_fn(unsigned int hooknum, return NF_ACCEPT; } + /* Don't try to NAT if this packet is not conntracked */ + if (ct == &ip_conntrack_untracked) + return NF_ACCEPT; + switch (ctinfo) { case IP_CT_RELATED: case IP_CT_RELATED+IP_CT_IS_REPLY: pgpZw1QpGS0hb.pgp Description: PGP signature
Re: [PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
Harald Welte wrote: > On Sun, Aug 07, 2005 at 05:18:06PM +0200, Harald Welte wrote: > >>The following trivial patch was confirmed to solve the problem. Patrick >>also has no objections, so please apply this to mainline. > > Please hold it back for another minute. I'm still puzzled by this > problem. I can neither reproduce it nor understand how the code could > end up in a state where it would try to do NAT on untracked connections. The conntrack reference is manually attached to locally generated ICMP errors and icmp_reply_translation() doesn't check if NAT mappings have been set up but simply replaces IP/port by what is stored in the untracked conntrack entry, which is all 0's. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
On Sun, Aug 07, 2005 at 06:44:15PM +0200, Harald Welte wrote: > On Sun, Aug 07, 2005 at 05:18:06PM +0200, Harald Welte wrote: > > Hi Dave! > > > > The following trivial patch was confirmed to solve the problem. Patrick > > also has no objections, so please apply this to mainline. > > Please hold it back for another minute. I'm still puzzled by this > problem. I can neither reproduce it nor understand how the code could > end up in a state where it would try to do NAT on untracked connections. > > Vladimir: Can you please send me the output of "iptables -t raw -L -vn" Well, that's pretty complex. See below. > > are you sure the locally-generated ICMP errors in OUTPUT are matched by > your NOTRACK rules? Yes, I am sure, here is a simple test: /sbin/iptables -t raw -I PREROUTING -s 172.16.16.10 -d 172.16.0.12 -j NOTRACK /sbin/iptables -t raw -I PREROUTING -d 172.16.16.10 -s 172.16.0.12 -j NOTRACK /sbin/iptables -t raw -I OUTPUT -s 172.16.16.1 -d 172.16.16.10 -j NOTRACK And after a tracepath test we got: Chain PREROUTING (policy ACCEPT 3225878 packets, 3033381627 bytes) pkts bytes target prot opt in out source destination 1 576 NOTRACKall -- * * 172.16.0.12 172.16.16.10 3 4480 NOTRACKall -- * * 172.16.16.10 172.16.0.12 Chain OUTPUT (policy ACCEPT 29774 packets, 973 bytes) pkts bytes target prot opt in out source destination 2 1152 NOTRACKall -- * * 172.16.16.1 172.16.16.10 All is working as expected. Even when I delete rule from OUTPUT chain, it continues to work: Chain PREROUTING (policy ACCEPT 6206384 packets, 5804528324 bytes) pkts bytes target prot opt in out source destination 2 1152 NOTRACKall -- * * 172.16.0.12 172.16.16.10 6 8960 NOTRACKall -- * * 172.16.16.10 172.16.0.12 Real setup is more complex. 172.16.0.0/16 and 10.0.0.0/8 are local prefixes. 172.16.0.13 is some special address, an exception from NOTRACK rules, to allow it to serve for NAT from some external network. === START Chain PREROUTING (policy ACCEPT 8491590 packets, 7917955822 bytes) pkts bytes target prot opt in out source destination 752775 692879468 ppp_masq all -- * * 0.0.0.0/0 0.0.0.0/0 752724 692828020 notrack_localif all -- * * 0.0.0.0/0 0.0.0.0/0 752691 692801984 notrack_src all -- * * 0.0.0.0/0 0.0.0.0/0 Chain OUTPUT (policy ACCEPT 74664 packets, 26857077 bytes) pkts bytes target prot opt in out source destination 5908 1903792 notrack_src all -- * * 0.0.0.0/0 0.0.0.0/0 Chain notrack_dst (2 references) pkts bytes target prot opt in out source destination 421071 276130730 NOTRACKall -- * * 0.0.0.0/0 172.16.0.0/16 320827 410642431 NOTRACKall -- * * 0.0.0.0/0 10.0.0.0/8 Chain notrack_localif (1 references) pkts bytes target prot opt in out source destination 00 NOTRACKall -- eth4 * 0.0.0.0/0 0.0.0.0/0 00 NOTRACKall -- eth5 * 0.0.0.0/0 0.0.0.0/0 00 NOTRACKall -- lh * 0.0.0.0/0 0.0.0.0/0 6 284 NOTRACKall -- eth3 * 0.0.0.0/0 0.0.0.0/0 2147 2540914 NOTRACKall -- vlan0170 * 0.0.0.0/0 0.0.0.0/0 31304 30137873 NOTRACKall -- eth2 * 0.0.0.0/0 0.0.0.0/0 144676 13944 NOTRACKall -- vlan0172 * 0.0.0.0/0 0.0.0.0/0 266468 363612812 NOTRACKall -- vlan0173 * 0.0.0.0/0 0.0.0.0/0 8803 2774738 NOTRACKall -- eth1 * 0.0.0.0/0 0.0.0.0/0 00 NOTRACKall -- vlan0181 * 0.0.0.0/0 0.0.0.0/0 50455 26026196 NOTRACKall -- vlan0175 * 0.0.0.0/0 0.0.0.0/0 232416 122431248 NOTRACKall -- vlan0176 * 0.0.0.0/0 0.0.0.0/0 00 NOTRACKall -- fdsnet * 0.0.0.0/0 0.0.0.0/0 00 NOTRACKall -- voip * 0.0.0.0/0 0.0.0.0/0 Chain notrack_src (2 references) pkts bytes target prot opt in out source destination 509677 564485901
Re: [PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
On Sun, Aug 07, 2005 at 05:18:06PM +0200, Harald Welte wrote: > Hi Dave! > > The following trivial patch was confirmed to solve the problem. Patrick > also has no objections, so please apply this to mainline. Please hold it back for another minute. I'm still puzzled by this problem. I can neither reproduce it nor understand how the code could end up in a state where it would try to do NAT on untracked connections. Vladimir: Can you please send me the output of "iptables -t raw -L -vn" are you sure the locally-generated ICMP errors in OUTPUT are matched by your NOTRACK rules? -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) pgpTS2DdcCwUJ.pgp Description: PGP signature
[PATCH] Fix erroneous ICMP corruption with iptable_nat and NOTRACK (was Re: ICMP broken in 2.6.13-rc5)
Hi Dave! The following trivial patch was confirmed to solve the problem. Patrick also has no objections, so please apply this to mainline. I'm undecided whether it should go into 2.6.12.x, since the problem only occurs in very rare usage cases. OTOTH, the fix is very trivial... so I leave it up to you, whether to push it to 2.6.12.x or not ;) Thanks, Harald -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) [NETFILTER] don't try to do any NAT on untracked connections With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <[EMAIL PROTECTED]> --- commit c16fd4ffed6349d0888cd97a75d04394dac42021 tree b4f0e73c7c36f3a52b23593c40f1f49353ba67e3 parent 4d08142e287f852db3f4bfd614f2d73521bd7f07 author Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 committer Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 net/ipv4/netfilter/ip_nat_standalone.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c --- a/net/ipv4/netfilter/ip_nat_standalone.c +++ b/net/ipv4/netfilter/ip_nat_standalone.c @@ -100,6 +100,10 @@ ip_nat_fn(unsigned int hooknum, return NF_ACCEPT; } + /* Don't try to NAT if this packet is not conntracked */ + if (ct == &ip_conntrack_untracked) + return NF_ACCEPT; + switch (ctinfo) { case IP_CT_RELATED: case IP_CT_RELATED+IP_CT_IS_REPLY: pgp4XjFHjkRhB.pgp Description: PGP signature
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 06:17:56PM +0200, Harald Welte wrote: > Ok, I re-thought. Given the following assumptions (combined from your > three mails): > > 1) tcp/udp packets are matched by NOTRACK > 2) icmp errors for packets in '1' are matched by NOTRACK > 3) there are no NAT rules that affect the packets in '1' and '2' Yes, that hold true in my setup. > > I see a case where packets get corrupted within iptable_nat. Please try > the attached (untested) patch attached to my mail. Tested, works for me, thank you. > > Still, my initial comments about this being an invalid setup upholds. > The NAT code needs to see all packets/connections in order to learn > about used port/ip tuples. Otherwise, when allocating a tuple, it could > reuse a tuple that is already used by a non-NAT'ed connection. It depends on the rules, doesn't it? In my case, it can not. > > So using nat in combination with NOTRACK should be prevented. I'll hack > up a patch for that, too. > > -- > - Harald Welte <[EMAIL PROTECTED]>http://gnumonks.org/ > > "Privacy in residential applications is a desirable marketing option." > (ETSI EN 300 175-7 Ch. A6) > [NETFILTER] don't try to do any NAT on untracked connections > > With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing > NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no > longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK > effectively prevents iteration of the 'nat' table, but doesn't prevent > nat_packet() to be executed. Since nr_manips is gone in 'rustynat', > nat_packet() now implicitly thinks that it has to do NAT on the packet. > > This patch fixes that problem by explicitly checking for > ip_conntrack_untracked in ip_nat_fn(). > > Signed-off-by: Harald Welte <[EMAIL PROTECTED]> > > --- > commit c16fd4ffed6349d0888cd97a75d04394dac42021 > tree b4f0e73c7c36f3a52b23593c40f1f49353ba67e3 > parent 4d08142e287f852db3f4bfd614f2d73521bd7f07 > author Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 > committer Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 > > net/ipv4/netfilter/ip_nat_standalone.c |4 > 1 files changed, 4 insertions(+), 0 deletions(-) > > diff --git a/net/ipv4/netfilter/ip_nat_standalone.c > b/net/ipv4/netfilter/ip_nat_standalone.c > --- a/net/ipv4/netfilter/ip_nat_standalone.c > +++ b/net/ipv4/netfilter/ip_nat_standalone.c > @@ -100,6 +100,10 @@ ip_nat_fn(unsigned int hooknum, > return NF_ACCEPT; > } > > + /* Don't try to NAT if this packet is not conntracked */ > + if (ct == &ip_conntrack_untracked) > + return NF_ACCEPT; > + > switch (ctinfo) { > case IP_CT_RELATED: > case IP_CT_RELATED+IP_CT_IS_REPLY: ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 01:25:43PM +0400, Vladimir B. Savkin wrote: > On Sat, Aug 06, 2005 at 11:13:37AM +0200, Harald Welte wrote: > > On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: > > > I found that it really is NOTRACK who cause? bogus ICMP errors. > > > > Well, this means that your ICMP errors need to be NAT'ed but they > > cannot, since the original connection causing the ICMP error did not go > > through connection tracking. > > How so, when there are no NAT rules that can match either source packets > or ICMP errors? Ok, I re-thought. Given the following assumptions (combined from your three mails): 1) tcp/udp packets are matched by NOTRACK 2) icmp errors for packets in '1' are matched by NOTRACK 3) there are no NAT rules that affect the packets in '1' and '2' I see a case where packets get corrupted within iptable_nat. Please try the attached (untested) patch attached to my mail. Still, my initial comments about this being an invalid setup upholds. The NAT code needs to see all packets/connections in order to learn about used port/ip tuples. Otherwise, when allocating a tuple, it could reuse a tuple that is already used by a non-NAT'ed connection. So using nat in combination with NOTRACK should be prevented. I'll hack up a patch for that, too. -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) [NETFILTER] don't try to do any NAT on untracked connections With the introduction of 'rustynat' in 2.6.11, the old tricks of preventing NAT of 'untracked' connections (e.g. NOTRACK target in 'raw' table) are no longer sufficient. The ip_conntrack_untracked.status |= IPS_NAT_DONE_MASK effectively prevents iteration of the 'nat' table, but doesn't prevent nat_packet() to be executed. Since nr_manips is gone in 'rustynat', nat_packet() now implicitly thinks that it has to do NAT on the packet. This patch fixes that problem by explicitly checking for ip_conntrack_untracked in ip_nat_fn(). Signed-off-by: Harald Welte <[EMAIL PROTECTED]> --- commit c16fd4ffed6349d0888cd97a75d04394dac42021 tree b4f0e73c7c36f3a52b23593c40f1f49353ba67e3 parent 4d08142e287f852db3f4bfd614f2d73521bd7f07 author Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 committer Harald Welte <[EMAIL PROTECTED]> Sa, 06 Aug 2005 18:11:00 +0200 net/ipv4/netfilter/ip_nat_standalone.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/net/ipv4/netfilter/ip_nat_standalone.c b/net/ipv4/netfilter/ip_nat_standalone.c --- a/net/ipv4/netfilter/ip_nat_standalone.c +++ b/net/ipv4/netfilter/ip_nat_standalone.c @@ -100,6 +100,10 @@ ip_nat_fn(unsigned int hooknum, return NF_ACCEPT; } + /* Don't try to NAT if this packet is not conntracked */ + if (ct == &ip_conntrack_untracked) + return NF_ACCEPT; + switch (ctinfo) { case IP_CT_RELATED: case IP_CT_RELATED+IP_CT_IS_REPLY: pgpQDvXXp73ho.pgp Description: PGP signature
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 04:58:46PM +0200, Patrick McHardy wrote: > Harald Welte wrote: > >On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: > > > >>I found that it really is NOTRACK who cause? bogus ICMP errors. > > Good work tracking this down. I've seen reports of this before, but > never found the reason. > > >Well, this means that your ICMP errors need to be NAT'ed but they > >cannot, since the original connection causing the ICMP error did not go > >through connection tracking. > > > >Your not-correctly-NATed ICMP packets are the logical result of this > >configuration. > > > >Use of NOTRACK in combination with NAT is _extremely_ dangerous, and > >unless you understand it's full implications, I would not recommend > >combining the two. > > > >So it seems your use of NOTRACK is invalid in this setup - and thus like > >a configuration problem. > > I disagree, NAT already ignores untracked connections in most places, > just icmp_reply_translation is missing. > > Vladimir, can you please test the attached patch? No success, looks that with this patch no ICMP replies are generated (*), no matter whether there exist any NOTRACK rules. (*) I only tested that no replies were received by the client (broken tracepath) and that there were no bogus packets on loopback. > diff --git a/net/ipv4/netfilter/ip_nat_core.c > b/net/ipv4/netfilter/ip_nat_core.c > --- a/net/ipv4/netfilter/ip_nat_core.c > +++ b/net/ipv4/netfilter/ip_nat_core.c > @@ -430,6 +430,19 @@ int icmp_reply_translation(struct sk_buf > } *inside; > struct ip_conntrack_tuple inner, target; > int hdrlen = (*pskb)->nh.iph->ihl * 4; > + unsigned long statusbit; > + > + if (manip == IP_NAT_MANIP_SRC) > + statusbit = IPS_SRC_NAT; > + else > + statusbit = IPS_DST_NAT; > + > + /* Invert if this is reply dir. */ > + if (dir == IP_CT_DIR_REPLY) > + statusbit ^= IPS_NAT_MASK; > + > + if (!(ct->status & statusbit)) > + return 0; > > if (!skb_make_writable(pskb, hdrlen + sizeof(*inside))) > return 0; ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 05:12:01PM +0200, Harald Welte wrote: > > > Well, this means that your ICMP errors need to be NAT'ed but they > > > cannot, since the original connection causing the ICMP error did not go > > > through connection tracking. > > > > How so, when there are no NAT rules that can match either source packets > > or ICMP errors? > > As soon as you load NAT, _all_ connections need to be tracked, since > those with no NAT configured need to "allocate a null binding". > > NAT needs to know about all connections, since otherwise it would not be > able to learn about all already-used port/ip tuples. > > So independant of the specific ICMP problem you're observing, the > configuration seems broken to me in the first place. > > It remains to be questioned, whether we should deal more gracefully with > such a setup, though. In my case, I have local network and Internet access. Local traffic (packets which have both src and dst IP belonging to local prefix) does not need to be NATed or statefully filtered. So I wanted to use NOTRACK for maximum forwarding performance. ICMP error were matched by NOTRACK too (in OUTPUT chain of raw table), as it also has local src and dst. IMO, this means then there should be no NAT attempts for this ICMP packet... I think of this as a valuable feature of Linux - using one box for two or more applications, in my case - local router (no NAT, no stateful filtering, maximum performance) and Internet gateway (with NAT, more filtering, maximum control). > But the discussion like this are one of the reasons why we thought very > hard whether we should include the NOTRACK target into mainline at all. > It is dangerous, and a lot of people will use it in combination and end > up with broken configuration. > > I think we should make NOTRACK and NAT an XOR, i.e. only allow one of > them to be enabled at any given time. > Well, this would break this feature which worked very well for me with older kernels. ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
Harald Welte wrote: As soon as you load NAT, _all_ connections need to be tracked, since those with no NAT configured need to "allocate a null binding". NAT needs to know about all connections, since otherwise it would not be able to learn about all already-used port/ip tuples. So independant of the specific ICMP problem you're observing, the configuration seems broken to me in the first place. It remains to be questioned, whether we should deal more gracefully with such a setup, though. But the discussion like this are one of the reasons why we thought very hard whether we should include the NOTRACK target into mainline at all. It is dangerous, and a lot of people will use it in combination and end up with broken configuration. I think we should make NOTRACK and NAT an XOR, i.e. only allow one of them to be enabled at any given time. I don't see how this can work except on a global scale, which would mean I can't exclude loopback traffic from tracking on a box that does NAT. I think dealing with this case correctly (don't break the ICMP packets) and letting the user make sure he doesn't create rules that result in collisions is a better solutions. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 01:25:43PM +0400, Vladimir B. Savkin wrote: > On Sat, Aug 06, 2005 at 11:13:37AM +0200, Harald Welte wrote: > > On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: > > > I found that it really is NOTRACK who cause? bogus ICMP errors. > > > > Well, this means that your ICMP errors need to be NAT'ed but they > > cannot, since the original connection causing the ICMP error did not go > > through connection tracking. > > How so, when there are no NAT rules that can match either source packets > or ICMP errors? As soon as you load NAT, _all_ connections need to be tracked, since those with no NAT configured need to "allocate a null binding". NAT needs to know about all connections, since otherwise it would not be able to learn about all already-used port/ip tuples. So independant of the specific ICMP problem you're observing, the configuration seems broken to me in the first place. It remains to be questioned, whether we should deal more gracefully with such a setup, though. But the discussion like this are one of the reasons why we thought very hard whether we should include the NOTRACK target into mainline at all. It is dangerous, and a lot of people will use it in combination and end up with broken configuration. I think we should make NOTRACK and NAT an XOR, i.e. only allow one of them to be enabled at any given time. -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) pgpSphNKlfV0Z.pgp Description: PGP signature
Re: ICMP broken in 2.6.13-rc5
Harald Welte wrote: On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: I found that it really is NOTRACK who cause? bogus ICMP errors. Good work tracking this down. I've seen reports of this before, but never found the reason. Well, this means that your ICMP errors need to be NAT'ed but they cannot, since the original connection causing the ICMP error did not go through connection tracking. Your not-correctly-NATed ICMP packets are the logical result of this configuration. Use of NOTRACK in combination with NAT is _extremely_ dangerous, and unless you understand it's full implications, I would not recommend combining the two. So it seems your use of NOTRACK is invalid in this setup - and thus like a configuration problem. I disagree, NAT already ignores untracked connections in most places, just icmp_reply_translation is missing. Vladimir, can you please test the attached patch? diff --git a/net/ipv4/netfilter/ip_nat_core.c b/net/ipv4/netfilter/ip_nat_core.c --- a/net/ipv4/netfilter/ip_nat_core.c +++ b/net/ipv4/netfilter/ip_nat_core.c @@ -430,6 +430,19 @@ int icmp_reply_translation(struct sk_buf } *inside; struct ip_conntrack_tuple inner, target; int hdrlen = (*pskb)->nh.iph->ihl * 4; + unsigned long statusbit; + + if (manip == IP_NAT_MANIP_SRC) + statusbit = IPS_SRC_NAT; + else + statusbit = IPS_DST_NAT; + + /* Invert if this is reply dir. */ + if (dir == IP_CT_DIR_REPLY) + statusbit ^= IPS_NAT_MASK; + + if (!(ct->status & statusbit)) + return 0; if (!skb_make_writable(pskb, hdrlen + sizeof(*inside))) return 0;
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 11:13:37AM +0200, Harald Welte wrote: > On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: > > I found that it really is NOTRACK who cause? bogus ICMP errors. > > Well, this means that your ICMP errors need to be NAT'ed but they > cannot, since the original connection causing the ICMP error did not go > through connection tracking. How so, when there are no NAT rules that can match either source packets or ICMP errors? ~ :wq With best regards, Vladimir Savkin. - To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ICMP broken in 2.6.13-rc5
On Sat, Aug 06, 2005 at 02:08:15AM +0400, Vladimir B. Savkin wrote: > I found that it really is NOTRACK who cause? bogus ICMP errors. Well, this means that your ICMP errors need to be NAT'ed but they cannot, since the original connection causing the ICMP error did not go through connection tracking. Your not-correctly-NATed ICMP packets are the logical result of this configuration. Use of NOTRACK in combination with NAT is _extremely_ dangerous, and unless you understand it's full implications, I would not recommend combining the two. So it seems your use of NOTRACK is invalid in this setup - and thus like a configuration problem. -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) pgp6WMm07KihA.pgp Description: PGP signature
Re: ICMP broken in 2.6.13-rc5
On Fri, Aug 05, 2005 at 07:43:25PM +0200, Harald Welte wrote: > On Wed, Aug 03, 2005 at 03:50:15PM +0400, Vladimir B. Savkin wrote: > > Hello! > > > > When trying to upgrade a gateway from old 2.6.10-rc2 to > > new 2.6.13-rc5, I noticed a flood of messages like > > "172.16.12.1 sent an invalid ICMP type 11, code 0 error to a broadcast: > > 0.0.0.0" > > Source IP is always that of this gateway, destination IP is always 0.0.0.0. > > could you please describe your setup in detail? Who/what creates those > error messages? > > Can you send 'tcpdump -w' captures of the original packet causing the > error, and the corresponding icmp packet? > I found that it really is NOTRACK who causeы bogus ICMP errors. Here is the test setup. vlan0173 is a vlan device (MTU=1500) lh is an IPIP tunnel (MTU=1480) # ip ro ls dev vlan0173 172.16.16.0/22 proto kernel scope link src 172.16.16.1 # ip ro ls dev lh 172.16.0.12 scope link 172.16.16.1 is an IP of gateway associated with vlan0173 I run tracepath from 172.16.16.10 to 172.16.0.12 Without any NOTRACK rules, all is fine: $ /usr/sbin/tracepath -n 172.16.0.12 1: 172.16.16.10 0.564ms pmtu 1500 1: 172.16.16.1 2.795ms 2: 172.16.16.1 asymm 1 0.595ms pmtu 1480 3: 172.16.0.12 asymm 2 2.377ms reached Resume: pmtu 1480 hops 3 back 2 Then I add NOTRACK rule: # iptables -t raw -I PREROUTING -s 172.16.16.10 -d 172.16.0.12 -j NOTRACK and start tcpdump to capture test packets and ICMP replies. Test packets arrive on "vlan0173", bogus ICMP errors go to "lo", PMTU discovery breaks. Dumps are attached. ~ :wq With best regards, Vladimir Savkin. DUMP-lo Description: DUMP-lo DUMP-vlan0173 Description: DUMP-vlan0173
Re: ICMP broken in 2.6.13-rc5
On Wed, Aug 03, 2005 at 03:50:15PM +0400, Vladimir B. Savkin wrote: > Hello! > > When trying to upgrade a gateway from old 2.6.10-rc2 to > new 2.6.13-rc5, I noticed a flood of messages like > "172.16.12.1 sent an invalid ICMP type 11, code 0 error to a broadcast: > 0.0.0.0" > Source IP is always that of this gateway, destination IP is always 0.0.0.0. could you please describe your setup in detail? Who/what creates those error messages? Can you send 'tcpdump -w' captures of the original packet causing the error, and the corresponding icmp packet? -- - Harald Welte <[EMAIL PROTECTED]> http://gnumonks.org/ "Privacy in residential applications is a desirable marketing option." (ETSI EN 300 175-7 Ch. A6) pgpSHazdM2ffJ.pgp Description: PGP signature
ICMP broken in 2.6.13-rc5
Hello! When trying to upgrade a gateway from old 2.6.10-rc2 to new 2.6.13-rc5, I noticed a flood of messages like "172.16.12.1 sent an invalid ICMP type 11, code 0 error to a broadcast: 0.0.0.0" Source IP is always that of this gateway, destination IP is always 0.0.0.0. Looks like it tries to send _every_ ICMP error to 0.0.0.0 instead of origin of a packet that was a source of ICMP error. A similar problem was discussed here: http://marc.theaimsgroup.com/?l=linux-kernel&m=112066911128962&w=2 but without a resolution. In my case, ICMP error generation is totally broken. I use ip_conntrack, and NOTRACK rules in "raw" table. The packets that caused ICMP errors in question were matched by that NOTRACK rules. I attach my kernel config and lsmod output. The system is x86_64 (dual Opteron), using gcc 3.3.6 (Debian 1:3.3.6-6) ~ :wq With best regards, Vladimir Savkin. # # Automatically generated make config: don't edit # Linux kernel version: 2.6.13-rc5 # Wed Aug 3 05:42:32 2005 # CONFIG_X86_64=y CONFIG_64BIT=y CONFIG_X86=y CONFIG_MMU=y CONFIG_RWSEM_GENERIC_SPINLOCK=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_X86_CMPXCHG=y CONFIG_EARLY_PRINTK=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_IOMAP=y # # Code maturity level options # CONFIG_EXPERIMENTAL=y CONFIG_CLEAN_COMPILE=y CONFIG_LOCK_KERNEL=y CONFIG_INIT_ENV_ARG_LIMIT=32 # # General setup # CONFIG_LOCALVERSION="" CONFIG_SWAP=y CONFIG_SYSVIPC=y CONFIG_POSIX_MQUEUE=y # CONFIG_BSD_PROCESS_ACCT is not set CONFIG_SYSCTL=y # CONFIG_AUDIT is not set CONFIG_HOTPLUG=y CONFIG_KOBJECT_UEVENT=y CONFIG_IKCONFIG=y CONFIG_IKCONFIG_PROC=y # CONFIG_CPUSETS is not set # CONFIG_EMBEDDED is not set CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y # CONFIG_KALLSYMS_EXTRA_PASS is not set CONFIG_PRINTK=y CONFIG_BUG=y CONFIG_BASE_FULL=y CONFIG_FUTEX=y CONFIG_EPOLL=y CONFIG_SHMEM=y CONFIG_CC_ALIGN_FUNCTIONS=0 CONFIG_CC_ALIGN_LABELS=0 CONFIG_CC_ALIGN_LOOPS=0 CONFIG_CC_ALIGN_JUMPS=0 # CONFIG_TINY_SHMEM is not set CONFIG_BASE_SMALL=0 # # Loadable module support # CONFIG_MODULES=y CONFIG_MODULE_UNLOAD=y # CONFIG_MODULE_FORCE_UNLOAD is not set CONFIG_OBSOLETE_MODPARM=y # CONFIG_MODVERSIONS is not set # CONFIG_MODULE_SRCVERSION_ALL is not set CONFIG_KMOD=y CONFIG_STOP_MACHINE=y # # Processor type and features # CONFIG_MK8=y # CONFIG_MPSC is not set # CONFIG_GENERIC_CPU is not set CONFIG_X86_L1_CACHE_BYTES=64 CONFIG_X86_L1_CACHE_SHIFT=6 CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y # CONFIG_MICROCODE is not set CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_MTRR=y CONFIG_SMP=y # CONFIG_SCHED_SMT is not set CONFIG_PREEMPT_NONE=y # CONFIG_PREEMPT_VOLUNTARY is not set # CONFIG_PREEMPT is not set # CONFIG_PREEMPT_BKL is not set CONFIG_K8_NUMA=y # CONFIG_NUMA_EMU is not set CONFIG_ARCH_DISCONTIGMEM_ENABLE=y CONFIG_NUMA=y CONFIG_ARCH_DISCONTIGMEM_DEFAULT=y CONFIG_ARCH_SPARSEMEM_ENABLE=y CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_FLATMEM_MANUAL is not set CONFIG_DISCONTIGMEM_MANUAL=y # CONFIG_SPARSEMEM_MANUAL is not set CONFIG_DISCONTIGMEM=y CONFIG_FLAT_NODE_MEM_MAP=y CONFIG_NEED_MULTIPLE_NODES=y CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NR_CPUS=2 # CONFIG_HOTPLUG_CPU is not set CONFIG_HPET_TIMER=y CONFIG_HPET_EMULATE_RTC=y CONFIG_GART_IOMMU=y CONFIG_SWIOTLB=y CONFIG_X86_MCE=y CONFIG_X86_MCE_INTEL=y CONFIG_PHYSICAL_START=0x10 # CONFIG_KEXEC is not set CONFIG_SECCOMP=y # CONFIG_HZ_100 is not set # CONFIG_HZ_250 is not set CONFIG_HZ_1000=y CONFIG_HZ=1000 CONFIG_GENERIC_HARDIRQS=y CONFIG_GENERIC_IRQ_PROBE=y CONFIG_ISA_DMA_API=y # # Power management options # # CONFIG_PM is not set # # CPU Frequency scaling # # CONFIG_CPU_FREQ is not set # # Bus options (PCI etc.) # CONFIG_PCI=y CONFIG_PCI_DIRECT=y # CONFIG_UNORDERED_IO is not set # CONFIG_PCIEPORTBUS is not set # CONFIG_PCI_MSI is not set CONFIG_PCI_LEGACY_PROC=y CONFIG_PCI_NAMES=y CONFIG_PCI_DEBUG=y # # PCCARD (PCMCIA/CardBus) support # # CONFIG_PCCARD is not set # # PCI Hotplug Support # # CONFIG_HOTPLUG_PCI is not set # # Executable file formats / Emulations # CONFIG_BINFMT_ELF=y # CONFIG_BINFMT_MISC is not set CONFIG_IA32_EMULATION=y # CONFIG_IA32_AOUT is not set CONFIG_COMPAT=y CONFIG_SYSVIPC_COMPAT=y CONFIG_UID16=y # # Networking # CONFIG_NET=y # # Networking options # CONFIG_PACKET=y CONFIG_PACKET_MMAP=y CONFIG_UNIX=y CONFIG_XFRM=y CONFIG_XFRM_USER=y CONFIG_NET_KEY=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_ADVANCED_ROUTER=y # CONFIG_ASK_IP_FIB_HASH is not set CONFIG_IP_FIB_TRIE=y # CONFIG_IP_FIB_HASH is not set CONFIG_IP_MULTIPLE_TABLES=y CONFIG_IP_ROUTE_FWMARK=y CONFIG_IP_ROUTE_MULTIPATH=y # CONFIG_IP_ROUTE_MULTIPATH_CACHED is not set CONFIG_IP_ROUTE_VERBOSE=y # CONFIG_IP_PNP is not set CONFIG_NET_IPIP=y CONFIG_NET_IPGRE=y # CONFIG_NET_IPGRE_BROADCAST is not set # CONFIG_IP_MROUTE is not set # CONFIG_ARPD is not set # CONFIG_SYN_COOKIES is not set CONFIG_INET_AH=y CONFIG_INET_ESP=y CONFIG_INET_