Re: [bisected] Forwarded packets occasionally has loopback output interface in Netfilter

2018-02-14 Thread Anders K. Pedersen | Cohaesio
On man, 2018-01-22 at 15:54 +0100, Anders K. Pedersen | Cohaesio wrote:
> On tor, 2018-01-11 at 10:18 -0800, Wei Wang wrote:
> > On Thu, Jan 11, 2018 at 9:25 AM, Anders K. Pedersen | Cohaesio
> >  wrote:
> > > On tir, 2017-12-26 at 12:05 +0100, Anders K. Pedersen | Cohaesio
> > > wrote:
> > > > Hello,
> > > > 
> > > > On one of our border routers, Netfilter is occasionally logging
> > > > packets
> > > > with "OUT=lo" (output interface lo) even though the packet should
> > > > be
> > > > going out via a regular interface. This behavior is present on
> > > > Linux
> > > > 4.13.0 to 4.14.9, and a bisection of the problem points to
> > > > 
> > > > [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call
> > > > dst_dev_put() properly
> > > > 
> > > > as the first bad commit. This commit adds dst_dev_put() calls
> > > > before
> > > > some dst_release() calls, and dst_dev_put() does
> > > > 
> > > > dst->dev = dev_net(dst->dev)->loopback_dev;
> > > > 
> > > > (among other things), which fits the problem we're seeing.
> > > > 
> > > > The essential part of our nftables rule set that shows this
> > > > behavior is
> > > > 
> > > > chain forward {
> > > > type filter hook forward priority 0;
> > > > 
> > > > meta oif { $internal_interfaces } accept
> > > > 
> > > > meta oif lo ip daddr != 127.0.0.0/8 \
> > > > log group 0 snaplen 80 prefix "oif-lo"
> > > > counter
> > > > 
> > > > ip saddr { $our_ip_series } \
> > > > flow table acct_out \
> > > > { meta oif . rt nexthop . ip saddr
> > > > timeout 12m counter } \
> > > > accept
> > > > 
> > > > log group 0 snaplen 80 prefix "DROP" counter drop
> > > > }
> > > > 
> > > > The router only does stateless packet filtering and no
> > > > redirection or
> > > > rewriting of the packets (connection tracking, NAT, ipvs etc. are
> > > > not
> > > > even compiled for this kernel).
> > > > 
> > > > As a result of this problem we see packets that should be going
> > > > to an
> > > > internal interface (and thus accepted by the first rule above)
> > > > being
> > > > logged and dropped by the last rule. Some examples:
> > > > 
> > > > Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo
> > > > MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118
> > > > DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF
> > > > PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705
> > > > WINDOW=3295 ACK URGP=0 MARK=0
> > > > Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo
> > > > MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118
> > > > DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF
> > > > PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705
> > > > WINDOW=3295 ACK URGP=0 MARK=0
> > > > 
> > > > Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo
> > > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18
> > > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF
> > > > PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103
> > > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > > Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo
> > > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18
> > > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF
> > > > PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103
> > > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > > 
> > > > Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo
> > > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34
> > > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF
> > > > PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612
> > > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > > Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo
> > > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34
> > > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF
> > > > PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612
> > > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > > 
> > > > It also happens for outbound traffic, where the packets are
> > > > logged and
> > > > counted in the acct_out flow table with "meta oif" = "lo", but a
> > > > correct "rt nexthop" - an example:
> > > > 
> > > > Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo
> > > > MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25
> > > > DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF
> > > > PROTO=TCP SPT=443 DPT=52560 SEQ=3034827396 ACK=2862814901
> > > > WINDOW=12618 ACK URGP=0 MARK=0
> > > > 
> > > > # nft list flow table filter acct_out|tr ',' '\n'|grep lo
> > > > flow table acct_out {
> > > >  "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter
> > > > packets 1 bytes 40
> > > > 
> > > > I don't know if these packets are actually sent out on the
> > > > correct
> > > > outbound interface thanks to the proper nexthop 

Re: [bisected] Forwarded packets occasionally has loopback output interface in Netfilter

2018-01-22 Thread Anders K. Pedersen | Cohaesio
On tor, 2018-01-11 at 10:18 -0800, Wei Wang wrote:
> On Thu, Jan 11, 2018 at 9:25 AM, Anders K. Pedersen | Cohaesio
>  wrote:
> > On tir, 2017-12-26 at 12:05 +0100, Anders K. Pedersen | Cohaesio
> > wrote:
> > > Hello,
> > > 
> > > On one of our border routers, Netfilter is occasionally logging
> > > packets
> > > with "OUT=lo" (output interface lo) even though the packet should
> > > be
> > > going out via a regular interface. This behavior is present on
> > > Linux
> > > 4.13.0 to 4.14.9, and a bisection of the problem points to
> > > 
> > > [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call
> > > dst_dev_put() properly
> > > 
> > > as the first bad commit. This commit adds dst_dev_put() calls
> > > before
> > > some dst_release() calls, and dst_dev_put() does
> > > 
> > > dst->dev = dev_net(dst->dev)->loopback_dev;
> > > 
> > > (among other things), which fits the problem we're seeing.
> > > 
> > > The essential part of our nftables rule set that shows this
> > > behavior is
> > > 
> > > chain forward {
> > > type filter hook forward priority 0;
> > > 
> > > meta oif { $internal_interfaces } accept
> > > 
> > > meta oif lo ip daddr != 127.0.0.0/8 \
> > > log group 0 snaplen 80 prefix "oif-lo"
> > > counter
> > > 
> > > ip saddr { $our_ip_series } \
> > > flow table acct_out \
> > > { meta oif . rt nexthop . ip saddr
> > > timeout 12m counter } \
> > > accept
> > > 
> > > log group 0 snaplen 80 prefix "DROP" counter drop
> > > }
> > > 
> > > The router only does stateless packet filtering and no
> > > redirection or
> > > rewriting of the packets (connection tracking, NAT, ipvs etc. are
> > > not
> > > even compiled for this kernel).
> > > 
> > > As a result of this problem we see packets that should be going
> > > to an
> > > internal interface (and thus accepted by the first rule above)
> > > being
> > > logged and dropped by the last rule. Some examples:
> > > 
> > > Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo
> > > MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118
> > > DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF
> > > PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705
> > > WINDOW=3295 ACK URGP=0 MARK=0
> > > Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo
> > > MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118
> > > DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF
> > > PROTO=TCP SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705
> > > WINDOW=3295 ACK URGP=0 MARK=0
> > > 
> > > Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo
> > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18
> > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF
> > > PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103
> > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo
> > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18
> > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF
> > > PROTO=TCP SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103
> > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > 
> > > Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo
> > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34
> > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF
> > > PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612
> > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo
> > > MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34
> > > DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF
> > > PROTO=TCP SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612
> > > WINDOW=65535 ACK PSH URGP=0 MARK=0
> > > 
> > > It also happens for outbound traffic, where the packets are
> > > logged and
> > > counted in the acct_out flow table with "meta oif" = "lo", but a
> > > correct "rt nexthop" - an example:
> > > 
> > > Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo
> > > MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25
> > > DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF
> > > PROTO=TCP SPT=443 DPT=52560 SEQ=3034827396 ACK=2862814901
> > > WINDOW=12618 ACK URGP=0 MARK=0
> > > 
> > > # nft list flow table filter acct_out|tr ',' '\n'|grep lo
> > > flow table acct_out {
> > >  "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter
> > > packets 1 bytes 40
> > > 
> > > I don't know if these packets are actually sent out on the
> > > correct
> > > outbound interface thanks to the proper nexthop (the MAC=
> > > information
> > > in the Netfilter log is from the received packet and thus not
> > > useful
> > > here).
> > > 
> > > I tried running a tcpdump on the lo interface to see if these
> > > packets
> > > would show up there, but during the three days I had it running,
> > > it
> 

Re: [bisected] Forwarded packets occasionally has loopback output interface in Netfilter

2018-01-11 Thread Wei Wang
On Thu, Jan 11, 2018 at 9:25 AM, Anders K. Pedersen | Cohaesio
 wrote:
> On tir, 2017-12-26 at 12:05 +0100, Anders K. Pedersen | Cohaesio wrote:
>> Hello,
>>
>> On one of our border routers, Netfilter is occasionally logging packets
>> with "OUT=lo" (output interface lo) even though the packet should be
>> going out via a regular interface. This behavior is present on Linux
>> 4.13.0 to 4.14.9, and a bisection of the problem points to
>>
>> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly
>>
>> as the first bad commit. This commit adds dst_dev_put() calls before
>> some dst_release() calls, and dst_dev_put() does
>>
>> dst->dev = dev_net(dst->dev)->loopback_dev;
>>
>> (among other things), which fits the problem we're seeing.
>>
>> The essential part of our nftables rule set that shows this behavior is
>>
>> chain forward {
>> type filter hook forward priority 0;
>>
>> meta oif { $internal_interfaces } accept
>>
>> meta oif lo ip daddr != 127.0.0.0/8 \
>> log group 0 snaplen 80 prefix "oif-lo" counter
>>
>> ip saddr { $our_ip_series } \
>> flow table acct_out \
>> { meta oif . rt nexthop . ip saddr timeout 12m 
>> counter } \
>> accept
>>
>> log group 0 snaplen 80 prefix "DROP" counter drop
>> }
>>
>> The router only does stateless packet filtering and no redirection or
>> rewriting of the packets (connection tracking, NAT, ipvs etc. are not
>> even compiled for this kernel).
>>
>> As a result of this problem we see packets that should be going to an
>> internal interface (and thus accepted by the first rule above) being
>> logged and dropped by the last rule. Some examples:
>>
>> Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo 
>> MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 
>> DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP 
>> SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
>> Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo 
>> MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 
>> DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP 
>> SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
>>
>> Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo 
>> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 
>> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP 
>> SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 
>> MARK=0
>> Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo 
>> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 
>> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP 
>> SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 
>> MARK=0
>>
>> Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo 
>> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 
>> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP 
>> SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 
>> MARK=0
>> Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo 
>> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 
>> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP 
>> SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 
>> MARK=0
>>
>> It also happens for outbound traffic, where the packets are logged and
>> counted in the acct_out flow table with "meta oif" = "lo", but a
>> correct "rt nexthop" - an example:
>>
>> Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo 
>> MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25 
>> DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF PROTO=TCP 
>> SPT=443 DPT=52560 SEQ=3034827396 ACK=2862814901 WINDOW=12618 ACK URGP=0 
>> MARK=0
>>
>> # nft list flow table filter acct_out|tr ',' '\n'|grep lo
>> flow table acct_out {
>>  "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter packets 1 
>> bytes 40
>>
>> I don't know if these packets are actually sent out on the correct
>> outbound interface thanks to the proper nexthop (the MAC= information
>> in the Netfilter log is from the received packet and thus not useful
>> here).
>>
>> I tried running a tcpdump on the lo interface to see if these packets
>> would show up there, but during the three days I had it running, it
>> only logged one such packet, while Netfilter logs 20+ outbound packets
>> every day, and the one packet logged by tcpdump was *not* logged by
>> Netfilter.
>
> Further testing of the individual parts of the first bad commit shows
> that the five first additions of the dst_dev_put() call doesn't trigger
> the problem, while the last one does (also without the first five), so
> the problematic part is:
>
> diff --git a/net/ipv4/route.c 

Re: [bisected] Forwarded packets occasionally has loopback output interface in Netfilter

2018-01-11 Thread Anders K. Pedersen | Cohaesio
On tir, 2017-12-26 at 12:05 +0100, Anders K. Pedersen | Cohaesio wrote:
> Hello,
> 
> On one of our border routers, Netfilter is occasionally logging packets
> with "OUT=lo" (output interface lo) even though the packet should be
> going out via a regular interface. This behavior is present on Linux
> 4.13.0 to 4.14.9, and a bisection of the problem points to
> 
> [95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly
> 
> as the first bad commit. This commit adds dst_dev_put() calls before
> some dst_release() calls, and dst_dev_put() does
> 
> dst->dev = dev_net(dst->dev)->loopback_dev;
> 
> (among other things), which fits the problem we're seeing.
> 
> The essential part of our nftables rule set that shows this behavior is
> 
> chain forward {
> type filter hook forward priority 0;
> 
> meta oif { $internal_interfaces } accept
> 
> meta oif lo ip daddr != 127.0.0.0/8 \
> log group 0 snaplen 80 prefix "oif-lo" counter
> 
> ip saddr { $our_ip_series } \
> flow table acct_out \
> { meta oif . rt nexthop . ip saddr timeout 12m 
> counter } \
> accept
> 
> log group 0 snaplen 80 prefix "DROP" counter drop
> }
> 
> The router only does stateless packet filtering and no redirection or
> rewriting of the packets (connection tracking, NAT, ipvs etc. are not
> even compiled for this kernel).
> 
> As a result of this problem we see packets that should be going to an
> internal interface (and thus accepted by the first rule above) being
> logged and dropped by the last rule. Some examples:
> 
> Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo 
> MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 
> DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP 
> SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
> Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo 
> MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 
> DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP 
> SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
> 
> Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo 
> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 
> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP 
> SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 
> MARK=0
> Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo 
> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 
> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP 
> SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 
> MARK=0
> 
> Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo 
> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 
> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP 
> SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 
> MARK=0
> Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo 
> MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 
> DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP 
> SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 
> MARK=0
> 
> It also happens for outbound traffic, where the packets are logged and
> counted in the acct_out flow table with "meta oif" = "lo", but a
> correct "rt nexthop" - an example:
> 
> Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo 
> MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25 
> DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF PROTO=TCP 
> SPT=443 DPT=52560 SEQ=3034827396 ACK=2862814901 WINDOW=12618 ACK URGP=0 MARK=0
> 
> # nft list flow table filter acct_out|tr ',' '\n'|grep lo
> flow table acct_out {
>  "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter packets 1 
> bytes 40
> 
> I don't know if these packets are actually sent out on the correct
> outbound interface thanks to the proper nexthop (the MAC= information
> in the Netfilter log is from the received packet and thus not useful
> here).
> 
> I tried running a tcpdump on the lo interface to see if these packets
> would show up there, but during the three days I had it running, it
> only logged one such packet, while Netfilter logs 20+ outbound packets
> every day, and the one packet logged by tcpdump was *not* logged by
> Netfilter.

Further testing of the individual parts of the first bad commit shows
that the five first additions of the dst_dev_put() call doesn't trigger
the problem, while the last one does (also without the first five), so
the problematic part is:

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 3dee004..d986d80 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1369,6 +1372,7 @@ static bool rt_cache_route(struct fib_nh *nh, struct 
rtable *rt)
prev = 

[bisected] Forwarded packets occasionally has loopback output interface in Netfilter

2017-12-26 Thread Anders K. Pedersen | Cohaesio
Hello,

On one of our border routers, Netfilter is occasionally logging packets
with "OUT=lo" (output interface lo) even though the packet should be
going out via a regular interface. This behavior is present on Linux
4.13.0 to 4.14.9, and a bisection of the problem points to

[95c47f9cf5e028d1ae77dc6c767c1edc8a18025b] ipv4: call dst_dev_put() properly

as the first bad commit. This commit adds dst_dev_put() calls before
some dst_release() calls, and dst_dev_put() does

dst->dev = dev_net(dst->dev)->loopback_dev;

(among other things), which fits the problem we're seeing.

The essential part of our nftables rule set that shows this behavior is

chain forward {
type filter hook forward priority 0;

meta oif { $internal_interfaces } accept

meta oif lo ip daddr != 127.0.0.0/8 \
log group 0 snaplen 80 prefix "oif-lo" counter

ip saddr { $our_ip_series } \
flow table acct_out \
{ meta oif . rt nexthop . ip saddr timeout 12m counter 
} \
accept

log group 0 snaplen 80 prefix "DROP" counter drop
}

The router only does stateless packet filtering and no redirection or
rewriting of the packets (connection tracking, NAT, ipvs etc. are not
even compiled for this kernel).

As a result of this problem we see packets that should be going to an
internal interface (and thus accepted by the first rule above) being
logged and dropped by the last rule. Some examples:

Dec 22 11:57:02 cix4 oif-lo IN=eth10 OUT=lo 
MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 
DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP 
SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0
Dec 22 11:57:02 cix4 DROP IN=eth10 OUT=lo 
MAC=90:e2:ba:5c:b6:95:10:f3:11:38:06:77:08:00 SRC=81.170.163.118 
DST=212.97.158.33 LEN=1500 TOS=00 PREC=0x00 TTL=116 ID=25932 DF PROTO=TCP 
SPT=35118 DPT=8443 SEQ=604358330 ACK=1182278705 WINDOW=3295 ACK URGP=0 MARK=0

Dec 22 12:47:07 cix4 oif-lo IN=eth10 OUT=lo 
MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 
DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP 
SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 
MARK=0
Dec 22 12:47:07 cix4 DROP IN=eth10 OUT=lo 
MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.30.18 
DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=118 ID=10370 DF PROTO=TCP 
SPT=443 DPT=44988 SEQ=1141545913 ACK=3844573103 WINDOW=65535 ACK PSH URGP=0 
MARK=0

Dec 22 12:53:56 cix4 oif-lo IN=eth10 OUT=lo 
MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 
DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP 
SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 
MARK=0
Dec 22 12:53:56 cix4 DROP IN=eth10 OUT=lo 
MAC=90:e2:ba:5c:b6:95:0e:86:10:27:99:f3:08:00 SRC=40.101.12.34 
DST=212.97.130.32 LEN=245 TOS=00 PREC=0x00 TTL=115 ID=27728 DF PROTO=TCP 
SPT=443 DPT=39724 SEQ=3797156404 ACK=3944234612 WINDOW=65535 ACK PSH URGP=0 
MARK=0

It also happens for outbound traffic, where the packets are logged and
counted in the acct_out flow table with "meta oif" = "lo", but a
correct "rt nexthop" - an example:

Dec 22 12:29:13 cix4 oif-lo IN=team0.20 OUT=lo 
MAC=3c:fd:fe:15:db:a8:00:24:a8:ff:f0:00:08:00 SRC=212.97.129.25 
DST=95.166.119.129 LEN=40 TOS=00 PREC=0x00 TTL=62 ID=19481 DF PROTO=TCP SPT=443 
DPT=52560 SEQ=3034827396 ACK=2862814901 WINDOW=12618 ACK URGP=0 MARK=0

# nft list flow table filter acct_out|tr ',' '\n'|grep lo
flow table acct_out {
 "lo" . 94.101.208.217 . 212.97.129.25 expires 3m17s : counter packets 1 bytes 
40

I don't know if these packets are actually sent out on the correct
outbound interface thanks to the proper nexthop (the MAC= information
in the Netfilter log is from the received packet and thus not useful
here).

I tried running a tcpdump on the lo interface to see if these packets
would show up there, but during the three days I had it running, it
only logged one such packet, while Netfilter logs 20+ outbound packets
every day, and the one packet logged by tcpdump was *not* logged by
Netfilter.

If it helps, I can try adding each of the six dst_dev_put() calls from
the first bad commit individually to see which of them that has an
effect on this.

Regards,
Anders K. 
PedersenN�r��yb�X��ǧv�^�)޺{.n�+���z���׫u�ޖ)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥