Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

2018-02-07 Thread Mike Maloney
On Wed, Feb 7, 2018 at 12:23 PM, Yves-Alexis Perez <cor...@debian.org> wrote:
> On Wed, 2018-02-07 at 18:05 +0100, Yves-Alexis Perez wrote:
>> I'll try to printk the mtu before returning EINVAL to see why it's lower than
>> 1280, but maybe the IP encapsulation is not correctly handled?
>
> I did:
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 3763dc01e374..d3c651158d35 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1215,7 +1215,7 @@ static int ip6_setup_cork(struct sock *sk, struct 
> inet_cork_full *cork,
> mtu = np->frag_size;
> }
> if (mtu < IPV6_MIN_MTU)
> -   return -EINVAL;
> +   printk("mtu: %d\n", mtu);
> cork->base.fragsize = mtu;
> if (dst_allfrag(rt->dst.path))
> cork->base.flags |= IPCORK_ALLFRAG;
>
> and I get:
>
> févr. 07 18:19:50 scapa kernel: mtu: 1218
>
> and it doesn't depend on the original packet size (same thing happens with
> ping -s 100). It also happens with UDP (DNS) traffic, but apparently not with
> TCP.
>
> Regards,
> --
> Yves-Alexis

Hi Yves-Alexis -

I apologize for the problem.  It seems to me that tunneling with an
outer MTU that causes the inner MTU to be smaller than the min, is
potentially problematic in other ways as well.

But also it could seem unfortunate that the code with my fix does not
look at actual packet size, but instead only looks at the MTU and then
fails, even if no packet was going to be so large.  The intention of
my patch was to prevent a negative number while calculating the
maxfraglen in  __ip6_append_data().  An alternative fix maybe to
instead return an error only if the mtu is less than or equal to the
fragheaderlen.   Something like:

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 3763dc01e374..5d912a289b95 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1214,8 +1214,6 @@ static int ip6_setup_cork(struct sock *sk,
struct inet_cork_full *cork,
if (np->frag_size)
mtu = np->frag_size;
}
-   if (mtu < IPV6_MIN_MTU)
-   return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(rt->dst.path))
cork->base.flags |= IPCORK_ALLFRAG;
@@ -1264,6 +1262,8 @@ static int __ip6_append_data(struct sock *sk,

fragheaderlen = sizeof(struct ipv6hdr) + rt->rt6i_nfheader_len +
(opt ? opt->opt_nflen : 0);
+   if (mtu < fragheaderlen + 8)
+   return -EINVAL;
maxfraglen = ((mtu - fragheaderlen) & ~7) + fragheaderlen -
 sizeof(struct frag_hdr);
(opt ? opt->opt_nflen : 0);

But then we also have to convince ourselves that maxfraglen can never
be <= 0.  I'd have to think about that.

I am not sure if others have thoughts on supporting MTUs configured
below the min in the spec.


Thanks.
--
Mike Maloney


[PATCH net] ipv6: fix udpv6 sendmsg crash caused by too small MTU

2018-01-10 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
task: 8801d0b68580 task.stack: 8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:8801ac6bf570 EFLAGS: 00010216
RAX: 0001 RBX: 0028 RCX: c90003cce000
RDX: 01b8 RSI: 839df06f RDI: 8801d9478ca0
RBP: 8801ac6bf780 R08: 8801cc3f1dbc R09: 
R10: 8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: 8801cc3f1dc8
R13: 8801cc3f1d40 R14: 1036 R15: dc00
FS:  7f43d740c700() GS:8801dc10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f7834984000 CR3: 0001d79b9000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 ip6_finish_skb include/net/ipv6.h:911 [inline]
 udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x352/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:7f43d740bc08 EFLAGS: 0216 ORIG_RAX: 002c
RAX: ffda RBX: 007180a8 RCX: 004512e9
RDX: 002e RSI: 20d08000 RDI: 0005
RBP: 0086 R08: 209c1000 R09: 001c
R10: 00040800 R11: 0216 R12: 004b9c69
R13:  R14: 0005 R15: 202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 
01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 
8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: 8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 
8801ac6bf570

Reported-by: syzbot <syzkal...@googlegroups.com>
Signed-off-by: Mike Maloney <malo...@google.com>

---

Depends on eric.duma...@gmail.com's prior fix or leaks a dst reference.
https://patchwork.ozlabs.org/patch/858234/
(ipv6: fix possible mem leaks in ipv6_make_skb())

 net/ipv6/ip6_output.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f7dd51c42314..f309ce7120d0 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1206,14 +1206,16 @@ static int ip6_setup_cork(struct sock *sk, struct 
inet_cork_full *cork,
v6_cork->tclass = ipc6->tclass;
if (rt->dst.flags & DST_XFRM_TUNNEL)
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
- rt->dst.dev->mtu : dst_mtu(>dst);
+ READ_ONCE(rt->dst.dev->mtu) : dst_mtu(>dst);
else
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
- rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+ READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path);
if (np->frag_size < mtu) {
if (np->frag_size)
mtu = np->frag_size;
}
+   if (mtu < IPV6_MIN_MTU)
+   return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(rt->dst.path))
cork->base.flags |= IPCORK_ALLFRAG;
-- 
2.16.0.rc1.238.g530d649a79-goog



Re: [PATCH net] ipv6: fix udpv6 sendmsg crash caused by too small MTU

2018-01-10 Thread Mike Maloney
D'oh - resending with fixes.  Thanks Eric.

On Wed, Jan 10, 2018 at 12:21 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> On Wed, 2018-01-10 at 12:10 -0500, Mike Maloney wrote:
>> From: Mike Maloney <malo...@google.com>
>>
>> The logic in __ip6_append_data() assumes that the MTU is at least large
>> enough for the headers.  A device's MTU may be adjusted after being
>> added while sendmsg() is processing data, resulting in
>> __ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
>> the fragmentation header, the math results in a negative 'maxfraglen',
>> which causes problems when refragmenting any previous skb in the
>> skb_write_queue, leaving it possibly malformed.
>>
>> Instead sendmsg returns EINVAL when the mtu is calculated to be less
>> than IPV6_MIN_MTU.
>>
>
> You forgot your SOB
>
>> Reported-by: syzbot <syzkal...@googlegroups.com>
>> ---
>
> Also please add after this '---' marker that your patch depends on my
> prior fix ( https://patchwork.ozlabs.org/patch/858234/
> ipv6: fix possible mem leaks in ipv6_make_skb() )
>
> ( Or we leak a dst reference )
>
> We probably should have sent a patch series.
>
> Thanks.
>
>>  net/ipv6/ip6_output.c | 6 --
>>  1 file changed, 4 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index f7dd51c42314..f309ce7120d0 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1206,14 +1206,16 @@ static int ip6_setup_cork(struct sock *sk, struct 
>> inet_cork_full *cork,
>>   v6_cork->tclass = ipc6->tclass;
>>   if (rt->dst.flags & DST_XFRM_TUNNEL)
>>   mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
>> -   rt->dst.dev->mtu : dst_mtu(>dst);
>> +   READ_ONCE(rt->dst.dev->mtu) : dst_mtu(>dst);
>>   else
>>   mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
>> -   rt->dst.dev->mtu : dst_mtu(rt->dst.path);
>> +   READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path);
>>   if (np->frag_size < mtu) {
>>   if (np->frag_size)
>>   mtu = np->frag_size;
>>   }
>> + if (mtu < IPV6_MIN_MTU)
>> + return -EINVAL;
>>   cork->base.fragsize = mtu;
>>   if (dst_allfrag(rt->dst.path))
>>   cork->base.flags |= IPCORK_ALLFRAG;


[PATCH net] ipv6: fix udpv6 sendmsg crash caused by too small MTU

2018-01-10 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

The logic in __ip6_append_data() assumes that the MTU is at least large
enough for the headers.  A device's MTU may be adjusted after being
added while sendmsg() is processing data, resulting in
__ip6_append_data() seeing any MTU.  For an mtu smaller than the size of
the fragmentation header, the math results in a negative 'maxfraglen',
which causes problems when refragmenting any previous skb in the
skb_write_queue, leaving it possibly malformed.

Instead sendmsg returns EINVAL when the mtu is calculated to be less
than IPV6_MIN_MTU.

Found by syzkaller:
kernel BUG at ./include/linux/skbuff.h:2064!
invalid opcode:  [#1] SMP KASAN
Dumping ftrace buffer:
   (ftrace buffer empty)
Modules linked in:
CPU: 1 PID: 14216 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
task: 8801d0b68580 task.stack: 8801ac6b8000
RIP: 0010:__skb_pull include/linux/skbuff.h:2064 [inline]
RIP: 0010:__ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617
RSP: 0018:8801ac6bf570 EFLAGS: 00010216
RAX: 0001 RBX: 0028 RCX: c90003cce000
RDX: 01b8 RSI: 839df06f RDI: 8801d9478ca0
RBP: 8801ac6bf780 R08: 8801cc3f1dbc R09: 
R10: 8801ac6bf7a0 R11: 43cb4b7b1948a9e7 R12: 8801cc3f1dc8
R13: 8801cc3f1d40 R14: 1036 R15: dc00
FS:  7f43d740c700() GS:8801dc10() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7f7834984000 CR3: 0001d79b9000 CR4: 001406e0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 ip6_finish_skb include/net/ipv6.h:911 [inline]
 udp_v6_push_pending_frames+0x255/0x390 net/ipv6/udp.c:1093
 udpv6_sendmsg+0x280d/0x31a0 net/ipv6/udp.c:1363
 inet_sendmsg+0x11f/0x5e0 net/ipv4/af_inet.c:762
 sock_sendmsg_nosec net/socket.c:633 [inline]
 sock_sendmsg+0xca/0x110 net/socket.c:643
 SYSC_sendto+0x352/0x5a0 net/socket.c:1750
 SyS_sendto+0x40/0x50 net/socket.c:1718
 entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:7f43d740bc08 EFLAGS: 0216 ORIG_RAX: 002c
RAX: ffda RBX: 007180a8 RCX: 004512e9
RDX: 002e RSI: 20d08000 RDI: 0005
RBP: 0086 R08: 209c1000 R09: 001c
R10: 00040800 R11: 0216 R12: 004b9c69
R13:  R14: 0005 R15: 202c2000
Code: 9e 01 fe e9 c5 e8 ff ff e8 7f 9e 01 fe e9 4a ea ff ff 48 89 f7 e8 52 9e 
01 fe e9 aa eb ff ff e8 a8 b6 cf fd 0f 0b e8 a1 b6 cf fd <0f> 0b 49 8d 45 78 4d 
8d 45 7c 48 89 85 78 fe ff ff 49 8d 85 ba
RIP: __skb_pull include/linux/skbuff.h:2064 [inline] RSP: 8801ac6bf570
RIP: __ip6_make_skb+0x18cf/0x1f70 net/ipv6/ip6_output.c:1617 RSP: 
8801ac6bf570

Reported-by: syzbot <syzkal...@googlegroups.com>
---
 net/ipv6/ip6_output.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index f7dd51c42314..f309ce7120d0 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1206,14 +1206,16 @@ static int ip6_setup_cork(struct sock *sk, struct 
inet_cork_full *cork,
v6_cork->tclass = ipc6->tclass;
if (rt->dst.flags & DST_XFRM_TUNNEL)
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
- rt->dst.dev->mtu : dst_mtu(>dst);
+ READ_ONCE(rt->dst.dev->mtu) : dst_mtu(>dst);
else
mtu = np->pmtudisc >= IPV6_PMTUDISC_PROBE ?
- rt->dst.dev->mtu : dst_mtu(rt->dst.path);
+ READ_ONCE(rt->dst.dev->mtu) : dst_mtu(rt->dst.path);
if (np->frag_size < mtu) {
if (np->frag_size)
mtu = np->frag_size;
}
+   if (mtu < IPV6_MIN_MTU)
+   return -EINVAL;
cork->base.fragsize = mtu;
if (dst_allfrag(rt->dst.path))
cork->base.flags |= IPCORK_ALLFRAG;
-- 
2.16.0.rc1.238.g530d649a79-goog



Re: [PATCH net] ipv6: fix possible mem leaks in ipv6_make_skb()

2018-01-10 Thread Mike Maloney
Acked-by:  Mike Maloney <malo...@google.com>

Thanks Eric!

On Wed, Jan 10, 2018 at 6:45 AM, Eric Dumazet <eric.duma...@gmail.com> wrote:
> From: Eric Dumazet <eduma...@google.com>
>
> ip6_setup_cork() might return an error, while memory allocations have
> been done and must be rolled back.
>
> Fixes: 6422398c2ab0 ("ipv6: introduce ipv6_make_skb")
> Signed-off-by: Eric Dumazet <eduma...@google.com>
> Cc: Vlad Yasevich <vyasev...@gmail.com>
> Reported-by: Mike Maloney <malo...@google.com>
> ---
>  net/ipv6/ip6_output.c |5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 
> f7dd51c4231415fd1321fd431194d896ea2d1689..688ba5f7516b37c87b879036dce781bdcfa01739
>  100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1735,9 +1735,10 @@ struct sk_buff *ip6_make_skb(struct sock *sk,
> cork.base.opt = NULL;
> v6_cork.opt = NULL;
> err = ip6_setup_cork(sk, , _cork, ipc6, rt, fl6);
> -   if (err)
> +   if (err) {
> +   ip6_cork_release(, _cork);
> return ERR_PTR(err);
> -
> +   }
> if (ipc6->dontfrag < 0)
> ipc6->dontfrag = inet6_sk(sk)->dontfrag;
>


Re: [PATCH net] tcp: refresh tcp_mstamp from timers callbacks

2017-12-13 Thread Mike Maloney
Acked-by:  Mike Maloney <malo...@google.com>

Thanks for the quick fix!

On Tue, Dec 12, 2017 at 9:42 PM, Soheil Hassas Yeganeh
<soheil.k...@gmail.com> wrote:
> On Tue, Dec 12, 2017 at 9:26 PM, Neal Cardwell <ncardw...@google.com> wrote:
>> On Tue, Dec 12, 2017 at 9:22 PM, Eric Dumazet <eric.duma...@gmail.com> wrote:
>>> From: Eric Dumazet <eduma...@google.com>
>>>
>>> Only the retransmit timer currently refreshes tcp_mstamp
>>>
>>> We should do the same for delayed acks and keepalives.
>>>
>>> Even if RFC 7323 does not request it, this is consistent to what linux
>>> did in the past, when TS values were based on jiffies.
>>>
>>> Fixes: 385e20706fac ("tcp: use tp->tcp_mstamp in output path")
>>> Signed-off-by: Eric Dumazet <eduma...@google.com>
>>> Cc: Soheil Hassas Yeganeh <soh...@google.com>
>>> Cc: Mike Maloney <malo...@google.com>
>>> Cc: Neal Cardwell <ncardw...@google.com>
>>> ---
>>
>> Acked-by: Neal Cardwell <ncardw...@google.com>
>>
>> Thanks, Eric!
>>
>> neal
>
> Acked-by: Soheil Hassas Yeganeh <soh...@google.com>
>
> This is a very nice catch! Thank you Eric!


[PATCH net] packet: fix crash in fanout_demux_rollover()

2017-11-28 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

syzkaller found a race condition fanout_demux_rollover() while removing
a packet socket from a fanout group.

po->rollover is read and operated on during packet_rcv_fanout(), via
fanout_demux_rollover(), but the pointer is currently cleared before the
synchronization in packet_release().   It is safer to delay the cleanup
until after synchronize_net() has been called, ensuring all calls to
packet_rcv_fanout() for this socket have finished.

To further simplify synchronization around the rollover structure, set
po->rollover in fanout_add() only if there are no errors.  This removes
the need for rcu in the struct and in the call to
packet_getsockopt(..., PACKET_ROLLOVER_STATS, ...).

Crashing stack trace:
 fanout_demux_rollover+0xb6/0x4d0 net/packet/af_packet.c:1392
 packet_rcv_fanout+0x649/0x7c8 net/packet/af_packet.c:1487
 dev_queue_xmit_nit+0x835/0xc10 net/core/dev.c:1953
 xmit_one net/core/dev.c:2975 [inline]
 dev_hard_start_xmit+0x16b/0xac0 net/core/dev.c:2995
 __dev_queue_xmit+0x17a4/0x2050 net/core/dev.c:3476
 dev_queue_xmit+0x17/0x20 net/core/dev.c:3509
 neigh_connected_output+0x489/0x720 net/core/neighbour.c:1379
 neigh_output include/net/neighbour.h:482 [inline]
 ip6_finish_output2+0xad1/0x22a0 net/ipv6/ip6_output.c:120
 ip6_finish_output+0x2f9/0x920 net/ipv6/ip6_output.c:146
 NF_HOOK_COND include/linux/netfilter.h:239 [inline]
 ip6_output+0x1f4/0x850 net/ipv6/ip6_output.c:163
 dst_output include/net/dst.h:459 [inline]
 NF_HOOK.constprop.35+0xff/0x630 include/linux/netfilter.h:250
 mld_sendpack+0x6a8/0xcc0 net/ipv6/mcast.c:1660
 mld_send_initial_cr.part.24+0x103/0x150 net/ipv6/mcast.c:2072
 mld_send_initial_cr net/ipv6/mcast.c:2056 [inline]
 ipv6_mc_dad_complete+0x99/0x130 net/ipv6/mcast.c:2079
 addrconf_dad_completed+0x595/0x970 net/ipv6/addrconf.c:4039
 addrconf_dad_work+0xac9/0x1160 net/ipv6/addrconf.c:3971
 process_one_work+0xbf0/0x1bc0 kernel/workqueue.c:2113
 worker_thread+0x223/0x1990 kernel/workqueue.c:2247
 kthread+0x35e/0x430 kernel/kthread.c:231
 ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:432

Fixes: 0648ab70afe6 ("packet: rollover prepare: per-socket state")
Fixes: 509c7a1ecc860 ("packet: avoid panic in packet_getsockopt()")
Reported-by: syzbot <syzkal...@googlegroups.com>
Signed-off-by: Mike Maloney <malo...@google.com>
---
 net/packet/af_packet.c | 32 ++--
 net/packet/internal.h  |  1 -
 2 files changed, 10 insertions(+), 23 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 737092ca9b4e..1b7bb9d9865e 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -1687,7 +1687,6 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
atomic_long_set(>num, 0);
atomic_long_set(>num_huge, 0);
atomic_long_set(>num_failed, 0);
-   po->rollover = rollover;
}
 
if (type_flags & PACKET_FANOUT_FLAG_UNIQUEID) {
@@ -1745,6 +1744,8 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
if (refcount_read(>sk_ref) < PACKET_FANOUT_MAX) {
__dev_remove_pack(>prot_hook);
po->fanout = match;
+   po->rollover = rollover;
+   rollover = NULL;
refcount_set(>sk_ref, 
refcount_read(>sk_ref) + 1);
__fanout_link(sk, po);
err = 0;
@@ -1758,10 +1759,7 @@ static int fanout_add(struct sock *sk, u16 id, u16 
type_flags)
}
 
 out:
-   if (err && rollover) {
-   kfree_rcu(rollover, rcu);
-   po->rollover = NULL;
-   }
+   kfree(rollover);
mutex_unlock(_mutex);
return err;
 }
@@ -1785,11 +1783,6 @@ static struct packet_fanout *fanout_release(struct sock 
*sk)
list_del(>list);
else
f = NULL;
-
-   if (po->rollover) {
-   kfree_rcu(po->rollover, rcu);
-   po->rollover = NULL;
-   }
}
mutex_unlock(_mutex);
 
@@ -3029,6 +3022,7 @@ static int packet_release(struct socket *sock)
synchronize_net();
 
if (f) {
+   kfree(po->rollover);
fanout_release_data(f);
kfree(f);
}
@@ -3843,7 +3837,6 @@ static int packet_getsockopt(struct socket *sock, int 
level, int optname,
void *data = 
union tpacket_stats_u st;
struct tpacket_rollover_stats rstats;
-   struct packet_rollover *rollover;
 
if (level != SOL_PACKET)
return -ENOPROTOOPT;
@@ -3922,18 +3915,13 @@ static int packet_getsockopt(struct socket *sock, int 
level, int optname,
   0);
break;
case PACKET_ROLLOVER_STATS:
-   

[PATCH v2 net-next 2/2] selftests/net: Add a test to validate behavior of rx timestamps

2017-08-22 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

Validate the behavior of the combination of various timestamp socket
options, and ensure consistency across ip, udp, and tcp.

Signed-off-by: Mike Maloney <malo...@google.com>
---
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile |   4 +-
 .../networking/timestamping/rxtimestamp.c  | 389 +
 3 files changed, 393 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/networking/timestamping/rxtimestamp.c

diff --git a/tools/testing/selftests/networking/timestamping/.gitignore 
b/tools/testing/selftests/networking/timestamping/.gitignore
index 9e69e982fb38..d9355035e746 100644
--- a/tools/testing/selftests/networking/timestamping/.gitignore
+++ b/tools/testing/selftests/networking/timestamping/.gitignore
@@ -1,3 +1,4 @@
 timestamping
+rxtimestamp
 txtimestamp
 hwtstamp_config
diff --git a/tools/testing/selftests/networking/timestamping/Makefile 
b/tools/testing/selftests/networking/timestamping/Makefile
index ccbb9ed9..92fb8ee917c5 100644
--- a/tools/testing/selftests/networking/timestamping/Makefile
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -1,4 +1,6 @@
-TEST_PROGS := hwtstamp_config timestamping txtimestamp
+CFLAGS += -I../../../../../usr/include
+
+TEST_PROGS := hwtstamp_config rxtimestamp timestamping txtimestamp
 
 all: $(TEST_PROGS)
 
diff --git a/tools/testing/selftests/networking/timestamping/rxtimestamp.c 
b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
new file mode 100644
index ..00f286661dcd
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
@@ -0,0 +1,389 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+struct options {
+   int so_timestamp;
+   int so_timestampns;
+   int so_timestamping;
+};
+
+struct tstamps {
+   bool tstamp;
+   bool tstampns;
+   bool swtstamp;
+   bool hwtstamp;
+};
+
+struct socket_type {
+   char *friendly_name;
+   int type;
+   int protocol;
+   bool enabled;
+};
+
+struct test_case {
+   struct options sockopt;
+   struct tstamps expected;
+   bool enabled;
+};
+
+struct sof_flag {
+   int mask;
+   char *name;
+};
+
+static struct sof_flag sof_flags[] = {
+#define SOF_FLAG(f) { f, #f }
+   SOF_FLAG(SOF_TIMESTAMPING_SOFTWARE),
+   SOF_FLAG(SOF_TIMESTAMPING_RX_SOFTWARE),
+   SOF_FLAG(SOF_TIMESTAMPING_RX_HARDWARE),
+};
+
+static struct socket_type socket_types[] = {
+   { "ip", SOCK_RAW,   IPPROTO_EGP },
+   { "udp",SOCK_DGRAM, IPPROTO_UDP },
+   { "tcp",SOCK_STREAM,IPPROTO_TCP },
+};
+
+static struct test_case test_cases[] = {
+   { {}, {} },
+   {
+   { so_timestamp: 1 },
+   { tstamp: true }
+   },
+   {
+   { so_timestampns: 1 },
+   { tstampns: true }
+   },
+   {
+   { so_timestamp: 1, so_timestampns: 1 },
+   { tstampns: true }
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE },
+   {}
+   },
+   {
+   /* Loopback device does not support hw timestamps. */
+   { so_timestamping: SOF_TIMESTAMPING_RX_HARDWARE },
+   {}
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_SOFTWARE },
+   {}
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE
+   | SOF_TIMESTAMPING_RX_HARDWARE },
+   {}
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+   | SOF_TIMESTAMPING_RX_SOFTWARE },
+   { swtstamp: true }
+   },
+   {
+   { so_timestamp: 1, so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+   | SOF_TIMESTAMPING_RX_SOFTWARE },
+   { tstamp: true, swtstamp: true }
+   },
+};
+
+static struct option long_options[] = {
+   { "list_tests", no_argument, 0, 'l' },
+   { "test_num", required_argument, 0, 'n' },
+   { "op_size", required_argument, 0, 's' },
+   { "tcp", no_argument, 0, 't' },
+   { "udp", no_argument, 0, 'u' },
+   { "ip", no_argument, 0, 'i' },
+};
+
+static int next_port = 1;
+static int op_size = 10 * 1024;
+
+void print_test_case(struct test_case *t)
+{
+   int f = 0;
+
+   printf("sockopts {");
+   if (t->sockopt.so_timestamp)
+   printf(" SO_TIMESTAMP ");
+   if (t->sockopt.so_timestampns)
+   printf(" SO_TIMESTAMPNS &q

[PATCH v2 net-next 1/2] tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg

2017-08-22 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
timestamp corresponding to the highest sequence number data returned.

Previously the skb->tstamp is overwritten when a TCP packet is placed
in the out of order queue.  While the packet is in the ooo queue, save the
timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
options which are only used on the tx path, and a previously unused 4
byte hole.

When skbs are coalesced either in the sk_receive_queue or the
out_of_order_queue always choose the timestamp of the appended skb to
maintain the invariant of returning the timestamp of the last byte in
the recvmsg buffer.

Signed-off-by: Mike Maloney <malo...@google.com>
---
 include/net/tcp.h|  9 +++-
 net/ipv4/tcp.c   | 65 
 net/ipv4/tcp_input.c | 35 
 net/ipv4/tcp_ipv4.c  |  2 ++
 net/ipv6/tcp_ipv6.c  |  2 ++
 5 files changed, 108 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index afdab3781425..f26d20e9760d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -774,6 +774,12 @@ struct tcp_skb_cb {
u16 tcp_gso_segs;
u16 tcp_gso_size;
};
+
+   /* Used to stash the receive timestamp while this skb is in the
+* out of order queue, as skb->tstamp is overwritten by the
+* rbnode.
+*/
+   ktime_t swtstamp;
};
__u8tcp_flags;  /* TCP header flags. (tcp[13])  */
 
@@ -790,7 +796,8 @@ struct tcp_skb_cb {
__u8ip_dsfield; /* IPv4 tos or IPv6 dsfield */
__u8txstamp_ack:1,  /* Record TX timestamp for ack? */
eor:1,  /* Is skb MSG_EOR marked? */
-   unused:6;
+   has_rxtstamp:1, /* SKB has a RX timestamp   */
+   unused:5;
__u32   ack_seq;/* Sequence number ACK'd*/
union {
struct {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d25e3bcca66b..0cce4472b4a1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -269,6 +269,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1695,6 +1696,61 @@ int tcp_peek_len(struct socket *sock)
 }
 EXPORT_SYMBOL(tcp_peek_len);
 
+static void tcp_update_recv_tstamps(struct sk_buff *skb,
+   struct scm_timestamping *tss)
+{
+   if (skb->tstamp)
+   tss->ts[0] = ktime_to_timespec(skb->tstamp);
+   else
+   tss->ts[0] = (struct timespec) {0};
+
+   if (skb_hwtstamps(skb)->hwtstamp)
+   tss->ts[2] = ktime_to_timespec(skb_hwtstamps(skb)->hwtstamp);
+   else
+   tss->ts[2] = (struct timespec) {0};
+}
+
+/* Similar to __sock_recv_timestamp, but does not require an skb */
+void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
+   struct scm_timestamping *tss)
+{
+   struct timeval tv;
+   bool has_timestamping = false;
+
+   if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
+   if (sock_flag(sk, SOCK_RCVTSTAMP)) {
+   if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+sizeof(tss->ts[0]), >ts[0]);
+   } else {
+   tv.tv_sec = tss->ts[0].tv_sec;
+   tv.tv_usec = tss->ts[0].tv_nsec / 1000;
+
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+sizeof(tv), );
+   }
+   }
+
+   if (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE)
+   has_timestamping = true;
+   else
+   tss->ts[0] = (struct timespec) {0};
+   }
+
+   if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
+   if (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
+   has_timestamping = true;
+   else
+   tss->ts[2] = (struct timespec) {0};
+   }
+
+   if (has_timestamping) {
+   tss->ts[1] = (struct timespec) {0};
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING,
+sizeof(*tss), tss);
+   }
+}
+
 /*
  * This routine copies from a sock struct into the user buffer.
  *
@@ -1716,6 +1772,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
long timeo;
struct sk_buff *skb, *last;
u32 urg_hole = 0;
+   struct scm_timestamping tss;
+   bool has_tss = false;
 
if (unli

[PATCH v2 net-next 0/2] Add software rx timestamp for TCP.

2017-08-22 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

Add software rx timestamps for TCP, and a test to ensure consistency of
behavior between IP, UDP, and TCP implementation.

Changes since v1:
  -Initialize tss->ts[1] to 0 if caller requested any timestamps.
  -Fix test case to validate that tss->ts[1] is zero.
  -Fix tests to actually use a raw socket.
  -Fix --tcp flag to work on the test.

Mike Maloney (2):
  tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
  selftests/net: Add a test to validate behavior of rx timestamps

 include/net/tcp.h  |   9 +-
 net/ipv4/tcp.c |  65 
 net/ipv4/tcp_input.c   |  35 +-
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv6/tcp_ipv6.c|   2 +
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile |   4 +-
 .../networking/timestamping/rxtimestamp.c  | 389 +
 8 files changed, 501 insertions(+), 6 deletions(-)
 create mode 100644 
tools/testing/selftests/networking/timestamping/rxtimestamp.c

-- 
2.14.1.480.gb18f417b89-goog



[PATCH net-next 2/2] selftests/net: Add a test to validate behavior of rx timestamps

2017-08-22 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

Validate the behavior of the combination of various timestamp socket
options, and ensure consistency across ip, udp, and tcp.

Signed-off-by: Mike Maloney <malo...@google.com>
---
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile |   4 +-
 .../networking/timestamping/rxtimestamp.c  | 379 +
 3 files changed, 383 insertions(+), 1 deletion(-)
 create mode 100644 
tools/testing/selftests/networking/timestamping/rxtimestamp.c

diff --git a/tools/testing/selftests/networking/timestamping/.gitignore 
b/tools/testing/selftests/networking/timestamping/.gitignore
index 9e69e982fb38..d9355035e746 100644
--- a/tools/testing/selftests/networking/timestamping/.gitignore
+++ b/tools/testing/selftests/networking/timestamping/.gitignore
@@ -1,3 +1,4 @@
 timestamping
+rxtimestamp
 txtimestamp
 hwtstamp_config
diff --git a/tools/testing/selftests/networking/timestamping/Makefile 
b/tools/testing/selftests/networking/timestamping/Makefile
index ccbb9ed9..92fb8ee917c5 100644
--- a/tools/testing/selftests/networking/timestamping/Makefile
+++ b/tools/testing/selftests/networking/timestamping/Makefile
@@ -1,4 +1,6 @@
-TEST_PROGS := hwtstamp_config timestamping txtimestamp
+CFLAGS += -I../../../../../usr/include
+
+TEST_PROGS := hwtstamp_config rxtimestamp timestamping txtimestamp
 
 all: $(TEST_PROGS)
 
diff --git a/tools/testing/selftests/networking/timestamping/rxtimestamp.c 
b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
new file mode 100644
index ..6abcdf401d1a
--- /dev/null
+++ b/tools/testing/selftests/networking/timestamping/rxtimestamp.c
@@ -0,0 +1,379 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
+
+struct options {
+   int so_timestamp;
+   int so_timestampns;
+   int so_timestamping;
+};
+
+struct tstamps {
+   bool tstamp;
+   bool tstampns;
+   bool swtstamp;
+   bool hwtstamp;
+};
+
+struct socket_type {
+   char *friendly_name;
+   int type;
+   int protocol;
+   bool enabled;
+};
+
+struct test_case {
+   struct options sockopt;
+   struct tstamps expected;
+   bool enabled;
+};
+
+struct sof_flag {
+   int mask;
+   char *name;
+};
+
+static struct sof_flag sof_flags[] = {
+#define SOF_FLAG(f) { f, #f }
+   SOF_FLAG(SOF_TIMESTAMPING_SOFTWARE),
+   SOF_FLAG(SOF_TIMESTAMPING_RX_SOFTWARE),
+   SOF_FLAG(SOF_TIMESTAMPING_RX_HARDWARE),
+};
+
+static struct socket_type socket_types[] = {
+   { "ip", SOCK_DGRAM, IPPROTO_IP },
+   { "udp",SOCK_DGRAM, IPPROTO_UDP },
+   { "tcp",SOCK_STREAM,IPPROTO_TCP },
+};
+
+static struct test_case test_cases[] = {
+   { {}, {} },
+   {
+   { so_timestamp: 1 },
+   { tstamp: true }
+   },
+   {
+   { so_timestampns: 1 },
+   { tstampns: true }
+   },
+   {
+   { so_timestamp: 1, so_timestampns: 1 },
+   { tstampns: true }
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE },
+   {}
+   },
+   {
+   /* Loopback device does not support hw timestamps. */
+   { so_timestamping: SOF_TIMESTAMPING_RX_HARDWARE },
+   {}
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_SOFTWARE },
+   {}
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_RX_SOFTWARE
+   | SOF_TIMESTAMPING_RX_HARDWARE },
+   {}
+   },
+   {
+   { so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+   | SOF_TIMESTAMPING_RX_SOFTWARE },
+   { swtstamp: true }
+   },
+   {
+   { so_timestamp: 1, so_timestamping: SOF_TIMESTAMPING_SOFTWARE
+   | SOF_TIMESTAMPING_RX_SOFTWARE },
+   { tstamp: true, swtstamp: true }
+   },
+};
+
+static struct option long_options[] = {
+   { "list_tests", no_argument, 0, 'l' },
+   { "test_num", required_argument, 0, 'n' },
+   { "op_size", required_argument, 0, 's' },
+   { "tcp", no_argument, 0, 't' },
+   { "udp", no_argument, 0, 'u' },
+   { "ip", no_argument, 0, 'i' },
+};
+
+static int next_port = 1;
+static int op_size = 10 * 1024;
+
+void print_test_case(struct test_case *t)
+{
+   int f = 0;
+
+   printf("sockopts {");
+   if (t->sockopt.so_timestamp)
+   printf(" SO_TIMESTAMP ");
+   if (t->sockopt.so_timestampns)
+   printf(" SO_TIMESTAMPNS &q

[PATCH net-next 1/2] tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg

2017-08-22 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

When SOF_TIMESTAMPING_RX_SOFTWARE is enabled for tcp sockets, return the
timestamp corresponding to the highest sequence number data returned.

Previously the skb->tstamp is overwritten when a TCP packet is placed
in the out of order queue.  While the packet is in the ooo queue, save the
timestamp in the TCB_SKB_CB.  This space is shared with the gso_*
options which are only used on the tx path, and a previously unused 4
byte hole.

When skbs are coalesced either in the sk_receive_queue or the
out_of_order_queue always choose the timestamp of the appended skb to
maintain the invariant of returning the timestamp of the last byte in
the recvmsg buffer.

Signed-off-by: Mike Maloney <malo...@google.com>
---
 include/net/tcp.h|  9 +++-
 net/ipv4/tcp.c   | 63 
 net/ipv4/tcp_input.c | 35 +
 net/ipv4/tcp_ipv4.c  |  2 ++
 net/ipv6/tcp_ipv6.c  |  2 ++
 5 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index afdab3781425..f26d20e9760d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -774,6 +774,12 @@ struct tcp_skb_cb {
u16 tcp_gso_segs;
u16 tcp_gso_size;
};
+
+   /* Used to stash the receive timestamp while this skb is in the
+* out of order queue, as skb->tstamp is overwritten by the
+* rbnode.
+*/
+   ktime_t swtstamp;
};
__u8tcp_flags;  /* TCP header flags. (tcp[13])  */
 
@@ -790,7 +796,8 @@ struct tcp_skb_cb {
__u8ip_dsfield; /* IPv4 tos or IPv6 dsfield */
__u8txstamp_ack:1,  /* Record TX timestamp for ack? */
eor:1,  /* Is skb MSG_EOR marked? */
-   unused:6;
+   has_rxtstamp:1, /* SKB has a RX timestamp   */
+   unused:5;
__u32   ack_seq;/* Sequence number ACK'd*/
union {
struct {
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index d25e3bcca66b..4c58c7b2d8ed 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -269,6 +269,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1695,6 +1696,59 @@ int tcp_peek_len(struct socket *sock)
 }
 EXPORT_SYMBOL(tcp_peek_len);
 
+static void tcp_update_recv_tstamps(struct sk_buff *skb,
+   struct scm_timestamping *tss)
+{
+   if (skb->tstamp)
+   tss->ts[0] = ktime_to_timespec(skb->tstamp);
+   else
+   tss->ts[0] = (struct timespec) {0};
+
+   if (skb_hwtstamps(skb)->hwtstamp)
+   tss->ts[2] = ktime_to_timespec(skb_hwtstamps(skb)->hwtstamp);
+   else
+   tss->ts[2] = (struct timespec) {0};
+}
+
+/* Similar to __sock_recv_timestamp, but does not require an skb */
+void tcp_recv_timestamp(struct msghdr *msg, const struct sock *sk,
+   struct scm_timestamping *tss)
+{
+   struct timeval tv;
+   bool has_timestamping = false;
+
+   if (tss->ts[0].tv_sec || tss->ts[0].tv_nsec) {
+   if (sock_flag(sk, SOCK_RCVTSTAMP)) {
+   if (sock_flag(sk, SOCK_RCVTSTAMPNS)) {
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPNS,
+sizeof(tss->ts[0]), >ts[0]);
+   } else {
+   tv.tv_sec = tss->ts[0].tv_sec;
+   tv.tv_usec = tss->ts[0].tv_nsec / 1000;
+
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMP,
+sizeof(tv), );
+   }
+   }
+
+   if (sk->sk_tsflags & SOF_TIMESTAMPING_SOFTWARE)
+   has_timestamping = true;
+   else
+   tss->ts[0] = (struct timespec) {0};
+   }
+
+   if (tss->ts[2].tv_sec || tss->ts[2].tv_nsec) {
+   if (sk->sk_tsflags & SOF_TIMESTAMPING_RAW_HARDWARE)
+   has_timestamping = true;
+   else
+   tss->ts[2] = (struct timespec) {0};
+   }
+
+   if (has_timestamping)
+   put_cmsg(msg, SOL_SOCKET, SCM_TIMESTAMPING,
+sizeof(*tss), tss);
+}
+
 /*
  * This routine copies from a sock struct into the user buffer.
  *
@@ -1716,6 +1770,8 @@ int tcp_recvmsg(struct sock *sk, struct msghdr *msg, 
size_t len, int nonblock,
long timeo;
struct sk_buff *skb, *last;
u32 urg_hole = 0;
+   struct scm_timestamping tss;
+   bool has_tss = false;
 
if (unlikely(flags & MSG_ERRQUEUE))
return inet_

[PATCH net-next 0/2] tcp: Add software rx timestamp for TCP.

2017-08-22 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

Add software rx timestamps for TCP, and a test to ensure consistency of
behavior between IP, UDP, and TCP implementations.

Mike Maloney (2):
  tcp: Extend SOF_TIMESTAMPING_RX_SOFTWARE to TCP recvmsg
  selftests/net: Add a test to validate behavior of rx timestamps

 include/net/tcp.h  |   9 +-
 net/ipv4/tcp.c |  63 
 net/ipv4/tcp_input.c   |  35 +-
 net/ipv4/tcp_ipv4.c|   2 +
 net/ipv6/tcp_ipv6.c|   2 +
 .../selftests/networking/timestamping/.gitignore   |   1 +
 .../selftests/networking/timestamping/Makefile |   4 +-
 .../networking/timestamping/rxtimestamp.c  | 379 +
 8 files changed, 489 insertions(+), 6 deletions(-)
 create mode 100644 
tools/testing/selftests/networking/timestamping/rxtimestamp.c

-- 
2.14.1.480.gb18f417b89-goog



[PATCH net-next] selftests/net: Fix broken test case in psock_fanout

2017-04-24 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

The error return falue form sock_fanout_open is -1, not zero.  One test
case was checking for 0 instead of -1.

Tested: Built and tested in clean client.
Signed-off-by: Mike Maloney <malo...@google.com>
---
 tools/testing/selftests/net/psock_fanout.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c 
b/tools/testing/selftests/net/psock_fanout.c
index b4b1d91fcea5..989f917068d1 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -305,7 +305,7 @@ static void test_unique_fanout_group_ids(void)
exit(1);
}
 
-   if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id)) {
+   if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id) != -1) {
fprintf(stderr, "ERROR: joined group with wrong type.\n");
exit(1);
}
-- 
2.13.0.rc0.306.g87b477812d-goog



[PATCH netnext 0/3] packet: Add option to create new fanout group with unique id.

2017-04-20 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

Fanout uses a per net global namespace. A process that intends to create a
new fanout group can accidentally join an existing group. It is
not possible to detect this.

Add a socket option to specify on the first call to
setsockopt(..., PACKET_FANOUT, ...) to ensure that a new group is created.
Also add tests.

Mike Maloney (3):
  selftests/net: cleanup unused parameter in psock_fanout
  packet: add PACKET_FANOUT_FLAG_UNIQUEID to assign new fanout group id.
  selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID

 include/uapi/linux/if_packet.h |  1 +
 net/packet/af_packet.c | 44 ++
 tools/testing/selftests/net/psock_fanout.c | 93 ++
 3 files changed, 128 insertions(+), 10 deletions(-)

-- 
2.12.2.816.g281164-goog



[PATCH netnext 3/3] selftests/net: add tests for PACKET_FANOUT_FLAG_UNIQUEID

2017-04-20 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

Create two groups with PACKET_FANOUT_FLAG_UNIQUEID, add a socket to one.
Ensure that the groups can only be joined if all options are consistent
with the original except for this flag.

Tested:
ran tools/testing/selftests/net/psock_fanout 10 times, all pass.

Signed-off-by: Mike Maloney <malo...@google.com>
Acked-by: Willem de Bruijn <will...@google.com>

---
 tools/testing/selftests/net/psock_fanout.c | 95 ++
 1 file changed, 84 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c 
b/tools/testing/selftests/net/psock_fanout.c
index b475d87d3aa3..1c46a21a9b5e 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -71,7 +71,7 @@
 
 /* Open a socket in a given fanout mode.
  * @return -1 if mode is bad, a valid socket otherwise */
-static int sock_fanout_open(uint16_t typeflags)
+static int sock_fanout_open(uint16_t typeflags, uint16_t group_id)
 {
int fd, val;
 
@@ -81,8 +81,7 @@ static int sock_fanout_open(uint16_t typeflags)
exit(1);
}
 
-   /* fanout group ID is always 0: tests whether old groups are deleted */
-   val = ((int) typeflags) << 16;
+   val = (((int) typeflags) << 16) | group_id;
if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT, , sizeof(val))) {
if (close(fd)) {
perror("close packet");
@@ -95,6 +94,20 @@ static int sock_fanout_open(uint16_t typeflags)
return fd;
 }
 
+static void sock_fanout_getopts(int fd, uint16_t *typeflags, uint16_t 
*group_id)
+{
+   int sockopt;
+   socklen_t sockopt_len = sizeof(sockopt);
+
+   if (getsockopt(fd, SOL_PACKET, PACKET_FANOUT,
+  , _len)) {
+   perror("failed to getsockopt");
+   exit(1);
+   }
+   *typeflags = sockopt >> 16;
+   *group_id = sockopt & 0xf;
+}
+
 static void sock_fanout_set_ebpf(int fd)
 {
const int len_off = __builtin_offsetof(struct __sk_buff, len);
@@ -210,7 +223,7 @@ static void test_control_single(void)
fprintf(stderr, "test: control single socket\n");
 
if (sock_fanout_open(PACKET_FANOUT_ROLLOVER |
-  PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
+  PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
fprintf(stderr, "ERROR: opened socket with dual rollover\n");
exit(1);
}
@@ -223,26 +236,26 @@ static void test_control_group(void)
 
fprintf(stderr, "test: control multiple sockets\n");
 
-   fds[0] = sock_fanout_open(PACKET_FANOUT_HASH);
+   fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
if (fds[0] == -1) {
fprintf(stderr, "ERROR: failed to open HASH socket\n");
exit(1);
}
if (sock_fanout_open(PACKET_FANOUT_HASH |
-  PACKET_FANOUT_FLAG_DEFRAG) != -1) {
+  PACKET_FANOUT_FLAG_DEFRAG, 0) != -1) {
fprintf(stderr, "ERROR: joined group with wrong flag defrag\n");
exit(1);
}
if (sock_fanout_open(PACKET_FANOUT_HASH |
-  PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
+  PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
fprintf(stderr, "ERROR: joined group with wrong flag ro\n");
exit(1);
}
-   if (sock_fanout_open(PACKET_FANOUT_CPU) != -1) {
+   if (sock_fanout_open(PACKET_FANOUT_CPU, 0) != -1) {
fprintf(stderr, "ERROR: joined group with wrong mode\n");
exit(1);
}
-   fds[1] = sock_fanout_open(PACKET_FANOUT_HASH);
+   fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 0);
if (fds[1] == -1) {
fprintf(stderr, "ERROR: failed to join group\n");
exit(1);
@@ -253,6 +266,61 @@ static void test_control_group(void)
}
 }
 
+/* Test creating a unique fanout group ids */
+static void test_unique_fanout_group_ids(void)
+{
+   int fds[3];
+   uint16_t typeflags, first_group_id, second_group_id;
+
+   fprintf(stderr, "test: unique ids\n");
+
+   fds[0] = sock_fanout_open(PACKET_FANOUT_HASH |
+ PACKET_FANOUT_FLAG_UNIQUEID, 0);
+   if (fds[0] == -1) {
+   fprintf(stderr, "ERROR: failed to create a unique id group.\n");
+   exit(1);
+   }
+
+   sock_fanout_getopts(fds[0], , _group_id);
+   if (typeflags != PACKET_FANOUT_HASH) {
+   fprintf(stderr, "ERROR: unexpected typeflags %x\n", typeflags);
+   exit(1);
+   }
+
+   if (sock_fanout_open(PACKET_FANOUT_CPU, first_group_id)) {
+   fprintf(stderr

[PATCH netnext 1/3] selftests/net: cleanup unused parameter in psock_fanout

2017-04-20 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

sock_fanout_open no longer sets the size of packet_socket ring, so stop
passing the parameter.

Tested:
Built and ran the test, it passed.

Signed-off-by: Mike Maloney <malo...@google.com>
Acked-by: Willem de Bruijn <will...@google.com>

---
 tools/testing/selftests/net/psock_fanout.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c 
b/tools/testing/selftests/net/psock_fanout.c
index 412459369686..b475d87d3aa3 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -71,7 +71,7 @@
 
 /* Open a socket in a given fanout mode.
  * @return -1 if mode is bad, a valid socket otherwise */
-static int sock_fanout_open(uint16_t typeflags, int num_packets)
+static int sock_fanout_open(uint16_t typeflags)
 {
int fd, val;
 
@@ -210,7 +210,7 @@ static void test_control_single(void)
fprintf(stderr, "test: control single socket\n");
 
if (sock_fanout_open(PACKET_FANOUT_ROLLOVER |
-  PACKET_FANOUT_FLAG_ROLLOVER, 0) != -1) {
+  PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
fprintf(stderr, "ERROR: opened socket with dual rollover\n");
exit(1);
}
@@ -223,26 +223,26 @@ static void test_control_group(void)
 
fprintf(stderr, "test: control multiple sockets\n");
 
-   fds[0] = sock_fanout_open(PACKET_FANOUT_HASH, 20);
+   fds[0] = sock_fanout_open(PACKET_FANOUT_HASH);
if (fds[0] == -1) {
fprintf(stderr, "ERROR: failed to open HASH socket\n");
exit(1);
}
if (sock_fanout_open(PACKET_FANOUT_HASH |
-  PACKET_FANOUT_FLAG_DEFRAG, 10) != -1) {
+  PACKET_FANOUT_FLAG_DEFRAG) != -1) {
fprintf(stderr, "ERROR: joined group with wrong flag defrag\n");
exit(1);
}
if (sock_fanout_open(PACKET_FANOUT_HASH |
-  PACKET_FANOUT_FLAG_ROLLOVER, 10) != -1) {
+  PACKET_FANOUT_FLAG_ROLLOVER) != -1) {
fprintf(stderr, "ERROR: joined group with wrong flag ro\n");
exit(1);
}
-   if (sock_fanout_open(PACKET_FANOUT_CPU, 10) != -1) {
+   if (sock_fanout_open(PACKET_FANOUT_CPU) != -1) {
fprintf(stderr, "ERROR: joined group with wrong mode\n");
exit(1);
}
-   fds[1] = sock_fanout_open(PACKET_FANOUT_HASH, 20);
+   fds[1] = sock_fanout_open(PACKET_FANOUT_HASH);
if (fds[1] == -1) {
fprintf(stderr, "ERROR: failed to join group\n");
exit(1);
@@ -263,8 +263,8 @@ static int test_datapath(uint16_t typeflags, int port_off,
 
fprintf(stderr, "test: datapath 0x%hx\n", typeflags);
 
-   fds[0] = sock_fanout_open(typeflags, 20);
-   fds[1] = sock_fanout_open(typeflags, 20);
+   fds[0] = sock_fanout_open(typeflags);
+   fds[1] = sock_fanout_open(typeflags);
if (fds[0] == -1 || fds[1] == -1) {
fprintf(stderr, "ERROR: failed open\n");
exit(1);
-- 
2.12.2.816.g281164-goog



Re: [PATCH net] selftests/net: Fixes psock_fanout CBPF test case

2017-04-18 Thread Mike Maloney
On Tue, Apr 18, 2017 at 11:26 AM, Sowmini Varadhan
<sowmini.varad...@oracle.com> wrote:
> On (04/18/17 11:14), Mike Maloney wrote:
>> Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
>> SO_ATTACH_FILTER can examine the entire frame.  Create a new CBPF
>> program for use with PACKET_FANOUT_DATA which ignores the header, as it
>> cannot see the ethernet header.
>
> Fix look good to me, but could you please also add the bpf_asm input
> as a comment to the C code, in case we want to to read/extend  this
> down the road?
>
> --Sowmini
>

I am not 100% sure what you are asking for, as the instructions you
can feed to bpf_asm are already commented to the right of the program.

-Mike


[PATCH net] selftests/net: Fixes psock_fanout CBPF test case

2017-04-18 Thread Mike Maloney
From: Mike Maloney <malo...@google.com>

'psock_fanout' has been failing since commit 4d7b9dc1f36a9 ("tools:
psock_lib: harden socket filter used by psock tests").  That commit
changed the CBPF filter to examine the full ethernet frame, and was
tested on 'psock_tpacket' which uses SOCK_RAW.  But 'psock_fanout' was
also using this same CBPF in two places, for filtering and fanout, on a
SOCK_DGRAM socket.

Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
SO_ATTACH_FILTER can examine the entire frame.  Create a new CBPF
program for use with PACKET_FANOUT_DATA which ignores the header, as it
cannot see the ethernet header.

Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
and they all passed.

Fixes: 4d7b9dc1f36a9 ("tools: psock_lib: harden socket filter used by psock 
tests")
Signed-off-by: 'Mike Maloney <maloneyker...@gmail.com>'

---
 tools/testing/selftests/net/psock_fanout.c | 22 --
 tools/testing/selftests/net/psock_lib.h| 13 +++--
 2 files changed, 23 insertions(+), 12 deletions(-)

diff --git a/tools/testing/selftests/net/psock_fanout.c 
b/tools/testing/selftests/net/psock_fanout.c
index 412459369686..e62bb354820c 100644
--- a/tools/testing/selftests/net/psock_fanout.c
+++ b/tools/testing/selftests/net/psock_fanout.c
@@ -75,7 +75,7 @@ static int sock_fanout_open(uint16_t typeflags, int 
num_packets)
 {
int fd, val;
 
-   fd = socket(PF_PACKET, SOCK_DGRAM, htons(ETH_P_IP));
+   fd = socket(PF_PACKET, SOCK_RAW, htons(ETH_P_IP));
if (fd < 0) {
perror("socket packet");
exit(1);
@@ -95,6 +95,24 @@ static int sock_fanout_open(uint16_t typeflags, int 
num_packets)
return fd;
 }
 
+static void sock_fanout_set_cbpf(int fd)
+{
+   struct sock_filter bpf_filter[] = {
+   BPF_STMT(BPF_LD+BPF_B+BPF_ABS, 80),   /* ldb [80] */
+   BPF_STMT(BPF_RET+BPF_A, 0),   /* ret A */
+   };
+   struct sock_fprog bpf_prog;
+
+   bpf_prog.filter = bpf_filter;
+   bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
+
+   if (setsockopt(fd, SOL_PACKET, PACKET_FANOUT_DATA, _prog,
+  sizeof(bpf_prog))) {
+   perror("fanout data cbpf");
+   exit(1);
+   }
+}
+
 static void sock_fanout_set_ebpf(int fd)
 {
const int len_off = __builtin_offsetof(struct __sk_buff, len);
@@ -270,7 +288,7 @@ static int test_datapath(uint16_t typeflags, int port_off,
exit(1);
}
if (type == PACKET_FANOUT_CBPF)
-   sock_setfilter(fds[0], SOL_PACKET, PACKET_FANOUT_DATA);
+   sock_fanout_set_cbpf(fds[0]);
else if (type == PACKET_FANOUT_EBPF)
sock_fanout_set_ebpf(fds[0]);
 
diff --git a/tools/testing/selftests/net/psock_lib.h 
b/tools/testing/selftests/net/psock_lib.h
index a77da88bf946..7d990d6c861b 100644
--- a/tools/testing/selftests/net/psock_lib.h
+++ b/tools/testing/selftests/net/psock_lib.h
@@ -38,7 +38,7 @@
 # define __maybe_unused__attribute__ ((__unused__))
 #endif
 
-static __maybe_unused void sock_setfilter(int fd, int lvl, int optnum)
+static __maybe_unused void pair_udp_setfilter(int fd)
 {
/* the filter below checks for all of the following conditions that
 * are based on the contents of create_payload()
@@ -76,23 +76,16 @@ static __maybe_unused void sock_setfilter(int fd, int lvl, 
int optnum)
};
struct sock_fprog bpf_prog;
 
-   if (lvl == SOL_PACKET && optnum == PACKET_FANOUT_DATA)
-   bpf_filter[5].code = 0x16;   /* RET A */
-
bpf_prog.filter = bpf_filter;
bpf_prog.len = sizeof(bpf_filter) / sizeof(struct sock_filter);
-   if (setsockopt(fd, lvl, optnum, _prog,
+
+   if (setsockopt(fd, SOL_SOCKET, SO_ATTACH_FILTER, _prog,
   sizeof(bpf_prog))) {
perror("setsockopt SO_ATTACH_FILTER");
exit(1);
}
 }
 
-static __maybe_unused void pair_udp_setfilter(int fd)
-{
-   sock_setfilter(fd, SOL_SOCKET, SO_ATTACH_FILTER);
-}
-
 static __maybe_unused void pair_udp_open(int fds[], uint16_t port)
 {
struct sockaddr_in saddr, daddr;
-- 
2.12.2.762.g0e3151a226-goog