Re: [PATCH ipsec-next] xfrm: Allow xfrmi if_id to be updated by UPDSA

2018-07-22 Thread Steffen Klassert
On Thu, Jul 19, 2018 at 07:07:47PM -0700, Nathan Harold wrote:
> Allow attaching an SA to an xfrm interface id after
> the creation of the SA, so that tasks such as keying
> which must be done as the SA is created, can remain
> separate from the decision on how to route traffic
> from an SA. This permits SA creation to be decomposed
> in to three separate steps:
> 1) allocation of a SPI
> 2) algorithm and key negotiation
> 3) insertion into the data path
> 
> Signed-off-by: Nathan Harold 

Applied to ipsec-next, thanks Nathan!


Re: [PATCH ipsec-next] xfrm: Remove xfrmi interface ID from flowi

2018-07-22 Thread Steffen Klassert
On Thu, Jul 19, 2018 at 10:50:44AM -0700, Benedict Wong wrote:
> In order to remove performance impact of having the extra u32 in every
> single flowi, this change removes the flowi_xfrm struct, prefering to
> take the if_id as a method parameter where needed.
> 
> In the inbound direction, if_id is only needed during the
> __xfrm_check_policy() function, and the if_id can be determined at that
> point based on the skb. As such, xfrmi_decode_session() is only called
> with the skb in __xfrm_check_policy().
> 
> In the outbound direction, the only place where if_id is needed is the
> xfrm_lookup() call in xfrmi_xmit2(). With this change, the if_id is
> directly passed into the xfrm_lookup_with_ifid() call. All existing
> callers can still call xfrm_lookup(), which uses a default if_id of 0.
> 
> This change does not change any behavior of XFRMIs except for improving
> overall system performance via flowi size reduction.
> 
> This change has been tested against the Android Kernel Networking Tests:
> 
> https://android.googlesource.com/kernel/tests/+/master/net/test
> 
> Signed-off-by: Benedict Wong 

Nice improvement. Applied to ipsec-next, thanks a lot!


Re: [PATCH ipsec-next 1/1] xfrm: don't check offload_handle for nonzero

2018-07-22 Thread Steffen Klassert
On Tue, Jun 26, 2018 at 02:19:10PM -0700, Shannon Nelson wrote:
> The offload_handle should be an opaque data cookie for the driver
> to use, much like the data cookie for a timer or alarm callback.
> Thus, the XFRM stack should not be checking for non-zero, because
> the driver might use that to store an array reference, which could
> be zero, or some other zero but meaningful value.
> 
> We can remove the checks for non-zero because there are plenty
> other attributes also being checked to see if there is an offload
> in place for the SA in question.
> 
> Signed-off-by: Shannon Nelson 

Applied to ipsec-next, thanks Shannon!


Re: [PATCH net] ip: in cmsg IP(V6)_ORIGDSTADDR do not read beyond headlen

2018-07-22 Thread Willem de Bruijn
On Sun, Jul 22, 2018 at 9:43 PM Eric Dumazet  wrote:
>
>
>
> On 07/22/2018 05:43 PM, Willem de Bruijn wrote:
> > From: Willem de Bruijn 
> >
> > Syzbot reported a read beyond the end of the skb head when returning
> > IPV6_ORIGDSTADDR:
> >
> >   BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
> >   CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
> >   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
> >   Google 01/01/2011
> >   Call Trace:
> > __dump_stack lib/dump_stack.c:77 [inline]
> > dump_stack+0x185/0x1d0 lib/dump_stack.c:113
> > kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
> > kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
> > kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
> > copy_to_user include/linux/uaccess.h:184 [inline]
> > put_cmsg+0x5ef/0x860 net/core/scm.c:242
> > ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
> > ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
> > rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
> > [..]
> >
> > This logic and its ipv4 counterpart read the destination port from
> > the packet at skb_transport_offset(skb) + 4.
> >
> > With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
> > packet that stores headers exactly up to skb_transport_offset(skb) in
> > the head and the remainder in a frag.
> >
> > Avoid reading beyond skb->tail by testing skb_headlen(skb) instead of
> > skb->len when trying to access this field.
> >
> > Link: 
> > http://lkml.kernel.org/r/CAF=yd-lejwzj5a1-baaj2oy_hkmgygv6rsj_woraynv-fna...@mail.gmail.com
> > Reported-by: syzbot+9adb4b567003cac78...@syzkaller.appspotmail.com
> > Signed-off-by: Willem de Bruijn 
> > ---
> >  net/ipv4/ip_sockglue.c | 2 +-
> >  net/ipv6/datagram.c| 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> > index 64c76dcf7386..5f228d641d9b 100644
> > --- a/net/ipv4/ip_sockglue.c
> > +++ b/net/ipv4/ip_sockglue.c
> > @@ -152,7 +152,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, 
> > struct sk_buff *skb)
> >   const struct iphdr *iph = ip_hdr(skb);
> >   __be16 *ports = (__be16 *)skb_transport_header(skb);
> >
> > - if (skb_transport_offset(skb) + 4 > (int)skb->len)
> > + if (skb_transport_offset(skb) + 4 > (int)skb_headlen(skb))
> >   return;
> >
>
> This really should call pskb_may_pull() or skb_header_pointer().

Okay. Since in all common cases these bytes do lie in the skb head,
pskb_may_pull is pretty cheap. I'll use that. The linked discussion
already outlines a rough patch.


Re: [PATCH net] ip: in cmsg IP(V6)_ORIGDSTADDR do not read beyond headlen

2018-07-22 Thread Eric Dumazet



On 07/22/2018 05:43 PM, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> Syzbot reported a read beyond the end of the skb head when returning
> IPV6_ORIGDSTADDR:
> 
>   BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
>   CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
>   Google 01/01/2011
>   Call Trace:
> __dump_stack lib/dump_stack.c:77 [inline]
> dump_stack+0x185/0x1d0 lib/dump_stack.c:113
> kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
> kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
> kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
> copy_to_user include/linux/uaccess.h:184 [inline]
> put_cmsg+0x5ef/0x860 net/core/scm.c:242
> ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
> ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
> rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
> [..]
> 
> This logic and its ipv4 counterpart read the destination port from
> the packet at skb_transport_offset(skb) + 4.
> 
> With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
> packet that stores headers exactly up to skb_transport_offset(skb) in
> the head and the remainder in a frag.
> 
> Avoid reading beyond skb->tail by testing skb_headlen(skb) instead of
> skb->len when trying to access this field.
> 
> Link: 
> http://lkml.kernel.org/r/CAF=yd-lejwzj5a1-baaj2oy_hkmgygv6rsj_woraynv-fna...@mail.gmail.com
> Reported-by: syzbot+9adb4b567003cac78...@syzkaller.appspotmail.com
> Signed-off-by: Willem de Bruijn 
> ---
>  net/ipv4/ip_sockglue.c | 2 +-
>  net/ipv6/datagram.c| 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 64c76dcf7386..5f228d641d9b 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -152,7 +152,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, 
> struct sk_buff *skb)
>   const struct iphdr *iph = ip_hdr(skb);
>   __be16 *ports = (__be16 *)skb_transport_header(skb);
>  
> - if (skb_transport_offset(skb) + 4 > (int)skb->len)
> + if (skb_transport_offset(skb) + 4 > (int)skb_headlen(skb))
>   return;
>

This really should call pskb_may_pull() or skb_header_pointer().



[PATCH net] ip: in cmsg IP(V6)_ORIGDSTADDR do not read beyond headlen

2018-07-22 Thread Willem de Bruijn
From: Willem de Bruijn 

Syzbot reported a read beyond the end of the skb head when returning
IPV6_ORIGDSTADDR:

  BUG: KMSAN: kernel-infoleak in put_cmsg+0x5ef/0x860 net/core/scm.c:242
  CPU: 0 PID: 4501 Comm: syz-executor128 Not tainted 4.17.0+ #9
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
  Google 01/01/2011
  Call Trace:
__dump_stack lib/dump_stack.c:77 [inline]
dump_stack+0x185/0x1d0 lib/dump_stack.c:113
kmsan_report+0x188/0x2a0 mm/kmsan/kmsan.c:1125
kmsan_internal_check_memory+0x138/0x1f0 mm/kmsan/kmsan.c:1219
kmsan_copy_to_user+0x7a/0x160 mm/kmsan/kmsan.c:1261
copy_to_user include/linux/uaccess.h:184 [inline]
put_cmsg+0x5ef/0x860 net/core/scm.c:242
ip6_datagram_recv_specific_ctl+0x1cf3/0x1eb0 net/ipv6/datagram.c:719
ip6_datagram_recv_ctl+0x41c/0x450 net/ipv6/datagram.c:733
rawv6_recvmsg+0x10fb/0x1460 net/ipv6/raw.c:521
[..]

This logic and its ipv4 counterpart read the destination port from
the packet at skb_transport_offset(skb) + 4.

With MSG_MORE and a local SOCK_RAW sender, syzbot was able to cook a
packet that stores headers exactly up to skb_transport_offset(skb) in
the head and the remainder in a frag.

Avoid reading beyond skb->tail by testing skb_headlen(skb) instead of
skb->len when trying to access this field.

Link: 
http://lkml.kernel.org/r/CAF=yd-lejwzj5a1-baaj2oy_hkmgygv6rsj_woraynv-fna...@mail.gmail.com
Reported-by: syzbot+9adb4b567003cac78...@syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn 
---
 net/ipv4/ip_sockglue.c | 2 +-
 net/ipv6/datagram.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 64c76dcf7386..5f228d641d9b 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -152,7 +152,7 @@ static void ip_cmsg_recv_dstaddr(struct msghdr *msg, struct 
sk_buff *skb)
const struct iphdr *iph = ip_hdr(skb);
__be16 *ports = (__be16 *)skb_transport_header(skb);
 
-   if (skb_transport_offset(skb) + 4 > (int)skb->len)
+   if (skb_transport_offset(skb) + 4 > (int)skb_headlen(skb))
return;
 
/* All current transport protocols have the port numbers in the
diff --git a/net/ipv6/datagram.c b/net/ipv6/datagram.c
index 2ee08b6a86a4..fd43810aacd2 100644
--- a/net/ipv6/datagram.c
+++ b/net/ipv6/datagram.c
@@ -702,7 +702,7 @@ void ip6_datagram_recv_specific_ctl(struct sock *sk, struct 
msghdr *msg,
struct sockaddr_in6 sin6;
__be16 *ports = (__be16 *) skb_transport_header(skb);
 
-   if (skb_transport_offset(skb) + 4 <= (int)skb->len) {
+   if (skb_transport_offset(skb) + 4 <= (int)skb_headlen(skb)) {
/* All current transport protocols have the port 
numbers in the
 * first four bytes of the transport header and this 
function is
 * written with this assumption in mind.
-- 
2.18.0.233.g985f88cf7e-goog



Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing

2018-07-22 Thread Uwe Kleine-König
On Sun, Jul 22, 2018 at 01:04:11PM -0700, David Miller wrote:
> From: Uwe Kleine-König 
> Date: Sun, 22 Jul 2018 21:00:35 +0200
> 
> > On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote:
> >> From: Uwe Kleine-König 
> >> Date: Fri, 20 Jul 2018 11:53:15 +0200
> >> 
> >> > free_irq() waits until all handlers for this IRQ have completed. As the
> >> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
> >> > it might never return if the thread calling free_irq() holds this lock.
> >> > 
> >> > For the same reason kthread_cancel_delayed_work_sync() in the polling 
> >> > case
> >> > must not hold this lock.
> >> > 
> >> > Also first free the irq (or stop the worker respectively) such that
> >> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
> >> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
> >> > worker thread to call handle_nested_irq(0) which results in a 
> >> > NULL-pointer
> >> > exception.
> >> > 
> >> > Signed-off-by: Uwe Kleine-König 
> >> 
> >> Looks good.
> >> 
> >> Note than the IRQ domain unmapping will do a synchronize_irq() which
> >> should cause the same deadlock as free_irq() will with the reg_lock
> >> held.
> > 
> > Do you think that there is still a problem? When free_irq() for the
> > external visible irq returns the muxed irqs should be all gone, too, so
> > this should not trigger, should it?
> 
> It shouldn't be a problem after your changes.
> 
> I'm just saying that I'm surprised that, in the original code, you see
> the deadlock in free_irq(), since the synchronize_irq() done by the
> IRQ domain code should have happened first.

ah, I see. This didn't happen because I added an msleep to
mv88e6xxx_g1_irq_thread_work() before the lock it taken to widen the
race window for a different problem. So the sub-irqs were not active
when mv88e6xxx_g1_irq_free() run, only the mux-irq was. When
irq_dispose_mapping() is called for the sub-irq there is no problem as
this results in synchronize_irq() for the sub-irq, not the mux-irq.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing

2018-07-22 Thread David Miller
From: Uwe Kleine-König 
Date: Sun, 22 Jul 2018 21:00:35 +0200

> On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote:
>> From: Uwe Kleine-König 
>> Date: Fri, 20 Jul 2018 11:53:15 +0200
>> 
>> > free_irq() waits until all handlers for this IRQ have completed. As the
>> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
>> > it might never return if the thread calling free_irq() holds this lock.
>> > 
>> > For the same reason kthread_cancel_delayed_work_sync() in the polling case
>> > must not hold this lock.
>> > 
>> > Also first free the irq (or stop the worker respectively) such that
>> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
>> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
>> > worker thread to call handle_nested_irq(0) which results in a NULL-pointer
>> > exception.
>> > 
>> > Signed-off-by: Uwe Kleine-König 
>> 
>> Looks good.
>> 
>> Note than the IRQ domain unmapping will do a synchronize_irq() which
>> should cause the same deadlock as free_irq() will with the reg_lock
>> held.
> 
> Do you think that there is still a problem? When free_irq() for the
> external visible irq returns the muxed irqs should be all gone, too, so
> this should not trigger, should it?

It shouldn't be a problem after your changes.

I'm just saying that I'm surprised that, in the original code, you see
the deadlock in free_irq(), since the synchronize_irq() done by the
IRQ domain code should have happened first.


Re: [PATCH] net: dsa: mv88e6xxx: fix races between lock and irq freeing

2018-07-22 Thread Uwe Kleine-König
Hello,

On Sat, Jul 21, 2018 at 10:44:09PM -0700, David Miller wrote:
> From: Uwe Kleine-König 
> Date: Fri, 20 Jul 2018 11:53:15 +0200
> 
> > free_irq() waits until all handlers for this IRQ have completed. As the
> > relevant handler (mv88e6xxx_g1_irq_thread_fn()) takes the chip's reg_lock
> > it might never return if the thread calling free_irq() holds this lock.
> > 
> > For the same reason kthread_cancel_delayed_work_sync() in the polling case
> > must not hold this lock.
> > 
> > Also first free the irq (or stop the worker respectively) such that
> > mv88e6xxx_g1_irq_thread_work() isn't called any more before the irq
> > mappings are dropped in mv88e6xxx_g1_irq_free_common() to prevent the
> > worker thread to call handle_nested_irq(0) which results in a NULL-pointer
> > exception.
> > 
> > Signed-off-by: Uwe Kleine-König 
> 
> Looks good.
> 
> Note than the IRQ domain unmapping will do a synchronize_irq() which
> should cause the same deadlock as free_irq() will with the reg_lock
> held.

Do you think that there is still a problem? When free_irq() for the
external visible irq returns the muxed irqs should be all gone, too, so
this should not trigger, should it?

> Note also that g2 IRQ freeing gets the ordering right, and doesn't need
> a lock because it doesn't program any registers when tearing down it's
> IRQ.

Yes.

> Applied and queued up for -stable, thanks.

Fine, thanks
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Hello Beautiful

2018-07-22 Thread Jack
Hi Dear, my name is Jack and i am seeking for a relationship in which i will 
feel loved after a series of failed relationships. 

I am hoping that you would be interested and we could possibly get to know each 
other more if you do not mind. I am open to answering questions from you as i 
think my approach is a little inappropriate. Hope to hear back from you.

Jack.


Re: [PATCH net-next 1/2] net: phy: add helper phy_polling_mode

2018-07-22 Thread David Miller


I think you can combine these two patches into one.

Thank you.


Re: [PATCH net-next 2/2] nfp: avoid buffer leak when FW communication fails

2018-07-22 Thread David Miller
From: Jakub Kicinski 
Date: Fri, 20 Jul 2018 21:14:39 -0700

> After device is stopped we reset the rings by moving all free buffers
> to positions [0, cnt - 2], and clear the position cnt - 1 in the ring.
> We then proceed to clear the read/write pointers.  This means that if
> we try to reset the ring again the code will assume that the next to
> fill buffer is at position 0 and swap it with cnt - 1.  Since we
> previously cleared position cnt - 1 it will lead to leaking the first
> buffer and leaving ring in a bad state.
> 
> This scenario can only happen if FW communication fails, in which case
> the ring will never be used again, so the fact it's in a bad state will
> not be noticed.  Buffer leak is the only problem.  Don't try to move
> buffers in the ring if the read/write pointers indicate the ring was
> never used or have already been reset.
> 
> nfp_net_clear_config_and_disable() is now fully idempotent.
> 
> Found by code inspection, FW communication failures are very rare,
> and reconfiguring a live device is not common either, so it's unlikely
> anyone has ever noticed the leak.
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Dirk van der Merwe 

Applied.

> This is arguably net material but IMHO the risk of me missing something
> this could break is higher than the error actually occurring, and a
> page leak on a FW communication error doesn't seem like it's worth
> it at -rc6 time..  I'm happy to respin if I'm wrong!

Agreed, net-next is more appropriate for this.

Thanks.


Re: [PATCH net-next 1/2] nfp: bring back support for offloading shared blocks

2018-07-22 Thread David Miller
From: Jakub Kicinski 
Date: Fri, 20 Jul 2018 21:14:38 -0700

> Now that we have offload replay infrastructure added by
> commit 326367427cc0 ("net: sched: call reoffload op on block callback reg")
> and flows are guaranteed to be removed correctly, we can revert
> commit 951a8ee6def3 ("nfp: reject binding to shared blocks").
> 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: John Hurley 

Applied.


Re: [PATCH net] nfp: flower: ensure dead neighbour entries are not offloaded

2018-07-22 Thread David Miller
From: Jakub Kicinski 
Date: Fri, 20 Jul 2018 21:07:54 -0700

> From: John Hurley 
> 
> Previously only the neighbour state was checked to decide if an offloaded
> entry should be removed. However, there can be situations when the entry
> is dead but still marked as valid. This can lead to dead entries not
> being removed from fw tables or even incorrect data being added.
> 
> Check the entry dead bit before deciding if it should be added to or
> removed from fw neighbour tables.
> 
> Fixes: 8e6a9046b66a ("nfp: flower vxlan neighbour offload")
> Signed-off-by: John Hurley 
> Reviewed-by: Jakub Kicinski 

Patch applied and queued up for -stable, thanks Jakub.


Re: [PATCH net 0/4] vxlan: fix default fdb entry user-space notify ordering/race

2018-07-22 Thread David Miller
From: Roopa Prabhu 
Date: Fri, 20 Jul 2018 13:21:00 -0700

> From: Roopa Prabhu 
> 
> Problem:
> In vxlan_newlink, a default fdb entry is added before register_netdev.
> The default fdb creation function notifies user-space of the
> fdb entry on the vxlan device which user-space does not know about yet.
> (RTM_NEWNEIGH goes before RTM_NEWLINK for the same ifindex).
> 
> This series fixes the user-space netlink notification ordering issue
> with the following changes:
> - decouple fdb notify from fdb create.
> - Move fdb notify after register_netdev.
> - modify rtnl_configure_link to allow configuring a link early.
> - Call rtnl_configure_link in vxlan newlink handler to notify
> userspace about the newlink before fdb notify and
> hence avoiding the user-space race.
> 
> Fixes: afbd8bae9c79 ("vxlan: add implicit fdb entry for default destination")
> Signed-off-by: Roopa Prabhu 

Series applied and queued up for -stable, thanks Roopa.


Re: [PATCH net] atl1c: reserve min skb headroom

2018-07-22 Thread David Miller
From: Florian Westphal 
Date: Fri, 20 Jul 2018 19:30:57 +0200

> Got crash report with following backtrace:
> BUG: unable to handle kernel paging request at 8801869daffe
> RIP: 0010:[]  [] 
> ip6_finish_output2+0x394/0x4c0
> RSP: 0018:880186c83a98  EFLAGS: 00010283
> RAX: 8801869db00e ...
>   [] ip6_finish_output+0x8c/0xf0
>   [] ip6_output+0x57/0x100
>   [] ip6_forward+0x4b9/0x840
>   [] ip6_rcv_finish+0x66/0xc0
>   [] ipv6_rcv+0x319/0x530
>   [] netif_receive_skb+0x1c/0x70
>   [] atl1c_clean+0x1ec/0x310 [atl1c]
>   ...
> 
> The bad access is in neigh_hh_output(), at skb->data - 16 (HH_DATA_MOD).
> atl1c driver provided skb with no headroom, so 14 bytes (ethernet
> header) got pulled, but then 16 are copied.
> 
> Reserve NET_SKB_PAD bytes headroom, like netdev_alloc_skb().
> 
> Compile tested only; I lack hardware.
> 
> Fixes: 7b7017642199 ("atl1c: Fix misuse of netdev_alloc_skb in refilling rx 
> ring")
> Signed-off-by: Florian Westphal 

Ancient bug :-/

Applied and queued up for -stable, thanks Florian.


Re: [PATCH net-next] bonding: don't cast const buf in sysfs store

2018-07-22 Thread David Miller
From: Nikolay Aleksandrov 
Date: Sun, 22 Jul 2018 11:37:31 +0300

> As was recently discussed [1], let's avoid casting the const buf in
> bonding_sysfs_store_option and use kstrndup/kfree instead.
> 
> [1] http://lists.openwall.net/netdev/2018/07/22/25
> 
> Signed-off-by: Nikolay Aleksandrov 

Applied, thanks Nikolay.


[PATCH v3 bpf-next 7/8] veth: Add XDP TX and REDIRECT

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

This allows further redirection of xdp_frames like

 NIC   -> veth--veth -> veth--veth
 (XDP)  (XDP) (XDP)

The intermediate XDP, redirecting packets from NIC to the other veth,
reuses xdp_mem_info from NIC so that page recycling of the NIC works on
the destination veth's XDP.
In this way return_frame is not fully guarded by NAPI, since another
NAPI handler on another cpu may use the same xdp_mem_info concurrently.
Thus disable napi_direct by XDP_MEM_RF_NO_DIRECT flag.

v3:
- Fix double free when veth_xdp_tx() returns a positive value.
- Convert xdp_xmit and xdp_redir variables into flags.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/veth.c | 119 +
 1 file changed, 110 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 57187e955fea..0323a4ca74e2 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -32,6 +32,10 @@
 #define VETH_RING_SIZE 256
 #define VETH_XDP_HEADROOM  (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
 
+/* Separating two types of XDP xmit */
+#define VETH_XDP_TXBIT(0)
+#define VETH_XDP_REDIR BIT(1)
+
 struct pcpu_vstats {
u64 packets;
u64 bytes;
@@ -45,6 +49,7 @@ struct veth_priv {
struct bpf_prog *_xdp_prog;
struct net_device __rcu *peer;
atomic64_t  dropped;
+   struct xdp_mem_info xdp_mem;
unsignedrequested_headroom;
boolrx_notify_masked;
struct ptr_ring xdp_ring;
@@ -311,10 +316,42 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
return n - drops;
 }
 
+static void veth_xdp_flush(struct net_device *dev)
+{
+   struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+   struct net_device *rcv;
+
+   rcu_read_lock();
+   rcv = rcu_dereference(priv->peer);
+   if (unlikely(!rcv))
+   goto out;
+
+   rcv_priv = netdev_priv(rcv);
+   /* xdp_ring is initialized on receive side? */
+   if (unlikely(!rcu_access_pointer(rcv_priv->xdp_prog)))
+   goto out;
+
+   __veth_xdp_flush(rcv_priv);
+out:
+   rcu_read_unlock();
+}
+
+static int veth_xdp_tx(struct net_device *dev, struct xdp_buff *xdp)
+{
+   struct xdp_frame *frame = convert_to_xdp_frame(xdp);
+
+   if (unlikely(!frame))
+   return -EOVERFLOW;
+
+   return veth_xdp_xmit(dev, 1, , 0);
+}
+
 static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
-   struct xdp_frame *frame)
+   struct xdp_frame *frame,
+   unsigned int *xdp_xmit)
 {
int len = frame->len, delta = 0;
+   struct xdp_frame orig_frame;
struct bpf_prog *xdp_prog;
unsigned int headroom;
struct sk_buff *skb;
@@ -338,6 +375,31 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv 
*priv,
delta = frame->data - xdp.data;
len = xdp.data_end - xdp.data;
break;
+   case XDP_TX:
+   orig_frame = *frame;
+   xdp.data_hard_start = frame;
+   xdp.rxq->mem = frame->mem;
+   xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT;
+   if (unlikely(veth_xdp_tx(priv->dev, ) < 0)) {
+   trace_xdp_exception(priv->dev, xdp_prog, act);
+   frame = _frame;
+   goto err_xdp;
+   }
+   *xdp_xmit |= VETH_XDP_TX;
+   rcu_read_unlock();
+   goto xdp_xmit;
+   case XDP_REDIRECT:
+   orig_frame = *frame;
+   xdp.data_hard_start = frame;
+   xdp.rxq->mem = frame->mem;
+   xdp.rxq->mem.flags |= XDP_MEM_RF_NO_DIRECT;
+   if (xdp_do_redirect(priv->dev, , xdp_prog)) {
+   frame = _frame;
+   goto err_xdp;
+   }
+   *xdp_xmit |= VETH_XDP_REDIR;
+   rcu_read_unlock();
+   goto xdp_xmit;
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
@@ -362,12 +424,13 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_priv 
*priv,
 err_xdp:
rcu_read_unlock();
xdp_return_frame(frame);
-
+xdp_xmit:
return NULL;
 }
 
 static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv,
-   struct sk_buff *skb)
+   struct sk_buff *skb,
+   unsigned int *xdp_xmit)
 {

[PATCH v3 bpf-next 4/8] veth: Handle xdp_frames in xdp napi ring

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

This is preparation for XDP TX and ndo_xdp_xmit.
This allows napi handler to handle xdp_frames through xdp ring as well
as sk_buff.

v3:
- Revert v2 change around rings and use a flag to differentiate skb and
  xdp_frame, since bulk skb xmit makes little performance difference
  for now.

v2:
- Use another ring instead of using flag to differentiate skb and
  xdp_frame. This approach makes bulk skb transmit possible in
  veth_xmit later.
- Clear xdp_frame feilds in skb->head.
- Implement adjust_tail.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/veth.c | 87 ++
 1 file changed, 82 insertions(+), 5 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f5b72e937d9d..4be75c58bc6a 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -22,12 +22,12 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 
 #define DRV_NAME   "veth"
 #define DRV_VERSION"1.0"
 
+#define VETH_XDP_FLAG  BIT(0)
 #define VETH_RING_SIZE 256
 #define VETH_XDP_HEADROOM  (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
 
@@ -115,6 +115,24 @@ static const struct ethtool_ops veth_ethtool_ops = {
 
 /* general routines */
 
+static bool veth_is_xdp_frame(void *ptr)
+{
+   return (unsigned long)ptr & VETH_XDP_FLAG;
+}
+
+static void *veth_ptr_to_xdp(void *ptr)
+{
+   return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG);
+}
+
+static void veth_ptr_free(void *ptr)
+{
+   if (veth_is_xdp_frame(ptr))
+   xdp_return_frame(veth_ptr_to_xdp(ptr));
+   else
+   kfree_skb(ptr);
+}
+
 static void __veth_xdp_flush(struct veth_priv *priv)
 {
/* Write ptr_ring before reading rx_notify_masked */
@@ -249,6 +267,61 @@ static struct sk_buff *veth_build_skb(void *head, int 
headroom, int len,
return skb;
 }
 
+static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
+   struct xdp_frame *frame)
+{
+   int len = frame->len, delta = 0;
+   struct bpf_prog *xdp_prog;
+   unsigned int headroom;
+   struct sk_buff *skb;
+
+   rcu_read_lock();
+   xdp_prog = rcu_dereference(priv->xdp_prog);
+   if (likely(xdp_prog)) {
+   struct xdp_buff xdp;
+   u32 act;
+
+   xdp.data_hard_start = frame->data - frame->headroom;
+   xdp.data = frame->data;
+   xdp.data_end = frame->data + frame->len;
+   xdp.data_meta = frame->data - frame->metasize;
+   xdp.rxq = >xdp_rxq;
+
+   act = bpf_prog_run_xdp(xdp_prog, );
+
+   switch (act) {
+   case XDP_PASS:
+   delta = frame->data - xdp.data;
+   len = xdp.data_end - xdp.data;
+   break;
+   default:
+   bpf_warn_invalid_xdp_action(act);
+   case XDP_ABORTED:
+   trace_xdp_exception(priv->dev, xdp_prog, act);
+   case XDP_DROP:
+   goto err_xdp;
+   }
+   }
+   rcu_read_unlock();
+
+   headroom = frame->data - delta - (void *)frame;
+   skb = veth_build_skb(frame, headroom, len, 0);
+   if (!skb) {
+   xdp_return_frame(frame);
+   goto err;
+   }
+
+   memset(frame, 0, sizeof(*frame));
+   skb->protocol = eth_type_trans(skb, priv->dev);
+err:
+   return skb;
+err_xdp:
+   rcu_read_unlock();
+   xdp_return_frame(frame);
+
+   return NULL;
+}
+
 static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv,
struct sk_buff *skb)
 {
@@ -358,12 +431,16 @@ static int veth_xdp_rcv(struct veth_priv *priv, int 
budget)
int i, done = 0;
 
for (i = 0; i < budget; i++) {
-   struct sk_buff *skb = __ptr_ring_consume(>xdp_ring);
+   void *ptr = __ptr_ring_consume(>xdp_ring);
+   struct sk_buff *skb;
 
-   if (!skb)
+   if (!ptr)
break;
 
-   skb = veth_xdp_rcv_skb(priv, skb);
+   if (veth_is_xdp_frame(ptr))
+   skb = veth_xdp_rcv_one(priv, veth_ptr_to_xdp(ptr));
+   else
+   skb = veth_xdp_rcv_skb(priv, ptr);
 
if (skb)
napi_gro_receive(>xdp_napi, skb);
@@ -416,7 +493,7 @@ static void veth_napi_del(struct net_device *dev)
napi_disable(>xdp_napi);
netif_napi_del(>xdp_napi);
priv->rx_notify_masked = false;
-   ptr_ring_cleanup(>xdp_ring, __skb_array_destroy_skb);
+   ptr_ring_cleanup(>xdp_ring, veth_ptr_free);
 }
 
 static int veth_enable_xdp(struct net_device *dev)
-- 
2.14.3



[PATCH v3 bpf-next 5/8] veth: Add ndo_xdp_xmit

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

This allows NIC's XDP to redirect packets to veth. The destination veth
device enqueues redirected packets to the napi ring of its peer, then
they are processed by XDP on its peer veth device.
This can be thought as calling another XDP program by XDP program using
REDIRECT, when the peer enables driver XDP.

Note that when the peer veth device does not set driver xdp, redirected
packets will be dropped because the peer is not ready for NAPI.

v2:
- Drop the part converting xdp_frame into skb when XDP is not enabled.
- Implement bulk interface of ndo_xdp_xmit.
- Implement XDP_XMIT_FLUSH bit and drop ndo_xdp_flush.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/veth.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 4be75c58bc6a..57187e955fea 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -125,6 +126,11 @@ static void *veth_ptr_to_xdp(void *ptr)
return (void *)((unsigned long)ptr & ~VETH_XDP_FLAG);
 }
 
+static void *veth_xdp_to_ptr(void *ptr)
+{
+   return (void *)((unsigned long)ptr | VETH_XDP_FLAG);
+}
+
 static void veth_ptr_free(void *ptr)
 {
if (veth_is_xdp_frame(ptr))
@@ -267,6 +273,44 @@ static struct sk_buff *veth_build_skb(void *head, int 
headroom, int len,
return skb;
 }
 
+static int veth_xdp_xmit(struct net_device *dev, int n,
+struct xdp_frame **frames, u32 flags)
+{
+   struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+   struct net_device *rcv;
+   int i, drops = 0;
+
+   if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK))
+   return -EINVAL;
+
+   rcv = rcu_dereference(priv->peer);
+   if (unlikely(!rcv))
+   return -ENXIO;
+
+   rcv_priv = netdev_priv(rcv);
+   /* xdp_ring is initialized on receive side? */
+   if (!rcu_access_pointer(rcv_priv->xdp_prog))
+   return -ENXIO;
+
+   spin_lock(_priv->xdp_ring.producer_lock);
+   for (i = 0; i < n; i++) {
+   struct xdp_frame *frame = frames[i];
+   void *ptr = veth_xdp_to_ptr(frame);
+
+   if (unlikely(xdp_ok_fwd_dev(rcv, frame->len) ||
+__ptr_ring_produce(_priv->xdp_ring, ptr))) {
+   xdp_return_frame_rx_napi(frame);
+   drops++;
+   }
+   }
+   spin_unlock(_priv->xdp_ring.producer_lock);
+
+   if (flags & XDP_XMIT_FLUSH)
+   __veth_xdp_flush(rcv_priv);
+
+   return n - drops;
+}
+
 static struct sk_buff *veth_xdp_rcv_one(struct veth_priv *priv,
struct xdp_frame *frame)
 {
@@ -760,6 +804,7 @@ static const struct net_device_ops veth_netdev_ops = {
.ndo_features_check = passthru_features_check,
.ndo_set_rx_headroom= veth_set_rx_headroom,
.ndo_bpf= veth_xdp,
+   .ndo_xdp_xmit   = veth_xdp_xmit,
 };
 
 #define VETH_FEATURES (NETIF_F_SG | NETIF_F_FRAGLIST | NETIF_F_HW_CSUM | \
-- 
2.14.3



[PATCH v3 bpf-next 0/8] veth: Driver XDP

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

This patch set introduces driver XDP for veth.
Basically this is used in conjunction with redirect action of another XDP
program.

  NIC ---> veth===veth
 (XDP) (redirect)(XDP)

In this case xdp_frame can be forwarded to the peer veth without
modification, so we can expect far better performance than generic XDP.

Envisioned use-cases


* Container managed XDP program
Container host redirects frames to containers by XDP redirect action, and
privileged containers can deploy their own XDP programs.

* XDP program cascading
Two or more XDP programs can be called for each packet by redirecting
xdp frames to veth.

* Internal interface for an XDP bridge
When using XDP redirection to create a virtual bridge, veth can be used
to create an internal interface for the bridge.

Implementation
--

This changeset is making use of NAPI to implement ndo_xdp_xmit and
XDP_TX/REDIRECT. This is mainly because XDP heavily relies on NAPI
context.
 - patch 1: Export a function needed for veth XDP.
 - patch 2-3: Basic implementation of veth XDP.
 - patch 4-5: Add ndo_xdp_xmit.
 - patch 6-7: Add XDP_TX and XDP_REDIRECT.
 - patch 8: Performance optimization for multi-queue env.

Tests and performance numbers
-

Tested with a simple XDP program which only redirects packets between
NIC and veth. I used i40e 25G NIC (XXV710) for the physical NIC. The
server has 20 of Xeon Silver 2.20 GHz cores.

  pktgen --(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP)

The rightmost veth loads XDP progs and just does DROP or TX. The number
of packets is measured in the XDP progs. The leftmost pktgen sends
packets at 37.1 Mpps (almost 25G wire speed).

veth XDP actionFlowsMpps

DROP   110.6
DROP   221.2
DROP 10036.0
TX 1 5.0
TX 210.0
TX   10031.0

I also measured netperf TCP_STREAM but was not so great performance due
to lack of tx/rx checksum offload and TSO, etc.

  netperf <--(wire)--> XXV710 (i40e) <--(XDP redirect)--> veth===veth (XDP PASS)

Direction Flows   Gbps
==
external->veth1   20.8
external->veth2   23.5
external->veth  100   23.6
veth->external19.0
veth->external2   17.8
veth->external  100   22.9

Also tested doing ifup/down or load/unload a XDP program repeatedly
during processing XDP packets in order to check if enabling/disabling
NAPI is working as expected, and found no problems.

Note:
This patchset depends on commit d8d7218ad842 ("xdp: XDP_REDIRECT should
check IFF_UP and MTU") which is currently in DaveM's net-next tree.

v3:
- Drop skb bulk xmit patch since it makes little performance
  difference. The hotspot in TCP skb xmit at this point is checksum
  computation in skb_segment and packet copy on XDP_REDIRECT due to
  cloned/nonlinear skb.
- Fix race on closing device.
- Add extack messages in ndo_bpf.

v2:
- Squash NAPI patch with "Add driver XDP" patch.
- Remove conversion from xdp_frame to skb when NAPI is not enabled.
- Introduce per-queue XDP ring (patch 8).
- Introduce bulk skb xmit when XDP is enabled on the peer (patch 9).

Signed-off-by: Toshiaki Makita 

Toshiaki Makita (8):
  net: Export skb_headers_offset_update
  veth: Add driver XDP
  veth: Avoid drops by oversized packets when XDP is enabled
  veth: Handle xdp_frames in xdp napi ring
  veth: Add ndo_xdp_xmit
  xdp: Add a flag for disabling napi_direct of xdp_return_frame in
xdp_mem_info
  veth: Add XDP TX and REDIRECT
  veth: Support per queue XDP ring

 drivers/net/veth.c | 735 -
 include/linux/skbuff.h |   1 +
 include/net/xdp.h  |   4 +
 net/core/skbuff.c  |   3 +-
 net/core/xdp.c |   6 +-
 5 files changed, 737 insertions(+), 12 deletions(-)

-- 
2.14.3



[PATCH v3 bpf-next 2/8] veth: Add driver XDP

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

This is the basic implementation of veth driver XDP.

Incoming packets are sent from the peer veth device in the form of skb,
so this is generally doing the same thing as generic XDP.

This itself is not so useful, but a starting point to implement other
useful veth XDP features like TX and REDIRECT.

This introduces NAPI when XDP is enabled, because XDP is now heavily
relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function
enqueues packets to the ring and peer NAPI handler drains the ring.

Currently only one ring is allocated for each veth device, so it does
not scale on multiqueue env. This can be resolved by allocating rings
on the per-queue basis later.

Note that NAPI is not used but netif_rx is used when XDP is not loaded,
so this does not change the default behaviour.

v3:
- Fix race on closing the device.
- Add extack messages in ndo_bpf.

v2:
- Squashed with the patch adding NAPI.
- Implement adjust_tail.
- Don't acquire consumer lock because it is guarded by NAPI.
- Make poll_controller noop since it is unnecessary.
- Register rxq_info on enabling XDP rather than on opening the device.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/veth.c | 373 -
 1 file changed, 366 insertions(+), 7 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index a69ad39ee57e..78fa08cb6e24 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -19,10 +19,18 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 #define DRV_NAME   "veth"
 #define DRV_VERSION"1.0"
 
+#define VETH_RING_SIZE 256
+#define VETH_XDP_HEADROOM  (XDP_PACKET_HEADROOM + NET_IP_ALIGN)
+
 struct pcpu_vstats {
u64 packets;
u64 bytes;
@@ -30,9 +38,16 @@ struct pcpu_vstats {
 };
 
 struct veth_priv {
+   struct napi_struct  xdp_napi;
+   struct net_device   *dev;
+   struct bpf_prog __rcu   *xdp_prog;
+   struct bpf_prog *_xdp_prog;
struct net_device __rcu *peer;
atomic64_t  dropped;
unsignedrequested_headroom;
+   boolrx_notify_masked;
+   struct ptr_ring xdp_ring;
+   struct xdp_rxq_info xdp_rxq;
 };
 
 /*
@@ -98,11 +113,43 @@ static const struct ethtool_ops veth_ethtool_ops = {
.get_link_ksettings = veth_get_link_ksettings,
 };
 
-static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
+/* general routines */
+
+static void __veth_xdp_flush(struct veth_priv *priv)
+{
+   /* Write ptr_ring before reading rx_notify_masked */
+   smp_mb();
+   if (!priv->rx_notify_masked) {
+   priv->rx_notify_masked = true;
+   napi_schedule(>xdp_napi);
+   }
+}
+
+static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb)
+{
+   if (unlikely(ptr_ring_produce(>xdp_ring, skb))) {
+   dev_kfree_skb_any(skb);
+   return NET_RX_DROP;
+   }
+
+   return NET_RX_SUCCESS;
+}
+
+static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool 
xdp)
 {
struct veth_priv *priv = netdev_priv(dev);
+
+   return __dev_forward_skb(dev, skb) ?: xdp ?
+   veth_xdp_rx(priv, skb) :
+   netif_rx(skb);
+}
+
+static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
+{
+   struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
struct net_device *rcv;
int length = skb->len;
+   bool rcv_xdp = false;
 
rcu_read_lock();
rcv = rcu_dereference(priv->peer);
@@ -111,7 +158,10 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
goto drop;
}
 
-   if (likely(dev_forward_skb(rcv, skb) == NET_RX_SUCCESS)) {
+   rcv_priv = netdev_priv(rcv);
+   rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog);
+
+   if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) {
struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
u64_stats_update_begin(>syncp);
@@ -122,14 +172,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
 drop:
atomic64_inc(>dropped);
}
+
+   if (rcv_xdp)
+   __veth_xdp_flush(rcv_priv);
+
rcu_read_unlock();
+
return NETDEV_TX_OK;
 }
 
-/*
- * general routines
- */
-
 static u64 veth_stats_one(struct pcpu_vstats *result, struct net_device *dev)
 {
struct veth_priv *priv = netdev_priv(dev);
@@ -179,18 +230,253 @@ static void veth_set_multicast_list(struct net_device 
*dev)
 {
 }
 
+static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
+ int buflen)
+{
+   struct sk_buff *skb;
+
+   if (!buflen) {
+   buflen = SKB_DATA_ALIGN(headroom + len) +
+   

[PATCH v3 bpf-next 3/8] veth: Avoid drops by oversized packets when XDP is enabled

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

All oversized packets including GSO packets are dropped if XDP is
enabled on receiver side, so don't send such packets from peer.

Drop TSO and SCTP fragmentation features so that veth devices themselves
segment packets with XDP enabled. Also cap MTU accordingly.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/veth.c | 41 +++--
 1 file changed, 39 insertions(+), 2 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 78fa08cb6e24..f5b72e937d9d 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -542,6 +542,23 @@ static int veth_get_iflink(const struct net_device *dev)
return iflink;
 }
 
+static netdev_features_t veth_fix_features(struct net_device *dev,
+  netdev_features_t features)
+{
+   struct veth_priv *priv = netdev_priv(dev);
+   struct net_device *peer;
+
+   peer = rtnl_dereference(priv->peer);
+   if (peer) {
+   struct veth_priv *peer_priv = netdev_priv(peer);
+
+   if (peer_priv->_xdp_prog)
+   features &= ~NETIF_F_GSO_SOFTWARE;
+   }
+
+   return features;
+}
+
 static void veth_set_rx_headroom(struct net_device *dev, int new_hr)
 {
struct veth_priv *peer_priv, *priv = netdev_priv(dev);
@@ -591,14 +608,33 @@ static int veth_xdp_set(struct net_device *dev, struct 
bpf_prog *prog,
goto err;
}
}
+
+   if (!old_prog) {
+   peer->hw_features &= ~NETIF_F_GSO_SOFTWARE;
+   peer->max_mtu = PAGE_SIZE - VETH_XDP_HEADROOM -
+   peer->hard_header_len -
+   SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+   if (peer->mtu > peer->max_mtu)
+   dev_set_mtu(peer, peer->max_mtu);
+   }
}
 
if (old_prog) {
-   if (!prog && dev->flags & IFF_UP)
-   veth_disable_xdp(dev);
+   if (!prog) {
+   if (dev->flags & IFF_UP)
+   veth_disable_xdp(dev);
+
+   if (peer) {
+   peer->hw_features |= NETIF_F_GSO_SOFTWARE;
+   peer->max_mtu = ETH_MAX_MTU;
+   }
+   }
bpf_prog_put(old_prog);
}
 
+   if ((!!old_prog ^ !!prog) && peer)
+   netdev_update_features(peer);
+
return 0;
 err:
priv->_xdp_prog = old_prog;
@@ -643,6 +679,7 @@ static const struct net_device_ops veth_netdev_ops = {
.ndo_poll_controller= veth_poll_controller,
 #endif
.ndo_get_iflink = veth_get_iflink,
+   .ndo_fix_features   = veth_fix_features,
.ndo_features_check = passthru_features_check,
.ndo_set_rx_headroom= veth_set_rx_headroom,
.ndo_bpf= veth_xdp,
-- 
2.14.3



[PATCH v3 bpf-next 6/8] xdp: Add a flag for disabling napi_direct of xdp_return_frame in xdp_mem_info

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

We need some mechanism to disable napi_direct on calling
xdp_return_frame_rx_napi() from some context.
When veth gets support of XDP_REDIRECT, it will redirects packets which
are redirected from other devices. On redirection veth will reuse
xdp_mem_info of the redirection source device to make return_frame work.
But in this case .ndo_xdp_xmit() called from veth redirection uses
xdp_mem_info which is not guarded by NAPI, because the .ndo_xdp_xmit is
not called directly from the rxq which owns the xdp_mem_info.

This approach introduces a flag in xdp_mem_info to indicate that
napi_direct should be disabled even when _rx_napi variant is used.

Signed-off-by: Toshiaki Makita 
---
 include/net/xdp.h | 4 
 net/core/xdp.c| 6 --
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index fcb033f51d8c..1d1bc6553ff2 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -41,6 +41,9 @@ enum xdp_mem_type {
MEM_TYPE_MAX,
 };
 
+/* XDP flags for xdp_mem_info */
+#define XDP_MEM_RF_NO_DIRECT   BIT(0)  /* don't use napi_direct */
+
 /* XDP flags for ndo_xdp_xmit */
 #define XDP_XMIT_FLUSH (1U << 0)   /* doorbell signal consumer */
 #define XDP_XMIT_FLAGS_MASKXDP_XMIT_FLUSH
@@ -48,6 +51,7 @@ enum xdp_mem_type {
 struct xdp_mem_info {
u32 type; /* enum xdp_mem_type, but known size type */
u32 id;
+   u32 flags;
 };
 
 struct page_pool;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 57285383ed00..1426c608fd75 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -330,10 +330,12 @@ static void __xdp_return(void *data, struct xdp_mem_info 
*mem, bool napi_direct,
/* mem->id is valid, checked in xdp_rxq_info_reg_mem_model() */
xa = rhashtable_lookup(mem_id_ht, >id, mem_id_rht_params);
page = virt_to_head_page(data);
-   if (xa)
+   if (xa) {
+   napi_direct &= !(mem->flags & XDP_MEM_RF_NO_DIRECT);
page_pool_put_page(xa->page_pool, page, napi_direct);
-   else
+   } else {
put_page(page);
+   }
rcu_read_unlock();
break;
case MEM_TYPE_PAGE_SHARED:
-- 
2.14.3



[PATCH v3 bpf-next 1/8] net: Export skb_headers_offset_update

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

This is needed for veth XDP which does skb_copy_expand()-like operation.

v2:
- Drop skb_copy_header part because it has already been exported now.

Signed-off-by: Toshiaki Makita 
---
 include/linux/skbuff.h | 1 +
 net/core/skbuff.c  | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index fd3cb1b247df..f6929688853a 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1035,6 +1035,7 @@ static inline struct sk_buff *alloc_skb_fclone(unsigned 
int size,
 }
 
 struct sk_buff *skb_morph(struct sk_buff *dst, struct sk_buff *src);
+void skb_headers_offset_update(struct sk_buff *skb, int off);
 int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask);
 struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t priority);
 void skb_copy_header(struct sk_buff *new, const struct sk_buff *old);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 0c1a00672ba9..5366d1660e5b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1291,7 +1291,7 @@ struct sk_buff *skb_clone(struct sk_buff *skb, gfp_t 
gfp_mask)
 }
 EXPORT_SYMBOL(skb_clone);
 
-static void skb_headers_offset_update(struct sk_buff *skb, int off)
+void skb_headers_offset_update(struct sk_buff *skb, int off)
 {
/* Only adjust this if it actually is csum_start rather than csum */
if (skb->ip_summed == CHECKSUM_PARTIAL)
@@ -1305,6 +1305,7 @@ static void skb_headers_offset_update(struct sk_buff 
*skb, int off)
skb->inner_network_header += off;
skb->inner_mac_header += off;
 }
+EXPORT_SYMBOL(skb_headers_offset_update);
 
 void skb_copy_header(struct sk_buff *new, const struct sk_buff *old)
 {
-- 
2.14.3



[PATCH v3 bpf-next 8/8] veth: Support per queue XDP ring

2018-07-22 Thread Toshiaki Makita
From: Toshiaki Makita 

Move XDP and napi related fields in veth_priv to newly created veth_rq
structure.

When xdp_frames are enqueued from ndo_xdp_xmit and XDP_TX, rxq is
selected by current cpu.

When skbs are enqueued from the peer device, rxq is one to one mapping
of its peer txq. This way we have a restriction that the number of rxqs
must not less than the number of peer txqs, but leave the possibility to
achieve bulk skb xmit in the future because txq lock would make it
possible to remove rxq ptr_ring lock.

v3:
- Add extack messages.
- Fix array overrun in veth_xmit.

Signed-off-by: Toshiaki Makita 
---
 drivers/net/veth.c | 278 -
 1 file changed, 188 insertions(+), 90 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 0323a4ca74e2..84482d9901ec 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -42,20 +42,24 @@ struct pcpu_vstats {
struct u64_stats_sync   syncp;
 };
 
-struct veth_priv {
+struct veth_rq {
struct napi_struct  xdp_napi;
struct net_device   *dev;
struct bpf_prog __rcu   *xdp_prog;
-   struct bpf_prog *_xdp_prog;
-   struct net_device __rcu *peer;
-   atomic64_t  dropped;
struct xdp_mem_info xdp_mem;
-   unsignedrequested_headroom;
boolrx_notify_masked;
struct ptr_ring xdp_ring;
struct xdp_rxq_info xdp_rxq;
 };
 
+struct veth_priv {
+   struct net_device __rcu *peer;
+   atomic64_t  dropped;
+   struct bpf_prog *_xdp_prog;
+   struct veth_rq  *rq;
+   unsigned intrequested_headroom;
+};
+
 /*
  * ethtool interface
  */
@@ -144,19 +148,19 @@ static void veth_ptr_free(void *ptr)
kfree_skb(ptr);
 }
 
-static void __veth_xdp_flush(struct veth_priv *priv)
+static void __veth_xdp_flush(struct veth_rq *rq)
 {
/* Write ptr_ring before reading rx_notify_masked */
smp_mb();
-   if (!priv->rx_notify_masked) {
-   priv->rx_notify_masked = true;
-   napi_schedule(>xdp_napi);
+   if (!rq->rx_notify_masked) {
+   rq->rx_notify_masked = true;
+   napi_schedule(>xdp_napi);
}
 }
 
-static int veth_xdp_rx(struct veth_priv *priv, struct sk_buff *skb)
+static int veth_xdp_rx(struct veth_rq *rq, struct sk_buff *skb)
 {
-   if (unlikely(ptr_ring_produce(>xdp_ring, skb))) {
+   if (unlikely(ptr_ring_produce(>xdp_ring, skb))) {
dev_kfree_skb_any(skb);
return NET_RX_DROP;
}
@@ -164,21 +168,22 @@ static int veth_xdp_rx(struct veth_priv *priv, struct 
sk_buff *skb)
return NET_RX_SUCCESS;
 }
 
-static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb, bool 
xdp)
+static int veth_forward_skb(struct net_device *dev, struct sk_buff *skb,
+   struct veth_rq *rq, bool xdp)
 {
-   struct veth_priv *priv = netdev_priv(dev);
-
return __dev_forward_skb(dev, skb) ?: xdp ?
-   veth_xdp_rx(priv, skb) :
+   veth_xdp_rx(rq, skb) :
netif_rx(skb);
 }
 
 static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
 {
struct veth_priv *rcv_priv, *priv = netdev_priv(dev);
+   struct veth_rq *rq = NULL;
struct net_device *rcv;
int length = skb->len;
bool rcv_xdp = false;
+   int rxq;
 
rcu_read_lock();
rcv = rcu_dereference(priv->peer);
@@ -188,9 +193,15 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
rcv_priv = netdev_priv(rcv);
-   rcv_xdp = rcu_access_pointer(rcv_priv->xdp_prog);
+   rxq = skb_get_queue_mapping(skb);
+   if (rxq < rcv->real_num_rx_queues) {
+   rq = _priv->rq[rxq];
+   rcv_xdp = rcu_access_pointer(rq->xdp_prog);
+   if (rcv_xdp)
+   skb_record_rx_queue(skb, rxq);
+   }
 
-   if (likely(veth_forward_skb(rcv, skb, rcv_xdp) == NET_RX_SUCCESS)) {
+   if (likely(veth_forward_skb(rcv, skb, rq, rcv_xdp) == NET_RX_SUCCESS)) {
struct pcpu_vstats *stats = this_cpu_ptr(dev->vstats);
 
u64_stats_update_begin(>syncp);
@@ -203,7 +214,7 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct 
net_device *dev)
}
 
if (rcv_xdp)
-   __veth_xdp_flush(rcv_priv);
+   __veth_xdp_flush(rq);
 
rcu_read_unlock();
 
@@ -278,11 +289,17 @@ static struct sk_buff *veth_build_skb(void *head, int 
headroom, int len,
return skb;
 }
 
+static int veth_select_rxq(struct net_device *dev)
+{
+   return smp_processor_id() % dev->real_num_rx_queues;
+}
+
 static int veth_xdp_xmit(struct net_device *dev, int n,
 struct xdp_frame **frames, u32 flags)
 {
struct veth_priv *rcv_priv, *priv = 

Re: [PATCH net-next 4/4] act_mirred: use ACT_MIRRED when possible

2018-07-22 Thread Paolo Abeni
On Sat, 2018-07-21 at 16:29 -0700, David Miller wrote:
> From: Paolo Abeni 
> Date: Thu, 19 Jul 2018 15:02:29 +0200
> 
> > kernel openswitch datapath.
>  ^^
> 
> "openvswitch"

Oops, typo!

Thank you for the feedback! I do hope this is your major concern ;)

I will address it v3.

Thanks,

Paolo


Security enhancement proposal for kernel TLS

2018-07-22 Thread Vakul Garg
Hi

The kernel based TLS record layer allows the user space world to use a 
decoupled TLS implementation.
The applications need not be linked with TLS stack. 
The TLS handshake can be done by a TLS daemon on the behalf of applications.

Presently, as soon as the handshake process derives keys, it pushes the 
negotiated keys to kernel TLS . 
Thereafter the applications can directly read and write data on their TCP 
socket (without having to use SSL apis).

With the current kernel TLS implementation, there is a security problem. 
Since the kernel TLS socket does not have information about the state of 
handshake, 
it allows applications to be able to receive data from the peer TLS endpoint 
even when the handshake verification has not been completed by the SSL daemon. 
It is a security problem if applications can receive data if verification of 
the handshake transcript is not completed (done with processing tls FINISHED 
message).

My proposal:
- Kernel TLS should maintain state of handshake (verified or 
unverified). 
In un-verified state, data records should not be allowed pass through 
to the applications.

- Add a new control interface using which that the user space SSL stack 
can tell the TLS socket that handshake has been verified and DATA records can 
flow. 
In 'unverified' state, only control records should be allowed to pass 
and reception DATA record should be pause the receive side record decryption.

- The handshake state should fallback to 'unverified' in case a control 
record is seen again by kernel TLS (e.g. in case of renegotiation, post 
handshake client auth etc).

Kindly comment.

Regards

Vakul


[PATCH iproute2-next] ip: Add violation counters to VF statisctics

2018-07-22 Thread Eran Ben Elisha
Extend VFs statistics by receive and transmit violation counters.

Example: "ip -s link show dev enp5s0f0"

6: enp5s0f0:  mtu 1500 qdisc mq state UP mode 
DEFAULT group default qlen 1000
link/ether 24:8a:07:a5:28:f0 brd ff:ff:ff:ff:ff:ff
RX: bytes  packets  errors  dropped overrun mcast
0  00   0   0   2
TX: bytes  packets  errors  dropped carrier collsns
1406   17   0   0   0   0
vf 0 MAC 00:00:ca:fe:ca:fe, vlan 5, spoof checking off, link-state auto, 
trust off, query_rss off
RX: bytes  packets  mcast   bcast   dropped
1666   29   14 32  0
TX: bytes  packets   dropped
2880   44   2412

Signed-off-by: Eran Ben Elisha 
---
 ip/ipaddress.c | 20 ++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 5009bfe6d2e3..ddb8bf876e10 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -562,6 +562,9 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
   rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
print_u64(PRINT_JSON, "broadcast", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
+   if (vf[IFLA_VF_STATS_RX_DROPPED])
+   print_u64(PRINT_JSON, "dropped", NULL,
+ 
rta_getattr_u64(vf[IFLA_VF_STATS_RX_DROPPED]));
close_json_object();
 
/* TX stats */
@@ -570,26 +573,39 @@ static void print_vf_stats64(FILE *fp, struct rtattr 
*vfstats)
   rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
print_u64(PRINT_JSON, "tx_packets", NULL,
   rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
+   if (vf[IFLA_VF_STATS_TX_DROPPED])
+   print_u64(PRINT_JSON, "dropped", NULL,
+ 
rta_getattr_u64(vf[IFLA_VF_STATS_TX_DROPPED]));
close_json_object();
close_json_object();
} else {
/* RX stats */
fprintf(fp, "%s", _SL_);
-   fprintf(fp, "RX: bytes  packets  mcast   bcast %s", _SL_);
+   fprintf(fp, "RX: bytes  packets  mcast   bcast ");
+   if (vf[IFLA_VF_STATS_RX_DROPPED])
+   fprintf(fp, "  dropped ");
+   fprintf(fp, "%s", _SL_);
fprintf(fp, "");
 
print_num(fp, 10, rta_getattr_u64(vf[IFLA_VF_STATS_RX_BYTES]));
print_num(fp, 8, rta_getattr_u64(vf[IFLA_VF_STATS_RX_PACKETS]));
print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_MULTICAST]));
print_num(fp, 7, rta_getattr_u64(vf[IFLA_VF_STATS_BROADCAST]));
+   if (vf[IFLA_VF_STATS_RX_DROPPED])
+   print_num(fp, 8, 
rta_getattr_u64(vf[IFLA_VF_STATS_RX_DROPPED]));
 
/* TX stats */
fprintf(fp, "%s", _SL_);
-   fprintf(fp, "TX: bytes  packets %s", _SL_);
+   fprintf(fp, "TX: bytes  packets ");
+   if (vf[IFLA_VF_STATS_TX_DROPPED])
+   fprintf(fp, "  dropped ");
+   fprintf(fp, "%s", _SL_);
fprintf(fp, "");
 
print_num(fp, 10, rta_getattr_u64(vf[IFLA_VF_STATS_TX_BYTES]));
print_num(fp, 8, rta_getattr_u64(vf[IFLA_VF_STATS_TX_PACKETS]));
+   if (vf[IFLA_VF_STATS_TX_DROPPED])
+   print_num(fp, 8, 
rta_getattr_u64(vf[IFLA_VF_STATS_TX_DROPPED]));
}
 }
 
-- 
1.8.3.1



Re: [PATCH 4/4] net: dsa: Add Lantiq / Intel DSA driver for vrx200

2018-07-22 Thread Hauke Mehrtens
On 07/22/2018 05:17 AM, David Miller wrote:
> From: Hauke Mehrtens 
> Date: Sat, 21 Jul 2018 21:13:58 +0200
> 
>> +// start the table access:
> 
> Please stick to C-style comments, except perhaps in the SPDX
> identifiers.
> 
> Thank you.
> 

Sorry that I missed that, it is fixed now.

Hauke


[PATCH net-next] bonding: don't cast const buf in sysfs store

2018-07-22 Thread Nikolay Aleksandrov
As was recently discussed [1], let's avoid casting the const buf in
bonding_sysfs_store_option and use kstrndup/kfree instead.

[1] http://lists.openwall.net/netdev/2018/07/22/25

Signed-off-by: Nikolay Aleksandrov 
---
 drivers/net/bonding/bond_sysfs.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/bonding/bond_sysfs.c b/drivers/net/bonding/bond_sysfs.c
index 6096440e96ea..35847250da5a 100644
--- a/drivers/net/bonding/bond_sysfs.c
+++ b/drivers/net/bonding/bond_sysfs.c
@@ -160,14 +160,19 @@ static ssize_t bonding_sysfs_store_option(struct device 
*d,
 {
struct bonding *bond = to_bond(d);
const struct bond_option *opt;
+   char *buffer_clone;
int ret;
 
opt = bond_opt_get_by_name(attr->attr.name);
if (WARN_ON(!opt))
return -ENOENT;
-   ret = bond_opt_tryset_rtnl(bond, opt->id, (char *)buffer);
+   buffer_clone = kstrndup(buffer, count, GFP_KERNEL);
+   if (!buffer_clone)
+   return -ENOMEM;
+   ret = bond_opt_tryset_rtnl(bond, opt->id, buffer_clone);
if (!ret)
ret = count;
+   kfree(buffer_clone);
 
return ret;
 }
-- 
2.11.0



Re: [PATCH net] net: rollback orig value on failure of dev_qdisc_change_tx_queue_len

2018-07-22 Thread Tariq Toukan




On 19/07/2018 8:25 PM, Cong Wang wrote:

On Thu, Jul 19, 2018 at 7:34 AM Tariq Toukan  wrote:


Fix dev_change_tx_queue_len so it rolls back original value
upon a failure in dev_qdisc_change_tx_queue_len.
This is already done for notifirers' failures, share the code.

The revert of changes in dev_qdisc_change_tx_queue_len
in such case is needed but missing (marked as TODO), yet it is
still better to not apply the new requested value.


You misunderstand the TODO, that is for reverting tx queue len
change for previous queues in the loop. I still don't have any
nice solution for this yet.


I understood this, but maybe didn't describe it clearly. I'll re-phrase.


Yeah, your change itself looks good.



Thanks.


Please update the changelog.

Thanks.



Re: [PATCH net-next] net: remove redundant input checks in SIOCSIFTXQLEN case of dev_ifsioc

2018-07-22 Thread Tariq Toukan




On 19/07/2018 8:21 PM, Cong Wang wrote:

On Thu, Jul 19, 2018 at 7:50 AM Tariq Toukan  wrote:

--- a/net/core/dev_ioctl.c
+++ b/net/core/dev_ioctl.c
@@ -282,14 +282,7 @@ static int dev_ifsioc(struct net *net, struct ifreq *ifr, 
unsigned int cmd)
 return dev_mc_del_global(dev, ifr->ifr_hwaddr.sa_data);

 case SIOCSIFTXQLEN:
-   if (ifr->ifr_qlen < 0)
-   return -EINVAL;


Are you sure we can remove this if check too?

The other one is safe to remove.



Hmm, let's see:
dev_change_tx_queue_len gets unsigned long new_len, any negative value 
passed is interpreted as a very large number, then we test:

if (new_len != (unsigned int)new_len)

This test returns true if range of unsigned long is larger than range of 
unsigned int. AFAIK these ranges are Arch dependent and there is no 
guarantee this holds.


Right?


Query about tls patch

2018-07-22 Thread Vakul Garg
Hi 

I got a query reading patch https://patchwork.ozlabs.org/patch/943442/ (already 
merged).

[PATCH]: tls: Fix zerocopy_from_iter iov handling

In tls_sw_sendmsg(), if zerocopy_from_iter() fails, we go to 
fallback_to_reg_send.
Here we first call iov_iter_revert(). But the iov_iter_advance  didn't happen 
in first place.

Similarly, in zerocopy_from_iter(), if 'revert' is passed as 'true', 
iov_iter_revert() is called even if iov_iter_advance() never happened (case of 
error return).

In tls_sw_recvmsg() flow, the iov_iter_revert() is always triggered. 
With revert happening, I am missing the point how the case where multiple 
records are decrypted into user space buffer with one invocation of 'recvmsg()' 
happens?
It seems that successively dequeued tls records would get decrypted at same 
location in the output buffer and keep overwriting the previous one.

Regards

Vakul


Re: [PATCH net] multicast: do not restore deleted record source filter mode to new one

2018-07-22 Thread David Miller
From: Hangbin Liu 
Date: Fri, 20 Jul 2018 14:04:27 +0800

> There are two scenarios that we will restore deleted records. The first is
> when device down and up(or unmap/remap). In this scenario the new filter
> mode is same with previous one. Because we get it from in_dev->mc_list and
> we do not touch it during device down and up.
> 
> The other scenario is when a new socket join a group which was just delete
> and not finish sending status reports. In this scenario, we should use the
> current filter mode instead of restore old one. Here are 4 cases in total.
> 
> old_socketnew_socket   before_fix   after_fix
>   IN(A) IN(A)   ALLOW(A) ALLOW(A)
>   IN(A) EX( )   TO_IN( ) TO_EX( )
>   EX( ) IN(A)   TO_EX( ) ALLOW(A)
>   EX( ) EX( )   TO_EX( ) TO_EX( )
> 
> Fixes: 24803f38a5c0b (igmp: do not remove igmp souce list info when set link 
> down)
> Fixes: 1666d49e1d416 (mld: do not remove mld souce list info when set link 
> down)
> Signed-off-by: Hangbin Liu 

Applied and queued up for -stable, thank you.


Re: [linux-sunxi] [PATCH 4/5] arm64: allwinner: h6: add EMAC device nodes

2018-07-22 Thread Corentin Labbe
On Sun, Jul 22, 2018 at 01:39:54PM +0800, Icenowy Zheng wrote:
> Allwinner H6 SoC has an EMAC like the one in A64.
> 
> Add device tree nodes for the H6 DTSI file.
> 
> Signed-off-by: Icenowy Zheng 
> ---
>  arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 30 
>  1 file changed, 30 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi 
> b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> index 3ab6cf0256ca..c65311de301a 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> @@ -149,6 +149,14 @@
>   interrupt-controller;
>   #interrupt-cells = <3>;
>  
> + ext_rgmii_pins: rgmii_pins {
> + pins = "PD0", "PD1", "PD2", "PD3", "PD4",
> +"PD5", "PD7", "PD8", "PD9", "PD10",
> +"PD11", "PD12", "PD13", "PD19", "PD20";
> + function = "emac";
> + drive-strength = <40>;
> + };
> +
>   mmc0_pins: mmc0-pins {
>   pins = "PF0", "PF1", "PF2", "PF3",
>  "PF4", "PF5";
> @@ -258,6 +266,28 @@
>   status = "disabled";
>   };
>  
> + emac: ethernet@502 {
> + compatible = "allwinner,sun50i-a64-emac",
> +  "allwinner,sun50i-h6-emac";
> + syscon = <>;
> + reg = <0x0502 0x1>;
> + interrupts = ;
> + interrupt-names = "macirq";
> + resets = < RST_BUS_EMAC>;
> + reset-names = "stmmaceth";
> + clocks = < CLK_BUS_EMAC>;
> + clock-names = "stmmaceth";
> + status = "disabled";
> + #address-cells = <1>;
> + #size-cells = <0>;

#address-cells and #size-cells is unnecessary in emac node.

Regards