Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Jiri Pirko
Sun, Mar 04, 2018 at 01:26:53AM CET, alexander.du...@gmail.com wrote:
>On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko  wrote:
>> Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:
>>>On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
 Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
>> >Yeah, this code essentially calls out the "shareable" code with a
>> >comment at the start and end of the section what defines the
>> >virtio_bypass functionality. It would just be a matter of mostly
>> >cutting and pasting to put it into a separate driver module.
>>
>> Please put it there and unite the use of it with netvsc.
>
>Surely, adding this to other drivers (e.g. might this be handy for xen
>too?) can be left for a separate patchset. Let's get one device merged
>first.

 Why? Let's do the generic infra alongside with the driver. I see no good
 reason to rush into merging driver and only later, if ever, to convert
 it to generic solution. On contrary. That would lead into multiple
 approaches and different behavious in multiple drivers. That is plain
 wrong.
>>>
>>>If nothing else it doesn't hurt to do this in one driver in a generic
>>>way, and once it has been proven to address all the needs of that one
>>>driver we can then start moving other drivers to it. The current
>>>solution is quite generic, that was my contribution to this patch set
>>>as I didn't like how invasive it was being to virtio and thought it
>>>would be best to keep this as minimally invasive as possible.
>>>
>>>My preference would be to give this a release or two in virtio to
>>>mature before we start pushing it onto other drivers. It shouldn't
>>>take much to cut/paste this into a new driver file once we decide it
>>>is time to start extending it out to other drivers.
>>
>> I'm not talking about cut/paste and in fact that is what I'm worried
>> about. I'm talking about common code in net/core/ or somewhere that
>> would take care of this in-driver bonding. Each driver, like virtio_net,
>> netvsc would just register some ops to it and the core would do all
>> logic. I believe it is essential take this approach from the start.
>
>Sorry, I didn't mean cut/paste into another driver, I meant to make it
>a driver of its own. My thought was to eventually create a shared/core
>driver module that is then used by the other drivers.
>
>My concern right now is that Stephen has indicated he doesn't want
>this approach taken with netvsc, and most of the community doesn't

IIUC, he only does not like the extra netdev. Is there anything else?


>want the netvsc approach applied to virtio. Until that impasse can be
>resolved there isn't much value in trying to split this up so it is
>available to other drivers. In addition I would imagine it would make
>it a pain for others to back-port into distros since it would break
>legacy netvsc driver behavior. Patches are always welcome. Once this
>is in you are free to try fighting to get this made into a generic
>module and applied to both drivers, but we have already spent close to
>3 months on this and it seems like there has been significantly more

Alex, time is never a good argument for poor design and shortcuts.


>time spent arguing over the number of interfaces and/or drivers than
>spent writing/reviewing actual code.
>
>- Alex


RE: [PATCH net-next 1/5] net: mvpp2: use the same buffer pool for all ports

2018-03-03 Thread Stefan Chulski
> Hello,
> 
> On Fri,  2 Mar 2018 16:40:40 +0100, Antoine Tenart wrote:
> > +static struct {
> > +   int pkt_size;
> > +   int buf_num;
> > +} mvpp2_pools[MVPP2_BM_POOLS_NUM];
> 
> Any reason for not doing:
> 
> } mvpp2_pools[MVPP2_BM_POOLS_NUM] = {
>   [MVPP2_BM_SHORT] = {
>   .pkt_size = MVPP2_BM_SHORT_PKT_SIZE,
>   .buf_num = MVPP2_BM_SHORT_BUF_NUM
>   },
>   [MVPP2_BM_LONG] = {
>   .pkt_size = MVPP2_BM_LONG_PKT_SIZE,
>   .buf_num = MVPP2_BM_LONG_BUF_NUM,
>   },
> };
> 
> And get rid of:
> 
> > +static void mvpp2_setup_bm_pool(void) {
> > +   /* Short pool */
> > +   mvpp2_pools[MVPP2_BM_SHORT].buf_num  =
> MVPP2_BM_SHORT_BUF_NUM;
> > +   mvpp2_pools[MVPP2_BM_SHORT].pkt_size =
> MVPP2_BM_SHORT_PKT_SIZE;
> > +
> > +   /* Long pool */
> > +   mvpp2_pools[MVPP2_BM_LONG].buf_num  =
> MVPP2_BM_LONG_BUF_NUM;
> > +   mvpp2_pools[MVPP2_BM_LONG].pkt_size =
> MVPP2_BM_LONG_PKT_SIZE; }
> 
>  ?
> 

No, we can change it.

Stefan.


RE: [PATCH net-next 5/5] net: mvpp2: jumbo frames support

2018-03-03 Thread Stefan Chulski
> > netdev_err(port->dev, "Invalid pool %d\n", pool);
> > return NULL;
> > }
> > @@ -4596,11 +4604,24 @@ mvpp2_bm_pool_use(struct mvpp2_port
> *port, int
> > pool, int pkt_size)  static int mvpp2_swf_bm_pool_init(struct
> > mvpp2_port *port)  {
> > int rxq;
> > +   enum mvpp2_bm_pool_log_num long_log_pool, short_log_pool;
> > +
> > +   /* If port pkt_size is higher than 1518B:
> > +* HW Long pool - SW Jumbo pool, HW Short pool - SW Short pool
> 
> The comment is wrong. In this case, the HW short pool is the SW long pool.

You right. Comment is wrong.

> > +   if (port->pool_long->id == MVPP2_BM_JUMBO && port->id != 0) {
> 
> Again, all over the place we hardcode the fact that Jumbo frames can only be
> used on port 0. I know port 0 is the only one that can do 10G, but are there
> possibly some use cases where you may want Jumbo frame on another port
> ?
> 
> This all really feels very hardcoded to me.
> 

All ports support Jumbo frames.
But only port 0 can do TX HW checksum offload(due to TX FIFO size).

Packet processor 2.2 has only 19KB TX FIFO size.
So in TX FIFO config code assign for Port 0 - 10KB, Port 1 - 3KB and Port 1 - 
3KB.

To perform checksum in HW, HW obviously should work in store and forward mode. 
Store all frame in TX FIFO and then check checksum.
If mtu 1500B, everything fine and all port can do this.

If mtu is 9KB and 9KB frame transmitted, Port 0 still can do HW checksum. But 
ports 1 and 2 doesn't has enough FIFO for this.
So we cannot offload this feature and SW should perform checksum.

> > +   /* 9704 == 9728 - 20 and rounding to 8 */
> > +   dev->max_mtu = MVPP2_BM_JUMBO_PKT_SIZE;
> 
> Is this correct for all ports ? Shouldn't the maximum MTU be different
> between port 0 (that supports Jumbo frames) and the other ports ?

This is correct for all ports. All ports can support Jumbo frames.

Stefan.


RE: [PATCH net-next 3/5] net: mvpp2: use a data size of 10kB for Tx FIFO on port 0

2018-03-03 Thread Stefan Chulski

> On Fri,  2 Mar 2018 16:40:42 +0100, Antoine Tenart wrote:
> 
> > -/* Initialize Tx FIFO's */
> > +/* Initialize Tx FIFO's
> > + * The CP110's total tx-fifo size is 19kB.
> > + * Use large-size 10kB for fast port but 3kB for others.
> > + */
> 
> Is there a reason to hardcode 10KB for port 0, and 3KB for the other ports ?
> Would there be use cases where the user may want different configurations
> ?
> 

Design requirement are 10KB TX FIFO for the 10Gb/sec and 2.5KB for the 
2.5Gb/sec.
Since only port 0 support 10Gb/sec and ports 1&2 support up to 2.5Gb/sec.
I don't see any reason to change this configurations.
Also TX FIFO size could be set only during probe.

> It's just that it feels very "hardcoded" to enforce specifically those 
> numbers.
> 
> Also, does it make sense to mention the CP110 here ? Is this 19 KB limitation
> a limit of the PPv2.2 IP, or of the CP110 ?

PPv2.2 IP is part of 110 communication processor.
Next communication processor will has different Packet processor or next 
generation of PPv2.x
Limit is PPv2.2 TX FIFO.

Stefan.


Re: WARNING: refcount bug in should_fail

2018-03-03 Thread Tetsuo Handa
Switching from mm to fsdevel, for this report says that put_net(net) in
rpc_kill_sb() made net->count < 0 when mount_ns() failed due to
register_shrinker() failure.

Relevant commits will be
commit 9ee332d99e4d5a97 ("sget(): handle failures of register_shrinker()") and
commit d91ee87d8d85a080 ("vfs: Pass data, ns, and ns->userns to mount_ns.").

When sget_userns() in mount_ns() failed, mount_ns() returns an error code to
the caller without calling fill_super(). That is, get_net(sb->s_fs_info) was
not called by rpc_fill_super() (via fill_super callback passed to mount_ns())
but put_net(sb->s_fs_info) is called by rpc_kill_sb() (via fs->kill_sb() from
deactivate_locked_super()).

--
static struct dentry *
rpc_mount(struct file_system_type *fs_type,
int flags, const char *dev_name, void *data)
{
struct net *net = current->nsproxy->net_ns;
return mount_ns(fs_type, flags, data, net, net->user_ns, 
rpc_fill_super);
}
--

syzbot wrote:
> Hello,
> 
> syzbot hit the following crash on bpf-next commit
> 6f1b5a2b58d8470e5a8b25ab29f5fdb46168 (Tue Feb 27 04:11:23 2018 +)
> Merge branch 'bpf-kselftest-improvements'
> 
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
> 
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+84371b6062cb639d7...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for  
> details.
> If you forward the report, please keep this part and the footer.
> 
> [ cut here ]
> FAULT_INJECTION: forcing a failure.
> name failslab, interval 1, probability 0, space 0, times 0
> refcount_t: underflow; use-after-free.
> CPU: 1 PID: 4239 Comm: syzkaller149381 Not tainted 4.16.0-rc2+ #20
> WARNING: CPU: 0 PID: 4237 at lib/refcount.c:187  
> refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:187
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> Call Trace:
> Kernel panic - not syncing: panic_on_warn set ...
> 
>   __dump_stack lib/dump_stack.c:17 [inline]
>   dump_stack+0x194/0x24d lib/dump_stack.c:53
>   fail_dump lib/fault-inject.c:51 [inline]
>   should_fail+0x8c0/0xa40 lib/fault-inject.c:149
>   should_failslab+0xec/0x120 mm/failslab.c:32
>   slab_pre_alloc_hook mm/slab.h:422 [inline]
>   slab_alloc mm/slab.c:3365 [inline]
>   __do_kmalloc mm/slab.c:3703 [inline]
>   __kmalloc+0x63/0x760 mm/slab.c:3714
>   kmalloc include/linux/slab.h:517 [inline]
>   kzalloc include/linux/slab.h:701 [inline]
>   register_shrinker+0x10e/0x2d0 mm/vmscan.c:268
>   sget_userns+0xbbf/0xe40 fs/super.c:520
>   mount_ns+0x6d/0x190 fs/super.c:1029
>   rpc_mount+0x9e/0xd0 net/sunrpc/rpc_pipe.c:1451
>   mount_fs+0x66/0x2d0 fs/super.c:1222
>   vfs_kern_mount.part.26+0xc6/0x4a0 fs/namespace.c:1037
>   vfs_kern_mount fs/namespace.c:2509 [inline]
>   do_new_mount fs/namespace.c:2512 [inline]
>   do_mount+0xea4/0x2bb0 fs/namespace.c:2842
>   SYSC_mount fs/namespace.c:3058 [inline]
>   SyS_mount+0xab/0x120 fs/namespace.c:3035
>   do_syscall_64+0x280/0x940 arch/x86/entry/common.c:287
>   entry_SYSCALL_64_after_hwframe+0x42/0xb7
> RIP: 0033:0x4460f9
> RSP: 002b:7fbcd769ad78 EFLAGS: 0246 ORIG_RAX: 00a5
> RAX: ffda RBX: 006dcc6c RCX: 004460f9
> RDX: 2080 RSI: 2040 RDI: 2000
> RBP: 7fbcd769ad80 R08: 20c0 R09: 3131
> R10:  R11: 0246 R12: 006dcc68
> R13:  R14: 0037 R15: 0030656c69662f2e
> CPU: 0 PID: 4237 Comm: syzkaller149381 Not tainted 4.16.0-rc2+ #20
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
> Google 01/01/2011
> Call Trace:
>   __dump_stack lib/dump_stack.c:17 [inline]
>   dump_stack+0x194/0x24d lib/dump_stack.c:53
>   panic+0x1e4/0x41c kernel/panic.c:183
>   __warn+0x1dc/0x200 kernel/panic.c:547
>   report_bug+0x211/0x2d0 lib/bug.c:184
>   fixup_bug.part.11+0x37/0x80 arch/x86/kernel/traps.c:178
>   fixup_bug arch/x86/kernel/traps.c:247 [inline]
>   do_error_trap+0x2d7/0x3e0 arch/x86/kernel/traps.c:296
>   do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:315
>   invalid_op+0x58/0x80 arch/x86/entry/entry_64.S:957
> RIP: 0010:refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:187
> RSP: 0018:8801b164f6d8 EFLAGS: 00010286
> RAX: dc08 RBX:  RCX: 815ac30e
> RDX:  RSI: 1100362c9e8b RDI: 1100362c9e60
> RBP: 8801b164f768 R08:  R09: 
> R10: 8801b164f610 R11:  R12: 1100362c9edc
> R13:  R14: 0001 R15: 8801ae924044
>   refcount_dec_and_test+0x1a/0x20 lib/refcount.c:212
>   put_net include/net/net_namespace.h:220 [inline]
>   rpc_kill_sb+0x253/0x3c0 net/sunrpc/rpc_pipe.c:1473
>   

Re: [PATCH 0/7] pull request for net: batman-adv 2018-03-02

2018-03-03 Thread David Miller
From: Simon Wunderlich 
Date: Fri,  2 Mar 2018 18:51:35 +0100

> here are some bugfixes which we would like to see integrated into net.
> 
> Please pull or let me know of any problem!

Pulled, thanks Simon.


Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 03:56:31PM -0800, Siwei Liu wrote:
> On Fri, Mar 2, 2018 at 1:36 PM, Michael S. Tsirkin  wrote:
> > On Fri, Mar 02, 2018 at 01:11:56PM -0800, Siwei Liu wrote:
> >> On Thu, Mar 1, 2018 at 12:08 PM, Sridhar Samudrala
> >>  wrote:
> >> > This patch enables virtio_net to switch over to a VF datapath when a VF
> >> > netdev is present with the same MAC address. It allows live migration
> >> > of a VM with a direct attached VF without the need to setup a bond/team
> >> > between a VF and virtio net device in the guest.
> >> >
> >> > The hypervisor needs to enable only one datapath at any time so that
> >> > packets don't get looped back to the VM over the other datapath. When a 
> >> > VF
> >> > is plugged, the virtio datapath link state can be marked as down. The
> >> > hypervisor needs to unplug the VF device from the guest on the source 
> >> > host
> >> > and reset the MAC filter of the VF to initiate failover of datapath to
> >> > virtio before starting the migration. After the migration is completed,
> >> > the destination hypervisor sets the MAC filter on the VF and plugs it 
> >> > back
> >> > to the guest to switch over to VF datapath.
> >> >
> >> > When BACKUP feature is enabled, an additional netdev(bypass netdev) is
> >> > created that acts as a master device and tracks the state of the 2 lower
> >> > netdevs. The original virtio_net netdev is marked as 'backup' netdev and 
> >> > a
> >> > passthru device with the same MAC is registered as 'active' netdev.
> >> >
> >> > This patch is based on the discussion initiated by Jesse on this thread.
> >> > https://marc.info/?l=linux-virtualization=151189725224231=2
> >> >
> >> > Signed-off-by: Sridhar Samudrala 
> >> > Signed-off-by: Alexander Duyck 
> >> > Reviewed-by: Jesse Brandeburg 
> >> > ---
> >> >  drivers/net/virtio_net.c | 683 
> >> > ++-
> >> >  1 file changed, 682 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> > index bcd13fe906ca..f2860d86c952 100644
> >> > --- a/drivers/net/virtio_net.c
> >> > +++ b/drivers/net/virtio_net.c
> >> > @@ -30,6 +30,8 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> > +#include 
> >> >  #include 
> >> >  #include 
> >> >
> >> > @@ -206,6 +208,9 @@ struct virtnet_info {
> >> > u32 speed;
> >> >
> >> > unsigned long guest_offloads;
> >> > +
> >> > +   /* upper netdev created when BACKUP feature enabled */
> >> > +   struct net_device *bypass_netdev;
> >> >  };
> >> >
> >> >  struct padded_vnet_hdr {
> >> > @@ -2236,6 +2241,22 @@ static int virtnet_xdp(struct net_device *dev, 
> >> > struct netdev_bpf *xdp)
> >> > }
> >> >  }
> >> >
> >> > +static int virtnet_get_phys_port_name(struct net_device *dev, char *buf,
> >> > + size_t len)
> >> > +{
> >> > +   struct virtnet_info *vi = netdev_priv(dev);
> >> > +   int ret;
> >> > +
> >> > +   if (!virtio_has_feature(vi->vdev, VIRTIO_NET_F_BACKUP))
> >> > +   return -EOPNOTSUPP;
> >> > +
> >> > +   ret = snprintf(buf, len, "_bkup");
> >> > +   if (ret >= len)
> >> > +   return -EOPNOTSUPP;
> >> > +
> >> > +   return 0;
> >> > +}
> >> > +
> >>
> >> What if the systemd/udevd is not new enough to enforce the
> >> n naming? Would virtio_bypass get a different name
> >> than the original virtio_net?
> >
> > You mean people using ethX names? Any hardware config change breaks
> > these, I don't think that can be helped.
> 
> I don't like the way to rely on .ndo_get_phys_port_name - it's fragile
> and it does not completely solve the problem it tries to address.
> Imagine what can end up with if getting an old udevd, or users already
> have exsiting explicit udev rules around phys_port_name. It does not
> give you the an ack in saying "yes, I know you're the bypass and
> you're the backup, please continue and I will give you both correct
> names", or an unacknowlegment saying "no, I don't know what these
> extra interfaces are, please go back and leave the VF device alone".
> We need new udev API for both feature negotiation and naming, or may
> even completely hide the lower interfaces.

Go ahead and try to make this happen, but I won't hold my
breath.

> >
> >> Should we detect this earlier and fall
> >> back to legacy mode without creating the bypass netdev and ensalving
> >> the VF?
> >
> > I don't think we can do this with existing kernel/userspace APIs.
> 
> That's why I ever said to make udev aware of this new type of combined
> device instead of doing hacks here and there around.
> 
> Regards,
> -Siwei

We can add new interfaces on top but the main purpose here is to
make old userspace do new tricks.

> >
> > --
> > MST


Re: [virtio-dev] Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Michael S. Tsirkin
On Fri, Mar 02, 2018 at 02:26:48PM -0800, Siwei Liu wrote:
> On Fri, Mar 2, 2018 at 1:31 PM, Michael S. Tsirkin  wrote:
> > On Fri, Mar 02, 2018 at 12:44:56PM -0800, Siwei Liu wrote:
> >> On Fri, Mar 2, 2018 at 12:10 PM, Michael S. Tsirkin  
> >> wrote:
> >> > On Fri, Mar 02, 2018 at 11:52:27AM -0800, Samudrala, Sridhar wrote:
> >> >>
> >> >>
> >> >> On 3/2/2018 11:41 AM, Michael S. Tsirkin wrote:
> >> >> > On Fri, Mar 02, 2018 at 07:26:25AM -0800, Alexander Duyck wrote:
> >> >> > > The design limits things to a 1:1 relationship since we just have 
> >> >> > > the
> >> >> > > child and backup pointers, but I don't think I am seeing exception
> >> >> > > handling to prevent us from overwriting the child pointers so there
> >> >> > > may be a leak there.
> >> >> > >
> >> >> > > Thanks.
> >> >> > >
> >> >> > > - Alex
> >> >> > In fact maintaining a list in that case would be nicer, and
> >> >> > just use an arbitrary one.
> >> >> > E.g. one can see how a user wanting to swap device 1 for device 2
> >> >> > might first add device 2 with same MAC then drop device 1.
> >> >>
> >> >> It should be possible to swap VF1 with VF2 by
> >> >> 1.- enabling virtio link
> >> >> 2.- unplugging VF1
> >> >> 3.- plugging VF2
> >> >> 4.- disabling virtio link
> >> >>
> >> >
> >> > True, but it isn't hard to avoid breakage if user
> >> > swapped steps 2 and 3. No need to make it more
> >> > fragile that it has to be.
> >>
> >> The migration case, VF2 is associated with another PF on another
> >> machine (destination), I wonder how it is possible.
> >
> > E.g. you want to remove the PF so you unplug the VF
> > then add another VF of the same PF.
> >
> >> Even with local plugging of VF2 on the same PF, the MAC address
> >> requirement (VF1's == VF2's) would fail the MAC address assignment on
> >> VF2.
> >>
> >> -Siwei
> >
> > Why would it fail? These are separate cards.
> 
> OK. I realized that you may talk about assigning a VF on a diffferent
> PF (VF1 on PF1 while VF2 on PF2). And we might assign a pass-through
> device rather than a VF. Yes, it's indeed possible that may happen but
> I take it as a further step down (another patch maybe) as it would
> involve changes to notify the network with gratuituious ARP and/or
> unsolicited ND advertisement of the MAC address association with the
> new port.
> 
> -Siwei

Interesting point. I guess that's a limitation in the curent patch then:
virtio and PT device must be connected to the same physical NIC.
Worth documenting.

> >
> >> >
> >> > --
> >> > MST
> >> >
> >> > -
> >> > To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> >> > For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
> >> >


[PATCH 1/3] net: core: dst_cache: Fix a typo in a comment

2018-03-03 Thread Jonathan Neuschäfer
Signed-off-by: Jonathan Neuschäfer 
---
 include/net/dst_cache.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index 72fd5067c353..844906fbf8c9 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -71,7 +71,7 @@ struct dst_entry *dst_cache_get_ip6(struct dst_cache 
*dst_cache,
  * dst_cache_reset - invalidate the cache contents
  * @dst_cache: the cache
  *
- * This do not free the cached dst to avoid races and contentions.
+ * This does not free the cached dst to avoid races and contentions.
  * the dst will be freed on later cache lookup.
  */
 static inline void dst_cache_reset(struct dst_cache *dst_cache)
-- 
2.16.1



[PATCH 2/3] net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency

2018-03-03 Thread Jonathan Neuschäfer
The other dst_cache_{get,set}_ip{4,6} functions, and the doc comment for
dst_cache_set_ip6 use 'saddr' for their source address parameter. Rename
the parameter to increase consistency.

This fixes the following kernel-doc warnings:

./include/net/dst_cache.h:58: warning: Function parameter or member 'addr' not 
described in 'dst_cache_set_ip6'
./include/net/dst_cache.h:58: warning: Excess function parameter 'saddr' 
description in 'dst_cache_set_ip6'

Fixes: 911362c70df5 ("net: add dst_cache support")
Signed-off-by: Jonathan Neuschäfer 
---
 include/net/dst_cache.h | 2 +-
 net/core/dst_cache.c| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/net/dst_cache.h b/include/net/dst_cache.h
index 844906fbf8c9..67634675e919 100644
--- a/include/net/dst_cache.h
+++ b/include/net/dst_cache.h
@@ -54,7 +54,7 @@ void dst_cache_set_ip4(struct dst_cache *dst_cache, struct 
dst_entry *dst,
  * local BH must be disabled.
  */
 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
-  const struct in6_addr *addr);
+  const struct in6_addr *saddr);
 
 /**
  * dst_cache_get_ip6 - perform cache lookup and fetch ipv6 source address
diff --git a/net/core/dst_cache.c b/net/core/dst_cache.c
index 554d36449231..64cef977484a 100644
--- a/net/core/dst_cache.c
+++ b/net/core/dst_cache.c
@@ -107,7 +107,7 @@ EXPORT_SYMBOL_GPL(dst_cache_set_ip4);
 
 #if IS_ENABLED(CONFIG_IPV6)
 void dst_cache_set_ip6(struct dst_cache *dst_cache, struct dst_entry *dst,
-  const struct in6_addr *addr)
+  const struct in6_addr *saddr)
 {
struct dst_cache_pcpu *idst;
 
@@ -117,7 +117,7 @@ void dst_cache_set_ip6(struct dst_cache *dst_cache, struct 
dst_entry *dst,
idst = this_cpu_ptr(dst_cache->cache);
dst_cache_per_cpu_dst_set(this_cpu_ptr(dst_cache->cache), dst,
  rt6_get_cookie((struct rt6_info *)dst));
-   idst->in6_saddr = *addr;
+   idst->in6_saddr = *saddr;
 }
 EXPORT_SYMBOL_GPL(dst_cache_set_ip6);
 
-- 
2.16.1



[PATCH 3/3] net: core: dst: Add kernel-doc for 'net' parameter

2018-03-03 Thread Jonathan Neuschäfer
This fixes the following kernel-doc warning:

./include/net/dst.h:366: warning: Function parameter or member 'net' not 
described in 'skb_tunnel_rx'

Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
Signed-off-by: Jonathan Neuschäfer 
---
 include/net/dst.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/net/dst.h b/include/net/dst.h
index c63d2c37f6e9..b3219cd8a5a1 100644
--- a/include/net/dst.h
+++ b/include/net/dst.h
@@ -356,6 +356,7 @@ static inline void __skb_tunnel_rx(struct sk_buff *skb, 
struct net_device *dev,
  * skb_tunnel_rx - prepare skb for rx reinsert
  * @skb: buffer
  * @dev: tunnel device
+ * @net: netns for packet i/o
  *
  * After decapsulation, packet is going to re-enter (netif_rx()) our stack,
  * so make some cleanups, and perform accounting.
-- 
2.16.1



Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Alexander Duyck
On Sat, Mar 3, 2018 at 1:25 PM, Jiri Pirko  wrote:
> Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:
>>On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
>>> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
> >Yeah, this code essentially calls out the "shareable" code with a
> >comment at the start and end of the section what defines the
> >virtio_bypass functionality. It would just be a matter of mostly
> >cutting and pasting to put it into a separate driver module.
>
> Please put it there and unite the use of it with netvsc.

Surely, adding this to other drivers (e.g. might this be handy for xen
too?) can be left for a separate patchset. Let's get one device merged
first.
>>>
>>> Why? Let's do the generic infra alongside with the driver. I see no good
>>> reason to rush into merging driver and only later, if ever, to convert
>>> it to generic solution. On contrary. That would lead into multiple
>>> approaches and different behavious in multiple drivers. That is plain
>>> wrong.
>>
>>If nothing else it doesn't hurt to do this in one driver in a generic
>>way, and once it has been proven to address all the needs of that one
>>driver we can then start moving other drivers to it. The current
>>solution is quite generic, that was my contribution to this patch set
>>as I didn't like how invasive it was being to virtio and thought it
>>would be best to keep this as minimally invasive as possible.
>>
>>My preference would be to give this a release or two in virtio to
>>mature before we start pushing it onto other drivers. It shouldn't
>>take much to cut/paste this into a new driver file once we decide it
>>is time to start extending it out to other drivers.
>
> I'm not talking about cut/paste and in fact that is what I'm worried
> about. I'm talking about common code in net/core/ or somewhere that
> would take care of this in-driver bonding. Each driver, like virtio_net,
> netvsc would just register some ops to it and the core would do all
> logic. I believe it is essential take this approach from the start.

Sorry, I didn't mean cut/paste into another driver, I meant to make it
a driver of its own. My thought was to eventually create a shared/core
driver module that is then used by the other drivers.

My concern right now is that Stephen has indicated he doesn't want
this approach taken with netvsc, and most of the community doesn't
want the netvsc approach applied to virtio. Until that impasse can be
resolved there isn't much value in trying to split this up so it is
available to other drivers. In addition I would imagine it would make
it a pain for others to back-port into distros since it would break
legacy netvsc driver behavior. Patches are always welcome. Once this
is in you are free to try fighting to get this made into a generic
module and applied to both drivers, but we have already spent close to
3 months on this and it seems like there has been significantly more
time spent arguing over the number of interfaces and/or drivers than
spent writing/reviewing actual code.

- Alex


[PATCH AUTOSEL for 4.9 136/219] MIPS: BPF: Quit clobbering callee saved registers in JIT code.

2018-03-03 Thread Sasha Levin
From: David Daney 

[ Upstream commit 1ef0910cfd681f0bd0b81f8809935b2006e9cfb9 ]

If bpf_needs_clear_a() returns true, only actually clear it if it is
ever used.  If it is not used, we don't save and restore it, so the
clearing has the nasty side effect of clobbering caller state.

Also, don't emit stack pointer adjustment instructions if the
adjustment amount is zero.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15745/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
---
 arch/mips/net/bpf_jit.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 49a2e2226fee..248603739198 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -526,7 +526,8 @@ static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned 
offset)
u32 sflags, tmp_flags;
 
/* Adjust the stack pointer */
-   emit_stack_offset(-align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(-align_sp(offset), ctx);
 
tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
/* sflags is essentially a bitmap */
@@ -578,7 +579,8 @@ static void restore_bpf_jit_regs(struct jit_ctx *ctx,
emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
 
/* Restore the sp and discard the scrach memory */
-   emit_stack_offset(align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(align_sp(offset), ctx);
 }
 
 static unsigned int get_stack_depth(struct jit_ctx *ctx)
@@ -625,8 +627,14 @@ static void build_prologue(struct jit_ctx *ctx)
if (ctx->flags & SEEN_X)
emit_jit_reg_move(r_X, r_zero, ctx);
 
-   /* Do not leak kernel data to userspace */
-   if (bpf_needs_clear_a(>skf->insns[0]))
+   /*
+* Do not leak kernel data to userspace, we only need to clear
+* r_A if it is ever used.  In fact if it is never used, we
+* will not save/restore it, so clearing it in this case would
+* corrupt the state of the caller.
+*/
+   if (bpf_needs_clear_a(>skf->insns[0]) &&
+   (ctx->flags & SEEN_A))
emit_jit_reg_move(r_A, r_zero, ctx);
 }
 
-- 
2.14.1


[PATCH AUTOSEL for 4.9 137/219] MIPS: BPF: Fix multiple problems in JIT skb access helpers.

2018-03-03 Thread Sasha Levin
From: David Daney 

[ Upstream commit a81507c79f4ae9a0f9fb1054b59b62a090620dd9 ]

o Socket data is unsigned, so use unsigned accessors instructions.

 o Fix path result pointer generation arithmetic.

 o Fix half-word byte swapping code for unsigned semantics.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15747/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
---
 arch/mips/net/bpf_jit_asm.S | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/mips/net/bpf_jit_asm.S b/arch/mips/net/bpf_jit_asm.S
index 5d2e0c8d29c0..88a2075305d1 100644
--- a/arch/mips/net/bpf_jit_asm.S
+++ b/arch/mips/net/bpf_jit_asm.S
@@ -90,18 +90,14 @@ FEXPORT(sk_load_half_positive)
is_offset_in_header(2, half)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   .setreorder
-   lh  $r_A, 0(t1)
-   .setnoreorder
+   lhu $r_A, 0(t1)
 #ifdef CONFIG_CPU_LITTLE_ENDIAN
 # if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
-   wsbht0, $r_A
-   seh $r_A, t0
+   wsbh$r_A, $r_A
 # else
-   sll t0, $r_A, 24
-   andit1, $r_A, 0xff00
-   sra t0, t0, 16
-   srl t1, t1, 8
+   sll t0, $r_A, 8
+   srl t1, $r_A, 8
+   andit0, t0, 0xff00
or  $r_A, t0, t1
 # endif
 #endif
@@ -115,7 +111,7 @@ FEXPORT(sk_load_byte_positive)
is_offset_in_header(1, byte)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   lb  $r_A, 0(t1)
+   lbu $r_A, 0(t1)
jr  $r_ra
 move   $r_ret, zero
END(sk_load_byte)
@@ -139,6 +135,11 @@ FEXPORT(sk_load_byte_positive)
  * (void *to) is returned in r_s0
  *
  */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define DS_OFFSET(SIZE) (4 * SZREG)
+#else
+#define DS_OFFSET(SIZE) ((4 * SZREG) + (4 - SIZE))
+#endif
 #define bpf_slow_path_common(SIZE) \
/* Quick check. Are we within reasonable boundaries? */ \
LONG_ADDIU  $r_s1, $r_skb_len, -SIZE;   \
@@ -150,7 +151,7 @@ FEXPORT(sk_load_byte_positive)
PTR_LA  t0, skb_copy_bits;  \
PTR_S   $r_ra, (5 * SZREG)($r_sp);  \
/* Assign low slot to a2 */ \
-   movea2, $r_sp;  \
+   PTR_ADDIU   a2, $r_sp, DS_OFFSET(SIZE); \
jalrt0; \
/* Reset our destination slot (DS but it's ok) */   \
 INT_S  zero, (4 * SZREG)($r_sp);   \
-- 
2.14.1


[PATCH AUTOSEL for 4.4 067/115] MIPS: BPF: Quit clobbering callee saved registers in JIT code.

2018-03-03 Thread Sasha Levin
From: David Daney 

[ Upstream commit 1ef0910cfd681f0bd0b81f8809935b2006e9cfb9 ]

If bpf_needs_clear_a() returns true, only actually clear it if it is
ever used.  If it is not used, we don't save and restore it, so the
clearing has the nasty side effect of clobbering caller state.

Also, don't emit stack pointer adjustment instructions if the
adjustment amount is zero.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15745/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
---
 arch/mips/net/bpf_jit.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 1a8c96035716..c0c1e9529dbd 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -527,7 +527,8 @@ static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned 
offset)
u32 sflags, tmp_flags;
 
/* Adjust the stack pointer */
-   emit_stack_offset(-align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(-align_sp(offset), ctx);
 
tmp_flags = sflags = ctx->flags >> SEEN_SREG_SFT;
/* sflags is essentially a bitmap */
@@ -579,7 +580,8 @@ static void restore_bpf_jit_regs(struct jit_ctx *ctx,
emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
 
/* Restore the sp and discard the scrach memory */
-   emit_stack_offset(align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(align_sp(offset), ctx);
 }
 
 static unsigned int get_stack_depth(struct jit_ctx *ctx)
@@ -626,8 +628,14 @@ static void build_prologue(struct jit_ctx *ctx)
if (ctx->flags & SEEN_X)
emit_jit_reg_move(r_X, r_zero, ctx);
 
-   /* Do not leak kernel data to userspace */
-   if (bpf_needs_clear_a(>skf->insns[0]))
+   /*
+* Do not leak kernel data to userspace, we only need to clear
+* r_A if it is ever used.  In fact if it is never used, we
+* will not save/restore it, so clearing it in this case would
+* corrupt the state of the caller.
+*/
+   if (bpf_needs_clear_a(>skf->insns[0]) &&
+   (ctx->flags & SEEN_A))
emit_jit_reg_move(r_A, r_zero, ctx);
 }
 
-- 
2.14.1


[PATCH AUTOSEL for 4.4 068/115] MIPS: BPF: Fix multiple problems in JIT skb access helpers.

2018-03-03 Thread Sasha Levin
From: David Daney 

[ Upstream commit a81507c79f4ae9a0f9fb1054b59b62a090620dd9 ]

o Socket data is unsigned, so use unsigned accessors instructions.

 o Fix path result pointer generation arithmetic.

 o Fix half-word byte swapping code for unsigned semantics.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15747/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
---
 arch/mips/net/bpf_jit_asm.S | 23 ---
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/mips/net/bpf_jit_asm.S b/arch/mips/net/bpf_jit_asm.S
index 5d2e0c8d29c0..88a2075305d1 100644
--- a/arch/mips/net/bpf_jit_asm.S
+++ b/arch/mips/net/bpf_jit_asm.S
@@ -90,18 +90,14 @@ FEXPORT(sk_load_half_positive)
is_offset_in_header(2, half)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   .setreorder
-   lh  $r_A, 0(t1)
-   .setnoreorder
+   lhu $r_A, 0(t1)
 #ifdef CONFIG_CPU_LITTLE_ENDIAN
 # if defined(__mips_isa_rev) && (__mips_isa_rev >= 2)
-   wsbht0, $r_A
-   seh $r_A, t0
+   wsbh$r_A, $r_A
 # else
-   sll t0, $r_A, 24
-   andit1, $r_A, 0xff00
-   sra t0, t0, 16
-   srl t1, t1, 8
+   sll t0, $r_A, 8
+   srl t1, $r_A, 8
+   andit0, t0, 0xff00
or  $r_A, t0, t1
 # endif
 #endif
@@ -115,7 +111,7 @@ FEXPORT(sk_load_byte_positive)
is_offset_in_header(1, byte)
/* Offset within header boundaries */
PTR_ADDU t1, $r_skb_data, offset
-   lb  $r_A, 0(t1)
+   lbu $r_A, 0(t1)
jr  $r_ra
 move   $r_ret, zero
END(sk_load_byte)
@@ -139,6 +135,11 @@ FEXPORT(sk_load_byte_positive)
  * (void *to) is returned in r_s0
  *
  */
+#ifdef CONFIG_CPU_LITTLE_ENDIAN
+#define DS_OFFSET(SIZE) (4 * SZREG)
+#else
+#define DS_OFFSET(SIZE) ((4 * SZREG) + (4 - SIZE))
+#endif
 #define bpf_slow_path_common(SIZE) \
/* Quick check. Are we within reasonable boundaries? */ \
LONG_ADDIU  $r_s1, $r_skb_len, -SIZE;   \
@@ -150,7 +151,7 @@ FEXPORT(sk_load_byte_positive)
PTR_LA  t0, skb_copy_bits;  \
PTR_S   $r_ra, (5 * SZREG)($r_sp);  \
/* Assign low slot to a2 */ \
-   movea2, $r_sp;  \
+   PTR_ADDIU   a2, $r_sp, DS_OFFSET(SIZE); \
jalrt0; \
/* Reset our destination slot (DS but it's ok) */   \
 INT_S  zero, (4 * SZREG)($r_sp);   \
-- 
2.14.1


[PATCH] ravb: remove erroneous comment

2018-03-03 Thread Niklas Söderlund
When addressing a review comment in a early version of the offending
patch a comment where left in which should have been removed. Remove the
comment to keep it consistent with the code.

Fixes: 75efa06f457bbed3 ("ravb: add support for changing MTU")
Reported-by: Sergei Shtylyov 
Signed-off-by: Niklas Söderlund 
---
 drivers/net/ethernet/renesas/ravb_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 54a6265da7a06460..68f122140966d4de 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -346,7 +346,6 @@ static int ravb_ring_init(struct net_device *ndev, int q)
int ring_size;
int i;
 
-   /* +16 gets room from the status from the card. */
priv->rx_buf_sz = (ndev->mtu <= 1492 ? PKT_BUF_SZ : ndev->mtu) +
ETH_HLEN + VLAN_HLEN;
 
-- 
2.16.2



[PATCH AUTOSEL for 3.18 37/63] MIPS: BPF: Quit clobbering callee saved registers in JIT code.

2018-03-03 Thread Sasha Levin
From: David Daney 

[ Upstream commit 1ef0910cfd681f0bd0b81f8809935b2006e9cfb9 ]

If bpf_needs_clear_a() returns true, only actually clear it if it is
ever used.  If it is not used, we don't save and restore it, so the
clearing has the nasty side effect of clobbering caller state.

Also, don't emit stack pointer adjustment instructions if the
adjustment amount is zero.

Signed-off-by: David Daney 
Cc: James Hogan 
Cc: Alexei Starovoitov 
Cc: Steven J. Hill 
Cc: linux-m...@linux-mips.org
Cc: netdev@vger.kernel.org
Cc: linux-ker...@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/15745/
Signed-off-by: Ralf Baechle 
Signed-off-by: Sasha Levin 
---
 arch/mips/net/bpf_jit.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/mips/net/bpf_jit.c b/arch/mips/net/bpf_jit.c
index 9fd82f48f8ed..32d2673439ad 100644
--- a/arch/mips/net/bpf_jit.c
+++ b/arch/mips/net/bpf_jit.c
@@ -562,7 +562,8 @@ static void save_bpf_jit_regs(struct jit_ctx *ctx, unsigned 
offset)
u32 sflags, tmp_flags;
 
/* Adjust the stack pointer */
-   emit_stack_offset(-align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(-align_sp(offset), ctx);
 
if (ctx->flags & SEEN_CALL) {
/* Argument save area */
@@ -641,7 +642,8 @@ static void restore_bpf_jit_regs(struct jit_ctx *ctx,
emit_load_stack_reg(r_ra, r_sp, real_off, ctx);
 
/* Restore the sp and discard the scrach memory */
-   emit_stack_offset(align_sp(offset), ctx);
+   if (offset)
+   emit_stack_offset(align_sp(offset), ctx);
 }
 
 static unsigned int get_stack_depth(struct jit_ctx *ctx)
@@ -689,8 +691,14 @@ static void build_prologue(struct jit_ctx *ctx)
if (ctx->flags & SEEN_X)
emit_jit_reg_move(r_X, r_zero, ctx);
 
-   /* Do not leak kernel data to userspace */
-   if (bpf_needs_clear_a(>skf->insns[0]))
+   /*
+* Do not leak kernel data to userspace, we only need to clear
+* r_A if it is ever used.  In fact if it is never used, we
+* will not save/restore it, so clearing it in this case would
+* corrupt the state of the caller.
+*/
+   if (bpf_needs_clear_a(>skf->insns[0]) &&
+   (ctx->flags & SEEN_A))
emit_jit_reg_move(r_A, r_zero, ctx);
 }
 
-- 
2.14.1


lnstat

2018-03-03 Thread David Kaufmann
Hi!

`lnstat` segfaults (tested on Debian 9, CentOS 6+7, Fedora 27) if it is
started as `lnstat -w 1`

according to gdb the crash is in `build_hdr_string` at lnstat.c:212

as it seems to be an useless value for the option anyway it might make
sense to just handle a single "1" the same as if "0" was specified.
`-w 0,1`, `-w 1,0`, `-w 1,1` and other variations do work.

All the best,
Astra

PS: I did not find any other place to report this, if this is the wrong
place please tell we where to post.


signature.asc
Description: PGP signature


Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Jiri Pirko
Sat, Mar 03, 2018 at 07:04:57PM CET, alexander.du...@gmail.com wrote:
>On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
>> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
>>>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
 >Yeah, this code essentially calls out the "shareable" code with a
 >comment at the start and end of the section what defines the
 >virtio_bypass functionality. It would just be a matter of mostly
 >cutting and pasting to put it into a separate driver module.

 Please put it there and unite the use of it with netvsc.
>>>
>>>Surely, adding this to other drivers (e.g. might this be handy for xen
>>>too?) can be left for a separate patchset. Let's get one device merged
>>>first.
>>
>> Why? Let's do the generic infra alongside with the driver. I see no good
>> reason to rush into merging driver and only later, if ever, to convert
>> it to generic solution. On contrary. That would lead into multiple
>> approaches and different behavious in multiple drivers. That is plain
>> wrong.
>
>If nothing else it doesn't hurt to do this in one driver in a generic
>way, and once it has been proven to address all the needs of that one
>driver we can then start moving other drivers to it. The current
>solution is quite generic, that was my contribution to this patch set
>as I didn't like how invasive it was being to virtio and thought it
>would be best to keep this as minimally invasive as possible.
>
>My preference would be to give this a release or two in virtio to
>mature before we start pushing it onto other drivers. It shouldn't
>take much to cut/paste this into a new driver file once we decide it
>is time to start extending it out to other drivers.

I'm not talking about cut/paste and in fact that is what I'm worried
about. I'm talking about common code in net/core/ or somewhere that
would take care of this in-driver bonding. Each driver, like virtio_net,
netvsc would just register some ops to it and the core would do all
logic. I believe it is essential take this approach from the start.


Re: lost interrupts when running sabrelite images (v4.15+) in qemu

2018-03-03 Thread Guenter Roeck

On 03/03/2018 12:48 PM, Guenter Roeck wrote:

On 03/03/2018 11:07 AM, Troy Kisky wrote:

On 3/3/2018 8:32 AM, Guenter Roeck wrote:

Hi,

since v4.15, I get the following runtime warning when running sabrelite images
in qemu.

irq 65: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<26292474>] fec_pps_interrupt
Disabling IRQ #65
fec 2188000.ethernet (unnamed net_device) (uninitialized): MDIO read timeout

Bisect points to commit 4ad1ceec05e491 ("net: fec: Let fec_ptp have its
own interrupt routine"). Analysis shows that platform_irq_count()
returns 2, which is reduced to 1 by fec_enet_get_irq_cnt().
If I let fec_enet_get_irq_cnt() return 2, the problem is gone.
Reverting commit 4ad1ceec05e491 also fixes the problem.

Bisect log is attached.



Sounds like you found a bug with qemu. I just booted sabrelite over nfs fine.
My interrupts look like this.


  64:  98767  0  0  0 GIC-0 150 Level 
2188000.ethernet
  65:  0  0  0  0 GIC-0 151 Level 
2188000.ethernet
___
Irq 65 is only for ptp interrrupts now. If qemu is signaling an tx/rx frame 
interrupt on 65,
then qemu is wrong. Of course, I've never used qemu so feel free to ignore me 
if I make no sense.



Thanks for checking with real hardware.

This is what I see (with your patch reverted):

  64:  0 GIC-0 150 Level 2188000.ethernet
  65: 64 GIC-0 151 Level 2188000.ethernet

Looking into the qemu source, I see:

#define FSL_IMX6_ENET_MAC_1588_IRQ 118
#define FSL_IMX6_ENET_MAC_IRQ 119

FSL_IMX6_ENET_MAC_IRQ is then connected to fec interrupt index 0, and 
FSL_IMX6_ENET_MAC_1588_IRQ
is connected to fec interrupt index 1.

This may suggest that the defines are reversed. I'll see what happens if I swap 
them.



Confirmed. If I swap the above defines, everything works fine. At the same time,
the modified qemu works with older kernels.

Thanks a lot for the hint, and sorry for the noise.

Guenter


Re: [PATCH bpf] bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs

2018-03-03 Thread Alexei Starovoitov
On Sat, Mar 03, 2018 at 09:57:15PM +0100, Daniel Borkmann wrote:
> On 03/03/2018 05:02 PM, Daniel Axtens wrote:
> >> From: Daniel Axtens 
> >>
> >> SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
> >> unconditionally mangling of that will result in nonsense value
> >> and would corrupt the skb later on.
> >>
> >> Therefore, i) add two helpers skb_increase_gso_size() and
> >> skb_decrease_gso_size() that would throw a one time warning and
> >> bail out for such skbs and ii) refuse and return early with an
> >> error in those BPF helpers that are affected. We do need to bail
> >> out as early as possible from there before any changes on the
> >> skb have been performed.
> >>
> >> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> >> Co-authored-by: Daniel Borkmann 
> >> Signed-off-by: Daniel Axtens 
> >> Cc: Marcelo Ricardo Leitner 
> >> Acked-by: Alexei Starovoitov 
> > 
> > I've looked over your changes and they all look good to me.
> > 
> >> +/* Note: Should be called only if skb_is_gso(skb) is true */
> >> +static inline bool skb_is_gso_sctp(const struct sk_buff *skb)
> >> +{
> >> +  return skb_shinfo(skb)->gso_type & SKB_GSO_SCTP;
> >> +}
> >> +
> > 
> > This helper is a fantastic idea and I will send a docs update to
> > highlight it.
> 
> Sounds good. There are infact several places in the code that
> could make use of this right away. If you have a chance, this
> could be done in net-next along with the doc update or so.

Applied to bpf tree, Thanks everyone.



Re: [PATCH bpf] bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs

2018-03-03 Thread Daniel Borkmann
On 03/03/2018 05:02 PM, Daniel Axtens wrote:
>> From: Daniel Axtens 
>>
>> SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
>> unconditionally mangling of that will result in nonsense value
>> and would corrupt the skb later on.
>>
>> Therefore, i) add two helpers skb_increase_gso_size() and
>> skb_decrease_gso_size() that would throw a one time warning and
>> bail out for such skbs and ii) refuse and return early with an
>> error in those BPF helpers that are affected. We do need to bail
>> out as early as possible from there before any changes on the
>> skb have been performed.
>>
>> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
>> Co-authored-by: Daniel Borkmann 
>> Signed-off-by: Daniel Axtens 
>> Cc: Marcelo Ricardo Leitner 
>> Acked-by: Alexei Starovoitov 
> 
> I've looked over your changes and they all look good to me.
> 
>> +/* Note: Should be called only if skb_is_gso(skb) is true */
>> +static inline bool skb_is_gso_sctp(const struct sk_buff *skb)
>> +{
>> +return skb_shinfo(skb)->gso_type & SKB_GSO_SCTP;
>> +}
>> +
> 
> This helper is a fantastic idea and I will send a docs update to
> highlight it.

Sounds good. There are infact several places in the code that
could make use of this right away. If you have a chance, this
could be done in net-next along with the doc update or so.

Thanks,
Daniel


Re: lost interrupts when running sabrelite images (v4.15+) in qemu

2018-03-03 Thread Guenter Roeck

On 03/03/2018 11:07 AM, Troy Kisky wrote:

On 3/3/2018 8:32 AM, Guenter Roeck wrote:

Hi,

since v4.15, I get the following runtime warning when running sabrelite images
in qemu.

irq 65: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<26292474>] fec_pps_interrupt
Disabling IRQ #65
fec 2188000.ethernet (unnamed net_device) (uninitialized): MDIO read timeout

Bisect points to commit 4ad1ceec05e491 ("net: fec: Let fec_ptp have its
own interrupt routine"). Analysis shows that platform_irq_count()
returns 2, which is reduced to 1 by fec_enet_get_irq_cnt().
If I let fec_enet_get_irq_cnt() return 2, the problem is gone.
Reverting commit 4ad1ceec05e491 also fixes the problem.

Bisect log is attached.



Sounds like you found a bug with qemu. I just booted sabrelite over nfs fine.
My interrupts look like this.


  64:  98767  0  0  0 GIC-0 150 Level 
2188000.ethernet
  65:  0  0  0  0 GIC-0 151 Level 
2188000.ethernet
___
Irq 65 is only for ptp interrrupts now. If qemu is signaling an tx/rx frame 
interrupt on 65,
then qemu is wrong. Of course, I've never used qemu so feel free to ignore me 
if I make no sense.



Thanks for checking with real hardware.

This is what I see (with your patch reverted):

 64:  0 GIC-0 150 Level 2188000.ethernet
 65: 64 GIC-0 151 Level 2188000.ethernet

Looking into the qemu source, I see:

#define FSL_IMX6_ENET_MAC_1588_IRQ 118
#define FSL_IMX6_ENET_MAC_IRQ 119

FSL_IMX6_ENET_MAC_IRQ is then connected to fec interrupt index 0, and 
FSL_IMX6_ENET_MAC_1588_IRQ
is connected to fec interrupt index 1.

This may suggest that the defines are reversed. I'll see what happens if I swap 
them.

Thanks,
Guenter


Re: lost interrupts when running sabrelite images (v4.15+) in qemu

2018-03-03 Thread Troy Kisky
On 3/3/2018 8:32 AM, Guenter Roeck wrote:
> Hi,
> 
> since v4.15, I get the following runtime warning when running sabrelite images
> in qemu.
> 
> irq 65: nobody cared (try booting with the "irqpoll" option)
> ...
> handlers:
> [<26292474>] fec_pps_interrupt
> Disabling IRQ #65
> fec 2188000.ethernet (unnamed net_device) (uninitialized): MDIO read timeout
> 
> Bisect points to commit 4ad1ceec05e491 ("net: fec: Let fec_ptp have its
> own interrupt routine"). Analysis shows that platform_irq_count()
> returns 2, which is reduced to 1 by fec_enet_get_irq_cnt().
> If I let fec_enet_get_irq_cnt() return 2, the problem is gone.
> Reverting commit 4ad1ceec05e491 also fixes the problem.
> 
> Bisect log is attached.
> 

Sounds like you found a bug with qemu. I just booted sabrelite over nfs fine.
My interrupts look like this.


 64:  98767  0  0  0 GIC-0 150 Level 
2188000.ethernet
 65:  0  0  0  0 GIC-0 151 Level 
2188000.ethernet
___
Irq 65 is only for ptp interrrupts now. If qemu is signaling an tx/rx frame 
interrupt on 65,
then qemu is wrong. Of course, I've never used qemu so feel free to ignore me 
if I make no sense.


BR
Troy



Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Alexander Duyck
On Sat, Mar 3, 2018 at 3:31 AM, Jiri Pirko  wrote:
> Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
>>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
>>> >Yeah, this code essentially calls out the "shareable" code with a
>>> >comment at the start and end of the section what defines the
>>> >virtio_bypass functionality. It would just be a matter of mostly
>>> >cutting and pasting to put it into a separate driver module.
>>>
>>> Please put it there and unite the use of it with netvsc.
>>
>>Surely, adding this to other drivers (e.g. might this be handy for xen
>>too?) can be left for a separate patchset. Let's get one device merged
>>first.
>
> Why? Let's do the generic infra alongside with the driver. I see no good
> reason to rush into merging driver and only later, if ever, to convert
> it to generic solution. On contrary. That would lead into multiple
> approaches and different behavious in multiple drivers. That is plain
> wrong.

If nothing else it doesn't hurt to do this in one driver in a generic
way, and once it has been proven to address all the needs of that one
driver we can then start moving other drivers to it. The current
solution is quite generic, that was my contribution to this patch set
as I didn't like how invasive it was being to virtio and thought it
would be best to keep this as minimally invasive as possible.

My preference would be to give this a release or two in virtio to
mature before we start pushing it onto other drivers. It shouldn't
take much to cut/paste this into a new driver file once we decide it
is time to start extending it out to other drivers.

- Alex


Re: [PATCH iproute2] ss: fix NULL dereference when rendering without header

2018-03-03 Thread Stefano Brivio
On Sat,  3 Mar 2018 16:59:44 +
Jean-Philippe Brucker  wrote:

> When ss is invoked with the no-header flag, if the query doesn't return
> any result, render() is called with 'buffer' uninitialized. This
> currently leads to a segfault. Ensure that buffer is initialized before
> rendering.
> 
> The bug can be triggered with: ss -H sport = 10

Oh dear. Nice catch, thanks for fixing this.

> Signed-off-by: Jean-Philippe Brucker 

Acked-by: Stefano Brivio 

-- 
Stefano


[PATCH iproute2] ss: fix NULL dereference when rendering without header

2018-03-03 Thread Jean-Philippe Brucker
When ss is invoked with the no-header flag, if the query doesn't return
any result, render() is called with 'buffer' uninitialized. This
currently leads to a segfault. Ensure that buffer is initialized before
rendering.

The bug can be triggered with: ss -H sport = 10

Signed-off-by: Jean-Philippe Brucker 
---
 misc/ss.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/misc/ss.c b/misc/ss.c
index e047f9c0..e087bef7 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1197,10 +1197,15 @@ newline:
 /* Render buffered output with spacing and delimiters, then free up buffers */
 static void render(int screen_width)
 {
-   struct buf_token *token = (struct buf_token *)buffer.head->data;
+   struct buf_token *token;
int printed, line_started = 0;
struct column *f;
 
+   if (!buffer.head)
+   return;
+
+   token = (struct buf_token *)buffer.head->data;
+
/* Ensure end alignment of last token, it wasn't necessarily flushed */
buffer.tail->end += buffer.cur->len % 2;
 
-- 
2.16.2



lost interrupts when running sabrelite images (v4.15+) in qemu

2018-03-03 Thread Guenter Roeck
Hi,

since v4.15, I get the following runtime warning when running sabrelite images
in qemu.

irq 65: nobody cared (try booting with the "irqpoll" option)
...
handlers:
[<26292474>] fec_pps_interrupt
Disabling IRQ #65
fec 2188000.ethernet (unnamed net_device) (uninitialized): MDIO read timeout

Bisect points to commit 4ad1ceec05e491 ("net: fec: Let fec_ptp have its
own interrupt routine"). Analysis shows that platform_irq_count()
returns 2, which is reduced to 1 by fec_enet_get_irq_cnt().
If I let fec_enet_get_irq_cnt() return 2, the problem is gone.
Reverting commit 4ad1ceec05e491 also fixes the problem.

Bisect log is attached.

Guenter


# bad: [d8a5b80568a9cb66810e75b182018e9edb68e8ff] Linux 4.15
# good: [bebc6082da0a9f5d47a1ea2edc099bf671058bd4] Linux 4.14
git bisect start 'v4.15' 'v4.14'
# bad: [5d352e69c60e54b5f04d6e337a1d2bf0dbf3d94a] Merge tag 'media/v4.15-1' of 
ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
git bisect bad 5d352e69c60e54b5f04d6e337a1d2bf0dbf3d94a
# good: [4e4510fec4af08ead21f6934c1410af1f19a8cad] Merge tag 'sound-4.15-rc1' 
of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
git bisect good 4e4510fec4af08ead21f6934c1410af1f19a8cad
# good: [9fb7bd77d11ab03b4a969279de9f54d8fd6fe988] mlxsw: spectrum_ipip: Split 
accessor functions
git bisect good 9fb7bd77d11ab03b4a969279de9f54d8fd6fe988
# bad: [22714a2ba4b55737cd7d5299db7aaf1fa8287354] Merge branch 'for-4.15' of 
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
git bisect bad 22714a2ba4b55737cd7d5299db7aaf1fa8287354
# bad: [f6b3716dcdcd1a4c3fa05ecb6ab0a1e52b6785d0] Merge branch 
'net-devname_alloc_cleanups'
git bisect bad f6b3716dcdcd1a4c3fa05ecb6ab0a1e52b6785d0
# bad: [f938daeee95eb36ef6b431bf054a5cc6cdada112] net/mlx5e: CHECKSUM_COMPLETE 
offload for VLAN/QinQ packets
git bisect bad f938daeee95eb36ef6b431bf054a5cc6cdada112
# good: [6c49b5e26004eef86e7a47093a53be290554351c] Merge branch 
'dsa-parsing-stage'
git bisect good 6c49b5e26004eef86e7a47093a53be290554351c
# bad: [ec5c91c6ca8b2d5ca6edfc968dbfeeaae4ed5572] net: dsa: lan9303: Replace 
msleep(1) with usleep_range()
git bisect bad ec5c91c6ca8b2d5ca6edfc968dbfeeaae4ed5572
# good: [fffcefe967a02997be7a296a4f0766b29dcd1a67] ipv6: addrconf: fix a 
lockdep splat
git bisect good fffcefe967a02997be7a296a4f0766b29dcd1a67
# bad: [aaf151b9e68101b03ba42d581e8a424bdd0110fe] bpf: Rename tcp_bbf.readme to 
tcp_bpf.readme
git bisect bad aaf151b9e68101b03ba42d581e8a424bdd0110fe
# bad: [3e29cd0e6563d5fefd59e7225750ee9922f2dad5] xdp: Sample xdp program 
implementing ip forward
git bisect bad 3e29cd0e6563d5fefd59e7225750ee9922f2dad5
# good: [0cf737808ae7cb25e952be619db46b9147a92f46] hv_netvsc: 
netvsc_teardown_gpadl() split
git bisect good 0cf737808ae7cb25e952be619db46b9147a92f46
# good: [ca1b17b7e843123f5a1e4c8bd2d7b6596ffe6e93] Merge branch 
'hv_netvsc-fix-a-hang-on-channel-mtu-changes'
git bisect good ca1b17b7e843123f5a1e4c8bd2d7b6596ffe6e93
# bad: [4ad1ceec05e49175d0f967cc87628101e79176f6] net: fec: Let fec_ptp have 
its own interrupt routine
git bisect bad 4ad1ceec05e49175d0f967cc87628101e79176f6
# first bad commit: [4ad1ceec05e49175d0f967cc87628101e79176f6] net: fec: Let 
fec_ptp have its own interrupt routine


[PATCH v2] net: ethernet: Drop unnecessary continue

2018-03-03 Thread Arushi Singhal
Continue at the bottom of a loop are removed.
Issue found using drop_continue.cocci Coccinelle script.

Signed-off-by: Arushi Singhal 
---
Changes in v2:
- Braces is dropped from if with single statement.

 drivers/net/ethernet/amd/ni65.c   | 4 +---
 drivers/net/ethernet/neterion/s2io.c  | 4 +---
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 4 +---
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/amd/ni65.c b/drivers/net/ethernet/amd/ni65.c
index e248d1a..8931ce6 100644
--- a/drivers/net/ethernet/amd/ni65.c
+++ b/drivers/net/ethernet/amd/ni65.c
@@ -435,10 +435,8 @@ static int __init ni65_probe1(struct net_device *dev,int 
ioaddr)
}
if(cards[i].vendor_id) {
for(j=0;j<3;j++)
-   if(inb(ioaddr+cards[i].addr_offset+j) != 
cards[i].vendor_id[j]) {
+   if(inb(ioaddr+cards[i].addr_offset+j) != 
cards[i].vendor_id[j])
release_region(ioaddr, 
cards[i].total_size);
-   continue;
- }
}
break;
}
diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index b8983e7..4738bc7 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -3679,11 +3679,9 @@ static void restore_xmsi_data(struct s2io_nic *nic)
writeq(nic->msix_info[i].data, >xmsi_data);
val64 = (s2BIT(7) | s2BIT(15) | vBIT(msix_index, 26, 6));
writeq(val64, >xmsi_access);
-   if (wait_for_msix_trans(nic, msix_index)) {
+   if (wait_for_msix_trans(nic, msix_index))
DBG_PRINT(ERR_DBG, "%s: index: %d failed\n",
  __func__, msix_index);
-   continue;
-   }
}
 }
 
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 15fa47f..5cd4f3f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -258,10 +258,8 @@ nfp_net_pf_alloc_vnics(struct nfp_pf *pf, void __iomem 
*ctrl_bar,
ctrl_bar += NFP_PF_CSR_SLICE_SIZE;
 
/* Kill the vNIC if app init marked it as invalid */
-   if (nn->port && nn->port->type == NFP_PORT_INVALID) {
+   if (nn->port && nn->port->type == NFP_PORT_INVALID)
nfp_net_pf_free_vnic(pf, nn);
-   continue;
-   }
}
 
if (list_empty(>vnics))
-- 
2.7.4



Re: [PATCH bpf] bpf: fix bpf_skb_adjust_net/bpf_skb_proto_xlat to deal with gso sctp skbs

2018-03-03 Thread Daniel Axtens
Hi Daniel,

> From: Daniel Axtens 
>
> SCTP GSO skbs have a gso_size of GSO_BY_FRAGS, so any sort of
> unconditionally mangling of that will result in nonsense value
> and would corrupt the skb later on.
>
> Therefore, i) add two helpers skb_increase_gso_size() and
> skb_decrease_gso_size() that would throw a one time warning and
> bail out for such skbs and ii) refuse and return early with an
> error in those BPF helpers that are affected. We do need to bail
> out as early as possible from there before any changes on the
> skb have been performed.
>
> Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper")
> Co-authored-by: Daniel Borkmann 
> Signed-off-by: Daniel Axtens 
> Cc: Marcelo Ricardo Leitner 
> Acked-by: Alexei Starovoitov 

I've looked over your changes and they all look good to me.

> +/* Note: Should be called only if skb_is_gso(skb) is true */
> +static inline bool skb_is_gso_sctp(const struct sk_buff *skb)
> +{
> + return skb_shinfo(skb)->gso_type & SKB_GSO_SCTP;
> +}
> +

This helper is a fantastic idea and I will send a docs update to
highlight it.

Regards,
Daniel


[ANNOUNCE] nftables 0.8.3 release

2018-03-03 Thread Florian Westphal
Hi!

The Netfilter project proudly presents:

nftables 0.8.3

This release includes a few fixes since last release plus following
enhancements:
 - ifname_type, so its possible to match interface names via sets:

  table inet t {
set s {
  type ifname
  elements = { "eth0",
   "eth1" }
 }
 chain c {
   iifname @s accept
   oifname @s accept
 }
  }

- raw payload support to match headers that do not yet have
a more human-readable mnemonic.  This also allows to match
udp and tcp port numbers in a single rule, because the raw
payload expression doesn't enforce a protocol dependency on
the network header.  Example:

 input meta l4proto {tcp, udp} @th,16,16 { dns, http }

 matches both udp and tcp dport 53 and 80 in single rule.

See ChangeLog that comes attached to this email for more details.

You can download it from:

http://www.netfilter.org/projects/nftables/downloads.html#nftables-0.8.3
ftp://ftp.netfilter.org/pub/nftables/

To build the code, libnftnl 1.0.9 and libmnl >= 1.0.2 are required:

* http://netfilter.org/projects/libnftnl/index.html
* http://netfilter.org/projects/libmnl/index.html

Visit our wikipage for user documentation at:

* http://wiki.nftables.org

For the manpage reference, check man(8) nft.

In case of bugs and feature request, file them via:

* https://bugzilla.netfilter.org

Happy firewalling!
Arturo Borrero Gonzalez (4):
  nftables: rearrange files and examples
  examples: add ct helper examples
  files: add load balance example
  meta: introduce datatype ifname_type

Baruch Siach (1):
  src: fix build with older glibc

David Fabian (1):
  Added undefine/redefine keywords

Duncan Roe (1):
  doc/nft.xml: fix typo

Florian Westphal (16):
  tests: enable sets test case 27
  tests: add test case for sets updated from packet path
  payload: don't decode past last valid template
  include: fix build failure
  tests: meta.t: fix test case for anonymous set automerge
  payload: use integer_type when initializing a raw expression
  payload: don't resolve expressions using the inet pseudoheader
  src: make raw payloads work
  doc: document raw protocol expression
  tests: add raw payload test cases.
  doc: mention meta l4proto and ipv6 nexthdr issue wrt. extension headers
  doc: remove ipv6 address FIXME
  doc: add example for rule add/delete
  parser: use nf_key_proto
  src: datatype: prefer sscanf, avoid strncpy
  build: Bump version to v0.8.3

Harsha Sharma (2):
  libnftables: don't crash when no commands are specified
  src: Use snprintf() over strncpy()

Laura Garcia Liebana (1):
  parser: support of maps with timeout

Pablo Neira Ayuso (11):
  src: pass family to payload_dependency_kill()
  payload: add payload_dependency_release() helper function
  src: add payload_dependency_exists()
  src: get rid of __payload_dependency_kill()
  payload: add payload_may_dependency_kill()
  netlink_delinearize: add meta_may_dependency_kill()
  src: bail out when exporting ruleset with unsupported output
  segtree: check for overlapping elements at insertion
  tests: shell: regression test for bugzilla 1228
  configure: misc updates
  netlink: remove non-batching routines

Phil Sutter (10):
  evaluate: Enable automerge feature for anonymous sets
  Review switch statements for unmarked fall through cases
  monitor: Make trace events respect output_fp
  monitor: Make JSON/XML output respect output_fp
  cli: Drop pointless check in cli_append_multiline()
  erec: Avoid passing negative offset to fseek()
  evaluate: Fix memleak in stmt_reject_gen_dependency()
  hash: Fix potential null-pointer dereference in hash_expr_cmp()
  netlink: Complain if setting O_NONBLOCK fails
  netlink_delinearize: Fix resource leaks

Ville Skyttä (2):
  configure: Make missing docbook2man an error if man build requested
  src: Spelling fixes



Re: [Outreachy kernel] [PATCH] net: ethernet: Drop unnecessary continue

2018-03-03 Thread Julia Lawall


On Sat, 3 Mar 2018, Arushi Singhal wrote:

> Continue at the bottom of a loop are removed.
> Issue found using drop_continue.cocci Coccinelle script.

In each case you leave an if with a single statement in the branch.  In
that case the { } should be dropped too.

julia

>
> Signed-off-by: Arushi Singhal 
> ---
>  drivers/net/ethernet/amd/ni65.c   | 1 -
>  drivers/net/ethernet/neterion/s2io.c  | 1 -
>  drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 1 -
>  3 files changed, 3 deletions(-)
>
> diff --git a/drivers/net/ethernet/amd/ni65.c b/drivers/net/ethernet/amd/ni65.c
> index e248d1a..5975f29 100644
> --- a/drivers/net/ethernet/amd/ni65.c
> +++ b/drivers/net/ethernet/amd/ni65.c
> @@ -437,7 +437,6 @@ static int __init ni65_probe1(struct net_device *dev,int 
> ioaddr)
>   for(j=0;j<3;j++)
>   if(inb(ioaddr+cards[i].addr_offset+j) != 
> cards[i].vendor_id[j]) {
>   release_region(ioaddr, 
> cards[i].total_size);
> - continue;
> }
>   }
>   break;
> diff --git a/drivers/net/ethernet/neterion/s2io.c 
> b/drivers/net/ethernet/neterion/s2io.c
> index b8983e7..5123abd 100644
> --- a/drivers/net/ethernet/neterion/s2io.c
> +++ b/drivers/net/ethernet/neterion/s2io.c
> @@ -3682,7 +3682,6 @@ static void restore_xmsi_data(struct s2io_nic *nic)
>   if (wait_for_msix_trans(nic, msix_index)) {
>   DBG_PRINT(ERR_DBG, "%s: index: %d failed\n",
> __func__, msix_index);
> - continue;
>   }
>   }
>  }
> diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
> b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
> index 15fa47f..77916ed 100644
> --- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
> +++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
> @@ -260,7 +260,6 @@ nfp_net_pf_alloc_vnics(struct nfp_pf *pf, void __iomem 
> *ctrl_bar,
>   /* Kill the vNIC if app init marked it as invalid */
>   if (nn->port && nn->port->type == NFP_PORT_INVALID) {
>   nfp_net_pf_free_vnic(pf, nn);
> - continue;
>   }
>   }
>
> --
> 2.7.4
>
> --
> You received this message because you are subscribed to the Google Groups 
> "outreachy-kernel" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to outreachy-kernel+unsubscr...@googlegroups.com.
> To post to this group, send email to outreachy-ker...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/outreachy-kernel/20180303130933.GA8337%40seema-Inspiron-15-3567.
> For more options, visit https://groups.google.com/d/optout.
>


[PATCH] net: ethernet: Drop unnecessary continue

2018-03-03 Thread Arushi Singhal
Continue at the bottom of a loop are removed.
Issue found using drop_continue.cocci Coccinelle script.

Signed-off-by: Arushi Singhal 
---
 drivers/net/ethernet/amd/ni65.c   | 1 -
 drivers/net/ethernet/neterion/s2io.c  | 1 -
 drivers/net/ethernet/netronome/nfp/nfp_net_main.c | 1 -
 3 files changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/amd/ni65.c b/drivers/net/ethernet/amd/ni65.c
index e248d1a..5975f29 100644
--- a/drivers/net/ethernet/amd/ni65.c
+++ b/drivers/net/ethernet/amd/ni65.c
@@ -437,7 +437,6 @@ static int __init ni65_probe1(struct net_device *dev,int 
ioaddr)
for(j=0;j<3;j++)
if(inb(ioaddr+cards[i].addr_offset+j) != 
cards[i].vendor_id[j]) {
release_region(ioaddr, 
cards[i].total_size);
-   continue;
  }
}
break;
diff --git a/drivers/net/ethernet/neterion/s2io.c 
b/drivers/net/ethernet/neterion/s2io.c
index b8983e7..5123abd 100644
--- a/drivers/net/ethernet/neterion/s2io.c
+++ b/drivers/net/ethernet/neterion/s2io.c
@@ -3682,7 +3682,6 @@ static void restore_xmsi_data(struct s2io_nic *nic)
if (wait_for_msix_trans(nic, msix_index)) {
DBG_PRINT(ERR_DBG, "%s: index: %d failed\n",
  __func__, msix_index);
-   continue;
}
}
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
index 15fa47f..77916ed 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_main.c
@@ -260,7 +260,6 @@ nfp_net_pf_alloc_vnics(struct nfp_pf *pf, void __iomem 
*ctrl_bar,
/* Kill the vNIC if app init marked it as invalid */
if (nn->port && nn->port->type == NFP_PORT_INVALID) {
nfp_net_pf_free_vnic(pf, nn);
-   continue;
}
}
 
-- 
2.7.4



Re: [PATCH net] sch_netem: fix skb leak in netem_enqueue()

2018-03-03 Thread Neil Horman
On Fri, Mar 02, 2018 at 09:16:48PM +0300, Alexey Kodanev wrote:
> When we exceed current packets limit and have more than one
> segment in the list returned by skb_gso_segment(), netem drops
> only the first one, skipping the rest, hence kmemleak reports:
> 
> unreferenced object 0x880b5d23b600 (size 1024):
>   comm "softirq", pid 0, jiffies 4384527763 (age 2770.629s)
>   hex dump (first 32 bytes):
> 00 80 23 5d 0b 88 ff ff 00 00 00 00 00 00 00 00  ..#]
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   backtrace:
> [] __alloc_skb+0xc9/0x520
> [<1709b32f>] skb_segment+0x8c8/0x3710
> [] tcp_gso_segment+0x331/0x1830
> [] inet_gso_segment+0x476/0x1370
> [<8b762dd4>] skb_mac_gso_segment+0x1f9/0x510
> [<2182660a>] __skb_gso_segment+0x1dd/0x620
> [<412651b9>] netem_enqueue+0x1536/0x2590 [sch_netem]
> [<05d3b2a9>] __dev_queue_xmit+0x1167/0x2120
> [] ip_finish_output2+0x998/0xf00
> [] ip_output+0x1aa/0x2c0
> [<7ecbd3a4>] tcp_transmit_skb+0x18db/0x3670
> [<42d2a45f>] tcp_write_xmit+0x4d4/0x58c0
> [<56a44199>] tcp_tasklet_func+0x3d9/0x540
> [<13d06d02>] tasklet_action+0x1ca/0x250
> [] __do_softirq+0x1b4/0x5a3
> [] irq_exit+0x1e2/0x210
> 
> Fix it by adding the rest of the segments, if any, to skb
> 'to_free' list in that case.
> 
> Fixes: 6071bd1aa13e ("netem: Segment GSO packets on enqueue")
> Signed-off-by: Alexey Kodanev 
> ---
>  net/sched/sch_netem.c | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
> index 7c179ad..a5023a2 100644
> --- a/net/sched/sch_netem.c
> +++ b/net/sched/sch_netem.c
> @@ -508,8 +508,14 @@ static int netem_enqueue(struct sk_buff *skb, struct 
> Qdisc *sch,
>   1<<(prandom_u32() % 8);
>   }
>  
> - if (unlikely(sch->q.qlen >= sch->limit))
> + if (unlikely(sch->q.qlen >= sch->limit)) {
> + while (segs) {
> + skb2 = segs->next;
> + __qdisc_drop(segs, to_free);
> + segs = skb2;
> + }
>   return qdisc_drop(skb, sch, to_free);
> + }
>  
It seems like it might be nice to wrap up this drop loop into a
qdisc_drop_all inline function.  Then we can easily drop segments in other
locations if we should need to

Regards
Neil
 
>   qdisc_qstats_backlog_inc(sch, skb);
>  
> -- 
> 1.8.3.1
> 
> 


Re: [PATCH v4 2/2] virtio_net: Extend virtio to use VF datapath when available

2018-03-03 Thread Jiri Pirko
Fri, Mar 02, 2018 at 08:42:47PM CET, m...@redhat.com wrote:
>On Fri, Mar 02, 2018 at 05:20:17PM +0100, Jiri Pirko wrote:
>> >Yeah, this code essentially calls out the "shareable" code with a
>> >comment at the start and end of the section what defines the
>> >virtio_bypass functionality. It would just be a matter of mostly
>> >cutting and pasting to put it into a separate driver module.
>> 
>> Please put it there and unite the use of it with netvsc.
>
>Surely, adding this to other drivers (e.g. might this be handy for xen
>too?) can be left for a separate patchset. Let's get one device merged
>first.

Why? Let's do the generic infra alongside with the driver. I see no good
reason to rush into merging driver and only later, if ever, to convert
it to generic solution. On contrary. That would lead into multiple
approaches and different behavious in multiple drivers. That is plain
wrong.


Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes

2018-03-03 Thread Stefano Brivio
On Fri, 2 Mar 2018 15:39:03 -0700
David Ahern  wrote:

> On 3/2/18 8:36 AM, Stefano Brivio wrote:
> > Currently, administrative MTU changes on a given netdevice are
> > not reflected on route exceptions for MTU-less routes, with a
> > set PMTU value, for that device:
> > 
> >  # ip -6 route get 3000::b
> >  3000::b from :: dev vti_a proto kernel src 3000::a metric 256 pref medium
> >  # ping6 -c 1 -q -s1 3000::b > /dev/null
> >  # ip netns exec a ip -6 route get 3000::b
> >  3000::b from :: dev vti_a src 3000::a metric 0
> >  cache expires 571sec mtu 4926 pref medium
> >  # ip link set dev vti_a mtu 3000
> >  # ip -6 route get 3000::b
> >  3000::b from :: dev vti_a src 3000::a metric 0
> >  cache expires 571sec mtu 4926 pref medium
> >  # ip link set dev vti_a mtu 9000
> >  # ip -6 route get 3000::b
> >  3000::b from :: dev vti_a src 3000::a metric 0
> >  cache expires 571sec mtu 4926 pref medium  
> 
> Addresses in the 2001:db8: range should be used for commit messages.

Thanks for pointing this out. I never related the "documentation
purposes" from RFC3849 to commit messages so far, but in the end this
is nothing else than documentation. I will post a v2 with updated
commit message.

> And please codify the above expectation as a test under
> tools/testing/selftests/net

And this, along with v2.

-- 
Stefano


Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for MTU-less routes

2018-03-03 Thread Stefano Brivio
Hi Maciej,

On Fri, 2 Mar 2018 10:54:36 -0800
Maciej Żenczykowski  wrote:

> I spend a significant fraction of my time making sure we never rely on PMTUD.

Thanks for your comments.

I see your point, but here we are not blindly relying on PMTUD,
rather reflecting an MTU administrative change on the PMTU, and making
the behaviour consistent between regular routes and exceptions, which
is nothing else than a bug fix.

This behaviour reflects RFC 8201, par. 3:

The basic idea is that a source node initially assumes that
the PMTU of a path is the (known) MTU of the first hop in the path.

and the need for it is clearly explained by the existing comment in
rt6_mtu_change_route():

/* For administrative MTU increase, there is no way to discover
   IPv6 PMTU increase, so PMTU increase should be updated here.
   Since RFC 1981 doesn't include administrative MTU increase
   update PMTU increase is a MUST. (i.e. jumbo frame)
 */

Letting that aside for a moment, a PMTU increase due to my fix is only
possible if the old local MTU (administratively set) was the lowest in
the path, no PMTUD happened meanwhile (but we have an exception route
in place e.g. due to a tunnel calling skb_dst_update_mtu()), and we get
a subsequent administrative change of the local MTU.

Relying on some old value set by the user is simply a bug, and breaks
the natural user assumption that increasing the MTU will have an
effect, if PMTU is not otherwise constrained.

If PMTUD is not working, we will rely on the MTU values set by the
user. This looks like the only sane thing to do.

> Debugging MTU related blackholes is a constant bane of my existence.
> 
> [btw. we're considering adding a hack to always fragment UDP to
> min(1280, dev/route/path mtu)...]
> 
> Basically: lower is always better because it's more likely to work...

This is not directly related to my fix, but I wonder if we shouldn't,
in general, simply comply with RFCs, and provide ways out in case the
network is broken, instead of breaking expected behaviours by default,
or making things work "by mistake". The way out, here, is as simple as
setting 1280 as MTU for the local interface.

Somebody might say higher is better because you avoid fragmentation. So
I would just keep the implementation compliant (and, perhaps more
importantly, consistent).

-- 
Stefano


Re: [RFC 0/2] kernel: add support to collect hardware logs in panic

2018-03-03 Thread Rahul Lakkireddy
On Friday, March 03/02/18, 2018 at 18:52:45 +0530, Eric W. Biederman wrote:
> Rahul Lakkireddy  writes:
> 
> > On production servers running variety of workloads over time, kernel
> > panic can happen sporadically after days or even months. It is
> > important to collect as much debug logs as possible to root cause
> > and fix the problem, that may not be easy to reproduce. Snapshot of
> > underlying hardware/firmware state (like register dump, firmware
> > logs, adapter memory, etc.), at the time of kernel panic will be very
> > helpful while debugging the culprit device driver.
> >
> > This series of patches add new generic framework that enable device
> > drivers to collect device specific snapshot of the hardware/firmware
> > state of the underlying device at the time of kernel panic. The
> > collected logs are appended to vmcore along with details, such as
> > start address and length of the logs, which are required for
> > extraction during post-analysis.
> >
> > Device drivers can use crash_driver_dump_register() to register their
> > callback that collects underlying device specific hardware/firmware
> > logs during kernel panic (i.e. before booting into the second kernel).
> > Drivers can unregister with crash_driver_dump_unregister().
> >
> > To extract the device specific hardware/firmware logs using crash:
> >
> > crash> help -D | grep DRIVERDUMP
> > DRIVERDUMP=(cxgb4_:02:00.4, b131090bd000, 37782968)
> >
> > crash> rd b131090bd000 37782968 -r hardware.log
> > 37782968 bytes copied from 0xb131090bd000 to hardware.log
> >
> > Patch 1 adds API to allow drivers to register callback to
> > collect the device specific hardware/firmware logs.
> >
> > Patch 2 shows a cxgb4 driver example using the API to collect
> > hardware/firmware logs during kernel panic.
> >
> > Suggestions and feedback will be much appreciated.
> 
> I strongly suggest you figure out how to run this code in the
> crash recovery kernel before your hardware is initialized.
> That will give you a known good kernel to perform your collection from.
> 
> Every line of code we add to the kexec on panic code path tends to add
> to it's fragility and increase the chance you won't get any information
> at all.
> 
> When the assumption is it is something wrong with your driver/hardware
> that caused the crash, calling into your driver is a very bad idea.
> Especially running code that does callbacks and all kinds of other cute
> things.
> 
> Doing this as the crash recover kernel boots up before much if any
> hardware is initialized seems like a fine thing to do, and just
> needs a little coordination with userspace to ensure the information
> gets saved when a vmcore is computed.
> 

Thanks for the feedback and suggestions. I will work on achieving
this from the crash recover kernel.

Thanks,
Rahul


Re: ppp/pppoe, still panic 4.15.3 in ppp_push

2018-03-03 Thread Denys Fedoryshchenko

On 2018-03-02 19:43, Guillaume Nault wrote:

On Thu, Mar 01, 2018 at 10:07:05PM +0200, Denys Fedoryshchenko wrote:

On 2018-03-01 22:01, Guillaume Nault wrote:
> diff --git a/drivers/net/ppp/ppp_generic.c
> b/drivers/net/ppp/ppp_generic.c
> index 255a5def56e9..2acf4b0eabd1 100644
> --- a/drivers/net/ppp/ppp_generic.c
> +++ b/drivers/net/ppp/ppp_generic.c
> @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int
> unit)
>goto outl;
>
>ppp_lock(ppp);
> +  spin_lock_bh(>downl);
> +  if (!pch->chan) {
> +  /* Don't connect unregistered channels */
> +  ppp_unlock(ppp);
> +  spin_unlock_bh(>downl);


This is obviously wrong. It should have been
+   spin_unlock_bh(>downl);
+   ppp_unlock(ppp);

Sorry, I shouldn't have hurried.
This is fixed in the official version.


> +  ret = -ENOTCONN;
> +  goto outl;
> +  }
> +  spin_unlock_bh(>downl);
>if (pch->file.hdrlen > ppp->file.hdrlen)
>ppp->file.hdrlen = pch->file.hdrlen;
>hdrlen = pch->file.hdrlen + 2;   /* for protocol bytes */
Ok, i will try to test that at night.
Thanks a lot! For me also problem solved anyway by removing 
unit-cache, just

i think it's nice to have bug fixed :)

I think this bug has been there forever, indeed it's good to have it 
fixed.

Thanks a lot for your help (and patience!).

FYI, if you see accel-ppp logs like
"ioctl(PPPIOCCONNECT): Transport endpoint is not connected", then that
means the patch prevented the scenario that was leading to the original
crash.

Out of curiosity, did unit-cache really bring performance improvements
on your workload?
On old kernels it definitely did, due local specifics (electricity 
outages) i might have few thousands of interfaces deleted and created 
again in short period of time.
And before interfaces creation/deletion (especially when there is 
thousands of them) was very expensive.


Re: [RFC PATCH V1 01/12] audit: add container id

2018-03-03 Thread Serge E. Hallyn
On Thu, Mar 01, 2018 at 02:41:04PM -0500, Richard Guy Briggs wrote:
...
> +static inline bool audit_containerid_set(struct task_struct *tsk)

Hi Richard,

the calls to audit_containerid_set() confused me.  Could you make it
is_audit_containerid_set() or audit_containerid_isset()?

> +{
> + return audit_get_containerid(tsk) != INVALID_CID;
> +}




[no subject]

2018-03-03 Thread Vanesa Ali
hi My name is Vanessa Ali. a France Nationality, I am a widow,
currently hospitalized due to cancer illness . Meanwhile, I have
decided to donate my fund to you as a reliable individual that will
use this money wisely, €2,800.000 Million Euros. to help the poor and
less privileged.

So if you are willing to accept this offer and do exactly as I will
instruct, then get back to me for more details.

Mrs. Vanessa Ali


Re: [PATCH net-next 1/2] mac80211_hwsim: Make hwsim_netgroup IDA

2018-03-03 Thread Benjamin Beichler


Am 2. März 2018 12:37:25 MEZ schrieb Kirill Tkhai :
>destroy_radio() may be executed in parallel with everything above you
>wrote,
>doesn't it? There may be several network namespaces, and
>destroy_radio()
>queued from one net namespace may race with mac80211_hwsim_new_radio()
>or hwsim_del_radio_nl() for another net namespace. I don't see, how
>netlink
>locking can act on synchronization with a work. This is what I mention.
>
I see, you are right. Nonetheless, this value is pretty uncritical, since the 
user (the netlink dump) only checks whether it changes within a dump and even 
if there would be race conditions, e.g. some generations would be skipped 
caused by parallel writing, it would also set the dump interrupted flag, and 
the user space program knows, if it needs exact results, it needs to dump 
again. I'm unsure about things like caching of this variable. Maybe it needs a 
volatile flag to work always as expected.

Unfortunately, currently the code triggers a dump interrupted also when the 
interfaces of the current namespace didn't change, but I think that is 
acceptable. Otherwise we need a per namespace generation and I think all this 
happens really rare and it's not worth the effort.


>Thanks,
>Kirill

-- 
M.Sc. Benjamin Beichler

Universität Rostock, Fakultät für Informatik und Elektrotechnik
Institut für Angewandte Mikroelektronik und Datentechnik

University of Rostock, Department of CS and EE
Institute of Applied Microelectronics and CE

Richard-Wagner-Straße 31
18119 Rostock
Deutschland/Germany

phone: +49 (0) 381 498 - 7278
email: benjamin.beich...@uni-rostock.de
www: http://www.imd.uni-rostock.de/