date:20180107

Re: atm/clip: Use seq_puts() in svc_addr()

2018-01-07 Thread SF Markus Elfring

>> Two strings should be quickly put into a sequence by two function calls.
>> Thus use the function "seq_puts" instead of "seq_printf".
>>
>> This issue was detected by using the Coccinelle software.
> 
> Can you please explain what the issue really is and what you're trying
> to do here?

Is the function "seq_puts" a bit more efficient for the desired output
of a single string in comparison to calling the function "seq_printf"
for this purpose?


> One shouldn't need to dig into Coccinelle patterns to find
> out what you mean,

Why did an attribution for a software tool confuse you?


> and "strings should be quickly put into a sequence"
> isn't terribly helpful.

Which wording would you find more appropriate for the suggested
adjustment of these function calls?

Regards,
Markus

Re: [PATCH 07/18] [media] uvcvideo: prevent bounds-check bypass via speculative execution

2018-01-07 Thread Greg KH

On Sat, Jan 06, 2018 at 09:41:17AM -0800, Dan Williams wrote:
> On Sat, Jan 6, 2018 at 1:40 AM, Greg KH  wrote:
> > On Sat, Jan 06, 2018 at 10:09:07AM +0100, Greg KH wrote:
> >> On Fri, Jan 05, 2018 at 05:10:32PM -0800, Dan Williams wrote:
> >> > Static analysis reports that 'index' may be a user controlled value that
> >> > is used as a data dependency to read 'pin' from the
> >> > 'selector->baSourceID' array. In order to avoid potential leaks of
> >> > kernel memory values, block speculative execution of the instruction
> >> > stream that could issue reads based on an invalid value of 'pin'.
> >> >
> >> > Based on an original patch by Elena Reshetova.
> >> >
> >> > Cc: Laurent Pinchart 
> >> > Cc: Mauro Carvalho Chehab 
> >> > Cc: linux-me...@vger.kernel.org
> >> > Signed-off-by: Elena Reshetova 
> >> > Signed-off-by: Dan Williams 
> >> > ---
> >> >  drivers/media/usb/uvc/uvc_v4l2.c |7 +--
> >> >  1 file changed, 5 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/drivers/media/usb/uvc/uvc_v4l2.c 
> >> > b/drivers/media/usb/uvc/uvc_v4l2.c
> >> > index 3e7e283a44a8..7442626dc20e 100644
> >> > --- a/drivers/media/usb/uvc/uvc_v4l2.c
> >> > +++ b/drivers/media/usb/uvc/uvc_v4l2.c
> >> > @@ -22,6 +22,7 @@
> >> >  #include 
> >> >  #include 
> >> >  #include 
> >> > +#include 
> >> >
> >> >  #include 
> >> >  #include 
> >> > @@ -810,6 +811,7 @@ static int uvc_ioctl_enum_input(struct file *file, 
> >> > void *fh,
> >> > struct uvc_entity *iterm = NULL;
> >> > u32 index = input->index;
> >> > int pin = 0;
> >> > +   __u8 *elem;
> >> >
> >> > if (selector == NULL ||
> >> > (chain->dev->quirks & UVC_QUIRK_IGNORE_SELECTOR_UNIT)) {
> >> > @@ -820,8 +822,9 @@ static int uvc_ioctl_enum_input(struct file *file, 
> >> > void *fh,
> >> > break;
> >> > }
> >> > pin = iterm->id;
> >> > -   } else if (index < selector->bNrInPins) {
> >> > -   pin = selector->baSourceID[index];
> >> > +   } else if ((elem = nospec_array_ptr(selector->baSourceID, index,
> >> > +   selector->bNrInPins))) {
> >> > +   pin = *elem;
> >>
> >> I dug through this before, and I couldn't find where index came from
> >> userspace, I think seeing the coverity rule would be nice.
> >
> > Ok, I take it back, this looks correct.  Ugh, the v4l ioctl api is
> > crazy complex (rightfully so), it's amazing that coverity could navigate
> > that whole thing :)
> >
> > While I'm all for fixing this type of thing, I feel like we need to do
> > something "else" for this as playing whack-a-mole for this pattern is
> > going to be a never-ending battle for all drivers for forever.  Either
> > we need some way to mark this data path to make it easy for tools like
> > sparse to flag easily, or we need to catch the issue in the driver
> > subsystems, which unfortunatly, would harm the drivers that don't have
> > this type of issue (like here.)
> >
> > I'm guessing that other operating systems, which don't have the luxury
> > of auditing all of their drivers are going for the "big hammer in the
> > subsystem" type of fix, right?
> >
> > I don't have a good answer for this, but if there was some better way to
> > rewrite these types of patterns to just prevent the need for the
> > nospec_array_ptr() type thing, that might be the best overall for
> > everyone.  Much like ebpf did with their changes.  That way a simple
> > coccinelle rule would be able to catch the pattern and rewrite it.
> >
> > Or am I just dreaming?
> 
> At least on the coccinelle front you're dreaming. Julia already took a
> look and said:
> 
> "I don't think Coccinelle would be good for doing this (ie
> implementing taint analysis) because the dataflow is too complicated."

Sorry for the confusion, no, I don't mean the "taint tracking", I mean
the generic pattern of "speculative out of bounds access" that we are
fixing here.

Yes, as you mentioned before, there are tons of false-positives in the
tree, as to find the real problems you have to show that userspace
controls the access index.  But if we have a generic pattern that can
rewrite that type of logic into one where it does not matter at all
(i.e. like the ebpf proposed changes), then it would not be an issue if
they are false or not, we just rewrite them all to be safe.

We need to find some way not only to fix these issues now (like you are
doing with this series), but to prevent them from every coming back into
the codebase again.  It's that second part that we need to keep in the
back of our minds here, while doing the first portion of this work.

> Perhaps the Coverity instance Dave mentioned at Ksummit 2012 has a
> role to play here?

We have a coverity instance that all kernel developers have access to
(just sign up and we grant it.)  We have at least one person working
full time on fixing up errors that this instance reports.  So if we
could get those rules added (which is why I asked for them), it wo

Re: [PATCH net-next 1/6] net: Fix netdev_WARN_ONCE macro

2018-01-07 Thread Gal Pressman

On 05-Jan-18 20:35, David Miller wrote:
> From: Gal Pressman 
> Date: Thu,  4 Jan 2018 16:55:58 +0200
> 
>> netdev_WARN_ONCE is broken (whoops..), fix it.
> 
> You don't say how it is broken and how you are fixing that
> brokenness, and you must do this.
> 
> Yes, if I stare at your patch for a few minutes I can see
> the missing "," but you could have saved everyone reviewing
> this patch (now and in the future) by writing a proper and
> complete commit message.
> 

Thank you for reviewing,
I will fix the commit message and submit v2.

Re: [PATCH net-next v2 06/10] net/mlx5e: Change Mellanox references in DIM code

2018-01-07 Thread Tal Gilboa




On 1/6/2018 12:58 AM, Andy Gospodarek wrote:

From: Andy Gospodarek 

Change all appropriate mlx5_am* and MLX5_AM* references to net_dim and
NET_DIM, respectively, in code that handles dynamic interrupt
moderation.  Also change all references from 'am' to 'dim' when used as
local variables.

Signed-off-by: Andy Gospodarek 
Acked-by: Tal Gilboa 
Acked-by: Saeed Mahameed 

---
  drivers/net/ethernet/mellanox/mlx5/core/en.h   |  12 +-
  drivers/net/ethernet/mellanox/mlx5/core/en_dim.c   |  14 +-
  .../net/ethernet/mellanox/mlx5/core/en_ethtool.c   |  12 +-
  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  52 ++--
  drivers/net/ethernet/mellanox/mlx5/core/en_rep.c   |   4 +-
  drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
  drivers/net/ethernet/mellanox/mlx5/core/net_dim.c  | 284 ++---
  drivers/net/ethernet/mellanox/mlx5/core/net_dim.h  |  63 +++--
  8 files changed, 232 insertions(+), 211 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 121f280..732f275 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -115,6 +115,9 @@
  #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES0x80
  #define MLX5E_PARAMS_DEFAULT_MIN_RX_WQES_MPW0x2
  
+#define MLX5E_CQ_PERIOD_MODE_START_FROM_EQE		0x0

+#define MLX5E_CQ_PERIOD_MODE_START_FROM_CQE0x1
+


This enum should be defined under include/linux/mlx5/mlx5_ifc.h:
enum {
MLX5_CQ_PERIOD_MODE_START_FROM_EQE = 0x0,
MLX5_CQ_PERIOD_MODE_START_FROM_CQE = 0x1,
MLX5_CQ_PERIOD_NUM_MODES
};

We already include this in the relevant files. This was changed in patch 
01/10, please revert.

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Thomas Gleixner

On Sat, 6 Jan 2018, Alexei Starovoitov wrote:
> which clearly states that bpf_tail_call() was used in the attack.
> Yet none of the intel nor arm patches address speculation in
> this bpf helper!
> It means that:
> - gpz didn't share neither exploit nor the detailed description
>   of the POC with cpu vendors until now
> - coverity rules used to find all these places in the kernel
>   failed to find bpf_tail_call
> - cpu vendors were speculating what variant 1 can actually do

You forgot to mention that there might be other attacks than the public POC
which are not covered by a simple AND 

Thanks,

tglx

[PATCH net-next v2 4/6] e1000: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Jeff Kirsher 
---
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c 
b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
index 3b3983a..dc71e87 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_ethtool.c
@@ -1838,8 +1838,8 @@ static void e1000_get_ethtool_stats(struct net_device 
*netdev,
p = (char *)adapter + stat->stat_offset;
break;
default:
-   WARN_ONCE(1, "Invalid E1000 stat type: %u index %d\n",
- stat->type, i);
+   netdev_WARN_ONCE(netdev, "Invalid E1000 stat type: %u 
index %d\n",
+stat->type, i);
continue;
}
 
-- 
2.7.4

[PATCH net-next v2 0/6] Replace WARN_ONCE usages with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Hi,
This series will fix an issue in netdev_WARN_ONCE, improve its formatting and
replace drivers' usage of WARN_ONCE to netdev_WARN_ONCE.

Driver specific patches were compilation tested, in addition, functional tested
on Mellanox NIC.

v1->v2:
- Addressed commit message comments in patch #1

Gal Pressman (6):
  net: Fix netdev_WARN_ONCE macro
  net: No line break on netdev_WARN* formatting
  net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE
  e1000: Replace WARN_ONCE with netdev_WARN_ONCE
  bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE
  8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c  | 5 +++--
 drivers/net/ethernet/intel/e1000/e1000_ethtool.c | 4 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c  | 9 -
 drivers/net/ethernet/realtek/8139cp.c| 4 ++--
 include/linux/netdevice.h| 6 +++---
 5 files changed, 14 insertions(+), 14 deletions(-)

-- 
2.7.4

[PATCH net-next v2 6/6] 8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Realtek linux nic maintainers 
---
 drivers/net/ethernet/realtek/8139cp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/realtek/8139cp.c 
b/drivers/net/ethernet/realtek/8139cp.c
index e7ab23e..81045df 100644
--- a/drivers/net/ethernet/realtek/8139cp.c
+++ b/drivers/net/ethernet/realtek/8139cp.c
@@ -748,8 +748,8 @@ static netdev_tx_t cp_start_xmit (struct sk_buff *skb,
mss = skb_shinfo(skb)->gso_size;
 
if (mss > MSSMask) {
-   WARN_ONCE(1, "Net bug: GSO size %d too large for 8139CP\n",
- mss);
+   netdev_WARN_ONCE(dev, "Net bug: GSO size %d too large for 
8139CP\n",
+mss);
goto out_dma_error;
}
 
-- 
2.7.4

[PATCH net-next v2 3/6] net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5334b2d..38803e3 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -495,8 +495,8 @@ static inline void mlx5e_poll_ico_single_cqe(struct 
mlx5e_cq *cq,
mlx5_cqwq_pop(&cq->wq);
 
if (unlikely((cqe->op_own >> 4) != MLX5_CQE_REQ)) {
-   WARN_ONCE(true, "mlx5e: Bad OP in ICOSQ CQE: 0x%x\n",
- cqe->op_own);
+   netdev_WARN_ONCE(cq->channel->netdev,
+"Bad OP in ICOSQ CQE: 0x%x\n", cqe->op_own);
return;
}
 
@@ -506,9 +506,8 @@ static inline void mlx5e_poll_ico_single_cqe(struct 
mlx5e_cq *cq,
}
 
if (unlikely(icowi->opcode != MLX5_OPCODE_NOP))
-   WARN_ONCE(true,
- "mlx5e: Bad OPCODE in ICOSQ WQE info: 0x%x\n",
- icowi->opcode);
+   netdev_WARN_ONCE(cq->channel->netdev,
+"Bad OPCODE in ICOSQ WQE info: 0x%x\n", 
icowi->opcode);
 }
 
 static void mlx5e_poll_ico_cq(struct mlx5e_cq *cq, struct mlx5e_rq *rq)
-- 
2.7.4

[PATCH net-next v2 5/6] bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE

2018-01-07 Thread Gal Pressman

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
Cc: Ariel Elior 
---
 drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c 
b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
index 57eb26d..d7c98e8 100644
--- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
+++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c
@@ -738,8 +738,9 @@ static void bnx2x_gro_receive(struct bnx2x *bp, struct 
bnx2x_fastpath *fp,
bnx2x_gro_csum(bp, skb, bnx2x_gro_ipv6_csum);
break;
default:
-   WARN_ONCE(1, "Error: FW GRO supports only IPv4/IPv6, 
not 0x%04x\n",
- be16_to_cpu(skb->protocol));
+   netdev_WARN_ONCE(bp->dev,
+"Error: FW GRO supports only 
IPv4/IPv6, not 0x%04x\n",
+be16_to_cpu(skb->protocol));
}
}
 #endif
-- 
2.7.4

[PATCH net-next v2 2/6] net: No line break on netdev_WARN* formatting

2018-01-07 Thread Gal Pressman

Remove the unnecessary line break between the netdev name and reg state
to the actual message that should be printed.

For example, this:
[86730.307236] [ cut here ]
[86730.313496] netdevice: enp27s0f0
Message from the driver
[...]

Will be replaced with:
[86770.259289] [ cut here ]
[86770.265191] netdevice: enp27s0f0: Message from the driver
[...]

Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 5ff1ef9..87211c4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4404,11 +4404,11 @@ do {
\
  * file/line information and a backtrace.
  */
 #define netdev_WARN(dev, format, args...)  \
-   WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
+   WARN(1, "netdevice: %s%s: " format, netdev_name(dev),   \
 netdev_reg_state(dev), ##args)
 
 #define netdev_WARN_ONCE(dev, format, args...) \
-   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
+   WARN_ONCE(1, "netdevice: %s%s: " format, netdev_name(dev),  \
  netdev_reg_state(dev), ##args)
 
 /* netif printk helpers, similar to netdev_printk */
-- 
2.7.4

[PATCH net-next v2 1/6] net: Fix netdev_WARN_ONCE macro

2018-01-07 Thread Gal Pressman

netdev_WARN_ONCE is broken (whoops..), this fix will remove the
unnecessary "condition" parameter, add the missing comma and change
"arg" to "args".

Fixes: 375ef2b1f0d0 ("net: Introduce netdev_*_once functions")
Signed-off-by: Gal Pressman 
Reviewed-by: Saeed Mahameed 
---
 include/linux/netdevice.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 352066e..5ff1ef9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -4407,8 +4407,8 @@ do {  
\
WARN(1, "netdevice: %s%s\n" format, netdev_name(dev),   \
 netdev_reg_state(dev), ##args)
 
-#define netdev_WARN_ONCE(dev, condition, format, arg...)   \
-   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev)   \
+#define netdev_WARN_ONCE(dev, format, args...) \
+   WARN_ONCE(1, "netdevice: %s%s\n" format, netdev_name(dev),  \
  netdev_reg_state(dev), ##args)
 
 /* netif printk helpers, similar to netdev_printk */
-- 
2.7.4

[PATCH net-next 11/18] ipv6: Add explicit flush indication to routes

2018-01-07 Thread Ido Schimmel

When routes that are a part of a multipath route are evaluated by
fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events
the state of their sibling routes is not considered.

This will change in subsequent patches in order to align IPv6 with
IPv4's behavior. For example, when the last sibling in a multipath route
becomes dead, the entire multipath route needs to be removed.

To prevent the tree walker from re-evaluating all the sibling routes
each time, we can simply evaluate them once - when the first sibling is
traversed.

If we determine the entire multipath route needs to be removed, then the
'should_flush' bit is set in all the siblings, which will cause the
walker to flush them when it traverses them.

Signed-off-by: Ido Schimmel 
---
 include/net/ip6_fib.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 44d96a91e745..affea1aa6ae4 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -173,7 +173,8 @@ struct rt6_info {
unsigned short  rt6i_nfheader_len;
u8  rt6i_protocol;
u8  exception_bucket_flushed:1,
-   unused:7;
+   should_flush:1,
+   unused:6;
 };
 
 #define for_each_fib6_node_rt_rcu(fn)  \
-- 
2.14.3

[PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4

2018-01-07 Thread Ido Schimmel

This set tries to eliminate some differences between IPv4's and IPv6's
treatment of nexthops. These differences are most likely a side effect
of IPv6's data structures (specifically 'rt6_info') that incorporate
both the route and the nexthop and the late addition of ECMP support in
commit 51ebd3181572 ("ipv6: add support of equal cost multipath
(ECMP)").

IPv4 and IPv6 do not react the same to certain netdev events. For
example, upon carrier change affected IPv4 nexthops are marked using the
RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly.
IPv6 on the other hand, does nothing which forces us to perform a
carrier check during route lookup and dump. This makes it difficult to
introduce features such as non-equal-cost multipath that are built on
top of this set [1].

In addition, when a netdev is put administratively down IPv4 nexthops
are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all
the routes using these nexthops. To be consistent with IPv4, multipath
routes should only be flushed when all nexthops in the group are
considered dead.

The first 12 patches introduce non-functional changes that store the
RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev
events, in a similar fashion to IPv4. This allows us to remove the
carrier check performed during route lookup and dump.

The next three patches make sure we only flush a multipath route when
all of its nexthops are dead.

Last three patches add test cases for IPv4/IPv6 FIB. These verify that
both address families react similarly to netdev events.

Finally, this series also serves as a good first step towards David
Ahern's goal of treating nexthops as standalone objects [2], as it makes
the code more in line with IPv4 where the nexthop and the nexthop group
are separate objects from the route itself.

1. https://github.com/idosch/linux/tree/ipv6-nexthops
2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf

Changes since RFC (feedback from David Ahern):
* Remove redundant declaration of rt6_ifdown() in patch 4 and adjust
comment referencing it accordingly
* Drop patch to flush multipath routes upon NETDEV_UNREGISTER. Reword
cover letter accordingly
* Use a temporary variable to make code more readable in patch 15

Ido Schimmel (18):
  ipv6: Remove redundant route flushing during namespace dismantle
  ipv6: Mark dead nexthops with appropriate flags
  ipv6: Clear nexthop flags upon netdev up
  ipv6: Prepare to handle multiple netdev events
  ipv6: Set nexthop flags upon carrier change
  ipv6: Set nexthop flags during route creation
  ipv6: Check nexthop flags during route lookup instead of carrier
  ipv6: Check nexthop flags in route dump instead of carrier
  ipv6: Ignore dead routes during lookup
  ipv6: Report dead flag during route dump
  ipv6: Add explicit flush indication to routes
  ipv6: Teach tree walker to skip multipath routes
  ipv6: Export sernum update function
  ipv6: Take table lock outside of sernum update function
  ipv6: Flush multipath routes when all siblings are dead
  selftests: fib_tests: Add test cases for IPv4/IPv6 FIB
  selftests: fib_tests: Add test cases for netdev down
  selftests: fib_tests: Add test cases for netdev carrier change

 include/net/ip6_fib.h|   4 +-
 include/net/ip6_route.h  |   4 +-
 net/ipv6/addrconf.c  |   9 +-
 net/ipv6/ip6_fib.c   |  28 +-
 net/ipv6/route.c | 185 +++--
 tools/testing/selftests/net/Makefile |   1 +
 tools/testing/selftests/net/fib_tests.sh | 429 +++
 7 files changed, 618 insertions(+), 42 deletions(-)
 create mode 100755 tools/testing/selftests/net/fib_tests.sh

-- 
2.14.3

[PATCH net-next 10/18] ipv6: Report dead flag during route dump

2018-01-07 Thread Ido Schimmel

Up until now the RTNH_F_DEAD flag was only reported in route dump when
the 'ignore_routes_with_linkdown' sysctl was set. This is expected as
dead routes were flushed otherwise.

The reliance on this sysctl is going to be removed, so we need to report
the flag regardless of the sysctl's value.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c00156805bf0..f62d24948aa2 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4049,6 +4049,9 @@ static size_t rt6_nlmsg_size(struct rt6_info *rt)
 static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
unsigned int *flags, bool skip_oif)
 {
+   if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+   *flags |= RTNH_F_DEAD;
+
if (rt->rt6i_nh_flags & RTNH_F_LINKDOWN) {
*flags |= RTNH_F_LINKDOWN;
if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
-- 
2.14.3

[PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle

2018-01-07 Thread Ido Schimmel

By the time fib6_net_exit() is executed all the netdevs in the namespace
have been either unregistered or pushed back to the default namespace.
That is because pernet subsys operations are always ordered before
pernet device operations and therefore invoked after them during
namespace dismantle.

Thus, all the routing tables in the namespace are empty by the time
fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed.

This allows us to simplify the condition in fib6_ifdown() as it's only
ever called with an actual netdev.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/ip6_fib.c | 1 -
 net/ipv6/route.c   | 8 +++-
 2 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index a64d559fa513..3bbb89d8187d 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2103,7 +2103,6 @@ static void fib6_net_exit(struct net *net)
 {
unsigned int i;
 
-   rt6_ifdown(net, NULL);
del_timer_sync(&net->ipv6.ip6_fib_timer);
 
for (i = 0; i < FIB6_TABLE_HASHSZ; i++) {
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2490280b3394..c557362daa23 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3470,10 +3470,9 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
const struct arg_dev_net *adn = arg;
const struct net_device *dev = adn->dev;
 
-   if ((rt->dst.dev == dev || !dev) &&
+   if (rt->dst.dev == dev &&
rt != adn->net->ipv6.ip6_null_entry &&
-   (rt->rt6i_nsiblings == 0 ||
-(dev && netdev_unregistering(dev)) ||
+   (rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
 !rt->rt6i_idev->cnf.ignore_routes_with_linkdown))
return -1;
 
@@ -3488,8 +3487,7 @@ void rt6_ifdown(struct net *net, struct net_device *dev)
};
 
fib6_clean_all(net, fib6_ifdown, &adn);
-   if (dev)
-   rt6_uncached_list_flush_dev(net, dev);
+   rt6_uncached_list_flush_dev(net, dev);
 }
 
 struct rt6_mtu_change_arg {
-- 
2.14.3

[PATCH net-next 08/18] ipv6: Check nexthop flags in route dump instead of carrier

2018-01-07 Thread Ido Schimmel

Similar to previous patch, there is no need to check for the carrier of
the nexthop device when dumping the route and we can instead check for
the presence of the RTNH_F_LINKDOWN flag.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index ab0eed43ed97..f980f904d6ea 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -4039,7 +4039,7 @@ static size_t rt6_nlmsg_size(struct rt6_info *rt)
 static int rt6_nexthop_info(struct sk_buff *skb, struct rt6_info *rt,
unsigned int *flags, bool skip_oif)
 {
-   if (!netif_running(rt->dst.dev) || !netif_carrier_ok(rt->dst.dev)) {
+   if (rt->rt6i_nh_flags & RTNH_F_LINKDOWN) {
*flags |= RTNH_F_LINKDOWN;
if (rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
*flags |= RTNH_F_DEAD;
-- 
2.14.3

[PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes

2018-01-07 Thread Ido Schimmel

As explained in previous patch, fib6_ifdown() needs to consider the
state of all the sibling routes when a multipath route is traversed.

This is done by evaluating all the siblings when the first sibling in a
multipath route is traversed. If the multipath route does not need to be
flushed (e.g., not all siblings are dead), then we should just skip the
multipath route as our work is done.

Have the tree walker jump to the last sibling when it is determined that
the multipath route needs to be skipped.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/ip6_fib.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 3bbb89d8187d..5e4b5eef0ddd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1887,7 +1887,7 @@ static int fib6_clean_node(struct fib6_walker *w)
 
for_each_fib6_walker_rt(w) {
res = c->func(rt, c->arg);
-   if (res < 0) {
+   if (res == -1) {
w->leaf = rt;
res = fib6_del(rt, &info);
if (res) {
@@ -1900,6 +1900,12 @@ static int fib6_clean_node(struct fib6_walker *w)
continue;
}
return 0;
+   } else if (res == -2) {
+   if (WARN_ON(!rt->rt6i_nsiblings))
+   continue;
+   rt = list_last_entry(&rt->rt6i_siblings,
+struct rt6_info, rt6i_siblings);
+   continue;
}
WARN_ON(res != 0);
}
@@ -1911,7 +1917,8 @@ static int fib6_clean_node(struct fib6_walker *w)
  * Convenient frontend to tree walker.
  *
  * func is called on each route.
- * It may return -1 -> delete this route.
+ * It may return -2 -> skip multipath route.
+ *   -1 -> delete this route.
  *   0  -> continue walking
  */
 
-- 
2.14.3

[PATCH net-next 06/18] ipv6: Set nexthop flags during route creation

2018-01-07 Thread Ido Schimmel

It is valid to install routes with a nexthop device that does not have a
carrier, so we need to make sure they're marked accordingly.

As explained in the previous patch, host and anycast routes are never
marked with the 'linkdown' flag.

Note that reject routes are unaffected, as these use the loopback device
which always has a carrier.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 2fd36c7dd143..314e3bf41f6f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2746,6 +2746,9 @@ static struct rt6_info *ip6_route_info_create(struct 
fib6_config *cfg,
rt->rt6i_flags = cfg->fc_flags;
 
 install_route:
+   if (!(rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST)) &&
+   !netif_carrier_ok(dev))
+   rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
rt->dst.dev = dev;
rt->rt6i_idev = idev;
rt->rt6i_table = table;
-- 
2.14.3

[PATCH net-next 09/18] ipv6: Ignore dead routes during lookup

2018-01-07 Thread Ido Schimmel

Currently, dead routes are only present in the routing tables in case
the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are
flushed.

Subsequent patches are going to remove the reliance on this sysctl and
make IPv6 more consistent with IPv4.

Before this is done, we need to make sure dead routes are skipped during
route lookup, so as to not cause packet loss.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f980f904d6ea..c00156805bf0 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -474,6 +474,8 @@ static struct rt6_info *rt6_multipath_select(struct 
rt6_info *match,
if (route_choosen == 0) {
struct inet6_dev *idev = sibling->rt6i_idev;
 
+   if (sibling->rt6i_nh_flags & RTNH_F_DEAD)
+   break;
if (sibling->rt6i_nh_flags & RTNH_F_LINKDOWN &&
idev->cnf.ignore_routes_with_linkdown)
break;
@@ -499,12 +501,15 @@ static inline struct rt6_info *rt6_device_match(struct 
net *net,
struct rt6_info *local = NULL;
struct rt6_info *sprt;
 
-   if (!oif && ipv6_addr_any(saddr))
-   goto out;
+   if (!oif && ipv6_addr_any(saddr) && !(rt->rt6i_nh_flags & RTNH_F_DEAD))
+   return rt;
 
for (sprt = rt; sprt; sprt = rcu_dereference(sprt->rt6_next)) {
struct net_device *dev = sprt->dst.dev;
 
+   if (sprt->rt6i_nh_flags & RTNH_F_DEAD)
+   continue;
+
if (oif) {
if (dev->ifindex == oif)
return sprt;
@@ -533,8 +538,8 @@ static inline struct rt6_info *rt6_device_match(struct net 
*net,
if (flags & RT6_LOOKUP_F_IFACE)
return net->ipv6.ip6_null_entry;
}
-out:
-   return rt;
+
+   return rt->rt6i_nh_flags & RTNH_F_DEAD ? net->ipv6.ip6_null_entry : rt;
 }
 
 #ifdef CONFIG_IPV6_ROUTER_PREF
@@ -680,6 +685,9 @@ static struct rt6_info *find_match(struct rt6_info *rt, int 
oif, int strict,
bool match_do_rr = false;
struct inet6_dev *idev = rt->rt6i_idev;
 
+   if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+   goto out;
+
if (idev->cnf.ignore_routes_with_linkdown &&
rt->rt6i_nh_flags & RTNH_F_LINKDOWN &&
!(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))
@@ -2153,6 +2161,8 @@ static struct rt6_info *__ip6_route_redirect(struct net 
*net,
fn = fib6_lookup(&table->tb6_root, &fl6->daddr, &fl6->saddr);
 restart:
for_each_fib6_node_rt_rcu(fn) {
+   if (rt->rt6i_nh_flags & RTNH_F_DEAD)
+   continue;
if (rt6_check_expired(rt))
continue;
if (rt->dst.error)
-- 
2.14.3

[PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down

2018-01-07 Thread Ido Schimmel

Check that IPv4 and IPv6 react the same when a netdev is being put
administratively down.

Signed-off-by: Ido Schimmel 
---
 tools/testing/selftests/net/fib_tests.sh | 141 +++
 1 file changed, 141 insertions(+)

diff --git a/tools/testing/selftests/net/fib_tests.sh 
b/tools/testing/selftests/net/fib_tests.sh
index 767d2ab2385d..25ba74f8a37e 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -120,9 +120,150 @@ fib_unreg_test()
fib_unreg_multipath_test
 }
 
+fib_down_unicast_test()
+{
+   ret=0
+
+   netns_create "testns"
+
+   ip netns exec testns ip link add dummy0 type dummy
+   ip netns exec testns ip link set dev dummy0 up
+
+   ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip link set dev dummy0 down
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+   check_fail $?
+
+   ip netns exec testns ip link del dev dummy0
+
+   ip netns del testns
+
+   if [ $ret -ne 0 ]; then
+   echo "FAIL: unicast route test"
+   return 1
+   fi
+   echo "PASS: unicast route test"
+}
+
+fib_down_multipath_test_do()
+{
+   local down_dev=$1
+   local up_dev=$2
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 \
+   oif $down_dev &> /dev/null
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 \
+   oif $down_dev &> /dev/null
+   check_fail $?
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 \
+   oif $up_dev &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 \
+   oif $up_dev &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 | \
+   grep $down_dev | grep -q "dead linkdown"
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 | \
+   grep $down_dev | grep -q "dead linkdown"
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 | \
+   grep $up_dev | grep -q "dead linkdown"
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 | \
+   grep $up_dev | grep -q "dead linkdown"
+   check_fail $?
+}
+
+fib_down_multipath_test()
+{
+   ret=0
+
+   netns_create "testns"
+
+   ip netns exec testns ip link add dummy0 type dummy
+   ip netns exec testns ip link set dev dummy0 up
+
+   ip netns exec testns ip link add dummy1 type dummy
+   ip netns exec testns ip link set dev dummy1 up
+
+   ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+   ip netns exec testns ip address add 192.0.2.1/24 dev dummy1
+   ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy1
+
+   ip netns exec testns ip route add 203.0.113.0/24 \
+   nexthop via 198.51.100.2 dev dummy0 \
+   nexthop via 192.0.2.2 dev dummy1
+   ip netns exec testns ip -6 route add 2001:db8:3::/64 \
+   nexthop via 2001:db8:1::2 dev dummy0 \
+   nexthop via 2001:db8:2::2 dev dummy1
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip link set dev dummy0 down
+   check_err $?
+
+   fib_down_multipath_test_do "dummy0" "dummy1"
+
+   ip netns exec testns ip link set dev dummy0 up
+   check_err $?
+   ip netns exec testns ip link set dev dummy1 down
+   check_err $?
+
+   fib_down_multipath_test_do "dummy1" "dummy0"
+
+   ip netns exec testns ip link set dev dummy0 down
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+   check_fail $?
+
+   ip netns exec testns ip link del dev dummy1
+   ip netns exec testns ip link del dev dummy0
+
+   ip netns del testns
+
+   if [ $ret -ne 0 ]; then
+   echo "FAIL: multipath route test"
+   return 1
+   fi
+   echo "PASS: multipath route test"
+}
+
+fib_down_test()
+{
+   echo "Running netdev down tests"
+
+   fib_down_unicast_test
+   f

[PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB

2018-01-07 Thread Ido Schimmel

Add test cases to check that IPv4 and IPv6 react to a netdev being
unregistered as expected.

Signed-off-by: Ido Schimmel 
---
 tools/testing/selftests/net/Makefile |   1 +
 tools/testing/selftests/net/fib_tests.sh | 146 +++
 2 files changed, 147 insertions(+)
 create mode 100755 tools/testing/selftests/net/fib_tests.sh

diff --git a/tools/testing/selftests/net/Makefile 
b/tools/testing/selftests/net/Makefile
index 500c74db746c..d7c30d366935 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -5,6 +5,7 @@ CFLAGS =  -Wall -Wl,--no-as-needed -O2 -g
 CFLAGS += -I../../../../usr/include/
 
 TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
rtnetlink.sh
+TEST_PROGS += fib_tests.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy
 TEST_GEN_PROGS = reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
diff --git a/tools/testing/selftests/net/fib_tests.sh 
b/tools/testing/selftests/net/fib_tests.sh
new file mode 100755
index ..767d2ab2385d
--- /dev/null
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -0,0 +1,146 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# This test is for checking IPv4 and IPv6 FIB behavior in response to
+# different events.
+
+ret=0
+
+check_err()
+{
+   if [ $ret -eq 0 ]; then
+   ret=$1
+   fi
+}
+
+check_fail()
+{
+   if [ $1 -eq 0 ]; then
+   ret=1
+   fi
+}
+
+netns_create()
+{
+   local testns=$1
+
+   ip netns add $testns
+   ip netns exec $testns ip link set dev lo up
+}
+
+fib_unreg_unicast_test()
+{
+   ret=0
+
+   netns_create "testns"
+
+   ip netns exec testns ip link add dummy0 type dummy
+   ip netns exec testns ip link set dev dummy0 up
+
+   ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip link del dev dummy0
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+   check_fail $?
+
+   ip netns del testns
+
+   if [ $ret -ne 0 ]; then
+   echo "FAIL: unicast route test"
+   return 1
+   fi
+   echo "PASS: unicast route test"
+}
+
+fib_unreg_multipath_test()
+{
+   ret=0
+
+   netns_create "testns"
+
+   ip netns exec testns ip link add dummy0 type dummy
+   ip netns exec testns ip link set dev dummy0 up
+
+   ip netns exec testns ip link add dummy1 type dummy
+   ip netns exec testns ip link set dev dummy1 up
+
+   ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+   ip netns exec testns ip address add 192.0.2.1/24 dev dummy1
+   ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy1
+
+   ip netns exec testns ip route add 203.0.113.0/24 \
+   nexthop via 198.51.100.2 dev dummy0 \
+   nexthop via 192.0.2.2 dev dummy1
+   ip netns exec testns ip -6 route add 2001:db8:3::/64 \
+   nexthop via 2001:db8:1::2 dev dummy0 \
+   nexthop via 2001:db8:2::2 dev dummy1
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip link del dev dummy0
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 203.0.113.1 &> /dev/null
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:3::1 &> /dev/null
+   # In IPv6 we do not flush the entire multipath route.
+   check_err $?
+
+   ip netns exec testns ip link del dev dummy1
+
+   ip netns del testns
+
+   if [ $ret -ne 0 ]; then
+   echo "FAIL: multipath route test"
+   return 1
+   fi
+   echo "PASS: multipath route test"
+}
+
+fib_unreg_test()
+{
+   echo "Running netdev unregister tests"
+
+   fib_unreg_unicast_test
+   fib_unreg_multipath_test
+}
+
+fib_test()
+{
+   fib_unreg_test
+}
+
+if [ "$(id -u)" -ne 0 ];then
+   echo "SKIP: Need root privileges"
+   exit 0
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+   echo "SKIP: Could not run test without ip tool"
+   exit 0
+fi
+
+ip route help 2>&1 | grep -q fibmatch
+if [ $? -ne 0 ]; then
+   echo "SKIP: iproute2 too old, missing fibmatch"
+   exit 0
+fi
+
+fib_test
+
+exit $ret
-- 
2.14.3

[PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change

2018-01-07 Thread Ido Schimmel

Check that IPv4 and IPv6 react the same when the carrier of a netdev is
toggled. Local routes should not be affected by this, whereas unicast
routes should.

Signed-off-by: Ido Schimmel 
---
 tools/testing/selftests/net/fib_tests.sh | 142 +++
 1 file changed, 142 insertions(+)

diff --git a/tools/testing/selftests/net/fib_tests.sh 
b/tools/testing/selftests/net/fib_tests.sh
index 25ba74f8a37e..a9154eefb2e2 100755
--- a/tools/testing/selftests/net/fib_tests.sh
+++ b/tools/testing/selftests/net/fib_tests.sh
@@ -260,10 +260,152 @@ fib_down_test()
fib_down_multipath_test
 }
 
+fib_carrier_local_test()
+{
+   ret=0
+
+   # Local routes should not be affected when carrier changes.
+   netns_create "testns"
+
+   ip netns exec testns ip link add dummy0 type dummy
+   ip netns exec testns ip link set dev dummy0 up
+
+   ip netns exec testns ip link set dev dummy0 carrier on
+
+   ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+   ip netns exec testns ip route get fibmatch 198.51.100.1 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 198.51.100.1 | \
+   grep -q "linkdown"
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 | \
+   grep -q "linkdown"
+   check_fail $?
+
+   ip netns exec testns ip link set dev dummy0 carrier off
+
+   ip netns exec testns ip route get fibmatch 198.51.100.1 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 198.51.100.1 | \
+   grep -q "linkdown"
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::1 | \
+   grep -q "linkdown"
+   check_fail $?
+
+   ip netns exec testns ip address add 192.0.2.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy0
+
+   ip netns exec testns ip route get fibmatch 192.0.2.1 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:2::1 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 192.0.2.1 | \
+   grep -q "linkdown"
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:2::1 | \
+   grep -q "linkdown"
+   check_fail $?
+
+   ip netns exec testns ip link del dev dummy0
+
+   ip netns del testns
+
+   if [ $ret -ne 0 ]; then
+   echo "FAIL: local route carrier test"
+   return 1
+   fi
+   echo "PASS: local route carrier test"
+}
+
+fib_carrier_unicast_test()
+{
+   ret=0
+
+   netns_create "testns"
+
+   ip netns exec testns ip link add dummy0 type dummy
+   ip netns exec testns ip link set dev dummy0 up
+
+   ip netns exec testns ip link set dev dummy0 carrier on
+
+   ip netns exec testns ip address add 198.51.100.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:1::1/64 dev dummy0
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 | \
+   grep -q "linkdown"
+   check_fail $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 | \
+   grep -q "linkdown"
+   check_fail $?
+
+   ip netns exec testns ip link set dev dummy0 carrier off
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 198.51.100.2 | \
+   grep -q "linkdown"
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:1::2 | \
+   grep -q "linkdown"
+   check_err $?
+
+   ip netns exec testns ip address add 192.0.2.1/24 dev dummy0
+   ip netns exec testns ip -6 address add 2001:db8:2::1/64 dev dummy0
+
+   ip netns exec testns ip route get fibmatch 192.0.2.2 &> /dev/null
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:2::2 &> /dev/null
+   check_err $?
+
+   ip netns exec testns ip route get fibmatch 192.0.2.2 | \
+   grep -q "linkdown"
+   check_err $?
+   ip netns exec testns ip -6 route get fibmatch 2001:db8:2::2 | \
+   grep -q "linkdown"
+   check_err $?
+
+   ip netns exec testns ip link del dev dummy0
+
+   ip netns del testns
+
+

[PATCH net-next 14/18] ipv6: Take table lock outside of sernum update function

2018-01-07 Thread Ido Schimmel

The next patch is going to allow dead routes to remain in the FIB tree
in certain situations.

When this happens we need to be sure to bump the sernum of the nodes
where these are stored so that potential copies cached in sockets are
invalidated.

The function that performs this update assumes the table lock is not
taken when it is invoked, but that will not be the case when it is
invoked by the tree walker.

Have the function assume the lock is taken and make the single caller
take the lock itself.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/ip6_fib.c | 5 +
 net/ipv6/route.c   | 2 ++
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index c1bbe7bf9fdd..edda5ad3b405 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -107,16 +107,13 @@ enum {
 
 void fib6_update_sernum(struct rt6_info *rt)
 {
-   struct fib6_table *table = rt->rt6i_table;
struct net *net = dev_net(rt->dst.dev);
struct fib6_node *fn;
 
-   spin_lock_bh(&table->tb6_lock);
fn = rcu_dereference_protected(rt->rt6i_node,
-   lockdep_is_held(&table->tb6_lock));
+   lockdep_is_held(&rt->rt6i_table->tb6_lock));
if (fn)
fn->fn_sernum = fib6_new_sernum(net);
-   spin_unlock_bh(&table->tb6_lock);
 }
 
 /*
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f62d24948aa2..a3bfce71c861 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1353,7 +1353,9 @@ static int rt6_insert_exception(struct rt6_info *nrt,
 
/* Update fn->fn_sernum to invalidate all cached dst */
if (!err) {
+   spin_lock_bh(&ort->rt6i_table->tb6_lock);
fib6_update_sernum(ort);
+   spin_unlock_bh(&ort->rt6i_table->tb6_lock);
fib6_force_start_gc(net);
}
 
-- 
2.14.3

[PATCH net-next 13/18] ipv6: Export sernum update function

2018-01-07 Thread Ido Schimmel

We are going to allow dead routes to stay in the FIB tree (e.g., when
they are part of a multipath route, directly connected route with no
carrier) and revive them when their nexthop device gains carrier or when
it is put administratively up.

This is equivalent to the addition of the route to the FIB tree and we
should therefore take care of updating the sernum of all the parent
nodes of the node where the route is stored. Otherwise, we risk sockets
caching and using sub-optimal dst entries.

Export the function that performs the above, so that it could be invoked
from fib6_ifup() later on.

Signed-off-by: Ido Schimmel 
---
 include/net/ip6_fib.h |  1 +
 net/ipv6/ip6_fib.c| 11 ---
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index affea1aa6ae4..ddf53dd1e948 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -405,6 +405,7 @@ unsigned int fib6_tables_seq_read(struct net *net);
 int fib6_tables_dump(struct net *net, struct notifier_block *nb);
 
 void fib6_update_sernum(struct rt6_info *rt);
+void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt);
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
 int fib6_rules_init(void);
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 5e4b5eef0ddd..c1bbe7bf9fdd 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -1102,8 +1102,8 @@ void fib6_force_start_gc(struct net *net)
  jiffies + net->ipv6.sysctl.ip6_rt_gc_interval);
 }
 
-static void fib6_update_sernum_upto_root(struct rt6_info *rt,
-int sernum)
+static void __fib6_update_sernum_upto_root(struct rt6_info *rt,
+  int sernum)
 {
struct fib6_node *fn = rcu_dereference_protected(rt->rt6i_node,
lockdep_is_held(&rt->rt6i_table->tb6_lock));
@@ -1117,6 +1117,11 @@ static void fib6_update_sernum_upto_root(struct rt6_info 
*rt,
}
 }
 
+void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt)
+{
+   __fib6_update_sernum_upto_root(rt, fib6_new_sernum(net));
+}
+
 /*
  * Add routing information to the routing tree.
  * /
@@ -1230,7 +1235,7 @@ int fib6_add(struct fib6_node *root, struct rt6_info *rt,
 
err = fib6_add_rt2node(fn, rt, info, mxc, extack);
if (!err) {
-   fib6_update_sernum_upto_root(rt, sernum);
+   __fib6_update_sernum_upto_root(rt, sernum);
fib6_start_gc(info->nl_net, rt);
}
 
-- 
2.14.3

[PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead

2018-01-07 Thread Ido Schimmel

By default, IPv6 deletes nexthops from a multipath route when the
nexthop device is put administratively down. This differs from IPv4
where the nexthops are kept, but marked with the RTNH_F_DEAD flag. A
multipath route is flushed when all of its nexthops become dead.

Align IPv6 with IPv4 and have it conform to the same guidelines.

In case the multipath route needs to be flushed, its siblings are
flushed one by one. Otherwise, the nexthops are marked with the
appropriate flags and the tree walker is instructed to skip all the
siblings.

As explained in previous patches, care is taken to update the sernum of
the affected tree nodes, so as to prevent the use of wrong dst entries.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 83 ++--
 1 file changed, 75 insertions(+), 8 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index a3bfce71c861..1054b059747f 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3486,8 +3486,10 @@ static int fib6_ifup(struct rt6_info *rt, void *p_arg)
const struct arg_netdev_event *arg = p_arg;
const struct net *net = dev_net(arg->dev);
 
-   if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev)
+   if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev) {
rt->rt6i_nh_flags &= ~arg->nh_flags;
+   fib6_update_sernum_upto_root(dev_net(rt->dst.dev), rt);
+   }
 
return 0;
 }
@@ -3505,6 +3507,58 @@ void rt6_sync_up(struct net_device *dev, unsigned int 
nh_flags)
fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
 }
 
+static bool rt6_multipath_uses_dev(const struct rt6_info *rt,
+  const struct net_device *dev)
+{
+   struct rt6_info *iter;
+
+   if (rt->dst.dev == dev)
+   return true;
+   list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+   if (iter->dst.dev == dev)
+   return true;
+
+   return false;
+}
+
+static void rt6_multipath_flush(struct rt6_info *rt)
+{
+   struct rt6_info *iter;
+
+   rt->should_flush = 1;
+   list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+   iter->should_flush = 1;
+}
+
+static unsigned int rt6_multipath_dead_count(const struct rt6_info *rt,
+const struct net_device *down_dev)
+{
+   struct rt6_info *iter;
+   unsigned int dead = 0;
+
+   if (rt->dst.dev == down_dev || rt->rt6i_nh_flags & RTNH_F_DEAD)
+   dead++;
+   list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+   if (iter->dst.dev == down_dev ||
+   iter->rt6i_nh_flags & RTNH_F_DEAD)
+   dead++;
+
+   return dead;
+}
+
+static void rt6_multipath_nh_flags_set(struct rt6_info *rt,
+  const struct net_device *dev,
+  unsigned int nh_flags)
+{
+   struct rt6_info *iter;
+
+   if (rt->dst.dev == dev)
+   rt->rt6i_nh_flags |= nh_flags;
+   list_for_each_entry(iter, &rt->rt6i_siblings, rt6i_siblings)
+   if (iter->dst.dev == dev)
+   iter->rt6i_nh_flags |= nh_flags;
+}
+
 /* called with write lock held for table with rt */
 static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 {
@@ -3512,20 +3566,33 @@ static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
const struct net_device *dev = arg->dev;
const struct net *net = dev_net(dev);
 
-   if (rt->dst.dev != dev || rt == net->ipv6.ip6_null_entry)
+   if (rt == net->ipv6.ip6_null_entry)
return 0;
 
switch (arg->event) {
case NETDEV_UNREGISTER:
-   return -1;
+   return rt->dst.dev == dev ? -1 : 0;
case NETDEV_DOWN:
-   if (rt->rt6i_nsiblings == 0 ||
-   !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+   if (rt->should_flush)
return -1;
-   rt->rt6i_nh_flags |= RTNH_F_DEAD;
-   /* fall through */
+   if (!rt->rt6i_nsiblings)
+   return rt->dst.dev == dev ? -1 : 0;
+   if (rt6_multipath_uses_dev(rt, dev)) {
+   unsigned int count;
+
+   count = rt6_multipath_dead_count(rt, dev);
+   if (rt->rt6i_nsiblings + 1 == count) {
+   rt6_multipath_flush(rt);
+   return -1;
+   }
+   rt6_multipath_nh_flags_set(rt, dev, RTNH_F_DEAD |
+  RTNH_F_LINKDOWN);
+   fib6_update_sernum(rt);
+   }
+   return -2;
case NETDEV_CHANGE:
-   if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
+   if (rt->dst.dev != dev ||
+

[PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags

2018-01-07 Thread Ido Schimmel

When a netdev is put administratively down or unregistered all the
nexthops using it as their nexthop device should be marked with the
'dead' and 'linkdown' flags.

Currently, when a route is dumped its nexthop device is tested and the
flags are set accordingly. A similar check is performed during route
lookup.

Instead, we can simply mark the nexthops based on netdev events and
avoid checking the netdev's state during route dump and lookup.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c557362daa23..f5eda0aeab55 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3473,8 +3473,10 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
if (rt->dst.dev == dev &&
rt != adn->net->ipv6.ip6_null_entry &&
(rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
-!rt->rt6i_idev->cnf.ignore_routes_with_linkdown))
+!rt->rt6i_idev->cnf.ignore_routes_with_linkdown)) {
+   rt->rt6i_nh_flags |= (RTNH_F_DEAD | RTNH_F_LINKDOWN);
return -1;
+   }
 
return 0;
 }
-- 
2.14.3

[PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up

2018-01-07 Thread Ido Schimmel

Previous patch marked nexthops with the 'dead' and 'linkdown' flags.
Clear these flags when the netdev comes back up.

Signed-off-by: Ido Schimmel 
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c |  3 +++
 net/ipv6/route.c| 29 +
 3 files changed, 33 insertions(+)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 18e442ea93d8..caad39198c2a 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -169,6 +169,7 @@ void rt6_ifdown(struct net *net, struct net_device *dev);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
+void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index ed06b1190f05..b6405568ed7b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3484,6 +3484,9 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
if (run_pending)
addrconf_dad_run(idev);
 
+   /* Device has an address by now */
+   rt6_sync_up(dev, RTNH_F_DEAD);
+
/*
 * If the MTU changed during the interface down,
 * when the interface up, the changed MTU must be
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f5eda0aeab55..4796d87e0b93 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3459,6 +3459,35 @@ void rt6_clean_tohost(struct net *net, struct in6_addr 
*gateway)
fib6_clean_all(net, fib6_clean_tohost, gateway);
 }
 
+struct arg_netdev_event {
+   const struct net_device *dev;
+   unsigned int nh_flags;
+};
+
+static int fib6_ifup(struct rt6_info *rt, void *p_arg)
+{
+   const struct arg_netdev_event *arg = p_arg;
+   const struct net *net = dev_net(arg->dev);
+
+   if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev)
+   rt->rt6i_nh_flags &= ~arg->nh_flags;
+
+   return 0;
+}
+
+void rt6_sync_up(struct net_device *dev, unsigned int nh_flags)
+{
+   struct arg_netdev_event arg = {
+   .dev = dev,
+   .nh_flags = nh_flags,
+   };
+
+   if (nh_flags & RTNH_F_DEAD && netif_carrier_ok(dev))
+   arg.nh_flags |= RTNH_F_LINKDOWN;
+
+   fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
+}
+
 struct arg_dev_net {
struct net_device *dev;
struct net *net;
-- 
2.14.3

[PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier

2018-01-07 Thread Ido Schimmel

Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the
need to dereference the nexthop device and check its carrier and instead
check for the presence of the flag.

Signed-off-by: Ido Schimmel 
---
 net/ipv6/route.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 314e3bf41f6f..ab0eed43ed97 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -474,7 +474,7 @@ static struct rt6_info *rt6_multipath_select(struct 
rt6_info *match,
if (route_choosen == 0) {
struct inet6_dev *idev = sibling->rt6i_idev;
 
-   if (!netif_carrier_ok(sibling->dst.dev) &&
+   if (sibling->rt6i_nh_flags & RTNH_F_LINKDOWN &&
idev->cnf.ignore_routes_with_linkdown)
break;
if (rt6_score_route(sibling, oif, strict) < 0)
@@ -679,10 +679,9 @@ static struct rt6_info *find_match(struct rt6_info *rt, 
int oif, int strict,
int m;
bool match_do_rr = false;
struct inet6_dev *idev = rt->rt6i_idev;
-   struct net_device *dev = rt->dst.dev;
 
-   if (dev && !netif_carrier_ok(dev) &&
-   idev->cnf.ignore_routes_with_linkdown &&
+   if (idev->cnf.ignore_routes_with_linkdown &&
+   rt->rt6i_nh_flags & RTNH_F_LINKDOWN &&
!(strict & RT6_LOOKUP_F_IGNORE_LINKSTATE))
goto out;
 
-- 
2.14.3

[PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change

2018-01-07 Thread Ido Schimmel

Similar to IPv4, when the carrier of a netdev changes we should toggle
the 'linkdown' flag on all the nexthops using it as their nexthop
device.

This will later allow us to test for the presence of this flag during
route lookup and dump.

Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
device with address") host and anycast routes used the loopback netdev
as their nexthop device and thus were not marked with the 'linkdown'
flag. The patch preserves this behavior and allows one to ping the local
address even when the nexthop device does not have a carrier and the
'ignore_routes_with_linkdown' sysctl is set.

Signed-off-by: Ido Schimmel 
---
 include/net/ip6_route.h |  1 +
 net/ipv6/addrconf.c |  2 ++
 net/ipv6/route.c| 23 +--
 3 files changed, 20 insertions(+), 6 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index 6a2f80cbdf65..34cd3b0c6ded 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -170,6 +170,7 @@ void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
 void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
 void rt6_disable_ip(struct net_device *dev, unsigned long event);
+void rt6_sync_down_dev(struct net_device *dev, unsigned long event);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index a13e1ffe87ec..2435f7ab070b 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3438,6 +3438,7 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
} else if (event == NETDEV_CHANGE) {
if (!addrconf_link_ready(dev)) {
/* device is still not ready. */
+   rt6_sync_down_dev(dev, event);
break;
}
 
@@ -3449,6 +3450,7 @@ static int addrconf_notify(struct notifier_block *this, 
unsigned long event,
 * multicast snooping switches
 */
ipv6_mc_up(idev);
+   rt6_sync_up(dev, RTNH_F_LINKDOWN);
break;
}
idev->if_flags |= IF_READY;
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 194fe9d9cd85..2fd36c7dd143 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -3498,18 +3498,29 @@ static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
const struct net_device *dev = arg->dev;
const struct net *net = dev_net(dev);
 
-   if (rt->dst.dev == dev &&
-   rt != net->ipv6.ip6_null_entry &&
-   (rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
-!rt->rt6i_idev->cnf.ignore_routes_with_linkdown)) {
-   rt->rt6i_nh_flags |= (RTNH_F_DEAD | RTNH_F_LINKDOWN);
+   if (rt->dst.dev != dev || rt == net->ipv6.ip6_null_entry)
+   return 0;
+
+   switch (arg->event) {
+   case NETDEV_UNREGISTER:
return -1;
+   case NETDEV_DOWN:
+   if (rt->rt6i_nsiblings == 0 ||
+   !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)
+   return -1;
+   rt->rt6i_nh_flags |= RTNH_F_DEAD;
+   /* fall through */
+   case NETDEV_CHANGE:
+   if (rt->rt6i_flags & (RTF_LOCAL | RTF_ANYCAST))
+   break;
+   rt->rt6i_nh_flags |= RTNH_F_LINKDOWN;
+   break;
}
 
return 0;
 }
 
-static void rt6_sync_down_dev(struct net_device *dev, unsigned long event)
+void rt6_sync_down_dev(struct net_device *dev, unsigned long event)
 {
struct arg_netdev_event arg = {
.dev = dev,
-- 
2.14.3

[PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events

2018-01-07 Thread Ido Schimmel

To make IPv6 more in line with IPv4 we need to be able to respond
differently to different netdev events. For example, when a netdev is
unregistered all the routes using it as their nexthop device should be
flushed, whereas when the netdev's carrier changes only the 'linkdown'
flag should be toggled.

Currently, this is not possible, as the function that traverses the
routing tables is not aware of the triggering event.

Propagate the triggering event down, so that it could be used in later
patches.

Signed-off-by: Ido Schimmel 
---
 include/net/ip6_route.h |  2 +-
 net/ipv6/addrconf.c |  4 ++--
 net/ipv6/route.c| 37 +
 3 files changed, 24 insertions(+), 19 deletions(-)

diff --git a/include/net/ip6_route.h b/include/net/ip6_route.h
index caad39198c2a..6a2f80cbdf65 100644
--- a/include/net/ip6_route.h
+++ b/include/net/ip6_route.h
@@ -165,11 +165,11 @@ struct rt6_rtnl_dump_arg {
 };
 
 int rt6_dump_route(struct rt6_info *rt, void *p_arg);
-void rt6_ifdown(struct net *net, struct net_device *dev);
 void rt6_mtu_change(struct net_device *dev, unsigned int mtu);
 void rt6_remove_prefsrc(struct inet6_ifaddr *ifp);
 void rt6_clean_tohost(struct net *net, struct in6_addr *gateway);
 void rt6_sync_up(struct net_device *dev, unsigned int nh_flags);
+void rt6_disable_ip(struct net_device *dev, unsigned long event);
 
 static inline const struct rt6_info *skb_rt6_info(const struct sk_buff *skb)
 {
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index b6405568ed7b..a13e1ffe87ec 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -3580,6 +3580,7 @@ static bool addr_is_local(const struct in6_addr *addr)
 
 static int addrconf_ifdown(struct net_device *dev, int how)
 {
+   unsigned long event = how ? NETDEV_UNREGISTER : NETDEV_DOWN;
struct net *net = dev_net(dev);
struct inet6_dev *idev;
struct inet6_ifaddr *ifa, *tmp;
@@ -3589,8 +3590,7 @@ static int addrconf_ifdown(struct net_device *dev, int 
how)
 
ASSERT_RTNL();
 
-   rt6_ifdown(net, dev);
-   neigh_ifdown(&nd_tbl, dev);
+   rt6_disable_ip(dev, event);
 
idev = __in6_dev_get(dev);
if (!idev)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 4796d87e0b93..194fe9d9cd85 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2344,7 +2344,7 @@ struct dst_entry *icmp6_dst_alloc(struct net_device *dev,
rt->rt6i_idev = idev;
dst_metric_set(&rt->dst, RTAX_HOPLIMIT, 0);
 
-   /* Add this dst into uncached_list so that rt6_ifdown() can
+   /* Add this dst into uncached_list so that rt6_disable_ip() can
 * do proper release of the net_device
 */
rt6_uncached_list_add(rt);
@@ -3461,7 +3461,10 @@ void rt6_clean_tohost(struct net *net, struct in6_addr 
*gateway)
 
 struct arg_netdev_event {
const struct net_device *dev;
-   unsigned int nh_flags;
+   union {
+   unsigned int nh_flags;
+   unsigned long event;
+   };
 };
 
 static int fib6_ifup(struct rt6_info *rt, void *p_arg)
@@ -3488,19 +3491,15 @@ void rt6_sync_up(struct net_device *dev, unsigned int 
nh_flags)
fib6_clean_all(dev_net(dev), fib6_ifup, &arg);
 }
 
-struct arg_dev_net {
-   struct net_device *dev;
-   struct net *net;
-};
-
 /* called with write lock held for table with rt */
-static int fib6_ifdown(struct rt6_info *rt, void *arg)
+static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 {
-   const struct arg_dev_net *adn = arg;
-   const struct net_device *dev = adn->dev;
+   const struct arg_netdev_event *arg = p_arg;
+   const struct net_device *dev = arg->dev;
+   const struct net *net = dev_net(dev);
 
if (rt->dst.dev == dev &&
-   rt != adn->net->ipv6.ip6_null_entry &&
+   rt != net->ipv6.ip6_null_entry &&
(rt->rt6i_nsiblings == 0 || netdev_unregistering(dev) ||
 !rt->rt6i_idev->cnf.ignore_routes_with_linkdown)) {
rt->rt6i_nh_flags |= (RTNH_F_DEAD | RTNH_F_LINKDOWN);
@@ -3510,15 +3509,21 @@ static int fib6_ifdown(struct rt6_info *rt, void *arg)
return 0;
 }
 
-void rt6_ifdown(struct net *net, struct net_device *dev)
+static void rt6_sync_down_dev(struct net_device *dev, unsigned long event)
 {
-   struct arg_dev_net adn = {
+   struct arg_netdev_event arg = {
.dev = dev,
-   .net = net,
+   .event = event,
};
 
-   fib6_clean_all(net, fib6_ifdown, &adn);
-   rt6_uncached_list_flush_dev(net, dev);
+   fib6_clean_all(dev_net(dev), fib6_ifdown, &arg);
+}
+
+void rt6_disable_ip(struct net_device *dev, unsigned long event)
+{
+   rt6_sync_down_dev(dev, event);
+   rt6_uncached_list_flush_dev(dev_net(dev), dev);
+   neigh_ifdown(&nd_tbl, dev);
 }
 
 struct rt6_mtu_change_arg {
-- 
2.14.3

Re: dvb usb issues since kernel 4.9

2018-01-07 Thread Mauro Carvalho Chehab

Em Sat, 6 Jan 2018 16:44:20 -0500 (EST)
Alan Stern  escreveu:

> On Sat, 6 Jan 2018, Mauro Carvalho Chehab wrote:
> 
> > Hi Josef,
> > 
> > Em Sat, 6 Jan 2018 16:04:16 +0100
> > "Josef Griebichler"  escreveu:
> >   
> > > Hi,
> > > 
> > > the causing commit has been identified.
> > > After reverting commit 
> > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
> > > its working again.  
> > 
> > Just replying to me won't magically fix this. The ones that were involved on
> > this patch should also be c/c, plus USB people. Just added them.
> >   
> > > Please have a look into the thread 
> > > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?pageNo=13
> > > here are several users aknowledging the revert solves their issues with 
> > > usb dvb cards.  
> > 
> > I read the entire (long) thread there. In order to make easier for the
> > others, from what I understand, the problem happens on both x86 and arm,
> > although almost all comments there are mentioning tests with raspbian
> > Kernel (with uses a different USB host driver than the upstream one).
> > 
> > It happens when watching digital TV DVB-C channels, with usually means
> > a sustained bit rate of 11 MBps to 54 MBps.
> > 
> > The reports mention the dvbsky, with uses USB URB bulk transfers.
> > On every several minutes (5 to 10 mins), the stream suffer "glitches"
> > caused by frame losses.
> > 
> > The part of the thread that contains the bisect is at:
> > 
> > https://forum.libreelec.tv/thread/4235-dvb-issue-since-le-switched-to-kernel-4-9-x/?postID=75965#post75965
> > 
> > It indirectly mentions another comment on the thread with points
> > to:
> > https://github.com/raspberrypi/linux/issues/2134
> > 
> > There, it says that this fix part of the issues:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=34f41c0316ed52b0b44542491d89278efdaa70e4
> > 
> > but it affects URB packet losses on a lesser extend.
> > 
> > The main issue is really the logic changes a the core softirq logic.
> > 
> > Using Kernel 4.14.10 on a Raspberry Pi 3 with 4cd13c2 commit reverted
> > fixed the issue. 
> > 
> > Joseph, is the above right? Anything else to mention? Does the
> > same issue affect also on x86 with vanilla Kernel 4.14.10?
> > 
> > -
> > 
> > It seems that the original patch were designed to solve some IRQ issues
> > with network cards with causes data losses on high traffic. However,
> > it is also causing bad effects on sustained high bandwidth demands
> > required by DVB cards, at least on some USB host drivers.
> > 
> > Alan/Greg/Eric/David:
> > 
> > Any ideas about how to fix it without causing regressions to
> > network?  
> 
> It would be good to know what hardware was involved on the x86 system
> and to have some timing data.  Can we see the output from lsusb and
> usbmon, running on a vanilla kernel that gets plenty of video glitches?

>From Josef's report, and from the BZ, the affected hardware seems
to be based on Montage Technology M88DS3103/M88TS2022 chipset.
The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
with shares a USB implementation that is used by a lot more drivers.
The URB handling code is at:

drivers/media/usb/dvb-usb-v2/usb_urb.c

This particular driver allocates 8 buffers with 4096 bytes each
for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.

This become a popular USB hardware nowadays. I have one S960c
myself, so I can send you the lsusb from it. You should notice, however,
that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
of payload after removing URB headers. A 10 minutes record with the
entire data (with typically contains 5-10 channels) can easily go
above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
a usbmon dump would be useful.

I'm enclosing the lsusb from a S960C device, with is based on those
Montage chipsets:

Bus 002 Device 007: ID 0572:960c Conexant Systems (Rockwell), Inc. DVBSky S960C 
DVB-S2 tuner
Couldn't open device, some information will be missing
Device Descriptor:
  bLength18
  bDescriptorType 1
  bcdUSB   2.00
  bDeviceClass0 (Defined at Interface level)
  bDeviceSubClass 0 
  bDeviceProtocol 0 
  bMaxPacketSize064
  idVendor   0x0572 Conexant Systems (Rockwell), Inc.
  idProduct  0x960c DVBSky S960C DVB-S2 tuner
  bcdDevice0.00
  iManufacturer   1 
  iProduct2 
  iSerial 3 
  bNumConfigurations  1
  Configuration Descriptor:
bLength 9
bDescriptorType 2
wTotalLength  219
bNumInterfaces  1
bConfigurationValue 1
iConfiguration  4 
bmAttributes 0x80
  (Bus Powered)
MaxPower

Re: [patch net-next v6 06/11] net: sched: use block index as a handle instead of qdisc when block is shared

2018-01-07 Thread Jamal Hadi Salim


On 18-01-06 03:43 PM, Jiri Pirko wrote:





@@ -886,8 +887,13 @@ static int tcf_fill_node(struct net *net, struct sk_buff 
*skb,
tcm->tcm_family = AF_UNSPEC;
tcm->tcm__pad1 = 0;
tcm->tcm__pad2 = 0;
-   tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
-   tcm->tcm_parent = parent;
+   if (q) {
+   tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
+   tcm->tcm_parent = parent;
+   } else {
+   tcm->tcm_ifindex = 0; /* block index is stored in parent */
+   tcm->tcm_parent = block->index;
+   }


Please guys, please look at this reuse (also on clt side). I would like
you to double-check this reuse of existing API for balock_index carrying
purpose. I believe it's UAPI safe. But please, check it out carefully.




Should not break any ABI/UAPI AFAIK. Maybe go for a negative ifindex
(not sure if zero means something speacial to someone).

cheers,
jamal

Re: [patch net-next v6 06/11] net: sched: use block index as a handle instead of qdisc when block is shared

2018-01-07 Thread Jiri Pirko

Sun, Jan 07, 2018 at 02:11:19PM CET, j...@mojatatu.com wrote:
>On 18-01-06 03:43 PM, Jiri Pirko wrote:
>
>
>> 
>> > @@ -886,8 +887,13 @@ static int tcf_fill_node(struct net *net, struct 
>> > sk_buff *skb,
>> >tcm->tcm_family = AF_UNSPEC;
>> >tcm->tcm__pad1 = 0;
>> >tcm->tcm__pad2 = 0;
>> > -  tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
>> > -  tcm->tcm_parent = parent;
>> > +  if (q) {
>> > +  tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
>> > +  tcm->tcm_parent = parent;
>> > +  } else {
>> > +  tcm->tcm_ifindex = 0; /* block index is stored in parent */
>> > +  tcm->tcm_parent = block->index;
>> > +  }
>> 
>> Please guys, please look at this reuse (also on clt side). I would like
>> you to double-check this reuse of existing API for balock_index carrying
>> purpose. I believe it's UAPI safe. But please, check it out carefully.
>> 
>
>
>Should not break any ABI/UAPI AFAIK. Maybe go for a negative ifindex
>(not sure if zero means something speacial to someone).

Like -1 means parent is block_index?

Why would 0 mean something special? Could you point to a code that
suggests it?

>
>cheers,
>jamal
>

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Alan Cox

> > 2.  It is very very complicated to answer a question like "is
> > sequence x safe on all of vendor's microprocessors" even for the vendor  
> 
> so far "is sequence x safe" was viewed by cpu vendors as
> "is sequence x going to stop speculative execution".

Incorrect. Modern processors are very very sophisticated beasts and you
are dealing with a wide range of architectures and micro-architectures
that all have their own differences.

> > Intel's current position is 'lfence'.  

> which clearly states that bpf_tail_call() was used in the attack.
> Yet none of the intel nor arm patches address speculation in
> this bpf helper!

There are a set of patches under discussion for eBPF both the JIT and
interpreter. BTW I would expect that there are things Coverity didn't
find, and that we'll also see variants on the attack where different
tricks for measurement emerge that may change what is needed a little.

> which means that POC is relying 64-bit address speculation.
> In the places coverity found the user supplied value is 32-bit,

People have 32bit computers as well as 64bit and in some cases 32bit is
fine for an attack depending where your target is relative to the object.

> > The differences involved on the "lfence" versus "and" versus before are
> > not likely to be anywhere in that order of magnitude.  
> 
> we clearly disagree here. Both intel and arm patches proposed
> to add lfence in bpf_map_lookup() which is the hottest function
> we have and we do run it at 40+Gbps speeds where every nanosecond
> counts, so no, lfence is not a solution.

The last solution I saw proposed in that path for the JIT is to "and" with
constant which in that situation clearly makes the most sense providing
it is safe on all the CPUs involved.

lfence timing is also heavily dependent upon what work has to be done to
retire previous live instructions. BPF does not normally do a lot of
writing so you'd expect the cost to be low.

Anyway - Intel's current position is that lfence is safe. It's not
impossible other sequences will in future be documented as safe by one or
more vendors of x86 processors.

Alan

Re: [patch net-next v6 06/11] net: sched: use block index as a handle instead of qdisc when block is shared

2018-01-07 Thread Jamal Hadi Salim


On 18-01-07 08:46 AM, Jiri Pirko wrote:

Sun, Jan 07, 2018 at 02:11:19PM CET, j...@mojatatu.com wrote:

On 18-01-06 03:43 PM, Jiri Pirko wrote:





@@ -886,8 +887,13 @@ static int tcf_fill_node(struct net *net, struct sk_buff 
*skb,
tcm->tcm_family = AF_UNSPEC;
tcm->tcm__pad1 = 0;
tcm->tcm__pad2 = 0;
-   tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
-   tcm->tcm_parent = parent;
+   if (q) {
+   tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
+   tcm->tcm_parent = parent;
+   } else {
+   tcm->tcm_ifindex = 0; /* block index is stored in parent */
+   tcm->tcm_parent = block->index;
+   }


Please guys, please look at this reuse (also on clt side). I would like
you to double-check this reuse of existing API for balock_index carrying
purpose. I believe it's UAPI safe. But please, check it out carefully.




Should not break any ABI/UAPI AFAIK. Maybe go for a negative ifindex
(not sure if zero means something speacial to someone).


Like -1 means parent is block_index?



Yes.


Why would 0 mean something special? Could you point to a code that
suggests it?



I cant point to any such code, it is just the ifindex is an int.
And the negative space looks like less likely someone would think
of using for signalling (0x as an example).
tcpdump -i any probably assumes soem weird ifindex (havent looked
at the code).

In any case, 0 is fine too.

cheers,
jamal

[PATCH iproute2] ip fou: Pass family attribute as u8

2018-01-07 Thread Filip Moc

This fixes fou on big-endian systems.

Signed-off-by: Filip Moc 
---
 ip/ipfou.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ip/ipfou.c b/ip/ipfou.c
index febc2c8c..1f392ade 100644
--- a/ip/ipfou.c
+++ b/ip/ipfou.c
@@ -52,7 +52,7 @@ static int fou_parse_opt(int argc, char **argv, struct 
nlmsghdr *n,
__u8 ipproto, type;
bool gue_set = false;
int ipproto_set = 0;
-   unsigned short family = AF_INET;
+   __u8 family = AF_INET;
 
while (argc > 0) {
if (!matches(*argv, "port")) {
@@ -103,7 +103,7 @@ static int fou_parse_opt(int argc, char **argv, struct 
nlmsghdr *n,
 
addattr16(n, 1024, FOU_ATTR_PORT, port);
addattr8(n, 1024, FOU_ATTR_TYPE, type);
-   addattr16(n, 1024, FOU_ATTR_AF, family);
+   addattr8(n, 1024, FOU_ATTR_AF, family);
 
if (ipproto_set)
addattr8(n, 1024, FOU_ATTR_IPPROTO, ipproto);
-- 
2.11.0

[iproute2 1/1] tc: Fix filter protocol output

2018-01-07 Thread Jamal Hadi Salim

From: Jamal Hadi Salim 

Fixes: 249284ff5a44 ("tc: jsonify filter core")
Signed-off-by: Jamal Hadi Salim 
---
 tc/tc_filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tc/tc_filter.c b/tc/tc_filter.c
index 545cc3a..546311a 100644
--- a/tc/tc_filter.c
+++ b/tc/tc_filter.c
@@ -276,7 +276,7 @@ int print_filter(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
if (!filter_protocol || filter_protocol != f_proto) {
if (f_proto) {
SPRINT_BUF(b1);
-   print_string(PRINT_JSON, "protocol",
+   print_string(PRINT_ANY, "protocol",
 "protocol %s ",
 ll_proto_n2a(f_proto, b1, 
sizeof(b1)));
}
-- 
1.9.1

Re: [patch net-next v6 06/11] net: sched: use block index as a handle instead of qdisc when block is shared

2018-01-07 Thread Jamal Hadi Salim


On 18-01-07 09:28 AM, Jamal Hadi Salim wrote:

On 18-01-07 08:46 AM, Jiri Pirko wrote:

Sun, Jan 07, 2018 at 02:11:19PM CET, j...@mojatatu.com wrote:

On 18-01-06 03:43 PM, Jiri Pirko wrote:




@@ -886,8 +887,13 @@ static int tcf_fill_node(struct net *net, 
struct sk_buff *skb,

tcm->tcm_family = AF_UNSPEC;
tcm->tcm__pad1 = 0;
tcm->tcm__pad2 = 0;
-    tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
-    tcm->tcm_parent = parent;
+    if (q) {
+    tcm->tcm_ifindex = qdisc_dev(q)->ifindex;
+    tcm->tcm_parent = parent;
+    } else {
+    tcm->tcm_ifindex = 0; /* block index is stored in parent */
+    tcm->tcm_parent = block->index;
+    }


Please guys, please look at this reuse (also on clt side). I would like
you to double-check this reuse of existing API for balock_index 
carrying

purpose. I believe it's UAPI safe. But please, check it out carefully.




Should not break any ABI/UAPI AFAIK. Maybe go for a negative ifindex
(not sure if zero means something speacial to someone).


Like -1 means parent is block_index?



Yes.


Why would 0 mean something special? Could you point to a code that
suggests it?



I cant point to any such code, it is just the ifindex is an int.
And the negative space looks like less likely someone would think
of using for signalling (0x as an example).
tcpdump -i any probably assumes soem weird ifindex (havent looked
at the code).

In any case, 0 is fine too.


Also a #define for whatever value you use would make it more readable,
ex:
#define IFINDEX_ANY ..

cheers,
jamal

[PATCH] ixgbevf: use ARRAY_SIZE for various array sizing calculations

2018-01-07 Thread Colin King

From: Colin Ian King 

Use the ARRAY_SIZE macro on various arrays to determine
size of the arrays. Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/intel/ixgbevf/vf.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c 
b/drivers/net/ethernet/intel/ixgbevf/vf.c
index 0c25006ce9af..cd79c4a73b48 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -285,7 +285,7 @@ static s32 ixgbevf_set_uc_addr_vf(struct ixgbe_hw *hw, u32 
index, u8 *addr)
ether_addr_copy(msg_addr, addr);
 
ret_val = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
-sizeof(msgbuf) / sizeof(u32));
+ARRAY_SIZE(msgbuf));
if (!ret_val) {
msgbuf[0] &= ~IXGBE_VT_MSGTYPE_CTS;
 
@@ -455,8 +455,7 @@ static s32 ixgbevf_set_rar_vf(struct ixgbe_hw *hw, u32 
index, u8 *addr,
ether_addr_copy(msg_addr, addr);
 
ret_val = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
-sizeof(msgbuf) / sizeof(u32));
-
+ARRAY_SIZE(msgbuf));
msgbuf[0] &= ~IXGBE_VT_MSGTYPE_CTS;
 
/* if nacked the address was rejected, use "perm_addr" */
@@ -571,7 +570,7 @@ static s32 ixgbevf_update_xcast_mode(struct ixgbe_hw *hw, 
int xcast_mode)
msgbuf[1] = xcast_mode;
 
err = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
-sizeof(msgbuf) / sizeof(u32));
+ARRAY_SIZE(msgbuf));
if (err)
return err;
 
@@ -609,7 +608,7 @@ static s32 ixgbevf_set_vfta_vf(struct ixgbe_hw *hw, u32 
vlan, u32 vind,
msgbuf[0] |= vlan_on << IXGBE_VT_MSGINFO_SHIFT;
 
err = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
-sizeof(msgbuf) / sizeof(u32));
+ARRAY_SIZE(msgbuf));
if (err)
goto mbx_err;
 
@@ -813,7 +812,7 @@ static s32 ixgbevf_set_rlpml_vf(struct ixgbe_hw *hw, u16 
max_size)
msgbuf[1] = max_size;
 
ret_val = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
-sizeof(msgbuf) / sizeof(u32));
+ARRAY_SIZE(msgbuf));
if (ret_val)
return ret_val;
if ((msgbuf[0] & IXGBE_VF_SET_LPE) &&
@@ -859,8 +858,7 @@ static int ixgbevf_negotiate_api_version_vf(struct ixgbe_hw 
*hw, int api)
msg[1] = api;
msg[2] = 0;
 
-   err = ixgbevf_write_msg_read_ack(hw, msg, msg,
-sizeof(msg) / sizeof(u32));
+   err = ixgbevf_write_msg_read_ack(hw, msg, msg, ARRAY_SIZE(msg));
if (!err) {
msg[0] &= ~IXGBE_VT_MSGTYPE_CTS;
 
@@ -911,8 +909,7 @@ int ixgbevf_get_queues(struct ixgbe_hw *hw, unsigned int 
*num_tcs,
msg[0] = IXGBE_VF_GET_QUEUE;
msg[1] = msg[2] = msg[3] = msg[4] = 0;
 
-   err = ixgbevf_write_msg_read_ack(hw, msg, msg,
-sizeof(msg) / sizeof(u32));
+   err = ixgbevf_write_msg_read_ack(hw, msg, msg, ARRAY_SIZE(msg));
if (!err) {
msg[0] &= ~IXGBE_VT_MSGTYPE_CTS;
 
-- 
2.15.1

Re: [PATCH net-next v3 00/10] net: qualcomm: rmnet: Enable csum offloads

2018-01-07 Thread David Miller

From: Subash Abhinov Kasiviswanathan 
Date: Sat, 06 Jan 2018 19:19:00 -0700

> I still dont see the v3 submission in patchwork if I query it.

Please resubmit then.

Re: atm/clip: Use seq_puts() in svc_addr()

2018-01-07 Thread Stefano Brivio

On Sun, 7 Jan 2018 09:19:17 +0100
SF Markus Elfring  wrote:

> >> Two strings should be quickly put into a sequence by two function calls.
> >> Thus use the function "seq_puts" instead of "seq_printf".
> >>
> >> This issue was detected by using the Coccinelle software.  
> > 
> > Can you please explain what the issue really is and what you're trying
> > to do here?  
> 
> Is the function "seq_puts" a bit more efficient for the desired output
> of a single string in comparison to calling the function "seq_printf"
> for this purpose?

Will you please be so kind and tell us?

> > One shouldn't need to dig into Coccinelle patterns to find
> > out what you mean,  
> 
> Why did an attribution for a software tool confuse you?

I'm not confused. I'm saying that one shouldn't need to dig into
Coccinelle patterns to find out what you mean.

> > and "strings should be quickly put into a sequence"
> > isn't terribly helpful.  
> 
> Which wording would you find more appropriate for the suggested
> adjustment of these function calls?

Whatever describes the actual issue and what you're doing about it.
Turn your rhetorical question above into a commit message, done.

Compare that with your original commit message, on the other hand,
and you should understand what I mean.

-- 
Stefano

Re: dvb usb issues since kernel 4.9

2018-01-07 Thread Alan Stern

On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote:

> > > It seems that the original patch were designed to solve some IRQ issues
> > > with network cards with causes data losses on high traffic. However,
> > > it is also causing bad effects on sustained high bandwidth demands
> > > required by DVB cards, at least on some USB host drivers.
> > > 
> > > Alan/Greg/Eric/David:
> > > 
> > > Any ideas about how to fix it without causing regressions to
> > > network?  
> > 
> > It would be good to know what hardware was involved on the x86 system
> > and to have some timing data.  Can we see the output from lsusb and
> > usbmon, running on a vanilla kernel that gets plenty of video glitches?
> 
> From Josef's report, and from the BZ, the affected hardware seems
> to be based on Montage Technology M88DS3103/M88TS2022 chipset.

What type of USB host controller does the x86_64 system use?  EHCI or
xHCI?

> The driver it uses is at drivers/media/usb/dvb-usb-v2/dvbsky.c,
> with shares a USB implementation that is used by a lot more drivers.
> The URB handling code is at:
> 
>   drivers/media/usb/dvb-usb-v2/usb_urb.c
> 
> This particular driver allocates 8 buffers with 4096 bytes each
> for bulk transfers, using transfer_flags = URB_NO_TRANSFER_DMA_MAP.
> 
> This become a popular USB hardware nowadays. I have one S960c
> myself, so I can send you the lsusb from it. You should notice, however,
> that a DVB-C/DVB-S2 channel can easily provide very high sustained bit
> rates. Here, on my DVB-S2 provider, a typical transponder produces 58 Mpps
> of payload after removing URB headers.

You mentioned earlier that the driver uses bulk transfers.  In USB-2.0,
the maximum possible payload data transfer rate using bulk transfers is
53248 bytes/ms, which is 53.248 MB/s (i.e., lower than 58 MB/s).  And
even this is possible only if almost nothing else is using the bus at
the same time.

> A 10 minutes record with the
> entire data (with typically contains 5-10 channels) can easily go
> above 4 GB, just to reproduce 1-2 glitches. So, I'm not sure if
> a usbmon dump would be useful.

It might not be helpful at all.  However, I'm not interested in the 
payload data (which would be unintelligible to me anyway) but rather 
the timing of URB submissions and completions.  A usbmon trace which 
didn't keep much of the payload data would only require on the order of 
50 MB per minute -- and Josef said that glitches usually would show up 
within a minute or so.

> I'm enclosing the lsusb from a S960C device, with is based on those
> Montage chipsets:

What I wanted to see was the output from "lsusb" on the affected
system, not the output from "lsusb -v -s B:D" on your system.

> > Overall, this may be a very difficult problem to solve.  The
> > 4cd13c21b207 commit was intended to improve throughput at the cost of
> > increased latency.  But then what do you do when the latency becomes
> > too high for the video subsystem to handle?
> 
> Latency can't be too high, otherwise frames will be dropped.

Yes, that's the whole point.

> Even if the Kernel itself doesn't drop, if the delay goes higher
> than a certain threshold, userspace will need to drop, as it
> should be presenting audio and video on real time. Yet, typically,
> userspace will delay it by one or two seconds, with would mean
> 1500-3500 buffers, with I suspect it is a lot more than the hardware
> limits. So I suspect that the hardware starves free buffers a way 
> before userspace, as media hardware don't have unlimited buffers
> inside them, as they assume that the Kernel/userspace will be fast
> enough to sustain bit rates up to 66 Mbps of payload.

The timing information would tell us how large the latency is.

In any case, you might be able to attack the problem simply by using
more than 8 buffers.  With just eight 4096-byte buffers, the total
pipeline capacity is only about 0.62 ms (at the maximum possible
transfer rate).  Increasing the number of buffers to 65 would give a
capacity of 5 ms, which is probably a lot better suited for situations
where completions are handled by the ksoftirqd thread.

> Perhaps media drivers could pass some quirk similar to URB_ISO_ASAP,
> in order to revert the kernel logic to prioritize latency instead of
> throughput.

It can't be done without pervasive changes to the USB subsystem, which
I would greatly prefer to avoid.  Besides, this wouldn't really solve
the problem.  Decreasing the latency for one device will cause it to be
increased for others.

Alan Stern

[PATCH net] ipv6: sr: fix TLVs not being copied using setsockopt

2018-01-07 Thread Mathieu Xhonneux

Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
to a socket through setsockopt, but the current implementation doesn't
copy possible TLVs at the end of the SRH (i.e., the following branch
if (sr_has_hmac(sr_phdr)) will never be taken as no HMAC TLV is copied).

This commit adds a memcpy in case TLVs have been appended to the SRH.

Fixes: a149e7c7ce812561f0fdc7a86ddc42f294e5eb3e ("ipv6: sr: add
support for SRH injection through setsockopt")
Acked-by: David Lebrun 
Signed-off-by: Mathieu Xhonneux 
---
 net/ipv6/exthdrs.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
index 83bd75713535..bc68eb661970 100644
--- a/net/ipv6/exthdrs.c
+++ b/net/ipv6/exthdrs.c
@@ -925,6 +925,15 @@ static void ipv6_push_rthdr4(struct sk_buff *skb, u8 
*proto,
sr_phdr->segments[0] = **addr_p;
*addr_p = &sr_ihdr->segments[sr_ihdr->segments_left];
 
+   if (sr_ihdr->hdrlen > hops * 2) {
+   int tlvs_offset, tlvs_length;
+
+   tlvs_offset = (1 + hops * 2) << 3;
+   tlvs_length = (sr_ihdr->hdrlen - hops * 2) << 3;
+   memcpy((char *)sr_phdr + tlvs_offset,
+  (char *)sr_ihdr + tlvs_offset, tlvs_length);
+   }
+
 #ifdef CONFIG_IPV6_SEG6_HMAC
if (sr_has_hmac(sr_phdr)) {
struct net *net = NULL;
-- 
2.15.1

Re: [net-next] netfilter: add segment routing header 'srh' match

2018-01-07 Thread Ahmed Abdelsalam

On Sun, 7 Jan 2018 00:40:03 +0100
Pablo Neira Ayuso  wrote:

> Hi Ahmed,
> 
> On Fri, Dec 29, 2017 at 12:07:52PM +0100, Ahmed Abdelsalam wrote:
> > It allows matching packets based on Segment Routing Header
> > (SRH) information.
> > The implementation considers revision 7 of the SRH draft.
> > https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
> > 
> > Currently supported match options include:
> > (1) Next Header
> > (2) Hdr Ext Len
> > (3) Segments Left
> > (4) Last Entry
> > (5) Tag value of SRH
> > 
> > Signed-off-by: Ahmed Abdelsalam 
> > ---
> >  include/uapi/linux/netfilter_ipv6/ip6t_srh.h |  63 ++
> >  net/ipv6/netfilter/Kconfig   |   9 ++
> >  net/ipv6/netfilter/Makefile  |   1 +
> >  net/ipv6/netfilter/ip6t_srh.c| 165 
> > +++
> >  4 files changed, 238 insertions(+)
> >  create mode 100644 include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> >  create mode 100644 net/ipv6/netfilter/ip6t_srh.c
> > 
> > diff --git a/include/uapi/linux/netfilter_ipv6/ip6t_srh.h 
> > b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> > new file mode 100644
> > index 000..1b5dbd8
> > --- /dev/null
> > +++ b/include/uapi/linux/netfilter_ipv6/ip6t_srh.h
> > @@ -0,0 +1,63 @@
> > +/**
> > + * Definitions for Segment Routing Header 'srh' match
> > + *
> > + * Author:
> > + *   Ahmed Abdelsalam   
> > + */
> 
> Please, add this in SPDX format instead.
> 
> See include/uapi/linux/netfilter/xt_owner.h for instance.
> 
Ok
> > +#ifndef _IP6T_SRH_H
> > +#define _IP6T_SRH_H
> > +
> > +#include 
> > +#include 
> > +
> > +/* Values for "mt_flags" field in struct ip6t_srh */
> > +#define IP6T_SRH_NEXTHDR0x0001
> > +#define IP6T_SRH_LEN_EQ 0x0002
> > +#define IP6T_SRH_LEN_GT 0x0004
> > +#define IP6T_SRH_LEN_LT 0x0008
> > +#define IP6T_SRH_SEGS_EQ0x0010
> > +#define IP6T_SRH_SEGS_GT0x0020
> > +#define IP6T_SRH_SEGS_LT0x0040
> > +#define IP6T_SRH_LAST_EQ0x0080
> > +#define IP6T_SRH_LAST_GT0x0100
> > +#define IP6T_SRH_LAST_LT0x0200
> > +#define IP6T_SRH_TAG0x0400
> > +#define IP6T_SRH_MASK   0x07FF
> > +
> > +/* Values for "mt_invflags" field in struct ip6t_srh */
> > +#define IP6T_SRH_INV_NEXTHDR0x0001
> > +#define IP6T_SRH_INV_LEN_EQ 0x0002
> > +#define IP6T_SRH_INV_LEN_GT 0x0004
> > +#define IP6T_SRH_INV_LEN_LT 0x0008
> > +#define IP6T_SRH_INV_SEGS_EQ0x0010
> > +#define IP6T_SRH_INV_SEGS_GT0x0020
> > +#define IP6T_SRH_INV_SEGS_LT0x0040
> > +#define IP6T_SRH_INV_LAST_EQ0x0080
> > +#define IP6T_SRH_INV_LAST_GT0x0100
> > +#define IP6T_SRH_INV_LAST_LT0x0200
> > +#define IP6T_SRH_INV_TAG0x0400
> > +#define IP6T_SRH_INV_MASK   0x07FF
> 
> Looking at all these EQ, GT, LT... I think this should be very easy to
> implement in nf_tables with no kernel changes.
> 
> You only need to add the protocol definition to:
> 
> nftables/src/exthdr.c
> 
> Would you have a look into this? This would be very much appreciated
> to we keep nftables in sync with what we have in iptables.
Yes, I look into it. I will send you a patch for nf_tables as well. 
> 
> > +
> > +/**
> > + *  struct ip6t_srh - SRH match options
> > + *  @ next_hdr: Next header field of SRH
> > + *  @ hdr_len: Extension header length field of SRH
> > + *  @ segs_left: Segments left field of SRH
> > + *  @ last_entry: Last entry field of SRH
> > + *  @ tag: Tag field of SRH
> > + *  @ mt_flags: match options
> > + *  @ mt_invflags: Invert the sense of match options
> > + */
> > +
> > +struct ip6t_srh {
> > +   __u8next_hdr;
> > +   __u8hdr_len;
> > +   __u8segs_left;
> > +   __u8last_entry;
> > +   __u16   tag;
> > +   __u16   mt_flags;
> > +   __u16   mt_invflags;
> > +};
> > +
> > +#endif /*_IP6T_SRH_H*/
> > diff --git a/net/ipv6/netfilter/Kconfig b/net/ipv6/netfilter/Kconfig
> > index 6acb2ee..e1818eb 100644
> > --- a/net/ipv6/netfilter/Kconfig
> > +++ b/net/ipv6/netfilter/Kconfig
> > @@ -232,6 +232,15 @@ config IP6_NF_MATCH_RT
> >  
> >   To compile it as a module, choose M here.  If unsure, say N.
> >  
> > +config IP6_NF_MATCH_SRH
> > +tristate '"srh" Segment Routing header match support'
> > +depends on NETFILTER_ADVANCED
> > +help
> > +  srh matching allows you to match packets based on the segment
> > + routing header of the packet.
> > +
> > +  To compile it as a module, choose M here.  If unsure, say N.
> > +
> >  # The targets
> >  config IP6_NF_TARGET_HL
> > tristate '"HL" hoplimit target support'
> > diff --git a/net/ipv6/netfilter/Makefile b/net/ipv6/netfilter/Makefile
> > index c6ee0cd..e0d51a9 100644
> > --- a/net/ipv6/netfilter/Makefile
> > +++ b/net/ipv6/netfilter/Makefile
> > @@ -54,6 +54,7 @@ obj-$(CONFIG_IP6

Re: atm/clip: Use seq_puts() in svc_addr()

2018-01-07 Thread SF Markus Elfring

>> Is the function "seq_puts" a bit more efficient for the desired output
>> of a single string in comparison to calling the function "seq_printf"
>> for this purpose?
> 
> Will you please be so kind and tell us?

How do you think about to get the run time characteristics for these
sequence output functions better documented?
https://elixir.free-electrons.com/linux/v4.15-rc6/source/fs/seq_file.c#L660

Can an information like “WARNING: Prefer seq_puts to seq_printf”
(from the script “checkpatch.pl”) be another incentive?


>>> and "strings should be quickly put into a sequence"
>>> isn't terribly helpful.  
>>
>> Which wording would you find more appropriate for the suggested
>> adjustment of these function calls?
> 
> Whatever describes the actual issue and what you're doing about it.
> Turn your rhetorical question above into a commit message, done.
> 
> Compare that with your original commit message, on the other hand,
> and you should understand what I mean.

Which descriptions are you really missing for the affected data output?

Regards,
Markus

Re: [PATCH net-next 01/18] ipv6: Remove redundant route flushing during namespace dismantle

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> By the time fib6_net_exit() is executed all the netdevs in the namespace
> have been either unregistered or pushed back to the default namespace.
> That is because pernet subsys operations are always ordered before
> pernet device operations and therefore invoked after them during
> namespace dismantle.
> 
> Thus, all the routing tables in the namespace are empty by the time
> fib6_net_exit() is invoked and the call to rt6_ifdown() can be removed.
> 
> This allows us to simplify the condition in fib6_ifdown() as it's only
> ever called with an actual netdev.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/ip6_fib.c | 1 -
>  net/ipv6/route.c   | 8 +++-
>  2 files changed, 3 insertions(+), 6 deletions(-)
> 

Acked-by: David Ahern

Re: [PATCH net-next 02/18] ipv6: Mark dead nexthops with appropriate flags

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> When a netdev is put administratively down or unregistered all the
> nexthops using it as their nexthop device should be marked with the
> 'dead' and 'linkdown' flags.
> 
> Currently, when a route is dumped its nexthop device is tested and the
> flags are set accordingly. A similar check is performed during route
> lookup.
> 
> Instead, we can simply mark the nexthops based on netdev events and
> avoid checking the netdev's state during route dump and lookup.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 

Acked-by: David Ahern

Re: [PATCH net-next 03/18] ipv6: Clear nexthop flags upon netdev up

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Previous patch marked nexthops with the 'dead' and 'linkdown' flags.
> Clear these flags when the netdev comes back up.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  include/net/ip6_route.h |  1 +
>  net/ipv6/addrconf.c |  3 +++
>  net/ipv6/route.c| 29 +
>  3 files changed, 33 insertions(+)
> 

Acked-by: David Ahern

Re: [PATCH net-next 04/18] ipv6: Prepare to handle multiple netdev events

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> To make IPv6 more in line with IPv4 we need to be able to respond
> differently to different netdev events. For example, when a netdev is
> unregistered all the routes using it as their nexthop device should be
> flushed, whereas when the netdev's carrier changes only the 'linkdown'
> flag should be toggled.
> 
> Currently, this is not possible, as the function that traverses the
> routing tables is not aware of the triggering event.
> 
> Propagate the triggering event down, so that it could be used in later
> patches.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  include/net/ip6_route.h |  2 +-
>  net/ipv6/addrconf.c |  4 ++--
>  net/ipv6/route.c| 37 +
>  3 files changed, 24 insertions(+), 19 deletions(-)

Acked-by: David Ahern

Re: [PATCH net] ipv6: sr: fix TLVs not being copied using setsockopt

2018-01-07 Thread Mathieu Xhonneux

Just realized I messed up the justification about sr_has_hmac. The
branch will be taken, but its execution will not complete since the
TLV's len and type fields aren't copied, hence seg6_get_tlv_hmac will
fail, and the HMAC will not be computed.

2018-01-07 17:12 GMT+00:00 Mathieu Xhonneux :
> Function ipv6_push_rthdr4 allows to add an IPv6 Segment Routing Header
> to a socket through setsockopt, but the current implementation doesn't
> copy possible TLVs at the end of the SRH (i.e., the following branch
> if (sr_has_hmac(sr_phdr)) will never be taken as no HMAC TLV is copied).
>
> This commit adds a memcpy in case TLVs have been appended to the SRH.
>
> Fixes: a149e7c7ce812561f0fdc7a86ddc42f294e5eb3e ("ipv6: sr: add
> support for SRH injection through setsockopt")
> Acked-by: David Lebrun 
> Signed-off-by: Mathieu Xhonneux 
> ---
>  net/ipv6/exthdrs.c | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/net/ipv6/exthdrs.c b/net/ipv6/exthdrs.c
> index 83bd75713535..bc68eb661970 100644
> --- a/net/ipv6/exthdrs.c
> +++ b/net/ipv6/exthdrs.c
> @@ -925,6 +925,15 @@ static void ipv6_push_rthdr4(struct sk_buff *skb, u8 
> *proto,
> sr_phdr->segments[0] = **addr_p;
> *addr_p = &sr_ihdr->segments[sr_ihdr->segments_left];
>
> +   if (sr_ihdr->hdrlen > hops * 2) {
> +   int tlvs_offset, tlvs_length;
> +
> +   tlvs_offset = (1 + hops * 2) << 3;
> +   tlvs_length = (sr_ihdr->hdrlen - hops * 2) << 3;
> +   memcpy((char *)sr_phdr + tlvs_offset,
> +  (char *)sr_ihdr + tlvs_offset, tlvs_length);
> +   }
> +
>  #ifdef CONFIG_IPV6_SEG6_HMAC
> if (sr_has_hmac(sr_phdr)) {
> struct net *net = NULL;
> --
> 2.15.1
>

Re: [PATCH net-next 05/18] ipv6: Set nexthop flags upon carrier change

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Similar to IPv4, when the carrier of a netdev changes we should toggle
> the 'linkdown' flag on all the nexthops using it as their nexthop
> device.
> 
> This will later allow us to test for the presence of this flag during
> route lookup and dump.
> 
> Up until commit 4832c30d5458 ("net: ipv6: put host and anycast routes on
> device with address") host and anycast routes used the loopback netdev
> as their nexthop device and thus were not marked with the 'linkdown'
> flag. The patch preserves this behavior and allows one to ping the local
> address even when the nexthop device does not have a carrier and the
> 'ignore_routes_with_linkdown' sysctl is set.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  include/net/ip6_route.h |  1 +
>  net/ipv6/addrconf.c |  2 ++
>  net/ipv6/route.c| 23 +--
>  3 files changed, 20 insertions(+), 6 deletions(-)

Acked-by: David Ahern

Re: [PATCH net-next 06/18] ipv6: Set nexthop flags during route creation

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> It is valid to install routes with a nexthop device that does not have a
> carrier, so we need to make sure they're marked accordingly.
> 
> As explained in the previous patch, host and anycast routes are never
> marked with the 'linkdown' flag.
> 
> Note that reject routes are unaffected, as these use the loopback device
> which always has a carrier.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: David Ahern

Re: [PATCH net-next 07/18] ipv6: Check nexthop flags during route lookup instead of carrier

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Now that the RTNH_F_LINKDOWN flag is set in nexthops, we can avoid the
> need to dereference the nexthop device and check its carrier and instead
> check for the presence of the flag.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)

Acked-by: David Ahern

Re: [PATCH net-next 08/18] ipv6: Check nexthop flags in route dump instead of carrier

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Similar to previous patch, there is no need to check for the carrier of
> the nexthop device when dumping the route and we can instead check for
> the presence of the RTNH_F_LINKDOWN flag.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

Acked-by: David Ahern

Re: [PATCH net-next 09/18] ipv6: Ignore dead routes during lookup

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Currently, dead routes are only present in the routing tables in case
> the 'ignore_routes_with_linkdown' sysctl is set. Otherwise, they are
> flushed.
> 
> Subsequent patches are going to remove the reliance on this sysctl and
> make IPv6 more consistent with IPv4.
> 
> Before this is done, we need to make sure dead routes are skipped during
> route lookup, so as to not cause packet loss.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 18 ++
>  1 file changed, 14 insertions(+), 4 deletions(-)

Acked-by: David Ahern

Aw: Re: dvb usb issues since kernel 4.9

2018-01-07 Thread Josef Griebichler

Hi,

here I provide lsusb from my affected hardware (technotrend s2-4600).
http://ix.io/DLY

With this hardware I had errors when recording with tvheadend. Livetv was ok, 
only channel switching made some problems sometimes. Please see attached 
tvheadend service logs.

I also provide dmesg (libreelec on rpi3 with kernel 4.14.10 with revert of the 
mentioned commit).
http://ix.io/DM2


Regards
Josef
 
 

Gesendet: Sonntag, 07. Januar 2018 um 16:41 Uhr
Von: "Alan Stern" 
An: "Mauro Carvalho Chehab" 
Cc: "Josef Griebichler" , "Greg Kroah-Hartman" 
, linux-...@vger.kernel.org, "Eric Dumazet" 
, "Rik van Riel" , "Paolo Abeni" 
, "Hannes Frederic Sowa" , "Jesper 
Dangaard Brouer" , linux-kernel 
, netdev , "Jonathan 
Corbet" , LMML , "Peter Zijlstra" 
, "David Miller" , 
torva...@linux-foundation.org
Betreff: Re: dvb usb issues since kernel 4.9
On Sun, 7 Jan 2018, Mauro Carvalho Chehab wrote: > > > It seems that the 
original patch were designed to solve some IRQ issues > > > with network cards 
with causes data losses on high traffic. However, > > > it is also causing bad 
effects on sustained high bandwidth demands > > > required by DVB cards, at 
least on some USB host drivers. > > > > > > Alan/Greg/Eric/David: > > > > > > 
Any ideas about how to fix it without causing regressions to > > > network? > > 
> > It would be good to know what hardware was involved on the x86 system > > 
and to have some timing data. Can we see the output from lsusb and > > usbmon, 
running on a vanilla kernel that gets plenty of video glitches? > > From 
Josef's report, and from the BZ, the affected hardware seems > to be based on 
Montage Technology M88DS3103/M88TS2022 chipset. What type of USB host 
controller does the x86_64 system use? EHCI or xHCI? > The driver it uses is at 
drivers/media/usb/dvb-usb-v2/dvbsky.c, > with shares a USB implementation that 
is used by a lot more drivers. > The URB handling code is at: > > 
drivers/media/usb/dvb-usb-v2/usb_urb.c > > This particular driver allocates 8 
buffers with 4096 bytes each > for bulk transfers, using transfer_flags = 
URB_NO_TRANSFER_DMA_MAP. > > This become a popular USB hardware nowadays. I 
have one S960c > myself, so I can send you the lsusb from it. You should 
notice, however, > that a DVB-C/DVB-S2 channel can easily provide very high 
sustained bit > rates. Here, on my DVB-S2 provider, a typical transponder 
produces 58 Mpps > of payload after removing URB headers. You mentioned earlier 
that the driver uses bulk transfers. In USB-2.0, the maximum possible payload 
data transfer rate using bulk transfers is 53248 bytes/ms, which is 53.248 MB/s 
(i.e., lower than 58 MB/s). And even this is possible only if almost nothing 
else is using the bus at the same time. > A 10 minutes record with the > entire 
data (with typically contains 5-10 channels) can easily go > above 4 GB, just 
to reproduce 1-2 glitches. So, I'm not sure if > a usbmon dump would be useful. 
It might not be helpful at all. However, I'm not interested in the payload data 
(which would be unintelligible to me anyway) but rather the timing of URB 
submissions and completions. A usbmon trace which didn't keep much of the 
payload data would only require on the order of 50 MB per minute -- and Josef 
said that glitches usually would show up within a minute or so. > I'm enclosing 
the lsusb from a S960C device, with is based on those > Montage chipsets: What 
I wanted to see was the output from "lsusb" on the affected system, not the 
output from "lsusb -v -s B:D" on your system. > > Overall, this may be a very 
difficult problem to solve. The > > 4cd13c21b207 commit was intended to improve 
throughput at the cost of > > increased latency. But then what do you do when 
the latency becomes > > too high for the video subsystem to handle? > > Latency 
can't be too high, otherwise frames will be dropped. Yes, that's the whole 
point. > Even if the Kernel itself doesn't drop, if the delay goes higher > 
than a certain threshold, userspace will need to drop, as it > should be 
presenting audio and video on real time. Yet, typically, > userspace will delay 
it by one or two seconds, with would mean > 1500-3500 buffers, with I suspect 
it is a lot more than the hardware > limits. So I suspect that the hardware 
starves free buffers a way > before userspace, as media hardware don't have 
unlimited buffers > inside them, as they assume that the Kernel/userspace will 
be fast > enough to sustain bit rates up to 66 Mbps of payload. The timing 
information would tell us how large the latency is. In any case, you might be 
able to attack the problem simply by using more than 8 buffers. With just eight 
4096-byte buffers, the total pipeline capacity is only about 0.62 ms (at the 
maximum possible transfer rate). Increasing the number of buffers to 65 would 
give a capacity of 5 ms, which is probably a lot better suited for situations 
where completions are handled by the ksoftirqd thread. > Perhaps media drivers 
could p

Re: [PATCH net-next 10/18] ipv6: Report dead flag during route dump

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Up until now the RTNH_F_DEAD flag was only reported in route dump when
> the 'ignore_routes_with_linkdown' sysctl was set. This is expected as
> dead routes were flushed otherwise.
> 
> The reliance on this sysctl is going to be removed, so we need to report
> the flag regardless of the sysctl's value.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 3 +++
>  1 file changed, 3 insertions(+)

Acked-by: David Ahern

Re: [PATCH net-next 11/18] ipv6: Add explicit flush indication to routes

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> When routes that are a part of a multipath route are evaluated by
> fib6_ifdown() in response to NETDEV_DOWN and NETDEV_UNREGISTER events
> the state of their sibling routes is not considered.
> 
> This will change in subsequent patches in order to align IPv6 with
> IPv4's behavior. For example, when the last sibling in a multipath route
> becomes dead, the entire multipath route needs to be removed.
> 
> To prevent the tree walker from re-evaluating all the sibling routes
> each time, we can simply evaluate them once - when the first sibling is
> traversed.
> 
> If we determine the entire multipath route needs to be removed, then the
> 'should_flush' bit is set in all the siblings, which will cause the
> walker to flush them when it traverses them.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  include/net/ip6_fib.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Acked-by: David Ahern

Re: [PATCH net-next 12/18] ipv6: Teach tree walker to skip multipath routes

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> As explained in previous patch, fib6_ifdown() needs to consider the
> state of all the sibling routes when a multipath route is traversed.
> 
> This is done by evaluating all the siblings when the first sibling in a
> multipath route is traversed. If the multipath route does not need to be
> flushed (e.g., not all siblings are dead), then we should just skip the
> multipath route as our work is done.
> 
> Have the tree walker jump to the last sibling when it is determined that
> the multipath route needs to be skipped.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/ip6_fib.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 

Acked-by: David Ahern

Re: [PATCH net-next 13/18] ipv6: Export sernum update function

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> We are going to allow dead routes to stay in the FIB tree (e.g., when
> they are part of a multipath route, directly connected route with no
> carrier) and revive them when their nexthop device gains carrier or when
> it is put administratively up.
> 
> This is equivalent to the addition of the route to the FIB tree and we
> should therefore take care of updating the sernum of all the parent
> nodes of the node where the route is stored. Otherwise, we risk sockets
> caching and using sub-optimal dst entries.
> 
> Export the function that performs the above, so that it could be invoked
> from fib6_ifup() later on.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  include/net/ip6_fib.h |  1 +
>  net/ipv6/ip6_fib.c| 11 ---
>  2 files changed, 9 insertions(+), 3 deletions(-)
> 

Acked-by: David Ahern

Re: [PATCH net-next 14/18] ipv6: Take table lock outside of sernum update function

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> The next patch is going to allow dead routes to remain in the FIB tree
> in certain situations.
> 
> When this happens we need to be sure to bump the sernum of the nodes
> where these are stored so that potential copies cached in sockets are
> invalidated.
> 
> The function that performs this update assumes the table lock is not
> taken when it is invoked, but that will not be the case when it is
> invoked by the tree walker.
> 
> Have the function assume the lock is taken and make the single caller
> take the lock itself.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/ip6_fib.c | 5 +
>  net/ipv6/route.c   | 2 ++
>  2 files changed, 3 insertions(+), 4 deletions(-)
> 

Acked-by: David Ahern

Re: [PATCH net-next 15/18] ipv6: Flush multipath routes when all siblings are dead

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> By default, IPv6 deletes nexthops from a multipath route when the
> nexthop device is put administratively down. This differs from IPv4
> where the nexthops are kept, but marked with the RTNH_F_DEAD flag. A
> multipath route is flushed when all of its nexthops become dead.
> 
> Align IPv6 with IPv4 and have it conform to the same guidelines.
> 
> In case the multipath route needs to be flushed, its siblings are
> flushed one by one. Otherwise, the nexthops are marked with the
> appropriate flags and the tree walker is instructed to skip all the
> siblings.
> 
> As explained in previous patches, care is taken to update the sernum of
> the affected tree nodes, so as to prevent the use of wrong dst entries.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  net/ipv6/route.c | 83 
> ++--
>  1 file changed, 75 insertions(+), 8 deletions(-)

Acked-by: David Ahern

Re: [PATCH net-next 16/18] selftests: fib_tests: Add test cases for IPv4/IPv6 FIB

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Add test cases to check that IPv4 and IPv6 react to a netdev being
> unregistered as expected.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  tools/testing/selftests/net/Makefile |   1 +
>  tools/testing/selftests/net/fib_tests.sh | 146 
> +++
>  2 files changed, 147 insertions(+)
>  create mode 100755 tools/testing/selftests/net/fib_tests.sh

Acked-by: David Ahern 

FYI: "ip -netns testns" is more efficient than "ip netns exec testns ip".

Re: [PATCH net-next 17/18] selftests: fib_tests: Add test cases for netdev down

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Check that IPv4 and IPv6 react the same when a netdev is being put
> administratively down.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  tools/testing/selftests/net/fib_tests.sh | 141 
> +++
>  1 file changed, 141 insertions(+)
> 

Acked-by: David Ahern

Re: [PATCH net-next 18/18] selftests: fib_tests: Add test cases for netdev carrier change

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> Check that IPv4 and IPv6 react the same when the carrier of a netdev is
> toggled. Local routes should not be affected by this, whereas unicast
> routes should.
> 
> Signed-off-by: Ido Schimmel 
> ---
>  tools/testing/selftests/net/fib_tests.sh | 142 
> +++
>  1 file changed, 142 insertions(+)

Acked-by: David Ahern

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread James Bottomley

On Sat, 2018-01-06 at 20:36 -0500, David Miller wrote:
> From: Willy Tarreau 
> Date: Sat, 6 Jan 2018 21:42:29 +0100
> 
> > On Sat, Jan 06, 2018 at 06:38:59PM +, Alan Cox wrote:
> >> Normally people who propose security fixes don't have to argue
> about the
> >> fact they added 30 clocks to avoid your box being 0wned.
> > 
> > In fact it depends, because if a fix makes the system unusable for
> its
> > initial purpose, this fix will simply not be deployed at all, which
> is
> > the worst that can happen.
> 
> +1
> 
> I completely agree with Willy and Alexei.
> 
> And the scale isn't even accurate, we're talking about at least
> hundreds upon hundreds of clocks, not 30, if we add an operation
> whose side effect is to wait for all pending loads to complete.  So
> yeah this is going to be heavily scrutinized.

Plus this is the standard kernel code review MO: we've never blindly
accepted code just because *security* (otherwise we'd have grsec in by
now).  We use the pushback to get better and more performant code.
 What often happens is it turns out that the "either security or
performance" position was a false dichotomy and there is a way of
fixing stuff that's acceptable (although not usually perfect) for
everyone.  I'm not saying this always happens, but it is reasonable to
let the iterative pushback see if we can get to better code in this
case rather than trying to cut it of with the "because *security*"
argument.

James

Re: [PATCH net-next 00/18] ipv6: Align nexthop behaviour with IPv4

2018-01-07 Thread David Ahern

On 1/7/18 3:45 AM, Ido Schimmel wrote:
> This set tries to eliminate some differences between IPv4's and IPv6's
> treatment of nexthops. These differences are most likely a side effect
> of IPv6's data structures (specifically 'rt6_info') that incorporate
> both the route and the nexthop and the late addition of ECMP support in
> commit 51ebd3181572 ("ipv6: add support of equal cost multipath
> (ECMP)").
> 
> IPv4 and IPv6 do not react the same to certain netdev events. For
> example, upon carrier change affected IPv4 nexthops are marked using the
> RTNH_F_LINKDOWN flag and the nexthop group is rebalanced accordingly.
> IPv6 on the other hand, does nothing which forces us to perform a
> carrier check during route lookup and dump. This makes it difficult to
> introduce features such as non-equal-cost multipath that are built on
> top of this set [1].
> 
> In addition, when a netdev is put administratively down IPv4 nexthops
> are marked using the RTNH_F_DEAD flag, whereas IPv6 simply flushes all
> the routes using these nexthops. To be consistent with IPv4, multipath
> routes should only be flushed when all nexthops in the group are
> considered dead.
> 
> The first 12 patches introduce non-functional changes that store the
> RTNH_F_DEAD and RTNH_F_LINKDOWN flags in IPv6 routes based on netdev
> events, in a similar fashion to IPv4. This allows us to remove the
> carrier check performed during route lookup and dump.
> 
> The next three patches make sure we only flush a multipath route when
> all of its nexthops are dead.
> 
> Last three patches add test cases for IPv4/IPv6 FIB. These verify that
> both address families react similarly to netdev events.
> 
> Finally, this series also serves as a good first step towards David
> Ahern's goal of treating nexthops as standalone objects [2], as it makes
> the code more in line with IPv4 where the nexthop and the nexthop group
> are separate objects from the route itself.
> 
> 1. https://github.com/idosch/linux/tree/ipv6-nexthops
> 2. http://vger.kernel.org/netconf2017_files/nexthop-objects.pdf
> 

Thanks for working on this - and creating the test cases.

One of many follow on changes that would be beneficial is to remove the
idev dereference in the hot path to check the
ignore_routes_with_linkdown sysctl.

Re: [PATCH net] ipv6: remove null_entry before adding default route

2018-01-07 Thread Wei Wang

On Sat, Jan 6, 2018 at 10:16 PM, Martin KaFai Lau  wrote:
> On Sat, Jan 06, 2018 at 05:41:28PM -0800, Wei Wang wrote:
>> On Fri, Jan 5, 2018 at 11:42 PM, Martin KaFai Lau  wrote:
>> > On Fri, Jan 05, 2018 at 05:38:35PM -0800, Wei Wang wrote:
>> >> From: Wei Wang 
>> >>
>> >> In the current code, when creating a new fib6 table, tb6_root.leaf gets
>> >> initialized to net->ipv6.ip6_null_entry.
>> >> If a default route is being added with rt->rt6i_metric = 0x,
>> >> fib6_add() will add this route after net->ipv6.ip6_null_entry. As
>> >> null_entry is shared, it could cause problem.
>> >>
>> >> In order to fix it, set fn->leaf to NULL before calling
>> >> fib6_add_rt2node() when trying to add the first default route.
>> >> And reset fn->leaf to null_entry when adding fails or when deleting the
>> >> last default route.
>> >>
>> >> syzkaller reported the following issue which is fixed by this commit:
>> >> =
>> >> WARNING: suspicious RCU usage
>> >> 4.15.0-rc5+ #171 Not tainted
>> >> -
>> >> net/ipv6/ip6_fib.c:1702 suspicious rcu_dereference_protected() usage!
>> >>
>> >> other info that might help us debug this:
>> >>
>> >> rcu_scheduler_active = 2, debug_locks = 1
>> >> 4 locks held by swapper/0/0:
>> >>  #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [] 
>> >> lockdep_copy_map include/linux/lockdep.h:178 [inline]
>> >>  #0:  ((&net->ipv6.ip6_fib_timer)){+.-.}, at: [] 
>> >> call_timer_fn+0x1c6/0x820 kernel/time/timer.c:1310
>> >>  #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<2ff9d65c>] 
>> >> spin_lock_bh include/linux/spinlock.h:315 [inline]
>> >>  #1:  (&(&net->ipv6.fib6_gc_lock)->rlock){+.-.}, at: [<2ff9d65c>] 
>> >> fib6_run_gc+0x9d/0x3c0 net/ipv6/ip6_fib.c:2007
>> >>  #2:  (rcu_read_lock){}, at: [<91db762d>] 
>> >> __fib6_clean_all+0x0/0x3a0 net/ipv6/ip6_fib.c:1560
>> >>  #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<9e503581>] 
>> >> spin_lock_bh include/linux/spinlock.h:315 [inline]
>> >>  #3:  (&(&tb->tb6_lock)->rlock){+.-.}, at: [<9e503581>] 
>> >> __fib6_clean_all+0x1d0/0x3a0 net/ipv6/ip6_fib.c:1948
>> >>
>> >> stack backtrace:
>> >> CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.15.0-rc5+ #171
>> >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS 
>> >> Google 01/01/2011
>> >> Call Trace:
>> >>  
>> >>  __dump_stack lib/dump_stack.c:17 [inline]
>> >>  dump_stack+0x194/0x257 lib/dump_stack.c:53
>> >>  lockdep_rcu_suspicious+0x123/0x170 kernel/locking/lockdep.c:4585
>> >>  fib6_del+0xcaa/0x11b0 net/ipv6/ip6_fib.c:1701
>> >>  fib6_clean_node+0x3aa/0x4f0 net/ipv6/ip6_fib.c:1892
>> >>  fib6_walk_continue+0x46c/0x8a0 net/ipv6/ip6_fib.c:1815
>> >>  fib6_walk+0x91/0xf0 net/ipv6/ip6_fib.c:1863
>> >>  fib6_clean_tree+0x1e6/0x340 net/ipv6/ip6_fib.c:1933
>> >>  __fib6_clean_all+0x1f4/0x3a0 net/ipv6/ip6_fib.c:1949
>> >>  fib6_clean_all net/ipv6/ip6_fib.c:1960 [inline]
>> >>  fib6_run_gc+0x16b/0x3c0 net/ipv6/ip6_fib.c:2016
>> >>  fib6_gc_timer_cb+0x20/0x30 net/ipv6/ip6_fib.c:2033
>> >>  call_timer_fn+0x228/0x820 kernel/time/timer.c:1320
>> >>  expire_timers kernel/time/timer.c:1357 [inline]
>> >>  __run_timers+0x7ee/0xb70 kernel/time/timer.c:1660
>> >>  run_timer_softirq+0x4c/0xb0 kernel/time/timer.c:1686
>> >>  __do_softirq+0x2d7/0xb85 kernel/softirq.c:285
>> >>  invoke_softirq kernel/softirq.c:365 [inline]
>> >>  irq_exit+0x1cc/0x200 kernel/softirq.c:405
>> >>  exiting_irq arch/x86/include/asm/apic.h:540 [inline]
>> >>  smp_apic_timer_interrupt+0x16b/0x700 arch/x86/kernel/apic/apic.c:1052
>> >>  apic_timer_interrupt+0xa9/0xb0 arch/x86/entry/entry_64.S:904
>> >>  
>> >>
>> >> Reported-by: syzbot 
>> >> Fixes: 66f5d6ce53e6 ("ipv6: replace rwlock with rcu and spinlock in 
>> >> fib6_table")
>> >> Signed-off-by: Wei Wang 
>> >> ---
>> >>  net/ipv6/ip6_fib.c | 45 +++--
>> >>  1 file changed, 35 insertions(+), 10 deletions(-)
>> >>
>> >> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
>> >> index d11a5578e4f8..37cb4ad1ea29 100644
>> >> --- a/net/ipv6/ip6_fib.c
>> >> +++ b/net/ipv6/ip6_fib.c
>> >> @@ -640,6 +640,11 @@ static struct fib6_node *fib6_add_1(struct net *net,
>> >>   if (!(fn->fn_flags & RTN_RTINFO)) {
>> >>   RCU_INIT_POINTER(fn->leaf, NULL);
>> >>   rt6_release(leaf);
>> >> + /* remove null_entry in the root node */
>> >> + } else if (fn->fn_flags & RTN_TL_ROOT &&
>> >> +rcu_access_pointer(fn->leaf) ==
>> >> +net->ipv6.ip6_null_entry) {
>> >> + RCU_INIT_POINTER(fn->leaf, NULL);
>> > It seems the reader side could see tb6_root.leaf == NULL after
>> > this change and I think it should be fine?  If it is, instead
>> > of switching betwen NULL and ip6_null_entry, would it be simpler
>> > to a

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Thomas Gleixner

On Sun, 7 Jan 2018, James Bottomley wrote:

> On Sat, 2018-01-06 at 20:36 -0500, David Miller wrote:
> > From: Willy Tarreau 
> > Date: Sat, 6 Jan 2018 21:42:29 +0100
> > 
> > > On Sat, Jan 06, 2018 at 06:38:59PM +, Alan Cox wrote:
> > >> Normally people who propose security fixes don't have to argue
> > about the
> > >> fact they added 30 clocks to avoid your box being 0wned.
> > > 
> > > In fact it depends, because if a fix makes the system unusable for
> > its
> > > initial purpose, this fix will simply not be deployed at all, which
> > is
> > > the worst that can happen.
> > 
> > +1
> > 
> > I completely agree with Willy and Alexei.
> > 
> > And the scale isn't even accurate, we're talking about at least
> > hundreds upon hundreds of clocks, not 30, if we add an operation
> > whose side effect is to wait for all pending loads to complete.  So
> > yeah this is going to be heavily scrutinized.
> 
> Plus this is the standard kernel code review MO: we've never blindly
> accepted code just because *security* (otherwise we'd have grsec in by
> now).  We use the pushback to get better and more performant code.
>  What often happens is it turns out that the "either security or
> performance" position was a false dichotomy and there is a way of
> fixing stuff that's acceptable (although not usually perfect) for
> everyone.  I'm not saying this always happens, but it is reasonable to
> let the iterative pushback see if we can get to better code in this
> case rather than trying to cut it of with the "because *security*"
> argument.

In principle I agree, though there are a few things to consider:

1) We have not the time to discuss that for the next 6 month

2) Alexei's analyis is purely based on the public information of the google
   zero folks. If it would be complete and the only attack vector all fine.

   If not and I doubt it is, we're going to regret this decision faster
   than we made it and this is not the kind of play field where we can
   afford that.

The whole 'we know better and performance is key' attitude is what led to
this disaster in the first place. We should finaly start to learn.

Can we please stop that and live with the extra cycles for a few month up
to the point where we get more informed answers to all these questions?

Thanks,

tglx

Re: [iproute2 1/1] tc: Fix filter protocol output

2018-01-07 Thread Jiri Pirko

Sun, Jan 07, 2018 at 03:43:28PM CET, j...@mojatatu.com wrote:
>From: Jamal Hadi Salim 
>
>Fixes: 249284ff5a44 ("tc: jsonify filter core")
>Signed-off-by: Jamal Hadi Salim 
>---
> tc/tc_filter.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/tc/tc_filter.c b/tc/tc_filter.c
>index 545cc3a..546311a 100644
>--- a/tc/tc_filter.c
>+++ b/tc/tc_filter.c
>@@ -276,7 +276,7 @@ int print_filter(const struct sockaddr_nl *who, struct 
>nlmsghdr *n, void *arg)
>   if (!filter_protocol || filter_protocol != f_proto) {
>   if (f_proto) {
>   SPRINT_BUF(b1);
>-  print_string(PRINT_JSON, "protocol",
>+  print_string(PRINT_ANY, "protocol",

ha! Thanks!

Acked-by: Jiri Pirko

[PATCH net-next v3 RESEND 01/10] net: qualcomm: rmnet: Remove redundant check when stamping map header

2018-01-07 Thread Subash Abhinov Kasiviswanathan

We already check the headroom once in rmnet_map_egress_handler(),
so this is not needed.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 86b8c75..978ce26 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -32,9 +32,6 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct 
sk_buff *skb,
u32 padding, map_datalen;
u8 *padbytes;
 
-   if (skb_headroom(skb) < sizeof(struct rmnet_map_header))
-   return NULL;
-
map_datalen = skb->len - hdrlen;
map_header = (struct rmnet_map_header *)
skb_push(skb, sizeof(struct rmnet_map_header));
-- 
1.9.1

[PATCH net-next v3 RESEND 07/10] net: qualcomm: rmnet: Add support for RX checksum offload

2018-01-07 Thread Subash Abhinov Kasiviswanathan

When using the MAPv4 packet format, receive checksum offload can be
enabled in hardware. The checksum computation over pseudo header is
not offloaded but the rest of the checksum computation over
the payload is offloaded. This applies only for TCP / UDP packets
which are not fragmented.

rmnet validates the TCP/UDP checksum for the packet using the checksum
from the checksum trailer added to the packet by hardware. The
validation performed is as following -

1. Perform 1's complement over the checksum value from the trailer
2. Compute 1's complement checksum over IPv4 / IPv6 header and
   subtracts it from the value from step 1
3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and
   adds it to the value from step 2
4. Subtracts the checksum value from the TCP / UDP header from the
   value from step 3.
5. Compares the value from step 4 to the checksum value from the
   TCP / UDP header.
6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set
   and the packet is passed on to network stack. If there is a
   failure, then the packet is passed on as such without modifying
   the ip_summed field.

The checksum field is also checked for UDP checksum 0 as per RFC 768
and for unexpected TCP checksum of 0.

If checksum offload is disabled when using MAPv4 packet format in
receive path, the packet is queued as is to network stack without
the validations above.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |  15 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h|   4 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 186 -
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c|   2 +
 4 files changed, 201 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 8f8c4f2..3409458 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -66,8 +66,8 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
struct rmnet_port *port)
 {
struct rmnet_endpoint *ep;
+   u16 len, pad;
u8 mux_id;
-   u16 len;
 
if (RMNET_MAP_GET_CD_BIT(skb)) {
if (port->data_format & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
@@ -77,7 +77,8 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
}
 
mux_id = RMNET_MAP_GET_MUX_ID(skb);
-   len = RMNET_MAP_GET_LENGTH(skb) - RMNET_MAP_GET_PAD(skb);
+   pad = RMNET_MAP_GET_PAD(skb);
+   len = RMNET_MAP_GET_LENGTH(skb) - pad;
 
if (mux_id >= RMNET_MAX_LOGICAL_EP)
goto free_skb;
@@ -90,8 +91,14 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
 
/* Subtract MAP header */
skb_pull(skb, sizeof(struct rmnet_map_header));
-   skb_trim(skb, len);
rmnet_set_skb_proto(skb);
+
+   if (port->data_format & RMNET_INGRESS_FORMAT_MAP_CKSUMV4) {
+   if (!rmnet_map_checksum_downlink_packet(skb, len + pad))
+   skb->ip_summed = CHECKSUM_UNNECESSARY;
+   }
+
+   skb_trim(skb, len);
rmnet_deliver_skb(skb);
return;
 
@@ -115,7 +122,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
}
 
if (port->data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
-   while ((skbn = rmnet_map_deaggregate(skb)) != NULL)
+   while ((skbn = rmnet_map_deaggregate(skb, port)) != NULL)
__rmnet_map_ingress_handler(skbn, port);
 
consume_skb(skb);
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 50c50cd..ca9f473 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -83,9 +83,11 @@ struct rmnet_map_ul_csum_header {
 #define RMNET_MAP_NO_PAD_BYTES0
 #define RMNET_MAP_ADD_PAD_BYTES   1
 
-struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb);
+struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb,
+ struct rmnet_port *port);
 struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
  int hdrlen, int pad);
 void rmnet_map_command(struct sk_buff *skb, struct rmnet_port *port);
+int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len);
 
 #endif /* _RMNET_MAP_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 978ce26..881c1dc 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -14,6 +14,9 @@
  */
 
 #include 
+#include 
+#include 
+#include 
 #include "rmnet_config.h"
 #include "rmnet_map.h"
 #include "rmnet_private.h"
@@ -21,6 +24,153 @@
 #define RMNET_MAP

[PATCH net-next v3 RESEND 03/10] net: qualcomm: rmnet: Remove unused function declaration

2018-01-07 Thread Subash Abhinov Kasiviswanathan

rmnet_map_demultiplex() is only declared but not defined anywhere,
so remove it.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index 4df359d..ef0eff2 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -67,7 +67,6 @@ struct rmnet_map_header {
 #define RMNET_MAP_NO_PAD_BYTES0
 #define RMNET_MAP_ADD_PAD_BYTES   1
 
-u8 rmnet_map_demultiplex(struct sk_buff *skb);
 struct sk_buff *rmnet_map_deaggregate(struct sk_buff *skb);
 struct rmnet_map_header *rmnet_map_add_map_header(struct sk_buff *skb,
  int hdrlen, int pad);
-- 
1.9.1

[PATCH net-next v3 RESEND 00/10] net: qualcomm: rmnet: Enable csum offloads

2018-01-07 Thread Subash Abhinov Kasiviswanathan

This series introduces the MAPv4 packet format for checksum
offload plus some other minor changes.

Patches 1-3 are cleanups.

Patch 4 renames the ingress format to data format so that all data
formats can be configured using this going forward.

Patch 5 uses the pacing helper to improve TCP transmit performance.

Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX.
A new header and trailer format are used as part of MAPv4.
For RX checksum offload, only the 1's complement of the IP payload
portion is computed by hardware. The meta data from RX header is
used to verify the checksum field in the packet. Note that the
IP packet and its field itself is not modified by hardware.
This gives metadata to help with the RX checksum. For TX, the
required metadata is filled up so hardware can compute the
checksum.

Patch 10 enables GSO on rmnet devices

v1->v2: Fix sparse errors reported by kbuild test robot

v2->v3: Update the commit message for Patch 5 based on Eric's comments

Subash Abhinov Kasiviswanathan (10):
  net: qualcomm: rmnet: Remove redundant check when stamping map header
  net: qualcomm: rmnet: Remove invalid condition while stamping mux id
  net: qualcomm: rmnet: Remove unused function declaration
  net: qualcomm: rmnet: Rename ingress data format to data format
  net: qualcomm: rmnet: Set pacing shift
  net: qualcomm: rmnet: Define the MAPv4 packet formats
  net: qualcomm: rmnet: Add support for RX checksum offload
  net: qualcomm: rmnet: Handle command packets with checksum trailer
  net: qualcomm: rmnet: Add support for TX checksum offload
  net: qualcomm: rmnet: Add support for GSO

 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c |  10 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h |   2 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |  36 ++-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h|  23 +-
 .../ethernet/qualcomm/rmnet/rmnet_map_command.c|  17 +-
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 309 -
 .../net/ethernet/qualcomm/rmnet/rmnet_private.h|   2 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c|   4 +
 8 files changed, 378 insertions(+), 25 deletions(-)

-- 
1.9.1

[PATCH net-next v3 RESEND 04/10] net: qualcomm: rmnet: Rename ingress data format to data format

2018-01-07 Thread Subash Abhinov Kasiviswanathan

This is done so that we can use this field for both ingress and
egress flags.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c   | 10 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h   |  2 +-
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c |  5 ++---
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
index cedacdd..7e7704d 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.c
@@ -143,7 +143,7 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
 struct nlattr *tb[], struct nlattr *data[],
 struct netlink_ext_ack *extack)
 {
-   int ingress_format = RMNET_INGRESS_FORMAT_DEAGGREGATION;
+   u32 data_format = RMNET_INGRESS_FORMAT_DEAGGREGATION;
struct net_device *real_dev;
int mode = RMNET_EPMODE_VND;
struct rmnet_endpoint *ep;
@@ -185,11 +185,11 @@ static int rmnet_newlink(struct net *src_net, struct 
net_device *dev,
struct ifla_vlan_flags *flags;
 
flags = nla_data(data[IFLA_VLAN_FLAGS]);
-   ingress_format = flags->flags & flags->mask;
+   data_format = flags->flags & flags->mask;
}
 
-   netdev_dbg(dev, "data format [ingress 0x%08X]\n", ingress_format);
-   port->ingress_data_format = ingress_format;
+   netdev_dbg(dev, "data format [0x%08X]\n", data_format);
+   port->data_format = data_format;
 
return 0;
 
@@ -353,7 +353,7 @@ static int rmnet_changelink(struct net_device *dev, struct 
nlattr *tb[],
struct ifla_vlan_flags *flags;
 
flags = nla_data(data[IFLA_VLAN_FLAGS]);
-   port->ingress_data_format = flags->flags & flags->mask;
+   port->data_format = flags->flags & flags->mask;
}
 
return 0;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
index 2ea9fe3..00e4634 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_config.h
@@ -32,7 +32,7 @@ struct rmnet_endpoint {
  */
 struct rmnet_port {
struct net_device *dev;
-   u32 ingress_data_format;
+   u32 data_format;
u8 nr_rmnet_devs;
u8 rmnet_mode;
struct hlist_head muxed_ep[RMNET_MAX_LOGICAL_EP];
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index b2d317e3..8e1f43a 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -69,8 +69,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
u16 len;
 
if (RMNET_MAP_GET_CD_BIT(skb)) {
-   if (port->ingress_data_format
-   & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
+   if (port->data_format & RMNET_INGRESS_FORMAT_MAP_COMMANDS)
return rmnet_map_command(skb, port);
 
goto free_skb;
@@ -114,7 +113,7 @@ static void rmnet_set_skb_proto(struct sk_buff *skb)
skb_push(skb, ETH_HLEN);
}
 
-   if (port->ingress_data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
+   if (port->data_format & RMNET_INGRESS_FORMAT_DEAGGREGATION) {
while ((skbn = rmnet_map_deaggregate(skb)) != NULL)
__rmnet_map_ingress_handler(skbn, port);
 
-- 
1.9.1

[PATCH net-next v3 RESEND 06/10] net: qualcomm: rmnet: Define the MAPv4 packet formats

2018-01-07 Thread Subash Abhinov Kasiviswanathan

The MAPv4 packet format adds support for RX / TX checksum offload.
For a bi-directional UDP stream at a rate of 570 / 146 Mbps, roughly
10% CPU cycles are saved.

For receive path, there is a checksum trailer appended to the end of
the MAP packet. The valid field indicates if hardware has computed
the checksum. csum_start_offset indicates the offset from the start
of the IP header from which hardware has computed checksum.
csum_length is the number of bytes over which the checksum was
computed and the resulting value is csum_value.

In the transmit path, a header is appended between the end of the MAP
header and the start of the IP packet. csum_start_offset is the offset
in bytes from which hardware will compute the checksum if the
csum_enabled bit is set. udp_ip4_ind indicates if the checksum
value of 0 is valid or not. csum_insert_offset is the offset from the
csum_start_offset where hardware will insert the computed checksum.

The use of this additional packet format for checksum offload is
explained in subsequent patches.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h | 16 
 drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index ef0eff2..50c50cd 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -47,6 +47,22 @@ struct rmnet_map_header {
u16 pkt_len;
 }  __aligned(1);
 
+struct rmnet_map_dl_csum_trailer {
+   u8  reserved1;
+   u8  valid:1;
+   u8  reserved2:7;
+   u16 csum_start_offset;
+   u16 csum_length;
+   __be16 csum_value;
+} __aligned(1);
+
+struct rmnet_map_ul_csum_header {
+   __be16 csum_start_offset;
+   u16 csum_insert_offset:14;
+   u16 udp_ip4_ind:1;
+   u16 csum_enabled:1;
+} __aligned(1);
+
 #define RMNET_MAP_GET_MUX_ID(Y) (((struct rmnet_map_header *) \
 (Y)->data)->mux_id)
 #define RMNET_MAP_GET_CD_BIT(Y) (((struct rmnet_map_header *) \
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
index d214280..de0143e 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_private.h
@@ -21,6 +21,8 @@
 /* Constants */
 #define RMNET_INGRESS_FORMAT_DEAGGREGATION  BIT(0)
 #define RMNET_INGRESS_FORMAT_MAP_COMMANDS   BIT(1)
+#define RMNET_INGRESS_FORMAT_MAP_CKSUMV4BIT(2)
+#define RMNET_EGRESS_FORMAT_MAP_CKSUMV4 BIT(3)
 
 /* Replace skb->dev to a virtual rmnet device and pass up the stack */
 #define RMNET_EPMODE_VND (1)
-- 
1.9.1

[PATCH net-next v3 RESEND 08/10] net: qualcomm: rmnet: Handle command packets with checksum trailer

2018-01-07 Thread Subash Abhinov Kasiviswanathan

When using the MAPv4 packet format in conjunction with MAP commands,
a dummy DL checksum trailer will be appended to the packet. Before
this packet is sent out as an ACK, the DL checksum trailer needs to be
removed.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c | 17 +++--
 1 file changed, 15 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
index 51e6049..6bc328f 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c
@@ -58,11 +58,24 @@ static u8 rmnet_map_do_flow_control(struct sk_buff *skb,
 }
 
 static void rmnet_map_send_ack(struct sk_buff *skb,
-  unsigned char type)
+  unsigned char type,
+  struct rmnet_port *port)
 {
struct rmnet_map_control_command *cmd;
int xmit_status;
 
+   if (port->data_format & RMNET_INGRESS_FORMAT_MAP_CKSUMV4) {
+   if (skb->len < sizeof(struct rmnet_map_header) +
+   RMNET_MAP_GET_LENGTH(skb) +
+   sizeof(struct rmnet_map_dl_csum_trailer)) {
+   kfree_skb(skb);
+   return;
+   }
+
+   skb_trim(skb, skb->len -
+sizeof(struct rmnet_map_dl_csum_trailer));
+   }
+
skb->protocol = htons(ETH_P_MAP);
 
cmd = RMNET_MAP_GET_CMD_START(skb);
@@ -100,5 +113,5 @@ void rmnet_map_command(struct sk_buff *skb, struct 
rmnet_port *port)
break;
}
if (rc == RMNET_MAP_COMMAND_ACK)
-   rmnet_map_send_ack(skb, rc);
+   rmnet_map_send_ack(skb, rc, port);
 }
-- 
1.9.1

[PATCH net-next v3 RESEND 02/10] net: qualcomm: rmnet: Remove invalid condition while stamping mux id

2018-01-07 Thread Subash Abhinov Kasiviswanathan

rmnet devices cannot have a mux id of 255. This is validated when
assigning the mux id to the rmnet devices. As a result, checking for
mux id 255 does not apply in egress path.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 0553932..b2d317e3 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -143,10 +143,7 @@ static int rmnet_map_egress_handler(struct sk_buff *skb,
if (!map_header)
goto fail;
 
-   if (mux_id == 0xff)
-   map_header->mux_id = 0;
-   else
-   map_header->mux_id = mux_id;
+   map_header->mux_id = mux_id;
 
skb->protocol = htons(ETH_P_MAP);
 
-- 
1.9.1

[PATCH net-next v3 RESEND 10/10] net: qualcomm: rmnet: Add support for GSO

2018-01-07 Thread Subash Abhinov Kasiviswanathan

Real devices may support scatter gather(SG), so enable SG on rmnet
devices to use GSO. GSO reduces CPU cycles by 20% for a rate of
146Mpbs for a single stream TCP connection.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
index f7f57ce..570a227 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c
@@ -190,6 +190,7 @@ int rmnet_vnd_newlink(u8 id, struct net_device *rmnet_dev,
 
rmnet_dev->hw_features = NETIF_F_RXCSUM;
rmnet_dev->hw_features |= NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM;
+   rmnet_dev->hw_features |= NETIF_F_SG;
 
rc = register_netdevice(rmnet_dev);
if (!rc) {
-- 
1.9.1

[PATCH net-next v3 RESEND 05/10] net: qualcomm: rmnet: Set pacing shift

2018-01-07 Thread Subash Abhinov Kasiviswanathan

The real device over which the rmnet devices are installed also
aggregate multiple IP packets and sends them as a single large
aggregate frame to the hardware. This causes degraded throughput
for TCP TX due to bufferbloat.

To overcome this problem, pacing shift value of 8 is set using the
sk_pacing_shift_update() helper. This value was determined based
on experiments with a single stream TCP TX using iperf for a
duration of 30s.

Pacing shift | Observed data rate (Mbps)
  10 | 9
   9 | 140
   8 | 146 (Max link rate)

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 8e1f43a..8f8c4f2 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "rmnet_private.h"
 #include "rmnet_config.h"
 #include "rmnet_vnd.h"
@@ -204,6 +205,8 @@ void rmnet_egress_handler(struct sk_buff *skb)
struct rmnet_priv *priv;
u8 mux_id;
 
+   sk_pacing_shift_update(skb->sk, 8);
+
orig_dev = skb->dev;
priv = netdev_priv(orig_dev);
skb->dev = priv->real_dev;
-- 
1.9.1

[PATCH net-next v3 RESEND 09/10] net: qualcomm: rmnet: Add support for TX checksum offload

2018-01-07 Thread Subash Abhinov Kasiviswanathan

TX checksum offload applies to TCP / UDP packets which are not
fragmented using the MAPv4 checksum trailer. The following needs to be
done to have checksum computed in hardware -

1. Set the checksum start offset and inset offset.
2. Set the csum_enabled bit
3. Compute and set 1's complement of partial checksum field in
   transport header.

Signed-off-by: Subash Abhinov Kasiviswanathan 
---
 .../net/ethernet/qualcomm/rmnet/rmnet_handlers.c   |   8 ++
 drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h|   2 +
 .../net/ethernet/qualcomm/rmnet/rmnet_map_data.c   | 120 +
 drivers/net/ethernet/qualcomm/rmnet/rmnet_vnd.c|   1 +
 4 files changed, 131 insertions(+)

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
index 3409458..601edec 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_handlers.c
@@ -141,11 +141,19 @@ static int rmnet_map_egress_handler(struct sk_buff *skb,
additional_header_len = 0;
required_headroom = sizeof(struct rmnet_map_header);
 
+   if (port->data_format & RMNET_EGRESS_FORMAT_MAP_CKSUMV4) {
+   additional_header_len = sizeof(struct rmnet_map_ul_csum_header);
+   required_headroom += additional_header_len;
+   }
+
if (skb_headroom(skb) < required_headroom) {
if (pskb_expand_head(skb, required_headroom, 0, GFP_KERNEL))
goto fail;
}
 
+   if (port->data_format & RMNET_EGRESS_FORMAT_MAP_CKSUMV4)
+   rmnet_map_checksum_uplink_packet(skb, orig_dev);
+
map_header = rmnet_map_add_map_header(skb, additional_header_len, 0);
if (!map_header)
goto fail;
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
index ca9f473..6ce31e2 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map.h
@@ -89,5 +89,7 @@ struct rmnet_map_header *rmnet_map_add_map_header(struct 
sk_buff *skb,
  int hdrlen, int pad);
 void rmnet_map_command(struct sk_buff *skb, struct rmnet_port *port);
 int rmnet_map_checksum_downlink_packet(struct sk_buff *skb, u16 len);
+void rmnet_map_checksum_uplink_packet(struct sk_buff *skb,
+ struct net_device *orig_dev);
 
 #endif /* _RMNET_MAP_H_ */
diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 881c1dc..c74a6c5 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -171,6 +171,86 @@ static __sum16 *rmnet_map_get_csum_field(unsigned char 
protocol,
 }
 #endif
 
+static void rmnet_map_complement_ipv4_txporthdr_csum_field(void *iphdr)
+{
+   struct iphdr *ip4h = (struct iphdr *)iphdr;
+   void *txphdr;
+   u16 *csum;
+
+   txphdr = iphdr + ip4h->ihl * 4;
+
+   if (ip4h->protocol == IPPROTO_TCP || ip4h->protocol == IPPROTO_UDP) {
+   csum = (u16 *)rmnet_map_get_csum_field(ip4h->protocol, txphdr);
+   *csum = ~(*csum);
+   }
+}
+
+static void
+rmnet_map_ipv4_ul_csum_header(void *iphdr,
+ struct rmnet_map_ul_csum_header *ul_header,
+ struct sk_buff *skb)
+{
+   struct iphdr *ip4h = (struct iphdr *)iphdr;
+   __be16 *hdr = (__be16 *)ul_header, offset;
+
+   offset = htons((__force u16)(skb_transport_header(skb) -
+(unsigned char *)iphdr));
+   ul_header->csum_start_offset = offset;
+   ul_header->csum_insert_offset = skb->csum_offset;
+   ul_header->csum_enabled = 1;
+   if (ip4h->protocol == IPPROTO_UDP)
+   ul_header->udp_ip4_ind = 1;
+   else
+   ul_header->udp_ip4_ind = 0;
+
+   /* Changing remaining fields to network order */
+   hdr++;
+   *hdr = htons((__force u16)*hdr);
+
+   skb->ip_summed = CHECKSUM_NONE;
+
+   rmnet_map_complement_ipv4_txporthdr_csum_field(iphdr);
+}
+
+#if IS_ENABLED(CONFIG_IPV6)
+static void rmnet_map_complement_ipv6_txporthdr_csum_field(void *ip6hdr)
+{
+   struct ipv6hdr *ip6h = (struct ipv6hdr *)ip6hdr;
+   void *txphdr;
+   u16 *csum;
+
+   txphdr = ip6hdr + sizeof(struct ipv6hdr);
+
+   if (ip6h->nexthdr == IPPROTO_TCP || ip6h->nexthdr == IPPROTO_UDP) {
+   csum = (u16 *)rmnet_map_get_csum_field(ip6h->nexthdr, txphdr);
+   *csum = ~(*csum);
+   }
+}
+
+static void
+rmnet_map_ipv6_ul_csum_header(void *ip6hdr,
+ struct rmnet_map_ul_csum_header *ul_header,
+ struct sk_buff *skb)
+{
+   __be16 *hdr = (__be16 *)ul_header, offset;
+
+   offset = htons((__force u16)(skb_transport_header(skb) -
+

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Alan Cox

> everyone.  I'm not saying this always happens, but it is reasonable to
> let the iterative pushback see if we can get to better code in this
> case rather than trying to cut it of with the "because *security*"
> argument.
> 

I'm not arguing otherwise - I'd just prefer most users machines are
secure while we have the discussion and while we see what other
architectural tricks people can come up with

Alan

Re: [PATCH 07/18] [media] uvcvideo: prevent bounds-check bypass via speculative execution

2018-01-07 Thread Dan Williams

On Sun, Jan 7, 2018 at 1:09 AM, Greg KH  wrote:
[..]
> Sorry for the confusion, no, I don't mean the "taint tracking", I mean
> the generic pattern of "speculative out of bounds access" that we are
> fixing here.
>
> Yes, as you mentioned before, there are tons of false-positives in the
> tree, as to find the real problems you have to show that userspace
> controls the access index.  But if we have a generic pattern that can
> rewrite that type of logic into one where it does not matter at all
> (i.e. like the ebpf proposed changes), then it would not be an issue if
> they are false or not, we just rewrite them all to be safe.
>
> We need to find some way not only to fix these issues now (like you are
> doing with this series), but to prevent them from every coming back into
> the codebase again.  It's that second part that we need to keep in the
> back of our minds here, while doing the first portion of this work.

I understand the goal, but I'm not sure any of our current annotation
mechanisms are suitable. We have:

__attribute__((noderef, address_space(x)))

...for the '__user' annotation and other pointers that must not be
de-referenced without a specific accessor. We also have:

__attribute__((bitwise))

...for values that should not be consumed directly without a specific
conversion like endian swapping.

The problem is that we need to see if a value derived from a userspace
controlled input is used to trigger a chain of dependent reads. As far
as I can see the annotation would need to be guided by taint analysis
to be useful, at which point we can just "annotate" the problem spot
with nospec_array_ptr(). Otherwise it seems the scope of a
"__nospec_array_index" annotation would have a low signal to noise
ratio.

Stopping speculation past a uacess_begin() boundary appears to handle
a wide swath of potential problems, and the rest likely needs taint
analysis, at least for now.

All that to say, yes, we need better tooling and infrastructure going forward.

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Linus Torvalds

On Sat, Jan 6, 2018 at 10:33 PM, Willy Tarreau  wrote:
>
> To be fair there's overreaction on both sides. The vast majority of
> users need to get a 100% safe system and will never notice  any
> difference.

There is no such thing as a "100% safe system". Never will be - unless
you make sure you have no users.

Also, people definitely *are* noticing the performance issues with the
current set of patches, and they are causing real problems. Go search
for reports of Amazon AWS slowdowns.

So this whole "security is so important that performance doesn't
matter" mindset is pure and utter garbage.

And the whole "normal people won't even notice" is pure garbage too.
Don't spread that bullshit when you see actual normal people
complaining.

Performance matters. A *LOT*.

Linus

[PATCH] hyperv/netvsc: Delete two error messages for a failed memory allocation in netvsc_init_buf()

2018-01-07 Thread SF Markus Elfring

From: Markus Elfring 
Date: Sun, 7 Jan 2018 21:03:26 +0100

Omit extra messages for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/net/hyperv/netvsc.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 17e529af79dc..c1ec02f801f6 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -275,9 +275,6 @@ static int netvsc_init_buf(struct hv_device *device,
 
net_device->recv_buf = vzalloc(buf_size);
if (!net_device->recv_buf) {
-   netdev_err(ndev,
-  "unable to allocate receive buffer of size %u\n",
-  buf_size);
ret = -ENOMEM;
goto cleanup;
}
@@ -357,8 +354,6 @@ static int netvsc_init_buf(struct hv_device *device,
 
net_device->send_buf = vzalloc(buf_size);
if (!net_device->send_buf) {
-   netdev_err(ndev, "unable to allocate send buffer of size %u\n",
-  buf_size);
ret = -ENOMEM;
goto cleanup;
}
-- 
2.15.1

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Willy Tarreau

On Sun, Jan 07, 2018 at 11:47:07AM -0800, Linus Torvalds wrote:
> And the whole "normal people won't even notice" is pure garbage too.
> Don't spread that bullshit when you see actual normal people
> complaining.
> 
> Performance matters. A *LOT*.

Linus, no need to explain that to me, I'm precisely trying to see how
to disable PTI for a specific process because I face up to 45% loss in
certain circumstances, making it a no-go. But while a few of us have
very specific workloads emphasizing this impact, others have very
different ones and will not notice. For example my laptop did boot
pretty fine and I didn't notice anything until I fire a network
benchmark.

Willy

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Dan Williams

On Sun, Jan 7, 2018 at 11:47 AM, Linus Torvalds
 wrote:
> On Sat, Jan 6, 2018 at 10:33 PM, Willy Tarreau  wrote:
>>
>> To be fair there's overreaction on both sides. The vast majority of
>> users need to get a 100% safe system and will never notice  any
>> difference.
>
> There is no such thing as a "100% safe system". Never will be - unless
> you make sure you have no users.
>
> Also, people definitely *are* noticing the performance issues with the
> current set of patches, and they are causing real problems. Go search
> for reports of Amazon AWS slowdowns.
>
> So this whole "security is so important that performance doesn't
> matter" mindset is pure and utter garbage.
>
> And the whole "normal people won't even notice" is pure garbage too.
> Don't spread that bullshit when you see actual normal people
> complaining.
>
> Performance matters. A *LOT*.

I'm thinking we should provide the option to at least build the
hot-path nospec_array_ptr() usages without an lfence.

CONFIG_SPECTRE1_PARANOIA_SAFE
CONFIG_SPECTRE1_PARANOIA_PERF

...if only for easing performance testing and let the distribution set
its policy.

Where hot-path usages can do:

nospec_relax(nospec_array_ptr())

...to optionally ellide the lfence.

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Linus Torvalds

On Sun, Jan 7, 2018 at 12:12 PM, Willy Tarreau  wrote:
>
> Linus, no need to explain that to me, I'm precisely trying to see how
> to disable PTI for a specific process because I face up to 45% loss in
> certain circumstances, making it a no-go. But while a few of us have
> very specific workloads emphasizing this impact, others have very
> different ones and will not notice. For example my laptop did boot
> pretty fine and I didn't notice anything until I fire a network
> benchmark.

Sure, most people have hardware where the bottleneck is entirely
elsewhere (slow network, rotating disk, whatever).

But this whole "normal people won't notice" is dangerous thinking.
They may well notice very much, we simply don't know what they are
doing.

Quite honesty, it's equally correct to say "normal people won't be
affected by the security issue in the first place".

That laptop that you didn't have any issues with? Likely it never had
an exploit running on it either!

So the whole "normal people" argument is pure and utter garbage. It's
wrong. It's pure shit when it comes to performance, but it's also pure
shit when it comes to the security issue.

Don't use it.

We need to fix the security problem, but we need to do it *without*
these braindead arguments that performance is somehow secondary.

   Linus

Re: [PATCH iproute2] ip fou: Pass family attribute as u8

2018-01-07 Thread Stephen Hemminger

On Sun, 7 Jan 2018 15:28:13 +0100
Filip Moc  wrote:

> This fixes fou on big-endian systems.
> 
> Signed-off-by: Filip Moc 
> ---
>  ip/ipfou.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/ip/ipfou.c b/ip/ipfou.c
> index febc2c8c..1f392ade 100644
> --- a/ip/ipfou.c
> +++ b/ip/ipfou.c
> @@ -52,7 +52,7 @@ static int fou_parse_opt(int argc, char **argv, struct 
> nlmsghdr *n,
>   __u8 ipproto, type;
>   bool gue_set = false;
>   int ipproto_set = 0;
> - unsigned short family = AF_INET;
> + __u8 family = AF_INET;
>  
>   while (argc > 0) {
>   if (!matches(*argv, "port")) {
> @@ -103,7 +103,7 @@ static int fou_parse_opt(int argc, char **argv, struct 
> nlmsghdr *n,
>  
>   addattr16(n, 1024, FOU_ATTR_PORT, port);
>   addattr8(n, 1024, FOU_ATTR_TYPE, type);
> - addattr16(n, 1024, FOU_ATTR_AF, family);
> + addattr8(n, 1024, FOU_ATTR_AF, family);
>  
>   if (ipproto_set)
>   addattr8(n, 1024, FOU_ATTR_IPPROTO, ipproto);

How is this binary compatiable with older versions.

Re: [PATCH v2 net-next] net: tracepoint: exposing sk_faimily in tracepoint inet_sock_set_state

2018-01-07 Thread Song Liu


> On Jan 6, 2018, at 10:31 PM, Yafang Shao  wrote:
> 
> As of now, there're two sk_family are traced with sock:inet_sock_set_state,
> which are AF_INET and AF_INET6.
> So the sk_family are exposed as well.
> Then we can conveniently use it to do the filter.
> 
> Both sk_family and sk_protocol are showed in the printk message, so we need
> not expose them as tracepoint arguments.
> 
> Suggested-by: Brendan Gregg 
> Suggested-by: Song Liu 
> Signed-off-by: Yafang Shao 
> ---
> include/trace/events/sock.h | 15 +--
> 1 file changed, 13 insertions(+), 2 deletions(-)
> 
> diff --git a/include/trace/events/sock.h b/include/trace/events/sock.h
> index 3537c5f..3176a39 100644
> --- a/include/trace/events/sock.h
> +++ b/include/trace/events/sock.h
> @@ -11,7 +11,11 @@
> #include 
> #include 
> 
> -/* The protocol traced by sock_set_state */
> +#define family_names \
> + EM(AF_INET) \
> + EMe(AF_INET6)
> +
> +/* The protocol traced by inet_sock_set_state */
> #define inet_protocol_names   \
>   EM(IPPROTO_TCP) \
>   EM(IPPROTO_DCCP)\
> @@ -37,6 +41,7 @@
> #define EM(a)   TRACE_DEFINE_ENUM(a);
> #define EMe(a)  TRACE_DEFINE_ENUM(a);
> 
> +family_names
> inet_protocol_names
> tcp_state_names
> 
> @@ -45,6 +50,9 @@
> #define EM(a)   { a, #a },
> #define EMe(a)  { a, #a }
> 
> +#define show_family_name(val)\
> + __print_symbolic(val, family_names)
> +
> #define show_inet_protocol_name(val)\
>   __print_symbolic(val, inet_protocol_names)
> 
> @@ -118,6 +126,7 @@
>   __field(int, newstate)
>   __field(__u16, sport)
>   __field(__u16, dport)
> + __field(__u16, family)
>   __field(__u8, protocol)
>   __array(__u8, saddr, 4)
>   __array(__u8, daddr, 4)
> @@ -134,6 +143,7 @@
>   __entry->oldstate = oldstate;
>   __entry->newstate = newstate;
> 
> + __entry->family = sk->sk_family;
>   __entry->protocol = sk->sk_protocol;
>   __entry->sport = ntohs(inet->inet_sport);
>   __entry->dport = ntohs(inet->inet_dport);
> @@ -160,7 +170,8 @@
>   }
>   ),
> 
> - TP_printk("protocol=%s sport=%hu dport=%hu saddr=%pI4 daddr=%pI4 
> saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
> + TP_printk("family=%s protocol=%s sport=%hu dport=%hu saddr=%pI4 
> daddr=%pI4 saddrv6=%pI6c daddrv6=%pI6c oldstate=%s newstate=%s",
> + show_family_name(__entry->family),
>   show_inet_protocol_name(__entry->protocol),
>   __entry->sport, __entry->dport,
>   __entry->saddr, __entry->daddr,
> --
> 1.8.3.1
> 

Reviewed-by: Song Liu

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Thomas Gleixner

On Sun, 7 Jan 2018, Linus Torvalds wrote:
> We need to fix the security problem, but we need to do it *without*
> these braindead arguments that performance is somehow secondary.

I surely agree, but we have gone the way of PTI without the ability of
exempting individual processes exactly for one reason:

  Lack of time

It can be done on top of the PTI implementation and it won't take ages.

For spectre_v1/2 we face the same problem simply because we got informed so
much ahead of time and we were all twiddling thumbs, enjoying our christmas
vacation and having a good time.

The exploits are out in the wild and they are observed already, so we
really have to make a decision whether we want to fix that in the most
obvious ways even if it hurts performance right now and then take a break
from all that hell and sit down and sort the performance mess or whether we
want to discuss the right way to fix it for the next 3 month and leave all
doors open until the last bit of performance is squeezed out.

I totally can understand the anger of those who learned all of this a few
days ago and are now grasping straws to avoid the obviously coming
performance hit, but its not our fault that we have to make a decision
which cannot make everyone happy tomorrow.

If the general sentiment is that we have to fix the performance issue
first, then please let me know and I take 3 weeks of vacation right now.

Thanks,

tglx

Re: dvb usb issues since kernel 4.9

2018-01-07 Thread Linus Torvalds

On Sat, Jan 6, 2018 at 11:54 AM, Mauro Carvalho Chehab
 wrote:
>
> Em Sat, 6 Jan 2018 16:04:16 +0100
> "Josef Griebichler"  escreveu:
>>
>> the causing commit has been identified.
>> After reverting commit 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4cd13c21b207e80ddb1144c576500098f2d5f882
>> its working again.
>
> Just replying to me won't magically fix this. The ones that were involved on
> this patch should also be c/c, plus USB people. Just added them.

Actually, you seem to have added an odd subset of the people involved.

For example, Ingo - who actually committed that patch - wasn't on the cc.

I do think we need to simply revert that patch. It's very simple: it
has been reported to lead to actual problems for people, and we don't
fix one problem and then say "well, it fixed something else" when
something breaks.

When something breaks, we either unbreak it, or we revert the change
that caused the breakage.

It's really that simple. That's what "no regressions" means.  We don't
accept changes that cause regressions. This one did.

  Linus

Re: [PATCH iproute2] ip fou: Pass family attribute as u8

2018-01-07 Thread Filip Moc

Hi Stephen,

> How is this binary compatiable with older versions.
I'm not sure what you mean. Kernel expects family attribute to be passed as u8
(and it always did). But current ip passes family attribute as u16. It works
now only on little endian systems because the relevant byte happens to be on
the same position. It is the current version of ip which is incompatible.

Filip

Re: [PATCH 06/18] x86, barrier: stop speculation for failed access_ok

2018-01-07 Thread Willy Tarreau

On Sun, Jan 07, 2018 at 12:17:11PM -0800, Linus Torvalds wrote:
> We need to fix the security problem, but we need to do it *without*
> these braindead arguments that performance is somehow secondary.

OK OK. At least we should have security by default and let people trade
it against performance if they want and understand the risk. People
never know when they're secure enough (security doesn't measure) but
they know fairly well when they're not performant enough to take action
(most often changing the machine).

Willy

Re: [PATCH] atm/clip: Use seq_puts() in svc_addr()

2018-01-07 Thread Andy Shevchenko

On Sat, Jan 6, 2018 at 11:44 PM, SF Markus Elfring
 wrote:
> From: Markus Elfring 
> Date: Sat, 6 Jan 2018 22:34:12 +0100
>
> Two strings should be quickly put into a sequence by two function calls.
> Thus use the function "seq_puts" instead of "seq_printf".
>
> This issue was detected by using the Coccinelle software.
>
> Signed-off-by: Markus Elfring 
> ---
>  net/atm/clip.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/atm/clip.c b/net/atm/clip.c
> index d4f6029d5109..62a852165b19 100644
> --- a/net/atm/clip.c
> +++ b/net/atm/clip.c
> @@ -708,11 +708,11 @@ static void svc_addr(struct seq_file *seq, struct 
> sockaddr_atmsvc *addr)
> static int e164[] = { 1, 8, 4, 6, 1, 0 };
>
> if (*addr->sas_addr.pub) {
> -   seq_printf(seq, "%s", addr->sas_addr.pub);
> +   seq_puts(seq, addr->sas_addr.pub);

Which opens a lot of security concerns.
Never do this again.

> if (*addr->sas_addr.prv)
> seq_putc(seq, '+');
> } else if (!*addr->sas_addr.prv) {

> -   seq_printf(seq, "%s", "(none)");
> +   seq_puts(seq, "(none)");

...while this one is okay per se, better to keep above pattern (same
style over the piece of code / function).

> return;
> }
> if (*addr->sas_addr.prv) {
> --
> 2.15.1
>

P.S. I'm wondering what would be first, Markus starts looking into the
actual code, or most (all) of the maintainers just ban him.

-- 
With Best Regards,
Andy Shevchenko

[PATCH] ixgbe: use ARRAY_SIZE for array sizing calculation on array buf

2018-01-07 Thread Colin King

From: Colin Ian King 

Use the ARRAY_SIZE macro on array buf to determine size of the array.
Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
index cb7da5f9c4da..28c23ef2ec28 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_x550.c
@@ -949,7 +949,7 @@ static s32 ixgbe_checksum_ptr_x550(struct ixgbe_hw *hw, u16 
ptr,
u16 length, bufsz, i, start;
u16 *local_buffer;
 
-   bufsz = sizeof(buf) / sizeof(buf[0]);
+   bufsz = ARRAY_SIZE(buf);
 
/* Read a chunk at the pointer location */
if (!buffer) {
-- 
2.15.1

[PATCH] be2net: use ARRAY_SIZE for array sizing calculation on array cmd_priv_map

2018-01-07 Thread Colin King

From: Colin Ian King 

Use the ARRAY_SIZE macro on array cmd_priv_map to determine size of the
array.  Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/emulex/benet/be_cmds.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_cmds.c 
b/drivers/net/ethernet/emulex/benet/be_cmds.c
index 02dd5246dfae..1a49297224ed 100644
--- a/drivers/net/ethernet/emulex/benet/be_cmds.c
+++ b/drivers/net/ethernet/emulex/benet/be_cmds.c
@@ -103,7 +103,7 @@ static struct be_cmd_priv_map cmd_priv_map[] = {
 static bool be_cmd_allowed(struct be_adapter *adapter, u8 opcode, u8 subsystem)
 {
int i;
-   int num_entries = sizeof(cmd_priv_map)/sizeof(struct be_cmd_priv_map);
+   int num_entries = ARRAY_SIZE(cmd_priv_map);
u32 cmd_privileges = adapter->cmd_privileges;
 
for (i = 0; i < num_entries; i++)
-- 
2.15.1

[PATCH] ipv6: use ARRAY_SIZE for array sizing calculation on array seg6_action_table

2018-01-07 Thread Colin King

From: Colin Ian King 

Use the ARRAY_SIZE macro on array seg6_action_table to determine size of
the array. Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King 
---
 net/ipv6/seg6_local.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/seg6_local.c b/net/ipv6/seg6_local.c
index 825b8e01f947..ba3767ef5e93 100644
--- a/net/ipv6/seg6_local.c
+++ b/net/ipv6/seg6_local.c
@@ -501,7 +501,7 @@ static struct seg6_action_desc *__get_action_desc(int 
action)
struct seg6_action_desc *desc;
int i, count;
 
-   count = sizeof(seg6_action_table) / sizeof(struct seg6_action_desc);
+   count = ARRAY_SIZE(seg6_action_table);
for (i = 0; i < count; i++) {
desc = &seg6_action_table[i];
if (desc->action == action)
-- 
2.15.1

pull-request: bpf-next 2018-01-07

2018-01-07 Thread Daniel Borkmann

Hi David,

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a start of a framework for extending struct xdp_buff without
   having the overhead of populating every data at runtime. Idea
   is to have a new per-queue struct xdp_rxq_info that holds read
   mostly data (currently that is, queue number and a pointer to
   the corresponding netdev) which is set up during rxqueue config
   time. When a XDP program is invoked, struct xdp_buff holds a
   pointer to struct xdp_rxq_info that the BPF program can then
   walk. The user facing BPF program that uses struct xdp_md for
   context can use these members directly, and the verifier rewrites
   context access transparently by walking the xdp_rxq_info and
   net_device pointers to load the data, from Jesper.

2) Redo the reporting of offload device information to user space
   such that it works in combination with network namespaces. The
   latter is reported through a device/inode tuple as similarly
   done in other subsystems as well (e.g. perf) in order to identify
   the namespace. For this to work, ns_get_path() has been generalized
   such that the namespace can be retrieved not only from a specific
   task (perf case), but also from a callback where we deduce the
   netns (ns_common) from a netdevice. bpftool support using the new
   uapi info and extensive test cases for test_offload.py in BPF
   selftests have been added as well, from Jakub.

3) Add two bpftool improvements: i) properly report the bpftool
   version such that it corresponds to the version from the kernel
   source tree. So pick the right linux/version.h from the source
   tree instead of the installed one. ii) fix bpftool and also
   bpf_jit_disasm build with bintutils >= 2.9. The reason for the
   build breakage is that binutils library changed the function
   signature to select the disassembler. Given this is needed in
   multiple tools, add a proper feature detection to the
   tools/build/features infrastructure, from Roman.

4) Implement the BPF syscall command BPF_MAP_GET_NEXT_KEY for the
   stacktrace map. It is currently unimplemented, but there are
   use cases where user space needs to walk all stacktrace map
   entries e.g. for dumping or deleting map entries w/o having to
   close and recreate the map. Add BPF selftests along with it,
   from Yonghong.

5) Few follow-up cleanups for the bpftool cgroup code: i) rename
   the cgroup 'list' command into 'show' as we have it for other
   subcommands as well, ii) then alias the 'show' command such that
   'list' is accepted which is also common practice in iproute2,
   and iii) remove couple of newlines from error messages using
   p_err(), from Jakub.

6) Two follow-up cleanups to sockmap code: i) remove the unused
   bpf_compute_data_end_sk_skb() function and ii) only build the
   sockmap infrastructure when CONFIG_INET is enabled since it's
   only aware of TCP sockets at this time, from John.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git

Thanks a lot & a happy new year!



The following changes since commit 6bb8824732f69de0f233ae6b1a8158e149627b38:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2017-12-29 
15:42:26 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git 

for you to fetch changes up to 9be99badee761f0b2c065ecbd8bd54a96cbd0fa0:

  Merge branch 'bpf-stacktrace-map-next-key-support' (2018-01-06 23:52:23 +0100)


Alexei Starovoitov (1):
  Merge branch 'xdp_rxq_info'

Daniel Borkmann (3):
  Merge branch 'bpf-offload-report-dev'
  Merge branch 'bpf-bpftool-misc-fixes'
  Merge branch 'bpf-stacktrace-map-next-key-support'

Jakub Kicinski (12):
  bpf: offload: don't require rtnl for dev list manipulation
  bpf: offload: don't use prog->aux->offload as boolean
  bpf: offload: allow netdev to disappear while verifier is running
  bpf: offload: free prog->aux->offload when device disappears
  bpf: offload: free program id when device disappears
  nsfs: generalize ns_get_path() for path resolution with a task
  bpf: offload: report device information for offloaded programs
  tools: bpftool: report device information for offloaded programs
  selftests/bpf: test device info reporting for bound progs
  tools: bpftool: rename cgroup list -> show in the code
  tools: bpftool: alias show and list commands
  tools: bpftool: remove new lines from errors

Jesper Dangaard Brouer (14):
  xdp: base API for new XDP rx-queue info concept
  xdp/mlx5: setup xdp_rxq_info
  i40e: setup xdp_rxq_info
  ixgbe: setup xdp_rxq_info
  xdp/qede: setup xdp_rxq_info and intro xdp_rxq_info_is_reg
  mlx4: setup xdp_rxq_info
  bnxt_en: set

1 2 >

1 - 100 of 124 matches

Mail list logo