Re: [PATCH net-next V2] tun: introduce tx skb ring

2016-06-28 Thread Michael S. Tsirkin
On Thu, Jun 23, 2016 at 01:14:07PM +0800, Jason Wang wrote:
> 
> 
> On 2016年06月23日 02:18, Michael S. Tsirkin wrote:
> > On Fri, Jun 17, 2016 at 03:41:20AM +0300, Michael S. Tsirkin wrote:
> > > >Would it help to have ptr_ring_resize that gets an array of
> > > >rings and resizes them both to same length?
> > OK, here it is. Untested so far, and no skb wrapper.
> > Pls let me know whether this is what you had in mind.
> 
> Exactly what I want.
> 
> Thanks

Ok and this for skb_array

-->
skb_array: add wrappers for resizing

Signed-off-by: Michael S. Tsirkin 

--

diff --git a/include/linux/skb_array.h b/include/linux/skb_array.h
index c900708..7e01c1f 100644
--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -151,16 +151,24 @@ static inline int skb_array_init(struct skb_array *a, int 
size, gfp_t gfp)
return ptr_ring_init(&a->ring, size, 0, gfp);
 }
 
-void __skb_array_destroy_skb(void *ptr)
+static void __skb_array_destroy_skb(void *ptr)
 {
kfree_skb(ptr);
 }
 
-int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
+static inline int skb_array_resize(struct skb_array *a, int size, gfp_t gfp)
 {
return ptr_ring_resize(&a->ring, size, gfp, __skb_array_destroy_skb);
 }
 
+static inline int skb_raay_resize_multiple(struct skb_array **rings, int 
nrings,
+  int size, gfp_t gfp)
+{
+   BUILD_BUG_ON(offsetof(struct skb_array, ring));
+   ptr_ring_resize_multiple((struct ptr_ring **)rings, nrings, size, gfp,
+__skb_array_destroy_skb);
+}
+
 static inline void skb_array_cleanup(struct skb_array *a)
 {
ptr_ring_cleanup(&a->ring, __skb_array_destroy_skb);


Re: [PATCH 2/3] can: fix oops caused by wrong rtnl dellink usage

2016-06-28 Thread Holger Schurig
> static void can_dellink(struct net_device *dev, struct list_head *head);
>
> and
>
> static void can_dellink(struct net_device *dev, struct list_head *head)
> {
>   return;
> }

Wouldn't the canonical form be this:

static void can_dellink(struct net_device *dev, struct list_head *head)
{
}


- the curly braces make sure this isn't a forward definition
- but no useless return either


But then again, this "return" is only cosmetical. No compiler will
generate any code from it.


Re: IP ID check (flush_id) in inet_gro_receive is necessary or not?

2016-06-28 Thread Tan Xiaojun
On 2016/6/28 12:57, Eric Dumazet wrote:
> On Tue, 2016-06-28 at 12:40 +0800, Tan Xiaojun wrote:
>> Hi everyone,
>>
>>  I'm sorry to bother you. But I was confused.
>>
>>  The IP ID check (flush_id) in inet_gro_receive is only used by
>> tcp_gro_receive, and in tcp_gro_receive we have tcphdr check to ensure
>> the order of skbs,
>>  like below:
>>
>>  flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
>>  flush |= (ntohl(th2->seq) + skb_gro_len(p)) ^ ntohl(th->seq);
>>
>>  So if I remove the IP ID check in inet_gro_receive, there will be a
>> problem ? And under what circumstances ?
> 
> You probably missed a recent patch ?
> 

Thank you very much. 

Is this patch means forcing the IP ID to be incrementing by 1 is necessary in 
the
case of using tunnel (if the IP_DF is not set in frag_off).

I have not used the tunneled frames. Do you have some examples for that ?

Xiaojun.

> commit 1530545ed64b42e87acb43c0c16401bd1ebae6bf
> Author: Alexander Duyck 
> Date:   Sun Apr 10 21:44:57 2016 -0400
> 
> GRO: Add support for TCP with fixed IPv4 ID field, limit tunnel IP ID 
> values
> 
> This patch does two things.
> 
> First it allows TCP to aggregate TCP frames with a fixed IPv4 ID field.  
> As
> a result we should now be able to aggregate flows that were converted from
> IPv6 to IPv4.  In addition this allows us more flexibility for future
> implementations of segmentation as we may be able to use a fixed IP ID 
> when
> segmenting the flow.
> 
> The second thing this does is that it places limitations on the outer IPv4
> ID header in the case of tunneled frames.  Specifically it forces the IP 
> ID
> to be incrementing by 1 unless the DF bit is set in the outer IPv4 header.
> This way we can avoid creating overlapping series of IP IDs that could
> possibly be fragmented if the frame goes through GRO and is then
> resegmented via GSO.
> 
> Signed-off-by: Alexander Duyck 
> Signed-off-by: David S. Miller 
> 
> 
> 
> .
> 




Re: [PATCH v10 04/22] IB/hns: Add RoCE engine reset function

2016-06-28 Thread Leon Romanovsky
On Tue, Jun 28, 2016 at 02:31:41PM +0800, Wei Hu (Xavier) wrote:
> 
> 
> On 2016/6/27 16:31, oulijun wrote:
> >Hi, Leon
> >在 2016/6/27 16:01, Leon Romanovsky 写道:
> >>On Sat, Jun 25, 2016 at 06:25:37PM +0800, Wei Hu (Xavier) wrote:
> >>>
> >>>On 2016/6/24 22:59, Leon Romanovsky wrote:
> On Thu, Jun 16, 2016 at 10:35:12PM +0800, Lijun Ou wrote:
> >This patch mainly added reset flow of RoCE engine in RoCE
> >driver. It is necessary when RoCE is loaded and removed.
> >
> >Signed-off-by: Wei Hu 
> >Signed-off-by: Nenglong Zhao 
> >Signed-off-by: Lijun Ou 
> >---
> >>...
> >>
> >+
> >+#define SLEEP_TIME_INTERVAL 20
> >+
> >+extern int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool 
> >enable);
> Why did you add this extern?
> You already exported this function.
> drivers/net/ethernet/hisilicon/hns/hns_dsaf_main.c:EXPORT_SYMBOL(hns_dsaf_roce_reset);
> >>>Hi, Leon
> >>>
> >>> The function named hns_dsaf_roce_reset is defined in 
> >>> hns_dsaf_main.c
> >>> It exists in hns_dsaf.ko(ethernet driver)
> >>>
> >>> RoCE driver will call this function.
> >>>
> >>> Your suggestion is that delete "extern" as below:
> >>> In /drivers/infiniband/hw/hns/hns_roce_hw_v1.h:
> >>>
> >>>   int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool
> >>>enable);
> >>>
> >>>Right? or other soultion?
> >>You placed it in header file.
> >>Please move it to your hns_roce_hw_v1.c file.
> >>
> >  You suggest to do as follows, right?
> >  in hns_roce_hw_v1.c
> >int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool enable);
> >
> >  and delete the keyword extern
> >
> >  Bcause reserve the extern in hns_roce_hw_v1.c, the checkpatch is not pass.
> Hi, Leon & Doug Ledford
> 
> If we move it to hns_roce_hw_v1.c file as below:
> int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode, bool
> enable);
> The result of checkpatch is warning.
> 
> We prepare to add a head file for this function as below:
> In the directory of include\linux,  mkdir hns.
> add hns_driver.h in include\linux\hns.
> In the file of hns_driver.h, the declaration:
>int hns_dsaf_roce_reset(struct fwnode_handle *dsaf_fwnode,
> bool enable);
> What do you think about?
> 
>

Please avoid creating new directories/files under include/linux,
especially for one function only.


signature.asc
Description: Digital signature


Re: [PATCH] mpls: Add missing RCU-bh read side critical section locking in output path

2016-06-28 Thread Lennert Buytenhek
On Thu, Jun 23, 2016 at 12:00:55PM -0400, David Miller wrote:

> > From: David Barroso 
> > 
> > When locally originated IP traffic hits a route that says to push
> > MPLS labels, we'll get a call chain dst_output() -> lwtunnel_output()
> > -> mpls_output() -> neigh_xmit() -> ___neigh_lookup_noref() where the
> > last function in this chain accesses a RCU-bh protected struct
> > neigh_table pointer without us ever having declared an RCU-bh read
> > side critical section.
> > 
> > As in case of locally originated IP traffic we'll be running in process
> > context, with softirqs enabled, we can be preempted by a softirq at any
> > time, and RCU-bh considers the completion of a softirq as signaling
> > the end of any pending read-side critical sections, so if we do get a
> > softirq here, we can end up with an unexpected RCU grace period and
> > all the nastiness that that comes with.
> > 
> > This patch makes neigh_xmit() take rcu_read_{,un}lock_bh() around the
> > code that expects to be treated as an RCU-bh read side critical section.
> > 
> > Signed-off-by: David Barroso 
> > Signed-off-by: Lennert Buytenhek 
> 
> Whilst the case that was used to discover this problem was MPLS, that
> is not the subsystem where the bug exists and is being fixed.
> 
> Therefore please fix your Subject line.
> 
> Thanks.

I'd say that the bug _is_ in the MPLS code, but that we're just fixing
it in a helper function that lives elsewhere (and which is only used by
MPLS), but yeah, the subject line and the patch body don't match up. :(
I've resubmitted the patch with the commit message below, I hope that
that'll do.

Thanks!


===

[PATCH] neigh: Explicitly declare RCU-bh read side critical section in 
neigh_xmit()

From: David Barroso 

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso 
Signed-off-by: Lennert Buytenhek 
Acked-by: David Ahern 
Acked-by: Robert Shearman 


[PATCH] neigh: Explicitly declare RCU-bh read side critical section in neigh_xmit()

2016-06-28 Thread Lennert Buytenhek
From: David Barroso 

neigh_xmit() expects to be called inside an RCU-bh read side critical
section, and while one of its two current callers gets this right, the
other one doesn't.

More specifically, neigh_xmit() has two callers, mpls_forward() and
mpls_output(), and while both callers call neigh_xmit() under
rcu_read_lock(), this provides sufficient protection for neigh_xmit()
only in the case of mpls_forward(), as that is always called from
softirq context and therefore doesn't need explicit BH protection,
while mpls_output() can be called from process context with softirqs
enabled.

When mpls_output() is called from process context, with softirqs
enabled, we can be preempted by a softirq at any time, and RCU-bh
considers the completion of a softirq as signaling the end of any
pending read-side critical sections, so if we do get a softirq
while we are in the part of neigh_xmit() that expects to be run inside
an RCU-bh read side critical section, we can end up with an unexpected
RCU grace period running right in the middle of that critical section,
making things go boom.

This patch fixes this impedance mismatch in the callee, by making
neigh_xmit() always take rcu_read_{,un}lock_bh() around the code that
expects to be treated as an RCU-bh read side critical section, as this
seems a safer option than fixing it in the callers.

Fixes: 4fd3d7d9e868f ("neigh: Add helper function neigh_xmit")
Signed-off-by: David Barroso 
Signed-off-by: Lennert Buytenhek 
Acked-by: David Ahern 
Acked-by: Robert Shearman 
---
 net/core/neighbour.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/net/core/neighbour.c b/net/core/neighbour.c
index 29dd8cc..510cd62 100644
--- a/net/core/neighbour.c
+++ b/net/core/neighbour.c
@@ -2469,13 +2469,17 @@ int neigh_xmit(int index, struct net_device *dev,
tbl = neigh_tables[index];
if (!tbl)
goto out;
+   rcu_read_lock_bh();
neigh = __neigh_lookup_noref(tbl, addr, dev);
if (!neigh)
neigh = __neigh_create(tbl, addr, dev, false);
err = PTR_ERR(neigh);
-   if (IS_ERR(neigh))
+   if (IS_ERR(neigh)) {
+   rcu_read_unlock_bh();
goto out_kfree_skb;
+   }
err = neigh->output(neigh, skb);
+   rcu_read_unlock_bh();
}
else if (index == NEIGH_LINK_TABLE) {
err = dev_hard_header(skb, dev, ntohs(skb->protocol),
-- 
2.7.4


Backported alx driver fix for stable 4.4 kernel and older

2016-06-28 Thread Feng Tang
Hi David,

Greg KH has picked up the alx driver fix for 4.6 stable kernel, and
people are asking about plan for 4.4/4.1 stable kernel in
https://bugzilla.kernel.org/show_bug.cgi?id=70761

Since the fix patch in your "net" git can't be applied to 4.4 and older
kernel as is, so I backport it as follows. Could you help to add it
to your stable queue.

Let me know if I break any netdev+stable patch rule, thanks

- Feng



>From 9c9caee22400c7ed3a514b1ee5e017e5e5b6b812 Mon Sep 17 00:00:00 2001
From: Feng Tang 
Date: Fri, 24 Jun 2016 15:26:05 +0800
Subject: [PATCH] net: alx: Work around the DMA RX overflow issue

Note: This is a verified backported patch for stable 4.4 kernel, and it
could also be applied to 4.3/4.2/4.1/3.18/3.16

There is a problem with alx devices, that the network link will be
lost in 1-5 minutes after the device is up.

>From debugging without datasheet, we found the error always
happen when the DMA RX address is set to 0xfc0, which is very
likely to be a HW/silicon problem.

This patch will apply rx skb with 64 bytes longer space, and if the
allocated skb has a 0x...fc0 address, it will use skb_resever(skb, 64)
to advance the address, so that the RX overflow can be avoided.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
Signed-off-by: Feng Tang 
Suggested-by: Eric Dumazet 
Tested-by: Ole Lukoie 
---
 drivers/net/ethernet/atheros/alx/main.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/alx/main.c 
b/drivers/net/ethernet/atheros/alx/main.c
index c8af3ce..a43d6a8 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -86,9 +86,14 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t 
gfp)
while (!cur_buf->skb && next != rxq->read_idx) {
struct alx_rfd *rfd = &rxq->rfd[cur];
 
-   skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp);
+   skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size + 64, gfp);
if (!skb)
break;
+
+   /* Workround for the HW RX DMA overflow issue */
+   if (((unsigned long)skb->data & 0xfff) == 0xfc0)
+   skb_reserve(skb, 64);
+
dma = dma_map_single(&alx->hw.pdev->dev,
 skb->data, alx->rxbuf_size,
 DMA_FROM_DEVICE);
-- 
2.5.0

>From 9c9caee22400c7ed3a514b1ee5e017e5e5b6b812 Mon Sep 17 00:00:00 2001
From: Feng Tang 
Date: Fri, 24 Jun 2016 15:26:05 +0800
Subject: [PATCH] net: alx: Work around the DMA RX overflow issue

Note: This is a verified backported patch for stable 4.4 kernel, and it
could also be applied to 4.3/4.2/4.1/3.18/3.16

There is a problem with alx devices, that the network link will be
lost in 1-5 minutes after the device is up.

>From debugging without datasheet, we found the error always
happen when the DMA RX address is set to 0xfc0, which is very
likely to be a HW/silicon problem.

This patch will apply rx skb with 64 bytes longer space, and if the
allocated skb has a 0x...fc0 address, it will use skb_resever(skb, 64)
to advance the address, so that the RX overflow can be avoided.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
Signed-off-by: Feng Tang 
Suggested-by: Eric Dumazet 
Tested-by: Ole Lukoie 
---
 drivers/net/ethernet/atheros/alx/main.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/atheros/alx/main.c b/drivers/net/ethernet/atheros/alx/main.c
index c8af3ce..a43d6a8 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -86,9 +86,14 @@ static int alx_refill_rx_ring(struct alx_priv *alx, gfp_t gfp)
 	while (!cur_buf->skb && next != rxq->read_idx) {
 		struct alx_rfd *rfd = &rxq->rfd[cur];
 
-		skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size, gfp);
+		skb = __netdev_alloc_skb(alx->dev, alx->rxbuf_size + 64, gfp);
 		if (!skb)
 			break;
+
+		/* Workround for the HW RX DMA overflow issue */
+		if (((unsigned long)skb->data & 0xfff) == 0xfc0)
+			skb_reserve(skb, 64);
+
 		dma = dma_map_single(&alx->hw.pdev->dev,
  skb->data, alx->rxbuf_size,
  DMA_FROM_DEVICE);
-- 
2.5.0



Re: [PATCH net] bonding: fix 802.3ad aggregator reselection

2016-06-28 Thread David Miller
From: Jay Vosburgh 
Date: Thu, 23 Jun 2016 14:20:51 -0700

> 
>   Since commit 7bb11dc9f59d ("bonding: unify all places where
> actor-oper key needs to be updated."), the logic in bonding to handle
> selection between multiple aggregators has not functioned.
> 
>   This affects only configurations wherein the bonding slaves
> connect to two discrete aggregators (e.g., two independent switches, each
> with LACP enabled), thus creating two separate aggregation groups within a
> single bond.
> 
>   The cause is a change in 7bb11dc9f59d to no longer set
> AD_PORT_BEGIN on a port after a link state change, which would cause the
> port to be reselected for attachment to an aggregator as if were newly
> added to the bond.  We cannot restore the prior behavior, as it
> contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
> inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
> 5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
> now remains selected, the aggregator selection logic is not invoked.
> 
>   A side effect of this change is that aggregators in bonding will
> now contain ports that are link down.  The aggregator selection logic
> does not currently handle this situation correctly, causing incorrect
> aggregator selection.
> 
>   This patch makes two changes to repair the aggregator selection
> logic in bonding to function as documented and within the confines of the
> standard:
> 
>   First, the aggregator selection and related logic now utilizes the
> number of active ports per aggregator, not the number of selected ports
> (as some selected ports may be down).  The ad_select "bandwidth" and
> "count" options only consider ports that are link up.
> 
>   Second, on any carrier state change of any slave, the aggregator
> selection logic is explicitly called to insure the correct aggregator is
> active.
> 
> Reported-by: Veli-Matti Lintu 
> Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to 
> be updated.")
> Signed-off-by: Jay Vosburgh 

Applied and queued up for -stable, thanks Jay.


Re: [PATCH net 0/3] net: bgmac: Random fixes

2016-06-28 Thread David Miller
From: Florian Fainelli 
Date: Thu, 23 Jun 2016 14:23:11 -0700

> This patch series fixes a few issues spotted by code inspection and
> actual testing.

Series applied, thanks.


[PATCH net-next] net_sched: netem: do not call qdisc_drop() with a NULL skb

2016-06-28 Thread Eric Dumazet
From: Eric Dumazet 

If skb_unshare() fails, we call qdisc_drop() with a NULL skb, which
is no longer supported.

Fixes: 520ac30f4551 ("net_sched: drop packets after root qdisc lock is 
released")
Signed-off-by: Eric Dumazet 
Reported-by: Dan Carpenter 
---
 net/sched/sch_netem.c |   12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 
ccca8ca4c722c603e8b8e6052eead51243e590b5..6eac3d8800480a4c463ae8d3b78a4fcfeec8165b
 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -487,10 +487,14 @@ static int netem_enqueue(struct sk_buff *skb, struct 
Qdisc *sch,
skb = segs;
segs = segs->next;
 
-   if (!(skb = skb_unshare(skb, GFP_ATOMIC)) ||
-   (skb->ip_summed == CHECKSUM_PARTIAL &&
-skb_checksum_help(skb))) {
-   rc = qdisc_drop(skb, sch, to_free);
+   skb = skb_unshare(skb, GFP_ATOMIC);
+   if (unlikely(!skb)) {
+   qdisc_qstats_drop(sch);
+   goto finish_segs;
+   }
+   if (skb->ip_summed == CHECKSUM_PARTIAL &&
+   skb_checksum_help(skb)) {
+   qdisc_drop(skb, sch, to_free);
goto finish_segs;
}
 




Re: [PATCH 1/2] net: ethernet: dnet: use phydev from struct net_device

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Thu, 23 Jun 2016 23:48:58 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 2/2] net: ethernet: dnet: use phy_ethtool_{get|set}_link_ksettings

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Thu, 23 Jun 2016 23:48:59 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH usbnet v2.1] mtu change needs to stop RX

2016-06-28 Thread David Miller
From: Soohoon Lee 
Date: Fri, 24 Jun 2016 00:30:16 +

> 
> When MTU is changed unlink_urbs() flushes RX Q but mean while usbnet_bh() can 
> fill up the Q at the same time.
> Depends on which HCD is down there unlink takes long time then the flush 
> never ends.
> 
> Signed-off-by: Soohoon Lee 
> Reviewed-by: Kimball Murray 

This patch is mangled by your email client, the TAB characters have been
converted into spaces.

Please do not resubmit this patah until you can successfully email the
patch to yourself and apply it cleanly.

Also, your subject line should be formatted like:

[PATCH net v2.x] usbnet: 

Thanks.



Re: [PATCH v3] net/mlx5: use mlx5_buf_alloc_node instead of mlx5_buf_alloc in mlx5_wq_ll_create

2016-06-28 Thread David Miller
From: Wang Sheng-Hui 
Date: Fri, 24 Jun 2016 08:52:11 +0800

> Commit 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on
> reader NUMA node") introduced mlx5_*_alloc_node() but missed changing
> some calling and warn messages. This patch introduces 2 changes:
>   * Use mlx5_buf_alloc_node() instead of mlx5_buf_alloc() in
> mlx5_wq_ll_create()
>   * Update the failure warn messages with _node postfix for
> mlx5_*_alloc function names
> 
> Fixes: 311c7c71c9bb ("net/mlx5e: Allocate DMA coherent memory on reader NUMA 
> node")
> Signed-off-by: Wang Sheng-Hui 

Applied.


Re: [PATCH 3/3] net: hisilicon: Add Fast Ethernet MAC driver

2016-06-28 Thread Dongpo Li


On 2016/6/15 5:20, Arnd Bergmann wrote:
> On Tuesday, June 14, 2016 9:17:44 PM CEST Li Dongpo wrote:
>> On 2016/6/13 17:06, Arnd Bergmann wrote:
>>> On Monday, June 13, 2016 2:07:56 PM CEST Dongpo Li wrote:
>>> You tx function uses BQL to optimize the queue length, and that
>>> is great. You also check xmit reclaim for rx interrupts, so
>>> as long as you have both rx and tx traffic, this should work
>>> great.
>>>
>>> However, I notice that you only have a 'tx fifo empty'
>>> interrupt triggering the napi poll, so I guess on a tx-only
>>> workload you will always end up pushing packets into the
>>> queue until BQL throttles tx, and then get the interrupt
>>> after all packets have been sent, which will cause BQL to
>>> make the queue longer up to the maximum queue size, and that
>>> negates the effect of BQL.
>>>
>>> Is there any way you can get a tx interrupt earlier than
>>> this in order to get a more balanced queue, or is it ok
>>> to just rely on rx packets to come in occasionally, and
>>> just use the tx fifo empty interrupt as a fallback?
>>>
>> In tx direction, there are only two kinds of interrupts, 'tx fifo empty'
>> and 'tx one packet finish'. I didn't use 'tx one packet finish' because
>> it would lead to high hardware interrupts rate. This has been verified in
>> our chips. It's ok to just use tx fifo empty interrupt.
> 
> I'm not convinced by the explanation, I don't think that has anything
> to do with the hardware design, but instead is about the correctness
> of the BQL logic with your driver.
> 
> Maybe your xmit function can do something like
> 
>   if (dql_avail(netdev_get_tx_queue(dev, 0)->dql) < 0)
>   enable per-packet interrupt
>   else
>   use only fifo-empty interrupt
> 
> That way, you don't get a lot of interrupts when the system is
> in a state of packets being received and sent continuously,
> but if you get to the point where your tx queue fills up
> and no rx interrupts arrive, you don't have to wait for it
> to become completely empty before adding new packets, and
> BQL won't keep growing the queue.
> 
Hi, Arnd
I tried enable per-packet interrupt when tx queue full in xmit function
and disable it in NAPI poll. But the number of interrupts are a little
bigger than only using fifo-empty interrupt.
The other hand, this is a fast ethernet MAC. Its maximum speed is 100Mbps.
This speed is very easily achived and the efficiency of the BQL is not
so important. What we focus on is the lower cpu utilization.
So I think it is okay to just use the tx fifo empty interrupt.

 +priv->phy_mode = of_get_phy_mode(node);
 +if (priv->phy_mode < 0) {
 +dev_err(dev, "not find phy-mode\n");
 +ret = -EINVAL;
 +goto out_disable_clk;
 +}
 +
 +priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
 +if (!priv->phy_node) {
 +dev_err(dev, "not find phy-handle\n");
 +ret = -EINVAL;
 +goto out_disable_clk;
 +}
 +
 +priv->phy = of_phy_connect(ndev, priv->phy_node,
 +   hisi_femac_adjust_link, 0, priv->phy_mode);
 +if (!(priv->phy) || IS_ERR(priv->phy)) {
 +dev_err(dev, "connect to PHY failed!\n");
 +ret = -ENODEV;
 +goto out_phy_node;
 +}
>>>
>>> I wonder if we could generalize this set of three calls, I
>>> get the impression that we duplicate this across several
>>> drivers that shouldn't need to bother with the specific
>>> phy-handle and phy-mode properties.
>>>
>> Some drivers only call 'of_phy_connect' when ndo_open called,
>> some call when driver probed. But 'phy_mode' and 'phy_node' are
>> usually initialized when driver probed.
>> So I think it's not suitable to combine 'of_phy_connect' with
>> 'of_get_phy_mode' and 'of_parse_phandle'.
>> Do you have any more suggestions ?
> 
> My idea was to add another interface that drivers could optionally
> call if they use the logic that you have here, but other drivers
> could keep using the plain of_phy_connect.
> 
> Anyway, this was just an idea, it's not important.
> 
>   Arnd
> 
> .
> 

Regards,
Dongpo

.



Re: [PATCH net-next] net: diag: Add support to filter on device index

2016-06-28 Thread David Miller
From: David Ahern 
Date: Thu, 23 Jun 2016 18:42:51 -0700

> Add support to inet_diag facility to filter sockets based on device
> index. If an interface index is in the filter only sockets bound
> to that index (sk_bound_dev_if) are returned.
> 
> Signed-off-by: David Ahern 

Applied.


Re: [PATCH] caif: Remove unneeded header file

2016-06-28 Thread David Miller
From: Amitoj Kaur Chawla 
Date: Fri, 24 Jun 2016 11:53:54 +0530

> Drop redundant include of moduleparam.h
> 
> The Coccinelle semantic patch used to make this change is as follows:
> @ includesmodule @
> @@
> 
> #include 
> 
> @ depends on includesmodule @
> @@
> 
> - #include 
> 
> Signed-off-by: Amitoj Kaur Chawla 

Applied, thanks.


Re: [PATCH 3/3] net: hisilicon: Add Fast Ethernet MAC driver

2016-06-28 Thread Arnd Bergmann
On Tuesday, June 28, 2016 5:21:19 PM CEST Dongpo Li wrote:
> On 2016/6/15 5:20, Arnd Bergmann wrote:
> > On Tuesday, June 14, 2016 9:17:44 PM CEST Li Dongpo wrote:
> >> On 2016/6/13 17:06, Arnd Bergmann wrote:
> >>> On Monday, June 13, 2016 2:07:56 PM CEST Dongpo Li wrote:
> >>> You tx function uses BQL to optimize the queue length, and that
> >>> is great. You also check xmit reclaim for rx interrupts, so
> >>> as long as you have both rx and tx traffic, this should work
> >>> great.
> >>>
> >>> However, I notice that you only have a 'tx fifo empty'
> >>> interrupt triggering the napi poll, so I guess on a tx-only
> >>> workload you will always end up pushing packets into the
> >>> queue until BQL throttles tx, and then get the interrupt
> >>> after all packets have been sent, which will cause BQL to
> >>> make the queue longer up to the maximum queue size, and that
> >>> negates the effect of BQL.
> >>>
> >>> Is there any way you can get a tx interrupt earlier than
> >>> this in order to get a more balanced queue, or is it ok
> >>> to just rely on rx packets to come in occasionally, and
> >>> just use the tx fifo empty interrupt as a fallback?
> >>>
> >> In tx direction, there are only two kinds of interrupts, 'tx fifo empty'
> >> and 'tx one packet finish'. I didn't use 'tx one packet finish' because
> >> it would lead to high hardware interrupts rate. This has been verified in
> >> our chips. It's ok to just use tx fifo empty interrupt.
> > 
> > I'm not convinced by the explanation, I don't think that has anything
> > to do with the hardware design, but instead is about the correctness
> > of the BQL logic with your driver.
> > 
> > Maybe your xmit function can do something like
> > 
> >   if (dql_avail(netdev_get_tx_queue(dev, 0)->dql) < 0)
> >   enable per-packet interrupt
> >   else
> >   use only fifo-empty interrupt
> > 
> > That way, you don't get a lot of interrupts when the system is
> > in a state of packets being received and sent continuously,
> > but if you get to the point where your tx queue fills up
> > and no rx interrupts arrive, you don't have to wait for it
> > to become completely empty before adding new packets, and
> > BQL won't keep growing the queue.
> > 
> Hi, Arnd
> I tried enable per-packet interrupt when tx queue full in xmit function
> and disable it in NAPI poll. But the number of interrupts are a little
> bigger than only using fifo-empty interrupt.

Right, I'd expect that to be the case, it basically means that the
algorithm works as expected.

Just to be sure you didn't have extra interrupts: you only enable the
per-packet interrupts if interrupts are currently enabled, not in
NAPI polling mode, right?

> The other hand, this is a fast ethernet MAC. Its maximum speed is 100Mbps.
> This speed is very easily achived and the efficiency of the BQL is not
> so important. What we focus on is the lower cpu utilization.
> So I think it is okay to just use the tx fifo empty interrupt.

BQL is not about efficiency, it's about keeping the latency down, which
is at least as important for low-throughput devices as it is for faster
ones. I don't think that disabling BQL here would be the right answer,
you'd just end up with the maximum TX queue length all the time.

Your queue length is 12 packets of 1500 bytes, meaning that you have 1.4ms
of latency at 100mbit/s rate, or 14ms for 10mbit/s. This is much less
than most, but it's probably still worth using BQL on it.

Arnd


Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-28 Thread David Miller
From: Dexuan Cui 
Date: Fri, 24 Jun 2016 07:45:24 +

> + while ((ret = vmalloc(size)) == NULL)
> + ssleep(1);

This is completely, and entirely, unacceptable.

If the allocation fails, you return an error and release
your resources.

You don't just loop forever waiting for it to succeed.


Re: [PATCH V5 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-28 Thread David Miller
From: 
Date: Fri, 24 Jun 2016 02:13:23 -0700

> From: Tien Hock Loh 
> 
> This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
> the dwmac is set to sgmii.
> 
> Signed-off-by: Tien Hock Loh 
> Acked-by: Giuseppe Cavallaro 
> Acked-by: Rob Herring 

Applied to net-next, thanks.


Re: [PATCH] of_mdio: select fixed phy support unconditionally

2016-06-28 Thread David Miller
From: Arnd Bergmann 
Date: Fri, 24 Jun 2016 11:24:08 +0200

> Calling the fixed-phy functions when CONFIG_FIXED_PHY=m as a previous
> change tried cannot work if the caller is in built-in code:
> 
> drivers/of/built-in.o: In function `of_phy_register_fixed_link':
> of_reserved_mem.c:(.text+0x85e0): undefined reference to `fixed_phy_register'
> 
> Making of_mdio depend on 'FIXED_PHY || !FIXED_PHY' would solve this
> dependency by enforcing that OF_MDIO itself becomes a loadable module
> when FIXED_PHY=y, but that creates a different dependency as it
> breaks any built-in ethernet driver that uses of_mdio.
> 
> Making FIXED_PHY a bool option also cannot work, since it depends on
> PHYLIB, which again is tristate.
> 
> This version now uses 'select FIXED_PHY' to ensure that the fixed-phy
> portion of of_mdio is not optional. The main downside of this is
> a small increase in code size for cases that do not need fixed phy
> support, but it should avoid all of the link-time problems.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: d1bd330a229f ("of_mdio: Enable fixed PHY support if driver is a 
> module")

Applied to net-next, thanks Arnd.

In the future, please be explicit about what tree a patch is
targetting by specifying it in your Subject line, as:

[PATCH net-next] ...

or similar.

Thanks.


Re: [PATCH] of_mdio: select fixed phy support unconditionally

2016-06-28 Thread Arnd Bergmann
On Tuesday, June 28, 2016 5:43:42 AM CEST David Miller wrote:
> Applied to net-next, thanks Arnd.
> 
> In the future, please be explicit about what tree a patch is
> targetting by specifying it in your Subject line, as:
> 
> [PATCH net-next] ...
> 

Sure, will do.

Arnd



Re: [PATCH V5 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-28 Thread David Miller
From: David Miller 
Date: Tue, 28 Jun 2016 05:34:50 -0400 (EDT)

> From: 
> Date: Fri, 24 Jun 2016 02:13:23 -0700
> 
>> From: Tien Hock Loh 
>> 
>> This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
>> the dwmac is set to sgmii.
>> 
>> Signed-off-by: Tien Hock Loh 
>> Acked-by: Giuseppe Cavallaro 
>> Acked-by: Rob Herring 
> 
> Applied to net-next, thanks.

I had to revert, this breaks the build:

ERROR: "tse_pcs_fix_mac_speed" 
[drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.ko] undefined!
ERROR: "tse_pcs_init" [drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.ko] 
undefined!


Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address

2016-06-28 Thread David Miller
From: Daniel Danzberger 
Date: Fri, 24 Jun 2016 12:35:18 +0200

> The bridge is falsly dropping ipv6 mulitcast packets if there is:
>  1. No ipv6 address assigned on the brigde.
>  2. No external mld querier present.
>  3. The internal querier enabled.
> 
> When the bridge fails to build mld queries, because it has no
> ipv6 address, it slilently returns, but keeps the local querier enabled.
> This specific case causes confusing packet loss.
> 
> Ipv6 multicast snooping can only work if:
>  a) An external querier is present
>  OR
>  b) The bridge has an ipv6 address an is capable of sending own queries
> 
> Otherwise it has to forward/flood the ipv6 multicast traffic,
> because snooping cannot work.
> 
> This patch fixes the issue by adding a flag to the bridge struct that
> indicates that there is currently no ipv6 address assinged to the bridge
> and returns a false state for the local querier in
> __br_multicast_querier_exists().
> 
> Special thanks to Linus Lüssing.
> 
> Signed-off-by: Daniel Danzberger 

Applied.


Re: [PATCH V5 1/1] net: ethernet: Add TSE PCS support to dwmac-socfpga

2016-06-28 Thread Tien Hock Loh
My fault, I wasn't testing against building as module, was always
building as part of the kernel. I'll get it fixed and put another patch
for review.

Thanks
Tien Hock

On Tue, 2016-06-28 at 05:48 -0400, David Miller wrote:
> From: David Miller 
> Date: Tue, 28 Jun 2016 05:34:50 -0400 (EDT)
> 
> > From: 
> > Date: Fri, 24 Jun 2016 02:13:23 -0700
> > 
> >> From: Tien Hock Loh 
> >> 
> >> This adds support for TSE PCS that uses SGMII adapter when the phy-mode of
> >> the dwmac is set to sgmii.
> >> 
> >> Signed-off-by: Tien Hock Loh 
> >> Acked-by: Giuseppe Cavallaro 
> >> Acked-by: Rob Herring 
> > 
> > Applied to net-next, thanks.
> 
> I had to revert, this breaks the build:
> 
> ERROR: "tse_pcs_fix_mac_speed" 
> [drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.ko] undefined!
> ERROR: "tse_pcs_init" [drivers/net/ethernet/stmicro/stmmac/dwmac-socfpga.ko] 
> undefined!



[PATCH stable] sfc: report supported link speeds on SFP connections

2016-06-28 Thread Bert Kenward
Hello David,

Can you queue up this commit for -stable please? The sfc 8000-series
NICs are less tolerant of missing supported link speeds. It applies
cleanly on to 4.5.7 and 4.6.3. The Fixes: commit below is the commit
that introduced the 8000-series PCI IDs.

commit 1974282ab547df7437276c8d4ec47f3d2300f339
Author: Bert Kenward 
Date:   Mon Jun 6 17:29:30 2016 +0100

sfc: report supported link speeds on SFP connections

Fixes: dd248f1bc65b49cba622a7e925d90d790e572996

Thanks,

Bert.


[PATCH] wlcore/wl18xx: mesh: added initial mesh support for wl8

2016-06-28 Thread Yaniv Machani
From: Maital Hahn 

1. Added support for interface and role of mesh type.
2. Enabled enable/start of mesh-point role,
   and opening and closing a connection with a mesh peer.
3. Added multirole combination of mesh and ap
   under the same limits of dual ap mode.
4. Add support for 'sta_rc_update' opcode for mesh IF.
   The 'sta_rc_update' opcode is being used in mesh_plink.c.
Add support in wlcore to handle this opcode correctly for mesh
(as opposed to current implementation that handles STA only).
5. Bumped the firmware version to support new Mesh functionality

Signed-off-by: Maital Hahn 
Signed-off-by: Yaniv Machani 
---
 drivers/net/wireless/ti/wl18xx/main.c | 15 ---
 drivers/net/wireless/ti/wl18xx/wl18xx.h   |  2 +-
 drivers/net/wireless/ti/wlcore/acx.h  |  1 +
 drivers/net/wireless/ti/wlcore/boot.c |  2 +-
 drivers/net/wireless/ti/wlcore/cmd.c  | 13 -
 drivers/net/wireless/ti/wlcore/main.c | 32 +++
 drivers/net/wireless/ti/wlcore/wlcore_i.h |  1 +
 7 files changed, 52 insertions(+), 14 deletions(-)

diff --git a/drivers/net/wireless/ti/wl18xx/main.c 
b/drivers/net/wireless/ti/wl18xx/main.c
index ae47c79..4811b74 100644
--- a/drivers/net/wireless/ti/wl18xx/main.c
+++ b/drivers/net/wireless/ti/wl18xx/main.c
@@ -1821,9 +1821,12 @@ static const struct ieee80211_iface_limit 
wl18xx_iface_limits[] = {
},
{
.max = 1,
-   .types = BIT(NL80211_IFTYPE_AP) |
-BIT(NL80211_IFTYPE_P2P_GO) |
-BIT(NL80211_IFTYPE_P2P_CLIENT),
+   .types =   BIT(NL80211_IFTYPE_AP)
+| BIT(NL80211_IFTYPE_P2P_GO)
+| BIT(NL80211_IFTYPE_P2P_CLIENT)
+#ifdef CONFIG_MAC80211_MESH
+| BIT(NL80211_IFTYPE_MESH_POINT)
+#endif
},
{
.max = 1,
@@ -1836,6 +1839,12 @@ static const struct ieee80211_iface_limit 
wl18xx_iface_ap_limits[] = {
.max = 2,
.types = BIT(NL80211_IFTYPE_AP),
},
+#ifdef CONFIG_MAC80211_MESH
+   {
+   .max = 1,
+   .types = BIT(NL80211_IFTYPE_MESH_POINT),
+   },
+#endif
{
.max = 1,
.types = BIT(NL80211_IFTYPE_P2P_DEVICE),
diff --git a/drivers/net/wireless/ti/wl18xx/wl18xx.h 
b/drivers/net/wireless/ti/wl18xx/wl18xx.h
index 71e9e38..d65cc6d 100644
--- a/drivers/net/wireless/ti/wl18xx/wl18xx.h
+++ b/drivers/net/wireless/ti/wl18xx/wl18xx.h
@@ -29,7 +29,7 @@
 #define WL18XX_IFTYPE_VER  9
 #define WL18XX_MAJOR_VER   WLCORE_FW_VER_IGNORE
 #define WL18XX_SUBTYPE_VER WLCORE_FW_VER_IGNORE
-#define WL18XX_MINOR_VER   11
+#define WL18XX_MINOR_VER   58
 
 #define WL18XX_CMD_MAX_SIZE  740
 
diff --git a/drivers/net/wireless/ti/wlcore/acx.h 
b/drivers/net/wireless/ti/wlcore/acx.h
index 0d61fae..6321ed4 100644
--- a/drivers/net/wireless/ti/wlcore/acx.h
+++ b/drivers/net/wireless/ti/wlcore/acx.h
@@ -105,6 +105,7 @@ enum wl12xx_role {
WL1271_ROLE_DEVICE,
WL1271_ROLE_P2P_CL,
WL1271_ROLE_P2P_GO,
+   WL1271_ROLE_MESH_POINT,
 
WL12XX_INVALID_ROLE_TYPE = 0xff
 };
diff --git a/drivers/net/wireless/ti/wlcore/boot.c 
b/drivers/net/wireless/ti/wlcore/boot.c
index 19b7ec7..f75d304 100644
--- a/drivers/net/wireless/ti/wlcore/boot.c
+++ b/drivers/net/wireless/ti/wlcore/boot.c
@@ -130,7 +130,7 @@ fail:
wl1271_error("Your WiFi FW version (%u.%u.%u.%u.%u) is invalid.\n"
 "Please use at least FW %s\n"
 "You can get the latest firmwares at:\n"
-"git://github.com/TI-OpenLink/firmwares.git",
+"git://git.ti.com/wilink8-wlan/wl18xx_fw.git",
 fw_ver[FW_VER_CHIP], fw_ver[FW_VER_IF_TYPE],
 fw_ver[FW_VER_MAJOR], fw_ver[FW_VER_SUBTYPE],
 fw_ver[FW_VER_MINOR], min_fw_str);
diff --git a/drivers/net/wireless/ti/wlcore/cmd.c 
b/drivers/net/wireless/ti/wlcore/cmd.c
index 3315356..d002dc7 100644
--- a/drivers/net/wireless/ti/wlcore/cmd.c
+++ b/drivers/net/wireless/ti/wlcore/cmd.c
@@ -629,11 +629,14 @@ int wl12xx_cmd_role_start_ap(struct wl1271 *wl, struct 
wl12xx_vif *wlvif)
 
wl1271_debug(DEBUG_CMD, "cmd role start ap %d", wlvif->role_id);
 
-   /* trying to use hidden SSID with an old hostapd version */
-   if (wlvif->ssid_len == 0 && !bss_conf->hidden_ssid) {
-   wl1271_error("got a null SSID from beacon/bss");
-   ret = -EINVAL;
-   goto out;
+   /* If MESH --> ssid_len is always 0 */
+   if (!ieee80211_vif_is_mesh(vif)) {
+   /* trying to use hidden SSID with an old hostapd version */
+   if (wlvif->ssid_len == 0 && !bss_conf->hidden_ssid) {
+   wl1271_error("got a null SSID from beacon/bss");
+   ret = -EINVAL;
+   goto out;
+  

[PATCH net-next 3/6] bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_read

2016-06-28 Thread Daniel Borkmann
Follow-up commit to 1e33759c788c ("bpf, trace: add BPF_F_CURRENT_CPU
flag for bpf_perf_event_output") to add the same functionality into
bpf_perf_event_read() helper. The split of index into flags and index
component is also safe here, since such large maps are rejected during
map allocation time.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  2 +-
 kernel/trace/bpf_trace.c | 11 ---
 2 files changed, 9 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 406459b..58df2da 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -347,7 +347,7 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
 #define BPF_F_DONT_FRAGMENT(1ULL << 2)
 
-/* BPF_FUNC_perf_event_output flags. */
+/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
 #define BPF_F_INDEX_MASK   0xULL
 #define BPF_F_CURRENT_CPU  BPF_F_INDEX_MASK
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 505f9e9..19c5b4a 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -188,13 +188,19 @@ const struct bpf_func_proto 
*bpf_get_trace_printk_proto(void)
return &bpf_trace_printk_proto;
 }
 
-static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, u64 r4, u64 r5)
+static u64 bpf_perf_event_read(u64 r1, u64 flags, u64 r3, u64 r4, u64 r5)
 {
struct bpf_map *map = (struct bpf_map *) (unsigned long) r1;
struct bpf_array *array = container_of(map, struct bpf_array, map);
+   unsigned int cpu = smp_processor_id();
+   u64 index = flags & BPF_F_INDEX_MASK;
struct bpf_event_entry *ee;
struct perf_event *event;
 
+   if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
+   return -EINVAL;
+   if (index == BPF_F_CURRENT_CPU)
+   index = cpu;
if (unlikely(index >= array->map.max_entries))
return -E2BIG;
 
@@ -208,8 +214,7 @@ static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, 
u64 r4, u64 r5)
return -EINVAL;
 
/* make sure event is local and doesn't have pmu::count */
-   if (event->oncpu != smp_processor_id() ||
-   event->pmu->count)
+   if (unlikely(event->oncpu != cpu || event->pmu->count))
return -EINVAL;
 
/*
-- 
1.9.3



[PATCH net-next 4/6] bpf: don't use raw processor id in generic helper

2016-06-28 Thread Daniel Borkmann
Use smp_processor_id() for the generic helper bpf_get_smp_processor_id()
instead of the raw variant. This allows for preemption checks when we
have DEBUG_PREEMPT, and otherwise uses the raw variant anyway. We only
need to keep the raw variant for socket filters, but we can reuse the
helper that is already there from cBPF side.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/helpers.c |  2 +-
 net/core/filter.c| 10 +-
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index ad7a057..1ea3afb 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -101,7 +101,7 @@ const struct bpf_func_proto bpf_get_prandom_u32_proto = {
 
 static u64 bpf_get_smp_processor_id(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
-   return raw_smp_processor_id();
+   return smp_processor_id();
 }
 
 const struct bpf_func_proto bpf_get_smp_processor_id_proto = {
diff --git a/net/core/filter.c b/net/core/filter.c
index cb9fc16..46c88d9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -150,6 +150,12 @@ static u64 __get_raw_cpu_id(u64 ctx, u64 a, u64 x, u64 r4, 
u64 r5)
return raw_smp_processor_id();
 }
 
+static const struct bpf_func_proto bpf_get_raw_smp_processor_id_proto = {
+   .func   = __get_raw_cpu_id,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+};
+
 static u32 convert_skb_access(int skb_field, int dst_reg, int src_reg,
  struct bpf_insn *insn_buf)
 {
@@ -2037,7 +2043,7 @@ sk_filter_func_proto(enum bpf_func_id func_id)
case BPF_FUNC_get_prandom_u32:
return &bpf_get_prandom_u32_proto;
case BPF_FUNC_get_smp_processor_id:
-   return &bpf_get_smp_processor_id_proto;
+   return &bpf_get_raw_smp_processor_id_proto;
case BPF_FUNC_tail_call:
return &bpf_tail_call_proto;
case BPF_FUNC_ktime_get_ns:
@@ -2086,6 +2092,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return &bpf_get_route_realm_proto;
case BPF_FUNC_perf_event_output:
return bpf_get_event_output_proto();
+   case BPF_FUNC_get_smp_processor_id:
+   return &bpf_get_smp_processor_id_proto;
default:
return sk_filter_func_proto(func_id);
}
-- 
1.9.3



[PATCH net-next 5/6] bpf: add bpf_skb_change_proto helper

2016-06-28 Thread Daniel Borkmann
This patch adds a minimal helper for doing the groundwork of changing
the skb->protocol in a controlled way. Currently supported is v4 to
v6 and vice versa transitions, which allows f.e. for a minimal, static
nat64 implementation where applications in containers that still
require IPv4 can be transparently operated in an IPv6-only environment.
For example, host facing veth of the container can transparently do
the transitions in a programmatic way with the help of clsact qdisc
and cls_bpf.

Idea is to separate concerns for keeping complexity of the helper
lower, which means that the programs utilize bpf_skb_change_proto(),
bpf_skb_store_bytes() and bpf_lX_csum_replace() to get the job done,
instead of doing everything in a single helper (and thus partially
duplicating helper functionality). Also, bpf_skb_change_proto()
shouldn't need to deal with raw packet data as this is done by other
helpers.

bpf_skb_proto_6_to_4() and bpf_skb_proto_4_to_6() unclone the skb to
operate on a private one, push or pop additionally required header
space and migrate the gso/gro meta data from the shared info. We do
mark the gso type as dodgy so that headers are checked and segs
recalculated by the gso/gro engine. The gso_size target is adapted
as well. The flags argument added is currently reserved and can be
used for future extensions.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  14 
 net/core/filter.c| 200 +++
 2 files changed, 214 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 58df2da..66cd738 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -313,6 +313,20 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_get_tunnel_opt,
BPF_FUNC_skb_set_tunnel_opt,
+
+   /**
+* bpf_skb_change_proto(skb, proto, flags)
+* Change protocol of the skb. Currently supported is
+* v4 -> v6, v6 -> v4 transitions. The helper will also
+* resize the skb. eBPF program is expected to fill the
+* new headers via skb_store_bytes and lX_csum_replace.
+* @skb: pointer to skb
+* @proto: new skb->protocol type
+* @flags: reserved
+* Return: 0 on success or negative error
+*/
+   BPF_FUNC_skb_change_proto,
+
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 46c88d9..d983e76 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1783,6 +1783,202 @@ const struct bpf_func_proto bpf_skb_vlan_pop_proto = {
 };
 EXPORT_SYMBOL_GPL(bpf_skb_vlan_pop_proto);
 
+static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
+{
+   /* Caller already did skb_cow() with len as headroom,
+* so no need to do it here.
+*/
+   skb_push(skb, len);
+   memmove(skb->data, skb->data + len, off);
+   memset(skb->data + off, 0, len);
+
+   /* No skb_postpush_rcsum(skb, skb->data + off, len)
+* needed here as it does not change the skb->csum
+* result for checksum complete when summing over
+* zeroed blocks.
+*/
+   return 0;
+}
+
+static int bpf_skb_generic_pop(struct sk_buff *skb, u32 off, u32 len)
+{
+   /* skb_ensure_writable() is not needed here, as we're
+* already working on an uncloned skb.
+*/
+   if (unlikely(!pskb_may_pull(skb, off + len)))
+   return -ENOMEM;
+
+   skb_postpull_rcsum(skb, skb->data + off, len);
+   memmove(skb->data + len, skb->data, off);
+   __skb_pull(skb, len);
+
+   return 0;
+}
+
+static int bpf_skb_net_hdr_push(struct sk_buff *skb, u32 off, u32 len)
+{
+   bool trans_same = skb->transport_header == skb->network_header;
+   int ret;
+
+   /* There's no need for __skb_push()/__skb_pull() pair to
+* get to the start of the mac header as we're guaranteed
+* to always start from here under eBPF.
+*/
+   ret = bpf_skb_generic_push(skb, off, len);
+   if (likely(!ret)) {
+   skb->mac_header -= len;
+   skb->network_header -= len;
+   if (trans_same)
+   skb->transport_header = skb->network_header;
+   }
+
+   return ret;
+}
+
+static int bpf_skb_net_hdr_pop(struct sk_buff *skb, u32 off, u32 len)
+{
+   bool trans_same = skb->transport_header == skb->network_header;
+   int ret;
+
+   /* Same here, __skb_push()/__skb_pull() pair not needed. */
+   ret = bpf_skb_generic_pop(skb, off, len);
+   if (likely(!ret)) {
+   skb->mac_header += len;
+   skb->network_header += len;
+   if (trans_same)
+   skb->transport_header = skb->network_header;
+   }
+
+   return ret;
+}
+
+static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
+{
+   const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
+   u32 off = skb->network_

[PATCH net-next 6/6] bpf: add bpf_skb_change_type helper

2016-06-28 Thread Daniel Borkmann
This work adds a helper for changing skb->pkt_type in a controlled way.
We only allow a subset of possible values and can extend that in future
should other use cases come up. Doing this as a helper has the advantage
that errors can be handeled gracefully and thus helper kept extensible.

It's a write counterpart to pkt_type member we can already read from
struct __sk_buff context. Major use case is to change incoming skbs to
PACKET_HOST in a programmatic way instead of having to recirculate via
redirect(..., BPF_F_INGRESS), for example.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h |  9 +
 net/core/filter.c| 24 
 2 files changed, 33 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 66cd738..be6ac12 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -327,6 +327,15 @@ enum bpf_func_id {
 */
BPF_FUNC_skb_change_proto,
 
+   /**
+* bpf_skb_change_type(skb, type)
+* Change packet type of skb.
+* @skb: pointer to skb
+* @type: new skb->pkt_type type
+* Return: 0 on success or negative error
+*/
+   BPF_FUNC_skb_change_type,
+
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index d983e76..76f9a49 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1979,6 +1979,28 @@ static const struct bpf_func_proto 
bpf_skb_change_proto_proto = {
.arg3_type  = ARG_ANYTHING,
 };
 
+static u64 bpf_skb_change_type(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   struct sk_buff *skb = (struct sk_buff *) (long) r1;
+   u32 pkt_type = r2;
+
+   /* We only allow a restricted subset to be changed for now. */
+   if (unlikely(skb->pkt_type > PACKET_OTHERHOST ||
+pkt_type > PACKET_OTHERHOST))
+   return -EINVAL;
+
+   skb->pkt_type = pkt_type;
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_skb_change_type_proto = {
+   .func   = bpf_skb_change_type,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 bool bpf_helper_changes_skb_data(void *func)
 {
if (func == bpf_skb_vlan_push)
@@ -2278,6 +2300,8 @@ tc_cls_act_func_proto(enum bpf_func_id func_id)
return &bpf_skb_vlan_pop_proto;
case BPF_FUNC_skb_change_proto:
return &bpf_skb_change_proto_proto;
+   case BPF_FUNC_skb_change_type:
+   return &bpf_skb_change_type_proto;
case BPF_FUNC_skb_get_tunnel_key:
return &bpf_skb_get_tunnel_key_proto;
case BPF_FUNC_skb_set_tunnel_key:
-- 
1.9.3



[PATCH net-next 2/6] bpf, trace: fetch current cpu only once

2016-06-28 Thread Daniel Borkmann
We currently have two invocations, which is unnecessary. Fetch it only
once and use the smp_processor_id() variant, so we also get preemption
checks along with it when DEBUG_PREEMPT is set.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/trace/bpf_trace.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4e61f74..505f9e9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -233,6 +233,7 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, 
u64 r4, u64 size)
struct pt_regs *regs = (struct pt_regs *) (long) r1;
struct bpf_map *map = (struct bpf_map *) (long) r2;
struct bpf_array *array = container_of(map, struct bpf_array, map);
+   unsigned int cpu = smp_processor_id();
u64 index = flags & BPF_F_INDEX_MASK;
void *data = (void *) (long) r4;
struct perf_sample_data sample_data;
@@ -246,7 +247,7 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, 
u64 r4, u64 size)
if (unlikely(flags & ~(BPF_F_INDEX_MASK)))
return -EINVAL;
if (index == BPF_F_CURRENT_CPU)
-   index = raw_smp_processor_id();
+   index = cpu;
if (unlikely(index >= array->map.max_entries))
return -E2BIG;
 
@@ -259,7 +260,7 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, 
u64 r4, u64 size)
 event->attr.config != PERF_COUNT_SW_BPF_OUTPUT))
return -EINVAL;
 
-   if (unlikely(event->oncpu != smp_processor_id()))
+   if (unlikely(event->oncpu != cpu))
return -EOPNOTSUPP;
 
perf_sample_data_init(&sample_data, 0, 0);
-- 
1.9.3



Re: [PATCH] nl80211: improve nl80211_parse_mesh_config type checking

2016-06-28 Thread Johannes Berg
On Wed, 2016-06-15 at 22:29 +0200, Arnd Bergmann wrote:
> When building a kernel with W=1, the nl80211.c file causes a number
> of
> warnings, all about the same problem:
> 
> net/wireless/nl80211.c: In function 'nl80211_parse_mesh_config':
> net/wireless/nl80211.c:5287:103: error: comparison is always false
> due to limited range of data type [-Werror=type-limits]
> net/wireless/nl80211.c:5290:96: error: comparison is always false due
> to limited range of data type [-Werror=type-limits]
> net/wireless/nl80211.c:5293:124: error: comparison is always false
> due to limited range of data type [-Werror=type-limits]
> net/wireless/nl80211.c:5295:148: error: comparison is always false
> due to limited range of data type [-Werror=type-limits]
> net/wireless/nl80211.c:5298:106: error: comparison is always false
> due to limited range of data type [-Werror=type-limits]
> net/wireless/nl80211.c:5305:116: error: comparison is always false
> due to limited range of data type [-Werror=type-limits]
> 
> The problem is that gcc does not notice that the check is generate
> by a macro, so it complains about comparing an unsigned type against
> 0.
> 
> I've tried to come up with a way to rephrase that code in a way that
> avoids the warnings and otherwise improves the code as well.
> 
> This uses a set of new helper functions that perform the range
> checking,
> and should provide slightly better type safety than the older patch,
> at the expense of adding 44 lines to the code. Binary code size is
> basically unchanged though (20 bytes added to 126561 bytes .text).
> 
Applied.

johannes


[PATCH net-next 0/6] BPF helper improvements

2016-06-28 Thread Daniel Borkmann
This set adds various BPF helper improvements, that is, cleaning
up and adding BPF_F_CURRENT_CPU flag for tracing helper, allowing
for preemption checks on bpf_get_smp_processor_id() helper, and
adding two new helpers bpf_skb_change_{proto, type} for tc related
programs. For further details please see individual patches.

Note, this set requires -net to be merged into -net-next tree first.

Thanks a lot!

Daniel Borkmann (6):
  bpf: minor cleanups on fd maps and helpers
  bpf, trace: fetch current cpu only once
  bpf, trace: add BPF_F_CURRENT_CPU flag for bpf_perf_event_read
  bpf: don't use raw processor id in generic helper
  bpf: add bpf_skb_change_proto helper
  bpf: add bpf_skb_change_type helper

 include/uapi/linux/bpf.h |  25 -
 kernel/bpf/core.c|   3 +-
 kernel/bpf/helpers.c |   2 +-
 kernel/trace/bpf_trace.c |  32 +++
 net/core/filter.c| 234 ++-
 5 files changed, 275 insertions(+), 21 deletions(-)

-- 
1.9.3



[PATCH net-next 1/6] bpf: minor cleanups on fd maps and helpers

2016-06-28 Thread Daniel Borkmann
Some minor cleanups: i) Remove the unlikely() from fd array map lookups
and let the CPU branch predictor do its job, scenarios where there is not
always a map entry are very well valid. ii) Move the attribute type check
in the bpf_perf_event_read() helper a bit earlier so it's consistent wrt
checks with bpf_perf_event_output() helper as well. iii) remove some
comments that are self-documenting in kprobe_prog_is_valid_access() and
therefore make it consistent to tp_prog_is_valid_access() as well.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/core.c|  3 +--
 kernel/trace/bpf_trace.c | 18 ++
 2 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b94a365..d638062 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -719,14 +719,13 @@ select_insn:
 
if (unlikely(index >= array->map.max_entries))
goto out;
-
if (unlikely(tail_call_cnt > MAX_TAIL_CALL_CNT))
goto out;
 
tail_call_cnt++;
 
prog = READ_ONCE(array->ptrs[index]);
-   if (unlikely(!prog))
+   if (!prog)
goto out;
 
/* ARG1 at this point is guaranteed to point to CTX from
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3de25fb..4e61f74 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -199,19 +199,19 @@ static u64 bpf_perf_event_read(u64 r1, u64 index, u64 r3, 
u64 r4, u64 r5)
return -E2BIG;
 
ee = READ_ONCE(array->ptrs[index]);
-   if (unlikely(!ee))
+   if (!ee)
return -ENOENT;
 
event = ee->event;
+   if (unlikely(event->attr.type != PERF_TYPE_HARDWARE &&
+event->attr.type != PERF_TYPE_RAW))
+   return -EINVAL;
+
/* make sure event is local and doesn't have pmu::count */
if (event->oncpu != smp_processor_id() ||
event->pmu->count)
return -EINVAL;
 
-   if (unlikely(event->attr.type != PERF_TYPE_HARDWARE &&
-event->attr.type != PERF_TYPE_RAW))
-   return -EINVAL;
-
/*
 * we don't know if the function is run successfully by the
 * return value. It can be judged in other places, such as
@@ -251,7 +251,7 @@ static u64 bpf_perf_event_output(u64 r1, u64 r2, u64 flags, 
u64 r4, u64 size)
return -E2BIG;
 
ee = READ_ONCE(array->ptrs[index]);
-   if (unlikely(!ee))
+   if (!ee)
return -ENOENT;
 
event = ee->event;
@@ -354,18 +354,12 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
 static bool kprobe_prog_is_valid_access(int off, int size, enum 
bpf_access_type type,
enum bpf_reg_type *reg_type)
 {
-   /* check bounds */
if (off < 0 || off >= sizeof(struct pt_regs))
return false;
-
-   /* only read is allowed */
if (type != BPF_READ)
return false;
-
-   /* disallow misaligned access */
if (off % size != 0)
return false;
-
return true;
 }
 
-- 
1.9.3



Re: [PATCH] cfg80211/nl80211: add wifi tx power mode switching support

2016-06-28 Thread Johannes Berg
On Thu, 2016-05-12 at 17:34 +0800, Wei-Ning Huang wrote:
> 
> Johannes, I feel like being able to set calibration data at runtime
> is something common to all wireless drivers, so instead of using
> vendor commands what do you think if I pass the calibration data name
> instead of using those magic constants? This way, userspace does not
> need to know the details of what band/range power limit the driver
> supports. It allows for flexible driver side implementation and
> easier for userspace to control.
> 

Sorry - I dropped this thread accidentally.

I'm not really sure I understand the situation fully, but right now to
me this seems very strange.

The physical antennas probably don't really change between "clamshell"
and "tablet" mode, do the physical radiation properties change enough
to actually require different *calibration*? To me, that sounds very
strange.

Assuming they don't really change fundamentally, then I understand the
need to set different power levels, per band/channel/whatever
granularity. But that can be achieved in very different ways, and in
fact if you look at Chrome then for our iwl7000 driver there we do have
a command to do something similar (currently a vendor command, but that
can be changed) without ever changing the *calibration*.

So to me, the whole premise of the patch is confusing and/or wrong.

johannes


RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-28 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, June 28, 2016 17:34
> To: Dexuan Cui 
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com;
> cav...@redhat.com; KY Srinivasan ; Haiyang Zhang
> ; j...@perches.com
> Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui 
> Date: Fri, 24 Jun 2016 07:45:24 +
> 
> > +   while ((ret = vmalloc(size)) == NULL)
> > +   ssleep(1);
> 
> This is completely, and entirely, unacceptable.
> 
> If the allocation fails, you return an error and release
> your resources.
> 
> You don't just loop forever waiting for it to succeed.

Hi David,
I agree this is ugly...

The idea here is: IMO the syscalls sys_read()/write() shoudn't return
-ENOMEM, so I have to make sure the buffer allocation succeeds?

I tried to use kmalloc with __GFP_NOFAIL, but I hit a warning in 
in mm/page_alloc.c:
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

What error code do you think I should return? 
EAGAIN, ERESTARTSYS, or something else?

May I have your suggestion? Thanks!

-- Dexuan



Re: [Bridge] [PATCH net-next] net: bridge: add support for IGMP/MLD stats and export them via netlink

2016-06-28 Thread Linus Lüssing
On Mon, Jun 27, 2016 at 08:10:48PM +0200, Nikolay Aleksandrov via Bridge wrote:
> These are invaluable when monitoring or debugging complex multicast setups
> with bridges.

Indeed! Great patch :). Especially if people are unable to provide
pcap files for debugging (due to whatever reason). Hopefully that
will help with bugzilla ticket #99081, too...

I know it might not quite fit into your current patch, which simply
stores the ICMPv6 and IGMP type in the bridge private skb->cb, but
do you think you could count and export the following two more
things, too:

* MLDv1 vs. MLDv2 querier (and IGMP accordingly)
* Number of (potential) MLD/IGMP parse errors
  (e.g. beginning of br_multicast_ipv{4,6}_rcv():
   http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1588 
and
   http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1634)

The former would help to know how the network is expected to
behave (for instance whether you should see MLDv2 reports at all or
whether / how much report suppression to expect).

The latter would help to spot either potential IGMP/MLD parsing bugs in
the bridge or malformed IGMP/MLD messages send by someone else.


Ideally, there would be per port counters again for the overall
IPv4/IPv6 multicast traffic. That would help for multicast streams
for instance, to easily see whether multicast counters increase
rapidly on the ports you would expect them to. And whether snooping
is working in general for such streames, without needing to check
each port individually via tcpdump, for instance.


Just some thoughts, would love to hear what you think about them.

Regards, Linus


[PATCH 0/4] Mesh mpm fixes and enhancements

2016-06-28 Thread Yaniv Machani
This patch set is addressing some issues found in the current 802.11s 
implementation,
specifically when using hostap mpm. 
It's aligning the beacon format and handling some corner cases.

Maital Hahn (2):
  mac80211: mesh: flush stations before beacons are stopped
  mac80211/cfg: mesh: fix healing time when a mesh peer is disconnecting

Meirav Kama (2):
  mac80211: mesh: fixed HT ies in beacon template
  mac80211: sta_info: max_peers reached falsely

 net/mac80211/cfg.c   |  1 +
 net/mac80211/mesh.c  | 46 --
 net/mac80211/mesh_hwmp.c | 42 +-
 net/mac80211/sta_info.c  | 14 ++
 net/mac80211/util.c  |  3 ---
 net/wireless/mesh.c  |  2 +-
 6 files changed, 81 insertions(+), 27 deletions(-)

-- 
2.9.0



[PATCH 2/4] mac80211/cfg: mesh: fix healing time when a mesh peer is disconnecting

2016-06-28 Thread Yaniv Machani
From: Maital Hahn 

Once receiving a CLOSE action frame from the disconnecting peer,
flush all entries in the path table which has this peer as the
next hop.

In addition, upon receiving a packet, if next hop is not found,
trigger PERQ immidiatly, instead of just putting it in the queue.

Signed-off-by: Maital Hahn 
Acked-by: Yaniv Machani 
---
 net/mac80211/cfg.c   |  1 +
 net/mac80211/mesh.c  |  3 ++-
 net/mac80211/mesh_hwmp.c | 42 +-
 3 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
index 0c12e40..f876ef7 100644
--- a/net/mac80211/cfg.c
+++ b/net/mac80211/cfg.c
@@ -1011,6 +1011,7 @@ static void sta_apply_mesh_params(struct ieee80211_local 
*local,
if (sta->mesh->plink_state == NL80211_PLINK_ESTAB)
changed = mesh_plink_dec_estab_count(sdata);
sta->mesh->plink_state = params->plink_state;
+   mesh_path_flush_by_nexthop(sta);
 
ieee80211_mps_sta_status_update(sta);
changed |= ieee80211_mps_set_sta_local_pm(sta,
diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index 9214bc1..1f5be54 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -159,7 +159,8 @@ void mesh_sta_cleanup(struct sta_info *sta)
if (!sdata->u.mesh.user_mpm) {
changed |= mesh_plink_deactivate(sta);
del_timer_sync(&sta->mesh->plink_timer);
-   }
+   } else
+   mesh_path_flush_by_nexthop(sta);
 
/* make sure no readers can access nexthop sta from here on */
mesh_path_flush_by_nexthop(sta);
diff --git a/net/mac80211/mesh_hwmp.c b/net/mac80211/mesh_hwmp.c
index 8f9c3bd..9783d49 100644
--- a/net/mac80211/mesh_hwmp.c
+++ b/net/mac80211/mesh_hwmp.c
@@ -19,7 +19,7 @@
 
 #define MAX_PREQ_QUEUE_LEN 64
 
-static void mesh_queue_preq(struct mesh_path *, u8);
+static void mesh_queue_preq(struct mesh_path *, u8, bool);
 
 static inline u32 u32_field_get(const u8 *preq_elem, int offset, bool ae)
 {
@@ -830,7 +830,8 @@ static void hwmp_rann_frame_process(struct 
ieee80211_sub_if_data *sdata,
mhwmp_dbg(sdata,
  "time to refresh root mpath %pM\n",
  orig_addr);
-   mesh_queue_preq(mpath, PREQ_Q_F_START | PREQ_Q_F_REFRESH);
+   mesh_queue_preq(mpath, PREQ_Q_F_START | PREQ_Q_F_REFRESH,
+   false);
mpath->last_preq_to_root = jiffies;
}
 
@@ -925,7 +926,7 @@ void mesh_rx_path_sel_frame(struct ieee80211_sub_if_data 
*sdata,
  * Locking: the function must be called from within a rcu read lock block.
  *
  */
-static void mesh_queue_preq(struct mesh_path *mpath, u8 flags)
+static void mesh_queue_preq(struct mesh_path *mpath, u8 flags, bool immediate)
 {
struct ieee80211_sub_if_data *sdata = mpath->sdata;
struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh;
@@ -964,18 +965,24 @@ static void mesh_queue_preq(struct mesh_path *mpath, u8 
flags)
++ifmsh->preq_queue_len;
spin_unlock_bh(&ifmsh->mesh_preq_queue_lock);
 
-   if (time_after(jiffies, ifmsh->last_preq + min_preq_int_jiff(sdata)))
+   if (immediate) {
ieee80211_queue_work(&sdata->local->hw, &sdata->work);
+   } else {
+   if (time_after(jiffies,
+  ifmsh->last_preq + min_preq_int_jiff(sdata))) {
+   ieee80211_queue_work(&sdata->local->hw, &sdata->work);
 
-   else if (time_before(jiffies, ifmsh->last_preq)) {
-   /* avoid long wait if did not send preqs for a long time
-* and jiffies wrapped around
-*/
-   ifmsh->last_preq = jiffies - min_preq_int_jiff(sdata) - 1;
-   ieee80211_queue_work(&sdata->local->hw, &sdata->work);
-   } else
-   mod_timer(&ifmsh->mesh_path_timer, ifmsh->last_preq +
-   min_preq_int_jiff(sdata));
+   } else if (time_before(jiffies, ifmsh->last_preq)) {
+   /* avoid long wait if did not send preqs for a long time
+* and jiffies wrapped around
+*/
+   ifmsh->last_preq = jiffies -
+  min_preq_int_jiff(sdata) - 1;
+   ieee80211_queue_work(&sdata->local->hw, &sdata->work);
+   } else
+   mod_timer(&ifmsh->mesh_path_timer, ifmsh->last_preq +
+ min_preq_int_jiff(sdata));
+   }
 }
 
 /**
@@ -1110,7 +1117,7 @@ int mesh_nexthop_resolve(struct ieee80211_sub_if_data 
*sdata,
}
 
if (!(mpath->flags & MESH_PATH_RESOLVING))
-   mesh_queue_preq(mpath, PREQ_Q_F_START);
+   mesh_queue_preq(mpath, PREQ_Q_F_START, true);
 
 

[PATCH 3/4] mac80211: mesh: fixed HT ies in beacon template

2016-06-28 Thread Yaniv Machani
From: Meirav Kama 

There are several values in HT info elements of mesh beacon (built by the
mac80211) that are incorrect.
To fix them:
1. mac80211 will check configuration from cfg and will build accordingly.
2. changes made in mesh default values.

Signed-off-by: Meirav Kama 
Acked-by: Yaniv Machani 
---
 net/mac80211/mesh.c | 33 -
 net/mac80211/util.c |  3 ---
 net/wireless/mesh.c |  2 +-
 3 files changed, 33 insertions(+), 5 deletions(-)

diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index 1f5be54..1b63b11 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -423,6 +423,8 @@ int mesh_add_ht_cap_ie(struct ieee80211_sub_if_data *sdata,
enum nl80211_band band = ieee80211_get_sdata_band(sdata);
struct ieee80211_supported_band *sband;
u8 *pos;
+   u16 cap;
+
 
sband = local->hw.wiphy->bands[band];
if (!sband->ht_cap.ht_supported ||
@@ -431,11 +433,40 @@ int mesh_add_ht_cap_ie(struct ieee80211_sub_if_data 
*sdata,
sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_10)
return 0;
 
+/* determine capability flags */
+   cap = sband->ht_cap.cap;
+
+/* if channel width is 20MHz - configure HT capab accordingly*/
+   if (sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_20) {
+   cap &= ~IEEE80211_HT_CAP_SUP_WIDTH_20_40;
+   cap &= ~IEEE80211_HT_CAP_DSSSCCK40;
+   }
+
+   /* set SM PS mode properly */
+   cap &= ~IEEE80211_HT_CAP_SM_PS;
+   switch (sdata->smps_mode) {
+   case IEEE80211_SMPS_AUTOMATIC:
+   case IEEE80211_SMPS_NUM_MODES:
+   WARN_ON(1);
+   case IEEE80211_SMPS_OFF:
+   cap |= WLAN_HT_CAP_SM_PS_DISABLED <<
+   IEEE80211_HT_CAP_SM_PS_SHIFT;
+   break;
+   case IEEE80211_SMPS_STATIC:
+   cap |= WLAN_HT_CAP_SM_PS_STATIC <<
+   IEEE80211_HT_CAP_SM_PS_SHIFT;
+   break;
+   case IEEE80211_SMPS_DYNAMIC:
+   cap |= WLAN_HT_CAP_SM_PS_DYNAMIC <<
+   IEEE80211_HT_CAP_SM_PS_SHIFT;
+   break;
+   }
+
if (skb_tailroom(skb) < 2 + sizeof(struct ieee80211_ht_cap))
return -ENOMEM;
 
pos = skb_put(skb, 2 + sizeof(struct ieee80211_ht_cap));
-   ieee80211_ie_build_ht_cap(pos, &sband->ht_cap, sband->ht_cap.cap);
+   ieee80211_ie_build_ht_cap(pos, &sband->ht_cap, cap);
 
return 0;
 }
diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 42bf0b6..5375a82 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -2349,10 +2349,7 @@ u8 *ieee80211_ie_build_ht_oper(u8 *pos, struct 
ieee80211_sta_ht_cap *ht_cap,
ht_oper->operation_mode = cpu_to_le16(prot_mode);
ht_oper->stbc_param = 0x;
 
-   /* It seems that Basic MCS set and Supported MCS set
-  are identical for the first 10 bytes */
memset(&ht_oper->basic_set, 0, 16);
-   memcpy(&ht_oper->basic_set, &ht_cap->mcs, 10);
 
return pos + sizeof(struct ieee80211_ht_operation);
 }
diff --git a/net/wireless/mesh.c b/net/wireless/mesh.c
index fa2066b..ac19a19 100644
--- a/net/wireless/mesh.c
+++ b/net/wireless/mesh.c
@@ -70,7 +70,7 @@ const struct mesh_config default_mesh_config = {
.dot11MeshGateAnnouncementProtocol = false,
.dot11MeshForwarding = true,
.rssi_threshold = MESH_RSSI_THRESHOLD,
-   .ht_opmode = IEEE80211_HT_OP_MODE_PROTECTION_NONHT_MIXED,
+   .ht_opmode = IEEE80211_HT_OP_MODE_PROTECTION_NONE,
.dot11MeshHWMPactivePathToRootTimeout = MESH_PATH_TO_ROOT_TIMEOUT,
.dot11MeshHWMProotInterval = MESH_ROOT_INTERVAL,
.dot11MeshHWMPconfirmationInterval = MESH_ROOT_CONFIRMATION_INTERVAL,
-- 
2.9.0



[PATCH 4/4] mac80211: sta_info: max_peers reached falsely

2016-06-28 Thread Yaniv Machani
From: Meirav Kama 

Issue happened when receiving delete_sta command without
changing plink_state from ESTAB to HOLDING before.
When receiving delete_sta command for mesh interface
verify plink_state is not ESTAB and if so, decrease
plink count and update beacon.

Signed-off-by: Meirav Kama 
Acked-by: Yaniv Machani 
---
 net/mac80211/sta_info.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 76b737d..1ce6320 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -1009,11 +1009,25 @@ int sta_info_destroy_addr_bss(struct 
ieee80211_sub_if_data *sdata,
 {
struct sta_info *sta;
int ret;
+#ifdef CONFIG_MAC80211_MESH
+   bool dec_links = false;
+#endif
 
mutex_lock(&sdata->local->sta_mtx);
sta = sta_info_get_bss(sdata, addr);
+#ifdef CONFIG_MAC80211_MESH
+   if (sdata->vif.type == NL80211_IFTYPE_MESH_POINT &&
+   sta->mesh->plink_state == NL80211_PLINK_ESTAB)
+   dec_links = true;
+#endif
ret = __sta_info_destroy(sta);
mutex_unlock(&sdata->local->sta_mtx);
+#ifdef CONFIG_MAC80211_MESH
+   if (dec_links) {
+   mesh_plink_dec_estab_count(sdata);
+   ieee80211_mbss_info_change_notify(sdata, BSS_CHANGED_BEACON);
+   }
+#endif
 
return ret;
 }
-- 
2.9.0



[PATCH 1/4] mac80211: mesh: flush stations before beacons are stopped

2016-06-28 Thread Yaniv Machani
From: Maital Hahn 

Some drivers (e.g. wl18xx) expect that the last stage in the
de-initialization process will be stopping the beacons, similar to ap.
Update ieee80211_stop_mesh() flow accordingly.

Signed-off-by: Maital Hahn 
Acked-by: Yaniv Machani 
---
 net/mac80211/mesh.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/mac80211/mesh.c b/net/mac80211/mesh.c
index 21b1fdf..9214bc1 100644
--- a/net/mac80211/mesh.c
+++ b/net/mac80211/mesh.c
@@ -896,20 +896,22 @@ void ieee80211_stop_mesh(struct ieee80211_sub_if_data 
*sdata)
 
netif_carrier_off(sdata->dev);
 
+   /* flush STAs and mpaths on this iface */
+   sta_info_flush(sdata);
+   mesh_path_flush_by_iface(sdata);
+
/* stop the beacon */
ifmsh->mesh_id_len = 0;
sdata->vif.bss_conf.enable_beacon = false;
clear_bit(SDATA_STATE_OFFCHANNEL_BEACON_STOPPED, &sdata->state);
ieee80211_bss_info_change_notify(sdata, BSS_CHANGED_BEACON_ENABLED);
+
+   /* remove beacon */
bcn = rcu_dereference_protected(ifmsh->beacon,
lockdep_is_held(&sdata->wdev.mtx));
RCU_INIT_POINTER(ifmsh->beacon, NULL);
kfree_rcu(bcn, rcu_head);
 
-   /* flush STAs and mpaths on this iface */
-   sta_info_flush(sdata);
-   mesh_path_flush_by_iface(sdata);
-
/* free all potentially still buffered group-addressed frames */
local->total_ps_buffered -= skb_queue_len(&ifmsh->ps.bc_buf);
skb_queue_purge(&ifmsh->ps.bc_buf);
-- 
2.9.0



[PATCH] mac80211: util: mesh is not connected properly after recovery

2016-06-28 Thread Yaniv Machani
From: Maital Hahn 

In the reconfigure process for mesh interface, moved the reconfiguration
of the mesh peers to be done only after restarting the beacons,
the same as it is done for AP.

Signed-off-by: Maital Hahn 
Acked-by: Yaniv Machani 
---
 net/mac80211/util.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/mac80211/util.c b/net/mac80211/util.c
index 5375a82..2431684 100644
--- a/net/mac80211/util.c
+++ b/net/mac80211/util.c
@@ -1910,6 +1910,7 @@ int ieee80211_reconfig(struct ieee80211_local *local)
ieee80211_reconfig_stations(sdata);
/* fall through */
case NL80211_IFTYPE_AP: /* AP stations are handled later */
+   case NL80211_IFTYPE_MESH_POINT: /* MP peers are handled later */
for (i = 0; i < IEEE80211_NUM_ACS; i++)
drv_conf_tx(local, sdata, i,
&sdata->tx_conf[i]);
@@ -2013,7 +2014,8 @@ int ieee80211_reconfig(struct ieee80211_local *local)
if (!sta->uploaded)
continue;
 
-   if (sta->sdata->vif.type != NL80211_IFTYPE_AP)
+   if ((sta->sdata->vif.type != NL80211_IFTYPE_AP) &&
+   (sta->sdata->vif.type != NL80211_IFTYPE_MESH_POINT))
continue;
 
for (state = IEEE80211_STA_NOTEXIST;
-- 
2.9.0



[PATCH] mac80211: rx: frames received out of order

2016-06-28 Thread Yaniv Machani
From: Meirav Kama 

MP received data frames from another MP. Frames are forwarded
from Rx to Tx to be transmitted to a third MP.
Upon cloning the skb, the tx_info was zeroed, and the
hw_queue wasn't set correctly, causing frames to be
inserted to queue 0 (VOICE). If re-queue occurred for some
reason, frame will be inserted to correct queue 2 (BE).
In this case frames are now dequeued from 2 different queues and
sent out of order.

Signed-off-by: Meirav Kama 
Acked-by: Yaniv Machani 
---
 net/mac80211/rx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 9a1eb70..88dc744 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2392,6 +2392,7 @@ ieee80211_rx_h_mesh_fwding(struct ieee80211_rx_data *rx)
info->flags |= IEEE80211_TX_INTFL_NEED_TXPROCESSING;
info->control.vif = &rx->sdata->vif;
info->control.jiffies = jiffies;
+   info->hw_queue = q;
if (is_multicast_ether_addr(fwd_hdr->addr1)) {
IEEE80211_IFSTA_MESH_CTR_INC(ifmsh, fwded_mcast);
memcpy(fwd_hdr->addr2, sdata->vif.addr, ETH_ALEN);
-- 
2.9.0



Re: [PATCH net] bonding: fix 802.3ad aggregator reselection

2016-06-28 Thread Veli-Matti Lintu
2016-06-24 0:20 GMT+03:00 Jay Vosburgh :
>
> Since commit 7bb11dc9f59d ("bonding: unify all places where
> actor-oper key needs to be updated."), the logic in bonding to handle
> selection between multiple aggregators has not functioned.
>
> This affects only configurations wherein the bonding slaves
> connect to two discrete aggregators (e.g., two independent switches, each
> with LACP enabled), thus creating two separate aggregation groups within a
> single bond.
>
> The cause is a change in 7bb11dc9f59d to no longer set
> AD_PORT_BEGIN on a port after a link state change, which would cause the
> port to be reselected for attachment to an aggregator as if were newly
> added to the bond.  We cannot restore the prior behavior, as it
> contradicts IEEE 802.1AX 5.4.12, which requires ports that "become
> inoperable" (lose carrier, setting port_enabled=false as per 802.1AX
> 5.4.7) to remain selected (i.e., assigned to the aggregator).  As the port
> now remains selected, the aggregator selection logic is not invoked.
>
> A side effect of this change is that aggregators in bonding will
> now contain ports that are link down.  The aggregator selection logic
> does not currently handle this situation correctly, causing incorrect
> aggregator selection.
>
> This patch makes two changes to repair the aggregator selection
> logic in bonding to function as documented and within the confines of the
> standard:
>
> First, the aggregator selection and related logic now utilizes the
> number of active ports per aggregator, not the number of selected ports
> (as some selected ports may be down).  The ad_select "bandwidth" and
> "count" options only consider ports that are link up.
>
> Second, on any carrier state change of any slave, the aggregator
> selection logic is explicitly called to insure the correct aggregator is
> active.
>
> Reported-by: Veli-Matti Lintu 
> Fixes: 7bb11dc9f59d ("bonding: unify all places where actor-oper key needs to 
> be updated.")
> Signed-off-by: Jay Vosburgh 


Hi,

Thanks for the patch. I have been now testing it and the reselection
seems to be working now in most cases, but I hit one case that seems
to consistently fail in my test environment.

I've been doing most of testing with ad_select=count and this happens
with it. I haven't yet done extensive testing with
ad_select=stable/bandwidth.

The sequence to trigger the failure seems to be:

  Switch A (Agg ID 2)   Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0enp5s0f1 ens5f1 ens6f1
X   X  -   X  -   - Connection works
(Agg ID 2 active)
X   -  -   X  -   - Connection works
(Agg ID 1 active)
X   -  -   -  -   - No connection (Agg
ID 2 active)

I'm also wondering why link down event causes change of aggregator
when the active aggregator has the same number of active links than
the new aggregator.

The situation here clears once a port comes up. Once I hit also
problems without disabling all ports on active switch:

  Switch A (Agg ID 2)   Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0enp5s0f1 ens5f1 ens6f1
X   X  -   X  -   - Connection works
(Agg ID 2 active)
X   -  -   X  -   - No connection (Agg
ID 1 active)

The active switch does not seem to matter either when disabling ports:

  Switch A (Agg ID 2)   Switch B (Agg ID 1)
enp5s0f0 ens5f0 ens6f0enp5s0f1 ens5f1 ens6f1
X   -  -   X  X   X Connection works
(Agg ID 1 active)
X   -  -   X  -   - Connection works
(Agg ID 2 active)
-   -  -   X  -   - No connection (Agg
ID 1 active)

All testing was done with upstream version 4.6.2 with the patch
applied. When there is no connection, /proc/net/bonding/bond0 still
shows that there is an active aggregator that has a link up, but for
some reason no traffic passes through. I added some debugging
information in bond_procfs.c and the number of active ports seems to
match the enabled ports on switches.

I'll continue doing tests with different scenarios and I can also test
specific cases if needed.


Veli-Matti


Re: [PATCH net-next] tcp: increase size at which tcp_bound_to_half_wnd bounds to > TCP_MSS_DEFAULT

2016-06-28 Thread Eric Dumazet
On Tue, 2016-06-28 at 04:33 +, Seymour, Shane M wrote:
> In previous commit 01f83d69844d307be2aa6fea88b0e8fe5cbdb2f4
> the following comments were added:
> 
> "When peer uses tiny windows, there is no use in packetizing to sub-MSS
> pieces for the sake of SWS or making sure there are enough packets in
> the pipe for fast recovery."
> 
> The test should be > TCP_MSS_DEFAULT not >= 512. This allows low end
> devices that send an MSS of 536 (TCP_MSS_DEFAULT) to see better network
> performance by sending it 536 bytes of data at a time instead of bounding
> to half window size (268). Other network stacks work this way, e.g. HP-UX.


Trying to cope with ridiculous windows these days is really a waste of
time, as we perform this check for all tcp sendmsg() calls :(

Anyway, your patch is reversed.





Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-28 Thread Or Gerlitz

On 6/28/2016 8:57 AM, John Fastabend wrote:

On 16-06-27 09:07 AM, Saeed Mahameed wrote:

Add the commands to set and show the mode of SRIOV E-Switch, two modes are 
supported:

* legacy   : operating in the "old" L2 based mode (DMAC --> VF vport)
* offloads : offloading SW rules/policy (e.g Bridge/FDB or TC/Flows based) set 
by the host OS

Nice work overall also I really appreciated that the core networking
interfaces appear to able to support this without any change.


thanks..


On this patch though do we really need modes like this? My concern with
modes is two fold. One its another knob that some controller will have
to get right which I would prefer to avoid. And two I suspect switching
between the two modes flushes the tables or leaves them in some
unexpected state? At least I can't figure out what the expected should
be off-hand.


Re the 1st concern (another knob), I think we do want that, see below

Re the 2nd concern, I will re-read the cover letter and change logs and 
if needed clarify/improve: the transition is clean! When you are moving 
from legacy to offloads or the other way around, nothing is left in 
unexpected state,  all HW forwarding tables as filled by the current 
mode are flushed and next they are set as needed for the new mode.



Could we instead continue to use the "legacy" mode by default by just
populating the fdb table correctly and then if users want to enable
the "offloads" mode they can modify the fdb tables by deleting entries
or adding them or just extending the dmac/vf mapping via 'tc'. This
would seem natural to me. The flooding rules in fdb might need to be
exposed a bit more cleanly to get the right default flooding behavior
etc. But to me at least this would be much cleaner. Everything will be
nicely defined and we wont have issues with drivers doing slightly
and subtle different defaults between legacy/offload and the transitions
between the states or on resets or etc. If users need to discover the
current configuration then they just query fdb, query tc, and the state
is known no need for any magic toggle switch as best I can see.



Few comments here:

Each mode has it's own way of the driver doing setup for the HW tables 
and how population of the HW tables is done.


The offloads mode needs to create a black hole miss rule and 
send-to-vport rules and create the tables so they can contain later 
rules set by the kernel in a way which is HW/driver dependent.


The legacy mode creates the tables differently and populates them later 
with rule set by

the driver and not the kernel.

Even if we put the different table setup issue a side, I don't think it 
would be correct for bridge/tc to remove rules they didn't add, which is 
needed under your proposal when moving from legacy type rules to 
offloads mode. Querying is problematic too, since legacy could (and 
does) involve some default rules set by the FW, e.g that deals with 
outer world (== not belonging to VM on this host) MACs which are 
invisible to the driver.


That legacy was here and we can't avoid handling it properly for which 
this knob is needed. Note that a vendor can choose to put their default 
to be offloads, hopefully over time, we will all go there :)



Otherwise I didn't review the mlx code but read the commit msgs and
it looks good. I'll take a closer look in the morning.


appreciated



Re: [Bridge] [PATCH net-next] net: bridge: add support for IGMP/MLD stats and export them via netlink

2016-06-28 Thread Nikolay Aleksandrov
On 28/06/16 13:03, Linus Lüssing wrote:
> On Mon, Jun 27, 2016 at 08:10:48PM +0200, Nikolay Aleksandrov via Bridge 
> wrote:
>> These are invaluable when monitoring or debugging complex multicast setups
>> with bridges.
> 
> Indeed! Great patch :). Especially if people are unable to provide
> pcap files for debugging (due to whatever reason). Hopefully that
> will help with bugzilla ticket #99081, too...
> 
> I know it might not quite fit into your current patch, which simply
> stores the ICMPv6 and IGMP type in the bridge private skb->cb, but
> do you think you could count and export the following two more
> things, too:
> 
> * MLDv1 vs. MLDv2 querier (and IGMP accordingly)
> * Number of (potential) MLD/IGMP parse errors
>   (e.g. beginning of br_multicast_ipv{4,6}_rcv():
>http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1588 
> and
>http://lxr.free-electrons.com/source/net/bridge/br_multicast.c?v=4.5#L1634)
> 
> The former would help to know how the network is expected to
> behave (for instance whether you should see MLDv2 reports at all or
> whether / how much report suppression to expect).
> 
> The latter would help to spot either potential IGMP/MLD parsing bugs in
> the bridge or malformed IGMP/MLD messages send by someone else.
> 
> 
> Ideally, there would be per port counters again for the overall
> IPv4/IPv6 multicast traffic. That would help for multicast streams
> for instance, to easily see whether multicast counters increase
> rapidly on the ports you would expect them to. And whether snooping
> is working in general for such streames, without needing to check
> each port individually via tcpdump, for instance.
> 
> 
> Just some thoughts, would love to hear what you think about them.
> 
> Regards, Linus
> 

Hi Linus,
I think these are all reasonable and helpful things to export in addition. I 
will
definitely look into extending the stats with them. If this patch is accepted 
as-is
I'll just do it as a follow-up.

Thanks for the good suggestions!

Cheers,
 Nik



Re: [PATCHv2, 2/7] ppc: bpf/jit: Fix/enhance 32-bit Load Immediate implementation

2016-06-28 Thread Michael Ellerman
On Wed, 2016-22-06 at 16:25:02 UTC, "Naveen N. Rao" wrote:
> The existing LI32() macro can sometimes result in a sign-extended 32-bit
> load that does not clear the top 32-bits properly. As an example,
> loading 0x7fff results in the register containing
> 0x7fff. While this does not impact classic BPF JIT
> implementation (since that only uses the lower word for all operations),
> we would like to share this macro between classic BPF JIT and extended
> BPF JIT, wherein the entire 64-bit value in the register matters. Fix
> this by first doing a shifted LI followed by ORI.
> 
> An additional optimization is with loading values between -32768 to -1,
> where we now only need a single LI.
> 
> The new implementation now generates the same or less number of
> instructions.
> 
> Acked-by: Alexei Starovoitov 
> Signed-off-by: Naveen N. Rao 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/aaf2f7e09932a08c1287d8e4c6

cheers


[PATCH net] openvswitch: fix conntrack netlink event delivery

2016-06-28 Thread Samuel Gauthier
Only the first and last netlink message for a particular conntrack are
actually sent. The first message is sent through nf_conntrack_confirm when
the conntrack is committed. The last one is sent when the conntrack is
destroyed on timeout. The other conntrack state change messages are not
advertised.

When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
is called for each packet, from the postrouting hook, which in turn calls
nf_ct_deliver_cached_events to send the state change netlink messages.

This commit fixes the problem by calling nf_conntrack_confirm all the time,
i.e not only in the commit case.

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
CC: Joe Stringer 
CC: Justin Pettit 
CC: Andy Zhou 
CC: Thomas Graf 
Signed-off-by: Samuel Gauthier 
---
This patch was tested against the net tree, checking the notifications with
conntrack -E.

David, this patch conflicts with the patch 7d904c7bcd51 ("openvswitch: Only
set mark and labels with a commit flag.") from net-next. I can help solving
the conflict if you need to.

 net/openvswitch/conntrack.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/openvswitch/conntrack.c b/net/openvswitch/conntrack.c
index 3d5feede962d..4ea97f1c3861 100644
--- a/net/openvswitch/conntrack.c
+++ b/net/openvswitch/conntrack.c
@@ -834,9 +834,6 @@ static int ovs_ct_commit(struct net *net, struct 
sw_flow_key *key,
err = __ovs_ct_lookup(net, key, info, skb);
if (err)
return err;
-   /* This is a no-op if the connection has already been confirmed. */
-   if (nf_conntrack_confirm(skb) != NF_ACCEPT)
-   return -EINVAL;
 
return 0;
 }
@@ -888,6 +885,11 @@ int ovs_ct_execute(struct net *net, struct sk_buff *skb,
if (labels_nonzero(&info->labels.mask))
err = ovs_ct_set_labels(skb, key, &info->labels.value,
&info->labels.mask);
+
+   /* This is a no-op if the connection has already been confirmed. */
+   if (nf_conntrack_confirm(skb) != NF_ACCEPT)
+   return -EINVAL;
+
 err:
skb_push(skb, nh_ofs);
if (err)
-- 
2.2.1.62.g3f15098



Re: [PATCH net-next 08/16] net/devlink: Add E-Switch mode control

2016-06-28 Thread Jiri Pirko
Mon, Jun 27, 2016 at 06:07:21PM CEST, sae...@mellanox.com wrote:
>From: Or Gerlitz 
>
>Add the commands to set and show the mode of SRIOV E-Switch,
>two modes are supported:
>
>* legacy   : operating in the "old" L2 based mode (DMAC --> VF vport)
>* offloads : offloading SW rules/policy (e.g Bridge/FDB or TC/Flows based) set 
>by the host OS
>
>Signed-off-by: Or Gerlitz 
>Signed-off-by: Saeed Mahameed 

Acked-by: Jiri Pirko 

Looks fine to me. Usable for many drivers of devices containing embedded
switch. We need this for clean transition from legacy handling of embedded
switches we have in drivers currently to new switchdev model.

Thanks!


Re: [PATCH 2/4] mac80211/cfg: mesh: fix healing time when a mesh peer is disconnecting

2016-06-28 Thread Bob Copeland
On Tue, Jun 28, 2016 at 02:13:05PM +0300, Yaniv Machani wrote:
> From: Maital Hahn 
> 
> Once receiving a CLOSE action frame from the disconnecting peer,
> flush all entries in the path table which has this peer as the
> next hop.

Please address the user-visible behavior in your commit messages.
Does it crash?  Does it send frames to an invalid peer?  Do
frames get dropped?

> In addition, upon receiving a packet, if next hop is not found,
> trigger PERQ immidiatly, instead of just putting it in the queue.

"PREQ"

Please split this into a separate patch that we can review
separately (and also give the "why" in the commit log).

> Signed-off-by: Maital Hahn 
> Acked-by: Yaniv Machani 
> ---
>  net/mac80211/cfg.c   |  1 +
>  net/mac80211/mesh.c  |  3 ++-
>  net/mac80211/mesh_hwmp.c | 42 +-
>  3 files changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c
> index 0c12e40..f876ef7 100644
> --- a/net/mac80211/cfg.c
> +++ b/net/mac80211/cfg.c
> @@ -1011,6 +1011,7 @@ static void sta_apply_mesh_params(struct 
> ieee80211_local *local,
>   if (sta->mesh->plink_state == NL80211_PLINK_ESTAB)
>   changed = mesh_plink_dec_estab_count(sdata);
>   sta->mesh->plink_state = params->plink_state;
> + mesh_path_flush_by_nexthop(sta);

This isn't necessary, caller should already be doing
mesh_path_flush_by_nexthop() in every case I could see.  Besides it
cannot be done under plink lock.

> +++ b/net/mac80211/mesh.c
> @@ -159,7 +159,8 @@ void mesh_sta_cleanup(struct sta_info *sta)
>   if (!sdata->u.mesh.user_mpm) {
>   changed |= mesh_plink_deactivate(sta);
>   del_timer_sync(&sta->mesh->plink_timer);
> - }
> + } else
> + mesh_path_flush_by_nexthop(sta);

And this is already fixed in mac80211-next.

-- 
Bob Copeland %% http://bobcopeland.com/


[PATCH net 1/1] qed: Protect the doorbell BAR with the write barriers.

2016-06-28 Thread Sudarsana Reddy Kalluru
SPQ doorbell is currently protected with the compilation barrier. Under the
stress scenarios, we may get into a state where (due to the weak ordering)
several ramrod doorbells were written to the BAR with an out-of-order
producer values. Need to change the barrier type to a write barrier to make
sure that the write buffer is flushed after each doorbell.

Signed-off-by: Sudarsana Reddy Kalluru 
---
 drivers/net/ethernet/qlogic/qed/qed_spq.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_spq.c 
b/drivers/net/ethernet/qlogic/qed/qed_spq.c
index 67d9893..b122f60 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_spq.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_spq.c
@@ -213,19 +213,15 @@ static int qed_spq_hw_post(struct qed_hwfn *p_hwfn,
SET_FIELD(db.params, CORE_DB_DATA_AGG_VAL_SEL,
  DQ_XCM_CORE_SPQ_PROD_CMD);
db.agg_flags = DQ_XCM_CORE_DQ_CF_CMD;
-
-   /* validate producer is up to-date */
-   rmb();
-
db.spq_prod = cpu_to_le16(qed_chain_get_prod_idx(p_chain));
 
-   /* do not reorder */
-   barrier();
+   /* make sure the SPQE is updated before the doorbell */
+   wmb();
 
DOORBELL(p_hwfn, qed_db_addr(p_spq->cid, DQ_DEMS_LEGACY), *(u32 *)&db);
 
/* make sure doorbell is rang */
-   mmiowb();
+   wmb();
 
DP_VERBOSE(p_hwfn, QED_MSG_SPQ,
   "Doorbelled [0x%08x, CID 0x%08x] with Flags: %02x 
agg_params: %02x, prod: %04x\n",
-- 
1.8.3.1



[PATCH net-next v3 6/7] r8152: support RTL8153B

2016-06-28 Thread Hayes Wang
Support new chip RTL8153B.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 569 +---
 1 file changed, 542 insertions(+), 27 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 7227931..bd74fab 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -28,7 +28,7 @@
 #include 
 
 /* Information for net-next */
-#define NETNEXT_VERSION"08"
+#define NETNEXT_VERSION"09"
 
 /* Information for net */
 #define NET_VERSION"3"
@@ -50,11 +50,14 @@
 #define PLA_FMC0xc0b4
 #define PLA_CFG_WOL0xc0b6
 #define PLA_TEREDO_CFG 0xc0bc
+#define PLA_TEREDO_WAKE_BASE   0xc0c4
 #define PLA_MAR0xcd00
 #define PLA_BACKUP 0xd000
 #define PAL_BDC_CR 0xd1a0
 #define PLA_TEREDO_TIMER   0xd2cc
 #define PLA_REALWOW_TIMER  0xd2e8
+#define PLA_EFUSE_DATA 0xdd00
+#define PLA_EFUSE_CMD  0xdd02
 #define PLA_LEDSEL 0xdd90
 #define PLA_LED_FEATURE0xdd92
 #define PLA_PHYAR  0xde00
@@ -104,7 +107,9 @@
 #define USB_CSR_DUMMY2 0xb466
 #define USB_DEV_STAT   0xb808
 #define USB_CONNECT_TIMER  0xcbf8
+#define USB_MSC_TIMER  0xcbfc
 #define USB_BURST_SIZE 0xcfc0
+#define USB_LPM_CONFIG 0xcfd8
 #define USB_USB_CTRL   0xd406
 #define USB_PHY_CTRL   0xd408
 #define USB_TX_AGG 0xd40a
@@ -112,14 +117,19 @@
 #define USB_USB_TIMER  0xd428
 #define USB_RX_EARLY_TIMEOUT   0xd42c
 #define USB_RX_EARLY_SIZE  0xd42e
-#define USB_PM_CTRL_STATUS 0xd432
+#define USB_PM_CTRL_STATUS 0xd432  /* RTL8153A */
+#define USB_RX_EXTRA_AGGR_TMR  0xd432  /* RTL8153B */
 #define USB_TX_DMA 0xd434
+#define USB_UPT_RXDMA_OWN  0xd437
 #define USB_TOLERANCE  0xd490
 #define USB_LPM_CTRL   0xd41a
+#define USB_U1U2_TIMER 0xd4da
 #define USB_UPS_CTRL   0xd800
 #define USB_MISC_0 0xd81a
 #define USB_POWER_CUT  0xd80a
 #define USB_AFE_CTRL2  0xd824
+#define USB_UPS_CFG0xd842
+#define USB_UPS_FLAGS  0xd848
 #define USB_WDT11_CTRL 0xe43c
 #define USB_BP_BA  0xfc26
 #define USB_BP_0   0xfc28
@@ -141,6 +151,7 @@
 #define OCP_EEE_AR 0xa41a
 #define OCP_EEE_DATA   0xa41c
 #define OCP_PHY_STATUS 0xa420
+#define OCP_NCTL_CFG   0xa42c
 #define OCP_POWER_CFG  0xa430
 #define OCP_EEE_CFG0xa432
 #define OCP_SRAM_ADDR  0xa436
@@ -150,9 +161,14 @@
 #define OCP_EEE_ADV0xa5d0
 #define OCP_EEE_LPABLE 0xa5d2
 #define OCP_PHY_STATE  0xa708  /* nway state for 8153 */
+#define OCP_PHY_PATCH_STAT 0xb800
+#define OCP_PHY_PATCH_CMD  0xb820
+#define OCP_ADC_IOFFSET0xbcfc
 #define OCP_ADC_CFG0xbc06
+#define OCP_SYSCLK_CFG 0xc416
 
 /* SRAM Register */
+#define SRAM_GREEN_CFG 0x8011
 #define SRAM_LPF_CFG   0x8012
 #define SRAM_10M_AMP1  0x8080
 #define SRAM_10M_AMP2  0x8082
@@ -250,6 +266,10 @@
 /* PAL_BDC_CR */
 #define ALDPS_PROXY_MODE   0x0001
 
+/* PLA_EFUSE_CMD */
+#define EFUSE_READ_CMD BIT(15)
+#define EFUSE_DATA_BIT16   BIT(7)
+
 /* PLA_CONFIG34 */
 #define LINK_ON_WAKE_EN0x0010
 #define LINK_OFF_WAKE_EN   0x0008
@@ -275,6 +295,7 @@
 
 /* PLA_MAC_PWR_CTRL2 */
 #define EEE_SPDWN_RATIO0x8007
+#define MAC_CLK_SPDWN_EN   BIT(15)
 
 /* PLA_MAC_PWR_CTRL3 */
 #define PKT_AVAIL_SPDWN_EN 0x0100
@@ -326,6 +347,9 @@
 #define STAT_SPEED_HIGH0x
 #define STAT_SPEED_FULL0x0002
 
+/* USB_LPM_CONFIG */
+#define LPM_U1U2_ENBIT(0)
+
 /* USB_TX_AGG */
 #define TX_AGG_MAX_THRESHOLD   0x03
 
@@ -333,11 +357,16 @@
 #define RX_THR_SUPPER  0x0c350180
 #define RX_THR_HIGH0x7a120180
 #define RX_THR_SLOW0x0180
+#define RX_THR_B   0x00010001
 
 /* USB_TX_DMA */
 #define TEST_MODE_DISABLE  0x0001
 #define TX_SIZE_ADJUST10x0100
 
+/* USB_UPT_RXDMA_OWN */
+#define OWN_UPDATE BIT(0)
+#define OWN_CLEAR  BIT(1)
+
 /* USB_UPS_CTRL */
 #define POWER_CUT  0x0100
 
@@ -354,6 +383,8 @@
 /* USB_POWER_CUT */
 #define PWR_EN 0x0001
 #define PHASE2_EN  0x0008
+#define UPS_EN BIT(4)
+#define USP_PREWAKEBIT(5)
 
 /* USB_MISC_0 */
 #define PCUT_STATUS0x0001
@@ -380,6 +411,37 @@
 #define SEN_VAL_NORMAL 0xa000
 #define SEL_RXIDLE 0x0100
 
+/* USB_UPS_CFG */
+#define SAW_CNT_1MS_MASK   0x0fff
+
+/* USB_UPS_FLAGS */
+#define UPS_FLAGS_R_TUNE   BIT(0)
+#define UPS_FLAGS_EN_10M_CKDIV BIT(1)
+#define UPS_FLAGS_250M_CKDIV   BIT(2)
+#define UPS_F

[PATCH net-next v3 5/7] r8152: support the new chip 8050

2016-06-28 Thread Hayes Wang
Support a new chip which has the product ID 0x8050.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index df370e5..7227931 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -647,6 +647,7 @@ enum rtl_version {
RTL_VER_04,
RTL_VER_05,
RTL_VER_06,
+   RTL_VER_07,
RTL_VER_MAX
 };
 
@@ -3921,6 +3922,7 @@ static int rtl8152_get_coalesce(struct net_device *netdev,
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
return -EOPNOTSUPP;
default:
break;
@@ -3940,6 +3942,7 @@ static int rtl8152_set_coalesce(struct net_device *netdev,
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
return -EOPNOTSUPP;
default:
break;
@@ -4039,6 +4042,7 @@ static int rtl8152_change_mtu(struct net_device *dev, int 
new_mtu)
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
return eth_change_mtu(dev, new_mtu);
default:
break;
@@ -4110,6 +4114,9 @@ static void r8152b_get_version(struct r8152 *tp)
tp->version = RTL_VER_06;
tp->mii.supports_gmii = 1;
break;
+   case 0x4800:
+   tp->version = RTL_VER_07;
+   break;
default:
netif_info(tp, probe, tp->netdev,
   "Unknown version 0x%04x\n", version);
@@ -4142,6 +4149,7 @@ static int rtl_ops_init(struct r8152 *tp)
switch (tp->version) {
case RTL_VER_01:
case RTL_VER_02:
+   case RTL_VER_07:
ops->init   = r8152b_init;
ops->enable = rtl8152_enable;
ops->disable= rtl8152_disable;
@@ -4339,6 +4347,7 @@ static void rtl8152_disconnect(struct usb_interface *intf)
 
 /* table of devices that work with this driver */
 static struct usb_device_id rtl8152_table[] = {
+   {REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8050)},
{REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8152)},
{REALTEK_USB_DEVICE(VENDOR_ID_REALTEK, 0x8153)},
{REALTEK_USB_DEVICE(VENDOR_ID_SAMSUNG, 0xa101)},
-- 
2.4.11



[PATCH net-next v3 7/7] r8152: add byte_enable for ocp_read_word function

2016-06-28 Thread Hayes Wang
Add byte_enable for ocp_read_word() to replace reading 4
bytes data with reading the desired 2 bytes data.

This is used to avoid the issue which is described in
commit b4d99def0938 ("r8152: remove sram_read"). The
original method always reads 4 bytes data, and it may
have problem when reading the PHY registers.

The new method is supported since RTL8152B, but it
doesn't influence the previous chips. The bits of the
byte_enable for the previous chips are the reserved
bits, and the hw would ignore them.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index bd74fab..e5405e6 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -946,11 +946,13 @@ static u16 ocp_read_word(struct r8152 *tp, u16 type, u16 
index)
 {
u32 data;
__le32 tmp;
+   u16 byen = BYTE_EN_WORD;
u8 shift = index & 2;
 
index &= ~3;
+   byen <<= shift;
 
-   generic_ocp_read(tp, index, sizeof(tmp), &tmp, type);
+   generic_ocp_read(tp, index, sizeof(tmp), &tmp, type | byen);
 
data = __le32_to_cpu(tmp);
data >>= (shift * 8);
-- 
2.4.11



[PATCH net-next v3 2/7] r8152: add u1u2_enable for rtl_ops

2016-06-28 Thread Hayes Wang
Add u1u2_enable() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index b253003..f51d799 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -620,6 +620,7 @@ struct r8152 {
int (*eee_set)(struct r8152 *, struct ethtool_eee *);
bool (*in_nway)(struct r8152 *);
void (*aldps_enable)(struct r8152 *tp, bool enable);
+   void (*u1u2_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
} rtl_ops;
 
@@ -2408,7 +2409,7 @@ static void rtl_runtime_suspend_enable(struct r8152 *tp, 
bool enable)
if (enable) {
u32 ocp_data;
 
-   r8153_u1u2en(tp, false);
+   tp->rtl_ops.u1u2_enable(tp, false);
r8153_u2p3en(tp, false);
 
__rtl_set_wol(tp, WAKE_ANY);
@@ -2423,7 +2424,7 @@ static void rtl_runtime_suspend_enable(struct r8152 *tp, 
bool enable)
} else {
__rtl_set_wol(tp, tp->saved_wolopts);
r8153_u2p3en(tp, true);
-   r8153_u1u2en(tp, true);
+   tp->rtl_ops.u1u2_enable(tp, true);
}
 }
 
@@ -2922,12 +2923,12 @@ static void rtl8153_up(struct r8152 *tp)
if (test_bit(RTL8152_UNPLUG, &tp->flags))
return;
 
-   r8153_u1u2en(tp, false);
+   tp->rtl_ops.u1u2_enable(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
r8153_first_init(tp);
tp->rtl_ops.aldps_enable(tp, true);
r8153_u2p3en(tp, true);
-   r8153_u1u2en(tp, true);
+   tp->rtl_ops.u1u2_enable(tp, true);
usb_enable_lpm(tp->udev);
 }
 
@@ -2938,7 +2939,7 @@ static void rtl8153_down(struct r8152 *tp)
return;
}
 
-   r8153_u1u2en(tp, false);
+   tp->rtl_ops.u1u2_enable(tp, false);
r8153_u2p3en(tp, false);
r8153_power_cut_en(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
@@ -4142,6 +4143,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_set= r8152_set_eee;
ops->in_nway= rtl8152_in_nway;
ops->aldps_enable   = r8152_aldps_en;
+   ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
break;
 
@@ -4159,6 +4161,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_set= r8153_set_eee;
ops->in_nway= rtl8153_in_nway;
ops->aldps_enable   = r8153_aldps_en;
+   ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
break;
 
-- 
2.4.11



[PATCH net-next v3 0/7] r8152: support new chips

2016-06-28 Thread Hayes Wang
v3:
Insert a patch "r8152: add u2p3_enable for rtl_ops".

Change the patch "r8152: support RTL8153B". Disable U2P3.

v2:
Fix the commit message for patch #6.

v1:
In order to support new chips, adjust some codes. Then, add the settings
for the new chips.

Hayes Wang (7):
  r8152: add aldps_enable for rtl_ops
  r8152: add u1u2_enable for rtl_ops
  r8152: add power_cut_en for rtl_ops
  r8152: add u2p3_enable for rtl_ops
  r8152: support the new chip 8050
  r8152: support RTL8153B
  r8152: add byte_enable for ocp_read_word function

 drivers/net/usb/r8152.c | 641 
 1 file changed, 592 insertions(+), 49 deletions(-)

-- 
2.4.11



[PATCH net-next v3 4/7] r8152: add u2p3_enable for rtl_ops

2016-06-28 Thread Hayes Wang
Add u2p3_enable() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index a4f8a01..df370e5 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -621,6 +621,7 @@ struct r8152 {
bool (*in_nway)(struct r8152 *);
void (*aldps_enable)(struct r8152 *tp, bool enable);
void (*u1u2_enable)(struct r8152 *tp, bool enable);
+   void (*u2p3_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
void (*power_cut_en)(struct r8152 *tp, bool enable);
} rtl_ops;
@@ -2418,7 +2419,7 @@ static void rtl_runtime_suspend_enable(struct r8152 *tp, 
bool enable)
u32 ocp_data;
 
tp->rtl_ops.u1u2_enable(tp, false);
-   r8153_u2p3en(tp, false);
+   tp->rtl_ops.u2p3_enable(tp, false);
 
__rtl_set_wol(tp, WAKE_ANY);
 
@@ -2431,7 +2432,7 @@ static void rtl_runtime_suspend_enable(struct r8152 *tp, 
bool enable)
ocp_write_byte(tp, MCU_TYPE_PLA, PLA_CRWECR, CRWECR_NORAML);
} else {
__rtl_set_wol(tp, tp->saved_wolopts);
-   r8153_u2p3en(tp, true);
+   tp->rtl_ops.u2p3_enable(tp, true);
tp->rtl_ops.u1u2_enable(tp, true);
}
 }
@@ -2935,7 +2936,7 @@ static void rtl8153_up(struct r8152 *tp)
tp->rtl_ops.aldps_enable(tp, false);
r8153_first_init(tp);
tp->rtl_ops.aldps_enable(tp, true);
-   r8153_u2p3en(tp, true);
+   tp->rtl_ops.u2p3_enable(tp, true);
tp->rtl_ops.u1u2_enable(tp, true);
usb_enable_lpm(tp->udev);
 }
@@ -2948,7 +2949,7 @@ static void rtl8153_down(struct r8152 *tp)
}
 
tp->rtl_ops.u1u2_enable(tp, false);
-   r8153_u2p3en(tp, false);
+   tp->rtl_ops.u2p3_enable(tp, false);
tp->rtl_ops.power_cut_en(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
r8153_enter_oob(tp);
@@ -4152,6 +4153,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->in_nway= rtl8152_in_nway;
ops->aldps_enable   = r8152_aldps_en;
ops->u1u2_enable= r8153_u1u2en;
+   ops->u2p3_enable= r8153_u2p3en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
ops->power_cut_en   = r8152_power_cut_en;
break;
@@ -4171,6 +4173,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->in_nway= rtl8153_in_nway;
ops->aldps_enable   = r8153_aldps_en;
ops->u1u2_enable= r8153_u1u2en;
+   ops->u2p3_enable= r8153_u2p3en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
ops->power_cut_en   = r8153A_power_cut_en;
break;
-- 
2.4.11



[PATCH net-next v3 3/7] r8152: add power_cut_en for rtl_ops

2016-06-28 Thread Hayes Wang
Add power_cut_en() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index f51d799..a4f8a01 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -622,6 +622,7 @@ struct r8152 {
void (*aldps_enable)(struct r8152 *tp, bool enable);
void (*u1u2_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
+   void (*power_cut_en)(struct r8152 *tp, bool enable);
} rtl_ops;
 
int intr_interval;
@@ -2391,6 +2392,13 @@ static void r8153_power_cut_en(struct r8152 *tp, bool 
enable)
else
ocp_data &= ~(PWR_EN | PHASE2_EN);
ocp_write_word(tp, MCU_TYPE_USB, USB_POWER_CUT, ocp_data);
+}
+
+static void r8153A_power_cut_en(struct r8152 *tp, bool enable)
+{
+   u32 ocp_data;
+
+   r8153_power_cut_en(tp, enable);
 
ocp_data = ocp_read_word(tp, MCU_TYPE_USB, USB_MISC_0);
ocp_data &= ~PCUT_STATUS;
@@ -2941,7 +2949,7 @@ static void rtl8153_down(struct r8152 *tp)
 
tp->rtl_ops.u1u2_enable(tp, false);
r8153_u2p3en(tp, false);
-   r8153_power_cut_en(tp, false);
+   tp->rtl_ops.power_cut_en(tp, false);
tp->rtl_ops.aldps_enable(tp, false);
r8153_enter_oob(tp);
tp->rtl_ops.aldps_enable(tp, true);
@@ -3397,7 +3405,7 @@ static void r8153_init(struct r8152 *tp)
 
ocp_write_word(tp, MCU_TYPE_USB, USB_CONNECT_TIMER, 0x0001);
 
-   r8153_power_cut_en(tp, false);
+   r8153A_power_cut_en(tp, false);
r8153_u1u2en(tp, true);
 
ocp_write_word(tp, MCU_TYPE_PLA, PLA_MAC_PWR_CTRL, ALDPS_SPDWN_RATIO);
@@ -4122,7 +4130,7 @@ static void rtl8153_unload(struct r8152 *tp)
if (test_bit(RTL8152_UNPLUG, &tp->flags))
return;
 
-   r8153_power_cut_en(tp, false);
+   tp->rtl_ops.power_cut_en(tp, false);
 }
 
 static int rtl_ops_init(struct r8152 *tp)
@@ -4145,6 +4153,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->aldps_enable   = r8152_aldps_en;
ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
+   ops->power_cut_en   = r8152_power_cut_en;
break;
 
case RTL_VER_03:
@@ -4163,6 +4172,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->aldps_enable   = r8153_aldps_en;
ops->u1u2_enable= r8153_u1u2en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
+   ops->power_cut_en   = r8153A_power_cut_en;
break;
 
default:
-- 
2.4.11



[PATCH net-next v3 1/7] r8152: add aldps_enable for rtl_ops

2016-06-28 Thread Hayes Wang
Add aldps_enable() for rtl_ops.

Signed-off-by: Hayes Wang 
---
 drivers/net/usb/r8152.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index 11178f9..b253003 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -619,6 +619,7 @@ struct r8152 {
int (*eee_get)(struct r8152 *, struct ethtool_eee *);
int (*eee_set)(struct r8152 *, struct ethtool_eee *);
bool (*in_nway)(struct r8152 *);
+   void (*aldps_enable)(struct r8152 *tp, bool enable);
void (*hw_phy_cfg)(struct r8152 *);
} rtl_ops;
 
@@ -2474,9 +2475,9 @@ static void r8152_aldps_en(struct r8152 *tp, bool enable)
 
 static void rtl8152_disable(struct r8152 *tp)
 {
-   r8152_aldps_en(tp, false);
+   tp->rtl_ops.aldps_enable(tp, false);
rtl_disable(tp);
-   r8152_aldps_en(tp, true);
+   tp->rtl_ops.aldps_enable(tp, true);
 }
 
 static void r8152b_hw_phy_cfg(struct r8152 *tp)
@@ -2801,9 +2802,7 @@ static void r8153_aldps_en(struct r8152 *tp, bool enable)
 
 static void rtl8153_disable(struct r8152 *tp)
 {
-   r8153_aldps_en(tp, false);
-   rtl_disable(tp);
-   r8153_aldps_en(tp, true);
+   rtl8152_disable(tp);
usb_enable_lpm(tp->udev);
 }
 
@@ -2924,9 +2923,9 @@ static void rtl8153_up(struct r8152 *tp)
return;
 
r8153_u1u2en(tp, false);
-   r8153_aldps_en(tp, false);
+   tp->rtl_ops.aldps_enable(tp, false);
r8153_first_init(tp);
-   r8153_aldps_en(tp, true);
+   tp->rtl_ops.aldps_enable(tp, true);
r8153_u2p3en(tp, true);
r8153_u1u2en(tp, true);
usb_enable_lpm(tp->udev);
@@ -2942,9 +2941,9 @@ static void rtl8153_down(struct r8152 *tp)
r8153_u1u2en(tp, false);
r8153_u2p3en(tp, false);
r8153_power_cut_en(tp, false);
-   r8153_aldps_en(tp, false);
+   tp->rtl_ops.aldps_enable(tp, false);
r8153_enter_oob(tp);
-   r8153_aldps_en(tp, true);
+   tp->rtl_ops.aldps_enable(tp, true);
 }
 
 static bool rtl8152_in_nway(struct r8152 *tp)
@@ -4142,6 +4141,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_get= r8152_get_eee;
ops->eee_set= r8152_set_eee;
ops->in_nway= rtl8152_in_nway;
+   ops->aldps_enable   = r8152_aldps_en;
ops->hw_phy_cfg = r8152b_hw_phy_cfg;
break;
 
@@ -4158,6 +4158,7 @@ static int rtl_ops_init(struct r8152 *tp)
ops->eee_get= r8153_get_eee;
ops->eee_set= r8153_set_eee;
ops->in_nway= rtl8153_in_nway;
+   ops->aldps_enable   = r8153_aldps_en;
ops->hw_phy_cfg = r8153_hw_phy_cfg;
break;
 
-- 
2.4.11



Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.

I have a local patch (appended, if anyone wants) to reduce the wasteful
amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
6 to 3 blocks on decrypt), but it would take some rework to convert to
crypto_cipher_encrypt_one() or avoid stack buffers entirely.

commit c0aa0ae38dc6115b378939c5483ba6c7eb65d92a
Author: George Spelvin 
Date:   Sat Oct 10 17:26:08 2015 -0400

crypto: cts - Reduce internal buffer usage

It only takes a 3-block temporary buffer to handle all the tricky
CTS cases.  Encryption could in theory be done with two, but at a cost
in complexity.

But it's still a saving from the previous six blocks on the stack.

One issue I'm uncertain of and I'd like clarification on: to simplify
the cts_cbc_{en,de}crypt calls, I pass in the lcldesc structure which
contains the ctx->child transform rather than the parent one.  I'm
assuming the block sizes are guaranteed to be the same (they're set up
in crypto_cts_alloc by copying), but I haven't been able to prove it to
my satisfaction.

Signed-off-by: George Spelvin 

diff --git a/crypto/cts.c b/crypto/cts.c
index e467ec0ac..e24d2e15 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -70,54 +70,44 @@ static int crypto_cts_setkey(struct crypto_tfm *parent, 
const u8 *key,
return err;
 }
 
-static int cts_cbc_encrypt(struct crypto_cts_ctx *ctx,
-  struct blkcipher_desc *desc,
+/*
+ * The final CTS encryption is just like CBC encryption except that:
+ * - the last plaintext block is zero-padded,
+ * - the second-last ciphertext block is trimmed, and
+ * - the last (complete) block of ciphertext is output before the
+ *   (truncated) second-last one.
+ */
+static int cts_cbc_encrypt(struct blkcipher_desc *lcldesc,
   struct scatterlist *dst,
   struct scatterlist *src,
   unsigned int offset,
   unsigned int nbytes)
 {
-   int bsize = crypto_blkcipher_blocksize(desc->tfm);
-   u8 tmp[bsize], tmp2[bsize];
-   struct blkcipher_desc lcldesc;
-   struct scatterlist sgsrc[1], sgdst[1];
+   int bsize = crypto_blkcipher_blocksize(lcldesc->tfm);
+   u8 tmp[3*bsize] __aligned(8);
+   struct scatterlist sgsrc[1], sgdst[2];
int lastn = nbytes - bsize;
-   u8 iv[bsize];
-   u8 s[bsize * 2], d[bsize * 2];
int err;
 
-   if (lastn < 0)
+   if (lastn <= 0)
return -EINVAL;
 
-   sg_init_table(sgsrc, 1);
-   sg_init_table(sgdst, 1);
-
-   memset(s, 0, sizeof(s));
-   scatterwalk_map_and_copy(s, src, offset, nbytes, 0);
-
-   memcpy(iv, desc->info, bsize);
-
-   lcldesc.tfm = ctx->child;
-   lcldesc.info = iv;
-   lcldesc.flags = desc->flags;
-
-   sg_set_buf(&sgsrc[0], s, bsize);
-   sg_set_buf(&sgdst[0], tmp, bsize);
-   err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
-   memcpy(d + bsize, tmp, lastn);
-
-   lcldesc.info = tmp;
-
-   sg_set_buf(&sgsrc[0], s + bsize, bsize);
-   sg_set_buf(&sgdst[0], tmp2, bsize);
-   err = crypto_blkcipher_encrypt_iv(&lcldesc, sgdst, sgsrc, bsize);
-
-   memcpy(d, tmp2, bsize);
-
-   scatterwalk_map_and_copy(d, dst, offset, nbytes, 1);
-
-   memcpy(desc->info, tmp2, bsize);
+   /* Copy the input to a temporary buffer; tmp = xxx, P[n-1], P[n] */
+   memset(tmp+2*bsize, 0, bsize);
+   scatterwalk_map_and_copy(tmp+bsize, src, offset, nbytes, 0);
+
+   sg_init_one(sgsrc, tmp+bsize, 2*bsize);
+   /* Initialize dst specially to do the rearrangement for us */
+   sg_init_table(sgdst, 2);
+   sg_set_buf(sgdst+0, tmp+bsize, bsize);
+   sg_set_buf(sgdst+1, tmp,   bsize);
+
+   /* CBC-encrypt in place the two blocks; tmp = C[n], C[n-1], P[n] */
+   err = crypto_blkcipher_encrypt_iv(lcldesc, sgdst, sgsrc, 2*bsize);
+
+   /* Copy beginning of tmp to the output */
+   scatterwalk_map_and_copy(tmp, dst, offset, nbytes, 1);
+   memzero_explicit(tmp, sizeof(tmp));
 
return err;
 }
@@ -126,8 +116,8 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
  struct scatterlist *dst, struct scatterlist *src,
  unsigned int nbytes)
 {
-   struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int bsize = crypto_blkcipher_blocksize(desc->tfm);
+   struct crypto_cts_ctx *ctx = crypto_blkcipher_ctx(desc->tfm);
int tot_blocks = (nbytes + bsize - 1) / bsize;
int cbc_blocks = tot_blocks > 2 ? tot_blocks - 2 : 0;
struct blkcipher_desc lcldesc;
@@ -140,14 +130,14 @@ static int crypto_cts_encrypt(struct blkcipher_desc *desc,
if (tot_blocks == 1) {
err = crypto_blkcipher_encrypt_iv(&lcldesc, dst, src, bsize);
} else if (nbytes <

Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread Herbert Xu
On Tue, Jun 28, 2016 at 08:37:43AM -0400, George Spelvin wrote:
> Just a note, crypto/cts.c also does a lot of sg_set_buf() in stack buffers.
> 
> I have a local patch (appended, if anyone wants) to reduce the wasteful
> amount of buffer space it uses (from 7 to 3 blocks on encrypt, from
> 6 to 3 blocks on decrypt), but it would take some rework to convert to
> crypto_cipher_encrypt_one() or avoid stack buffers entirely.

I'm currently working on cts and I'm removing the stack usage
altogether by having it operate on the src/dst SG lists only.

It's part of the skcipher conversion though so it'll have to go
through the crypto tree.

BTW, the only cts user in our tree appears to be implementing
CTS all over again and is only calling the crypto API cts for
the last two blocks.  Someone should fix that.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address

2016-06-28 Thread David Miller
From: Linus Lüssing 
Date: Sat, 25 Jun 2016 16:20:28 +0200

> On Fri, Jun 24, 2016 at 12:35:18PM +0200, Daniel Danzberger wrote:
>> The bridge is falsly dropping ipv6 mulitcast packets if there is:
>>  1. No ipv6 address assigned on the brigde.
>>  2. No external mld querier present.
>>  3. The internal querier enabled.
>> 
>> When the bridge fails to build mld queries, because it has no
>> ipv6 address, it slilently returns, but keeps the local querier enabled.
>> This specific case causes confusing packet loss.
>> 
>> Ipv6 multicast snooping can only work if:
>>  a) An external querier is present
>>  OR
>>  b) The bridge has an ipv6 address an is capable of sending own queries
>> 
>> Otherwise it has to forward/flood the ipv6 multicast traffic,
>> because snooping cannot work.
>> 
>> This patch fixes the issue by adding a flag to the bridge struct that
>> indicates that there is currently no ipv6 address assinged to the bridge
>> and returns a false state for the local querier in
>> __br_multicast_querier_exists().
> 
> Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()")

You're missing an initial 'd' in that SHA1-ID.

With that fixed, applied and queued up for -stable.


Re: [PATCH net-next] net: bridge: add support for IGMP/MLD stats and export them via netlink

2016-06-28 Thread Nikolay Aleksandrov
On 27/06/16 20:10, Nikolay Aleksandrov wrote:
> This patch adds stats support for the currently used IGMP/MLD types by the
> bridge. The stats are per-port (plus one stat per-bridge) and per-direction
> (RX/TX). The stats are exported via netlink via the new linkxstats API
> (RTM_GETSTATS). In order to minimize the performance impact, a new option
> is used to enable/disable the stats - multicast_stats_enabled, similar to
> the recent vlan stats. Also in order to avoid multiple IGMP/MLD type
> lookups and checks, we make use of the current "igmp" member of the bridge
> private skb->cb region to record the type on Rx (both host-generated and
> external packets pass by multicast_rcv()). We can do that since the igmp
> member was used as a boolean and all the valid IGMP/MLD types are positive
> values. The normal bridge fast-path is not affected at all, the only
> affected paths are the flooding ones and since we make use of the IGMP/MLD
> type, we can quickly determine if the packet should be counted using
> cache-hot data (cb's igmp member). We add counters for:
> * IGMP Queries
> * IGMP Leaves
> * IGMP v1/v2/v3 reports
> 
> * MLD Queries
> * MLD Leaves
> * MLD v1/v2 reports
> 
> These are invaluable when monitoring or debugging complex multicast setups
> with bridges.
> 
> Signed-off-by: Nikolay Aleksandrov 
> ---
>  include/uapi/linux/if_bridge.h |  27 +++
>  include/uapi/linux/if_link.h   |   1 +
>  net/bridge/br_device.c |  10 ++-
>  net/bridge/br_forward.c|  13 ++-
>  net/bridge/br_if.c |   9 ++-
>  net/bridge/br_input.c  |   3 +
>  net/bridge/br_multicast.c  | 176 
> +
>  net/bridge/br_netlink.c|  94 --
>  net/bridge/br_private.h|  41 +-
>  net/bridge/br_sysfs_br.c   |  25 ++
>  10 files changed, 356 insertions(+), 43 deletions(-)
> 

Self-NAK, while the patch is okay, me and Roopa have been talking about 
exporting
the stats via the new API better and would like to introduce a way to expose 
only
per-port stats via a specific RTM_GETSTATS request instead of dumping it all via
the bridge request in order to be able to get the stats only for a single 
device.

I will post v2 after working out the details on how to achieve per-port export 
of
linkxstats.

Thanks,
 Nik





Re: [PATCH] connector: fix out-of-order cn_proc netlink message delivery

2016-06-28 Thread David Miller
From: Aaron Campbell 
Date: Fri, 24 Jun 2016 10:05:32 -0300

> The proc connector messages include a sequence number, allowing userspace
> programs to detect lost messages.  However, performing this detection is
> currently more difficult than necessary, since netlink messages can be
> delivered to the application out-of-order.  To fix this, leave pre-emption
> disabled during cn_netlink_send(), and use GFP_NOWAIT.
> 
> The following was written as a test case.  Building the kernel w/ make -j32
> proved a reliable way to generate out-of-order cn_proc messages.
 ...
> Signed-off-by: Aaron Campbell 

Applied.


Re: [PATCH (net-next.git) 0/3] stmmac: rework and enhance the PCS support

2016-06-28 Thread David Miller
From: Giuseppe Cavallaro 
Date: Fri, 24 Jun 2016 15:16:23 +0200

> The 3.xx and 4.xx synopsys gmacs have a very similar
> PCS embedded module and they share almost the same registers;
> for example:
>   AN_Control, AN_Status, AN_Advertisement, AN_Link_Partner_Ability,
>   AN_Expansion, TBI_Extended_Status.
> 
> Just the RGMII/SMII Control/Status register differs.
> 
> So these patches aim to reorganize and enhance the PCS support;
> to do that, some small inline functions have been provided and
> also some rework to the PCS ISR part has been done.
> 
> In the end, the SGMII for MAC2MAC connection has been introduced.
> 
> All patches have been built on top of net-next git and, as for
> the previous version, not fully tested.

Series applied, thanks.


Re: [PATCH 4/4] mac80211: sta_info: max_peers reached falsely

2016-06-28 Thread Bob Copeland
On Tue, Jun 28, 2016 at 02:13:07PM +0300, Yaniv Machani wrote:
> From: Meirav Kama 
> 
> Issue happened when receiving delete_sta command without
> changing plink_state from ESTAB to HOLDING before.
> When receiving delete_sta command for mesh interface
> verify plink_state is not ESTAB and if so, decrease
> plink count and update beacon.

This should be fixed already (and properly) by the patch
"mac80211: Fix mesh estab links counting" -- please let us
know if you have a case that's still broken with that fix.

-- 
Bob Copeland %% http://bobcopeland.com/


Re: [PATCH v2 00/15] drivers: net: cpsw: improve runtime pm

2016-06-28 Thread David Miller
From: Grygorii Strashko 
Date: Fri, 24 Jun 2016 21:23:40 +0300

> This series intended to improve runtime PM and allow CPSW to be
> RPM suspended when all ethX netdevices are down.

Series applied, thanks.


Re: [PATCH] etherdevice.h & bridge: netfilter: Add and use ether_addr_equal_masked

2016-06-28 Thread David Miller
From: Joe Perches 
Date: Fri, 24 Jun 2016 11:32:26 -0700

> There are code duplications of a masked ethernet address comparison here
> so make it a separate function instead.
> 
> Miscellanea:
> 
> o Neaten alignment of FWINV macro uses to make it clearer for the reader
> 
> Signed-off-by: Joe Perches 

Pablo feel free to take this:

Acked-by: David S. Miller 


Re: [PATCH] connector: fix out-of-order cn_proc netlink message delivery

2016-06-28 Thread Evgeniy Polyakov
Hi Aaron

24.06.2016, 16:07, "Aaron Campbell" :
> The proc connector messages include a sequence number, allowing userspace
> programs to detect lost messages. However, performing this detection is
> currently more difficult than necessary, since netlink messages can be
> delivered to the application out-of-order. To fix this, leave pre-emption
> disabled during cn_netlink_send(), and use GFP_NOWAIT.
>
> The following was written as a test case. Building the kernel w/ make -j32
> proved a reliable way to generate out-of-order cn_proc messages.

This is not actually about out-of-order sending which is impossible iirc,
but the way fork pushes messages into socket queue in parallel. What you've done
is syncing one more layer higher.

I'm not against this patch if you think it does fix some issues, but wording is 
not correct imo.


RE: [PATCH 4/4] mac80211: sta_info: max_peers reached falsely

2016-06-28 Thread Machani, Yaniv
On Tue, Jun 28, 2016 at 15:56:21, Bob Copeland wrote:
> linux- wirel...@vger.kernel.org; netdev@vger.kernel.org; Kama, Meirav
> Subject: Re: [PATCH 4/4] mac80211: sta_info: max_peers reached falsely
> 
> On Tue, Jun 28, 2016 at 02:13:07PM +0300, Yaniv Machani wrote:
> > From: Meirav Kama 
> >
> > Issue happened when receiving delete_sta command without changing 
> > plink_state from ESTAB to HOLDING before.
> > When receiving delete_sta command for mesh interface verify 
> > plink_state is not ESTAB and if so, decrease plink count and update 
> > beacon.
> 
> This should be fixed already (and properly) by the patch
> "mac80211: Fix mesh estab links counting" -- please let us know if you 
> have a case that's still broken with that fix.
> 

Thanks Bob,
Will be dropped.

Yaniv
> --
> Bob Copeland %% http://bobcopeland.com/




[iproute PATCH 1/2] ip-address: Support filtering by slave type, too

2016-06-28 Thread Phil Sutter
This patch allows to query all interfaces enslaved to a bridge or bond
using the following syntax:

| ip addr show type bridge_slave

Filtering has to be done in userspace since the kernel does not support
filtering on IFLA_INFO_SLAVE_KIND.

Functionality introduced in this patch is not fully complete since it
does not allow to match on type and slave type at the same time, but it
doesn't prevent implementing a dedicated slave_type match, either.

Signed-off-by: Phil Sutter 
---
 ip/ipaddress.c | 52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 8766530f7fa7c..56f68eb21c0fc 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -61,6 +61,7 @@ static struct
int group;
int master;
char *kind;
+   char *slave_kind;
 } filter;
 
 static int do_link;
@@ -206,18 +207,27 @@ static void print_linkmode(FILE *f, struct rtattr *tb)
fprintf(f, "mode %s ", link_modes[mode]);
 }
 
-static char *parse_link_kind(struct rtattr *tb)
+static char *parse_link_kind(struct rtattr *tb, bool slave)
 {
struct rtattr *linkinfo[IFLA_INFO_MAX+1];
+   int attr = slave ? IFLA_INFO_SLAVE_KIND : IFLA_INFO_KIND;
 
parse_rtattr_nested(linkinfo, IFLA_INFO_MAX, tb);
 
-   if (linkinfo[IFLA_INFO_KIND])
-   return RTA_DATA(linkinfo[IFLA_INFO_KIND]);
+   if (linkinfo[attr])
+   return RTA_DATA(linkinfo[attr]);
 
return "";
 }
 
+static int match_link_kind(struct rtattr **tb, char *kind, bool slave)
+{
+   if (!tb[IFLA_LINKINFO])
+   return -1;
+
+   return strcmp(parse_link_kind(tb[IFLA_LINKINFO], slave), kind);
+}
+
 static void print_linktype(FILE *fp, struct rtattr *tb)
 {
struct rtattr *linkinfo[IFLA_INFO_MAX+1];
@@ -680,16 +690,11 @@ int print_linkinfo_brief(const struct sockaddr_nl *who,
} else if (filter.master > 0)
return -1;
 
-   if (filter.kind) {
-   if (tb[IFLA_LINKINFO]) {
-   char *kind = parse_link_kind(tb[IFLA_LINKINFO]);
+   if (filter.kind && match_link_kind(tb, filter.kind, 0))
+   return -1;
 
-   if (strcmp(kind, filter.kind))
-   return -1;
-   } else {
-   return -1;
-   }
-   }
+   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
+   return -1;
 
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
@@ -781,16 +786,11 @@ int print_linkinfo(const struct sockaddr_nl *who,
} else if (filter.master > 0)
return -1;
 
-   if (filter.kind) {
-   if (tb[IFLA_LINKINFO]) {
-   char *kind = parse_link_kind(tb[IFLA_LINKINFO]);
+   if (filter.kind && match_link_kind(tb, filter.kind, 0))
+   return -1;
 
-   if (strcmp(kind, filter.kind))
-   return -1;
-   } else {
-   return -1;
-   }
-   }
+   if (filter.slave_kind && match_link_kind(tb, filter.slave_kind, 1))
+   return -1;
 
if (n->nlmsg_type == RTM_DELLINK)
fprintf(fp, "Deleted ");
@@ -1621,8 +1621,16 @@ static int ipaddr_list_flush_or_save(int argc, char 
**argv, int action)
invarg("Device does not exist\n", *argv);
filter.master = ifindex;
} else if (strcmp(*argv, "type") == 0) {
+   int soff;
+
NEXT_ARG();
-   filter.kind = *argv;
+   soff = strlen(*argv) - strlen("_slave");
+   if (!strcmp(*argv + soff, "_slave")) {
+   (*argv)[soff] = '\0';
+   filter.slave_kind = *argv;
+   } else {
+   filter.kind = *argv;
+   }
} else {
if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
-- 
2.8.2



[iproute PATCH 2/2] ip-address: Align type list in help and man page

2016-06-28 Thread Phil Sutter
This adds missing entries on both sides until they are identical.

Signed-off-by: Phil Sutter 
---
 ip/ipaddress.c   | 6 +++---
 man/man8/ip-address.8.in | 3 +++
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 56f68eb21c0fc..d4d649505e15a 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -95,9 +95,9 @@ static void usage(void)
fprintf(stderr, "LIFETIME := [ valid_lft LFT ] [ preferred_lft LFT 
]\n");
fprintf(stderr, "LFT := forever | SECONDS\n");
fprintf(stderr, "TYPE := { vlan | veth | vcan | dummy | ifb | macvlan | 
macvtap |\n");
-   fprintf(stderr, "  bridge | bond | ipoib | ip6tnl | ipip | sit 
| vxlan |\n");
-   fprintf(stderr, "  gre | gretap | ip6gre | ip6gretap | vti | 
nlmon |\n");
-   fprintf(stderr, "  bond_slave | ipvlan | geneve | bridge_slave 
| vrf }\n");
+   fprintf(stderr, "  bridge | bond | ipoib | ip6tnl | ipip | sit 
| vxlan | lowpan |\n");
+   fprintf(stderr, "  gre | gretap | ip6gre | ip6gretap | vti | 
nlmon | can |\n");
+   fprintf(stderr, "  bond_slave | ipvlan | geneve | bridge_slave 
| vrf | hsr}\n");
 
exit(-1);
 }
diff --git a/man/man8/ip-address.8.in b/man/man8/ip-address.8.in
index 8d34adb336af4..3cbe4181f7e36 100644
--- a/man/man8/ip-address.8.in
+++ b/man/man8/ip-address.8.in
@@ -98,7 +98,9 @@ ip-address \- protocol address management
 .ti -8
 .IR TYPE " := [ "
 .BR bridge " | "
+.BR bridge_slave " |"
 .BR bond " | "
+.BR bond_slave " |"
 .BR can " | "
 .BR dummy " | "
 .BR hsr " | "
@@ -118,6 +120,7 @@ ip-address \- protocol address management
 .BR ip6gre " |"
 .BR ip6gretap " |"
 .BR vti " |"
+.BR vrf " |"
 .BR nlmon " |"
 .BR ipvlan " |"
 .BR lowpan " |"
-- 
2.8.2



[iproute PATCH 0/2] ip-address: fix type list inconsistencies

2016-06-28 Thread Phil Sutter
The basic problem was differences in type list reported by help output
and stated in man page.

I decided to tackle the problem from both sides:
a) Make sure 'ip addr show' supports matching on all types reported.
b) Add missing types in either list.

This is still rather best-effort, actually things are quite messed up:
- Lists are not sorted, it's easy to miss something.
- The type list is duplicated four times as ip-link help and man page
  contain it, too.
- The kernel supports more types than listed here.
- We can't add but match on all types the kernel supports.

Phil Sutter (2):
  ip-address: Support filtering by slave type, too
  ip-address: Align type list in help and man page

 ip/ipaddress.c   | 58 +++-
 man/man8/ip-address.8.in |  3 +++
 2 files changed, 36 insertions(+), 25 deletions(-)

-- 
2.8.2



Re: [PATCH net] sock_diag: do not broadcast raw socket destruction

2016-06-28 Thread David Miller
From: Willem de Bruijn 
Date: Fri, 24 Jun 2016 16:02:35 -0400

> From: Willem de Bruijn 
> 
> Diag intends to broadcast tcp_sk and udp_sk socket destruction.
> Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not
> sufficient for this. Raw sockets can have the same type.
> 
> Add a test for sk->sk_type.
> 
> Fixes: eb4cb008529c ("sock_diag: define destruction multicast groups")
> Signed-off-by: Willem de Bruijn 

Applied and queued up for -stable, thanks.


Re: [PATCH 1/2] net: ethernet: hix5hd2: use phydev from struct net_device

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 16:55:12 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phy in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 1/2] net: ethernet: sxgbe: use phydev from struct net_device

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 22:05:26 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 1/2] net: ethernet: r6040: use phydev from struct net_device

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 21:09:01 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 2/2] net: ethernet: hix5hd2: use phy_ethtool_{get|set}_link_ksettings

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 16:55:13 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 2/2] net: ethernet: r6040: use phy_ethtool_{get|set}_link_ksettings

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 21:09:02 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [ovs-dev] [PATCH net] openvswitch: fix conntrack netlink event delivery

2016-06-28 Thread Joe Stringer
On 28 June 2016 at 14:12, Samuel Gauthier  wrote:
> Only the first and last netlink message for a particular conntrack are
> actually sent. The first message is sent through nf_conntrack_confirm when
> the conntrack is committed. The last one is sent when the conntrack is
> destroyed on timeout. The other conntrack state change messages are not
> advertised.
>
> When the conntrack subsystem is used from netfilter, nf_conntrack_confirm
> is called for each packet, from the postrouting hook, which in turn calls
> nf_ct_deliver_cached_events to send the state change netlink messages.
>
> This commit fixes the problem by calling nf_conntrack_confirm all the time,
> i.e not only in the commit case.
>
> Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
> CC: Joe Stringer 
> CC: Justin Pettit 
> CC: Andy Zhou 
> CC: Thomas Graf 
> Signed-off-by: Samuel Gauthier 

This breaks the semantics of OVS_CT_ATTR_COMMIT. If you just want to
ensure that nf_ct_deliver_cached_events() is run, then we should call
to that for confirmed connections in the non-commit case.


Re: [PATCH 2/2] net: ethernet: sxgbe: use phy_ethtool_{get|set}_link_ksettings

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 22:05:27 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH 1/2] net: ethernet: dwc_eth_qos: use phydev from struct net_device

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 23:05:15 +0200

> The private structure contain a pointer to phydev, but the structure
> net_device already contain such pointer. So we can remove the pointer
> phydev in the private structure, and update the driver to use the
> one contained in struct net_device.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: [PATCH net] Bridge: Fix ipv6 mc snooping if bridge has no ipv6 address

2016-06-28 Thread Linus Lüssing
On Tue, Jun 28, 2016 at 08:04:42AM -0400, David Miller wrote:
> From: Linus Lüssing 
> [...]
> > Fixes: 1d81d4c3dd88 ("bridge: check return value of ipv6_dev_get_saddr()")
> 
> You're missing an initial 'd' in that SHA1-ID.
> 
> With that fixed, applied and queued up for -stable.

Sorry :(. Thanks for taking care of it!


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
Herbert Xu wrote:
> I'm currently working on cts and I'm removing the stack usage
> altogether by having it operate on the src/dst SG lists only.

Wow, I should see how you do that.  I couldn't get it below 3
blocks of temporary, and the dst SG list only gives you
one and a half.

> BTW, the only cts user in our tree appears to be implementing
> CTS all over again and is only calling the crypto API cts for
> the last two blocks.  Someone should fix that.

Hint taken.  Although I'm having a hard time finding that only user
amidst all the drivers thinking it means Clear To Send or (for HDMI)
Cycle Time Stamp.

Um...the uses in fs/crypto/keyinfo.c and fs/ext4/crypto_key.c
don't seem to do anything untoward.

Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?


I have a request of you: like Andy, I find the crypto layer an
impenetrable thicket of wrapper structures.  I'm not suggesting there
aren't reasons for it, but it's extremely hard to infer those reasons by
looking at the code.  If I were to draft a (hilariously wrong) overview
document, would you be willing to edit it into correctness?


Re: [PATCH 2/2] net: ethernet: dwc_eth_qos: use phy_ethtool_{get|set}_link_ksettings

2016-06-28 Thread David Miller
From: Philippe Reynes 
Date: Sat, 25 Jun 2016 23:05:16 +0200

> There are two generics functions phy_ethtool_{get|set}_link_ksettings,
> so we can use them instead of defining the same code in the driver.
> 
> Signed-off-by: Philippe Reynes 

Applied.


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread Herbert Xu
On Tue, Jun 28, 2016 at 09:23:01AM -0400, George Spelvin wrote:
> 
> Wow, I should see how you do that.  I couldn't get it below 3
> blocks of temporary, and the dst SG list only gives you
> one and a half.

I don't mean that I'm using no temporary buffers at all, just
that the actual crypto only operates on the SG lists.  I'm still
doing the xoring and stitching in temp buffers.  I just counted
and I'm using three blocks like you.

> Is net/sunrpc/auth_gss/gss_krb5_mech.c doing something odd?

Yes gss_krb5_crypto.c is the one.

> I have a request of you: like Andy, I find the crypto layer an
> impenetrable thicket of wrapper structures.  I'm not suggesting there
> aren't reasons for it, but it's extremely hard to infer those reasons by
> looking at the code.  If I were to draft a (hilariously wrong) overview
> document, would you be willing to edit it into correctness?

We have actually gained quite a bit of documentation recently.
Have you looked at Documentation/DocBook/crypto-API.tmpl?

More is always welcome of course.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


[PATCH v3] fib_rules: Added NLM_F_EXCL support to fib_nl_newrule

2016-06-28 Thread Mateusz Bajorski
When adding rule with NLM_F_EXCL flag then check if the same rule exist.
If yes then exit with -EEXIST.

This is already implemented in iproute2:
if (cmd == RTM_NEWRULE) {
req.n.nlmsg_flags |= NLM_F_CREATE|NLM_F_EXCL;
req.r.rtm_type = RTN_UNICAST;
}

Tested ipv4 and ipv6 with net-next linux on qemu x86

expected behavior after patch:
localhost ~ # ip rule
0:from all lookup local
32766:from all lookup main
32767:from all lookup default
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
localhost ~ # ip rule add from 10.46.177.97 lookup 104 pref 1005
RTNETLINK answers: File exists
localhost ~ # ip rule
0:from all lookup local
1005:from 10.46.177.97 lookup 104
32766:from all lookup main
32767:from all lookup default

There was already topic regarding this but I don't see any changes
merged and problem still occurs.
https://lkml.kernel.org/r/1135778809.5944.7.camel+%28%29+localhost+%21+localdomain

Signed-off-by: Mateusz Bajorski 
---
Changes in v2: section moved to new place where new rule is already built
Changes in v3: compare moved to helper, added fr_net compare

 net/core/fib_rules.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c
index 98298b1..fa0c3ff 100644
--- a/net/core/fib_rules.c
+++ b/net/core/fib_rules.c
@@ -269,6 +269,46 @@ errout:
return err;
 }
 
+static int rule_exists(struct fib_rules_ops *ops, struct fib_rule_hdr *frh,
+  struct nlattr **tb, struct fib_rule *rule)
+{
+   struct fib_rule *r;
+
+   list_for_each_entry(r, &ops->rules_list, list) {
+   if (r->action != rule->action)
+   continue;
+
+   if (r->table != rule->table)
+   continue;
+
+   if (r->pref != rule->pref)
+   continue;
+
+   if (memcmp(r->iifname, rule->iifname, IFNAMSIZ))
+   continue;
+
+   if (memcmp(r->oifname, rule->oifname, IFNAMSIZ))
+   continue;
+
+   if (r->mark != rule->mark)
+   continue;
+
+   if (r->mark_mask != rule->mark_mask)
+   continue;
+
+   if (r->tun_id != rule->tun_id)
+   continue;
+
+   if (r->fr_net != rule->fr_net)
+   continue;
+
+   if (!ops->compare(r, frh, tb))
+   continue;
+   return 1;
+   }
+   return 0;
+}
+
 int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh)
 {
struct net *net = sock_net(skb->sk);
@@ -386,6 +426,12 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr 
*nlh)
if (rule->l3mdev && rule->table)
goto errout_free;
 
+   if ((nlh->nlmsg_flags & NLM_F_EXCL) &&
+   rule_exists(ops, frh, tb, rule)) {
+   err = -EEXIST;
+   goto errout_free;
+   }
+
err = ops->configure(rule, skb, frh, tb);
if (err < 0)
goto errout_free;
-- 
2.6.4



Re: [PATCH net-next 10/16] net/mlx5e: Add devlink based SRIOV mode changes (legacy --> offloads)

2016-06-28 Thread Andy Gospodarek
On Mon, Jun 27, 2016 at 07:07:23PM +0300, Saeed Mahameed wrote:
> From: Or Gerlitz 
> 
> Implement handlers for the devlink commands to get and set the SRIOV
> E-Switch mode.
> 
> When turning to the offloads mode, we disable the e-switch and enable
> it again in the new mode, create the NIC offloads table and create VF reps.
> 
> When turning to legacy mode, we remove the VF reps and the offloads
> table, and re-initiate the e-switch in it's legacy mode.
> 
> The actual creation/removal of the VF reps is done in downstream patches.
> 
> Signed-off-by: Or Gerlitz 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  12 ++-
>  .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 102 
> -
>  2 files changed, 105 insertions(+), 9 deletions(-)
> 
[...]
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 3b3afbd..a39af6b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
[...]
>  int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
>  {
> - return -EOPNOTSUPP;
> + struct mlx5_core_dev *dev;
> + u16 cur_mode;
> +
> + dev = devlink_priv(devlink);
> +
> + if (!MLX5_CAP_GEN(dev, vport_group_manager))
> + return -EOPNOTSUPP;
> +
> + cur_mode = dev->priv.eswitch->mode;
> +
> + if (cur_mode == SRIOV_NONE || mode == SRIOV_NONE)
> + return -EOPNOTSUPP;
> +
> + if (cur_mode == mode)
> + return 0;
> +
> + if (mode == SRIOV_OFFLOADS) /* current mode is legacy */
> + return esw_offloads_start(dev->priv.eswitch);
> + else if (mode == SRIOV_LEGACY) /* curreny mode is offloads */
> + return esw_offloads_stop(dev->priv.eswitch);
> + else
> + return -EINVAL;
>  }
>  
>  int mlx5_devlink_eswitch_mode_get(struct devlink *devlink, u16 *mode)
>  {
> - return -EOPNOTSUPP;
> + struct mlx5_core_dev *dev;
> +
> + dev = devlink_priv(devlink);
> +
> + if (!MLX5_CAP_GEN(dev, vport_group_manager))
> + return -EOPNOTSUPP;
> +
> + if (dev->priv.eswitch->mode == SRIOV_NONE)
> + return -EOPNOTSUPP;
> +
> + *mode = dev->priv.eswitch->mode;
> +
> + return 0;
>  }

This is an _extremely_ minor nit, but I only bring it up since you are
leading the way here and your model may be one that other people
follow...

Internally you have a enum to track the SRIOV modes:

enum {
   SRIOV_NONE,
   SRIOV_LEGACY,
   SRIOV_OFFLOADS
};

But patch 8 adds a new enum for devlink to track this as well.

enum devlink_eswitch_mode {
   DEVLINK_ESWITCH_MODE_NONE,
   DEVLINK_ESWITCH_MODE_LEGACY,
   DEVLINK_ESWITCH_MODE_OFFLOADS,
};

Would it make sense at some point to use the devlink modes in the driver
so it's less to track?

Again, this is an extremely _minor_ concern.  The rest of the set looks
great and I like the architectural decisions made here.  Awesome work
all around!



Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-28 Thread David Miller
From: Dexuan Cui 
Date: Tue, 28 Jun 2016 09:59:21 +

> The idea here is: IMO the syscalls sys_read()/write() shoudn't return
> -ENOMEM, so I have to make sure the buffer allocation succeeds?

You have to fail if resources cannot be allocated.


[PATCH] net: can: Introduce MEN 16Z192-00 CAN controller driver

2016-06-28 Thread Andreas Werner
This CAN Controller is found on MEN Chameleon FPGAs.

The driver/device supports the CAN2.0 specification.
There are 255 RX and 255 Tx buffer within the IP. The
pointer for the buffer are handled by HW to make the
access from within the driver as simple as possible.

The driver also supports parameters to configure the
buffer level interrupt for RX/TX as well as a RX timeout
interrupt.

With this configuration options, the driver/device
provides flexibility for different types of usecases.

Signed-off-by: Andreas Werner 
---
 drivers/net/can/Kconfig|  10 +
 drivers/net/can/Makefile   |   1 +
 drivers/net/can/men_z192_can.c | 990 +
 3 files changed, 1001 insertions(+)
 create mode 100644 drivers/net/can/men_z192_can.c

diff --git a/drivers/net/can/Kconfig b/drivers/net/can/Kconfig
index 0d40aef..0fa0387 100644
--- a/drivers/net/can/Kconfig
+++ b/drivers/net/can/Kconfig
@@ -104,6 +104,16 @@ config CAN_JANZ_ICAN3
  This driver can also be built as a module. If so, the module will be
  called janz-ican3.ko.
 
+config CAN_MEN_Z192
+   tristate "MEN 16Z192-00 CAN Controller"
+   depends on MCB
+   ---help---
+ Driver for MEN 16Z192-00 CAN Controller IP-Core, which
+ is connected to the MEN Chameleon Bus.
+
+ This driver can also be built as a module. If so, the module will be
+ called men_z192_can.ko.
+
 config CAN_RCAR
tristate "Renesas R-Car CAN controller"
depends on ARCH_RENESAS || ARM
diff --git a/drivers/net/can/Makefile b/drivers/net/can/Makefile
index e3db0c8..eb206b3 100644
--- a/drivers/net/can/Makefile
+++ b/drivers/net/can/Makefile
@@ -22,6 +22,7 @@ obj-$(CONFIG_CAN_FLEXCAN) += flexcan.o
 obj-$(CONFIG_CAN_GRCAN)+= grcan.o
 obj-$(CONFIG_CAN_IFI_CANFD)+= ifi_canfd/
 obj-$(CONFIG_CAN_JANZ_ICAN3)   += janz-ican3.o
+obj-$(CONFIG_CAN_MEN_Z192) += men_z192_can.o
 obj-$(CONFIG_CAN_MSCAN)+= mscan/
 obj-$(CONFIG_CAN_M_CAN)+= m_can/
 obj-$(CONFIG_CAN_RCAR) += rcar_can.o
diff --git a/drivers/net/can/men_z192_can.c b/drivers/net/can/men_z192_can.c
new file mode 100644
index 000..d3acc2e
--- /dev/null
+++ b/drivers/net/can/men_z192_can.c
@@ -0,0 +1,990 @@
+/*
+ * MEN 16Z192 CAN Controller driver
+ *
+ * Copyright (C) 2016 MEN Mikroelektronik GmbH (www.men.de)
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; version 2 of the License.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DRV_NAME   "z192_can"
+
+#define MEN_Z192_NAPI_WEIGHT   64
+#define MEN_Z192_MODE_TOUT_US  40
+
+/* CTL/BTR Register Bits */
+#define MEN_Z192_CTL0_INITRQ   BIT(0)
+#define MEN_Z192_CTL0_SLPRQBIT(1)
+#define MEN_Z192_CTL1_INITAK   BIT(8)
+#define MEN_Z192_CTL1_SLPAKBIT(9)
+#define MEN_Z192_CTL1_LISTEN   BIT(12)
+#define MEN_Z192_CTL1_LOOPBBIT(13)
+#define MEN_Z192_CTL1_CANE BIT(15)
+#define MEN_Z192_BTR0_BRP(x)   (((x) & 0x3f) << 16)
+#define MEN_Z192_BTR0_SJW(x)   (((x) & 0x03) << 22)
+#define MEN_Z192_BTR1_TSEG1(x) (((x) & 0x0f) << 24)
+#define MEN_Z192_BTR1_TSEG2(x) (((x) & 0x07) << 28)
+#define MEN_Z192_BTR1_SAMP BIT(31)
+
+/* IER Interrupt Enable Register bits */
+#define MEN_Z192_RXIE  BIT(0)
+#define MEN_Z192_OVRIE BIT(1)
+#define MEN_Z192_CSCIE BIT(6)
+#define MEN_Z192_TOUTE BIT(7)
+#define MEN_Z192_TXIE  BIT(16)
+#define MEN_Z192_ERRIE BIT(17)
+
+#define MEN_Z192_IRQ_ALL   \
+   (MEN_Z192_RXIE | MEN_Z192_OVRIE |   \
+MEN_Z192_CSCIE | MEN_Z192_TOUTE |  \
+MEN_Z192_TXIE)
+
+#define MEN_Z192_IRQ_NAPI  (MEN_Z192_RXIE | MEN_Z192_TOUTE)
+
+/* RX_TX_STAT RX/TX Status status register bits */
+#define MEN_Z192_RX_BUF_CNT(x) ((x) & 0xff)
+#define MEN_Z192_TX_BUF_CNT(x) (((x) & 0xff00) >> 8)
+#defineMEN_Z192_RFLG_RXIF  BIT(16)
+#defineMEN_Z192_RFLG_OVRF  BIT(17)
+#defineMEN_Z192_RFLG_TSTATEGENMASK(19, 18)
+#defineMEN_Z192_RFLG_RSTATEGENMASK(21, 20)
+#defineMEN_Z192_RFLG_CSCIF BIT(22)
+#defineMEN_Z192_RFLG_TOUTF BIT(23)
+#define MEN_Z192_TFLG_TXIF BIT(24)
+
+#define MEN_Z192_GET_TSTATE(x) (((x) & MEN_Z192_RFLG_TSTATE) >> 18)
+#define MEN_Z192_GET_RSTATE(x) (((x) & MEN_Z192_RFLG_RSTATE) >> 20)
+
+#define MEN_Z192_IRQ_FLAGS_ALL \
+   (MEN_Z192_RFLG_RXIF | MEN_Z192_RFLG_OVRF |  \
+MEN_Z192_RFLG_TSTATE | MEN_Z192_RFLG_RSTATE |  \
+MEN_Z192_RFLG_CSCIF | MEN_Z192_RFLG_TOUTF |\
+MEN_Z192_TFLG_TXIF)
+
+/* RX/TX Error counter bits */
+#define MEN_Z192_GET_RX_ERR_CNT(x) ((x) & 0xff)
+#define MEN_Z192_GET_TX_ERR_CNT

Re: [PATCH 3/4] mac80211: mesh: fixed HT ies in beacon template

2016-06-28 Thread Bob Copeland
On Tue, Jun 28, 2016 at 02:13:06PM +0300, Yaniv Machani wrote:
> From: Meirav Kama 
> 
> There are several values in HT info elements of mesh beacon (built by the
> mac80211) that are incorrect.

Would be good to enumerate the problems here.

> To fix them:
> 1. mac80211 will check configuration from cfg and will build accordingly.
> 2. changes made in mesh default values.

What is wrong with the defaults?

>   sband = local->hw.wiphy->bands[band];
>   if (!sband->ht_cap.ht_supported ||
> @@ -431,11 +433,40 @@ int mesh_add_ht_cap_ie(struct ieee80211_sub_if_data 
> *sdata,
>   sdata->vif.bss_conf.chandef.width == NL80211_CHAN_WIDTH_10)
>   return 0;
>  
> +/* determine capability flags */
> + cap = sband->ht_cap.cap;

There is some weird whitespace here (space instead of tabs for the
comment).

-- 
Bob Copeland %% http://bobcopeland.com/


Re: [PATCH net-next 0/6] net: dsa: Platform data for dsa2.c

2016-06-28 Thread Andrew Lunn
On Mon, Jun 27, 2016 at 06:19:28PM -0700, Florian Fainelli wrote:
> 2016-06-27 18:05 GMT-07:00 Andrew Lunn :
> > On Mon, Jun 27, 2016 at 05:52:37PM -0700, Florian Fainelli wrote:
> >> Hi all,
> >>
> >> This patch series adds support for platform data using the new code from
> >> net/dsa/dsa2.c. The motivation behind this is that we have a bit of in tree
> >> platforms (ar7, bcm47xx, x86, others) that could be benefiting from the new
> >> dsa_register_switch() API model but do not support Device Tree, nor is 
> >> there a
> >> plan to bring Device Tree to these platforms (time vs. benefits).
> >
> > Hi Florian
> >
> > Please could you convert an in tree device to actually use this.
> 
> Sure, I don't think there are going to be in tree users who need the
> dsa2_port_link information most of what we have is typically single
> chip, and so in that case, we can even re-use the existing
> dsa_platform_data.

O.K.

Less is better. When going through the old code for the earlier
re-structuring proposals, the code is not simple at times. I would
prefer not adding more complexity to dsa2 than really is needed.

   Andrew


Re: [PATCH] rtlwifi: Create _rtl_dbg_trace function to reduce RT_TRACE code size

2016-06-28 Thread Larry Finger

On 06/27/2016 10:55 PM, Joe Perches wrote:

On Mon, 2016-06-27 at 19:53 -0500, Larry Finger wrote:

On 06/25/2016 05:46 PM, Joe Perches wrote:


This debugging macro can expand to a lot of code.
Make it a function to reduce code size.

(x86-64 defconfig w/ all rtlwifi drivers and allyesconfig)
$ size drivers/net/wireless/realtek/rtlwifi/built-in.o*
 text  data bss dec hex filename
   9000832004991907 1102489  10d299 
drivers/net/wireless/realtek/rtlwifi/built-in.o.defconfig.new
1113597  2004991907 1316003  1414a3 
drivers/net/wireless/realtek/rtlwifi/built-in.o.defconfig.old
1746879  4535038512 2208894  21b47e 
drivers/net/wireless/realtek/rtlwifi/built-in.o.new
2051965  5033118512 2563788  271ecc 
drivers/net/wireless/realtek/rtlwifi/built-in.o.old

Signed-off-by: Joe Perches 

I acked this before; however there is a bug that breaks the build if
CONFIG_RTLWIFI_DEBUG is not defined. The rest of the code calls
_rtl_dbg_trace(), but that symbol is never defined. The problem can be fixed in
debug.c or debug.h.


Confused a bit.  What breaks again?


Nothing breaks and your patch is OK. I had ported it to a GitHub repo of these 
drivers, which had a different debug.h. That led to the missing global when 
CONFIG_RTLWIFI_DEBUG was not defined. That has now been fixed.


Sorry for the confusion.

Larry




Re: [PATCH net-next 10/16] net/mlx5e: Add devlink based SRIOV mode changes (legacy --> offloads)

2016-06-28 Thread Or Gerlitz

On 6/28/2016 4:42 PM, Andy Gospodarek wrote:

On Mon, Jun 27, 2016 at 07:07:23PM +0300, Saeed Mahameed wrote:

From: Or Gerlitz 

Implement handlers for the devlink commands to get and set the SRIOV
E-Switch mode.

When turning to the offloads mode, we disable the e-switch and enable
it again in the new mode, create the NIC offloads table and create VF reps.

When turning to legacy mode, we remove the VF reps and the offloads
table, and re-initiate the e-switch in it's legacy mode.

The actual creation/removal of the VF reps is done in downstream patches.

Signed-off-by: Or Gerlitz 
Signed-off-by: Saeed Mahameed 
---
  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |  12 ++-
  .../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 102 -
  2 files changed, 105 insertions(+), 9 deletions(-)


[...]

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
index 3b3afbd..a39af6b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c

[...]

  int mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode)
  {
-   return -EOPNOTSUPP;
+   struct mlx5_core_dev *dev;
+   u16 cur_mode;
+
+   dev = devlink_priv(devlink);
+
+   if (!MLX5_CAP_GEN(dev, vport_group_manager))
+   return -EOPNOTSUPP;
+
+   cur_mode = dev->priv.eswitch->mode;
+
+   if (cur_mode == SRIOV_NONE || mode == SRIOV_NONE)
+   return -EOPNOTSUPP;
+
+   if (cur_mode == mode)
+   return 0;
+
+   if (mode == SRIOV_OFFLOADS) /* current mode is legacy */
+   return esw_offloads_start(dev->priv.eswitch);
+   else if (mode == SRIOV_LEGACY) /* curreny mode is offloads */
+   return esw_offloads_stop(dev->priv.eswitch);
+   else
+   return -EINVAL;
  }
  


This is an _extremely_ minor nit, but I only bring it up since you are
leading the way here and your model may be one that other people
follow...

Internally you have a enum to track the SRIOV modes:

enum {
SRIOV_NONE,
SRIOV_LEGACY,
SRIOV_OFFLOADS
};

But patch 8 adds a new enum for devlink to track this as well.

enum devlink_eswitch_mode {
DEVLINK_ESWITCH_MODE_NONE,
DEVLINK_ESWITCH_MODE_LEGACY,
DEVLINK_ESWITCH_MODE_OFFLOADS,
};




Andy,

In mlx5 we're having an eswitch driver instance also when not in sriov 
mode where on that case the mlx5 eswitch mode is called sriov_none, 
which is maybe not a very successful name, I'll look on that.


On the devlink/system level, the eswitch modes are relevant only for 
SRIOV, you can see in the mlx5 set function that we return error when in 
the none mode or asked to go there.


So... with your comment,  I realize now that I forgot to remove 
DEVLINK_ESWITCH_MODE_NONE value from the submission.



Would it make sense at some point to use the devlink modes in the driver
so it's less to track?


This makes it a bit problematic for mlx5 to use the 
DEVLINK_ESWITCH_MODE_YYY values internally.



Again, this is an extremely _minor_ concern.  The rest of the set looks
great and I like the architectural decisions made here.  Awesome work
all around!


Re: Doing crypto in small stack buffers (bluetooth vs vmalloc-stack crash, etc)

2016-06-28 Thread George Spelvin
> We have actually gained quite a bit of documentation recently.
> Have you looked at Documentation/DocBook/crypto-API.tmpl?
> 
> More is always welcome of course.

It's improved since I last looked at it, but there are still many structures
that aren't described:

- struct crypto_instance
- struct crypto_spawn
- struct crypto_blkcipher
- struct blkcipher_desc
- More on the context structures returned by crypto_tfm_ctx

Also not mentioned in the documentation is that some algorithms *do*
have different implementations depending on key size.  SHA-2 is the
classic example.


Re: [Patch net] mlx4: set csum_complete_sw bit when fixing complete csum

2016-06-28 Thread Or Gerlitz
On Tue, Jun 28, 2016 at 12:44 AM, Cong Wang  wrote:
> On Mon, Jun 27, 2016 at 2:08 PM, Or Gerlitz  wrote:
>> On Mon, Jun 27, 2016 at 9:22 PM, Cong Wang  wrote:

>> can you point/paste the exact warning and how to reproduce that? is
>> that as simple as running ping and/or ping6?

> Yes, ping is enough to reproduce it every time.

> The warning is below:
> [ 8693.680997] eth0: hw csum failure

Wow, can you please report us the exact card type (lspci -nn | grep -i
mellanox) and  firmware version (ethtool -i $DEV)


[RFC] WireGuard: next generation secure network tunnel

2016-06-28 Thread Jason A. Donenfeld
Hi Dave & Folks,

Today I'm releasing WireGuard, an encrypted and authenticated
tunneling virtual interface for the kernel. It uses next-generation
cryptography and is designed to be both easy to use and simple to
implement (only ~4000 LoC, which compared to xfrm or openvpn is
spectacular), avoiding the enormous complexities of all other secure
tunneling tools. It's been a long road, but after considerable
research, experiments, cryptographic review, and implementing, I think
I'm at a point where I feel comfortable releasing this and asking for
your feedback. This isn't yet a patch series, however. There's still
some work to be done, I anticipate, before this is mergeable. But what
we have now is a good basis for discussion and talking about what
needs to be done for this to be a proper patch series.

You may visit the main info site about WireGuard at
https://www.wireguard.io and you can read the whitepaper and full
technical description and argumentation at
https://www.wireguard.io/papers/wireguard.pdf . The source code lives
at https://git.zx2c4.com/WireGuard/tree/src/ and you can read
instructions on building it in the install and quickstart sections of
the website. I'm not going to recapitulate all of the paper here, but
I will discuss the things that are most relevant to kernel
development.

WireGuard acts as a virtual interface, doing layer 3 IP tunneling,
addable with "ip link add dev wg0 type wireguard". You can set the
interface's local IP and routes using the usual ip-address and
ip-route tools. The WireGuard-specific elements are in a new tool
called `wg`, which will at some point be merged into the usual ip
tools. With `wg` you can set the device's private key, and give it a
list of associations between peers' public keys, their allowed IP
addresses, and their remote UDP endpoints. When a locally generated
packet hits the device, it looks at the dst IP, looks up this dst IP
in the aforementioned association table, and then encrypts it using
the proper public key's session. Conversely, when an encrypted packet
arrives on the interface, after it's been decrypted, the inner src IP
is looked up in this association table to see if it matches the public
key from which it originated. This is the "cryptokey routing table",
and many more details and explanations are found on the site and paper
above. But that's the basic gist; you add a device with ip-link, give
it keys with `wg`, and then you can start sending and receiving
packets on the interface that are secure.

In order to make this so seamless, WireGuard does away with a lot of
the _theoretically pure_ layering abstractions typically seen. First
of all, WireGuard is an interface, where crypto is done, which is a
considerable departure from the (hugely complex) xfrm-approach. It is
not unprecedented, however; the mac80211 infrastructure also does
crypto at this same layer. The massive gain is not only greater
simplicity in the codebase, but huge simplicity earnings and
ease-of-security for administrators. If a packet comes from a
WireGuard interface, it can be trusted as authentic and confidential.
If you want outgoing packets to be tunneled, point your routing table
at the WireGuard interface. It's basically that simple, removing years
and years of headaches (and catastrophically insecure
misconfigurations) people often have with the xfrm layer.

Second, WireGuard uses something based on the Noise Protocol Framework
(in Noise_IK) for key agreement and handshake, rather than, say,
relegating to a userspace daemon. The reason, again, is massive
simplicity and security savings. The Noise_IK handshake is extremely
simple, and tight integration between the handshake and the transport
layer allows WireGuard itself to handle all session-state and
connection-state and so-forth, making the whole process appear
"stateless" to the administrator (you set it up with `wg`, and then it
_just works_). There is no x509, no ASN.1, no huge complexity; the
user configures the public keys, and then the rest is taken care of.
Other configuration frameworks (based on x509 or SSL or LDAP or
whatever you want) can then build on top of this in userspace, if that
sort of thing is desired. But the basic handshake fundamentals are
left to WireGuard. This is more or less similar to SSH, which cares
about the authorized_keys file.

These two design choices are fundamental to WireGuard, and I believe
they confer significant benefits, which are discussed extensively in
the paper. There are two incidental implementation choices, however,
that I think will be more controversial from a kernel perspective, and
depending on the result of this discussion, maybe things will change,
or maybe they wont.

First, WireGuard doesn't use the kernel's crypto API. The overhead of
memory allocation and abstraction/indirection behind each
encryption/hashing/ec-multiplication operation not only adds
unfortunate performance overhead, but also bloats the code, impacting
ease of auditing and verific

  1   2   >