Re: [PATCH] usbnet: Fix a race between usbnet_stop() and the BH

2015-09-08 Thread Eugene Shatokhin

01.09.2015 17:05, Eugene Shatokhin пишет:

The race may happen when a device (e.g. YOTA 4G LTE Modem) is
unplugged while the system is downloading a large file from the Net.

Hardware breakpoints and Kprobes with delays were used to confirm that
the race does actually happen.

The race is on skb_queue ('next' pointer) between usbnet_stop()
and rx_complete(), which, in turn, calls usbnet_bh().

Here is a part of the call stack with the code where the changes to the
queue happen. The line numbers are for the kernel 4.1.0:

*0 __skb_unlink (skbuff.h:1517)
 prev->next = next;
*1 defer_bh (usbnet.c:430)
 spin_lock_irqsave(&list->lock, flags);
 old_state = entry->state;
 entry->state = state;
 __skb_unlink(skb, list);
 spin_unlock(&list->lock);
 spin_lock(&dev->done.lock);
 __skb_queue_tail(&dev->done, skb);
 if (dev->done.qlen == 1)
 tasklet_schedule(&dev->bh);
 spin_unlock_irqrestore(&dev->done.lock, flags);
*2 rx_complete (usbnet.c:640)
 state = defer_bh(dev, skb, &dev->rxq, state);

At the same time, the following code repeatedly checks if the queue is
empty and reads these values concurrently with the above changes:

*0  usbnet_terminate_urbs (usbnet.c:765)
 /* maybe wait for deletions to finish. */
 while (!skb_queue_empty(&dev->rxq)
 && !skb_queue_empty(&dev->txq)
 && !skb_queue_empty(&dev->done)) {
 schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
 set_current_state(TASK_UNINTERRUPTIBLE);
 netif_dbg(dev, ifdown, dev->net,
   "waited for %d urb completions\n", temp);
 }
*1  usbnet_stop (usbnet.c:806)
 if (!(info->flags & FLAG_AVOID_UNLINK_URBS))
 usbnet_terminate_urbs(dev);

As a result, it is possible, for example, that the skb is removed from
dev->rxq by __skb_unlink() before the check
"!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
also possible in this case that the skb is added to dev->done queue
after "!skb_queue_empty(&dev->done)" is checked. So
usbnet_terminate_urbs() may stop waiting and return while dev->done
queue still has an item.

Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
this race.

Signed-off-by: Eugene Shatokhin 
---
  drivers/net/usb/usbnet.c | 39 ---
  1 file changed, 28 insertions(+), 11 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index e049857..b4cf107 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -428,12 +428,18 @@ static enum skb_state defer_bh(struct usbnet *dev, struct 
sk_buff *skb,
old_state = entry->state;
entry->state = state;
__skb_unlink(skb, list);
-   spin_unlock(&list->lock);
-   spin_lock(&dev->done.lock);
+
+   /* defer_bh() is never called with list == &dev->done.
+* spin_lock_nested() tells lockdep that it is OK to take
+* dev->done.lock here with list->lock held.
+*/
+   spin_lock_nested(&dev->done.lock, SINGLE_DEPTH_NESTING);
+
__skb_queue_tail(&dev->done, skb);
if (dev->done.qlen == 1)
tasklet_schedule(&dev->bh);
-   spin_unlock_irqrestore(&dev->done.lock, flags);
+   spin_unlock(&dev->done.lock);
+   spin_unlock_irqrestore(&list->lock, flags);
return old_state;
  }

@@ -749,6 +755,20 @@ EXPORT_SYMBOL_GPL(usbnet_unlink_rx_urbs);

  /*-*/

+static void wait_skb_queue_empty(struct sk_buff_head *q)
+{
+   unsigned long flags;
+
+   spin_lock_irqsave(&q->lock, flags);
+   while (!skb_queue_empty(q)) {
+   spin_unlock_irqrestore(&q->lock, flags);
+   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
+   set_current_state(TASK_UNINTERRUPTIBLE);
+   spin_lock_irqsave(&q->lock, flags);
+   }
+   spin_unlock_irqrestore(&q->lock, flags);
+}
+
  // precondition: never called in_interrupt
  static void usbnet_terminate_urbs(struct usbnet *dev)
  {
@@ -762,14 +782,11 @@ static void usbnet_terminate_urbs(struct usbnet *dev)
unlink_urbs(dev, &dev->rxq);

/* maybe wait for deletions to finish. */
-   while (!skb_queue_empty(&dev->rxq)
-   && !skb_queue_empty(&dev->txq)
-   && !skb_queue_empty(&dev->done)) {
-   schedule_timeout(msecs_to_jiffies(UNLINK_TIMEOUT_MS));
-   set_current_state(TASK_UNINTERRUPTIBLE);
-   netif_dbg(dev, ifdown, dev->net,
- "waited for %d urb completions\n", temp);
-   }
+   wait_skb_queue_empty(&dev->rxq);
+   wait_skb_queue_empty(&dev->txq);
+   wait_skb_queue_empty(&dev->done);
+   netif_dbg(dev, ifdown, dev->net,
+ "waited for %d urb completions\n", temp);
set_current_state(TASK_RUNNING);
remove_wait_queue(&

Re: [PATCH] usbnet: Fix a race between usbnet_stop() and the BH

2015-09-08 Thread Bjørn Mork
Eugene Shatokhin  writes:

> I resent the patch to make it separate. What is the status of this now?

One of the many nice features of patchwork is that you don't need to ask
those questions :)

See http://patchwork.ozlabs.org/patch/512856/

I really don't think it's appropriate for me to ack this, but I can add
my

Reviewed-by: Bjørn Mork 

if that helps.


Bjørn
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix a race between usbnet_stop() and the BH

2015-09-08 Thread Oliver Neukum
On Tue, 2015-09-08 at 09:37 +0200, Bjørn Mork wrote:
> Eugene Shatokhin  writes:
> 
> > I resent the patch to make it separate. What is the status of this now?
> 
> One of the many nice features of patchwork is that you don't need to ask
> those questions :)
> 
> See http://patchwork.ozlabs.org/patch/512856/
> 
> I really don't think it's appropriate for me to ack this, but I can add
> my

Well, in case this helps:

> Reviewed-by: Bjørn Mork 
Acked-by: Oliver Neukum 

Regards
Oliver



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-08 Thread Corinna Vinschen
Hi David,

On Sep  7 17:00, David Miller wrote:
> From: Corinna Vinschen 
> Date: Mon, 7 Sep 2015 11:29:49 +0200
> 
> > Still wondering though.  Given that the driver never failed before if
> > the counter values couldn't be updated, and given that these counter
> > values only have statistical relevance, why should this suddenly result
> > in a fatal failure at open time?
> 
> Failing to allocate such a small buffer means we have much deeper issues
> at hand.  A GFP_KERNEL allocation of this size really should not fail.

I'm not talking about the allocation.  I agree with you on that score.

What I'm talking about is the situation where the NIC hardware fails to
reset or update its own counters for whatever reason.  Apparently the
mechanism is supposed to be performed within a given timeframe.  The
code sets some registers and then waits for a flag bit to be set to 0.
For that it utilizes a busy loop checking the flag bit up to 1000 times
with a delay of about 10 us.

The error condition is that the flag bit hasn't been set to 0 when the
loop exits, after roughly 10ms, and *this* part does not constitute a
fatal error which breaks the operation of the NIC.  So, from my
perspective a timeout while trying to wait for updated counter values
from the NIC at @ndo_open time should not be treated as fatal.


Corinna


pgpk7BRl5wT6Y.pgp
Description: PGP signature


Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-08 Thread Corinna Vinschen
On Sep  7 23:52, Francois Romieu wrote:
> Corinna Vinschen  :
> [...]
> > I have a bit of a problem with this patch.  It crashes when loading the
> > driver manually w/ modprobe.  For some reason dev_get_stats is called
> > during rtl_init_one and at that time the counters pointer is NULL, so
> > the kernel gets a SEGV.
> >
> >  After I worked around that I got a SEGV in
> > rtl_remove_one, which I still have to find out why.  I didn't test with
> > the latest kernel, though, so I still have to check if that's the reason
> > for the crashes.  That takes a bit longer, I just wanted to let you
> > know.
> 
> Call me stupid: I forgot that it's perfectly fine to request the stats
> of a down interface. :o/
> 
> Updated patch is on the way.
> 
> > There's also a problem with rtl8169_cmd_counters:  It always calls
> > rtl_udelay_loop_wait_low w/ rtl_reset_counters_cond, even when called
> > from rtl8169_update_counters, where it should call
> > rtl_udelay_loop_wait_low w/ rtl_counters_cond to check for the
> > CounterDump flag, rather than for the CounterReset flag.
> > 
> > For now I applied the below patch, but I think it's a bit ugly to
> > conditionalize the rtl_cond struct by the incoming flag value.
> 
> 
> 
> Let's check both at once:
> 
>   return RTL_R32(CounterAddrLow) & (CounterDump | CounterReset);

If you're sure that's valid, then why not?  It's certainly cleaner
code.


Corinna


pgpNZQ0zHUR2P.pgp
Description: PGP signature


[PATCH v3] stmmac: fix check for phydev being open

2015-09-08 Thread Alexey Brodkin
Current check of phydev with IS_ERR(phydev) may make not much sense
because of_phy_connect() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() makes perfect sense.

So let's use combined check IS_ERR_OR_NULL() that covers both cases.

Cc: Sergei Shtylyov 
Cc: Giuseppe Cavallaro 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Signed-off-by: Alexey Brodkin 
---

Changes compared to v2:
 * Updated commit message with mention of of_phy_connect() instead of
   of_phy_attach().
 * Return ENODEV in case of of_phy_connect() failure

Changes compared to v1:
 * Use IS_ERR_OR_NULL() instead of discrete checks for null and err

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..e2c9c86 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,9 +837,12 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}
 
-   if (IS_ERR(phydev)) {
+   if (IS_ERR_OR_NULL(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
-   return PTR_ERR(phydev);
+   if (!phydev)
+   return -ENODEV;
+   else
+   return PTR_ERR(phydev);
}
 
/* Stop Advertising 1000BASE Capability if interface is not GMII */
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: eth: altera: Fix the initial device operstate

2015-09-08 Thread Atsushi Nemoto
Call netif_carrier_off() prior to register_netdev(), otherwise
userspace can see incorrect link state.

Signed-off-by: Atsushi Nemoto 
---
 drivers/net/ethernet/altera/altera_tse_main.c |1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index 3ade46c..91fd718 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -1518,6 +1518,7 @@ static int altera_tse_probe(struct platform_device *pdev)
spin_lock_init(&priv->tx_lock);
spin_lock_init(&priv->rxdma_irq_lock);
 
+   netif_carrier_off(ndev);
ret = register_netdev(ndev);
if (ret) {
dev_err(&pdev->dev, "failed to register TSE net device\n");
-- 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] ipv6: fix multipath route replace error recovery

2015-09-08 Thread Nicolas Dichtel

Le 08/09/2015 02:01, roopa a écrit :

On 9/7/15, 5:03 AM, Nicolas Dichtel wrote:

[snip]

yes, i had submitted the patch you mention above to fix a slightly different
problem that existed then..which
was introduced by "51ebd3181572 ("ipv6: add support of equal cost multipath
(ECMP)")".
Commit "35f1b4e96b9258a3668872b1139c51e5a23eb876 ipv6: do not delete previously
existing ECMP routes if add fails"
subsequently fixed it.

Ok, got it. Thank you for the details.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v3] ipv6: fix multipath route replace error recovery

2015-09-08 Thread Nicolas Dichtel

Le 08/09/2015 02:42, roopa a écrit :

On 9/6/15, 1:46 PM, Roopa Prabhu wrote:

From: Roopa Prabhu 

Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route

This patch fixes the problem with the following:
a) Changes the existing multipath route add code to a two stage process:
   build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that all errors are caught during building rt6_infos
   and we fail early


The other way I have been thinking of solving the problem is to mark the sibling
routes being replaced with some state
...so they can be restored on error. Still figuring out a way to do this in a
clean and non-intrusive way.

If I'm not wrong, the only error which may result to an inconsistent list of
nexthops is ENOMEM (after your patch). I'm not sure it's worth to add too much
complexity to the code to handle this error.


Or maybe  just save the sibling routes (rt6_infos) being replaced in a list and
re-insert them on error.

Yes, but we can also fail to re-insert the route.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 1/2] cxgb4: Fix tx flit calculation

2015-09-08 Thread Hariprasad Shenai
In commit 0aac3f56d4a63f04 ("cxgb4: Add comment for calculate tx flits
and sge length code") introduced a regression where tx flit calculation
is going wrong, which can lead to data corruption, hang, stall and
write-combining failure. Fixing it.

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/sge.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/sge.c 
b/drivers/net/ethernet/chelsio/cxgb4/sge.c
index 78f446c..9162746 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/sge.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/sge.c
@@ -807,7 +807,7 @@ static inline unsigned int calc_tx_flits(const struct 
sk_buff *skb)
 * message or, if we're doing a Large Send Offload, an LSO CPL message
 * with an embedded TX Packet Write CPL message.
 */
-   flits = sgl_len(skb_shinfo(skb)->nr_frags + 1) + 4;
+   flits = sgl_len(skb_shinfo(skb)->nr_frags + 1);
if (skb_shinfo(skb)->gso_size)
flits += (sizeof(struct fw_eth_tx_pkt_wr) +
  sizeof(struct cpl_tx_pkt_lso_core) +
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 0/2] Fix tx flit calculation and wc stat configuration

2015-09-08 Thread Hariprasad Shenai
Hi,

This patch series fixes the following:
Patch 1/2 fixes tx flit calculation, which if wrong can lead to 
stall, hang, data corrpution, write combining failure. Patch 2/2 fixes 
PCI-E write combining stats configuration.

This patch series has been created against net tree and includes
patches on cxgb4 driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks

Hariprasad Shenai (2):
  cxgb4: Fix tx flit calculation
  cxgb4: Fix for write-combining stats configuration

 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 7 +--
 drivers/net/ethernet/chelsio/cxgb4/sge.c| 2 +-
 2 files changed, 6 insertions(+), 3 deletions(-)

-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net 2/2] cxgb4: Fix for write-combining stats configuration

2015-09-08 Thread Hariprasad Shenai
The write-combining configuration register SGE_STAT_CFG_A needs to
be configured after FW initializes the adapter, else FW will reset
the configuration

Signed-off-by: Hariprasad Shenai 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
index 592a4d6..f5dcde2 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c
@@ -4719,8 +4719,6 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
err = -ENOMEM;
goto out_free_adapter;
}
-   t4_write_reg(adapter, SGE_STAT_CFG_A,
-STATSOURCE_T5_V(7) | STATMODE_V(0));
}
 
setup_memwin(adapter);
@@ -4732,6 +4730,11 @@ static int init_one(struct pci_dev *pdev, const struct 
pci_device_id *ent)
if (err)
goto out_unmap_bar;
 
+   /* configure SGE_STAT_CFG_A to read WC stats */
+   if (!is_t4(adapter->params.chip))
+   t4_write_reg(adapter, SGE_STAT_CFG_A,
+STATSOURCE_T5_V(7) | STATMODE_V(0));
+
for_each_port(adapter, i) {
struct net_device *netdev;
 
-- 
2.3.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-08 Thread Julien Grall
Hi Wei,

On 07/09/15 17:57, Wei Liu wrote:
> You might need to rebase you patch. A patch to netback went it recently.

Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
support for multicast control"?

If so I didn't see any specific issue while rebasing on the latest
linus' master.

> On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
>> The PV network protocol is using 4KB page granularity. The goal of this
>> patch is to allow a Linux using 64KB page granularity working as a
>> network backend on a non-modified Xen.
>>
>> It's only necessary to adapt the ring size and break skb data in small
>> chunk of 4KB. The rest of the code is relying on the grant table code.
>>
>> Signed-off-by: Julien Grall 
>>
> 
> Reviewed-by: Wei Liu 

Thank you!

Regards,

-- 
Julien Grall
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [PATCH v4 18/20] net/xen-netback: Make it running on 64KB page granularity

2015-09-08 Thread Wei Liu
On Tue, Sep 08, 2015 at 12:07:31PM +0100, Julien Grall wrote:
> Hi Wei,
> 
> On 07/09/15 17:57, Wei Liu wrote:
> > You might need to rebase you patch. A patch to netback went it recently.
> 
> Do you mean 210c34dcd8d912dcc740f1f17625a7293af5cb56 "xen-netback: add
> support for multicast control"?
> 

Yes, that one.

> If so I didn't see any specific issue while rebasing on the latest
> linus' master.
> 

Good.

> > On Mon, Sep 07, 2015 at 04:33:56PM +0100, Julien Grall wrote:
> >> The PV network protocol is using 4KB page granularity. The goal of this
> >> patch is to allow a Linux using 64KB page granularity working as a
> >> network backend on a non-modified Xen.
> >>
> >> It's only necessary to adapt the ring size and break skb data in small
> >> chunk of 4KB. The rest of the code is relying on the grant table code.
> >>
> >> Signed-off-by: Julien Grall 
> >>
> > 
> > Reviewed-by: Wei Liu 
> 
> Thank you!
> 
> Regards,
> 
> -- 
> Julien Grall
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] stmmac: fix check for phydev being open

2015-09-08 Thread Sergei Shtylyov

Hello.

On 9/8/2015 11:43 AM, Alexey Brodkin wrote:


Current check of phydev with IS_ERR(phydev) may make not much sense
because of_phy_connect() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() makes perfect sense.

So let's use combined check IS_ERR_OR_NULL() that covers both cases.

Cc: Sergei Shtylyov 
Cc: Giuseppe Cavallaro 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Signed-off-by: Alexey Brodkin 
---

Changes compared to v2:
  * Updated commit message with mention of of_phy_connect() instead of
of_phy_attach().
  * Return ENODEV in case of of_phy_connect() failure

Changes compared to v1:
  * Use IS_ERR_OR_NULL() instead of discrete checks for null and err

  drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..e2c9c86 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,9 +837,12 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}

-   if (IS_ERR(phydev)) {
+   if (IS_ERR_OR_NULL(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
-   return PTR_ERR(phydev);
+   if (!phydev)
+   return -ENODEV;
+   else
+   return PTR_ERR(phydev);


   Don't need *else* after *return* and scripts/checkpatch.pl should have 
complained about that.


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] stmmac: fix check for phydev being open

2015-09-08 Thread Alexey Brodkin
Hi Sergei,

On Tue, 2015-09-08 at 14:20 +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 9/8/2015 11:43 AM, Alexey Brodkin wrote:
> 
> > Current check of phydev with IS_ERR(phydev) may make not much sense
> > because of_phy_connect() returns NULL on failure instead of error value.
> > 
> > Still for checking result of phy_connect() IS_ERR() makes perfect sense.
> > 
> > So let's use combined check IS_ERR_OR_NULL() that covers both cases.
> > 
> > Cc: Sergei Shtylyov 
> > Cc: Giuseppe Cavallaro 
> > Cc: linux-ker...@vger.kernel.org
> > Cc: sta...@vger.kernel.org
> > Cc: David Miller 
> > Signed-off-by: Alexey Brodkin 
> > ---
> > 
> > Changes compared to v2:
> >   * Updated commit message with mention of of_phy_connect() instead of
> > of_phy_attach().
> >   * Return ENODEV in case of of_phy_connect() failure
> > 
> > Changes compared to v1:
> >   * Use IS_ERR_OR_NULL() instead of discrete checks for null and err
> > 
> >   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +--
> >   1 file changed, 5 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
> > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > index 864b476..e2c9c86 100644
> > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> > @@ -837,9 +837,12 @@ static int stmmac_init_phy(struct net_device *dev)
> >  interface);
> > }
> > 
> > -   if (IS_ERR(phydev)) {
> > +   if (IS_ERR_OR_NULL(phydev)) {
> > pr_err("%s: Could not attach to PHY\n", dev->name);
> > -   return PTR_ERR(phydev);
> > +   if (!phydev)
> > +   return -ENODEV;
> > +   else
> > +   return PTR_ERR(phydev);
> 
> Don't need *else* after *return* and scripts/checkpatch.pl should have 
> complained about that.

./scripts/checkpatch.pl 0001-stmmac-fix-check-for-phydev-being-open.patch 
total: 0 errors, 0 warnings, 0 checks, 14 lines checked

-Alexey--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv1 net] xen-netback: require fewer guest Rx slots when not using GSO

2015-09-08 Thread David Vrabel
Commit f48da8b14d04ca87ffcffe68829afd45f926ec6a (xen-netback: fix
unlimited guest Rx internal queue and carrier flapping) introduced a
regression.

The PV frontend in IPXE only places 4 requests on the guest Rx ring.
Since netback required at least (MAX_SKB_FRAGS + 1) slots, IPXE could
not receive any packets.

a) If GSO is not enabled on the VIF, fewer guest Rx slots are required
   for the largest possible packet.  Calculate the required slots
   based on the maximum GSO size or the MTU.

   This calculation of the number of required slots relies on
   1650d5455bd2 (xen-netback: always fully coalesce guest Rx packets)
   which present in 4.0-rc1 and later.

b) Reduce the Rx stall detection to checking for at least one
   available Rx request.  This is fine since we're predominately
   concerned with detecting interfaces which are down and thus have
   zero available Rx requests.

Signed-off-by: David Vrabel 
---
 drivers/net/xen-netback/common.h  | 10 --
 drivers/net/xen-netback/netback.c | 23 ---
 2 files changed, 16 insertions(+), 17 deletions(-)

Note that this can only be backported as-is on top of
1650d5455bd2dc6b5ee134bd6fc1a3236c266b5b (xen-netback: always fully
coalesce guest Rx packets) which is in 4.0 and later.

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8a495b3..25990b1 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -200,11 +200,6 @@ struct xenvif_queue { /* Per-queue data for xenvif */
struct xenvif_stats stats;
 };
 
-/* Maximum number of Rx slots a to-guest packet may use, including the
- * slot needed for GSO meta-data.
- */
-#define XEN_NETBK_RX_SLOTS_MAX (MAX_SKB_FRAGS + 1)
-
 enum state_bit_shift {
/* This bit marks that the vif is connected */
VIF_STATUS_CONNECTED,
@@ -306,11 +301,6 @@ int xenvif_dealloc_kthread(void *data);
 
 void xenvif_rx_queue_tail(struct xenvif_queue *queue, struct sk_buff *skb);
 
-/* Determine whether the needed number of slots (req) are available,
- * and set req_event if not.
- */
-bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed);
-
 void xenvif_carrier_on(struct xenvif *vif);
 
 /* Callback from stack when TX packet can be released */
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 3f44b52..e791930 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -149,9 +149,20 @@ static inline pending_ring_idx_t pending_index(unsigned i)
return i & (MAX_PENDING_REQS-1);
 }
 
-bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue, int needed)
+static int xenvif_rx_ring_slots_needed(struct xenvif *vif)
+{
+   if (vif->gso_mask)
+   return DIV_ROUND_UP(vif->dev->gso_max_size, PAGE_SIZE) + 1;
+   else
+   return DIV_ROUND_UP(vif->dev->mtu, PAGE_SIZE);
+}
+
+static bool xenvif_rx_ring_slots_available(struct xenvif_queue *queue)
 {
RING_IDX prod, cons;
+   int needed;
+
+   needed = xenvif_rx_ring_slots_needed(queue->vif);
 
do {
prod = queue->rx.sring->req_prod;
@@ -513,7 +524,7 @@ static void xenvif_rx_action(struct xenvif_queue *queue)
 
skb_queue_head_init(&rxq);
 
-   while (xenvif_rx_ring_slots_available(queue, XEN_NETBK_RX_SLOTS_MAX)
+   while (xenvif_rx_ring_slots_available(queue)
   && (skb = xenvif_rx_dequeue(queue)) != NULL) {
queue->last_rx_time = jiffies;
 
@@ -1839,8 +1850,7 @@ static bool xenvif_rx_queue_stalled(struct xenvif_queue 
*queue)
prod = queue->rx.sring->req_prod;
cons = queue->rx.req_cons;
 
-   return !queue->stalled
-   && prod - cons < XEN_NETBK_RX_SLOTS_MAX
+   return !queue->stalled && prod - cons < 1
&& time_after(jiffies,
  queue->last_rx_time + queue->vif->stall_timeout);
 }
@@ -1852,14 +1862,13 @@ static bool xenvif_rx_queue_ready(struct xenvif_queue 
*queue)
prod = queue->rx.sring->req_prod;
cons = queue->rx.req_cons;
 
-   return queue->stalled
-   && prod - cons >= XEN_NETBK_RX_SLOTS_MAX;
+   return queue->stalled && prod - cons >= 1;
 }
 
 static bool xenvif_have_rx_work(struct xenvif_queue *queue)
 {
return (!skb_queue_empty(&queue->rx_queue)
-   && xenvif_rx_ring_slots_available(queue, 
XEN_NETBK_RX_SLOTS_MAX))
+   && xenvif_rx_ring_slots_available(queue))
|| (queue->vif->stall_timeout &&
(xenvif_rx_queue_stalled(queue)
 || xenvif_rx_queue_ready(queue)))
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd

2015-09-08 Thread Tycho Andersen
On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
> On 09/04/2015 10:41 PM, Kees Cook wrote:
> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
> >  wrote:
> >> This is the final bit needed to support seccomp filters created via the bpf
> >> syscall.
> 
> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
> the outset.

Apologies, I'll cc the list on future versions.

> Tycho, where's the man-pages patch describing this new kernel-userspace
> API feature? :-)

Once we get the API finalized I'm happy to write it.

> >> One concern with this patch is exactly what the interface should look like
> >> for users, since seccomp()'s second argument is a pointer, we could ask
> >> people to pass a pointer to the fd, but implies we might write to it which
> >> seems impolite. Right now we cast the pointer (and force the user to cast
> >> it), which generates ugly warnings. I'm not sure what the right answer is
> >> here.
> >>
> >> Signed-off-by: Tycho Andersen 
> >> CC: Kees Cook 
> >> CC: Will Drewry 
> >> CC: Oleg Nesterov 
> >> CC: Andy Lutomirski 
> >> CC: Pavel Emelyanov 
> >> CC: Serge E. Hallyn 
> >> CC: Alexei Starovoitov 
> >> CC: Daniel Borkmann 
> >> ---
> >>  include/linux/seccomp.h  |  3 +-
> >>  include/uapi/linux/seccomp.h |  1 +
> >>  kernel/seccomp.c | 70 
> >> 
> >>  3 files changed, 61 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
> >> index d1a86ed..a725dd5 100644
> >> --- a/include/linux/seccomp.h
> >> +++ b/include/linux/seccomp.h
> >> @@ -3,7 +3,8 @@
> >>
> >>  #include 
> >>
> >> -#define SECCOMP_FILTER_FLAG_MASK   (SECCOMP_FILTER_FLAG_TSYNC)
> >> +#define SECCOMP_FILTER_FLAG_MASK   (\
> >> +   SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
> >>
> >>  #ifdef CONFIG_SECCOMP
> >>
> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
> >> index 0f238a4..c29a423 100644
> >> --- a/include/uapi/linux/seccomp.h
> >> +++ b/include/uapi/linux/seccomp.h
> >> @@ -16,6 +16,7 @@
> >>
> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
> >>  #define SECCOMP_FILTER_FLAG_TSYNC  1
> >> +#define SECCOMP_FILTER_FLAG_EBPF   (1 << 1)
> >>
> >>  /*
> >>   * All BPF programs must return a 32-bit value.
> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
> >> index a2c5b32..9c6bea6 100644
> >> --- a/kernel/seccomp.c
> >> +++ b/kernel/seccomp.c
> >> @@ -355,17 +355,6 @@ static struct seccomp_filter 
> >> *seccomp_prepare_filter(struct sock_fprog *fprog)
> >>
> >> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
> >>
> >> -   /*
> >> -* Installing a seccomp filter requires that the task has
> >> -* CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
> >> -* This avoids scenarios where unprivileged tasks can affect the
> >> -* behavior of privileged children.
> >> -*/
> >> -   if (!task_no_new_privs(current) &&
> >> -   security_capable_noaudit(current_cred(), current_user_ns(),
> >> -CAP_SYS_ADMIN) != 0)
> >> -   return ERR_PTR(-EACCES);
> >> -
> >> /* Allocate a new seccomp_filter */
> >> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
> >> if (!sfilter)
> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int 
> >> reason)
> >> info.si_syscall = syscall;
> >> force_sig_info(SIGSYS, &info, current);
> >>  }
> >> +
> >> +#ifdef CONFIG_BPF_SYSCALL
> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user 
> >> *filter)
> >> +{
> >> +   /* XXX: this cast generates a warning. should we make people pass 
> >> in
> >> +* &fd, or is there some nicer way of doing this?
> >> +*/
> >> +   u32 fd = (u32) filter;
> > 
> > I think this is probably the right way to do it, modulo getting the
> > warning fixed. Let me invoke the great linux-api subscribers to get
> > some more opinions.
> 
> Sigh. It's sad, but the using a cast does seem the simplest option.
> But, how about another idea...
> 
> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
> > pointer argument into an fd argument. Is this sane, should it be a
> > pointer to an fd, or should it not be a flag at all, creating a new
> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?
> 
> What about
> 
> seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)
> 
> Where structp is a pointer to something like
> 
> struct seccomp_ebpf {
> int size;  /* Size of this whole struct */
> int fd;
> }
> 
> 'size' allows for future expansion of the struct (in case we want to 
> expand it later), and placing 'fd' inside a struct avoids unpleasant
> implication that would be made by passing a pointer to an fd as the
> third argument.

I like this; although perhaps something like bpf() has, with the

Re: [PATCH net v3] ipv6: fix multipath route replace error recovery

2015-09-08 Thread roopa

On 9/8/15, 2:55 AM, Nicolas Dichtel wrote:

Le 08/09/2015 02:42, roopa a écrit :

On 9/6/15, 1:46 PM, Roopa Prabhu wrote:

From: Roopa Prabhu 

Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too 
late

to recover the already deleted existing route

This patch fixes the problem with the following:
a) Changes the existing multipath route add code to a two stage 
process:

   build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that all errors are caught during building rt6_infos
   and we fail early

The other way I have been thinking of solving the problem is to mark 
the sibling

routes being replaced with some state
...so they can be restored on error. Still figuring out a way to do 
this in a

clean and non-intrusive way.
If I'm not wrong, the only error which may result to an inconsistent 
list of
nexthops is ENOMEM (after your patch). I'm not sure it's worth to add 
too much

complexity to the code to handle this error.
yes, agreed. And that's the reason i went down the  path presented in 
the patch in context.
I was just reflecting back on the other possible implementations. thanks 
for the review.


Or maybe  just save the sibling routes (rt6_infos) being replaced in 
a list and

re-insert them on error.

Yes, but we can also fail to re-insert the route.

ack.

posting v4 soon.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


PATCH: netdev: add a cast NLMSG_OK to avoid a GCC warning in users' code

2015-09-08 Thread D. Hugh Redelmeier
Using netlink.h's NLMSG_OK correctly will cause GCC to issue a warning
on systems with 32-bit userland.  The definition can easily be changed
to avoid this.

GCC's warning is to flag comparisons where C's implicit "Usual
Arithmetic Conversions" could lead to a surprising result.

Consider this context:
int i;
unsigned u;
The C standard says that i= (int)sizeof(struct nlmsghdr) && \
   (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
   (nlh)->nlmsg_len <= (len))

This comparison looks suspicious to GCC:
(nlh)->nlmsg_len <= (len)
nlmsg_len is of type __u32.  If int is 32 bits, this compare will be
done as
(nlh)->nlmsg_len <= (unsigned)(len)
We know that this is actually safe because the first conjunct
determined that len isn't negative but the GCC apparently doesn't know.

A change that would calm GCC and also be correct would be to add
a cast to unsigned:
(nlh)->nlmsg_len <= (unsigned)(len))
But I imagine that len might well actually have type ssize_t.  It is
often the result of a call to recvfrom(2), which is a ssize_t.  So I
think that this would be safer:
(nlh)->nlmsg_len <= (size_t)(len))
I know of no system where size_t is narrower than unsigned.

This problem came up when building a userland component of libreswan in a 
32-bit environment with a recent GCC and was reported by Lennart Sorensen.

Signed-off-by: D. Hugh Redelmeier 

diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 6f3fe16..dd15537 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -86,7 +86,7 @@ struct nlmsghdr {
  (struct nlmsghdr*)(((char*)(nlh)) + 
NLMSG_ALIGN((nlh)->nlmsg_len)))
 #define NLMSG_OK(nlh,len) ((len) >= (int)sizeof(struct nlmsghdr) && \
   (nlh)->nlmsg_len >= sizeof(struct nlmsghdr) && \
-  (nlh)->nlmsg_len <= (len))
+  (nlh)->nlmsg_len <= (size_t)(len))
 #define NLMSG_PAYLOAD(nlh,len) ((nlh)->nlmsg_len - NLMSG_SPACE((len)))
 
 #define NLMSG_NOOP 0x1 /* Nothing. */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH mm] slab: implement bulking for SLAB allocator

2015-09-08 Thread Jesper Dangaard Brouer
Implement a basic approach of bulking in the slab allocator. Simply
use local_irq_{disable,enable} and call single alloc/free in a loop.
This simple implementation approach is surprising fast.

Notice the normal slab fastpath is: 96 cycles (24.119 ns). Below table
show that single object bulking only takes 42 cycles.  This can be
explained by the bulk APIs requirement to be called from a known
interrupt context, that is with interrupts enabled.  This allow us to
avoid the expensive (37 cycles) local_irq_{save,restore}, and instead
use the much faster (7 cycles) local_irq_{disable,restore}.

Benchmarked[1] obj size 256 bytes on CPU i7-4790K @ 4.00GHz:

bulk - Current  - simple slab bulk implementation
  1 - 115 cycles(tsc) 28.812 ns - 42 cycles(tsc) 10.715 ns - improved 63.5%
  2 - 103 cycles(tsc) 25.956 ns - 27 cycles(tsc)  6.985 ns - improved 73.8%
  3 - 101 cycles(tsc) 25.336 ns - 22 cycles(tsc)  5.733 ns - improved 78.2%
  4 - 100 cycles(tsc) 25.147 ns - 21 cycles(tsc)  5.319 ns - improved 79.0%
  8 -  98 cycles(tsc) 24.616 ns - 18 cycles(tsc)  4.620 ns - improved 81.6%
 16 -  97 cycles(tsc) 24.408 ns - 17 cycles(tsc)  4.344 ns - improved 82.5%
 30 -  98 cycles(tsc) 24.641 ns - 16 cycles(tsc)  4.202 ns - improved 83.7%
 32 -  98 cycles(tsc) 24.607 ns - 16 cycles(tsc)  4.199 ns - improved 83.7%
 34 -  98 cycles(tsc) 24.605 ns - 18 cycles(tsc)  4.579 ns - improved 81.6%
 48 -  97 cycles(tsc) 24.463 ns - 17 cycles(tsc)  4.405 ns - improved 82.5%
 64 -  97 cycles(tsc) 24.370 ns - 17 cycles(tsc)  4.384 ns - improved 82.5%
128 -  99 cycles(tsc) 24.763 ns - 19 cycles(tsc)  4.755 ns - improved 80.8%
158 -  98 cycles(tsc) 24.708 ns - 18 cycles(tsc)  4.723 ns - improved 81.6%
250 - 101 cycles(tsc) 25.342 ns - 20 cycles(tsc)  5.035 ns - improved 80.2%

Also notice how well bulking maintains the performance when the bulk
size increases (which is a soar spot for the slub allocator).

Increasing the bulk size further:
 20 cycles(tsc)  5.214 ns (bulk: 512)
 30 cycles(tsc)  7.734 ns (bulk: 768)
 40 cycles(tsc) 10.244 ns (bulk:1024)
 72 cycles(tsc) 18.049 ns (bulk:2048)
 90 cycles(tsc) 22.585 ns (bulk:4096)

[1] 
https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/mm/slab_bulk_test01.c

Signed-off-by: Jesper Dangaard Brouer 
---
 mm/slab.c |   87 +++--
 1 file changed, 62 insertions(+), 25 deletions(-)

diff --git a/mm/slab.c b/mm/slab.c
index d890750ec31e..0086b24210ad 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3234,11 +3234,15 @@ __do_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 #endif /* CONFIG_NUMA */
 
 static __always_inline void *
-slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
+slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller,
+  bool irq_off_needed)
 {
unsigned long save_flags;
void *objp;
 
+   /* Compiler need to remove irq_off_needed branch statements */
+   BUILD_BUG_ON(!__builtin_constant_p(irq_off_needed));
+
flags &= gfp_allowed_mask;
 
lockdep_trace_alloc(flags);
@@ -3249,9 +3253,11 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, 
unsigned long caller)
cachep = memcg_kmem_get_cache(cachep, flags);
 
cache_alloc_debugcheck_before(cachep, flags);
-   local_irq_save(save_flags);
+   if (irq_off_needed)
+   local_irq_save(save_flags);
objp = __do_cache_alloc(cachep, flags);
-   local_irq_restore(save_flags);
+   if (irq_off_needed)
+   local_irq_restore(save_flags);
objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
kmemleak_alloc_recursive(objp, cachep->object_size, 1, cachep->flags,
 flags);
@@ -3407,7 +3413,7 @@ static inline void __cache_free(struct kmem_cache 
*cachep, void *objp,
  */
 void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t flags)
 {
-   void *ret = slab_alloc(cachep, flags, _RET_IP_);
+   void *ret = slab_alloc(cachep, flags, _RET_IP_, true);
 
trace_kmem_cache_alloc(_RET_IP_, ret,
   cachep->object_size, cachep->size, flags);
@@ -3416,16 +3422,23 @@ void *kmem_cache_alloc(struct kmem_cache *cachep, gfp_t 
flags)
 }
 EXPORT_SYMBOL(kmem_cache_alloc);
 
-void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
-{
-   __kmem_cache_free_bulk(s, size, p);
-}
-EXPORT_SYMBOL(kmem_cache_free_bulk);
-
+/* Note that interrupts must be enabled when calling this function. */
 bool kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
-   void **p)
+  void **p)
 {
-   return __kmem_cache_alloc_bulk(s, flags, size, p);
+   size_t i;
+
+   local_irq_disable();
+   for (i = 0; i < size; i++) {
+   void *x = p[i] = slab_alloc(s, flags, _RET_IP_, false);
+
+   if (!x) {
+

[PATCH v2] RDS: verify the underlying transport exists before creating a connection

2015-09-08 Thread Sasha Levin
There was no verification that an underlying transport exists when creating
a connection, this would cause dereferencing a NULL ptr.

It might happen on sockets that weren't properly bound before attempting to
send a message, which will cause a NULL ptr deref:

[135546.047719] kasan: GPF could be caused by NULL-ptr deref or user memory 
accessgeneral protection fault:  [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[135546.051270] Modules linked in:
[135546.051781] CPU: 4 PID: 15650 Comm: trinity-c4 Not tainted 
4.2.0-next-20150902-sasha-00041-gbaa1222-dirty #2527
[135546.053217] task: 8800835bc000 ti: 8800bc708000 task.ti: 
8800bc708000
[135546.054291] RIP: __rds_conn_create (net/rds/connection.c:194)
[135546.055666] RSP: 0018:8800bc70fab0  EFLAGS: 00010202
[135546.056457] RAX: dc00 RBX: 0f2c RCX: 
8800835bc000
[135546.057494] RDX: 0007 RSI: 8800835bccd8 RDI: 
0038
[135546.058530] RBP: 8800bc70fb18 R08: 0001 R09: 

[135546.059556] R10: ed014d7a3a23 R11: ed014d7a3a21 R12: 

[135546.060614] R13: 0001 R14: 8801ec3d R15: 

[135546.061668] FS:  7faad4ffb700() GS:88025200() 
knlGS:
[135546.062836] CS:  0010 DS:  ES:  CR0: 8005003b
[135546.063682] CR2: 846a CR3: 9d137000 CR4: 
06a0
[135546.064723] Stack:
[135546.065048]  afe2055c afe23fc1 ed00493097bf 
8801ec3d0008
[135546.066247]   00d0  
ac194a24c0586342
[135546.067438]  1100178e1f78 880320581b00 8800bc70fdd0 
880320581b00
[135546.068629] Call Trace:
[135546.069028] ? __rds_conn_create (include/linux/rcupdate.h:856 
net/rds/connection.c:134)
[135546.069989] ? rds_message_copy_from_user (net/rds/message.c:298)
[135546.071021] rds_conn_create_outgoing (net/rds/connection.c:278)
[135546.071981] rds_sendmsg (net/rds/send.c:1058)
[135546.072858] ? perf_trace_lock (include/trace/events/lock.h:38)
[135546.073744] ? lockdep_init (kernel/locking/lockdep.c:3298)
[135546.074577] ? rds_send_drop_to (net/rds/send.c:976)
[135546.075508] ? __might_fault (./arch/x86/include/asm/current.h:14 
mm/memory.c:3795)
[135546.076349] ? __might_fault (mm/memory.c:3795)
[135546.077179] ? rds_send_drop_to (net/rds/send.c:976)
[135546.078114] sock_sendmsg (net/socket.c:611 net/socket.c:620)
[135546.078856] SYSC_sendto (net/socket.c:1657)
[135546.079596] ? SYSC_connect (net/socket.c:1628)
[135546.080510] ? trace_dump_stack (kernel/trace/trace.c:1926)
[135546.081397] ? ring_buffer_unlock_commit (kernel/trace/ring_buffer.c:2479 
kernel/trace/ring_buffer.c:2558 kernel/trace/ring_buffer.c:2674)
[135546.082390] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.083410] ? trace_event_raw_event_sys_enter 
(include/trace/events/syscalls.h:16)
[135546.084481] ? do_audit_syscall_entry (include/trace/events/syscalls.h:16)
[135546.085438] ? trace_buffer_unlock_commit (kernel/trace/trace.c:1749)
[135546.085515] rds_ib_laddr_check(): addr 36.74.25.172 ret -99 node type -1 


Acked-by: Santosh Shilimkar  
Signed-off-by: Sasha Levin 
---
 net/rds/connection.c |6 ++
 1 file changed, 6 insertions(+)

diff --git a/net/rds/connection.c b/net/rds/connection.c
index a50e652..0218d81 100644
--- a/net/rds/connection.c
+++ b/net/rds/connection.c
@@ -189,6 +189,12 @@ new_conn:
}
}
 
+   if (trans == NULL) {
+   kmem_cache_free(rds_conn_slab, conn);
+   conn = ERR_PTR(-ENODEV);
+   goto out;
+   }
+
conn->c_trans = trans;
 
ret = trans->conn_alloc(conn, gfp);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: linux-next: Tree for Sep 8 (netfilter build error)

2015-09-08 Thread Randy Dunlap
On 09/07/15 22:21, Stephen Rothwell wrote:
> Hi all,
> 
> Please do not add material for v4.4 until after v4.3-rc1 is out.
> 
> Changes since 20150903:
> 

on i386:

net/built-in.o: In function `nf_dup_ipv6':
(.text+0x16dfba): undefined reference to `nf_conntrack_untracked'
net/built-in.o: In function `nf_dup_ipv6':
(.text+0x16dfd7): undefined reference to `nf_conntrack_untracked'


Full randconfig file is attached.


-- 
~Randy
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.2.0 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_DMA_MAP_STATE=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_LAZY_GS=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=3
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_CONSTRUCTORS=y
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
CONFIG_KERNEL_GZIP=y
# CONFIG_KERNEL_BZIP2 is not set
# CONFIG_KERNEL_LZMA is not set
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
# CONFIG_POSIX_MQUEUE is not set
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_FHANDLE=y
# CONFIG_USELIB is not set
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
CONFIG_AUDITSYSCALL=y
CONFIG_AUDIT_WATCH=y
CONFIG_AUDIT_TREE=y

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ_COMMON=y
# CONFIG_HZ_PERIODIC is not set
CONFIG_NO_HZ_IDLE=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set

#
# CPU/Task time and stats accounting
#
CONFIG_TICK_CPU_ACCOUNTING=y
# CONFIG_IRQ_TIME_ACCOUNTING is not set

#
# RCU Subsystem
#
CONFIG_TINY_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
# CONFIG_TREE_RCU_TRACE is not set
CONFIG_RCU_KTHREAD_PRIO=0
# CONFIG_RCU_EXPEDITE_BOOT is not set
# CONFIG_BUILD_BIN2C is not set
# CONFIG_IKCONFIG is not set
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
CONFIG_CGROUP_FREEZER=y
# CONFIG_CGROUP_PIDS is not set
# CONFIG_CGROUP_DEVICE is not set
# CONFIG_CPUSETS is not set
# CONFIG_CGROUP_CPUACCT is not set
CONFIG_PAGE_COUNTER=y
# CONFIG_MEMCG is not set
CONFIG_CGROUP_HUGETLB=y
CONFIG_CGROUP_PERF=y
CONFIG_CGROUP_SCHED=y
CONFIG_FAIR_GROUP_SCHED=y
CONFIG_CFS_BANDWIDTH=y
CONFIG_RT_GROUP_SCHED=y
# CONFIG_CHECKPOINT_RESTORE is not set
CONFIG_SCHED_AUTOGROUP=y
CONFIG_RELAY=y
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
# CONFIG_LTO_MENU is not set
CONFIG_ANON_INODES=y
CONFIG_HAVE_UID16=y
CONFIG_SYSCTL_EXCEPTION_TRACE=y
CONFIG_HAVE_PCSPKR_PLATFORM=y
CONFIG_BPF=y
CONFIG_EXPERT=y
# CONFIG_MULTIUSER is not set
# CONFIG_SGETMASK_SYSCALL is not set
# CONFIG_SYSFS_SYSCALL is not set
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_PRINTK is not set
# CONFIG_BUG is not set
CONFIG_PCSPKR_PLATFORM=y
CONFIG_BASE_FULL=y
# CONFIG_FUTEX is not set
CONFIG_EPOLL=y
# CONFIG_SIGNALFD is not set
# CONFIG_TIMERFD is not set
# CONFIG_EVENTFD is not set
CONFIG_BPF_SYSCALL=y
# CONFIG_SHMEM is not set
# CONFIG_AIO is not set
CONFIG_ADVISE_SYSCALLS=y
CONFIG_USERFAULTFD=y
# CONFIG_MEMBARRIER is not set
CONFIG_EMBEDDED=y
CONFIG_HAVE_PERF_EVENTS=y

#
# Kernel Performance Events And Counters
#
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_COMPAT_BRK=y
# CO

Re: [PATCH mm] slab: implement bulking for SLAB allocator

2015-09-08 Thread Christoph Lameter
On Tue, 8 Sep 2015, Jesper Dangaard Brouer wrote:

> Also notice how well bulking maintains the performance when the bulk
> size increases (which is a soar spot for the slub allocator).

Well you are not actually completing the free action in SLAB. This is
simply queueing the item to be freed later. Also was this test done on a
NUMA system? Alien caches at some point come into the picture.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] ebpf: fix fd refcount leaks related to maps in bpf syscall

2015-09-08 Thread Daniel Borkmann
We may already have gotten a proper fd struct through fdget(), so
whenever we return at the end of an map operation, we need to call
fdput(). However, each map operation from syscall side first probes
CHECK_ATTR() to verify that unused fields in the bpf_attr union are
zero.

In case of malformed input, we return with error, but the lookup to
the map_fd was already performed at that time, so that we return
without an corresponding fdput(). Fix it by performing an fdget()
only right before bpf_map_get(). The fdget() invocation on maps in
the verifier is not affected.

Fixes: db20fd2b0108 ("bpf: add lookup/update/delete/iterate methods to BPF 
maps")
Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/syscall.c | 12 
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index dc9b464..35bac8e 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -155,14 +155,15 @@ static int map_lookup_elem(union bpf_attr *attr)
void __user *ukey = u64_to_ptr(attr->key);
void __user *uvalue = u64_to_ptr(attr->value);
int ufd = attr->map_fd;
-   struct fd f = fdget(ufd);
struct bpf_map *map;
void *key, *value, *ptr;
+   struct fd f;
int err;
 
if (CHECK_ATTR(BPF_MAP_LOOKUP_ELEM))
return -EINVAL;
 
+   f = fdget(ufd);
map = bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -213,14 +214,15 @@ static int map_update_elem(union bpf_attr *attr)
void __user *ukey = u64_to_ptr(attr->key);
void __user *uvalue = u64_to_ptr(attr->value);
int ufd = attr->map_fd;
-   struct fd f = fdget(ufd);
struct bpf_map *map;
void *key, *value;
+   struct fd f;
int err;
 
if (CHECK_ATTR(BPF_MAP_UPDATE_ELEM))
return -EINVAL;
 
+   f = fdget(ufd);
map = bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -265,14 +267,15 @@ static int map_delete_elem(union bpf_attr *attr)
 {
void __user *ukey = u64_to_ptr(attr->key);
int ufd = attr->map_fd;
-   struct fd f = fdget(ufd);
struct bpf_map *map;
+   struct fd f;
void *key;
int err;
 
if (CHECK_ATTR(BPF_MAP_DELETE_ELEM))
return -EINVAL;
 
+   f = fdget(ufd);
map = bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
@@ -305,14 +308,15 @@ static int map_get_next_key(union bpf_attr *attr)
void __user *ukey = u64_to_ptr(attr->key);
void __user *unext_key = u64_to_ptr(attr->next_key);
int ufd = attr->map_fd;
-   struct fd f = fdget(ufd);
struct bpf_map *map;
void *key, *next_key;
+   struct fd f;
int err;
 
if (CHECK_ATTR(BPF_MAP_GET_NEXT_KEY))
return -EINVAL;
 
+   f = fdget(ufd);
map = bpf_map_get(f);
if (IS_ERR(map))
return PTR_ERR(map);
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH mm] slab: implement bulking for SLAB allocator

2015-09-08 Thread Jesper Dangaard Brouer
On Tue, 8 Sep 2015 10:22:32 -0500 (CDT)
Christoph Lameter  wrote:

> On Tue, 8 Sep 2015, Jesper Dangaard Brouer wrote:
> 
> > Also notice how well bulking maintains the performance when the bulk
> > size increases (which is a soar spot for the slub allocator).
> 
> Well you are not actually completing the free action in SLAB. This is
> simply queueing the item to be freed later. Also was this test done on a
> NUMA system? Alien caches at some point come into the picture.

This test was a single CPU benchmark with no congestion or concurrency.
But the code was compiled with CONFIG_NUMA=y.

I don't know the slAb code very well, but the kmem_cache_node->list_lock
looks like a scalability issue.  I guess that is what you are referring
to ;-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Sr. Network Kernel Developer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT] Networking

2015-09-08 Thread Rustad, Mark D
> On Sep 7, 2015, at 4:02 AM, David Laight  wrote:
> 
> Feed:
> int bar(int (*f)[10]) { return sizeof *f; }
> into cc -O2 -S and look at the generated code - returns 40 not 4.

Yes, indeed it does. And with clang too. I guess I was too easily discouraged 
when looking for a workable syntax some years ago. At the time I stopped when 
the typedef worked, which really just encapsulates this. I should have 
recognized that then. Thanks for making it all so clear.

--
Mark Rustad, Networking Division, Intel Corporation



signature.asc
Description: Message signed with OpenPGP using GPGMail


[PATCH] alx: Add the Device ID for the Killer E2400 03:00.0 Ethernet controller: Qualcomm Atheros Device e0a1 (rev 10) Subsystem: Micro-Star International Co., Ltd. [MSI] Device 7977 Control: I/O+ M

2015-09-08 Thread Ben Pope
From: BenPope 

Signed-off-by: BenPope 
---
 drivers/net/ethernet/atheros/alx/main.c | 2 ++
 drivers/net/ethernet/atheros/alx/reg.h  | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/atheros/alx/main.c 
b/drivers/net/ethernet/atheros/alx/main.c
index c8af3ce..bd377a6 100644
--- a/drivers/net/ethernet/atheros/alx/main.c
+++ b/drivers/net/ethernet/atheros/alx/main.c
@@ -1534,6 +1534,8 @@ static const struct pci_device_id alx_pci_tbl[] = {
  .driver_data = ALX_DEV_QUIRK_MSI_INTX_DISABLE_BUG },
{ PCI_VDEVICE(ATTANSIC, ALX_DEV_ID_E2200),
  .driver_data = ALX_DEV_QUIRK_MSI_INTX_DISABLE_BUG },
+   { PCI_VDEVICE(ATTANSIC, ALX_DEV_ID_E2400),
+ .driver_data = ALX_DEV_QUIRK_MSI_INTX_DISABLE_BUG },
{ PCI_VDEVICE(ATTANSIC, ALX_DEV_ID_AR8162),
  .driver_data = ALX_DEV_QUIRK_MSI_INTX_DISABLE_BUG },
{ PCI_VDEVICE(ATTANSIC, ALX_DEV_ID_AR8171) },
diff --git a/drivers/net/ethernet/atheros/alx/reg.h 
b/drivers/net/ethernet/atheros/alx/reg.h
index af006b4..0959e68 100644
--- a/drivers/net/ethernet/atheros/alx/reg.h
+++ b/drivers/net/ethernet/atheros/alx/reg.h
@@ -37,6 +37,7 @@
 
 #define ALX_DEV_ID_AR8161  0x1091
 #define ALX_DEV_ID_E2200   0xe091
+#define ALX_DEV_ID_E2400   0xe0a1
 #define ALX_DEV_ID_AR8162  0x1090
 #define ALX_DEV_ID_AR8171  0x10A1
 #define ALX_DEV_ID_AR8172  0x10A0
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: _DSD standardization note (WAS: Re: [PATCH 2/2] net, thunder, bgx: Add support for ACPI binding.)

2015-09-08 Thread David Daney

On 09/05/2015 01:00 PM, Jon Masters wrote:

Following up on this thread after finally seeing it...figured I would
send something just for the archive mainly (we discussed this in person
recently at a few different events and I think are aligned already).

On 08/07/2015 08:28 PM, Rafael J. Wysocki wrote:

Hi David,

On Sat, Aug 8, 2015 at 2:11 AM, David Daney  wrote:

On 08/07/2015 05:05 PM, Rafael J. Wysocki wrote:


[cut]



It is actually useful to people as far as I can say.

Also, if somebody is going to use properties with ACPI, why whould
they use a different set of properties with DT?


Generally speaking, if there's a net new thing to handle, there is of
course no particular problem with using DT as an inspiration, but we
need to be conscious of the fact that Linux isn't the only Operating
System that may need to support these bindings, so the correct thing (as
stated by many of you, and below, and in person also recently - so we
are aligned) is to get this (the MAC address binding for _DSD in ACPI)
standardized properly through UEFI where everyone who has a vest OS
interest beyond Linux can also have their own involvement as well. As
discussed, that doesn't make it part of ACPI6.0, just a binding.

FWIW I made a decision on the Red Hat end that in our guidelines to
partners for ARM RHEL(SA - Server for ARM) builds we would not generally
endorse any use of _DSD, with the exception of the MAC address binding
being discussed here. In that case, I realized I had not been fully
prescriptive enough with the vendors building early hw in that I should
have realized this would happen and have required that they do this the
right way. MAC IP should be more sophisticated such that it can handle
being reset even after the firmware has loaded its MAC address(es).
Platform flash storage separate from UEFI variable storage (which is
being abused to contain too much now that DXE drivers then load into the
ACPI tables prior to exiting Boot Services, etc.) should contain the
actual MAC address(es), as it is done on other arches, and it should not
be necessary to communicate this via ACPI tables to begin with (that's a
cost saving embedded concept that should not happen on server systems in
the general case). I have already had several unannounced future designs
adjusted in light of this _DSD usage.

In the case of providing MAC address information (only) by _DSD, I
connected the initial ARMv8 SoC silicon vendors who needed to use such a
hack to ensure they were using the same properties, and will followup
off list to ensure Cavium are looped into that. But, we do need to get
the _DSD property for MAC address provision standardized through UEFI
properly as an official binding rather than just a link on the website,
and then we need to be extremely careful not to grow any further
dependence upon _DSD elsewhere. Generally, if you're using that approach
on a server system (other than for this MAC case), your firmware or
design (or both) need to be modified to not use _DSD.


I think we need to be cognizant of the fact that ARMv8 SoCs do contain, 
and in the future will still contain, many different hardware bus 
interface devices.  These include I2C, MDIO, GPIO, xMII (x in {,SG,RGM, 
etc} network MAC interfaces).  In the context of network interfaces 
these are often used in conjunction with stand-alone PHY devices.


A network driver on such a system must know several things that cannot 
be probed, and thus must be communicated by the firmware:


 - PHY type/model-number.

 - PHY management channel (MDIO/I2C + management bus address)

 - PHY interrupt connection, if any, (Often a GPIO pin).

 - SFP eeprom location (Which I2C bus is it on).

On x86 systems, all those things were placed on a PCI NIC, and the 
configuration could be identified in a stand alone manner by the NIC 
driver, so everything was simple.


For SoC based systems, I don't see a better alternative than to expose 
the topology via firmware tables.  In the case of OF device-tree, this 
is done in a standard manner with "phy-handle" and "interrupts" 
properties utilizing phandle links to traverse branches of the device tree.


I am not an ACPI guru, so I don't know for certain the best way to 
express this in ACPI, but it seems like _DSD may have to be involved.


David Daney


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH mm] slab: implement bulking for SLAB allocator

2015-09-08 Thread Christoph Lameter
On Tue, 8 Sep 2015, Jesper Dangaard Brouer wrote:

> This test was a single CPU benchmark with no congestion or concurrency.
> But the code was compiled with CONFIG_NUMA=y.
>
> I don't know the slAb code very well, but the kmem_cache_node->list_lock
> looks like a scalability issue.  I guess that is what you are referring
> to ;-)

That lock can be mitigated like in SLUB by increasing per cpu resources.
The problem in SLAB is the categorization of objects on free as to which
node they came from and the use of arrays of pointers to avoid freeing the
object to the object tracking metadata structures in the slab page.

The arrays of pointers have to be replicated for each node, each slab and
each processor.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/3] Network stack, first user of SLAB/kmem_cache bulk free API.

2015-09-08 Thread Christoph Lameter
On Sat, 5 Sep 2015, Jesper Dangaard Brouer wrote:

> The double_cmpxchg without lock prefix still cost 9 cycles, which is
> very fast but still a cost (add approx 19 cycles for a lock prefix).
>
> It is slower than local_irq_disable + local_irq_enable that only cost
> 7 cycles, which the bulking call uses.  (That is the reason bulk calls
> with 1 object can almost compete with fastpath).

Hmmm... Guess we need to come up with distinct version of kmalloc() for
irq and non irq contexts to take advantage of that . Most at non irq
context anyways.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net v4] ipv6: fix multipath route replace error recovery

2015-09-08 Thread Roopa Prabhu
From: Roopa Prabhu 

Problem:
The ecmp route replace support for ipv6 in the kernel, deletes the
existing ecmp route too early, ie when it installs the first nexthop.
If there is an error in installing the subsequent nexthops, its too late
to recover the already deleted existing route leaving the fib
in an inconsistent state.

This patch reduces the possibility of this by doing the following:
a) Changes the existing multipath route add code to a two stage process:
  build rt6_infos + insert them
ip6_route_add rt6_info creation code is moved into
ip6_route_info_create.
b) This ensures that most errors are caught during building rt6_infos
  and we fail early
c) Separates multipath add and del code. Because add needs the special
  two stage mode in a) and delete essentially does not care.
d) In any event if the code fails during inserting a route again, a
  warning is printed (This should be unlikely)

Before the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
/* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
 * kernel */

After the patch:
$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

/* Try replacing the route with a duplicate nexthop */
$ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
RTNETLINK answers: File exists

$ip -6 route show
3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Roopa Prabhu 

v1 - v2 : fix leak
v2 - v3: fix 'Fixes' tag and warn msg (feedback from nicolas)
 resending against net
v3 - v4: reword warn msg (feedback from nicolas). I still print the
 nexthops in the warning to help user know the offending
 route replace. The msg is printed for each nexthop which I
 think should be ok because this is consistent with all other cases
 (notifications etc) where IPV6 multipath nexthops are
 treated as individual routes and this warn should be very
 unlikely.
---
 net/ipv6/route.c | 201 ---
 1 file changed, 175 insertions(+), 26 deletions(-)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index f45cac6..34539d3 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1748,7 +1748,7 @@ static int ip6_convert_metrics(struct mx6_config *mxc,
return -EINVAL;
 }
 
-int ip6_route_add(struct fib6_config *cfg)
+int ip6_route_info_create(struct fib6_config *cfg, struct rt6_info **rt_ret)
 {
int err;
struct net *net = cfg->fc_nlinfo.nl_net;
@@ -1756,7 +1756,6 @@ int ip6_route_add(struct fib6_config *cfg)
struct net_device *dev = NULL;
struct inet6_dev *idev = NULL;
struct fib6_table *table;
-   struct mx6_config mxc = { .mx = NULL, };
int addr_type;
 
if (cfg->fc_dst_len > 128 || cfg->fc_src_len > 128)
@@ -1981,6 +1980,32 @@ install_route:
 
cfg->fc_nlinfo.nl_net = dev_net(dev);
 
+   *rt_ret = rt;
+
+   return 0;
+out:
+   if (dev)
+   dev_put(dev);
+   if (idev)
+   in6_dev_put(idev);
+   if (rt)
+   dst_free(&rt->dst);
+
+   *rt_ret = NULL;
+
+   return err;
+}
+
+int ip6_route_add(struct fib6_config *cfg)
+{
+   struct mx6_config mxc = { .mx = NULL, };
+   struct rt6_info *rt = NULL;
+   int err;
+
+   err = ip6_route_info_create(cfg, &rt);
+   if (err)
+   goto out;
+
err = ip6_convert_metrics(&mxc, cfg);
if (err)
goto out;
@@ -1988,14 +2013,12 @@ install_route:
err = __ip6_ins_rt(rt, &cfg->fc_nlinfo, &mxc);
 
kfree(mxc.mx);
+
return err;
 out:
-   if (dev)
-   dev_put(dev);
-   if (idev)
-   in6_dev_put(idev);
if (rt)
dst_free(&rt->dst);
+
return err;
 }
 
@@ -2776,19 +2799,78 @@ errout:
return err;
 }
 
-static int ip6_route_multipath(struct fib6_config *cfg, int add)
+struct rt6_nh {
+   struct rt6_info *rt6_info;
+   struct fib6_config r_cfg;
+

Re: [PATCH v3] stmmac: fix check for phydev being open

2015-09-08 Thread Sergei Shtylyov

Hello.

On 09/08/2015 03:46 PM, Alexey Brodkin wrote:


Current check of phydev with IS_ERR(phydev) may make not much sense
because of_phy_connect() returns NULL on failure instead of error value.

Still for checking result of phy_connect() IS_ERR() makes perfect sense.

So let's use combined check IS_ERR_OR_NULL() that covers both cases.

Cc: Sergei Shtylyov 
Cc: Giuseppe Cavallaro 
Cc: linux-ker...@vger.kernel.org
Cc: sta...@vger.kernel.org
Cc: David Miller 
Signed-off-by: Alexey Brodkin 
---

Changes compared to v2:
   * Updated commit message with mention of of_phy_connect() instead of
 of_phy_attach().
   * Return ENODEV in case of of_phy_connect() failure

Changes compared to v1:
   * Use IS_ERR_OR_NULL() instead of discrete checks for null and err

   drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 7 +--
   1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 864b476..e2c9c86 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -837,9 +837,12 @@ static int stmmac_init_phy(struct net_device *dev)
 interface);
}

-   if (IS_ERR(phydev)) {
+   if (IS_ERR_OR_NULL(phydev)) {
pr_err("%s: Could not attach to PHY\n", dev->name);
-   return PTR_ERR(phydev);
+   if (!phydev)
+   return -ENODEV;
+   else
+   return PTR_ERR(phydev);


 Don't need *else* after *return* and scripts/checkpatch.pl should have
complained about that.


./scripts/checkpatch.pl 0001-stmmac-fix-check-for-phydev-being-open.patch
total: 0 errors, 0 warnings, 0 checks, 14 lines checked


   Hm... I bet I saw such warning from checkpatch.pl recently (it was a false 
positive though, so maybe the check was removed recently, not sure). Your 
patch is clean indeed, however my comment is still valid.



-Alexey


MBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net v4] ipv6: fix multipath route replace error recovery

2015-09-08 Thread Nikolay Aleksandrov
On 09/08/2015 07:53 PM, Roopa Prabhu wrote:
> From: Roopa Prabhu 
> 
> Problem:
> The ecmp route replace support for ipv6 in the kernel, deletes the
> existing ecmp route too early, ie when it installs the first nexthop.
> If there is an error in installing the subsequent nexthops, its too late
> to recover the already deleted existing route leaving the fib
> in an inconsistent state.
> 
> This patch reduces the possibility of this by doing the following:
> a) Changes the existing multipath route add code to a two stage process:
>   build rt6_infos + insert them
>   ip6_route_add rt6_info creation code is moved into
>   ip6_route_info_create.
> b) This ensures that most errors are caught during building rt6_infos
>   and we fail early
> c) Separates multipath add and del code. Because add needs the special
>   two stage mode in a) and delete essentially does not care.
> d) In any event if the code fails during inserting a route again, a
>   warning is printed (This should be unlikely)
> 
> Before the patch:
> $ip -6 route show
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
> 
> /* Try replacing the route with a duplicate nexthop */
> $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
> fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
> swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
> RTNETLINK answers: File exists
> 
> $ip -6 route show
> /* previously added ecmp route 3000:1000:1000:1000::2 dissappears from
>  * kernel */
> 
> After the patch:
> $ip -6 route show
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
> 
> /* Try replacing the route with a duplicate nexthop */
> $ip -6 route change 3000:1000:1000:1000::2/128 nexthop via
> fe80::202:ff:fe00:b dev swp49s0 nexthop via fe80::202:ff:fe00:d dev
> swp49s1 nexthop via fe80::202:ff:fe00:d dev swp49s1
> RTNETLINK answers: File exists
> 
> $ip -6 route show
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:b dev swp49s0 metric 1024
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:d dev swp49s1 metric 1024
> 3000:1000:1000:1000::2 via fe80::202:ff:fe00:f dev swp49s2 metric 1024
> 
> Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
> Signed-off-by: Roopa Prabhu 
> 
> v1 - v2 : fix leak
> v2 - v3: fix 'Fixes' tag and warn msg (feedback from nicolas)
>  resending against net
> v3 - v4: reword warn msg (feedback from nicolas). I still print the
>  nexthops in the warning to help user know the offending
>  route replace. The msg is printed for each nexthop which I
>  think should be ok because this is consistent with all other cases
>  (notifications etc) where IPV6 multipath nexthops are
>  treated as individual routes and this warn should be very
>  unlikely.
> ---
>  net/ipv6/route.c | 201 
> ---
>  1 file changed, 175 insertions(+), 26 deletions(-)
> 

I went over it and also ran a few tests with the change, IMO printing
the offending entry is helpful to analyze the problem.
FWIW,

Reviewed-by: Nikolay Aleksandrov 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v3] drivers: net: cpsw: Add support to drive gpios for ethernet to be functional

2015-09-08 Thread Tony Lindgren
* Mugunthan V N  [150907 02:50]:
> In DRA72x EVM, by default slave 1 is connected to the onboard
> phy, but slave 2 pins are also muxed with video input module
> which is controlled by pcf857x gpio and currently to select slave
> 0 to connect to phy gpio hogging is used, but with
> omap2plus_defconfig the pcf857x gpio is built as module. So when
> using NFS on DRA72x EVM, board doesn't boot as gpio hogging do
> not set proper gpio state to connect slave 0 to phy as it is
> built as module and you do not see any errors for not setting
> gpio and just mentions dhcp reply not got.
> 
> To solve this issue, introducing "mode-gpios" in DT when gpio
> based muxing is required. This will throw a warning when gpio
> get fails and returns probe defer. When gpio-pcf857x module is
> installed, cpsw probes again and ethernet becomes functional.
> Verified this on DRA72x with pcf as module and ramdisk.
> 
> Signed-off-by: Mugunthan V N 

Acked-by: Tony Lindgren 

> ---
> 
> Changes from v2:
> * Used mode-gpios, so that the driver is generic enough to handle
>   multiple gpios
> 
> This patch is tested on DRA72x, Logs [1] and pushed a branch [2]
> 
> [1]: http://pastebin.ubuntu.com/12306224/
> [2]: git://git.ti.com/~mugunthanvnm/ti-linux-kernel/linux.git 
> cpsw-gpio-optional-v3
> 
> ---
>  Documentation/devicetree/bindings/net/cpsw.txt | 7 +++
>  drivers/net/ethernet/ti/cpsw.c | 9 +
>  2 files changed, 16 insertions(+)
> 
> diff --git a/Documentation/devicetree/bindings/net/cpsw.txt 
> b/Documentation/devicetree/bindings/net/cpsw.txt
> index a9df21a..676ecf6 100644
> --- a/Documentation/devicetree/bindings/net/cpsw.txt
> +++ b/Documentation/devicetree/bindings/net/cpsw.txt
> @@ -30,6 +30,13 @@ Optional properties:
>  - dual_emac  : Specifies Switch to act as Dual EMAC
>  - syscon : Phandle to the system control device node, which is
> the control module device of the am33x
> +- mode-gpios : Should be added if one/multiple gpio lines are
> +   required to be driven so that cpsw data lines
> +   can be connected to the phy via selective mux.
> +   For example in dra72x-evm, pcf gpio has to be
> +   driven low so that cpsw slave 0 and phy data
> +   lines are connected via mux.
> +
>  
>  Slave Properties:
>  Required properties:
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index 8fc90f1..c670317 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -2207,6 +2208,7 @@ static int cpsw_probe(struct platform_device *pdev)
>   void __iomem*ss_regs;
>   struct resource *res, *ss_res;
>   const struct of_device_id   *of_id;
> + struct gpio_descs   *mode;
>   u32 slave_offset, sliver_offset, slave_size;
>   int ret = 0, i;
>   int irq;
> @@ -2232,6 +2234,13 @@ static int cpsw_probe(struct platform_device *pdev)
>   goto clean_ndev_ret;
>   }
>  
> + mode = devm_gpiod_get_array_optional(&pdev->dev, "mode", GPIOD_OUT_LOW);
> + if (IS_ERR(mode)) {
> + ret = PTR_ERR(mode);
> + dev_err(&pdev->dev, "gpio request failed, ret %d\n", ret);
> + goto clean_ndev_ret;
> + }
> +
>   /*
>* This may be required here for child devices.
>*/
> -- 
> 2.6.0.rc0.24.gec371ff
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] usbnet: Fix a race between usbnet_stop() and the BH

2015-09-08 Thread David Miller
From: Eugene Shatokhin 
Date: Tue,  1 Sep 2015 17:05:33 +0300

> The race may happen when a device (e.g. YOTA 4G LTE Modem) is
> unplugged while the system is downloading a large file from the Net.
> 
> Hardware breakpoints and Kprobes with delays were used to confirm that
> the race does actually happen.
> 
> The race is on skb_queue ('next' pointer) between usbnet_stop()
> and rx_complete(), which, in turn, calls usbnet_bh().
> 
> Here is a part of the call stack with the code where the changes to the
> queue happen. The line numbers are for the kernel 4.1.0:
 ...
> As a result, it is possible, for example, that the skb is removed from
> dev->rxq by __skb_unlink() before the check
> "!skb_queue_empty(&dev->rxq)" in usbnet_terminate_urbs() is made. It is
> also possible in this case that the skb is added to dev->done queue
> after "!skb_queue_empty(&dev->done)" is checked. So
> usbnet_terminate_urbs() may stop waiting and return while dev->done
> queue still has an item.
> 
> Locking in defer_bh() and usbnet_terminate_urbs() was revisited to avoid
> this race.
> 
> Signed-off-by: Eugene Shatokhin 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] net: irda: pxaficp_ir: convert to readl and writel

2015-09-08 Thread Petr Cvek
Dne 3.9.2015 v 08:20 Robert Jarzmik napsal(a):
> Convert the pxa IRDA driver to readl and writel primitives, and remove
> another set of direct registers access. This leaves only the DMA
> registers access, which will be dealt with dmaengine conversion.

Test on magician (nonvanilla, but there should not be any collision).

>  
> - err = request_mem_region(__PREG(STUART), 0x24, "IrDA") ? 0 : -EBUSY;
> - if (err)
> - goto err_mem_1;
> + res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> + ficp = devm_ioremap_resource(&pdev->dev, res);
> + if (IS_ERR(ficp)) {
> + dev_err(&pdev->dev, "resource ficp not defined\n");

Fails around here with:

[ 4245.368764] pxa2xx-ir pxa2xx-ir: invalid resource
[ 4245.369191] pxa2xx-ir pxa2xx-ir: resource ficp not defined
[ 4245.369364] pxa2xx-ir: probe of pxa2xx-ir failed with error -22

Did you defined resources somewhere? Actual resources are in "pxa_ir_resources" 
variable at:

http://lxr.free-electrons.com/source/arch/arm/mach-pxa/devices.c#L386

or this pdata should be moved into specific machine files?

Cheers,
Petr
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-08 Thread Corinna Vinschen
On Sep  8 02:02, Francois Romieu wrote:
> Francois Romieu  :
> [...]
> > Updated patch is on the way.
> 
> Fixed memcpy in patch 0001, moved counters allocation from open() 
> to probe(), returned open() to its original state but something is
> still wrong: the link does not come up.

I tested and debugged the attached patches.  Just as you noticed, the
interfaces (my test machine has two) don't come up at boot time and
subsequently I can also reproduce two kinds of crashes:

- Calling `ip link ... up' crashes the kernel in rtl_open like this:

[  138.031190]  [] dump_stack+0x44/0x55
[  138.036311]  [] __setup_irq+0x515/0x580
[  138.041693]  [] ? rtl8169_gset_xmii+0x20/0x20 [r8169]
[  138.048284]  [] request_threaded_irq+0xf4/0x1a0
[  138.054357]  [] rtl_open+0x3a7/0xab4 [r8169]
[...]

- Alternatively I can still reproduce the SEGV in rtl_remove_one
  when trying to rmmod the module, I just don't have the stack dump
  handy while writing this mail.  I can show it if needed.

I debugged this on and off the entire day (tweaking, compiling, rebooting,
kernel crash, rinse and repeat).

And the result of my debugging is totally crazy:

If I disable the call to rtl_init_counter_offsets in rtl_open, as in

  #if 0
retval = rtl_init_counter_offsets(dev);
if (retval < 0)
netif_warn(tp, hw, dev, "counter reset/update failed\n");
  #endif

the interfaces come up just fine.

If I reenable the rtl_init_counter_offsets call in rtl_open, and reduce
the rtl_init_counter_offsets function to just this:

  static int rtl_init_counter_offsets(struct net_device *dev)
  {
  return 1;
  }

then the interfaces refuse to come up, and a subsequent `ip link ... up'
crashes the kernel.

No, I do not understand this :(


Corinna


pgp2h7zEl3YBC.pgp
Description: PGP signature


Re: [PATCH net-next] net: Remove VRF change to udp_sendmsg

2015-09-08 Thread David Miller
From: David Ahern 
Date: Mon, 31 Aug 2015 14:44:46 -0600

> If anything I should be going straight to fib_table_lookup in the VRF
> driver for this new lookup to get the source address. It knows the
> exact table that should be used and hence can avoid the rules walk +
> local table miss which happens using the ip_route_x functions as
> well as the rth lookup/create which is not needed here.

Definitely if this is the case you should do a direct fib_table_lookup().
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3 v7] net: track link-status of ipv4 nexthops

2015-09-08 Thread Julian Anastasov

Hello,

On Tue, 23 Jun 2015, Andy Gospodarek wrote:

> Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
> reachable via an interface where carrier is off.  No action is taken,
> but additional flags are passed to userspace to indicate carrier status.
> 
> This also includes a cleanup to fib_disable_ip to more clearly indicate
> what event made the function call to replace the more cryptic force
> option previously used.
> 

> diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c
> index 872494e..534eb14 100644
> --- a/net/ipv4/fib_frontend.c
> +++ b/net/ipv4/fib_frontend.c

> @@ -1093,7 +1093,7 @@ static int fib_inetaddr_event(struct notifier_block 
> *this, unsigned long event,
>   /* Last address was deleted from this interface.
>* Disable IP.
>*/
> - fib_disable_ip(dev, 1);
> + fib_disable_ip(dev, event);

NETDEV_DOWN for "inetaddr" event is used instead of force=1 ...

>   case NETDEV_DOWN:
> - fib_disable_ip(dev, 0);
> + fib_disable_ip(dev, event);

Ops, NETDEV_DOWN for different "netdev" event is
used this time for force=0 ...

> diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c
> index 28ec3c1..b1b305b 100644
> --- a/net/ipv4/fib_semantics.c
> +++ b/net/ipv4/fib_semantics.c

> @@ -1112,7 +1124,8 @@ int fib_sync_down_dev(struct net_device *dev, int force)
>   struct hlist_head *head = &fib_info_devhash[hash];
>   struct fib_nh *nh;
>  
> - if (force)
> + if (event == NETDEV_UNREGISTER ||
> + event == NETDEV_DOWN)

Wrong, both kinds of NETDEV_DOWN events set scope = -1.
May be this leads to removal of RT_SCOPE_NOWHERE NHs when
"netdev" event NETDEV_DOWN comes. May be we still need a
flag to know if the NETDEV_DOWN event was for force=1 and
to check it again above? Or may be to use "force" flag again?

>   scope = -1;

Regards

--
Julian Anastasov 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] bpf: fix out of bounds access in verifier log

2015-09-08 Thread Alexei Starovoitov
when the verifier log is enabled the print_bpf_insn() is doing
bpf_alu_string[BPF_OP(insn->code) >> 4]
and
bpf_jmp_string[BPF_OP(insn->code) >> 4]
where BPF_OP is a 4-bit instruction opcode.
Malformed insns can cause out of bounds access.
Fix it by sizing arrays appropriately.

The bug was found by clang address sanitizer with libfuzzer.

Reported-by: Yonghong Song 
Signed-off-by: Alexei Starovoitov 
---
fyi sanitizer error looks like:
...
 27 invalid dst register in STX OK
 28 invalid dst register in ST OK
 29 invalid src register in LDX OK
 30 invalid dst register in LDX OK
 31 junk insn OK
 32 junk insn2 OK
=
==52730==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x00500c58
READ of size 8 at 0x00500c58 thread T0
#0 0x4e480b in print_bpf_insn verifier.c:332:5
#1 0x4e1bcb in do_check verifier.c:1657:4
...
0x00500c58 is located 8 bytes to the right of global variable 
'bpf_alu_string'
defined in 'verifier.c:286:26' (0x500be0) of size 112
---
 kernel/bpf/verifier.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index ed12e385fb75..b074b23000d6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -283,7 +283,7 @@ static const char *const bpf_class_string[] = {
[BPF_ALU64] = "alu64",
 };
 
-static const char *const bpf_alu_string[] = {
+static const char *const bpf_alu_string[16] = {
[BPF_ADD >> 4]  = "+=",
[BPF_SUB >> 4]  = "-=",
[BPF_MUL >> 4]  = "*=",
@@ -307,7 +307,7 @@ static const char *const bpf_ldst_string[] = {
[BPF_DW >> 3] = "u64",
 };
 
-static const char *const bpf_jmp_string[] = {
+static const char *const bpf_jmp_string[16] = {
[BPF_JA >> 4]   = "jmp",
[BPF_JEQ >> 4]  = "==",
[BPF_JGT >> 4]  = ">",
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] device property: Don't overwrite addr when failing in device_get_mac_address

2015-09-08 Thread David Miller
From: Julien Grall 
Date: Thu, 3 Sep 2015 23:59:50 +0100

> The function device_get_mac_address is trying different property names
> in order to get the mac address. To check the return value, the variable
> addr (which contain the buffer pass by the caller) will be re-used. This
> means that if the previous property is not found, the next property will
> be read using a NULL buffer.
> 
> Therefore it's only possible to retrieve the mac if node contains a
> property "mac-address". Fix it by using a temporary buffer for the
> return value.
> 
> This has been introduced by commit 4c96b7dc0d393f12c17e0d81db15aa4a820a6ab3
> "Add a matching set of device_ functions for determining mac/phy"
> 
> Signed-off-by: Julien Grall 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH net-next 2/3] net: switchdev: extract switchdev_obj_ipv4_fib

2015-09-08 Thread Vivien Didelot
Move the switchdev_obj_ipv4_fib structure out of the switchdev_obj
union.

This lightens the switchdev_obj structure and allows drivers to access
the object transaction and callback directly from a
switchdev_obj_ipv4_fib. This is more consistent and convenient for add
and dump operations.

The patch updates the Rocker driver accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c |  4 ++--
 include/net/switchdev.h  | 21 +
 net/switchdev/switchdev.c| 44 
 3 files changed, 34 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index e72d49a..41aabbc 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4447,7 +4447,7 @@ static int rocker_port_obj_add(struct net_device *dev,
err = rocker_port_vlans_add(rocker_port, vlan);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
-   fib4 = &obj->u.ipv4_fib;
+   fib4 = (struct switchdev_obj_ipv4_fib *) obj;
err = rocker_port_fib_ipv4(rocker_port, obj->trans,
   htonl(fib4->dst), fib4->dst_len,
   fib4->fi, fib4->tb_id, 0);
@@ -4519,7 +4519,7 @@ static int rocker_port_obj_del(struct net_device *dev,
err = rocker_port_vlans_del(rocker_port, vlan);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
-   fib4 = &obj->u.ipv4_fib;
+   fib4 = (struct switchdev_obj_ipv4_fib *) obj;
err = rocker_port_fib_ipv4(rocker_port, SWITCHDEV_TRANS_NONE,
   htonl(fib4->dst), fib4->dst_len,
   fib4->fi, fib4->tb_id,
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 55fa106..0b76aa8 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -55,15 +55,6 @@ struct switchdev_obj {
enum switchdev_trans trans;
int (*cb)(struct net_device *dev, struct switchdev_obj *obj);
union {
-   struct switchdev_obj_ipv4_fib { /* IPV4_FIB */
-   u32 dst;
-   int dst_len;
-   struct fib_info *fi;
-   u8 tos;
-   u8 type;
-   u32 nlflags;
-   u32 tb_id;
-   } ipv4_fib;
struct switchdev_obj_fdb {  /* PORT_FDB */
const unsigned char *addr;
u16 vid;
@@ -80,6 +71,18 @@ struct switchdev_obj_vlan {
u16 vid_end;
 };
 
+/* SWITCHDEV_OBJ_IPV4_FIB */
+struct switchdev_obj_ipv4_fib {
+   struct switchdev_obj obj;   /* must be first */
+   u32 dst;
+   int dst_len;
+   struct fib_info *fi;
+   u8 tos;
+   u8 type;
+   u32 nlflags;
+   u32 tb_id;
+};
+
 /**
  * struct switchdev_ops - switchdev operations
  *
diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 9923a97..10fde6f 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -936,17 +936,15 @@ static struct net_device *switchdev_get_dev_by_nhs(struct 
fib_info *fi)
 int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct fib_info *fi,
   u8 tos, u8 type, u32 nlflags, u32 tb_id)
 {
-   struct switchdev_obj fib_obj = {
-   .id = SWITCHDEV_OBJ_IPV4_FIB,
-   .u.ipv4_fib = {
-   .dst = dst,
-   .dst_len = dst_len,
-   .fi = fi,
-   .tos = tos,
-   .type = type,
-   .nlflags = nlflags,
-   .tb_id = tb_id,
-   },
+   struct switchdev_obj_ipv4_fib fib_obj = {
+   .obj.id = SWITCHDEV_OBJ_IPV4_FIB,
+   .dst = dst,
+   .dst_len = dst_len,
+   .fi = fi,
+   .tos = tos,
+   .type = type,
+   .nlflags = nlflags,
+   .tb_id = tb_id,
};
struct net_device *dev;
int err = 0;
@@ -967,7 +965,7 @@ int switchdev_fib_ipv4_add(u32 dst, int dst_len, struct 
fib_info *fi,
if (!dev)
return 0;
 
-   err = switchdev_port_obj_add(dev, &fib_obj);
+   err = switchdev_port_obj_add(dev, &fib_obj.obj);
if (!err)
fi->fib_flags |= RTNH_F_OFFLOAD;
 
@@ -990,17 +988,15 @@ EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_add);
 int switchdev_fib_ipv4_del(u32 dst, int dst_len, struct fib_info *fi,
   u8 tos, u8 type, u32 tb_id)
 {
-   struct switchdev_obj fib_obj = {
-   .id = SWITCHDEV_OBJ_IPV4_FIB,
-   .u.ipv4_fib = {
-   .dst = dst,
-   .dst_len = dst_len,
-

[RFC PATCH net-next 1/3] net: switchdev: extract switchdev_obj_vlan

2015-09-08 Thread Vivien Didelot
Move the switchdev_obj_vlan structure out of the switchdev_obj union.

This lightens the switchdev_obj structure and allows drivers to access
the object transaction and callback directly from a switchdev_obj_vlan.
This is more consistent and convenient for add and dump operations.

The patch updates bridge, the Rocker driver and DSA accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 21 +
 include/net/switchdev.h  | 13 +++
 net/bridge/br_vlan.c | 26 +
 net/dsa/slave.c  | 27 --
 net/switchdev/switchdev.c| 44 ++--
 5 files changed, 68 insertions(+), 63 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 34ac41a..e72d49a 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4394,14 +4394,13 @@ static int rocker_port_vlan_add(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_vlans_add(struct rocker_port *rocker_port,
-enum switchdev_trans trans,
 const struct switchdev_obj_vlan *vlan)
 {
u16 vid;
int err;
 
for (vid = vlan->vid_begin; vid <= vlan->vid_end; vid++) {
-   err = rocker_port_vlan_add(rocker_port, trans,
+   err = rocker_port_vlan_add(rocker_port, vlan->obj.trans,
   vid, vlan->flags);
if (err)
return err;
@@ -4427,6 +4426,7 @@ static int rocker_port_obj_add(struct net_device *dev,
   struct switchdev_obj *obj)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
+   const struct switchdev_obj_vlan *vlan;
const struct switchdev_obj_ipv4_fib *fib4;
int err = 0;
 
@@ -4443,8 +4443,8 @@ static int rocker_port_obj_add(struct net_device *dev,
 
switch (obj->id) {
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlans_add(rocker_port, obj->trans,
-   &obj->u.vlan);
+   vlan = (struct switchdev_obj_vlan *) obj;
+   err = rocker_port_vlans_add(rocker_port, vlan);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
fib4 = &obj->u.ipv4_fib;
@@ -4509,12 +4509,14 @@ static int rocker_port_obj_del(struct net_device *dev,
   struct switchdev_obj *obj)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
+   const struct switchdev_obj_vlan *vlan;
const struct switchdev_obj_ipv4_fib *fib4;
int err = 0;
 
switch (obj->id) {
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlans_del(rocker_port, &obj->u.vlan);
+   vlan = (struct switchdev_obj_vlan *) obj;
+   err = rocker_port_vlans_del(rocker_port, vlan);
break;
case SWITCHDEV_OBJ_IPV4_FIB:
fib4 = &obj->u.ipv4_fib;
@@ -4563,9 +4565,8 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_vlan_dump(const struct rocker_port *rocker_port,
-struct switchdev_obj *obj)
+struct switchdev_obj_vlan *vlan)
 {
-   struct switchdev_obj_vlan *vlan = &obj->u.vlan;
u16 vid;
int err = 0;
 
@@ -4576,7 +4577,7 @@ static int rocker_port_vlan_dump(const struct rocker_port 
*rocker_port,
if (rocker_vlan_id_is_internal(htons(vid)))
vlan->flags |= BRIDGE_VLAN_INFO_PVID;
vlan->vid_begin = vlan->vid_end = vid;
-   err = obj->cb(rocker_port->dev, obj);
+   err = vlan->obj.cb(rocker_port->dev, &vlan->obj);
if (err)
break;
}
@@ -4588,6 +4589,7 @@ static int rocker_port_obj_dump(struct net_device *dev,
struct switchdev_obj *obj)
 {
const struct rocker_port *rocker_port = netdev_priv(dev);
+   struct switchdev_obj_vlan *vlan;
int err = 0;
 
switch (obj->id) {
@@ -4595,7 +4597,8 @@ static int rocker_port_obj_dump(struct net_device *dev,
err = rocker_port_fdb_dump(rocker_port, obj);
break;
case SWITCHDEV_OBJ_PORT_VLAN:
-   err = rocker_port_vlan_dump(rocker_port, obj);
+   vlan = (struct switchdev_obj_vlan *) obj;
+   err = rocker_port_vlan_dump(rocker_port, vlan);
break;
default:
err = -EOPNOTSUPP;
diff --git a/include/net/switchdev.h b/include/net/switchdev.h
index 319baab..55fa106 100644
--- a/include/net/switchdev.h
+++ b/include/net/switchdev.h
@@ -55,11 +55,6 @@ struct switchdev_obj {
enum switchdev_trans trans;
int (*cb)(struct net_d

[RFC PATCH net-next 3/3] net: switchdev: extract switchdev_obj_fdb

2015-09-08 Thread Vivien Didelot
Move the switchdev_obj_fdb structure out of the switchdev_obj union.

This lightens the switchdev_obj structure and allows drivers to access
the object transaction and callback directly from a switchdev_obj_fdb.
This is more consistent and convenient for add and dump operations.

The patch updates bridge, the Rocker driver and DSA accordingly.

Signed-off-by: Vivien Didelot 
---
 drivers/net/ethernet/rocker/rocker.c | 25 ++-
 include/net/switchdev.h  | 15 +++---
 net/bridge/br_fdb.c  | 12 +--
 net/dsa/slave.c  | 32 -
 net/switchdev/switchdev.c| 39 
 5 files changed, 63 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 41aabbc..c3f25a9 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4410,7 +4410,6 @@ static int rocker_port_vlans_add(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_fdb_add(struct rocker_port *rocker_port,
-  enum switchdev_trans trans,
   const struct switchdev_obj_fdb *fdb)
 {
__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
@@ -4419,7 +4418,8 @@ static int rocker_port_fdb_add(struct rocker_port 
*rocker_port,
if (!rocker_port_is_bridged(rocker_port))
return -EINVAL;
 
-   return rocker_port_fdb(rocker_port, trans, fdb->addr, vlan_id, flags);
+   return rocker_port_fdb(rocker_port, fdb->obj.trans, fdb->addr, vlan_id,
+  flags);
 }
 
 static int rocker_port_obj_add(struct net_device *dev,
@@ -4428,6 +4428,7 @@ static int rocker_port_obj_add(struct net_device *dev,
struct rocker_port *rocker_port = netdev_priv(dev);
const struct switchdev_obj_vlan *vlan;
const struct switchdev_obj_ipv4_fib *fib4;
+   const struct switchdev_obj_fdb *fdb;
int err = 0;
 
switch (obj->trans) {
@@ -4453,7 +4454,8 @@ static int rocker_port_obj_add(struct net_device *dev,
   fib4->fi, fib4->tb_id, 0);
break;
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_add(rocker_port, obj->trans, &obj->u.fdb);
+   fdb = (struct switchdev_obj_fdb *) obj;
+   err = rocker_port_fdb_add(rocker_port, fdb);
break;
default:
err = -EOPNOTSUPP;
@@ -4493,7 +4495,6 @@ static int rocker_port_vlans_del(struct rocker_port 
*rocker_port,
 }
 
 static int rocker_port_fdb_del(struct rocker_port *rocker_port,
-  enum switchdev_trans trans,
   const struct switchdev_obj_fdb *fdb)
 {
__be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
@@ -4502,7 +4503,8 @@ static int rocker_port_fdb_del(struct rocker_port 
*rocker_port,
if (!rocker_port_is_bridged(rocker_port))
return -EINVAL;
 
-   return rocker_port_fdb(rocker_port, trans, fdb->addr, vlan_id, flags);
+   return rocker_port_fdb(rocker_port, fdb->obj.trans, fdb->addr, vlan_id,
+  flags);
 }
 
 static int rocker_port_obj_del(struct net_device *dev,
@@ -4511,6 +4513,7 @@ static int rocker_port_obj_del(struct net_device *dev,
struct rocker_port *rocker_port = netdev_priv(dev);
const struct switchdev_obj_vlan *vlan;
const struct switchdev_obj_ipv4_fib *fib4;
+   const struct switchdev_obj_fdb *fdb;
int err = 0;
 
switch (obj->id) {
@@ -4526,7 +4529,8 @@ static int rocker_port_obj_del(struct net_device *dev,
   ROCKER_OP_FLAG_REMOVE);
break;
case SWITCHDEV_OBJ_PORT_FDB:
-   err = rocker_port_fdb_del(rocker_port, obj->trans, &obj->u.fdb);
+   fdb = (struct switchdev_obj_fdb *) obj;
+   err = rocker_port_fdb_del(rocker_port, fdb);
break;
default:
err = -EOPNOTSUPP;
@@ -4537,10 +4541,9 @@ static int rocker_port_obj_del(struct net_device *dev,
 }
 
 static int rocker_port_fdb_dump(const struct rocker_port *rocker_port,
-   struct switchdev_obj *obj)
+   struct switchdev_obj_fdb *fdb)
 {
struct rocker *rocker = rocker_port->rocker;
-   struct switchdev_obj_fdb *fdb = &obj->u.fdb;
struct rocker_fdb_tbl_entry *found;
struct hlist_node *tmp;
unsigned long lock_flags;
@@ -4555,7 +4558,7 @@ static int rocker_port_fdb_dump(const struct rocker_port 
*rocker_port,
fdb->ndm_state = NUD_REACHABLE;
fdb->vid = rocker_port_vlan_to_vid(rocker_port,
   found->key.vlan_id);
-   err = obj->cb(rocke

[RFC PATCH net-next 0/3] net: switchdev: extract specific object structures

2015-09-08 Thread Vivien Didelot
Hi!

Current implementations of .switchdev_port_obj_add and .switchdev_port_obj_dump
must pass the generic switchdev_obj structure instead of a specific one (e.g.
switchdev_obj_fdb) to the related driver accessors, because it contains the
object transaction and callback. This is not very convenient for drivers.

Instead of having all specific structures included into a switchdev_obj union,
it would be simpler to embed a switchdev_obj structure as the first member of
the more specific ones. That way, a driver can cast the switchdev_obj pointer
to the specific structure and have access to all the information from it.

As an example, this allows Rocker to change its two FDB accessors:

rocker_port_fdb_add(rocker_port, obj->trans, &obj->u.fdb);
rocker_port_fdb_dump(rocker_port, obj);

for these most consistent ones:

fdb = (struct switchdev_obj_fdb *) obj;
rocker_port_fdb_add(rocker_port, fdb);
rocker_port_fdb_dump(rocker_port, fdb);

This is what struct netdev_notifier_info and its specific supersets (e.g.
struct netdev_notifier_changeupper_info) do in include/linux/netdevice.h.

This patchset does that and updates bridge, Rocker and DSA accordingly.

Note that this patchset was sent as an RFC not to bother David with new
net-next stuffs, but if the changes look good, it is ready to merge.

Also, please take note that the change sits on top of:
http://patchwork.ozlabs.org/patch/514894/

Vivien Didelot (3):
  net: switchdev: extract switchdev_obj_vlan
  net: switchdev: extract switchdev_obj_ipv4_fib
  net: switchdev: extract switchdev_obj_fdb

 drivers/net/ethernet/rocker/rocker.c |  50 --
 include/net/switchdev.h  |  49 --
 net/bridge/br_fdb.c  |  12 ++--
 net/bridge/br_vlan.c |  26 +++
 net/dsa/slave.c  |  59 +---
 net/switchdev/switchdev.c| 127 ---
 6 files changed, 165 insertions(+), 158 deletions(-)

-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bpf: fix out of bounds access in verifier log

2015-09-08 Thread Daniel Borkmann

On 09/08/2015 10:40 PM, Alexei Starovoitov wrote:

when the verifier log is enabled the print_bpf_insn() is doing
bpf_alu_string[BPF_OP(insn->code) >> 4]
and
bpf_jmp_string[BPF_OP(insn->code) >> 4]
where BPF_OP is a 4-bit instruction opcode.
Malformed insns can cause out of bounds access.
Fix it by sizing arrays appropriately.

The bug was found by clang address sanitizer with libfuzzer.

Reported-by: Yonghong Song 
Signed-off-by: Alexei Starovoitov 


Acked-by: Daniel Borkmann 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/3] net: introduce kfree_skb_bulk() user of kmem_cache_free_bulk()

2015-09-08 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Fri, 04 Sep 2015 19:00:53 +0200

> +/**
> + *   kfree_skb_bulk - bulk free SKBs when refcnt allows to
> + *   @skbs: array of SKBs to free
> + *   @size: number of SKBs in array
> + *
> + *   If SKB refcnt allows for free, then release any auxiliary data
> + *   and then bulk free SKBs to the SLAB allocator.
> + *
> + *   Note that interrupts must be enabled when calling this function.
> + */
> +void kfree_skb_bulk(struct sk_buff **skbs, unsigned int size)
> +{
> + int i;
> + size_t cnt = 0;
> +
> + for (i = 0; i < size; i++) {
> + struct sk_buff *skb = skbs[i];
> +
> + if (!skb_dec_and_test(skb))
> + continue; /* skip skb, not ready to free */
> +
> + /* Construct an array of SKBs, ready to be free'ed and
> +  * cleanup all auxiliary, before bulk free to SLAB.
> +  * For now, only handle non-cloned SKBs, related to
> +  * SLAB skbuff_head_cache
> +  */
> + if (skb->fclone == SKB_FCLONE_UNAVAILABLE) {
> + skb_release_all(skb);
> + skbs[cnt++] = skb;
> + } else {
> + /* SKB was a clone, don't handle this case */
> + __kfree_skb(skb);
> + }
> + }
> + if (likely(cnt)) {
> + kmem_cache_free_bulk(skbuff_head_cache, cnt, (void **) skbs);
> + }
> +}

You're going to have to do a trace_kfree_skb() or trace_consume_skb() for
these things.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next 1/3] net: switchdev: extract switchdev_obj_vlan

2015-09-08 Thread Andrew Lunn
> +/* SWITCHDEV_OBJ_PORT_VLAN */
> +struct switchdev_obj_vlan {
> + struct switchdev_obj obj;   /* must be first */
> + u16 flags;
> + u16 vid_begin;
> + u16 vid_end;
> +};

I know you give a few examples for where this is done in network code,
but i think container_of() is used much more. You can then place the
struct switchdev_obj anywhere in the structure, and it will work.

#define to_switchdev_obj_vlan(o) container_of(o, struct switchdev_obj_vlan, obj)

Andrew
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/2] driver: net: xgene: Add support for 2nd 10GbE port

2015-09-08 Thread Iyappan Subramanian
Adding support for the second 10GbE port on APM X-Gene SoC

Signed-off-by: Iyappan Subramanian 
---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   |  3 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 16 
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |  5 +
 3 files changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
index cfa3704..21749f0 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_hw.c
@@ -107,7 +107,8 @@ static void xgene_enet_set_ring_state(struct 
xgene_enet_desc_ring *ring)
 {
xgene_enet_ring_set_type(ring);
 
-   if (xgene_enet_ring_owner(ring->id) == RING_OWNER_ETH0)
+   if (xgene_enet_ring_owner(ring->id) == RING_OWNER_ETH0 ||
+   xgene_enet_ring_owner(ring->id) == RING_OWNER_ETH1)
xgene_enet_ring_set_recombbuf(ring);
 
xgene_enet_ring_init(ring);
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
index e47298f..6b1846d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.c
@@ -1305,10 +1305,17 @@ static void xgene_enet_setup_ops(struct 
xgene_enet_pdata *pdata)
pdata->ring_num = START_RING_NUM_0;
break;
case 1:
-   pdata->cpu_bufnum = START_CPU_BUFNUM_1;
-   pdata->eth_bufnum = START_ETH_BUFNUM_1;
-   pdata->bp_bufnum = START_BP_BUFNUM_1;
-   pdata->ring_num = START_RING_NUM_1;
+   if (pdata->phy_mode == PHY_INTERFACE_MODE_XGMII) {
+   pdata->cpu_bufnum = XG_START_CPU_BUFNUM_1;
+   pdata->eth_bufnum = XG_START_ETH_BUFNUM_1;
+   pdata->bp_bufnum = XG_START_BP_BUFNUM_1;
+   pdata->ring_num = XG_START_RING_NUM_1;
+   } else {
+   pdata->cpu_bufnum = START_CPU_BUFNUM_1;
+   pdata->eth_bufnum = START_ETH_BUFNUM_1;
+   pdata->bp_bufnum = START_BP_BUFNUM_1;
+   pdata->ring_num = START_RING_NUM_1;
+   }
break;
default:
break;
@@ -1478,6 +1485,7 @@ static const struct acpi_device_id 
xgene_enet_acpi_match[] = {
{ "APMC0D05", XGENE_ENET1},
{ "APMC0D30", XGENE_ENET1},
{ "APMC0D31", XGENE_ENET1},
+   { "APMC0D3F", XGENE_ENET1},
{ "APMC0D26", XGENE_ENET2},
{ "APMC0D25", XGENE_ENET2},
{ }
diff --git a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h 
b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
index 50f92c3..ff89a5d 100644
--- a/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
+++ b/drivers/net/ethernet/apm/xgene/xgene_enet_main.h
@@ -56,6 +56,11 @@
 #define START_BP_BUFNUM_1  0x2A
 #define START_RING_NUM_1   264
 
+#define XG_START_CPU_BUFNUM_1  12
+#define XG_START_ETH_BUFNUM_1  2
+#define XG_START_BP_BUFNUM_1   0x22
+#define XG_START_RING_NUM_1264
+
 #define X2_START_CPU_BUFNUM_0  0
 #define X2_START_ETH_BUFNUM_0  0
 #define X2_START_BP_BUFNUM_0   0x20
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/2] driver: net: xgene: Enable 2nd 10GbE port on APM X-Gene SoC

2015-09-08 Thread Iyappan Subramanian
This patch adds support for 2nd 10GbE on APM X-Gene SoC

---
Iyappan Subramanian (2):
  driver: net: xgene: Add support for 2nd 10GbE port
  dtb: xgene: Add 2nd 10GbE node

 arch/arm64/boot/dts/apm/apm-storm.dtsi   | 28 
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   |  3 ++-
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 16 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h |  5 +
 4 files changed, 47 insertions(+), 5 deletions(-)

-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/2] dtb: xgene: Add 2nd 10GbE node

2015-09-08 Thread Iyappan Subramanian
Adding the second 10GbE dt node for APM X-Gene SoC device tree

Signed-off-by: Iyappan Subramanian 
---
 arch/arm64/boot/dts/apm/apm-storm.dtsi | 28 
 1 file changed, 28 insertions(+)

diff --git a/arch/arm64/boot/dts/apm/apm-storm.dtsi 
b/arch/arm64/boot/dts/apm/apm-storm.dtsi
index d831bc2..d483e7e 100644
--- a/arch/arm64/boot/dts/apm/apm-storm.dtsi
+++ b/arch/arm64/boot/dts/apm/apm-storm.dtsi
@@ -207,6 +207,17 @@
clock-output-names = "xge0clk";
};
 
+   xge1clk: xge1clk@1f62c000 {
+   compatible = "apm,xgene-device-clock";
+   status = "disabled";
+   #clock-cells = <1>;
+   clocks = <&socplldiv2 0>;
+   reg = <0x0 0x1f62c000 0x0 0x1000>;
+   reg-names = "csr-reg";
+   csr-mask = <0x3>;
+   clock-output-names = "xge1clk";
+   };
+
sataphy1clk: sataphy1clk@1f21c000 {
compatible = "apm,xgene-device-clock";
#clock-cells = <1>;
@@ -816,6 +827,23 @@
phy-connection-type = "xgmii";
};
 
+   xgenet1: ethernet@1f62 {
+   compatible = "apm,xgene1-xgenet";
+   status = "disabled";
+   reg = <0x0 0x1f62 0x0 0xd100>,
+ <0x0 0x1f60 0x0 0Xc300>,
+ <0x0 0x1800 0x0 0X8000>;
+   reg-names = "enet_csr", "ring_csr", "ring_cmd";
+   interrupts = <0x0 0x6C 0x4>,
+<0x0 0x6D 0x4>;
+   port-id = <1>;
+   dma-coherent;
+   clocks = <&xge1clk 0>;
+   /* mac address will be overwritten by the bootloader */
+   local-mac-address = [00 00 00 00 00 00];
+   phy-connection-type = "xgmii";
+   };
+
rng: rng@1052 {
compatible = "apm,xgene-rng";
reg = <0x0 0x1052 0x0 0x100>;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 3/3] r8169: increase the lifespan of the hardware counters dump area.

2015-09-08 Thread Francois Romieu
Corinna Vinschen  :
[...]
> - Alternatively I can still reproduce the SEGV in rtl_remove_one
>   when trying to rmmod the module, I just don't have the stack dump
>   handy while writing this mail.  I can show it if needed.

I see it too. 

> I debugged this on and off the entire day (tweaking, compiling, rebooting,
> kernel crash, rinse and repeat).
> 
> And the result of my debugging is totally crazy:

My patch corrupts memory.

-- 
Ueimor
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] seccomp: add a way to attach a filter via eBPF fd

2015-09-08 Thread Kees Cook
On Tue, Sep 8, 2015 at 6:40 AM, Tycho Andersen
 wrote:
> On Sat, Sep 05, 2015 at 09:13:02AM +0200, Michael Kerrisk (man-pages) wrote:
>> On 09/04/2015 10:41 PM, Kees Cook wrote:
>> > On Fri, Sep 4, 2015 at 9:04 AM, Tycho Andersen
>> >  wrote:
>> >> This is the final bit needed to support seccomp filters created via the 
>> >> bpf
>> >> syscall.
>>
>> Hmm. Thanks Kees, for CCinf linux-api@. That really should have been done at
>> the outset.
>
> Apologies, I'll cc the list on future versions.
>
>> Tycho, where's the man-pages patch describing this new kernel-userspace
>> API feature? :-)
>
> Once we get the API finalized I'm happy to write it.
>
>> >> One concern with this patch is exactly what the interface should look like
>> >> for users, since seccomp()'s second argument is a pointer, we could ask
>> >> people to pass a pointer to the fd, but implies we might write to it which
>> >> seems impolite. Right now we cast the pointer (and force the user to cast
>> >> it), which generates ugly warnings. I'm not sure what the right answer is
>> >> here.
>> >>
>> >> Signed-off-by: Tycho Andersen 
>> >> CC: Kees Cook 
>> >> CC: Will Drewry 
>> >> CC: Oleg Nesterov 
>> >> CC: Andy Lutomirski 
>> >> CC: Pavel Emelyanov 
>> >> CC: Serge E. Hallyn 
>> >> CC: Alexei Starovoitov 
>> >> CC: Daniel Borkmann 
>> >> ---
>> >>  include/linux/seccomp.h  |  3 +-
>> >>  include/uapi/linux/seccomp.h |  1 +
>> >>  kernel/seccomp.c | 70 
>> >> 
>> >>  3 files changed, 61 insertions(+), 13 deletions(-)
>> >>
>> >> diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h
>> >> index d1a86ed..a725dd5 100644
>> >> --- a/include/linux/seccomp.h
>> >> +++ b/include/linux/seccomp.h
>> >> @@ -3,7 +3,8 @@
>> >>
>> >>  #include 
>> >>
>> >> -#define SECCOMP_FILTER_FLAG_MASK   (SECCOMP_FILTER_FLAG_TSYNC)
>> >> +#define SECCOMP_FILTER_FLAG_MASK   (\
>> >> +   SECCOMP_FILTER_FLAG_TSYNC | SECCOMP_FILTER_FLAG_EBPF)
>> >>
>> >>  #ifdef CONFIG_SECCOMP
>> >>
>> >> diff --git a/include/uapi/linux/seccomp.h b/include/uapi/linux/seccomp.h
>> >> index 0f238a4..c29a423 100644
>> >> --- a/include/uapi/linux/seccomp.h
>> >> +++ b/include/uapi/linux/seccomp.h
>> >> @@ -16,6 +16,7 @@
>> >>
>> >>  /* Valid flags for SECCOMP_SET_MODE_FILTER */
>> >>  #define SECCOMP_FILTER_FLAG_TSYNC  1
>> >> +#define SECCOMP_FILTER_FLAG_EBPF   (1 << 1)
>> >>
>> >>  /*
>> >>   * All BPF programs must return a 32-bit value.
>> >> diff --git a/kernel/seccomp.c b/kernel/seccomp.c
>> >> index a2c5b32..9c6bea6 100644
>> >> --- a/kernel/seccomp.c
>> >> +++ b/kernel/seccomp.c
>> >> @@ -355,17 +355,6 @@ static struct seccomp_filter 
>> >> *seccomp_prepare_filter(struct sock_fprog *fprog)
>> >>
>> >> BUG_ON(INT_MAX / fprog->len < sizeof(struct sock_filter));
>> >>
>> >> -   /*
>> >> -* Installing a seccomp filter requires that the task has
>> >> -* CAP_SYS_ADMIN in its namespace or be running with no_new_privs.
>> >> -* This avoids scenarios where unprivileged tasks can affect the
>> >> -* behavior of privileged children.
>> >> -*/
>> >> -   if (!task_no_new_privs(current) &&
>> >> -   security_capable_noaudit(current_cred(), current_user_ns(),
>> >> -CAP_SYS_ADMIN) != 0)
>> >> -   return ERR_PTR(-EACCES);
>> >> -
>> >> /* Allocate a new seccomp_filter */
>> >> sfilter = kzalloc(sizeof(*sfilter), GFP_KERNEL | __GFP_NOWARN);
>> >> if (!sfilter)
>> >> @@ -509,6 +498,48 @@ static void seccomp_send_sigsys(int syscall, int 
>> >> reason)
>> >> info.si_syscall = syscall;
>> >> force_sig_info(SIGSYS, &info, current);
>> >>  }
>> >> +
>> >> +#ifdef CONFIG_BPF_SYSCALL
>> >> +static struct seccomp_filter *seccomp_prepare_ebpf(const char __user 
>> >> *filter)
>> >> +{
>> >> +   /* XXX: this cast generates a warning. should we make people pass 
>> >> in
>> >> +* &fd, or is there some nicer way of doing this?
>> >> +*/
>> >> +   u32 fd = (u32) filter;
>> >
>> > I think this is probably the right way to do it, modulo getting the
>> > warning fixed. Let me invoke the great linux-api subscribers to get
>> > some more opinions.
>>
>> Sigh. It's sad, but the using a cast does seem the simplest option.
>> But, how about another idea...
>>
>> > tl;dr: adding SECCOMP_FILTER_FLAG_EBPF to the flags changes the
>> > pointer argument into an fd argument. Is this sane, should it be a
>> > pointer to an fd, or should it not be a flag at all, creating a new
>> > seccomp command instead (SECCOMP_MODE_FILTER_EBPF)?
>>
>> What about
>>
>> seccomp(SECCOMP_MODE_FILTER_EBPF, flags, structp)
>>
>> Where structp is a pointer to something like
>>
>> struct seccomp_ebpf {
>> int size;  /* Size of this whole struct */
>> int fd;
>> }
>>
>> 'size' allows for future expansion of the struct (in case we want to
>> expand it later), and placing 'fd' ins

Re: [PATCH 2/3] net: irda: pxaficp_ir: convert to readl and writel

2015-09-08 Thread Petr Cvek
Dne 8.9.2015 v 22:24 Petr Cvek napsal(a):
> 
> Did you defined resources somewhere? Actual resources are in 
> "pxa_ir_resources" variable at:
> 
>   http://lxr.free-electrons.com/source/arch/arm/mach-pxa/devices.c#L386
> 
> or this pdata should be moved into specific machine files?
> 

I tried to add following patch for new resources, but now it fails with:

[  141.534545] pxa2xx-ir pxa2xx-ir: can't request region for resource [mem 
0x4070-0x40700100]
[  141.534574] pxa2xx-ir pxa2xx-ir: resource stuart not defined
[  141.534656] pxa2xx-ir: probe of pxa2xx-ir failed with error -16

That's because STUART is allocated by normal UART driver at:

http://lxr.free-electrons.com/source/arch/arm/mach-pxa/devices.c#L244

So somehow there must be configuration for STUART used with FICP and STUART 
alone (probably can be used for normal UART).


diff --git a/arch/arm/mach-pxa/devices.c b/arch/arm/mach-pxa/devices.c
index 3543466..316ffa3 100644
--- a/arch/arm/mach-pxa/devices.c
+++ b/arch/arm/mach-pxa/devices.c
@@ -394,6 +394,26 @@ static struct resource pxa_ir_resources[] = {
.end= IRQ_ICP,
.flags  = IORESOURCE_IRQ,
},
+   [2] = {
+   .start  = 0x4080,
+   .end= 0x4080 + 0x100,
+   .flags  = IORESOURCE_MEM,
+   },
+   [3] = {
+   .start  = 0x4070,
+   .end= 0x4070 + 0x100,
+   .flags  = IORESOURCE_MEM,
+   },
+   [4] = {
+   .start  = 17,
+   .end= 17,
+   .flags  = IORESOURCE_DMA,
+   },
+   [5] = {
+   .start  = 18,
+   .end= 18,
+   .flags  = IORESOURCE_DMA,
+   },
 };
 
 struct platform_device pxa_device_ficp = {
-- 
1.7.12.1


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] atomics,cmpxchg: Privatize the inclusion of asm/cmpxchg.h

2015-09-08 Thread Boqun Feng
After commit:

atomics: add acquire/release/relaxed variants of some atomic operations

Architectures may only provide {cmp,}xchg_relaxed definitions in
asm/cmpxchg.h. Other variants, such as {cmp,}xchg, may be built in
linux/atomic.h, which means simply including asm/cmpxchg.h may not get
the definitions of all the{cmp,}xchg variants. Therefore, we should
privatize the inclusions of asm/cmpxchg.h to keep it only included in
arch/* and replace the inclusions outside with linux/atomic.h

Acked-by: Will Deacon 
Signed-off-by: Boqun Feng 
---
v1 --> v2:
1. rebase on current master branch of tip tree
2. remove documentation modification

 drivers/net/ethernet/sfc/mcdi.c | 2 +-
 drivers/phy/phy-rcar-gen2.c | 3 +--
 drivers/staging/speakup/selection.c | 2 +-
 3 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/sfc/mcdi.c b/drivers/net/ethernet/sfc/mcdi.c
index 81640f8..968383e 100644
--- a/drivers/net/ethernet/sfc/mcdi.c
+++ b/drivers/net/ethernet/sfc/mcdi.c
@@ -9,7 +9,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include "net_driver.h"
 #include "nic.h"
 #include "io.h"
diff --git a/drivers/phy/phy-rcar-gen2.c b/drivers/phy/phy-rcar-gen2.c
index 6e0d9fa..c7a0599 100644
--- a/drivers/phy/phy-rcar-gen2.c
+++ b/drivers/phy/phy-rcar-gen2.c
@@ -17,8 +17,7 @@
 #include 
 #include 
 #include 
-
-#include 
+#include 
 
 #define USBHS_LPSTS0x02
 #define USBHS_UGCTRL   0x80
diff --git a/drivers/staging/speakup/selection.c 
b/drivers/staging/speakup/selection.c
index 98af3b1..aa5ab6c 100644
--- a/drivers/staging/speakup/selection.c
+++ b/drivers/staging/speakup/selection.c
@@ -7,7 +7,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 
 #include "speakup.h"
 
-- 
2.5.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Linux kernel commit breaks IPMI on iface downing

2015-09-08 Thread Harish Patil

>On Fri, 2015-09-04 at 09:55 +0200, Sébastien Bocahu wrote:
>> Hi,
>> 
>> Any chance this behaviour gets fixed, with either a new firmware or a
>> workaround in the kernel ?
>> 
>
>As I said earlier, when we call bnx2_shutdown_chip(), we inform the
>firmware that the driver is shutting down.  The firmware should know
>that there is IPMI firmware and the link needs to stay up.
>
>In the older driver, we would also call bnx2_set_power_state() which
>would send some additional messages to the firmware before putting the
>device in D3hot.  May be these messages are required by the firmware to
>keep the link up.  Harish, please check with your firmware team.
>Thanks.
>
>
>
>

Hi Michael/Sebastien,
ACK. Sure, I will look into it and get back.

Thanks,
Harish



Re: [GIT] Networking

2015-09-08 Thread Konrad Rzeszutek Wilk
On Tue, Sep 8, 2015 at 10:14 PM, Konrad Rzeszutek Wilk
 wrote:
>
> (Removed Linus and Andrew from the To, added Corinna ..)

and resending again without HTML (sorry, thought I had HTML-emails
disabled by default)
>
> On Thu, Sep 3, 2015 at 1:35 AM, David Miller  wrote:
>>
>>
>> Another merge window, another set of networking changes.  I've heard
>> rumblings that the lightweight tunnels infrastructure has been voted
>> networking change of the year.  But what do I know?
>>
> .. snip..
>>
>> Corinna Vinschen (2):
>>   r8169: Add values missing in @get_stats64 from HW counters
>
>
> .. cases an regression when SWIOTLB is in use. (See full attached serial 
> log), but
> the relevant snippet is:
>
> [   12.010065] BUG: sleeping function called from invalid context at 
> /home/konrad/ssd/konrad/linux/mm/page_alloc.c:3186
> [   12.88021e49b938 8174ff50 8800c6637748
> [   12.051548] ttyS1: 4 input overrun(s)
> [   12.055485]  8800c6637140 88021e49b958 810cf22e 
> 0001^G^G^G^G^G^G^G^G
> [   12.064566]   88021e49b988^G^G^G^G^G^G^G^G 
> 810cf2cd 88021e49b9b4
> [   12.073639] Call Trace:
> [   12.076156]  [] dump_stack+0x4f/0x68
> [   12.081444]  [] ___might_sleep+0xde/0x130
> [   12.087176]  [] __might_sleep+0x4d/0x90
> [   12.092731]  [] __alloc_pages_nodemask+0x26f/0xa20
> [   12.099271]  [] ? create_object+0x21e/0x2c0
> [   12.105183]  [] ? kmemleak_alloc+0x23/0x40
> [   12.111006]  [] ? kmem_cache_alloc_trace+0x184/0x190
> [   12.117721]  [] ? kmemleak_alloc+0x23/0x40
> [   12.123546]  [] dma_generic_alloc_coherent+0x9d/0x140
> [   12.130354]  [] x86_swiotlb_alloc_coherent+0x30/0x60
> [   12.137072]  [] dma_alloc_attrs+0x4c/0xb0
> [   12.142808]  [] rtl8169_update_counters+0x7e/0x150 
> [r8169]
> [   12.150061]  [] rtl8169_get_stats64+0xcb/0x130 [r8169]
> [   12.156956]  [] dev_get_stats+0x38/0x90
> [   12.162511]  [] dev_seq_printf_stats+0x23/0x100
> [   12.168786]  [] ? create_object+0x21e/0x2c0
> [   12.174715]  [] dev_seq_show+0xf/0x30
> [   12.180098]  [] seq_read+0x26a/0x400
> [   12.185384]  [] proc_reg_read+0x3e/0x70
> [   12.190943]  [] __vfs_read+0x2f/0xe0
> [   12.196245]  [] ? security_file_permission+0xa2/0xb0
> [   12.202972]  [] ? rw_verify_area+0x58/0xe0
> [   12.208799]  [] vfs_read+0x92/0xd0
> [   12.213908]  [] ? __fdget+0xe/0x10
> [   12.219024]  [] SyS_read+0x51/0xb0
> [   12.224140]  [] entry_SYSCALL_64_fastpath+0x12/0x71
>  done.
>
>  If I revert 6e85d5ad36a26debc23a9a865c029cbe242b2dc8 "r8169: Add values 
> missing in @get_stats64 from HW counters"
> I don't get this message.
>
> Thank you!
>>
>>


tst038
Description: Binary data


[PATCH net-next 1/1] net: fec: add netif status check before set mac address

2015-09-08 Thread Fugang Duan
There exist one issue by below case that case system hang:
ifconfig eth0 down
ifconfig eth0 hw ether 00:10:19:19:81:19

After eth0 down, all fec clocks are gated off. In the .fec_set_mac_address()
function, it will set new MAC address to registers, which causes system hang.

So it needs to add netif status check to avoid registers access when clocks are
gated off. Until eth0 up the new MAC address are wrote into related registers.

Signed-off-by: Fugang Duan 
---
 drivers/net/ethernet/freescale/fec_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 91925e3..cd09dbb 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3029,6 +3029,9 @@ fec_set_mac_address(struct net_device *ndev, void *p)
memcpy(ndev->dev_addr, addr->sa_data, ndev->addr_len);
}
 
+   if (!netif_running(ndev))
+   return 0;
+
writel(ndev->dev_addr[3] | (ndev->dev_addr[2] << 8) |
(ndev->dev_addr[1] << 16) | (ndev->dev_addr[0] << 24),
fep->hwp + FEC_ADDR_LOW);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] net: dsa: bcm_sf2: Fix 64-bits register writes

2015-09-08 Thread Florian Fainelli
The macro to write 64-bits quantities to the 32-bits register swapped
the value and offsets arguments, we want to preserve the ordering of the
arguments with respect to how writel() is implemented for instance:
value first, offset/base second.

Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
Signed-off-by: Florian Fainelli 
---
David,

Can you also queue this one for -stable? Thanks!

 drivers/net/dsa/bcm_sf2.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.h b/drivers/net/dsa/bcm_sf2.h
index 22e2ebf31333..789d7b7737da 100644
--- a/drivers/net/dsa/bcm_sf2.h
+++ b/drivers/net/dsa/bcm_sf2.h
@@ -112,8 +112,8 @@ static inline u64 name##_readq(struct bcm_sf2_priv *priv, 
u32 off)  \
spin_unlock(&priv->indir_lock); \
return (u64)indir << 32 | dir;  \
 }  \
-static inline void name##_writeq(struct bcm_sf2_priv *priv, u32 off,   \
-   u64 val)\
+static inline void name##_writeq(struct bcm_sf2_priv *priv, u64 val,   \
+   u32 off)\
 {  \
spin_lock(&priv->indir_lock);   \
reg_writel(priv, upper_32_bits(val), REG_DIR_DATA_WRITE);   \
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: dsa: bcm_sf2: Fix 64-bits register writes

2015-09-08 Thread Vivien Didelot
On Sep. Tuesday 08 (37) 08:06 PM, Florian Fainelli wrote:
> The macro to write 64-bits quantities to the 32-bits register swapped
> the value and offsets arguments, we want to preserve the ordering of the
> arguments with respect to how writel() is implemented for instance:
> value first, offset/base second.
> 
> Fixes: 246d7f773c13 ("net: dsa: add Broadcom SF2 switch driver")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/1] net: fec: add netif status check before set mac address

2015-09-08 Thread Florian Fainelli
Le 09/08/15 19:42, Fugang Duan a écrit :
> There exist one issue by below case that case system hang:
> ifconfig eth0 down
> ifconfig eth0 hw ether 00:10:19:19:81:19
> 
> After eth0 down, all fec clocks are gated off. In the .fec_set_mac_address()
> function, it will set new MAC address to registers, which causes system hang.
> 
> So it needs to add netif status check to avoid registers access when clocks 
> are
> gated off. Until eth0 up the new MAC address are wrote into related registers.

Since this is a bug fix, do not you intend to target the "net" tree
instead of "net-next"?

> 
> Signed-off-by: Fugang Duan 
> ---
>  drivers/net/ethernet/freescale/fec_main.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ethernet/freescale/fec_main.c 
> b/drivers/net/ethernet/freescale/fec_main.c
> index 91925e3..cd09dbb 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -3029,6 +3029,9 @@ fec_set_mac_address(struct net_device *ndev, void *p)
>   memcpy(ndev->dev_addr, addr->sa_data, ndev->addr_len);
>   }
>  
> + if (!netif_running(ndev))
> + return 0;
> +
>   writel(ndev->dev_addr[3] | (ndev->dev_addr[2] << 8) |
>   (ndev->dev_addr[1] << 16) | (ndev->dev_addr[0] << 24),
>   fep->hwp + FEC_ADDR_LOW);
> 


-- 
Florian
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next 1/1] net: fec: add netif status check before set mac address

2015-09-08 Thread Duan Andy
From: Florian Fainelli  Sent: Wednesday, September 09, 
2015 11:38 AM
> To: Duan Fugang-B38611; da...@davemloft.net
> Cc: netdev@vger.kernel.org; bhutchi...@solarflare.com
> Subject: Re: [PATCH net-next 1/1] net: fec: add netif status check before
> set mac address
> 
> Le 09/08/15 19:42, Fugang Duan a écrit :
> > There exist one issue by below case that case system hang:
> > ifconfig eth0 down
> > ifconfig eth0 hw ether 00:10:19:19:81:19
> >
> > After eth0 down, all fec clocks are gated off. In the
> > .fec_set_mac_address() function, it will set new MAC address to
> registers, which causes system hang.
> >
> > So it needs to add netif status check to avoid registers access when
> > clocks are gated off. Until eth0 up the new MAC address are wrote into
> related registers.
> 
> Since this is a bug fix, do not you intend to target the "net" tree
> instead of "net-next"?
> 
Thanks for your reminding, it is better to enter net tree.

David, if you apply it, pls put it into net tree. Thanks.

> >
> > Signed-off-by: Fugang Duan 
> > ---
> >  drivers/net/ethernet/freescale/fec_main.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ethernet/freescale/fec_main.c
> > b/drivers/net/ethernet/freescale/fec_main.c
> > index 91925e3..cd09dbb 100644
> > --- a/drivers/net/ethernet/freescale/fec_main.c
> > +++ b/drivers/net/ethernet/freescale/fec_main.c
> > @@ -3029,6 +3029,9 @@ fec_set_mac_address(struct net_device *ndev, void
> *p)
> > memcpy(ndev->dev_addr, addr->sa_data, ndev->addr_len);
> > }
> >
> > +   if (!netif_running(ndev))
> > +   return 0;
> > +
> > writel(ndev->dev_addr[3] | (ndev->dev_addr[2] << 8) |
> > (ndev->dev_addr[1] << 16) | (ndev->dev_addr[0] << 24),
> > fep->hwp + FEC_ADDR_LOW);
> >
> 
> 
> --
> Florian
N�r��yb�X��ǧv�^�)޺{.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥

Re: [PATCH net] net: dsa: bcm_sf2: Fix ageing conditions and operation

2015-09-08 Thread David Miller
From: Florian Fainelli 
Date: Sat,  5 Sep 2015 13:07:27 -0700

> The comparison check between cur_hw_state and hw_state is currently
> invalid because cur_hw_state is right shifted by G_MISTP_SHIFT, while
> hw_state is not, so we end-up comparing bits 2:0 with bits 7:5, which is
> going to cause an additional aging to occur. Fix this by not shifting
> cur_hw_state while reading it, but instead, mask the value with the
> appropriately shitfted bitmask.
> 
> The other problem with the fast-ageing process is that we did not set
> the EN_AGE_DYNAMIC bit to request the ageing to occur for dynamically
> learned MAC addresses. Finally, write back 0 to the FAST_AGE_CTRL
> register to avoid leaving spurious bits sets from one operation to the
> other.
> 
> Fixes: 12f460f23423 ("net: dsa: bcm_sf2: add HW bridging support")
> Signed-off-by: Florian Fainelli 
> ---
> David, this dates back to 4.1, could you queue this for -stable?

Applied and queued up for -stable, thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] net: bridge: check __vlan_vid_del for error

2015-09-08 Thread David Miller
From: Vivien Didelot 
Date: Sat,  5 Sep 2015 21:27:57 -0400

> Since __vlan_del can return an error code, change its inner function
> __vlan_vid_del to return an eventual error from switchdev_port_obj_del.
> 
> Signed-off-by: Vivien Didelot 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: bridge: remove unnecessary switchdev include

2015-09-08 Thread David Miller
From: Vivien Didelot 
Date: Sat,  5 Sep 2015 21:49:41 -0400

> Remove the unnecessary switchdev.h include from br_netlink.c.
> 
> Signed-off-by: Vivien Didelot 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v2] net: ipv6: use common fib_default_rule_pref

2015-09-08 Thread David Miller
From: Phil Sutter 
Date: Sun,  6 Sep 2015 12:20:58 +0200

> This switches IPv6 policy routing to use the shared
> fib_default_rule_pref() function of IPv4 and DECnet. It is also used in
> multicast routing for IPv4 as well as IPv6.
> 
> The motivation for this patch is a complaint about iproute2 behaving
> inconsistent between IPv4 and IPv6 when adding policy rules: Formerly,
> IPv6 rules were assigned a fixed priority of 0x3FFF whereas for IPv4 the
> assigned priority value was decreased with each rule added.
> 
> Since then all users of the default_pref field have been converted to
> assign the generic function fib_default_rule_pref(), fib_nl_newrule()
> may just use it directly instead. Therefore get rid of the function
> pointer altogether and make fib_default_rule_pref() static, as it's not
> used outside fib_rules.c anymore.
> 
> Signed-off-by: Phil Sutter 
> ---
> Changes since v1:
> - Folded together with API change and adjusted commit message accordingly.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [net-next PATCH v2] net: ipv6: use common fib_default_rule_pref

2015-09-08 Thread David Miller
From: David Miller 
Date: Tue, 08 Sep 2015 22:37:05 -0700 (PDT)

> Applied, thanks.

Actually, reverted, this doesn't even compile.

net/core/fib_rules.c:47:12: error: static declaration of 
‘fib_default_rule_pref’ follows non-static declaration
 static u32 fib_default_rule_pref(struct fib_rules_ops *ops)
^
In file included from net/core/fib_rules.c:18:0:
include/net/fib_rules.h:120:5: note: previous declaration of 
‘fib_default_rule_pref’ was here
 u32 fib_default_rule_pref(struct fib_rules_ops *ops);
 ^


Re: [PATCH] net: stmmac: Use msleep rather then udelay for reset delay

2015-09-08 Thread David Miller
From: Sjoerd Simons 
Date: Sun,  6 Sep 2015 17:50:59 +0200

> @@ -161,11 +161,16 @@ int stmmac_mdio_reset(struct mii_bus *bus)
>  
>   if (!gpio_request(reset_gpio, "mdio-reset")) {
>   gpio_direction_output(reset_gpio, active_low ? 1 : 0);
> - udelay(data->delays[0]);
> + if (data->delays[0])
> + msleep((data->delays[0] + 999) / 1000);

Please use something like DIV_ROUND_UP(..., USEC_PER_MSEC) or similar.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dm9000: fix a typo

2015-09-08 Thread David Miller
From: Barry Song <21cn...@gmail.com>
Date: Mon,  7 Sep 2015 03:15:20 +

> From: Barry Song 
> 
> Signed-off-by: Barry Song 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: tipc: fix stall during bclink wakeup procedure

2015-09-08 Thread David Miller
From: Kolmakov Dmitriy 
Date: Mon, 7 Sep 2015 09:05:48 +

> If an attempt to wake up users of broadcast link is made when there is
> no enough place in send queue than it may hang up inside the
> tipc_sk_rcv() function since the loop breaks only after the wake up
> queue becomes empty. This can lead to complete CPU stall with the
> following message generated by RCU:
 ...
> The issue occurs only when tipc_sk_rcv() is used to wake up postponed
> senders:
 ...
> After the sender thread is woke up it can gather control and perform
> an attempt to send a message. But if there is no enough place in send
> queue it will call link_schedule_user() function which puts a message
> of type SOCK_WAKEUP to the wakeup queue and put the sender to sleep.
> Thus the size of the queue actually is not changed and the while()
> loop never exits.
> 
> The approach I proposed is to wake up only senders for which there is
> enough place in send queue so the described issue can't occur.
> Moreover the same approach is already used to wake up senders on
> unicast links.
> 
> I have got into the issue on our product code but to reproduce the
> issue I changed a benchmark test application (from
> tipcutils/demos/benchmark) to perform the following scenario:
>   1. Run 64 instances of test application (nodes). It can be done
>  on the one physical machine.
>   2. Each application connects to all other using TIPC sockets in
>  RDM mode.
>   3. When setup is done all nodes start simultaneously send
>  broadcast messages.
>   4. Everything hangs up.
> 
> The issue is reproducible only when a congestion on broadcast link
> occurs. For example, when there are only 8 nodes it works fine since
> congestion doesn't occur. Send queue limit is 40 in my case (I use a
> critical importance level) and when 64 nodes send a message at the
> same moment a congestion occurs every time.
> 
> Signed-off-by: Dmitry S Kolmakov 
> Reviewed-by: Jon Maloy 
> Acked-by: Ying Xue 
> ---
> v2: Updated after comments from Jon and Ying.

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv1 net-next 0/5] netlink: mmap: kernel panic and some issues

2015-09-08 Thread David Miller
From: Daniel Borkmann 
Date: Mon, 07 Sep 2015 16:54:46 +0200

> On 08/17/2015 11:02 PM, David Miller wrote:
> ...
>> I would seriously rather see us do an expensive full copy of the SKB
>> than to have traffic which is unexpectedly invisible to taps.
> 
> I've been looking into this issue a bit further, so the copy for the
> tap seems doable, but while further going through the code to find
> similar
> issues elsewhere, and doing some experiments, it looks like we write
> shared info also in some edge-cases of upcalls such as nfqueue or ovs
> when mmaped netlink is used for rx. I did a test with nfqueue using
> the libmnl mmap branch [1].

Honestly if it's something isolated to something like nf_queue it can
be contained to just being a special fix there.

nf_queue is usually very special and needs hacks to handle things
properly since it acts as an "escape" point for various SKB things.

But if it's in OVS too

I guess we need a more generic fix.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/3] net: irda: pxaficp_ir: convert to readl and writel

2015-09-08 Thread Robert Jarzmik
Petr Cvek  writes:

> Dne 8.9.2015 v 22:24 Petr Cvek napsal(a):
>> 
>> Did you defined resources somewhere? Actual resources are in
>> "pxa_ir_resources" variable at:
I have them in patch [1], which is exactly the patch you have made yourself.

> I tried to add following patch for new resources, but now it fails with:
>
> [  141.534545] pxa2xx-ir pxa2xx-ir: can't request region for resource [mem 
> 0x4070-0x40700100]
> [  141.534574] pxa2xx-ir pxa2xx-ir: resource stuart not defined
> [  141.534656] pxa2xx-ir: probe of pxa2xx-ir failed with error -16
>
> That's because STUART is allocated by normal UART driver at:
>
>   http://lxr.free-electrons.com/source/arch/arm/mach-pxa/devices.c#L244
>
> So somehow there must be configuration for STUART used with FICP and STUART
> alone (probably can be used for normal UART).
That's because you have to remove from magician.c:
pxa_set_stuart_info(NULL);

Cheers.

--
Robert

[1]
>From ea242c5b1c4dcdf2a99ea604ee542ded5e6384b9 Mon Sep 17 00:00:00 2001
From: Robert Jarzmik 
Date: Sat, 29 Aug 2015 00:37:51 +0200
Subject: [PATCH] ARM: pxa: add resources to pxaficp_ir

Add io memory and dma requestor lines to the irda pxa device. This is
part of the conversion of pxaficp_ir to dmaengine, and to shrink its
adherence to 'mach' includes.

Signed-off-by: Robert Jarzmik 
---
 arch/arm/mach-pxa/devices.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/arch/arm/mach-pxa/devices.c b/arch/arm/mach-pxa/devices.c
index e6ce669b54af..6d5ab8199536 100644
--- a/arch/arm/mach-pxa/devices.c
+++ b/arch/arm/mach-pxa/devices.c
@@ -395,6 +395,26 @@ static struct resource pxa_ir_resources[] = {
.end= IRQ_ICP,
.flags  = IORESOURCE_IRQ,
},
+   [3] = {
+   .start  = 0x4080,
+   .end= 0x4080001b,
+   .flags  = IORESOURCE_MEM,
+   },
+   [4] = {
+   .start  = 0x4070,
+   .end= 0x40700023,
+   .flags  = IORESOURCE_MEM,
+   },
+   [5] = {
+   .start  = 17,
+   .end= 17,
+   .flags  = IORESOURCE_DMA,
+   },
+   [6] = {
+   .start  = 18,
+   .end= 18,
+   .flags  = IORESOURCE_DMA,
+   },
 };
 
 struct platform_device pxa_device_ficp = {
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH net-next 0/3] net: switchdev: extract specific object structures

2015-09-08 Thread Jiri Pirko
Tue, Sep 08, 2015 at 10:47:52PM CEST, vivien.dide...@savoirfairelinux.com wrote:
>Hi!
>
>Current implementations of .switchdev_port_obj_add and .switchdev_port_obj_dump
>must pass the generic switchdev_obj structure instead of a specific one (e.g.
>switchdev_obj_fdb) to the related driver accessors, because it contains the
>object transaction and callback. This is not very convenient for drivers.
>
>Instead of having all specific structures included into a switchdev_obj union,
>it would be simpler to embed a switchdev_obj structure as the first member of
>the more specific ones. That way, a driver can cast the switchdev_obj pointer
>to the specific structure and have access to all the information from it.
>
>As an example, this allows Rocker to change its two FDB accessors:
>
>rocker_port_fdb_add(rocker_port, obj->trans, &obj->u.fdb);
>rocker_port_fdb_dump(rocker_port, obj);
>
>for these most consistent ones:
>
>fdb = (struct switchdev_obj_fdb *) obj;
>rocker_port_fdb_add(rocker_port, fdb);
>rocker_port_fdb_dump(rocker_port, fdb);
>
>This is what struct netdev_notifier_info and its specific supersets (e.g.
>struct netdev_notifier_changeupper_info) do in include/linux/netdevice.h.
>
>This patchset does that and updates bridge, Rocker and DSA accordingly.
>
>Note that this patchset was sent as an RFC not to bother David with new
>net-next stuffs, but if the changes look good, it is ready to merge.
>
>Also, please take note that the change sits on top of:
>http://patchwork.ozlabs.org/patch/514894/
>
>Vivien Didelot (3):
>  net: switchdev: extract switchdev_obj_vlan
>  net: switchdev: extract switchdev_obj_ipv4_fib
>  net: switchdev: extract switchdev_obj_fdb

I like this patchset a lot! I think it is the right way. Thanks for doing this.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html