date:20160902

[PATCH] RDS: Simplify code

2016-09-02 Thread Christophe JAILLET

Calling 'list_splice' followed by 'INIT_LIST_HEAD' is equivalent to
'list_splice_init'.

This has been spotted with the following coccinelle script:
/
@@
expression y,z;
@@

-   list_splice(y,z);
-   INIT_LIST_HEAD(y);
+   list_splice_init(y,z);

Signed-off-by: Christophe JAILLET 
---
 net/rds/loop.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/rds/loop.c b/net/rds/loop.c
index f2bf78de5688..c3e6da4fdf97 100644
--- a/net/rds/loop.c
+++ b/net/rds/loop.c
@@ -167,8 +167,7 @@ void rds_loop_exit(void)
 
/* avoid calling conn_destroy with irqs off */
spin_lock_irq(_conns_lock);
-   list_splice(_conns, _list);
-   INIT_LIST_HEAD(_conns);
+   list_splice_init(_conns, _list);
spin_unlock_irq(_conns_lock);
 
list_for_each_entry_safe(lc, _lc, _list, loop_node) {
-- 
2.7.4

Re: [BUG] af_packet FANOUT and device dismantle

2016-09-02 Thread David Miller

From: Eric Dumazet 
Date: Thu, 01 Sep 2016 17:04:43 -0700

> When you added fanout in commit dc99f600698dcac,
> it seems a device dismantle is not properly handled.
> 
> packet_notifier() does properly finds all sockets attached to the
> device and we call __unregister_prot_hook()
> 
> But the actual dev_remove_pack() is called when the last socket attached
> to the FANOUT group is _closed_ , possibly minutes after the device
> disappeared.
> 
> So we trigger the BUG_ON(!list_empty(>ptype_all)); in
> netdev_run_todo(), that came with linux-4.0 in 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=7866a621043fbaca3d7389e9b9f69dd1a2e5a855
> 
> Fix would be to call dev_remote_pack() from __fanout_unlink() when
> num_members becomes 0.
> 
> Let me know if you agree with this, thanks.

I completely agree with your analysis, good catch.

Re: [PATCH v3 0/4] net: smsc911x: Move phy and interrupt config

2016-09-02 Thread David Miller

From: Jeremy Linton 
Date: Thu,  1 Sep 2016 15:15:05 -0500

> v2-v3: Move error handing into separate patch, replace a couple cases
>  of fixed errors with the errors being returned from the failing functions.
>  Hoist irq handler.
> 
> The smsc911x driver is doing a number of things in its probe routine that
> should be delayed until the interface is started. Because of this, the module
> cannot be unloaded, the phy states are incorrect/stale if the interface isn't
> running, open's unnecessarily fail causing network configuration problems, and
> the /proc/irq nodes are incorrectly named.
> 
> Clean up a number of these problems by moving the mdio and interrupt
> configuration into the smsc911x_open routine.

Series applied, thanks.

Re: [PATCH net-next V5 0/8] liquidio CN23XX support

2016-09-02 Thread David Miller

From: Raghu Vatsavayi 
Date: Thu, 1 Sep 2016 11:16:03 -0700

> I am posting the remaining half of patchset after the
> acceptance of first half. With this patchset I am able
> to completely submit the code of V3 patchset  which you
> earlier advised me to split into smaller ones.
> 
> This V5 patch also addresses all the comments from previous
> submission:
> 1) Avoid busy loop while reading registers.
> 2) Other minor comments about debug messages and constants.
> 
> Please apply patches in following order as some of the
> patches depend on earlier patches.

Series applied.

Re: [PATCH net-next 0/3] tipc: improve broadcast NACK mechanism

2016-09-02 Thread David Miller

From: Jon Maloy 
Date: Thu,  1 Sep 2016 13:52:48 -0400

> The broadcast protocol has turned out to not scale well beyond 70-80
> nodes, while it is now possible to build TIPC clusters of at least ten
> times that size. This commit series improves the NACK/retransmission
> mechanism of the broadcast protocol to make is at scalable as the rest
> of TIPC.

Series applied, thanks Jon.

Re: [PATCH v4 0/6] Support the rk3399 gmac and pd function

2016-09-02 Thread David Miller

From: Caesar Wang 
Date: Fri,  2 Sep 2016 01:49:58 +0800

> This patch have the following changes:
> 
> 7edf13e net: stmmac: dwmac-rk: add rk3366 & rk3399 specific data
> 26e004e net: stmmac: dwmac-rk: fixes the gmac resume after PD on/off
> b216c2f net: stmmac: dwmac-rk: add pd_gmac support for rk3399
> 848bb71 arm64: dts: rockchip: add the gmac power domain on rk3399
> 508e41f arm64: dts: rockchip: add the gmac needed node for rk3399
> fb26795 arm64: dts: rockchip: enable the gmac for rk3399 evb board
> 
> Hi David,
> The patch 1,2,3 is related to the rockchip net/stammc driver,
> 
> Hi Heiko,
> The patch 4,5,6 is related to the dts changes.

Patches 1, 2, and 3 applied to net-next, thanks.

[PATCH 0/2] hso: neatening

2016-09-02 Thread Joe Perches

This seems to be the only code in the kernel that uses
macro defines with a trailing underscore.  Fix that.

Joe Perches (2):
  hso: Use a more common logging style
  hso: Convert printk to pr_

 drivers/net/usb/hso.c | 118 +++---
 1 file changed, 55 insertions(+), 63 deletions(-)

-- 
2.10.0.rc2.1.g053435c

[PATCH 2/2] hso: Convert printk to pr_

2016-09-02 Thread Joe Perches

Use a more common logging style

Miscellanea:

o Add pr_fmt to prefix each output message
o Realign arguments

Signed-off-by: Joe Perches 
---
 drivers/net/usb/hso.c | 21 +++--
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index 6c37512..e7b5163 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -50,6 +50,8 @@
  *
  */
 
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
 #include 
 #include 
 #include 
@@ -638,7 +640,7 @@ static int get_free_serial_index(void)
}
spin_unlock_irqrestore(_table_lock, flags);
 
-   printk(KERN_ERR "%s: no free serial devices in table\n", __func__);
+   pr_err("%s: no free serial devices in table\n", __func__);
return -1;
 }
 
@@ -1102,7 +1104,7 @@ static void _hso_serial_set_termios(struct tty_struct 
*tty,
struct hso_serial *serial = tty->driver_data;
 
if (!serial) {
-   printk(KERN_ERR "%s: no tty structures", __func__);
+   pr_err("%s: no tty structures", __func__);
return;
}
 
@@ -1347,7 +1349,7 @@ static int hso_serial_write(struct tty_struct *tty, const 
unsigned char *buf,
 
/* sanity check */
if (serial == NULL) {
-   printk(KERN_ERR "%s: serial is NULL\n", __func__);
+   pr_err("%s: serial is NULL\n", __func__);
return -ENODEV;
}
 
@@ -1773,7 +1775,7 @@ static int mux_device_request(struct hso_serial *serial, 
u8 type, u16 port,
 
/* Sanity check */
if (!serial || !ctrl_urb || !ctrl_req) {
-   printk(KERN_ERR "%s: Wrong arguments\n", __func__);
+   pr_err("%s: Wrong arguments\n", __func__);
return -EINVAL;
}
 
@@ -3220,7 +3222,7 @@ static int __init hso_init(void)
int result;
 
/* put it in the log */
-   printk(KERN_INFO "hso: %s\n", version);
+   pr_info("%s\n", version);
 
/* Initialise the serial table semaphore and table */
spin_lock_init(_table_lock);
@@ -3251,16 +3253,15 @@ static int __init hso_init(void)
/* register the tty driver */
result = tty_register_driver(tty_drv);
if (result) {
-   printk(KERN_ERR "%s - tty_register_driver failed(%d)\n",
-   __func__, result);
+   pr_err("%s - tty_register_driver failed(%d)\n",
+  __func__, result);
goto err_free_tty;
}
 
/* register this module as an usb driver */
result = usb_register(_driver);
if (result) {
-   printk(KERN_ERR "Could not register hso driver? error: %d\n",
-   result);
+   pr_err("Could not register hso driver - error: %d\n", result);
goto err_unreg_tty;
}
 
@@ -3275,7 +3276,7 @@ err_free_tty:
 
 static void __exit hso_exit(void)
 {
-   printk(KERN_INFO "hso: unloaded\n");
+   pr_info("unloaded\n");
 
tty_unregister_driver(tty_drv);
put_tty_driver(tty_drv);
-- 
2.10.0.rc2.1.g053435c

[PATCH 1/2] hso: Use a more common logging style

2016-09-02 Thread Joe Perches

Macros that end in an underscore are just odd.
Add hso_dbg(level, fmt, ...) and use it everwhere instead.

Several uses had additional unnecessary newlines as the
macro added a newline.  Remove the newline from the macro
and add newlines to each use as appropriate.

Remove now unused D macros.

Signed-off-by: Joe Perches 
---
 drivers/net/usb/hso.c | 97 +++
 1 file changed, 44 insertions(+), 53 deletions(-)

diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
index c5544d3..6c37512 100644
--- a/drivers/net/usb/hso.c
+++ b/drivers/net/usb/hso.c
@@ -108,23 +108,12 @@
 /*/
 /* Debugging functions   */
 /*/
-#define D__(lvl_, fmt, arg...) \
-   do {\
-   printk(lvl_ "[%d:%s]: " fmt "\n",   \
-  __LINE__, __func__, ## arg); \
-   } while (0)
-
-#define D_(lvl, args...)   \
-   do {\
-   if (lvl & debug)\
-   D__(KERN_INFO, args);   \
-   } while (0)
-
-#define D1(args...)D_(0x01, ##args)
-#define D2(args...)D_(0x02, ##args)
-#define D3(args...)D_(0x04, ##args)
-#define D4(args...)D_(0x08, ##args)
-#define D5(args...)D_(0x10, ##args)
+#define hso_dbg(lvl, fmt, ...) \
+do {   \
+   if ((lvl) & debug)  \
+   pr_info("[%d:%s] " fmt, \
+   __LINE__, __func__, ##__VA_ARGS__); \
+} while (0)
 
 /*/
 /* Enumerators   */
@@ -709,7 +698,8 @@ static void handle_usb_error(int status, const char 
*function,
}
 
/* log a meaningful explanation of an USB status */
-   D1("%s: received USB status - %s (%d)", function, explanation, status);
+   hso_dbg(0x1, "%s: received USB status - %s (%d)\n",
+   function, explanation, status);
 }
 
 /* Network interface functions */
@@ -808,7 +798,7 @@ static netdev_tx_t hso_net_start_xmit(struct sk_buff *skb,
DUMP1(skb->data, skb->len);
/* Copy it from kernel memory to OUR memory */
memcpy(odev->mux_bulk_tx_buf, skb->data, skb->len);
-   D1("len: %d/%d", skb->len, MUX_BULK_TX_BUF_SIZE);
+   hso_dbg(0x1, "len: %d/%d\n", skb->len, MUX_BULK_TX_BUF_SIZE);
 
/* Fill in the URB for shipping it out. */
usb_fill_bulk_urb(odev->mux_bulk_tx_urb,
@@ -872,7 +862,7 @@ static void packetizeRx(struct hso_net *odev, unsigned char 
*ip_pkt,
unsigned char *tmp_rx_buf;
 
/* log if needed */
-   D1("Rx %d bytes", count);
+   hso_dbg(0x1, "Rx %d bytes\n", count);
DUMP(ip_pkt, min(128, (int)count));
 
while (count) {
@@ -912,7 +902,7 @@ static void packetizeRx(struct hso_net *odev, unsigned char 
*ip_pkt,
frame_len);
if (!odev->skb_rx_buf) {
/* We got no receive buffer. */
-   D1("could not allocate memory");
+   hso_dbg(0x1, "could not allocate 
memory\n");
odev->rx_parse_state = WAIT_SYNC;
continue;
}
@@ -972,11 +962,11 @@ static void packetizeRx(struct hso_net *odev, unsigned 
char *ip_pkt,
break;
 
case WAIT_SYNC:
-   D1(" W_S");
+   hso_dbg(0x1, " W_S\n");
count = 0;
break;
default:
-   D1(" ");
+   hso_dbg(0x1, "\n");
count--;
break;
}
@@ -1020,7 +1010,7 @@ static void read_bulk_callback(struct urb *urb)
 
/* Sanity check */
if (!odev || !test_bit(HSO_NET_RUNNING, >flags)) {
-   D1("BULK IN callback but driver is not active!");
+   hso_dbg(0x1, "BULK IN callback but driver is not active!\n");
return;
}
usb_mark_last_busy(urb->dev);
@@ -1116,7 +1106,7 @@ static void _hso_serial_set_termios(struct tty_struct 
*tty,
return;
}
 
-   D4("port %d", serial->minor);
+   hso_dbg(0x8, "port %d\n", serial->minor);
 
/*

Re: [ovs-dev] [PATCH net-next v21 3/4] openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes

2016-09-02 Thread pravin shelar

On Fri, Sep 2, 2016 at 2:42 PM, pravin shelar  wrote:
> On Thu, Sep 1, 2016 at 1:45 PM, Eric Garver  wrote:
>> Add support for 802.1ad including the ability to push and pop double
>> tagged vlans. Add support for 802.1ad to netlink parsing and flow
>> conversion. Uses double nested encap attributes to represent double
>> tagged vlan. Inner TPID encoded along with ctci in nested attributes.
>>
>> This is based on Thomas F Herbert's original v20 patch. I made some
>> small clean ups and bug fixes.
>>
>> Signed-off-by: Thomas F Herbert 
>> Signed-off-by: Eric Garver 
>

checkpatch.pl is also not happy with the patch, It found bunch of issues.

./scripts/checkpatch.pl
0003-openvswitch-802.1AD-Flow-handling-actions-vlan-parsi.patch
...
...
total: 6 errors, 9 warnings, 1 checks, 460 lines checked

Re: [ovs-dev] [PATCH net-next v21 4/4] openvswitch: report error on VLAN nlattr in ovs_key_from_nlattrs()

2016-09-02 Thread pravin shelar

On Thu, Sep 1, 2016 at 1:45 PM, Eric Garver  wrote:
> With 802.1ad support these are parsed and set upfront by
> parse_vlan_from_nlattrs() before ovs_key_from_nlattrs() is ever called.
> As such we should never see a VLAN attribute in ovs_key_from_nlattrs().
>
> Signed-off-by: Eric Garver 
> ---
>  net/openvswitch/flow_netlink.c | 19 +--
>  1 file changed, 5 insertions(+), 14 deletions(-)
>
> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index fbe9e0e4792d..0f36aead744b 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
> @@ -923,20 +923,11 @@ static int ovs_key_from_nlattrs(struct net *net, struct 
> sw_flow_match *match,
> }
>
> if (attrs & (1 << OVS_KEY_ATTR_VLAN)) {
> -   __be16 tci;
> -
> -   tci = nla_get_be16(a[OVS_KEY_ATTR_VLAN]);
> -   if (!(tci & htons(VLAN_TAG_PRESENT))) {
> -   if (is_mask)
> -   OVS_NLERR(log, "VLAN TCI mask does not have 
> exact match for VLAN_TAG_PRESENT bit.");
> -   else
> -   OVS_NLERR(log, "VLAN TCI does not have 
> VLAN_TAG_PRESENT bit set.");
> -
> -   return -EINVAL;
> -   }
> -
> -   SW_FLOW_KEY_PUT(match, eth.vlan.tci, tci, is_mask);
> -   attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
> +   /* VLAN attribute is always parsed before getting here since 
> it may
> +* occur multiple times.
> +*/
> +   OVS_NLERR(log, "VLAN attribute unexpected.");
> +   return -EINVAL;
> }
>
Is there reason for not merging this patch with earlier patch?

Re: [ovs-dev] [PATCH net-next v21 1/4] openvswitch: 802.1ad uapi changes.

2016-09-02 Thread pravin shelar

On Thu, Sep 1, 2016 at 1:45 PM, Eric Garver  wrote:
> From: Thomas F Herbert 
>
> openvswitch: Add support for 8021.AD
>
> Change the description of the VLAN tpid field.
>
> Signed-off-by: Thomas F Herbert 

Acked-by: Pravin B Shelar

Re: [ovs-dev] [PATCH net-next v21 2/4] vlan: Check for vlan ethernet types for 8021.q or 802.1ad

2016-09-02 Thread pravin shelar

On Thu, Sep 1, 2016 at 1:45 PM, Eric Garver  wrote:
> This is to simplify using double tagged vlans. This function allows all
> valid vlan ethertypes to be checked in a single function call.
> Also replace some instances that check for both ETH_P_8021Q and
> ETH_P_8021AD.
>
> Patch based on one originally by Thomas F Herbert.
>
> Signed-off-by: Thomas F Herbert 
> Signed-off-by: Eric Garver 

Acked-by: Pravin B Shelar

Re: [ovs-dev] [PATCH net-next v21 3/4] openvswitch: 802.1AD Flow handling, actions, vlan parsing, netlink attributes

2016-09-02 Thread pravin shelar

On Thu, Sep 1, 2016 at 1:45 PM, Eric Garver  wrote:
> Add support for 802.1ad including the ability to push and pop double
> tagged vlans. Add support for 802.1ad to netlink parsing and flow
> conversion. Uses double nested encap attributes to represent double
> tagged vlan. Inner TPID encoded along with ctci in nested attributes.
>
> This is based on Thomas F Herbert's original v20 patch. I made some
> small clean ups and bug fixes.
>
> Signed-off-by: Thomas F Herbert 
> Signed-off-by: Eric Garver 

Thanks for working on this. This version looks pretty clone to complete.

> ---
>  net/openvswitch/actions.c  |  16 +--
>  net/openvswitch/flow.c |  64 
>  net/openvswitch/flow.h |   8 +-
>  net/openvswitch/flow_netlink.c | 227 
> ++---
>  net/openvswitch/vport.c|   7 +-
>  5 files changed, 235 insertions(+), 87 deletions(-)
>
...
> diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
> index 0ea128eeeab2..13f6ebdf379b 100644
> --- a/net/openvswitch/flow.c
> +++ b/net/openvswitch/flow.c
> @@ -302,24 +302,56 @@ static bool icmp6hdr_ok(struct sk_buff *skb)
>   sizeof(struct icmp6hdr));
>  }
>
> -static int parse_vlan(struct sk_buff *skb, struct sw_flow_key *key)
> +/**
> + * Parse vlan tag from vlan header.
> + * Returns ERROR on memory error.
> + * Returns 0 if it encounters a non-vlan or incomplete packet.
> + * Returns 1 after successfully parsing vlan tag.
> + */
> +static int parse_vlan_tag(struct sk_buff *skb, struct vlan_head *vlan)
>  {
> -   struct qtag_prefix {
> -   __be16 eth_type; /* ETH_P_8021Q */
> -   __be16 tci;
> -   };
> -   struct qtag_prefix *qp;
> +   struct vlan_head *qp = (struct vlan_head *)skb->data;
>
> -   if (unlikely(skb->len < sizeof(struct qtag_prefix) + sizeof(__be16)))
> +   if (likely(!eth_type_vlan(qp->tpid)))
> return 0;
>
> -   if (unlikely(!pskb_may_pull(skb, sizeof(struct qtag_prefix) +
> -sizeof(__be16
> +   if (unlikely(skb->len < sizeof(struct vlan_head) + sizeof(__be16)))
> +   return 0;
> +
> +   if (unlikely(!pskb_may_pull(skb, sizeof(struct vlan_head) +
> +sizeof(__be16
> return -ENOMEM;
>
pskb_may_pull() can change skb->data, so you need to refresh qp pointer.

> -   qp = (struct qtag_prefix *) skb->data;
> -   key->eth.tci = qp->tci | htons(VLAN_TAG_PRESENT);
> -   __skb_pull(skb, sizeof(struct qtag_prefix));
> +   vlan->tci = qp->tci | htons(VLAN_TAG_PRESENT);
> +   vlan->tpid = qp->tpid;
> +
> +   __skb_pull(skb, sizeof(struct vlan_head));
> +   return 1;
> +}
> +
...
...

> diff --git a/net/openvswitch/flow_netlink.c b/net/openvswitch/flow_netlink.c
> index c78a6a1476fb..fbe9e0e4792d 100644
> --- a/net/openvswitch/flow_netlink.c
> +++ b/net/openvswitch/flow_netlink.c
...

> +static int __parse_vlan_from_nlattrs(struct sw_flow_match *match,
> +u64 *key_attrs, bool inner,
> +const struct nlattr **a, bool is_mask,
> +bool log)
> +{
> +   int err;
> +   const struct nlattr *encap;
> +
> +   err = encode_vlan_from_nlattrs(match, a, is_mask, inner, log);
> +   if (err)
> +   return err;
> +
> +   *key_attrs &= ~(1 << OVS_KEY_ATTR_ENCAP);
> +
> +   /* Ensure that tci key attribute isn't
> +* overwritten by encapsulated customer tci.
> +* Ethertype is cleared because it is c_tpid.
> +*/
> +   *key_attrs &= ~(1 << OVS_KEY_ATTR_VLAN);
> +   *key_attrs &= ~(1 << OVS_KEY_ATTR_ETHERTYPE);
> +
> +   encap = a[OVS_KEY_ATTR_ENCAP];
> +
> +   if (is_mask)
> +   err = parse_flow_mask_nlattrs(encap, a, key_attrs, log);
> +   else
> +   err = parse_flow_nlattrs(encap, a, key_attrs, log);
> +
> +   return err;
> +}
> +
> +static int parse_vlan_from_nlattrs(struct sw_flow_match *match,
> +  u64 *key_attrs, bool *ie_valid,
> +  const struct nlattr **a, bool is_mask,
> +  bool log)
> +{
> +   int err;
> +
> +   err = __parse_vlan_from_nlattrs(match, key_attrs,
> +   false, a, is_mask, log);
> +   if (err)
> +   return err;
> +
> +   if (!is_mask) {
> +   if ((*key_attrs & (1 << OVS_KEY_ATTR_ETHERTYPE)) &&
> +   eth_type_vlan(nla_get_be16(a[OVS_KEY_ATTR_ETHERTYPE]))) {
> +
> +   if (!((*key_attrs & (1 << OVS_KEY_ATTR_VLAN)) &&
> + (*key_attrs & (1 << OVS_KEY_ATTR_ENCAP {
> +   OVS_NLERR(log, "Invalid Inner VLAN frame");
> +

Re: [PATCH net-next 2/3] smsc95xx: Add register define

2016-09-02 Thread Joe Perches

On Fri, 2016-09-02 at 20:34 +, woojung@microchip.com wrote:
> From: Woojung Huh 
> 
> Add STRAP_STATUS defines.
[]
> diff --git a/drivers/net/usb/smsc95xx.h b/drivers/net/usb/smsc95xx.h
[]
> @@ -144,6 +144,14 @@
>  
>  #define BURST_CAP(0x38)
>  
> +#define  STRAP_STATUS(0x3C)
> +#define  STRAP_STATUS_PWR_SEL_   (0x0020)
> +#define  STRAP_STATUS_AMDIX_EN_  (0x0010)
> +#define  STRAP_STATUS_PORT_SWAP_ (0x0008)
> +#define  STRAP_STATUS_EEP_SIZE_  (0x0004)
> +#define  STRAP_STATUS_RMT_WKP_   (0x0002)
> +#define  STRAP_STATUS_EEP_DISABLE_   (0x0001)

Using BIT would be more common.

Ending the #defines with an underscore is just odd
and unappealing.

[PATCH net-next 3/3] smsc95xx: Add mdix control via ethtool

2016-09-02 Thread Woojung.Huh

From: Woojung Huh 

Add mdix control through ethtool.

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/smsc95xx.c | 109 +++--
 1 file changed, 106 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/smsc95xx.c b/drivers/net/usb/smsc95xx.c
index dc989a8..831aa33 100644
--- a/drivers/net/usb/smsc95xx.c
+++ b/drivers/net/usb/smsc95xx.c
@@ -33,7 +33,7 @@
 #include "smsc95xx.h"
 
 #define SMSC_CHIPNAME  "smsc95xx"
-#define SMSC_DRIVER_VERSION"1.0.4"
+#define SMSC_DRIVER_VERSION"1.0.5"
 #define HS_USB_PKT_SIZE(512)
 #define FS_USB_PKT_SIZE(64)
 #define DEFAULT_HS_BURST_CAP_SIZE  (16 * 1024 + 5 * HS_USB_PKT_SIZE)
@@ -64,6 +64,7 @@
 #define CARRIER_CHECK_DELAY (2 * HZ)
 
 struct smsc95xx_priv {
+   u32 chip_id;
u32 mac_cr;
u32 hash_hi;
u32 hash_lo;
@@ -71,6 +72,7 @@ struct smsc95xx_priv {
spinlock_t mac_cr_lock;
u8 features;
u8 suspend_flags;
+   u8 mdix_ctrl;
bool link_ok;
struct delayed_work carrier_check;
struct usbnet *dev;
@@ -782,14 +784,113 @@ static int smsc95xx_ethtool_set_wol(struct net_device 
*net,
return ret;
 }
 
+static int get_mdix_status(struct net_device *net)
+{
+   struct usbnet *dev = netdev_priv(net);
+   u32 val;
+   int buf;
+
+   buf = smsc95xx_mdio_read(dev->net, dev->mii.phy_id, SPECIAL_CTRL_STS);
+   if (buf & SPECIAL_CTRL_STS_OVRRD_AMDIX_) {
+   if (buf & SPECIAL_CTRL_STS_AMDIX_ENABLE_)
+   return ETH_TP_MDI_AUTO;
+   else if (buf & SPECIAL_CTRL_STS_AMDIX_STATE_)
+   return ETH_TP_MDI_X;
+   } else {
+   buf = smsc95xx_read_reg(dev, STRAP_STATUS, );
+   if (val & STRAP_STATUS_AMDIX_EN_)
+   return ETH_TP_MDI_AUTO;
+   }
+
+   return ETH_TP_MDI;
+}
+
+static void set_mdix_status(struct net_device *net, __u8 mdix_ctrl)
+{
+   struct usbnet *dev = netdev_priv(net);
+   struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
+   int buf;
+
+   if ((pdata->chip_id == ID_REV_CHIP_ID_9500A_) ||
+   (pdata->chip_id == ID_REV_CHIP_ID_9530_) ||
+   (pdata->chip_id == ID_REV_CHIP_ID_89530_) ||
+   (pdata->chip_id == ID_REV_CHIP_ID_9730_)) {
+   /* Extend Manual AutoMDIX timer for 9500A/9500Ai */
+   buf = smsc95xx_mdio_read(dev->net, dev->mii.phy_id,
+PHY_EDPD_CONFIG);
+   buf |= PHY_EDPD_CONFIG_EXT_CROSSOVER_;
+   smsc95xx_mdio_write(dev->net, dev->mii.phy_id,
+   PHY_EDPD_CONFIG, buf);
+   }
+
+   if (mdix_ctrl == ETH_TP_MDI) {
+   buf = smsc95xx_mdio_read(dev->net, dev->mii.phy_id,
+SPECIAL_CTRL_STS);
+   buf |= SPECIAL_CTRL_STS_OVRRD_AMDIX_;
+   buf &= ~(SPECIAL_CTRL_STS_AMDIX_ENABLE_ |
+SPECIAL_CTRL_STS_AMDIX_STATE_);
+   smsc95xx_mdio_write(dev->net, dev->mii.phy_id,
+   SPECIAL_CTRL_STS, buf);
+   } else if (mdix_ctrl == ETH_TP_MDI_X) {
+   buf = smsc95xx_mdio_read(dev->net, dev->mii.phy_id,
+SPECIAL_CTRL_STS);
+   buf |= SPECIAL_CTRL_STS_OVRRD_AMDIX_;
+   buf &= ~(SPECIAL_CTRL_STS_AMDIX_ENABLE_ |
+SPECIAL_CTRL_STS_AMDIX_STATE_);
+   buf |= SPECIAL_CTRL_STS_AMDIX_STATE_;
+   smsc95xx_mdio_write(dev->net, dev->mii.phy_id,
+   SPECIAL_CTRL_STS, buf);
+   } else if (mdix_ctrl == ETH_TP_MDI_AUTO) {
+   buf = smsc95xx_mdio_read(dev->net, dev->mii.phy_id,
+SPECIAL_CTRL_STS);
+   buf &= ~SPECIAL_CTRL_STS_OVRRD_AMDIX_;
+   buf &= ~(SPECIAL_CTRL_STS_AMDIX_ENABLE_ |
+SPECIAL_CTRL_STS_AMDIX_STATE_);
+   buf |= SPECIAL_CTRL_STS_AMDIX_ENABLE_;
+   smsc95xx_mdio_write(dev->net, dev->mii.phy_id,
+   SPECIAL_CTRL_STS, buf);
+   }
+   pdata->mdix_ctrl = mdix_ctrl;
+}
+
+static int smsc95xx_get_settings(struct net_device *net,
+struct ethtool_cmd *cmd)
+{
+   struct usbnet *dev = netdev_priv(net);
+   struct smsc95xx_priv *pdata = (struct smsc95xx_priv *)(dev->data[0]);
+   int retval;
+
+   retval = usbnet_get_settings(net, cmd);
+
+   cmd->eth_tp_mdix = pdata->mdix_ctrl;
+   cmd->eth_tp_mdix_ctrl = pdata->mdix_ctrl;
+
+   return retval;
+}
+
+static int smsc95xx_set_settings(struct net_device *net,
+struct ethtool_cmd *cmd)
+{
+   struct usbnet *dev =

[PATCH net-next 1/3] smsc95xx: Add maintainer

2016-09-02 Thread Woojung.Huh

From: Woojung Huh 

Add Microchip Linux Driver Support as maintainer
because this driver is maintaining by Microchip.

Signed-off-by: Woojung Huh 
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 58b5029..324e3b8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12279,6 +12279,7 @@ F:  drivers/net/usb/smsc75xx.*
 
 USB SMSC95XX ETHERNET DRIVER
 M: Steve Glendinning 
+M: Microchip Linux Driver Support 
 L: netdev@vger.kernel.org
 S: Maintained
 F: drivers/net/usb/smsc95xx.*
-- 
2.7.4

[PATCH net-next 2/3] smsc95xx: Add register define

2016-09-02 Thread Woojung.Huh

From: Woojung Huh 

Add STRAP_STATUS defines.

Signed-off-by: Woojung Huh 
---
 drivers/net/usb/smsc95xx.h | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/usb/smsc95xx.h b/drivers/net/usb/smsc95xx.h
index 526faa0..29a4d9e 100644
--- a/drivers/net/usb/smsc95xx.h
+++ b/drivers/net/usb/smsc95xx.h
@@ -144,6 +144,14 @@
 
 #define BURST_CAP  (0x38)
 
+#defineSTRAP_STATUS(0x3C)
+#defineSTRAP_STATUS_PWR_SEL_   (0x0020)
+#defineSTRAP_STATUS_AMDIX_EN_  (0x0010)
+#defineSTRAP_STATUS_PORT_SWAP_ (0x0008)
+#defineSTRAP_STATUS_EEP_SIZE_  (0x0004)
+#defineSTRAP_STATUS_RMT_WKP_   (0x0002)
+#defineSTRAP_STATUS_EEP_DISABLE_   (0x0001)
+
 #define GPIO_WAKE  (0x64)
 
 #define INT_EP_CTL (0x68)
-- 
2.7.4

[PATCH net-next 0/3] smsc95xx: patch serises

2016-09-02 Thread Woojung.Huh

From: Woojung Huh 

Woojung Huh (3):
  Smsc95xx: Add maintainer
  smsc95xx: Add register define
  smsc95xx: Add mdix control via ethtool

 MAINTAINERS|   1 +
 drivers/net/usb/smsc95xx.c | 109 +++--
 drivers/net/usb/smsc95xx.h |   8 
 3 files changed, 115 insertions(+), 3 deletions(-)

-- 
2.7.4

Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-02 Thread Hannes Frederic Sowa

On 02.09.2016 20:13, Linus Torvalds wrote:
> 
> From: Linus Torvalds 
> Date: Thu, 1 Sep 2016 14:43:53 -0700
> Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 
> 'bindlock'
> 
> Right now we use the 'readlock' both for protecting some of the af_unix
> IO path and for making the bind be single-threaded.
> 
> The two are independent, but using the same lock makes for a nasty
> deadlock due to ordering with regards to filesystem locking.  The bind
> locking would want to nest outside the VSF pathname locking, but the IO
> locking wants to nest inside some of those same locks.
> 
> We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
> splice-bind deadlock") which moved the readlock inside the vfs locks,
> but that caused problems with overlayfs that will then call back into
> filesystem routines that take the lock in the wrong order anyway.
> 
> Splitting the locks means that we can go back to having the bind lock be
> the outermost lock, and we don't have any deadlocks with lock ordering.
> 
> Acked-by: Rainer Weikusat 
> Acked-by: Al Viro 
> Signed-off-by: Linus Torvalds 
> ---
> 
> This patch is really trivial, and I've tried to be careful and look at the 
> locking, but somebody who really knows the AF_UNIX code should definitely 
> take a second look.
> 
> Note that I did the revert (that re-introduces the original splice 
> deadlock) first, because that made the whole series much easier to 
> explain. Doing it in the other order made the revert nastier because this 
> patch obviously touches the same code that the revert in 1/2 does.
> 
> So this way the series ends up being "go back to the original code with 
> the original deadlock, and then fix that original deadlock by splitting 
> the bind lock".

Acked-by: Hannes Frederic Sowa

Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-02 Thread Hannes Frederic Sowa

On 02.09.2016 21:15, David Miller wrote:
> From: Linus Torvalds 
> Date: Fri, 2 Sep 2016 11:17:18 -0700
> 
>> Oh, this was missing a
>>
>>   Reported-and-tested-by: CAI Qian 
>>
>> who found the new deadlock.
>>
>> There's now *another* lockdep deadlock report by him, but that one has
>> nothing to do with networking.
>>
>> (And neither of these deadlocks will actually deadlock the machine in
>> practice, but you can trigger the lockdep reports with some odd splice
>> patterns and overlayfs use)
> 
> I read over this and can't find any problems.
> 
> The main thing I was concerned about was an I/O path that really
> expects the socket's hash not to change for whatever reason, but even
> all of the unix_find_other() calls are done outside of the mutex
> already.

Furthermore we don't allow rehashing, if the socket is "published" once,
it can only be destroyed but can't change identity during its lifetime.
IIRC this was another bug we had a while ago.

Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-02 Thread Linus Torvalds

On Fri, Sep 2, 2016 at 12:15 PM, David Miller  wrote:
>
> The main thing I was concerned about was an I/O path that really
> expects the socket's hash not to change for whatever reason, but even
> all of the unix_find_other() calls are done outside of the mutex
> already.

That's my read of it too. I did actually go through the effort on
trying to follow all the locking in there yesterday, so while I would
want an AF_UNIX person to double-check me, I think it's fine.

The real locking seem to be the u->lock spinlock (taken by
unix_state_lock() and friends).

The bindlock is only about serializing the binders, which do things
that can sleep (notably the mknod of the unix socket node in the
filesystem).

The one thing I looked at and wasn't sure about was
unix_stream_connect(), which creates a *new* unix domain socket and
binds it to the old one without holding the bindlock. Note that it
never did hold the lock, so that's not a new thing, but it worries me
a bit.

I *think* it's ok, because I think this all happens before that new
socket is actually reachable, so it cannot race against somebody else
doing a bind. And my patch doesn't actually change anything in this
area, so I'm not making it worse. But it was the one thing I reacted
to when going through the locking there.

(Well, not the "one thing". The other thing I reacted to was the
overall reaction that none of this is documented, and the locking was
clearly not very clear).

 Linus

Re: [PATCH iproute2 net-next] nstat: add sctp snmp support

2016-09-02 Thread Stephen Hemminger

On Fri, 2 Sep 2016 12:46:49 -0700
Stephen Hemminger  wrote:

> On Fri,  2 Sep 2016 15:12:38 +0800
> Hangbin Liu  wrote:
> 
> > SCTP module was not load by default. But this should be OK since we will not
> > load table if fdopen() failed.
> > 
> > Signed-off-by: Hangbin Liu   
> 
> This seems like a bad idea for the normal distro user.
> If they run nstat command the SCTP kernel module would get loaded even
> if never used.
> 
> Makes more sense not to display SCTP statistics if it isn't loaded.
> 

Never mind, opening the proc file won't load SCTP kernel module.

Re: [PATCH iproute2 net-next] nstat: add sctp snmp support

2016-09-02 Thread Stephen Hemminger

On Fri,  2 Sep 2016 15:12:38 +0800
Hangbin Liu  wrote:

> SCTP module was not load by default. But this should be OK since we will not
> load table if fdopen() failed.
> 
> Signed-off-by: Hangbin Liu 

This seems like a bad idea for the normal distro user.
If they run nstat command the SCTP kernel module would get loaded even
if never used.

Makes more sense not to display SCTP statistics if it isn't loaded.

Re: [RFC Patch net-next 0/6] net_sched: really switch to RCU for tc actions

2016-09-02 Thread Cong Wang

On Fri, Sep 2, 2016 at 12:09 AM, Jiri Pirko  wrote:
>
> I wonder, do you happen to have a very tiny narrow screen?

LOL, yeah, I should fix the indentation... ;)

Re: [net-next PATCH =v2] e1000: add initial XDP support

2016-09-02 Thread John Fastabend

On 16-09-01 11:50 PM, Jesper Dangaard Brouer wrote:
> On Thu, 01 Sep 2016 14:39:44 -0700
> John Fastabend  wrote:
> 
>> From: Alexei Starovoitov 
>>
>> This patch adds initial support for XDP on e1000 driver. Note e1000
>> driver does not support page recycling in general which could be
>> added as a further improvement. However XDP_DROP case will recycle.
>> XDP_TX and XDP_PASS do not support recycling yet.
>>
>> This patch includes the rcu_read_lock/rcu_read_unlock pair noted by
>> Brenden Blanco in another pending patch.
>>
>>   net/mlx4_en: protect ring->xdp_prog with rcu_read_lock
>>
>> I tested this patch running e1000 in a VM using KVM over a tap
>> device.
>>
>> CC: William Tu 
>> Signed-off-by: Alexei Starovoitov 
>> Signed-off-by: John Fastabend 
>> ---
>>  drivers/net/ethernet/intel/e1000/e1000.h  |2 
>>  drivers/net/ethernet/intel/e1000/e1000_main.c |  170 
>> +
>>  2 files changed, 169 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/intel/e1000/e1000.h 
>> b/drivers/net/ethernet/intel/e1000/e1000.h
>> index d7bdea7..5cf8a0a 100644
>> --- a/drivers/net/ethernet/intel/e1000/e1000.h
>> +++ b/drivers/net/ethernet/intel/e1000/e1000.h
>> @@ -150,6 +150,7 @@ struct e1000_adapter;
>>   */
>>  struct e1000_tx_buffer {
>>  struct sk_buff *skb;
>> +struct page *page;
>>  dma_addr_t dma;
>>  unsigned long time_stamp;
>>  u16 length;
>> @@ -279,6 +280,7 @@ struct e1000_adapter {
>>   struct e1000_rx_ring *rx_ring,
>>   int cleaned_count);
>>  struct e1000_rx_ring *rx_ring;  /* One per active queue */
>> +struct bpf_prog *prog;
> 
> The bpf_prog should be in the rx_ring structure.
> 

ok sure it helps I guess if you use e1000 as a template for implementing
XDP and logically makes a bit more sense. But it doesn't functionally
matter here.


>>  struct napi_struct napi;
>>  
>>  int num_tx_queues;

[...]

>> +static void e1000_xmit_raw_frame(struct e1000_rx_buffer *rx_buffer_info,
>> + unsigned int len,
>> + struct net_device *netdev,
>> + struct e1000_adapter *adapter)
>> +{
>> +struct netdev_queue *txq = netdev_get_tx_queue(netdev, 0);
>> +struct e1000_hw *hw = >hw;
>> +struct e1000_tx_ring *tx_ring;
>> +
>> +if (len > E1000_MAX_DATA_PER_TXD)
>> +return;
>> +
>> +/* e1000 only support a single txq at the moment so the queue is being
>> + * shared with stack. To support this requires locking to ensure the
>> + * stack and XDP are not running at the same time. Devices with
>> + * multiple queues should allocate a separate queue space.
>> + */
>> +HARD_TX_LOCK(netdev, txq, smp_processor_id());
>> +
>> +tx_ring = adapter->tx_ring;
>> +
>> +if (E1000_DESC_UNUSED(tx_ring) < 2)
>> +return;
>> +
>> +e1000_tx_map_rxpage(tx_ring, rx_buffer_info, len);
>> +
>> +e1000_tx_queue(adapter, tx_ring, 0/*tx_flags*/, 1);
>> +
>> +writel(tx_ring->next_to_use, hw->hw_addr + tx_ring->tdt);
>> +mmiowb();
>> +
>> +HARD_TX_UNLOCK(netdev, txq);
>> +}
> 
> Above is going to give really bad XDP_TX performance. Both locking and
> a HW TX tailptr pointer per TX packet, that is as bad as it gets.
> 

Yep.

> You might say this is just for testing my eBPF-XDP program. BUT people
> wanting to try XDP is going to start with this driver, and they will be
> disappointed and never return (and no they will not read the comment in
> the code).

hmm perhaps we should look at a vhost_net implementation for performance
setup. My gut feeling is vhost_net is a better target for performance.

> 
> It should be fairly easy to introduce a bulking/bundling XDP_TX
> facility into the TX-ring (taking HARD_TX_LOCK a single time), and then
> flush the TX-ring at the end of the loop (in e1000_clean_jumbo_rx_irq).
> All you need is an array/stack of RX *buffer_info ptrs being build up
> in the XDP_TX case. (Experiments show minimum bulking/array size should
> be 8).
> 
> If you want to get fancy, and save space in the bulking structure,
> then you can even just use the RX ring index "i" to describe which RX
> packets need to be XDP_TX'ed. (as the driver code "owns" this part of
> the ring, until updating rx_ring->next_to_clean).
> 

Sure I'll add this seems easy enough.

Re: [PATCH net 0/2] net: thunderx: Fixes for TSO offload issues

2016-09-02 Thread David Miller

From: sunil.kovv...@gmail.com
Date: Tue, 30 Aug 2016 11:36:25 +0530

> This patch series fixes couple of issues w.r.t HW TSO offload

Series applied, thanks.

Re: [PATCH] net/mlx4_en: protect ring->xdp_prog with rcu_read_lock

2016-09-02 Thread Tom Herbert

On Fri, Sep 2, 2016 at 11:13 AM, Brenden Blanco  wrote:
> On Mon, Aug 29, 2016 at 10:46:38AM -0700, Tom Herbert wrote:
> [...]
>> Brenden, tracking down how the structure is freed needed a few steps,
>> please make sure the RCU requirements are well documented. Also, I'm
> Really? It's just bpf_prog_put->call_rcu(__bpf_prog_put_rcu). I suppose
> what's missing is a general guideline for which functions new consumers
> of bpf should use, but I wouldn't trust myself to write such holistic
> documentation accurately (e.g. interacting with nmi probes and such).
>> still not a fan of using xchg to set the program, seems that a lock
>> could be used in that path.
> Where would such a lock go? Everything in mlx4/en_netdev.c relies on
> rtnl, which seems sufficient and obvious...adding some new field
> specific lock would be distracting and unneeded.

Just use the same mutex_lock that is already being taken in case when
priv->xdp_ring_num != xdp_ring_num. Then use normal rcu functions to
dereference and assign pointers.

Tom

>>
>> Thanks,
>> Tom

Re: [PATCH net v2 1/2] ipv6: add missing netconf notif when 'all' is updated

2016-09-02 Thread David Miller

From: Nicolas Dichtel 
Date: Tue, 30 Aug 2016 10:09:21 +0200

> The 'default' value was not advertised.
> 
> Fixes: f3a1bfb11ccb ("rtnl/ipv6: use netconf msg to advertise forwarding 
> status")
> Signed-off-by: Nicolas Dichtel 

Applied.

Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-02 Thread David Miller

From: Linus Torvalds 
Date: Fri, 2 Sep 2016 11:17:18 -0700

> Oh, this was missing a
> 
>   Reported-and-tested-by: CAI Qian 
> 
> who found the new deadlock.
> 
> There's now *another* lockdep deadlock report by him, but that one has
> nothing to do with networking.
> 
> (And neither of these deadlocks will actually deadlock the machine in
> practice, but you can trigger the lockdep reports with some odd splice
> patterns and overlayfs use)

I read over this and can't find any problems.

The main thing I was concerned about was an I/O path that really
expects the socket's hash not to change for whatever reason, but even
all of the unix_find_other() calls are done outside of the mutex
already.

Re: [PATCH net v2 2/2] netconf: add a notif when settings are created

2016-09-02 Thread David Miller

From: Nicolas Dichtel 
Date: Tue, 30 Aug 2016 10:09:22 +0200

> All changes are notified, but the initial state was missing.
> 
> Signed-off-by: Nicolas Dichtel 

Applied.

Re: Centralizing support for TCAM?

2016-09-02 Thread Andrew Lunn

> I need to get reasonable operations per second on the TCAM tables.
> Reasonable for me being somewhere in the 50k to 100k add/del/update
> commands per second. I'm hesitant to create more abstractions then
> are actually needed.

Hi John

That is an interesting requirement. Could you explain why?

The Marvell TCAM is accessed using MDIO. Maybe it can do 50
add/del/updates per second. For a WiFi access point, or cable modem,
even 50 per second seems ample, they get set at boot and never
changed.

Andrew

Re: [PATCH 0/4] SA11x0 Clocks and removal of Neponset SMC91x hack

2016-09-02 Thread Russell King - ARM Linux

On Thu, Sep 01, 2016 at 04:32:41PM -0700, David Miller wrote:
> From: Russell King - ARM Linux 
> Date: Tue, 30 Aug 2016 11:51:17 +0100
> 
> > Please ack these changes; due to the dependencies, I wish to merge
> > them through my tree.  Thanks.
> 
> Acked-by: David S. Miller 

Many thanks David, I'll add that only to the two SMC91x patches.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

Re: Centralizing support for TCAM?

2016-09-02 Thread John Fastabend

On 16-09-02 10:18 AM, Florian Fainelli wrote:
> Hi all,
> 

Hi Florian,

> (apologies for the long CC list and the fact that I can't type correctly
> email addresses)
> 

My favorite topic ;)

> While working on adding support for the Broadcom Ethernet switches
> Compact Field Processor (which is essentially a TCAM,
> action/policer/rate meter RAMs, 256 entries), I started working with the
> ethtool::rxnfc API which is actually kind of nice in that it fits nicely
> with my use simple case of being able to insert rules at a given or
> driver selected location and has a pretty good flow representation for
> common things you may match: TCP/UDP v4/v6 (not so much for non-IP, or
> L2 stuff though you can use the extension flow representation). It lacks
> support for more complex actions other than redirect to a particular
> port/queue.

When I was doing this for one of the products I work on I decided that
extending ethtool was likely not a good approach and building a netlink
interface would be a better choice. My reasons were mainly extending
ethtool is a bit painful to keep structure compatibility across versions
and I also had use cases that wanted to get notifications both made
easier when using netlink. However my netlink port+extensions were not
accepted and were called a "kernel bypass" and the general opinion was
that it was not going to be accepted upstream. Hence the 'tc' effort.

> 
> Now ethtool::rxnfc is one possible user, but tc and netfiler also are,
> more powerful and extensible, but since this is a resource constrained
> piece of hardware, and it would suck for people to have to implement
> these 3 APIs if we could come up with a central one that satisfies the
> superset offered by tc + netfilter. We can surely imagine an use case we

My opinion is that tc and netfilter are sufficiently different that
building a common layer is challenging and is actually more complex vs
just implementing two interfaces. Always happy to review code though.

There is also an already established packet flow through tc, netfilter,
fdb, l3 in linux that folks want to maintain. At the moment I just don't
see the need for a common layer IMO.

Also adding another layer of abstraction so we end up doing multiple
translations into and out of these layers adds overhead. Eventually
I need to get reasonable operations per second on the TCAM tables.
Reasonable for me being somewhere in the 50k to 100k add/del/update
commands per second. I'm hesitant to create more abstractions then
are actually needed.

> centralize the whole matching + action into a Domain Specific Language
> that we compiled into eBPF and then translate into whatever the HW
> understands, although that raises the question of where do we put the
> translation tool in user space or kernel space.

The eBPF to HW translation I started to look at but gave up. The issue
was the program space of eBPF is much larger than any traditional
parser, table hardware implementation can support so most programs get
rejected (obvious observation right?). I'm more inclined to build
hardware that can support eBPF vs restricting eBPF to fit into a
parser/table model.

Surely something like P4 (DSL) -> ebpf -> HW can constrain the ebpf
programs so they can be loaded without issues. This might be worth
while but mapping it onto 'tc' classifiers like cls_{u32|flower} is a
bit more straight forward.

> 
> So what's everybody's take on this?

Seems a good time to bring up my other issue. When I have a pipeline
with multiple TCAM tables I was trying to figure out how to abstract
that in Linux. Something like the following

TCAM -> exact match -> TCAM -> exact match

So for now I was thinking of lifting two netdevs into linux something
like, ethx-frontend, ethx-backend. Where rules added to the frontend
go into the front part of the pipeline and rules added to the backend
go into the second half of the pipeline.

It probably needs more thought.

> 
> Thanks!
> 

Not sure that helps but my suggestion is to see if the
cls_u32/cls_flower implementation that exists today solves at least
the TCAM entry problem. Note the "order" field in u32 allows you to
place rules in user specific order.

.John

[PATCH net-next v2 1/3] net: dsa: mv88e6xxx: fix module naming

2016-09-02 Thread Vivien Didelot

Since the mv88e6xxx.c file has been renamed, the driver compiled as a
module is called chip.ko instead of mv88e6xxx.ko. Fix this.

Fixes: fad09c73c270 ("net: dsa: mv88e6xxx: rename single-chip support")
Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/Makefile | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/Makefile 
b/drivers/net/dsa/mv88e6xxx/Makefile
index 6e29a75..e58b745 100644
--- a/drivers/net/dsa/mv88e6xxx/Makefile
+++ b/drivers/net/dsa/mv88e6xxx/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_NET_DSA_MV88E6XXX) += chip.o
+obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o
+mv88e6xxx-objs := chip.o
-- 
2.9.3

[PATCH net-next v2 0/3] net: dsa: mv88e6xxx: isolate Global2 support

2016-09-02 Thread Vivien Didelot

Registers of Marvell chips are organized in internal SMI devices.

One of them at address 0x1C is called Global2. It provides an extended
set of registers, used for interrupt control, EEPROM access, indirect
PHY access (to bypass the PHY Polling Unit) and cross-chip setup.

Most chips have it, but some others don't (older ones such as 6060).

Now that its related code is isolated in mv88e6xxx_g2_* functions, move
it to its own global2.c file, making most of its setup code static.

Then make its compilation optional, which allows to reduce the size of
the mv88e6xxx driver for devices such as home routers embedding Ethernet
chips without Global2 support.

It is present on most recent chips, thus enable its support by default.

Changes in v2: fail probe if GLOBAL2 is required but not enabled.

Vivien Didelot (3):
  net: dsa: mv88e6xxx: fix module naming
  net: dsa: mv88e6xxx: move Global2 code
  net: dsa: mv88e6xxx: make global2 code optional

 drivers/net/dsa/mv88e6xxx/Kconfig |  11 +
 drivers/net/dsa/mv88e6xxx/Makefile|   4 +-
 drivers/net/dsa/mv88e6xxx/chip.c  | 467 ++---
 drivers/net/dsa/mv88e6xxx/global2.c   | 471 ++
 drivers/net/dsa/mv88e6xxx/global2.h   |  88 +++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |   6 +
 6 files changed, 596 insertions(+), 451 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/global2.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/global2.h

-- 
2.9.3

[PATCH net-next v2 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Vivien Didelot

Since not every chip has a Global2 set of registers, make its support
optional, in which case the related functions will return -EOPNOTSUPP.

This also allows to reduce the size of the mv88e6xxx driver for devices
such as home routers embedding Ethernet chips without Global2 support.

It is present on most recent chips, thus enable its support by default.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/Kconfig   | 11 +++
 drivers/net/dsa/mv88e6xxx/Makefile  |  2 +-
 drivers/net/dsa/mv88e6xxx/chip.c|  4 +++
 drivers/net/dsa/mv88e6xxx/global2.h | 58 +
 4 files changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx/Kconfig 
b/drivers/net/dsa/mv88e6xxx/Kconfig
index ac77737..4866688 100644
--- a/drivers/net/dsa/mv88e6xxx/Kconfig
+++ b/drivers/net/dsa/mv88e6xxx/Kconfig
@@ -6,3 +6,14 @@ config NET_DSA_MV88E6XXX
help
  This driver adds support for most of the Marvell 88E6xxx models of
  Ethernet switch chips, except 88E6060.
+
+config NET_DSA_MV88E6XXX_GLOBAL2
+   bool "Switch Global 2 Registers support"
+   default y
+   depends on NET_DSA_MV88E6XXX
+   help
+ This registers set at internal SMI address 0x1C provides extended
+ features like EEPROM interface, trunking, cross-chip setup, etc.
+
+ It is required on most chips. If the chip you compile the support for
+ doesn't have such registers set, say N here. In doubt, say Y.
diff --git a/drivers/net/dsa/mv88e6xxx/Makefile 
b/drivers/net/dsa/mv88e6xxx/Makefile
index 5a42066..6971039 100644
--- a/drivers/net/dsa/mv88e6xxx/Makefile
+++ b/drivers/net/dsa/mv88e6xxx/Makefile
@@ -1,3 +1,3 @@
 obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o
 mv88e6xxx-objs := chip.o
-mv88e6xxx-objs += global2.o
+mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_GLOBAL2) += global2.o
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 745e158..70a812d 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -3444,6 +3444,10 @@ static int mv88e6xxx_detect(struct mv88e6xxx_chip *chip)
/* Update the compatible info with the probed one */
chip->info = info;
 
+   err = mv88e6xxx_g2_require(chip);
+   if (err)
+   return err;
+
dev_info(chip->dev, "switch 0x%x detected: %s, revision %u\n",
 chip->info->prod_num, chip->info->name, rev);
 
diff --git a/drivers/net/dsa/mv88e6xxx/global2.h 
b/drivers/net/dsa/mv88e6xxx/global2.h
index bee1bc9..c4bb903 100644
--- a/drivers/net/dsa/mv88e6xxx/global2.h
+++ b/drivers/net/dsa/mv88e6xxx/global2.h
@@ -16,6 +16,13 @@
 
 #include "mv88e6xxx.h"
 
+#ifdef CONFIG_NET_DSA_MV88E6XXX_GLOBAL2
+
+static inline int mv88e6xxx_g2_require(struct mv88e6xxx_chip *chip)
+{
+   return 0;
+}
+
 int mv88e6xxx_g2_smi_phy_read(struct mv88e6xxx_chip *chip, int addr, int reg,
  u16 *val);
 int mv88e6xxx_g2_smi_phy_write(struct mv88e6xxx_chip *chip, int addr, int reg,
@@ -27,4 +34,55 @@ int mv88e6xxx_g2_set_eeprom16(struct mv88e6xxx_chip *chip,
  struct ethtool_eeprom *eeprom, u8 *data);
 int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip);
 
+#else /* !CONFIG_NET_DSA_MV88E6XXX_GLOBAL2 */
+
+static inline int mv88e6xxx_g2_require(struct mv88e6xxx_chip *chip)
+{
+   if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_GLOBAL2)) {
+   dev_err(chip->dev, "this chip requires 
CONFIG_NET_DSA_MV88E6XXX_GLOBAL2 enabled\n");
+   return -EOPNOTSUPP;
+   }
+
+   return 0;
+}
+
+static inline int mv88e6xxx_g2_smi_phy_read(struct mv88e6xxx_chip *chip,
+   int addr, int reg, u16 *val)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int mv88e6xxx_g2_smi_phy_write(struct mv88e6xxx_chip *chip,
+int addr, int reg, u16 val)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int mv88e6xxx_g2_set_switch_mac(struct mv88e6xxx_chip *chip,
+ u8 *addr)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int mv88e6xxx_g2_get_eeprom16(struct mv88e6xxx_chip *chip,
+   struct ethtool_eeprom *eeprom,
+   u8 *data)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int mv88e6xxx_g2_set_eeprom16(struct mv88e6xxx_chip *chip,
+   struct ethtool_eeprom *eeprom,
+   u8 *data)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int mv88e6xxx_g2_setup(struct mv88e6xxx_chip *chip)
+{
+   return -EOPNOTSUPP;
+}
+
+#endif /* CONFIG_NET_DSA_MV88E6XXX_GLOBAL2 */
+
 #endif /* _MV88E6XXX_GLOBAL2_H */
-- 
2.9.3

[PATCH net-next v2 2/3] net: dsa: mv88e6xxx: move Global2 code

2016-09-02 Thread Vivien Didelot

Marvell chips are composed of multiple SMI devices. One of them at
address 0x1C is called Global2. It provides an extended set of
registers, used for interrupt control, EEPROM access, indirect PHY
access (to bypass the PHY Polling Unit) and cross-chip related setup.

Most chips have it, but some others don't (older ones such as 6060).

Now that its related code is isolated in mv88e6xxx_g2_* functions, move
it to its own global2.c file, making most of its setup code static.
Document each registers in the meantime.

Its compilation can be later avoided for chips without such registers.

Signed-off-by: Vivien Didelot 
---
 drivers/net/dsa/mv88e6xxx/Makefile|   1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 463 +
 drivers/net/dsa/mv88e6xxx/global2.c   | 471 ++
 drivers/net/dsa/mv88e6xxx/global2.h   |  30 +++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |   6 +
 5 files changed, 521 insertions(+), 450 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/global2.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/global2.h

diff --git a/drivers/net/dsa/mv88e6xxx/Makefile 
b/drivers/net/dsa/mv88e6xxx/Makefile
index e58b745..5a42066 100644
--- a/drivers/net/dsa/mv88e6xxx/Makefile
+++ b/drivers/net/dsa/mv88e6xxx/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_NET_DSA_MV88E6XXX) += mv88e6xxx.o
 mv88e6xxx-objs := chip.o
+mv88e6xxx-objs += global2.o
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d6b0f78..745e158 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -29,7 +29,9 @@
 #include 
 #include 
 #include 
+
 #include "mv88e6xxx.h"
+#include "global2.h"
 
 static void assert_reg_lock(struct mv88e6xxx_chip *chip)
 {
@@ -182,8 +184,7 @@ static const struct mv88e6xxx_ops 
mv88e6xxx_smi_multi_chip_ops = {
.write = mv88e6xxx_smi_multi_chip_write,
 };
 
-static int mv88e6xxx_read(struct mv88e6xxx_chip *chip,
- int addr, int reg, u16 *val)
+int mv88e6xxx_read(struct mv88e6xxx_chip *chip, int addr, int reg, u16 *val)
 {
int err;
 
@@ -199,8 +200,7 @@ static int mv88e6xxx_read(struct mv88e6xxx_chip *chip,
return 0;
 }
 
-static int mv88e6xxx_write(struct mv88e6xxx_chip *chip,
-  int addr, int reg, u16 val)
+int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, int reg, u16 val)
 {
int err;
 
@@ -306,8 +306,7 @@ static int mv88e6xxx_serdes_write(struct mv88e6xxx_chip 
*chip, int reg, u16 val)
reg, val);
 }
 
-static int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int addr, int reg,
- u16 mask)
+int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int addr, int reg, u16 mask)
 {
int i;
 
@@ -330,8 +329,7 @@ static int mv88e6xxx_wait(struct mv88e6xxx_chip *chip, int 
addr, int reg,
 }
 
 /* Indirect write to single pointer-data register with an Update bit */
-static int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int addr, int reg,
-   u16 update)
+int mv88e6xxx_update(struct mv88e6xxx_chip *chip, int addr, int reg, u16 
update)
 {
u16 val;
int err;
@@ -2878,330 +2876,6 @@ static int mv88e6xxx_g1_setup(struct mv88e6xxx_chip 
*chip)
return 0;
 }
 
-static int mv88e6xxx_g2_device_mapping_write(struct mv88e6xxx_chip *chip,
-int target, int port)
-{
-   u16 val = (target << 8) | (port & 0xf);
-
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_DEVICE_MAPPING, val);
-}
-
-static int mv88e6xxx_g2_set_device_mapping(struct mv88e6xxx_chip *chip)
-{
-   int target, port;
-   int err;
-
-   /* Initialize the routing port to the 32 possible target devices */
-   for (target = 0; target < 32; ++target) {
-   port = 0xf;
-
-   if (target < DSA_MAX_SWITCHES) {
-   port = chip->ds->rtable[target];
-   if (port == DSA_RTABLE_NONE)
-   port = 0xf;
-   }
-
-   err = mv88e6xxx_g2_device_mapping_write(chip, target, port);
-   if (err)
-   break;
-   }
-
-   return err;
-}
-
-static int mv88e6xxx_g2_trunk_mask_write(struct mv88e6xxx_chip *chip, int num,
-bool hask, u16 mask)
-{
-   const u16 port_mask = BIT(chip->info->num_ports) - 1;
-   u16 val = (num << 12) | (mask & port_mask);
-
-   if (hask)
-   val |= GLOBAL2_TRUNK_MASK_HASK;
-
-   return mv88e6xxx_update(chip, REG_GLOBAL2, GLOBAL2_TRUNK_MASK, val);
-}
-
-static int mv88e6xxx_g2_trunk_mapping_write(struct mv88e6xxx_chip *chip, int 
id,
-   u16 map)
-{
-   const u16 port_mask = BIT(chip->info->num_ports) - 1;
-   u16 val = (id << 11) | (map & port_mask);
-
-   return

Re: [PATCH net v2] l2tp: fix use-after-free during module unload

2016-09-02 Thread David Miller

From: Sabrina Dubroca 
Date: Fri,  2 Sep 2016 10:22:54 +0200

> Tunnel deletion is delayed by both a workqueue (l2tp_tunnel_delete -> wq
>  -> l2tp_tunnel_del_work) and RCU (sk_destruct -> RCU ->
> l2tp_tunnel_destruct).
> 
> By the time l2tp_tunnel_destruct() runs to destroy the tunnel and finish
> destroying the socket, the private data reserved via the net_generic
> mechanism has already been freed, but l2tp_tunnel_destruct() actually
> uses this data.
> 
> Make sure tunnel deletion for the netns has completed before returning
> from l2tp_exit_net() by first flushing the tunnel removal workqueue, and
> then waiting for RCU callbacks to complete.
> 
> Fixes: 167eb17e0b17 ("l2tp: create tunnel sockets in the right namespace")
> Signed-off-by: Sabrina Dubroca 
> ---
> v2: fix typo in commit message, spotted by Sergei Shtylyov

Applied.

ipv6: release dst in ping_v6_sendmsg

2016-09-02 Thread Dave Jones

Neither the failure or success paths of ping_v6_sendmsg release
the dst it acquires.  This leads to a flood of warnings from
"net/core/dst.c:288 dst_release" on older kernels that
don't have 8bf4ada2e21378816b28205427ee6b0e1ca4c5f1 backported.

That patch optimistically hoped this had been fixed post 3.10, but
it seems at least one case wasn't, where I've seen this triggered
a lot from machines doing unprivileged icmp sockets.

Cc: Martin Lau 
Signed-off-by: Dave Jones 

diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c
index 0900352c924c..0e983b694ee8 100644
--- a/net/ipv6/ping.c
+++ b/net/ipv6/ping.c
@@ -126,8 +126,10 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t len)
rt = (struct rt6_info *) dst;
 
np = inet6_sk(sk);
-   if (!np)
-   return -EBADF;
+   if (!np) {
+   err = -EBADF;
+   goto dst_err_out;
+   }
 
if (!fl6.flowi6_oif && ipv6_addr_is_multicast())
fl6.flowi6_oif = np->mcast_oif;
@@ -163,6 +165,9 @@ static int ping_v6_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t len)
}
release_sock(sk);
 
+dst_err_out:
+   dst_release(dst);
+
if (err)
return err;

Re: [PATCH v3 1/2] wcn36xx: Transition driver to SMD client

2016-09-02 Thread Kalle Valo

Bjorn Andersson  writes:

> On Fri 02 Sep 09:24 PDT 2016, Kalle Valo wrote:
>
>> Bjorn Andersson  writes:
>> 
>> > --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
>> > +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
>> > @@ -1,6 +1,6 @@
>> >  config WCN36XX
>> >tristate "Qualcomm Atheros WCN3660/3680 support"
>> > -  depends on MAC80211 && HAS_DMA
>> > +  depends on MAC80211 && HAS_DMA && QCOM_SMD
>> 
>> While I had this patch on my pending branch I noticed that I can't
>> compile wcn36xx on x86 anymore. This is actually quite inconvenient, for
>> example then it's easy to miss mac80211 API changes etc. Is there any
>> way we could continue build testing wcn36xx on x86 (or all) platforms?
>> 
>
> Sorry about that, we should at least be able to COMPILE_TEST it in
> non-qcom builds. Thanks for letting me know.

Yeah, that would be better. Even though it's a bit shame that
COMPILE_TEST disables DEBUG_INFO (I use the same build also for
development) so I need to toggle it on and off whenever I need debug
symbols. Oh well...

> And the driver should also depend on QCOM_WCNSS_CTRL, in the normal
> case.
>
> I'll respin this, unless you prefer an incremental patch for those
> changes(?)

Yes, please respin.

>> Also what about older platforms? Earlier I used wcn36xx on my Nexus 4
>> with help of backports project. I can't do that anymore, right?
>> 
>
> This has been tested on 8064, 8974 and 8916. So your Nexus 4 is covered.
>
> Unfortunately I don't have a Nexus 4, but we currently have Nexus 7
> (the 8064 version), Xperia Z, Xperia Z1 and DB410c using this chip and
> all four has been tested with this code.

Actually I meant running wcn36xx on older kernels, where your QCOM_SMD
is not yet supported.

> I've staged the PIL/remoteproc (firmware "loader") driver for v4.9, so
> with this patch the only thing missing to fill in the dts files is one
> clock from the RPM.

Nice.

> JFYI, There seems to be some race in the removal path, which I will look
> into. But I would prefer if we could merge this to make the driver
> usable, and more accessible to work on.

Sure, a race like that isn't that big of a deal compared to the benefits
your work brings. But it's good to document knows regressions to the
commit log anyway so that others can be prepared if they test it.

-- 
Kalle Valo

Re: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-02 Thread Linus Torvalds

On Fri, Sep 2, 2016 at 11:13 AM, Linus Torvalds
 wrote:
>
> From: Linus Torvalds 
> Date: Thu, 1 Sep 2016 14:43:53 -0700
> Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 
> 'bindlock'
>
> Right now we use the 'readlock' both for protecting some of the af_unix
> IO path and for making the bind be single-threaded.
>
> The two are independent, but using the same lock makes for a nasty
> deadlock due to ordering with regards to filesystem locking.  The bind
> locking would want to nest outside the VSF pathname locking, but the IO
> locking wants to nest inside some of those same locks.
>
> We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
> splice-bind deadlock") which moved the readlock inside the vfs locks,
> but that caused problems with overlayfs that will then call back into
> filesystem routines that take the lock in the wrong order anyway.
>
> Splitting the locks means that we can go back to having the bind lock be
> the outermost lock, and we don't have any deadlocks with lock ordering.
>
> Acked-by: Rainer Weikusat 
> Acked-by: Al Viro 
> Signed-off-by: Linus Torvalds 

Oh, this was missing a

  Reported-and-tested-by: CAI Qian 

who found the new deadlock.

There's now *another* lockdep deadlock report by him, but that one has
nothing to do with networking.

(And neither of these deadlocks will actually deadlock the machine in
practice, but you can trigger the lockdep reports with some odd splice
patterns and overlayfs use)

 Linus

Re: [PATCH] net/mlx4_en: protect ring->xdp_prog with rcu_read_lock

2016-09-02 Thread Brenden Blanco

On Mon, Aug 29, 2016 at 10:46:38AM -0700, Tom Herbert wrote:
[...]
> Brenden, tracking down how the structure is freed needed a few steps,
> please make sure the RCU requirements are well documented. Also, I'm
Really? It's just bpf_prog_put->call_rcu(__bpf_prog_put_rcu). I suppose
what's missing is a general guideline for which functions new consumers
of bpf should use, but I wouldn't trust myself to write such holistic
documentation accurately (e.g. interacting with nmi probes and such).
> still not a fan of using xchg to set the program, seems that a lock
> could be used in that path.
Where would such a lock go? Everything in mlx4/en_netdev.c relies on
rtnl, which seems sufficient and obvious...adding some new field
specific lock would be distracting and unneeded.
> 
> Thanks,
> Tom

[PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 'bindlock'

2016-09-02 Thread Linus Torvalds

From: Linus Torvalds 
Date: Thu, 1 Sep 2016 14:43:53 -0700
Subject: [PATCH 2/2] af_unix: split 'u->readlock' into two: 'iolock' and 
'bindlock'

Right now we use the 'readlock' both for protecting some of the af_unix
IO path and for making the bind be single-threaded.

The two are independent, but using the same lock makes for a nasty
deadlock due to ordering with regards to filesystem locking.  The bind
locking would want to nest outside the VSF pathname locking, but the IO
locking wants to nest inside some of those same locks.

We tried to fix this earlier with commit c845acb324aa ("af_unix: Fix
splice-bind deadlock") which moved the readlock inside the vfs locks,
but that caused problems with overlayfs that will then call back into
filesystem routines that take the lock in the wrong order anyway.

Splitting the locks means that we can go back to having the bind lock be
the outermost lock, and we don't have any deadlocks with lock ordering.

Acked-by: Rainer Weikusat 
Acked-by: Al Viro 
Signed-off-by: Linus Torvalds 
---

This patch is really trivial, and I've tried to be careful and look at the 
locking, but somebody who really knows the AF_UNIX code should definitely 
take a second look.

Note that I did the revert (that re-introduces the original splice 
deadlock) first, because that made the whole series much easier to 
explain. Doing it in the other order made the revert nastier because this 
patch obviously touches the same code that the revert in 1/2 does.

So this way the series ends up being "go back to the original code with 
the original deadlock, and then fix that original deadlock by splitting 
the bind lock".

 include/net/af_unix.h |  2 +-
 net/unix/af_unix.c| 45 +++--
 2 files changed, 24 insertions(+), 23 deletions(-)

diff --git a/include/net/af_unix.h b/include/net/af_unix.h
index 9b4c418bebd8..fd60eccb59a6 100644
--- a/include/net/af_unix.h
+++ b/include/net/af_unix.h
@@ -52,7 +52,7 @@ struct unix_sock {
struct sock sk;
struct unix_address *addr;
struct path path;
-   struct mutexreadlock;
+   struct mutexiolock, bindlock;
struct sock *peer;
struct list_headlink;
atomic_long_t   inflight;
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 433ae1bbef97..8309687a56b0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -661,11 +661,11 @@ static int unix_set_peek_off(struct sock *sk, int val)
 {
struct unix_sock *u = unix_sk(sk);

-   if (mutex_lock_interruptible(>readlock))
+   if (mutex_lock_interruptible(>iolock))
return -EINTR;

sk->sk_peek_off = val;
-   mutex_unlock(>readlock);
+   mutex_unlock(>iolock);

return 0;
 }
@@ -779,7 +779,8 @@ static struct sock *unix_create1(struct net *net, struct 
socket *sock, int kern)
spin_lock_init(>lock);
atomic_long_set(>inflight, 0);
INIT_LIST_HEAD(>link);
-   mutex_init(>readlock); /* single task reading lock */
+   mutex_init(>iolock); /* single task reading lock */
+   mutex_init(>bindlock); /* single task binding lock */
init_waitqueue_head(>peer_wait);
init_waitqueue_func_entry(>peer_wake, unix_dgram_peer_wake_relay);
unix_insert_socket(unix_sockets_unbound(sk), sk);
@@ -848,7 +849,7 @@ static int unix_autobind(struct socket *sock)
int err;
unsigned int retries = 0;

-   err = mutex_lock_interruptible(>readlock);
+   err = mutex_lock_interruptible(>bindlock);
if (err)
return err;

@@ -895,7 +896,7 @@ retry:
spin_unlock(_table_lock);
err = 0;

-out:   mutex_unlock(>readlock);
+out:   mutex_unlock(>bindlock);
return err;
 }

@@ -1009,7 +1010,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
goto out;
addr_len = err;

-   err = mutex_lock_interruptible(>readlock);
+   err = mutex_lock_interruptible(>bindlock);
if (err)
goto out;

@@ -1063,7 +1064,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
 out_unlock:
spin_unlock(_table_lock);
 out_up:
-   mutex_unlock(>readlock);
+   mutex_unlock(>bindlock);
 out:
return err;
 }
@@ -1955,17 +1956,17 @@ static ssize_t unix_stream_sendpage(struct socket 
*socket, struct page *page,
if (false) {
 alloc_skb:
unix_state_unlock(other);
-   mutex_unlock(_sk(other)->readlock);
+   mutex_unlock(_sk(other)->iolock);
newskb = sock_alloc_send_pskb(sk, 0, 0, flags & MSG_DONTWAIT,
  , 0);
if (!newskb)
goto err;
}

[PATCH 1/2] Revert "af_unix: Fix splice-bind deadlock"

2016-09-02 Thread Linus Torvalds


From: Linus Torvalds 
Date: Thu, 1 Sep 2016 14:56:49 -0700
Subject: [PATCH 1/2] Revert "af_unix: Fix splice-bind deadlock"

This reverts commit c845acb324aa85a39650a14e7696982ceea75dc1.

It turns out that it just replaces one deadlock with another one: we can
still get the wrong lock ordering with the readlock due to overlayfs
calling back into the filesystem layer and still taking the vfs locks
after the readlock.

The proper solution ends up being to just split the readlock into two
pieces: the bind lock (taken *outside* the vfs locks) and the IO lock
(taken *inside* the filesystem locks).  The two locks are independent
anyway.

Signed-off-by: Linus Torvalds 
---

This is not a completely clean revert, because other changes had happened 
in this area since that commit, but the conflicts were pretty trivial.

The next patch actually fixes the problem as described above ("proper 
solution").

Also, David, if you'd prefer I just apply these directly, you can just 
tell me so. But I really wanted some AF_UNIX people to look at the next 
patch regardless.

 net/unix/af_unix.c | 66 +-
 1 file changed, 26 insertions(+), 40 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index f1dffe84f0d5..433ae1bbef97 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -954,20 +954,32 @@ fail:
return NULL;
 }
 
-static int unix_mknod(struct dentry *dentry, const struct path *path, umode_t 
mode,
- struct path *res)
+static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
 {
-   int err;
+   struct dentry *dentry;
+   struct path path;
+   int err = 0;
+   /*
+* Get the parent directory, calculate the hash for last
+* component.
+*/
+   dentry = kern_path_create(AT_FDCWD, sun_path, , 0);
+   err = PTR_ERR(dentry);
+   if (IS_ERR(dentry))
+   return err;
 
-   err = security_path_mknod(path, dentry, mode, 0);
+   /*
+* All right, let's create it.
+*/
+   err = security_path_mknod(, dentry, mode, 0);
if (!err) {
-   err = vfs_mknod(d_inode(path->dentry), dentry, mode, 0);
+   err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
if (!err) {
-   res->mnt = mntget(path->mnt);
+   res->mnt = mntget(path.mnt);
res->dentry = dget(dentry);
}
}
-
+   done_path_create(, dentry);
return err;
 }
 
@@ -978,12 +990,10 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
struct unix_sock *u = unix_sk(sk);
struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
char *sun_path = sunaddr->sun_path;
-   int err, name_err;
+   int err;
unsigned int hash;
struct unix_address *addr;
struct hlist_head *list;
-   struct path path;
-   struct dentry *dentry;
 
err = -EINVAL;
if (sunaddr->sun_family != AF_UNIX)
@@ -999,34 +1009,14 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
goto out;
addr_len = err;
 
-   name_err = 0;
-   dentry = NULL;
-   if (sun_path[0]) {
-   /* Get the parent directory, calculate the hash for last
-* component.
-*/
-   dentry = kern_path_create(AT_FDCWD, sun_path, , 0);
-
-   if (IS_ERR(dentry)) {
-   /* delay report until after 'already bound' check */
-   name_err = PTR_ERR(dentry);
-   dentry = NULL;
-   }
-   }
-
err = mutex_lock_interruptible(>readlock);
if (err)
-   goto out_path;
+   goto out;
 
err = -EINVAL;
if (u->addr)
goto out_up;
 
-   if (name_err) {
-   err = name_err == -EEXIST ? -EADDRINUSE : name_err;
-   goto out_up;
-   }
-
err = -ENOMEM;
addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
if (!addr)
@@ -1037,11 +1027,11 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
addr->hash = hash ^ sk->sk_type;
atomic_set(>refcnt, 1);
 
-   if (dentry) {
-   struct path u_path;
+   if (sun_path[0]) {
+   struct path path;
umode_t mode = S_IFSOCK |
   (SOCK_INODE(sock)->i_mode & ~current_umask());
-   err = unix_mknod(dentry, , mode, _path);
+   err = unix_mknod(sun_path, mode, );
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
@@ -1049,9 +1039,9 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int

Re: [PATCH] net/mlx4_en: protect ring->xdp_prog with rcu_read_lock

2016-09-02 Thread Brenden Blanco

On Fri, Sep 02, 2016 at 01:59:40AM +0300, Saeed Mahameed wrote:
> On Wed, Aug 31, 2016 at 4:50 AM, Brenden Blanco  wrote:
> > On Tue, Aug 30, 2016 at 12:35:58PM +0300, Saeed Mahameed wrote:
[...]
> >> Sorry folks I am with Tariq on this, you can't just add a single
> >> instruction which is only valid/needed for 1% of the use cases
> >> to the driver's general data path, even if it was as cheap as one cpu 
> >> cycle!
> > How about 0?
> >
> > $ diff mlx4_en.ko.norcu.s mlx4_en.ko.rcu.s | wc -l
> > 0
> >
> 
> Well, If you put it this way, it seems OK then.
> 
> Anyway I would add a friendly comment beside the rcu_read_lock that
> "this is needed to protect
> access to ring->xdp_prog".

Thanks, I will go ahead with this then.

> 
> >>
> >> Let me try to suggest something:
> >> instead of taking the rcu_read_lock for the whole
> >> mlx4_en_process_rx_cq, we can minimize to XDP code path only
> >> by double checking xdp_prog (non-protected check followed by a
> >> protected check inside mlx4 XDP critical path).
> >>
> >> i.e instead of:
> >>
> >> rcu_read_lock();
> >>
> >> xdp_prog = ring->xdp_prog;
> >>
> >> //__Do lots of non-XDP related stuff__
> >>
> >> if (xdp_prog) {
> >> //Do XDP magic ..
> >> }
> >> //__Do more of non-XDP related stuff__
> >>
> >> rcu_read_unlock();
> >>
> >>
> >> We can minimize it to XDP critical path only:
> >>
> >> //Non protected xdp_prog dereference.
> >> if (xdp_prog) {
> >>  rcu_read_lock();
> >>  //Protected dereference to ring->xdp_prog
> >>  xdp_prog = ring->xdp_prog;
> >>  if(unlikely(!xdp_prg)) goto unlock;
> >
> > The addition of this branch and extra deref is now slowing down the xdp
> > path compared to the current proposal.
> >
> 
> Yep, but this is an unlikely condition and the critical code here is
> much smaller and it is more clear that the rcu_read_lock here meant to
> protect the ring->xdp_prog under this small xdp critical section in
> comparison to your patch where it is held across the whole RX
> function.

It's really an improper use of RCU though.

RCU is meant to provide correctness without sacrificing any performance
in the fastpath. It is designed to avoid having to double-dereference
and other such tricks, so shouldn't we use it how it was designed?
Having a larger scoped rcu_read_lock doesn't hurt anybody here, but the
extra memory reads certainly _does_ impact the XDP path, which folks are
going to start relying on to be performant. Let's not start chipping
away at that.

Re: [PATCH net-next] switchdev: Fix return value of switchdev_port_fdb_dump().

2016-09-02 Thread David Miller

From: Rami Rosen 
Date: Fri,  2 Sep 2016 14:11:57 +0300

> This patch fixes the retun value of switchdev_port_fdb_dump() when
> CONFIG_NET_SWITCHDEV is not set. This avoids getting "warning: return makes
> integer from pointer without a cast [-Wint-conversion]" when building
> when CONFIG_NET_SWITCHDEV is not set under several compiler versions.
> This warning is due to commit d297653dd6f07afbe7e6c702a4bcd7615680002e
> ("rtnetlink: fdb dump: optimize by saving last interface markers").
> 
> Signed-off-by: Rami Rosen 

Applied, thanks.

Re: [PATCH RESEND net-next 13/15] smc: receive data from RMBE

2016-09-02 Thread David Miller

From: Ursula Braun 
Date: Fri, 2 Sep 2016 15:05:01 +0200

> Understood, I wrongly used xchg() for atomicity. I now realize that I
> would need cursor locking for 32-bit architectures - something I would
> like to defer. Thus I would like to come up with V2 of SMC-R with
> builds restricted to 64-bit architectures only, and thus no usage of
> xchg() anymore.

Please don't restrict the driver build to 64-bit

Re: [PATCH v3 net-next 0/6] perf, bpf: add support for bpf in sw/hw perf_events

2016-09-02 Thread David Miller

From: Alexei Starovoitov 
Date: Thu, 1 Sep 2016 18:37:20 -0700

> Hi Peter, Dave,
> 
> this patch set is a follow up to the discussion:
> https://lkml.kernel.org/r/20160804142853.GO6862%20()%20twins%20!%20programming%20!%20kicks-ass%20!%20net
> It turned out to be simpler than what we discussed.
> 
> Patches 1-3 is bpf-side prep for the main patch 4
> that adds bpf program as an overflow_handler to sw and hw perf_events.
> 
> Patches 5 and 6 are examples from myself and Brendan.
> 
> Peter,
> to implement your suggestion to add ifdef CONFIG_BPF_SYSCALL
> inside struct perf_event, I had to shuffle ifdefs in events/core.c
> Please double check whether that is what you wanted to see.
> 
> v2->v3: fixed few more minor issues
> v1->v2: fixed issues spotted by Peter and Daniel.

Series applied, thanks.

Re: possible circular locking dependency detected

2016-09-02 Thread Rainer Weikusat

Linus Torvalds  writes:
> On Fri, Sep 2, 2016 at 10:02 AM, Al Viro  wrote:
>>
>> It's very much _not_ just overlayfs being pathological - that it certainly 
>> is,
>> but the problem is much wider.
>
> Al, can you take a look at my two patches, and see if you agree that
> they fix it, though?

The original deadlock occurred because of some code path locking the
superblock followed by trying to acquire the af_unix readlock while
unix_bind did the same in the opposite order (by doing kern_path_create
with the readlock held). If unix_bind doesn't share a lock with the
receive routines anymore, this obviously can't happen anymore.

The other problem situation is one where a lock on someting can be
acquired both by kern_path_create and a mknod operation and the readlock
is taken in between. Because that sits in between the kern_path_create
and the mknod, it can block the thread which got a certain lock via
kern_path_create because the one which is about to try to acquire it via
mknod got the readlock first. This obviously can't happen anymore the when the
original 'acquire readlock (now bindlock) first' is restored.

Re: possible circular locking dependency detected

2016-09-02 Thread Al Viro

On Fri, Sep 02, 2016 at 06:40:59PM +0100, Rainer Weikusat wrote:

> The original deadlock occurred because of some code path locking the
> superblock followed by trying to acquire the af_unix readlock while

Not even that - one code path takes ->readlock under pipe lock, while
another takes pipe lock under sb_start_write...

Re: possible circular locking dependency detected

2016-09-02 Thread Al Viro

On Fri, Sep 02, 2016 at 10:12:08AM -0700, Linus Torvalds wrote:
> On Fri, Sep 2, 2016 at 10:02 AM, Al Viro  wrote:
> >
> > It's very much _not_ just overlayfs being pathological - that it certainly 
> > is,
> > but the problem is much wider.
> 
> Al, can you take a look at my two patches, and see if you agree that
> they fix it, though?

AFAICS, they should.  Locking is obviously saner that way and AFAICS the
rest is absolutely straightforward.

Acked-by: Al Viro 

> Of course, we now have *another* splice deadlock. That pipe inode is
> nasty, it's very easy to deadlock on it in subtle ways.

I'm still digging through iomap.c, but that's better taken to another branch
of this thread...

Re: [PATCH] net/mlx4_en: protect ring->xdp_prog with rcu_read_lock

2016-09-02 Thread Brenden Blanco

On Thu, Sep 01, 2016 at 04:30:28PM -0700, Tom Herbert wrote:
[...]
> > Yep, but this is an unlikely condition and the critical code here is
> > much smaller and it is more clear that the rcu_read_lock here meant to
> > protect the ring->xdp_prog under this small xdp critical section in
> > comparison to your patch where it is held across the whole RX
> > function.
> 
> Note that there is already an rcu_read_lock potentially per packet
> buried in the function, if the whole function is under rcu_read_lock
> then that can be removed.
Yes I was aware of that, I had left it as-is since: 1. it seemed to be
in an exception path and less performance sensitive to nested calls, and
2. in case some future developer removed the top-level rcu_read_lock,
the finer-grained one would have been unprotected if not code reviewed
carefully.

I'll instead add a note at the top pointing out the dual need for the
lock, to address both yours and Saeed's comments.

As a side note, when considering the idea of moving the rcu_read_lock to
a more generic location (napi), I had toyed with the idea of
benchmarking to see if removing the actually-fast-path use of
rcu_read_lock in netif_receive_skb_internal could have any performance
benefit for the universal use case (non-xdp). However, that seems
completely out of scope at the moment, and only beneficial for
non-standard (IMO) .configs, besides being much harder to review. It was
showing up in perf at about 1-2% overhead in preempt=y kernels.
> 
> Tom

[PATCH] qed: Remove OOM messages

2016-09-02 Thread Joe Perches

These messages are unnecessary as OOM allocation failures already do
a dump_stack() giving more or less the same information.

$ size drivers/net/ethernet/qlogic/qed/built-in.o* (defconfig x86-64)
   textdata bss dec hex filename
 126849   27968   32800  187617   2dce1 
drivers/net/ethernet/qlogic/qed/built-in.o.new
 131506   27968   32800  192274   2ef12 
drivers/net/ethernet/qlogic/qed/built-in.o.old

Miscellanea:

o Change allocs to the generally preferred forms where possible.

Signed-off-by: Joe Perches 
---
 drivers/net/ethernet/qlogic/qed/qed_cxt.c  | 20 +++--
 drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 13 ++
 drivers/net/ethernet/qlogic/qed/qed_dev.c  | 57 +++---
 drivers/net/ethernet/qlogic/qed/qed_hw.c   | 12 ++
 drivers/net/ethernet/qlogic/qed/qed_init_ops.c |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_int.c  | 26 
 drivers/net/ethernet/qlogic/qed/qed_main.c |  4 +-
 drivers/net/ethernet/qlogic/qed/qed_mcp.c  |  1 -
 drivers/net/ethernet/qlogic/qed/qed_spq.c  | 31 --
 drivers/net/ethernet/qlogic/qed/qed_sriov.c|  9 ++--
 drivers/net/ethernet/qlogic/qed/qed_vf.c   | 14 ++-
 11 files changed, 49 insertions(+), 142 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c 
b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 1c35f37..e61185a 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_cxt.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
@@ -795,10 +795,9 @@ static int qed_cxt_src_t2_alloc(struct qed_hwfn *p_hwfn)
p_mngr->t2_num_pages = DIV_ROUND_UP(total_size, psz);
 
/* allocate t2 */
-   p_mngr->t2 = kzalloc(p_mngr->t2_num_pages * sizeof(struct qed_dma_mem),
+   p_mngr->t2 = kcalloc(p_mngr->t2_num_pages, sizeof(struct qed_dma_mem),
 GFP_KERNEL);
if (!p_mngr->t2) {
-   DP_NOTICE(p_hwfn, "Failed to allocate t2 table\n");
rc = -ENOMEM;
goto t2_fail;
}
@@ -963,7 +962,6 @@ static int qed_ilt_shadow_alloc(struct qed_hwfn *p_hwfn)
p_mngr->ilt_shadow = kcalloc(size, sizeof(struct qed_dma_mem),
 GFP_KERNEL);
if (!p_mngr->ilt_shadow) {
-   DP_NOTICE(p_hwfn, "Failed to allocate ilt shadow table\n");
rc = -ENOMEM;
goto ilt_shadow_fail;
}
@@ -1056,10 +1054,8 @@ int qed_cxt_mngr_alloc(struct qed_hwfn *p_hwfn)
u32 i;
 
p_mngr = kzalloc(sizeof(*p_mngr), GFP_KERNEL);
-   if (!p_mngr) {
-   DP_NOTICE(p_hwfn, "Failed to allocate `struct qed_cxt_mngr'\n");
+   if (!p_mngr)
return -ENOMEM;
-   }
 
/* Initialize ILT client registers */
clients = p_mngr->clients;
@@ -,24 +1107,18 @@ int qed_cxt_tables_alloc(struct qed_hwfn *p_hwfn)
 
/* Allocate the ILT shadow table */
rc = qed_ilt_shadow_alloc(p_hwfn);
-   if (rc) {
-   DP_NOTICE(p_hwfn, "Failed to allocate ilt memory\n");
+   if (rc)
goto tables_alloc_fail;
-   }
 
/* Allocate the T2  table */
rc = qed_cxt_src_t2_alloc(p_hwfn);
-   if (rc) {
-   DP_NOTICE(p_hwfn, "Failed to allocate T2 memory\n");
+   if (rc)
goto tables_alloc_fail;
-   }
 
/* Allocate and initialize the acquired cids bitmaps */
rc = qed_cid_map_alloc(p_hwfn);
-   if (rc) {
-   DP_NOTICE(p_hwfn, "Failed to allocate cid maps\n");
+   if (rc)
goto tables_alloc_fail;
-   }
 
return 0;
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c 
b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
index 3656d2f..54a7899 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_dcbx.c
@@ -875,11 +875,8 @@ int qed_dcbx_info_alloc(struct qed_hwfn *p_hwfn)
int rc = 0;
 
p_hwfn->p_dcbx_info = kzalloc(sizeof(*p_hwfn->p_dcbx_info), GFP_KERNEL);
-   if (!p_hwfn->p_dcbx_info) {
-   DP_NOTICE(p_hwfn,
- "Failed to allocate 'struct qed_dcbx_info'\n");
+   if (!p_hwfn->p_dcbx_info)
rc = -ENOMEM;
-   }
 
return rc;
 }
@@ -1190,10 +1187,8 @@ int qed_dcbx_get_config_params(struct qed_hwfn *p_hwfn,
}
 
dcbx_info = kzalloc(sizeof(*dcbx_info), GFP_KERNEL);
-   if (!dcbx_info) {
-   DP_ERR(p_hwfn, "Failed to allocate struct qed_dcbx_info\n");
+   if (!dcbx_info)
return -ENOMEM;
-   }
 
rc = qed_dcbx_query_params(p_hwfn, dcbx_info, QED_DCBX_OPERATIONAL_MIB);
if (rc) {
@@ -1227,10 +1222,8 @@ static struct qed_dcbx_get *qed_dcbnl_get_dcbx(struct 
qed_hwfn *hwfn,
struct qed_dcbx_get *dcbx_info;
 
dcbx_info = kzalloc(sizeof(*dcbx_info), GFP_KERNEL);
-   if (!dcbx_info) {
-   DP_ERR(hwfn->cdev, "Failed to

Re: [PATCH] qed: fix kzalloc-simple.cocci warnings

2016-09-02 Thread Joe Perches

On Thu, 2016-09-01 at 07:37 +, Yuval Mintz wrote:
> > drivers/net/ethernet/qlogic/qed/qed_dcbx.c:1230:13-20: WARNING: kzalloc
> > should be used for dcbx_info, instead of kmalloc/memset
> > drivers/net/ethernet/qlogic/qed/qed_dcbx.c:1192:13-20: WARNING: kzalloc
> > should be used for dcbx_info, instead of kmalloc/memset
> > 
> >  Use kzalloc rather than kmalloc followed by memset with 0
[]
> One question the automated script -
> Can't it [relative] easily be upgraded to also have 'Fixes:' as part
> of its message?

It's really not a "fix" as it has no real effect on behavior.

The code is perfectly fine as is really.

It is a code size reduction though.

Another thing with a behavior change and one that would also
reduce code size would be to remove the unnecessary OOM
messages after the allocs.

I'll send a patch for that.

Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> I agree. Does the following snippet looks OK?
>> 
>> 
>> #ifndef CONFIG_NET_DSA_MV88E6XXX_GLOBAL2
>> if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_GLOBAL2)) {
>> dev_err(chip->dev, "Missing support for Global 2 
>> registers\n");
>
> I would include the name of the option which needs enabling. Also it
> is not really missing. It has not been enabled.
>
> "The required compile time options needed to support this switch have
> not been enabled. Please enable: CONFIG_NET_DSA_MV88E6XXX_GLOBAL2"

That is much better indeed, respining.

Thanks!

Vivien

Re: [Patch net-next 2/2] netns: avoid disabling irq for netns id

2016-09-02 Thread Cong Wang

On Fri, Sep 2, 2016 at 9:39 AM, Cong Wang  wrote:
> On Fri, Sep 2, 2016 at 1:12 AM, Nicolas Dichtel
>  wrote:
>> Le 02/09/2016 à 06:53, Cong Wang a écrit :
>>> We never read or change netns id in hardirq context,
>>> the only place we read netns id in softirq context
>>> is in vxlan_xmit(). So, it should be enough to just
>>> disable BH.
>>
>> Are you sure? Did you audit all part of the code?
>
> I did audit all the callers, and I didn't find any of them in IRQ context.
>
>> peernet2id() is called from netlink core system (do_one_broadcast()). Are you
>> sure that no driver call this function from an hard irq context?
>
> I audit all callers of netlink_broadcast(), and I don't see any of
> them in hardirq context.

Note, you can rule out most of them by checking GFP_KERNEL,
which indicates a process context. ;) For GFP_ATOMIC cases,
I don't see any of them in hardirq context either, but I am definitely
not familiar with drivers like infiniband.

Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Andrew Lunn

> I agree. Does the following snippet looks OK?
> 
> 
> #ifndef CONFIG_NET_DSA_MV88E6XXX_GLOBAL2
> if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_GLOBAL2)) {
> dev_err(chip->dev, "Missing support for Global 2 
> registers\n");

I would include the name of the option which needs enabling. Also it
is not really missing. It has not been enabled.

"The required compile time options needed to support this switch have
not been enabled. Please enable: CONFIG_NET_DSA_MV88E6XXX_GLOBAL2"

  Andrew

Centralizing support for TCAM?

2016-09-02 Thread Florian Fainelli

Hi all,

(apologies for the long CC list and the fact that I can't type correctly
email addresses)

While working on adding support for the Broadcom Ethernet switches
Compact Field Processor (which is essentially a TCAM,
action/policer/rate meter RAMs, 256 entries), I started working with the
ethtool::rxnfc API which is actually kind of nice in that it fits nicely
with my use simple case of being able to insert rules at a given or
driver selected location and has a pretty good flow representation for
common things you may match: TCP/UDP v4/v6 (not so much for non-IP, or
L2 stuff though you can use the extension flow representation). It lacks
support for more complex actions other than redirect to a particular
port/queue.

Now ethtool::rxnfc is one possible user, but tc and netfiler also are,
more powerful and extensible, but since this is a resource constrained
piece of hardware, and it would suck for people to have to implement
these 3 APIs if we could come up with a central one that satisfies the
superset offered by tc + netfilter. We can surely imagine an use case we
centralize the whole matching + action into a Domain Specific Language
that we compiled into eBPF and then translate into whatever the HW
understands, although that raises the question of where do we put the
translation tool in user space or kernel space.

So what's everybody's take on this?

Thanks!
-- 
Florian

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread Jiri Pirko

Fri, Sep 02, 2016 at 07:08:54PM CEST, mahe...@google.com wrote:
>On Fri, Sep 2, 2016 at 9:37 AM, Eric Dumazet  wrote:
>> On Fri, 2016-09-02 at 18:25 +0200, Jiri Pirko wrote:
>>
>>> Understand the reason. All I say is we tried hard (I surely did) in
>>> the past to remove bonding specific quirks and now we are adding one.
>>> And the simple reason is that the code is such a mess.
>>>
>I would not qualify this as bonding specific change! The check of you can
>really use the interface is simple enough, and can be used by other virtual
>drivers (e.g. macvlan, ipvlan, vrf etc.). Waiting until
>rx_handler_register() can be
>called in your code and then try to do rollback is bad. Also doing
>register() early
>has it's own consequences. On contrary, I would argue that this is much 
>cleaner.

Fair enough. Would be great to do this in all other rx_handler users so
we are consistent. Thanks!


>
>Agreed that bonding-driver has become a monster but that doesn't mean we
>should not fix issues. This argument would be valid when we are
>dealing with last
>couple installations / use cases of bonding and everyone else has moved to
>alternatives. Unfortunately we are not there yet :(
>
>>> Just use team instead and you'll be fine. You can "google" it :)
>>
>> Sure, please join _our_ team and make all the needed changes in user
>> land.
>>
>:) Cannot put it more eloquently than Eric did but we would (in
>theory) love to move
>to team-driver, but logistics don't support this (yet!).
>
>> The kernel part of it is epsilon compared to all the control part (ie
>> talk to the various Google switches.), and monitoring.
>>
>> I am fine to leave the bug in upstream bonding driver, if you really
>> want to force people out of bonding land !
>>
>> Here an automated test simply used macvlan and did not remove the
>> macvlan before enslaving the netdev again in a bonding, we already fixed
>> the test, because it is faster than deploying new kernels anyway.
>>
>>
>>

Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> What do you think?
>
> I think the probe() needs to fail with a very obvious error message
> saying you need to recompile your kernel with option XYZ enabled in
> order to support this switch, when the optional stuff is not
> optional...

I agree. Does the following snippet looks OK?


#ifndef CONFIG_NET_DSA_MV88E6XXX_GLOBAL2
if (mv88e6xxx_has(chip, MV88E6XXX_FLAG_GLOBAL2)) {
dev_err(chip->dev, "Missing support for Global 2 
registers\n");
return -EOPNOTSUPP;
}
#endif /* CONFIG_NET_DSA_MV88E6XXX_GLOBAL2 */


Thanks,

Vivien

Re: possible circular locking dependency detected

2016-09-02 Thread Linus Torvalds

On Fri, Sep 2, 2016 at 10:02 AM, Al Viro  wrote:
>
> It's very much _not_ just overlayfs being pathological - that it certainly is,
> but the problem is much wider.

Al, can you take a look at my two patches, and see if you agree that
they fix it, though?

Of course, we now have *another* splice deadlock. That pipe inode is
nasty, it's very easy to deadlock on it in subtle ways.

 Linus

Re: possible circular locking dependency detected

2016-09-02 Thread Linus Torvalds

On Fri, Sep 2, 2016 at 8:51 AM, CAI Qian  wrote:
>
> Actually, I took it back, and now spice seems start to deadlock using the 
> reproducer,
>
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/splice/splice01.c

This is a different deadlock, though. This is a deadlock due to mixed
lock ordering between the pipe mutex, the XFS "mr_lock", and the inode
mutex.

If I read the lockdep trace correctly, we have:

Normally we have write doing inode->i_mutex -> i_iolock.mr_lock fro
the regular write path.

But the normal splice "write()" case has

  pipe->mutex -> filesystem write lock (normally i_mutex)

(in iter_file_splice_write() that calls vfs_iter_write() that calls
->write_iter())

and then the XFS splice case as

mr_lock -> pipe->mutex

in xfs_file_splice_read() calling splice_to_pipe().

So you end up with a A->B->C->A chain.

I think it's new to the new iomap based buffered write path in 4.8.

Dave, Christoph?

 Linus

> [ 1749.956818]
> [ 1749.958492] ==
> [ 1749.965386] [ INFO: possible circular locking dependency detected ]
> [ 1749.972381] 4.8.0-rc4+ #34 Not tainted
> [ 1749.976560] ---
> [ 1749.983554] splice01/35921 is trying to acquire lock:
> [ 1749.989188]  (>s_type->i_mutex_key#14){+.+.+.}, at: 
> [] xfs_file_buffered_aio_write+0x127/0x840 [xfs]
> [ 1750.001644]
> [ 1750.001644] but task is already holding lock:
> [ 1750.008151]  (>mutex/1){+.+.+.}, at: [] 
> pipe_lock+0x51/0x60
> [ 1750.016753]
> [ 1750.016753] which lock already depends on the new lock.
> [ 1750.016753]
> [ 1750.025880]
> [ 1750.025880] the existing dependency chain (in reverse order) is:
> [ 1750.034229]
> -> #2 (>mutex/1){+.+.+.}:
> [ 1750.039139][] lock_acquire+0x1fa/0x440
> [ 1750.045857][] mutex_lock_nested+0xdd/0x850
> [ 1750.052963][] pipe_lock+0x51/0x60
> [ 1750.059190][] splice_to_pipe+0x75/0x9e0
> [ 1750.066001][] 
> __generic_file_splice_read+0xa71/0xe90
> [ 1750.074071][] generic_file_splice_read+0xc1/0x1f0
> [ 1750.081849][] xfs_file_splice_read+0x368/0x7b0 
> [xfs]
> [ 1750.089940][] do_splice_to+0xee/0x150
> [ 1750.096555][] SyS_splice+0x1144/0x1c10
> [ 1750.103269][] do_syscall_64+0x1a6/0x500
> [ 1750.110084][] return_from_SYSCALL_64+0x0/0x7a
> [ 1750.117479]
> -> #1 (&(>i_iolock)->mr_lock#2){++}:
> [ 1750.123649][] lock_acquire+0x1fa/0x440
> [ 1750.130362][] down_write_nested+0x5e/0xe0
> [ 1750.137371][] xfs_ilock+0x2fe/0x550 [xfs]
> [ 1750.144397][] 
> xfs_file_buffered_aio_write+0x134/0x840 [xfs]
> [ 1750.153175][] xfs_file_write_iter+0x26d/0x6d0 
> [xfs]
> [ 1750.161177][] __vfs_write+0x2be/0x640
> [ 1750.167799][] vfs_write+0x152/0x4b0
> [ 1750.174220][] SyS_write+0xdf/0x1d0
> [ 1750.180547][] entry_SYSCALL_64_fastpath+0x1f/0xbd
> [ 1750.188328]
> -> #0 (>s_type->i_mutex_key#14){+.+.+.}:
> [ 1750.194508][] __lock_acquire+0x3043/0x3dd0
> [ 1750.201609][] lock_acquire+0x1fa/0x440
> [ 1750.208321][] down_write+0x5a/0xe0
> [ 1750.214645][] 
> xfs_file_buffered_aio_write+0x127/0x840 [xfs]
> [ 1750.223421][] xfs_file_write_iter+0x26d/0x6d0 
> [xfs]
> [ 1750.231423][] vfs_iter_write+0x29e/0x550
> [ 1750.238330][] iter_file_splice_write+0x529/0xb70
> [ 1750.246012][] SyS_splice+0x724/0x1c10
> [ 1750.252627][] do_syscall_64+0x1a6/0x500
> [ 1750.259438][] return_from_SYSCALL_64+0x0/0x7a
> [ 1750.266830]
> [ 1750.266830] other info that might help us debug this:
> [ 1750.266830]
> [ 1750.275764] Chain exists of:
>   >s_type->i_mutex_key#14 --> &(>i_iolock)->mr_lock#2 --> 
> >mutex/1
>
> [ 1750.287213]  Possible unsafe locking scenario:
> [ 1750.287213]
> [ 1750.293817]CPU0CPU1
> [ 1750.298871]
> [ 1750.303924]   lock(>mutex/1);
> [ 1750.307845]
> lock(&(>i_iolock)->mr_lock#2);
> [ 1750.315836]lock(>mutex/1);
> [ 1750.322567]   lock(>s_type->i_mutex_key#14);
> [ 1750.327748]
> [ 1750.327748]  *** DEADLOCK ***
> [ 1750.327748]
> [ 1750.334355] 2 locks held by splice01/35921:
> [ 1750.339019]  #0:  (sb_writers#8){.+.+.+}, at: [] 
> __sb_start_write+0xb4/0xf0
> [ 1750.348595]  #1:  (>mutex/1){+.+.+.}, at: [] 
> pipe_lock+0x51/0x60
> [ 1750.357686]
> [ 1750.357686] stack backtrace:
> [ 1750.362548] CPU: 50 PID: 35921 Comm: splice01 Not tainted 4.8.0-rc4+ #34
> [ 1750.370026] Hardware name: Intel Corporation S2600WTT/S2600WTT, BIOS 
> GRNDSDP1.86B.0044.R00.1501191641 01/19/2015
> [ 1750.381382]   3bca9477 88044c4176e0 
> 81a3d191
> [ 1750.389675]  84292880 842b9e30 88044c417730 
> 812a6aa6
> [ 1750.397968]

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread महेश बंडेवार

On Fri, Sep 2, 2016 at 9:37 AM, Eric Dumazet  wrote:
> On Fri, 2016-09-02 at 18:25 +0200, Jiri Pirko wrote:
>
>> Understand the reason. All I say is we tried hard (I surely did) in
>> the past to remove bonding specific quirks and now we are adding one.
>> And the simple reason is that the code is such a mess.
>>
I would not qualify this as bonding specific change! The check of you can
really use the interface is simple enough, and can be used by other virtual
drivers (e.g. macvlan, ipvlan, vrf etc.). Waiting until
rx_handler_register() can be
called in your code and then try to do rollback is bad. Also doing
register() early
has it's own consequences. On contrary, I would argue that this is much cleaner.

Agreed that bonding-driver has become a monster but that doesn't mean we
should not fix issues. This argument would be valid when we are
dealing with last
couple installations / use cases of bonding and everyone else has moved to
alternatives. Unfortunately we are not there yet :(

>> Just use team instead and you'll be fine. You can "google" it :)
>
> Sure, please join _our_ team and make all the needed changes in user
> land.
>
:) Cannot put it more eloquently than Eric did but we would (in
theory) love to move
to team-driver, but logistics don't support this (yet!).

> The kernel part of it is epsilon compared to all the control part (ie
> talk to the various Google switches.), and monitoring.
>
> I am fine to leave the bug in upstream bonding driver, if you really
> want to force people out of bonding land !
>
> Here an automated test simply used macvlan and did not remove the
> macvlan before enslaving the netdev again in a bonding, we already fixed
> the test, because it is faster than deploying new kernels anyway.
>
>
>

Minimum MTU Mess

2016-09-02 Thread Jarod Wilson

So... I had a bug reported, about a NIC that ceased to work, if it's MTU
was set to 0, then back to it's original value (1500). This got me
thinking... What does an MTU of 0 even mean? Why should it be allowed?

As it turns out, most (but not all) network drivers have a check in their
ndo_change_mtu function that returns -EINVAL if new_mtu < $magic_number,
which is often 68 (per page 25 of RFC 791), but is sometimes 64, or 60, or
46... Sometimes other manipulations are done. But the short version is
that it seems there's an almost universal need to check for a minimum MTU.
There's just some disagreement on what that minimum is.

So, rather than having nearly every ndo_change_mut callback do the exact
same thing, would it make sense to settle on one minimum MTU value, and
check that unilaterally (at least for certain netdev types) in
net/core/dev.c's dev_set_mtu()? Or is intentionally left vague, because
it's really up to the hardware to care?

Alternatively, perhaps each driver should set a netdev->min_mtu value, and
net/core/dev.c could check against that. Could even have a centralized
IP_MIN_MTU of 68 that all devices using ether_setup() and/or
alloc_etherdev() used by default.

In any case, the number of "mtu < 68" and "#define FOO_MIN_MTU 68", or
variations thereof, under drivers/net/ is kind of crazy.

In the interim, I think I'll just do a 3-line patch for this one driver
that mirrors all the existing drivers to keep it from going splat...

-- 
Jarod Wilson
ja...@redhat.com

Re: possible circular locking dependency detected

2016-09-02 Thread Al Viro

On Fri, Sep 02, 2016 at 05:10:13PM +0100, Rainer Weikusat wrote:

> > Bullshit.  kern_path_create() *does* mean the same thing in all cases.
> > Namely, find the parent, lock it and leave the final name component for
> > the create-type operation.  It sure as hell is not guaranteed to take
> > *all* locks that are going to be taken in process of mknod/mkdir/etc.
> > Never had been.
> 
> This isn't about "all locks", it's about the lock in question. No other
> mknod operation (I'm aware of) calls this with another superblock than
> the one already acted upon by kern_path_create. This may be wrong (if
> so, feel free to correct it) but it's not "bullshit" (intentional
> deception in order to sell something to someone).
> 

Never had been promised.  And it's not just this lock - e.g. ->i_rwsem is
taken on the parent by kern_path_create() and on parent in underlying
filesystem by ecryptfs ->mknod() (as well as overlayfs one).  bind/bind
deadlock - one for a path to ecryptfs, another for that on the raw
filesystem behind it (which can be mounted elsewhere/in another namespace/etc.)
with those paths ending in the matching directories (the last components may
be same or different - doesn't matter)

A: lock parent in ecryptfs (via kern_path_create())
B: lock the directory behind it in underlying fs (ditto)
A: grab ->readlock
B: block on ->readlock
A: call ecryptfs_mknod() and block trying to lock the directory held by B

Deadlock.  And while we are at it, ecryptfs probably ought to claim transient
write access for the duration of modifications of the underlying one similar
to overlayfs.  The point is, it had never been promised that you can stick
random locks just outside of ->mknod()/->mkdir()/etc.  The same goes for e.g.
NFS mount of something exported by localhost; knfsd must lock the parent
directory on server before creating anything in it.  Suppose you have
/srv/nfs/foo exported and mounted on the same host at /mnt/bar.  bind to
/mnt/bar/barf/a vs. bind to /srv/nfs/foo/barf/b:
A: lock /mnt/bar/barf
B: lock /srv/nfs/foo/barf
A: grab ->readlock
B: block on ->readlock
A: call nfs_mknod(), wait for reply
knfsd: block trying to lock /srv/nfs/foo/barf

It's very much _not_ just overlayfs being pathological - that it certainly is,
but the problem is much wider.  You could try to argue that kern_path_create()
should've known to lock all relevant directories in case of overlayfs and
ecryptfs, but it has absolutely no chance to do that in case of NFS - the
protocol doesn't allow "lock this directory, one of my next requests will
be to create something in it".  Even leases are only for regular files...

Re: e1000e on Thinkpad x60: gigabit not available due to "SmartSpeed"

2016-09-02 Thread Lennart Sorensen

On Thu, Sep 01, 2016 at 02:58:13PM -0700, Greg wrote:
> On Thu, 2016-09-01 at 22:14 +0200, Pavel Machek wrote:
> > Hi!
> > 
> > I have trouble getting 1000mbit out of my ethernet card.
> > 
> > I tried direct connection between two PCs with different cables, and
> > no luck.
> > 
> > Today I tried connection to 1000mbit switch, and no luck, either. (Two
> > cables, one was cat6, both short).
> > 
> > My computer sees 1000mbit being advertised by the other side, but does
> > not advertise 1000mbit, "Link Speed was downgraded by SmartSpeed".
> 
> Check your cables?
> 
> https://vmxp.wordpress.com/2015/01/06/1gbe-intel-nic-throttled-to-100mbit-by-smartspeed/

Of course if it isn't the cable, then it could even be a broken pin in
the port.  As far as I can tell, anything that causes one of the 3rd
or 4th pairs of wires to not work will degrade to 100Mbit on just the
first 2 pairs of wires and give that message.  Some badly implemented
switches can also cause it of course.

-- 
Len Sorensen

[PATCH net-next v4 1/2] net: ethernet: mediatek: enhance RX path by reducing the frequency of the memory barrier used

2016-09-02 Thread sean.wang

From: Sean Wang 

The patch makes move wmb() to outside the loop that could help
RX path handling more faster although that RX descriptors aren't
freed for DMA to use as soon as possible, but based on my experiment
and the result shows it still can reach about 943Mbpis without
performance drop that is tested based on the setup with one port
using Giga PHY and 256 RX descriptors for DMA to move.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 048d763..33bb10f 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -951,13 +951,14 @@ release_desc:
rxd->rxd2 = RX_DMA_PLEN0(ring->buf_size);
 
ring->calc_idx = idx;
-   /* make sure that all changes to the dma ring are flushed before
-* we continue
-*/
-   wmb();
-   mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
+
done++;
}
+   /* make sure that all changes to the dma ring are flushed before
+* we continue
+*/
+   wmb();
+   mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
 
if (done < budget)
mtk_w32(eth, MTK_RX_DONE_INT, MTK_PDMA_INT_STATUS);
-- 
1.9.1

[PATCH net-next v4 2/2] net: ethernet: mediatek: enhance RX path by aggregating more SKBs into NAPI

2016-09-02 Thread sean.wang

From: Sean Wang 

The patch adds support for aggregating more SKBs feed into NAPI in
order to get more benefits from generic receive offload (GRO) by
peeking at the RX ring status and moving more packets right before
returning from NAPI RX polling handler if NAPI budgets are still
available and some packets already present in hardware.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 33bb10f..44cab1b 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -960,9 +960,6 @@ release_desc:
wmb();
mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
 
-   if (done < budget)
-   mtk_w32(eth, MTK_RX_DONE_INT, MTK_PDMA_INT_STATUS);
-
return done;
 }
 
@@ -1081,10 +1078,13 @@ static int mtk_napi_rx(struct napi_struct *napi, int 
budget)
struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
u32 status, mask;
int rx_done = 0;
+   int remain_budget = budget;
 
mtk_handle_status_irq(eth);
+
+poll_again:
mtk_w32(eth, MTK_RX_DONE_INT, MTK_PDMA_INT_STATUS);
-   rx_done = mtk_poll_rx(napi, budget, eth);
+   rx_done = mtk_poll_rx(napi, remain_budget, eth);
 
if (unlikely(netif_msg_intr(eth))) {
status = mtk_r32(eth, MTK_PDMA_INT_STATUS);
@@ -1093,14 +1093,14 @@ static int mtk_napi_rx(struct napi_struct *napi, int 
budget)
 "done rx %d, intr 0x%08x/0x%x\n",
 rx_done, status, mask);
}
-
-   if (rx_done == budget)
+   if (rx_done == remain_budget)
return budget;
 
status = mtk_r32(eth, MTK_PDMA_INT_STATUS);
-   if (status & MTK_RX_DONE_INT)
-   return budget;
-
+   if (status & MTK_RX_DONE_INT) {
+   remain_budget -= rx_done;
+   goto poll_again;
+   }
napi_complete(napi);
mtk_irq_enable(eth, MTK_PDMA_INT_MASK, MTK_RX_DONE_INT);
 
-- 
1.9.1

[PATCH net-next v4 0/2] net: ethernet: mediatek: add enhancements to RX path

2016-09-02 Thread sean.wang

From: Sean Wang 

Changes since v1:
- fix message typos and add coverletter
  
Changes since v2:
- split from the previous series for submitting add enhancements as 
a series targeting 'net-next' and add indents before comments.

Changes since v3:
- merge the patch using PDMA RX path
- fixed the input of mtk_poll_rx is with the remaining budget

Sean Wang (2):
  net: ethernet: mediatek: enhance RX path by reducing the frequency of
the memory barrier used
  net: ethernet: mediatek: enhance RX path by aggregating more SKBs into
NAPI

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

-- 
1.9.1

Re: [PATCH 4/4] net: smsc911x: Move interrupt allocation to open/stop

2016-09-02 Thread Will Deacon

On Thu, Sep 01, 2016 at 03:15:09PM -0500, Jeremy Linton wrote:
> The /proc/irq/xx information is incorrect for smsc911x because
> the request_irq is happening before the register_netdev has the
> proper device name. Moving it to the open also fixes the case
> of when the device is renamed.
> 
> Reported-by: Will Deacon 
> Signed-off-by: Jeremy Linton 
> ---
>  drivers/net/ethernet/smsc/smsc911x.c | 47 
> ++--
>  1 file changed, 18 insertions(+), 29 deletions(-)

FWIW, this fixes the interface naming under /proc//ethX for me:

Tested-by: Will Deacon 

Will

Re: possible circular locking dependency detected

2016-09-02 Thread Rainer Weikusat

Linus Torvalds  writes:
> On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds
>  wrote:
>> On Thu, Sep 1, 2016 at 2:01 PM, Al Viro  wrote:
>>>
>>> Outside as in "all fs activity in bind happens under it".  Along with
>>> assignment to ->u.addr, etc.  IOW, make it the outermost lock there.
>>
>> Hah, yes. I misunderstood you.
>>
>> Yes. In fact that fixes the problem I mentioned, rather than introducing it.
>
> So the easiest approach would seem to be to revert commit c845acb324aa
> ("af_unix: Fix splice-bind deadlock"), and then apply the lock split.
>
> Like the attached two patches.
>
> This is still *entirely* untested.

As far as I can tell, this should work as I can't currently imagine why
a fs operation might end up binding a unix socket despite the idea to
make af_unix.c yet more complicated in order to work around irregular
behaviour of (as far as I can tell) a single filesystem (for which
kern_path_create doesn't really mean kern_path_create and it has to work
around that once it gets control) goes against all instincts I have in
this area. If filesystems need to do arbitrary stuff when
__sb_start_write is called for 'their' superblock, they should be able
to do so directly.

At present, this is a theoretic concern as I can't (due to other work
committments) put any non-cursory work into this before Sunday. There
may also be other reasons why this idea is impractical or even
unworkable.

Re: possible circular locking dependency detected

2016-09-02 Thread CAI Qian



- Original Message -
> From: "CAI Qian" 
> To: "Linus Torvalds" 
> Cc: "Al Viro" , "Miklos Szeredi" 
> , "Rainer Weikusat"
> , "Hannes Frederic Sowa" 
> , "Rainer Weikusat"
> , "Eric Sandeen" , 
> "Network Development"
> 
> Sent: Friday, September 2, 2016 11:51:58 AM
> Subject: Re: possible circular locking dependency detected
> 
> 
> 
> - Original Message -
> > From: "CAI Qian" 
> > To: "Linus Torvalds" 
> > Cc: "Al Viro" , "Miklos Szeredi"
> > , "Rainer Weikusat"
> > , "Hannes Frederic Sowa"
> > , "Rainer Weikusat"
> > , "Eric Sandeen" ,
> > "Network Development"
> > 
> > Sent: Friday, September 2, 2016 10:43:20 AM
> > Subject: Re: possible circular locking dependency detected
> > 
> > 
> > 
> > - Original Message -
> > > From: "Linus Torvalds" 
> > > To: "Al Viro" , "CAI Qian" 
> > > Cc: "Miklos Szeredi" , "Rainer Weikusat"
> > > , "Hannes Frederic Sowa"
> > > , "Rainer Weikusat"
> > > , "Eric Sandeen"
> > > , "Network Development" 
> > > Sent: Thursday, September 1, 2016 6:04:38 PM
> > > Subject: Re: possible circular locking dependency detected
> > > 
> > > On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds
> > >  wrote:
> > > > On Thu, Sep 1, 2016 at 2:01 PM, Al Viro 
> > > > wrote:
> > > >>
> > > >> Outside as in "all fs activity in bind happens under it".  Along with
> > > >> assignment to ->u.addr, etc.  IOW, make it the outermost lock there.
> > > >
> > > > Hah, yes. I misunderstood you.
> > > >
> > > > Yes. In fact that fixes the problem I mentioned, rather than
> > > > introducing
> > > > it.
> > > 
> > > So the easiest approach would seem to be to revert commit c845acb324aa
> > > ("af_unix: Fix splice-bind deadlock"), and then apply the lock split.
> > > 
> > > Like the attached two patches.
> > > 
> > > This is still *entirely* untested.
> > Tested-by: CAI Qian 
OK, this tag still stand. The below issue is also reproduced without those
patches, so a separate problem most likely was introduced recently (after
rc3 or rc4) by probably some xfs update.
   CAI Qian 
> Actually, I took it back, and now spice seems start to deadlock using the
> reproducer,
> 
> https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/splice/splice01.c
> 
> [ 1749.956818]
> [ 1749.958492] ==
> [ 1749.965386] [ INFO: possible circular locking dependency detected ]
> [ 1749.972381] 4.8.0-rc4+ #34 Not tainted
> [ 1749.976560] ---
> [ 1749.983554] splice01/35921 is trying to acquire lock:
> [ 1749.989188]  (>s_type->i_mutex_key#14){+.+.+.}, at:
> [] xfs_file_buffered_aio_write+0x127/0x840 [xfs]
> [ 1750.001644]
> [ 1750.001644] but task is already holding lock:
> [ 1750.008151]  (>mutex/1){+.+.+.}, at: []
> pipe_lock+0x51/0x60
> [ 1750.016753]
> [ 1750.016753] which lock already depends on the new lock.
> [ 1750.016753]
> [ 1750.025880]
> [ 1750.025880] the existing dependency chain (in reverse order) is:
> [ 1750.034229]
> -> #2 (>mutex/1){+.+.+.}:
> [ 1750.039139][] lock_acquire+0x1fa/0x440
> [ 1750.045857][] mutex_lock_nested+0xdd/0x850
> [ 1750.052963][] pipe_lock+0x51/0x60
> [ 1750.059190][] splice_to_pipe+0x75/0x9e0
> [ 1750.066001][]
> __generic_file_splice_read+0xa71/0xe90
> [ 1750.074071][]
> generic_file_splice_read+0xc1/0x1f0
> [ 1750.081849][] xfs_file_splice_read+0x368/0x7b0
> [xfs]
> [ 1750.089940][] do_splice_to+0xee/0x150
> [ 1750.096555][] SyS_splice+0x1144/0x1c10
> [ 1750.103269][] do_syscall_64+0x1a6/0x500
> [ 1750.110084][] return_from_SYSCALL_64+0x0/0x7a
> [ 1750.117479]
> -> #1 (&(>i_iolock)->mr_lock#2){++}:
> [ 1750.123649][] lock_acquire+0x1fa/0x440
> [ 1750.130362][] down_write_nested+0x5e/0xe0
> [ 1750.137371][] xfs_ilock+0x2fe/0x550 [xfs]
> [ 1750.144397][]
> xfs_file_buffered_aio_write+0x134/0x840 [xfs]
> [ 1750.153175][] xfs_file_write_iter+0x26d/0x6d0
> [xfs]
> [ 1750.161177][] __vfs_write+0x2be/0x640
> [ 1750.167799][] vfs_write+0x152/0x4b0
> [ 1750.174220][] SyS_write+0xdf/0x1d0
> [ 1750.180547][]
>

Re: [PATCH v3 1/2] wcn36xx: Transition driver to SMD client

2016-09-02 Thread Bjorn Andersson

On Fri 02 Sep 09:24 PDT 2016, Kalle Valo wrote:

> Bjorn Andersson  writes:
> 
> > The wcn36xx wifi driver follows the life cycle of the WLAN_CTRL SMD
> > channel, as such it should be a SMD client. This patch makes this
> > transition, now that we have the necessary frameworks available.
> >
> > Signed-off-by: Bjorn Andersson 
> 
> [...]
> 
> > --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
> > +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
> > @@ -1,6 +1,6 @@
> >  config WCN36XX
> > tristate "Qualcomm Atheros WCN3660/3680 support"
> > -   depends on MAC80211 && HAS_DMA
> > +   depends on MAC80211 && HAS_DMA && QCOM_SMD
> 
> While I had this patch on my pending branch I noticed that I can't
> compile wcn36xx on x86 anymore. This is actually quite inconvenient, for
> example then it's easy to miss mac80211 API changes etc. Is there any
> way we could continue build testing wcn36xx on x86 (or all) platforms?
> 

Sorry about that, we should at least be able to COMPILE_TEST it in
non-qcom builds. Thanks for letting me know.

And the driver should also depend on QCOM_WCNSS_CTRL, in the normal
case.

I'll respin this, unless you prefer an incremental patch for those
changes(?)

> Also what about older platforms? Earlier I used wcn36xx on my Nexus 4
> with help of backports project. I can't do that anymore, right?
> 

This has been tested on 8064, 8974 and 8916. So your Nexus 4 is covered.

Unfortunately I don't have a Nexus 4, but we currently have Nexus 7
(the 8064 version), Xperia Z, Xperia Z1 and DB410c using this chip and
all four has been tested with this code.

I've staged the PIL/remoteproc (firmware "loader") driver for v4.9, so
with this patch the only thing missing to fill in the dts files is one
clock from the RPM.

JFYI, There seems to be some race in the removal path, which I will look
into. But I would prefer if we could merge this to make the driver
usable, and more accessible to work on.

Regards,
Bjorn

Re: [PATCH net 0/2] vxlan: fix error reporting

2016-09-02 Thread Jiri Benc

On Fri, 2 Sep 2016 09:03:11 -0700, Stephen Hemminger wrote:
> On Fri,  2 Sep 2016 13:37:10 +0200
> Jiri Benc  wrote:
> 
> > This patchset improves checking for invalid configuration in VXLAN and fixes
> > problems with duplicated and inappropriate error messages.
> > 
> > Jiri Benc (2):
> >   vxlan: reject multicast destination without an interface
> >   vxlan: fix duplicated and wrong error messages
> > 
> >  drivers/net/vxlan.c | 38 --
> >  1 file changed, 12 insertions(+), 26 deletions(-)
> > 
> 
> These should also be detected and rejected in iproute2
> errors in kernel log are user unfriendly api.

I agree and I intend to send a patch once this is accepted.

 Jiri

Re: [Patch net-next 2/2] netns: avoid disabling irq for netns id

2016-09-02 Thread Cong Wang

On Fri, Sep 2, 2016 at 1:12 AM, Nicolas Dichtel
 wrote:
> Le 02/09/2016 à 06:53, Cong Wang a écrit :
>> We never read or change netns id in hardirq context,
>> the only place we read netns id in softirq context
>> is in vxlan_xmit(). So, it should be enough to just
>> disable BH.
>
> Are you sure? Did you audit all part of the code?

I did audit all the callers, and I didn't find any of them in IRQ context.

> peernet2id() is called from netlink core system (do_one_broadcast()). Are you
> sure that no driver call this function from an hard irq context?

I audit all callers of netlink_broadcast(), and I don't see any of
them in hardirq context.

>
> I think that NETLINK_LISTEN_ALL_NSID is largely untested, so it will be hard 
> to
> detect a bug introduced in this feature.

This patch passed my netns stress tests, I have LOCKDEP enabled,
and I don't get any warning or crash etc.

Re: [PATCH v3 1/2] wcn36xx: Transition driver to SMD client

2016-09-02 Thread Marcel Holtmann

Hi Kalle,

>> The wcn36xx wifi driver follows the life cycle of the WLAN_CTRL SMD
>> channel, as such it should be a SMD client. This patch makes this
>> transition, now that we have the necessary frameworks available.
>> 
>> Signed-off-by: Bjorn Andersson 
> 
> [...]
> 
>> --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
>> +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
>> @@ -1,6 +1,6 @@
>> config WCN36XX
>>  tristate "Qualcomm Atheros WCN3660/3680 support"
>> -depends on MAC80211 && HAS_DMA
>> +depends on MAC80211 && HAS_DMA && QCOM_SMD
> 
> While I had this patch on my pending branch I noticed that I can't
> compile wcn36xx on x86 anymore. This is actually quite inconvenient, for
> example then it's easy to miss mac80211 API changes etc. Is there any
> way we could continue build testing wcn36xx on x86 (or all) platforms?

I would also like that for the btqcomsmd Bluetooth driver. Doing quick build 
tests on x86 would be great.

Regards

Marcel

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread Eric Dumazet

On Fri, 2016-09-02 at 18:25 +0200, Jiri Pirko wrote:

> Understand the reason. All I say is we tried hard (I surely did) in
> the past to remove bonding specific quirks and now we are adding one.
> And the simple reason is that the code is such a mess.
> 
> Just use team instead and you'll be fine. You can "google" it :)

Sure, please join _our_ team and make all the needed changes in user
land.

The kernel part of it is epsilon compared to all the control part (ie
talk to the various Google switches.), and monitoring.

I am fine to leave the bug in upstream bonding driver, if you really
want to force people out of bonding land !

Here an automated test simply used macvlan and did not remove the
macvlan before enslaving the netdev again in a bonding, we already fixed
the test, because it is faster than deploying new kernels anyway.

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread Jiri Pirko

Fri, Sep 02, 2016 at 03:33:20PM CEST, eric.duma...@gmail.com wrote:
>On Fri, 2016-09-02 at 09:52 +0200, Jiri Pirko wrote:
>
>> 
>> No, please, don't make bonding a spacial citizen introducing this.
>> Please handle the issue inside the bonding code, like we do for the rest
>> of master devices (and how it was once done for bonding). Thanks.
>
>I do not feel this netdev_is_rx_handler_busy() use is special to
>bonding.
>
>It makes sense to use it early, to avoid complex rollback, once various
>events have been sent all over.
>
>bond_enslave() is 447 lines long already.
>
>You perfectly know how hard it is to 'handle the issue inside the
>bonding code' as you chose to not fix bonding and write team instead.
>
>So Mahesh patch makes perfect sense to me. It exactly fixes a stupid
>bond_enslave() behavior, trying to set rx_handler way too late.
>
>By doing sanity checks before any action, we simply do not have to add
>complex rollback.

Understand the reason. All I say is we tried hard (I surely did) in
the past to remove bonding specific quirks and now we are adding one.
And the simple reason is that the code is such a mess.

Just use team instead and you'll be fine. You can "google" it :)

Re: [PATCH v3 1/2] wcn36xx: Transition driver to SMD client

2016-09-02 Thread Kalle Valo

Bjorn Andersson  writes:

> The wcn36xx wifi driver follows the life cycle of the WLAN_CTRL SMD
> channel, as such it should be a SMD client. This patch makes this
> transition, now that we have the necessary frameworks available.
>
> Signed-off-by: Bjorn Andersson 

[...]

> --- a/drivers/net/wireless/ath/wcn36xx/Kconfig
> +++ b/drivers/net/wireless/ath/wcn36xx/Kconfig
> @@ -1,6 +1,6 @@
>  config WCN36XX
>   tristate "Qualcomm Atheros WCN3660/3680 support"
> - depends on MAC80211 && HAS_DMA
> + depends on MAC80211 && HAS_DMA && QCOM_SMD

While I had this patch on my pending branch I noticed that I can't
compile wcn36xx on x86 anymore. This is actually quite inconvenient, for
example then it's easy to miss mac80211 API changes etc. Is there any
way we could continue build testing wcn36xx on x86 (or all) platforms?

Also what about older platforms? Earlier I used wcn36xx on my Nexus 4
with help of backports project. I can't do that anymore, right?

-- 
Kalle Valo

Re: possible circular locking dependency detected

2016-09-02 Thread Rainer Weikusat

Al Viro  writes:
> On Fri, Sep 02, 2016 at 04:18:04PM +0100, Rainer Weikusat wrote:
>
>> As far as I can tell, this should work as I can't currently imagine
>> why a fs operation might end up binding a unix socket despite the
>> idea to make af_unix.c yet more complicated in order to work around
>> irregular behaviour of (as far as I can tell) a single filesystem
>> (for which kern_path_create doesn't really mean kern_path_create and
>> it has to work around that once it gets control) goes against all
>> instincts I have in this area. If filesystems need to do arbitrary
>> stuff when __sb_start_write is called for 'their' superblock, they
>> should be able to do so directly.
>
> Bullshit.  kern_path_create() *does* mean the same thing in all cases.
> Namely, find the parent, lock it and leave the final name component for
> the create-type operation.  It sure as hell is not guaranteed to take
> *all* locks that are going to be taken in process of mknod/mkdir/etc.
> Never had been.

This isn't about "all locks", it's about the lock in question. No other
mknod operation (I'm aware of) calls this with another superblock than
the one already acted upon by kern_path_create. This may be wrong (if
so, feel free to correct it) but it's not "bullshit" (intentional
deception in order to sell something to someone).

[PATCH net-next v3 0/2] net: ethernet: mediatek: add enhancements to RX path

2016-09-02 Thread sean.wang

From: Sean Wang 

Changes since v1:
- fix message typos and add coverletter
  
Changes since v2:
- split from the previous series for submitting add enhancements as 
a series targeting 'net-next' and add indents before comments.

Changes since v3:
- merge the patch using PDMA RX path
- fixed the input of mtk_poll_rx is with the remaining budget

Sean Wang (2):
  net: ethernet: mediatek: enhance RX path by reducing the frequency of
the memory barrier used
  net: ethernet: mediatek: enhance RX path by aggregating more SKBs into
NAPI

 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 29 +++--
 1 file changed, 15 insertions(+), 14 deletions(-)

-- 
1.9.1

[PATCH net-next v3 1/2] net: ethernet: mediatek: enhance RX path by reducing the frequency of the memory barrier used

2016-09-02 Thread sean.wang

From: Sean Wang 

The patch makes move wmb() to outside the loop that could help
RX path handling more faster although that RX descriptors aren't
freed for DMA to use as soon as possible, but based on my experiment
and the result shows it still can reach about 943Mbpis without
performance drop that is tested based on the setup with one port
using Giga PHY and 256 RX descriptors for DMA to move.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 048d763..33bb10f 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -951,13 +951,14 @@ release_desc:
rxd->rxd2 = RX_DMA_PLEN0(ring->buf_size);
 
ring->calc_idx = idx;
-   /* make sure that all changes to the dma ring are flushed before
-* we continue
-*/
-   wmb();
-   mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
+
done++;
}
+   /* make sure that all changes to the dma ring are flushed before
+* we continue
+*/
+   wmb();
+   mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
 
if (done < budget)
mtk_w32(eth, MTK_RX_DONE_INT, MTK_PDMA_INT_STATUS);
-- 
1.9.1

[PATCH net-next v3 2/2] net: ethernet: mediatek: enhance RX path by aggregating more SKBs into NAPI

2016-09-02 Thread sean.wang

From: Sean Wang 

The patch adds support for aggregating more SKBs feed into NAPI in
order to get more benefits from generic receive offload (GRO) by
peeking at the RX ring status and moving more packets right before
returning from NAPI RX polling handler if NAPI budgets are still
available and some packets already present in hardware.

Signed-off-by: Sean Wang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 33bb10f..44cab1b 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -960,9 +960,6 @@ release_desc:
wmb();
mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
 
-   if (done < budget)
-   mtk_w32(eth, MTK_RX_DONE_INT, MTK_PDMA_INT_STATUS);
-
return done;
 }
 
@@ -1081,10 +1078,13 @@ static int mtk_napi_rx(struct napi_struct *napi, int 
budget)
struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
u32 status, mask;
int rx_done = 0;
+   int remain_budget = budget;
 
mtk_handle_status_irq(eth);
+
+poll_again:
mtk_w32(eth, MTK_RX_DONE_INT, MTK_PDMA_INT_STATUS);
-   rx_done = mtk_poll_rx(napi, budget, eth);
+   rx_done = mtk_poll_rx(napi, remain_budget, eth);
 
if (unlikely(netif_msg_intr(eth))) {
status = mtk_r32(eth, MTK_PDMA_INT_STATUS);
@@ -1093,14 +1093,14 @@ static int mtk_napi_rx(struct napi_struct *napi, int 
budget)
 "done rx %d, intr 0x%08x/0x%x\n",
 rx_done, status, mask);
}
-
-   if (rx_done == budget)
+   if (rx_done == remain_budget)
return budget;
 
status = mtk_r32(eth, MTK_PDMA_INT_STATUS);
-   if (status & MTK_RX_DONE_INT)
-   return budget;
-
+   if (status & MTK_RX_DONE_INT) {
+   remain_budget -= rx_done;
+   goto poll_again;
+   }
napi_complete(napi);
mtk_irq_enable(eth, MTK_PDMA_INT_MASK, MTK_RX_DONE_INT);
 
-- 
1.9.1

Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Andrew Lunn

Hi Vivien

> What do you think?

I think the probe() needs to fail with a very obvious error message
saying you need to recompile your kernel with option XYZ enabled in
order to support this switch, when the optional stuff is not
optional...

Andrew

Re: [v2] ath9k: mark ath_fill_led_pin() static

2016-09-02 Thread Kalle Valo

Baoyou Xie  wrote:
> We get 1 warning about global functions without a declaration
> in the ath9k gpio driver when building with W=1:
> drivers/net/wireless/ath/ath9k/gpio.c:25:6: warning: no previous prototype 
> for 'ath_fill_led_pin' [-Wmissing-prototypes]
> 
> In fact, this function is only used in the file in which it is declared
> and don't need a declaration, but can be made static.
> so this patch marks it 'static'.
> 
> Signed-off-by: Baoyou Xie 

Thanks, 1 patch applied to ath-next branch of ath.git:

c39265f72ae6 ath9k: mark ath_fill_led_pin() static

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9303795/

Re: [PATCH] net: Don't delete routes in different VRFs

2016-09-02 Thread David Ahern

On 9/1/16 11:26 PM, Mark Tomlinson wrote:
> When deleting an IP address from an interface, there is a clean-up of
> routes which refer to this local address. However, there was no check to
> see that the VRF matched. This meant that deletion wasn't confined to
> the VRF it should have been.
> 
> To solve this, a new field has been added to fib_info to hold a table
> id. When removing fib entries corresponding to a local ip address, this
> table id is also used in the comparison.
> 
> The table id is populated when the fib_info is created. This was already
> done in some places, but not in ip_rt_ioctl(). This has now been fixed.
> 

The best fixes tag is:
Fixes: 021dd3b8a142 ("net: Add routes to the table associated with the device")

> Signed-off-by: Mark Tomlinson 
> ---
>  include/net/ip_fib.h | 3 ++-
>  net/ipv4/fib_frontend.c  | 3 ++-
>  net/ipv4/fib_semantics.c | 8 ++--
>  3 files changed, 10 insertions(+), 4 deletions(-)

Acked-by: David Ahern 
Tested-by: David Ahern 

Mark: send a v2 with the Fixes tag and my acked-by and tested-by.

Thanks for the patch.

Re: ath10k: fix spelling mistake "montior" -> "monitor"

2016-09-02 Thread Kalle Valo

Colin Ian King  wrote:
> From: Colin Ian King 
> 
> Trivial fix to spelling mistake in ath10k_warn message.
> 
> Signed-off-by: Colin Ian King 
> Reviewed-by: Julian Calaby 

Thanks, 1 patch applied to ath-next branch of ath.git:

7f03d3069381 ath10k: fix spelling mistake "montior" -> "monitor"

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9301869/

Re: possible circular locking dependency detected

2016-09-02 Thread Al Viro

On Fri, Sep 02, 2016 at 04:18:04PM +0100, Rainer Weikusat wrote:

> As far as I can tell, this should work as I can't currently imagine why
> a fs operation might end up binding a unix socket despite the idea to
> make af_unix.c yet more complicated in order to work around irregular
> behaviour of (as far as I can tell) a single filesystem (for which
> kern_path_create doesn't really mean kern_path_create

Bullshit.  kern_path_create() *does* mean the same thing in all cases.
Namely, find the parent, lock it and leave the final name component for
the create-type operation.  It sure as hell is not guaranteed to take
*all* locks that are going to be taken in process of mknod/mkdir/etc.
Never had been.

 and it has to work
> around that once it gets control) goes against all instincts I have in
> this area. If filesystems need to do arbitrary stuff when
> __sb_start_write is called for 'their' superblock, they should be able
> to do so directly.
> 
> At present, this is a theoretic concern as I can't (due to other work
> committments) put any non-cursory work into this before Sunday. There
> may also be other reasons why this idea is impractical or even
> unworkable.

Re: ath10k: replace config_enabled() with IS_REACHABLE()

2016-09-02 Thread Kalle Valo

Masahiro Yamada  wrote:
> Commit 97f2645f358b ("tree-wide: replace config_enabled() with
> IS_ENABLED()") mostly did away with config_enabled().
> 
> This is one of the postponed TODO items as config_enabled() is used
> for a tristate option here.  Theoretically, config_enabled() is
> equivalent to IS_BUILTIN(), but I guess IS_REACHABLE() is the best
> fit for this case because both CONFIG_HWMON and CONFIG_ATH10K are
> tristate.
> 
> Signed-off-by: Masahiro Yamada 

Thanks, 1 patch applied to ath-next branch of ath.git:

749bc03ae2cd ath10k: replace config_enabled() with IS_REACHABLE()

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9295945/

Re: [PATCH net 0/2] vxlan: fix error reporting

2016-09-02 Thread Stephen Hemminger

On Fri,  2 Sep 2016 13:37:10 +0200
Jiri Benc  wrote:

> This patchset improves checking for invalid configuration in VXLAN and fixes
> problems with duplicated and inappropriate error messages.
> 
> Jiri Benc (2):
>   vxlan: reject multicast destination without an interface
>   vxlan: fix duplicated and wrong error messages
> 
>  drivers/net/vxlan.c | 38 --
>  1 file changed, 12 insertions(+), 26 deletions(-)
> 

These should also be detected and rejected in iproute2
errors in kernel log are user unfriendly api.

Re: possible circular locking dependency detected

2016-09-02 Thread CAI Qian



- Original Message -
> From: "CAI Qian" 
> To: "Linus Torvalds" 
> Cc: "Al Viro" , "Miklos Szeredi" 
> , "Rainer Weikusat"
> , "Hannes Frederic Sowa" 
> , "Rainer Weikusat"
> , "Eric Sandeen" , 
> "Network Development"
> 
> Sent: Friday, September 2, 2016 10:43:20 AM
> Subject: Re: possible circular locking dependency detected
> 
> 
> 
> - Original Message -
> > From: "Linus Torvalds" 
> > To: "Al Viro" , "CAI Qian" 
> > Cc: "Miklos Szeredi" , "Rainer Weikusat"
> > , "Hannes Frederic Sowa"
> > , "Rainer Weikusat"
> > , "Eric Sandeen"
> > , "Network Development" 
> > Sent: Thursday, September 1, 2016 6:04:38 PM
> > Subject: Re: possible circular locking dependency detected
> > 
> > On Thu, Sep 1, 2016 at 2:43 PM, Linus Torvalds
> >  wrote:
> > > On Thu, Sep 1, 2016 at 2:01 PM, Al Viro  wrote:
> > >>
> > >> Outside as in "all fs activity in bind happens under it".  Along with
> > >> assignment to ->u.addr, etc.  IOW, make it the outermost lock there.
> > >
> > > Hah, yes. I misunderstood you.
> > >
> > > Yes. In fact that fixes the problem I mentioned, rather than introducing
> > > it.
> > 
> > So the easiest approach would seem to be to revert commit c845acb324aa
> > ("af_unix: Fix splice-bind deadlock"), and then apply the lock split.
> > 
> > Like the attached two patches.
> > 
> > This is still *entirely* untested.
> Tested-by: CAI Qian 
Actually, I took it back, and now spice seems start to deadlock using the 
reproducer,

https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/splice/splice01.c

[ 1749.956818] 
[ 1749.958492] ==
[ 1749.965386] [ INFO: possible circular locking dependency detected ]
[ 1749.972381] 4.8.0-rc4+ #34 Not tainted
[ 1749.976560] ---
[ 1749.983554] splice01/35921 is trying to acquire lock:
[ 1749.989188]  (>s_type->i_mutex_key#14){+.+.+.}, at: [] 
xfs_file_buffered_aio_write+0x127/0x840 [xfs]
[ 1750.001644] 
[ 1750.001644] but task is already holding lock:
[ 1750.008151]  (>mutex/1){+.+.+.}, at: [] 
pipe_lock+0x51/0x60
[ 1750.016753] 
[ 1750.016753] which lock already depends on the new lock.
[ 1750.016753] 
[ 1750.025880] 
[ 1750.025880] the existing dependency chain (in reverse order) is:
[ 1750.034229] 
-> #2 (>mutex/1){+.+.+.}:
[ 1750.039139][] lock_acquire+0x1fa/0x440
[ 1750.045857][] mutex_lock_nested+0xdd/0x850
[ 1750.052963][] pipe_lock+0x51/0x60
[ 1750.059190][] splice_to_pipe+0x75/0x9e0
[ 1750.066001][] 
__generic_file_splice_read+0xa71/0xe90
[ 1750.074071][] generic_file_splice_read+0xc1/0x1f0
[ 1750.081849][] xfs_file_splice_read+0x368/0x7b0 
[xfs]
[ 1750.089940][] do_splice_to+0xee/0x150
[ 1750.096555][] SyS_splice+0x1144/0x1c10
[ 1750.103269][] do_syscall_64+0x1a6/0x500
[ 1750.110084][] return_from_SYSCALL_64+0x0/0x7a
[ 1750.117479] 
-> #1 (&(>i_iolock)->mr_lock#2){++}:
[ 1750.123649][] lock_acquire+0x1fa/0x440
[ 1750.130362][] down_write_nested+0x5e/0xe0
[ 1750.137371][] xfs_ilock+0x2fe/0x550 [xfs]
[ 1750.144397][] 
xfs_file_buffered_aio_write+0x134/0x840 [xfs]
[ 1750.153175][] xfs_file_write_iter+0x26d/0x6d0 [xfs]
[ 1750.161177][] __vfs_write+0x2be/0x640
[ 1750.167799][] vfs_write+0x152/0x4b0
[ 1750.174220][] SyS_write+0xdf/0x1d0
[ 1750.180547][] entry_SYSCALL_64_fastpath+0x1f/0xbd
[ 1750.188328] 
-> #0 (>s_type->i_mutex_key#14){+.+.+.}:
[ 1750.194508][] __lock_acquire+0x3043/0x3dd0
[ 1750.201609][] lock_acquire+0x1fa/0x440
[ 1750.208321][] down_write+0x5a/0xe0
[ 1750.214645][] 
xfs_file_buffered_aio_write+0x127/0x840 [xfs]
[ 1750.223421][] xfs_file_write_iter+0x26d/0x6d0 [xfs]
[ 1750.231423][] vfs_iter_write+0x29e/0x550
[ 1750.238330][] iter_file_splice_write+0x529/0xb70
[ 1750.246012][] SyS_splice+0x724/0x1c10
[ 1750.252627][] do_syscall_64+0x1a6/0x500
[ 1750.259438][] return_from_SYSCALL_64+0x0/0x7a
[ 1750.266830] 
[ 1750.266830] other info that might help us debug this:
[ 1750.266830] 
[ 1750.275764] Chain exists of:
  >s_type->i_mutex_key#14 --> &(>i_iolock)->mr_lock#2 --> >mutex/1

[ 1750.287213]  Possible unsafe locking scenario:
[ 1750.287213] 
[ 1750.293817]CPU0CPU1
[ 1750.298871]
[

Re: [2/2] ath10k: use complete() instead complete_all()

2016-09-02 Thread Kalle Valo

Daniel Wagner  wrote:
> From: Daniel Wagner 
> 
> There is only one waiter for the completion, therefore there
> is no need to use complete_all(). Let's make that clear by
> using complete() instead of complete_all().
> 
> The usage pattern of the completion is:
> 
> waiter context  waker context
> 
> scan.started
> 
> 
> ath10k_start_scan()
>   lockdep_assert_held(conf_mutex)
>   auth10k_wmi_start_scan()
>   wait_for_completion_timeout(scan.started)
> 
>   ath10k_wmi_event_scan_start_failed()
> complete(scan.started)
> 
>   ath10k_wmi_event_scan_started()
> complete(scan.started)
> 
> scan.completed
> --
> 
> ath10k_scan_stop()
>   lockdep_assert_held(conf_mutex)
>   ath10k_wmi_stop_scan()
>   wait_for_completion_timeout(scan.completed)
> 
>   __ath10k_scan_finish()
> complete(scan.completed)
> 
> scan.on_channel
> ---
> 
> ath10k_remain_on_channel()
>   mutex_lock(conf_mutex)
>   ath10k_start_scan()
>   wait_for_completion_timeout(scan.on_channel)
> 
>   ath10k_wmi_event_scan_foreign_chan()
> complete(scan.on_channel)
> 
> offchan_tx_completed
> 
> 
> ath10k_offchan_tx_work()
>   mutex_lock(conf_mutex)
>   reinit_completion(offchan_tx_completed)
>   wait_for_completion_timeout(offchan_tx_completed)
> 
>   ath10k_report_offchain_tx()
> complete(offchan_tx_completed)
> 
> install_key_done
> 
> ath10k_install_key()
>   lockep_assert_held(conf_mutex)
>   reinit_completion(install_key_done)
>   wait_for_completion_timeout(install_key_done)
> 
>   ath10k_htt_t2h_msg_handler()
> complete(install_key_done)
> 
> vdev_setup_done
> ---
> 
> ath10k_monitor_vdev_start()
>   lockdep_assert_held(conf_mutex)
>reinit_completion(vdev_setup_done)
>   ath10k_vdev_setup_sync()
> wait_for_completion_timeout(vdev_setup_done)
> 
>   ath10k_wmi_event_vdev_start_resp()
> complete(vdev_setup_done)
> 
> ath10k_monitor_vdev_stop()
>   lockdep_assert_held(conf_mutex)
>   reinit_completion(vdev_setup_done()
>   ath10k_vdev_setup_sync()
> wait_for_completion_timeout(vdev_setup_done)
> 
>   ath10k_wmi_event_vdev_stopped()
>complete(vdev_setup_done)
> 
> thermal.wmi_sync
> 
> ath10k_thermal_show_temp()
>   mutex_lock(conf_mutex)
>   reinit_completion(thermal.wmi_sync)
>   wait_for_completion_timeout(thermal.wmi_sync)
> 
>   ath10k_thermal_event_temperature()
> complete(thermal.wmi_sync)
> 
> bss_survey_done
> ---
> ath10k_mac_update_bss_chan_survey
>   lockdep_assert_held(conf_mutex)
>   reinit_completion(bss_survey_done)
>   wait_for_completion_timeout(bss_survey_done)
> 
>   ath10k_wmi_event_pdev_bss_chan_info()
> complete(bss_survey_done)
> 
> All complete() calls happen while the conf_mutex is taken. That means
> at max one waiter is possible.
> 
> Signed-off-by: Daniel Wagner 

Thanks, 1 patch applied to ath-next branch of ath.git:

881ed54ecc13 ath10k: use complete() instead complete_all()

-- 
Sent by pwcli
https://patchwork.kernel.org/patch/9287731/

Re: [PATCH] ath10k: fix memory leak on caldata on error exit path

2016-09-02 Thread Valo, Kalle

Colin King  writes:

> From: Colin Ian King 
>
> caldata is not being free'd on the error exit path, causing
> a memory leak. kfree it to fix the leak.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/wireless/ath/ath10k/pci.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/drivers/net/wireless/ath/ath10k/pci.c 
> b/drivers/net/wireless/ath/ath10k/pci.c
> index 9a22c47..886337c 100644
> --- a/drivers/net/wireless/ath/ath10k/pci.c
> +++ b/drivers/net/wireless/ath/ath10k/pci.c
> @@ -2725,6 +2725,7 @@ static int ath10k_pci_hif_fetch_cal_eeprom(struct 
> ath10k *ar, void **data,
>   return 0;
>  
>  err_free:
> + kfree(caldata);
>   kfree(data);
>  
>   return -EINVAL;

I don't think we should free data at all:

static int ath10k_download_cal_eeprom(struct ath10k *ar)
{
size_t data_len;
void *data = NULL;
int ret;

ret = ath10k_hif_fetch_cal_eeprom(ar, , _len);

Instead we should free only caldata, right?

-- 
Kalle Valo

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread Eric Dumazet

On Fri, 2016-09-02 at 08:21 -0700, Eric Dumazet wrote:

> Ideally the netdev_rx_handler_register() should be called only when no
> further errors can be detected in the 'enslaving' process, otherwise
> some live packets could come and be incorrectly processed/dropped by a
> not fully initialized driver.
> 
> So it is a chicken and egg problem, if you allow
> netdev_rx_handler_register() to return an error, while it is so easy
> to early check what could go wrong with it.

One final point is that this patch is easy to backport to stable
versions.

bond_enslave() is a monster, and changing the rollback chain is likely
to cause backport merge conflicts.

Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

> On Fri, Sep 02, 2016 at 08:08:19AM -0400, Vivien Didelot wrote:
>> Since not every chip has a Global2 set of registers, make its support
>> optional, in which case the related functions will return -EOPNOTSUPP.
>> 
>> This also allows to reduce the size of the mv88e6xxx driver for devices
>> such as home routers embedding Ethernet chips without Global2 support.
>
> I've no problems with splitting this out.

Yes, I think it is important to split support for independent internal
SMI devices (Globals, PHY, TCAM, ...).

> However, making it optional? I don't think that is a good idea.
>
> 1) If you don't have this, and you should, it looks like the PHYs stop
> working, and since you cannot set the switch MAC address, maybe Pause
> frames are broken.
>
> 2) Distros won't build a kernel per target hardware. They probably
> build a kernel per SoC vendor. That means, they will have this
> optional code built all the time. Very few builds will leave it out.
>
> So the only way i think making this optional, is if it is a kernel
> module, and it gets loaded if needed.

Your points are correct. But unless I'm mistaken, I purposely default
this suppor to 'y'. Distros will compile everything, which is good.

Only people who know their chips and what they are doing will eventually
go there and disable the support for some devices.

I plan to merge mv88e6060.c into the mv88e6xxx driver, these chips don't
have Global2. And we will soon add support for TCAM, which is huge. The
majority of devices supported by mv88e6xxx don't have a TCAM device.

So my point is that I'd like to disable support for unnecessary internal
devices, even if it doesn't make much difference on our huge dev boards,
but it does make a difference for the many embedded devices out there
(think home routers) with some 88E6060, 88E6176 or 88E6185 chips.

What do you think?

Thanks,

Vivien

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread Eric Dumazet

On Fri, 2016-09-02 at 08:57 -0600, David Ahern wrote:

> I hit this same problem yesterday but with the bridge. I forgot I had
> a macvlan device on an interface and tried to enslave it to a bridge.
> It failed with EBUSY without crashing the kernel so it is one example
> that handles the conflict, and the bridge also calls the register
> before the enslaving is done.
> 

Sure, some netdev_rx_handler_register() users are simpler than bonding,
this was the point Jiri raised. Hey guys just use team instead of
bonding ;)

If you carefully look at bridge code, you'll find other kind of errors
for sure ;)

For example, should dev_disable_lro() be called before or after
netdev_rx_handler_register()  ? ;)

Mahesh proposal allows for simplification, since one level can be
removed from the error rollback chain.

Ideally the netdev_rx_handler_register() should be called only when no
further errors can be detected in the 'enslaving' process, otherwise
some live packets could come and be incorrectly processed/dropped by a
not fully initialized driver.

So it is a chicken and egg problem, if you allow
netdev_rx_handler_register() to return an error, while it is so easy
to early check what could go wrong with it.

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread David Ahern

On 9/2/16 8:45 AM, Eric Dumazet wrote:
> On Fri, 2016-09-02 at 08:30 -0600, David Ahern wrote:
> 
>> This check duplicates what netdev_rx_handler_register does. Why not
>> move the call to netdev_rx_handler_register here and then call
>> unregister on failure paths?
> 
> As soon as you call netdev_rx_handler_register(), incoming packets will
> hit your driver and we'll likely crash since the enslaving is not done
> yet.
> 
> Really think about RCU, we do rcu_assign_pointer() because we want all
> prior changes being committed to memory before 'enabling' readers to see
> the updated rcu protected pointer.
> 
> There are 9 call sites where netdev_rx_handler_register() is used.
> 
> We can get rid of the extra check in netdev_rx_handler_register() once
> all of them are using netdev_is_rx_handler_busy()
> 
> Since this patch takes care of bonding only, we need to keep the
> existing check in netdev_rx_handler_register()
> 
> Anyway, we are speaking of control function, an extra check is simply
> safer, like all the ASSERT_RTNL() we do have...
> 

I hit this same problem yesterday but with the bridge. I forgot I had a macvlan 
device on an interface and tried to enslave it to a bridge. It failed with 
EBUSY without crashing the kernel so it is one example that handles the 
conflict, and the bridge also calls the register before the enslaving is done.

Re: [PATCH net-next 3/3] net: dsa: mv88e6xxx: make global2 code optional

2016-09-02 Thread Andrew Lunn

On Fri, Sep 02, 2016 at 08:08:19AM -0400, Vivien Didelot wrote:
> Since not every chip has a Global2 set of registers, make its support
> optional, in which case the related functions will return -EOPNOTSUPP.
> 
> This also allows to reduce the size of the mv88e6xxx driver for devices
> such as home routers embedding Ethernet chips without Global2 support.

Hi Vivien

I've no problems with splitting this out.

However, making it optional? I don't think that is a good idea.

1) If you don't have this, and you should, it looks like the PHYs stop
working, and since you cannot set the switch MAC address, maybe Pause
frames are broken.

2) Distros won't build a kernel per target hardware. They probably
build a kernel per SoC vendor. That means, they will have this
optional code built all the time. Very few builds will leave it out.

So the only way i think making this optional, is if it is a kernel
module, and it gets loaded if needed.

Andrew

Re: [PATCH net] bonding: Fix bonding crash

2016-09-02 Thread Eric Dumazet

On Fri, 2016-09-02 at 08:30 -0600, David Ahern wrote:

> This check duplicates what netdev_rx_handler_register does. Why not
> move the call to netdev_rx_handler_register here and then call
> unregister on failure paths?

As soon as you call netdev_rx_handler_register(), incoming packets will
hit your driver and we'll likely crash since the enslaving is not done
yet.

Really think about RCU, we do rcu_assign_pointer() because we want all
prior changes being committed to memory before 'enabling' readers to see
the updated rcu protected pointer.

There are 9 call sites where netdev_rx_handler_register() is used.

We can get rid of the extra check in netdev_rx_handler_register() once
all of them are using netdev_is_rx_handler_busy()

Since this patch takes care of bonding only, we need to keep the
existing check in netdev_rx_handler_register()

Anyway, we are speaking of control function, an extra check is simply
safer, like all the ASSERT_RTNL() we do have...

1 2 >

1 - 100 of 172 matches

Mail list logo