date:20170523

Re: [Patch net-next] net_sched: only create filter chains for new filters/actions

2017-05-23 Thread Jiri Pirko

Tue, May 23, 2017 at 06:42:37PM CEST, xiyou.wangc...@gmail.com wrote:
>tcf_chain_get() always creates a new filter chain if not found
>in existing ones. This is totally unnecessary when we get or
>delete filters, new chain should be only created for new filters
>(or new actions).
>
>Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
>Cc: Jamal Hadi Salim 
>Cc: Jiri Pirko 
>Signed-off-by: Cong Wang 
>---
> include/net/pkt_cls.h |  3 ++-
> net/sched/act_api.c   |  2 +-
> net/sched/cls_api.c   | 13 +
> 3 files changed, 12 insertions(+), 6 deletions(-)
>
>diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
>index 2c213a6..f776229 100644
>--- a/include/net/pkt_cls.h
>+++ b/include/net/pkt_cls.h
>@@ -18,7 +18,8 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
> int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
> 
> #ifdef CONFIG_NET_CLS
>-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index);
>+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>+  bool create);
> void tcf_chain_put(struct tcf_chain *chain);
> int tcf_block_get(struct tcf_block **p_block,
> struct tcf_proto __rcu **p_filter_chain);
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 0ecf2a8..aed6cf2 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -34,7 +34,7 @@ static int tcf_action_goto_chain_init(struct tc_action *a, 
>struct tcf_proto *tp)
> 
>   if (!tp)
>   return -EINVAL;
>-  a->goto_chain = tcf_chain_get(tp->chain->block, chain_index);
>+  a->goto_chain = tcf_chain_get(tp->chain->block, chain_index, true);
>   if (!a->goto_chain)
>   return -ENOMEM;
>   return 0;
>diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
>index 01a8b8b..23d2236 100644
>--- a/net/sched/cls_api.c
>+++ b/net/sched/cls_api.c
>@@ -220,7 +220,8 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
>   kfree(chain);
> }
> 
>-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index)
>+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
>+  bool create)
> {
>   struct tcf_chain *chain;
> 
>@@ -230,7 +231,10 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, 
>u32 chain_index)
>   return chain;
>   }
>   }
>-  return tcf_chain_create(block, chain_index);
>+  if (create)
>+  return tcf_chain_create(block, chain_index);
>+  else
>+  return NULL;
> }
> EXPORT_SYMBOL(tcf_chain_get);
> 
>@@ -509,9 +513,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
>nlmsghdr *n,
>   err = -EINVAL;
>   goto errout;
>   }
>-  chain = tcf_chain_get(block, chain_index);
>+  chain = tcf_chain_get(block, chain_index,
>+n->nlmsg_type == RTM_NEWTFILTER);

First of all, I really hate all these true/false arg dances. Totaly
confusing all the time.



>   if (!chain) {
>-  err = -ENOMEM;
>+  err = n->nlmsg_type == RTM_NEWTFILTER ? -ENOMEM : -EINVAL;

Confusing. Please do not obfuscate the code for a corner cases. Thanks.



>   goto errout;
>   }
> 
>-- 
>2.5.5
>

RE: [PATCH 1/2] net: phy: Update get_phy_c45_ids for Cortina PHYs

2017-05-23 Thread Bogdan Purcareata

> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Tuesday, May 23, 2017 9:12 PM
> To: Andrew Lunn ; Bogdan Purcareata
> 
> Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 1/2] net: phy: Update get_phy_c45_ids for Cortina PHYs
> 
> On 05/23/2017 09:55 AM, Andrew Lunn wrote:
> >> The patches mentioned in the commit message add _some_ support for
> >> the Cortina PHYs - mainly checking for devices at additional
> >> locations. Once they are found, the phy IDs must be read from custom
> >> locations.
> >
> > As a general principle, we don't add hacks in generic code to handle
> > broken devices. We add generic mechanisms to work around the
> > brokenness.
> >
> > In this case, by using ethernet-phy-id in the device tree, we are
> > saying, this PHYs probing is totally borked, but we know it is there,
> > at this address. Just load the driver.
> >
> > Please try to make ethernet-phy-id work.
> 
> What Andrew is suggesting is to leverage the code in
> drivers/of/of_mdio.c which does the following:
> 
>is_c45 = of_device_is_compatible(child,
>  "ethernet-phy-ieee802.3-c45");
> 
> if (!is_c45 && !of_get_phy_id(child, &phy_id))
> phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
> else
> phy = get_phy_device(mdio, addr, is_c45);
> if (IS_ERR(phy))
> return;
> 
> If you know the PHY ID, and you did put it in the PHY node's compatible
> string (in the format that of_get_phy_id() expects it to, and you also
> did not add "ethernet-phy-ieee802.3-c45") then the PHY library will
> directly create the PHY device, with the designated ID, at the specific
> address.
> 
> While this works for clause 22 PHYs, I don't know if it also does for
> clause 45 PHYs, but as Andrew is suggesting, I would be more inclined
> into making this scheme work for all types (22 or 45) PHYs, rather than
> hacking the core code that tries to identify devices in packages.
> 
> Can you give it a spin?

Sure. My first thought was to latch onto the codepath that had some mention of 
Cortina PHYs.

I will check of_get_phy_id() and see if it does the job.

Thank you!
Bogdan

Re: Why max netlink msg size is limited to 16k

2017-05-23 Thread prashantkumar dhotre

Thanks
Can I use NLM_F_MULTI netlink multipart msg flag to send large buffer ?
I could not find example of that usage in sending case .

I have a kernel module and tryng to send large buffer to user app.
If I can use that flag, will kernel take care of splitting large
buffer into small chunks and will kernel also send NLMSG_DONE type msg
at end to user app?
How should I handle in user app.
Can you point me to few example please.

On Sat, Apr 22, 2017 at 9:06 PM, Eric Dumazet  wrote:
> On Sat, 2017-04-22 at 19:43 +0530, prashantkumar dhotre wrote:
>> I am observing that max netlink msg that my kernel module can send to
>> user app is close to 16K.
>>
>> For larger sizes, genlmsg_unicast() succeeds but my app does not receive 
>> data.
>>
>> I have tried increasing RECV buffer size in my user app but that does not 
>> help.
>>
>> Regards
>
>
> You need a kernel >= linux-4.9 to get about 32KB
>
> Why is this limited ? Please read
>
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=d35c99ff77ecb2eb239731b799386f3b3637a31e
>
>
>

Re: [patch net-next v2 7/8] mlxsw: spectrum: Validate firmware revision on init

2017-05-23 Thread Ido Schimmel

On Tue, May 23, 2017 at 09:56:29PM +0200, Jiri Pirko wrote:
> From: Yotam Gigi 
> 
> Make the spectrum module check the current device firmware version, and if
> it is below the supported version, use the libfirmware API to request a
> firmware file with the supported firmware version and flash it to the
> device using the mlxfw module.
> 
> The firmware file names are expected to be of Mellanox Firmware Archive
> version 2 (MFA2) format and their name are expected to be in the following
> pattern: "mlxsw_spectrum-...mfa2".
> 
> Signed-off-by: Yotam Gigi 
> Signed-off-by: Jiri Pirko 

Reviewed-by: Ido Schimmel

Re: [PATCH] rhashtable: Fix missing elements when inserting.

2017-05-23 Thread Herbert Xu

Taehee Yoo  wrote:
> rhltable_insert_key() inserts a node into list of element,
> if node's key is duplicated, so that it becomes the chain of
> element(as known as rhead). Also bucket table points that element directly.
> If a inserted node's element chain is located at third,
> rhltable misses first and second element chain.
> This issue is causion of to failture the rhltable_remove().
> 
> After this patch, rhltable_insert_key() inserts a node into second of
> element's list, so that rhlist do not misses elements.
> 
> Signed-off-by: Taehee Yoo 

I'm sorry but I don't understand your description of the problem.
The new duplicate object is always inserted at the head of the
list and therefore replaces the previous list head in the hash
bucket chain.

Do you have some code that reproduces the problem?

> diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
> index 7d56a7e..d3c24b9 100644
> --- a/include/linux/rhashtable.h
> +++ b/include/linux/rhashtable.h
> @@ -762,11 +762,9 @@ static inline void *__rhashtable_insert_fast(
>list = container_of(obj, struct rhlist_head, rhead);
>plist = container_of(head, struct rhlist_head, rhead);
> 
> -   RCU_INIT_POINTER(list->next, plist);
> -   head = rht_dereference_bucket(head->next, tbl, hash);
> -   RCU_INIT_POINTER(list->rhead.next, head);
> -   rcu_assign_pointer(*pprev, obj);
> -
> +   RCU_INIT_POINTER(list->next, 
> rht_dereference_bucket(plist->next,
> +   tbl, 
> hash));
> +   RCU_INIT_POINTER(plist->next, list);

Your second RCU_INIT_POINTER needs to be rcu_assign_pointer.

Your approach of retaining the first duplicate object as the head
of the list should work too but I don't really see any point in
changing this.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

RE: [PATCH v2] net: fec: add post PHY reset delay DT property

2017-05-23 Thread Andy Duan

From: Quentin Schulz  Sent: Tuesday, May 23, 
2017 5:48 PM
>Some PHY require to wait for a bit after the reset GPIO has been toggled. This
>adds support for the DT property `phy-reset-post-delay` which gives the delay
>in milliseconds to wait after reset.
>
>If the DT property is not given, no delay is observed. Post reset delay greater
>than 1000ms are invalid.
>
>Signed-off-by: Quentin Schulz 
>---
>
>v2:
>  - return -EINVAL when phy-reset-post-delay is greater than 1000ms
>  instead of defaulting to 1ms,
>  - remove `default to 1ms` when phy-reset-post-delay > 1000Ms from DT
>  binding doc and commit log,
>  - move phy-reset-post-delay property reading before
>  devm_gpio_request_one(),
>
> Documentation/devicetree/bindings/net/fsl-fec.txt |  4 
> drivers/net/ethernet/freescale/fec_main.c | 16 +++-
> 2 files changed, 19 insertions(+), 1 deletion(-)
>
>diff --git a/Documentation/devicetree/bindings/net/fsl-fec.txt
>b/Documentation/devicetree/bindings/net/fsl-fec.txt
>index a1e3693cca16..6f55bdd52f8a 100644
>--- a/Documentation/devicetree/bindings/net/fsl-fec.txt
>+++ b/Documentation/devicetree/bindings/net/fsl-fec.txt
>@@ -15,6 +15,10 @@ Optional properties:
> - phy-reset-active-high : If present then the reset sequence using the GPIO
>   specified in the "phy-reset-gpios" property is reversed (H=reset state,
>   L=operation state).
>+- phy-reset-post-delay : Post reset delay in milliseconds. If present
>+then
>+  a delay of phy-reset-post-delay milliseconds will be observed after
>+the
>+  phy-reset-gpios has been toggled. Can be omitted thus no delay is
>+  observed. Delay is in range of 1ms to 1000ms. Other delays are invalid.
> - phy-supply : regulator that powers the Ethernet PHY.
> - phy-handle : phandle to the PHY device connected to this device.
> - fixed-link : Assume a fixed link. See fixed-link.txt in the same directory.
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index 56a563f90b0b..f7c8649fd28f 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -3192,7 +3192,7 @@ static int fec_reset_phy(struct platform_device
>*pdev)  {
>   int err, phy_reset;
>   bool active_high = false;
>-  int msec = 1;
>+  int msec = 1, phy_post_delay = 0;
>   struct device_node *np = pdev->dev.of_node;
>
>   if (!np)
>@@ -3209,6 +3209,11 @@ static int fec_reset_phy(struct platform_device
>*pdev)
>   else if (!gpio_is_valid(phy_reset))
>   return 0;
>
>+  err = of_property_read_u32(np, "phy-reset-post-delay",
>&phy_post_delay);
>+  /* valid reset duration should be less than 1s */
>+  if (!err && phy_post_delay > 1000)
>+  return -EINVAL;
>+
>   active_high = of_property_read_bool(np, "phy-reset-active-high");
>
>   err = devm_gpio_request_one(&pdev->dev, phy_reset, @@ -3226,6
>+3231,15 @@ static int fec_reset_phy(struct platform_device *pdev)
>
>   gpio_set_value_cansleep(phy_reset, !active_high);
>
>+  if (!phy_post_delay)
>+  return 0;
>+
>+  if (phy_post_delay > 20)
>+  msleep(phy_post_delay);
>+  else
>+  usleep_range(phy_post_delay * 1000,
>+   phy_post_delay * 1000 + 1000);
>+
>   return 0;
> }
> #else /* CONFIG_OF */
>--
>2.11.0

It looks fine.
Acked-by: Fugang Duan

[PATCH] rhashtable: Fix missing elements when inserting.

2017-05-23 Thread Taehee Yoo

rhltable_insert_key() inserts a node into list of element,
if node's key is duplicated, so that it becomes the chain of
element(as known as rhead). Also bucket table points that element directly.
If a inserted node's element chain is located at third,
rhltable misses first and second element chain.
This issue is causion of to failture the rhltable_remove().

After this patch, rhltable_insert_key() inserts a node into second of
element's list, so that rhlist do not misses elements.

Signed-off-by: Taehee Yoo 
---
 include/linux/rhashtable.h | 8 +++-
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h
index 7d56a7e..d3c24b9 100644
--- a/include/linux/rhashtable.h
+++ b/include/linux/rhashtable.h
@@ -762,11 +762,9 @@ static inline void *__rhashtable_insert_fast(
list = container_of(obj, struct rhlist_head, rhead);
plist = container_of(head, struct rhlist_head, rhead);
 
-   RCU_INIT_POINTER(list->next, plist);
-   head = rht_dereference_bucket(head->next, tbl, hash);
-   RCU_INIT_POINTER(list->rhead.next, head);
-   rcu_assign_pointer(*pprev, obj);
-
+   RCU_INIT_POINTER(list->next, rht_dereference_bucket(plist->next,
+   tbl, hash));
+   RCU_INIT_POINTER(plist->next, list);
goto good;
}
 
-- 
2.9.3

Re: [PATCH V3 net 1/3] vlan: Fix tcp checksum offloads in Q-in-Q vlans

2017-05-23 Thread Toshiaki Makita

On 2017/05/24 2:38, Vladislav Yasevich wrote:
> It appears that TCP checksum offloading has been broken for
> Q-in-Q vlans.  The behavior was execerbated by the
> series
> commit afb0bc972b52 ("Merge branch 'stacked_vlan_tso'")
> that that enabled accleleration features on stacked vlans.
> 
> However, event without that series, it is possible to trigger
> this issue.  It just requires a lot more specialized configuration.
> 
> The root cause is the interaction between how
> netdev_intersect_features() works, the features actually set on
> the vlan devices and HW having the ability to run checksum with
> longer headers.
> 
> The issue starts when netdev_interesect_features() replaces
> NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
> if the HW advertises IP|IPV6 specific checksums.  This happens
> for tagged and multi-tagged packets.   However, HW that enables
> IP|IPV6 checksum offloading doesn't gurantee that packets with
> arbitrarily long headers can be checksummed.
> 
> This patch disables IP|IPV6 checksums on the packet for multi-tagged
> packets.
> 
> CC: Toshiaki Makita 
> CC: Michal Kubecek 
> Signed-off-by: Vladislav Yasevich 
> ---

Thank you for fixing it.
Acked-by: Toshiaki Makita 

Toshiaki Makita

[PATCH] net: jme: Remove unused functions

2017-05-23 Thread Matthias Kaehlcke

The functions jme_restart_tx_engine(), jme_pause_rx() and
jme_resume_rx() are not used. Removing them fixes the following warnings
when building with clang:

drivers/net/ethernet/jme.c:694:1: error: unused function
'jme_restart_tx_engine' [-Werror,-Wunused-function]

drivers/net/ethernet/jme.c:2393:20: error: unused function
'jme_pause_rx' [-Werror,-Wunused-function]

drivers/net/ethernet/jme.c:2406:20: error: unused function
'jme_resume_rx' [-Werror,-Wunused-function]

Signed-off-by: Matthias Kaehlcke 
---
 drivers/net/ethernet/jme.c | 42 --
 1 file changed, 42 deletions(-)

diff --git a/drivers/net/ethernet/jme.c b/drivers/net/ethernet/jme.c
index f580b49e6b67..0e5083a48937 100644
--- a/drivers/net/ethernet/jme.c
+++ b/drivers/net/ethernet/jme.c
@@ -691,17 +691,6 @@ jme_enable_tx_engine(struct jme_adapter *jme)
 }
 
 static inline void
-jme_restart_tx_engine(struct jme_adapter *jme)
-{
-   /*
-* Restart TX Engine
-*/
-   jwrite32(jme, JME_TXCS, jme->reg_txcs |
-   TXCS_SELECT_QUEUE0 |
-   TXCS_ENABLE);
-}
-
-static inline void
 jme_disable_tx_engine(struct jme_adapter *jme)
 {
int i;
@@ -2382,37 +2371,6 @@ jme_tx_timeout(struct net_device *netdev)
jme_reset_link(jme);
 }
 
-static inline void jme_pause_rx(struct jme_adapter *jme)
-{
-   atomic_dec(&jme->link_changing);
-
-   jme_set_rx_pcc(jme, PCC_OFF);
-   if (test_bit(JME_FLAG_POLL, &jme->flags)) {
-   JME_NAPI_DISABLE(jme);
-   } else {
-   tasklet_disable(&jme->rxclean_task);
-   tasklet_disable(&jme->rxempty_task);
-   }
-}
-
-static inline void jme_resume_rx(struct jme_adapter *jme)
-{
-   struct dynpcc_info *dpi = &(jme->dpi);
-
-   if (test_bit(JME_FLAG_POLL, &jme->flags)) {
-   JME_NAPI_ENABLE(jme);
-   } else {
-   tasklet_enable(&jme->rxclean_task);
-   tasklet_enable(&jme->rxempty_task);
-   }
-   dpi->cur= PCC_P1;
-   dpi->attempt= PCC_P1;
-   dpi->cnt= 0;
-   jme_set_rx_pcc(jme, PCC_P1);
-
-   atomic_inc(&jme->link_changing);
-}
-
 static void
 jme_get_drvinfo(struct net_device *netdev,
 struct ethtool_drvinfo *info)
-- 
2.13.0.219.gdb65acc882-goog

pull request: bluetooth-next 2017-05-23

2017-05-23 Thread Johan Hedberg

Hi Dave,

Here's the first Bluetooth & 802.15.4 pull request targeting the 4.13
kernel release.

 - Bluetooth 5.0 improvements (Data Length Extensions and alternate PHY)
 - Support for new Intel Bluetooth adapter [[8087:0aaa]
 - Various fixes to ieee802154 code
 - Various fixes to HCI UART code

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 1fc4d180b3c6bed0e7f5160bcd553aec89594962:

  Merge branch 'phy-marvell-cleanups' (2017-05-17 16:27:52 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 7dab5467647be42736dcabcd5d035c7b571f4653:

  net: ieee802154: fix potential null pointer dereference (2017-05-23 20:12:53 
+0200)


Alexander Aring (1):
  MAINTAINERS: update my mail address

Dean Jenkins (1):
  Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races

Guodong Xu (1):
  Bluetooth: hci_ll: Fix download_firmware() return when __hci_cmd_sync 
fails

Gustavo A. R. Silva (1):
  net: ieee802154: fix potential null pointer dereference

Jürg Billeter (1):
  Bluetooth: btintel: Add MODULE_FIRMWARE entries for iBT 3.5 controllers

Lin Zhang (2):
  net: ieee802154: remove explicit set skb->sk
  net: ieee802154: fix net_device reference release too early

Loic Poulain (1):
  Bluetooth: btwilink: Fix unexpected skb free

Marcel Holtmann (5):
  Bluetooth: Set LE Suggested Default Data Length to maximum
  Bluetooth: Enable LE Channel Selection Algorithm event
  Bluetooth: Enable LE PHY Update Complete event
  Bluetooth: Set LE Default PHY preferences
  Bluetooth: Skip vendor diagnostic configuration for HCI User Channel

Markus Elfring (3):
  Bluetooth: Delete error messages for failed memory allocations in two 
functions
  ieee802154: ca8210: Delete an error message for a failed memory 
allocation in ca8210_probe()
  ieee802154: ca8210: Delete an error message for a failed memory 
allocation in ca8210_skb_rx()

Tedd Ho-Jeong An (1):
  Bluetooth: Add support for Intel Bluetooth device 9460/9560 [8087:0aaa]

Tobias Regnery (2):
  Bluetooth: hci_uart: fix kconfig dependency
  Bluetooth: hci_nokia: select BT_HCIUART_H4

 MAINTAINERS |  4 ++--
 drivers/bluetooth/Kconfig   |  3 ++-
 drivers/bluetooth/btintel.c |  2 ++
 drivers/bluetooth/btusb.c   |  4 
 drivers/bluetooth/btwilink.c|  1 -
 drivers/bluetooth/hci_ldisc.c   | 40 ++-
 drivers/bluetooth/hci_ll.c  |  1 +
 drivers/bluetooth/hci_uart.h|  1 +
 drivers/net/ieee802154/ca8210.c | 12 ---
 include/net/bluetooth/hci.h |  8 +++
 net/bluetooth/ecdh_helper.c | 11 +++---
 net/bluetooth/hci_core.c| 46 -
 net/ieee802154/socket.c | 10 -
 13 files changed, 107 insertions(+), 36 deletions(-)


signature.asc
Description: PGP signature

Re: Alignment in BPF verifier

2017-05-23 Thread Alexei Starovoitov


On 5/23/17 10:43 AM, Edward Cree wrote:

Another issue: it looks like the min/max_value handling for subtraction is
 bogus.  In adjust_reg_min_max_vals() we have
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
dst_reg->min_value -= min_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
dst_reg->max_value -= max_val;
 where min_val and max_val refer to the src_reg.
But surely they should be used the other way round; if (say) 2 <= R1 <= 6
 and 1 <= R2 <= 4, then this will claim 1 <= (R1 - R2) <= 2, whereas really
 (R1 - R2) could be anything from -2 to 5.
This also means that the code just above the switch,
if (min_val == BPF_REGISTER_MIN_RANGE)
dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
if (max_val == BPF_REGISTER_MAX_RANGE)
dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 is wrong, since e.g. subtracting MAX_RANGE needs to blow our min_value,
 not our max_value.


right. good catch. I have a feeling we discussed similar thing before.
May be some patch felt through the cracks.
That's the reason the fancy verifier analysis is root only.
I'm assuming you're going to send a fix?
Thanks!

[PATCH] net: fix potential null pointer dereference

2017-05-23 Thread Gustavo A. R. Silva

Add null check to avoid a potential null pointer dereference.

Addresses-Coverity-ID: 1408831
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/gtp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 4fea1b3..7b652bb 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -873,7 +873,7 @@ static struct gtp_dev *gtp_find_dev(struct net *src_net, 
struct nlattr *nla[])
 
/* Check if there's an existing gtpX device to configure */
dev = dev_get_by_index_rcu(net, nla_get_u32(nla[GTPA_LINK]));
-   if (dev->netdev_ops == >p_netdev_ops)
+   if (dev && dev->netdev_ops == >p_netdev_ops)
gtp = netdev_priv(dev);
 
put_net(net);
-- 
2.5.0

[patch iproute2] ipvtap: Adding support for ipvtap device management

2017-05-23 Thread Sainath Grandhi

This patch adds support for managing ipvtap devices using ip link. ipvtap 
support
is added to linux with commit 235a9d89da976e2975b3de9afc0bed7b72557983
---
 ip/iplink_ipvlan.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/ip/iplink_ipvlan.c b/ip/iplink_ipvlan.c
index f7735f3..153aa2f 100644
--- a/ip/iplink_ipvlan.c
+++ b/ip/iplink_ipvlan.c
@@ -1,4 +1,4 @@
-/* iplink_ipvlan.c IPVLAN device support
+/* iplink_ipvlan.c IPVLAN/IPVTAP device support
  *
  *  This program is free software; you can redistribute it and/or
  *  modify it under the terms of the GNU General Public License
@@ -90,3 +90,11 @@ struct link_util ipvlan_link_util = {
.print_opt  = ipvlan_print_opt,
.print_help = ipvlan_print_help,
 };
+
+struct link_util ipvtap_link_util = {
+   .id = "ipvtap",
+   .maxattr= IFLA_IPVLAN_MAX,
+   .parse_opt  = ipvlan_parse_opt,
+   .print_opt  = ipvlan_print_opt,
+   .print_help = ipvlan_print_help,
+};
-- 
2.7.4

[PATCH net-next] geneve: fix fill_info when using collect_metadata

2017-05-23 Thread Eric Garver

Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.

Fix by checking for the presence of the actual sockets.

Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.")
Signed-off-by: Eric Garver 
---
 drivers/net/geneve.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c
index dec5d563ab19..959fd12d2e67 100644
--- a/drivers/net/geneve.c
+++ b/drivers/net/geneve.c
@@ -1293,7 +1293,7 @@ static int geneve_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
if (nla_put_u32(skb, IFLA_GENEVE_ID, vni))
goto nla_put_failure;
 
-   if (ip_tunnel_info_af(info) == AF_INET) {
+   if (rtnl_dereference(geneve->sock4)) {
if (nla_put_in_addr(skb, IFLA_GENEVE_REMOTE,
info->key.u.ipv4.dst))
goto nla_put_failure;
@@ -1302,8 +1302,10 @@ static int geneve_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
   !!(info->key.tun_flags & TUNNEL_CSUM)))
goto nla_put_failure;
 
+   }
+
 #if IS_ENABLED(CONFIG_IPV6)
-   } else {
+   if (rtnl_dereference(geneve->sock6)) {
if (nla_put_in6_addr(skb, IFLA_GENEVE_REMOTE6,
 &info->key.u.ipv6.dst))
goto nla_put_failure;
@@ -1315,8 +1317,8 @@ static int geneve_fill_info(struct sk_buff *skb, const 
struct net_device *dev)
if (nla_put_u8(skb, IFLA_GENEVE_UDP_ZERO_CSUM6_RX,
   !geneve->use_udp6_rx_checksums))
goto nla_put_failure;
-#endif
}
+#endif
 
if (nla_put_u8(skb, IFLA_GENEVE_TTL, info->key.ttl) ||
nla_put_u8(skb, IFLA_GENEVE_TOS, info->key.tos) ||
-- 
2.12.0

Re: [PATCH] net: phy: put genphy_config_init's EXPORT_SYMBOL directly after the function

2017-05-23 Thread Florian Fainelli

On 05/23/2017 03:26 PM, Uwe Kleine-König wrote:
> Commit af6b6967d6e1 ("net: phy: export genphy_config_init()") introduced
> this EXPORT_SYMBOL and put it after gen10g_soft_reset() instead of
> directly after genphy_config_init. Probably this happend when the patch
> was applied because http://patchwork.ozlabs.org/patch/339622/ looks ok.
> 
> Signed-off-by: Uwe Kleine-König 

Acked-by: Florian Fainelli 
-- 
Florian

Re: bond link state mismatch, rtnl_trylock() vs rtnl_lock()

2017-05-23 Thread Jay Vosburgh

Nithin Sujir  wrote:

>Hi,
>We're encountering a problem in 4.4 LTS where, rarely, the bond link state
>is not updated when the slave link changes.
>
>I've traced the issue to the arp monitor unable to get the rtnl lock. The
>sequence resulting in failure is as below.
>
>bond_loadbalance_arp_mon() periodically called, if slave link is _down_,
>it checks if the slave is sending/receiving packets. If it is, it sets
>flags to be processed later down the function for bond link
>update. However, it sets the slave->link right away.
>
>if (slave->link != BOND_LINK_UP) {
>if (bond_time_in_interval(bond, trans_start, 1) &&
>bond_time_in_interval(bond, slave->last_rx,
>1)) {
>
>slave->link  = BOND_LINK_UP;
>slave_state_changed = 1;
>
>
>Later down the function, it tries to get the rtnl_lock. If it doesn't get
>it, it rearms and returns.
>
>if (do_failover || slave_state_changed) {
>if (!rtnl_trylock())
>goto re_arm; <-- returns here
>
>if (slave_state_changed) {
>bond_slave_state_change(bond);
>
>This is the problem. The next time this function is called, the
>slave->link is already marked UP. And we will never update the bond link
>state to UP.

This looks like an ARP monitor version of

commit de77ecd4ef02ca783f7762e04e92b3d0964be66b
Author: Mahesh Bandewar 
Date:   Mon Mar 27 11:37:33 2017 -0700

bonding: improve link-status update in mii-monitoring

and probably needs a similar fix (possibly for both the
loadbalance and active-backup ARP monitor cases).

>Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.
>
>Is this the right way to fix it? If it is, I can submit this formally.

It's not the right way, unfortunately.

The reason for the rtnl_trylock is that there's a possible race
against bond_close() -> bond_work_cancel_all() trying to cancel the
arp_work workqueue item while it's running.  bond_close is called with
RTNL held, so if it has RTNL and is waiting for the work function to
complete, an rtnl_lock call here will deadlock.  Some of the trylock
calls in bonding are commented to this effect, but not this one.

-J

>What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some
>places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't
>find much via a google search or in Documentation/.
>
>Thanks,
>Nithin.
>
>
>
>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>index 5dca77e..1f60503 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct
>work_struct *work)
>rcu_read_unlock();
>
>if (do_failover || slave_state_changed) {
>-   if (!rtnl_trylock())
>-   goto re_arm;
>+   rtnl_lock();
>
>if (slave_state_changed) {
>bond_slave_state_change(bond);
>
>

---
-Jay Vosburgh, jay.vosbu...@canonical.com

[PATCH] net: phy: put genphy_config_init's EXPORT_SYMBOL directly after the function

2017-05-23 Thread Uwe Kleine-König

Commit af6b6967d6e1 ("net: phy: export genphy_config_init()") introduced
this EXPORT_SYMBOL and put it after gen10g_soft_reset() instead of
directly after genphy_config_init. Probably this happend when the patch
was applied because http://patchwork.ozlabs.org/patch/339622/ looks ok.

Signed-off-by: Uwe Kleine-König 
---
 drivers/net/phy/phy_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 1219eeab69d1..0780e9f9e167 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1571,13 +1571,13 @@ int genphy_config_init(struct phy_device *phydev)
 
return 0;
 }
+EXPORT_SYMBOL(genphy_config_init);
 
 static int gen10g_soft_reset(struct phy_device *phydev)
 {
/* Do nothing for now */
return 0;
 }
-EXPORT_SYMBOL(genphy_config_init);
 
 static int gen10g_config_init(struct phy_device *phydev)
 {
-- 
2.11.0

[PATCH net-next] tcp: better validation of received ack sequences

2017-05-23 Thread Eric Dumazet

From: Eric Dumazet 

Paul Fiterau Brostean reported :


Linux TCP stack we analyze exhibits behavior that seems odd to me.
The scenario is as follows (all packets have empty payloads, no window
scaling, rcv/snd window size should not be a factor):

   TEST HARNESS (CLIENT)LINUX SERVER

   1.  -  LISTEN (server listen,
then accepts)

   2.  - -->--> SYN-RECEIVED

   3.  - <--   <-- SYN-RECEIVED

   4.  - -->   --> ESTABLISHED
  
   5.  - <--   <-- FIN WAIT-1 (server
opts to close the data connection calling "close" on the connection
socket)

   6.  - -->  --> CLOSING (client sends
FIN,ACK with not yet sent acknowledgement number)

   7.  - <--   <-- CLOSING (ACK is 102
instead of 101, why?)
  
... (silence from CLIENT)

   8.  - <--   <-- CLOSING
(retransmission, again ACK is 102)


Now, note that packet 6 while having the expected sequence number, 
acknowledges something that wasn't sent by the server. So I would
expect 
the packet to maybe prompt an ACK response from the server, and then be 
ignored. Yet it is not ignored and actually leads to an increase of the 
acknowledgement number in the server's retransmission of the FIN,ACK 
packet. The explanation I found is that the FIN  in packet 6 was 
processed, despite the acknowledgement number being unacceptable. 
Further experiments indeed show that the server processes this FIN, 
transitioning to CLOSING, then on receiving an ACK for the FIN it had 
send in packet 5, the server (or better said connection) transitions 
from CLOSING to TIME_WAIT (as signaled by netstat).



Indeed, tcp_rcv_state_process() calls tcp_ack() but
does not exploit the @acceptable status but for TCP_SYN_RECV
state.

What we want here is to send a challenge ACK, if not in TCP_SYN_RECV
state. TCP_FIN_WAIT1 state is not the only state we should fix.

Add a FLAG_NO_CHALLENGE_ACK so that tcp_rcv_state_process()
can choose to send a challenge ACK and discard the packet instead
of wrongly change socket state.

With help from Neal Cardwell.

Signed-off-by: Eric Dumazet 
Reported-by: Paul Fiterau Brostean 
Cc: Neal Cardwell 
Cc: Yuchung Cheng 
Cc: Soheil Hassas Yeganeh 
---
Note for Googlers : Google-Bug-Id: 37204158

 net/ipv4/tcp_input.c |   24 +++-
 1 file changed, 11 insertions(+), 13 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
2fa55f57ac06584bfd9b555799ceb3bbfb7e1b4e..c3bdcbcf544793ba410c618130586bf7d3963da6
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -112,6 +112,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
 #define FLAG_DSACKING_ACK  0x800 /* SACK blocks contained D-SACK info */
 #define FLAG_SACK_RENEGING 0x2000 /* snd_una advanced to a sacked seq */
 #define FLAG_UPDATE_TS_RECENT  0x4000 /* tcp_replace_ts_recent() */
+#define FLAG_NO_CHALLENGE_ACK  0x8000 /* do not call tcp_send_challenge_ack()  
*/
 
 #define FLAG_ACKED (FLAG_DATA_ACKED|FLAG_SYN_ACKED)
 #define FLAG_NOT_DUP   (FLAG_DATA|FLAG_WIN_UPDATE|FLAG_ACKED)
@@ -3568,7 +3569,8 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
if (before(ack, prior_snd_una)) {
/* RFC 5961 5.2 [Blind Data Injection Attack].[Mitigation] */
if (before(ack, prior_snd_una - tp->max_window)) {
-   tcp_send_challenge_ack(sk, skb);
+   if (!(flag & FLAG_NO_CHALLENGE_ACK))
+   tcp_send_challenge_ack(sk, skb);
return -1;
}
goto old_ack;
@@ -5951,13 +5953,17 @@ int tcp_rcv_state_process(struct sock *sk, struct 
sk_buff *skb)
 
/* step 5: check the ACK field */
acceptable = tcp_ack(sk, skb, FLAG_SLOWPATH |
- FLAG_UPDATE_TS_RECENT) > 0;
+ FLAG_UPDATE_TS_RECENT |
+ FLAG_NO_CHALLENGE_ACK) > 0;
 
+   if (!acceptable) {
+   if (sk->sk_state == TCP_SYN_RECV)
+   return 1;   /* send one RST */
+   tcp_send_challenge_ack(sk, skb);
+   goto discard;
+   }
switch (sk->sk_state) {
case TCP_SYN_RECV:
-   if (!acceptable)
-   return 1;
-
if (!tp->srtt_us)
tcp_synack_rtt_meas(sk, req);
 
@@ -6026,14 +6032,6 @@ int tcp_rcv_state_process(struct sock *sk, struct 
sk_buff *skb)
 * our SYNACK so stop the SYNACK timer.
 */
if (req) {
-   /* Return RST if ack_seq is invalid.
-* Note that RFC793 only says to generate a
-* DUPACK for it but for TCP Fast Open it seems
-* better to treat this case like TCP_SYN_RECV
-* above.
-

Re: [PATCH net-next] net: dsa: support cross-chip ageing time

2017-05-23 Thread Andrew Lunn

On Tue, May 23, 2017 at 03:20:59PM -0400, Vivien Didelot wrote:
> Now that the switchdev bridge ageing time attribute is propagated to all
> switch chips of the fabric, each switch can check if the requested value
> is valid and program itself, so that the whole fabric shares a common
> ageing time setting.
> 
> This is especially needed for switch chips in between others, containing
> no bridge port members but evidently used in the data path.
> 
> To achieve that, remove the condition which skips the other switches. We
> also don't need to identify the target switch anymore, thus remove the
> sw_index member of the dsa_notifier_ageing_time_info notifier structure.
> 
> On ZII Dev Rev B (with two 88E6352 and one 88E6185) and ZII Dev Rev C
> (with two 88E6390X), we have the following hardware configuration:
> 
> # ip link add name br0 type bridge
> # ip link set master br0 dev lan6
> br0: port 1(lan6) entered blocking state
> br0: port 1(lan6) entered disabled state
> # echo 2000 > /sys/class/net/br0/bridge/ageing_time
> 
> Before this patch:
> 
> zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 30
> 30
> 15000
> 
> zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 30
> 18750
> 
> After this patch:
> 
> zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 15000
> 15000
> 15000
> 
> zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 18750
> 18750
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

[patch iproute2] tc: flower: add support for tcp flags

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Allow user to insert a flower classifier filter rule which includes
match for tcp flags.

Signed-off-by: Jiri Pirko 
---
v1->v2:
- removed forgotten debug printout
- fixed mask parsing as reported by Or
---
 include/linux/pkt_cls.h |  3 +++
 man/man8/tc-flower.8|  8 +++
 tc/f_flower.c   | 62 +
 3 files changed, 73 insertions(+)

diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index d613be3..ce9dfb9 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -450,6 +450,9 @@ enum {
TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
TCA_FLOWER_KEY_MPLS_LABEL,  /* be32 - 20 bits */
 
+   TCA_FLOWER_KEY_TCP_FLAGS,   /* be16 */
+   TCA_FLOWER_KEY_TCP_FLAGS_MASK,  /* be16 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/man/man8/tc-flower.8 b/man/man8/tc-flower.8
index ba29065..7648079 100644
--- a/man/man8/tc-flower.8
+++ b/man/man8/tc-flower.8
@@ -35,6 +35,8 @@ flower \- flow based traffic control filter
 .IR PREFIX " | { "
 .BR dst_port " | " src_port " } "
 .IR port_number " } | "
+.B tcp_flags
+.IR MASKED_TCP_FLAGS " | "
 .B type
 .IR MASKED_TYPE " | "
 .B code
@@ -136,6 +138,12 @@ Match on layer 4 protocol source or destination port 
number. Only available for
 .BR ip_proto " values " udp ", " tcp  " and " sctp
 which have to be specified in beforehand.
 .TP
+.BI tcp_flags " MASKED_TCP_FLAGS"
+Match on TCP flags represented as 12bit bitfield in in hexadecimal format.
+A mask may be optionally provided to limit the bits which are matched. A mask
+is provided by following the value with a slash and then the mask. If the mask
+is missing then a match on all bits is assumed.
+.TP
 .BI type " MASKED_TYPE"
 .TQ
 .BI code " MASKED_CODE"
diff --git a/tc/f_flower.c b/tc/f_flower.c
index ebc63ca..1b6b46e 100644
--- a/tc/f_flower.c
+++ b/tc/f_flower.c
@@ -57,6 +57,7 @@ static void explain(void)
"   src_ip PREFIX |\n"
"   dst_port PORT-NUMBER |\n"
"   src_port PORT-NUMBER |\n"
+   "   tcp_flags MASKED-TCP_FLAGS |\n"
"   type MASKED-ICMP-TYPE |\n"
"   code MASKED-ICMP-CODE |\n"
"   arp_tip IPV4-PREFIX |\n"
@@ -474,6 +475,41 @@ static int flower_parse_port(char *str, __u8 ip_proto,
return 0;
 }
 
+#define TCP_FLAGS_MAX_MASK 0xfff
+
+static int flower_parse_tcp_flags(char *str, int flags_type, int mask_type,
+ struct nlmsghdr *n)
+{
+   char *slash;
+   int ret, err = -1;
+   __u16 flags;
+
+   slash = strchr(str, '/');
+   if (slash)
+   *slash = '\0';
+
+   ret = get_u16(&flags, str, 16);
+   if (ret < 0 || flags & ~TCP_FLAGS_MAX_MASK)
+   goto err;
+
+   addattr16(n, MAX_MSG, flags_type, htons(flags));
+
+   if (slash) {
+   ret = get_u16(&flags, slash + 1, 16);
+   if (ret < 0 || flags & ~TCP_FLAGS_MAX_MASK)
+   goto err;
+   } else {
+   flags = TCP_FLAGS_MAX_MASK;
+   }
+   addattr16(n, MAX_MSG, mask_type, htons(flags));
+
+   err = 0;
+err:
+   if (slash)
+   *slash = '/';
+   return err;
+}
+
 static int flower_parse_key_id(const char *str, int type, struct nlmsghdr *n)
 {
int ret;
@@ -671,6 +707,16 @@ static int flower_parse_opt(struct filter_util *qu, char 
*handle,
fprintf(stderr, "Illegal \"src_port\"\n");
return -1;
}
+   } else if (matches(*argv, "tcp_flags") == 0) {
+   NEXT_ARG();
+   ret = flower_parse_tcp_flags(*argv,
+TCA_FLOWER_KEY_TCP_FLAGS,
+
TCA_FLOWER_KEY_TCP_FLAGS_MASK,
+n);
+   if (ret < 0) {
+   fprintf(stderr, "Illegal \"tcp_flags\"\n");
+   return -1;
+   }
} else if (matches(*argv, "type") == 0) {
NEXT_ARG();
ret = flower_parse_icmp(*argv, eth_type, ip_proto,
@@ -1000,6 +1046,19 @@ static void flower_print_port(FILE *f, char *name, 
struct rtattr *attr)
fprintf(f, "\n  %s %d", name, rta_getattr_be16(attr));
 }
 
+static void flower_print_tcp_flags(FILE *f, char *name,
+ struct rtattr *flags_attr,
+ struct rtattr *mask_attr)
+{
+   if (!flags_attr)
+   return;
+   fprintf(f, "\n  %s %x", name, rta_getattr_be16(flags_attr));
+   if (!mask_attr)
+   return;

Re: bond link state mismatch, rtnl_trylock() vs rtnl_lock()

2017-05-23 Thread Nithin Sujir




On 5/23/2017 2:30 PM, Mahesh Bandewar (महेश बंडेवार) wrote:

On Tue, May 23, 2017 at 12:32 PM, Nithin Sujir  wrote:

diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 5dca77e..1f60503 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct
work_struct *work)
 rcu_read_unlock();

 if (do_failover || slave_state_changed) {
-   if (!rtnl_trylock())
-   goto re_arm;
+   rtnl_lock();

Nitin, you can't do this. The tryRTNL code is to prevent deadlock
during work-cancellation during bond_close().
Thanks, Mahesh. Yes, Jay pointed me to your patch and I will take a look 
at how to use a similar approach.


Nithin.


 if (slave_state_changed) {
 bond_slave_state_change(bond);

Re: bond link state mismatch, rtnl_trylock() vs rtnl_lock()

2017-05-23 Thread महेश बंडेवार

On Tue, May 23, 2017 at 12:32 PM, Nithin Sujir  wrote:
> Hi,
> We're encountering a problem in 4.4 LTS where, rarely, the bond link state
> is not updated when the slave link changes.
>
> I've traced the issue to the arp monitor unable to get the rtnl lock. The
> sequence resulting in failure is as below.
>
> bond_loadbalance_arp_mon() periodically called, if slave link is _down_, it
> checks if the slave is sending/receiving packets. If it is, it sets flags to
> be processed later down the function for bond link update. However, it sets
> the slave->link right away.
>
> if (slave->link != BOND_LINK_UP) {
> if (bond_time_in_interval(bond, trans_start, 1) &&
> bond_time_in_interval(bond, slave->last_rx, 1))
> {
>
> slave->link  = BOND_LINK_UP;
> slave_state_changed = 1;
>
>
> Later down the function, it tries to get the rtnl_lock. If it doesn't get
> it, it rearms and returns.
>
> if (do_failover || slave_state_changed) {
> if (!rtnl_trylock())
> goto re_arm; <-- returns here
>
> if (slave_state_changed) {
> bond_slave_state_change(bond);
>
> This is the problem. The next time this function is called, the slave->link
> is already marked UP. And we will never update the bond link state to UP.
>
> Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.
>
> Is this the right way to fix it? If it is, I can submit this formally.
>
> What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some
> places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't
> find much via a google search or in Documentation/.
>
> Thanks,
> Nithin.
>
> 
>
> diff --git a/drivers/net/bonding/bond_main.c
> b/drivers/net/bonding/bond_main.c
> index 5dca77e..1f60503 100644
> --- a/drivers/net/bonding/bond_main.c
> +++ b/drivers/net/bonding/bond_main.c
> @@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct
> work_struct *work)
> rcu_read_unlock();
>
> if (do_failover || slave_state_changed) {
> -   if (!rtnl_trylock())
> -   goto re_arm;
> +   rtnl_lock();

Nitin, you can't do this. The tryRTNL code is to prevent deadlock
during work-cancellation during bond_close().

>
> if (slave_state_changed) {
> bond_slave_state_change(bond);
>
>

Re: [PATCH] brcmfmac: Fix kernel oops on resume when request firmware fails.

2017-05-23 Thread Franky Lin

Hi Enric,

On Tue, May 23, 2017 at 11:07 AM, Enric Balletbo i Serra
 wrote:
> When request firmware fails, brcmf_ops_sdio_remove is being called and
> brcmf_bus freed. In such circumstancies if you do a suspend/resume cycle
> the kernel hangs on resume due a NULL pointer dereference in resume
> function.
>
> Steps to reproduce the problem:
>  - modprobe brcmfmac without the firmware
>  brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac4354-sdio.bin
>  failed with error -2
>  - do a suspend/resume cycle (echo mem > /sys/power/state)
>
> Protect against the NULL pointer derefence by checking if dev_get_drvdata
> returned a valid pointer.
>
> Signed-off-by: Enric Balletbo i Serra 
> ---
> I'm not sure about if this is the correct way to fix this but at least it
> prevents the kernel to hang. From one side I'm not sure why suspend/resume
> functions are called in such case and why the device is not removed from
> the bus, from the other side I saw, that others drivers only unregisters
> from sdio when the driver is removed so I supose this is the normal behavior.
>

Thank you for reporting this. I also think these questions you listed
should be answered before putting the null check in resume routine. I
will dig deeper and share my finding on the thread.

Regards,
Franky

> Cheers,
>  Enric
>
>  drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c 
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> index 9b970dc..aa0e7c2 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
> @@ -1274,14 +1274,16 @@ static int brcmf_ops_sdio_suspend(struct device *dev)
>  static int brcmf_ops_sdio_resume(struct device *dev)
>  {
> struct brcmf_bus *bus_if = dev_get_drvdata(dev);
> -   struct brcmf_sdio_dev *sdiodev = bus_if->bus_priv.sdio;
> struct sdio_func *func = container_of(dev, struct sdio_func, dev);
>
> brcmf_dbg(SDIO, "Enter: F%d\n", func->num);
> if (func->num != SDIO_FUNC_2)
> return 0;
>
> -   brcmf_sdiod_freezer_off(sdiodev);
> +   if (!bus_if)
> +   return 0;
> +
> +   brcmf_sdiod_freezer_off(bus_if->bus_priv.sdio);
> return 0;
>  }
>
> @@ -1319,4 +1321,3 @@ void brcmf_sdio_exit(void)
>
> sdio_unregister_driver(&brcmf_sdmmc_driver);
>  }
> -
> --
> 2.9.3
>

Re: Alignment in BPF verifier

2017-05-23 Thread Daniel Borkmann


On 05/23/2017 09:45 PM, Alexei Starovoitov wrote:

On 5/23/17 7:41 AM, Edward Cree wrote:

I'm still plugging away at this... it's going to be quite a big patch and
 rewrite a lot of stuff (and I'm not sure I'll be able to break it into
 smaller bisectable patches).
And of course I have more questions.  In check_packet_ptr_add(), we
 forbid adding a negative constant to a packet ptr.  Is there some
 principled reason for that, or is it just because the bounds checking is
 hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
 to avoid overflow etc.), then the subtraction should be legal.  Indeed,
 even if the reg->off (fixed part of offset) is zero, if the variable part
 is known (min_value) to be >= -imm, the subtraction should be safe.


adding negative imm to pkt_ptr is ok, but what is the use case?
Do you see llvm generating such code?


Btw, currently, you kind of have it in a limited way via BPF_STX_MEM()
and BPF_LDX_MEM() when accessing pkt data through the offset, which
can be negative on the insn level.


I think if we try to track everything with the current shape of
state pruning, the verifier will stop accepting old programs
because it reaches complexity limit.

I think we need to rearchitect the whole thing.
I was thinking of doing it compiler-style. Convert to ssa and
do traditional data flow analysis, use-def chains, register liveness
then pruning heuristics won't be necessary and verifier should be
able to check everything in more or less single pass.
Things like register liveness can be done without ssa. It can
be used to augment existing pruning, since it will know which
registers are dead, so they don't have to be compared, but it
feels half-way. I'd rather go all the way.


On 20/05/17 00:05, Daniel Borkmann wrote:

Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
track all registers (incl. spilled ones) with the same reg->id
that originated from the same map lookup. After the reg type is
then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
for map in map) or UNKNOWN_VALUE depending on the branch, the
reg->id is then reset to 0 again. Whole reason for this is that
LLVM generates code where it can move and/or spill a reg of type
PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
test on it, and later on it expects that the spilled or moved
regs work wrt access. So they're marked with an id and then all
of them are type migrated. So here meaning of reg->id is different
than in PTR_TO_PACKET case.

Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?


'id' is sort of like 'version' of a pointer and has the same meaning in
both cases. How exactly do you see this split?


Also, same id is never reused once generated and later propagated
through regs. So far we haven't run into this kind of optimization
from llvm side yet, but others which led to requiring the id marker
(see 57a09bf0a416). I could imagine it might be needed at some point,
though where we later transition directly to PTR_TO_MAP_VALUE_ADJ
after NULL check. Out of curiosity, did you run into it with llvm?


Of course that would need (more) caution wrt. states_equal(), but it
 looks like I'll be mangling that a lot anyway - for instance, we don't
 want to just use memcmp() to compare alignments, we want to check that
 our alignment is stricter than the old alignment.  (Of course memcmp()
 is a conservative check, so the "memcmp() the whole reg_state" fast
 path can remain.)


yes. that would be good improvement. Not sure how much it will help
the pruning though.

Re: [PATCH net] netfilter: do not hold dev in ipt_CLUSTERIP

2017-05-23 Thread Pablo Neira Ayuso

On Sat, May 20, 2017 at 05:08:06PM +0800, Xin Long wrote:
> It's a terrible thing to hold dev in iptables target. When the dev is
> being removed, unregister_netdevice has to wait for the dev to become
> free. dmesg will keep logging the err:
> 
>   kernel:unregister_netdevice: waiting for veth0_in to become free. \
>   Usage count = 1
> 
> until iptables rules with this target are removed manually.
> 
> The worse thing is when deleting a netns, a virtual nic will be deleted
> instead of reset to init_net in default_device_ops exit/exit_batch. As
> it is earlier than to flush the iptables rules in iptable_filter_net_ops
> exit, unregister_netdevice will block to wait for the nic to become free.
> 
> As unregister_netdevice is actually waiting for iptables rules flushing
> while iptables rules have to be flushed after unregister_netdevice. This
> 'dead lock' will cause unregister_netdevice to block there forever. As
> the netns is not available to operate at that moment, iptables rules can
> not even be flushed manually either.
> 
> The reproducer can be:
> 
>   # ip netns add test
>   # ip link add veth0_in type veth peer name veth0_out
>   # ip link set veth0_in netns test
>   # ip netns exec test ip link set lo up
>   # ip netns exec test ip link set veth0_in up
>   # ip netns exec test iptables -I INPUT -d 1.2.3.4 -i veth0_in -j \
> CLUSTERIP --new --clustermac 89:d4:47:eb:9a:fa --total-nodes 3 \
> --local-node 1 --hashmode sourceip-sourceport
>   # ip netns del test
> 
> This issue can be triggered by all virtual nics with ipt_CLUSTERIP.
> 
> This patch is to fix it by not holding dev in ipt_CLUSTERIP, but only
> save dev->ifindex instead of dev. When removing the mc from the dev,
> it will get dev by c->ifindex through dev_get_by_index.
> 
> Note that it doesn't save dev->name but dev->ifindex, as a dev->name
> can be changed, it will confuse ipt_CLUSTERIP.
> 
> Reported-by: Jianlin Shi 
> Signed-off-by: Xin Long 

OK. Let's fix this finally... One comment below.

>  net/ipv4/netfilter/ipt_CLUSTERIP.c | 31 ++-
>  1 file changed, 18 insertions(+), 13 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> index 038f293..d1adb2f 100644
> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> @@ -47,7 +47,7 @@ struct clusterip_config {
>  
>   __be32 clusterip;   /* the IP address */
>   u_int8_t clustermac[ETH_ALEN];  /* the MAC address */
> - struct net_device *dev; /* device */
> + int ifindex;/* device ifindex */
>   u_int16_t num_total_nodes;  /* total number of nodes */
>   unsigned long local_nodes;  /* node number array */
>  
> @@ -98,19 +98,23 @@ clusterip_config_put(struct clusterip_config *c)
>   * entry(rule) is removed, remove the config from lists, but don't free it
>   * yet, since proc-files could still be holding references */
>  static inline void
> -clusterip_config_entry_put(struct clusterip_config *c)
> +clusterip_config_entry_put(struct net *net, struct clusterip_config *c)
>  {
> - struct net *net = dev_net(c->dev);
>   struct clusterip_net *cn = net_generic(net, clusterip_net_id);
>  
>   local_bh_disable();
>   if (refcount_dec_and_lock(&c->entries, &cn->lock)) {
> + struct net_device *dev;
> +
>   list_del_rcu(&c->list);
>   spin_unlock(&cn->lock);
>   local_bh_enable();
>  
> - dev_mc_del(c->dev, c->clustermac);
> - dev_put(c->dev);
> + dev = dev_get_by_index(net, c->ifindex);
> + if (dev) {
> + dev_mc_del(dev, c->clustermac);
> + dev_put(dev);
> + }
>  
>   /* In case anyone still accesses the file, the open/close
>* functions are also incrementing the refcount on their own,
> @@ -182,7 +186,7 @@ clusterip_config_init(const struct ipt_clusterip_tgt_info 
> *i, __be32 ip,
>   if (!c)
>   return ERR_PTR(-ENOMEM);
>  
> - c->dev = dev;
> + c->ifindex = dev->ifindex;
>   c->clusterip = ip;
>   memcpy(&c->clustermac, &i->clustermac, ETH_ALEN);
>   c->num_total_nodes = i->num_total_nodes;
> @@ -427,12 +431,14 @@ static int clusterip_tg_check(const struct 
> xt_tgchk_param *par)
>   }
>  
>   config = clusterip_config_init(cipinfo,
> - e->ip.dst.s_addr, dev);
> +e->ip.dst.s_addr, dev);
>   if (IS_ERR(config)) {
>   dev_put(dev);
>   return PTR_ERR(config);
>   }
> - dev_mc_add(config->dev, config->clustermac);
> +
> + dev_mc_add(dev, config->clusterm

[PATCH net-next 0/4] More marvell phy cleanups

2017-05-23 Thread Andrew Lunn

This patchset continues the cleanup of the Marvell PHY driver.  These
phys use pages to allow more than the 32 registers that fit into the
MDIO address space. Cleanup the code used for changing pages.

Andrew Lunn (4):
  net: phy: marvell: #defines for copper and fibre pages
  net: phy: marvell: More hidden page changes refactored
  net: phy: marvell: helper to get and set page
  net: phy: marvell: Uniform page names

 drivers/net/phy/marvell.c | 181 +-
 1 file changed, 98 insertions(+), 83 deletions(-)

-- 
2.11.0

[PATCH net-next 1/4] net: phy: marvell: #defines for copper and fibre pages

2017-05-23 Thread Andrew Lunn

Replace magic numbers for PHY pages with symbolic names.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 88cd97b44ba6..bb067026353a 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -650,7 +650,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
 
mdelay(500);
 
-   err = marvell_set_page(phydev, 0);
+   err = marvell_set_page(phydev, MII_M_COPPER);
if (err < 0)
return err;
 
@@ -662,7 +662,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = marvell_set_page(phydev, 2);
+   err = marvell_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
if (err < 0)
return err;
temp = phy_read(phydev, MII_M1116R_CONTROL_REG_MAC);
@@ -671,7 +671,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
err = phy_write(phydev, MII_M1116R_CONTROL_REG_MAC, temp);
if (err < 0)
return err;
-   err = marvell_set_page(phydev, 0);
+   err = marvell_set_page(phydev, MII_M_COPPER);
if (err < 0)
return err;
 
@@ -892,7 +892,7 @@ static int m88e1510_config_init(struct phy_device *phydev)
return err;
 
/* Reset page selection */
-   err = marvell_set_page(phydev, 0);
+   err = marvell_set_page(phydev, MII_M_COPPER);
if (err < 0)
return err;
}
@@ -922,7 +922,7 @@ static int m88e1118_config_init(struct phy_device *phydev)
int err;
 
/* Change address */
-   err = marvell_set_page(phydev, 2);
+   err = marvell_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
if (err < 0)
return err;
 
@@ -932,7 +932,7 @@ static int m88e1118_config_init(struct phy_device *phydev)
return err;
 
/* Change address */
-   err = marvell_set_page(phydev, 3);
+   err = marvell_set_page(phydev, MII_88E1318S_PHY_LED_PAGE);
if (err < 0)
return err;
 
@@ -949,7 +949,7 @@ static int m88e1118_config_init(struct phy_device *phydev)
return err;
 
/* Reset address */
-   err = marvell_set_page(phydev, 0);
+   err = marvell_set_page(phydev, MII_M_COPPER);
if (err < 0)
return err;
 
@@ -961,7 +961,7 @@ static int m88e1149_config_init(struct phy_device *phydev)
int err;
 
/* Change address */
-   err = marvell_set_page(phydev, 2);
+   err = marvell_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
if (err < 0)
return err;
 
@@ -975,7 +975,7 @@ static int m88e1149_config_init(struct phy_device *phydev)
return err;
 
/* Reset address */
-   err = marvell_set_page(phydev, 0);
+   err = marvell_set_page(phydev, MII_M_COPPER);
if (err < 0)
return err;
 
@@ -1409,7 +1409,7 @@ static void m88e1318_get_wol(struct phy_device *phydev,
MII_88E1318S_PHY_WOL_CTRL_MAGIC_PACKET_MATCH_ENABLE)
wol->wolopts |= WAKE_MAGIC;
 
-   if (marvell_set_page(phydev, 0x00) < 0)
+   if (marvell_set_page(phydev, MII_M_COPPER) < 0)
return;
 }
 
@@ -1422,7 +1422,7 @@ static int m88e1318_set_wol(struct phy_device *phydev,
 
if (wol->wolopts & WAKE_MAGIC) {
/* Explicitly switch to page 0x00, just to be sure */
-   err = marvell_set_page(phydev, 0x00);
+   err = marvell_set_page(phydev, MII_M_COPPER);
if (err < 0)
return err;
 
-- 
2.11.0

[PATCH net-next 4/4] net: phy: marvell: Uniform page names

2017-05-23 Thread Andrew Lunn

Bring all the page names together, remove the repeats, and make them
uniform.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 94 +++
 1 file changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index ab00609af2be..0569614ec236 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -41,6 +41,12 @@
 #include 
 
 #define MII_MARVELL_PHY_PAGE   22
+#define MII_MARVELL_COPPER_PAGE0x00
+#define MII_MARVELL_FIBER_PAGE 0x01
+#define MII_MARVELL_MSCR_PAGE  0x02
+#define MII_MARVELL_LED_PAGE   0x03
+#define MII_MARVELL_MISC_TEST_PAGE 0x06
+#define MII_MARVELL_WOL_PAGE   0x11
 
 #define MII_M1011_IEVENT   0x13
 #define MII_M1011_IEVENT_CLEAR 0x
@@ -82,16 +88,11 @@
 #define MII_M_HWCFG_FIBER_COPPER_AUTO  0x8000
 #define MII_M_HWCFG_FIBER_COPPER_RES   0x2000
 
-#define MII_M_COPPER   0
-#define MII_M_FIBER1
-
-#define MII_88E1121_PHY_MSCR_PAGE  2
 #define MII_88E1121_PHY_MSCR_REG   21
 #define MII_88E1121_PHY_MSCR_RX_DELAY  BIT(5)
 #define MII_88E1121_PHY_MSCR_TX_DELAY  BIT(4)
 #define MII_88E1121_PHY_MSCR_DELAY_MASK(~(0x3 << 4))
 
-#define MII_88E1121_MISC_TEST_PAGE 6
 #define MII_88E1121_MISC_TEST  0x1a
 #define MII_88E1510_MISC_TEST_TEMP_THRESHOLD_MASK  0x1f00
 #define MII_88E1510_MISC_TEST_TEMP_THRESHOLD_SHIFT 8
@@ -112,7 +113,6 @@
 #define MII_88E1318S_PHY_CSIER_WOL_EIE  BIT(7)
 
 /* LED Timer Control Register */
-#define MII_88E1318S_PHY_LED_PAGE   0x03
 #define MII_88E1318S_PHY_LED_TCR0x12
 #define MII_88E1318S_PHY_LED_TCR_FORCE_INT  BIT(15)
 #define MII_88E1318S_PHY_LED_TCR_INTn_ENABLEBIT(7)
@@ -123,13 +123,11 @@
 #define MII_88E1318S_PHY_MAGIC_PACKET_WORD1 0x18
 #define MII_88E1318S_PHY_MAGIC_PACKET_WORD0 0x19
 
-#define MII_88E1318S_PHY_WOL_PAGE   0x11
 #define MII_88E1318S_PHY_WOL_CTRL   0x10
 #define MII_88E1318S_PHY_WOL_CTRL_CLEAR_WOL_STATUS  BIT(12)
 #define MII_88E1318S_PHY_WOL_CTRL_MAGIC_PACKET_MATCH_ENABLE BIT(14)
 
 #define MII_88E1121_PHY_LED_CTRL   16
-#define MII_88E1121_PHY_LED_PAGE   3
 #define MII_88E1121_PHY_LED_DEF0x0030
 
 #define MII_M1011_PHY_STATUS   0x11
@@ -465,7 +463,7 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
 {
int err, oldpage, mscr;
 
-   oldpage = marvell_get_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
+   oldpage = marvell_get_set_page(phydev, MII_MARVELL_MSCR_PAGE);
if (oldpage < 0)
return oldpage;
 
@@ -504,7 +502,7 @@ static int m88e1318_config_aneg(struct phy_device *phydev)
 {
int err, oldpage, mscr;
 
-   oldpage = marvell_get_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
+   oldpage = marvell_get_set_page(phydev, MII_MARVELL_MSCR_PAGE);
if (oldpage < 0)
return oldpage;
 
@@ -615,7 +613,7 @@ static int m88e1510_config_aneg(struct phy_device *phydev)
 {
int err;
 
-   err = marvell_set_page(phydev, MII_M_COPPER);
+   err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
if (err < 0)
goto error;
 
@@ -625,7 +623,7 @@ static int m88e1510_config_aneg(struct phy_device *phydev)
goto error;
 
/* Then the fiber link */
-   err = marvell_set_page(phydev, MII_M_FIBER);
+   err = marvell_set_page(phydev, MII_MARVELL_FIBER_PAGE);
if (err < 0)
goto error;
 
@@ -633,10 +631,10 @@ static int m88e1510_config_aneg(struct phy_device *phydev)
if (err < 0)
goto error;
 
-   return marvell_set_page(phydev, MII_M_COPPER);
+   return marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
 
 error:
-   marvell_set_page(phydev, MII_M_COPPER);
+   marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
return err;
 }
 
@@ -659,7 +657,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
 
mdelay(500);
 
-   err = marvell_set_page(phydev, MII_M_COPPER);
+   err = marvell_set_page(phydev, MII_MARVELL_COPPER_PAGE);
if (err < 0)
return err;
 
@@ -671,7 +669,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
if (err < 0)
return err;
 
-   err = marvell_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
+   err = marvell_set_page(phydev, MII_MARVELL_MSCR_PAGE);
if (err < 0)
return err;
temp = phy_read(phydev, MII_M1116R_CONTROL_REG_MAC);
@@ -680,7 +678,7 @@ static int m88e1116r_config_init(struct phy_device *phydev)
err = phy_write(phydev, MII_M1116R_CONTROL_REG_M

[PATCH net-next 2/4] net: phy: marvell: More hidden page changes refactored

2017-05-23 Thread Andrew Lunn

EXT_ADDR_PAGE is the same meaning as MII_MARVELL_PHY_PAGE, i.e. change
page. Replace it will calls to the helpers.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 62 +++
 1 file changed, 46 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index bb067026353a..44ea1dd89ce5 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -54,7 +54,6 @@
 #define MII_M1011_PHY_SCR_MDI_X0x0020
 #define MII_M1011_PHY_SCR_AUTO_CROSS   0x0060
 
-#define MII_M1145_PHY_EXT_ADDR_PAGE0x16
 #define MII_M1145_PHY_EXT_SR   0x1b
 #define MII_M1145_PHY_EXT_CR   0x14
 #define MII_M1145_RGMII_RX_DELAY   0x0080
@@ -92,6 +91,7 @@
 #define MII_88E1121_PHY_MSCR_TX_DELAY  BIT(4)
 #define MII_88E1121_PHY_MSCR_DELAY_MASK(~(0x3 << 4))
 
+#define MII_88E1121_MISC_TEST_PAGE 6
 #define MII_88E1121_MISC_TEST  0x1a
 #define MII_88E1510_MISC_TEST_TEMP_THRESHOLD_MASK  0x1f00
 #define MII_88E1510_MISC_TEST_TEMP_THRESHOLD_SHIFT 8
@@ -760,11 +760,7 @@ static int m88e_config_init_sgmii(struct phy_device 
*phydev)
return err;
 
/* make sure copper is selected */
-   err = phy_read(phydev, MII_M1145_PHY_EXT_ADDR_PAGE);
-   if (err < 0)
-   return err;
-
-   return phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, err & (~0xff));
+   return marvell_set_page(phydev, MII_M_COPPER);
 }
 
 static int m88e_config_init_rtbi(struct phy_device *phydev)
@@ -1556,12 +1552,19 @@ static int m88e1121_get_temp(struct phy_device *phydev, 
long *temp)
 {
int ret;
int val;
+   int oldpage;
 
*temp = 0;
 
mutex_lock(&phydev->lock);
 
-   ret = phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x6);
+   oldpage = marvell_get_page(phydev);
+   if (oldpage < 0) {
+   mutex_unlock(&phydev->lock);
+   return oldpage;
+   }
+
+   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (ret < 0)
goto error;
 
@@ -1593,7 +1596,7 @@ static int m88e1121_get_temp(struct phy_device *phydev, 
long *temp)
*temp = ((val & MII_88E1121_MISC_TEST_TEMP_MASK) - 5) * 5000;
 
 error:
-   phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x0);
+   marvell_set_page(phydev, oldpage);
mutex_unlock(&phydev->lock);
 
return ret;
@@ -1671,12 +1674,19 @@ static const struct hwmon_chip_info 
m88e1121_hwmon_chip_info = {
 static int m88e1510_get_temp(struct phy_device *phydev, long *temp)
 {
int ret;
+   int oldpage;
 
*temp = 0;
 
mutex_lock(&phydev->lock);
 
-   ret = phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x6);
+   oldpage = marvell_get_page(phydev);
+   if (oldpage < 0) {
+   mutex_unlock(&phydev->lock);
+   return oldpage;
+   }
+
+   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (ret < 0)
goto error;
 
@@ -1687,7 +1697,7 @@ static int m88e1510_get_temp(struct phy_device *phydev, 
long *temp)
*temp = ((ret & MII_88E1510_TEMP_SENSOR_MASK) - 25) * 1000;
 
 error:
-   phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x0);
+   marvell_set_page(phydev, oldpage);
mutex_unlock(&phydev->lock);
 
return ret;
@@ -1696,12 +1706,18 @@ static int m88e1510_get_temp(struct phy_device *phydev, 
long *temp)
 int m88e1510_get_temp_critical(struct phy_device *phydev, long *temp)
 {
int ret;
+   int oldpage;
 
*temp = 0;
 
mutex_lock(&phydev->lock);
+   oldpage = marvell_get_page(phydev);
+   if (oldpage < 0) {
+   mutex_unlock(&phydev->lock);
+   return oldpage;
+   }
 
-   ret = phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x6);
+   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (ret < 0)
goto error;
 
@@ -1715,7 +1731,7 @@ int m88e1510_get_temp_critical(struct phy_device *phydev, 
long *temp)
*temp *= 1000;
 
 error:
-   phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x0);
+   marvell_set_page(phydev, oldpage);
mutex_unlock(&phydev->lock);
 
return ret;
@@ -1724,10 +1740,17 @@ int m88e1510_get_temp_critical(struct phy_device 
*phydev, long *temp)
 int m88e1510_set_temp_critical(struct phy_device *phydev, long temp)
 {
int ret;
+   int oldpage;
 
mutex_lock(&phydev->lock);
 
-   ret = phy_write(phydev, MII_M1145_PHY_EXT_ADDR_PAGE, 0x6);
+   oldpage = marvell_get_page(phydev);
+   if (oldpage < 0) {
+   mutex_unlock(&phydev->lock);
+   return oldpage;
+   }
+
+   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (ret < 0)
goto error;
 
@@ -1742,7 +1765,7 @@ int m88e1510_set_temp_critical(struct phy_device *phydev

[PATCH net-next 3/4] net: phy: marvell: helper to get and set page

2017-05-23 Thread Andrew Lunn

There is a common pattern of first reading the currently selected page
and then changing to another page. Add a helper to do this.

Signed-off-by: Andrew Lunn 
---
 drivers/net/phy/marvell.c | 75 ---
 1 file changed, 31 insertions(+), 44 deletions(-)

diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
index 44ea1dd89ce5..ab00609af2be 100644
--- a/drivers/net/phy/marvell.c
+++ b/drivers/net/phy/marvell.c
@@ -199,6 +199,19 @@ static int marvell_set_page(struct phy_device *phydev, int 
page)
return phy_write(phydev, MII_MARVELL_PHY_PAGE, page);
 }
 
+static int marvell_get_set_page(struct phy_device *phydev, int page)
+{
+   int oldpage = marvell_get_page(phydev);
+
+   if (oldpage < 0)
+   return oldpage;
+
+   if (page != oldpage)
+   return marvell_set_page(phydev, page);
+
+   return 0;
+}
+
 static int marvell_ack_interrupt(struct phy_device *phydev)
 {
int err;
@@ -452,11 +465,9 @@ static int m88e1121_config_aneg(struct phy_device *phydev)
 {
int err, oldpage, mscr;
 
-   oldpage = marvell_get_page(phydev);
-
-   err = marvell_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
-   if (err < 0)
-   return err;
+   oldpage = marvell_get_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
+   if (oldpage < 0)
+   return oldpage;
 
if (phy_interface_is_rgmii(phydev)) {
mscr = phy_read(phydev, MII_88E1121_PHY_MSCR_REG) &
@@ -493,11 +504,9 @@ static int m88e1318_config_aneg(struct phy_device *phydev)
 {
int err, oldpage, mscr;
 
-   oldpage = marvell_get_page(phydev);
-
-   err = marvell_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
-   if (err < 0)
-   return err;
+   oldpage = marvell_get_set_page(phydev, MII_88E1121_PHY_MSCR_PAGE);
+   if (oldpage < 0)
+   return oldpage;
 
mscr = phy_read(phydev, MII_88E1318S_PHY_MSCR1_REG);
mscr |= MII_88E1318S_PHY_MSCR1_PAD_ODD;
@@ -843,11 +852,9 @@ static int m88e1121_config_init(struct phy_device *phydev)
 {
int err, oldpage;
 
-   oldpage = marvell_get_page(phydev);
-
-   err = marvell_set_page(phydev, MII_88E1121_PHY_LED_PAGE);
-   if (err < 0)
-   return err;
+   oldpage = marvell_get_set_page(phydev, MII_88E1121_PHY_LED_PAGE);
+   if (oldpage < 0)
+   return oldpage;
 
/* Default PHY LED config: LED[0] .. Link, LED[1] .. Activity */
err = phy_write(phydev, MII_88E1121_PHY_LED_CTRL,
@@ -1516,12 +1523,11 @@ static u64 marvell_get_stat(struct phy_device *phydev, 
int i)
 {
struct marvell_hw_stat stat = marvell_hw_stats[i];
struct marvell_priv *priv = phydev->priv;
-   int err, oldpage, val;
+   int oldpage, val;
u64 ret;
 
-   oldpage = marvell_get_page(phydev);
-   err = marvell_set_page(phydev, stat.page);
-   if (err < 0)
+   oldpage = marvell_get_set_page(phydev, stat.page);
+   if (oldpage < 0)
return UINT64_MAX;
 
val = phy_read(phydev, stat.reg);
@@ -1558,16 +1564,12 @@ static int m88e1121_get_temp(struct phy_device *phydev, 
long *temp)
 
mutex_lock(&phydev->lock);
 
-   oldpage = marvell_get_page(phydev);
+   oldpage = marvell_get_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (oldpage < 0) {
mutex_unlock(&phydev->lock);
return oldpage;
}
 
-   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
-   if (ret < 0)
-   goto error;
-
/* Enable temperature sensor */
ret = phy_read(phydev, MII_88E1121_MISC_TEST);
if (ret < 0)
@@ -1680,16 +1682,12 @@ static int m88e1510_get_temp(struct phy_device *phydev, 
long *temp)
 
mutex_lock(&phydev->lock);
 
-   oldpage = marvell_get_page(phydev);
+   oldpage = marvell_get_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (oldpage < 0) {
mutex_unlock(&phydev->lock);
return oldpage;
}
 
-   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
-   if (ret < 0)
-   goto error;
-
ret = phy_read(phydev, MII_88E1510_TEMP_SENSOR);
if (ret < 0)
goto error;
@@ -1711,16 +1709,13 @@ int m88e1510_get_temp_critical(struct phy_device 
*phydev, long *temp)
*temp = 0;
 
mutex_lock(&phydev->lock);
-   oldpage = marvell_get_page(phydev);
+
+   oldpage = marvell_get_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
if (oldpage < 0) {
mutex_unlock(&phydev->lock);
return oldpage;
}
 
-   ret = marvell_set_page(phydev, MII_88E1121_MISC_TEST_PAGE);
-   if (ret < 0)
-   goto error;
-
ret = phy_read(phydev, MII_88E1121_MISC_TEST);
if (ret < 0)
goto error;
@@ -1744,16 +1739,12 @@ int m88e1510_set_temp_critical(stru

Re: bond link state mismatch, rtnl_trylock() vs rtnl_lock()

2017-05-23 Thread Nithin Sujir




On 5/23/2017 1:13 PM, Jay Vosburgh wrote:

Nithin Sujir  wrote:


Hi,
We're encountering a problem in 4.4 LTS where, rarely, the bond link state
is not updated when the slave link changes.

I've traced the issue to the arp monitor unable to get the rtnl lock. The
sequence resulting in failure is as below.

bond_loadbalance_arp_mon() periodically called, if slave link is _down_,
it checks if the slave is sending/receiving packets. If it is, it sets
flags to be processed later down the function for bond link
update. However, it sets the slave->link right away.

if (slave->link != BOND_LINK_UP) {
if (bond_time_in_interval(bond, trans_start, 1) &&
bond_time_in_interval(bond, slave->last_rx,
1)) {

slave->link  = BOND_LINK_UP;
slave_state_changed = 1;


Later down the function, it tries to get the rtnl_lock. If it doesn't get
it, it rearms and returns.

if (do_failover || slave_state_changed) {
if (!rtnl_trylock())
goto re_arm; <-- returns here

if (slave_state_changed) {
bond_slave_state_change(bond);

This is the problem. The next time this function is called, the
slave->link is already marked UP. And we will never update the bond link
state to UP.

This looks like an ARP monitor version of

commit de77ecd4ef02ca783f7762e04e92b3d0964be66b
Author: Mahesh Bandewar 
Date:   Mon Mar 27 11:37:33 2017 -0700

 bonding: improve link-status update in mii-monitoring

and probably needs a similar fix (possibly for both the
loadbalance and active-backup ARP monitor cases).
Thanks for the explanation and the pointer to this patch. I will take a 
look.


Thanks, Jay!

Nithin.


Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.

Is this the right way to fix it? If it is, I can submit this formally.

It's not the right way, unfortunately.

The reason for the rtnl_trylock is that there's a possible race
against bond_close() -> bond_work_cancel_all() trying to cancel the
arp_work workqueue item while it's running.  bond_close is called with
RTNL held, so if it has RTNL and is waiting for the work function to
complete, an rtnl_lock call here will deadlock.  Some of the trylock
calls in bonding are commented to this effect, but not this one.

-J


What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some
places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't
find much via a google search or in Documentation/.

Thanks,
Nithin.



diff --git a/drivers/net/bonding/bond_main.c
b/drivers/net/bonding/bond_main.c
index 5dca77e..1f60503 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct
work_struct *work)
rcu_read_unlock();

if (do_failover || slave_state_changed) {
-   if (!rtnl_trylock())
-   goto re_arm;
+   rtnl_lock();

if (slave_state_changed) {
bond_slave_state_change(bond);



---
-Jay Vosburgh, jay.vosbu...@canonical.com

Re: [PATCH v6 net-next 17/17] net: qualcomm: add QCA7000 UART driver

2017-05-23 Thread Lino Sanfilippo

On 23.05.2017 21:38, Stefan Wahren wrote:
> 
>> Lino Sanfilippo  hat am 23. Mai 2017 um 20:16 
>> geschrieben:
>>
>>
>> Hi,
>>
>> On 23.05.2017 15:12, Stefan Wahren wrote:
>>
>>
>>> +}
>>> +
>>> +static void qca_uart_remove(struct serdev_device *serdev)
>>> +{
>>> +   struct qcauart *qca = serdev_device_get_drvdata(serdev);
>>> +
>>> +   netif_carrier_off(qca->net_dev);
>>> +   cancel_work_sync(&qca->tx_work);
>>> +   unregister_netdev(qca->net_dev);
>>
>> Note that it is still possible that the tx work is queued right after 
>> cancel_work_sync()
>> returned and before the net device is unregistered (and thus the check for 
>> the net device
>> being up at the beginning of the tx work function is passed and the function 
>> is executed).
> 
> Even if the carrier is off? Since i see this pattern in some drivers, can you 
> please point me to a reference like a thread or something else?
> 

The check in the tx work function is against the "running" state not against 
the carrier. So why should 
the carrier matter in this case? 


>> I suggest to avoid this possible race by first unregistering the netdevice 
>> and then 
>> calling cancel_work_sync().
> 
> What makes you sure that's safe to unregister the netdev while the tx work 
> queue is possibly active?

unregister_netdevice() calls netdev_close() if the interface is still up. 
netdev_close() calls flush_work()
so the unregistration is delayed until the tx work function is finished. 
Furthermore both close() and
tx work are synchronized by means of the qca->lock which also guarantees that 
unregister_netdevice() wont
be finished until the tx work is done.

But I may have missed something and if unregistering the device while the tx 
work could be running worries you,
we could first close and later unregister the device like in the following 
sequence:


dev_close();
/* the tx work wont be scheduled any more now, however we have to wait for a 
potentially
   earlier scheduled work */ 
cancel_work_sync(&qca->tx_work);
/* we can be sure that the tx work will neither be running nor be started 
again, so 
   it is safe to unregister the netdev */
unregister_netdev(qca->net_dev);
serdev_device_close(serdev);
free_netdev(qca->net_dev);
 
What do you think?

Regards,
Lino

[PATCH v2 net-next 0/2] Start making stats consistent

2017-05-23 Thread Andrew Lunn

It does not appear that the sysfs class net stats have been clearly
defined, resulting in different network interfaces implementing them
slightly different. Define that the rx_bytes and tx_bytes counters
should include the Ethernet header, data and FEC. Modify the r8152 USB
Ethernet driver to include the FEC in its counters.

Andrew Lunn (2):
  Documentation: sysfs-class-net-statistics: Clarify rx_bytes and
tx_bytes
  net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS

 Documentation/ABI/testing/sysfs-class-net-statistics | 6 --
 drivers/net/usb/r8152.c  | 5 ++---
 2 files changed, 6 insertions(+), 5 deletions(-)

-- 
2.11.0

[PATCH v2 net-next 1/2] Documentation: sysfs-class-net-statistics: Clarify rx_bytes and tx_bytes

2017-05-23 Thread Andrew Lunn

Document what is expected for the rx_bytes and tx_bytes statistics in
/sys/class/net//statistics. The FCS should be included in the
statistics. However, since this has been unclear until now, it is
expected a number of drivers don't. But maybe with time they will.

Signed-off-by: Andrew Lunn 
---
 Documentation/ABI/testing/sysfs-class-net-statistics | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-class-net-statistics 
b/Documentation/ABI/testing/sysfs-class-net-statistics
index 397118de7b5e..a487cbb79458 100644
--- a/Documentation/ABI/testing/sysfs-class-net-statistics
+++ b/Documentation/ABI/testing/sysfs-class-net-statistics
@@ -21,7 +21,8 @@ Contact:  netdev@vger.kernel.org
 Description:
Indicates the number of bytes received by this network device.
See the network driver for the exact meaning of when this
-   value is incremented.
+   value is incremented. However, for an Ethernet frame, it should
+   include the Ethernet header, data, and frame check sum.
 
 What:  /sys/class//statistics/rx_compressed
 Date:  April 2005
@@ -125,7 +126,8 @@ Description:
device. See the network driver for the exact meaning of this
value, in particular whether this accounts for all successfully
transmitted packets or all packets that have been queued for
-   transmission.
+   transmission. For an Ethernet frame, it should include the
+   Ethernet header, data, and frame check sum.
 
 What:  /sys/class//statistics/tx_carrier_errors
 Date:  April 2005
-- 
2.11.0

[PATCH v2 net-next 2/2] net: usb: r8152: Fix rx_bytes/tx_bytes to include FCS

2017-05-23 Thread Andrew Lunn

The statistics counters rx_bytes and tx_bytes don't include the
Ethernet Frame Check Sequence. Fix the rx_bytes/tx_bytes counters to
include the FCS, and include the received FCS in the skb.

Signed-off-by: Andrew Lunn 
---
 drivers/net/usb/r8152.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/r8152.c b/drivers/net/usb/r8152.c
index ddc62cb69be8..4081a2cd8b1b 100644
--- a/drivers/net/usb/r8152.c
+++ b/drivers/net/usb/r8152.c
@@ -1228,7 +1228,7 @@ static void write_bulk_callback(struct urb *urb)
stats->tx_errors += agg->skb_num;
} else {
stats->tx_packets += agg->skb_num;
-   stats->tx_bytes += agg->skb_len;
+   stats->tx_bytes += agg->skb_len + CRC_SIZE;
}
 
spin_lock(&tp->tx_lock);
@@ -1826,7 +1826,6 @@ static int rx_bottom(struct r8152 *tp, int budget)
if (urb->actual_length < len_used)
break;
 
-   pkt_len -= CRC_SIZE;
rx_data += sizeof(struct rx_desc);
 
skb = napi_alloc_skb(napi, pkt_len);
@@ -1850,7 +1849,7 @@ static int rx_bottom(struct r8152 *tp, int budget)
}
 
 find_next_rx:
-   rx_data = rx_agg_align(rx_data + pkt_len + CRC_SIZE);
+   rx_data = rx_agg_align(rx_data + pkt_len);
rx_desc = (struct rx_desc *)rx_data;
len_used = (int)(rx_data - (u8 *)agg->head);
len_used += sizeof(struct rx_desc);
-- 
2.11.0

[PATCHv2 net-next 1/5] net: dsa: mv88e6xxx: Move phy functions into phy.[ch]

2017-05-23 Thread Andrew Lunn

The upcoming SERDES support will need to make use of PHY functions. Move
them out into a file of there own. No code changes.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/Makefile|   1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 221 +---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |   2 +-
 drivers/net/dsa/mv88e6xxx/phy.c   | 235 ++
 drivers/net/dsa/mv88e6xxx/phy.h   |  40 ++
 5 files changed, 279 insertions(+), 220 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/phy.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/phy.h

diff --git a/drivers/net/dsa/mv88e6xxx/Makefile 
b/drivers/net/dsa/mv88e6xxx/Makefile
index 6edd869c8d6f..e4372eaf3bc5 100644
--- a/drivers/net/dsa/mv88e6xxx/Makefile
+++ b/drivers/net/dsa/mv88e6xxx/Makefile
@@ -4,4 +4,5 @@ mv88e6xxx-objs += global1.o
 mv88e6xxx-objs += global1_atu.o
 mv88e6xxx-objs += global1_vtu.o
 mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_GLOBAL2) += global2.o
+mv88e6xxx-objs += phy.o
 mv88e6xxx-objs += port.o
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 41de250dbcc3..a3c7756dc01b 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -36,6 +36,7 @@
 #include "mv88e6xxx.h"
 #include "global1.h"
 #include "global2.h"
+#include "phy.h"
 #include "port.h"
 
 static void assert_reg_lock(struct mv88e6xxx_chip *chip)
@@ -221,21 +222,7 @@ int mv88e6xxx_write(struct mv88e6xxx_chip *chip, int addr, 
int reg, u16 val)
return 0;
 }
 
-static int mv88e6165_phy_read(struct mv88e6xxx_chip *chip,
- struct mii_bus *bus,
- int addr, int reg, u16 *val)
-{
-   return mv88e6xxx_read(chip, addr, reg, val);
-}
-
-static int mv88e6165_phy_write(struct mv88e6xxx_chip *chip,
-  struct mii_bus *bus,
-  int addr, int reg, u16 val)
-{
-   return mv88e6xxx_write(chip, addr, reg, val);
-}
-
-static struct mii_bus *mv88e6xxx_default_mdio_bus(struct mv88e6xxx_chip *chip)
+struct mii_bus *mv88e6xxx_default_mdio_bus(struct mv88e6xxx_chip *chip)
 {
struct mv88e6xxx_mdio_bus *mdio_bus;
 
@@ -247,94 +234,6 @@ static struct mii_bus *mv88e6xxx_default_mdio_bus(struct 
mv88e6xxx_chip *chip)
return mdio_bus->bus;
 }
 
-static int mv88e6xxx_phy_read(struct mv88e6xxx_chip *chip, int phy,
- int reg, u16 *val)
-{
-   int addr = phy; /* PHY devices addresses start at 0x0 */
-   struct mii_bus *bus;
-
-   bus = mv88e6xxx_default_mdio_bus(chip);
-   if (!bus)
-   return -EOPNOTSUPP;
-
-   if (!chip->info->ops->phy_read)
-   return -EOPNOTSUPP;
-
-   return chip->info->ops->phy_read(chip, bus, addr, reg, val);
-}
-
-static int mv88e6xxx_phy_write(struct mv88e6xxx_chip *chip, int phy,
-  int reg, u16 val)
-{
-   int addr = phy; /* PHY devices addresses start at 0x0 */
-   struct mii_bus *bus;
-
-   bus = mv88e6xxx_default_mdio_bus(chip);
-   if (!bus)
-   return -EOPNOTSUPP;
-
-   if (!chip->info->ops->phy_write)
-   return -EOPNOTSUPP;
-
-   return chip->info->ops->phy_write(chip, bus, addr, reg, val);
-}
-
-static int mv88e6xxx_phy_page_get(struct mv88e6xxx_chip *chip, int phy, u8 
page)
-{
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_PHY_PAGE))
-   return -EOPNOTSUPP;
-
-   return mv88e6xxx_phy_write(chip, phy, PHY_PAGE, page);
-}
-
-static void mv88e6xxx_phy_page_put(struct mv88e6xxx_chip *chip, int phy)
-{
-   int err;
-
-   /* Restore PHY page Copper 0x0 for access via the registered MDIO bus */
-   err = mv88e6xxx_phy_write(chip, phy, PHY_PAGE, PHY_PAGE_COPPER);
-   if (unlikely(err)) {
-   dev_err(chip->dev, "failed to restore PHY %d page Copper 
(%d)\n",
-   phy, err);
-   }
-}
-
-static int mv88e6xxx_phy_page_read(struct mv88e6xxx_chip *chip, int phy,
-  u8 page, int reg, u16 *val)
-{
-   int err;
-
-   /* There is no paging for registers 22 */
-   if (reg == PHY_PAGE)
-   return -EINVAL;
-
-   err = mv88e6xxx_phy_page_get(chip, phy, page);
-   if (!err) {
-   err = mv88e6xxx_phy_read(chip, phy, reg, val);
-   mv88e6xxx_phy_page_put(chip, phy);
-   }
-
-   return err;
-}
-
-static int mv88e6xxx_phy_page_write(struct mv88e6xxx_chip *chip, int phy,
-   u8 page, int reg, u16 val)
-{
-   int err;
-
-   /* There is no paging for registers 22 */
-   if (reg == PHY_PAGE)
-   return -EINVAL;
-
-   err = mv88e6xxx_phy_page_get(chip, phy, page);
-   if (!err) {
-   err = mv88e6xxx_phy_write(chip, phy, PHY_PAGE, page);
-   mv88e6xxx_phy_page_put(chip, phy);
-   }
-
-   return err;
-}
-
 static int mv88e6xxx_se

Re: [PATCH net-next] net: dsa: support cross-chip ageing time

2017-05-23 Thread Florian Fainelli

On 05/23/2017 12:20 PM, Vivien Didelot wrote:
> Now that the switchdev bridge ageing time attribute is propagated to all
> switch chips of the fabric, each switch can check if the requested value
> is valid and program itself, so that the whole fabric shares a common
> ageing time setting.
> 
> This is especially needed for switch chips in between others, containing
> no bridge port members but evidently used in the data path.
> 
> To achieve that, remove the condition which skips the other switches. We
> also don't need to identify the target switch anymore, thus remove the
> sw_index member of the dsa_notifier_ageing_time_info notifier structure.
> 
> On ZII Dev Rev B (with two 88E6352 and one 88E6185) and ZII Dev Rev C
> (with two 88E6390X), we have the following hardware configuration:
> 
> # ip link add name br0 type bridge
> # ip link set master br0 dev lan6
> br0: port 1(lan6) entered blocking state
> br0: port 1(lan6) entered disabled state
> # echo 2000 > /sys/class/net/br0/bridge/ageing_time
> 
> Before this patch:
> 
> zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 30
> 30
> 15000
> 
> zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 30
> 18750
> 
> After this patch:
> 
> zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 15000
> 15000
> 15000
> 
> zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
> 18750
> 18750
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian

[PATCHv2 net-next 0/5] net: dsa: mv88e6xxx: Add basic SERDES support

2017-05-23 Thread Andrew Lunn

Some of the Marvell switches are SERDES interface, which must be
powered up before packets can be passed. This is particularly true on
the 6390, where the SERDES defaults to down, probably to save power.

This series refactors the existing SERDES support for the 6352, and
adds 6390 support.

v2:

Split phy functions out into phy.[ch]
Don't add MV88E6XXX_FLAG_G1_ATU_FID back again
Move the serdes op up in mv88e6xxx_ops
Move some #defines into serdes.h
Add a mv88e6xxx_serdes_power()
Don't keep moving calls to this helper around in the code

Andrew Lunn (5):
  net: dsa: mv88e6xxx: Move phy functions into phy.[ch]
  net: dsa: mv88e6xxx: Refactor mv88e6352 SERDES code into an op
  net: dsa: mv88e6xxx: Remove SERDES flag
  net: dsa: mv88e6xxx: mv88e6390X SERDES support
  dsa: mv88e6xxx: Enable/Disable SERDES on port enable/disable

 drivers/net/dsa/mv88e6xxx/Makefile|   2 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 315 ++
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  31 +---
 drivers/net/dsa/mv88e6xxx/phy.c   | 232 +
 drivers/net/dsa/mv88e6xxx/phy.h   |  40 +
 drivers/net/dsa/mv88e6xxx/serdes.c| 229 
 drivers/net/dsa/mv88e6xxx/serdes.h|  48 ++
 7 files changed, 611 insertions(+), 286 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/phy.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/phy.h
 create mode 100644 drivers/net/dsa/mv88e6xxx/serdes.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/serdes.h

-- 
2.11.0

[PATCHv2 net-next 5/5] dsa: mv88e6xxx: Enable/Disable SERDES on port enable/disable

2017-05-23 Thread Andrew Lunn

Implement the port enable/disable callbacks, which enable/disable the
SERDES interfaces, if applicable. This should save a bit of
power/heat.

We also need to enable SERDES on CPU and DSA ports, so keep the
existing call to the op, but make it conditional.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 40 +++-
 1 file changed, 35 insertions(+), 5 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index fc4d93b5c59f..edab7eb306d9 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -1862,12 +1862,15 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
if (err)
return err;
 
-   /* If this port is connected to a SerDes, make sure the SerDes is
-* powered up.
+   /* Enable the SERDES interface for DSA and CPU ports. Normal
+* ports SERDES are enabled when the port is enabled, thus
+* saving a bit of power.
 */
-   err = mv88e6xxx_serdes_power(chip, port, true);
-   if (err)
-   return err;
+   if ((dsa_is_cpu_port(ds, port) || dsa_is_dsa_port(ds, port))) {
+   err = mv88e6xxx_serdes_power(chip, port, true);
+   if (err)
+   return err;
+   }
 
/* Port Control 2: don't force a good FCS, set the maximum frame size to
 * 10240 bytes, disable 802.1q tags checking, don't discard tagged or
@@ -1969,6 +1972,31 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
return mv88e6xxx_port_write(chip, port, PORT_DEFAULT_VLAN, 0x);
 }
 
+static int mv88e6xxx_port_enable(struct dsa_switch *ds, int port,
+struct phy_device *phydev)
+{
+   struct mv88e6xxx_chip *chip = ds->priv;
+   int err = 0;
+
+   mutex_lock(&chip->reg_lock);
+   if (chip->info->ops->serdes_power)
+   err = chip->info->ops->serdes_power(chip, port, true);
+   mutex_unlock(&chip->reg_lock);
+
+   return err;
+}
+
+static void mv88e6xxx_port_disable(struct dsa_switch *ds, int port,
+  struct phy_device *phydev)
+{
+   struct mv88e6xxx_chip *chip = ds->priv;
+
+   mutex_lock(&chip->reg_lock);
+   if (chip->info->ops->serdes_power)
+   chip->info->ops->serdes_power(chip, port, false);
+   mutex_unlock(&chip->reg_lock);
+}
+
 static int mv88e6xxx_g1_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr)
 {
int err;
@@ -3821,6 +3849,8 @@ static const struct dsa_switch_ops mv88e6xxx_switch_ops = 
{
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
+   .port_enable= mv88e6xxx_port_enable,
+   .port_disable   = mv88e6xxx_port_disable,
.set_eee= mv88e6xxx_set_eee,
.get_eee= mv88e6xxx_get_eee,
.get_eeprom_len = mv88e6xxx_get_eeprom_len,
-- 
2.11.0

[PATCHv2 net-next 4/5] net: dsa: mv88e6xxx: mv88e6390X SERDES support

2017-05-23 Thread Andrew Lunn

The mv88e6390X family has 8 SERDES lanes. These can be used for 2
10Gbps ports, ports 9 or 10. If these ports are used at slower speeds,
the SERDES lanes become available for other ports for 1000Base-X.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/chip.c   |   6 ++
 drivers/net/dsa/mv88e6xxx/serdes.c | 154 +
 drivers/net/dsa/mv88e6xxx/serdes.h |  24 ++
 3 files changed, 184 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index b029d01a0a5f..fc4d93b5c59f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -2757,6 +2757,7 @@ static const struct mv88e6xxx_ops mv88e6190_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6390_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6190x_ops = {
@@ -2789,6 +2790,7 @@ static const struct mv88e6xxx_ops mv88e6190x_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6390_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6191_ops = {
@@ -2821,6 +2823,7 @@ static const struct mv88e6xxx_ops mv88e6191_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6390_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6240_ops = {
@@ -2888,6 +2891,7 @@ static const struct mv88e6xxx_ops mv88e6290_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6390_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6320_ops = {
@@ -3113,6 +3117,7 @@ static const struct mv88e6xxx_ops mv88e6390_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6390_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6390x_ops = {
@@ -3147,6 +3152,7 @@ static const struct mv88e6xxx_ops mv88e6390x_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6390_g1_vtu_getnext,
.vtu_loadpurge = mv88e6390_g1_vtu_loadpurge,
+   .serdes_power = mv88e6390_serdes_power,
 };
 
 static const struct mv88e6xxx_info mv88e6xxx_table[] = {
diff --git a/drivers/net/dsa/mv88e6xxx/serdes.c 
b/drivers/net/dsa/mv88e6xxx/serdes.c
index 235f5f0c30ae..53795676bd70 100644
--- a/drivers/net/dsa/mv88e6xxx/serdes.c
+++ b/drivers/net/dsa/mv88e6xxx/serdes.c
@@ -13,6 +13,7 @@
 
 #include 
 
+#include "global2.h"
 #include "mv88e6xxx.h"
 #include "phy.h"
 #include "port.h"
@@ -73,3 +74,156 @@ int mv88e6352_serdes_power(struct mv88e6xxx_chip *chip, int 
port, bool on)
 
return 0;
 }
+
+/* Set the power on/off for 10GBASE-R and 10GBASE-X4/X2 */
+static int mv88e6390_serdes_10g(struct mv88e6xxx_chip *chip, int addr, bool on)
+{
+   u16 val, new_val;
+   int reg_c45;
+   int err;
+
+   reg_c45 = MII_ADDR_C45 | MV88E6390_SERDES_DEVICE |
+   MV88E6390_PCS_CONTROL_1;
+   err = mv88e6xxx_phy_read(chip, addr, reg_c45, &val);
+   if (err)
+   return err;
+
+   if (on)
+   new_val = val & ~(MV88E6390_PCS_CONTROL_1_RESET |
+ MV88E6390_PCS_CONTROL_1_LOOPBACK |
+ MV88E6390_PCS_CONTROL_1_PDOWN);
+   else
+   new_val = val | MV88E6390_PCS_CONTROL_1_PDOWN;
+
+   if (val != new_val)
+   err = mv88e6xxx_phy_write(chip, addr, reg_c45, new_val);
+
+   return err;
+}
+
+/* Set the power on/off for 10GBASE-R and 10GBASE-X4/X2 */
+static int mv88e6390_serdes_sgmii(struct mv88e6xxx_chip *chip, int addr,
+ bool on)
+{
+   u16 val, new_val;
+   int reg_c45;
+   int err;
+
+   reg_c45 = MII_ADDR_C45 | MV88E6390_SERDES_DEVICE |
+   MV88E6390_SGMII_CONTROL;
+   err = mv88e6xxx_phy_read(chip, addr, reg_c45, &val);
+   if (err)
+   return err;
+
+   if (on)
+   new_val = val & ~(MV88E6390_SGMII_CONTROL_RESET |
+ MV88E6390_SGMII_CONTROL_LOOPBACK |
+ MV88E6390_SGMII_CONTROL_PDOWN);
+   else
+   new_val = val | MV88E6390_SGMII_CONTROL_PDOWN;
+
+   if (val != new_val)
+   err = mv88e6xxx_phy_write(chip, addr, reg_c45, new_val);
+
+   return err;
+}
+
+static int mv88e6390_serdes_lower(struct mv88e6xxx_chip *chip, u8 cmode,
+ int port_donor, int lane, bool rxaui, bool on)
+{
+   int err;
+   u8 cmode_donor;
+
+   err = mv88e6xxx_port_get_cmode(chip, port_donor, &cm

[PATCHv2 net-next 3/5] net: dsa: mv88e6xxx: Remove SERDES flag

2017-05-23 Thread Andrew Lunn

Now that we use an op for SERDES operations, we don't need a flag for
it. Remove it.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h | 23 ++-
 drivers/net/dsa/mv88e6xxx/phy.c   |  3 ---
 2 files changed, 2 insertions(+), 24 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h 
b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
index fb996491b111..9087cb009cc3 100644
--- a/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
+++ b/drivers/net/dsa/mv88e6xxx/mv88e6xxx.h
@@ -508,14 +508,6 @@ enum mv88e6xxx_cap {
MV88E6XXX_CAP_SMI_CMD,  /* (0x00) SMI Command */
MV88E6XXX_CAP_SMI_DATA, /* (0x01) SMI Data */
 
-   /* PHY Registers.
-*/
-   MV88E6XXX_CAP_PHY_PAGE, /* (0x16) Page Register */
-
-   /* Fiber/SERDES Registers (SMI address F).
-*/
-   MV88E6XXX_CAP_SERDES,
-
/* Switch Global (1) Registers.
 */
MV88E6XXX_CAP_G1_ATU_FID,   /* (0x01) ATU FID Register */
@@ -550,10 +542,6 @@ enum mv88e6xxx_cap {
 #define MV88E6XXX_FLAG_SMI_CMD BIT_ULL(MV88E6XXX_CAP_SMI_CMD)
 #define MV88E6XXX_FLAG_SMI_DATABIT_ULL(MV88E6XXX_CAP_SMI_DATA)
 
-#define MV88E6XXX_FLAG_PHY_PAGEBIT_ULL(MV88E6XXX_CAP_PHY_PAGE)
-
-#define MV88E6XXX_FLAG_SERDES  BIT_ULL(MV88E6XXX_CAP_SERDES)
-
 #define MV88E6XXX_FLAG_G1_VTU_FID  BIT_ULL(MV88E6XXX_CAP_G1_VTU_FID)
 
 #define MV88E6XXX_FLAG_GLOBAL2 BIT_ULL(MV88E6XXX_CAP_GLOBAL2)
@@ -574,11 +562,6 @@ enum mv88e6xxx_cap {
(MV88E6XXX_FLAG_SMI_CMD |   \
 MV88E6XXX_FLAG_SMI_DATA)
 
-/* Fiber/SERDES Registers at SMI address F, page 1 */
-#define MV88E6XXX_FLAGS_SERDES \
-   (MV88E6XXX_FLAG_PHY_PAGE |  \
-MV88E6XXX_FLAG_SERDES)
-
 #define MV88E6XXX_FLAGS_FAMILY_6095\
(MV88E6XXX_FLAG_GLOBAL2 |   \
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
@@ -626,8 +609,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_INT |\
 MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAGS_IRL |  \
-MV88E6XXX_FLAGS_MULTI_CHIP |   \
-MV88E6XXX_FLAGS_SERDES)
+MV88E6XXX_FLAGS_MULTI_CHIP)
 
 #define MV88E6XXX_FLAGS_FAMILY_6351\
(MV88E6XXX_FLAG_G1_VTU_FID |\
@@ -648,8 +630,7 @@ enum mv88e6xxx_cap {
 MV88E6XXX_FLAG_G2_MGMT_EN_0X | \
 MV88E6XXX_FLAG_G2_POT |\
 MV88E6XXX_FLAGS_IRL |  \
-MV88E6XXX_FLAGS_MULTI_CHIP |   \
-MV88E6XXX_FLAGS_SERDES)
+MV88E6XXX_FLAGS_MULTI_CHIP)
 
 #define MV88E6XXX_FLAGS_FAMILY_6390\
(MV88E6XXX_FLAG_EEE |   \
diff --git a/drivers/net/dsa/mv88e6xxx/phy.c b/drivers/net/dsa/mv88e6xxx/phy.c
index 2ebac599a174..cd8e0b329cd6 100644
--- a/drivers/net/dsa/mv88e6xxx/phy.c
+++ b/drivers/net/dsa/mv88e6xxx/phy.c
@@ -62,9 +62,6 @@ int mv88e6xxx_phy_write(struct mv88e6xxx_chip *chip, int phy, 
int reg, u16 val)
 
 int mv88e6xxx_phy_page_get(struct mv88e6xxx_chip *chip, int phy, u8 page)
 {
-   if (!mv88e6xxx_has(chip, MV88E6XXX_FLAG_PHY_PAGE))
-   return -EOPNOTSUPP;
-
return mv88e6xxx_phy_write(chip, phy, PHY_PAGE, page);
 }
 
-- 
2.11.0

[PATCHv2 net-next 2/5] net: dsa: mv88e6xxx: Refactor mv88e6352 SERDES code into an op

2017-05-23 Thread Andrew Lunn

The mv88e6390 family has a different SERDES implementation. Refactor
the mv88e6352 code into an ops function, so we can later add the
mv88e6390 code.

Signed-off-by: Andrew Lunn 
---
 drivers/net/dsa/mv88e6xxx/Makefile|  1 +
 drivers/net/dsa/mv88e6xxx/chip.c  | 64 +-
 drivers/net/dsa/mv88e6xxx/mv88e6xxx.h |  6 +--
 drivers/net/dsa/mv88e6xxx/serdes.c| 75 +++
 drivers/net/dsa/mv88e6xxx/serdes.h| 24 +++
 5 files changed, 122 insertions(+), 48 deletions(-)
 create mode 100644 drivers/net/dsa/mv88e6xxx/serdes.c
 create mode 100644 drivers/net/dsa/mv88e6xxx/serdes.h

diff --git a/drivers/net/dsa/mv88e6xxx/Makefile 
b/drivers/net/dsa/mv88e6xxx/Makefile
index e4372eaf3bc5..5cd5551461e3 100644
--- a/drivers/net/dsa/mv88e6xxx/Makefile
+++ b/drivers/net/dsa/mv88e6xxx/Makefile
@@ -6,3 +6,4 @@ mv88e6xxx-objs += global1_vtu.o
 mv88e6xxx-$(CONFIG_NET_DSA_MV88E6XXX_GLOBAL2) += global2.o
 mv88e6xxx-objs += phy.o
 mv88e6xxx-objs += port.o
+mv88e6xxx-objs += serdes.o
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index a3c7756dc01b..b029d01a0a5f 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -38,6 +38,7 @@
 #include "global2.h"
 #include "phy.h"
 #include "port.h"
+#include "serdes.h"
 
 static void assert_reg_lock(struct mv88e6xxx_chip *chip)
 {
@@ -234,18 +235,6 @@ struct mii_bus *mv88e6xxx_default_mdio_bus(struct 
mv88e6xxx_chip *chip)
return mdio_bus->bus;
 }
 
-static int mv88e6xxx_serdes_read(struct mv88e6xxx_chip *chip, int reg, u16 
*val)
-{
-   return mv88e6xxx_phy_page_read(chip, ADDR_SERDES, SERDES_PAGE_FIBER,
-  reg, val);
-}
-
-static int mv88e6xxx_serdes_write(struct mv88e6xxx_chip *chip, int reg, u16 
val)
-{
-   return mv88e6xxx_phy_page_write(chip, ADDR_SERDES, SERDES_PAGE_FIBER,
-   reg, val);
-}
-
 static void mv88e6xxx_g1_irq_mask(struct irq_data *d)
 {
struct mv88e6xxx_chip *chip = irq_data_get_irq_chip_data(d);
@@ -1733,24 +1722,6 @@ static int mv88e6xxx_switch_reset(struct mv88e6xxx_chip 
*chip)
return mv88e6xxx_software_reset(chip);
 }
 
-static int mv88e6xxx_serdes_power_on(struct mv88e6xxx_chip *chip)
-{
-   u16 val;
-   int err;
-
-   /* Clear Power Down bit */
-   err = mv88e6xxx_serdes_read(chip, MII_BMCR, &val);
-   if (err)
-   return err;
-
-   if (val & BMCR_PDOWN) {
-   val &= ~BMCR_PDOWN;
-   err = mv88e6xxx_serdes_write(chip, MII_BMCR, val);
-   }
-
-   return err;
-}
-
 static int mv88e6xxx_set_port_mode(struct mv88e6xxx_chip *chip, int port,
   enum mv88e6xxx_frame_mode frame, u16 egress,
   u16 etype)
@@ -1832,6 +1803,15 @@ static int mv88e6xxx_setup_egress_floods(struct 
mv88e6xxx_chip *chip, int port)
return 0;
 }
 
+static int mv88e6xxx_serdes_power(struct mv88e6xxx_chip *chip, int port,
+ bool on)
+{
+   if (chip->info->ops->serdes_power)
+   return chip->info->ops->serdes_power(chip, port, on);
+
+   return 0;
+}
+
 static int mv88e6xxx_setup_port(struct mv88e6xxx_chip *chip, int port)
 {
struct dsa_switch *ds = chip->ds;
@@ -1882,22 +1862,12 @@ static int mv88e6xxx_setup_port(struct mv88e6xxx_chip 
*chip, int port)
if (err)
return err;
 
-   /* If this port is connected to a SerDes, make sure the SerDes is not
-* powered down.
+   /* If this port is connected to a SerDes, make sure the SerDes is
+* powered up.
 */
-   if (mv88e6xxx_has(chip, MV88E6XXX_FLAGS_SERDES)) {
-   err = mv88e6xxx_port_read(chip, port, PORT_STATUS, ®);
-   if (err)
-   return err;
-   reg &= PORT_STATUS_CMODE_MASK;
-   if ((reg == PORT_STATUS_CMODE_100BASE_X) ||
-   (reg == PORT_STATUS_CMODE_1000BASE_X) ||
-   (reg == PORT_STATUS_CMODE_SGMII)) {
-   err = mv88e6xxx_serdes_power_on(chip);
-   if (err < 0)
-   return err;
-   }
-   }
+   err = mv88e6xxx_serdes_power(chip, port, true);
+   if (err)
+   return err;
 
/* Port Control 2: don't force a good FCS, set the maximum frame size to
 * 10240 bytes, disable 802.1q tags checking, don't discard tagged or
@@ -2662,6 +2632,7 @@ static const struct mv88e6xxx_ops mv88e6172_ops = {
.reset = mv88e6352_g1_reset,
.vtu_getnext = mv88e6352_g1_vtu_getnext,
.vtu_loadpurge = mv88e6352_g1_vtu_loadpurge,
+   .serdes_power = mv88e6352_serdes_power,
 };
 
 static const struct mv88e6xxx_ops mv88e6175_ops = {
@@ -2726,6 +2697,7 @@ static const struct mv88e6xxx_ops mv88e6176_ops = {
.reset = mv88e6352_g1_reset,

Re: [PATCH v2 0/4] arp: always override existing neigh entries with gratuitous ARP

2017-05-23 Thread Arnd Bergmann

On Thu, May 18, 2017 at 9:41 PM, Ihar Hrachyshka  wrote:
> This patchset is spurred by discussion started at
> https://patchwork.ozlabs.org/patch/760372/ where we figured that there is no
> real reason for enforcing override by gratuitous ARP packets only when
> arp_accept is 1. Same should happen when it's 0 (the default value).
>
> changelog v2: handled review comments by Julian Anastasov
> - fixed a mistake in a comment;
> - postponed addr_type calculation to as late as possible.

This seems to have caused a build warning:

net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized
in this function [-Wmaybe-uninitialized]

   Arnd

[patch net-next v2 6/8] mlxsw: core: Create the mlxsw_fw_rev struct

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

This struct was previously an anonymous struct defined inside the
mlxsw_bus_info struct. Extract it to a struct named mlxsw_fw_rev, as it
will be needed later by the spectrum driver.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/core.h | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h 
b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 7fb3539..6e966af 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -344,15 +344,17 @@ struct mlxsw_bus {
u8 features;
 };
 
+struct mlxsw_fw_rev {
+   u16 major;
+   u16 minor;
+   u16 subminor;
+};
+
 struct mlxsw_bus_info {
const char *device_kind;
const char *device_name;
struct device *dev;
-   struct {
-   u16 major;
-   u16 minor;
-   u16 subminor;
-   } fw_rev;
+   struct mlxsw_fw_rev fw_rev;
u8 vsd[MLXSW_CMD_BOARDINFO_VSD_LEN];
u8 psid[MLXSW_CMD_BOARDINFO_PSID_LEN];
 };
-- 
2.9.3

[patch net-next v2 7/8] mlxsw: spectrum: Validate firmware revision on init

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

Make the spectrum module check the current device firmware version, and if
it is below the supported version, use the libfirmware API to request a
firmware file with the supported firmware version and flash it to the
device using the mlxfw module.

The firmware file names are expected to be of Mellanox Firmware Archive
version 2 (MFA2) format and their name are expected to be in the following
pattern: "mlxsw_spectrum-...mfa2".

Signed-off-by: Yotam Gigi 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/Kconfig|  1 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 67 ++
 2 files changed, 68 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Kconfig 
b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
index ef23eae..b9f80c2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Kconfig
+++ b/drivers/net/ethernet/mellanox/mlxsw/Kconfig
@@ -75,6 +75,7 @@ config MLXSW_SPECTRUM
depends on MLXSW_CORE && MLXSW_PCI && NET_SWITCHDEV && VLAN_8021Q
depends on PSAMPLE || PSAMPLE=n
select PARMAN
+   select MLXFW
default m
---help---
  This driver supports Mellanox Technologies Spectrum Ethernet
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index b533a53..9594e9d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -70,6 +70,21 @@
 #include "spectrum_dpipe.h"
 #include "../mlxfw/mlxfw.h"
 
+#define MLXSW_FWREV_MAJOR 13
+#define MLXSW_FWREV_MINOR 1420
+#define MLXSW_FWREV_SUBMINOR 122
+
+static const struct mlxsw_fw_rev mlxsw_sp_supported_fw_rev = {
+   .major = MLXSW_FWREV_MAJOR,
+   .minor = MLXSW_FWREV_MINOR,
+   .subminor = MLXSW_FWREV_SUBMINOR
+};
+
+#define MLXSW_SP_FW_FILENAME \
+   "mlxsw_spectrum-" __stringify(MLXSW_FWREV_MAJOR) \
+   "." __stringify(MLXSW_FWREV_MINOR) \
+   "." __stringify(MLXSW_FWREV_SUBMINOR) ".mfa2"
+
 static const char mlxsw_sp_driver_name[] = "mlxsw_spectrum";
 static const char mlxsw_sp_driver_version[] = "1.0";
 
@@ -306,6 +321,51 @@ static const struct mlxfw_dev_ops mlxsw_sp_mlxfw_dev_ops = 
{
.fsm_release= mlxsw_sp_fsm_release
 };
 
+static bool mlxsw_sp_fw_rev_ge(const struct mlxsw_fw_rev *a,
+  const struct mlxsw_fw_rev *b)
+{
+   if (a->major != b->major)
+   return a->major > b->major;
+   if (a->minor != b->minor)
+   return a->minor > b->minor;
+   return a->subminor >= b->subminor;
+}
+
+static int mlxsw_sp_fw_rev_validate(struct mlxsw_sp *mlxsw_sp)
+{
+   const struct mlxsw_fw_rev *rev = &mlxsw_sp->bus_info->fw_rev;
+   struct mlxsw_sp_mlxfw_dev mlxsw_sp_mlxfw_dev = {
+   .mlxfw_dev = {
+   .ops = &mlxsw_sp_mlxfw_dev_ops,
+   .psid = mlxsw_sp->bus_info->psid,
+   .psid_size = strlen(mlxsw_sp->bus_info->psid),
+   },
+   .mlxsw_sp = mlxsw_sp
+   };
+   const struct firmware *firmware;
+   int err;
+
+   if (mlxsw_sp_fw_rev_ge(rev, &mlxsw_sp_supported_fw_rev))
+   return 0;
+
+   dev_info(mlxsw_sp->bus_info->dev, "The firmware version %d.%d.%d out of 
data\n",
+rev->major, rev->minor, rev->subminor);
+   dev_info(mlxsw_sp->bus_info->dev, "Upgrading firmware using file %s\n",
+MLXSW_SP_FW_FILENAME);
+
+   err = request_firmware_direct(&firmware, MLXSW_SP_FW_FILENAME,
+ mlxsw_sp->bus_info->dev);
+   if (err) {
+   dev_err(mlxsw_sp->bus_info->dev, "Could not request firmware 
file %s\n",
+   MLXSW_SP_FW_FILENAME);
+   return err;
+   }
+
+   err = mlxfw_firmware_flash(&mlxsw_sp_mlxfw_dev.mlxfw_dev, firmware);
+   release_firmware(firmware);
+   return err;
+}
+
 int mlxsw_sp_flow_counter_get(struct mlxsw_sp *mlxsw_sp,
  unsigned int counter_index, u64 *packets,
  u64 *bytes)
@@ -3559,6 +3619,12 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
INIT_LIST_HEAD(&mlxsw_sp->fids);
INIT_LIST_HEAD(&mlxsw_sp->vfids.list);
 
+   err = mlxsw_sp_fw_rev_validate(mlxsw_sp);
+   if (err) {
+   dev_err(mlxsw_sp->bus_info->dev, "Could not upgrade 
firmware\n");
+   return err;
+   }
+
err = mlxsw_sp_base_mac_get(mlxsw_sp);
if (err) {
dev_err(mlxsw_sp->bus_info->dev, "Failed to get base mac\n");
@@ -4930,3 +4996,4 @@ MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Jiri Pirko ");
 MODULE_DESCRIPTION("Mellanox Spectrum driver");
 MODULE_DEVICE_TABLE(pci, mlxsw_sp_pci_id_table);
+MODULE_FIRMWARE(MLXSW_SP_FW_FILENAME);
-- 
2.9.3

[patch net-next v2 8/8] mlxsw: spectrum_router: Adjust RIF configuration for new firmware versions

2017-05-23 Thread Jiri Pirko

From: Ido Schimmel 

In new firmware versions, when configuring a {Port, VID} as a router
interface, the driver is responsible for enabling the STP filter and
disabling learning.  Otherwise, packets are discarded.

This change doesn't break existing firmware versions, but is required
for newer firmware versions.

Signed-off-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
index 3cc7d52..8165b11 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3109,7 +3110,9 @@ static int mlxsw_sp_vport_rif_sp_join(struct 
mlxsw_sp_port *mlxsw_sp_vport,
  struct net_device *l3_dev)
 {
struct mlxsw_sp *mlxsw_sp = mlxsw_sp_vport->mlxsw_sp;
+   u16 vid = mlxsw_sp_vport_vid_get(mlxsw_sp_vport);
struct mlxsw_sp_rif *rif;
+   int err;
 
rif = mlxsw_sp_rif_find_by_dev(mlxsw_sp, l3_dev);
if (!rif) {
@@ -3118,20 +3121,39 @@ static int mlxsw_sp_vport_rif_sp_join(struct 
mlxsw_sp_port *mlxsw_sp_vport,
return PTR_ERR(rif);
}
 
+   err = mlxsw_sp_port_vid_learning_set(mlxsw_sp_vport, vid, false);
+   if (err)
+   goto err_port_vid_learning_set;
+
+   err = mlxsw_sp_port_vid_stp_set(mlxsw_sp_vport, vid,
+   BR_STATE_FORWARDING);
+   if (err)
+   goto err_port_vid_stp_set;
+
mlxsw_sp_vport_fid_set(mlxsw_sp_vport, rif->f);
rif->f->ref_count++;
 
netdev_dbg(mlxsw_sp_vport->dev, "Joined FID=%d\n", rif->f->fid);
 
return 0;
+
+err_port_vid_stp_set:
+   mlxsw_sp_port_vid_learning_set(mlxsw_sp_vport, vid, true);
+err_port_vid_learning_set:
+   if (rif->f->ref_count == 0)
+   mlxsw_sp_vport_rif_sp_destroy(mlxsw_sp_vport, rif);
+   return err;
 }
 
 static void mlxsw_sp_vport_rif_sp_leave(struct mlxsw_sp_port *mlxsw_sp_vport)
 {
struct mlxsw_sp_fid *f = mlxsw_sp_vport_fid_get(mlxsw_sp_vport);
+   u16 vid = mlxsw_sp_vport_vid_get(mlxsw_sp_vport);
 
netdev_dbg(mlxsw_sp_vport->dev, "Left FID=%d\n", f->fid);
 
+   mlxsw_sp_port_vid_stp_set(mlxsw_sp_vport, vid, BR_STATE_BLOCKING);
+   mlxsw_sp_port_vid_learning_set(mlxsw_sp_vport, vid, true);
mlxsw_sp_vport_fid_set(mlxsw_sp_vport, NULL);
if (--f->ref_count == 0)
mlxsw_sp_vport_rif_sp_destroy(mlxsw_sp_vport, f->rif);
-- 
2.9.3

[patch net-next v2 4/8] mlxsw: reg: Add Management Component Data Access register

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

The MCDA register allows reading and writing a firmware component.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 52 +++
 1 file changed, 52 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index f3c768c..182150a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5808,6 +5808,57 @@ static inline void mlxsw_reg_mcc_unpack(char *payload, 
u32 *p_update_handle,
*p_control_state = mlxsw_reg_mcc_control_state_get(payload);
 }
 
+/* MCDA - Management Component Data Access
+ * ---
+ * This register allows reading and writing a firmware component.
+ */
+#define MLXSW_REG_MCDA_ID 0x9063
+#define MLXSW_REG_MCDA_BASE_LEN 0x10
+#define MLXSW_REG_MCDA_MAX_DATA_LEN 0x80
+#define MLXSW_REG_MCDA_LEN \
+   (MLXSW_REG_MCDA_BASE_LEN + MLXSW_REG_MCDA_MAX_DATA_LEN)
+
+MLXSW_REG_DEFINE(mcda, MLXSW_REG_MCDA_ID, MLXSW_REG_MCDA_LEN);
+
+/* reg_mcda_update_handle
+ * Token representing the current flow executed by the FSM.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcda, update_handle, 0x00, 0, 24);
+
+/* reg_mcda_offset
+ * Offset of accessed address relative to component start. Accesses must be in
+ * accordance to log_mcda_word_size in MCQI reg.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcda, offset, 0x04, 0, 32);
+
+/* reg_mcda_size
+ * Size of the data accessed, given in bytes.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcda, size, 0x08, 0, 16);
+
+/* reg_mcda_data
+ * Data block accessed.
+ * Access: RW
+ */
+MLXSW_ITEM32_INDEXED(reg, mcda, data, 0x10, 0, 32, 4, 0, false);
+
+static inline void mlxsw_reg_mcda_pack(char *payload, u32 update_handle,
+  u32 offset, u16 size, u8 *data)
+{
+   int i;
+
+   MLXSW_REG_ZERO(mcda, payload);
+   mlxsw_reg_mcda_update_handle_set(payload, update_handle);
+   mlxsw_reg_mcda_offset_set(payload, offset);
+   mlxsw_reg_mcda_size_set(payload, size);
+
+   for (i = 0; i < size / 4; i++)
+   mlxsw_reg_mcda_data_set(payload, i, *(u32 *) &data[i * 4]);
+}
+
 /* MPSC - Monitoring Packet Sampling Configuration Register
  * 
  * MPSC Register is used to configure the Packet Sampling mechanism.
@@ -6388,6 +6439,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
MLXSW_REG(mpsc),
MLXSW_REG(mcqi),
MLXSW_REG(mcc),
+   MLXSW_REG(mcda),
MLXSW_REG(mgpc),
MLXSW_REG(sbpr),
MLXSW_REG(sbcm),
-- 
2.9.3

[patch net-next v2 2/8] mlxsw: reg: Add Management Component Query Information register

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

The MCQI register queries information about firmware components. It will
be needed by the mlxfw module to query various options about the
components, such as their max size, alignment and max write size.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 84 +++
 1 file changed, 84 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 83b277c..adb385f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5643,6 +5643,89 @@ static inline void mlxsw_reg_mlcr_pack(char *payload, u8 
local_port,
   MLXSW_REG_MLCR_DURATION_MAX : 0);
 }
 
+/* MCQI - Management Component Query Information
+ * -
+ * This register allows querying information about firmware components.
+ */
+#define MLXSW_REG_MCQI_ID 0x9061
+#define MLXSW_REG_MCQI_BASE_LEN 0x18
+#define MLXSW_REG_MCQI_CAP_LEN 0x14
+#define MLXSW_REG_MCQI_LEN (MLXSW_REG_MCQI_BASE_LEN + MLXSW_REG_MCQI_CAP_LEN)
+
+MLXSW_REG_DEFINE(mcqi, MLXSW_REG_MCQI_ID, MLXSW_REG_MCQI_LEN);
+
+/* reg_mcqi_component_index
+ * Index of the accessed component.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mcqi, component_index, 0x00, 0, 16);
+
+enum mlxfw_reg_mcqi_info_type {
+   MLXSW_REG_MCQI_INFO_TYPE_CAPABILITIES,
+};
+
+/* reg_mcqi_info_type
+ * Component properties set.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcqi, info_type, 0x08, 0, 5);
+
+/* reg_mcqi_offset
+ * The requested/returned data offset from the section start, given in bytes.
+ * Must be DWORD aligned.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcqi, offset, 0x10, 0, 32);
+
+/* reg_mcqi_data_size
+ * The requested/returned data size, given in bytes. If data_size is not DWORD
+ * aligned, the last bytes are zero padded.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcqi, data_size, 0x14, 0, 16);
+
+/* reg_mcqi_cap_max_component_size
+ * Maximum size for this component, given in bytes.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mcqi, cap_max_component_size, 0x20, 0, 32);
+
+/* reg_mcqi_cap_log_mcda_word_size
+ * Log 2 of the access word size in bytes. Read and write access must be 
aligned
+ * to the word size. Write access must be done for an integer number of words.
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mcqi, cap_log_mcda_word_size, 0x24, 28, 4);
+
+/* reg_mcqi_cap_mcda_max_write_size
+ * Maximal write size for MCDA register
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mcqi, cap_mcda_max_write_size, 0x24, 0, 16);
+
+static inline void mlxsw_reg_mcqi_pack(char *payload, u16 component_index)
+{
+   MLXSW_REG_ZERO(mcqi, payload);
+   mlxsw_reg_mcqi_component_index_set(payload, component_index);
+   mlxsw_reg_mcqi_info_type_set(payload,
+MLXSW_REG_MCQI_INFO_TYPE_CAPABILITIES);
+   mlxsw_reg_mcqi_offset_set(payload, 0);
+   mlxsw_reg_mcqi_data_size_set(payload, MLXSW_REG_MCQI_CAP_LEN);
+}
+
+static inline void mlxsw_reg_mcqi_unpack(char *payload,
+u32 *p_cap_max_component_size,
+u8 *p_cap_log_mcda_word_size,
+u16 *p_cap_mcda_max_write_size)
+{
+   *p_cap_max_component_size =
+   mlxsw_reg_mcqi_cap_max_component_size_get(payload);
+   *p_cap_log_mcda_word_size =
+   mlxsw_reg_mcqi_cap_log_mcda_word_size_get(payload);
+   *p_cap_mcda_max_write_size =
+   mlxsw_reg_mcqi_cap_mcda_max_write_size_get(payload);
+}
+
 /* MPSC - Monitoring Packet Sampling Configuration Register
  * 
  * MPSC Register is used to configure the Packet Sampling mechanism.
@@ -6221,6 +6304,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
MLXSW_REG(mpar),
MLXSW_REG(mlcr),
MLXSW_REG(mpsc),
+   MLXSW_REG(mcqi),
MLXSW_REG(mgpc),
MLXSW_REG(sbpr),
MLXSW_REG(sbcm),
-- 
2.9.3

[patch net-next v2 1/8] Add the mlxfw module for Mellanox firmware flash process

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

The mlxfw module is in charge of common logic needed to flash Mellanox
devices firmware, which consists of:
 - Parse the Mellanox Firmware Archive version 2 (MFA2) format, which is
   the format used to store the Mellanox firmware. The MFA2 format file can
   hold firmware for many different silicon variants, differentiated by a
   unique ID called PSID. In addition, the MFA2 file data section is
   compressed using xz compression to save both file-system space and
   memory at extraction time.
 - Implement the firmware flash state machine logic, which is a common
   logic for Mellanox products needed to flash the firmware to the device.

As the module is shared between different Mellanox products, it defines a
set of callbacks to be implemented by the specific driver for hardware
interaction.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 MAINTAINERS|   8 +
 drivers/net/ethernet/mellanox/Kconfig  |   1 +
 drivers/net/ethernet/mellanox/Makefile |   1 +
 drivers/net/ethernet/mellanox/mlxfw/Kconfig|   6 +
 drivers/net/ethernet/mellanox/mlxfw/Makefile   |   2 +
 drivers/net/ethernet/mellanox/mlxfw/mlxfw.h| 102 
 drivers/net/ethernet/mellanox/mlxfw/mlxfw_fsm.c| 273 +
 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c   | 620 +
 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.h   |  66 +++
 .../net/ethernet/mellanox/mlxfw/mlxfw_mfa2_file.h  |  60 ++
 .../ethernet/mellanox/mlxfw/mlxfw_mfa2_format.h| 103 
 .../net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv.h   |  98 
 .../ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.c | 126 +
 .../ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.h |  71 +++
 14 files changed, 1537 insertions(+)
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/Kconfig
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_fsm.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_file.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_format.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 9e98464..fcde259 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8320,6 +8320,14 @@ W:   http://www.mellanox.com
 Q: http://patchwork.ozlabs.org/project/netdev/list/
 F: drivers/net/ethernet/mellanox/mlxsw/
 
+MELLANOX FIRMWARE FLASH LIBRARY (mlxfw)
+M: Yotam Gigi 
+L: netdev@vger.kernel.org
+S: Supported
+W: http://www.mellanox.com
+Q: http://patchwork.ozlabs.org/project/netdev/list/
+F: drivers/net/ethernet/mellanox/mlxfw/
+
 MELLANOX MLXCPLD I2C AND MUX DRIVER
 M: Vadim Pasternak 
 M: Michael Shych 
diff --git a/drivers/net/ethernet/mellanox/Kconfig 
b/drivers/net/ethernet/mellanox/Kconfig
index d547010..84a2007 100644
--- a/drivers/net/ethernet/mellanox/Kconfig
+++ b/drivers/net/ethernet/mellanox/Kconfig
@@ -19,5 +19,6 @@ if NET_VENDOR_MELLANOX
 source "drivers/net/ethernet/mellanox/mlx4/Kconfig"
 source "drivers/net/ethernet/mellanox/mlx5/core/Kconfig"
 source "drivers/net/ethernet/mellanox/mlxsw/Kconfig"
+source "drivers/net/ethernet/mellanox/mlxfw/Kconfig"
 
 endif # NET_VENDOR_MELLANOX
diff --git a/drivers/net/ethernet/mellanox/Makefile 
b/drivers/net/ethernet/mellanox/Makefile
index 2e2a5ec..016aa26 100644
--- a/drivers/net/ethernet/mellanox/Makefile
+++ b/drivers/net/ethernet/mellanox/Makefile
@@ -5,3 +5,4 @@
 obj-$(CONFIG_MLX4_CORE) += mlx4/
 obj-$(CONFIG_MLX5_CORE) += mlx5/core/
 obj-$(CONFIG_MLXSW_CORE) += mlxsw/
+obj-$(CONFIG_MLXFW) += mlxfw/
diff --git a/drivers/net/ethernet/mellanox/mlxfw/Kconfig 
b/drivers/net/ethernet/mellanox/mlxfw/Kconfig
new file mode 100644
index 000..56b60ac
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxfw/Kconfig
@@ -0,0 +1,6 @@
+#
+# Mellanox firmware flash library configuration
+#
+
+config MLXFW
+tristate "mlxfw" if COMPILE_TEST
diff --git a/drivers/net/ethernet/mellanox/mlxfw/Makefile 
b/drivers/net/ethernet/mellanox/mlxfw/Makefile
new file mode 100644
index 000..7448b30
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxfw/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_MLXFW)+= mlxfw.o
+mlxfw-objs := mlxfw_fsm.o mlxfw_mfa2_tlv_multi.o mlxfw_mfa2.o
diff --git a/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h 
b/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
new file mode 100644
index 000..beea4ba
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
@@ -0,0 +1,102 @@
+/*
+

[patch net-next v2 3/8] mlxsw: reg: Add Management Component Control register

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

The MCC register allows controlling and querying the firmware flash state
machine (FSM).

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 83 +++
 1 file changed, 83 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h 
b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index adb385f..f3c768c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5726,6 +5726,88 @@ static inline void mlxsw_reg_mcqi_unpack(char *payload,
mlxsw_reg_mcqi_cap_mcda_max_write_size_get(payload);
 }
 
+/* MCC - Management Component Control
+ * --
+ * Controls the firmware component and updates the FSM.
+ */
+#define MLXSW_REG_MCC_ID 0x9062
+#define MLXSW_REG_MCC_LEN 0x1C
+
+MLXSW_REG_DEFINE(mcc, MLXSW_REG_MCC_ID, MLXSW_REG_MCC_LEN);
+
+enum mlxsw_reg_mcc_instruction {
+   MLXSW_REG_MCC_INSTRUCTION_LOCK_UPDATE_HANDLE = 0x01,
+   MLXSW_REG_MCC_INSTRUCTION_RELEASE_UPDATE_HANDLE = 0x02,
+   MLXSW_REG_MCC_INSTRUCTION_UPDATE_COMPONENT = 0x03,
+   MLXSW_REG_MCC_INSTRUCTION_VERIFY_COMPONENT = 0x04,
+   MLXSW_REG_MCC_INSTRUCTION_ACTIVATE = 0x06,
+   MLXSW_REG_MCC_INSTRUCTION_CANCEL = 0x08,
+};
+
+/* reg_mcc_instruction
+ * Command to be executed by the FSM.
+ * Applicable for write operation only.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, mcc, instruction, 0x00, 0, 8);
+
+/* reg_mcc_component_index
+ * Index of the accessed component. Applicable only for commands that
+ * refer to components. Otherwise, this field is reserved.
+ * Access: Index
+ */
+MLXSW_ITEM32(reg, mcc, component_index, 0x04, 0, 16);
+
+/* reg_mcc_update_handle
+ * Token representing the current flow executed by the FSM.
+ * Access: WO
+ */
+MLXSW_ITEM32(reg, mcc, update_handle, 0x08, 0, 24);
+
+/* reg_mcc_error_code
+ * Indicates the successful completion of the instruction, or the reason it
+ * failed
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mcc, error_code, 0x0C, 8, 8);
+
+/* reg_mcc_control_state
+ * Current FSM state
+ * Access: RO
+ */
+MLXSW_ITEM32(reg, mcc, control_state, 0x0C, 0, 4);
+
+/* reg_mcc_component_size
+ * Component size in bytes. Valid for UPDATE_COMPONENT instruction. Specifying
+ * the size may shorten the update time. Value 0x0 means that size is
+ * unspecified.
+ * Access: WO
+ */
+MLXSW_ITEM32(reg, mcc, component_size, 0x10, 0, 32);
+
+static inline void mlxsw_reg_mcc_pack(char *payload,
+ enum mlxsw_reg_mcc_instruction instr,
+ u16 component_index, u32 update_handle,
+ u32 component_size)
+{
+   MLXSW_REG_ZERO(mcc, payload);
+   mlxsw_reg_mcc_instruction_set(payload, instr);
+   mlxsw_reg_mcc_component_index_set(payload, component_index);
+   mlxsw_reg_mcc_update_handle_set(payload, update_handle);
+   mlxsw_reg_mcc_component_size_set(payload, component_size);
+}
+
+static inline void mlxsw_reg_mcc_unpack(char *payload, u32 *p_update_handle,
+   u8 *p_error_code, u8 *p_control_state)
+{
+   if (p_update_handle)
+   *p_update_handle = mlxsw_reg_mcc_update_handle_get(payload);
+   if (p_error_code)
+   *p_error_code = mlxsw_reg_mcc_error_code_get(payload);
+   if (p_control_state)
+   *p_control_state = mlxsw_reg_mcc_control_state_get(payload);
+}
+
 /* MPSC - Monitoring Packet Sampling Configuration Register
  * 
  * MPSC Register is used to configure the Packet Sampling mechanism.
@@ -6305,6 +6387,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
MLXSW_REG(mlcr),
MLXSW_REG(mpsc),
MLXSW_REG(mcqi),
+   MLXSW_REG(mcc),
MLXSW_REG(mgpc),
MLXSW_REG(sbpr),
MLXSW_REG(sbcm),
-- 
2.9.3

[patch net-next v2 5/8] mlxsw: spectrum: Add the needed callbacks for mlxfw integration

2017-05-23 Thread Jiri Pirko

From: Yotam Gigi 

The mlxfw module defines several needed callbacks in order to flash the
device's firmware. As the mlxfw module is shared between several different
drivers, those callbacks are the glue functionality that is responsible
for hardware interaction. Add those callbacks using the MCQI, MCC, MCDA
registers.

Signed-off-by: Yotam Gigi 
Reviewed-by: Ido Schimmel 
Signed-off-by: Jiri Pirko 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 166 +
 1 file changed, 166 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 8a165bb..b533a53 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -68,6 +68,7 @@
 #include "txheader.h"
 #include "spectrum_cnt.h"
 #include "spectrum_dpipe.h"
+#include "../mlxfw/mlxfw.h"
 
 static const char mlxsw_sp_driver_name[] = "mlxsw_spectrum";
 static const char mlxsw_sp_driver_version[] = "1.0";
@@ -140,6 +141,171 @@ MLXSW_ITEM32(tx, hdr, fid, 0x08, 0, 16);
  */
 MLXSW_ITEM32(tx, hdr, type, 0x0C, 0, 4);
 
+struct mlxsw_sp_mlxfw_dev {
+   struct mlxfw_dev mlxfw_dev;
+   struct mlxsw_sp *mlxsw_sp;
+};
+
+static int mlxsw_sp_component_query(struct mlxfw_dev *mlxfw_dev,
+   u16 component_index, u32 *p_max_size,
+   u8 *p_align_bits, u16 *p_max_write_size)
+{
+   struct mlxsw_sp_mlxfw_dev *mlxsw_sp_mlxfw_dev =
+   container_of(mlxfw_dev, struct mlxsw_sp_mlxfw_dev, mlxfw_dev);
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_mlxfw_dev->mlxsw_sp;
+   char mcqi_pl[MLXSW_REG_MCQI_LEN];
+   int err;
+
+   mlxsw_reg_mcqi_pack(mcqi_pl, component_index);
+   err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(mcqi), mcqi_pl);
+   if (err)
+   return err;
+   mlxsw_reg_mcqi_unpack(mcqi_pl, p_max_size, p_align_bits,
+ p_max_write_size);
+
+   *p_align_bits = max_t(u8, *p_align_bits, 2);
+   *p_max_write_size = min_t(u16, *p_max_write_size,
+ MLXSW_REG_MCDA_MAX_DATA_LEN);
+   return 0;
+}
+
+static int mlxsw_sp_fsm_lock(struct mlxfw_dev *mlxfw_dev, u32 *fwhandle)
+{
+   struct mlxsw_sp_mlxfw_dev *mlxsw_sp_mlxfw_dev =
+   container_of(mlxfw_dev, struct mlxsw_sp_mlxfw_dev, mlxfw_dev);
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_mlxfw_dev->mlxsw_sp;
+   char mcc_pl[MLXSW_REG_MCC_LEN];
+   u8 control_state;
+   int err;
+
+   mlxsw_reg_mcc_pack(mcc_pl, 0, 0, 0, 0);
+   err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(mcc), mcc_pl);
+   if (err)
+   return err;
+
+   mlxsw_reg_mcc_unpack(mcc_pl, fwhandle, NULL, &control_state);
+   if (control_state != MLXFW_FSM_STATE_IDLE)
+   return -EBUSY;
+
+   mlxsw_reg_mcc_pack(mcc_pl,
+  MLXSW_REG_MCC_INSTRUCTION_LOCK_UPDATE_HANDLE,
+  0, *fwhandle, 0);
+   return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mcc), mcc_pl);
+}
+
+static int mlxsw_sp_fsm_component_update(struct mlxfw_dev *mlxfw_dev,
+u32 fwhandle, u16 component_index,
+u32 component_size)
+{
+   struct mlxsw_sp_mlxfw_dev *mlxsw_sp_mlxfw_dev =
+   container_of(mlxfw_dev, struct mlxsw_sp_mlxfw_dev, mlxfw_dev);
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_mlxfw_dev->mlxsw_sp;
+   char mcc_pl[MLXSW_REG_MCC_LEN];
+
+   mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_UPDATE_COMPONENT,
+  component_index, fwhandle, component_size);
+   return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mcc), mcc_pl);
+}
+
+static int mlxsw_sp_fsm_block_download(struct mlxfw_dev *mlxfw_dev,
+  u32 fwhandle, u8 *data, u16 size,
+  u32 offset)
+{
+   struct mlxsw_sp_mlxfw_dev *mlxsw_sp_mlxfw_dev =
+   container_of(mlxfw_dev, struct mlxsw_sp_mlxfw_dev, mlxfw_dev);
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_mlxfw_dev->mlxsw_sp;
+   char mcda_pl[MLXSW_REG_MCDA_LEN];
+
+   mlxsw_reg_mcda_pack(mcda_pl, fwhandle, offset, size, data);
+   return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mcda), mcda_pl);
+}
+
+static int mlxsw_sp_fsm_component_verify(struct mlxfw_dev *mlxfw_dev,
+u32 fwhandle, u16 component_index)
+{
+   struct mlxsw_sp_mlxfw_dev *mlxsw_sp_mlxfw_dev =
+   container_of(mlxfw_dev, struct mlxsw_sp_mlxfw_dev, mlxfw_dev);
+   struct mlxsw_sp *mlxsw_sp = mlxsw_sp_mlxfw_dev->mlxsw_sp;
+   char mcc_pl[MLXSW_REG_MCC_LEN];
+
+   mlxsw_reg_mcc_pack(mcc_pl, MLXSW_REG_MCC_INSTRUCTION_VERIFY_COMPONENT,
+  component_index, fwhandle, 0);
+   return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(mcc), mcc_pl);
+}
+
+static in

[patch net-next v2 0/8] mlxsw: Support firmware flash

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Add support for device firmware flash on mlxsw spectrum. The firmware files
are expected to be in the Mellanox Firmware Archive version 2 (MFA2)
format.

The firmware flash is triggered on driver initialization time if the device
firmware version does not meet the minimum firmware version supported by
the driver.

Currently, to activate the newly flashed firmware, the user needs to
reboot his system.

The first patch introduces the mlxfw module, which implements common logic
needed for the firmware flash process on Mellanox products, such as the
MFA2 format parsing and the firmware flash state machine logic. As the
module implements common logic which will be needed by various different
Mellanox drivers, it defines a set of callbacks needed to interact with the
specific device.

Patches 1-5 implement the needed mlxfw callbacks in the mlxsw spectrum
driver.

Patches 6 and 7 add boot-time firmware upgrade on the mlxsw spectrum
driver.

Patch 8 adds a fix needed for new firmware versions.

---
v1->v2:
- removed patch with the ethtool part

Ido Schimmel (1):
  mlxsw: spectrum_router: Adjust RIF configuration for new firmware
versions

Yotam Gigi (7):
  Add the mlxfw module for Mellanox firmware flash process
  mlxsw: reg: Add Management Component Query Information register
  mlxsw: reg: Add Management Component Control register
  mlxsw: reg: Add Management Component Data Access register
  mlxsw: spectrum: Add the needed callbacks for mlxfw integration
  mlxsw: core: Create the mlxsw_fw_rev struct
  mlxsw: spectrum: Validate firmware revision on init

 MAINTAINERS|   8 +
 drivers/net/ethernet/mellanox/Kconfig  |   1 +
 drivers/net/ethernet/mellanox/Makefile |   1 +
 drivers/net/ethernet/mellanox/mlxfw/Kconfig|   6 +
 drivers/net/ethernet/mellanox/mlxfw/Makefile   |   2 +
 drivers/net/ethernet/mellanox/mlxfw/mlxfw.h| 102 
 drivers/net/ethernet/mellanox/mlxfw/mlxfw_fsm.c| 273 +
 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c   | 620 +
 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.h   |  66 +++
 .../net/ethernet/mellanox/mlxfw/mlxfw_mfa2_file.h  |  60 ++
 .../ethernet/mellanox/mlxfw/mlxfw_mfa2_format.h| 103 
 .../net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv.h   |  98 
 .../ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.c | 126 +
 .../ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.h |  71 +++
 drivers/net/ethernet/mellanox/mlxsw/Kconfig|   1 +
 drivers/net/ethernet/mellanox/mlxsw/core.h |  12 +-
 drivers/net/ethernet/mellanox/mlxsw/reg.h  | 219 
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 233 
 .../net/ethernet/mellanox/mlxsw/spectrum_router.c  |  22 +
 19 files changed, 2019 insertions(+), 5 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/Kconfig
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/Makefile
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_fsm.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_file.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_format.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxfw/mlxfw_mfa2_tlv_multi.h

-- 
2.9.3

bond link state mismatch, rtnl_trylock() vs rtnl_lock()

2017-05-23 Thread Nithin Sujir


Hi,
We're encountering a problem in 4.4 LTS where, rarely, the bond link 
state is not updated when the slave link changes.


I've traced the issue to the arp monitor unable to get the rtnl lock. 
The sequence resulting in failure is as below.


bond_loadbalance_arp_mon() periodically called, if slave link is _down_, 
it checks if the slave is sending/receiving packets. If it is, it sets 
flags to be processed later down the function for bond link update. 
However, it sets the slave->link right away.


if (slave->link != BOND_LINK_UP) {
if (bond_time_in_interval(bond, trans_start, 1) &&
bond_time_in_interval(bond, slave->last_rx, 
1)) {


slave->link  = BOND_LINK_UP;
slave_state_changed = 1;


Later down the function, it tries to get the rtnl_lock. If it doesn't 
get it, it rearms and returns.


if (do_failover || slave_state_changed) {
if (!rtnl_trylock())
goto re_arm; <-- returns here

if (slave_state_changed) {
bond_slave_state_change(bond);

This is the problem. The next time this function is called, the 
slave->link is already marked UP. And we will never update the bond link 
state to UP.


Changing the rtnl_trylock() -> rtnl_lock() _does_ fix the issue.

Is this the right way to fix it? If it is, I can submit this formally.

What are the guidelines around using rtnl_lock() vs rtnl_trylock()? Some 
places are using rtnl_lock() and other rtnl_trylock(). Sorry, I couldn't 
find much via a google search or in Documentation/.


Thanks,
Nithin.



diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c

index 5dca77e..1f60503 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2614,8 +2614,7 @@ static void bond_loadbalance_arp_mon(struct 
work_struct *work)

rcu_read_unlock();

if (do_failover || slave_state_changed) {
-   if (!rtnl_trylock())
-   goto re_arm;
+   rtnl_lock();

if (slave_state_changed) {
bond_slave_state_change(bond);

Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-23 Thread Andy Lutomirski

On Tue, May 23, 2017 at 11:36 AM, Kees Cook  wrote:
> On Tue, May 23, 2017 at 12:48 AM, Solar Designer  wrote:
>> For modules_autoload_mode=2, we already seem to have the equivalent of
>> modprobe=/bin/true (or does it differ subtly, maybe in return values?),
>> which I already use at startup on a GPU box like this (preloading
>> modules so that the OpenCL backends wouldn't need the autoloading):
>>
>> nvidia-smi
>> nvidia-modprobe -u -c=0
>> #modprobe nvidia_uvm
>> #modprobe fglrx
>>
>> sysctl -w kernel.modprobe=/bin/true
>> sysctl -w kernel.hotplug=/bin/true
>>
>> but it's good to also have this supported more explicitly and more
>> consistently through modules_autoload_mode=2 while we're at it.  So I
>> support having this mode as well.  I just question the need to have it
>> non-resettable.
>
> I agree it's useful to have the explicit =2 state just to avoid
> confusion when more systems start implementing
> CONFIG_STATIC_USERMODEHELPER and kernel.modprobe becomes read-only
> (though the userspace implementation may allow for some way to disable
> it, etc). I just like avoiding the upcall to modprobe at all.

I fully support =2 to mean "no automatic loading at all".  I dislike
making it non-resettable.  If you can write to sysctls, then, most
likely you can either call init_module() directly or the system has
module loading disabled entirely.

Re: [PATCH net-next] tcp: fix TCP_SYNCNT flakes

2017-05-23 Thread Soheil Hassas Yeganeh

On Tue, May 23, 2017 at 3:38 PM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> After the mentioned commit, some of our packetdrill tests became flaky.
>
> TCP_SYNCNT socket option can limit the number of SYN retransmits.
>
> retransmits_timed_out() has to compare times computations based on
> local_clock() while timers are based on jiffies. With NTP adjustments
> and roundings we can observe 999 ms delay for 1000 ms timers.
> We end up sending one extra SYN packet.
>
> Gimmick added in commit 6fa12c850314 ("Revert Backoff [v3]: Calculate
> TCP's connection close threshold as a time value") makes no
> real sense for TCP_SYN_SENT sockets where no RTO backoff can happen at
> all.
>
> Lets use a simpler logic for TCP_SYN_SENT sockets and remove @syn_set
> parameter from retransmits_timed_out()
>
> Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
> Signed-off-by: Eric Dumazet 
> Signed-off-by: Yuchung Cheng 

Acked-by: Soheil Hassas Yeganeh 

Nice!

Re: Alignment in BPF verifier

2017-05-23 Thread Alexei Starovoitov


On 5/23/17 7:41 AM, Edward Cree wrote:

I'm still plugging away at this... it's going to be quite a big patch and
 rewrite a lot of stuff (and I'm not sure I'll be able to break it into
 smaller bisectable patches).
And of course I have more questions.  In check_packet_ptr_add(), we
 forbid adding a negative constant to a packet ptr.  Is there some
 principled reason for that, or is it just because the bounds checking is
 hard?  It seems like, if imm + reg->off > 0 (suitably carefully checked
 to avoid overflow etc.), then the subtraction should be legal.  Indeed,
 even if the reg->off (fixed part of offset) is zero, if the variable part
 is known (min_value) to be >= -imm, the subtraction should be safe.


adding negative imm to pkt_ptr is ok, but what is the use case?
Do you see llvm generating such code?
I think if we try to track everything with the current shape of
state pruning, the verifier will stop accepting old programs
because it reaches complexity limit.

I think we need to rearchitect the whole thing.
I was thinking of doing it compiler-style. Convert to ssa and
do traditional data flow analysis, use-def chains, register liveness
then pruning heuristics won't be necessary and verifier should be
able to check everything in more or less single pass.
Things like register liveness can be done without ssa. It can
be used to augment existing pruning, since it will know which
registers are dead, so they don't have to be compared, but it
feels half-way. I'd rather go all the way.


On 20/05/17 00:05, Daniel Borkmann wrote:

Besides PTR_TO_PACKET also PTR_TO_MAP_VALUE_OR_NULL uses it to
track all registers (incl. spilled ones) with the same reg->id
that originated from the same map lookup. After the reg type is
then migrated to either PTR_TO_MAP_VALUE (resp. CONST_PTR_TO_MAP
for map in map) or UNKNOWN_VALUE depending on the branch, the
reg->id is then reset to 0 again. Whole reason for this is that
LLVM generates code where it can move and/or spill a reg of type
PTR_TO_MAP_VALUE_OR_NULL to other regs before we do the NULL
test on it, and later on it expects that the spilled or moved
regs work wrt access. So they're marked with an id and then all
of them are type migrated. So here meaning of reg->id is different
than in PTR_TO_PACKET case.

Hmm, that means that we can't do arithmetic on a
 PTR_TO_MAP_VALUE_OR_NULL, we have to convert it to a PTR_TO_MAP_VALUE
 first by NULL-checking it.  That's probably fine, but I can just about
 imagine some compiler optimisation reordering them.  Any reason not to
 split this out into a different reg->field, rather than overloading id?


'id' is sort of like 'version' of a pointer and has the same meaning in
both cases. How exactly do you see this split?


Of course that would need (more) caution wrt. states_equal(), but it
 looks like I'll be mangling that a lot anyway - for instance, we don't
 want to just use memcmp() to compare alignments, we want to check that
 our alignment is stricter than the old alignment.  (Of course memcmp()
 is a conservative check, so the "memcmp() the whole reg_state" fast
 path can remain.)


yes. that would be good improvement. Not sure how much it will help
the pruning though.

[PATCH net-next] tcp: fix TCP_SYNCNT flakes

2017-05-23 Thread Eric Dumazet

From: Eric Dumazet 

After the mentioned commit, some of our packetdrill tests became flaky.

TCP_SYNCNT socket option can limit the number of SYN retransmits.

retransmits_timed_out() has to compare times computations based on
local_clock() while timers are based on jiffies. With NTP adjustments
and roundings we can observe 999 ms delay for 1000 ms timers.
We end up sending one extra SYN packet.

Gimmick added in commit 6fa12c850314 ("Revert Backoff [v3]: Calculate
TCP's connection close threshold as a time value") makes no
real sense for TCP_SYN_SENT sockets where no RTO backoff can happen at
all.

Lets use a simpler logic for TCP_SYN_SENT sockets and remove @syn_set
parameter from retransmits_timed_out()

Fixes: 9a568de4818d ("tcp: switch TCP TS option (RFC 7323) to 1ms clock")
Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
---
 net/ipv4/tcp_timer.c |   26 +++---
 1 file changed, 11 insertions(+), 15 deletions(-)

diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c
index 
c4a35ba7f8ed0dac573c864900b081b4847927d8..c0ff962aa31401ee90f8bd015c2aae2ef932
 100644
--- a/net/ipv4/tcp_timer.c
+++ b/net/ipv4/tcp_timer.c
@@ -139,21 +139,17 @@ static void tcp_mtu_probing(struct inet_connection_sock 
*icsk, struct sock *sk)
  *  @timeout:  A custom timeout value.
  * If set to 0 the default timeout is calculated and used.
  * Using TCP_RTO_MIN and the number of unsuccessful retransmits.
- *  @syn_set:  true if the SYN Bit was set.
  *
  * The default "timeout" value this function can calculate and use
  * is equivalent to the timeout of a TCP Connection
  * after "boundary" unsuccessful, exponentially backed-off
- * retransmissions with an initial RTO of TCP_RTO_MIN or TCP_TIMEOUT_INIT if
- * syn_set flag is set.
- *
+ * retransmissions with an initial RTO of TCP_RTO_MIN.
  */
 static bool retransmits_timed_out(struct sock *sk,
  unsigned int boundary,
- unsigned int timeout,
- bool syn_set)
+ unsigned int timeout)
 {
-   unsigned int rto_base = syn_set ? TCP_TIMEOUT_INIT : TCP_RTO_MIN;
+   const unsigned int rto_base = TCP_RTO_MIN;
unsigned int linear_backoff_thresh, start_ts;
 
if (!inet_csk(sk)->icsk_retransmits)
@@ -181,8 +177,8 @@ static int tcp_write_timeout(struct sock *sk)
struct inet_connection_sock *icsk = inet_csk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct net *net = sock_net(sk);
+   bool expired, do_reset;
int retry_until;
-   bool do_reset, syn_set = false;
 
if ((1 << sk->sk_state) & (TCPF_SYN_SENT | TCPF_SYN_RECV)) {
if (icsk->icsk_retransmits) {
@@ -196,9 +192,9 @@ static int tcp_write_timeout(struct sock *sk)
sk_rethink_txhash(sk);
}
retry_until = icsk->icsk_syn_retries ? : 
net->ipv4.sysctl_tcp_syn_retries;
-   syn_set = true;
+   expired = icsk->icsk_retransmits >= retry_until;
} else {
-   if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 0, 
0)) {
+   if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1, 
0)) {
/* Some middle-boxes may black-hole Fast Open _after_
 * the handshake. Therefore we conservatively disable
 * Fast Open on this path on recurring timeouts after
@@ -224,15 +220,15 @@ static int tcp_write_timeout(struct sock *sk)
 
retry_until = tcp_orphan_retries(sk, alive);
do_reset = alive ||
-   !retransmits_timed_out(sk, retry_until, 0, 0);
+   !retransmits_timed_out(sk, retry_until, 0);
 
if (tcp_out_of_resources(sk, do_reset))
return 1;
}
+   expired = retransmits_timed_out(sk, retry_until,
+   icsk->icsk_user_timeout);
}
-
-   if (retransmits_timed_out(sk, retry_until,
- syn_set ? 0 : icsk->icsk_user_timeout, 
syn_set)) {
+   if (expired) {
/* Has it gone just too far? */
tcp_write_err(sk);
return 1;
@@ -540,7 +536,7 @@ void tcp_retransmit_timer(struct sock *sk)
icsk->icsk_rto = min(icsk->icsk_rto << 1, TCP_RTO_MAX);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, icsk->icsk_rto, 
TCP_RTO_MAX);
-   if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1 + 1, 0, 0))
+   if (retransmits_timed_out(sk, net->ipv4.sysctl_tcp_retries1 + 1, 0))
__sk_dst_reset(sk);
 
 out:;

Re: [PATCH v6 net-next 17/17] net: qualcomm: add QCA7000 UART driver

2017-05-23 Thread Stefan Wahren


> Lino Sanfilippo  hat am 23. Mai 2017 um 20:16 
> geschrieben:
> 
> 
> Hi,
> 
> On 23.05.2017 15:12, Stefan Wahren wrote:
> 
> 
> > +}
> > +
> > +static void qca_uart_remove(struct serdev_device *serdev)
> > +{
> > +   struct qcauart *qca = serdev_device_get_drvdata(serdev);
> > +
> > +   netif_carrier_off(qca->net_dev);
> > +   cancel_work_sync(&qca->tx_work);
> > +   unregister_netdev(qca->net_dev);
> 
> Note that it is still possible that the tx work is queued right after 
> cancel_work_sync()
> returned and before the net device is unregistered (and thus the check for 
> the net device
> being up at the beginning of the tx work function is passed and the function 
> is executed).

Even if the carrier is off? Since i see this pattern in some drivers, can you 
please point me to a reference like a thread or something else?

> I suggest to avoid this possible race by first unregistering the netdevice 
> and then 
> calling cancel_work_sync().

What makes you sure that's safe to unregister the netdev while the tx work 
queue is possibly active?

> Regards,
> Lino 
>

Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit

2017-05-23 Thread Kees Cook

On Tue, May 23, 2017 at 11:39 AM, Shubham Bansal
 wrote:
> Here is the patch I sent to the arm mailing list.

Thanks for sending this!

I'm trying to figure out the best way to split this up, but the bulk
of it (the actual JIT changes) seems like they need to all land at the
same time. Any thoughts on this Daniel?

-Kees

-- 
Kees Cook
Pixel Security

[PATCH net-next] net: dsa: support cross-chip ageing time

2017-05-23 Thread Vivien Didelot

Now that the switchdev bridge ageing time attribute is propagated to all
switch chips of the fabric, each switch can check if the requested value
is valid and program itself, so that the whole fabric shares a common
ageing time setting.

This is especially needed for switch chips in between others, containing
no bridge port members but evidently used in the data path.

To achieve that, remove the condition which skips the other switches. We
also don't need to identify the target switch anymore, thus remove the
sw_index member of the dsa_notifier_ageing_time_info notifier structure.

On ZII Dev Rev B (with two 88E6352 and one 88E6185) and ZII Dev Rev C
(with two 88E6390X), we have the following hardware configuration:

# ip link add name br0 type bridge
# ip link set master br0 dev lan6
br0: port 1(lan6) entered blocking state
br0: port 1(lan6) entered disabled state
# echo 2000 > /sys/class/net/br0/bridge/ageing_time

Before this patch:

zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
30
30
15000

zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
30
18750

After this patch:

zii-rev-b# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
15000
15000
15000

zii-rev-c# cat /sys/kernel/debug/mv88e6xxx/sw*/age_time
18750
18750

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa_priv.h | 1 -
 net/dsa/port.c | 1 -
 net/dsa/switch.c   | 4 
 3 files changed, 6 deletions(-)

diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 1d52f9051d0e..c1d4180651af 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -32,7 +32,6 @@ enum {
 struct dsa_notifier_ageing_time_info {
struct switchdev_trans *trans;
unsigned int ageing_time;
-   int sw_index;
 };
 
 /* DSA_NOTIFIER_BRIDGE_* */
diff --git a/net/dsa/port.c b/net/dsa/port.c
index c88c0cec8454..efc3bce3a89d 100644
--- a/net/dsa/port.c
+++ b/net/dsa/port.c
@@ -135,7 +135,6 @@ int dsa_port_ageing_time(struct dsa_port *dp, clock_t 
ageing_clock,
unsigned int ageing_time = jiffies_to_msecs(ageing_jiffies);
struct dsa_notifier_ageing_time_info info = {
.ageing_time = ageing_time,
-   .sw_index = dp->ds->index,
.trans = trans,
};
 
diff --git a/net/dsa/switch.c b/net/dsa/switch.c
index c1e4b2d5a3ae..d8e5c311ee7c 100644
--- a/net/dsa/switch.c
+++ b/net/dsa/switch.c
@@ -37,10 +37,6 @@ static int dsa_switch_ageing_time(struct dsa_switch *ds,
unsigned int ageing_time = info->ageing_time;
struct switchdev_trans *trans = info->trans;
 
-   /* Do not care yet about other switch chips of the fabric */
-   if (ds->index != info->sw_index)
-   return 0;
-
if (switchdev_trans_ph_prepare(trans)) {
if (ds->ageing_time_min && ageing_time < ds->ageing_time_min)
return -ERANGE;
-- 
2.13.0

Re: [PATCH v4 next 1/3] modules:capabilities: allow __request_module() to take a capability argument

2017-05-23 Thread Kees Cook

On Tue, May 23, 2017 at 3:29 AM, Djalal Harouni  wrote:
> On Tue, May 23, 2017 at 12:20 AM, Kees Cook  wrote:
>> On Mon, May 22, 2017 at 4:57 AM, Djalal Harouni  wrote:
>>> This is a preparation patch for the module auto-load restriction feature.
>>>
>>> In order to restrict module auto-load operations we need to check if the
>>> caller has CAP_SYS_MODULE capability. This allows to align security
>>> checks of automatic module loading with the checks of the explicit 
>>> operations.
>>>
>>> However for "netdev-%s" modules, they are allowed to be loaded if
>>> CAP_NET_ADMIN is set. Therefore, in order to not break this assumption,
>>> and allow userspace to only load "netdev-%s" modules with CAP_NET_ADMIN
>>> capability which is considered a privileged operation, we have two
>>> choices: 1) parse "netdev-%s" alias and check the capability or 2) hand
>>> the capability form request_module() to security_kernel_module_request()
>>> hook and let the capability subsystem decide.
>>>
>>> After a discussion with Rusty Russell [1], the suggestion was to pass
>>> the capability from request_module() to security_kernel_module_request()
>>> for 'netdev-%s' modules that need CAP_NET_ADMIN.
>>>
>>> The patch does not update request_module(), it updates the internal
>>> __request_module() that will take an extra "allow_cap" argument. If
>>> positive, then automatic module load operation can be allowed.
>>
>> I find this refactor slightly confusing. I would expect to collapse
>> the existing caps checks in net/core/dev_ioctl.c and
>> net/ipv4/tcp_cong.c, and make this a "required cap" argument, and to
>> add a new non-__ function instead of requiring callers use
>> __request_module.
>>
>> request_module_capable(int cap_required, fmt, args);
>>
>> adjust __request_module() for the new arg, and when cap_required !=
>> -1, perform a cap check.
>>
>> Then make request_module pass -1 to __request_module(), and change
>> dev_ioctl.c (and tcp_cong.c) from:
>>
>> if (no_module && capable(CAP_NET_ADMIN))
>> no_module = request_module("netdev-%s", name);
>> if (no_module && capable(CAP_SYS_MODULE))
>> request_module("%s", name);
>>
>> to:
>>
>> if (no_module)
>> no_module = request_module_capable(CAP_NET_ADMIN,
>> "netdev-%s", name);
>> if (no_module)
>> no_module = request_module_capable(CAP_SYS_MODULE, "%s", 
>> name);
>>
>> that'll make the code cleaner, too.
>
> The refactoring in the patch is more for backward compatibility with
> CAP_NET_ADMIN,
> as discussed here: https://lkml.org/lkml/2017/4/26/147

I think Rusty and I are saying the same thing here, and I must be not
understanding something you're trying to explain. Apologies for being
dense.

> I think if there is an interface request_module_capable() , then code
> will use it. The DCCP code path did not check capabilities at all and
> called request_module(), other code does the same.
>
> A new interface can be abused, the result of this: we may break
> "modules_autoload_mode" in mode 0 and 1. In the long term code will
> want to change may_autoload_module() to also allow mode 1 to load a
> module with CAP_NET_ADMIN or other caps in its own userns, resulting
> in "modules_autoload_mode == 0 == 1". Without userns in the game we
> may just see request_module_capable(CAP_SYS_ADMIN, ...)  . There is
> already some code maybe phonet sockets ? that require CAP_SYS_ADMIN to
> get the appropriate protocol and no one will be able to review all
> this code or track new patches with request_module_capable() callers.

I'm having some trouble following what you're saying here, but if I
understand, you're worried about getting the kernel into a state where
autoload state 0 == 1. Autoload 0 is "business as usual", and autoload
1 is "CAP_SYS_MODULE required to be able to trigger a module auto-load
operation, or CAP_NET_ADMIN for modules with a 'netdev-%s' alias."

In the v4 patch, under autoload==1, CAP_NET_ADMIN is needed to load
netdev- modules:

if (no_module && capable(CAP_NET_ADMIN))
   no_module = __request_module(true, CAP_NET_ADMIN,
"netdev-%s", name);

and in the LSM hook, CAP_NET_ADMIN is passed as an allowable "alias"
for the CAP_SYS_MODULE requirement:

   else if (modules_autoload_mode == MODULES_AUTOLOAD_PRIVILEGED) {
   /* Check CAP_SYS_MODULE then allow_cap if valid */
   if (capable(CAP_SYS_MODULE) ||
   (allow_cap > 0 && capable(allow_cap)))
  return 0;
   }

What I see is some needless double-checking. Since you're making
changes to the request_module() API, it would be possible to have
request_module_cap(), which could be checked instead of open-coding
it:

 if (no_module)
no_module = request_module_cap(CAP_NET_ADMIN, "netdev-%s", name);

If I'm understanding your objection correctly, it's that you want to
ONLY ever provide thi

Re: [RFC V1 1/1] net: cdc_ncm: Reduce memory use when kernel memory low

2017-05-23 Thread Baxter, Jim

From: David S. Miller (da...@davemloft.net)
Sent: Tue, 23 May 2017 11:26:25 -0400 
> From: Oliver Neukum 
> Date: Tue, 23 May 2017 10:42:48 +0200
> 
>>
>> We could use a counter. After the first failure, do it once, after the
>> second twice and so on. And reset the counter as a higher order
>> allocation works. (just bound it somewhere)
> 
> So an exponential backoff, that might work.
> 

As an idea I have created this patch as an addition to the original patch
in this series.

Would this be acceptable?

At the moment I have capped the value at 10, does anyone think it needs to
be much higher then that?

Regards,
Jim


diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c
index c06d20f..0e40603 100644
--- a/drivers/net/usb/cdc_ncm.c
+++ b/drivers/net/usb/cdc_ncm.c
@@ -89,6 +89,8 @@ struct cdc_ncm_stats {
CDC_NCM_SIMPLE_STAT(rx_ntbs),
 };
 
+#define CDC_NCM_LOW_MEM_MAX_CNT 10
+
 static int cdc_ncm_get_sset_count(struct net_device __always_unused *netdev, 
int sset)
 {
switch (sset) {
@@ -,8 +1113,13 @@ struct sk_buff *
 
/* allocate a new OUT skb */
if (!skb_out) {
-   ctx->tx_curr_size = ctx->tx_max;
-   skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+   if (ctx->tx_low_mem_val == 0) {
+   ctx->tx_curr_size = ctx->tx_max;
+   skb_out = alloc_skb(ctx->tx_curr_size, GFP_ATOMIC);
+   if (skb_out == NULL) {
+   ctx->tx_low_mem_max_cnt = 
min(ctx->tx_low_mem_max_cnt + 1, CDC_NCM_LOW_MEM_MAX_CNT);
+   ctx->tx_low_mem_val = ctx->tx_low_mem_max_cnt;
+   }
if (skb_out == NULL) {
/* See if a very small allocation is possible.
 * We will send this packet immediately and hope
@@ -1127,12 +1134,13 @@ struct sk_buff *
 
/* No allocation possible so we will abort */
if (skb_out == NULL) {
-   if (skb != NULL) {
+   if (skb) {
dev_kfree_skb_any(skb);
dev->net->stats.tx_dropped++;
}
goto exit_no_skb;
}
+   ctx->tx_low_mem_val--;
}
/* fill out the initial 16-bit NTB header */
nth16 = (struct usb_cdc_ncm_nth16 *)memset(skb_put(skb_out, 
sizeof(struct usb_cdc_ncm_nth16)), 0, sizeof(struct usb_cdc_ncm_nth16));
diff --git a/include/linux/usb/cdc_ncm.h b/include/linux/usb/cdc_ncm.h
index 5162f38..25a0aed 100644
--- a/include/linux/usb/cdc_ncm.h
+++ b/include/linux/usb/cdc_ncm.h
@@ -118,6 +118,8 @@ struct cdc_ncm_ctx {
u32 rx_max;
u32 tx_max;
u32 tx_curr_size;
+   u32 tx_low_mem_max_cnt;
+   u32 tx_low_mem_val;
u32 max_datagram_size;
u16 tx_max_datagrams;
u16 tx_remainder;

Re: [PATCH v3 net] net: phy: marvell: Limit errata to 88m1101

2017-05-23 Thread Florian Fainelli

On 05/23/2017 08:49 AM, Andrew Lunn wrote:
> The 88m1101 has an errata when configuring autoneg. However, it was
> being applied to many other Marvell PHYs as well. Limit its scope to
> just the 88m1101.
> 
> Fixes: 76884679c644 ("phylib: Add support for Marvell 88eS and 88e1145")
> Reported-by: Daniel Walker 
> Signed-off-by: Andrew Lunn 
> Acked-by: Harini Katakam 

Reviewed-by: Florian Fainelli 

it would be great to be put proper defines (at least for the registers)
for what this work around accomplishes, anyone with the datasheet?

> ---
> 
> v3: Rebase onto net.
> 
> drivers/net/phy/marvell.c | 66 ++-
>  1 file changed, 37 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/net/phy/marvell.c b/drivers/net/phy/marvell.c
> index 272b051a0199..9097e42bec2e 100644
> --- a/drivers/net/phy/marvell.c
> +++ b/drivers/net/phy/marvell.c
> @@ -255,34 +255,6 @@ static int marvell_config_aneg(struct phy_device *phydev)
>  {
>   int err;
>  
> - /* The Marvell PHY has an errata which requires
> -  * that certain registers get written in order
> -  * to restart autonegotiation */
> - err = phy_write(phydev, MII_BMCR, BMCR_RESET);
> -
> - if (err < 0)
> - return err;
> -
> - err = phy_write(phydev, 0x1d, 0x1f);
> - if (err < 0)
> - return err;
> -
> - err = phy_write(phydev, 0x1e, 0x200c);
> - if (err < 0)
> - return err;
> -
> - err = phy_write(phydev, 0x1d, 0x5);
> - if (err < 0)
> - return err;
> -
> - err = phy_write(phydev, 0x1e, 0);
> - if (err < 0)
> - return err;
> -
> - err = phy_write(phydev, 0x1e, 0x100);
> - if (err < 0)
> - return err;
> -
>   err = marvell_set_polarity(phydev, phydev->mdix_ctrl);
>   if (err < 0)
>   return err;
> @@ -316,6 +288,42 @@ static int marvell_config_aneg(struct phy_device *phydev)
>   return 0;
>  }
>  
> +static int m88e1101_config_aneg(struct phy_device *phydev)
> +{
> + int err;
> +
> + /* This Marvell PHY has an errata which requires
> +  * that certain registers get written in order
> +  * to restart autonegotiation
> +  */
> + err = phy_write(phydev, MII_BMCR, BMCR_RESET);
> +
> + if (err < 0)
> + return err;
> +
> + err = phy_write(phydev, 0x1d, 0x1f);
> + if (err < 0)
> + return err;
> +
> + err = phy_write(phydev, 0x1e, 0x200c);
> + if (err < 0)
> + return err;
> +
> + err = phy_write(phydev, 0x1d, 0x5);
> + if (err < 0)
> + return err;
> +
> + err = phy_write(phydev, 0x1e, 0);
> + if (err < 0)
> + return err;
> +
> + err = phy_write(phydev, 0x1e, 0x100);
> + if (err < 0)
> + return err;
> +
> + return marvell_config_aneg(phydev);
> +}
> +
>  static int m88e_config_aneg(struct phy_device *phydev)
>  {
>   int err;
> @@ -1892,7 +1900,7 @@ static struct phy_driver marvell_drivers[] = {
>   .flags = PHY_HAS_INTERRUPT,
>   .probe = marvell_probe,
>   .config_init = &marvell_config_init,
> - .config_aneg = &marvell_config_aneg,
> + .config_aneg = &m88e1101_config_aneg,
>   .read_status = &genphy_read_status,
>   .ack_interrupt = &marvell_ack_interrupt,
>   .config_intr = &marvell_config_intr,
> 


-- 
Florian

Re: [PATCH] net: ieee802154: fix potential null pointer dereference

2017-05-23 Thread Gustavo A. R. Silva


Hi Marcel,

Quoting Marcel Holtmann :


Hi Gustavo,


Null check at line 918: if (!spi) {, implies spi might be NULL.
Function spi_get_drvdata() dereference pointer spi.
Move pointer priv assignment after the null check.

Addresses-Coverity-ID: 140
Signed-off-by: Gustavo A. R. Silva 
---
drivers/net/ieee802154/ca8210.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)


patch has been applied to bluetooth-next tree.



Awesome :)

Thanks
--
Gustavo A. R. Silva

[PATCH] net: ieee802154: fix potential null pointer dereference

2017-05-23 Thread Gustavo A. R. Silva

Null check at line 918: if (!spi) {, implies spi might be NULL.
Function spi_get_drvdata() dereference pointer spi.
Move pointer priv assignment after the null check.

Addresses-Coverity-ID: 140
Signed-off-by: Gustavo A. R. Silva 
---
 drivers/net/ieee802154/ca8210.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ieee802154/ca8210.c b/drivers/net/ieee802154/ca8210.c
index 25fd3b0..ccaf20d 100644
--- a/drivers/net/ieee802154/ca8210.c
+++ b/drivers/net/ieee802154/ca8210.c
@@ -912,7 +912,7 @@ static int ca8210_spi_transfer(
 )
 {
int i, status = 0;
-   struct ca8210_priv *priv = spi_get_drvdata(spi);
+   struct ca8210_priv *priv;
struct cas_control *cas_ctl;
 
if (!spi) {
@@ -923,6 +923,7 @@ static int ca8210_spi_transfer(
return -ENODEV;
}
 
+   priv = spi_get_drvdata(spi);
reinit_completion(&priv->spi_transfer_complete);
 
dev_dbg(&spi->dev, "ca8210_spi_transfer called\n");
-- 
2.5.0

Re: arch: arm: bpf: Converting cBPF to eBPF for arm 32 bit

2017-05-23 Thread Shubham Bansal

Hi Kees, Daniel, Mircea and David,

Here is the patch I sent to the arm mailing list.
Any Comments are welcome.

-- Forwarded message --
From: Shubham Bansal 
Date: Wed, May 24, 2017 at 12:03 AM
Subject: [PATCH] RFC: arm: eBPF JIT compiler
To: li...@armlinux.org.uk
Cc: linux-arm-ker...@lists.infradead.org,
linux-ker...@vger.kernel.org, Shubham Bansal



The JIT compiler emits ARM 32 bit instructions. Currently, It supports
eBPF only. Classic BPF is supported because of the conversion by BPF
core.

JIT is enabled with

echo 1 > /proc/sys/net/core/bpf_jit_enable

Constant Blinding can be enabled along with JIT using

echo 1 > /proc/sys/net/core/bpf_jit_enable
echo 2 > /proc/sys/net/core/bpf_jit_harden

See Documentation/networking/filter.txt for more information.
Tested on ARMv7 with CONFIG_FRAME_POINTER enabled.

Results:

1. Interpreter:

[   93.551176] test_bpf: Summary: 314 PASSED, 0 FAILED, [0/306 JIT'ed]

2. JIT enabled:

[   92.913931] test_bpf: Summary: 314 PASSED, 0 FAILED, [278/306 JIT'ed]

3. JIT + blinding enabled:

[  109.414506] test_bpf: Summary: 314 PASSED, 0 FAILED, [278/306 JIT'ed]

Currently, following eBPF instructions are not JITed.

BPF_ALU64 | BPF_DIV | BPF_K
BPF_ALU64 | BPF_DIV | BPF_X
BPF_ALU64 | BPF_MOD | BPF_K
BPF_ALU64 | BPF_MOD | BPF_X
BPF_STX | BPF_XADD | BPF_W
BPF_STX | BPF_XADD | BPF_DW
BPF_JMP | BPF_CALL

Signed-off-by: Shubham Bansal 
---
 arch/arm/net/bpf_jit_32.c | 2410 ++---
 arch/arm/net/bpf_jit_32.h |  108 +-
 2 files changed, 1716 insertions(+), 802 deletions(-)

diff --git a/arch/arm/net/bpf_jit_32.c b/arch/arm/net/bpf_jit_32.c
index 93d0b6d..338d352 100644
--- a/arch/arm/net/bpf_jit_32.c
+++ b/arch/arm/net/bpf_jit_32.c
@@ -1,13 +1,16 @@
 /*
- * Just-In-Time compiler for BPF filters on 32bit ARM
+ * Just-In-Time compiler for eBPF filters on 32bit ARM
  *
  * Copyright (c) 2011 Mircea Gherzan 
+ * Copyright (c) 2017 Shubham Bansal 
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms of the GNU General Public License as published by the
  * Free Software Foundation; version 2 of the License.
  */
+#define pr_fmt(fmt) "bpf_jit: " fmt

+#include 
 #include 
 #include 
 #include 
@@ -23,44 +26,91 @@

 #include "bpf_jit_32.h"

+int bpf_jit_enable __read_mostly;
+
+#define STACK_OFFSET(k)(k)
+#define TMP_REG_1  (MAX_BPF_JIT_REG + 0)   /* TEMP Register 1 */
+#define TMP_REG_2  (MAX_BPF_JIT_REG + 1)   /* TEMP Register 2 */
+#define TCALL_CNT  (MAX_BPF_JIT_REG + 2)   /* Tail Call Count */
+
+/* Flags used for JIT optimization */
+#define SEEN_CALL  (1 << 0)
+
+#define FLAG_IMM_OVERFLOW  (1 << 0)
+
 /*
- * ABI:
+ * Map eBPF registers to ARM 32bit registers or stack scratch space.
+ *
+ * 1. First argument is passed using the arm 32bit registers and rest of the
+ * arguments are passed on stack scratch space.
+ * 2. First callee-saved aregument is mapped to arm 32 bit registers and rest
+ * arguments are mapped to scratch space on stack.
+ * 3. We need two 64 bit temp registers to do complex operations on eBPF
+ * registers.
+ *
+ * As the eBPF registers are all 64 bit registers and arm has only 32 bit
+ * registers, we have to map each eBPF registers with two arm 32 bit regs or
+ * scratch memory space and we have to build eBPF 64 bit register from those.
  *
- * r0  scratch register
- * r4  BPF register A
- * r5  BPF register X
- * r6  pointer to the skb
- * r7  skb->data
- * r8  skb_headlen(skb)
  */
+static const u8 bpf2a32[][2] = {
+   /* return value from in-kernel function, and exit value from eBPF */
+   [BPF_REG_0] = {ARM_R1, ARM_R0},
+   /* arguments from eBPF program to in-kernel function */
+   [BPF_REG_1] = {ARM_R3, ARM_R2},
+   /* Stored on stack scratch space */
+   [BPF_REG_2] = {STACK_OFFSET(0), STACK_OFFSET(4)},
+   [BPF_REG_3] = {STACK_OFFSET(8), STACK_OFFSET(12)},
+   [BPF_REG_4] = {STACK_OFFSET(16), STACK_OFFSET(20)},
+   [BPF_REG_5] = {STACK_OFFSET(24), STACK_OFFSET(28)},
+   /* callee saved registers that in-kernel function will preserve */
+   [BPF_REG_6] = {ARM_R5, ARM_R4},
+   /* Stored on stack scratch space */
+   [BPF_REG_7] = {STACK_OFFSET(32), STACK_OFFSET(36)},
+   [BPF_REG_8] = {STACK_OFFSET(40), STACK_OFFSET(44)},
+   [BPF_REG_9] = {STACK_OFFSET(48), STACK_OFFSET(52)},
+   /* Read only Frame Pointer to access Stack */
+   [BPF_REG_FP] = {STACK_OFFSET(56), STACK_OFFSET(60)},
+   /* Temperory Register for internal BPF JIT, can be used
+* for constant blindings and others.
+*/
+   [TMP_REG_1] = {ARM_R7, ARM_R6},
+   [TMP_REG_2] = {ARM_R10, ARM_R8},
+   /* Tail call count. Stored on stack scratch space. */
+   [TCALL_CNT] = {STACK_OFFSET(64), STACK_OFFSET(68)},
+   /* temporary register for blinding constants.
+

Re: [kernel-hardening] [PATCH v4 next 0/3] modules: automatic module loading restrictions

2017-05-23 Thread Kees Cook

On Tue, May 23, 2017 at 12:48 AM, Solar Designer  wrote:
> For modules_autoload_mode=2, we already seem to have the equivalent of
> modprobe=/bin/true (or does it differ subtly, maybe in return values?),
> which I already use at startup on a GPU box like this (preloading
> modules so that the OpenCL backends wouldn't need the autoloading):
>
> nvidia-smi
> nvidia-modprobe -u -c=0
> #modprobe nvidia_uvm
> #modprobe fglrx
>
> sysctl -w kernel.modprobe=/bin/true
> sysctl -w kernel.hotplug=/bin/true
>
> but it's good to also have this supported more explicitly and more
> consistently through modules_autoload_mode=2 while we're at it.  So I
> support having this mode as well.  I just question the need to have it
> non-resettable.

I agree it's useful to have the explicit =2 state just to avoid
confusion when more systems start implementing
CONFIG_STATIC_USERMODEHELPER and kernel.modprobe becomes read-only
(though the userspace implementation may allow for some way to disable
it, etc). I just like avoiding the upcall to modprobe at all.

-Kees

-- 
Kees Cook
Pixel Security

Re: [PATCHv4] wlcore: add wl1285 compatible

2017-05-23 Thread Sebastian Reichel

Hi,

On Mon, May 22, 2017 at 11:50:04AM -0500, Rob Herring wrote:
> [...]
> >> >> Thanks, I'll take it then. Not sure why Sebastian was suggested to
> >> >> submit this patch via your tree in the first place.
> >> >>
> >> >> https://patchwork.kernel.org/patch/9713645/
> >> >
> >> > Thanks. The idea was to get into early 4.12-rc to avoid merge
> >> > conflicts in the droid 4 *.dts during 4.13 cycle. This strategy
> >> > obviously failed :)
> >>
> >> First, I'm not sure why you combined everything. A maintainer can just
> >> as easily take a series as a single patch and we prefer binding doc,
> >> dts and driver changes all separate.
> >>
> >> Second, the dts changes could go thru arm-soc and the driver change
> >> thru netdev. The binding doc can be thru either. There's no bisecting
> >> dependency and things shouldn't break. It just won't all work until
> >> you have both branches.
> >
> > This is only true for new devices. WLAN for droid4 works at the
> > moment using incorrect compatible string. If *.dts is updated and
> > driver is not yet updated WLAN does not work. IMHO that is a
> > bisecting dependency.
> 
> True. That's also breaking compatibility if a new kernel doesn't work
> with an old DT.

This way around works. It's the other way around, that does not work
(new DT with old kernel).

> Is it just a compatible string change? If so, then just keep the
> old string as a fallback.

That should work.

-- Sebastian


signature.asc
Description: PGP signature

Re: [PATCH net-next] sctp: no need to check asoc_id before calling sctp_assoc_set_id

2017-05-23 Thread Xin Long

On Wed, May 24, 2017 at 2:04 AM, Marcelo Ricardo Leitner
 wrote:
[...]
>
> On Tue, May 23, 2017 at 02:30:32PM +0800, Xin Long wrote:
>> sctp_assoc_set_id has already done the asoc_id check in the beginning
>   s//does/
> Very important to use the right tense here as otherwise it gives the
> impression that the check was already made, which is not true.
Sure, will change it and post v2.
But may be later than another two patches in my hands.

>
>> when processing dupcook, no need to do the same check before calling
> s/^^^/dupcookie/
Oh, can't believe it's still here ;)
will correct it. thanks.

>
> other than this, LGTM, including the comment update.
>
>   Marcelo
>
>> it.
>>
>> Signed-off-by: Xin Long 
>> ---
>>  net/sctp/associola.c | 8 ++--
>>  1 file changed, 2 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
>> index a9708da..ce2a3ec 100644
>> --- a/net/sctp/associola.c
>> +++ b/net/sctp/associola.c
>> @@ -1181,12 +1181,8 @@ void sctp_assoc_update(struct sctp_association *asoc,
>>   new->stream = NULL;
>>   }
>>
>> - if (!asoc->assoc_id) {
>> - /* get a new association id since we don't have one
>> -  * yet.
>> -  */
>> - sctp_assoc_set_id(asoc, GFP_ATOMIC);
>> - }
>> + /* get a new association id if we don't have one yet. */
>> + sctp_assoc_set_id(asoc, GFP_ATOMIC);
>>   }
>>
>>   /* SCTP-AUTH: Save the peer parameters from the new associations
>> --
>> 2.1.0
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>

Re: [PATCH v6 net-next 17/17] net: qualcomm: add QCA7000 UART driver

2017-05-23 Thread Lino Sanfilippo

Hi,

On 23.05.2017 15:12, Stefan Wahren wrote:

> +}
> +
> +static void qca_uart_remove(struct serdev_device *serdev)
> +{
> + struct qcauart *qca = serdev_device_get_drvdata(serdev);
> +
> + netif_carrier_off(qca->net_dev);
> + cancel_work_sync(&qca->tx_work);
> + unregister_netdev(qca->net_dev);

Note that it is still possible that the tx work is queued right after 
cancel_work_sync()
returned and before the net device is unregistered (and thus the check for the 
net device
being up at the beginning of the tx work function is passed and the function is 
executed).
I suggest to avoid this possible race by first unregistering the netdevice and 
then 
calling cancel_work_sync().

Regards,
Lino

Re: [PATCH] net: ieee802154: fix potential null pointer dereference

2017-05-23 Thread Marcel Holtmann

Hi Gustavo,

> Null check at line 918: if (!spi) {, implies spi might be NULL.
> Function spi_get_drvdata() dereference pointer spi.
> Move pointer priv assignment after the null check.
> 
> Addresses-Coverity-ID: 140
> Signed-off-by: Gustavo A. R. Silva 
> ---
> drivers/net/ieee802154/ca8210.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel

Re: [PATCH 1/2] net: phy: Update get_phy_c45_ids for Cortina PHYs

2017-05-23 Thread Florian Fainelli

On 05/23/2017 09:55 AM, Andrew Lunn wrote:
>> The patches mentioned in the commit message add _some_ support for
>> the Cortina PHYs - mainly checking for devices at additional
>> locations. Once they are found, the phy IDs must be read from custom
>> locations.
>  
> As a general principle, we don't add hacks in generic code to handle
> broken devices. We add generic mechanisms to work around the
> brokenness.
> 
> In this case, by using ethernet-phy-id in the device tree, we are
> saying, this PHYs probing is totally borked, but we know it is there,
> at this address. Just load the driver.
> 
> Please try to make ethernet-phy-id work.

What Andrew is suggesting is to leverage the code in
drivers/of/of_mdio.c which does the following:

   is_c45 = of_device_is_compatible(child,
 "ethernet-phy-ieee802.3-c45");

if (!is_c45 && !of_get_phy_id(child, &phy_id))
phy = phy_device_create(mdio, addr, phy_id, 0, NULL);
else
phy = get_phy_device(mdio, addr, is_c45);
if (IS_ERR(phy))
return;

If you know the PHY ID, and you did put it in the PHY node's compatible
string (in the format that of_get_phy_id() expects it to, and you also
did not add "ethernet-phy-ieee802.3-c45") then the PHY library will
directly create the PHY device, with the designated ID, at the specific
address.

While this works for clause 22 PHYs, I don't know if it also does for
clause 45 PHYs, but as Andrew is suggesting, I would be more inclined
into making this scheme work for all types (22 or 45) PHYs, rather than
hacking the core code that tries to identify devices in packages.

Can you give it a spin?
-- 
Florian

[PATCH] brcmfmac: Fix kernel oops on resume when request firmware fails.

2017-05-23 Thread Enric Balletbo i Serra

When request firmware fails, brcmf_ops_sdio_remove is being called and
brcmf_bus freed. In such circumstancies if you do a suspend/resume cycle
the kernel hangs on resume due a NULL pointer dereference in resume
function.

Steps to reproduce the problem:
 - modprobe brcmfmac without the firmware
 brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac4354-sdio.bin
 failed with error -2
 - do a suspend/resume cycle (echo mem > /sys/power/state)

Protect against the NULL pointer derefence by checking if dev_get_drvdata
returned a valid pointer.

Signed-off-by: Enric Balletbo i Serra 
---
I'm not sure about if this is the correct way to fix this but at least it
prevents the kernel to hang. From one side I'm not sure why suspend/resume
functions are called in such case and why the device is not removed from
the bus, from the other side I saw, that others drivers only unregisters
from sdio when the driver is removed so I supose this is the normal behavior.

Cheers,
 Enric

 drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
index 9b970dc..aa0e7c2 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/bcmsdh.c
@@ -1274,14 +1274,16 @@ static int brcmf_ops_sdio_suspend(struct device *dev)
 static int brcmf_ops_sdio_resume(struct device *dev)
 {
struct brcmf_bus *bus_if = dev_get_drvdata(dev);
-   struct brcmf_sdio_dev *sdiodev = bus_if->bus_priv.sdio;
struct sdio_func *func = container_of(dev, struct sdio_func, dev);
 
brcmf_dbg(SDIO, "Enter: F%d\n", func->num);
if (func->num != SDIO_FUNC_2)
return 0;
 
-   brcmf_sdiod_freezer_off(sdiodev);
+   if (!bus_if)
+   return 0;
+
+   brcmf_sdiod_freezer_off(bus_if->bus_priv.sdio);
return 0;
 }
 
@@ -1319,4 +1321,3 @@ void brcmf_sdio_exit(void)
 
sdio_unregister_driver(&brcmf_sdmmc_driver);
 }
-
-- 
2.9.3

Re: [PATCH net v2 1/2] net: ieee802154: remove explicit set skb->sk

2017-05-23 Thread Marcel Holtmann

Hi Lin,

> Explicit set skb->sk is needless, sock_alloc_send_skb is already set it.
> 
> Signed-off-by: Lin Zhang 
> Acked-by: Stefan Schmidt 
> ---
> changelog:
> 
> v1 -> v2:
>* split v1 into two patches, per Stefan Schmidt.
> 
> Thanks to Stefan Schmidt for reviewing !
> ---
> net/ieee802154/socket.c | 2 --
> 1 file changed, 2 deletions(-)

patch has been applied to bluetooth-next tree.

Regards

Marcel

Re: [PATCH net v2 2/2] net: ieee802154: fix net_device reference release too early

2017-05-23 Thread Marcel Holtmann

Hi Lin,

> This patch fixes the kernel oops when release net_device reference in
> advance. In function raw_sendmsg(i think the dgram_sendmsg has the same
> problem), there is a race condition between dev_put and dev_queue_xmit
> when the device is gong that maybe lead to dev_queue_ximt to see
> an illegal net_device pointer.
> 
> My test kernel is 3.13.0-32 and because i am not have a real 802154 
> device, so i change lowpan_newlink function to this:
> 
>/* find and hold real wpan device */
>real_dev = dev_get_by_index(src_net, nla_get_u32(tb[IFLA_LINK]));
>if (!real_dev)
>return -ENODEV;
> //  if (real_dev->type != ARPHRD_IEEE802154) {
> //  dev_put(real_dev);
> //  return -EINVAL;
> //  }
>lowpan_dev_info(dev)->real_dev = real_dev;
>lowpan_dev_info(dev)->fragment_tag = 0;
>mutex_init(&lowpan_dev_info(dev)->dev_list_mtx);
> 
> Also, in order to simulate preempt, i change the raw_sendmsg function 
> to this:
> 
>skb->dev = dev;
>skb->sk  = sk;
>skb->protocol = htons(ETH_P_IEEE802154);
>dev_put(dev);
>//simulate preempt
>schedule_timeout_uninterruptible(30 * HZ);
>err = dev_queue_xmit(skb);
>if (err > 0)
>err = net_xmit_errno(err);
> 
> and this is my userspace test code named test_send_data:
> 
> int main(int argc, char **argv)
> {
>char buf[127];
>int sockfd;
>sockfd = socket(AF_IEEE802154, SOCK_RAW, 0);
>if (sockfd < 0) {
>printf("create sockfd error: %s\n", strerror(errno));
>return -1;
>}
>send(sockfd, buf, sizeof(buf), 0);
>return 0;
> }
> 
> 
> This is my test case:
> 
> root@zhanglin-x-computer:~/develop/802154# uname -a
> Linux zhanglin-x-computer 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15
> 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
> root@zhanglin-x-computer:~/develop/802154# ip link add link eth0 name
> lowpan0 type lowpan
> root@zhanglin-x-computer:~/develop/802154#
> //keep the lowpan0 device down
> root@zhanglin-x-computer:~/develop/802154# ./test_send_data &
> //wait a while
> root@zhanglin-x-computer:~/develop/802154# ip link del link dev lowpan0
> //the device is gone
> //oops
> [381.303307] general protection fault:  [#1]SMP
> [381.303407] Modules linked in: af_802154 6lowpan bnep rfcomm
> bluetooth nls_iso8859_1 snd_hda_codec_hdmi snd_hda_codec_realtek
> rts5139(C) snd_hda_intel
> snd_had_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
> snd_seq_midi_event snd_rawmidi snd_req intel_rapl snd_seq_device
> coretemp i915 kvm_intel
> kvm snd_timer snd crct10dif_pclmul crc32_pclmul ghash_clmulni_intel
> cypted drm_kms_helper drm i2c_algo_bit soundcore video mac_hid
> parport_pc ppdev ip parport hid_generic
> usbhid hid ahci r8169 mii libahdi
> [381.304286] CPU:1 PID: 2524 Commm: 1 Tainted: G C 0 3.13.0-32-generic
> [381.304409] Hardware name: Haier Haier DT Computer/Haier DT Codputer,
> BIOS FIBT19H02_X64 06/09/2014
> [381.304546] tasks: 96965fc0 ti: B0013779c000 task.ti:
> B8013779c000
> [381.304659] RIP: 0010:[] []
> __dev_queue_ximt+0x61/0x500
> [381.304798] RSP: 0018:B8013779dca0 EFLAGS: 00010202
> [381.304880] RAX: 272b031d57565351 RBX:  RCX: 8800968f1a00
> [381.304987] RDX:  RSI:  RDI: 8800968f1a00
> [381.305095] RBP: 8e013773dce0 R08: 0266 R09: 0004
> [381.305202] R10: 0004 R11: 0005 R12: 88013902e000
> [381.305310] R13: 007f R14: 007f R15: 8800968f1a00
> [381.305418] FS:  7fc57f50f740() GS: 88013fc8()
> knlGS: 
> [381.305540] CS:  0010 DS:  ES:  CR0: 8005003b
> [381.305627] CR2: 7fad0841c000 CR3: 0001368dd000 CR4: 001007e0
> [361.905734] Stack:
> [381.305768]  002052d0 3facb30a 88013779dcc0
> 880137764000
> [381.305898]  88013779de70 007f 007f
> 88013902e000
> [381.306026]  88013779dcf0 81622490 88013779dd39
> a03af9f1
> [381.306155] Call Trace:
> [381.306202]  [] dev_queue_xmit+0x10/0x20
> [381.306294]  [] raw_sendmsg+0x1b1/0x270 [af_802154]
> [381.306396]  [] ieee802154_sock_sendmsg+0x14/0x20 
> [af_802154]
> [381.306512]  [] sock_sendmsg+0x8b/0xc0
> [381.306600]  [] ? __d_alloc+0x25/0x180
> [381.306687]  [] ? kmem_cache_alloc_trace+0x1c6/0x1f0
> [381.306791]  [] SYSC_sendto+0x121/0x1c0
> [381.306878]  [] ? vtime_account_user+x54/0x60
> [381.306975]  [] ? syscall_trace_enter+0x145/0x250
> [381.307073]  [] SyS_sendto+0xe/0x10
> [381.307156]  [] tracesys+0xe1/0xe6
> [381.307233] Code: c6 a1 a4 ff 41 8b 57 78 49 8b 47 20 85 d2 48 8b 80
> 78 07 00 00 75 21 49 8b 57 18 48 85 d2 74 18 48 85 c0 74 13 8b 92 ac
> 01 00 00 <3b> 50 10 73 08 8b 44 90 14 41 89 47 78 41 f6 84 24 d5 00 00
> 00
> [381.307801] RIP [] _dev_

Re: [PATCH net-next] sctp: no need to check asoc_id before calling sctp_assoc_set_id

2017-05-23 Thread Marcelo Ricardo Leitner

Hi,

On Tue, May 23, 2017 at 02:30:32PM +0800, Xin Long wrote:
> sctp_assoc_set_id has already done the asoc_id check in the beginning
  s//does/
Very important to use the right tense here as otherwise it gives the
impression that the check was already made, which is not true.

> when processing dupcook, no need to do the same check before calling
s/^^^/dupcookie/

other than this, LGTM, including the comment update.

  Marcelo

> it.
> 
> Signed-off-by: Xin Long 
> ---
>  net/sctp/associola.c | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/net/sctp/associola.c b/net/sctp/associola.c
> index a9708da..ce2a3ec 100644
> --- a/net/sctp/associola.c
> +++ b/net/sctp/associola.c
> @@ -1181,12 +1181,8 @@ void sctp_assoc_update(struct sctp_association *asoc,
>   new->stream = NULL;
>   }
>  
> - if (!asoc->assoc_id) {
> - /* get a new association id since we don't have one
> -  * yet.
> -  */
> - sctp_assoc_set_id(asoc, GFP_ATOMIC);
> - }
> + /* get a new association id if we don't have one yet. */
> + sctp_assoc_set_id(asoc, GFP_ATOMIC);
>   }
>  
>   /* SCTP-AUTH: Save the peer parameters from the new associations
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-sctp" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Re: 4.12-RC2 BUG: scheduling while atomic: irq/47-iwlwifi

2017-05-23 Thread Sander Eikelenboom

On 22/05/17 23:02, Arend Van Spriel wrote:
> 
> 
> On 22-5-2017 14:09, Arend van Spriel wrote:
>> On 5/22/2017 12:57 PM, Johannes Berg wrote:
>>> On Mon, 2017-05-22 at 12:36 +0200, Sander Eikelenboom wrote:
 Hi,

 I encountered this splat with 4.12-RC2.
>>>
>>> Ugh, yeah, I should've seen that in the review.
>>>
>>> Arend, please take a look at this. cfg80211_sched_scan_results() cannot
>>> sleep, so you can't rtnl_lock() in there. Looks like you can just rely
>>> on RCU though?
>>
>> I see. I think you are right on RCU. Don't have the code in front of me
>> now, but I think the lookup has an ASSERT_RTNL. Will look into it after
>> my monday meeting :-p
> 
> I realized I have a laptop lying around with intel 3160 wifi chip and
> tried to reproduce the issue. Did not run into the splat running
> 4.12-rc1 from wireless-drivers-next repo. I did not get the email from
> Sander so I don't know any details.
> 
> Here is what I changed based on the info Johannes provided. Can you
> please check if this get rid of the splat and let me know.

Hi Arend,

I ran your patch today, so far no issues.

--
Sander


> Regards,
> Arend
> ---
> diff --git a/net/wireless/scan.c b/net/wireless/scan.c
> index 14d5f0c..04833bb 100644
> --- a/net/wireless/scan.c
> +++ b/net/wireless/scan.c
> @@ -322,9 +322,7 @@ static void cfg80211_del_sched_scan_req(struct
> cfg80211_regi
>  {
> struct cfg80211_sched_scan_request *pos;
> 
> -   ASSERT_RTNL();
> -
> -   list_for_each_entry(pos, &rdev->sched_scan_req_list, list) {
> +   list_for_each_entry_rcu(pos, &rdev->sched_scan_req_list, list) {
> if (pos->reqid == reqid)
> return pos;
> }
> @@ -398,13 +396,13 @@ void cfg80211_sched_scan_results(struct wiphy
> *wiphy, u64
> trace_cfg80211_sched_scan_results(wiphy, reqid);
> /* ignore if we're not scanning */
> 
> -   rtnl_lock();
> +   rcu_read_lock();
> request = cfg80211_find_sched_scan_req(rdev, reqid);
> if (request) {
> request->report_results = true;
> queue_work(cfg80211_wq, &rdev->sched_scan_res_wk);
> }
> -   rtnl_unlock();
> +   rcu_read_unlock();
>  }
>  EXPORT_SYMBOL(cfg80211_sched_scan_results);
> 
>

Re: Alignment in BPF verifier

2017-05-23 Thread Edward Cree

Another issue: it looks like the min/max_value handling for subtraction is
 bogus.  In adjust_reg_min_max_vals() we have
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
dst_reg->min_value -= min_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
dst_reg->max_value -= max_val;
 where min_val and max_val refer to the src_reg.
But surely they should be used the other way round; if (say) 2 <= R1 <= 6
 and 1 <= R2 <= 4, then this will claim 1 <= (R1 - R2) <= 2, whereas really
 (R1 - R2) could be anything from -2 to 5.
This also means that the code just above the switch,
if (min_val == BPF_REGISTER_MIN_RANGE)
dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
if (max_val == BPF_REGISTER_MAX_RANGE)
dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
 is wrong, since e.g. subtracting MAX_RANGE needs to blow our min_value,
 not our max_value.

-Ed

[PATCH V3 net 2/3] be2net: Fix offload features for Q-in-Q packets

2017-05-23 Thread Vladislav Yasevich

At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.

This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.

CC: Sathya Perla 
CC: Ajit Khaparde 
CC: Sriharsha Basavapatna 
CC: Somnath Kotur 
Signed-off-by: Vladislav Yasevich 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index f3a09ab..4eee18c 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5078,9 +5078,11 @@ static netdev_features_t be_features_check(struct 
sk_buff *skb,
struct be_adapter *adapter = netdev_priv(dev);
u8 l4_hdr = 0;
 
-   /* The code below restricts offload features for some tunneled packets.
+   /* The code below restricts offload features for some tunneled and
+* Q-in-Q packets.
 * Offload features for normal (non tunnel) packets are unchanged.
 */
+   features = vlan_features_check(skb, features);
if (!skb->encapsulation ||
!(adapter->flags & BE_FLAGS_VXLAN_OFFLOADS))
return features;
-- 
2.7.4

[PATCH V3 net 3/3] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

2017-05-23 Thread Vladislav Yasevich

Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it.  This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.

Acked-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Vladislav Yasevich 
---
 drivers/net/virtio_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 665627c..ead7a58 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2020,6 +2020,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_poll_controller = virtnet_netpoll,
 #endif
.ndo_xdp= virtnet_xdp,
+   .ndo_features_check = passthru_features_check,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
-- 
2.7.4

[PATCH V3 net 0/3] Fix checksum issues with Q-in-Q vlans

2017-05-23 Thread Vladislav Yasevich

TCP checksum appear broken on a lot of devices that
advertise NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM.  This problem
becomes very visible/reproducable since the series
commit afb0bc972b526 ("Merge branch 'stacked_vlan_tso'").

In particular, the issue appeared consistently on bnx2 and be2net
drivers (not all drivers were tested).

This short series corrects this by disabling checksum offload
support on packets sent through Q-in-Q vlans if the underlying HW only
enables IP specific checksum features.  We currently 'assume' that
any drivers setting NETIF_F_HW_CSUM can correclty pass checksum offsets
to HW.  It is up to individual drivers to enable it properly through
ndo_features_check if they have some support for Q-in-Q vlans.

Additionally, be2net driver was fixed to make the proper call.

While looking at the drivers, it was also found that virtio-net ended
up disabling accelerations, which is unnecessary.  

V3: Fixed checkpatch errors.

V2: Instead of disabling checksuming for all devices, only devices using
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM are now affected by this change.
For drivers using NETIF_F_HW_CSUM, we will continue to use checksum
offloading.  If any drivers are found to be broken, they would need
be fixed individually.

Vladislav Yasevich (3):
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

 drivers/net/ethernet/emulex/benet/be_main.c |  4 +++-
 drivers/net/virtio_net.c|  1 +
 include/linux/if_vlan.h | 10 --
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.7.4

[PATCH V3 net 1/3] vlan: Fix tcp checksum offloads in Q-in-Q vlans

2017-05-23 Thread Vladislav Yasevich

It appears that TCP checksum offloading has been broken for
Q-in-Q vlans.  The behavior was execerbated by the
series
commit afb0bc972b52 ("Merge branch 'stacked_vlan_tso'")
that that enabled accleleration features on stacked vlans.

However, event without that series, it is possible to trigger
this issue.  It just requires a lot more specialized configuration.

The root cause is the interaction between how
netdev_intersect_features() works, the features actually set on
the vlan devices and HW having the ability to run checksum with
longer headers.

The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums.  This happens
for tagged and multi-tagged packets.   However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.

This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.

CC: Toshiaki Makita 
CC: Michal Kubecek 
Signed-off-by: Vladislav Yasevich 
---
 include/linux/if_vlan.h | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 8d5fcd6..6686d0f 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -614,14 +614,16 @@ static inline bool skb_vlan_tagged_multi(const struct 
sk_buff *skb)
 static inline netdev_features_t vlan_features_check(const struct sk_buff *skb,
netdev_features_t features)
 {
-   if (skb_vlan_tagged_multi(skb))
-   features = netdev_intersect_features(features,
-NETIF_F_SG |
-NETIF_F_HIGHDMA |
-NETIF_F_FRAGLIST |
-NETIF_F_HW_CSUM |
-NETIF_F_HW_VLAN_CTAG_TX |
-NETIF_F_HW_VLAN_STAG_TX);
+   if (skb_vlan_tagged_multi(skb)) {
+   /* In the case of multi-tagged packets, use a direct mask
+* instead of using netdev_interesect_features(), to make
+* sure that only devices supporting NETIF_F_HW_CSUM will
+* have checksum offloading support.
+*/
+   features &= NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_HW_CSUM |
+   NETIF_F_FRAGLIST | NETIF_F_HW_VLAN_CTAG_TX |
+   NETIF_F_HW_VLAN_STAG_TX;
+   }
 
return features;
 }
-- 
2.7.4

Re: [PATCH net v2] ipv6: sr: fix user space compilation error with old glibc

2017-05-23 Thread Daniel Borkmann


On 05/15/2017 04:21 PM, David Lebrun wrote:

On 05/15/2017 04:09 PM, David Miller wrote:

Please, no.

The reason we put together a method by which glibc and the kernel can
stay out of eachother's way in header files is exactly so that we
don't need ifdefs that conditionally do netinet/in.h vs. using the
kernel header.

There are more than a dozen other UAPI headers which make use of
linux/in6.h and none of them jump through hoops like what is being
proposed here, and that's on purpose.

So special casing this one one header is really not the way to go.


Mmmh it's true that special casing in kernel headers for a user space
issue is not the best way to go.. I'll find a way to solve this in user
space only.

Sorry about the lousy patch.


Any new outcomes so far?

Thanks,
Daniel

ATENCIÓN

2017-05-23 Thread administrador

ATENCIÓN;

Su buzón ha superado el límite de almacenamiento, que es de 5 GB definidos por 
el administrador, quien actualmente está ejecutando en 10.9GB, no puede ser 
capaz de enviar o recibir correo nuevo hasta que vuelva a validar su buzón de 
correo electrónico. Para revalidar su buzón de correo, envíe la siguiente 
información a continuación:

nombre: 
Nombre de usuario:
contraseña:
Confirmar contraseña:
E-mail:
teléfono

Si usted no puede revalidar su buzón, el buzón se deshabilitará!

Disculpa las molestias.
 Código de verificación: es: 006524
Correo Soporte Técnico © 2017

¡gracias
Sistemas administrador

Re: [PATCH net-next 1/2] perf, bpf: add support for HW_CACHE and RAW events

2017-05-23 Thread Alexei Starovoitov


On 5/23/17 9:31 AM, Peter Zijlstra wrote:

On Tue, May 23, 2017 at 07:38:08AM -0700, Alexei Starovoitov wrote:

On 5/23/17 12:42 AM, Peter Zijlstra wrote:

On Mon, May 22, 2017 at 03:48:39PM -0700, Alexei Starovoitov wrote:

From: Teng Qin 

This commit adds support for attach BPF program to RAW and HW_CACHE type
events, and support for read HW_CACHE type event counters in BPF
program. Existing code logic already supports them, so this commit is
just update Enum value checks.


So what I'm missing is why they were not supported previously, and what
changed to allow it now.


that code path simply wasn't tested previously. Nothing changed on
bpf side and on perf side.
Why it wasn't added on day one? There was no demand. Now people
use bpf more and more and few folks got confused that these types
of perf events were not supported, hence we're adding it.


OK. Is there anything stopping people from wanting to use the dynamic
types, as found in:

  /sys/bus/event_source/devices/*/type

?

In which case, do we want something like this instead?


diff --git a/kernel/events/core.c b/kernel/events/core.c
index 971f7259108f..4aa5f3011cf8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8063,12 +8063,8 @@ static int perf_event_set_bpf_prog(struct perf_event 
*event, u32 prog_fd)
bool is_kprobe, is_tracepoint;
struct bpf_prog *prog;

-   if (event->attr.type == PERF_TYPE_HARDWARE ||
-   event->attr.type == PERF_TYPE_SOFTWARE)
-   return perf_event_set_bpf_handler(event, prog_fd);
-
if (event->attr.type != PERF_TYPE_TRACEPOINT)
-   return -EINVAL;
+   return perf_event_set_bpf_handler(event, prog_fd);


Good point. We were actually looking at how to deal with msr and cstate
events. That should indeed address it.
Will respin.
Thanks for the feedback!

Re: [PATCH 1/2] net: phy: Update get_phy_c45_ids for Cortina PHYs

2017-05-23 Thread Andrew Lunn

> The patches mentioned in the commit message add _some_ support for
> the Cortina PHYs - mainly checking for devices at additional
> locations. Once they are found, the phy IDs must be read from custom
> locations.
 
As a general principle, we don't add hacks in generic code to handle
broken devices. We add generic mechanisms to work around the
brokenness.

In this case, by using ethernet-phy-id in the device tree, we are
saying, this PHYs probing is totally borked, but we know it is there,
at this address. Just load the driver.

Please try to make ethernet-phy-id work.

   Andrew

[PATCH V2 net 3/3] virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

2017-05-23 Thread Vladislav Yasevich

Since virtio does not provide it's own ndo_features_check handler,
TSO, and now checksum offload, are disabled for stacked vlans.
Re-enable the support and let the host take care of it.  This
restores/improves Guest-to-Guest performance over Q-in-Q vlans.

Acked-by: Jason Wang 
Acked-by: Michael S. Tsirkin 
Signed-off-by: Vladislav Yasevich 
---
 drivers/net/virtio_net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 665627c..ead7a58 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2020,6 +2020,7 @@ static const struct net_device_ops virtnet_netdev = {
.ndo_poll_controller = virtnet_netpoll,
 #endif
.ndo_xdp= virtnet_xdp,
+   .ndo_features_check = passthru_features_check,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
-- 
2.7.4

[PATCH V2 net 2/3] be2net: Fix offload features for Q-in-Q packets

2017-05-23 Thread Vladislav Yasevich

At least some of the be2net cards do not seem to be capabled
of performing checksum offload computions on Q-in-Q packets.
In these case, the recevied checksum on the remote is invalid
and TCP syn packets are dropped.

This patch adds a call to check disbled acceleration features
on Q-in-Q tagged traffic.

CC: Sathya Perla 
CC: Ajit Khaparde 
CC: Somnath Kotur 
Signed-off-by: Vladislav Yasevich 
---
 drivers/net/ethernet/emulex/benet/be_main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/emulex/benet/be_main.c 
b/drivers/net/ethernet/emulex/benet/be_main.c
index f3a09ab..4eee18c 100644
--- a/drivers/net/ethernet/emulex/benet/be_main.c
+++ b/drivers/net/ethernet/emulex/benet/be_main.c
@@ -5078,9 +5078,11 @@ static netdev_features_t be_features_check(struct 
sk_buff *skb,
struct be_adapter *adapter = netdev_priv(dev);
u8 l4_hdr = 0;
 
-   /* The code below restricts offload features for some tunneled packets.
+   /* The code below restricts offload features for some tunneled and
+* Q-in-Q packets.
 * Offload features for normal (non tunnel) packets are unchanged.
 */
+   features = vlan_features_check(skb, features);
if (!skb->encapsulation ||
!(adapter->flags & BE_FLAGS_VXLAN_OFFLOADS))
return features;
-- 
2.7.4

[PATCH V2 net 0/3] Fix checksum issues with Q-in-Q vlans

2017-05-23 Thread Vladislav Yasevich

TCP checksum appear broken on a lot of devices that
advertise NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM.  This problem
becomes very visible/reproducable since the series
commit afb0bc972b526 ("Merge branch 'stacked_vlan_tso'").

In particular, the issue appeared consistently on bnx2 and be2net
drivers (not all drivers were tested).

This short series corrects this by disabling checksum offload
support on packets sent through Q-in-Q vlans if the underlying HW only
enables IP specific checksum features.  We currently 'assume' that
any drivers setting NETIF_F_HW_CSUM can correclty pass checksum offsets
to HW.  It is up to individual drivers to enable it properly through
ndo_features_check if they have some support for Q-in-Q vlans.

Additionally, be2net driver was fixed to make the proper call.

While looking at the drivers, it was also found that virtio-net ended
up disabling accelerations, which is unnecessary.  

V2: Instead of disabling checksuming for all devices, only devices using
NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM are now affected by this change.
For drivers using NETIF_F_HW_CSUM, we will continue to use checksum
offloading.  If any drivers are found to be broken, they would need
be fixed individually.

Vladislav Yasevich (3):
  vlan: Fix tcp checksum offloads in Q-in-Q vlans
  be2net: Fix offload features for Q-in-Q packets
  virtio-net: enable TSO/checksum offloads for Q-in-Q vlans

 drivers/net/ethernet/emulex/benet/be_main.c |  4 +++-
 drivers/net/virtio_net.c|  1 +
 include/linux/if_vlan.h | 10 --
 3 files changed, 12 insertions(+), 3 deletions(-)

-- 
2.7.4

[PATCH V2 net 1/3] vlan: Fix tcp checksum offloads in Q-in-Q vlans

2017-05-23 Thread Vladislav Yasevich

It appears that TCP checksum offloading has been broken for
Q-in-Q vlans.  The behavior was execerbated by the
series that that enabled accleleration features on stacked
vlans (commit afb0bc972b526 "Merge branch 'stacked_vlan_tso'")

However, event without that series, it is possible to trigger
this issue.  It just requires a lot more specialized configuration.

The root cause is the interaction between how
netdev_intersect_features() works, the features actualy set on
the vlan devices and HW having the ability to run checksum with
longer headers.

The issue starts when netdev_interesect_features() replaces
NETIF_F_HW_CSUM with a combination of NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM,
if the HW advertises IP|IPV6 specific checksums.  This happens
for tagged and multi-tagged packets.   However, HW that enables
IP|IPV6 checksum offloading doesn't gurantee that packets with
arbitrarily long headers can be checksummed.

This patch disables IP|IPV6 checksums on the packet for multi-tagged
packets.

CC: Toshiaki Makita 
CC: Michal Kubecek 
Signed-off-by: Vladislav Yasevich 
---
 include/linux/if_vlan.h | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 8d5fcd6..6686d0f 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -614,14 +614,16 @@ static inline bool skb_vlan_tagged_multi(const struct 
sk_buff *skb)
 static inline netdev_features_t vlan_features_check(const struct sk_buff *skb,
netdev_features_t features)
 {
-   if (skb_vlan_tagged_multi(skb))
-   features = netdev_intersect_features(features,
-NETIF_F_SG |
-NETIF_F_HIGHDMA |
-NETIF_F_FRAGLIST |
-NETIF_F_HW_CSUM |
-NETIF_F_HW_VLAN_CTAG_TX |
-NETIF_F_HW_VLAN_STAG_TX);
+   if (skb_vlan_tagged_multi(skb)) {
+   /* In the case of multi-tagged packets, use a direct mask
+* instead of using netdev_interesect_features(), to make
+* sure that only devices supporting NETIF_F_HW_CSUM will
+* have checksum offloading support.  
+*/
+   features &= NETIF_F_SG | NETIF_F_HIGHDMA | NETIF_F_HW_CSUM |
+   NETIF_F_FRAGLIST | NETIF_F_HW_VLAN_CTAG_TX |
+   NETIF_F_HW_VLAN_STAG_TX;
+   }
 
return features;
 }
-- 
2.7.4

[Patch net-next] net_sched: only create filter chains for new filters/actions

2017-05-23 Thread Cong Wang

tcf_chain_get() always creates a new filter chain if not found
in existing ones. This is totally unnecessary when we get or
delete filters, new chain should be only created for new filters
(or new actions).

Fixes: 5bc1701881e3 ("net: sched: introduce multichain support for filters")
Cc: Jamal Hadi Salim 
Cc: Jiri Pirko 
Signed-off-by: Cong Wang 
---
 include/net/pkt_cls.h |  3 ++-
 net/sched/act_api.c   |  2 +-
 net/sched/cls_api.c   | 13 +
 3 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 2c213a6..f776229 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -18,7 +18,8 @@ int register_tcf_proto_ops(struct tcf_proto_ops *ops);
 int unregister_tcf_proto_ops(struct tcf_proto_ops *ops);
 
 #ifdef CONFIG_NET_CLS
-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index);
+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
+   bool create);
 void tcf_chain_put(struct tcf_chain *chain);
 int tcf_block_get(struct tcf_block **p_block,
  struct tcf_proto __rcu **p_filter_chain);
diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index 0ecf2a8..aed6cf2 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -34,7 +34,7 @@ static int tcf_action_goto_chain_init(struct tc_action *a, 
struct tcf_proto *tp)
 
if (!tp)
return -EINVAL;
-   a->goto_chain = tcf_chain_get(tp->chain->block, chain_index);
+   a->goto_chain = tcf_chain_get(tp->chain->block, chain_index, true);
if (!a->goto_chain)
return -ENOMEM;
return 0;
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 01a8b8b..23d2236 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -220,7 +220,8 @@ static void tcf_chain_destroy(struct tcf_chain *chain)
kfree(chain);
 }
 
-struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index)
+struct tcf_chain *tcf_chain_get(struct tcf_block *block, u32 chain_index,
+   bool create)
 {
struct tcf_chain *chain;
 
@@ -230,7 +231,10 @@ struct tcf_chain *tcf_chain_get(struct tcf_block *block, 
u32 chain_index)
return chain;
}
}
-   return tcf_chain_create(block, chain_index);
+   if (create)
+   return tcf_chain_create(block, chain_index);
+   else
+   return NULL;
 }
 EXPORT_SYMBOL(tcf_chain_get);
 
@@ -509,9 +513,10 @@ static int tc_ctl_tfilter(struct sk_buff *skb, struct 
nlmsghdr *n,
err = -EINVAL;
goto errout;
}
-   chain = tcf_chain_get(block, chain_index);
+   chain = tcf_chain_get(block, chain_index,
+ n->nlmsg_type == RTM_NEWTFILTER);
if (!chain) {
-   err = -ENOMEM;
+   err = n->nlmsg_type == RTM_NEWTFILTER ? -ENOMEM : -EINVAL;
goto errout;
}
 
-- 
2.5.5

[patch net-next v2 5/5] mlxsw: spectrum_flower: Add support for tcp flags

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Allow to offload rules that contain tcp flags within the mask.

Signed-off-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c|  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  | 29 ++
 2 files changed, 30 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
index 3a24289..61a10f1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_tcam.c
@@ -983,6 +983,7 @@ static const enum mlxsw_afk_element 
mlxsw_sp_acl_tcam_pattern_ipv4[] = {
MLXSW_AFK_ELEMENT_SRC_L4_PORT,
MLXSW_AFK_ELEMENT_VID,
MLXSW_AFK_ELEMENT_PCP,
+   MLXSW_AFK_ELEMENT_TCP_FLAGS,
 };
 
 static const enum mlxsw_afk_element mlxsw_sp_acl_tcam_pattern_ipv6[] = {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index f7a8c3c..ed75c6a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -182,6 +182,32 @@ static int mlxsw_sp_flower_parse_ports(struct mlxsw_sp 
*mlxsw_sp,
return 0;
 }
 
+static int mlxsw_sp_flower_parse_tcp(struct mlxsw_sp *mlxsw_sp,
+struct mlxsw_sp_acl_rule_info *rulei,
+struct tc_cls_flower_offload *f,
+u8 ip_proto)
+{
+   struct flow_dissector_key_tcp *key, *mask;
+
+   if (!dissector_uses_key(f->dissector, FLOW_DISSECTOR_KEY_TCP))
+   return 0;
+
+   if (ip_proto != IPPROTO_TCP) {
+   dev_err(mlxsw_sp->bus_info->dev, "TCP keys supported only for 
TCP\n");
+   return -EINVAL;
+   }
+
+   key = skb_flow_dissector_target(f->dissector,
+   FLOW_DISSECTOR_KEY_TCP,
+   f->key);
+   mask = skb_flow_dissector_target(f->dissector,
+FLOW_DISSECTOR_KEY_TCP,
+f->mask);
+   mlxsw_sp_acl_rulei_keymask_u32(rulei, MLXSW_AFK_ELEMENT_TCP_FLAGS,
+  ntohs(key->flags), ntohs(mask->flags));
+   return 0;
+}
+
 static int mlxsw_sp_flower_parse(struct mlxsw_sp *mlxsw_sp,
 struct net_device *dev,
 struct mlxsw_sp_acl_rule_info *rulei,
@@ -290,6 +316,9 @@ static int mlxsw_sp_flower_parse(struct mlxsw_sp *mlxsw_sp,
err = mlxsw_sp_flower_parse_ports(mlxsw_sp, rulei, f, ip_proto);
if (err)
return err;
+   err = mlxsw_sp_flower_parse_tcp(mlxsw_sp, rulei, f, ip_proto);
+   if (err)
+   return err;
 
return mlxsw_sp_flower_parse_actions(mlxsw_sp, dev, rulei, f->exts);
 }
-- 
2.9.3

[patch net-next v2 4/5] mlxsw: spectrum: Add acl block containing tcp flags for ipv4

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Add acl block called "ipv4" which contains tcp flags.

Signed-off-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.h 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.h
index af7b7ba..85d5001 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_acl_flex_keys.h
@@ -68,6 +68,11 @@ static struct mlxsw_afk_element_inst 
mlxsw_sp_afk_element_info_ipv4_dip[] = {
MLXSW_AFK_ELEMENT_INST_U32(SRC_SYS_PORT, 0x0C, 0, 16),
 };
 
+static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv4[] = {
+   MLXSW_AFK_ELEMENT_INST_U32(SRC_IP4, 0x00, 0, 32),
+   MLXSW_AFK_ELEMENT_INST_U32(TCP_FLAGS, 0x08, 8, 9), /* 
TCP_CONTROL+TCP_ECN */
+};
+
 static struct mlxsw_afk_element_inst mlxsw_sp_afk_element_info_ipv4_ex[] = {
MLXSW_AFK_ELEMENT_INST_U32(VID, 0x00, 0, 12),
MLXSW_AFK_ELEMENT_INST_U32(PCP, 0x08, 29, 3),
@@ -102,6 +107,7 @@ static const struct mlxsw_afk_block mlxsw_sp_afk_blocks[] = 
{
MLXSW_AFK_BLOCK(0x12, mlxsw_sp_afk_element_info_l2_smac_ex),
MLXSW_AFK_BLOCK(0x30, mlxsw_sp_afk_element_info_ipv4_sip),
MLXSW_AFK_BLOCK(0x31, mlxsw_sp_afk_element_info_ipv4_dip),
+   MLXSW_AFK_BLOCK(0x32, mlxsw_sp_afk_element_info_ipv4),
MLXSW_AFK_BLOCK(0x33, mlxsw_sp_afk_element_info_ipv4_ex),
MLXSW_AFK_BLOCK(0x60, mlxsw_sp_afk_element_info_ipv6_dip),
MLXSW_AFK_BLOCK(0x65, mlxsw_sp_afk_element_info_ipv6_ex1),
-- 
2.9.3

[patch net-next v2 2/5] net/sched: flower: add support for matching on tcp flags

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Benefit from the support of tcp flags dissection and allow user to
insert rules matching on tcp flags.

Signed-off-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  3 +++
 net/sched/cls_flower.c   | 13 -
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 1b9aa9e..c6e8cf5 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -451,6 +451,9 @@ enum {
TCA_FLOWER_KEY_MPLS_TC, /* u8 - 3 bits */
TCA_FLOWER_KEY_MPLS_LABEL,  /* be32 - 20 bits */
 
+   TCA_FLOWER_KEY_TCP_FLAGS,   /* be16 */
+   TCA_FLOWER_KEY_TCP_FLAGS_MASK,  /* be16 */
+
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index ca526c0..fb74a47 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -49,6 +49,7 @@ struct fl_flow_key {
};
struct flow_dissector_key_ports enc_tp;
struct flow_dissector_key_mpls mpls;
+   struct flow_dissector_key_tcp tcp;
 } __aligned(BITS_PER_LONG / 8); /* Ensure that we can do comparisons as longs. 
*/
 
 struct fl_flow_mask_range {
@@ -424,6 +425,8 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_MPLS_BOS]   = { .type = NLA_U8 },
[TCA_FLOWER_KEY_MPLS_TC]= { .type = NLA_U8 },
[TCA_FLOWER_KEY_MPLS_LABEL] = { .type = NLA_U32 },
+   [TCA_FLOWER_KEY_TCP_FLAGS]  = { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_TCP_FLAGS_MASK] = { .type = NLA_U16 },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -596,6 +599,9 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
fl_set_key_val(tb, &key->tp.dst, TCA_FLOWER_KEY_TCP_DST,
   &mask->tp.dst, TCA_FLOWER_KEY_TCP_DST_MASK,
   sizeof(key->tp.dst));
+   fl_set_key_val(tb, &key->tcp.flags, TCA_FLOWER_KEY_TCP_FLAGS,
+  &mask->tcp.flags, TCA_FLOWER_KEY_TCP_FLAGS_MASK,
+  sizeof(key->tcp.flags));
} else if (key->basic.ip_proto == IPPROTO_UDP) {
fl_set_key_val(tb, &key->tp.src, TCA_FLOWER_KEY_UDP_SRC,
   &mask->tp.src, TCA_FLOWER_KEY_UDP_SRC_MASK,
@@ -767,6 +773,8 @@ static void fl_init_dissector(struct cls_fl_head *head,
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 FLOW_DISSECTOR_KEY_PORTS, tp);
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
+FLOW_DISSECTOR_KEY_TCP, tcp);
+   FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 FLOW_DISSECTOR_KEY_ICMP, icmp);
FL_KEY_SET_IF_MASKED(&mask->key, keys, cnt,
 FLOW_DISSECTOR_KEY_ARP, arp);
@@ -1215,7 +1223,10 @@ static int fl_dump(struct net *net, struct tcf_proto 
*tp, unsigned long fh,
 sizeof(key->tp.src)) ||
 fl_dump_key_val(skb, &key->tp.dst, TCA_FLOWER_KEY_TCP_DST,
 &mask->tp.dst, TCA_FLOWER_KEY_TCP_DST_MASK,
-sizeof(key->tp.dst
+sizeof(key->tp.dst)) ||
+fl_dump_key_val(skb, &key->tcp.flags, TCA_FLOWER_KEY_TCP_FLAGS,
+&mask->tcp.flags, TCA_FLOWER_KEY_TCP_FLAGS_MASK,
+sizeof(key->tcp.flags
goto nla_put_failure;
else if (key->basic.ip_proto == IPPROTO_UDP &&
 (fl_dump_key_val(skb, &key->tp.src, TCA_FLOWER_KEY_UDP_SRC,
-- 
2.9.3

[patch net-next v2 3/5] mlxsw: acl: Add tcp flags acl element

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Define new element for tcp flags and place it into scratch area.

Signed-off-by: Jiri Pirko 
Reviewed-by: Ido Schimmel 
---
 drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h | 2 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c| 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h 
b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
index c75e914..9807ef8 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core_acl_flex_keys.h
@@ -56,6 +56,7 @@ enum mlxsw_afk_element {
MLXSW_AFK_ELEMENT_SRC_L4_PORT,
MLXSW_AFK_ELEMENT_VID,
MLXSW_AFK_ELEMENT_PCP,
+   MLXSW_AFK_ELEMENT_TCP_FLAGS,
MLXSW_AFK_ELEMENT_MAX,
 };
 
@@ -102,6 +103,7 @@ static const struct mlxsw_afk_element_info 
mlxsw_afk_element_infos[] = {
MLXSW_AFK_ELEMENT_INFO_U32(IP_PROTO, 0x10, 0, 8),
MLXSW_AFK_ELEMENT_INFO_U32(VID, 0x10, 8, 12),
MLXSW_AFK_ELEMENT_INFO_U32(PCP, 0x10, 20, 3),
+   MLXSW_AFK_ELEMENT_INFO_U32(TCP_FLAGS, 0x10, 23, 9),
MLXSW_AFK_ELEMENT_INFO_U32(SRC_IP4, 0x18, 0, 32),
MLXSW_AFK_ELEMENT_INFO_U32(DST_IP4, 0x1C, 0, 32),
MLXSW_AFK_ELEMENT_INFO_BUF(SRC_IP6_HI, 0x18, 8),
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c 
b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
index cc99de0..f7a8c3c 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_flower.c
@@ -198,6 +198,7 @@ static int mlxsw_sp_flower_parse(struct mlxsw_sp *mlxsw_sp,
  BIT(FLOW_DISSECTOR_KEY_IPV4_ADDRS) |
  BIT(FLOW_DISSECTOR_KEY_IPV6_ADDRS) |
  BIT(FLOW_DISSECTOR_KEY_PORTS) |
+ BIT(FLOW_DISSECTOR_KEY_TCP) |
  BIT(FLOW_DISSECTOR_KEY_VLAN))) {
dev_err(mlxsw_sp->bus_info->dev, "Unsupported key\n");
return -EOPNOTSUPP;
-- 
2.9.3

[patch net-next v2 1/5] net: flow_dissector: add support for dissection of tcp flags

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

Add support for dissection of tcp flags. Uses similar function call to
tcp dissection function as arp, mpls and others.

Signed-off-by: Jiri Pirko 
Acked-by: Or Gerlitz 
---
 include/net/flow_dissector.h |  9 +
 net/core/flow_dissector.c| 29 +
 2 files changed, 38 insertions(+)

diff --git a/include/net/flow_dissector.h b/include/net/flow_dissector.h
index 8d21d44..efe34eec 100644
--- a/include/net/flow_dissector.h
+++ b/include/net/flow_dissector.h
@@ -157,6 +157,14 @@ struct flow_dissector_key_eth_addrs {
unsigned char src[ETH_ALEN];
 };
 
+/**
+ * struct flow_dissector_key_tcp:
+ * @flags: flags
+ */
+struct flow_dissector_key_tcp {
+   __be16 flags;
+};
+
 enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_BASIC, /* struct flow_dissector_key_basic */
@@ -177,6 +185,7 @@ enum flow_dissector_key_id {
FLOW_DISSECTOR_KEY_ENC_CONTROL, /* struct flow_dissector_key_control */
FLOW_DISSECTOR_KEY_ENC_PORTS, /* struct flow_dissector_key_ports */
FLOW_DISSECTOR_KEY_MPLS, /* struct flow_dissector_key_mpls */
+   FLOW_DISSECTOR_KEY_TCP, /* struct flow_dissector_key_tcp */
 
FLOW_DISSECTOR_KEY_MAX,
 };
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 28d94bc..5a45943 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -342,6 +343,30 @@ __skb_flow_dissect_gre(const struct sk_buff *skb,
return FLOW_DISSECT_RET_OUT_PROTO_AGAIN;
 }
 
+static void
+__skb_flow_dissect_tcp(const struct sk_buff *skb,
+  struct flow_dissector *flow_dissector,
+  void *target_container, void *data, int thoff, int hlen)
+{
+   struct flow_dissector_key_tcp *key_tcp;
+   struct tcphdr *th, _th;
+
+   if (!dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_TCP))
+   return;
+
+   th = __skb_header_pointer(skb, thoff, sizeof(_th), data, hlen, &_th);
+   if (!th)
+   return;
+
+   if (unlikely(__tcp_hdrlen(th) < sizeof(_th)))
+   return;
+
+   key_tcp = skb_flow_dissector_target(flow_dissector,
+   FLOW_DISSECTOR_KEY_TCP,
+   target_container);
+   key_tcp->flags = (*(__be16 *) &tcp_flag_word(th) & htons(0x0FFF));
+}
+
 /**
  * __skb_flow_dissect - extract the flow_keys struct and return it
  * @skb: sk_buff to extract the flow from, can be NULL if the rest are 
specified
@@ -683,6 +708,10 @@ bool __skb_flow_dissect(const struct sk_buff *skb,
case IPPROTO_MPLS:
proto = htons(ETH_P_MPLS_UC);
goto mpls;
+   case IPPROTO_TCP:
+   __skb_flow_dissect_tcp(skb, flow_dissector, target_container,
+  data, nhoff, hlen);
+   break;
default:
break;
}
-- 
2.9.3

[patch net-next v2 0/5] add tcp flags match support to flower and offload it

2017-05-23 Thread Jiri Pirko

From: Jiri Pirko 

This patch adds support to dissect tcp flags, match on them using
flower classifier and offload such rules to mlxsw Spectrum devices.

---
v1->v2:
- removed no longer relevant comment from patch 1 as suggested by Or
- sent correct patches this time

Jiri Pirko (5):
  net: flow_dissector: add support for dissection of tcp flags
  net/sched: flower: add support for matching on tcp flags
  mlxsw: acl: Add tcp flags acl element
  mlxsw: spectrum: Add acl block containing tcp flags for ipv4
  mlxsw: spectrum_flower: Add support for tcp flags

 .../ethernet/mellanox/mlxsw/core_acl_flex_keys.h   |  2 ++
 .../mellanox/mlxsw/spectrum_acl_flex_keys.h|  6 +
 .../ethernet/mellanox/mlxsw/spectrum_acl_tcam.c|  1 +
 .../net/ethernet/mellanox/mlxsw/spectrum_flower.c  | 30 ++
 include/net/flow_dissector.h   |  9 +++
 include/uapi/linux/pkt_cls.h   |  3 +++
 net/core/flow_dissector.c  | 29 +
 net/sched/cls_flower.c | 13 +-
 8 files changed, 92 insertions(+), 1 deletion(-)

-- 
2.9.3

Re: tipc: Delete error messages for failed memory allocations in three functions

2017-05-23 Thread Joe Perches

On Tue, 2017-05-23 at 18:23 +0200, SF Markus Elfring wrote:
> > tipc_subseq_alloc does a kcalloc (memset to 0),
> > half of which is immediately overwritten.
> > 
> > In other words, don't just blindly remove stuff,
> > understand what it does and improve it.
> 
> Do you suggest another specific source code transformation pattern here?

For the somewhat hard-of-thinking,
something like krealloc would do nicely.

Re: [PATCH 2/2] at803x: double check SGMII side autoneg

2017-05-23 Thread Timur Tabi

On 05/23/2017 11:07 AM, Andrew Lunn wrote:
>> > I will test that to see what happens, but I believe the real problem is 
>> > that
>> > the at803x driver is lying when it says that the link is not okay.  I think
>> > the link is okay, and that's why I'm not getting any more interrupts.  I
>> > don't think I should have to drop interrupt support in my MAC driver 
>> > because
>> > one specific PHY driver is broken.
> If it turns out the PHY hardware is broken, the phy driver itself can
> force it back to polling by setting phydev->irq to PHY_POLL in its
> probe() function.

I don't think the hardware is broken, I think the driver is broken.  The
patch that sets aneg_done to 0 should be reverted or restricted somehow.

Even the developer of the patch admits that if the warning message is
displayed, the link will appear to be up, but no packets will go through.
Perhaps that's because the driver is returning 0 instead of BMSR_ANEGCOMPLETE?

Would it be okay for the PHY driver to query a property from the device tree
directly (e.g. "qca,check-sgmii-link"), and if present, only then implement
the sgmii link check?  So in at803x_probe(), I would do something like this:

if (device_property_read_bool(&phydev->mdio.dev,
"qca,check-sgmii-link")
priv->check_sgmii_link = true;

-- 
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm
Technologies, Inc.  Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

RE: [PATCH 2/2] drivers: phy: Add Cortina CS4340 driver

2017-05-23 Thread Bogdan Purcareata

> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch]
> Sent: Tuesday, May 23, 2017 6:57 PM
> To: Bogdan Purcareata 
> Cc: f.faine...@gmail.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH 2/2] drivers: phy: Add Cortina CS4340 driver
> 
> On Tue, May 23, 2017 at 03:53:19PM +, Bogdan Purcareata wrote:
> > Add basic support for Cortina PHY drivers. Support only CS4340 for now.
> > The phys are not fully compatible with IEEE 802.3 clause 45 registers.
> > Implement proper read_status support, so that phy polling does not cause
> > bus register access errors.
> >
> > Signed-off-by: Bogdan Purcareata 
> > ---
> >  drivers/net/phy/Kconfig|  5 +++
> >  drivers/net/phy/Makefile   |  1 +
> >  drivers/net/phy/mdio-cortina.c | 90
> ++
> 
> This is a phy driver, not a mdio bus driver. Please use the correct
> file name.

Will fix in v2, thanks!
Bogdan

1 2 3 >

1 - 100 of 247 matches

Mail list logo