date:20170108

Re: [PATCH net-next V2 0/3] net/sched: act_pedit: Use offset relative to conventional network headers

2017-01-08 Thread Amir Vadai

On Fri, Jan 06, 2017 at 08:51:09PM -0500, David Miller wrote:
> From: Amir Vadai 
> Date: Thu,  5 Jan 2017 11:54:51 +0200
> 
> > Enhancing the UAPI to allow for specifying that would allow the same
> > flows to be set into both SW and HW.
> 
> This is actually not backward compatible.
> 
> When pedit rules are dumped, older tools will not know about the
> type field and therefore will completely misinterpret the rule.
> 
> You must extend this the proper way, which is to add a new attribute
> or something along those lines.  The presense of a new attribute
> is an explicit communication to older tools that somethng they
> might not support and understand is going on.

Sorry, I missed this scenario. Going back to the drawing board.

Re: [PATCH net-next 6/7] net/mlx5: E-Switch, Add control for inline mode

2017-01-08 Thread Jiri Pirko

Mon, Nov 21, 2016 at 02:06:00PM CET, sae...@mellanox.com wrote:
>From: Roi Dayan 
>
>Implement devlink show and set of HW inline-mode.
>The supported modes: none, link, network, transport.
>We currently support one mode for all vports so set is done on all vports.
>When eswitch is first initialized the inline-mode is queried from the FW.
>
>Signed-off-by: Roi Dayan 
>Signed-off-by: Saeed Mahameed 

Saeed, could you please use get_maintainer script and cc those people
for you submissions? Thanks!

[net-next 8/8] fm10k: remove FM10K_FLAG_DEBUG_STATS

2017-01-08 Thread Jeff Kirsher

From: Jacob Keller 

The debug statistics were removed due to complications with the ethtool
statistics API which are not possible to resolve without a new
statistics interface. The flag was left behind, but we no longer need
it.

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h 
b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 75d2c80..52b9794 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -261,7 +261,6 @@ struct fm10k_intfc {
 #define FM10K_FLAG_RSS_FIELD_IPV4_UDP  (u32)(BIT(1))
 #define FM10K_FLAG_RSS_FIELD_IPV6_UDP  (u32)(BIT(2))
 #define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(3))
-#define FM10K_FLAG_DEBUG_STATS (u32)(BIT(4))
int xcast_mode;
 
/* Tx fast path data */
-- 
2.9.3

[net-next 7/8] fm10k: report the receive timestamp in FM10K_CB(skb)->tstamp

2017-01-08 Thread Jeff Kirsher

From: Jacob Keller 

This was accidentally removed when we defeatured the full 1588 Clock
support. We need to report the Rx descriptor timestamp value so that
applications built on top of the IES API can function properly.

Additionally, remove the FM10K_FLAG_RX_TS_ENABLED, as it is not used now
that 1588 functionality has been removed.

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k.h  | 5 ++---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 ++
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k.h 
b/drivers/net/ethernet/intel/fm10k/fm10k.h
index 4d19e46..75d2c80 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k.h
+++ b/drivers/net/ethernet/intel/fm10k/fm10k.h
@@ -260,9 +260,8 @@ struct fm10k_intfc {
 #define FM10K_FLAG_RESET_REQUESTED (u32)(BIT(0))
 #define FM10K_FLAG_RSS_FIELD_IPV4_UDP  (u32)(BIT(1))
 #define FM10K_FLAG_RSS_FIELD_IPV6_UDP  (u32)(BIT(2))
-#define FM10K_FLAG_RX_TS_ENABLED   (u32)(BIT(3))
-#define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(4))
-#define FM10K_FLAG_DEBUG_STATS (u32)(BIT(5))
+#define FM10K_FLAG_SWPRI_CONFIG(u32)(BIT(3))
+#define FM10K_FLAG_DEBUG_STATS (u32)(BIT(4))
int xcast_mode;
 
/* Tx fast path data */
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 8f90c6d..5bb233a 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -475,6 +475,8 @@ static unsigned int fm10k_process_skb_fields(struct 
fm10k_ring *rx_ring,
 
fm10k_rx_checksum(rx_ring, rx_desc, skb);
 
+   FM10K_CB(skb)->tstamp = rx_desc->q.timestamp;
+
FM10K_CB(skb)->fi.w.vlan = rx_desc->w.vlan;
 
skb_record_rx_queue(skb, rx_ring->queue_index);
-- 
2.9.3

[net-next 6/8] fm10k: Limit dma sync of RX buffers to actual packet size

2017-01-08 Thread Jeff Kirsher

From: Scott Peterson 

On packet RX, we perform a dma sync for cpu before passing the
packet up.  Here we limit that sync to the actual length of the
incoming packet, rather than always syncing the entire buffer.

Signed-off-by: Scott Peterson 
Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 509514d..8f90c6d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -251,6 +251,7 @@ static bool fm10k_can_reuse_rx_page(struct fm10k_rx_buffer 
*rx_buffer,
 /**
  * fm10k_add_rx_frag - Add contents of Rx buffer to sk_buff
  * @rx_buffer: buffer containing page to add
+ * @size: packet size from rx_desc
  * @rx_desc: descriptor containing length of buffer written by hardware
  * @skb: sk_buff to place the data into
  *
@@ -263,12 +264,12 @@ static bool fm10k_can_reuse_rx_page(struct 
fm10k_rx_buffer *rx_buffer,
  * true if the buffer can be reused by the interface.
  **/
 static bool fm10k_add_rx_frag(struct fm10k_rx_buffer *rx_buffer,
+ unsigned int size,
  union fm10k_rx_desc *rx_desc,
  struct sk_buff *skb)
 {
struct page *page = rx_buffer->page;
unsigned char *va = page_address(page) + rx_buffer->page_offset;
-   unsigned int size = le16_to_cpu(rx_desc->w.length);
 #if (PAGE_SIZE < 8192)
unsigned int truesize = FM10K_RX_BUFSZ;
 #else
@@ -314,6 +315,7 @@ static struct sk_buff *fm10k_fetch_rx_buffer(struct 
fm10k_ring *rx_ring,
 union fm10k_rx_desc *rx_desc,
 struct sk_buff *skb)
 {
+   unsigned int size = le16_to_cpu(rx_desc->w.length);
struct fm10k_rx_buffer *rx_buffer;
struct page *page;
 
@@ -350,11 +352,11 @@ static struct sk_buff *fm10k_fetch_rx_buffer(struct 
fm10k_ring *rx_ring,
dma_sync_single_range_for_cpu(rx_ring->dev,
  rx_buffer->dma,
  rx_buffer->page_offset,
- FM10K_RX_BUFSZ,
+ size,
  DMA_FROM_DEVICE);
 
/* pull page into skb */
-   if (fm10k_add_rx_frag(rx_buffer, rx_desc, skb)) {
+   if (fm10k_add_rx_frag(rx_buffer, size, rx_desc, skb)) {
/* hand second half of page back to the ring */
fm10k_reuse_rx_page(rx_ring, rx_buffer);
} else {
-- 
2.9.3

[net-next 5/8] fm10k: bump version number

2017-01-08 Thread Jeff Kirsher

From: Jacob Keller 

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_main.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
index 5de9378..509514d 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_main.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_main.c
@@ -28,7 +28,7 @@
 
 #include "fm10k.h"
 
-#define DRV_VERSION"0.21.2-k"
+#define DRV_VERSION"0.21.7-k"
 #define DRV_SUMMARY"Intel(R) Ethernet Switch Host Interface Driver"
 const char fm10k_driver_version[] = DRV_VERSION;
 char fm10k_driver_name[] = "fm10k";
-- 
2.9.3

[net-next 0/8][pull request] 100GbE Intel Wired LAN Driver Updates 2017-01-08

2017-01-08 Thread Jeff Kirsher

This series contains updates to fm10k only.

Ngai-Mint changes the driver to use the MAC pointer in the fm10k_mac_info
structure for fm10k_get_host_state_generic().  Fixed a race condition
where the mailbox interrupt request bits can be cleared before being
handled causing certain mailbox messages from the PF to be untreated
and the PF will enter in some inactive state.

Jake removes the typecast of u8 to char, and the extra variable that was
created for the typecast.  Bumps the driver version.  Added back the
receive descriptor timestamp value so that applications built on top
of the IES API can function properly.  Cleaned up the debug statistics
flag, since debug statistics were removed and the flag was missed in
the removal.

Scott limits the DMA sync for CPU to the actual length of the packet,
instead of the entire buffer, since the DMA sync occurs every time a
packet is received.

The following are changes since commit 111427f6eb5a5d9ce22f8a90780ac1c18113091a:
  net: dsa: move HWMON support to its own file
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 100GbE

Jacob Keller (4):
  fm10k: remove extraneous variable definition in fm10k_ethtool.c
  fm10k: bump version number
  fm10k: report the receive timestamp in FM10K_CB(skb)->tstamp
  fm10k: remove FM10K_FLAG_DEBUG_STATS

Ngai-Mint Kwan (3):
  fm10k-shared: use mac-> instead of hw->mac.
  fm10k: request reset when mbx->state changes
  fm10k: do not clear global mailbox interrupt bits

Scott Peterson (1):
  fm10k: Limit dma sync of RX buffers to actual packet size

 drivers/net/ethernet/intel/fm10k/fm10k.h |  4 +---
 drivers/net/ethernet/intel/fm10k/fm10k_common.c  |  6 +++---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 21 +
 drivers/net/ethernet/intel/fm10k/fm10k_main.c| 12 
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 10 +++---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |  6 +-
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c  |  4 
 7 files changed, 33 insertions(+), 30 deletions(-)

-- 
2.9.3

[net-next 4/8] fm10k: do not clear global mailbox interrupt bits

2017-01-08 Thread Jeff Kirsher

From: Ngai-Mint Kwan 

Partially revert commit 5e93cbadd3e9 ("fm10k: Reset mailbox global
interrupts", 2016-06-07)

The register bits related to this commit are now solely being handled by
the IES API. Recent changes in the IES API will allow an automatic
recovery from improper handling of these bits.

Signed-off-by: Ngai-Mint Kwan 
Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_pf.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
index 23fb319..40ee024 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pf.c
@@ -72,10 +72,6 @@ static s32 fm10k_reset_hw_pf(struct fm10k_hw *hw)
fm10k_write_flush(hw);
udelay(FM10K_RESET_TIMEOUT);
 
-   /* Reset mailbox global interrupts */
-   reg = FM10K_MBX_GLOBAL_REQ_INTERRUPT | FM10K_MBX_GLOBAL_ACK_INTERRUPT;
-   fm10k_write_reg(hw, FM10K_GMBX, reg);
-
/* Verify we made it out of reset */
reg = fm10k_read_reg(hw, FM10K_IP);
if (!(reg & FM10K_IP_NOTINRESET))
-- 
2.9.3

[net-next 1/8] fm10k-shared: use mac-> instead of hw->mac.

2017-01-08 Thread Jeff Kirsher

From: Ngai-Mint Kwan 

Since a pointer "mac" to fm10k_mac_info structure exists, use it to
access the contents of its members.

Signed-off-by: Ngai-Mint Kwan 
Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_common.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_common.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
index dd95ac4..62a6ad9 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_common.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_common.c
@@ -506,7 +506,7 @@ s32 fm10k_get_host_state_generic(struct fm10k_hw *hw, bool 
*host_ready)
goto out;
 
/* if we somehow dropped the Tx enable we should reset */
-   if (hw->mac.tx_ready && !(txdctl & FM10K_TXDCTL_ENABLE)) {
+   if (mac->tx_ready && !(txdctl & FM10K_TXDCTL_ENABLE)) {
ret_val = FM10K_ERR_RESET_REQUESTED;
goto out;
}
@@ -523,8 +523,8 @@ s32 fm10k_get_host_state_generic(struct fm10k_hw *hw, bool 
*host_ready)
 
/* interface cannot receive traffic without logical ports */
if (mac->dglort_map == FM10K_DGLORTMAP_NONE) {
-   if (hw->mac.ops.request_lport_map)
-   ret_val = hw->mac.ops.request_lport_map(hw);
+   if (mac->ops.request_lport_map)
+   ret_val = mac->ops.request_lport_map(hw);
 
goto out;
}
-- 
2.9.3

[net-next 3/8] fm10k: request reset when mbx->state changes

2017-01-08 Thread Jeff Kirsher

From: Ngai-Mint Kwan 

Multiple IES API resets can cause a race condition where the mailbox
interrupt request bits can be cleared before being handled. This can
leave certain mailbox messages from the PF to be untreated and the PF
will enter in some inactive state. If this situation occurs, the IES API
will initiate a mailbox version reset which, then, trigger a mailbox
state change. Once this mailbox transition occurs (from OPEN to CONNECT
state), a request for reset will be returned.

This ensures that PF will undergo a reset whenever IES API encounters an
unknown global mailbox interrupt event or whenever the IES API
terminates.

Signed-off-by: Ngai-Mint Kwan 
Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_mbx.c | 10 +++---
 drivers/net/ethernet/intel/fm10k/fm10k_pci.c |  6 +-
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
index c9dfa65..334088a 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_mbx.c
@@ -2011,9 +2011,10 @@ static void fm10k_sm_mbx_create_reply(struct fm10k_hw 
*hw,
  *  function can also be used to respond to an error as the connection
  *  resetting would also be a means of dealing with errors.
  **/
-static void fm10k_sm_mbx_process_reset(struct fm10k_hw *hw,
-  struct fm10k_mbx_info *mbx)
+static s32 fm10k_sm_mbx_process_reset(struct fm10k_hw *hw,
+ struct fm10k_mbx_info *mbx)
 {
+   s32 err = 0;
const enum fm10k_mbx_state state = mbx->state;
 
switch (state) {
@@ -2026,6 +2027,7 @@ static void fm10k_sm_mbx_process_reset(struct fm10k_hw 
*hw,
case FM10K_STATE_OPEN:
/* flush any incomplete work */
fm10k_sm_mbx_connect_reset(mbx);
+   err = FM10K_ERR_RESET_REQUESTED;
break;
case FM10K_STATE_CONNECT:
/* Update remote value to match local value */
@@ -2035,6 +2037,8 @@ static void fm10k_sm_mbx_process_reset(struct fm10k_hw 
*hw,
}
 
fm10k_sm_mbx_create_reply(hw, mbx, mbx->tail);
+
+   return err;
 }
 
 /**
@@ -2115,7 +2119,7 @@ static s32 fm10k_sm_mbx_process(struct fm10k_hw *hw,
 
switch (FM10K_MSG_HDR_FIELD_GET(mbx->mbx_hdr, SM_VER)) {
case 0:
-   fm10k_sm_mbx_process_reset(hw, mbx);
+   err = fm10k_sm_mbx_process_reset(hw, mbx);
break;
case FM10K_SM_MBX_VERSION:
err = fm10k_sm_mbx_process_version_1(hw, mbx);
diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
index b1a2f84..e372a58 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_pci.c
@@ -1144,6 +1144,7 @@ static irqreturn_t fm10k_msix_mbx_pf(int __always_unused 
irq, void *data)
struct fm10k_hw *hw = &interface->hw;
struct fm10k_mbx_info *mbx = &hw->mbx;
u32 eicr;
+   s32 err = 0;
 
/* unmask any set bits related to this interrupt */
eicr = fm10k_read_reg(hw, FM10K_EICR);
@@ -1159,12 +1160,15 @@ static irqreturn_t fm10k_msix_mbx_pf(int 
__always_unused irq, void *data)
 
/* service mailboxes */
if (fm10k_mbx_trylock(interface)) {
-   mbx->ops.process(hw, mbx);
+   err = mbx->ops.process(hw, mbx);
/* handle VFLRE events */
fm10k_iov_event(interface);
fm10k_mbx_unlock(interface);
}
 
+   if (err == FM10K_ERR_RESET_REQUESTED)
+   interface->flags |= FM10K_FLAG_RESET_REQUESTED;
+
/* if switch toggled state we should reset GLORTs */
if (eicr & FM10K_EICR_SWITCHNOTREADY) {
/* force link down for at least 4 seconds */
-- 
2.9.3

[net-next 2/8] fm10k: remove extraneous variable definition in fm10k_ethtool.c

2017-01-08 Thread Jeff Kirsher

From: Jacob Keller 

We don't need to typecast a u8 * into a char *, so just remove the extra
variable.

Signed-off-by: Jacob Keller 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c 
b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
index 5241e08..0c84fef 100644
--- a/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
+++ b/drivers/net/ethernet/intel/fm10k/fm10k_ethtool.c
@@ -148,7 +148,7 @@ enum {
 static const char fm10k_prv_flags[FM10K_PRV_FLAG_LEN][ETH_GSTRING_LEN] = {
 };
 
-static void fm10k_add_stat_strings(char **p, const char *prefix,
+static void fm10k_add_stat_strings(u8 **p, const char *prefix,
   const struct fm10k_stats stats[],
   const unsigned int size)
 {
@@ -164,32 +164,31 @@ static void fm10k_add_stat_strings(char **p, const char 
*prefix,
 static void fm10k_get_stat_strings(struct net_device *dev, u8 *data)
 {
struct fm10k_intfc *interface = netdev_priv(dev);
-   char *p = (char *)data;
unsigned int i;
 
-   fm10k_add_stat_strings(&p, "", fm10k_gstrings_net_stats,
+   fm10k_add_stat_strings(&data, "", fm10k_gstrings_net_stats,
   FM10K_NETDEV_STATS_LEN);
 
-   fm10k_add_stat_strings(&p, "", fm10k_gstrings_global_stats,
+   fm10k_add_stat_strings(&data, "", fm10k_gstrings_global_stats,
   FM10K_GLOBAL_STATS_LEN);
 
-   fm10k_add_stat_strings(&p, "", fm10k_gstrings_mbx_stats,
+   fm10k_add_stat_strings(&data, "", fm10k_gstrings_mbx_stats,
   FM10K_MBX_STATS_LEN);
 
if (interface->hw.mac.type != fm10k_mac_vf)
-   fm10k_add_stat_strings(&p, "", fm10k_gstrings_pf_stats,
+   fm10k_add_stat_strings(&data, "", fm10k_gstrings_pf_stats,
   FM10K_PF_STATS_LEN);
 
for (i = 0; i < interface->hw.mac.max_queues; i++) {
char prefix[ETH_GSTRING_LEN];
 
snprintf(prefix, ETH_GSTRING_LEN, "tx_queue_%u_", i);
-   fm10k_add_stat_strings(&p, prefix,
+   fm10k_add_stat_strings(&data, prefix,
   fm10k_gstrings_queue_stats,
   FM10K_QUEUE_STATS_LEN);
 
snprintf(prefix, ETH_GSTRING_LEN, "rx_queue_%u_", i);
-   fm10k_add_stat_strings(&p, prefix,
+   fm10k_add_stat_strings(&data, prefix,
   fm10k_gstrings_queue_stats,
   FM10K_QUEUE_STATS_LEN);
}
@@ -198,18 +197,16 @@ static void fm10k_get_stat_strings(struct net_device 
*dev, u8 *data)
 static void fm10k_get_strings(struct net_device *dev,
  u32 stringset, u8 *data)
 {
-   char *p = (char *)data;
-
switch (stringset) {
case ETH_SS_TEST:
-   memcpy(data, *fm10k_gstrings_test,
+   memcpy(data, fm10k_gstrings_test,
   FM10K_TEST_LEN * ETH_GSTRING_LEN);
break;
case ETH_SS_STATS:
fm10k_get_stat_strings(dev, data);
break;
case ETH_SS_PRIV_FLAGS:
-   memcpy(p, fm10k_prv_flags,
+   memcpy(data, fm10k_prv_flags,
   FM10K_PRV_FLAG_LEN * ETH_GSTRING_LEN);
break;
}
-- 
2.9.3

Re: [PATCH net-next 4/7] devlink: Add E-Switch inline mode control

2017-01-08 Thread Jiri Pirko

Mon, Nov 21, 2016 at 02:05:58PM CET, sae...@mellanox.com wrote:
>From: Roi Dayan 
>
>Some HWs need the VF driver to put part of the packet headers on the
>TX descriptor so the e-switch can do proper matching and steering.

Could you please elaborate a bit about possible use-cases for different
modes? Thanks.


>
>The supported modes: none, link, network, transport.
>
>Signed-off-by: Roi Dayan 
>Signed-off-by: Saeed Mahameed

[PATCH] net: fix accept4() flags not work

2017-01-08 Thread yuan linyu

From: yuan linyu 

user input flags store to newsock which should be used.

Signed-off-by: yuan linyu 
---
 net/socket.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/socket.c b/net/socket.c
index a8c2307..415f988 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -1508,7 +1508,7 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user 
*, upeer_sockaddr,
if (err)
goto out_fd;
 
-   err = sock->ops->accept(sock, newsock, sock->file->f_flags);
+   err = sock->ops->accept(sock, newsock, newsock->file->f_flags);
if (err < 0)
goto out_fd;
 
-- 
2.7.4

Re: [PATCH net-next 4/7] devlink: Add E-Switch inline mode control

2017-01-08 Thread Or Gerlitz


On 1/8/2017 12:29 PM, Jiri Pirko wrote:

Mon, Nov 21, 2016 at 02:05:58PM CET, sae...@mellanox.com wrote:

From: Roi Dayan 

Some HWs need the VF driver to put part of the packet headers on the
TX descriptor so the e-switch can do proper matching and steering.

Could you please elaborate a bit about possible use-cases for different
modes? Thanks.


As written in the change log, some HW models have this requirement that 
the header set you want eswitch matching on (e.g L2/L3) is present as MD 
on the xmit DMA descriptor.


To address these requirements, following the admin devlink directive the 
FW advertizes that
to the VF, they are doing so in their xmit logic and the host driver 
enforces that the VF has the proper inline mode before we are willing to 
offload eswitch matching rules. If the VF doesn't obey to the 
requirement the packets are dropped by HW.


Or.

Or.

Re: [PATCH net-next 4/7] devlink: Add E-Switch inline mode control

2017-01-08 Thread Jiri Pirko

Sun, Jan 08, 2017 at 11:49:20AM CET, ogerl...@mellanox.com wrote:
>On 1/8/2017 12:29 PM, Jiri Pirko wrote:
>> Mon, Nov 21, 2016 at 02:05:58PM CET, sae...@mellanox.com wrote:
>> > From: Roi Dayan 
>> > 
>> > Some HWs need the VF driver to put part of the packet headers on the
>> > TX descriptor so the e-switch can do proper matching and steering.
>> Could you please elaborate a bit about possible use-cases for different
>> modes? Thanks.
>
>As written in the change log, some HW models have this requirement that the
>header set you want eswitch matching on (e.g L2/L3) is present as MD on the
>xmit DMA descriptor.
>
>To address these requirements, following the admin devlink directive the FW
>advertizes that
>to the VF, they are doing so in their xmit logic and the host driver enforces
>that the VF has the proper inline mode before we are willing to offload
>eswitch matching rules. If the VF doesn't obey to the requirement the packets
>are dropped by HW.

Okay, makes sense. Do you expect this will ever have to be needed
per-vf? In general, not only for mlx* drivers. I believe that this is
an e-switch requirement so it should be same for all connected VFs,
right?

Thanks!

Re: [PATCH net-next 4/7] devlink: Add E-Switch inline mode control

2017-01-08 Thread Or Gerlitz


On 1/8/2017 12:54 PM, Jiri Pirko wrote:

I believe that this is an e-switch requirement so it should be same for all 
connected VFs, right?

yes

Re: [PATCH v5 3/3] stmmac: adding new glue driver dwmac-dwc-qos-eth

2017-01-08 Thread Lars Persson


> On 06 Jan 2017, at 11:48 , Joao Pinto  wrote:
> 
> This patch adds a new glue driver called dwmac-dwc-qos-eth which
> was based in the dwc_eth_qos as is. To assure retro-compatibility a slight
> tweak was also added to stmmac_platform.
> 
> Signed-off-by: Joao Pinto 
> ---
> changes v4 -> v5:
> - memset was not done properly
> changes v3 -> v4:
> - stmmac_res is now being initialized to 0
> changes v2 -> v3:
> - Nothing changed, just to keep up patch set version
> changes v1 -> v2:
> - WOL was not declared in the new glue driver
> - clocks were switched and now fixed (apb_pclk and phy_ref_clk)
> 
> .../bindings/net/snps,dwc-qos-ethernet.txt |   3 +
> drivers/net/ethernet/stmicro/stmmac/Kconfig|   9 +
> drivers/net/ethernet/stmicro/stmmac/Makefile   |   1 +
> .../ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c| 202 +
> .../net/ethernet/stmicro/stmmac/stmmac_platform.c  |  15 +-
> 5 files changed, 227 insertions(+), 3 deletions(-)
> create mode 100644 drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
> 
> diff --git a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt 
> b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
> index d93f71c..21d27aa 100644
> --- a/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
> +++ b/Documentation/devicetree/bindings/net/snps,dwc-qos-ethernet.txt
> @@ -1,5 +1,8 @@
> * Synopsys DWC Ethernet QoS IP version 4.10 driver (GMAC)
> 
> +This binding is deprecated, but it continues to be supported, but new
> +features should be preferably added to the stmmac binding document.
> +
> This binding supports the Synopsys Designware Ethernet QoS (Quality Of 
> Service)
> IP block. The IP supports multiple options for bus type, clocking and reset
> structure, and feature list. Consequently, a number of properties and list
> diff --git a/drivers/net/ethernet/stmicro/stmmac/Kconfig 
> b/drivers/net/ethernet/stmicro/stmmac/Kconfig
> index ab66248..99594e3 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/Kconfig
> +++ b/drivers/net/ethernet/stmicro/stmmac/Kconfig
> @@ -29,6 +29,15 @@ config STMMAC_PLATFORM
> 
> if STMMAC_PLATFORM
> 
> +config DWMAC_DWC_QOS_ETH
> + tristate "Support for snps,dwc-qos-ethernet.txt DT binding."
> + select PHYLIB
> + select CRC32
> + select MII
> + depends on OF && HAS_DMA
> + help
> +   Support for chips using the snps,dwc-qos-ethernet.txt DT binding.
> +
> config DWMAC_GENERIC
>   tristate "Generic driver for DWMAC"
>   default STMMAC_PLATFORM
> diff --git a/drivers/net/ethernet/stmicro/stmmac/Makefile 
> b/drivers/net/ethernet/stmicro/stmmac/Makefile
> index 8f83a86..700c603 100644
> --- a/drivers/net/ethernet/stmicro/stmmac/Makefile
> +++ b/drivers/net/ethernet/stmicro/stmmac/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_DWMAC_SOCFPGA) += dwmac-altr-socfpga.o
> obj-$(CONFIG_DWMAC_STI)   += dwmac-sti.o
> obj-$(CONFIG_DWMAC_STM32) += dwmac-stm32.o
> obj-$(CONFIG_DWMAC_SUNXI) += dwmac-sunxi.o
> +obj-$(CONFIG_DWMAC_DWC_QOS_ETH)  += dwmac-dwc-qos-eth.o
> obj-$(CONFIG_DWMAC_GENERIC)   += dwmac-generic.o
> stmmac-platform-objs:= stmmac_platform.o
> dwmac-altr-socfpga-objs := altr_tse_pcs.o dwmac-socfpga.o
> diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c 
> b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
> new file mode 100644
> index 000..7bdbc77
> --- /dev/null
> +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-dwc-qos-eth.c
> @@ -0,0 +1,202 @@
> +/*
> + * Synopsys DWC Ethernet Quality-of-Service v4.10a linux driver
> + *
> + * Copyright (C) 2016 Joao Pinto 
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program. If not, see .
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "stmmac_platform.h"
> +
> +static int dwc_eth_dwmac_config_dt(struct platform_device *pdev,
> +struct plat_stmmacenet_data *plat_dat)
> +{
> + struct device_node *np = pdev->dev.of_node;
> + u32 burst_map = 0;
> + u32 bit_index = 0;
> + u32 a_index = 0;
> +
> + if (!plat_dat->axi) {
> + plat_dat->axi = kzalloc(sizeof(struct stmmac_axi), GFP_KERNEL);
> +
> + if (!plat_dat->axi)
> + return -ENOMEM;
> + }
> +
> + plat_dat->axi->axi_lpi_en = of_property_read_bool(np, "snps,en-lpi");
> + if (of_property_read_u32(np, "snps,write-requests",
> +  &plat_dat->axi->axi_wr_osr_lmt)) {
> + /**
> +  * Since the register has a reset value of 1, if property
> +  * i

Re: [PATCH v2 net-next 3/4] secure_seq: use SipHash in place of MD5

2017-01-08 Thread Jason A. Donenfeld

Hi David,

On Sat, Jan 7, 2017 at 10:37 PM, David Miller  wrote:
> This and the next patch are a real shame, performance wise, on cpus
> that have single-instruction SHA1 and MD5 implementations.  Sparc64
> has both, and I believe x86_64 can do SHA1 these days.
>
> It took so long to get those instructions into real silicon, and then
> have software implemented to make use of them as well.

Actually, from a performance perspective, these patches are strictly
better than what was already there, since nothing actually used the
special instructions. They're also better security wise, because the
prior use of these functions was quite dubious. On x86, using the FPU
isn't really an option in these situations, as you well know. On
Sparc64, sure, I guess it's a bummer that silicon is lagging
cryptography. If after merging these improvements, you want to start
thinking about a special construction just for Sparc64 that would be
faster and have a matching security level, this would of course be
great. But so far, nobody even bothered to do this for the old
insecure slow code that this is replacing.

> Who knows when we'll see SipHash widely deployed in any instruction
> set, if at all, right?  And by that time we'll possibly find out that
> "Oh shit, this SipHash thing has flaws!" and we'll need
> DIPPY_DO_DA_HASH and thus be forced back to a software implementation
> again.

The literature and cryptanalyses on SipHash have been quite positive.
And as I mentioned earlier in patchset messages, SipHash is really
_not_ some newfangled hipster thing, but rather something that's been
around a while, pretty extensively studied, and considered quite
venerable. I think if you're going to bet on something SipHash is one
of the more safe bets to be made.

> I understand the reasons why these patches are being proposed, I just
> thought I'd mention the issue of cpus that implement secure hash
> algorithm instructions.

Yea, agreed, it's a bummer. Hopefully silicon will catch up someday,
and we'll all be happy. In the meantime, at least these patches
improve the situation on Linux.

I interpret your letter's omission of any substantive comments on the
code itself to be an indication that things are mostly sane. I'll
follow up with Eric's suggestions to produce a v3, and then hopefully
we can get this merged.

Regards,
Jason

Re: [PATCH v2 net-next 0/4] Introduce The SipHash PRF

2017-01-08 Thread Jason A. Donenfeld

Hi Eric,

Thanks for round two. I'll address these. Comments are inline below.

On Sat, Jan 7, 2017 at 8:54 PM, Eric Biggers  wrote:
> Hi Jason, thanks for doing this!  Yes, I had gotten a little lost in the 
> earlier
> discussions about the 'random' driver and other potential users of SipHash.  I
> agree with the approach to just introduce the two uses in net/ to start with,
> and introduce more users later.  The changes from v1 to v2 look good too.

Indeed the initial patchset was _insane_ and the discussion became
sprawling and impossible. Everybody suggested I do baby steps, so
voila, there's this more manageable patchset now.

> Now that the HalfSipHash patch is Cc'ed to me too I do have one other small
> suggestion which is that this:
>
> #if BITS_PER_LONG == 64
> typedef siphash_key_t hsiphash_key_t;
> #define HSIPHASH_ALIGNMENT SIPHASH_ALIGNMENT
> #else
> typedef struct {
> u32 key[2];
> } hsiphash_key_t;
> #define HSIPHASH_ALIGNMENT __alignof__(u32)
> #endif
>
> could cause confusion if someone accidentally uses 'siphash_key_t' instead of
> 'hsiphash_key_t', as their code would compile fine on a 64-bit platform but
> would fail to compile on a 32-bit platform.  I think there should just always 
> be
> a hsiphash_key_t struct defined, and it can use unsigned long (no need for an
> #ifdef):
>
> #define HSIPHASH_ALIGNMENT __alignof__(unsigned long)
> typedef struct {
> unsigned long key[2];
> } hsiphash_key_t;

Good idea. Will adjust. That makes things a lot simpler.

> There's also a small error in Documentation/siphash.txt: hsiphash() is shown 
> as
> taking siphash_key_t instead of hsiphash_key_t.

Arg, nice catch, fixing.

> The uses in net look good too.  Something to watch out for is accidentally
> defining the structs in a way that leaves internal padding bytes, which could
> theoretically take on any value and cause the same input to produce different
> hashes.  But AFAICS, in the proposed patch all the structs are laid out
> properly, so that won't happen.

Indeed that's something closely examined. In fact, originally, just to
be careful, I was using __packed, but David pointed out that using
__packed makes gcc resort to byte-by-byte assignment, even if the
alignment would otherwise be natural. So, instead I just made sure to
list the members in descending order of size, and made sure to use
offsetendof instead of sizeof. I'll be sure to document this
precaution in the Documentation/siphash.txt file for the next version,
alongside a simple example.

> 'net_secret' could also be marked as __read_mostly, like the keys in
> syncookies.c, I suppose; it may not matter much.

Good point. Fixing.

New version coming your way soon.

Thanks again,
Jason

[PATCH v3 net-next 1/4] siphash: add cryptographically secure PRF

2017-01-08 Thread Jason A. Donenfeld

SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function, or as a
general PRF for short input use cases, such as sequence numbers or RNG
chaining.

For the first usage:

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector. Currently
hashtables use jhash, which is fast but not secure, and some kind of
rotating key scheme (or none at all, which isn't good). SipHash is meant
as a replacement for jhash in these cases.

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

While SipHash is extremely fast for a cryptographically secure function,
it is likely a bit slower than the insecure jhash, and so replacements
will be evaluated on a case-by-case basis based on whether or not the
difference in speed is negligible and whether or not the current jhash usage
poses a real security risk.

For the second usage:

A few places in the kernel are using MD5 or SHA1 for creating secure
sequence numbers, syn cookies, port numbers, or fast random numbers.
SipHash is a faster and more fitting, and more secure replacement for MD5
in those situations. Replacing MD5 and SHA1 with SipHash for these uses is
obvious and straight-forward, and so is submitted along with this patch
series. There shouldn't be much of a debate over its efficacy.

Dozens of languages are already using this internally for their hash
tables and PRFs. Some of the BSDs already use this in their kernels.
SipHash is a widely known high-speed solution to a widely known set of
problems, and it's time we catch-up.

Signed-off-by: Jason A. Donenfeld 
Reviewed-by: Jean-Philippe Aumasson 
Cc: Linus Torvalds 
Cc: Eric Biggers 
Cc: David Laight 
Cc: Eric Dumazet 
---
 Documentation/siphash.txt | 100 
 MAINTAINERS   |   7 ++
 include/linux/siphash.h   |  85 +
 lib/Kconfig.debug |   6 +-
 lib/Makefile  |   5 +-
 lib/siphash.c | 232 ++
 lib/test_siphash.c| 131 ++
 7 files changed, 561 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/siphash.txt
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

diff --git a/Documentation/siphash.txt b/Documentation/siphash.txt
new file mode 100644
index ..e8e6ddbbaab4
--- /dev/null
+++ b/Documentation/siphash.txt
@@ -0,0 +1,100 @@
+ SipHash - a short input PRF
+---
+Written by Jason A. Donenfeld 
+
+SipHash is a cryptographically secure PRF -- a keyed hash function -- that
+performs very well for short inputs, hence the name. It was designed by
+cryptographers Daniel J. Bernstein and Jean-Philippe Aumasson. It is intended
+as a replacement for some uses of: `jhash`, `md5_transform`, `sha_transform`,
+and so forth.
+
+SipHash takes a secret key filled with randomly generated numbers and either
+an input buffer or several input integers. It spits out an integer that is
+indistinguishable from random. You may then use that integer as part of secure
+sequence numbers, secure cookies, or mask it off for use in a hash table.
+
+1. Generating a key
+
+Keys should always be generated from a cryptographically secure source of
+random numbers, either using get_random_bytes or get_random_once:
+
+siphash_key_t key;
+get_random_bytes(&key, sizeof(key));
+
+If you're not deriving your key from here, you're doing it wrong.
+
+2. Using the functions
+
+There are two variants of the function, one that takes a list of integers, and
+one that takes a buffer:
+
+u64 siphash(const void *data, size_t len, const siphash_key_t *key);
+
+And:
+
+u64 siphash_1u64(u64, const siphash_key_t *key);
+u64 siphash_2u64(u64, u64, const siphash_key_t *key);
+u64 siphash_3u64(u64, u64, u64, const siphash_key_t *key);
+u64 siphash_4u64(u64, u64, u64, u64, const siphash_key_t *key);
+u64 siphash_1u32(u32, const siphash_key_t *key);
+u64 siphash_2u32(u32, u32, const siphash_key_t *key);
+u64 siphash_3u32(u32, u32, u32, const siphash_key_t *key);
+u64 siphash_4u32(u32, u32, u32, u32, const siphash_key_t *key);
+
+If you pass the generic siphash function something of a constant length, it
+will constant fold at compile-time and automatically choose one of the
+optimized functions

[PATCH v3 net-next 3/4] secure_seq: use SipHash in place of MD5

2017-01-08 Thread Jason A. Donenfeld

This gives a clear speed and security improvement. Siphash is both
faster and is more solid crypto than the aging MD5.

Rather than manually filling MD5 buffers, for IPv6, we simply create
a layout by a simple anonymous struct, for which gcc generates
rather efficient code. For IPv4, we pass the values directly to the
short input convenience functions.

64-bit x86_64:
[1.683628] secure_tcpv6_sequence_number_md5# cycles: 99563527
[1.717350] secure_tcp_sequence_number_md5# cycles: 92890502
[1.741968] secure_tcpv6_sequence_number_siphash# cycles: 67825362
[1.762048] secure_tcp_sequence_number_siphash# cycles: 67485526

32-bit x86:
[1.600012] secure_tcpv6_sequence_number_md5# cycles: 103227892
[1.634219] secure_tcp_sequence_number_md5# cycles: 94732544
[1.669102] secure_tcpv6_sequence_number_siphash# cycles: 96299384
[1.700165] secure_tcp_sequence_number_siphash# cycles: 86015473

Signed-off-by: Jason A. Donenfeld 
Cc: Andi Kleen 
Cc: David Miller 
Cc: David Laight 
Cc: Tom Herbert 
Cc: Hannes Frederic Sowa 
Cc: Eric Dumazet 
---
 net/core/secure_seq.c | 145 ++
 1 file changed, 63 insertions(+), 82 deletions(-)

diff --git a/net/core/secure_seq.c b/net/core/secure_seq.c
index 88a8e429fc3e..3a9fcec94ace 100644
--- a/net/core/secure_seq.c
+++ b/net/core/secure_seq.c
@@ -1,3 +1,7 @@
+/*
+ * Copyright (C) 2016 Jason A. Donenfeld . All Rights 
Reserved.
+ */
+
 #include 
 #include 
 #include 
@@ -8,18 +12,18 @@
 #include 
 #include 
 #include 
-
+#include 
 #include 
 
 #if IS_ENABLED(CONFIG_IPV6) || IS_ENABLED(CONFIG_INET)
+#include 
 #include 
-#define NET_SECRET_SIZE (MD5_MESSAGE_BYTES / 4)
 
-static u32 net_secret[NET_SECRET_SIZE] cacheline_aligned;
+static siphash_key_t net_secret __read_mostly;
 
 static __always_inline void net_secret_init(void)
 {
-   net_get_random_once(net_secret, sizeof(net_secret));
+   net_get_random_once(&net_secret, sizeof(net_secret));
 }
 #endif
 
@@ -44,80 +48,70 @@ static u32 seq_scale(u32 seq)
 u32 secure_tcpv6_sequence_number(const __be32 *saddr, const __be32 *daddr,
 __be16 sport, __be16 dport, u32 *tsoff)
 {
-   u32 secret[MD5_MESSAGE_BYTES / 4];
-   u32 hash[MD5_DIGEST_WORDS];
-   u32 i;
-
+   const struct {
+   struct in6_addr saddr;
+   struct in6_addr daddr;
+   __be16 sport;
+   __be16 dport;
+   } __aligned(SIPHASH_ALIGNMENT) combined = {
+   .saddr = *(struct in6_addr *)saddr,
+   .daddr = *(struct in6_addr *)daddr,
+   .sport = sport,
+   .dport = dport
+   };
+   u64 hash;
net_secret_init();
-   memcpy(hash, saddr, 16);
-   for (i = 0; i < 4; i++)
-   secret[i] = net_secret[i] + (__force u32)daddr[i];
-   secret[4] = net_secret[4] +
-   (((__force u16)sport << 16) + (__force u16)dport);
-   for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-   secret[i] = net_secret[i];
-
-   md5_transform(hash, secret);
-
-   *tsoff = sysctl_tcp_timestamps == 1 ? hash[1] : 0;
-   return seq_scale(hash[0]);
+   hash = siphash(&combined, offsetofend(typeof(combined), dport),
+  &net_secret);
+   *tsoff = sysctl_tcp_timestamps == 1 ? (hash >> 32) : 0;
+   return seq_scale(hash);
 }
 EXPORT_SYMBOL(secure_tcpv6_sequence_number);
 
 u32 secure_ipv6_port_ephemeral(const __be32 *saddr, const __be32 *daddr,
   __be16 dport)
 {
-   u32 secret[MD5_MESSAGE_BYTES / 4];
-   u32 hash[MD5_DIGEST_WORDS];
-   u32 i;
-
+   const struct {
+   struct in6_addr saddr;
+   struct in6_addr daddr;
+   __be16 dport;
+   } __aligned(SIPHASH_ALIGNMENT) combined = {
+   .saddr = *(struct in6_addr *)saddr,
+   .daddr = *(struct in6_addr *)daddr,
+   .dport = dport
+   };
net_secret_init();
-   memcpy(hash, saddr, 16);
-   for (i = 0; i < 4; i++)
-   secret[i] = net_secret[i] + (__force u32) daddr[i];
-   secret[4] = net_secret[4] + (__force u32)dport;
-   for (i = 5; i < MD5_MESSAGE_BYTES / 4; i++)
-   secret[i] = net_secret[i];
-
-   md5_transform(hash, secret);
-
-   return hash[0];
+   return siphash(&combined, offsetofend(typeof(combined), dport),
+  &net_secret);
 }
 EXPORT_SYMBOL(secure_ipv6_port_ephemeral);
 #endif
 
 #ifdef CONFIG_INET
 
+/* secure_tcp_sequence_number(a, b, 0, d) == secure_ipv4_port_ephemeral(a, b, 
d),
+ * but fortunately, `sport' cannot be 0 in any circumstances. If this changes,
+ * it would be easy enough to have the former function use siphash_4u32, 
passing
+ * the arguments as separate u32.
+ */
+
 u32 secure_tcp_sequence_number(__be32 saddr, __be32 daddr,
   __be16 sport, __be16 dport, u32 *tsoff)
 {
-   u32 h

[PATCH v3 net-next 4/4] syncookies: use SipHash in place of SHA1

2017-01-08 Thread Jason A. Donenfeld

SHA1 is slower and less secure than SipHash, and so replacing syncookie
generation with SipHash makes natural sense. Some BSDs have been doing
this for several years in fact.

The speedup should be similar -- and even more impressive -- to the
speedup from the sequence number fix in this series.

Signed-off-by: Jason A. Donenfeld 
Cc: Eric Dumazet 
Cc: David Miller 
---
 net/ipv4/syncookies.c | 21 +
 net/ipv6/syncookies.c | 41 +++--
 2 files changed, 24 insertions(+), 38 deletions(-)

diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index 3e88467d70ee..496b97e17aaf 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -13,13 +13,13 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
 #include 
 
-static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
+static siphash_key_t syncookie_secret[2] __read_mostly;
 
 #define COOKIEBITS 24  /* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
@@ -48,24 +48,13 @@ static u32 syncookie_secret[2][16-4+SHA_DIGEST_WORDS] 
__read_mostly;
 #define TSBITS 6
 #define TSMASK (((__u32)1 << TSBITS) - 1)
 
-static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], 
ipv4_cookie_scratch);
-
 static u32 cookie_hash(__be32 saddr, __be32 daddr, __be16 sport, __be16 dport,
   u32 count, int c)
 {
-   __u32 *tmp;
-
net_get_random_once(syncookie_secret, sizeof(syncookie_secret));
-
-   tmp  = this_cpu_ptr(ipv4_cookie_scratch);
-   memcpy(tmp + 4, syncookie_secret[c], sizeof(syncookie_secret[c]));
-   tmp[0] = (__force u32)saddr;
-   tmp[1] = (__force u32)daddr;
-   tmp[2] = ((__force u32)sport << 16) + (__force u32)dport;
-   tmp[3] = count;
-   sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
-
-   return tmp[17];
+   return siphash_4u32((__force u32)saddr, (__force u32)daddr,
+   (__force u32)sport << 16 | (__force u32)dport,
+   count, &syncookie_secret[c]);
 }
 
 
diff --git a/net/ipv6/syncookies.c b/net/ipv6/syncookies.c
index a4d49760bf43..895ff650db43 100644
--- a/net/ipv6/syncookies.c
+++ b/net/ipv6/syncookies.c
@@ -16,7 +16,7 @@
 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -24,7 +24,7 @@
 #define COOKIEBITS 24  /* Upper bits store count */
 #define COOKIEMASK (((__u32)1 << COOKIEBITS) - 1)
 
-static u32 syncookie6_secret[2][16-4+SHA_DIGEST_WORDS] __read_mostly;
+static siphash_key_t syncookie6_secret[2] __read_mostly;
 
 /* RFC 2460, Section 8.3:
  * [ipv6 tcp] MSS must be computed as the maximum packet size minus 60 [..]
@@ -41,30 +41,27 @@ static __u16 const msstab[] = {
9000 - 60,
 };
 
-static DEFINE_PER_CPU(__u32 [16 + 5 + SHA_WORKSPACE_WORDS], 
ipv6_cookie_scratch);
-
-static u32 cookie_hash(const struct in6_addr *saddr, const struct in6_addr 
*daddr,
+static u32 cookie_hash(const struct in6_addr *saddr,
+  const struct in6_addr *daddr,
   __be16 sport, __be16 dport, u32 count, int c)
 {
-   __u32 *tmp;
+   const struct {
+   struct in6_addr saddr;
+   struct in6_addr daddr;
+   u32 count;
+   __be16 sport;
+   __be16 dport;
+   } __aligned(SIPHASH_ALIGNMENT) combined = {
+   .saddr = *saddr,
+   .daddr = *daddr,
+   .count = count,
+   .sport = sport,
+   .dport = dport
+   };
 
net_get_random_once(syncookie6_secret, sizeof(syncookie6_secret));
-
-   tmp  = this_cpu_ptr(ipv6_cookie_scratch);
-
-   /*
-* we have 320 bits of information to hash, copy in the remaining
-* 192 bits required for sha_transform, from the syncookie6_secret
-* and overwrite the digest with the secret
-*/
-   memcpy(tmp + 10, syncookie6_secret[c], 44);
-   memcpy(tmp, saddr, 16);
-   memcpy(tmp + 4, daddr, 16);
-   tmp[8] = ((__force u32)sport << 16) + (__force u32)dport;
-   tmp[9] = count;
-   sha_transform(tmp + 16, (__u8 *)tmp, tmp + 16 + 5);
-
-   return tmp[17];
+   return siphash(&combined, offsetofend(typeof(combined), dport),
+  &syncookie6_secret[c]);
 }
 
 static __u32 secure_tcp_syn_cookie(const struct in6_addr *saddr,
-- 
2.11.0

[PATCH v3 net-next 0/4] Introduce The SipHash PRF

2017-01-08 Thread Jason A. Donenfeld

This patch series introduces SipHash into the kernel. SipHash is a
cryptographically secure PRF, which serves a variety of functions, and is
introduced in patch #1. The following patch #2 introduces HalfSipHash,
an optimization suitable for hash tables only. Finally, the last two patches
in this series show two usages of the introduced siphash function family.
It is expected that after this initial introduction, other usages will follow.

Please read the extensive descriptions in patch #1 and patch #2 of what these
functions do and the various levels of assurances. They're products of intense
cryptographic research, and I believe they're suitable for the uses outlined
herein.

The use of SipHash is not limited to the networking subsystem -- indeed I
would like to use it in other places too in the kernel. But after discussing
with a few on this list and at Linus' suggestion, the initial import of these
functions is coming through the networking tree. After these are merged, it
will then be easier to expand use elsewhere.

Changes v2->v3:
  - hsiphash keys now simply use an unsigned long, in order to avoid
a cluttered ifdef and make it a bit more clear what's happening.
  - A typo in the documentation has been fixed.
  - The documentation has been augmented with an example relating to struct
packing and passing.
  - The net_secret variable is now __read_mostly.

Hopefully this is the last of the required revisions, and v3 can be merged
into net-next.

Jason A. Donenfeld (4):
  siphash: add cryptographically secure PRF
  siphash: implement HalfSipHash1-3 for hash tables
  secure_seq: use SipHash in place of MD5
  syncookies: use SipHash in place of SHA1

 Documentation/siphash.txt | 175 +++
 MAINTAINERS   |   7 +
 include/linux/siphash.h   | 140 
 lib/Kconfig.debug |   6 +-
 lib/Makefile  |   5 +-
 lib/siphash.c | 551 ++
 lib/test_siphash.c| 223 +++
 net/core/secure_seq.c | 145 ++--
 net/ipv4/syncookies.c |  21 +-
 net/ipv6/syncookies.c |  41 ++--
 10 files changed, 1189 insertions(+), 125 deletions(-)
 create mode 100644 Documentation/siphash.txt
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c
 create mode 100644 lib/test_siphash.c

-- 
2.11.0

[PATCH v3 net-next 2/4] siphash: implement HalfSipHash1-3 for hash tables

2017-01-08 Thread Jason A. Donenfeld

HalfSipHash, or hsiphash, is a shortened version of SipHash, which
generates 32-bit outputs using a weaker 64-bit key. It has *much* lower
security margins, and shouldn't be used for anything too sensitive, but
it could be used as a hashtable key function replacement, if the output
is never exposed, and if the security requirement is not too high.

The goal is to make this something that performance-critical jhash users
would be willing to use.

On 64-bit machines, HalfSipHash1-3 is slower than SipHash1-3, so we alias
SipHash1-3 to HalfSipHash1-3 on those systems.

64-bit x86_64:
[0.509409] test_siphash: SipHash2-4 cycles: 4049181
[0.510650] test_siphash: SipHash1-3 cycles: 2512884
[0.512205] test_siphash: HalfSipHash1-3 cycles: 3429920
[0.512904] test_siphash:JenkinsHash cycles:  978267
So, we map hsiphash() -> SipHash1-3

32-bit x86:
[0.509868] test_siphash: SipHash2-4 cycles: 14812892
[0.513601] test_siphash: SipHash1-3 cycles:  9510710
[0.515263] test_siphash: HalfSipHash1-3 cycles:  3856157
[0.515952] test_siphash:JenkinsHash cycles:  1148567
So, we map hsiphash() -> HalfSipHash1-3

hsiphash() is roughly 3 times slower than jhash(), but comes with a
considerable security improvement.

Signed-off-by: Jason A. Donenfeld 
Reviewed-by: Jean-Philippe Aumasson 
---
 Documentation/siphash.txt |  75 +++
 include/linux/siphash.h   |  57 +++-
 lib/siphash.c | 321 +-
 lib/test_siphash.c|  98 +-
 4 files changed, 546 insertions(+), 5 deletions(-)

diff --git a/Documentation/siphash.txt b/Documentation/siphash.txt
index e8e6ddbbaab4..908d348ff777 100644
--- a/Documentation/siphash.txt
+++ b/Documentation/siphash.txt
@@ -98,3 +98,78 @@ u64 h = siphash(&combined, offsetofend(typeof(combined), 
dport), &secret);
 
 Read the SipHash paper if you're interested in learning more:
 https://131002.net/siphash/siphash.pdf
+
+
+~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~=~
+
+HalfSipHash - SipHash's insecure younger cousin
+---
+Written by Jason A. Donenfeld 
+
+On the off-chance that SipHash is not fast enough for your needs, you might be
+able to justify using HalfSipHash, a terrifying but potentially useful
+possibility. HalfSipHash cuts SipHash's rounds down from "2-4" to "1-3" and,
+even scarier, uses an easily brute-forcable 64-bit key (with a 32-bit output)
+instead of SipHash's 128-bit key. However, this may appeal to some
+high-performance `jhash` users.
+
+Danger!
+
+Do not ever use HalfSipHash except for as a hashtable key function, and only
+then when you can be absolutely certain that the outputs will never be
+transmitted out of the kernel. This is only remotely useful over `jhash` as a
+means of mitigating hashtable flooding denial of service attacks.
+
+1. Generating a key
+
+Keys should always be generated from a cryptographically secure source of
+random numbers, either using get_random_bytes or get_random_once:
+
+hsiphash_key_t key;
+get_random_bytes(&key, sizeof(key));
+
+If you're not deriving your key from here, you're doing it wrong.
+
+2. Using the functions
+
+There are two variants of the function, one that takes a list of integers, and
+one that takes a buffer:
+
+u32 hsiphash(const void *data, size_t len, const hsiphash_key_t *key);
+
+And:
+
+u32 hsiphash_1u32(u32, const hsiphash_key_t *key);
+u32 hsiphash_2u32(u32, u32, const hsiphash_key_t *key);
+u32 hsiphash_3u32(u32, u32, u32, const hsiphash_key_t *key);
+u32 hsiphash_4u32(u32, u32, u32, u32, const hsiphash_key_t *key);
+
+If you pass the generic hsiphash function something of a constant length, it
+will constant fold at compile-time and automatically choose one of the
+optimized functions.
+
+3. Hashtable key function usage:
+
+struct some_hashtable {
+   DECLARE_HASHTABLE(hashtable, 8);
+   hsiphash_key_t key;
+};
+
+void init_hashtable(struct some_hashtable *table)
+{
+   get_random_bytes(&table->key, sizeof(table->key));
+}
+
+static inline hlist_head *some_hashtable_bucket(struct some_hashtable *table, 
struct interesting_input *input)
+{
+   return &table->hashtable[hsiphash(input, sizeof(*input), &table->key) & 
(HASH_SIZE(table->hashtable) - 1)];
+}
+
+You may then iterate like usual over the returned hash bucket.
+
+4. Performance
+
+HalfSipHash is roughly 3 times slower than JenkinsHash. For many replacements,
+this will not be a problem, as the hashtable lookup isn't the bottleneck. And
+in general, this is probably a good sacrifice to make for the security and DoS
+resistance of HalfSipHash.
diff --git a/include/linux/siphash.h b/include/linux/siphash.h
index feeb29cd113e..fa7a6b9cedbf 100644
--- a/include/linux/siphash.h
+++ b/include/linux/siphash.h
@@ -5,7 +5,9 @@
  * SipHash: a fast short-input PRF
  * https://131002.net/siphash/
  *
- * This implementation is specifically for SipHash2-4.
+ * This imp

GREETINGS

2017-01-08 Thread THANDI ROBERT



Hello my name is Ms. Thandi Robert, from Ivory Coast. My parents were brutally 
mulled by the former president Laurent Gbagbo because of political crisis as 
the only survival of my family. I got your email while searching for a reliable 
personality in my private study on the internet. I am in need of your help and 
stand as my guardian in the management of my family inherited sum of $22.5M 
USD.  Please get back to me with your private telephone number with sincerity. 

Sincerely Yours, 
Ms. Thandi Robert

[PATCH] net: ethernet: ti: cpsw: remove dual check from common res usage function

2017-01-08 Thread Ivan Khoronzhuk

Common res usage is possible only in case an interface is
running. In case of not dual emac here can be only one interface,
so while ndo_open and switch mode, only one interface can be opened,
thus if open is called no any interface is running ... and no common
res are used. So remove check on dual emac, it will simplify
code/understanding and will match the name it's called.

Signed-off-by: Ivan Khoronzhuk 
---

Based on linux-next/master

 drivers/net/ethernet/ti/cpsw.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index b203143..91684f1 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1235,9 +1235,6 @@ static int cpsw_common_res_usage_state(struct cpsw_common 
*cpsw)
u32 i;
u32 usage_count = 0;
 
-   if (!cpsw->data.dual_emac)
-   return 0;
-
for (i = 0; i < cpsw->data.slaves; i++)
if (cpsw->slaves[i].open_stat)
usage_count++;
-- 
2.7.4

[PATCH stable 4.1] openvswitch: gre: filter gre packets

2017-01-08 Thread Pravin B Shelar

OVS can only process L2 packets. But OVS GRE receive handler
can accept IP-GRE packets. When such packet is processed by
OVS datapath it can trigger following assert failure due
to insufficient linear data in skb. Following patch filters
received packets to avoid this issue.

[68240.441681] [ cut here ]
[68240.496918] kernel BUG at 
/build/linux-lts-trusty-D60X6T/linux-lts-trusty-3.13.0/include/linux/skbuff.h:1486!
[68240.615520] invalid opcode:  [#1] SMP
[68241.953939] RIP: [] __skb_pull.part.7+0x4/0x6 [openvswitch]
[68243.099945] Call Trace:
[68243.129188]  
[68243.152204]  [] ovs_flow_extract+0x664/0x720 [openvswitch]
[68243.314912]  [] ovs_dp_process_received_packet+0x60/0x130 
[openvswitch]
[68243.481559]  [] ovs_vport_receive+0x2a/0x30 [openvswitch]
[68243.564884]  [] gre_rcv+0xa4/0xb8 [openvswitch]
[68243.637802]  [] gre_cisco_rcv+0x75/0xbc [gre]
[68243.708621]  [] gre_rcv+0x65/0x90 [gre]
[68243.773214]  [] ip_local_deliver_finish+0xa8/0x220
[68243.849244]  [] ip_local_deliver+0x4b/0x90
[68243.916951]  [] ip_rcv_finish+0x121/0x380
[68243.983627]  [] ip_rcv+0x286/0x380
[68244.043023]  [] __netif_receive_skb_core+0x61a/0x760
[68244.121122]  [] __netif_receive_skb+0x21/0x70
[68244.191942]  [] process_backlog+0xb1/0x190
[68244.259642]  [] net_rx_action+0x139/0x280
[68244.326305]  [] __do_softirq+0xed/0x360
[68244.390887]  [] irq_exit+0x11e/0x140
[68244.452358]  [] do_IRQ+0x63/0xe0
[68244.509674]  [] common_interrupt+0x6d/0x6d
[68245.392237] RIP  [] __skb_pull.part.7+0x4/0x6 [openvswitch]
[68245.520082] ---[ end trace 383bac9f3e676970 ]---

Fixes: aa310701e7 ("openvswitch: Add gre tunnel support.")
Reported-by: Uri Foox 
CC: Joe Stringer 
Signed-off-by: Pravin B Shelar 
---
Newer OVS GRE vport uses LWT interface which does not have this issue.
---
 net/openvswitch/vport-gre.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/openvswitch/vport-gre.c b/net/openvswitch/vport-gre.c
index f17ac96..de67fd1 100644
--- a/net/openvswitch/vport-gre.c
+++ b/net/openvswitch/vport-gre.c
@@ -102,6 +102,9 @@ static int gre_rcv(struct sk_buff *skb,
struct vport *vport;
__be64 key;
 
+   if (tpi->proto != htons(ETH_P_TEB))
+   return PACKET_REJECT;
+
ovs_net = net_generic(dev_net(skb->dev), ovs_net_id);
vport = rcu_dereference(ovs_net->vport_net.gre_vport);
if (unlikely(!vport))
-- 
2.9.3

[for-next V2 04/10] IB/mlx5: Fix retrieval of index to first hi class bfreg

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

First the function retrieving the index of the first hi latency class
blue flame register. High latency class bfregs are located right above
medium latency class bfregs.

Fixes: c1be5232d21d ('IB/mlx5: Fix micro UAR allocator')
Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/qp.c | 24 ++--
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index fbea9bd..240fbb0 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -490,12 +490,21 @@ static int next_bfreg(int n)
return n;
 }
 
+enum {
+   /* this is the first blue flame register in the array of bfregs assigned
+* to a processes. Since we do not use it for blue flame but rather
+* regular 64 bit doorbells, we do not need a lock for maintaiing
+* "odd/even" order
+*/
+   NUM_NON_BLUE_FLAME_BFREGS = 1,
+};
+
 static int num_med_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int n;
 
n = bfregi->num_uars * MLX5_NON_FP_BFREGS_PER_UAR -
-   bfregi->num_low_latency_bfregs - 1;
+   bfregi->num_low_latency_bfregs - NUM_NON_BLUE_FLAME_BFREGS;
 
return n >= 0 ? n : 0;
 }
@@ -508,17 +517,9 @@ static int max_bfregi(struct mlx5_bfreg_info *bfregi)
 static int first_hi_bfreg(struct mlx5_bfreg_info *bfregi)
 {
int med;
-   int i;
-   int t;
 
med = num_med_bfreg(bfregi);
-   for (t = 0, i = first_med_bfreg();; i = next_bfreg(i)) {
-   t++;
-   if (t == med)
-   return next_bfreg(i);
-   }
-
-   return 0;
+   return next_bfreg(med);
 }
 
 static int alloc_high_class_bfreg(struct mlx5_bfreg_info *bfregi)
@@ -544,6 +545,8 @@ static int alloc_med_class_bfreg(struct mlx5_bfreg_info 
*bfregi)
for (i = first_med_bfreg(); i < first_hi_bfreg(bfregi); i = 
next_bfreg(i)) {
if (bfregi->count[i] < bfregi->count[minidx])
minidx = i;
+   if (!bfregi->count[minidx])
+   break;
}
 
bfregi->count[minidx]++;
@@ -558,6 +561,7 @@ static int alloc_bfreg(struct mlx5_bfreg_info *bfregi,
mutex_lock(&bfregi->lock);
switch (lat) {
case MLX5_IB_LATENCY_CLASS_LOW:
+   BUILD_BUG_ON(NUM_NON_BLUE_FLAME_BFREGS != 1);
bfregn = 0;
bfregi->count[bfregn]++;
break;
-- 
2.7.4

[for-next V2 01/10] IB/mlx5: Fix kernel to user leak prevention logic

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

The logic was broken as it failed to update the response length for
architectures with PAGE_SIZE larger than 4kB. As a result further
extension of the ucontext response struct would fail.

Fixes: d69e3bcf7976 ('IB/mlx5: Mmap the HCA's core clock register to 
user-space')
Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/main.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 86c61e7..852b5b7 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -1148,13 +1148,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 * pretend we don't support reading the HCA's core clock. This is also
 * forced by mmap function.
 */
-   if (PAGE_SIZE <= 4096 &&
-   field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
-   resp.comp_mask |=
-   MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
-   resp.hca_core_clock_offset =
-   offsetof(struct mlx5_init_seg, internal_timer_h) %
-   PAGE_SIZE;
+   if (field_avail(typeof(resp), hca_core_clock_offset, udata->outlen)) {
+   if (PAGE_SIZE <= 4096) {
+   resp.comp_mask |=
+   
MLX5_IB_ALLOC_UCONTEXT_RESP_MASK_CORE_CLOCK_OFFSET;
+   resp.hca_core_clock_offset =
+   offsetof(struct mlx5_init_seg, 
internal_timer_h) % PAGE_SIZE;
+   }
resp.response_length += sizeof(resp.hca_core_clock_offset) +
sizeof(resp.reserved2);
}
-- 
2.7.4

[for-next V2 02/10] IB/mlx5: Fix error handling order in create_kernel_qp

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

Make sure order of cleanup is exactly the opposite of initialization.

Fixes: 9603b61de1ee ('mlx5: Move pci device handling from mlx5_ib to mlx5_core')
Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/qp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c
index 53f4dd3..42d021cd 100644
--- a/drivers/infiniband/hw/mlx5/qp.c
+++ b/drivers/infiniband/hw/mlx5/qp.c
@@ -994,12 +994,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
return 0;
 
 err_wrid:
-   mlx5_db_free(dev->mdev, &qp->db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, &qp->db);
 
 err_free:
kvfree(*in);
@@ -1014,12 +1014,12 @@ static int create_kernel_qp(struct mlx5_ib_dev *dev,
 
 static void destroy_qp_kernel(struct mlx5_ib_dev *dev, struct mlx5_ib_qp *qp)
 {
-   mlx5_db_free(dev->mdev, &qp->db);
kfree(qp->sq.wqe_head);
kfree(qp->sq.w_list);
kfree(qp->sq.wrid);
kfree(qp->sq.wr_data);
kfree(qp->rq.wrid);
+   mlx5_db_free(dev->mdev, &qp->db);
mlx5_buf_free(dev->mdev, &qp->buf);
free_uuar(&dev->mdev->priv.uuari, qp->bf->uuarn);
 }
-- 
2.7.4

[for-next V2 00/10][pull request] Mellanox 100G mlx5 4K UAR support

2017-01-08 Thread Saeed Mahameed

Hi Dave and Doug,

Following the mlx5-odp submission, you can find here the 2nd mlx5
submission for 4.11 as a pull-request including mlx5 4K UAR support from
Eli Cohen (details below).  For you Doug, this pull request will provide 
you with both mlx5 odp and mlx5 4k UAR since it is based on Dave's
net-next mlx5-odp merge commit.

v1->v2:
  - Removed 64BIT arch dependency.

Thank you,
Saeed.

---

The following changes since commit 525dfa2cdce4f5ab76251b5e57ebabf4f2dfc40c:

  Merge branch 'mlx5-odp' (2017-01-02 15:51:21 -0500)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git 
tags/mlx5-4kuar-for-4.11

for you to fetch changes up to ca704520bd370758aec9d70afeeecc9d643fe132:

  net/mlx5: Activate support for 4K UARs (2017-01-08 11:21:27 +0200)


mlx5 4K UAR

The following series of patches optimizes the usage of the UAR area which is
contained within the BAR 0-1. Previous versions of the firmware and the driver
assumed each system page contains a single UAR. This patch set will query the
firmware for a new capability that if published, means that the firmware can
support UARs of fixed 4K regardless of system page size. In the case of
powerpc, where page size equals 64KB, this means we can utilize 16 UARs per
system page. Since user space processes by default consume eight UARs per
context this means that with this change a process will need a single system
page to fulfill that requirement and in fact make use of more UARs which is
better in terms of performance.

In addition to optimizing user-space processes, we introduce an allocator
that can be used by kernel consumers to allocate blue flame registers
(which are areas within a UAR that are used to write doorbells). This provides
further optimization on using the UAR area since the Ethernet driver makes
use of a single blue flame register per system page and now it will use two
blue flame registers per 4K.

The series also makes changes to naming conventions and now the terms used in
the driver code match the terms used in the PRM (programmers reference manual).
Thus, what used to be called UUAR (micro UAR) is now called BFREG (blue flame
register).

In order to support compatibility between different versions of
library/driver/firmware, the library has now means to notify the kernel driver
that it supports the new scheme and the kernel can notify the library if it
supports this extension. So mixed versions of libraries can run concurrently
without any issues.

Thanks,
Eli and Matan


Eli Cohen (10):
  IB/mlx5: Fix kernel to user leak prevention logic
  IB/mlx5: Fix error handling order in create_kernel_qp
  mlx5: Fix naming convention with respect to UARs
  IB/mlx5: Fix retrieval of index to first hi class bfreg
  net/mlx5: Introduce blue flame register allocator
  net/mlx5: Add interface to get reference to a UAR
  IB/mlx5: Use blue flame register allocator in mlx5_ib
  IB/mlx5: Allow future extension of libmlx5 input data
  IB/mlx5: Support 4k UAR for libmlx5
  net/mlx5: Activate support for 4K UARs

 drivers/infiniband/hw/mlx5/cq.c|  10 +-
 drivers/infiniband/hw/mlx5/main.c  | 278 ++--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  32 +-
 drivers/infiniband/hw/mlx5/qp.c| 290 +++--
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |   2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  11 +-
 .../net/ethernet/mellanox/mlx5/core/en_common.c|  12 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |  21 +-
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |  14 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  26 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 351 +
 include/linux/mlx5/cq.h|   5 +-
 include/linux/mlx5/device.h|  23 +-
 include/linux/mlx5/doorbell.h  |   6 +-
 include/linux/mlx5/driver.h|  81 ++---
 include/linux/mlx5/mlx5_ifc.h  |   7 +-
 include/uapi/rdma/mlx5-abi.h   |  19 +-
 17 files changed, 672 insertions(+), 516 deletions(-)

-- 
2.7.4

[for-next V2 10/10] net/mlx5: Activate support for 4K UARs

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

Activate 4K UAR support for firmware versions that support it.

Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index ff1f144..a16ee16 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -530,6 +530,10 @@ static int handle_hca_cap(struct mlx5_core_dev *dev)
/* disable cmdif checksum */
MLX5_SET(cmd_hca_cap, set_hca_cap, cmdif_checksum, 0);
 
+   /* If the HCA supports 4K UARs use it */
+   if (MLX5_CAP_GEN_MAX(dev, uar_4k))
+   MLX5_SET(cmd_hca_cap, set_hca_cap, uar_4k, 1);
+
MLX5_SET(cmd_hca_cap, set_hca_cap, log_uar_page_sz, PAGE_SHIFT - 12);
 
err = set_caps(dev, set_ctx, set_sz,
-- 
2.7.4

[for-next V2 03/10] mlx5: Fix naming convention with respect to UARs

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

This establishes a solid naming conventions for UARs. A UAR (User Access
Region) can have size identical to a system page or can be fixed 4KB
depending on a value queried by firmware. Each UAR always has 4 blue
flame register which are used to post doorbell to send queue. In
addition, a UAR has section used for posting doorbells to CQs or EQs. In
this patch we change names to reflect this conventions.

Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/cq.c|   6 +-
 drivers/infiniband/hw/mlx5/main.c  |  80 +--
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |   6 +-
 drivers/infiniband/hw/mlx5/qp.c| 176 -
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |   8 +-
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  |  90 ++---
 include/linux/mlx5/device.h|   9 +-
 include/linux/mlx5/driver.h|  14 +-
 include/uapi/rdma/mlx5-abi.h   |  12 +-
 10 files changed, 206 insertions(+), 203 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index b3ef47c..bb7e91c 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.uuari.uars[0].map;
+   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
unsigned long irq_flags;
int ret = 0;
 
@@ -790,7 +790,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->uuari.uars[0].index;
+   *index = to_mucontext(context)->bfregi.uars[0].index;
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
@@ -886,7 +886,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.uuari.uars[0].index;
+   *index = dev->mdev->priv.bfregi.uars[0].index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 852b5b7..d5cf82b 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -999,12 +999,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
struct mlx5_ib_alloc_ucontext_req_v2 req = {};
struct mlx5_ib_alloc_ucontext_resp resp = {};
struct mlx5_ib_ucontext *context;
-   struct mlx5_uuar_info *uuari;
+   struct mlx5_bfreg_info *bfregi;
struct mlx5_uar *uars;
-   int gross_uuars;
+   int gross_bfregs;
int num_uars;
int ver;
-   int uuarn;
+   int bfregn;
int err;
int i;
size_t reqlen;
@@ -1032,10 +1032,10 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
if (req.flags)
return ERR_PTR(-EINVAL);
 
-   if (req.total_num_uuars > MLX5_MAX_UUARS)
+   if (req.total_num_bfregs > MLX5_MAX_BFREGS)
return ERR_PTR(-ENOMEM);
 
-   if (req.total_num_uuars == 0)
+   if (req.total_num_bfregs == 0)
return ERR_PTR(-EINVAL);
 
if (req.comp_mask || req.reserved0 || req.reserved1 || req.reserved2)
@@ -1046,13 +1046,13 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
 reqlen - sizeof(req)))
return ERR_PTR(-EOPNOTSUPP);
 
-   req.total_num_uuars = ALIGN(req.total_num_uuars,
-   MLX5_NON_FP_BF_REGS_PER_PAGE);
-   if (req.num_low_latency_uuars > req.total_num_uuars - 1)
+   req.total_num_bfregs = ALIGN(req.total_num_bfregs,
+   MLX5_NON_FP_BFREGS_PER_UAR);
+   if (req.num_low_latency_bfregs > req.total_num_bfregs - 1)
return ERR_PTR(-EINVAL);
 
-   num_uars = req.total_num_uuars / MLX5_NON_FP_BF_REGS_PER_PAGE;
-   gross_uuars = num_uars * MLX5_BF_REGS_PER_PAGE;
+   num_uars = req.total_num_bfregs / MLX5_NON_FP_BFREGS_PER_UAR;
+   gross_bfregs = num_uars * MLX5_BFREGS_PER_UAR;
resp.qp_tab_size = 1 << MLX5_CAP_GEN(dev->mdev, log_max_qp);
if (mlx5_core_is_pf(dev->mdev) && MLX5_CAP_GEN(dev->mdev, bf))
resp.bf_reg_size = 1 << MLX5_CAP_GEN(dev->mdev, 
log_bf_reg_size);
@@ -1072,32 +1072,33 @@ static struct ib_ucontext 
*mlx5_ib_alloc_ucontext(struct ib_device *ibdev,
if (!contex

[for-next V2 07/10] IB/mlx5: Use blue flame register allocator in mlx5_ib

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

Make use of the blue flame registers allocator at mlx5_ib. Since blue
flame was not really supported we remove all the code that is related to
blue flame and we let all consumers to use the same blue flame register.
Once blue flame is supported we will add the code. As part of this patch
we also move the definition of struct mlx5_bf to mlx5_ib.h as it is only
used by mlx5_ib.

Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/cq.c|   8 +-
 drivers/infiniband/hw/mlx5/main.c  |  28 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h   |  11 ++-
 drivers/infiniband/hw/mlx5/qp.c|  73 +++-
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |   2 +-
 drivers/net/ethernet/mellanox/mlx5/core/main.c |  16 +---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 114 -
 include/linux/mlx5/cq.h|   3 +-
 include/linux/mlx5/doorbell.h  |   6 +-
 include/linux/mlx5/driver.h|  19 -
 10 files changed, 59 insertions(+), 221 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index bb7e91c..a28ec33 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -689,7 +689,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
 {
struct mlx5_core_dev *mdev = to_mdev(ibcq->device)->mdev;
struct mlx5_ib_cq *cq = to_mcq(ibcq);
-   void __iomem *uar_page = mdev->priv.bfregi.uars[0].map;
+   void __iomem *uar_page = mdev->priv.uar->map;
unsigned long irq_flags;
int ret = 0;
 
@@ -704,9 +704,7 @@ int mlx5_ib_arm_cq(struct ib_cq *ibcq, enum 
ib_cq_notify_flags flags)
mlx5_cq_arm(&cq->mcq,
(flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ?
MLX5_CQ_DB_REQ_NOT_SOL : MLX5_CQ_DB_REQ_NOT,
-   uar_page,
-   MLX5_GET_DOORBELL_LOCK(&mdev->priv.cq_uar_lock),
-   to_mcq(ibcq)->mcq.cons_index);
+   uar_page, to_mcq(ibcq)->mcq.cons_index);
 
return ret;
 }
@@ -886,7 +884,7 @@ static int create_cq_kernel(struct mlx5_ib_dev *dev, struct 
mlx5_ib_cq *cq,
MLX5_SET(cqc, cqc, log_page_size,
 cq->buf.buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = dev->mdev->priv.bfregi.uars[0].index;
+   *index = dev->mdev->priv.uar->index;
 
return 0;
 
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index d5cf82b..e9f0830 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -3074,8 +3074,6 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (mlx5_use_mad_ifc(dev))
get_ext_port_caps(dev);
 
-   MLX5_INIT_DOORBELL_LOCK(&dev->uar_lock);
-
if (!mlx5_lag_is_active(mdev))
name = "mlx5_%d";
else
@@ -3251,9 +3249,21 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
if (err)
goto err_odp;
 
+   dev->mdev->priv.uar = mlx5_get_uars_page(dev->mdev);
+   if (!dev->mdev->priv.uar)
+   goto err_q_cnt;
+
+   err = mlx5_alloc_bfreg(dev->mdev, &dev->bfreg, false, false);
+   if (err)
+   goto err_uar_page;
+
+   err = mlx5_alloc_bfreg(dev->mdev, &dev->fp_bfreg, false, true);
+   if (err)
+   goto err_bfreg;
+
err = ib_register_device(&dev->ib_dev, NULL);
if (err)
-   goto err_q_cnt;
+   goto err_fp_bfreg;
 
err = create_umr_res(dev);
if (err)
@@ -3276,6 +3286,15 @@ static void *mlx5_ib_add(struct mlx5_core_dev *mdev)
 err_dev:
ib_unregister_device(&dev->ib_dev);
 
+err_fp_bfreg:
+   mlx5_free_bfreg(dev->mdev, &dev->fp_bfreg);
+
+err_bfreg:
+   mlx5_free_bfreg(dev->mdev, &dev->bfreg);
+
+err_uar_page:
+   mlx5_put_uars_page(dev->mdev, dev->mdev->priv.uar);
+
 err_q_cnt:
mlx5_ib_dealloc_q_counters(dev);
 
@@ -3307,6 +3326,9 @@ static void mlx5_ib_remove(struct mlx5_core_dev *mdev, 
void *context)
 
mlx5_remove_netdev_notifier(dev);
ib_unregister_device(&dev->ib_dev);
+   mlx5_free_bfreg(dev->mdev, &dev->fp_bfreg);
+   mlx5_free_bfreg(dev->mdev, &dev->bfreg);
+   mlx5_put_uars_page(dev->mdev, mdev->priv.uar);
mlx5_ib_dealloc_q_counters(dev);
destroy_umrc_res(dev);
mlx5_ib_odp_remove_one(dev);
diff --git a/drivers/infiniband/hw/mlx5/mlx5_ib.h 
b/drivers/infiniband/hw/mlx5/mlx5_ib.h
index d4d1329..ae3bc4a 100644
--- a/drivers/infiniband/hw/mlx5/mlx5_ib.h
+++ b/drivers/infiniband/hw/mlx5/mlx5_ib.h
@@ -324,6 +324,12 @@ struct mlx5_ib_raw_packet_qp {
struct mlx5_ib_rq rq;
 };
 
+struct mlx5_bf {
+   int buf_size;
+   unsigned long   offset;
+

[for-next V2 05/10] net/mlx5: Introduce blue flame register allocator

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

Here is an implementation of an allocator that allocates blue flame
registers. A blue flame register is used for generating send doorbells.
A blue flame register can be used to generate either a regular doorbell
or a blue flame doorbell where the data to be sent is written to the
device's I/O memory hence saving the need to read the data from memory.
For blue flame kind of doorbells to succeed, the blue flame register
need to be mapped as write combining. The user can specify what kind of
send doorbells she wishes to use. If she requested write combining
mapping but that failed, the allocator will fall back to non write
combining mapping and will indicate that to the user.
Subsequent patches in this series will make use of this allocator.

Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/uar.c | 235 ++
 include/linux/mlx5/device.h   |   2 +
 include/linux/mlx5/driver.h   |  37 
 include/linux/mlx5/mlx5_ifc.h |   7 +-
 4 files changed, 279 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/uar.c 
b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
index ce7fceb..6a081a8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/uar.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/uar.c
@@ -231,3 +231,238 @@ void mlx5_unmap_free_uar(struct mlx5_core_dev *mdev, 
struct mlx5_uar *uar)
mlx5_cmd_free_uar(mdev, uar->index);
 }
 EXPORT_SYMBOL(mlx5_unmap_free_uar);
+
+static int uars_per_sys_page(struct mlx5_core_dev *mdev)
+{
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   return MLX5_CAP_GEN(mdev, num_of_uars_per_page);
+
+   return 1;
+}
+
+static u64 uar2pfn(struct mlx5_core_dev *mdev, u32 index)
+{
+   u32 system_page_index;
+
+   if (MLX5_CAP_GEN(mdev, uar_4k))
+   system_page_index = index >> (PAGE_SHIFT - 
MLX5_ADAPTER_PAGE_SHIFT);
+   else
+   system_page_index = index;
+
+   return (pci_resource_start(mdev->pdev, 0) >> PAGE_SHIFT) + 
system_page_index;
+}
+
+static void up_rel_func(struct kref *kref)
+{
+   struct mlx5_uars_page *up = container_of(kref, struct mlx5_uars_page, 
ref_count);
+
+   list_del(&up->list);
+   if (mlx5_cmd_free_uar(up->mdev, up->index))
+   mlx5_core_warn(up->mdev, "failed to free uar index %d\n", 
up->index);
+   kfree(up->reg_bitmap);
+   kfree(up->fp_bitmap);
+   kfree(up);
+}
+
+static struct mlx5_uars_page *alloc_uars_page(struct mlx5_core_dev *mdev,
+ bool map_wc)
+{
+   struct mlx5_uars_page *up;
+   int err = -ENOMEM;
+   phys_addr_t pfn;
+   int bfregs;
+   int i;
+
+   bfregs = uars_per_sys_page(mdev) * MLX5_BFREGS_PER_UAR;
+   up = kzalloc(sizeof(*up), GFP_KERNEL);
+   if (!up)
+   return ERR_PTR(err);
+
+   up->mdev = mdev;
+   up->reg_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->reg_bitmap)
+   goto error1;
+
+   up->fp_bitmap = kcalloc(BITS_TO_LONGS(bfregs), sizeof(unsigned long), 
GFP_KERNEL);
+   if (!up->fp_bitmap)
+   goto error1;
+
+   for (i = 0; i < bfregs; i++)
+   if ((i % MLX5_BFREGS_PER_UAR) < MLX5_NON_FP_BFREGS_PER_UAR)
+   set_bit(i, up->reg_bitmap);
+   else
+   set_bit(i, up->fp_bitmap);
+
+   up->bfregs = bfregs;
+   up->fp_avail = bfregs * MLX5_FP_BFREGS_PER_UAR / MLX5_BFREGS_PER_UAR;
+   up->reg_avail = bfregs * MLX5_NON_FP_BFREGS_PER_UAR / 
MLX5_BFREGS_PER_UAR;
+
+   err = mlx5_cmd_alloc_uar(mdev, &up->index);
+   if (err) {
+   mlx5_core_warn(mdev, "mlx5_cmd_alloc_uar() failed, %d\n", err);
+   goto error1;
+   }
+
+   pfn = uar2pfn(mdev, up->index);
+   if (map_wc) {
+   up->map = ioremap_wc(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -EAGAIN;
+   goto error2;
+   }
+   } else {
+   up->map = ioremap(pfn << PAGE_SHIFT, PAGE_SIZE);
+   if (!up->map) {
+   err = -ENOMEM;
+   goto error2;
+   }
+   }
+   kref_init(&up->ref_count);
+   mlx5_core_dbg(mdev, "allocated UAR page: index %d, total bfregs %d\n",
+ up->index, up->bfregs);
+   return up;
+
+error2:
+   if (mlx5_cmd_free_uar(mdev, up->index))
+   mlx5_core_warn(mdev, "failed to free uar index %d\n", 
up->index);
+error1:
+   kfree(up->fp_bitmap);
+   kfree(up->reg_bitmap);
+   kfree(up);
+   return ERR_PTR(err);
+}
+
+static unsigned long map_offset(struct mlx5_core_dev *mdev, int dbi)
+{
+   /* return the offset in bytes from the start of

[for-next V2 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

A reference to a UAR is required to generate CQ or EQ doorbells. Since
CQ or EQ doorbells can all be generated using the same UAR area without
any effect on performance, we are just getting a reference to any
available UAR, If one is not available we allocate it but we don't waste
the blue flame registers it can provide and we will use them for
subsequent allocations.
We get a reference to such UAR and put in mlx5_priv so any kernel
consumer can make use of it.

Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 14 ---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 32 ++
 include/linux/mlx5/driver.h|  5 +++-
 4 files changed, 59 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
index 11a8d63..9849ee9 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
@@ -512,7 +512,7 @@ static void init_eq_buf(struct mlx5_eq *eq)
 
 int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 
vecidx,
   int nent, u64 mask, const char *name,
-  struct mlx5_uar *uar, enum mlx5_eq_type type)
+  enum mlx5_eq_type type)
 {
u32 out[MLX5_ST_SZ_DW(create_eq_out)] = {0};
struct mlx5_priv *priv = &dev->priv;
@@ -556,7 +556,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
 
eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
-   MLX5_SET(eqc, eqc, uar_page, uar->index);
+   MLX5_SET(eqc, eqc, uar_page, priv->uar->index);
MLX5_SET(eqc, eqc, intr, vecidx);
MLX5_SET(eqc, eqc, log_page_size,
 eq->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
@@ -571,7 +571,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
mlx5_eq *eq, u8 vecidx,
eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
eq->irqn = priv->msix_arr[vecidx].vector;
eq->dev = dev;
-   eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
+   eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
err = request_irq(eq->irqn, handler, 0,
  priv->irq_info[vecidx].name, eq);
if (err)
@@ -686,8 +686,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, &table->cmd_eq, MLX5_EQ_VEC_CMD,
 MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
-"mlx5_cmd_eq", &dev->priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_cmd_eq",  MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create cmd EQ %d\n", err);
return err;
@@ -697,8 +696,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 
err = mlx5_create_map_eq(dev, &table->async_eq, MLX5_EQ_VEC_ASYNC,
 MLX5_NUM_ASYNC_EQE, async_event_mask,
-"mlx5_async_eq", &dev->priv.bfregi.uars[0],
-MLX5_EQ_TYPE_ASYNC);
+"mlx5_async_eq", MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create async EQ %d\n", err);
goto err1;
@@ -708,7 +706,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_EQ_VEC_PAGES,
 /* TODO: sriov max_vf + */ 1,
 1 << MLX5_EVENT_TYPE_PAGE_REQUEST, 
"mlx5_pages_eq",
-&dev->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_ASYNC);
if (err) {
mlx5_core_warn(dev, "failed to create pages EQ %d\n", err);
@@ -722,7 +719,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
 MLX5_NUM_ASYNC_EQE,
 1 << MLX5_EVENT_TYPE_PAGE_FAULT,
 "mlx5_page_fault_eq",
-&dev->priv.bfregi.uars[0],
 MLX5_EQ_TYPE_PF);
if (err) {
mlx5_core_warn(dev, "failed to create page fault EQ 
%d\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 634e96a..2882d04 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -753,8 +753,7 @@ static int alloc_comp_eqs(struct mlx5_core_dev *dev)
snprintf(name, MLX5_MAX_IRQ_NAME, "mlx5_comp%d", i)

[for-next V2 08/10] IB/mlx5: Allow future extension of libmlx5 input data

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

Current check requests that new fields in struct
mlx5_ib_alloc_ucontext_req_v2 that are not known to the driver be zero.
This was introduced so new libraries passing additional information to
the kernel through struct mlx5_ib_alloc_ucontext_req_v2 will be notified
by old kernels that do not support their request by failing the
operation. This schecme is problematic since it requires libmlx5 to issue
the requests with descending input size for struct
mlx5_ib_alloc_ucontext_req_v2.

To avoid this, we require that new features that will obey the following
rules:
If the feature requires one or more fields in the response and the at
least one of the fields can be encoded such that a zero value means the
kernel ignored the request then this field will provide the indication
to the library. If no response is required or if zero is a valid
response, a new field should be added that indicates to the library
whether its request was processed.

Fixes: b368d7cb8ceb ('IB/mlx5: Add hca_core_clock_offset to udata in 
init_ucontext')
Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/cq.c  |   2 +-
 drivers/infiniband/hw/mlx5/main.c| 201 ++-
 drivers/infiniband/hw/mlx5/mlx5_ib.h |  15 ++-
 drivers/infiniband/hw/mlx5/qp.c  | 133 ++-
 include/linux/mlx5/device.h  |  12 ++-
 include/linux/mlx5/driver.h  |  12 +--
 6 files changed, 209 insertions(+), 166 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/cq.c b/drivers/infiniband/hw/mlx5/cq.c
index a28ec33..31803b3 100644
--- a/drivers/infiniband/hw/mlx5/cq.c
+++ b/drivers/infiniband/hw/mlx5/cq.c
@@ -788,7 +788,7 @@ static int create_cq_user(struct mlx5_ib_dev *dev, struct 
ib_udata *udata,
MLX5_SET(cqc, cqc, log_page_size,
 page_shift - MLX5_ADAPTER_PAGE_SHIFT);
 
-   *index = to_mucontext(context)->bfregi.uars[0].index;
+   *index = to_mucontext(context)->bfregi.sys_pages[0];
 
if (ucmd.cqe_comp_en == 1) {
if (unlikely((*cqe_size != 64) ||
diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index e9f0830..6640672 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,80 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
+struct mlx5_ib_alloc_ucontext_req_v2 *req,
+u32 *num_sys_pages)
+{
+   int uars_per_sys_page;
+   int bfregs_per_sys_page;
+   int ref_bfregs = req->total_num_bfregs;
+
+   if (req->total_num_bfregs == 0)
+   return -EINVAL;
+
+   BUILD_BUG_ON(MLX5_MAX_BFREGS % MLX5_NON_FP_BFREGS_IN_PAGE);
+   BUILD_BUG_ON(MLX5_MAX_BFREGS < MLX5_NON_FP_BFREGS_IN_PAGE);
+
+   if (req->total_num_bfregs > MLX5_MAX_BFREGS)
+   return -ENOMEM;
+
+   uars_per_sys_page = get_uars_per_sys_page(dev, lib_uar_4k);
+   bfregs_per_sys_page = uars_per_sys_page * MLX5_NON_FP_BFREGS_PER_UAR;
+   req->total_num_bfregs = ALIGN(req->total_num_bfregs, 
bfregs_per_sys_page);
+   *num_sys_pages = req->total_num_bfregs / bfregs_per_sys_page;
+
+   if (req->num_low_latency_bfregs > req->total_num_bfregs - 1)
+   return -EINVAL;
+
+   mlx5_ib_dbg(dev, "uar_4k: fw support %s, lib support %s, user requested 
%d bfregs, alloated %d, using %d sys pages\n",
+   MLX5_CAP_GEN(dev->mdev, uar_4k) ? "yes" : "no",
+   lib_uar_4k ? "yes" : "no", ref_bfregs,
+   req->total_num_bfregs, *num_sys_pages);
+
+   return 0;
+}
+
+static int allocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+   bfregi = &context->bfregi;
+   for (i = 0; i < bfregi->num_sys_pages; i++) {
+   err = mlx5_cmd_alloc_uar(dev->mdev, &bfregi->sys_pages[i]);
+   if (err)
+   goto error;
+
+   mlx5_ib_dbg(dev, "allocated uar %d\n", bfregi->sys_pages[i]);
+   }
+   return 0;
+
+error:
+   for (--i; i >= 0; i--)
+   if (mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]))
+   mlx5_ib_warn(dev, "failed to free uar %d\n", i);
+
+   return err;
+}
+
+static int deallocate_uars(struct mlx5_ib_dev *dev, struct mlx5_ib_ucontext 
*context)
+{
+   struct mlx5_bfreg_info *bfregi;
+   int err;
+   int i;
+
+   bfregi = &context->bfregi;
+   for (i = 0; i < bfregi->num_sys_pages; i++) {
+   err = mlx5_cmd_free_uar(dev->mdev, bfregi->sys_pages[i]);
+   if (err) {
+   mlx5_ib_warn(dev, "failed to free uar %d\n", i);
+   return er

[for-next V2 09/10] IB/mlx5: Support 4k UAR for libmlx5

2017-01-08 Thread Saeed Mahameed

From: Eli Cohen 

Add fields to structs to convey to kernel an indication whether the
library supports multi UARs per page and return to the library the size
of a UAR based on the queried value.

Signed-off-by: Eli Cohen 
Reviewed-by: Matan Barak 
Signed-off-by: Leon Romanovsky 
Signed-off-by: Saeed Mahameed 
---
 drivers/infiniband/hw/mlx5/main.c  | 21 +++-
 drivers/net/ethernet/mellanox/mlx5/core/cq.c   |  2 +
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  9 ++--
 .../net/ethernet/mellanox/mlx5/core/en_common.c| 12 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 21 
 drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 56 --
 include/linux/mlx5/cq.h|  2 +-
 include/linux/mlx5/driver.h| 12 -
 include/uapi/rdma/mlx5-abi.h   |  7 +++
 9 files changed, 42 insertions(+), 100 deletions(-)

diff --git a/drivers/infiniband/hw/mlx5/main.c 
b/drivers/infiniband/hw/mlx5/main.c
index 6640672..a191b93 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -992,6 +992,12 @@ static int mlx5_ib_modify_port(struct ib_device *ibdev, u8 
port, int mask,
return err;
 }
 
+static void print_lib_caps(struct mlx5_ib_dev *dev, u64 caps)
+{
+   mlx5_ib_dbg(dev, "MLX5_LIB_CAP_4K_UAR = %s\n",
+   caps & MLX5_LIB_CAP_4K_UAR ? "y" : "n");
+}
+
 static int calc_total_bfregs(struct mlx5_ib_dev *dev, bool lib_uar_4k,
 struct mlx5_ib_alloc_ucontext_req_v2 *req,
 u32 *num_sys_pages)
@@ -1122,6 +1128,10 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
resp.cqe_version = min_t(__u8,
 (__u8)MLX5_CAP_GEN(dev->mdev, cqe_version),
 req.max_cqe_version);
+   resp.log_uar_size = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_ADAPTER_PAGE_SHIFT : PAGE_SHIFT;
+   resp.num_uars_per_page = MLX5_CAP_GEN(dev->mdev, uar_4k) ?
+   MLX5_CAP_GEN(dev->mdev, 
num_of_uars_per_page) : 1;
resp.response_length = min(offsetof(typeof(resp), response_length) +
   sizeof(resp.response_length), udata->outlen);
 
@@ -1129,7 +1139,7 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
if (!context)
return ERR_PTR(-ENOMEM);
 
-   lib_uar_4k = false;
+   lib_uar_4k = req.lib_caps & MLX5_LIB_CAP_4K_UAR;
bfregi = &context->bfregi;
 
/* updates req->total_num_bfregs */
@@ -1209,6 +1219,12 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
sizeof(resp.reserved2);
}
 
+   if (field_avail(typeof(resp), log_uar_size, udata->outlen))
+   resp.response_length += sizeof(resp.log_uar_size);
+
+   if (field_avail(typeof(resp), num_uars_per_page, udata->outlen))
+   resp.response_length += sizeof(resp.num_uars_per_page);
+
err = ib_copy_to_udata(udata, &resp, resp.response_length);
if (err)
goto out_td;
@@ -1216,7 +1232,8 @@ static struct ib_ucontext *mlx5_ib_alloc_ucontext(struct 
ib_device *ibdev,
bfregi->ver = ver;
bfregi->num_low_latency_bfregs = req.num_low_latency_bfregs;
context->cqe_version = resp.cqe_version;
-   context->lib_caps = false;
+   context->lib_caps = req.lib_caps;
+   print_lib_caps(dev, context->lib_caps);
 
return &context->ibucontext;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cq.c 
b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
index 32d4af9..336d473 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/cq.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/cq.c
@@ -179,6 +179,8 @@ int mlx5_core_create_cq(struct mlx5_core_dev *dev, struct 
mlx5_core_cq *cq,
mlx5_core_dbg(dev, "failed adding CP 0x%x to debug file 
system\n",
  cq->cqn);
 
+   cq->uar = dev->priv.uar;
+
return 0;
 
 err_cmd:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 3037631..a473cea 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -465,7 +465,6 @@ struct mlx5e_sq {
/* read only */
struct mlx5_wq_cyc wq;
u32dma_fifo_mask;
-   void __iomem  *uar_map;
struct netdev_queue   *txq;
u32sqn;
u16bf_buf_size;
@@ -479,7 +478,7 @@ struct mlx5e_sq {
 
/* control path */
struct mlx5_wq_ctrlwq_ctrl;
-   struct mlx5_uaruar;
+   struct mlx5_sq_bfreg   bfreg;
struct mlx5e_channel  *channel

Re: [PATCH net-next 6/7] net/mlx5: E-Switch, Add control for inline mode

2017-01-08 Thread Saeed Mahameed

On Sun, Jan 8, 2017 at 11:56 AM, Jiri Pirko  wrote:
> Mon, Nov 21, 2016 at 02:06:00PM CET, sae...@mellanox.com wrote:
>>From: Roi Dayan 
>>
>>Implement devlink show and set of HW inline-mode.
>>The supported modes: none, link, network, transport.
>>We currently support one mode for all vports so set is done on all vports.
>>When eswitch is first initialized the inline-mode is queried from the FW.
>>
>>Signed-off-by: Roi Dayan 
>>Signed-off-by: Saeed Mahameed 
>
> Saeed, could you please use get_maintainer script and cc those people
> for you submissions? Thanks!

Sure,

Or, Roi, please make sure you do this in your future work.
I will verify prior to submission of course.

Re: [PATCH] net: ethernet: ti: cpsw: remove dual check from common res usage function

2017-01-08 Thread Ivan Khoronzhuk

Please ignore it, I've included it in new series

On Sun, Jan 08, 2017 at 03:56:27PM +0200, Ivan Khoronzhuk wrote:
> Common res usage is possible only in case an interface is
> running. In case of not dual emac here can be only one interface,
> so while ndo_open and switch mode, only one interface can be opened,
> thus if open is called no any interface is running ... and no common
> res are used. So remove check on dual emac, it will simplify
> code/understanding and will match the name it's called.
> 
> Signed-off-by: Ivan Khoronzhuk 
> ---
> 
> Based on linux-next/master
> 
>  drivers/net/ethernet/ti/cpsw.c | 3 ---
>  1 file changed, 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index b203143..91684f1 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -1235,9 +1235,6 @@ static int cpsw_common_res_usage_state(struct 
> cpsw_common *cpsw)
>   u32 i;
>   u32 usage_count = 0;
>  
> - if (!cpsw->data.dual_emac)
> - return 0;
> -
>   for (i = 0; i < cpsw->data.slaves; i++)
>   if (cpsw->slaves[i].open_stat)
>   usage_count++;
> -- 
> 2.7.4
>

[PATCH 4/4] net: ethernet: ti: cpsw: don't duplicate common res in rx handler

2017-01-08 Thread Ivan Khoronzhuk

No need to duplicate the same function in rx handler to get info
if any interface is running.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 40 
 1 file changed, 16 insertions(+), 24 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index daae87f..458298d 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -671,6 +671,18 @@ static void cpsw_intr_disable(struct cpsw_common *cpsw)
return;
 }
 
+static int cpsw_common_res_usage_state(struct cpsw_common *cpsw)
+{
+   u32 i;
+   u32 usage_count = 0;
+
+   for (i = 0; i < cpsw->data.slaves; i++)
+   if (netif_running(cpsw->slaves[i].ndev))
+   usage_count++;
+
+   return usage_count;
+}
+
 static void cpsw_tx_handler(void *token, int len, int status)
 {
struct netdev_queue *txq;
@@ -703,18 +715,10 @@ static void cpsw_rx_handler(void *token, int len, int 
status)
cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb);
 
if (unlikely(status < 0) || unlikely(!netif_running(ndev))) {
-   bool ndev_status = false;
-   struct cpsw_slave *slave = cpsw->slaves;
-   int n;
-
-   if (cpsw->data.dual_emac) {
-   /* In dual emac mode check for all interfaces */
-   for (n = cpsw->data.slaves; n; n--, slave++)
-   if (netif_running(slave->ndev))
-   ndev_status = true;
-   }
-
-   if (ndev_status && (status >= 0)) {
+   /* In dual emac mode check for all interfaces */
+   if (cpsw->data.dual_emac &&
+   cpsw_common_res_usage_state(cpsw) &&
+   (status >= 0)) {
/* The packet received is for the interface which
 * is already down and the other interface is up
 * and running, instead of freeing which results
@@ -1234,18 +1238,6 @@ static void cpsw_get_ethtool_stats(struct net_device 
*ndev,
}
 }
 
-static int cpsw_common_res_usage_state(struct cpsw_common *cpsw)
-{
-   u32 i;
-   u32 usage_count = 0;
-
-   for (i = 0; i < cpsw->data.slaves; i++)
-   if (netif_running(cpsw->slaves[i].ndev))
-   usage_count++;
-
-   return usage_count;
-}
-
 static inline int cpsw_tx_packet_submit(struct cpsw_priv *priv,
struct sk_buff *skb,
struct cpdma_chan *txch)
-- 
2.7.4

[PATCH 3/4] net: ethernet: ti: cpsw: don't duplicate ndev_running

2017-01-08 Thread Ivan Khoronzhuk

No need to create additional vars to identify if interface is running.
So simplify code by removing redundant var and checking usage counter
instead.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 14 --
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 40d7fc9..daae87f 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -357,7 +357,6 @@ struct cpsw_slave {
struct phy_device   *phy;
struct net_device   *ndev;
u32 port_vlan;
-   u32 open_stat;
 };
 
 static inline u32 slave_read(struct cpsw_slave *slave, u32 offset)
@@ -1241,7 +1240,7 @@ static int cpsw_common_res_usage_state(struct cpsw_common 
*cpsw)
u32 usage_count = 0;
 
for (i = 0; i < cpsw->data.slaves; i++)
-   if (cpsw->slaves[i].open_stat)
+   if (netif_running(cpsw->slaves[i].ndev))
usage_count++;
 
return usage_count;
@@ -1502,7 +1501,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
 CPSW_RTL_VERSION(reg));
 
/* initialize host and slave ports */
-   if (!cpsw_common_res_usage_state(cpsw))
+   if (cpsw_common_res_usage_state(cpsw) < 2)
cpsw_init_host_port(priv);
for_each_slave(priv, cpsw_slave_open, priv);
 
@@ -1513,7 +1512,7 @@ static int cpsw_ndo_open(struct net_device *ndev)
cpsw_ale_add_vlan(cpsw->ale, cpsw->data.default_vlan,
  ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0);
 
-   if (!cpsw_common_res_usage_state(cpsw)) {
+   if (cpsw_common_res_usage_state(cpsw) < 2) {
/* disable priority elevation */
__raw_writel(0, &cpsw->regs->ptype);
 
@@ -1556,9 +1555,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
cpdma_ctlr_start(cpsw->dma);
cpsw_intr_enable(cpsw);
 
-   if (cpsw->data.dual_emac)
-   cpsw->slaves[priv->emac_port].open_stat = true;
-
return 0;
 
 err_cleanup:
@@ -1578,7 +1574,7 @@ static int cpsw_ndo_stop(struct net_device *ndev)
netif_tx_stop_all_queues(priv->ndev);
netif_carrier_off(priv->ndev);
 
-   if (cpsw_common_res_usage_state(cpsw) <= 1) {
+   if (!cpsw_common_res_usage_state(cpsw)) {
napi_disable(&cpsw->napi_rx);
napi_disable(&cpsw->napi_tx);
cpts_unregister(cpsw->cpts);
@@ -1592,8 +1588,6 @@ static int cpsw_ndo_stop(struct net_device *ndev)
cpsw_split_res(ndev);
 
pm_runtime_put_sync(cpsw->dev);
-   if (cpsw->data.dual_emac)
-   cpsw->slaves[priv->emac_port].open_stat = false;
return 0;
 }
 
-- 
2.7.4

[PATCH 0/4] net: ethernet: ti: cpsw: correct common res usage

2017-01-08 Thread Ivan Khoronzhuk

This series is intended to remove unneeded redundancies connected with
common resource usage function.

Based on net-next/master
Tested on am572x idk

Ivan Khoronzhuk (4):
  net: ethernet: ti: cpsw: remove dual check from common res usage
function
  net: ethernet: ti: cpsw: don't disable interrupts in ndo_open
  net: ethernet: ti: cpsw: don't duplicate ndev_running
  net: ethernet: ti: cpsw: don't duplicate common res in rx handler

 drivers/net/ethernet/ti/cpsw.c | 57 ++
 1 file changed, 19 insertions(+), 38 deletions(-)

-- 
2.7.4

[PATCH 2/4] net: ethernet: ti: cpsw: don't disable interrupts in ndo_open

2017-01-08 Thread Ivan Khoronzhuk

If any interface is running the interrupts are disabled anyway.
It make sense to disable interrupts if any of interfaces is running,
but in this place, obviously, it didn't have any effect. So, no need
in redundant check and interrupt disable.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index d261024..40d7fc9 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1480,8 +1480,6 @@ static int cpsw_ndo_open(struct net_device *ndev)
return ret;
}
 
-   if (!cpsw_common_res_usage_state(cpsw))
-   cpsw_intr_disable(cpsw);
netif_carrier_off(ndev);
 
/* Notify the stack of the actual queue counts. */
-- 
2.7.4

[PATCH 1/4] net: ethernet: ti: cpsw: remove dual check from common res usage function

2017-01-08 Thread Ivan Khoronzhuk

Common res usage is possible only in case an interface is
running. In case of not dual emac here can be only one interface,
so while ndo_open and switch mode, only one interface can be opened,
thus if open is called no any interface is running ... and no common
res are used. So remove check on dual emac, it will simplify
code/understanding and will match the name it's called.

Signed-off-by: Ivan Khoronzhuk 
---
 drivers/net/ethernet/ti/cpsw.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index f339268..d261024 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1240,9 +1240,6 @@ static int cpsw_common_res_usage_state(struct cpsw_common 
*cpsw)
u32 i;
u32 usage_count = 0;
 
-   if (!cpsw->data.dual_emac)
-   return 0;
-
for (i = 0; i < cpsw->data.slaves; i++)
if (cpsw->slaves[i].open_stat)
usage_count++;
-- 
2.7.4

Re: [PATCH net-next 2/2] net/sched: act_csum: compute crc32c on SCTP packets

2017-01-08 Thread Davide Caratti

On Fri, 2017-01-06 at 10:23 +0100, Nicolas Dichtel wrote:
> Le 05/01/2017 à 17:59, Davide Caratti a écrit :
> > @@ -21,7 +21,8 @@ enum {
> >     TCA_CSUM_UPDATE_FLAG_IGMP= 4,
> >     TCA_CSUM_UPDATE_FLAG_TCP = 8,
> >     TCA_CSUM_UPDATE_FLAG_UDP = 16,
> > -   TCA_CSUM_UPDATE_FLAG_UDPLITE = 32
> > +   TCA_CSUM_UPDATE_FLAG_UDPLITE = 32,
> > +   TCA_CSUM_UPDATE_FLAG_SCTP= 64
> nit: please put a comma after the '64' so that the next person who adds
> a flag
> will not have to touch that line.
> 

ok,

> > @@ -365,6 +385,12 @@ static int tcf_csum_ipv4(struct sk_buff *skb, u32
> > update_flags)
> >        ntohs(iph->tot_len),
> > 1))
> >     goto fail;
> >     break;
> > +   case IPPROTO_SCTP:
> > +   if (update_flags & TCA_CSUM_UPDATE_FLAG_SCTP)
> > +   if (!tcf_csum_sctp(skb, iph->ihl * 4,
> > +      ntohs(iph->tot_len)))
> nit: one 'if' only?
>   if (update_flags & TCA_CSUM_UPDATE_FLAG_SCTP &&
>   !tcf_csum_sctp(skb, iph->ihl * 4, ntohs(iph->tot_len))
> 

ok,

> > @@ -481,6 +507,12 @@ static int tcf_csum_ipv6(struct sk_buff *skb, u32
> > update_flags)
> >        pl +
> > sizeof(*ip6h), 1))
> >     goto fail;
> >     goto done;
> > +   case IPPROTO_SCTP:
> > +   if (update_flags & TCA_CSUM_UPDATE_FLAG_SCTP)
> > +   if (!tcf_csum_sctp(skb, hl,
> > +      pl +
> > sizeof(*ip6h)))
> Same here.
> 

ok,

> 
> Regards,
> Nicolas

hello Nicolas,
thank you for the attention! I will apply the changes you suggested and
repost a v2.

regards,
--
davide

[PATCH] cls_u32: don't bother explicitly initializing ->divisor to zero

2017-01-08 Thread Alexandru Moise

This struct member is already initialized to zero upon root_ht's
allocation via kzalloc().

Signed-off-by: Alexandru Moise <00moses.alexande...@gmail.com>
---
 net/sched/cls_u32.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/cls_u32.c b/net/sched/cls_u32.c
index ae83c3ae..a6ec3e4b 100644
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -334,7 +334,6 @@ static int u32_init(struct tcf_proto *tp)
if (root_ht == NULL)
return -ENOBUFS;
 
-   root_ht->divisor = 0;
root_ht->refcnt++;
root_ht->handle = tp_c ? gen_new_htid(tp_c) : 0x8000;
root_ht->prio = tp->prio;
-- 
2.1.4

Re: [for-next V2 06/10] net/mlx5: Add interface to get reference to a UAR

2017-01-08 Thread Yuval Shaia

On Sun, Jan 08, 2017 at 05:54:47PM +0200, Saeed Mahameed wrote:
> From: Eli Cohen 
> 
> A reference to a UAR is required to generate CQ or EQ doorbells. Since
> CQ or EQ doorbells can all be generated using the same UAR area without
> any effect on performance, we are just getting a reference to any
> available UAR, If one is not available we allocate it but we don't waste
> the blue flame registers it can provide and we will use them for
> subsequent allocations.
> We get a reference to such UAR and put in mlx5_priv so any kernel
> consumer can make use of it.
> 
> Signed-off-by: Eli Cohen 
> Reviewed-by: Matan Barak 
> Signed-off-by: Leon Romanovsky 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/eq.c   | 14 ---
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 22 ++
>  drivers/net/ethernet/mellanox/mlx5/core/uar.c  | 32 
> ++
>  include/linux/mlx5/driver.h|  5 +++-
>  4 files changed, 59 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eq.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> index 11a8d63..9849ee9 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eq.c
> @@ -512,7 +512,7 @@ static void init_eq_buf(struct mlx5_eq *eq)
>  
>  int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct mlx5_eq *eq, u8 
> vecidx,
>  int nent, u64 mask, const char *name,
> -struct mlx5_uar *uar, enum mlx5_eq_type type)
> +enum mlx5_eq_type type)
>  {
>   u32 out[MLX5_ST_SZ_DW(create_eq_out)] = {0};
>   struct mlx5_priv *priv = &dev->priv;
> @@ -556,7 +556,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
> mlx5_eq *eq, u8 vecidx,
>  
>   eqc = MLX5_ADDR_OF(create_eq_in, in, eq_context_entry);
>   MLX5_SET(eqc, eqc, log_eq_size, ilog2(eq->nent));
> - MLX5_SET(eqc, eqc, uar_page, uar->index);
> + MLX5_SET(eqc, eqc, uar_page, priv->uar->index);
>   MLX5_SET(eqc, eqc, intr, vecidx);
>   MLX5_SET(eqc, eqc, log_page_size,
>eq->buf.page_shift - MLX5_ADAPTER_PAGE_SHIFT);
> @@ -571,7 +571,7 @@ int mlx5_create_map_eq(struct mlx5_core_dev *dev, struct 
> mlx5_eq *eq, u8 vecidx,
>   eq->eqn = MLX5_GET(create_eq_out, out, eq_number);
>   eq->irqn = priv->msix_arr[vecidx].vector;
>   eq->dev = dev;
> - eq->doorbell = uar->map + MLX5_EQ_DOORBEL_OFFSET;
> + eq->doorbell = priv->uar->map + MLX5_EQ_DOORBEL_OFFSET;
>   err = request_irq(eq->irqn, handler, 0,
> priv->irq_info[vecidx].name, eq);
>   if (err)
> @@ -686,8 +686,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
>  
>   err = mlx5_create_map_eq(dev, &table->cmd_eq, MLX5_EQ_VEC_CMD,
>MLX5_NUM_CMD_EQE, 1ull << MLX5_EVENT_TYPE_CMD,
> -  "mlx5_cmd_eq", &dev->priv.bfregi.uars[0],
> -  MLX5_EQ_TYPE_ASYNC);
> +  "mlx5_cmd_eq",  MLX5_EQ_TYPE_ASYNC);

Remove extra space

>   if (err) {
>   mlx5_core_warn(dev, "failed to create cmd EQ %d\n", err);
>   return err;
> @@ -697,8 +696,7 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
>  
>   err = mlx5_create_map_eq(dev, &table->async_eq, MLX5_EQ_VEC_ASYNC,
>MLX5_NUM_ASYNC_EQE, async_event_mask,
> -  "mlx5_async_eq", &dev->priv.bfregi.uars[0],
> -  MLX5_EQ_TYPE_ASYNC);
> +  "mlx5_async_eq", MLX5_EQ_TYPE_ASYNC);
>   if (err) {
>   mlx5_core_warn(dev, "failed to create async EQ %d\n", err);
>   goto err1;
> @@ -708,7 +706,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
>MLX5_EQ_VEC_PAGES,
>/* TODO: sriov max_vf + */ 1,
>1 << MLX5_EVENT_TYPE_PAGE_REQUEST, 
> "mlx5_pages_eq",
> -  &dev->priv.bfregi.uars[0],
>MLX5_EQ_TYPE_ASYNC);
>   if (err) {
>   mlx5_core_warn(dev, "failed to create pages EQ %d\n", err);
> @@ -722,7 +719,6 @@ int mlx5_start_eqs(struct mlx5_core_dev *dev)
>MLX5_NUM_ASYNC_EQE,
>1 << MLX5_EVENT_TYPE_PAGE_FAULT,
>"mlx5_page_fault_eq",
> -  &dev->priv.bfregi.uars[0],
>MLX5_EQ_TYPE_PF);
>   if (err) {
>   mlx5_core_warn(dev, "failed to create page fault EQ 
> %d\n",
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index 634e96a..2882d04 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b

Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-08 Thread Jiri Pirko

Mon, Jan 02, 2017 at 03:59:49PM CET, j...@mojatatu.com wrote:
>
>We have been using a cookie as well for actions (which we have been
>using but have been too lazy to submit so far). I am going to port
>it over to the newer kernels and post it.

Hard to deal with something we can't look at :)


>In our case that is intended to be opaque to the kernel i.e kernel
>never inteprets it; in that case it is similar to the kernel
>FIB protocol field.

In case of this patch, kernel also never interprets it. What makes you
think otherwise. Bot kernel, it is always a binary blob.


>
>In your case - could this cookie have been a class/flowid
>(a 32 bit)?
>And would it not make more sense for it the cookie to be
>generic to all classifiers? i.e why is it specific to flower?

Correct, makes sense to have it generic for all cls and perhaps also
acts.


>
>cheers,
>jamal
>
>On 17-01-02 08:13 AM, Paul Blakey wrote:
>> This is to support saving extra data that might be helpful on retrieval.
>> First use case is upcoming openvswitch flow offloads, extra data will
>> include UFID and port mappings for each added flow.
>> 
>> Signed-off-by: Paul Blakey 
>> Reviewed-by: Roi Dayan 
>> Acked-by: Jiri Pirko 
>> ---
>>  include/uapi/linux/pkt_cls.h |  3 +++
>>  net/sched/cls_flower.c   | 22 +-
>>  2 files changed, 24 insertions(+), 1 deletion(-)
>> 
>> diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>> index cb4bcdc..ca9bbe3 100644
>> --- a/include/uapi/linux/pkt_cls.h
>> +++ b/include/uapi/linux/pkt_cls.h
>> @@ -471,10 +471,13 @@ enum {
>>  TCA_FLOWER_KEY_ICMPV6_TYPE, /* u8 */
>>  TCA_FLOWER_KEY_ICMPV6_TYPE_MASK,/* u8 */
>> 
>> +TCA_FLOWER_COOKIE,  /* binary */
>> +
>>  __TCA_FLOWER_MAX,
>>  };
>> 
>>  #define TCA_FLOWER_MAX (__TCA_FLOWER_MAX - 1)
>> +#define FLOWER_MAX_COOKIE_SIZE 128
>> 
>>  enum {
>>  TCA_FLOWER_KEY_FLAGS_IS_FRAGMENT = (1 << 0),
>> diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
>> index 333f8e2..e2f5b25 100644
>> --- a/net/sched/cls_flower.c
>> +++ b/net/sched/cls_flower.c
>> @@ -85,6 +85,8 @@ struct cls_fl_filter {
>>  struct rcu_head rcu;
>>  struct tc_to_netdev tc;
>>  struct net_device *hw_dev;
>> +size_t cookie_len;
>> +long cookie[0];
>>  };
>> 
>>  static unsigned short int fl_mask_range(const struct fl_flow_mask *mask)
>> @@ -794,6 +796,9 @@ static int fl_change(struct net *net, struct sk_buff 
>> *in_skb,
>>  struct cls_fl_filter *fnew;
>>  struct nlattr *tb[TCA_FLOWER_MAX + 1];
>>  struct fl_flow_mask mask = {};
>> +const struct nlattr *attr;
>> +size_t cookie_len = 0;
>> +void *cookie;
>>  int err;
>> 
>>  if (!tca[TCA_OPTIONS])
>> @@ -806,10 +811,22 @@ static int fl_change(struct net *net, struct sk_buff 
>> *in_skb,
>>  if (fold && handle && fold->handle != handle)
>>  return -EINVAL;
>> 
>> -fnew = kzalloc(sizeof(*fnew), GFP_KERNEL);
>> +if (tb[TCA_FLOWER_COOKIE]) {
>> +attr = tb[TCA_FLOWER_COOKIE];
>> +cookie_len = nla_len(attr);
>> +cookie = nla_data(attr);
>> +if (cookie_len > FLOWER_MAX_COOKIE_SIZE)
>> +return -EINVAL;
>> +}
>> +
>> +fnew = kzalloc(sizeof(*fnew) + cookie_len, GFP_KERNEL);
>>  if (!fnew)
>>  return -ENOBUFS;
>> 
>> +fnew->cookie_len = cookie_len;
>> +if (cookie_len)
>> +memcpy(fnew->cookie, cookie, cookie_len);
>> +
>>  err = tcf_exts_init(&fnew->exts, TCA_FLOWER_ACT, 0);
>>  if (err < 0)
>>  goto errout;
>> @@ -1151,6 +1168,9 @@ static int fl_dump(struct net *net, struct tcf_proto 
>> *tp, unsigned long fh,
>> 
>>  nla_put_u32(skb, TCA_FLOWER_FLAGS, f->flags);
>> 
>> +if (f->cookie_len)
>> +nla_put(skb, TCA_FLOWER_COOKIE, f->cookie_len, f->cookie);
>> +
>>  if (tcf_exts_dump(skb, &f->exts))
>>  goto nla_put_failure;
>> 
>> 
>

Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-08 Thread Jiri Pirko

Mon, Jan 02, 2017 at 07:23:27PM CET, john.fastab...@gmail.com wrote:
>On 17-01-02 06:59 AM, Jamal Hadi Salim wrote:
>> 
>> We have been using a cookie as well for actions (which we have been
>> using but have been too lazy to submit so far). I am going to port
>> it over to the newer kernels and post it.
>> In our case that is intended to be opaque to the kernel i.e kernel
>> never inteprets it; in that case it is similar to the kernel
>> FIB protocol field.
>> 
>> In your case - could this cookie have been a class/flowid
>> (a 32 bit)?
>> And would it not make more sense for it the cookie to be
>> generic to all classifiers? i.e why is it specific to flower?
>> 
>> cheers,
>> jamal
>> 
>> On 17-01-02 08:13 AM, Paul Blakey wrote:
>>> This is to support saving extra data that might be helpful on retrieval.
>>> First use case is upcoming openvswitch flow offloads, extra data will
>>> include UFID and port mappings for each added flow.
>>>
>>> Signed-off-by: Paul Blakey 
>>> Reviewed-by: Roi Dayan 
>>> Acked-by: Jiri Pirko 
>>> ---
>
>Additionally I would like to point out this is an arbitrary length binary
>blob (for undefined use, without even a specified encoding) that gets pushed
>between user space and hardware ;) This seemed to get folks fairly excited in
>the past.

No John, this is very different. What was frowned upon was interchange
of binary blobs between userspace and hw. In this case, cookie is never
interpreted, only stored in kernel memory, used *always* only by user.

Re: [PATCH net-next] net/sched: cls_flower: Add user specified data

2017-01-08 Thread Jiri Pirko

Mon, Jan 02, 2017 at 11:21:41PM CET, j...@mojatatu.com wrote:
>On 17-01-02 01:23 PM, John Fastabend wrote:
>
>> 
>> Additionally I would like to point out this is an arbitrary length binary
>> blob (for undefined use, without even a specified encoding) that gets pushed
>> between user space and hardware ;) This seemed to get folks fairly excited in
>> the past.
>> 
>
>The binary blob size is a little strange - but i think there is value
>in storing some "cookie" field. The challenge is whether the kernel
>gets to intepret it; in which case encoding must be specified. Or
>whether we should leave it up to user space - in which something
>like tc could standardize its own encodings.

This should never be interpreted by kernel. I think this would be good
to make clear in the comment in the code.


>
>> Some questions, exactly what do you mean by "port mappings" above? In
>> general the 'tc' API uses the netdev the netlink msg is processed on as
>> the port mapping. If you mean OVS port to netdev port I think this is
>> a OVS problem and nothing to do with 'tc'. For what its worth there is an
>> existing problem with 'tc' where rules only apply to a single ingress or
>> egress port which is limiting on hardware.
>> 
>
>In our case the desire is to be able to correlate for a system wide
>mostly identity/key mapping.
>
>> The UFID in my ovs code base is defined as best I can tell here,
>> 
>> [OVS_FLOW_ATTR_UFID] = { .type = NL_A_UNSPEC, .optional = true,
>>  .min_len = sizeof(ovs_u128) },
>> 
>> So you need 128 bits if you want a 1:1 mapping onto 'tc'. So rather
>> than an arbitrary blob why not make the case that 'tc' ids need to be
>> 128 bits long? Even if its just initially done in flower call it
>> flower_flow_id and define it so its not opaque and at least at the code
>> level it isn't an arbitrary blob of data.
>> 
>
>I dont know what this UFID is, but do note:
>The idea is not new - the FIB for example has some such cookie
>(albeit a tiny one) which will typically be populated to tell
>you who/what installed the entry.
>I could see f.e use for this cookie to simplify and pretty print in
>a human language for the u32 classifier (i.e user space tc sets
>some fields in the cookie when updating kernel and when user space
>invokes get/dump it uses the cookie to intepret how to pretty print).
>
>I have attached a compile tested version of the cookies on actions
>(flat 64 bit; now that we have experienced the use when we have a
>large number of counters - I would not mind a 128 bit field).
>
>
>cheers,
>jamal
>
>> And what are the "next" uses of this besides OVS. It would be really
>> valuable to see how this generalizes to other usage models. To avoid
>> embedding OVS syntax into 'tc'.
>> 
>> Finally if you want to see an example of binary data encodings look at
>> how drivers/hardware/users are currently using the user defined bits in
>> ethtools ntuple API. Also track down out of tree drivers to see other
>> interesting uses. And that was capped at 64bits :/
>> 
>> Thanks,
>> John
>> 
>> 
>> 
>> 
>> 
>

>diff --git a/include/net/act_api.h b/include/net/act_api.h
>index 1d71644..f299ed3 100644
>--- a/include/net/act_api.h
>+++ b/include/net/act_api.h
>@@ -41,6 +41,7 @@ struct tc_action {
>   struct rcu_head tcfa_rcu;
>   struct gnet_stats_basic_cpu __percpu *cpu_bstats;
>   struct gnet_stats_queue __percpu *cpu_qstats;
>+  u64 cookie;
> };
> #define tcf_head  common.tcfa_head
> #define tcf_index common.tcfa_index
>diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
>index cb4bcdc..2e968ee 100644
>--- a/include/uapi/linux/pkt_cls.h
>+++ b/include/uapi/linux/pkt_cls.h
>@@ -67,6 +67,7 @@ enum {
>   TCA_ACT_INDEX,
>   TCA_ACT_STATS,
>   TCA_ACT_PAD,
>+  TCA_ACT_COOKIE,
>   __TCA_ACT_MAX
> };
> 
>diff --git a/net/sched/act_api.c b/net/sched/act_api.c
>index 2095c83..97eae6b 100644
>--- a/net/sched/act_api.c
>+++ b/net/sched/act_api.c
>@@ -26,6 +26,7 @@
> #include 
> #include 
> #include 
>+#include 
> 
> static void free_tcf(struct rcu_head *head)
> {
>@@ -467,17 +468,21 @@ int tcf_action_destroy(struct list_head *actions, int 
>bind)
>   return a->ops->dump(skb, a, bind, ref);
> }
> 
>-int
>-tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind, int ref)
>+int tcf_action_dump_1(struct sk_buff *skb, struct tc_action *a, int bind,
>+int ref)
> {
>   int err = -EINVAL;
>   unsigned char *b = skb_tail_pointer(skb);
>   struct nlattr *nest;
>+  u64 cookie = a->cookie;
> 
>   if (nla_put_string(skb, TCA_KIND, a->ops->kind))
>   goto nla_put_failure;
>   if (tcf_action_copy_stats(skb, a, 0))
>   goto nla_put_failure;
>+  if (nla_put_u64_64bit(skb, TCA_ACT_COOKIE, cookie, TCA_ACT_PAD))
>+  goto nla_put_failure;
>+
>   nest = nla_nest_start(skb, TCA_OPTIONS);
>   if (nest == NULL)
>

Re: [PATCH net 1/2] net: dsa: bcm_sf2: Do not clobber b53_switch_ops

2017-01-08 Thread Andrew Lunn

On Sat, Jan 07, 2017 at 09:01:56PM -0800, Florian Fainelli wrote:
> We make the bcm_sf2 driver override ds->ops which points to
> b53_switch_ops since b53_switch_alloc() did the assignent. This is all
> well and good until a second b53 switch comes in, and ends up using the
> bcm_sf2 operations. Make a proper local copy, substitute the ds->ops
> pointer and then override the operations.
> 
> Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when 
> possible")
> Signed-off-by: Florian Fainelli 

Hi Florian

There is a general trend of making ops structures const. It closes off
kernel exploits. This coping and then modifying prevents us making
ds->ops a pointer to a const.

You are already using b53_common.c as a library. Could you go further
with the concept, and export the ops you need for SF2, and have SF2
define its own ops structure? We can then swap to const ops dsa wide.

Thanks
Andrew

Re: [PATCH net 2/2] net: dsa: bcm_sf2: Utilize nested MDIO read/write

2017-01-08 Thread Andrew Lunn

On Sat, Jan 07, 2017 at 09:01:57PM -0800, Florian Fainelli wrote:
> We are implementing a MDIO bus which is behind another one, so use the
> nested version of the accessors to get lockdep annotations correct.
> 
> Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net 1/2] net: dsa: bcm_sf2: Do not clobber b53_switch_ops

2017-01-08 Thread Florian Fainelli

Le 01/08/17 à 09:41, Andrew Lunn a écrit :
> On Sat, Jan 07, 2017 at 09:01:56PM -0800, Florian Fainelli wrote:
>> We make the bcm_sf2 driver override ds->ops which points to
>> b53_switch_ops since b53_switch_alloc() did the assignent. This is all
>> well and good until a second b53 switch comes in, and ends up using the
>> bcm_sf2 operations. Make a proper local copy, substitute the ds->ops
>> pointer and then override the operations.
>>
>> Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when 
>> possible")
>> Signed-off-by: Florian Fainelli 
> 
> Hi Florian

Hi Andrew,

> 
> There is a general trend of making ops structures const. It closes off
> kernel exploits. This coping and then modifying prevents us making
> ds->ops a pointer to a const.

Agreed, and this was my initial approach, but I also wanted a minimal
fix for David to pull into "net" while we can properly resolve this for
"net-next" see below.

> 
> You are already using b53_common.c as a library. Could you go further
> with the concept, and export the ops you need for SF2, and have SF2
> define its own ops structure? We can then swap to const ops dsa wide.

Making the ops const was my initial approach but there are several
challenges to making it possible right now which I will address against
net-next:

- register/unregister_switch_driver actually do modify dsa_switch_ops
while updating the list pointer, so we need to encapsulate
dsa_switch_ops into a dsa_switch_driver plus a list member

- as you pointed out, b53 needs to export the operations to other
drivers that are going to make use of them

Thanks for your comments!
-- 
Florian

[PATCH] net: ethernet: ti: cpsw: extend limits for cpsw_get/set_ringparam

2017-01-08 Thread Ivan Khoronzhuk

Allow to set number of descs close to possible values. In case of
minimum limit it's equal to number of channels to be able to set
at least one desc per channel. For maximum limit leave enough descs
number for tx channels.

Signed-off-by: Ivan Khoronzhuk 
---

Based on net-next/master

 drivers/net/ethernet/ti/cpsw.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 458298d..09e0ed6 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2474,8 +2474,7 @@ static void cpsw_get_ringparam(struct net_device *ndev,
/* not supported */
ering->tx_max_pending = 0;
ering->tx_pending = cpdma_get_num_tx_descs(cpsw->dma);
-   /* Max 90% RX buffers */
-   ering->rx_max_pending = (descs_pool_size * 9) / 10;
+   ering->rx_max_pending = descs_pool_size - CPSW_MAX_QUEUES;
ering->rx_pending = cpdma_get_num_rx_descs(cpsw->dma);
 }
 
@@ -2490,8 +2489,8 @@ static int cpsw_set_ringparam(struct net_device *ndev,
/* ignore ering->tx_pending - only rx_pending adjustment is supported */
 
if (ering->rx_mini_pending || ering->rx_jumbo_pending ||
-   ering->rx_pending < (descs_pool_size / 10) ||
-   ering->rx_pending > ((descs_pool_size * 9) / 10))
+   ering->rx_pending < CPSW_MAX_QUEUES ||
+   ering->rx_pending > (descs_pool_size - CPSW_MAX_QUEUES))
return -EINVAL;
 
if (ering->rx_pending == cpdma_get_num_rx_descs(cpsw->dma))
-- 
2.7.4

Re: [PATCH net 1/2] net: dsa: bcm_sf2: Do not clobber b53_switch_ops

2017-01-08 Thread Andrew Lunn

> Agreed, and this was my initial approach, but I also wanted a minimal
> fix for David to pull into "net" while we can properly resolve this for
> "net-next" see below.

O.K, so in that case, this is fine.

> Making the ops const was my initial approach but there are several
> challenges to making it possible right now which I will address against
> net-next:
> 
> - register/unregister_switch_driver actually do modify dsa_switch_ops
> while updating the list pointer, so we need to encapsulate
> dsa_switch_ops into a dsa_switch_driver plus a list member

O.K, this is dsa v1. I had v2 in mind. Yes, the list needs
abstracting.

Thanks
Andrew

Re: [PATCH net 1/2] net: dsa: bcm_sf2: Do not clobber b53_switch_ops

2017-01-08 Thread Andrew Lunn

On Sat, Jan 07, 2017 at 09:01:56PM -0800, Florian Fainelli wrote:
> We make the bcm_sf2 driver override ds->ops which points to
> b53_switch_ops since b53_switch_alloc() did the assignent. This is all
> well and good until a second b53 switch comes in, and ends up using the
> bcm_sf2 operations. Make a proper local copy, substitute the ds->ops
> pointer and then override the operations.
> 
> Fixes: f458995b9ad8 ("net: dsa: bcm_sf2: Utilize core B53 driver when 
> possible")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net] bpf: change back to orig prog on too many passes

2017-01-08 Thread David Miller

From: Daniel Borkmann 
Date: Sat,  7 Jan 2017 00:26:33 +0100

> If after too many passes still no image could be emitted, then
> swap back to the original program as we do in all other cases
> and don't use the one with blinding.
> 
> Fixes: 959a75791603 ("bpf, x86: add support for constant blinding")
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Applied and queued up for -stable, thanks Daniel.

patch 4.8 "net: handle no dst on skb in icmp6_send"

2017-01-08 Thread Bronek Kozicki


Hello,

any particular reason why this fix 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=79dc7e3f1cd323be4c81aa1a94faa1b3ed987fb2 
was missed from stable 4.8 line? Apparently the bug being fixed has its 
own https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-9919


Thank you for your hard work and best regards


B.

Re: [PATCH net-next] liquidio: store the L4 hash of rx packets in skb

2017-01-08 Thread David Miller

From: Felix Manlunas 
Date: Fri, 6 Jan 2017 16:55:42 -0800

>  
> + if (rh->r_dh.has_hash) {
> + u32 hash = be32_to_cpu(*(u32 *)(skb->data + r_dh_off));

Is the checksum defined to be in the first 4-bytes of the 8-byte DHLEN unit,
or the second 4-bytes?  Is the answer to this question endian-dependent?

Re: [PATCH net-next] liquidio: simplify octeon_flush_iq()

2017-01-08 Thread David Miller

From: Felix Manlunas 
Date: Fri, 6 Jan 2017 17:16:12 -0800

> From: Derek Chickles 
> 
> Because every call to octeon_flush_iq() has a hardcoded 1 for the
> pending_thresh argument, simplify that function by removing that argument.
> This avoids one atomic read as well.
> 
> Signed-off-by: Derek Chickles 
> Signed-off-by: Felix Manlunas 
> Signed-off-by: Satanand Burla 

Applied.

Re: [PATCH net-next] net: ipv4: Remove flow arg from ip_mkroute_input

2017-01-08 Thread David Miller

From: David Ahern 
Date: Fri,  6 Jan 2017 17:39:58 -0800

> fl4 arg is not used; remove it.
> 
> Signed-off-by: David Ahern 

Applied.

Re: [PATCH net-next] net: ipmr: Remove nowait arg to ipmr_get_route

2017-01-08 Thread David Miller

From: David Ahern 
Date: Fri,  6 Jan 2017 17:39:06 -0800

> ipmr_get_route has 1 caller and the nowait arg is 0. Remove the arg and
> simplify ipmr_get_route accordingly.
> 
> Signed-off-by: David Ahern 

Applied.

Re: [net-next 0/8][pull request] 100GbE Intel Wired LAN Driver Updates 2017-01-08

2017-01-08 Thread David Miller

From: Jeff Kirsher 
Date: Sun,  8 Jan 2017 02:10:26 -0800

> This series contains updates to fm10k only.

Pulled, thanks Jeff.

Re: [PATCH v3 3/3] nfc: trf7970a: Prevent repeated polling from crashing the kernel

2017-01-08 Thread Geoff Lansberry

On Tue, Jan 3, 2017 at 4:21 PM, Mark Greer  wrote:
> On Tue, Jan 03, 2017 at 01:35:18PM -0500, Geoff Lansberry wrote:
>> On Tue, Jan 3, 2017 at 11:33 AM, Mark Greer  wrote:
>> > On Tue, Dec 27, 2016 at 09:18:32AM -0500, Geoff Lansberry wrote:
>
>> >> In the meantime - here is some more info about how we use it.
>> >>
>> >> We do use NFC structures.I did find an interesting clue in that
>> >> there are certain bottles that cause neard to segfault,  I'm not sure
>> >> what is different about them.  We write a string, like
>> >> "coppola_chardonnay_2015" to the bottles.
>> >
>> > Off the top of my head, it could be the length of the text.
>> > It would be useful to compare the data that works to the data
>> > that doesn't work.  Can you install NXP's 'TagInfo' app on a
>> > smartphone and scan tags with working & non-working data?
>> > You can email the data from the app to yourself, edit out
>> > the cruft, and share here.
>>
>> The data is always the same - and the tags are all the same.  Only
>> difference is that the tag is physically different, and perhaps
>> orientation; distance from antenna to tag is fixed.
>
> Interesting...  They're all type 2 tags, right?

Yes type 2.

>
>> I can't even
>> write the tags at all, so reading them will show blank.   Also a minor
>> but significant detail, is that the tags are embedded in such a way
>> that the phone cannot get close enough to them to connect.
>
> This section had me completely confused for a couple minutes until I realized
> that you mean that you can read & write the tags using the trf7970a with
> an attached antenna but not with your phone.  Is that correct?

Correct, due to the physical arrangement of the part the tag is embedded in.

>
> If so, try a tag that isn't embedded in something else and move it around
> the back of the phone.  Try to find where it works best.  The phone
> manufacturers are notorius for paying little attention to the NFC antenna
> they put on their products.  For example, I have a Samsung S5 next to me
> and it seems to work best around the center of the phone.  I've used others
> where I had to use the upper-left or upper-right corner of the phone.

I can borrow a phone and try, I do have some other tags.  This will
take me some time and
 I'm not optimistic that we will learn much, other than that the tag
was not programmed when
it does not work.   Don't wait on this answer.
>
> Mark
> --

Re: patch 4.8 "net: handle no dst on skb in icmp6_send"

2017-01-08 Thread David Miller

From: Bronek Kozicki 
Date: Sun, 8 Jan 2017 21:46:18 +

> Hello,
> 
> any particular reason why this fix
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=79dc7e3f1cd323be4c81aa1a94faa1b3ed987fb2
> was missed from stable 4.8 line? Apparently the bug being fixed has
> its own https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-9919
> 
> Thank you for your hard work and best regards

You should always check the networking -stable queue before asking
such questions:

http://patchwork.ozlabs.org/bundle/davem/stable/?submitter=&state=*&q=&archive=

Every patch sitting there is queued up and will be submitted to -stable
at some time in the next week or two, or whenever I get around to vetting
and submitting -stable changes.

The patch you are asking about it in fact in there, and will be attended
to at an appropriate time.

Thanks.

[PATCH net-next 3/4] net: dsa: Encapsulate legacy switch drivers into dsa_switch_driver

2017-01-08 Thread Florian Fainelli

In preparation for making struct dsa_switch_ops const, encapsulate it
within a dsa_switch_driver which has a list pointer and a pointer to
dsa_switch_ops. This allows us to take the list_head pointer out of
dsa_switch_ops, which is written to by {un,}register_switch_driver.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/mv88e6060.c  |  8 ++--
 drivers/net/dsa/mv88e6xxx/chip.c |  8 ++--
 include/net/dsa.h| 11 +++
 net/dsa/dsa.c| 12 +++-
 4 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index 7ce36dbd9b62..bcbd6dcbd8e8 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -261,16 +261,20 @@ static struct dsa_switch_ops mv88e6060_switch_ops = {
.phy_write  = mv88e6060_phy_write,
 };
 
+static struct dsa_switch_driver mv88e6060_switch_drv = {
+   .ops= &mv88e6060_switch_ops,
+};
+
 static int __init mv88e6060_init(void)
 {
-   register_switch_driver(&mv88e6060_switch_ops);
+   register_switch_driver(&mv88e6060_switch_drv);
return 0;
 }
 module_init(mv88e6060_init);
 
 static void __exit mv88e6060_cleanup(void)
 {
-   unregister_switch_driver(&mv88e6060_switch_ops);
+   unregister_switch_driver(&mv88e6060_switch_drv);
 }
 module_exit(mv88e6060_cleanup);
 
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index 676b0e2ad221..d43d12c281b3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4403,6 +4403,10 @@ static struct dsa_switch_ops mv88e6xxx_switch_ops = {
.port_mdb_dump  = mv88e6xxx_port_mdb_dump,
 };
 
+static struct dsa_switch_driver mv88e6xxx_switch_drv = {
+   .ops= &mv88e6xxx_switch_ops,
+};
+
 static int mv88e6xxx_register_switch(struct mv88e6xxx_chip *chip,
 struct device_node *np)
 {
@@ -4565,7 +4569,7 @@ static struct mdio_driver mv88e6xxx_driver = {
 
 static int __init mv88e6xxx_init(void)
 {
-   register_switch_driver(&mv88e6xxx_switch_ops);
+   register_switch_driver(&mv88e6xxx_switch_drv);
return mdio_driver_register(&mv88e6xxx_driver);
 }
 module_init(mv88e6xxx_init);
@@ -4573,7 +4577,7 @@ module_init(mv88e6xxx_init);
 static void __exit mv88e6xxx_cleanup(void)
 {
mdio_driver_unregister(&mv88e6xxx_driver);
-   unregister_switch_driver(&mv88e6xxx_switch_ops);
+   unregister_switch_driver(&mv88e6xxx_switch_drv);
 }
 module_exit(mv88e6xxx_cleanup);
 
diff --git a/include/net/dsa.h b/include/net/dsa.h
index b122196d5a1f..edfa9b130953 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -240,8 +240,6 @@ struct switchdev_obj_port_mdb;
 struct switchdev_obj_port_vlan;
 
 struct dsa_switch_ops {
-   struct list_headlist;
-
/*
 * Probing and setup.
 */
@@ -390,8 +388,13 @@ struct dsa_switch_ops {
 int (*cb)(struct switchdev_obj *obj));
 };
 
-void register_switch_driver(struct dsa_switch_ops *type);
-void unregister_switch_driver(struct dsa_switch_ops *type);
+struct dsa_switch_driver {
+   struct list_headlist;
+   struct dsa_switch_ops   *ops;
+};
+
+void register_switch_driver(struct dsa_switch_driver *type);
+void unregister_switch_driver(struct dsa_switch_driver *type);
 struct mii_bus *dsa_host_dev_to_mii_bus(struct device *dev);
 
 static inline bool dsa_uses_tagged_protocol(struct dsa_switch_tree *dst)
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index cda787ebad15..4e7bc57cdae5 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -60,18 +60,18 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = 
{
 static DEFINE_MUTEX(dsa_switch_drivers_mutex);
 static LIST_HEAD(dsa_switch_drivers);
 
-void register_switch_driver(struct dsa_switch_ops *ops)
+void register_switch_driver(struct dsa_switch_driver *drv)
 {
mutex_lock(&dsa_switch_drivers_mutex);
-   list_add_tail(&ops->list, &dsa_switch_drivers);
+   list_add_tail(&drv->list, &dsa_switch_drivers);
mutex_unlock(&dsa_switch_drivers_mutex);
 }
 EXPORT_SYMBOL_GPL(register_switch_driver);
 
-void unregister_switch_driver(struct dsa_switch_ops *ops)
+void unregister_switch_driver(struct dsa_switch_driver *drv)
 {
mutex_lock(&dsa_switch_drivers_mutex);
-   list_del_init(&ops->list);
+   list_del_init(&drv->list);
mutex_unlock(&dsa_switch_drivers_mutex);
 }
 EXPORT_SYMBOL_GPL(unregister_switch_driver);
@@ -90,8 +90,10 @@ dsa_switch_probe(struct device *parent, struct device 
*host_dev, int sw_addr,
mutex_lock(&dsa_switch_drivers_mutex);
list_for_each(list, &dsa_switch_drivers) {
struct dsa_switch_ops *ops;
+   struct dsa_switch_driver *drv;
 
-   ops = list_entry(list, struct dsa_switch_ops, list);
+   drv = list_entry(list, struct dsa_switch_driver, list);

[PATCH net-next 4/4] net: dsa: Make dsa_switch_ops const

2017-01-08 Thread Florian Fainelli

Now that we have properly encapsulated and made drivers utilize exported
functions, we can switch dsa_switch_ops to be a annotated with const.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c |  2 +-
 drivers/net/dsa/bcm_sf2.c|  2 +-
 drivers/net/dsa/mv88e6060.c  |  2 +-
 drivers/net/dsa/mv88e6xxx/chip.c |  2 +-
 drivers/net/dsa/qca8k.c  |  2 +-
 include/net/dsa.h|  4 ++--
 net/dsa/dsa.c| 10 +-
 net/dsa/hwmon.c  |  2 +-
 8 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index a448661b55c6..5102a3701a1a 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1453,7 +1453,7 @@ static enum dsa_tag_protocol b53_get_tag_protocol(struct 
dsa_switch *ds)
return DSA_TAG_PROTO_NONE;
 }
 
-static struct dsa_switch_ops b53_switch_ops = {
+static const struct dsa_switch_ops b53_switch_ops = {
.get_tag_protocol   = b53_get_tag_protocol,
.setup  = b53_setup,
.get_strings= b53_get_strings,
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index fcfc2cb5f3cd..4e7581788465 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -977,7 +977,7 @@ static struct b53_io_ops bcm_sf2_io_ops = {
.write64 = bcm_sf2_core_write64,
 };
 
-static struct dsa_switch_ops bcm_sf2_ops = {
+static const struct dsa_switch_ops bcm_sf2_ops = {
.get_tag_protocol   = bcm_sf2_sw_get_tag_protocol,
.setup  = bcm_sf2_sw_setup,
.get_strings= b53_get_strings,
diff --git a/drivers/net/dsa/mv88e6060.c b/drivers/net/dsa/mv88e6060.c
index bcbd6dcbd8e8..5934b7a4c448 100644
--- a/drivers/net/dsa/mv88e6060.c
+++ b/drivers/net/dsa/mv88e6060.c
@@ -252,7 +252,7 @@ mv88e6060_phy_write(struct dsa_switch *ds, int port, int 
regnum, u16 val)
return reg_write(ds, addr, regnum, val);
 }
 
-static struct dsa_switch_ops mv88e6060_switch_ops = {
+static const struct dsa_switch_ops mv88e6060_switch_ops = {
.get_tag_protocol = mv88e6060_get_tag_protocol,
.probe  = mv88e6060_drv_probe,
.setup  = mv88e6060_setup,
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index d43d12c281b3..eea8e0176e33 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -4361,7 +4361,7 @@ static int mv88e6xxx_port_mdb_dump(struct dsa_switch *ds, 
int port,
return err;
 }
 
-static struct dsa_switch_ops mv88e6xxx_switch_ops = {
+static const struct dsa_switch_ops mv88e6xxx_switch_ops = {
.probe  = mv88e6xxx_drv_probe,
.get_tag_protocol   = mv88e6xxx_get_tag_protocol,
.setup  = mv88e6xxx_setup,
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
index b3df70d07ff6..54d270d59eb0 100644
--- a/drivers/net/dsa/qca8k.c
+++ b/drivers/net/dsa/qca8k.c
@@ -911,7 +911,7 @@ qca8k_get_tag_protocol(struct dsa_switch *ds)
return DSA_TAG_PROTO_QCA;
 }
 
-static struct dsa_switch_ops qca8k_switch_ops = {
+static const struct dsa_switch_ops qca8k_switch_ops = {
.get_tag_protocol   = qca8k_get_tag_protocol,
.setup  = qca8k_setup,
.get_strings= qca8k_get_strings,
diff --git a/include/net/dsa.h b/include/net/dsa.h
index edfa9b130953..b94d1f2ef912 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -169,7 +169,7 @@ struct dsa_switch {
/*
 * The switch operations.
 */
-   struct dsa_switch_ops   *ops;
+   const struct dsa_switch_ops *ops;
 
/*
 * An array of which element [a] indicates which port on this
@@ -390,7 +390,7 @@ struct dsa_switch_ops {
 
 struct dsa_switch_driver {
struct list_headlist;
-   struct dsa_switch_ops   *ops;
+   const struct dsa_switch_ops *ops;
 };
 
 void register_switch_driver(struct dsa_switch_driver *type);
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 4e7bc57cdae5..fd532487dfdf 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -76,11 +76,11 @@ void unregister_switch_driver(struct dsa_switch_driver *drv)
 }
 EXPORT_SYMBOL_GPL(unregister_switch_driver);
 
-static struct dsa_switch_ops *
+static const struct dsa_switch_ops *
 dsa_switch_probe(struct device *parent, struct device *host_dev, int sw_addr,
 const char **_name, void **priv)
 {
-   struct dsa_switch_ops *ret;
+   const struct dsa_switch_ops *ret;
struct list_head *list;
const char *name;
 
@@ -89,7 +89,7 @@ dsa_switch_probe(struct device *parent, struct device 
*host_dev, int sw_addr,
 
mutex_lock(&dsa_switch_drivers_mutex);
list_for_each(list, &dsa_switch_drivers) {
-   struct dsa_switch_ops *ops;
+   const struct dsa_switch_ops *ops;

[PATCH net-next 2/4] net: dsa: bcm_sf2: Declare our own dsa_switch_ops

2017-01-08 Thread Florian Fainelli

Utilize the b53 exported functions to fill our bcm_sf2_ops structure,
also making it clear what we utilize and what we specifically override.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 53 +--
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 9ec33b51a0ed..fcfc2cb5f3cd 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -977,6 +977,38 @@ static struct b53_io_ops bcm_sf2_io_ops = {
.write64 = bcm_sf2_core_write64,
 };
 
+static struct dsa_switch_ops bcm_sf2_ops = {
+   .get_tag_protocol   = bcm_sf2_sw_get_tag_protocol,
+   .setup  = bcm_sf2_sw_setup,
+   .get_strings= b53_get_strings,
+   .get_ethtool_stats  = b53_get_ethtool_stats,
+   .get_sset_count = b53_get_sset_count,
+   .get_phy_flags  = bcm_sf2_sw_get_phy_flags,
+   .adjust_link= bcm_sf2_sw_adjust_link,
+   .fixed_link_update  = bcm_sf2_sw_fixed_link_update,
+   .suspend= bcm_sf2_sw_suspend,
+   .resume = bcm_sf2_sw_resume,
+   .get_wol= bcm_sf2_sw_get_wol,
+   .set_wol= bcm_sf2_sw_set_wol,
+   .port_enable= bcm_sf2_port_setup,
+   .port_disable   = bcm_sf2_port_disable,
+   .get_eee= bcm_sf2_sw_get_eee,
+   .set_eee= bcm_sf2_sw_set_eee,
+   .port_bridge_join   = b53_br_join,
+   .port_bridge_leave  = b53_br_leave,
+   .port_stp_state_set = b53_br_set_stp_state,
+   .port_fast_age  = b53_br_fast_age,
+   .port_vlan_filtering= b53_vlan_filtering,
+   .port_vlan_prepare  = b53_vlan_prepare,
+   .port_vlan_add  = b53_vlan_add,
+   .port_vlan_del  = b53_vlan_del,
+   .port_vlan_dump = b53_vlan_dump,
+   .port_fdb_prepare   = b53_fdb_prepare,
+   .port_fdb_dump  = b53_fdb_dump,
+   .port_fdb_add   = b53_fdb_add,
+   .port_fdb_del   = b53_fdb_del,
+};
+
 static int bcm_sf2_sw_probe(struct platform_device *pdev)
 {
const char *reg_names[BCM_SF2_REGS_NUM] = BCM_SF2_REGS_NAME;
@@ -1012,26 +1044,7 @@ static int bcm_sf2_sw_probe(struct platform_device *pdev)
 
priv->dev = dev;
ds = dev->ds;
-
-   /* Override the parts that are non-standard wrt. normal b53 devices */
-   ds->ops->get_tag_protocol = bcm_sf2_sw_get_tag_protocol;
-   ds->ops->setup = bcm_sf2_sw_setup;
-   ds->ops->get_phy_flags = bcm_sf2_sw_get_phy_flags;
-   ds->ops->adjust_link = bcm_sf2_sw_adjust_link;
-   ds->ops->fixed_link_update = bcm_sf2_sw_fixed_link_update;
-   ds->ops->suspend = bcm_sf2_sw_suspend;
-   ds->ops->resume = bcm_sf2_sw_resume;
-   ds->ops->get_wol = bcm_sf2_sw_get_wol;
-   ds->ops->set_wol = bcm_sf2_sw_set_wol;
-   ds->ops->port_enable = bcm_sf2_port_setup;
-   ds->ops->port_disable = bcm_sf2_port_disable;
-   ds->ops->get_eee = bcm_sf2_sw_get_eee;
-   ds->ops->set_eee = bcm_sf2_sw_set_eee;
-
-   /* Avoid having DSA free our slave MDIO bus (checking for
-* ds->slave_mii_bus and ds->ops->phy_read being non-NULL)
-*/
-   ds->ops->phy_read = NULL;
+   ds->ops = &bcm_sf2_ops;
 
dev_set_drvdata(&pdev->dev, priv);
 
-- 
2.9.3

[PATCH net-next 1/4] net: dsa: b53: Export most operations to other drivers

2017-01-08 Thread Florian Fainelli

In preparation for making dsa_switch_ops const, export b53 operations
utilized by other drivers such as bcm_sf2.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 79 +++-
 drivers/net/dsa/b53/b53_priv.h   | 33 +
 2 files changed, 79 insertions(+), 33 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index d5370c227043..a448661b55c6 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -712,7 +712,7 @@ static unsigned int b53_get_mib_size(struct b53_device *dev)
return B53_MIBS_SIZE;
 }
 
-static void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
+void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data)
 {
struct b53_device *dev = ds->priv;
const struct b53_mib_desc *mibs = b53_get_mib(dev);
@@ -723,9 +723,9 @@ static void b53_get_strings(struct dsa_switch *ds, int 
port, uint8_t *data)
memcpy(data + i * ETH_GSTRING_LEN,
   mibs[i].name, ETH_GSTRING_LEN);
 }
+EXPORT_SYMBOL(b53_get_strings);
 
-static void b53_get_ethtool_stats(struct dsa_switch *ds, int port,
- uint64_t *data)
+void b53_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *data)
 {
struct b53_device *dev = ds->priv;
const struct b53_mib_desc *mibs = b53_get_mib(dev);
@@ -756,13 +756,15 @@ static void b53_get_ethtool_stats(struct dsa_switch *ds, 
int port,
 
mutex_unlock(&dev->stats_mutex);
 }
+EXPORT_SYMBOL(b53_get_ethtool_stats);
 
-static int b53_get_sset_count(struct dsa_switch *ds)
+int b53_get_sset_count(struct dsa_switch *ds)
 {
struct b53_device *dev = ds->priv;
 
return b53_get_mib_size(dev);
 }
+EXPORT_SYMBOL(b53_get_sset_count);
 
 static int b53_setup(struct dsa_switch *ds)
 {
@@ -921,15 +923,15 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
port,
}
 }
 
-static int b53_vlan_filtering(struct dsa_switch *ds, int port,
- bool vlan_filtering)
+int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering)
 {
return 0;
 }
+EXPORT_SYMBOL(b53_vlan_filtering);
 
-static int b53_vlan_prepare(struct dsa_switch *ds, int port,
-   const struct switchdev_obj_port_vlan *vlan,
-   struct switchdev_trans *trans)
+int b53_vlan_prepare(struct dsa_switch *ds, int port,
+const struct switchdev_obj_port_vlan *vlan,
+struct switchdev_trans *trans)
 {
struct b53_device *dev = ds->priv;
 
@@ -943,10 +945,11 @@ static int b53_vlan_prepare(struct dsa_switch *ds, int 
port,
 
return 0;
 }
+EXPORT_SYMBOL(b53_vlan_prepare);
 
-static void b53_vlan_add(struct dsa_switch *ds, int port,
-const struct switchdev_obj_port_vlan *vlan,
-struct switchdev_trans *trans)
+void b53_vlan_add(struct dsa_switch *ds, int port,
+ const struct switchdev_obj_port_vlan *vlan,
+ struct switchdev_trans *trans)
 {
struct b53_device *dev = ds->priv;
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
@@ -977,9 +980,10 @@ static void b53_vlan_add(struct dsa_switch *ds, int port,
b53_fast_age_vlan(dev, vid);
}
 }
+EXPORT_SYMBOL(b53_vlan_add);
 
-static int b53_vlan_del(struct dsa_switch *ds, int port,
-   const struct switchdev_obj_port_vlan *vlan)
+int b53_vlan_del(struct dsa_switch *ds, int port,
+const struct switchdev_obj_port_vlan *vlan)
 {
struct b53_device *dev = ds->priv;
bool untagged = vlan->flags & BRIDGE_VLAN_INFO_UNTAGGED;
@@ -1015,10 +1019,11 @@ static int b53_vlan_del(struct dsa_switch *ds, int port,
 
return 0;
 }
+EXPORT_SYMBOL(b53_vlan_del);
 
-static int b53_vlan_dump(struct dsa_switch *ds, int port,
-struct switchdev_obj_port_vlan *vlan,
-int (*cb)(struct switchdev_obj *obj))
+int b53_vlan_dump(struct dsa_switch *ds, int port,
+ struct switchdev_obj_port_vlan *vlan,
+ int (*cb)(struct switchdev_obj *obj))
 {
struct b53_device *dev = ds->priv;
u16 vid, vid_start = 0, pvid;
@@ -1057,6 +1062,7 @@ static int b53_vlan_dump(struct dsa_switch *ds, int port,
 
return err;
 }
+EXPORT_SYMBOL(b53_vlan_dump);
 
 /* Address Resolution Logic routines */
 static int b53_arl_op_wait(struct b53_device *dev)
@@ -1175,9 +1181,9 @@ static int b53_arl_op(struct b53_device *dev, int op, int 
port,
return b53_arl_rw_op(dev, 0);
 }
 
-static int b53_fdb_prepare(struct dsa_switch *ds, int port,
-  const struct switchdev_obj_port_fdb *fdb,
-  struct switchdev_trans *trans)
+int b53_fdb_prepare(struct dsa_switch *ds, int port,
+   const str

[PATCH net-next 0/4] net: dsa: Make dsa_switch_ops const

2017-01-08 Thread Florian Fainelli

Hi all,

This patch series allows us to annotate dsa_switch_ops with a const
qualifier.

Florian Fainelli (4):
  net: dsa: b53: Export most operations to other drivers
  net: dsa: bcm_sf2: Declare our own dsa_switch_ops
  net: dsa: Encapsulate legacy switch drivers into dsa_switch_driver
  net: dsa: Make dsa_switch_ops const

 drivers/net/dsa/b53/b53_common.c | 81 +++-
 drivers/net/dsa/b53/b53_priv.h   | 33 
 drivers/net/dsa/bcm_sf2.c| 53 --
 drivers/net/dsa/mv88e6060.c  | 10 +++--
 drivers/net/dsa/mv88e6xxx/chip.c | 10 +++--
 drivers/net/dsa/qca8k.c  |  2 +-
 include/net/dsa.h| 13 ---
 net/dsa/dsa.c| 22 ++-
 net/dsa/hwmon.c  |  2 +-
 9 files changed, 149 insertions(+), 77 deletions(-)

-- 
2.9.3

Re: [PATCH net-next v2 1/2] net: make ndo_get_stats64 a void function

2017-01-08 Thread David Miller

From: Stephen Hemminger 
Date: Fri,  6 Jan 2017 19:12:52 -0800

> The network device operation for reading statistics is only called
> in one place, and it ignores the return value. Having a structure
> return value is potentially confusing because some future driver could
> incorrectly assume that the return value was used.
> 
> Fix all drivers with ndo_get_stats64 to have a void function.
> 
> Signed-off-by: Stephen Hemminger 

Applied.

Re: [PATCH net-next 2/2] net: remove useless memset's in drivers get_stats64

2017-01-08 Thread David Miller

From: Stephen Hemminger 
Date: Fri,  6 Jan 2017 19:12:53 -0800

> In dev_get_stats() the statistic structure storage has already been
> zeroed. Therefore network drivers do not need to call memset() again.
> 
> Signed-off-by: Stephen Hemminger 

Applied.

Re: [PATCH net-next] mdio: Demote print from info to debug in mdio_device_register

2017-01-08 Thread David Miller

From: Florian Fainelli 
Date: Fri,  6 Jan 2017 22:27:59 -0800

> While it is useful to know which MDIO device is being registered, demote
> the dev_info() to a dev_dbg().
> 
> Signed-off-by: Florian Fainelli 

Applied.

Re: [PATCH v2] phy state machine: failsafe leave invalid RUNNING state

2017-01-08 Thread David Miller

From: Zefir Kurtisi 
Date: Fri,  6 Jan 2017 12:14:48 +0100

> While in RUNNING state, phy_state_machine() checks for link changes by
> comparing phydev->link before and after calling phy_read_status().
> This works as long as it is guaranteed that phydev->link is never
> changed outside the phy_state_machine().
> 
> If in some setups this happens, it causes the state machine to miss
> a link loss and remain RUNNING despite phydev->link being 0.
> 
> This has been observed running a dsa setup with a process continuously
> polling the link states over ethtool each second (SNMPD RFC-1213
> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
> call phy_read_status() and with that modify the link status - and
> with that bricking the phy state machine.
> 
> This patch adds a fail-safe check while in RUNNING, which causes to
> move to CHANGELINK when the link is gone and we are still RUNNING.
> 
> Signed-off-by: Zefir Kurtisi 
> ---
> Changes to v1:
> * fix kbuild test robot error: use phydev_err instead of dev_warn
>   (adapt to changed struct phy_device after 4.4.21)

Florian and Andrew, please provide some feedback on this.

Thank you.

[PATCH net-next v2] net: dsa: make "label" property optional for dsa2

2017-01-08 Thread Vivien Didelot

In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
phandles are respectively mandatory and exclusive to CPU port and DSA
link device tree nodes.

Simplify dsa2.c a bit by checking the presence of such phandle instead
of checking the redundant "label" property.

Then the Linux philosophy for Ethernet switch ports is to expose them to
userspace as standard NICs by default. Thus use the standard enumerated
"eth%d" device name if no "label" property is provided for a user port.
This allows to save DTS files from subjective net device names.

Here's an example on a ZII Dev Rev B board without "label" properties:

# ip link | grep ': ' | cut -d: -f2
 lo
 eth0
 eth1
 eth2@eth1
 eth3@eth1
 eth4@eth1
 eth5@eth1
 eth6@eth1
 eth7@eth1
 eth8@eth1
 eth9@eth1
 eth10@eth1
 eth11@eth1
 eth12@eth1

If one wants to rename an interface, udev rules can be used as usual, as
suggested in the switchdev documentation:

# cat /etc/udev/rules.d/90-net-dsa.rules
SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", 
NAME="sw$attr{phys_switch_id}p$attr{phys_port_id}"

# ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
swp00
swp01
swp02
sw0100p00
sw0100p01
sw0100p02
sw0200p00
sw0200p01
sw0200p02
sw0200p03
sw0200p04

Until the printing of netdev_phys_item_id structures is fixed in
net/core/net-sysfs.c, an external helper can be used like this:

# cat /etc/udev/rules.d/90-net-dsa.rules
SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", 
PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", 
NAME="$result"

# cat /lib/udev/dsanitizer
#!/bin/sh
echo $1 | sed -e 's,^0*,,' -e 's,0*$,,' | xargs printf sw%d
echo $2 | sed -e 's,^0*,,' | xargs printf p%d

# ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
sw0p0
sw0p1
sw0p2
sw1p0
sw1p1
sw1p2
sw2p0
sw2p1
sw2p2
sw2p3
sw2p4

Of course the current behavior is unchanged, and the optional "label"
property for user ports has precedence over the enumerated name.

Signed-off-by: Vivien Didelot 
Acked-by: Uwe Kleine-König 
---
 Documentation/devicetree/bindings/net/dsa/dsa.txt | 20 ---
 net/dsa/dsa2.c| 24 ---
 2 files changed, 12 insertions(+), 32 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
b/Documentation/devicetree/bindings/net/dsa/dsa.txt
index a4a570fb2494..cfe8f64eca4f 100644
--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
@@ -34,13 +34,9 @@ Required properties:
 
 Each port children node must have the following mandatory properties:
 - reg  : Describes the port address in the switch
-- label: Describes the label associated with this 
port, which
-  will become the netdev name. Special labels are
- "cpu" to indicate a CPU port and "dsa" to
- indicate an uplink/downlink port between switches in
- the cluster.
 
-A port labelled "dsa" has the following mandatory property:
+An uplink/downlink port between switches in the cluster has the following
+mandatory property:
 
 - link : Should be a list of phandles to other switch's DSA
  port. This port is used as the outgoing port
@@ -48,12 +44,17 @@ A port labelled "dsa" has the following mandatory property:
  information must be given, not just the one hop
  routes to neighbouring switches.
 
-A port labelled "cpu" has the following mandatory property:
+A CPU port has the following mandatory property:
 
 - ethernet : Should be a phandle to a valid Ethernet device node.
   This host device is what the switch port is
  connected to.
 
+A user port has the following optional property:
+
+- label: Describes the label associated with this 
port, which
+  will become the netdev name.
+
 Port child nodes may also contain the following optional standardised
 properties, described in binding documents:
 
@@ -107,7 +108,6 @@ linked into one DSA cluster.
 
switch0port5: port@5 {
reg = <5>;
-   label = "dsa";
phy-mode = "rgmii-txid";
link = <&switch1port6
&switch2port9>;
@@ -119,7 +119,6 @@ linked into one DSA cluster.
 
port@6 {
reg = <6>;
-   label = "cpu";
ethernet = <&fec1>;

Re: [PATCH v2 03/12] net: ethernet: aquantia: Add ring support code

2017-01-08 Thread Rami Rosen

Hi, Alexander,

After a brief review, I have the following minor comments:
...
...
> diff --git a/drivers/net/ethernet/aquantia/aq_ring.c 
> b/drivers/net/ethernet/aquantia/aq_ring.c
> new file mode 100644
> index 000..a7ef6aa
> --- /dev/null
> +++ b/drivers/net/ethernet/aquantia/aq_ring.c
> @@ -0,0 +1,380 @@

Should be aq_ring.c and not aq_pci_ring.c

> +
> +/* File aq_pci_ring.c: Definition of functions for Rx/Tx rings. */
> +

The aq_nic_cfg parameter is not used, it should be removed:

> +struct aq_ring_s *aq_ring_tx_alloc(struct aq_ring_s *self,
> +  struct aq_nic_s *aq_nic,
> +  unsigned int idx,
> +  struct aq_nic_cfg_s *aq_nic_cfg)
> +{
> +   int err = 0;
> +
> +   if (!self) {
> +   err = -ENOMEM;
> +   goto err_exit;
> +   }
> +   self->aq_nic = aq_nic;
> +   self->idx = idx;
> +   self->size = aq_nic_cfg->txds;
> +   self->dx_size = aq_nic_cfg->aq_hw_caps->txd_size;
> +
> +   self = aq_ring_alloc(self, aq_nic, aq_nic_cfg);
> +   if (!self) {
> +   err = -ENOMEM;
> +   goto err_exit;
> +   }
> +
> +err_exit:
> +   if (err < 0) {
> +   aq_ring_free(self);
> +   self = NULL;
> +   }
> +   return self;
> +}
> +

Shouldn't the return type be void for next 2 methods?

> +int aq_ring_init(struct aq_ring_s *self)
> +{
> +   self->hw_head = 0;
> +   self->sw_head = 0;
> +   self->sw_tail = 0;
> +   return 0;
> +}
> +
> +int aq_ring_deinit(struct aq_ring_s *self)
> +{
> +   return 0;
> +}
> +
> +void aq_ring_free(struct aq_ring_s *self)
> +{
> +   if (!self)

I would prefer here simply "return" and remove altogether the err_exit
label, but it is up to you:

> +   goto err_exit;
> +
> +   kfree(self->buff_ring);
> +
> +   if (self->dx_ring)
> +   dma_free_coherent(aq_nic_get_dev(self->aq_nic),
> + self->size * self->dx_size, self->dx_ring,
> + self->dx_ring_pa);
> +
> +err_exit:;
> +}
> +

Shouldn't the following method return type be void ?
> +
> +int aq_ring_tx_clean(struct aq_ring_s *self)
> +{
> +   struct device *dev = aq_nic_get_dev(self->aq_nic);
> +   struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
> +
> +   for (; self->sw_head != self->hw_head;
> +   self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
> +   struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
> +
> +   ++self->stats.tx_packets;
> +   ++ndev->stats.tx_packets;
> +   ndev->stats.tx_bytes += buff->len;
> +
> +   if (likely(buff->is_mapped)) {
> +   if (unlikely(buff->is_sop))
> +   dma_unmap_single(dev, buff->pa, buff->len,
> +DMA_TO_DEVICE);
> +   else
> +   dma_unmap_page(dev, buff->pa, buff->len,
> +  DMA_TO_DEVICE);
> +   }
> +
> +   if (unlikely(buff->is_eop))
> +   dev_kfree_skb_any(buff->skb);
> +   }
> +
> +   if (aq_ring_avail_dx(self) > AQ_CFG_SKB_FRAGS_MAX)
> +   aq_nic_ndev_queue_start(self->aq_nic, self->idx);
> +
> +   return 0;
> +}
> +

The "err" variable in aq_ring_rx_clean() is meaningless and according to
current implementation of this method it should be removed. You set it
at the beginning to
0, then later on you also assign 0 to it under certain conditions, and
that's it, no other assignment. Maybe the second assignment should
have been to some other value than 0, but as it is it, the "err"
variable has no meaning.

> +int aq_ring_rx_clean(struct aq_ring_s *self, int *work_done, int budget)
> +{
> +   struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
> +   int err = 0;
> +   bool is_rsc_completed = true;
> +
> +   for (; (self->sw_head != self->hw_head) && budget;
> +   self->sw_head = aq_ring_next_dx(self, self->sw_head),
> +   --budget, ++(*work_done)) {
> +   struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
> +   struct sk_buff *skb = NULL;
> +   unsigned int next_ = 0U;
> +   unsigned int i = 0U;
> +   struct aq_ring_buff_s *buff_ = NULL;
> +
> +   if (buff->is_error) {
> +   __free_pages(buff->page, 0);
> +   continue;
> +   }
> +
> +   if (buff->is_cleaned)
> +   continue;
> +
> +   ++self->stats.rx_packets;
> +   ++ndev->stats.rx_packets;
> +   ndev->stats.rx_bytes += buff->len;
> +
> +   if (!buff->is_eop) {
> +   for (next_ = buf

[PATCH net-next] net: dsa: select NET_SWITCHDEV

2017-01-08 Thread Vivien Didelot

DSA wraps SWITCHDEV, thus select it instead of depending on it.

Signed-off-by: Vivien Didelot 
---
 net/dsa/Kconfig | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index 2ae9bb357523..675acbf1502d 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -6,7 +6,8 @@ config HAVE_NET_DSA
 
 config NET_DSA
tristate "Distributed Switch Architecture"
-   depends on HAVE_NET_DSA && NET_SWITCHDEV
+   depends on HAVE_NET_DSA
+   select NET_SWITCHDEV
select PHYLIB
---help---
  Say Y if you want to enable support for the hardware switches 
supported
-- 
2.11.0

Re: [PATCH v5] net: stmmac: fix maxmtu assignment to be within valid range

2017-01-08 Thread David Miller

From: "Kweh, Hock Leong"
Date: Sat,  7 Jan 2017 17:32:03 +0800

> From: "Kweh, Hock Leong" 
> 
> There is no checking valid value of maxmtu when getting it from
> device tree. This resolution added the checking condition to
> ensure the assignment is made within a valid range.
> 
> Signed-off-by: Kweh, Hock Leong 

Applied, thank you.

Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2

2017-01-08 Thread Andrew Lunn

> Until the printing of netdev_phys_item_id structures is fixed in
> net/core/net-sysfs.c, an external helper can be used like this:

Hi Vivien

As Florian pointed out, this cannot be changed. It is now part of the
ABI. We have to live with it printing little endian numbers as big
endian.

> # cat /etc/udev/rules.d/90-net-dsa.rules
> SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", 
> PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", 
> NAME="$result"
> 
> # cat /lib/udev/dsanitizer
> #!/bin/sh
> echo $1 | sed -e 's,^0*,,' -e 's,0*$,,' | xargs printf sw%d
> echo $2 | sed -e 's,^0*,,' | xargs printf p%d
> 
> # ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
> sw0p0
> sw0p1
> sw0p2
> sw1p0
> sw1p1
> sw1p2
> sw2p0
> sw2p1
> sw2p2
> sw2p3
> sw2p4

Rather than recommending something, it might be better to point to the
Free Desktop "Predictable Network Interface Names" which is what most
people will end up with, if they rename:

https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/

It would also be good to test on a recent systemd system and see what
happens. What names does it pick?

Andrew

Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV

2017-01-08 Thread Andrew Lunn

On Sun, Jan 08, 2017 at 06:17:24PM -0500, Vivien Didelot wrote:
> DSA wraps SWITCHDEV, thus select it instead of depending on it.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH v2] PCI: lock each enable/disable num_vfs operation in sysfs

2017-01-08 Thread Gavin Shan

On Fri, Jan 06, 2017 at 01:59:08PM -0800, Emil Tantilov wrote:
>Enabling/disabling SRIOV via sysfs by echo-ing multiple values
>simultaneously:
>
>echo 63 > /sys/class/net/ethX/device/sriov_numvfs&
>echo 63 > /sys/class/net/ethX/device/sriov_numvfs
>
>sleep 5
>
>echo 0 > /sys/class/net/ethX/device/sriov_numvfs&
>echo 0 > /sys/class/net/ethX/device/sriov_numvfs
>
>Results in the following bug:
>
>kernel BUG at drivers/pci/iov.c:495!
>invalid opcode:  [#1] SMP
>CPU: 1 PID: 8050 Comm: bash Tainted: G   W   4.9.0-rc7-net-next #2092
>RIP: 0010:[]
> [] pci_iov_release+0x57/0x60
>
>Call Trace:
> [] pci_release_dev+0x26/0x70
> [] device_release+0x3e/0xb0
> [] kobject_cleanup+0x67/0x180
> [] kobject_put+0x2d/0x60
> [] put_device+0x17/0x20
> [] pci_dev_put+0x1a/0x20
> [] pci_get_dev_by_id+0x5b/0x90
> [] pci_get_subsys+0x35/0x40
> [] pci_get_device+0x18/0x20
> [] pci_get_domain_bus_and_slot+0x2b/0x60
> [] pci_iov_remove_virtfn+0x57/0x180
> [] pci_disable_sriov+0x65/0x140
> [] ixgbe_disable_sriov+0xc7/0x1d0 [ixgbe]
> [] ixgbe_pci_sriov_configure+0x3d/0x170 [ixgbe]
> [] sriov_numvfs_store+0xdc/0x130
>...
>RIP  [] pci_iov_release+0x57/0x60
>
>Use the existing mutex lock to protect each enable/disable operation.
>
>-v2: move the existing lock from protecting the config of the IOV bus
>to protecting the writes to sriov_numvfs in sysfs without maintaining
>a "locked" version of pci_iov_add/remove_virtfn().
>As suggested by Gavin Shan 
>
>CC: Alexander Duyck 
>Signed-off-by: Emil Tantilov 
>---

Reviewed-by: Gavin Shan

Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV

2017-01-08 Thread Florian Fainelli

On 01/08/2017 03:17 PM, Vivien Didelot wrote:
> DSA wraps SWITCHDEV, thus select it instead of depending on it.
> 
> Signed-off-by: Vivien Didelot 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH v2] phy state machine: failsafe leave invalid RUNNING state

2017-01-08 Thread Florian Fainelli



On 01/06/2017 03:14 AM, Zefir Kurtisi wrote:
> While in RUNNING state, phy_state_machine() checks for link changes by
> comparing phydev->link before and after calling phy_read_status().
> This works as long as it is guaranteed that phydev->link is never
> changed outside the phy_state_machine().
> 
> If in some setups this happens, it causes the state machine to miss
> a link loss and remain RUNNING despite phydev->link being 0.
> 
> This has been observed running a dsa setup with a process continuously
> polling the link states over ethtool each second (SNMPD RFC-1213
> agent). Disconnecting the link on a phy followed by a ETHTOOL_GSET
> causes dsa_slave_get_settings() / dsa_slave_get_link_ksettings() to
> call phy_read_status() and with that modify the link status - and
> with that bricking the phy state machine.
> 
> This patch adds a fail-safe check while in RUNNING, which causes to
> move to CHANGELINK when the link is gone and we are still RUNNING.
> 
> Signed-off-by: Zefir Kurtisi 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next] net: dsa: select NET_SWITCHDEV

2017-01-08 Thread Randy Dunlap

On 01/08/17 17:18, Florian Fainelli wrote:
> On 01/08/2017 03:17 PM, Vivien Didelot wrote:
>> DSA wraps SWITCHDEV, thus select it instead of depending on it.
>>
>> Signed-off-by: Vivien Didelot 
> 
> Reviewed-by: Florian Fainelli 
> 

but when CONFIG_INET is not enabled, the patch causes this warning:

warning: (NET_DSA) selects NET_SWITCHDEV which has unmet direct dependencies 
(NET && INET)


-- 
~Randy

Re: [PATCH net-next 0/6] convert tc_verd to integer bitfields

2017-01-08 Thread David Miller

From: Willem de Bruijn 
Date: Sat,  7 Jan 2017 17:06:32 -0500

> The skb tc_verd field takes up two bytes but uses far fewer bits.
> Convert the remaining use cases to bitfields that fit in existing
> holes (depending on config options) and potentially save the two
> bytes in struct sk_buff.
 ...

Series applied, thanks!

Re: [PATCH V4 net-next 3/3] tun: rx batching

2017-01-08 Thread Jason Wang




On 2017年01月07日 03:47, Michael S. Tsirkin wrote:

+static int tun_get_coalesce(struct net_device *dev,
+   struct ethtool_coalesce *ec)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+
+   ec->rx_max_coalesced_frames = tun->rx_batched;
+
+   return 0;
+}
+
+static int tun_set_coalesce(struct net_device *dev,
+   struct ethtool_coalesce *ec)
+{
+   struct tun_struct *tun = netdev_priv(dev);
+
+   if (ec->rx_max_coalesced_frames > NAPI_POLL_WEIGHT)
+   return -EINVAL;

So what should userspace do? Keep trying until it succeeds?
I think it's better to just use NAPI_POLL_WEIGHT instead and DTRT here.



Well, looking at how set_coalesce is implemented in other drivers, 
-EINVAL is usually used when user give a value that exceeds the 
limitation. For tuntap, what missed here is probably just a 
documentation for coalescing in tuntap.txt. (Or extend ethtool to return 
the max value). This seems much better than silently reduce the value to 
the limitation.


Thanks

Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2

2017-01-08 Thread Vivien Didelot

Hi Andrew,

Andrew Lunn  writes:

>> Until the printing of netdev_phys_item_id structures is fixed in
>> net/core/net-sysfs.c, an external helper can be used like this:
>
> As Florian pointed out, this cannot be changed. It is now part of the
> ABI. We have to live with it printing little endian numbers as big
> endian.

I totally understand the fact that ABI must not be changed. However we
should be aware that the current phys_switch_id of DSA is broken.

In addition to the minor issue of being hardly useable, it does not meet
the requirement described in the switchdev documentation of being unique
on a system. A switch ID in DSA is currently unique only to a DSA tree.
A system with two disjoint switch trees will have two switches with a
phys_switch_id of "".

> Rather than recommending something, it might be better to point to the
> Free Desktop "Predictable Network Interface Names" which is what most
> people will end up with, if they rename:
>
> https://www.freedesktop.org/wiki/Software/systemd/PredictableNetworkInterfaceNames/
>
> It would also be good to test on a recent systemd system and see what
> happens. What names does it pick?

Note that the udev rules I gave in this commit message were only there
as examples of renaming DSA slave interfaces from userspace. This is
orthogonal with the purpose of this patch.

Thanks,

Vivien

Re: [PATCH V4 net-next 1/3] vhost: better detection of available buffers

2017-01-08 Thread Jason Wang




On 2017年01月07日 03:55, Michael S. Tsirkin wrote:

On Fri, Jan 06, 2017 at 10:13:15AM +0800, Jason Wang wrote:

This patch tries to do several tweaks on vhost_vq_avail_empty() for a
better performance:

- check cached avail index first which could avoid userspace memory access.
- using unlikely() for the failure of userspace access
- check vq->last_avail_idx instead of cached avail index as the last
   step.

This patch is need for batching supports which needs to peek whether
or not there's still available buffers in the ring.

Reviewed-by: Stefan Hajnoczi 
Signed-off-by: Jason Wang 
---
  drivers/vhost/vhost.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index d643260..9f11838 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -2241,11 +2241,15 @@ bool vhost_vq_avail_empty(struct vhost_dev *dev, struct 
vhost_virtqueue *vq)
__virtio16 avail_idx;
int r;
  
+	if (vq->avail_idx != vq->last_avail_idx)

+   return false;
+
r = vhost_get_user(vq, avail_idx, &vq->avail->idx);
-   if (r)
+   if (unlikely(r))
return false;
+   vq->avail_idx = vhost16_to_cpu(vq, avail_idx);
  
-	return vhost16_to_cpu(vq, avail_idx) == vq->avail_idx;

+   return vq->avail_idx == vq->last_avail_idx;
  }
  EXPORT_SYMBOL_GPL(vhost_vq_avail_empty);

So again, this did not address the issue I pointed out in v1:
if we have 1 buffer in RX queue and
that is not enough to store the whole packet,
vhost_vq_avail_empty returns false, then we re-read
the descriptors again and again.

You have saved a single index access but not the more expensive
descriptor access.


Looks not, if I understand the code correctly, in this case, 
get_rx_bufs() will return zero, and we will try to enable rx kick and 
exit the loop.


Thanks

Re: [PATCH net 0/2] net: dsa: bcm_sf2: Couple fixes

2017-01-08 Thread David Miller

From: Florian Fainelli 
Date: Sat,  7 Jan 2017 21:01:55 -0800

> Here are a couple of fixes for bcm_sf2, please queue these up for
> -stable as well, thank you very much!

Series applied and queued up for -stable, thanks.

[GIT] Networking

2017-01-08 Thread David Miller


1) Fix dumping of nft_quota entries, from Pablo Neira Ayuso.

2) Fix out of bounds access in nf_tables discovered by KASAN,
   from Florian Westphal.

3) Fix IRQ enabling in dp83867 driver, from Grygorii Strashko.

4) Fix unicast filtering in be2net driver, from Ivan Vecera.

5) tg3_get_stats64() can race with driver close and ethtool
   reconfigurations, fix from Michael Chan.

6) Fix error handling when pass limit is reached in bpf code
   gen on x86.  From Daniel Borkmann.

7) Don't clobber switch ops and use proper MDIO nested reads
   and writes in bcm_sf2 driver, from Florian Fainelli.

Please pull, thanks a lot!

The following changes since commit e02003b515e8d95f40f20f213622bb82510873d2:

  Merge tag 'xfs-for-linus-4.10-rc3' of 
git://git.kernel.org/pub/scm/fs/xfs/xfs-linux (2017-01-04 18:33:35 -0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to 03430fa10b99e95e3a15eb7c00978fb1652f3b24:

  Merge branch 'bcm_sf2-fixes' (2017-01-08 22:01:22 -0500)


Artur Molchanov (1):
  bridge: netfilter: Fix dropping packets that moving through bridge 
interface

Daniel Borkmann (1):
  bpf: change back to orig prog on too many passes

David Forster (1):
  vti6: fix device register to report IFLA_INFO_KIND

David S. Miller (3):
  Merge git://git.kernel.org/.../pablo/nf
  Merge tag 'mac80211-for-davem-2017-01-06' of 
git://git.kernel.org/.../jberg/mac80211
  Merge branch 'bcm_sf2-fixes'

Florian Fainelli (2):
  net: dsa: bcm_sf2: Do not clobber b53_switch_ops
  net: dsa: bcm_sf2: Utilize nested MDIO read/write

Florian Westphal (1):
  netfilter: nf_tables: fix oob access

Grygorii Strashko (1):
  net: phy: dp83867: fix irq generation

Ivan Vecera (2):
  be2net: fix accesses to unicast list
  be2net: fix unicast list filling

Johannes Berg (1):
  nl80211: fix sched scan netlink socket owner destruction

Kweh, Hock Leong (1):
  net: stmmac: fix maxmtu assignment to be within valid range

Lendacky, Thomas (1):
  amd-xgbe: Fix IRQ processing when running in single IRQ mode

Michael Chan (1):
  tg3: Fix race condition in tg3_get_stats64().

Pablo Neira Ayuso (3):
  netfilter: nft_quota: reset quota after dump
  netfilter: nft_queue: use raw_smp_processor_id()
  netfilter: nft_payload: mangle ckecksum if NFT_PAYLOAD_L4CSUM_PSEUDOHDR 
is set

Paul Moore (1):
  netlabel: add CALIPSO to the list of built-in protocols

Sergei Shtylyov (2):
  sh_eth: fix EESIPR values for SH77{34|63}
  sh_eth: R8A7740 supports packet shecksumming

Xin Long (1):
  netfilter: ipt_CLUSTERIP: check duplicate config when initializing

Zhu Yanjun (1):
  r8169: fix the typo in the comment

 arch/x86/net/bpf_jit_comp.c   |  2 ++
 drivers/net/dsa/bcm_sf2.c | 11 +--
 drivers/net/ethernet/amd/xgbe/xgbe-drv.c  |  2 +-
 drivers/net/ethernet/broadcom/tg3.c   |  3 +++
 drivers/net/ethernet/emulex/benet/be_main.c   | 12 
 drivers/net/ethernet/realtek/r8169.c  |  2 +-
 drivers/net/ethernet/renesas/sh_eth.c |  5 +++--
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 +-
 drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c  |  6 ++
 drivers/net/phy/dp83867.c | 10 ++
 net/bridge/br_netfilter_hooks.c   |  2 +-
 net/ipv4/netfilter/ipt_CLUSTERIP.c| 34 
+++---
 net/ipv6/ip6_vti.c|  2 +-
 net/netfilter/nf_tables_api.c |  2 +-
 net/netfilter/nft_payload.c   | 27 
+++
 net/netfilter/nft_queue.c |  2 +-
 net/netfilter/nft_quota.c | 26 
++
 net/netlabel/netlabel_kapi.c  |  5 +
 net/wireless/nl80211.c| 16 +++-
 19 files changed, 116 insertions(+), 63 deletions(-)

[PATCH v2 net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.

2017-01-08 Thread Mao Wenan

Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
enhance the performance for some cpu architecure, such as SPARC and so on.
Currently it only supports one special cpu architecture(SPARC) in 82599
driver to enable RO feature, this is not very common for other cpu architecture
which really needs RO feature.
This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO feature,
and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.

Signed-off-by: Mao Wenan 
---
 arch/Kconfig| 3 +++
 arch/sparc/Kconfig  | 1 +
 drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/Kconfig b/arch/Kconfig
index 99839c2..bd04eac 100644
--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -781,4 +781,7 @@ config VMAP_STACK
  the stack to map directly to the KASAN shadow map using a formula
  that is incorrect if the stack is in vmalloc space.
 
+config ARCH_WANT_RELAX_ORDER
+   bool
+
 source "kernel/gcov/Kconfig"
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index cf4034c..68ac5c7 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -44,6 +44,7 @@ config SPARC
select CPU_NO_EFFICIENT_FFS
select HAVE_ARCH_HARDENED_USERCOPY
select PROVE_LOCKING_SMALL if PROVE_LOCKING
+   select ARCH_WANT_RELAX_ORDER
 
 config SPARC32
def_bool !64BIT
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
index 094e1d6..c38d50c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_common.c
@@ -350,7 +350,7 @@ s32 ixgbe_start_hw_gen2(struct ixgbe_hw *hw)
}
IXGBE_WRITE_FLUSH(hw);
 
-#ifndef CONFIG_SPARC
+#ifndef CONFIG_ARCH_WANT_RELAX_ORDER
/* Disable relaxed ordering */
for (i = 0; i < hw->mac.max_tx_queues; i++) {
u32 regval;
-- 
2.7.0

Re: [PATCH net-next] net:add one common config ARCH_WANT_RELAX_ORDER to support relax ordering.

2017-01-08 Thread maowenan



On 2017/1/6 23:41, Alexander Duyck wrote:
> On Fri, Jan 6, 2017 at 1:52 AM, Mao Wenan  wrote:
>> Relax ordering(RO) is one feature of 82599 NIC, to enable this feature can
>> enhance the performance for some cpu architecure, such as SPARC and so on.
>> Currently it only supports one special cpu architecture(SPARC) in 82599
>> driver to enable RO feature, this is not very common for other cpu 
>> architecture
>> which really needs RO feature.
>> This patch add one common config CONFIG_ARCH_WANT_RELAX_ORDER to set RO 
>> feature,
>> and should define CONFIG_ARCH_WANT_RELAX_ORDER in sparc Kconfig firstly.
>>
>> Signed-off-by: Mao Wenan 
>> ---
>>  arch/sparc/Kconfig  | 1 +
>>  drivers/net/ethernet/intel/ixgbe/ixgbe_common.c | 2 +-
>>  2 files changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
>> index cf4034c..68ac5c7 100644
>> --- a/arch/sparc/Kconfig
>> +++ b/arch/sparc/Kconfig
>> @@ -44,6 +44,7 @@ config SPARC
>> select CPU_NO_EFFICIENT_FFS
>> select HAVE_ARCH_HARDENED_USERCOPY
>> select PROVE_LOCKING_SMALL if PROVE_LOCKING
>> +   select ARCH_WANT_RELAX_ORDER
>>
>>  config SPARC32
>> def_bool !64BIT
> 
> 
> I'm pretty sure this is incomplete.  I think you need to add a couple
> lines to arch/Kconfig so that the config option itself is listed
> somewhere.  You might look at using something like HAVE_CMPXCHG_DOUBLE
> as an example.
> 
> - Alex
> 
> 

thank you for comments, i will send v2 patch soon.

UNSUBSCIBE

2017-01-08 Thread Vink, Ronald

-Original Message-
From: netfilter-announce 
[mailto:netfilter-announce-boun...@lists.netfilter.org] On Behalf Of Pablo 
Neira Ayuso
Sent: dinsdag 20 december 2016 21:47
To: netfilter-de...@vger.kernel.org
Cc: l...@lwn.net; netdev@vger.kernel.org; netfil...@vger.kernel.org; 
netfilter-annou...@lists.netfilter.org
Subject: [ANNOUNCE] nftables 0.7 release

Hi!

The Netfilter project proudly presents:

nftables 0.7

This release contains many accumulated bug fixes and new features available up 
to the (upcoming) Linux 4.10-rc1 kernel release.

* Facilitate migration from iptables to nftables:

  At compilation time, you have to pass this option.

  # ./configure --with-xtables

  And libxtables needs to be installed in your system. This allows you
  to list a ruleset containing xt extensions loaded through
  iptables-compat-restore tool. The nft tool provides a native
  translation for iptables extensions (if available).

* Add new fib expression, which can be used to obtain the output
  interface from the route table based on either source or destination
  address of a packet. This can be used to e.g. add reverse path
  filtering, eg. drop if not coming from the same interface packet
  arrived on:

  # nft add rule x prerouting fib saddr . iif oif eq 0 drop

  Accept only if from eth:

  # nft add rule x prerouting fib saddr . iif oif eq "eth0" accept

  Accept if from any valid interface:

  # nft add rule x prerouting fib saddr oif accept

  Querying of address type is also supported, this can be used
  to only accept packets to addresses configured in the same
  interface, eg.

  # nft add rule x prerouting fib daddr . iif type local accept

  Its also possible to use mark and verdict map, eg,

  # nft add rule x prerouting \
meta mark set 0xdead fib daddr . mark type vmap {
blackhole : drop,
prohibit : drop,
unicast : accept
}

* Support hashing of any arbitrary key combination, eg.

  # nft add rule x y \
dnat to jhash ip saddr . tcp dport mod 2 map { \
0 : 192.168.20.100, \
1 : 192.168.30.100 \
}

  Another usecase: Set packet marks based on any arbitrary hashing.

* Add number generation support. Useful for round-robin packet mark
  setting, eg.

  # nft add rule filter prerouting meta mark set numgen inc mod 2

  You can also specify an offset to indicate from what value you want
  to start from.

  The modulus provides the scale of the counting sequence. You can
  also use this from maps, eg.

  # nft add rule nat prerouting \
dnat to numgen inc mod 2 map { 0 : 192.168.10.100, 1 : 192.168.20.200 }

  So this is distributing new connections in a round-robin fashion
  between 192.168.10.100 and 192.168.20.200. Don't forget the special NAT
  chain semantics: Only the first packet evaluates the rule, follow up
  packets rely on conntrack to apply the NAT information.

  You can also emulate flow distribution with different backend weights
  using intervals, eg.

  # nft add rule nat prerouting \
dnat to numgen inc mod 10 map { 0-5 : 192.168.10.100, 6-9 : 
192.168.20.200 }

* Add quota support, eg.

  # nft add rule filter input \
flow table http { ip saddr timeout 60s quota over 50 mbytes } drop

  This creates a flow table, where every flow gets a quota of 50
  mbytes. You can also from use simple rules too to enforce quotas, of
  course.

* Introduce routing expression, for routing related data with support
  for nexthop (i.e. the directly connected IP address that an outgoing
  packet is sent to), which can be used either for matching or accounting, eg.

 # nft add rule filter postrouting \
  ip daddr 192.168.1.0/24 rt nexthop != 192.168.0.1 drop

  This will drop any traffic to 192.168.1.0/24 that is not routed via
  192.168.0.1.

 # nft add rule filter postrouting \
  flow table acct { rt nexthop timeout 600s counter }

 # nft add rule ip6 filter postrouting \
  flow table acct { rt nexthop timeout 600s counter }

  These rules count outgoing traffic per nexthop. Note that the timeout
  releases an entry if no traffic is seen for this nexthop within 10
  minutes.

* Notrack support, to explicitly skip connection tracking for matching
  packets, eg.

 # nft add rule ip raw prerouting tcp dport { 80, 443 } notrack

  So you can skip tracking for http and https traffic.

* Support to set non-byte bound packet header fields, including
  checksum adjustment, eg. ip6 ecn set 1.

* Add 'create set' and 'create element' commands, eg.

 # nft add set x y { type ipv4_addr\; }
 # nft create set x y { type ipv4_addr\; }
 :1:1-35: Error: Could not process rule: File exists
 create set x y { type ipv4_addr; }
 ^^^
 # nft add set x y { type ipv4_addr\; }
 #

  So 'create' bails out if the set already exists, while 'add'
  doesn't, for more ergonomic usage as s

Re: [PATCH 2/3] xen: modify xenstore watch event interface

2017-01-08 Thread Juergen Gross

On 06/01/17 22:57, Boris Ostrovsky wrote:
> On 01/06/2017 10:05 AM, Juergen Gross wrote:
>> Today a Xenstore watch event is delivered via a callback function
>> declared as:
>>
>> void (*callback)(struct xenbus_watch *,
>>  const char **vec, unsigned int len);
>>
>> As all watch events only ever come with two parameters (path and token)
>> changing the prototype to:
>>
>> void (*callback)(struct xenbus_watch *,
>>  const char *path, const char *token);
>>
>> is the natural thing to do.
>>
>> Apply this change and adapt all users.
>>
>> Cc: konrad.w...@oracle.com
>> Cc: roger@citrix.com
>> Cc: wei.l...@citrix.com
>> Cc: paul.durr...@citrix.com
>> Cc: netdev@vger.kernel.org
>>
>> Signed-off-by: Juergen Gross 
> 
> 
>>  
>> @@ -903,24 +902,24 @@ static int process_msg(void)
>>  body[msg->hdr.len] = '\0';
>>  
>>  if (msg->hdr.type == XS_WATCH_EVENT) {
>> -msg->u.watch.vec = split(body, msg->hdr.len,
>> - &msg->u.watch.vec_size);
>> -if (IS_ERR(msg->u.watch.vec)) {
>> -err = PTR_ERR(msg->u.watch.vec);
>> +if (count_strings(body, msg->hdr.len) != 2) {
>> +err = -EINVAL;
> 
> xenbus_write_watch() returns -EILSEQ when this type of error is
> encountered so perhaps for we should return the same error here.

Not since 9a6161fe73bdd3ae4a1e18421b0b20cb7141f680. :-)

> 
> Either way
> 
> Reviewed-by: Boris Ostrovsky 

Thanks,

Juergen

Re: [PATCH net-next v2] net: dsa: make "label" property optional for dsa2

2017-01-08 Thread Jiri Pirko

Mon, Jan 09, 2017 at 12:15:52AM CET, vivien.dide...@savoirfairelinux.com wrote:
>In the new DTS bindings for DSA (dsa2), the "ethernet" and "link"
>phandles are respectively mandatory and exclusive to CPU port and DSA
>link device tree nodes.
>
>Simplify dsa2.c a bit by checking the presence of such phandle instead
>of checking the redundant "label" property.
>
>Then the Linux philosophy for Ethernet switch ports is to expose them to
>userspace as standard NICs by default. Thus use the standard enumerated
>"eth%d" device name if no "label" property is provided for a user port.
>This allows to save DTS files from subjective net device names.
>
>Here's an example on a ZII Dev Rev B board without "label" properties:
>
># ip link | grep ': ' | cut -d: -f2
> lo
> eth0
> eth1
> eth2@eth1
> eth3@eth1
> eth4@eth1
> eth5@eth1
> eth6@eth1
> eth7@eth1
> eth8@eth1
> eth9@eth1
> eth10@eth1
> eth11@eth1
> eth12@eth1
>
>If one wants to rename an interface, udev rules can be used as usual, as
>suggested in the switchdev documentation:
>
># cat /etc/udev/rules.d/90-net-dsa.rules
>SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", 
> NAME="sw$attr{phys_switch_id}p$attr{phys_port_id}"
>
># ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
>swp00
>swp01
>swp02
>sw0100p00
>sw0100p01
>sw0100p02
>sw0200p00
>sw0200p01
>sw0200p02
>sw0200p03
>sw0200p04
>
>Until the printing of netdev_phys_item_id structures is fixed in
>net/core/net-sysfs.c, an external helper can be used like this:
>
># cat /etc/udev/rules.d/90-net-dsa.rules
>SUBSYSTEM=="net", ACTION=="add", ENV{DEVTYPE}=="dsa", 
> PROGRAM="/lib/udev/dsanitizer $attr{phys_switch_id} $attr{phys_port_id}", 
> NAME="$result"

I know this is kind of confusing, but phys_port_id is to be used to
indicate same physical port that is shared by multiple netdevices- for
example sr-iov usecase. For switchdev usecase, you should use
phys_port_name.

I will add some documentation to kernel regarding this. But I see that
net/dsa/slave.c already implements .ndo_get_phys_port_id :(

I recently made changes in udev so it names the switch ports according
to phys_port_name, out of the box, without need for any rules:
https://github.com/systemd/systemd/pull/4506/commits/c960caa0c2a620fc506c6f0f7b6c40eeace48e4d

I guess that it should be enough for you to implement
ndo_get_phys_port_name.





>
># cat /lib/udev/dsanitizer
>#!/bin/sh
>echo $1 | sed -e 's,^0*,,' -e 's,0*$,,' | xargs printf sw%d
>echo $2 | sed -e 's,^0*,,' | xargs printf p%d
>
># ip link | awk '/@eth/ { split($2,a,"@"); print a[1]; }'
>sw0p0
>sw0p1
>sw0p2
>sw1p0
>sw1p1
>sw1p2
>sw2p0
>sw2p1
>sw2p2
>sw2p3
>sw2p4
>
>Of course the current behavior is unchanged, and the optional "label"
>property for user ports has precedence over the enumerated name.
>
>Signed-off-by: Vivien Didelot 
>Acked-by: Uwe Kleine-König 
>---
> Documentation/devicetree/bindings/net/dsa/dsa.txt | 20 ---
> net/dsa/dsa2.c| 24 ---
> 2 files changed, 12 insertions(+), 32 deletions(-)
>
>diff --git a/Documentation/devicetree/bindings/net/dsa/dsa.txt 
>b/Documentation/devicetree/bindings/net/dsa/dsa.txt
>index a4a570fb2494..cfe8f64eca4f 100644
>--- a/Documentation/devicetree/bindings/net/dsa/dsa.txt
>+++ b/Documentation/devicetree/bindings/net/dsa/dsa.txt
>@@ -34,13 +34,9 @@ Required properties:
> 
> Each port children node must have the following mandatory properties:
> - reg : Describes the port address in the switch
>-- label   : Describes the label associated with this 
>port, which
>-  will become the netdev name. Special labels are
>-"cpu" to indicate a CPU port and "dsa" to
>-indicate an uplink/downlink port between switches in
>-the cluster.
> 
>-A port labelled "dsa" has the following mandatory property:
>+An uplink/downlink port between switches in the cluster has the following
>+mandatory property:
> 
> - link: Should be a list of phandles to other 
> switch's DSA
> port. This port is used as the outgoing port
>@@ -48,12 +44,17 @@ A port labelled "dsa" has the following mandatory property:
> information must be given, not just the one hop
> routes to neighbouring switches.
> 
>-A port labelled "cpu" has the following mandatory property:
>+A CPU port has the following mandatory property:
> 
> - ethernet: Should be a phandle to a valid Ethernet device node.
>   This host device is what the switch port is
> connected to.
> 
>+A user port has the following optional

[PATCH v2] ipv4: make tcp_notsent_lowat sysctl knob behave as true unsigned int

2017-01-08 Thread Pavel Tikhomirov

> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-1
> echo 4294967295 > /proc/sys/net/ipv4/tcp_notsent_lowat
-bash: echo: write error: Invalid argument
> echo -2147483648 > /proc/sys/net/ipv4/tcp_notsent_lowat
> cat /proc/sys/net/ipv4/tcp_notsent_lowat
-2147483648

but in documentation we have "tcp_notsent_lowat - UNSIGNED INTEGER"

v2: simplify to just proc_douintvec
Signed-off-by: Pavel Tikhomirov 
---
 net/ipv4/sysctl_net_ipv4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c
index 80bc36b..566cfc5 100644
--- a/net/ipv4/sysctl_net_ipv4.c
+++ b/net/ipv4/sysctl_net_ipv4.c
@@ -958,7 +958,7 @@ static struct ctl_table ipv4_net_table[] = {
.data   = &init_net.ipv4.sysctl_tcp_notsent_lowat,
.maxlen = sizeof(unsigned int),
.mode   = 0644,
-   .proc_handler   = proc_dointvec,
+   .proc_handler   = proc_douintvec,
},
 #ifdef CONFIG_IP_ROUTE_MULTIPATH
{
-- 
2.9.3

Re: [RFC v2 00/10] HFI Virtual Network Interface Controller (VNIC)

2017-01-08 Thread Leon Romanovsky

On Thu, Dec 15, 2016 at 11:28:06AM -0500, Doug Ledford wrote:
> On 12/15/2016 9:52 AM, ira.weiny wrote:
>
> 2) With more than 60% of the code being MAD related, and another
> significant chunk being hfi related, and only a minor bit (20% maybe?)
> being net related,

Hi Doug and Ira,

I may admit that I didn't read the code very deep, but from brief
overview, I didn't find support for the claim the "60% code is MAD related".
It looks like the opposite thing will be more accurate.

Can you help me to understand this claim? How did you come to this
conclusion?

Thanks

signature.asc
Description: PGP signature

Re: patch 4.8 "net: handle no dst on skb in icmp6_send"

2017-01-08 Thread Bronek Kozicki

On 08/01/2017 22:50, David Miller wrote:

From: Bronek Kozicki 
Date: Sun, 8 Jan 2017 21:46:18 +

Hello,

any particular reason why this fix
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=79dc7e3f1cd323be4c81aa1a94faa1b3ed987fb2
was missed from stable 4.8 line? Apparently the bug being fixed has
its own https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-9919

Thank you for your hard work and best regards

You should always check the networking -stable queue before asking
such questions:

http://patchwork.ozlabs.org/bundle/davem/stable/?submitter=&state=*&q=&archive=

Every patch sitting there is queued up and will be submitted to -stable
at some time in the next week or two, or whenever I get around to vetting
and submitting -stable changes.

The patch you are asking about it in fact in there, and will be attended
to at an appropriate time.

Thank you David for prompt reply. I guess perhaps you are not aware that 
patches to stable line 4.8 might not be accepted after Sun Jan 8th 
(yesterday), and it will be considered EOL by version 4.8.17 ?

Best regards

B.

Re: patch 4.8 "net: handle no dst on skb in icmp6_send"

2017-01-08 Thread Greg Kroah-Hartman

On Mon, Jan 09, 2017 at 07:53:49AM +, Bronek Kozicki wrote:
> On 08/01/2017 22:50, David Miller wrote:
> > From: Bronek Kozicki 
> > Date: Sun, 8 Jan 2017 21:46:18 +
> > 
> > > Hello,
> > > 
> > > any particular reason why this fix
> > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=79dc7e3f1cd323be4c81aa1a94faa1b3ed987fb2
> > > was missed from stable 4.8 line? Apparently the bug being fixed has
> > > its own https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-9919
> > > 
> > > Thank you for your hard work and best regards
> > 
> > You should always check the networking -stable queue before asking
> > such questions:
> > 
> > 
> > http://patchwork.ozlabs.org/bundle/davem/stable/?submitter=&state=*&q=&archive=
> > 
> > Every patch sitting there is queued up and will be submitted to -stable
> > at some time in the next week or two, or whenever I get around to vetting
> > and submitting -stable changes.
> > 
> > The patch you are asking about it in fact in there, and will be attended
> > to at an appropriate time.
> 
> 
> Thank you David for prompt reply. I guess perhaps you are not aware that
> patches to stable line 4.8 might not be accepted after Sun Jan 8th
> (yesterday), and it will be considered EOL by version 4.8.17 ?

It's ok, no one should be using 4.8 anymore now, and if this is fixed in
4.9, all is good :)

thanks,

greg k-h

1 2 >

1 - 100 of 101 matches

Mail list logo