[net-next 08/13] i40e: convert to cpu from le16 to generate switch_id correctly

2017-02-18 Thread Jeff Kirsher
From: Jacob Keller 

On Big Endian platforms we would incorrectly calculate the wrong switch
id since we did not properly convert the le16 value into CPU format.
Caught by sparse.

Change-ID: I69a2f9fa064a0a91691f7d0e6fcc206adceb8e36
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index f1f41f12902f..267ad2588255 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -974,7 +974,7 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
struct i40e_dcbx_config *r_cfg =
&pf->hw.remote_dcbx_config;
int i, ret;
-   u32 switch_id;
+   u16 switch_id;
 
bw_data = kzalloc(sizeof(
struct i40e_aqc_query_port_ets_config_resp),
@@ -986,7 +986,8 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
 
vsi = pf->vsi[pf->lan_vsi];
switch_id =
-   vsi->info.switch_id & I40E_AQ_VSI_SW_ID_MASK;
+   le16_to_cpu(vsi->info.switch_id) &
+   I40E_AQ_VSI_SW_ID_MASK;
 
ret = i40e_aq_query_port_ets_config(&pf->hw,
switch_id,
-- 
2.11.0



[net-next 10/13] i40e: Error handling for link event

2017-02-18 Thread Jeff Kirsher
From: Harshitha Ramamurthy 

There exists an intermittent bug which causes the 'Link Detected'
field reported by the 'ethtool ' command to be 'Yes' when
in fact, there is no link. This patch fixes the problem by
enabling temporary link polling when i40e_get_link_status returns
an error. This causes the driver to remember that an admin queue
command failed and polls, until the function returns with a success.

Change-Id: I64c69b008db4017b8729f3fc27b8f65c8fe2eaa0
Signed-off-by: Harshitha Ramamurthy 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h  |  1 +
 drivers/net/ethernet/intel/i40e/i40e_main.c | 14 --
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index f4c4ee3edde1..82d8040fa418 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -349,6 +349,7 @@ struct i40e_pf {
 #define I40E_FLAG_HAVE_CRT_RETIMER BIT_ULL(52)
 #define I40E_FLAG_PTP_L4_CAPABLE   BIT_ULL(53)
 #define I40E_FLAG_WOL_MC_MAGIC_PKT_WAKEBIT_ULL(54)
+#define I40E_FLAG_TEMP_LINK_POLLINGBIT_ULL(55)
 
/* tracks features that get auto disabled by errors */
u64 auto_disable_flags;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index df78271bdce5..199ef34e00f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6353,7 +6353,16 @@ static void i40e_link_event(struct i40e_pf *pf)
old_link = (pf->hw.phy.link_info_old.link_info & I40E_AQ_LINK_UP);
 
status = i40e_get_link_status(&pf->hw, &new_link);
-   if (status) {
+
+   /* On success, disable temp link polling */
+   if (status == I40E_SUCCESS) {
+   if (pf->flags & I40E_FLAG_TEMP_LINK_POLLING)
+   pf->flags &= ~I40E_FLAG_TEMP_LINK_POLLING;
+   } else {
+   /* Enable link polling temporarily until i40e_get_link_status
+* returns I40E_SUCCESS
+*/
+   pf->flags |= I40E_FLAG_TEMP_LINK_POLLING;
dev_dbg(&pf->pdev->dev, "couldn't get link state, status: %d\n",
status);
return;
@@ -6405,7 +6414,8 @@ static void i40e_watchdog_subtask(struct i40e_pf *pf)
return;
pf->service_timer_previous = jiffies;
 
-   if (pf->flags & I40E_FLAG_LINK_POLLING_ENABLED)
+   if ((pf->flags & I40E_FLAG_LINK_POLLING_ENABLED) ||
+   (pf->flags & I40E_FLAG_TEMP_LINK_POLLING))
i40e_link_event(pf);
 
/* Update the stats for active netdevs so the network stack
-- 
2.11.0



[net-next 03/13] i40e: remove unnecessary call to i40e_update_link_info

2017-02-18 Thread Jeff Kirsher
From: Jacob Keller 

This call is made just prior to running i40e_link_event. In
i40e_link_event, we set hw->phy.get_link_info to true just prior to
calling i40e_get_link_status, which conveniently runs
i40e_update_link_info for us. Thus, we are running i40e_update_link_info
twice, which seems like something we don't need to do...

Change-ID: I36467a570f44b7546d218c99e134ff97c2709315
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e9335af4cc28..a6dca5822cff 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10761,7 +10761,6 @@ static int i40e_setup_pf_switch(struct i40e_pf *pf, 
bool reinit)
i40e_pf_config_rss(pf);
 
/* fill in link information and enable LSE reporting */
-   i40e_update_link_info(&pf->hw);
i40e_link_event(pf);
 
/* Initialize user-specific link properties */
-- 
2.11.0



[net-next 06/13] i40e: Fix Adaptive ITR enabling

2017-02-18 Thread Jeff Kirsher
From: Carolyn Wyborny 

This patch fixes a bug introduced with the addition of the per queue
ITR feature support in ethtool.  With that addition, there were
functions added which converted the ITR settings to binary values.
The IS_ENABLED macros that run on those values check whether a bit
is set or not and with the value being binary, the bit check always
returned ITR disabled which prevents any updating of the ITR rate.
This patch fixes the problem by changing the functions to return the
current ITR value instead and renaming it to better reflect
its function.  These functions now provide a value which will be
accurately asessed and update the ITR as intended.

Change-ID: I14f1d088d052e27f652aaa3113e186415ddea1fc
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 12 ++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 09f09ea7a5e5..4dc993bb16bf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1864,14 +1864,14 @@ static u32 i40e_buildreg_itr(const int type, const u16 
itr)
 
 /* a small macro to shorten up some long lines */
 #define INTREG I40E_PFINT_DYN_CTLN
-static inline int get_rx_itr_enabled(struct i40e_vsi *vsi, int idx)
+static inline int get_rx_itr(struct i40e_vsi *vsi, int idx)
 {
-   return !!(vsi->rx_rings[idx]->rx_itr_setting);
+   return vsi->rx_rings[idx]->rx_itr_setting;
 }
 
-static inline int get_tx_itr_enabled(struct i40e_vsi *vsi, int idx)
+static inline int get_tx_itr(struct i40e_vsi *vsi, int idx)
 {
-   return !!(vsi->tx_rings[idx]->tx_itr_setting);
+   return vsi->tx_rings[idx]->tx_itr_setting;
 }
 
 /**
@@ -1897,8 +1897,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi 
*vsi,
 */
rxval = txval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
 
-   rx_itr_setting = get_rx_itr_enabled(vsi, idx);
-   tx_itr_setting = get_tx_itr_enabled(vsi, idx);
+   rx_itr_setting = get_rx_itr(vsi, idx);
+   tx_itr_setting = get_tx_itr(vsi, idx);
 
if (q_vector->itr_countdown > 0 ||
(!ITR_IS_DYNAMIC(rx_itr_setting) &&
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index b758846d4dc5..7cd28ef6e714 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1324,18 +1324,18 @@ static u32 i40e_buildreg_itr(const int type, const u16 
itr)
 
 /* a small macro to shorten up some long lines */
 #define INTREG I40E_VFINT_DYN_CTLN1
-static inline int get_rx_itr_enabled(struct i40e_vsi *vsi, int idx)
+static inline int get_rx_itr(struct i40e_vsi *vsi, int idx)
 {
struct i40evf_adapter *adapter = vsi->back;
 
-   return !!(adapter->rx_rings[idx].rx_itr_setting);
+   return adapter->rx_rings[idx].rx_itr_setting;
 }
 
-static inline int get_tx_itr_enabled(struct i40e_vsi *vsi, int idx)
+static inline int get_tx_itr(struct i40e_vsi *vsi, int idx)
 {
struct i40evf_adapter *adapter = vsi->back;
 
-   return !!(adapter->tx_rings[idx].tx_itr_setting);
+   return adapter->tx_rings[idx].tx_itr_setting;
 }
 
 /**
@@ -1361,8 +1361,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi 
*vsi,
 */
rxval = txval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
 
-   rx_itr_setting = get_rx_itr_enabled(vsi, idx);
-   tx_itr_setting = get_tx_itr_enabled(vsi, idx);
+   rx_itr_setting = get_rx_itr(vsi, idx);
+   tx_itr_setting = get_tx_itr(vsi, idx);
 
if (q_vector->itr_countdown > 0 ||
(!ITR_IS_DYNAMIC(rx_itr_setting) &&
-- 
2.11.0



[net-next 04/13] i40evf: free rings in remove function

2017-02-18 Thread Jeff Kirsher
From: Mitch Williams 

When the i40evf_remove() calls netdev close, the device doesn't actually
close - it schedules the work for the watchdog to perform. Since we're
stopping the watchdog, this work doesn't get done. However, we're
resetting the part, so we can free resources after the reset request has
gone through. This plugs a memory leak.

Change-ID: Id5335dcaf76ce00d2a4c3d26e9faf711d7f051cf
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 920c1cb06a92..5673dbd2cf7d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2871,7 +2871,8 @@ static void i40evf_remove(struct pci_dev *pdev)
i40evf_request_reset(adapter);
msleep(50);
}
-
+   i40evf_free_all_tx_resources(adapter);
+   i40evf_free_all_rx_resources(adapter);
i40evf_misc_irq_disable(adapter);
i40evf_free_misc_irq(adapter);
i40evf_reset_interrupt_capability(adapter);
-- 
2.11.0



[net-next 07/13] i40e: refactor AQ CMD buffer debug printing

2017-02-18 Thread Jeff Kirsher
From: Alan Brady 

This patch refactors the '%*ph' printk format specifier to instead use
the print_hex_dump function, as recommended by the '%*ph' documentation.
This produces better/more standardized output.

Change-ID: Id56700b4e8abc40ff8c04bc8379e7df04cb4d6fd
Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c   | 19 ---
 drivers/net/ethernet/intel/i40evf/i40e_common.c | 19 ---
 2 files changed, 24 insertions(+), 14 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index fc73e4ef27ac..ece57d6a6e23 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -300,7 +300,6 @@ void i40e_debug_aq(struct i40e_hw *hw, enum i40e_debug_mask 
mask, void *desc,
struct i40e_aq_desc *aq_desc = (struct i40e_aq_desc *)desc;
u16 len;
u8 *buf = (u8 *)buffer;
-   u16 i = 0;
 
if ((!(mask & hw->debug_mask)) || (desc == NULL))
return;
@@ -328,12 +327,18 @@ void i40e_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
if (buf_len < len)
len = buf_len;
/* write the full 16-byte chunks */
-   for (i = 0; i < (len - 16); i += 16)
-   i40e_debug(hw, mask, "\t0x%04X  %16ph\n", i, buf + i);
-   /* write whatever's left over without overrunning the buffer */
-   if (i < len)
-   i40e_debug(hw, mask, "\t0x%04X  %*ph\n",
-i, len - i, buf + i);
+   if (hw->debug_mask & mask) {
+   char prefix[20];
+
+   snprintf(prefix, 20,
+"i40e %02x:%02x.%x: \t0x",
+hw->bus.bus_id,
+hw->bus.device,
+hw->bus.func);
+
+   print_hex_dump(KERN_INFO, prefix, DUMP_PREFIX_OFFSET,
+  16, 1, buf, len, false);
+   }
}
 }
 
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_common.c 
b/drivers/net/ethernet/intel/i40evf/i40e_common.c
index b5a59dd72a0c..89dfdbca13db 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_common.c
@@ -304,7 +304,6 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
 {
struct i40e_aq_desc *aq_desc = (struct i40e_aq_desc *)desc;
u8 *buf = (u8 *)buffer;
-   u16 i = 0;
 
if ((!(mask & hw->debug_mask)) || (desc == NULL))
return;
@@ -332,12 +331,18 @@ void i40evf_debug_aq(struct i40e_hw *hw, enum 
i40e_debug_mask mask, void *desc,
if (buf_len < len)
len = buf_len;
/* write the full 16-byte chunks */
-   for (i = 0; i < (len - 16); i += 16)
-   i40e_debug(hw, mask, "\t0x%04X  %16ph\n", i, buf + i);
-   /* write whatever's left over without overrunning the buffer */
-   if (i < len)
-   i40e_debug(hw, mask, "\t0x%04X  %*ph\n",
-i, len - i, buf + i);
+   if (hw->debug_mask & mask) {
+   char prefix[20];
+
+   snprintf(prefix, 20,
+"i40evf %02x:%02x.%x: \t0x",
+hw->bus.bus_id,
+hw->bus.device,
+hw->bus.func);
+
+   print_hex_dump(KERN_INFO, prefix, DUMP_PREFIX_OFFSET,
+  16, 1, buf, len, false);
+   }
}
 }
 
-- 
2.11.0



[net-next 02/13] i40e: enable mc magic pkt wakeup during power down

2017-02-18 Thread Jeff Kirsher
From: Joshua Hay 

This patch adds a call to the mac_address_write admin q function during
power down to update the PRTPM_SAH/SAL registers with the MC_MAG_EN bit
thus enabling multicast magic packet wakeup.

A FW workaround is needed to write the multicast magic wake up enable
bit in the PRTPM_SAH register. The FW expects the mac address write
admin q cmd to be called first with one of the WRITE_TYPE_LAA flags
and then with the multicast relevant flags.

*Note: This solution only works for X722 devices currently. A PFR will
clear the previously mentioned bit by default, but X722 has support for a
WOL_PRESERVE_ON_PFR flag which prevents the bit from being cleared. Once
other devices support this flag, this solution should work as well.

Change-ID: I51bd5b8535bd9051c2676e27c999c1657f786827
Signed-off-by: Joshua Hay 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h|  1 +
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h |  2 +
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 74 ---
 3 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 7a23d3e47c6f..f4c4ee3edde1 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -348,6 +348,7 @@ struct i40e_pf {
 #define I40E_FLAG_TRUE_PROMISC_SUPPORT BIT_ULL(51)
 #define I40E_FLAG_HAVE_CRT_RETIMER BIT_ULL(52)
 #define I40E_FLAG_PTP_L4_CAPABLE   BIT_ULL(53)
+#define I40E_FLAG_WOL_MC_MAGIC_PKT_WAKEBIT_ULL(54)
 
/* tracks features that get auto disabled by errors */
u64 auto_disable_flags;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index b2101a51534c..451f48b7540a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -538,6 +538,8 @@ I40E_CHECK_STRUCT_LEN(24, i40e_aqc_mac_address_read_data);
 /* Manage MAC Address Write Command (0x0108) */
 struct i40e_aqc_mac_address_write {
__le16  command_flags;
+#define I40E_AQC_MC_MAG_EN 0x0100
+#define I40E_AQC_WOL_PRESERVE_ON_PFR   0x0200
 #define I40E_AQC_WRITE_TYPE_LAA_ONLY   0x
 #define I40E_AQC_WRITE_TYPE_LAA_WOL0x4000
 #define I40E_AQC_WRITE_TYPE_PORT   0x8000
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index fb8a52dd94cd..e9335af4cc28 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8815,16 +8815,17 @@ static int i40e_sw_init(struct i40e_pf *pf)
}
 #endif /* CONFIG_PCI_IOV */
if (pf->hw.mac.type == I40E_MAC_X722) {
-   pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE |
-I40E_FLAG_128_QP_RSS_CAPABLE |
-I40E_FLAG_HW_ATR_EVICT_CAPABLE |
-I40E_FLAG_OUTER_UDP_CSUM_CAPABLE |
-I40E_FLAG_WB_ON_ITR_CAPABLE |
-I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE |
-I40E_FLAG_NO_PCI_LINK_CHECK |
-I40E_FLAG_USE_SET_LLDP_MIB |
-I40E_FLAG_GENEVE_OFFLOAD_CAPABLE |
-I40E_FLAG_PTP_L4_CAPABLE;
+   pf->flags |= I40E_FLAG_RSS_AQ_CAPABLE
+| I40E_FLAG_128_QP_RSS_CAPABLE
+| I40E_FLAG_HW_ATR_EVICT_CAPABLE
+| I40E_FLAG_OUTER_UDP_CSUM_CAPABLE
+| I40E_FLAG_WB_ON_ITR_CAPABLE
+| I40E_FLAG_MULTIPLE_TCP_UDP_RSS_PCTYPE
+| I40E_FLAG_NO_PCI_LINK_CHECK
+| I40E_FLAG_USE_SET_LLDP_MIB
+| I40E_FLAG_GENEVE_OFFLOAD_CAPABLE
+| I40E_FLAG_PTP_L4_CAPABLE
+| I40E_FLAG_WOL_MC_MAGIC_PKT_WAKE;
} else if ((pf->hw.aq.api_maj_ver > 1) ||
   ((pf->hw.aq.api_maj_ver == 1) &&
(pf->hw.aq.api_min_ver > 4))) {
@@ -11741,6 +11742,53 @@ static void i40e_pci_error_resume(struct pci_dev *pdev)
 }
 
 /**
+ * i40e_enable_mc_magic_wake - enable multicast magic packet wake up
+ * using the mac_address_write admin q function
+ * @pf: pointer to i40e_pf struct
+ **/
+static void i40e_enable_mc_magic_wake(struct i40e_pf *pf)
+{
+   struct i40e_hw *hw = &pf->hw;
+   i40e_status ret;
+   u8 mac_addr[6];
+   u16 flags = 0;
+
+   /* Get current MAC address in case it's an LAA */
+   if (pf->vsi[pf->lan_vsi] && pf->vsi[pf->lan_vsi]->netdev) {
+   ether_addr_copy(mac_addr,
+   pf->vsi[pf->lan_vsi]->netdev->dev_addr);
+   } else {
+   dev_err(&pf->pdev->dev,
+   "Failed to

[net-next 09/13] i40e: properly convert le16 value to CPU format

2017-02-18 Thread Jeff Kirsher
From: Jacob Keller 

This ensures that the pvid which is stored in __le16 format is converted
to the CPU format. This will fix comparison issues on Big Endian
platforms.

Change-ID: I92c80d1315dc2a0f9f095d5a0c48d461beb052ed
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a6dca5822cff..df78271bdce5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1254,6 +1254,7 @@ static int i40e_correct_mac_vlan_filters(struct i40e_vsi 
*vsi,
 struct hlist_head *tmp_del_list,
 int vlan_filters)
 {
+   s16 pvid = le16_to_cpu(vsi->info.pvid);
struct i40e_mac_filter *f, *add_head;
struct i40e_new_mac_filter *new;
struct hlist_node *h;
@@ -1275,8 +1276,8 @@ static int i40e_correct_mac_vlan_filters(struct i40e_vsi 
*vsi,
 
/* Update the filters about to be added in place */
hlist_for_each_entry(new, tmp_add_list, hlist) {
-   if (vsi->info.pvid && new->f->vlan != vsi->info.pvid)
-   new->f->vlan = vsi->info.pvid;
+   if (pvid && new->f->vlan != pvid)
+   new->f->vlan = pvid;
else if (vlan_filters && new->f->vlan == I40E_VLAN_ANY)
new->f->vlan = 0;
else if (!vlan_filters && new->f->vlan == 0)
@@ -1290,12 +1291,12 @@ static int i40e_correct_mac_vlan_filters(struct 
i40e_vsi *vsi,
 * order to avoid duplicating code for adding the new filter
 * then deleting the old filter.
 */
-   if ((vsi->info.pvid && f->vlan != vsi->info.pvid) ||
+   if ((pvid && f->vlan != pvid) ||
(vlan_filters && f->vlan == I40E_VLAN_ANY) ||
(!vlan_filters && f->vlan == 0)) {
/* Determine the new vlan we will be adding */
-   if (vsi->info.pvid)
-   new_vlan = vsi->info.pvid;
+   if (pvid)
+   new_vlan = pvid;
else if (vlan_filters)
new_vlan = 0;
else
-- 
2.11.0



[net-next 00/13][pull request] 40GbE Intel Wired LAN Driver Updates 2017-02-18

2017-02-18 Thread Jeff Kirsher
This series contains updates to i40e and i40evf only.

Alan fixes a bug in which the driver is unable to exit overflow
promiscuous mode after having added "too many" mac filters.  Ractored
the '%*ph' printk format specifier to instead use the print_hex_dump().

Josh adds enabling multicast magic packet wakeup by adding calls to
the mac_address_write admin q function during power down to update the
PRTPM_SAH/SAL registers with the MC_MAG_EN bit.

Jake remove a duplicate call i40e_update_link_info(), since it does not
need to call it twice.  Fixes and issue where we calculating the wrong
switch id on big endian platforms.  Avoided sparse warning, by doing a
typecast to ensure the value is of the type expected by
csum_replace_by_diff().

Mitch fixes a memory leak by freeing resources during i40e_remove().
Cleans up some code confusion by adding a proper code comment.

Carolyn fixes a bug introduced with the addition of the per queue ITR
feature support in ethtool.  Cleans up a duplicate device id from the
PCI table.

Harshitha fixes a bug which causes the 'Link Detected' field in
ethtool to report the correct link status.

Benjamin Poirier from SuSE applies a fix ec13ee80145c ("virtio_net:
 invoke softirqs after __napi_schedule") to i40e driver as well.

The following are changes since commit 4e33e34625103593a71d2bae471ce49cef62ef06:
  tcp: use page_ref_inc() in tcp_sendmsg()
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Alan Brady (2):
  i40e: fix disable overflow promiscuous mode
  i40e: refactor AQ CMD buffer debug printing

Benjamin Poirier (1):
  i40e: Invoke softirqs after napi_reschedule

Carolyn Wyborny (2):
  i40e: Fix Adaptive ITR enabling
  i40e: remove duplicate device id from PCI table

Harshitha Ramamurthy (1):
  i40e: Error handling for link event

Jacob Keller (4):
  i40e: remove unnecessary call to i40e_update_link_info
  i40e: convert to cpu from le16 to generate switch_id correctly
  i40e: properly convert le16 value to CPU format
  i40e: mark the value passed to csum_replace_by_diff as __wsum

Joshua Hay (1):
  i40e: enable mc magic pkt wakeup during power down

Mitch Williams (2):
  i40evf: free rings in remove function
  i40evf: add comment

 drivers/net/ethernet/intel/i40e/i40e.h|   2 +
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h |   2 +
 drivers/net/ethernet/intel/i40e/i40e_common.c |  19 ++--
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c|   5 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c   | 115 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  17 ++--
 drivers/net/ethernet/intel/i40evf/i40e_common.c   |  19 ++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c |  17 ++--
 drivers/net/ethernet/intel/i40evf/i40evf_main.c   |   8 +-
 9 files changed, 147 insertions(+), 57 deletions(-)

-- 
2.11.0



[net-next 13/13] i40e: Invoke softirqs after napi_reschedule

2017-02-18 Thread Jeff Kirsher
From: Benjamin Poirier 

The following message is logged from time to time when using i40e:
NOHZ: local_softirq_pending 08

i40e may schedule napi from a workqueue. Afterwards, softirqs are not run
in a deterministic time frame. The problem is the same as what was
described in commit ec13ee80145c ("virtio_net: invoke softirqs after
__napi_schedule") and this patch applies the same fix to i40e.

Signed-off-by: Benjamin Poirier 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index b0215c1159fe..e8a8351c8ea9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -4685,8 +4685,10 @@ static void i40e_detect_recover_hung_queue(int q_idx, 
struct i40e_vsi *vsi)
 */
if ((!tx_pending_hw) && i40e_get_tx_pending(tx_ring, true) &&
(!(val & I40E_PFINT_DYN_CTLN_INTENA_MASK))) {
+   local_bh_disable();
if (napi_reschedule(&tx_ring->q_vector->napi))
tx_ring->tx_stats.tx_lost_interrupt++;
+   local_bh_enable();
}
 }
 
-- 
2.11.0



[net-next 11/13] i40e: mark the value passed to csum_replace_by_diff as __wsum

2017-02-18 Thread Jeff Kirsher
From: Jacob Keller 

Fix, or rather, avoid a sparse warning caused by the fact that
csum_replace_by_diff expects to receive a __wsum value. Since the
calculation appears to work, simply typecast the passed paylen value to
__wsum to avoid the warning.

This seems pretty fishy since __wsum was obviously annotated as
a separate type on purpose, so this throws the entire calculation into
question. Since it currently appears to behave as expected, the typecast
is probably safe.

Change-ID: I4fdc5cddd589abc16098176e8a61127e761488f4
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 5 +++--
 drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 5 +++--
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 4dc993bb16bf..97d46058d71d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2335,7 +2335,8 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 
*hdr_len,
 
/* remove payload length from outer checksum */
paylen = skb->len - l4_offset;
-   csum_replace_by_diff(&l4.udp->check, htonl(paylen));
+   csum_replace_by_diff(&l4.udp->check,
+(__force __wsum)htonl(paylen));
}
 
/* reset pointers to inner headers */
@@ -2356,7 +2357,7 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 
*hdr_len,
 
/* remove payload length from inner checksum */
paylen = skb->len - l4_offset;
-   csum_replace_by_diff(&l4.tcp->check, htonl(paylen));
+   csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
 
/* compute length of segmentation header */
*hdr_len = (l4.tcp->doff * 4) + l4_offset;
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
index 7cd28ef6e714..c91fcf43ccbc 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40evf/i40e_txrx.c
@@ -1629,7 +1629,8 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 
*hdr_len,
 
/* remove payload length from outer checksum */
paylen = skb->len - l4_offset;
-   csum_replace_by_diff(&l4.udp->check, htonl(paylen));
+   csum_replace_by_diff(&l4.udp->check,
+(__force __wsum)htonl(paylen));
}
 
/* reset pointers to inner headers */
@@ -1650,7 +1651,7 @@ static int i40e_tso(struct i40e_tx_buffer *first, u8 
*hdr_len,
 
/* remove payload length from inner checksum */
paylen = skb->len - l4_offset;
-   csum_replace_by_diff(&l4.tcp->check, htonl(paylen));
+   csum_replace_by_diff(&l4.tcp->check, (__force __wsum)htonl(paylen));
 
/* compute length of segmentation header */
*hdr_len = (l4.tcp->doff * 4) + l4_offset;
-- 
2.11.0



[net-next 12/13] i40e: remove duplicate device id from PCI table

2017-02-18 Thread Jeff Kirsher
From: Carolyn Wyborny 

Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 199ef34e00f8..b0215c1159fe 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -77,7 +77,6 @@ static const struct pci_device_id i40e_pci_tbl[] = {
{PCI_VDEVICE(INTEL, I40E_DEV_ID_QSFP_C), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_10G_BASE_T4), 0},
-   {PCI_VDEVICE(INTEL, I40E_DEV_ID_20G_KR2), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_KX_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_QSFP_X722), 0},
{PCI_VDEVICE(INTEL, I40E_DEV_ID_SFP_X722), 0},
-- 
2.11.0



[net-next 05/13] i40evf: add comment

2017-02-18 Thread Jeff Kirsher
From: Mitch Williams 

Add a comment to reduce confusion.

Change-ID: I3d5819c0f3f5174680442ae54398a073d4a61f4f
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index 5673dbd2cf7d..f35dcaac5bb7 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -2153,6 +2153,11 @@ static int i40evf_close(struct net_device *netdev)
adapter->state = __I40EVF_DOWN_PENDING;
i40evf_free_traffic_irqs(adapter);
 
+   /* We explicitly don't free resources here because the hardware is
+* still active and can DMA into memory. Resources are cleared in
+* i40evf_virtchnl_completion() after we get confirmation from the PF
+* driver that the rings have been stopped.
+*/
return 0;
 }
 
-- 
2.11.0



[net-next 01/13] i40e: fix disable overflow promiscuous mode

2017-02-18 Thread Jeff Kirsher
From: Alan Brady 

There exists a bug in which the driver is unable to exit overflow
promiscuous mode after having added "too many" mac filters.  It is
expected that after triggering overflow promiscuous, removing the
failed/extra filters should then disable overflow promiscuous mode.

The bug exists because we were intentionally skipping the sync_vsi_filter
path in cases where we were removing failed filters since they shouldn't
have been added to the firmware in the first place, however we still
need to go through the sync_vsi_filter code path to determine whether or
not it is ok to exit overflow promiscuous mode.  This patch fixes the
bug by making sure we go through the sync_vsi_filter path in cases of
failed filters.

Change-ID: I634d249ca3e5fa50729553137c295e73e7722143
Signed-off-by: Alan Brady 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index e83a8ca5dd65..fb8a52dd94cd 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -1447,18 +1447,20 @@ void __i40e_del_filter(struct i40e_vsi *vsi, struct 
i40e_mac_filter *f)
if (!f)
return;
 
+   /* If the filter was never added to firmware then we can just delete it
+* directly and we don't want to set the status to remove or else an
+* admin queue command will unnecessarily fire.
+*/
if ((f->state == I40E_FILTER_FAILED) ||
(f->state == I40E_FILTER_NEW)) {
-   /* this one never got added by the FW. Just remove it,
-* no need to sync anything.
-*/
hash_del(&f->hlist);
kfree(f);
} else {
f->state = I40E_FILTER_REMOVE;
-   vsi->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
-   vsi->back->flags |= I40E_FLAG_FILTER_SYNC;
}
+
+   vsi->flags |= I40E_VSI_FLAG_FILTER_CHANGED;
+   vsi->back->flags |= I40E_FLAG_FILTER_SYNC;
 }
 
 /**
-- 
2.11.0



Re: [PATCH rfc 4/4] iscsi-target: use generic inet_pton_with_scope

2017-02-18 Thread Nicholas A. Bellinger
On Thu, 2017-02-16 at 19:43 +0200, Sagi Grimberg wrote:
> Signed-off-by: Sagi Grimberg 
> ---
>  drivers/target/iscsi/iscsi_target_configfs.c | 46 
> 
>  1 file changed, 12 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/target/iscsi/iscsi_target_configfs.c 
> b/drivers/target/iscsi/iscsi_target_configfs.c
> index bf40f03755dd..f30c27b83c5e 100644
> --- a/drivers/target/iscsi/iscsi_target_configfs.c
> +++ b/drivers/target/iscsi/iscsi_target_configfs.c
> @@ -167,10 +167,7 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
>   struct iscsi_portal_group *tpg;
>   struct iscsi_tpg_np *tpg_np;
>   char *str, *str2, *ip_str, *port_str;
> - struct sockaddr_storage sockaddr;
> - struct sockaddr_in *sock_in;
> - struct sockaddr_in6 *sock_in6;
> - unsigned long port;
> + struct sockaddr_storage sockaddr = { };
>   int ret;
>   char buf[MAX_PORTAL_LEN + 1];
>  
> @@ -182,21 +179,19 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
>   memset(buf, 0, MAX_PORTAL_LEN + 1);
>   snprintf(buf, MAX_PORTAL_LEN + 1, "%s", name);
>  
> - memset(&sockaddr, 0, sizeof(struct sockaddr_storage));
> -
>   str = strstr(buf, "[");
>   if (str) {
> - const char *end;
> -
>   str2 = strstr(str, "]");
>   if (!str2) {
>   pr_err("Unable to locate trailing \"]\""
>   " in IPv6 iSCSI network portal address\n");
>   return ERR_PTR(-EINVAL);
>   }
> - str++; /* Skip over leading "[" */
> +
> + ip_str = str + 1; /* Skip over leading "[" */
>   *str2 = '\0'; /* Terminate the unbracketed IPv6 address */
>   str2++; /* Skip over the \0 */
> +
>   port_str = strstr(str2, ":");
>   if (!port_str) {
>   pr_err("Unable to locate \":port\""
> @@ -205,23 +200,8 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
>   }
>   *port_str = '\0'; /* Terminate string for IP */
>   port_str++; /* Skip over ":" */
> -
> - ret = kstrtoul(port_str, 0, &port);
> - if (ret < 0) {
> - pr_err("kstrtoul() failed for port_str: %d\n", ret);
> - return ERR_PTR(ret);
> - }
> - sock_in6 = (struct sockaddr_in6 *)&sockaddr;
> - sock_in6->sin6_family = AF_INET6;
> - sock_in6->sin6_port = htons((unsigned short)port);
> - ret = in6_pton(str, -1,
> - (void *)&sock_in6->sin6_addr.in6_u, -1, &end);
> - if (ret <= 0) {
> - pr_err("in6_pton returned: %d\n", ret);
> - return ERR_PTR(-EINVAL);
> - }
>   } else {
> - str = ip_str = &buf[0];
> + ip_str = &buf[0];
>   port_str = strstr(ip_str, ":");
>   if (!port_str) {
>   pr_err("Unable to locate \":port\""
> @@ -230,17 +210,15 @@ static struct se_tpg_np *lio_target_call_addnptotpg(
>   }
>   *port_str = '\0'; /* Terminate string for IP */
>   port_str++; /* Skip over ":" */
> + }
>  
> - ret = kstrtoul(port_str, 0, &port);
> - if (ret < 0) {
> - pr_err("kstrtoul() failed for port_str: %d\n", ret);
> - return ERR_PTR(ret);
> - }
> - sock_in = (struct sockaddr_in *)&sockaddr;
> - sock_in->sin_family = AF_INET;
> - sock_in->sin_port = htons((unsigned short)port);
> - sock_in->sin_addr.s_addr = in_aton(ip_str);
> + ret = inet_pton_with_scope(&init_net, AF_UNSPEC, ip_str,
> + port_str, &sockaddr);
> + if (ret) {
> + pr_err("malformed ip/port passed: %s\n", name);
> + return ERR_PTR(ret);
>   }
> +
>   tpg = container_of(se_tpg, struct iscsi_portal_group, tpg_se_tpg);
>   ret = iscsit_get_tpg(tpg);
>   if (ret < 0)

A nice cleanup.

Acked-by: Nicholas Bellinger 



Re: [PATCH repost] ptr_ring: fix race conditions when resizing

2017-02-18 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Sun, 19 Feb 2017 07:17:17 +0200

> Dave, could you merge this before 4.10? If not - I can try.

I just sent my last pull request to Linus, please merge it to
him directly.

Thanks.


Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-18 Thread Michael S. Tsirkin
On Sat, Feb 18, 2017 at 06:28:39PM +0100, Dmitry Vyukov wrote:
> On Fri, Feb 10, 2017 at 6:17 AM, Jason Wang  wrote:
> >
> >
> > On 2017年02月10日 02:10, Michael S. Tsirkin wrote:
> >>
> >> On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote:
> >>>
> >>> - Original Message -
> 
>  Hello,
> 
>  I've got the following report while running syzkaller fuzzer on mmotm
>  (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
>  remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:
> 
> >>> [...]
> >>>
>  other info that might help us debug this:
> 
>    Possible interrupt unsafe locking scenario:
> 
>  CPU0CPU1
>  
> lock(&(&r->consumer_lock)->rlock);
>  local_irq_disable();
>  lock(&(&r->producer_lock)->rlock);
>  lock(&(&r->consumer_lock)->rlock);
> 
>   lock(&(&r->producer_lock)->rlock);
> 
> >>> Thanks a lot for the testing.
> >>>
> >>> Looks like we could address this by using skb_array_consume_bh() instead.
> >>>
> >>> Could you pls verify if the following patch works?
> >>
> >> I think we should use _bh for the produce call as well,
> >> since resizing takes the producer lock.
> >
> > Looks not since irq was disabled during resizing?
> 
> 
> Hello,
> 
> Is there a fix for this that we can pick up?
> This killed 10'000 VMs on our testing infra over the last day. Still
> happening on linux-next.
> 
> Thanks

I posted a fix.  ptr_ring: fix race conditions when resizing
Just reposted.  I'll push into linux-next ASAP.

-- 
MST


Re: [PATCH net-next] virito-net: set queues after reset during xdp_set

2017-02-18 Thread Michael S. Tsirkin
On Fri, Feb 17, 2017 at 01:10:08PM +0800, Jason Wang wrote:
> 
> 
> On 2017年02月17日 12:53, John Fastabend wrote:
> > On 17-02-15 01:08 AM, Jason Wang wrote:
> > > We set queues before reset which will cause a crash[1]. This is
> > > because is_xdp_raw_buffer_queue() depends on the old xdp queue pairs
> > > number to do the correct detection. So fix this by:
> > > 
> > > - set queues after reset, to keep the old vi->curr_queue_pairs. (in
> > >fact setting queues before reset does not works since after feature
> > >set, all queue pairs were enabled by default during reset).
> > > - change xdp_queue_pairs only after virtnet_reset() is succeed.
> > > 
> > > [1]
> > I'm guessing this occurs when enabling XDP while receiving lots of traffic?
> 
> I hit this then disabling XDP while receiving lots of traffic.
> 
> > 
> > > [   74.328168] general protection fault:  [#1] SMP
> > > [   74.328625] Modules linked in: nfsd xfs libcrc32c virtio_net virtio_pci
> > > [   74.329117] CPU: 0 PID: 2849 Comm: xdp2 Not tainted 4.10.0-rc7+ #499
> > > [   74.329577] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
> > > [   74.330424] task: 88007a894000 task.stack: c90004388000
> > > [   74.330844] RIP: 0010:skb_release_head_state+0x28/0x80
> > > [   74.331298] RSP: 0018:c9000438b8d0 EFLAGS: 00010206
> > > [   74.331676] RAX:  RBX: 88007ad96300 RCX: 
> > > 
> > > [   74.332217] RDX: 88007fc137a8 RSI: 88007fc0db28 RDI: 
> > > 0001bf0001be
> > > [   74.332758] RBP: c9000438b8d8 R08: 0005008f R09: 
> > > 05f9
> > > [   74.333274] R10: 88007d001700 R11: 820a8a4d R12: 
> > > 88007ad96300
> > > [   74.333787] R13: 0002 R14: 880036604000 R15: 
> > > 77ff8000
> > > [   74.334308] FS:  7fc70d8a7b40() GS:88007fc0() 
> > > knlGS:
> > > [   74.334891] CS:  0010 DS:  ES:  CR0: 80050033
> > > [   74.335314] CR2: 7fff4144a710 CR3: 7ab56000 CR4: 
> > > 003406f0
> > > [   74.335830] DR0:  DR1:  DR2: 
> > > 
> > > [   74.336373] DR3:  DR6: fffe0ff0 DR7: 
> > > 0400
> > > [   74.336895] Call Trace:
> > > [   74.337086]  skb_release_all+0xd/0x30
> > > [   74.337356]  consume_skb+0x2c/0x90
> > > [   74.337607]  free_unused_bufs+0x1ff/0x270 [virtio_net]
> > > [   74.337988]  ? vp_synchronize_vectors+0x3b/0x60 [virtio_pci]
> > > [   74.338398]  virtnet_xdp+0x21e/0x440 [virtio_net]
> > > [   74.338741]  dev_change_xdp_fd+0x101/0x140
> > > [   74.339048]  do_setlink+0xcf4/0xd20
> > > [   74.339304]  ? symcmp+0xf/0x20
> > > [   74.339529]  ? mls_level_isvalid+0x52/0x60
> > > [   74.339828]  ? mls_range_isvalid+0x43/0x50
> > > [   74.340135]  ? nla_parse+0xa0/0x100
> > > [   74.340400]  rtnl_setlink+0xd4/0x120
> > > [   74.340664]  ? cpumask_next_and+0x30/0x50
> > > [   74.340966]  rtnetlink_rcv_msg+0x7f/0x1f0
> > > [   74.341259]  ? sock_has_perm+0x59/0x60
> > > [   74.341586]  ? napi_consume_skb+0xe2/0x100
> > > [   74.342010]  ? rtnl_newlink+0x890/0x890
> > > [   74.342435]  netlink_rcv_skb+0x92/0xb0
> > > [   74.342846]  rtnetlink_rcv+0x23/0x30
> > > [   74.343277]  netlink_unicast+0x162/0x210
> > > [   74.343677]  netlink_sendmsg+0x2db/0x390
> > > [   74.343968]  sock_sendmsg+0x33/0x40
> > > [   74.344233]  SYSC_sendto+0xee/0x160
> > > [   74.344482]  ? SYSC_bind+0xb0/0xe0
> > > [   74.344806]  ? sock_alloc_file+0x92/0x110
> > > [   74.345106]  ? fd_install+0x20/0x30
> > > [   74.345360]  ? sock_map_fd+0x3f/0x60
> > > [   74.345586]  SyS_sendto+0x9/0x10
> > > [   74.345790]  entry_SYSCALL_64_fastpath+0x1a/0xa9
> > > [   74.346086] RIP: 0033:0x7fc70d1b8f6d
> > > [   74.346312] RSP: 002b:7fff4144a708 EFLAGS: 0246 ORIG_RAX: 
> > > 002c
> > > [   74.346785] RAX: ffda RBX:  RCX: 
> > > 7fc70d1b8f6d
> > > [   74.347244] RDX: 002c RSI: 7fff4144a720 RDI: 
> > > 0003
> > > [   74.347683] RBP: 0003 R08:  R09: 
> > > 
> > > [   74.348544] R10:  R11: 0246 R12: 
> > > 7fff4144bd90
> > > [   74.349082] R13: 0002 R14: 0002 R15: 
> > > 7fff4144cda0
> > > [   74.349607] Code: 00 00 00 55 48 89 e5 53 48 89 fb 48 8b 7f 58 48 85 
> > > ff 74 0e 40 f6 c7 01 74 3d 48 c7 43 58 00 00 00 00 48 8b 7b 68 48 85 ff 
> > > 74 05  ff 0f 74 20 48 8b 43 60 48 85 c0 74 14 65 8b 15 f3 ab 8d 7e
> > > [   74.351008] RIP: skb_release_head_state+0x28/0x80 RSP: c9000438b8d0
> > > [   74.351625] ---[ end trace fe6e19fd11cfc80b ]---
> > > 
> > > Fixes: 2de2f7f40ef9 ("virtio_net: XDP support for adjust_head")
> > > Cc: John Fastabend 
> > > Signed-off-by: Jason Wang 
> > > ---
> > >   drivers/net/virtio_net.c | 35 ++

[PATCH repost] ptr_ring: fix race conditions when resizing

2017-02-18 Thread Michael S. Tsirkin
Resizing currently drops consumer lock.  This can cause entries to be
reordered, which isn't good in itself.  More importantly, consumer can
detect a false ring empty condition and block forever.

Further, nesting of consumer within producer lock is problematic for
tun, since it produces entries in a BH, which causes a lock order
reversal:

   CPU0CPU1
   
  consume:
  lock(&(&r->consumer_lock)->rlock);
   resize:
   local_irq_disable();
   lock(&(&r->producer_lock)->rlock);
   lock(&(&r->consumer_lock)->rlock);
  
  produce:
  lock(&(&r->producer_lock)->rlock);

To fix, nest producer lock within consumer lock during resize,
and keep consumer lock during the whole swap operation.

Reported-by: Dmitry Vyukov 
Cc: sta...@vger.kernel.org
Cc: "David S. Miller" 
Acked-by: Jason Wang 
Signed-off-by: Michael S. Tsirkin 
---

Dave, could you merge this before 4.10? If not - I can try.

 include/linux/ptr_ring.h | 36 +++-
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
index 2052011..6c70444 100644
--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -111,6 +111,11 @@ static inline int __ptr_ring_produce(struct ptr_ring *r, 
void *ptr)
return 0;
 }
 
+/*
+ * Note: resize (below) nests producer lock within consumer lock, so if you
+ * consume in interrupt or BH context, you must disable interrupts/BH when
+ * calling this.
+ */
 static inline int ptr_ring_produce(struct ptr_ring *r, void *ptr)
 {
int ret;
@@ -242,6 +247,11 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
return ptr;
 }
 
+/*
+ * Note: resize (below) nests producer lock within consumer lock, so if you
+ * call this in interrupt or BH context, you must disable interrupts/BH when
+ * producing.
+ */
 static inline void *ptr_ring_consume(struct ptr_ring *r)
 {
void *ptr;
@@ -357,7 +367,7 @@ static inline void **__ptr_ring_swap_queue(struct ptr_ring 
*r, void **queue,
void **old;
void *ptr;
 
-   while ((ptr = ptr_ring_consume(r)))
+   while ((ptr = __ptr_ring_consume(r)))
if (producer < size)
queue[producer++] = ptr;
else if (destroy)
@@ -372,6 +382,12 @@ static inline void **__ptr_ring_swap_queue(struct ptr_ring 
*r, void **queue,
return old;
 }
 
+/*
+ * Note: producer lock is nested within consumer lock, so if you
+ * resize you must make sure all uses nest correctly.
+ * In particular if you consume ring in interrupt or BH context, you must
+ * disable interrupts/BH when doing so.
+ */
 static inline int ptr_ring_resize(struct ptr_ring *r, int size, gfp_t gfp,
  void (*destroy)(void *))
 {
@@ -382,17 +398,25 @@ static inline int ptr_ring_resize(struct ptr_ring *r, int 
size, gfp_t gfp,
if (!queue)
return -ENOMEM;
 
-   spin_lock_irqsave(&(r)->producer_lock, flags);
+   spin_lock_irqsave(&(r)->consumer_lock, flags);
+   spin_lock(&(r)->producer_lock);
 
old = __ptr_ring_swap_queue(r, queue, size, gfp, destroy);
 
-   spin_unlock_irqrestore(&(r)->producer_lock, flags);
+   spin_unlock(&(r)->producer_lock);
+   spin_unlock_irqrestore(&(r)->consumer_lock, flags);
 
kfree(old);
 
return 0;
 }
 
+/*
+ * Note: producer lock is nested within consumer lock, so if you
+ * resize you must make sure all uses nest correctly.
+ * In particular if you consume ring in interrupt or BH context, you must
+ * disable interrupts/BH when doing so.
+ */
 static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
   int size,
   gfp_t gfp, void (*destroy)(void *))
@@ -412,10 +436,12 @@ static inline int ptr_ring_resize_multiple(struct 
ptr_ring **rings, int nrings,
}
 
for (i = 0; i < nrings; ++i) {
-   spin_lock_irqsave(&(rings[i])->producer_lock, flags);
+   spin_lock_irqsave(&(rings[i])->consumer_lock, flags);
+   spin_lock(&(rings[i])->producer_lock);
queues[i] = __ptr_ring_swap_queue(rings[i], queues[i],
  size, gfp, destroy);
-   spin_unlock_irqrestore(&(rings[i])->producer_lock, flags);
+   spin_unlock(&(rings[i])->producer_lock);
+   spin_unlock_irqrestore(&(rings[i])->consumer_lock, flags);
}
 
for (i = 0; i < nrings; ++i)
-- 
MST


[GIT] Networking

2017-02-18 Thread David Miller

One last brown-paper-bag fix for the release.  If we fail the ipv4
mapped source address check, we have to release the route.  From
Willem de Bruijn.

Please pull, thanks a lot!

The following changes since commit 2763f92f858f7c4c3198335c0542726eaed07ba3:

  Merge tag 'fixes-for-linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc (2017-02-18 17:38:09 
-0800)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 00ea1ceebe0d9f2dc1cc2b7bd575a00100c27869:

  ipv6: release dst on error in ip6_dst_lookup_tail (2017-02-18 22:55:13 -0500)


Willem de Bruijn (1):
  ipv6: release dst on error in ip6_dst_lookup_tail

 net/ipv6/ip6_output.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)


Re: [PATCH net v2] ipv6: release dst on error in ip6_dst_lookup_tail

2017-02-18 Thread David Miller
From: Willem de Bruijn 
Date: Sat, 18 Feb 2017 19:00:45 -0500

> From: Willem de Bruijn 
> 
> If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
> check, release the dst before returning an error.
> 
> Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
> Signed-off-by: Willem de Bruijn 

Applied, thanks Willem.


Re: Questions on XDP

2017-02-18 Thread John Fastabend
On 17-02-18 06:16 PM, Alexander Duyck wrote:
> On Sat, Feb 18, 2017 at 3:48 PM, John Fastabend
>  wrote:
>> On 17-02-18 03:31 PM, Alexei Starovoitov wrote:
>>> On Sat, Feb 18, 2017 at 10:18 AM, Alexander Duyck
>>>  wrote:

> XDP_DROP does not require having one page per frame.

 Agreed.
>>>
>>> why do you think so?
>>> xdp_drop is targeting ddos where in good case
>>> all traffic is passed up and in bad case
>>> most of the traffic is dropped, but good traffic still needs
>>> to be serviced by the layers after. Like other xdp
>>> programs and the stack.
>>> Say ixgbe+xdp goes with 2k per packet,
>>> very soon we will have a bunch of half pages
>>> sitting in the stack and other halfs requiring
>>> complex refcnting and making the actual
>>> ddos mitigation ineffective and forcing nic to drop packets
>>
>> I'm not seeing the distinction here. If its a 4k page and
>> in the stack the driver will get overrun as well.
>>
>>> because it runs out of buffers. Why complicate things?
>>
>> It doesn't seem complex to me and the driver already handles this
>> case so it actually makes the drivers simpler because there is only
>> a single buffer management path.
>>
>>> packet per page approach is simple and effective.
>>> virtio is different. there we don't have hw that needs
>>> to have buffers ready for dma.
>>>
 Looking at the Mellanox way of doing it I am not entirely sure it is
 useful.  It looks good for benchmarks but that is about it.  Also I
>>>
>>> it's the opposite. It already runs very nicely in production.
>>> In real life it's always a combination of xdp_drop, xdp_tx and
>>> xdp_pass actions.
>>> Sounds like ixgbe wants to do things differently because
>>> of not-invented-here. That new approach may turn
>>> out to be good or bad, but why risk it?
>>> mlx4 approach works.
>>> mlx5 has few issues though, because page recycling
>>> was done too simplistic. Generic page pool/recycling
>>> that all drivers will use should solve that. I hope.
>>> Is the proposal to have generic split-page recycler ?
>>> How that is going to work?
>>>
>>
>> No, just give the driver a page when it asks for it. How the
>> driver uses the page is not the pools concern.
>>
 don't see it extending out to the point that we would be able to
 exchange packets between interfaces which really seems like it should
 be the ultimate goal for XDP_TX.
>>>
>>> we don't have a use case for multi-port xdp_tx,
>>> but I'm not objecting to doing it in general.
>>> Just right now I don't see a need to complicate
>>> drivers to do so.
>>
>> We are running our vswitch in userspace now for many workloads
>> it would be nice to have these in kernel if possible.
>>
>>>
 It seems like eventually we want to be able to peel off the buffer and
 send it to something other than ourselves.  For example it seems like
 it might be useful at some point to use XDP to do traffic
 classification and have it route packets between multiple interfaces
 on a host and it wouldn't make sense to have all of them map every
 page as bidirectional because it starts becoming ridiculous if you
 have dozens of interfaces in a system.
>>>
>>> dozen interfaces? Like a single nic with dozen ports?
>>> or many nics with many ports on the same system?
>>> are you trying to build a switch out of x86?
>>> I don't think it's realistic to have multi-terrabit x86 box.
>>> Is it all because of dpdk/6wind demos?
>>> I saw how dpdk was bragging that they can saturate
>>> pcie bus. So? Why is this useful?
> 
> Actually I was thinking more of an OVS, bridge, or routing
> replacement.  Basically with a couple of physical interfaces and then
> either veth and/or vhost interfaces.
> 

Yep valid use case for me. We would use this with Intel Clear Linux
assuming we can sort it out and perf metrics are good.

>>> Why anyone would care to put a bunch of nics
>>> into x86 and demonstrate that bandwidth of pcie is now
>>> a limiting factor ?
>>
>> Maybe Alex had something else in mind but we have many virtual interfaces
>> plus physical interfaces in vswitch use case. Possibly thousands.
> 
> I was thinking about the fact that the Mellanox driver is currently
> mapping pages as bidirectional, so I was sticking to the device to
> device case in regards to that discussion.  For virtual interfaces we
> don't even need the DMA mapping, it is just a copy to user space we
> have to deal with in the case of vhost.  In that regard I was thinking
> we need to start looking at taking XDP_TX one step further and
> possibly look at supporting the transmit of an xdp_buf on an unrelated
> netdev.  Although it looks like that means adding a netdev pointer to
> xdp_buf in order to support returning that.
> 
> Anyway I am just running on conjecture at this point.  But it seems
> like if we want to make XDP capable of doing transmit we should
> support something other than bounce on the same port since that seems
> like a "just saturate the bus" use case m

Re: Questions on XDP

2017-02-18 Thread Alexander Duyck
On Sat, Feb 18, 2017 at 3:48 PM, John Fastabend
 wrote:
> On 17-02-18 03:31 PM, Alexei Starovoitov wrote:
>> On Sat, Feb 18, 2017 at 10:18 AM, Alexander Duyck
>>  wrote:
>>>
 XDP_DROP does not require having one page per frame.
>>>
>>> Agreed.
>>
>> why do you think so?
>> xdp_drop is targeting ddos where in good case
>> all traffic is passed up and in bad case
>> most of the traffic is dropped, but good traffic still needs
>> to be serviced by the layers after. Like other xdp
>> programs and the stack.
>> Say ixgbe+xdp goes with 2k per packet,
>> very soon we will have a bunch of half pages
>> sitting in the stack and other halfs requiring
>> complex refcnting and making the actual
>> ddos mitigation ineffective and forcing nic to drop packets
>
> I'm not seeing the distinction here. If its a 4k page and
> in the stack the driver will get overrun as well.
>
>> because it runs out of buffers. Why complicate things?
>
> It doesn't seem complex to me and the driver already handles this
> case so it actually makes the drivers simpler because there is only
> a single buffer management path.
>
>> packet per page approach is simple and effective.
>> virtio is different. there we don't have hw that needs
>> to have buffers ready for dma.
>>
>>> Looking at the Mellanox way of doing it I am not entirely sure it is
>>> useful.  It looks good for benchmarks but that is about it.  Also I
>>
>> it's the opposite. It already runs very nicely in production.
>> In real life it's always a combination of xdp_drop, xdp_tx and
>> xdp_pass actions.
>> Sounds like ixgbe wants to do things differently because
>> of not-invented-here. That new approach may turn
>> out to be good or bad, but why risk it?
>> mlx4 approach works.
>> mlx5 has few issues though, because page recycling
>> was done too simplistic. Generic page pool/recycling
>> that all drivers will use should solve that. I hope.
>> Is the proposal to have generic split-page recycler ?
>> How that is going to work?
>>
>
> No, just give the driver a page when it asks for it. How the
> driver uses the page is not the pools concern.
>
>>> don't see it extending out to the point that we would be able to
>>> exchange packets between interfaces which really seems like it should
>>> be the ultimate goal for XDP_TX.
>>
>> we don't have a use case for multi-port xdp_tx,
>> but I'm not objecting to doing it in general.
>> Just right now I don't see a need to complicate
>> drivers to do so.
>
> We are running our vswitch in userspace now for many workloads
> it would be nice to have these in kernel if possible.
>
>>
>>> It seems like eventually we want to be able to peel off the buffer and
>>> send it to something other than ourselves.  For example it seems like
>>> it might be useful at some point to use XDP to do traffic
>>> classification and have it route packets between multiple interfaces
>>> on a host and it wouldn't make sense to have all of them map every
>>> page as bidirectional because it starts becoming ridiculous if you
>>> have dozens of interfaces in a system.
>>
>> dozen interfaces? Like a single nic with dozen ports?
>> or many nics with many ports on the same system?
>> are you trying to build a switch out of x86?
>> I don't think it's realistic to have multi-terrabit x86 box.
>> Is it all because of dpdk/6wind demos?
>> I saw how dpdk was bragging that they can saturate
>> pcie bus. So? Why is this useful?

Actually I was thinking more of an OVS, bridge, or routing
replacement.  Basically with a couple of physical interfaces and then
either veth and/or vhost interfaces.

>> Why anyone would care to put a bunch of nics
>> into x86 and demonstrate that bandwidth of pcie is now
>> a limiting factor ?
>
> Maybe Alex had something else in mind but we have many virtual interfaces
> plus physical interfaces in vswitch use case. Possibly thousands.

I was thinking about the fact that the Mellanox driver is currently
mapping pages as bidirectional, so I was sticking to the device to
device case in regards to that discussion.  For virtual interfaces we
don't even need the DMA mapping, it is just a copy to user space we
have to deal with in the case of vhost.  In that regard I was thinking
we need to start looking at taking XDP_TX one step further and
possibly look at supporting the transmit of an xdp_buf on an unrelated
netdev.  Although it looks like that means adding a netdev pointer to
xdp_buf in order to support returning that.

Anyway I am just running on conjecture at this point.  But it seems
like if we want to make XDP capable of doing transmit we should
support something other than bounce on the same port since that seems
like a "just saturate the bus" use case more than anything.  I suppose
you can do a one armed router, or have it do encap/decap for a tunnel,
but that is about the limits of it.  If we allow it to do transmit on
other netdevs then suddenly this has the potential to replace
significant existing infrastructure.

Sorry if I am stirring the hor

Re: [PATCH iproute2 net-next 0/3] iplink: add support for link xstats

2017-02-18 Thread Stephen Hemminger
On Wed, 15 Feb 2017 15:23:10 +0100
Nikolay Aleksandrov  wrote:

> Hi,
> This set adds support for printing link xstats per link type. Currently
> only the bridge and its ports support such call and it dumps the mcast
> stats. This model makes it easy to use the same callback for both bridge
> and bridge_slave link types. Patch 01 also updates the man page with the
> new xstats link option and you can find an example in patch 02's commit
> message.
> 
> Thanks,
>  Nik
> 
> 
> Nikolay Aleksandrov (3):
>   iplink: add support for xstats subcommand
>   iplink: bridge: add support for displaying xstats
>   iplink: bridge_slave: add support for displaying xstats
> 
>  ip/Makefile  |   2 +-
>  ip/ip_common.h   |  12 +++-
>  ip/iplink.c  |   5 ++
>  ip/iplink_bridge.c   | 153 
> +++
>  ip/iplink_bridge_slave.c |   2 +
>  ip/iplink_xstats.c   |  81 +
>  man/man8/ip-link.8.in|  12 
>  7 files changed, 264 insertions(+), 3 deletions(-)
>  create mode 100644 ip/iplink_xstats.c
> 

Applied thanks. There was some minor fuzz on first batch due to other changes.


Re: [PATCH iproute2] devlink: Call dl_free in early exit case

2017-02-18 Thread Stephen Hemminger
On Tue, 14 Feb 2017 07:29:38 +0200
Leon Romanovsky  wrote:

> From: Leon Romanovsky 
> 
> Prior to parsing command options, the devlink tool allocates memory
> to store results. In case of early exit (wrong parameters or version
> check), this memory wasn't freed.
> 
> Signed-off-by: Leon Romanovsky 
> Acked-by: Jiri Pirko 

Applied, thanks.


Re: [PATCH iproute2 1/1] man page: add page for skbmod action

2017-02-18 Thread Stephen Hemminger
On Fri, 10 Feb 2017 18:28:54 -0500
Lucas Bates  wrote:

> Signed-off-by: Lucas Bates 
> Signed-off-by: Jamal Hadi Salim 
> Signed-off-by: Roman Mashak 

Applied to master branch.


Re: [PATCH net v2] ipv6: release dst on error in ip6_dst_lookup_tail

2017-02-18 Thread Eric Dumazet
On Sat, 2017-02-18 at 19:00 -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
> check, release the dst before returning an error.
> 
> Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
> Signed-off-by: Willem de Bruijn 
> ---
Acked-by: Eric Dumazet 




Re: [PATCH v2 iproute2 2/2] actions: Add support for user cookies

2017-02-18 Thread Stephen Hemminger
On Fri, 10 Feb 2017 06:25:44 -0500
Jamal Hadi Salim  wrote:

> From: Jamal Hadi Salim 
> 
> Make use of 128b user cookies
> 
> Introduce optional 128-bit action cookie.
> Like all other cookie schemes in the networking world (eg in protocols
> like http or existing kernel fib protocol field, etc) the idea is to
> save user state that when retrieved serves as a correlator. The kernel
> _should not_ intepret it. The user can store whatever they wish in the
> 128 bits.

The support for this has not been accepted upstream in net-next yet.
Therefore submitting it into iproute2 is premature.
Pleas resubmit when the kernel part (and uapi) are upstream.



Re: [PATCH v2 iproute2 1/2] utils: make hex2mem available to all users

2017-02-18 Thread Stephen Hemminger
On Fri, 10 Feb 2017 06:25:43 -0500
Jamal Hadi Salim  wrote:

> From: Jamal Hadi Salim 
> 
> hex2mem() api is useful for parsing hexstrings which are then packed in
> a stream of chars.
> 
> Signed-off-by: Jamal Hadi Salim 

I went ahead and applied this part since it makes sense to have it now.
The other part depends on TC_ACT_COOKIE which is not upstream in net-next yet.


Re: [PATCH iproute2/net-next repost 0/3] tc: flower: support masked ICMP code and type match

2017-02-18 Thread Stephen Hemminger
On Thu,  9 Feb 2017 14:48:58 +0100
Simon Horman  wrote:

> Hi,
> 
> this short series allows the tc tool to configure masked matches on the
> ICMP code and type. Unmasked matches are already supported by the tool.
> 
> This does not depend on any kernel changes as support for both masked and
> unmasked matches were added to the kernel at the same time.
> 
> Sample usage:
> 
> tc qdisc add dev eth0 ingress
> tc filter add dev eth0 protocol ipv6 parent : flower \
> indev eth0 ip_proto icmpv6 type 128/240 code 0 action drop
> 
> Reposting after breaking out of a larger patchset.
> 
> Simon Horman (3):
>   tc: flower: provide generic masked u8 parser helper
>   tc: flower: provide generic masked u8 print helper
>   tc: flower: support masked ICMP code and type match
> 
>  man/man8/tc-flower.8 |  16 --
>  tc/f_flower.c| 158 
> ++-
>  2 files changed, 119 insertions(+), 55 deletions(-)
> 

Applied to net-next


Re: [PATCH iproute2 0/4] ip vrf: updates to handle cgroup hiearchy and namespace nesting

2017-02-18 Thread Stephen Hemminger
On Thu, 16 Feb 2017 08:58:54 -0800
David Ahern  wrote:

> More updates to ip vrf for 4.10. Major changes: handle vrf in an existing
> cgroup hierarchy and handle vrf nesting in network namespaces.
> 
> Comparison of the netns in bpf code will be added once the kernel patch
> is accepted.
> 
> David Ahern (4):
>   ip vrf: Handle vrf in a cgroup hierarchy
>   ip netns: refactor netns_identify
>   ip vrf: Handle VRF nesting in namespace
>   ip vrf: Detect invalid vrf name in pids command
> 
>  ip/ip_common.h |   1 +
>  ip/ipnetns.c   |  47 +++-
>  ip/ipvrf.c | 222 
> ++---
>  3 files changed, 227 insertions(+), 43 deletions(-)
> 

Applied all four thanks


Re: [PATCH iproute2 1/2] testsuite: refactor kernel config search

2017-02-18 Thread Stephen Hemminger
On Wed, 15 Feb 2017 21:26:41 +
Asbjørn Sloth Tønnesen  wrote:

> Signed-off-by: Asbjørn Sloth Tønnesen 

Both applied thanks


Re: [PATCH iproute] tc: matchall: Print skip flags when dumping a filter

2017-02-18 Thread Stephen Hemminger
On Thu,  9 Feb 2017 15:10:14 +0200
Or Gerlitz  wrote:

> Print the skip flags when we dump a filter.
> 
> Signed-off-by: Or Gerlitz 
> Acked by: Yotam Gigi 

Applied, thanks




Re: [PATCH iproute2 v2] ip route: Make name of protocol 0 consistent

2017-02-18 Thread Stephen Hemminger
On Mon, 13 Feb 2017 12:21:53 -0800
David Ahern  wrote:

> iproute2 can inconsistently show the name of protocol 0 if a route with
> a custom protocol is added. For example:
>   dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
>   local ::1 dev lo  table local  proto none  metric 0  pref medium
>   local fe80::225:90ff:fecb:1c18 dev lo  table local  proto none  metric 0  
> pref medium
>   local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto none  metric 0  
> pref medium
> 
> protocol 0 is pretty printed as "none". Add a route with a custom protocol:
>   dsa@cartman:~$ sudo ip -6 ro add  2001:db8:200::1/128 dev eth0 proto 123
> 
> And now display has switched from "none" to "unspec":
>   dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
>   local ::1 dev lo  table local  proto unspec  metric 0  pref medium
>   local fe80::225:90ff:fecb:1c18 dev lo  table local  proto unspec  metric 0  
> pref medium
>   local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto unspec  metric 0 
>  pref medium
> 
> The rt_protos file has the id to name mapping as "unspec" while
> rtnl_rtprot_tab[0] has "none". The presence of a custom protocol id
> triggers reading the rt_protos file and overwriting the string in
> rtnl_rtprot_tab. All of this is logic from 2004 and earlier.
> 
> Update rtnl_rtprot_tab to "unspec" to match the enum value.
> 
> Signed-off-by: David Ahern 

Applied, thanks.


[PATCH net v2] ipv6: release dst on error in ip6_dst_lookup_tail

2017-02-18 Thread Willem de Bruijn
From: Willem de Bruijn 

If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
check, release the dst before returning an error.

Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
Signed-off-by: Willem de Bruijn 
---
 net/ipv6/ip6_output.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index e164684456df..7cebee58e55b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1022,8 +1022,10 @@ static int ip6_dst_lookup_tail(struct net *net, const 
struct sock *sk,
}
 #endif
if (ipv6_addr_v4mapped(&fl6->saddr) &&
-   !(ipv6_addr_v4mapped(&fl6->daddr) || ipv6_addr_any(&fl6->daddr)))
-   return -EAFNOSUPPORT;
+   !(ipv6_addr_v4mapped(&fl6->daddr) || ipv6_addr_any(&fl6->daddr))) {
+   err = -EAFNOSUPPORT;
+   goto out_err_release;
+   }
 
return 0;
 
-- 
2.11.0.483.g087da7b7c-goog



Re: Questions on XDP

2017-02-18 Thread Alexei Starovoitov
On Sat, Feb 18, 2017 at 3:48 PM, John Fastabend
 wrote:
>
> We are running our vswitch in userspace now for many workloads
> it would be nice to have these in kernel if possible.
...
> Maybe Alex had something else in mind but we have many virtual interfaces
> plus physical interfaces in vswitch use case. Possibly thousands.

virtual interfaces towards many VMs is certainly good use case
that we need to address.
we'd still need to copy the packet from memory of one vm into another,
right? so per packet allocation strategy for virtual interface can
be anything.

Sounds like you already have patches that do that?
Excellent. Please share.


Re: Questions on XDP

2017-02-18 Thread Eric Dumazet
On Sat, 2017-02-18 at 15:48 -0800, John Fastabend wrote:

> I'm not seeing the distinction here. If its a 4k page and
> in the stack the driver will get overrun as well.

Agree.

Using a full page per Ethernet frame does not change the attack vector.

It makes attacker job easier.




Re: [PATCH net] ipv6: release dst on error in ip6_dst_lookup_tail

2017-02-18 Thread Willem de Bruijn
> But looks like the commit sha1 was
> ec5e3b0a1d41fbda0cc33a45bc9e54e91d9d12c7
>
> So the correct tag would be
>
> Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.")

Thanks for catching that! I'll send a v2 with the fixed commit.


Re: [PATCH net] ipv6: release dst on error in ip6_dst_lookup_tail

2017-02-18 Thread Eric Dumazet
On Sat, 2017-02-18 at 18:51 -0500, Willem de Bruijn wrote:
> From: Willem de Bruijn 
> 
> If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
> check, release the dst before returning an error.
> 
> Fixes: b2e6f0c8dde5 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
> Signed-off-by: Willem de Bruijn 
> ---

Acked-by: Eric Dumazet 

But looks like the commit sha1 was
ec5e3b0a1d41fbda0cc33a45bc9e54e91d9d12c7

So the correct tag would be

Fixes: ec5e3b0a1d41 ("ipv6: Inhibit IPv4-mapped src address on the wire.")



[PATCH net] ipv6: release dst on error in ip6_dst_lookup_tail

2017-02-18 Thread Willem de Bruijn
From: Willem de Bruijn 

If ip6_dst_lookup_tail has acquired a dst and fails the IPv4-mapped
check, release the dst before returning an error.

Fixes: b2e6f0c8dde5 ("ipv6: Inhibit IPv4-mapped src address on the wire.")
Signed-off-by: Willem de Bruijn 
---
 net/ipv6/ip6_output.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index e164684456df..7cebee58e55b 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1022,8 +1022,10 @@ static int ip6_dst_lookup_tail(struct net *net, const 
struct sock *sk,
}
 #endif
if (ipv6_addr_v4mapped(&fl6->saddr) &&
-   !(ipv6_addr_v4mapped(&fl6->daddr) || ipv6_addr_any(&fl6->daddr)))
-   return -EAFNOSUPPORT;
+   !(ipv6_addr_v4mapped(&fl6->daddr) || ipv6_addr_any(&fl6->daddr))) {
+   err = -EAFNOSUPPORT;
+   goto out_err_release;
+   }
 
return 0;
 
-- 
2.11.0.483.g087da7b7c-goog



Re: Questions on XDP

2017-02-18 Thread John Fastabend
On 17-02-18 03:31 PM, Alexei Starovoitov wrote:
> On Sat, Feb 18, 2017 at 10:18 AM, Alexander Duyck
>  wrote:
>>
>>> XDP_DROP does not require having one page per frame.
>>
>> Agreed.
> 
> why do you think so?
> xdp_drop is targeting ddos where in good case
> all traffic is passed up and in bad case
> most of the traffic is dropped, but good traffic still needs
> to be serviced by the layers after. Like other xdp
> programs and the stack.
> Say ixgbe+xdp goes with 2k per packet,
> very soon we will have a bunch of half pages
> sitting in the stack and other halfs requiring
> complex refcnting and making the actual
> ddos mitigation ineffective and forcing nic to drop packets

I'm not seeing the distinction here. If its a 4k page and
in the stack the driver will get overrun as well.

> because it runs out of buffers. Why complicate things?

It doesn't seem complex to me and the driver already handles this
case so it actually makes the drivers simpler because there is only
a single buffer management path.

> packet per page approach is simple and effective.
> virtio is different. there we don't have hw that needs
> to have buffers ready for dma.
> 
>> Looking at the Mellanox way of doing it I am not entirely sure it is
>> useful.  It looks good for benchmarks but that is about it.  Also I
> 
> it's the opposite. It already runs very nicely in production.
> In real life it's always a combination of xdp_drop, xdp_tx and
> xdp_pass actions.
> Sounds like ixgbe wants to do things differently because
> of not-invented-here. That new approach may turn
> out to be good or bad, but why risk it?
> mlx4 approach works.
> mlx5 has few issues though, because page recycling
> was done too simplistic. Generic page pool/recycling
> that all drivers will use should solve that. I hope.
> Is the proposal to have generic split-page recycler ?
> How that is going to work?
> 

No, just give the driver a page when it asks for it. How the
driver uses the page is not the pools concern.

>> don't see it extending out to the point that we would be able to
>> exchange packets between interfaces which really seems like it should
>> be the ultimate goal for XDP_TX.
> 
> we don't have a use case for multi-port xdp_tx,
> but I'm not objecting to doing it in general.
> Just right now I don't see a need to complicate
> drivers to do so.

We are running our vswitch in userspace now for many workloads
it would be nice to have these in kernel if possible.

> 
>> It seems like eventually we want to be able to peel off the buffer and
>> send it to something other than ourselves.  For example it seems like
>> it might be useful at some point to use XDP to do traffic
>> classification and have it route packets between multiple interfaces
>> on a host and it wouldn't make sense to have all of them map every
>> page as bidirectional because it starts becoming ridiculous if you
>> have dozens of interfaces in a system.
> 
> dozen interfaces? Like a single nic with dozen ports?
> or many nics with many ports on the same system?
> are you trying to build a switch out of x86?
> I don't think it's realistic to have multi-terrabit x86 box.
> Is it all because of dpdk/6wind demos?
> I saw how dpdk was bragging that they can saturate
> pcie bus. So? Why is this useful?
> Why anyone would care to put a bunch of nics
> into x86 and demonstrate that bandwidth of pcie is now
> a limiting factor ?

Maybe Alex had something else in mind but we have many virtual interfaces
plus physical interfaces in vswitch use case. Possibly thousands.

> 
>> Also as far as the one page per frame it occurs to me that you will
>> have to eventually deal with things like frame replication.
> 
> ... only in cases where one needs to demo a multi-port
> bridge with lots of nics in one x86 box.
> I don't see practicality of such setup and I think
> that copying full page every time xdp needs to
> broadcast is preferred vs doing atomic refcnting
> that will slow down the main case. broadcast is slow path.
> 
> My strong believe that xdp should not care about
> niche architectures. It never meant to be a solution
> for everyone and for all use cases.
> If xdp sucks on powerpc, so be it.
> cpus with 64k pages are doomed. We should
> not sacrifice performance on x86 because of ppc.
> I think it was a mistake that ixgbe choose to do that
> in the past. When mb()s were added because
> of powerpc and it took years to introduce dma_mb()
> and return performance to good levels.
> btw, dma_mb work was awesome.
> In xdp I don't want to make such trade-offs.
> Really only x86 and arm64 archs matter today.
> Everything else is best effort.
> 



[PATCH] fsl/fman: fix spelling mistake in variable name en_tsu_err_exeption

2017-02-18 Thread Colin King
From: Colin Ian King 

trivial fix to spelling mistake, en_tsu_err_exeption should
be en_tsu_err_exception

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/freescale/fman/fman_dtsec.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fman/fman_dtsec.c 
b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
index c88918c..84ea130 100644
--- a/drivers/net/ethernet/freescale/fman/fman_dtsec.c
+++ b/drivers/net/ethernet/freescale/fman/fman_dtsec.c
@@ -337,7 +337,7 @@ struct fman_mac {
u8 mac_id;
u32 exceptions;
bool ptp_tsu_enabled;
-   bool en_tsu_err_exeption;
+   bool en_tsu_err_exception;
struct dtsec_cfg *dtsec_drv_param;
void *fm;
struct fman_rev_info fm_rev_info;
@@ -1247,12 +1247,12 @@ int dtsec_set_exception(struct fman_mac *dtsec,
switch (exception) {
case FM_MAC_EX_1G_1588_TS_RX_ERR:
if (enable) {
-   dtsec->en_tsu_err_exeption = true;
+   dtsec->en_tsu_err_exception = true;
iowrite32be(ioread32be(®s->tmr_pemask) |
TMR_PEMASK_TSREEN,
®s->tmr_pemask);
} else {
-   dtsec->en_tsu_err_exeption = false;
+   dtsec->en_tsu_err_exception = false;
iowrite32be(ioread32be(®s->tmr_pemask) &
~TMR_PEMASK_TSREEN,
®s->tmr_pemask);
@@ -1420,7 +1420,7 @@ struct fman_mac *dtsec_config(struct fman_mac_params 
*params)
dtsec->event_cb = params->event_cb;
dtsec->dev_id = params->dev_id;
dtsec->ptp_tsu_enabled = dtsec->dtsec_drv_param->ptp_tsu_en;
-   dtsec->en_tsu_err_exeption = dtsec->dtsec_drv_param->ptp_exception_en;
+   dtsec->en_tsu_err_exception = dtsec->dtsec_drv_param->ptp_exception_en;
 
dtsec->fm = params->fm;
dtsec->basex_if = params->basex_if;
-- 
2.10.2



Re: Questions on XDP

2017-02-18 Thread John Fastabend
On 17-02-18 10:18 AM, Alexander Duyck wrote:
> On Sat, Feb 18, 2017 at 9:41 AM, Eric Dumazet  wrote:
>> On Sat, 2017-02-18 at 17:34 +0100, Jesper Dangaard Brouer wrote:
>>> On Thu, 16 Feb 2017 14:36:41 -0800
>>> John Fastabend  wrote:
>>>
 On 17-02-16 12:41 PM, Alexander Duyck wrote:
> So I'm in the process of working on enabling XDP for the Intel NICs
> and I had a few questions so I just thought I would put them out here
> to try and get everything sorted before I paint myself into a corner.
>
> So my first question is why does the documentation mention 1 frame per
> page for XDP?
>>>
>>> Yes, XDP defines upfront a memory model where there is only one packet
>>> per page[1], please respect that!
>>>
>>> This is currently used/needed for fast-direct recycling of pages inside
>>> the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
>>> refcnt operations on the page. E.g. see mlx4_en_rx_recycle().

Alex, does your pagecnt_bias trick resolve this? It seems to me that the
recycling is working in ixgbe patches just fine (at least I never see the
allocator being triggered with simple XDP programs). The biggest win for
me right now is to avoid the dma mapping operations.

>>
>>
>> XDP_DROP does not require having one page per frame.
> 
> Agreed.
> 
>> (Look after my recent mlx4 patch series if you need to be convinced)
>>
>> Only XDP_TX is.

I'm still not sure what page per packet buys us on XDP_TX. What was the
explanation again?

>>
>> This requirement makes XDP useless (very OOM likely) on arches with 64K
>> pages.
> 
> Actually I have been having a side discussion with John about XDP_TX.
> Looking at the Mellanox way of doing it I am not entirely sure it is
> useful.  It looks good for benchmarks but that is about it.  Also I
> don't see it extending out to the point that we would be able to
> exchange packets between interfaces which really seems like it should
> be the ultimate goal for XDP_TX.

This is needed if we want XDP to be used for vswitch use cases. We have
a patch running on virtio but really need to get it working on real
hardware before we push it.

> 
> It seems like eventually we want to be able to peel off the buffer and
> send it to something other than ourselves.  For example it seems like
> it might be useful at some point to use XDP to do traffic
> classification and have it route packets between multiple interfaces
> on a host and it wouldn't make sense to have all of them map every
> page as bidirectional because it starts becoming ridiculous if you
> have dozens of interfaces in a system.
> 
> As per our original discussion at netconf if we want to be able to do
> XDP Tx with a fully lockless Tx ring we needed to have a Tx ring per
> CPU that is performing XDP.  The Tx path will end up needing to do the
> map/unmap itself in the case of physical devices but the expense of
> that can be somewhat mitigated on x86 at least by either disabling the
> IOMMU or using identity mapping.  I think this might be the route
> worth exploring as we could then start looking at doing things like
> implementing bridges and routers in XDP and see what performance gains
> can be had there.

One issue I have with TX ring per CPU per device is in my current use
case I have 2k tap/vhost devices and need to scale up to more than that.
Taking the naive approach and making each tap/vhost create a per cpu
ring would be 128k rings on my current dev box. I think locking could
be optional without too much difficulty.

> 
> Also as far as the one page per frame it occurs to me that you will
> have to eventually deal with things like frame replication.  Once that
> comes into play everything becomes much more difficult because the
> recycling doesn't work without some sort of reference counting, and
> since the device interrupt can migrate you could end up with clean-up
> occurring on a different CPUs so you need to have some sort of
> synchronization mechanism.
> 
> Thanks.
> 
> - Alex
> 



Re: Questions on XDP

2017-02-18 Thread Alexei Starovoitov
On Sat, Feb 18, 2017 at 10:18 AM, Alexander Duyck
 wrote:
>
>> XDP_DROP does not require having one page per frame.
>
> Agreed.

why do you think so?
xdp_drop is targeting ddos where in good case
all traffic is passed up and in bad case
most of the traffic is dropped, but good traffic still needs
to be serviced by the layers after. Like other xdp
programs and the stack.
Say ixgbe+xdp goes with 2k per packet,
very soon we will have a bunch of half pages
sitting in the stack and other halfs requiring
complex refcnting and making the actual
ddos mitigation ineffective and forcing nic to drop packets
because it runs out of buffers. Why complicate things?
packet per page approach is simple and effective.
virtio is different. there we don't have hw that needs
to have buffers ready for dma.

> Looking at the Mellanox way of doing it I am not entirely sure it is
> useful.  It looks good for benchmarks but that is about it.  Also I

it's the opposite. It already runs very nicely in production.
In real life it's always a combination of xdp_drop, xdp_tx and
xdp_pass actions.
Sounds like ixgbe wants to do things differently because
of not-invented-here. That new approach may turn
out to be good or bad, but why risk it?
mlx4 approach works.
mlx5 has few issues though, because page recycling
was done too simplistic. Generic page pool/recycling
that all drivers will use should solve that. I hope.
Is the proposal to have generic split-page recycler ?
How that is going to work?

> don't see it extending out to the point that we would be able to
> exchange packets between interfaces which really seems like it should
> be the ultimate goal for XDP_TX.

we don't have a use case for multi-port xdp_tx,
but I'm not objecting to doing it in general.
Just right now I don't see a need to complicate
drivers to do so.

> It seems like eventually we want to be able to peel off the buffer and
> send it to something other than ourselves.  For example it seems like
> it might be useful at some point to use XDP to do traffic
> classification and have it route packets between multiple interfaces
> on a host and it wouldn't make sense to have all of them map every
> page as bidirectional because it starts becoming ridiculous if you
> have dozens of interfaces in a system.

dozen interfaces? Like a single nic with dozen ports?
or many nics with many ports on the same system?
are you trying to build a switch out of x86?
I don't think it's realistic to have multi-terrabit x86 box.
Is it all because of dpdk/6wind demos?
I saw how dpdk was bragging that they can saturate
pcie bus. So? Why is this useful?
Why anyone would care to put a bunch of nics
into x86 and demonstrate that bandwidth of pcie is now
a limiting factor ?

> Also as far as the one page per frame it occurs to me that you will
> have to eventually deal with things like frame replication.

... only in cases where one needs to demo a multi-port
bridge with lots of nics in one x86 box.
I don't see practicality of such setup and I think
that copying full page every time xdp needs to
broadcast is preferred vs doing atomic refcnting
that will slow down the main case. broadcast is slow path.

My strong believe that xdp should not care about
niche architectures. It never meant to be a solution
for everyone and for all use cases.
If xdp sucks on powerpc, so be it.
cpus with 64k pages are doomed. We should
not sacrifice performance on x86 because of ppc.
I think it was a mistake that ixgbe choose to do that
in the past. When mb()s were added because
of powerpc and it took years to introduce dma_mb()
and return performance to good levels.
btw, dma_mb work was awesome.
In xdp I don't want to make such trade-offs.
Really only x86 and arm64 archs matter today.
Everything else is best effort.


[PATCH] net: qlogic: qla3xxx: use new api ethtool_{get|set}_link_ksettings

2017-02-18 Thread Philippe Reynes
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

As I don't have the hardware, I'd be very pleased if
someone may test this patch.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/qlogic/qla3xxx.c |   29 ++---
 1 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/qlogic/qla3xxx.c 
b/drivers/net/ethernet/qlogic/qla3xxx.c
index ea38236..2991179 100644
--- a/drivers/net/ethernet/qlogic/qla3xxx.c
+++ b/drivers/net/ethernet/qlogic/qla3xxx.c
@@ -1707,23 +1707,30 @@ static int ql_get_full_dup(struct ql3_adapter *qdev)
return status;
 }
 
-static int ql_get_settings(struct net_device *ndev, struct ethtool_cmd *ecmd)
+static int ql_get_link_ksettings(struct net_device *ndev,
+struct ethtool_link_ksettings *cmd)
 {
struct ql3_adapter *qdev = netdev_priv(ndev);
+   u32 supported, advertising;
 
-   ecmd->transceiver = XCVR_INTERNAL;
-   ecmd->supported = ql_supported_modes(qdev);
+   supported = ql_supported_modes(qdev);
 
if (test_bit(QL_LINK_OPTICAL, &qdev->flags)) {
-   ecmd->port = PORT_FIBRE;
+   cmd->base.port = PORT_FIBRE;
} else {
-   ecmd->port = PORT_TP;
-   ecmd->phy_address = qdev->PHYAddr;
+   cmd->base.port = PORT_TP;
+   cmd->base.phy_address = qdev->PHYAddr;
}
-   ecmd->advertising = ql_supported_modes(qdev);
-   ecmd->autoneg = ql_get_auto_cfg_status(qdev);
-   ethtool_cmd_speed_set(ecmd, ql_get_speed(qdev));
-   ecmd->duplex = ql_get_full_dup(qdev);
+   advertising = ql_supported_modes(qdev);
+   cmd->base.autoneg = ql_get_auto_cfg_status(qdev);
+   cmd->base.speed = ql_get_speed(qdev);
+   cmd->base.duplex = ql_get_full_dup(qdev);
+
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
+   advertising);
+
return 0;
 }
 
@@ -1769,12 +1776,12 @@ static void ql_get_pauseparam(struct net_device *ndev,
 }
 
 static const struct ethtool_ops ql3xxx_ethtool_ops = {
-   .get_settings = ql_get_settings,
.get_drvinfo = ql_get_drvinfo,
.get_link = ethtool_op_get_link,
.get_msglevel = ql_get_msglevel,
.set_msglevel = ql_set_msglevel,
.get_pauseparam = ql_get_pauseparam,
+   .get_link_ksettings = ql_get_link_ksettings,
 };
 
 static int ql_populate_free_queue(struct ql3_adapter *qdev)
-- 
1.7.4.4



[patch] sunrpc: silence uninitialized variable warning

2017-02-18 Thread Dan Carpenter
kstrtouint() can return a couple different error codes so the check for
"ret == -EINVAL" is wrong and static analysis tools correctly complain
that we can use "num" without initializing it.  It's not super harmful
because we check the bounds.  But it's also easy enough to fix.

Signed-off-by: Dan Carpenter 

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 956c7bce80d1..311ce92384b7 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -3209,7 +3209,9 @@ static int param_set_uint_minmax(const char *val,
if (!val)
return -EINVAL;
ret = kstrtouint(val, 0, &num);
-   if (ret == -EINVAL || num < min || num > max)
+   if (ret)
+   return ret;
+   if (num < min || num > max)
return -EINVAL;
*((unsigned int *)kp->arg) = num;
return 0;


Re: [RFC v2 11/20] scsi: megaraid: Replace PCI pool old API

2017-02-18 Thread Peter Senna Tschudin
On Sat, Feb 18, 2017 at 09:35:47AM +0100, Romain Perier wrote:
> The PCI pool API is deprecated. This commits replaces the PCI pool old
> API by the appropriated function with the DMA pool API.

Did not apply on linux-next-20170217


> 
> Signed-off-by: Romain Perier 
> ---
>  drivers/scsi/megaraid/megaraid_mbox.c   | 30 -
>  drivers/scsi/megaraid/megaraid_mm.c | 29 
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 25 +++---
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 51 
> +++--
>  4 files changed, 70 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_mbox.c 
> b/drivers/scsi/megaraid/megaraid_mbox.c
> index f0987f2..6d0bd3a 100644
> --- a/drivers/scsi/megaraid/megaraid_mbox.c
> +++ b/drivers/scsi/megaraid/megaraid_mbox.c
> @@ -1153,8 +1153,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>  
>   // Allocate memory for 16-bytes aligned mailboxes
> - raid_dev->mbox_pool_handle = pci_pool_create("megaraid mbox pool",
> - adapter->pdev,
> + raid_dev->mbox_pool_handle = dma_pool_create("megaraid mbox pool",
> + &adapter->pdev->dev,
>   sizeof(mbox64_t) + 16,
>   16, 0);
>  
> @@ -1164,7 +1164,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   mbox_pci_blk = raid_dev->mbox_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
> - mbox_pci_blk[i].vaddr = pci_pool_alloc(
> + mbox_pci_blk[i].vaddr = dma_pool_alloc(
>   raid_dev->mbox_pool_handle,
>   GFP_KERNEL,
>   &mbox_pci_blk[i].dma_addr);
> @@ -1181,8 +1181,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>* share common memory pool. Passthru structures piggyback on memory
>* allocted to extended passthru since passthru is smaller of the two
>*/
> - raid_dev->epthru_pool_handle = pci_pool_create("megaraid mbox pthru",
> - adapter->pdev, sizeof(mraid_epassthru_t), 128, 0);
> + raid_dev->epthru_pool_handle = dma_pool_create("megaraid mbox pthru",
> + &adapter->pdev->dev, sizeof(mraid_epassthru_t), 128, 0);
>  
>   if (raid_dev->epthru_pool_handle == NULL) {
>   goto fail_setup_dma_pool;
> @@ -1190,7 +1190,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   epthru_pci_blk = raid_dev->epthru_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
> - epthru_pci_blk[i].vaddr = pci_pool_alloc(
> + epthru_pci_blk[i].vaddr = dma_pool_alloc(
>   raid_dev->epthru_pool_handle,
>   GFP_KERNEL,
>   &epthru_pci_blk[i].dma_addr);
> @@ -1202,8 +1202,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   // Allocate memory for each scatter-gather list. Request for 512 bytes
>   // alignment for each sg list
> - raid_dev->sg_pool_handle = pci_pool_create("megaraid mbox sg",
> - adapter->pdev,
> + raid_dev->sg_pool_handle = dma_pool_create("megaraid mbox sg",
> + &adapter->pdev->dev,
>   sizeof(mbox_sgl64) * MBOX_MAX_SG_SIZE,
>   512, 0);
>  
> @@ -1213,7 +1213,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   sg_pci_blk = raid_dev->sg_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
> - sg_pci_blk[i].vaddr = pci_pool_alloc(
> + sg_pci_blk[i].vaddr = dma_pool_alloc(
>   raid_dev->sg_pool_handle,
>   GFP_KERNEL,
>   &sg_pci_blk[i].dma_addr);
> @@ -1249,29 +1249,29 @@ megaraid_mbox_teardown_dma_pools(adapter_t *adapter)
>  
>   sg_pci_blk = raid_dev->sg_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS && sg_pci_blk[i].vaddr; i++) {
> - pci_pool_free(raid_dev->sg_pool_handle, sg_pci_blk[i].vaddr,
> + dma_pool_free(raid_dev->sg_pool_handle, sg_pci_blk[i].vaddr,
>   sg_pci_blk[i].dma_addr);
>   }
>   if (raid_dev->sg_pool_handle)
> - pci_pool_destroy(raid_dev->sg_pool_handle);
> + dma_pool_destroy(raid_dev->sg_pool_handle);
>  
>  
>   epthru_pci_blk = raid_dev->epthru_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS && epthru_pci_blk[i].vaddr; i++) {
> - pci_pool_free(raid_dev->epthru_pool_handle,
> + dma_pool_free(raid_dev->epthru_pool_handle,
>   epthru_pci_blk[i].vaddr, epthr

Re: [RFC v2 00/20] Replace PCI pool by DMA pool API

2017-02-18 Thread Peter Senna Tschudin
On Sat, Feb 18, 2017 at 09:35:36AM +0100, Romain Perier wrote:

Tested all patches by compilation and checkpatch. All of them compile
fine, but patches 11 and 12 need some fixes. You can resend as
PATCH instead of RFC.

> The current PCI pool API are simple macro functions direct expanded to
> the appropriated dma pool functions. The prototypes are almost the same
> and semantically, they are very similar. I propose to use the DMA pool
> API directly and get rid of the old API.
> 
> This set of patches, replaces the old API by the dma pool API, adds
> support to warn about this old API in checkpath.pl and remove the
> defines.
> 
> Changes in v2:
> - Introduced patch 18/20
> - Fixed cosmetic changes: spaces before brace, live over 80 characters
> - Removed some of the check for NULL pointers before calling dma_pool_destroy
> - Improved the regexp in checkpatch for pci_pool, thanks to Joe Perches
> - Added Tested-by and Acked-by tags
> 
> Romain Perier (20):
>   block: DAC960: Replace PCI pool old API
>   dmaengine: pch_dma: Replace PCI pool old API
>   IB/mthca: Replace PCI pool old API
>   net: e100: Replace PCI pool old API
>   mlx4: Replace PCI pool old API
>   mlx5: Replace PCI pool old API
>   wireless: ipw2200: Replace PCI pool old API
>   scsi: be2iscsi: Replace PCI pool old API
>   scsi: csiostor: Replace PCI pool old API
>   scsi: lpfc: Replace PCI pool old API
>   scsi: megaraid: Replace PCI pool old API
>   scsi: mpt3sas: Replace PCI pool old API
>   scsi: mvsas: Replace PCI pool old API
>   scsi: pmcraid: Replace PCI pool old API
>   usb: gadget: amd5536udc: Replace PCI pool old API
>   usb: gadget: net2280: Replace PCI pool old API
>   usb: gadget: pch_udc: Replace PCI pool old API
>   usb: host: Remove remaining pci_pool in comments
>   PCI: Remove PCI pool macro functions
>   checkpatch: warn for use of old PCI pool API
> 
>  drivers/block/DAC960.c| 36 ++---
>  drivers/block/DAC960.h|  4 +-
>  drivers/dma/pch_dma.c | 12 ++---
>  drivers/infiniband/hw/mthca/mthca_av.c| 10 ++--
>  drivers/infiniband/hw/mthca/mthca_cmd.c   |  8 +--
>  drivers/infiniband/hw/mthca/mthca_dev.h   |  4 +-
>  drivers/net/ethernet/intel/e100.c | 12 ++---
>  drivers/net/ethernet/mellanox/mlx4/cmd.c  | 10 ++--
>  drivers/net/ethernet/mellanox/mlx4/mlx4.h |  2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++--
>  drivers/net/wireless/intel/ipw2x00/ipw2200.c  | 13 ++---
>  drivers/scsi/be2iscsi/be_iscsi.c  |  6 +--
>  drivers/scsi/be2iscsi/be_main.c   |  6 +--
>  drivers/scsi/be2iscsi/be_main.h   |  2 +-
>  drivers/scsi/csiostor/csio_hw.h   |  2 +-
>  drivers/scsi/csiostor/csio_init.c | 11 ++--
>  drivers/scsi/csiostor/csio_scsi.c |  6 +--
>  drivers/scsi/lpfc/lpfc.h  | 10 ++--
>  drivers/scsi/lpfc/lpfc_init.c |  6 +--
>  drivers/scsi/lpfc/lpfc_mem.c  | 73 
> +--
>  drivers/scsi/lpfc/lpfc_scsi.c | 12 ++---
>  drivers/scsi/megaraid/megaraid_mbox.c | 30 +--
>  drivers/scsi/megaraid/megaraid_mm.c   | 29 ++-
>  drivers/scsi/megaraid/megaraid_sas_base.c | 25 -
>  drivers/scsi/megaraid/megaraid_sas_fusion.c   | 51 ++-
>  drivers/scsi/mpt3sas/mpt3sas_base.c   | 73 
> +--
>  drivers/scsi/mvsas/mv_init.c  |  6 +--
>  drivers/scsi/mvsas/mv_sas.c   |  6 +--
>  drivers/scsi/pmcraid.c| 10 ++--
>  drivers/scsi/pmcraid.h|  2 +-
>  drivers/usb/gadget/udc/amd5536udc.c   |  8 +--
>  drivers/usb/gadget/udc/amd5536udc.h   |  4 +-
>  drivers/usb/gadget/udc/net2280.c  | 12 ++---
>  drivers/usb/gadget/udc/net2280.h  |  2 +-
>  drivers/usb/gadget/udc/pch_udc.c  | 31 ++--
>  drivers/usb/host/ehci-hcd.c   |  2 +-
>  drivers/usb/host/fotg210-hcd.c|  2 +-
>  drivers/usb/host/oxu210hp-hcd.c   |  2 +-
>  include/linux/mlx5/driver.h   |  2 +-
>  include/linux/pci.h   |  9 
>  scripts/checkpatch.pl |  9 +++-
>  41 files changed, 284 insertions(+), 287 deletions(-)
> 
> -- 
> 2.9.3
> 


Re: [RFC v2 12/20] scsi: mpt3sas: Replace PCI pool old API

2017-02-18 Thread Peter Senna Tschudin
On Sat, Feb 18, 2017 at 09:35:48AM +0100, Romain Perier wrote:
> The PCI pool API is deprecated. This commits replaces the PCI pool old
> API by the appropriated function with the DMA pool API.

Please run checkpatch, fix the style issue and resend.

> 
> Signed-off-by: Romain Perier 
> ---
>  drivers/scsi/mpt3sas/mpt3sas_base.c | 73 
> +
>  1 file changed, 34 insertions(+), 39 deletions(-)
> 
> diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c 
> b/drivers/scsi/mpt3sas/mpt3sas_base.c
> index a3fe1fb..3c2206d 100644
> --- a/drivers/scsi/mpt3sas/mpt3sas_base.c
> +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
> @@ -3210,9 +3210,8 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
>   }
>  
>   if (ioc->sense) {
> - pci_pool_free(ioc->sense_dma_pool, ioc->sense, ioc->sense_dma);
> - if (ioc->sense_dma_pool)
> - pci_pool_destroy(ioc->sense_dma_pool);
> + dma_pool_free(ioc->sense_dma_pool, ioc->sense, ioc->sense_dma);
> + dma_pool_destroy(ioc->sense_dma_pool);
>   dexitprintk(ioc, pr_info(MPT3SAS_FMT
>   "sense_pool(0x%p): free\n",
>   ioc->name, ioc->sense));
> @@ -3220,9 +3219,8 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
>   }
>  
>   if (ioc->reply) {
> - pci_pool_free(ioc->reply_dma_pool, ioc->reply, ioc->reply_dma);
> - if (ioc->reply_dma_pool)
> - pci_pool_destroy(ioc->reply_dma_pool);
> + dma_pool_free(ioc->reply_dma_pool, ioc->reply, ioc->reply_dma);
> + dma_pool_destroy(ioc->reply_dma_pool);
>   dexitprintk(ioc, pr_info(MPT3SAS_FMT
>   "reply_pool(0x%p): free\n",
>   ioc->name, ioc->reply));
> @@ -3230,10 +3228,9 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
>   }
>  
>   if (ioc->reply_free) {
> - pci_pool_free(ioc->reply_free_dma_pool, ioc->reply_free,
> + dma_pool_free(ioc->reply_free_dma_pool, ioc->reply_free,
>   ioc->reply_free_dma);
> - if (ioc->reply_free_dma_pool)
> - pci_pool_destroy(ioc->reply_free_dma_pool);
> + dma_pool_destroy(ioc->reply_free_dma_pool);
>   dexitprintk(ioc, pr_info(MPT3SAS_FMT
>   "reply_free_pool(0x%p): free\n",
>   ioc->name, ioc->reply_free));
> @@ -3244,7 +3241,7 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
>   do {
>   rps = &ioc->reply_post[i];
>   if (rps->reply_post_free) {
> - pci_pool_free(
> + dma_pool_free(
>   ioc->reply_post_free_dma_pool,
>   rps->reply_post_free,
>   rps->reply_post_free_dma);
> @@ -3256,8 +3253,7 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER *ioc)
>   } while (ioc->rdpq_array_enable &&
>  (++i < ioc->reply_queue_count));
>  
> - if (ioc->reply_post_free_dma_pool)
> - pci_pool_destroy(ioc->reply_post_free_dma_pool);
> + dma_pool_destroy(ioc->reply_post_free_dma_pool);
>   kfree(ioc->reply_post);
>   }
>  
> @@ -3278,12 +3274,11 @@ _base_release_memory_pools(struct MPT3SAS_ADAPTER 
> *ioc)
>   if (ioc->chain_lookup) {
>   for (i = 0; i < ioc->chain_depth; i++) {
>   if (ioc->chain_lookup[i].chain_buffer)
> - pci_pool_free(ioc->chain_dma_pool,
> + dma_pool_free(ioc->chain_dma_pool,
>   ioc->chain_lookup[i].chain_buffer,
>   ioc->chain_lookup[i].chain_buffer_dma);
>   }
> - if (ioc->chain_dma_pool)
> - pci_pool_destroy(ioc->chain_dma_pool);
> + dma_pool_destroy(ioc->chain_dma_pool);
>   free_pages((ulong)ioc->chain_lookup, ioc->chain_pages);
>   ioc->chain_lookup = NULL;
>   }
> @@ -3458,23 +3453,23 @@ _base_allocate_memory_pools(struct MPT3SAS_ADAPTER 
> *ioc)
>   ioc->name);
>   goto out;
>   }
> - ioc->reply_post_free_dma_pool = pci_pool_create("reply_post_free pool",
> - ioc->pdev, sz, 16, 0);
> + ioc->reply_post_free_dma_pool = dma_pool_create("reply_post_free pool",
> + &ioc->pdev->dev, sz, 16, 0);
>   if (!ioc->reply_post_free_dma_pool) {
>   pr_err(MPT3SAS_FMT
> -  "reply_post_free pool: pci_pool_create failed\n",
> +  "reply_post_free pool: dma_pool_create failed\n",
>ioc->name);
>   goto out;
>   }
>   i = 0;
>   do {
>   ioc->reply_post[i].reply_post_free =
> -  

Re: [RFC v2 11/20] scsi: megaraid: Replace PCI pool old API

2017-02-18 Thread Peter Senna Tschudin
On Sat, Feb 18, 2017 at 09:35:47AM +0100, Romain Perier wrote:

Hi Romain,

Checkpatch gives some warnings you can fix related to NULL tests before
dma_pool_destroy(), and you changed indentation style in some of your
changes. Some times it is important to keep consistency within a file
even if the style is not the default. Please fix and resend.


> The PCI pool API is deprecated. This commits replaces the PCI pool old
> API by the appropriated function with the DMA pool API.
> 
> Signed-off-by: Romain Perier 
> ---
>  drivers/scsi/megaraid/megaraid_mbox.c   | 30 -
>  drivers/scsi/megaraid/megaraid_mm.c | 29 
>  drivers/scsi/megaraid/megaraid_sas_base.c   | 25 +++---
>  drivers/scsi/megaraid/megaraid_sas_fusion.c | 51 
> +++--
>  4 files changed, 70 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/scsi/megaraid/megaraid_mbox.c 
> b/drivers/scsi/megaraid/megaraid_mbox.c
> index f0987f2..6d0bd3a 100644
> --- a/drivers/scsi/megaraid/megaraid_mbox.c
> +++ b/drivers/scsi/megaraid/megaraid_mbox.c
> @@ -1153,8 +1153,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>  
>   // Allocate memory for 16-bytes aligned mailboxes
> - raid_dev->mbox_pool_handle = pci_pool_create("megaraid mbox pool",
> - adapter->pdev,
> + raid_dev->mbox_pool_handle = dma_pool_create("megaraid mbox pool",
> + &adapter->pdev->dev,
>   sizeof(mbox64_t) + 16,
>   16, 0);
>  
> @@ -1164,7 +1164,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   mbox_pci_blk = raid_dev->mbox_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
> - mbox_pci_blk[i].vaddr = pci_pool_alloc(
> + mbox_pci_blk[i].vaddr = dma_pool_alloc(
>   raid_dev->mbox_pool_handle,
>   GFP_KERNEL,
>   &mbox_pci_blk[i].dma_addr);
> @@ -1181,8 +1181,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>* share common memory pool. Passthru structures piggyback on memory
>* allocted to extended passthru since passthru is smaller of the two
>*/
> - raid_dev->epthru_pool_handle = pci_pool_create("megaraid mbox pthru",
> - adapter->pdev, sizeof(mraid_epassthru_t), 128, 0);
> + raid_dev->epthru_pool_handle = dma_pool_create("megaraid mbox pthru",
> + &adapter->pdev->dev, sizeof(mraid_epassthru_t), 128, 0);
>  
>   if (raid_dev->epthru_pool_handle == NULL) {
>   goto fail_setup_dma_pool;
> @@ -1190,7 +1190,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   epthru_pci_blk = raid_dev->epthru_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
> - epthru_pci_blk[i].vaddr = pci_pool_alloc(
> + epthru_pci_blk[i].vaddr = dma_pool_alloc(
>   raid_dev->epthru_pool_handle,
>   GFP_KERNEL,
>   &epthru_pci_blk[i].dma_addr);
> @@ -1202,8 +1202,8 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   // Allocate memory for each scatter-gather list. Request for 512 bytes
>   // alignment for each sg list
> - raid_dev->sg_pool_handle = pci_pool_create("megaraid mbox sg",
> - adapter->pdev,
> + raid_dev->sg_pool_handle = dma_pool_create("megaraid mbox sg",
> + &adapter->pdev->dev,
>   sizeof(mbox_sgl64) * MBOX_MAX_SG_SIZE,
>   512, 0);
>  
> @@ -1213,7 +1213,7 @@ megaraid_mbox_setup_dma_pools(adapter_t *adapter)
>  
>   sg_pci_blk = raid_dev->sg_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS; i++) {
> - sg_pci_blk[i].vaddr = pci_pool_alloc(
> + sg_pci_blk[i].vaddr = dma_pool_alloc(
>   raid_dev->sg_pool_handle,
>   GFP_KERNEL,
>   &sg_pci_blk[i].dma_addr);
> @@ -1249,29 +1249,29 @@ megaraid_mbox_teardown_dma_pools(adapter_t *adapter)
>  
>   sg_pci_blk = raid_dev->sg_pool;
>   for (i = 0; i < MBOX_MAX_SCSI_CMDS && sg_pci_blk[i].vaddr; i++) {
> - pci_pool_free(raid_dev->sg_pool_handle, sg_pci_blk[i].vaddr,
> + dma_pool_free(raid_dev->sg_pool_handle, sg_pci_blk[i].vaddr,
>   sg_pci_blk[i].dma_addr);
>   }
>   if (raid_dev->sg_pool_handle)
> - pci_pool_destroy(raid_dev->sg_pool_handle);
> + dma_pool_destroy(raid_dev->sg_pool_handle);
>  
>  
>   epthru_pci_blk = raid_dev->epthru_pool;
>   

[PATCH net] mlx4: reduce OOM risk on arches with large pages

2017-02-18 Thread Eric Dumazet
From: Eric Dumazet 

Since mlx4 NIC are used on PowerPC with 64K pages, we need to adapt
MLX4_EN_ALLOC_PREFER_ORDER definition.

Otherwise, a fragment sitting in an out of order TCP queue can hold
0.5 Mbytes and it is a serious OOM risk.

Fixes: 51151a16a60f ("mlx4: allow order-0 memory allocations in RX path")
Signed-off-by: Eric Dumazet 
---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h 
b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 
cec59bc264c9ac197048fd7c98bcd5cf25de0efd..0f6d2f3b7d54f51de359d4ccde21f4585e6b7852
 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -102,7 +102,8 @@
 /* Use the maximum between 16384 and a single page */
 #define MLX4_EN_ALLOC_SIZE PAGE_ALIGN(16384)
 
-#define MLX4_EN_ALLOC_PREFER_ORDER PAGE_ALLOC_COSTLY_ORDER
+#define MLX4_EN_ALLOC_PREFER_ORDER min_t(int, get_order(32768),
\
+PAGE_ALLOC_COSTLY_ORDER)
 
 /* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
  * and 4K allocations) */




Re: Questions on XDP

2017-02-18 Thread Alexander Duyck
On Sat, Feb 18, 2017 at 9:41 AM, Eric Dumazet  wrote:
> On Sat, 2017-02-18 at 17:34 +0100, Jesper Dangaard Brouer wrote:
>> On Thu, 16 Feb 2017 14:36:41 -0800
>> John Fastabend  wrote:
>>
>> > On 17-02-16 12:41 PM, Alexander Duyck wrote:
>> > > So I'm in the process of working on enabling XDP for the Intel NICs
>> > > and I had a few questions so I just thought I would put them out here
>> > > to try and get everything sorted before I paint myself into a corner.
>> > >
>> > > So my first question is why does the documentation mention 1 frame per
>> > > page for XDP?
>>
>> Yes, XDP defines upfront a memory model where there is only one packet
>> per page[1], please respect that!
>>
>> This is currently used/needed for fast-direct recycling of pages inside
>> the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
>> refcnt operations on the page. E.g. see mlx4_en_rx_recycle().
>
>
> XDP_DROP does not require having one page per frame.

Agreed.

> (Look after my recent mlx4 patch series if you need to be convinced)
>
> Only XDP_TX is.
>
> This requirement makes XDP useless (very OOM likely) on arches with 64K
> pages.

Actually I have been having a side discussion with John about XDP_TX.
Looking at the Mellanox way of doing it I am not entirely sure it is
useful.  It looks good for benchmarks but that is about it.  Also I
don't see it extending out to the point that we would be able to
exchange packets between interfaces which really seems like it should
be the ultimate goal for XDP_TX.

It seems like eventually we want to be able to peel off the buffer and
send it to something other than ourselves.  For example it seems like
it might be useful at some point to use XDP to do traffic
classification and have it route packets between multiple interfaces
on a host and it wouldn't make sense to have all of them map every
page as bidirectional because it starts becoming ridiculous if you
have dozens of interfaces in a system.

As per our original discussion at netconf if we want to be able to do
XDP Tx with a fully lockless Tx ring we needed to have a Tx ring per
CPU that is performing XDP.  The Tx path will end up needing to do the
map/unmap itself in the case of physical devices but the expense of
that can be somewhat mitigated on x86 at least by either disabling the
IOMMU or using identity mapping.  I think this might be the route
worth exploring as we could then start looking at doing things like
implementing bridges and routers in XDP and see what performance gains
can be had there.

Also as far as the one page per frame it occurs to me that you will
have to eventually deal with things like frame replication.  Once that
comes into play everything becomes much more difficult because the
recycling doesn't work without some sort of reference counting, and
since the device interrupt can migrate you could end up with clean-up
occurring on a different CPUs so you need to have some sort of
synchronization mechanism.

Thanks.

- Alex


[PATCH net-next 2/2] sctp: add support for MSG_MORE

2017-02-18 Thread Xin Long
This patch is to add support for MSG_MORE on sctp.

It adds force_delay in sctp_datamsg to save MSG_MORE, and sets it after
creating datamsg according to the send flag. sctp_packet_can_append_data
then uses it to decide if the chunks of this msg will be sent at once or
delay it.

Note that unlike [1], this patch saves MSG_MORE in datamsg, instead of
in assoc. As sctp enqueues the chunks first, then dequeue them one by
one. If it's saved in assoc,the current msg's send flag (MSG_MORE) may
affect other chunks' bundling.

Since last patch, sctp flush out queue once assoc state falls into
SHUTDOWN_PENDING, the close block problem mentioned in [1] has been
solved as well.

[1] https://patchwork.ozlabs.org/patch/372404/

Signed-off-by: Xin Long 
---
 include/net/sctp/structs.h | 1 +
 net/sctp/output.c  | 9 +++--
 net/sctp/socket.c  | 1 +
 3 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/include/net/sctp/structs.h b/include/net/sctp/structs.h
index 387c802..a244db5 100644
--- a/include/net/sctp/structs.h
+++ b/include/net/sctp/structs.h
@@ -497,6 +497,7 @@ struct sctp_datamsg {
/* Did the messenge fail to send? */
int send_error;
u8 send_failed:1,
+  force_delay:1,
   can_delay;   /* should this message be Nagle delayed */
 };
 
diff --git a/net/sctp/output.c b/net/sctp/output.c
index 814eac0..85406d5 100644
--- a/net/sctp/output.c
+++ b/net/sctp/output.c
@@ -704,18 +704,15 @@ static sctp_xmit_t sctp_packet_can_append_data(struct 
sctp_packet *packet,
 * unacknowledged.
 */
 
-   if (sctp_sk(asoc->base.sk)->nodelay)
-   /* Nagle disabled */
+   if ((sctp_sk(asoc->base.sk)->nodelay || inflight == 0) &&
+   !chunk->msg->force_delay)
+   /* Nothing unacked */
return SCTP_XMIT_OK;
 
if (!sctp_packet_empty(packet))
/* Append to packet */
return SCTP_XMIT_OK;
 
-   if (inflight == 0)
-   /* Nothing unacked */
-   return SCTP_XMIT_OK;
-
if (!sctp_state(asoc, ESTABLISHED))
return SCTP_XMIT_OK;
 
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 75f35ce..b532148 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1964,6 +1964,7 @@ static int sctp_sendmsg(struct sock *sk, struct msghdr 
*msg, size_t msg_len)
err = PTR_ERR(datamsg);
goto out_free;
}
+   datamsg->force_delay = !!(msg->msg_flags & MSG_MORE);
 
/* Now send the (possibly) fragmented message. */
list_for_each_entry(chunk, &datamsg->chunks, frag_list) {
-- 
2.1.0



Re: [RFC v2 00/20] Replace PCI pool by DMA pool API

2017-02-18 Thread Romain Perier


Le 18/02/2017 à 14:06, Greg Kroah-Hartman a écrit :
> On Sat, Feb 18, 2017 at 09:35:36AM +0100, Romain Perier wrote:
>> The current PCI pool API are simple macro functions direct expanded to
>> the appropriated dma pool functions. The prototypes are almost the same
>> and semantically, they are very similar. I propose to use the DMA pool
>> API directly and get rid of the old API.
>>
>> This set of patches, replaces the old API by the dma pool API, adds
>> support to warn about this old API in checkpath.pl and remove the
>> defines.
> Why is this a "RFC" series?  Personally, I never apply those as it
> implies that the author doesn't think they are ready to be merged :)
>
> thanks,
>
> greg k-h
Hi,

I was not sure about this. I have noticed that most of the API changes
are tagged as RFC.
I can re-send a v3 without the prefix RFC if you prefer.

Thanks,
Romain


[PATCH net-next 0/2] sctp: support MSG_MORE flag when sending msg

2017-02-18 Thread Xin Long
This patch is to add support for MSG_MORE on sctp. Patch 1/2 is an
improvement ahead of patch 2/2 to solve the close block problem
mentioned in https://patchwork.ozlabs.org/patch/372404/.

Xin Long (2):
  sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING
  sctp: add support for MSG_MORE

 include/net/sctp/structs.h | 1 +
 net/sctp/output.c  | 9 +++--
 net/sctp/sm_sideeffect.c   | 4 
 net/sctp/socket.c  | 1 +
 4 files changed, 9 insertions(+), 6 deletions(-)

-- 
2.1.0



[PATCH net-next 1/2] sctp: flush out queue once assoc state falls into SHUTDOWN_PENDING

2017-02-18 Thread Xin Long
This patch is to flush out queue when assoc state falls into
SHUTDOWN_PENDING if there are still chunks in it, so that the
data can be sent out as soon as possible before sending SHUTDOWN
chunk.

When sctp supports MSG_MORE flag in next patch, this improvement
can also solve the problem that the chunks with MSG_MORE flag
may be stuck in queue when closing an assoc.

Signed-off-by: Xin Long 
---
 net/sctp/sm_sideeffect.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/sctp/sm_sideeffect.c b/net/sctp/sm_sideeffect.c
index 51abcc9..25384fa 100644
--- a/net/sctp/sm_sideeffect.c
+++ b/net/sctp/sm_sideeffect.c
@@ -872,6 +872,10 @@ static void sctp_cmd_new_state(sctp_cmd_seq_t *cmds,
if (!sctp_style(sk, UDP))
sk->sk_state_change(sk);
}
+
+   if (sctp_state(asoc, SHUTDOWN_PENDING) &&
+   !sctp_outq_is_empty(&asoc->outqueue))
+   sctp_outq_uncork(&asoc->outqueue, GFP_ATOMIC);
 }
 
 /* Helper function to delete an association. */
-- 
2.1.0



Re: Questions on XDP

2017-02-18 Thread Eric Dumazet
On Sat, 2017-02-18 at 17:34 +0100, Jesper Dangaard Brouer wrote:
> On Thu, 16 Feb 2017 14:36:41 -0800
> John Fastabend  wrote:
> 
> > On 17-02-16 12:41 PM, Alexander Duyck wrote:
> > > So I'm in the process of working on enabling XDP for the Intel NICs
> > > and I had a few questions so I just thought I would put them out here
> > > to try and get everything sorted before I paint myself into a corner.
> > >   
> > > So my first question is why does the documentation mention 1 frame per
> > > page for XDP?  
> 
> Yes, XDP defines upfront a memory model where there is only one packet
> per page[1], please respect that!
> 
> This is currently used/needed for fast-direct recycling of pages inside
> the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
> refcnt operations on the page. E.g. see mlx4_en_rx_recycle().


XDP_DROP does not require having one page per frame.

(Look after my recent mlx4 patch series if you need to be convinced)

Only XDP_TX is.

This requirement makes XDP useless (very OOM likely) on arches with 64K
pages.






Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-18 Thread Dmitry Vyukov
On Sat, Feb 18, 2017 at 6:28 PM, Dmitry Vyukov  wrote:
> On Fri, Feb 10, 2017 at 6:17 AM, Jason Wang  wrote:
>>
>>
>> On 2017年02月10日 02:10, Michael S. Tsirkin wrote:
>>>
>>> On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote:

 - Original Message -
>
> Hello,
>
> I've got the following report while running syzkaller fuzzer on mmotm
> (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
> remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:
>
 [...]

> other info that might help us debug this:
>
>   Possible interrupt unsafe locking scenario:
>
> CPU0CPU1
> 
>lock(&(&r->consumer_lock)->rlock);
> local_irq_disable();
> lock(&(&r->producer_lock)->rlock);
> lock(&(&r->consumer_lock)->rlock);
>
>  lock(&(&r->producer_lock)->rlock);
>
 Thanks a lot for the testing.

 Looks like we could address this by using skb_array_consume_bh() instead.

 Could you pls verify if the following patch works?
>>>
>>> I think we should use _bh for the produce call as well,
>>> since resizing takes the producer lock.
>>
>> Looks not since irq was disabled during resizing?
>
>
> Hello,
>
> Is there a fix for this that we can pick up?
> This killed 10'000 VMs on our testing infra over the last day. Still
> happening on linux-next.


Ah, sorry, I see the patch above with skb_array_consume_bh. It's just
that it's not in linux-next. Will manually apply it now then.
Should we also do something with produce_skb?


Re: net: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected in skb_array_produce

2017-02-18 Thread Dmitry Vyukov
On Fri, Feb 10, 2017 at 6:17 AM, Jason Wang  wrote:
>
>
> On 2017年02月10日 02:10, Michael S. Tsirkin wrote:
>>
>> On Thu, Feb 09, 2017 at 05:02:31AM -0500, Jason Wang wrote:
>>>
>>> - Original Message -

 Hello,

 I've got the following report while running syzkaller fuzzer on mmotm
 (git://git.kernel.org/pub/scm/linux/kernel/git/mhocko/mm.git)
 remotes/mmotm/auto-latest ee4ba7533626ba7bf2f8b992266467ac9fdc045e:

>>> [...]
>>>
 other info that might help us debug this:

   Possible interrupt unsafe locking scenario:

 CPU0CPU1
 
lock(&(&r->consumer_lock)->rlock);
 local_irq_disable();
 lock(&(&r->producer_lock)->rlock);
 lock(&(&r->consumer_lock)->rlock);

  lock(&(&r->producer_lock)->rlock);

>>> Thanks a lot for the testing.
>>>
>>> Looks like we could address this by using skb_array_consume_bh() instead.
>>>
>>> Could you pls verify if the following patch works?
>>
>> I think we should use _bh for the produce call as well,
>> since resizing takes the producer lock.
>
> Looks not since irq was disabled during resizing?


Hello,

Is there a fix for this that we can pick up?
This killed 10'000 VMs on our testing infra over the last day. Still
happening on linux-next.

Thanks


Re: [PATCH net-next v2 11/12] net: ethernet: aquantia: Fixed memory allocation if AQ_CFG_RX_FRAME_MAX > 1 page.

2017-02-18 Thread Pavel Belous



On 02/18/2017 02:50 PM, Lino Sanfilippo wrote:

Hi,

On 17.02.2017 22:07, Pavel Belous wrote:

From: Pavel Belous 

We should allocate the number of pages based on the config parameter
AQ_CFG_RX_FRAME_MAX.

Signed-off-by: Pavel Belous 



do {
if (spin_trylock(&ring->header.lock)) {
-   frags = aq_nic_map_skb(self, skb, &buffers[0]);
+   frags = aq_nic_map_skb(self, skb, buffers);

-   aq_ring_tx_append_buffs(ring, &buffers[0], frags);
+   aq_ring_tx_append_buffs(ring, buffers, frags);



This change has nothing to do with what the commit message claims that the
patch is about. Please dont mix fixes and totally unrelated cleanups in one
patch.

Regards,
Lino



Sorry, its just small fix for readability.
I will remove it or put in separate patch in v3.

Regards,
Pavel


Re: [PATCH net-next v2 00/12] net: ethernet: aquantia: improvements and fixes

2017-02-18 Thread Pavel Belous


On 02/18/2017 02:56 PM, Lino Sanfilippo wrote:

Hi,

On 17.02.2017 22:07, Pavel Belous wrote:

From: Pavel Belous 

The following patchset contains improvements and fixes for aQuantia
AQtion ethernet driver from net-next tree.

Most fixes are based on the comments from Lino Sanfilippo.

Sanity testing was performed on real HW. No regression found.

v1->v2 :Removed buffers copying.
Fixed dma error handling.


Please review.


You could have added all "reviewed-by" tags that you have received so
far for patches in the former version of this series. I think otherwise this
information will get lost.

Regards,
Lino



I was thinking to add this tag, but i was unsure.

Thank you, I will add "Reviewed-by" tags for reviewed patches from v1 
(not changed) in patches v3.


Regards,
Pavel


Re: [PATCH net-next v2 11/12] net: ethernet: aquantia: Fixed memory allocation if AQ_CFG_RX_FRAME_MAX > 1 page.

2017-02-18 Thread Pavel Belous

On 02/18/2017 01:43 AM, Andrew Lunn wrote:

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 4c40644..0877625 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -278,6 +278,8 @@ int aq_ring_rx_fill(struct aq_ring_s *self)
struct aq_ring_buff_s *buff = NULL;
int err = 0;
int i = 0;
+   unsigned int pages_order = fls(AQ_CFG_RX_FRAME_MAX / PAGE_SIZE +
+   (AQ_CFG_RX_FRAME_MAX % PAGE_SIZE ? 1 : 0)) - 1;


Reverse Christmas tree?

Andrew



Thank you.
I will fix it in v3.

Regards,
Pavel


Re: Questions on XDP

2017-02-18 Thread Jesper Dangaard Brouer
On Thu, 16 Feb 2017 14:36:41 -0800
John Fastabend  wrote:

> On 17-02-16 12:41 PM, Alexander Duyck wrote:
> > So I'm in the process of working on enabling XDP for the Intel NICs
> > and I had a few questions so I just thought I would put them out here
> > to try and get everything sorted before I paint myself into a corner.
> >   
> > So my first question is why does the documentation mention 1 frame per
> > page for XDP?  

Yes, XDP defines upfront a memory model where there is only one packet
per page[1], please respect that!

This is currently used/needed for fast-direct recycling of pages inside
the driver for XDP_DROP and XDP_TX, _without_ performing any atomic
refcnt operations on the page. E.g. see mlx4_en_rx_recycle().

This is also about controlling the cache-coherency state of the
struct-page cache-line.  (With two (or-more) packets per page,
the struct-page cache-line will be jumping around.) Controlling this is
essential when packets are transferred between CPUs. We need an
architecture were we can control this, please.

[1] 
https://prototype-kernel.readthedocs.io/en/latest/networking/XDP/design/requirements.html#page-per-packet

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net] sctp: check duplicate node before inserting a new transport

2017-02-18 Thread Xin Long
On Sat, Feb 18, 2017 at 4:19 AM, David Miller  wrote:
> From: Xin Long 
> Date: Fri, 17 Feb 2017 16:35:24 +0800
>
>
>> + list = rhltable_lookup(&sctp_transport_hashtable, &arg,
>> +sctp_hash_params);
>> +
>> + rhl_for_each_entry_rcu(transport, tmp, list, node)
>> + if (transport->asoc->ep == t->asoc->ep) {
>> + err = -EEXIST;
>> + goto out;
>> + }
>> +
>>   err = rhltable_insert_key(&sctp_transport_hashtable, &arg,
>> &t->node, sctp_hash_params);
>> +
>> +out:
>
> Well, what if another thread of control inserts a matching transport
> after you've checked the list but before rhltable_insert_key() does
> it's work?
>
> What write side lock is being held to protect the table from
> modifications here?
sock lock.
...
sctp_assoc_add_peer()
  sctp_hash_transport()
 rhltable_insert_key()

all the places where it call  sctp_assoc_add_peer() are proctected by
lock_sock(). it's a big lock, no need to worry about race issues here.


Re: [RFC v2 00/20] Replace PCI pool by DMA pool API

2017-02-18 Thread Greg Kroah-Hartman
On Sat, Feb 18, 2017 at 09:35:36AM +0100, Romain Perier wrote:
> The current PCI pool API are simple macro functions direct expanded to
> the appropriated dma pool functions. The prototypes are almost the same
> and semantically, they are very similar. I propose to use the DMA pool
> API directly and get rid of the old API.
> 
> This set of patches, replaces the old API by the dma pool API, adds
> support to warn about this old API in checkpath.pl and remove the
> defines.

Why is this a "RFC" series?  Personally, I never apply those as it
implies that the author doesn't think they are ready to be merged :)

thanks,

greg k-h


[PATCH net-next] GTP: Add some basic documentation about drivers/net/gtp.c

2017-02-18 Thread Harald Welte
In order to clarify what the module actually does, and how to use it,
let's add some basic documentation to the kernel tree, together with
pointers to related specs and projects.

Signed-off-by: Harald Welte 
---
 Documentation/networking/gtp.txt | 135 +++
 1 file changed, 135 insertions(+)
 create mode 100644 Documentation/networking/gtp.txt

diff --git a/Documentation/networking/gtp.txt b/Documentation/networking/gtp.txt
new file mode 100644
index ..93e96750f103
--- /dev/null
+++ b/Documentation/networking/gtp.txt
@@ -0,0 +1,135 @@
+The Linux kernel GTP tunneling module
+==
+Documentation by Harald Welte 
+
+In 'drivers/net/gtp.c' you are finding a kernel-level implementation
+of a GTP tunnel endpoint.
+
+== What is GTP ==
+
+GTP is the Generic Tunnel Protocol, which is a 3GPP protocol used for
+tunneling User-IP payload between a mobile station (phone, modem)
+and the interconnection between an external packet data network (such
+as the internet).
+
+So when you start a 'data connection' from your mobile phone, the
+phone will use the control plane to signal for the establishment of
+such a tunnel between that external data network and the phone.  The
+tunnel endpoints thus reside on the phone and in the gateway.  All
+intermediate nodes just transport the encapsulated packet.
+
+The phone itself does not implement GTP but uses some other
+technology-dependent protocol stack for transmitting the user IP
+payload, such as LLC/SNDCP/RLC/MAC.
+
+At some network element inside the cellular operator infrastructure
+(SGSN in case of GPRS/EGPRS or classic UMTS, hNodeB in case of a 3G
+femtocell, eNodeB in case of 4G/LTE), the cellular protocol stacking
+is translated into GTP *without breaking the end-to-end tunnel*.  So
+intermediate nodes just perform some specific relay function.
+
+At some point the GTP packet ends up on the so-called GGSN (GSM/UMTS)
+or P-GW (LTE), which terminates the tunnel, decapsulates the packet
+and forwards it onto an external packet data network.  This can be
+public internet, but can also be any private IP network (or even
+theoretically some non-IP network like X.25).
+
+You can find the protocol specification in 3GPP TS 29.060, available
+publicly via the 3GPP website at http://www.3gpp.org/DynaReport/29060.htm
+
+A direct PDF link to v13.6.0 is provided for convenience below:
+http://www.etsi.org/deliver/etsi_ts/129000_129099/129060/13.06.00_60/ts_129060v130600p.pdf
+
+== The Linux GTP tunnelling module ==
+
+The module implements the function of a tunnel endpoint, i.e. it is
+able to decapsulate tunneled IP packets in the uplink originated by
+the phone, and encapsulate raw IP packets received from the external
+packet network in downlink towards the phone.
+
+It *only* implements the so-called 'user plane', carrying the User-IP
+payload, called GTP-U.  It does not implement the 'control plane',
+which is a signaling protocol used for establishment and teardown of
+GTP tunnels (GTP-C).
+
+So in order to have a working GGSN/P-GW setup, you will need a
+userspace program that implements the GTP-C protocol and which then
+uses the netlink interface provided by the GTP-U module in the kernel
+to configure the kernel module.
+
+This split architecture follows the tunneling modules of other
+protocols, e.g. PPPoE or L2TP, where you also run a userspace daemon
+to handle the tunnel establishment, authentication etc. and only the
+data plane is accelerated inside the kernel.
+
+Don't be confused by terminology:  The GTP User Plane goes through
+kernel accelerated path, while the GTP Control Plane goes to
+Userspace :)
+
+The official homepge of the module is at
+https://osmocom.org/projects/linux-kernel-gtp-u/wiki
+
+== Userspace Programs with Linux Kernel GTP-U support ==
+
+At the time of this writing, there are at least two Free Software
+implementations that implement GTP-C and can use the netlink interface
+to make use of the Linux kernel GTP-U support:
+
+* OpenGGSN (classic 2G/3G GGSN in C):
+  https://osmocom.org/projects/openggsn/wiki/OpenGGSN
+
+* ergw (GGSN + P-GW in Erlang):
+  https://github.com/travelping/ergw
+
+== Userspace Library / Command Line Utilities ==
+
+There is a userspace library called 'libgtpnl' which is based on
+libmnl and which implements a C-language API towards the netlink
+interface provided by the Kernel GTP module:
+
+http://git.osmocom.org/libgtpnl/
+
+== Protocol Versions ==
+
+There are two different versions of GTP-U: v0 and v1.  Both are
+implemented in the Kernel GTP module.  Version 0 is a legacy version,
+and deprecated from recent 3GPP specifications.
+
+There are three versions of GTP-C: v0, v1, and v2.  As the kernel
+doesn't implement GTP-C, we don't have to worry about this.  It's the
+responsibility of the control plane implementation in userspace to
+implement that.
+
+== IPv6 ==
+
+The 3GPP specifications indicate eit

Re: [RFC v2 01/20] block: DAC960: Replace PCI pool old API

2017-02-18 Thread Peter Senna Tschudin
On Sat, Feb 18, 2017 at 09:35:37AM +0100, Romain Perier wrote:
> The PCI pool API is deprecated. This commits replaces the PCI pool old
> API by the appropriated function with the DMA pool API.
> 

no new errors added, tested by compilation only.

> Signed-off-by: Romain Perier 
> Acked-by: Peter Senna Tschudin 
> Tested-by: Peter Senna Tschudin 
> ---
>  drivers/block/DAC960.c | 36 ++--
>  drivers/block/DAC960.h |  4 ++--
>  2 files changed, 20 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/block/DAC960.c b/drivers/block/DAC960.c
> index 26a51be..2b221cc 100644
> --- a/drivers/block/DAC960.c
> +++ b/drivers/block/DAC960.c
> @@ -268,17 +268,17 @@ static bool 
> DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
>void *AllocationPointer = NULL;
>void *ScatterGatherCPU = NULL;
>dma_addr_t ScatterGatherDMA;
> -  struct pci_pool *ScatterGatherPool;
> +  struct dma_pool *ScatterGatherPool;
>void *RequestSenseCPU = NULL;
>dma_addr_t RequestSenseDMA;
> -  struct pci_pool *RequestSensePool = NULL;
> +  struct dma_pool *RequestSensePool = NULL;
>  
>if (Controller->FirmwareType == DAC960_V1_Controller)
>  {
>CommandAllocationLength = offsetof(DAC960_Command_T, V1.EndMarker);
>CommandAllocationGroupSize = DAC960_V1_CommandAllocationGroupSize;
> -  ScatterGatherPool = pci_pool_create("DAC960_V1_ScatterGather",
> - Controller->PCIDevice,
> +  ScatterGatherPool = dma_pool_create("DAC960_V1_ScatterGather",
> + &Controller->PCIDevice->dev,
>   DAC960_V1_ScatterGatherLimit * sizeof(DAC960_V1_ScatterGatherSegment_T),
>   sizeof(DAC960_V1_ScatterGatherSegment_T), 0);
>if (ScatterGatherPool == NULL)
> @@ -290,18 +290,18 @@ static bool 
> DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
>  {
>CommandAllocationLength = offsetof(DAC960_Command_T, V2.EndMarker);
>CommandAllocationGroupSize = DAC960_V2_CommandAllocationGroupSize;
> -  ScatterGatherPool = pci_pool_create("DAC960_V2_ScatterGather",
> - Controller->PCIDevice,
> +  ScatterGatherPool = dma_pool_create("DAC960_V2_ScatterGather",
> + &Controller->PCIDevice->dev,
>   DAC960_V2_ScatterGatherLimit * sizeof(DAC960_V2_ScatterGatherSegment_T),
>   sizeof(DAC960_V2_ScatterGatherSegment_T), 0);
>if (ScatterGatherPool == NULL)
>   return DAC960_Failure(Controller,
>   "AUXILIARY STRUCTURE CREATION (SG)");
> -  RequestSensePool = pci_pool_create("DAC960_V2_RequestSense",
> - Controller->PCIDevice, sizeof(DAC960_SCSI_RequestSense_T),
> +  RequestSensePool = dma_pool_create("DAC960_V2_RequestSense",
> + &Controller->PCIDevice->dev, sizeof(DAC960_SCSI_RequestSense_T),
>   sizeof(int), 0);
>if (RequestSensePool == NULL) {
> - pci_pool_destroy(ScatterGatherPool);
> + dma_pool_destroy(ScatterGatherPool);
>   return DAC960_Failure(Controller,
>   "AUXILIARY STRUCTURE CREATION (SG)");
>}
> @@ -335,16 +335,16 @@ static bool 
> DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
>Command->Next = Controller->FreeCommands;
>Controller->FreeCommands = Command;
>Controller->Commands[CommandIdentifier-1] = Command;
> -  ScatterGatherCPU = pci_pool_alloc(ScatterGatherPool, GFP_ATOMIC,
> +  ScatterGatherCPU = dma_pool_alloc(ScatterGatherPool, GFP_ATOMIC,
>   &ScatterGatherDMA);
>if (ScatterGatherCPU == NULL)
> return DAC960_Failure(Controller, "AUXILIARY STRUCTURE CREATION");
>  
>if (RequestSensePool != NULL) {
> -   RequestSenseCPU = pci_pool_alloc(RequestSensePool, GFP_ATOMIC,
> +   RequestSenseCPU = dma_pool_alloc(RequestSensePool, GFP_ATOMIC,
>   &RequestSenseDMA);
> if (RequestSenseCPU == NULL) {
> -pci_pool_free(ScatterGatherPool, ScatterGatherCPU,
> +dma_pool_free(ScatterGatherPool, ScatterGatherCPU,
>  ScatterGatherDMA);
>   return DAC960_Failure(Controller,
>   "AUXILIARY STRUCTURE CREATION");
> @@ -379,8 +379,8 @@ static bool 
> DAC960_CreateAuxiliaryStructures(DAC960_Controller_T *Controller)
>  static void DAC960_DestroyAuxiliaryStructures(DAC960_Controller_T 
> *Controller)
>  {
>int i;
> -  struct pci_pool *ScatterGatherPool = Controller->ScatterGatherPool;
> -  struct pci_pool *RequestSensePool = NULL;
> +  struct dma_pool *ScatterGatherPool = Controller->ScatterGatherPool;
> +  struct dma_pool *RequestSensePool = NULL;
>void *ScatterGatherCPU;
>dma_addr_t ScatterGatherDMA;
>void *RequestSenseCPU;
> @@ -411,9 +411,9 @@ static void 
> DAC960_DestroyAuxiliaryStructures(DAC960_Controller_T *Controller)
> RequestS

Re: [PATCH net-next v2 00/12] net: ethernet: aquantia: improvements and fixes

2017-02-18 Thread Lino Sanfilippo
Hi,

On 17.02.2017 22:07, Pavel Belous wrote:
> From: Pavel Belous 
> 
> The following patchset contains improvements and fixes for aQuantia
> AQtion ethernet driver from net-next tree.
> 
> Most fixes are based on the comments from Lino Sanfilippo.
> 
> Sanity testing was performed on real HW. No regression found.
> 
> v1->v2 :Removed buffers copying.
>   Fixed dma error handling.
> 
> 
> Please review.

You could have added all "reviewed-by" tags that you have received so
far for patches in the former version of this series. I think otherwise this 
information will get lost.

Regards,
Lino



Re: [PATCH net-next v2 11/12] net: ethernet: aquantia: Fixed memory allocation if AQ_CFG_RX_FRAME_MAX > 1 page.

2017-02-18 Thread Lino Sanfilippo
Hi,

On 17.02.2017 22:07, Pavel Belous wrote:
> From: Pavel Belous 
> 
> We should allocate the number of pages based on the config parameter
> AQ_CFG_RX_FRAME_MAX.
> 
> Signed-off-by: Pavel Belous 

>   do {
>   if (spin_trylock(&ring->header.lock)) {
> - frags = aq_nic_map_skb(self, skb, &buffers[0]);
> + frags = aq_nic_map_skb(self, skb, buffers);
>  
> - aq_ring_tx_append_buffs(ring, &buffers[0], frags);
> + aq_ring_tx_append_buffs(ring, buffers, frags);
>  

This change has nothing to do with what the commit message claims that the
patch is about. Please dont mix fixes and totally unrelated cleanups in one
patch.

Regards,
Lino



[PATCH] net: aquantia: remove function aq_ring_tx_deinit

2017-02-18 Thread Lino Sanfilippo
Both functions aq_ring_rx_deinit() and aq_ring_tx_clean() are almost
identical aside from an additional check in the latter.
Move that check from the function into its caller and replace
aq_ring_rx_deinit() with aq_ring_rx_deinit().

By doing this also adjust the functions return value from int to void
since it can never fail.

Signed-off-by: Lino Sanfilippo 
---
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 33 +---
 drivers/net/ethernet/aquantia/atlantic/aq_ring.h |  3 +--
 drivers/net/ethernet/aquantia/atlantic/aq_vec.c  | 14 ++
 3 files changed, 11 insertions(+), 39 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index dea9e9b..fed6ac5 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -123,7 +123,7 @@ void aq_ring_tx_append_buffs(struct aq_ring_s *self,
}
 }
 
-int aq_ring_tx_clean(struct aq_ring_s *self)
+void aq_ring_tx_clean(struct aq_ring_s *self)
 {
struct device *dev = aq_nic_get_dev(self->aq_nic);
 
@@ -143,11 +143,6 @@ int aq_ring_tx_clean(struct aq_ring_s *self)
if (unlikely(buff->is_eop))
dev_kfree_skb_any(buff->skb);
}
-
-   if (aq_ring_avail_dx(self) > AQ_CFG_SKB_FRAGS_MAX)
-   aq_nic_ndev_queue_start(self->aq_nic, self->idx);
-
-   return 0;
 }
 
 static inline unsigned int aq_ring_dx_in_range(unsigned int h, unsigned int i,
@@ -333,32 +328,6 @@ void aq_ring_rx_deinit(struct aq_ring_s *self)
 err_exit:;
 }
 
-void aq_ring_tx_deinit(struct aq_ring_s *self)
-{
-   if (!self)
-   goto err_exit;
-
-   for (; self->sw_head != self->sw_tail;
-   self->sw_head = aq_ring_next_dx(self, self->sw_head)) {
-   struct aq_ring_buff_s *buff = &self->buff_ring[self->sw_head];
-   struct device *ndev = aq_nic_get_dev(self->aq_nic);
-
-   if (likely(buff->is_mapped)) {
-   if (unlikely(buff->is_sop)) {
-   dma_unmap_single(ndev, buff->pa, buff->len,
-DMA_TO_DEVICE);
-   } else {
-   dma_unmap_page(ndev, buff->pa, buff->len,
-  DMA_TO_DEVICE);
-   }
-   }
-
-   if (unlikely(buff->is_eop))
-   dev_kfree_skb_any(buff->skb);
-   }
-err_exit:;
-}
-
 void aq_ring_free(struct aq_ring_s *self)
 {
if (!self)
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h 
b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
index 0ac3f9e..fb296b3 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.h
@@ -144,13 +144,12 @@ struct aq_ring_s *aq_ring_rx_alloc(struct aq_ring_s *self,
   unsigned int idx,
   struct aq_nic_cfg_s *aq_nic_cfg);
 int aq_ring_init(struct aq_ring_s *self);
-void aq_ring_tx_deinit(struct aq_ring_s *self);
 void aq_ring_rx_deinit(struct aq_ring_s *self);
 void aq_ring_free(struct aq_ring_s *self);
 void aq_ring_tx_append_buffs(struct aq_ring_s *ring,
 struct aq_ring_buff_s *buffer,
 unsigned int buffers);
-int aq_ring_tx_clean(struct aq_ring_s *self);
+void aq_ring_tx_clean(struct aq_ring_s *self);
 int aq_ring_rx_clean(struct aq_ring_s *self, int *work_done, int budget);
 int aq_ring_rx_fill(struct aq_ring_s *self);
 
diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c 
b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
index cb30a63..ad5b4d4d 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_vec.c
@@ -59,10 +59,14 @@ static int aq_vec_poll(struct napi_struct *napi, int budget)
}
 
if (ring[AQ_VEC_TX_ID].sw_head !=
-   ring[AQ_VEC_TX_ID].hw_head) {
-   err = aq_ring_tx_clean(&ring[AQ_VEC_TX_ID]);
-   if (err < 0)
-   goto err_exit;
+   ring[AQ_VEC_TX_ID].hw_head) {
+   aq_ring_tx_clean(&ring[AQ_VEC_TX_ID]);
+
+   if (aq_ring_avail_dx(&ring[AQ_VEC_TX_ID]) >
+   AQ_CFG_SKB_FRAGS_MAX) {
+   aq_nic_ndev_queue_start(self->aq_nic,
+   ring[AQ_VEC_TX_ID].idx);
+   }
was_tx_cleaned = true;
}
 
@@ -271,7 +275,7 @@ void aq_vec_deinit(struct aq_vec_s *self)
 
for (i = 0U, ring = self->ring[0];
self->tx_rings > i; ++i, 

[PATCH] net: ena: remove superfluous check in ena_remove()

2017-02-18 Thread Lino Sanfilippo
The check in ena_remove() for the pci driver data not being NULL is not
needed, since it is always set in the probe() function. Remove the
superfluous check.

Signed-off-by: Lino Sanfilippo 
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c | 6 --
 1 file changed, 6 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c 
b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index d8c920b..35f1943 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -3163,12 +3163,6 @@ static void ena_remove(struct pci_dev *pdev)
struct ena_com_dev *ena_dev;
struct net_device *netdev;
 
-   if (!adapter)
-   /* This device didn't load properly and it's resources
-* already released, nothing to do
-*/
-   return;
-
ena_dev = adapter->ena_dev;
netdev = adapter->netdev;
 
-- 
2.7.4



Re: [PATCH 1/2] tcp: setup random timestamp offset when write_seq already set

2017-02-18 Thread Alexey Kodanev
Hi,
On 18.02.2017 3:56, Alexey Kodanev wrote:
> Found that when random offset enabled (default) TCP client can
> still start new connections with and without random offsets. Later,
> if server does active close and re-use sockets in TIME-WAIT state,
> new SYN from client can be rejected on PAWS check inside
> tcp_timewait_state_process().
>

Actually, on second thoughts, we can just copy tsoffset from tw socket:

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 89a95da..f40a61d 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -126,6 +126,7 @@ int tcp_twsk_unique(struct sock *sk, struct sock
*sktw, void *twp)
tp->write_seq = 1;
tp->rx_opt.ts_recent   = tcptw->tw_ts_recent;
tp->rx_opt.ts_recent_stamp = tcptw->tw_ts_recent_stamp;
+   tp->tsoffset   = tcptw->tw_ts_offset;
sock_hold(sktw);
return 1;
}

Will test this and send a new version.

Thanks,
Alexey