Re: [patch net-next 1/4] netdevice: add SW statistics ndo

2016-05-14 Thread Jiri Pirko
Sun, May 15, 2016 at 06:11:20AM CEST, ro...@cumulusnetworks.com wrote:
>On 5/14/16, 11:46 AM, Jiri Pirko wrote:
>> Sat, May 14, 2016 at 05:47:41PM CEST, ro...@cumulusnetworks.com wrote:
>>> On 5/14/16, 5:49 AM, Jiri Pirko wrote:
 Fri, May 13, 2016 at 08:47:48PM CEST, ro...@cumulusnetworks.com wrote:
>
>[snip]
>  Jiri Pirko 
> ---
>
>>> To me netdev stats is  combined 'SW + HW' stats for that netdev.
>>> ndo_get_stats64 callback into the drivers does the magic of adding HW 
>>> stats
>>> to SW (netdev) stats and returning (see enic_get_stats). HW stats is 
>>> available for netdevs
>>> that are offloaded or are backed by hardware. SW stats is the stats 
>>> that the driver maintains
>>> (logical or physical). HW stats is queried and added to the SW stats.
>> I'm not sure I follow. HW stats already contain SW stats. Because on
>> slow path every packet that is not offloaded and goes through kernel is
>> counted into HW stats as well (because it goes through HW port). 
> yes, correct... we don't want to double count those. But since these 
> stats are
> generally queried from hw, I am calling them HW stats.
> you will not really maintain a software counter for this. But, the driver 
> can maintain its own
> counters for rx and tx errors etc and I call these SW stats. They are 
> counted at the driver.
>
>> If you
>> do HW stats + SW stats, what you get makes no sense. Am I missing 
>> something?
> If you go by my definition of HW and SW stats above, on a 
> ndo_get_stats64() call,
> you will add the SW counters + HW counters and return. In my definition, 
> the pkts
> that was rx'ed or tx'ed successfully are always in the HW count.
>
>> Btw, looking at enic_get_stats, looks exactly what we introduce for
>> mlxsw in this patchset.
> In enic_get_stats, the ones counted in software are the ones taken from 
> 'enic->'
> net_stats->rx_over_errors = enic->rq_truncated_pkts;
> net_stats->rx_crc_errors = enic->rq_bad_fcs;
>
>> With this patchset, we only allow user to se the actual stats for
>> slow-path aka SW stats.
> hmm...ok. But i am not sure how many will use this new attribute.
> When you do 'ip -s link show' you really want all counters on that port
> hardware or software does not matter at that point.
>
> My suggestion to move this to ethtool like attribute is because that is 
> an existing
> way to break down your stats which ever way you want. And the best part 
> is it can be
> customized (say rx_pkts_cpu_saw)
 I bevieve that ethtool is really not a place to expose sw stats. Does
 not make sense.
>>> 2 things:
>>> - i was surprised you don't want your ndo_get_stats64 to be a unified view 
>>> of HW and SW stats
>> Roopa, please, look at the patch 4/4. That is exactly what we are doing.
>> We expose HW stats via ndo_get_stats64 and that is of course including
>> whatever comes through slowpath (non-forwarded in HW).
>
>Maybe i missed it but i did not think it included any rx or tx err counters 
>counted solely
>by the driver.
>>
>>
>>> - by bringing up ethtool like stats (IFLA_STATS_LINK_HW_EXTENDED) I am just 
>>> saying
>>> it has always been a way to breakdown stats. If you don't want to show 
>>> explicit SW stats there,
>>> there is always a way to show HW only statsand now you know the delta 
>>> between the unified stats
>>> and the HW only stats is your SW stats.
>> I think we don/t understand each other. HW stats always include SW
>> stats. Because whatever goes in or out goes through HW. Therefore, the
>> "unified stats" you mention are exactly HW stats.
>>
>> This is fine, Patch 4/4 would do to make this correct. However, I think
>> it has value for user to know what went via slowpath (non-forwarded in HW).
>> And that is exacly exposed by the SW stats we try to add.
>>
>> Is that confusing?
>
>Its not confusing. I understand what you are doing.
>The only point I was making was that most drivers have unified stats via ndo
>and there are also hw stats via ethtool like api (which will also be part of 
>the stats
>api in the future). And sw only stats can be derived from that...which is the 
>way most

The thing is, they can't be derived from it. That is my whole point.
HW-HW=0


>people do today.
>But that's fine. If you think it will be useful/easier to have a new 
>api/attribute
>for software only stats for some drivers, sure, fine. Lets move on.
>
>


Re: [PATCH net-next 2/9] bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO

2016-05-14 Thread Michael Chan
On Sat, May 14, 2016 at 6:31 PM, Ben Hutchings  wrote:
> On Sat, 2016-05-14 at 20:29 -0400, Michael Chan wrote:
>> From: Ajit Khaparde 
> [...]
>> + /* Read A2 portion of the EEPROM */
>> + if (length) {
>> + start -= ETH_MODULE_SFF_8436_LEN;
>> + bnxt_read_sfp_module_eeprom_info(bp, I2C_DEV_ADDR_A2, 1, start,
>> +  length, data + start);
>
> The output address calculation (data + start) makes no sense at all.
> If eeprom->offset < ETH_MODULE_SFF_8436_LEN then start == 0 here and
> this read overwrites earlier data in the output buffer.  If
> eeeprom->offset > ETH_MODULE_SFF_8436_LEN then start > 0 here and this
> overruns the output buffer.
>
> I think that 'data' should be incremented along with 'start' in the
> previous if-block.
>

Yes, you're right.  We'll fix it and resend.  Thanks.


Re: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread Willy Tarreau
On Sat, May 14, 2016 at 03:21:31PM -0700, Linus Torvalds wrote:
> On Sat, May 14, 2016 at 2:33 PM, Willy Tarreau  wrote:
> >
> > Why simply not cast the atomic to (unsigned long long) instead of (u64)
> > so that %llu always matches ?
> 
> Yes, that fixes the problem. It's just more typing, and annoying. The
> fact that MS got it right while posix and gcc screwed it up is a bit
> embarrassing..

Well on the other hand, because of this MS still has problems porting
code from 32 to 64 bit. The real problem is that on both sides they
imagined that you needed only one way to specify your types. In practice
users generally want either the most optimal types for the architecture
because they don't care about the size (char, int, size_t, void *...)
or a specific size. This last one is annoying to use with printf format.

> If we ever start using __uint128_t, we'll have even more problems in
> this area. Oh well.

Definitely.

Willy



[net-next 11/13] i40e: don't add broadcast filter for VFs

2016-05-14 Thread Jeff Kirsher
From: Mitch Williams 

Now that all VSIs are configured to receive broadcasts as default, we
don't need to add a filter. This eliminates an annoying but harmless
error message each time VFs are created or reset.

Change-ID: I4cd6339684df45b0d2722133eeb84c14fa93ea19
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 9473429..1fcafcf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -665,8 +665,6 @@ static int i40e_alloc_vsi_res(struct i40e_vf *vf, enum 
i40e_vsi_type type)
goto error_alloc_vsi_res;
}
if (type == I40E_VSI_SRIOV) {
-   u8 brdcast[ETH_ALEN] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
-
vf->lan_vsi_idx = vsi->idx;
vf->lan_vsi_id = vsi->id;
/* If the port VLAN has been configured and then the
@@ -688,12 +686,6 @@ static int i40e_alloc_vsi_res(struct i40e_vf *vf, enum 
i40e_vsi_type type)
 "Could not add MAC filter %pM for VF 
%d\n",
vf->default_lan_addr.addr, vf->vf_id);
}
-   f = i40e_add_filter(vsi, brdcast,
-   vf->port_vlan_id ? vf->port_vlan_id : -1,
-   true, false);
-   if (!f)
-   dev_info(&pf->pdev->dev,
-"Could not allocate VF broadcast filter\n");
spin_unlock_bh(&vsi->mac_filter_list_lock);
}
 
-- 
2.5.5



[net-next 01/13] i40e: Add support for disabling all link and change bits needed for PHY interactions

2016-05-14 Thread Jeff Kirsher
From: Kevin Scott 

Add flag to tell firmware to disable link on all ports.

This patch changes the bits set for telling firmware the PHY needs
to be modified by driver.  Without this patch, the setting will only
set that mode for the current port on the device.  Because the
MDIO interface is common for the copper device. The command needs to
set the mode for all ports.

Change-ID: I8baa7da91d384291ac95b41ae1a516604f8eb67f
Signed-off-by: Kevin Scott 
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h| 4 +++-
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h | 3 +++
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c| 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 2a6a5d3..01cc732 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -111,7 +111,9 @@
 #define I40E_OEM_VER_PATCH_MASK0xff
 #define I40E_OEM_VER_BUILD_SHIFT   8
 #define I40E_OEM_VER_SHIFT 24
-#define I40E_PHY_DEBUG_PORTBIT(4)
+#define I40E_PHY_DEBUG_ALL \
+   (I40E_AQ_PHY_DEBUG_DISABLE_LINK_FW | \
+   I40E_AQ_PHY_DEBUG_DISABLE_ALL_LINK_FW)
 
 /* The values in here are decimal coded as hex as is the case in the NVM map*/
 #define I40E_CURRENT_NVM_VERSION_HI 0x2
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
index eacbe74..11cf1a5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h
@@ -1833,7 +1833,10 @@ struct i40e_aqc_set_phy_debug {
 #define I40E_AQ_PHY_DEBUG_RESET_EXTERNAL_NONE  0x00
 #define I40E_AQ_PHY_DEBUG_RESET_EXTERNAL_HARD  0x01
 #define I40E_AQ_PHY_DEBUG_RESET_EXTERNAL_SOFT  0x02
+/* Disable link manageability on a single port */
 #define I40E_AQ_PHY_DEBUG_DISABLE_LINK_FW  0x10
+/* Disable link manageability on all ports */
+#define I40E_AQ_PHY_DEBUG_DISABLE_ALL_LINK_FW  0x20
u8  reserved[15];
 };
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 51a994d..6fa05c4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1880,7 +1880,7 @@ static int i40e_set_phys_id(struct net_device *netdev,
if (!(pf->flags & I40E_FLAG_HAVE_10GBASET_PHY)) {
pf->led_status = i40e_led_get(hw);
} else {
-   i40e_aq_set_phy_debug(hw, I40E_PHY_DEBUG_PORT, NULL);
+   i40e_aq_set_phy_debug(hw, I40E_PHY_DEBUG_ALL, NULL);
ret = i40e_led_get_phy(hw, &temp_status,
   &pf->phy_led_val);
pf->led_status = temp_status;
-- 
2.5.5



[net-next 03/13] i40e: Implement the API function for aq_set_switch_config

2016-05-14 Thread Jeff Kirsher
From: Shannon Nelson 

Add the support code for calling the AdminQ API call aq_set_switch_config

Signed-off-by: Shannon Nelson 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c| 29 
 drivers/net/ethernet/intel/i40e/i40e_prototype.h |  4 
 2 files changed, 33 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 4a934e1..4739a9c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -2283,6 +2283,35 @@ i40e_status i40e_aq_get_switch_config(struct i40e_hw *hw,
 }
 
 /**
+ * i40e_aq_set_switch_config
+ * @hw: pointer to the hardware structure
+ * @flags: bit flag values to set
+ * @valid_flags: which bit flags to set
+ * @cmd_details: pointer to command details structure or NULL
+ *
+ * Set switch configuration bits
+ **/
+enum i40e_status_code i40e_aq_set_switch_config(struct i40e_hw *hw,
+   u16 flags,
+   u16 valid_flags,
+   struct i40e_asq_cmd_details *cmd_details)
+{
+   struct i40e_aq_desc desc;
+   struct i40e_aqc_set_switch_config *scfg =
+   (struct i40e_aqc_set_switch_config *)&desc.params.raw;
+   enum i40e_status_code status;
+
+   i40e_fill_default_direct_cmd_desc(&desc,
+ i40e_aqc_opc_set_switch_config);
+   scfg->flags = cpu_to_le16(flags);
+   scfg->valid_flags = cpu_to_le16(valid_flags);
+
+   status = i40e_asq_send_command(hw, &desc, NULL, 0, cmd_details);
+
+   return status;
+}
+
+/**
  * i40e_aq_get_firmware_version
  * @hw: pointer to the hw struct
  * @fw_major_version: firmware major version
diff --git a/drivers/net/ethernet/intel/i40e/i40e_prototype.h 
b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
index 4c8977c..b76b158 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_prototype.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_prototype.h
@@ -182,6 +182,10 @@ i40e_status i40e_aq_get_switch_config(struct i40e_hw *hw,
struct i40e_aqc_get_switch_config_resp *buf,
u16 buf_size, u16 *start_seid,
struct i40e_asq_cmd_details *cmd_details);
+enum i40e_status_code i40e_aq_set_switch_config(struct i40e_hw *hw,
+   u16 flags,
+   u16 valid_flags,
+   struct i40e_asq_cmd_details *cmd_details);
 i40e_status i40e_aq_request_resource(struct i40e_hw *hw,
enum i40e_aq_resources_ids resource,
enum i40e_aq_resource_access_type access,
-- 
2.5.5



[net-next 13/13] i40e: fix an uninitialized variable bug

2016-05-14 Thread Jeff Kirsher
From: Dan Carpenter 

We removed this initialization but it is required.  Let's put it back.

Fixes: 895106a577c4 ('i40e: trivial fixes')
Signed-off-by: Dan Carpenter 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_hmc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_hmc.c 
b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
index 5ebe12d..a7c7b1d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_hmc.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_hmc.c
@@ -49,7 +49,7 @@ i40e_status i40e_add_sd_table_entry(struct i40e_hw *hw,
struct i40e_hmc_sd_entry *sd_entry;
bool dma_mem_alloc_done = false;
struct i40e_dma_mem mem;
-   i40e_status ret_code;
+   i40e_status ret_code = I40E_SUCCESS;
u64 alloc_len;
 
if (NULL == hmc_info->sd_table.sd_entry) {
-- 
2.5.5



[net-next 09/13] i40e: set context to use VSI RSS LUT for SR-IOV

2016-05-14 Thread Jeff Kirsher
From: Ashish Shah 

For the SR-IOV VSIs, when the queue filtering section is valid,
the RSS LUT needs to be set to use the VSI specific lookup table
(otherwise it will use the PF RSS LUT table).

Change-ID: Ia9377cc818078238a75c3bdeade1b593a91b3480
Signed-off-by: Ashish Shah 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f8038d0..a981246 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -9362,7 +9362,8 @@ static int i40e_add_vsi(struct i40e_vsi *vsi)
ctxt.info.valid_sections |=
cpu_to_le16(I40E_AQ_VSI_PROP_QUEUE_OPT_VALID);
ctxt.info.queueing_opt_flags |=
-   I40E_AQ_VSI_QUE_OPT_TCP_ENA;
+   (I40E_AQ_VSI_QUE_OPT_TCP_ENA |
+I40E_AQ_VSI_QUE_OPT_RSS_LUT_VSI);
}
 
ctxt.info.valid_sections |= 
cpu_to_le16(I40E_AQ_VSI_PROP_VLAN_VALID);
-- 
2.5.5



[net-next 12/13] i40e: Bump version from 1.5.10 to 1.5.16

2016-05-14 Thread Jeff Kirsher
From: Bimmy Pujari 

Signed-off-by: Bimmy Pujari 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a981246..1cd0ebf 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -46,7 +46,7 @@ static const char i40e_driver_string[] =
 
 #define DRV_VERSION_MAJOR 1
 #define DRV_VERSION_MINOR 5
-#define DRV_VERSION_BUILD 10
+#define DRV_VERSION_BUILD 16
 #define DRV_VERSION __stringify(DRV_VERSION_MAJOR) "." \
 __stringify(DRV_VERSION_MINOR) "." \
 __stringify(DRV_VERSION_BUILD)DRV_KERN
-- 
2.5.5



[net-next 07/13] i40e: change Rx hang message into a WARN_ONCE

2016-05-14 Thread Jeff Kirsher
From: Jacob Keller 

Use WARN_ONCE in order to highlight the issue, but don't display
a warning every time. The user should be able to see the ethtool counter
we created if necessary to see how often it is occurring.

Change-ID: I40c4ea159819b64a7d33b7f5716749089791533a
Signed-off-by: Jacob Keller 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ptp.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ptp.c 
b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
index a1b878a..ed39cba 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ptp.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ptp.c
@@ -289,9 +289,7 @@ void i40e_ptp_rx_hang(struct i40e_vsi *vsi)
rd32(hw, I40E_PRTTSYN_RXTIME_H(3));
pf->last_rx_ptp_check = jiffies;
pf->rx_hwtstamp_cleared++;
-   dev_warn(&vsi->back->pdev->dev,
-"%s: clearing Rx timestamp hang\n",
-__func__);
+   WARN_ONCE(1, "Detected Rx timestamp register hang\n");
}
 }
 
-- 
2.5.5



[net-next 10/13] i40e/i40evf: properly report Rx packet hash

2016-05-14 Thread Jeff Kirsher
From: Mitch Williams 

This logic is inverted. If the RXHASH flag is set, then we should go
ahead and call skb_set_hash.

Change-ID: Ib2e30356dced1d3e939c8061ab6ad5bd94197e7c
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b0edffe..99a524d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1394,7 +1394,7 @@ static inline void i40e_rx_hash(struct i40e_ring *ring,
cpu_to_le64((u64)I40E_RX_DESC_FLTSTAT_RSS_HASH <<
I40E_RX_DESC_STATUS_FLTSTAT_SHIFT);
 
-   if (ring->netdev->features & NETIF_F_RXHASH)
+   if (!(ring->netdev->features & NETIF_F_RXHASH))
return;
 
if ((rx_desc->wb.qword1.status_error_len & rss_mask) == rss_mask) {
-- 
2.5.5



[net-next 02/13] i40e: Add allmulti support for the VF

2016-05-14 Thread Jeff Kirsher
From: Anjali Singhai Jain 

This patch enables a feature to enable/disable all multicast
for a trusted VF.

Change-Id: I926eba7f8850c8d40f8ad7e08bbe4056bbd3985f
Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40evf/i40evf.h  |  3 +++
 drivers/net/ethernet/intel/i40evf/i40evf_main.c | 15 ++-
 drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c | 15 +--
 3 files changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40evf/i40evf.h 
b/drivers/net/ethernet/intel/i40evf/i40evf.h
index fa044a9..76ed97d 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf.h
+++ b/drivers/net/ethernet/intel/i40evf/i40evf.h
@@ -215,6 +215,7 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_OUTER_UDP_CSUM_CAPABLE BIT(12)
 #define I40EVF_FLAG_ADDR_SET_BY_PF BIT(13)
 #define I40EVF_FLAG_PROMISC_ON BIT(15)
+#define I40EVF_FLAG_ALLMULTI_ONBIT(16)
 /* duplicates for common code */
 #define I40E_FLAG_FDIR_ATR_ENABLED  0
 #define I40E_FLAG_DCB_ENABLED   0
@@ -241,6 +242,8 @@ struct i40evf_adapter {
 #define I40EVF_FLAG_AQ_SET_RSS_LUT BIT(14)
 #define I40EVF_FLAG_AQ_REQUEST_PROMISC BIT(15)
 #define I40EVF_FLAG_AQ_RELEASE_PROMISC BIT(16)
+#define I40EVF_FLAG_AQ_REQUEST_ALLMULTIBIT(17)
+#define I40EVF_FLAG_AQ_RELEASE_ALLMULTIBIT(18)
 
/* OS defined structs */
struct net_device *netdev;
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_main.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
index b548dbe..642bb45 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_main.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_main.c
@@ -934,6 +934,13 @@ bottom_of_search_loop:
 adapter->flags & I40EVF_FLAG_PROMISC_ON)
adapter->aq_required |= I40EVF_FLAG_AQ_RELEASE_PROMISC;
 
+   if (netdev->flags & IFF_ALLMULTI &&
+   !(adapter->flags & I40EVF_FLAG_ALLMULTI_ON))
+   adapter->aq_required |= I40EVF_FLAG_AQ_REQUEST_ALLMULTI;
+   else if (!(netdev->flags & IFF_ALLMULTI) &&
+adapter->flags & I40EVF_FLAG_ALLMULTI_ON)
+   adapter->aq_required |= I40EVF_FLAG_AQ_RELEASE_ALLMULTI;
+
clear_bit(__I40EVF_IN_CRITICAL_TASK, &adapter->crit_section);
 }
 
@@ -1612,7 +1619,13 @@ static void i40evf_watchdog_task(struct work_struct 
*work)
goto watchdog_done;
}
 
-   if (adapter->aq_required & I40EVF_FLAG_AQ_RELEASE_PROMISC) {
+   if (adapter->aq_required & I40EVF_FLAG_AQ_REQUEST_ALLMULTI) {
+   i40evf_set_promiscuous(adapter, I40E_FLAG_VF_MULTICAST_PROMISC);
+   goto watchdog_done;
+   }
+
+   if ((adapter->aq_required & I40EVF_FLAG_AQ_RELEASE_PROMISC) &&
+   (adapter->aq_required & I40EVF_FLAG_AQ_RELEASE_ALLMULTI)) {
i40evf_set_promiscuous(adapter, 0);
goto watchdog_done;
}
diff --git a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c 
b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
index c5d33a2..f134456 100644
--- a/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
+++ b/drivers/net/ethernet/intel/i40evf/i40evf_virtchnl.c
@@ -641,6 +641,7 @@ void i40evf_del_vlans(struct i40evf_adapter *adapter)
 void i40evf_set_promiscuous(struct i40evf_adapter *adapter, int flags)
 {
struct i40e_virtchnl_promisc_info vpi;
+   int promisc_all;
 
if (adapter->current_op != I40E_VIRTCHNL_OP_UNKNOWN) {
/* bail because we already have a command pending */
@@ -649,11 +650,21 @@ void i40evf_set_promiscuous(struct i40evf_adapter 
*adapter, int flags)
return;
}
 
-   if (flags) {
+   promisc_all = I40E_FLAG_VF_UNICAST_PROMISC |
+ I40E_FLAG_VF_MULTICAST_PROMISC;
+   if ((flags & promisc_all) == promisc_all) {
adapter->flags |= I40EVF_FLAG_PROMISC_ON;
adapter->aq_required &= ~I40EVF_FLAG_AQ_REQUEST_PROMISC;
dev_info(&adapter->pdev->dev, "Entering promiscuous mode\n");
-   } else {
+   }
+
+   if (flags & I40E_FLAG_VF_MULTICAST_PROMISC) {
+   adapter->flags |= I40EVF_FLAG_ALLMULTI_ON;
+   adapter->aq_required &= ~I40EVF_FLAG_AQ_REQUEST_ALLMULTI;
+   dev_info(&adapter->pdev->dev, "Entering multicast promiscuous 
mode\n");
+   }
+
+   if (!flags) {
adapter->flags &= ~I40EVF_FLAG_PROMISC_ON;
adapter->aq_required &= ~I40EVF_FLAG_AQ_RELEASE_PROMISC;
dev_info(&adapter->pdev->dev, "Leaving promiscuous mode\n");
-- 
2.5.5



[net-next 06/13] i40e: Refactor ethtool get_settings

2016-05-14 Thread Jeff Kirsher
From: Catherine Sullivan 

Previously we were only looking at the FW supported PHY types if link
was down, because we want to be more specific when link is up. This
refactor changes this. When link is down, we still rely on the FW
supported PHY types, but when link is up, we select the possible
supported link modes from what we know about the current PHY type, and
AND that with the FW supported PHY types.

Change-ID: Ice5dad83f2a17932b0b8b59f07439696ad6aa013
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 258 +
 1 file changed, 135 insertions(+), 123 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 52b58e3..5e8d84f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -262,6 +262,110 @@ static void i40e_partition_setting_complaint(struct 
i40e_pf *pf)
 }
 
 /**
+ * i40e_phy_type_to_ethtool - convert the phy_types to ethtool link modes
+ * @phy_types: PHY types to convert
+ * @supported: pointer to the ethtool supported variable to fill in
+ * @advertising: pointer to the ethtool advertising variable to fill in
+ *
+ **/
+static void i40e_phy_type_to_ethtool(struct i40e_pf *pf, u32 *supported,
+u32 *advertising)
+{
+   enum i40e_aq_capabilities_phy_type phy_types = pf->hw.phy.phy_types;
+
+   *supported = 0x0;
+   *advertising = 0x0;
+
+   if (phy_types & I40E_CAP_PHY_TYPE_SGMII) {
+   *supported |= SUPPORTED_Autoneg |
+ SUPPORTED_1000baseT_Full;
+   *advertising |= ADVERTISED_Autoneg |
+   ADVERTISED_1000baseT_Full;
+   if (pf->flags & I40E_FLAG_100M_SGMII_CAPABLE) {
+   *supported |= SUPPORTED_100baseT_Full;
+   *advertising |= ADVERTISED_100baseT_Full;
+   }
+   }
+   if (phy_types & I40E_CAP_PHY_TYPE_XAUI ||
+   phy_types & I40E_CAP_PHY_TYPE_XFI ||
+   phy_types & I40E_CAP_PHY_TYPE_SFI ||
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_SFPP_CU ||
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_AOC)
+   *supported |= SUPPORTED_1baseT_Full;
+   if (phy_types & I40E_CAP_PHY_TYPE_10GBASE_CR1_CU ||
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_CR1 ||
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_T ||
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_SR ||
+   phy_types & I40E_CAP_PHY_TYPE_10GBASE_LR) {
+   *supported |= SUPPORTED_Autoneg |
+ SUPPORTED_1baseT_Full;
+   *advertising |= ADVERTISED_Autoneg |
+   ADVERTISED_1baseT_Full;
+   }
+   if (phy_types & I40E_CAP_PHY_TYPE_XLAUI ||
+   phy_types & I40E_CAP_PHY_TYPE_XLPPI ||
+   phy_types & I40E_CAP_PHY_TYPE_40GBASE_AOC)
+   *supported |= SUPPORTED_4baseCR4_Full;
+   if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_CR4_CU ||
+   phy_types & I40E_CAP_PHY_TYPE_40GBASE_CR4) {
+   *supported |= SUPPORTED_Autoneg |
+ SUPPORTED_4baseCR4_Full;
+   *advertising |= ADVERTISED_Autoneg |
+   ADVERTISED_4baseCR4_Full;
+   }
+   if ((phy_types & I40E_CAP_PHY_TYPE_100BASE_TX) &&
+   !(phy_types & I40E_CAP_PHY_TYPE_1000BASE_T)) {
+   *supported |= SUPPORTED_Autoneg |
+ SUPPORTED_100baseT_Full;
+   *advertising |= ADVERTISED_Autoneg |
+   ADVERTISED_100baseT_Full;
+   }
+   if (phy_types & I40E_CAP_PHY_TYPE_1000BASE_T ||
+   phy_types & I40E_CAP_PHY_TYPE_1000BASE_SX ||
+   phy_types & I40E_CAP_PHY_TYPE_1000BASE_LX ||
+   phy_types & I40E_CAP_PHY_TYPE_1000BASE_T_OPTICAL) {
+   *supported |= SUPPORTED_Autoneg |
+ SUPPORTED_1000baseT_Full;
+   *advertising |= ADVERTISED_Autoneg |
+   ADVERTISED_1000baseT_Full;
+   }
+   if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_SR4)
+   *supported |= SUPPORTED_4baseSR4_Full;
+   if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_LR4)
+   *supported |= SUPPORTED_4baseLR4_Full;
+   if (phy_types & I40E_CAP_PHY_TYPE_40GBASE_KR4) {
+   *supported |= SUPPORTED_4baseKR4_Full |
+ SUPPORTED_Autoneg;
+   *advertising |= ADVERTISED_4baseKR4_Full |
+   ADVERTISED_Autoneg;
+   }
+   if (phy_types & I40E_CAP_PHY_TYPE_20GBASE_KR2) {
+   *supported |= SUPPORTED_2baseKR2_Full |
+ SUPPORTED_Autoneg;
+   *advertising |= ADVERTISED_2bas

[net-next 08/13] i40e: Correct UDP packet header for non_tunnel-ipv6

2016-05-14 Thread Jeff Kirsher
From: Akeem G Abodunrin 

This patch corrects Rx ptype payload layer for non_tunneled ipv6. It
should be layer 4 for UDP, instead of layer 3.

Change-ID: I9382e4458ab3c4e58f6d2e9f195d5d4ee513805e
Signed-off-by: Akeem G Abodunrin 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 27c6f9d..422b41d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -696,7 +696,7 @@ struct i40e_rx_ptype_decoded i40e_ptype_lookup[] = {
/* Non Tunneled IPv6 */
I40E_PTT(88, IP, IPV6, FRG, NONE, NONE, NOF, NONE, PAY3),
I40E_PTT(89, IP, IPV6, NOF, NONE, NONE, NOF, NONE, PAY3),
-   I40E_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP,  PAY3),
+   I40E_PTT(90, IP, IPV6, NOF, NONE, NONE, NOF, UDP,  PAY4),
I40E_PTT_UNUSED_ENTRY(91),
I40E_PTT(92, IP, IPV6, NOF, NONE, NONE, NOF, TCP,  PAY4),
I40E_PTT(93, IP, IPV6, NOF, NONE, NONE, NOF, SCTP, PAY4),
-- 
2.5.5



[net-next 05/13] i40e: lie to the VF

2016-05-14 Thread Jeff Kirsher
From: Mitch Williams 

If an untrusted VF attempts to configure promiscuous mode, log a message
pointing out its naughty behavior. But then, instead of returning an
error to the offender, just lie to it and say everything's OK. It will
continue on its way, thinking it's in promiscuous mode, but receiving no
packets except its own.

Change-ID: I63369215b1720f3c531eedfc06af86ff8c0e3dc8
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index 6430933..9473429 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -1474,12 +1474,16 @@ static int i40e_vc_config_promiscuous_mode_msg(struct 
i40e_vf *vf,
 
vsi = i40e_find_vsi_from_id(pf, info->vsi_id);
if (!test_bit(I40E_VF_STAT_ACTIVE, &vf->vf_states) ||
-   !test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps) ||
!i40e_vc_isvalid_vsi_id(vf, info->vsi_id)) {
+   aq_ret = I40E_ERR_PARAM;
+   goto error_param;
+   }
+   if (!test_bit(I40E_VIRTCHNL_VF_CAP_PRIVILEGE, &vf->vf_caps)) {
dev_err(&pf->pdev->dev,
-   "VF %d doesn't meet requirements to enter promiscuous 
mode\n",
+   "Unprivileged VF %d is attempting to configure 
promiscuous mode\n",
vf->vf_id);
-   aq_ret = I40E_ERR_PARAM;
+   /* Lie to the VF on purpose. */
+   aq_ret = 0;
goto error_param;
}
/* Multicast promiscuous handling*/
-- 
2.5.5



[net-next 04/13] i40e: Add vf-true-promisc-support priv flag

2016-05-14 Thread Jeff Kirsher
From: Anjali Singhai Jain 

This patch adds priv-flag knob to configure global true promisc
support. With this patch the user can decide the flavor of
promiscuous that the VFs will see when promiscuous mode is enabled
on the interface. Since this a global setting for the whole device,
the priv-flag is exposed only on the first PF of the device.

The default is true promisc support is off, which means the promisc
mode for the VF will be limited/defport mode.

For the PF, we still will be in limited promisc unless in MFP mode
irrespective of the flavor picked through this knob.

Usage:
On PF0
ethtool --show-priv-flags p261p1
Private flags for p261p1:
MFP: off
LinkPolling: off
flow-director-atr  : on
veb-stats  : off
hw-atr-eviction: off
vf-true-promisc-support: off

to enable setting true promisc
ethtool --set-priv-flags p261p1 vf-true-promisc-support on

At this point if the VF is set to trust and promisc is enabled
on the VF through
ip link set ... promisc on
The VF/VFs will be able to see ALL ingress traffic

Change-Id: I8fac4b6eb1af9ca77b5376b79c50bdce5055bd94
Signed-off-by: Anjali Singhai Jain 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h | 12 ++--
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  9 ++-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 72 +++---
 drivers/net/ethernet/intel/i40e/i40e_main.c| 30 -
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |  3 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  3 +-
 6 files changed, 111 insertions(+), 18 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 01cc732..9c44739 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -97,11 +97,12 @@
 #define I40E_INT_NAME_STR_LEN(IFNAMSIZ + 16)
 
 /* Ethtool Private Flags */
-#define I40E_PRIV_FLAGS_NPAR_FLAG  BIT(0)
-#define I40E_PRIV_FLAGS_LINKPOLL_FLAG  BIT(1)
-#define I40E_PRIV_FLAGS_FD_ATR BIT(2)
-#define I40E_PRIV_FLAGS_VEB_STATS  BIT(3)
-#define I40E_PRIV_FLAGS_HW_ATR_EVICT   BIT(5)
+#defineI40E_PRIV_FLAGS_MFP_FLAGBIT(0)
+#defineI40E_PRIV_FLAGS_LINKPOLL_FLAG   BIT(1)
+#define I40E_PRIV_FLAGS_FD_ATR BIT(2)
+#define I40E_PRIV_FLAGS_VEB_STATS  BIT(3)
+#define I40E_PRIV_FLAGS_HW_ATR_EVICT   BIT(4)
+#define I40E_PRIV_FLAGS_TRUE_PROMISC_SUPPORT   BIT(5)
 
 #define I40E_NVM_VERSION_LO_SHIFT  0
 #define I40E_NVM_VERSION_LO_MASK   (0xff << I40E_NVM_VERSION_LO_SHIFT)
@@ -358,6 +359,7 @@ struct i40e_pf {
 #define I40E_FLAG_STOP_FW_LLDP BIT_ULL(47)
 #define I40E_FLAG_HAVE_10GBASET_PHYBIT_ULL(48)
 #define I40E_FLAG_PF_MAC   BIT_ULL(50)
+#define I40E_FLAG_TRUE_PROMISC_SUPPORT BIT_ULL(51)
 
/* tracks features that get auto disabled by errors */
u64 auto_disable_flags;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 4739a9c..27c6f9d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1972,10 +1972,12 @@ aq_add_vsi_exit:
  * @seid: vsi number
  * @set: set unicast promiscuous enable/disable
  * @cmd_details: pointer to command details structure or NULL
+ * @rx_only_promisc: flag to decide if egress traffic gets mirrored in promisc
  **/
 i40e_status i40e_aq_set_vsi_unicast_promiscuous(struct i40e_hw *hw,
u16 seid, bool set,
-   struct i40e_asq_cmd_details *cmd_details)
+   struct i40e_asq_cmd_details *cmd_details,
+   bool rx_only_promisc)
 {
struct i40e_aq_desc desc;
struct i40e_aqc_set_vsi_promiscuous_modes *cmd =
@@ -1988,8 +1990,9 @@ i40e_status i40e_aq_set_vsi_unicast_promiscuous(struct 
i40e_hw *hw,
 
if (set) {
flags |= I40E_AQC_SET_VSI_PROMISC_UNICAST;
-   if (((hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver >= 5)) ||
-   (hw->aq.api_maj_ver > 1))
+   if (rx_only_promisc &&
+   (((hw->aq.api_maj_ver == 1) && (hw->aq.api_min_ver >= 5)) ||
+(hw->aq.api_maj_ver > 1)))
flags |= I40E_AQC_SET_VSI_PROMISC_TX;
}
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 6fa05c4..52b58e3 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -230,6 +230,17 @@ static const char i40e_gstrings_test[][ETH_GSTRING_LEN] = {
 
 #define I40E_TEST_LEN (sizeof(i40e_gstrings_test) / ETH_GSTRING_LEN)
 
+static const char i40e_priv_flags_strings_gl[][ETH_GSTRING_LEN] = {
+   "M

[net-next 00/13][pull request] 40GbE Intel Wired LAN Driver Updates 2016-05-14

2016-05-14 Thread Jeff Kirsher
This series contains updates to i40e and i40evf.

Kevin adds support to disable link on all ports and changes bits set
for telling firmware the PHY needs to be modified by the driver.

Anjali adds a feature to enable/disable all multicast for a trusted
VF.  Added priv-flag knob to configure global true promiscuous
support.

Shannon adds the support code for calling the admin queue API call
aq_set_switch_config().

Mitch modifies the VF, to log a message if an untrusted VF attempts to
configure promiscuous mode, but lies to it and returns everything is ok
instead of returning an error.  Corrects the logic for reporting the
receive packet hash.  Fixed the adding of a broadcast filter for VFs,
since that all VSIs are configured to receive broadcasts as default,
so do not need to add a filter.

Catherine refactors the ethtool get_settings to report the possible
supported link modes from what we know about the current PHY type and
that with the firmware supported PHY types.

Jacob changes the driver to use WARN_ONCE in order to highlight the
issue, but do not display a warning every time when receive hang
message is received.

Akeem corrects receive ptype payload layer for non_tunneled IPv6, when
it should be layer 4 for UDP, instead of layer 3.

Dan Carpenter fixes an uninitialized variable bug.

The following are changes since commit 8ea658cea453e3deede3851b58113112ae1dd9cb:
  Merge branch '1GbE' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
and are available in the git repository at:
  git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue 40GbE

Akeem G Abodunrin (1):
  i40e: Correct UDP packet header for non_tunnel-ipv6

Anjali Singhai Jain (2):
  i40e: Add allmulti support for the VF
  i40e: Add vf-true-promisc-support priv flag

Ashish Shah (1):
  i40e: set context to use VSI RSS LUT for SR-IOV

Bimmy Pujari (1):
  i40e: Bump version from 1.5.10 to 1.5.16

Catherine Sullivan (1):
  i40e: Refactor ethtool get_settings

Dan Carpenter (1):
  i40e: fix an uninitialized variable bug

Jacob Keller (1):
  i40e: change Rx hang message into a WARN_ONCE

Kevin Scott (1):
  i40e: Add support for disabling all link and change bits needed for
PHY interactions

Mitch Williams (3):
  i40e: lie to the VF
  i40e/i40evf: properly report Rx packet hash
  i40e: don't add broadcast filter for VFs

Shannon Nelson (1):
  i40e: Implement the API function for aq_set_switch_config

 drivers/net/ethernet/intel/i40e/i40e.h |  16 +-
 drivers/net/ethernet/intel/i40e/i40e_adminq_cmd.h  |   3 +
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  40 ++-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 332 +
 drivers/net/ethernet/intel/i40e/i40e_hmc.c |   2 +-
 drivers/net/ethernet/intel/i40e/i40e_main.c|  35 ++-
 drivers/net/ethernet/intel/i40e/i40e_prototype.h   |   7 +-
 drivers/net/ethernet/intel/i40e/i40e_ptp.c |   4 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   2 +-
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c |  21 +-
 drivers/net/ethernet/intel/i40evf/i40evf.h |   3 +
 drivers/net/ethernet/intel/i40evf/i40evf_main.c|  15 +-
 .../net/ethernet/intel/i40evf/i40evf_virtchnl.c|  15 +-
 13 files changed, 330 insertions(+), 165 deletions(-)

-- 
2.5.5



Re: [patch net-next 1/4] netdevice: add SW statistics ndo

2016-05-14 Thread Roopa Prabhu
On 5/14/16, 11:46 AM, Jiri Pirko wrote:
> Sat, May 14, 2016 at 05:47:41PM CEST, ro...@cumulusnetworks.com wrote:
>> On 5/14/16, 5:49 AM, Jiri Pirko wrote:
>>> Fri, May 13, 2016 at 08:47:48PM CEST, ro...@cumulusnetworks.com wrote:

[snip]
  Jiri Pirko 
 ---

>> To me netdev stats is  combined 'SW + HW' stats for that netdev.
>> ndo_get_stats64 callback into the drivers does the magic of adding HW 
>> stats
>> to SW (netdev) stats and returning (see enic_get_stats). HW stats is 
>> available for netdevs
>> that are offloaded or are backed by hardware. SW stats is the stats that 
>> the driver maintains
>> (logical or physical). HW stats is queried and added to the SW stats.
> I'm not sure I follow. HW stats already contain SW stats. Because on
> slow path every packet that is not offloaded and goes through kernel is
> counted into HW stats as well (because it goes through HW port). 
 yes, correct... we don't want to double count those. But since these stats 
 are
 generally queried from hw, I am calling them HW stats.
 you will not really maintain a software counter for this. But, the driver 
 can maintain its own
 counters for rx and tx errors etc and I call these SW stats. They are 
 counted at the driver.

> If you
> do HW stats + SW stats, what you get makes no sense. Am I missing 
> something?
 If you go by my definition of HW and SW stats above, on a 
 ndo_get_stats64() call,
 you will add the SW counters + HW counters and return. In my definition, 
 the pkts
 that was rx'ed or tx'ed successfully are always in the HW count.

> Btw, looking at enic_get_stats, looks exactly what we introduce for
> mlxsw in this patchset.
 In enic_get_stats, the ones counted in software are the ones taken from 
 'enic->'
 net_stats->rx_over_errors = enic->rq_truncated_pkts;
 net_stats->rx_crc_errors = enic->rq_bad_fcs;

> With this patchset, we only allow user to se the actual stats for
> slow-path aka SW stats.
 hmm...ok. But i am not sure how many will use this new attribute.
 When you do 'ip -s link show' you really want all counters on that port
 hardware or software does not matter at that point.

 My suggestion to move this to ethtool like attribute is because that is an 
 existing
 way to break down your stats which ever way you want. And the best part is 
 it can be
 customized (say rx_pkts_cpu_saw)
>>> I bevieve that ethtool is really not a place to expose sw stats. Does
>>> not make sense.
>> 2 things:
>> - i was surprised you don't want your ndo_get_stats64 to be a unified view 
>> of HW and SW stats
> Roopa, please, look at the patch 4/4. That is exactly what we are doing.
> We expose HW stats via ndo_get_stats64 and that is of course including
> whatever comes through slowpath (non-forwarded in HW).

Maybe i missed it but i did not think it included any rx or tx err counters 
counted solely
by the driver.
>
>
>> - by bringing up ethtool like stats (IFLA_STATS_LINK_HW_EXTENDED) I am just 
>> saying
>> it has always been a way to breakdown stats. If you don't want to show 
>> explicit SW stats there,
>> there is always a way to show HW only statsand now you know the delta 
>> between the unified stats
>> and the HW only stats is your SW stats.
> I think we don/t understand each other. HW stats always include SW
> stats. Because whatever goes in or out goes through HW. Therefore, the
> "unified stats" you mention are exactly HW stats.
>
> This is fine, Patch 4/4 would do to make this correct. However, I think
> it has value for user to know what went via slowpath (non-forwarded in HW).
> And that is exacly exposed by the SW stats we try to add.
>
> Is that confusing?

Its not confusing. I understand what you are doing.
The only point I was making was that most drivers have unified stats via ndo
and there are also hw stats via ethtool like api (which will also be part of 
the stats
api in the future). And sw only stats can be derived from that...which is the 
way most
people do today.
But that's fine. If you think it will be useful/easier to have a new 
api/attribute
for software only stats for some drivers, sure, fine. Lets move on.




Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding

2016-05-14 Thread Eric W. Biederman
Florian Westphal  writes:

> Eric W. Biederman  wrote:
>> Florian Westphal  writes:
>> 
>> > Eric W. Biederman  wrote:
>> >> Florian could you test and verify this patch fixes your issues?
>> >
>> > Yes, this seems to work.
>> >
>> > Pablo, I'm fine with this patch going into -nf/stable but I do not think
>> > making the pointers per netns is a desireable option in the long term.
>> >
>> >> Unlike the other possibilities that have been discussed this also
>> >> addresses the nf_queue path as well as the nf_queue_hook_drop path.
>> >
>> > The nf_queue path should have been fine, no?
>> >
>> > Or putting it differently: can we start processing skbs before a netns
>> > is fully initialized?
>> 
>> The practical case that worries me is what happens when someone does
>> "rmmod nfnetlink_queue" while the system is running.  It appears to me
>> that today we could free the per netns data during the rcu grace period
>> and cause a similar issue in nfnl_queue_pernet.
>>
>> That looks like it could affect both the nf_queue path and the
>> nf_queue_nf_hook_drop path.
>
> OK, I'll check this again but I seem to recall this was fine (the
> nfqueue module exit path sets the handler to NULL before doing anything
> else).

Good point.

Yes, the nfnetlink_queue module calls nf_unregister_queue_handler()
in the module fini method before it does anything else.  That does
set queue_handler to NULL and calls synchronize_rcu() before anything
else.

So in practice that is not a problem, but being disconnected from
everything else that is not immediately apparent.  Sigh.

> The normal netns exit path should be fine too as exit and free happens
> in two distinct loops, i.e. while (without your change) we can have
> calls to nf_queue_hook_drop after the nfqueue netns exit function was
> called, these calls will always happen before the pernets data is
> freed.

Ouch.  That is a little scary.  Today we only have remove_proc_entry
in nfnl_queue_net_exit.  If we had something more substantial those
calls after .exit (without my change) we could get into nasty to find
oopses.  So I guess I did prevent a possible future issue with my patch.


I am half wondering if we could make everything simpler by simply not
allowing nfnetlink_queue be a module.

Eric


Re: BUG: net/tipc: NULL-ptr dereference in tipc_nl_publ_dump

2016-05-14 Thread Baozeng Ding



On 2016/5/15 1:13, Eric Dumazet wrote:

On Sat, 2016-05-14 at 23:22 +0800, Baozeng Ding wrote:

Hello all,
The following program triggers NULL-ptr dereference in
tipc_nl_publ_dump. The kernel version is 4.6.0-rc7+ (on May 13 commit
1410b74e4061e05a5d2bffb1f99829efce27c8a9). Thanks.
--
netlink: 1 bytes leftover after parsing attributes in process
`syz-executor'.
kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory
accessgeneral protection fault:  [#1] SMP KASAN
Modules linked in:
CPU: 2 PID: 1346 Comm: syz-executor Not tainted 4.6.0-rc7+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
task: 88001eb1dd40 ti: 88001bd98000 task.ti: 88001bd98000
RIP: 0010:[]  []
tipc_nl_publ_dump+0xa39/0xdf0
RSP: 0018:88001bd9f428  EFLAGS: 00010246
RAX: dc00 RBX: 88003562efc0 RCX: c900012c7000
RDX:  RSI: 880036215d98 RDI: 8800196fda98
RBP: 88001bd9f678 R08: 0001 R09: 
R10: ed00032dfb5a R11: 11131255 R12: 
R13: 88002d0f8040 R14:  R15: 88002ea220a8
FS:  7f0b7c70f700() GS:88003620() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20b5d7f2 CR3: 301fe000 CR4: 06e0
Stack:
    88002ea22100 88002ea220f8 88002ea220f0
   1bd9f520 1100037b3e92 88002ea220b0 88001bd9f498
   815bcc6e 880036223e40 88002fd60008 
Call Trace:
   [] genl_lock_dumpit+0x68/0x90
net/netlink/genetlink.c:517
   [] netlink_dump+0x36a/0xa40
net/netlink/af_netlink.c:2108
   [] __netlink_dump_start+0x4e9/0x760
net/netlink/af_netlink.c:2196
   [] genl_family_rcv_msg+0xa91/0xc30
net/netlink/genetlink.c:584
   [] genl_rcv_msg+0x1ab/0x260 net/netlink/genetlink.c:658
   [] netlink_rcv_skb+0x29c/0x390
net/netlink/af_netlink.c:2277
   [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
   [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
   [] netlink_unicast+0x5a2/0x890
net/netlink/af_netlink.c:1240
   [] netlink_sendmsg+0x981/0xcb0
net/netlink/af_netlink.c:1786
   [< inline >] sock_sendmsg_nosec net/socket.c:612
   [] sock_sendmsg+0xca/0x110 net/socket.c:622
   [] ___sys_sendmsg+0x728/0x860 net/socket.c:1946
   [] __sys_sendmsg+0xd1/0x170 net/socket.c:1980
   [< inline >] SYSC_sendmsg net/socket.c:1991
   [] SyS_sendmsg+0x2d/0x50 net/socket.c:1987
   [] entry_SYSCALL_64_fastpath+0x23/0xc1
arch/x86/entry/entry_64.S:207
Code: df 49 8d 7e 10 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 df 01 00 00
4d 8b 76 10 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6
14 02 4c 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85
RIP  [] tipc_nl_publ_dump+0xa39/0xdf0
net/tipc/socket.c:2810
   RSP 
---[ end trace e8355fded2057a4f ]---

Probable fix :

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3eeb50a27b89..5f80d3fa9c85 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2807,6 +2807,9 @@ int tipc_nl_publ_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
if (err)
return err;
  
+		if (!attrs[TIPC_NLA_SOCK])

+   return -EINVAL;
+
err = nla_parse_nested(sock, TIPC_NLA_SOCK_MAX,
   attrs[TIPC_NLA_SOCK],
   tipc_nl_sock_policy);


Yes. I tested with the patch. It works. Thanks.



Re: What ixgbe devices support HWTSTAMP_FILTER_ALL for hardware time stamping?

2016-05-14 Thread Guy Harris
On May 14, 2016, at 1:26 PM, Richard Cochran  wrote:

> On Sat, May 14, 2016 at 11:47:22AM -0700, Guy Harris wrote:
>> So if you have a GUI application for packet capture, with a combo box to 
>> select the type of time stamping, should it:
>> 
>>  1) regardless of whether ETHTOOL_GET_TS_INFO is available, open the 
>> adapter, try each of the time stamp types to see whether it works, and show 
>> a combo box based on that;
>> 
>>  2) use ETHTOOL_GET_TS_INFO if available;
>> 
>>  3) offer all possibilities regardless of whether they work with the 
>> adapter or not, and just report an error for possibilities that don't work?
>> 
>> My preference is 2) - which is the main reason why libpcap offers "what 
>> possibilities are available?" APIs, not just "request this possibility" APIs.
> 
> You are going to have to implement #1 in any case, if you want your
> program to work on all kernels.

What libpcap currently implements is a combination of #2 and #3, where:

if it's compiled with headers that define ETHTOOL_GET_TS_INFO, it tries 
to do ETHTOOL_GET_TS_INFO and, if that fails with EOPNOTSUPP or EINVAL, it 
offers all possibilities;

if it's compiled with headers that don't define it, it just offers all 
possibilities.

It could do a combination of #2 and #1, where "offers all possibilities" is 
replaced by "opens the adapter, tries each of the possibilities, and offers the 
ones that don't fail" - but, other than the current bugs with 
ETHTOOL_GET_TS_INFO, I don't see any advantage to doing only #1, rather than 
trying #2, perhaps with some special-casing to work around the bugs in 
question, and only falling back on actually trying to set the options if we 
can't ask about them.


Re: [PATCH net-next 2/9] bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO

2016-05-14 Thread Ben Hutchings
On Sat, 2016-05-14 at 20:29 -0400, Michael Chan wrote:
> From: Ajit Khaparde 
[...]
> + /* Read A2 portion of the EEPROM */
> + if (length) {
> + start -= ETH_MODULE_SFF_8436_LEN;
> + bnxt_read_sfp_module_eeprom_info(bp, I2C_DEV_ADDR_A2, 1, start,
> +  length, data + start);

The output address calculation (data + start) makes no sense at all.
If eeprom->offset < ETH_MODULE_SFF_8436_LEN then start == 0 here and
this read overwrites earlier data in the output buffer.  If
eeeprom->offset > ETH_MODULE_SFF_8436_LEN then start > 0 here and this
overruns the output buffer.

I think that 'data' should be incremented along with 'start' in the
previous if-block.

Ben.

> + }
> + return rc;
> +}
[...]

-- 
Ben Hutchings
For every action, there is an equal and opposite criticism. - Harrison


signature.asc
Description: This is a digitally signed message part


[PATCH net-next 6/9] bnxt_en: Fix length value in dmesg log firmware error message.

2016-05-14 Thread Michael Chan
The len value in the hwrm error message is wrong.  Use the properly adjusted
value in the variable len.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index d33b20f..0a83fd8 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2774,7 +2774,7 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void 
*msg, u32 msg_len,
if (i >= tmo_count) {
netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 
0x%x} len:%d\n",
   timeout, le16_to_cpu(req->req_type),
-  le16_to_cpu(req->seq_id), *resp_len);
+  le16_to_cpu(req->seq_id), len);
return -1;
}
 
-- 
1.8.3.1



[PATCH net-next 7/9] bnxt_en: Simplify and improve unsupported SFP+ module reporting.

2016-05-14 Thread Michael Chan
The current code is more complicated than necessary and can only report
unsupported SFP+ module if it is plugged in after the device is up.

Rename bnxt_port_module_event() to bnxt_get_port_module_status().  We
already have the current module_status in the link_info structure, so
just check that and report any unsupported SFP+ module status.  Delete
the unnecessary last_port_module_event.  Call this function at the
end of bnxt_open to report unsupported module already plugged in.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 66 ++-
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  1 -
 2 files changed, 30 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 0a83fd8..6def145 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1263,15 +1263,6 @@ next_rx_no_prod:
((data) &   \
 HWRM_ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_PORT_ID_MASK)
 
-#define BNXT_EVENT_POLICY_MASK \
-   
HWRM_ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_ENFORCEMENT_POLICY_MASK
-
-#define BNXT_EVENT_POLICY_SFT  \
-   
HWRM_ASYNC_EVENT_CMPL_PORT_CONN_NOT_ALLOWED_EVENT_DATA1_ENFORCEMENT_POLICY_SFT
-
-#define BNXT_GET_EVENT_POLICY(data)\
-   (((data) & BNXT_EVENT_POLICY_MASK) >> BNXT_EVENT_POLICY_SFT)
-
 static int bnxt_async_event_process(struct bnxt *bp,
struct hwrm_async_event_cmpl *cmpl)
 {
@@ -1310,9 +1301,6 @@ static int bnxt_async_event_process(struct bnxt *bp,
if (bp->pf.port_id != port_id)
break;
 
-   bp->link_info.last_port_module_event =
-   BNXT_GET_EVENT_POLICY(data1);
-
set_bit(BNXT_HWRM_PORT_MODULE_SP_EVENT, &bp->sp_event);
break;
}
@@ -4725,6 +4713,33 @@ static int bnxt_update_link(struct bnxt *bp, bool 
chng_link_state)
return 0;
 }
 
+static void bnxt_get_port_module_status(struct bnxt *bp)
+{
+   struct bnxt_link_info *link_info = &bp->link_info;
+   struct hwrm_port_phy_qcfg_output *resp = &link_info->phy_qcfg_resp;
+   u8 module_status;
+
+   if (bnxt_update_link(bp, true))
+   return;
+
+   module_status = link_info->module_status;
+   switch (module_status) {
+   case PORT_PHY_QCFG_RESP_MODULE_STATUS_DISABLETX:
+   case PORT_PHY_QCFG_RESP_MODULE_STATUS_PWRDOWN:
+   case PORT_PHY_QCFG_RESP_MODULE_STATUS_WARNINGMSG:
+   netdev_warn(bp->dev, "Unqualified SFP+ module detected on port 
%d\n",
+   bp->pf.port_id);
+   if (bp->hwrm_spec_code >= 0x10201) {
+   netdev_warn(bp->dev, "Module part number %s\n",
+   resp->phy_vendor_partnumber);
+   }
+   if (module_status == PORT_PHY_QCFG_RESP_MODULE_STATUS_DISABLETX)
+   netdev_warn(bp->dev, "TX is disabled\n");
+   if (module_status == PORT_PHY_QCFG_RESP_MODULE_STATUS_PWRDOWN)
+   netdev_warn(bp->dev, "SFP+ module is shutdown\n");
+   }
+}
+
 static void
 bnxt_hwrm_set_pause_common(struct bnxt *bp, struct hwrm_port_phy_cfg_input 
*req)
 {
@@ -5017,7 +5032,8 @@ static int __bnxt_open_nic(struct bnxt *bp, bool 
irq_re_init, bool link_re_init)
/* Enable TX queues */
bnxt_tx_enable(bp);
mod_timer(&bp->timer, jiffies + bp->current_interval);
-   bnxt_update_link(bp, true);
+   /* Poll link status and check for SFP+ module status */
+   bnxt_get_port_module_status(bp);
 
return 0;
 
@@ -5552,28 +5568,6 @@ bnxt_restart_timer:
mod_timer(&bp->timer, jiffies + bp->current_interval);
 }
 
-static void bnxt_port_module_event(struct bnxt *bp)
-{
-   struct bnxt_link_info *link_info = &bp->link_info;
-   struct hwrm_port_phy_qcfg_output *resp = &link_info->phy_qcfg_resp;
-
-   if (bnxt_update_link(bp, true))
-   return;
-
-   if (link_info->last_port_module_event != 0) {
-   netdev_warn(bp->dev, "Unqualified SFP+ module detected on port 
%d\n",
-   bp->pf.port_id);
-   if (bp->hwrm_spec_code >= 0x10201) {
-   netdev_warn(bp->dev, "Module part number %s\n",
-   resp->phy_vendor_partnumber);
-   }
-   }
-   if (link_info->last_port_module_event == 1)
-   netdev_warn(bp->dev, "TX is disabled\n");
-   if (link_info->last_port_module_event == 3)
-   netdev_warn(bp->dev, "Shutdown SFP+ module\n");
-}
-
 static void bnxt_cfg_ntp_filters(struct bnxt *);
 
 static void bnxt_sp_task(struct work_struct *work)
@@ -5622,7 +5616,7 @@ static void bnxt_sp_task(struct work_struct *work)
}
 
if (test_and_clear_bit(BNXT_HWRM_

[PATCH net-next 3/9] bnxt_en: Report PCIe link speed and width during driver load

2016-05-14 Thread Michael Chan
From: Ajit Khaparde 

Add code to log a message during driver load indicating PCIe link
speed and width.

The log message will look like this:
bnxt_en :86:00.0 eth0: PCIe: Speed 8.0GT/s Width x8

Signed-off-by: Ajit Khaparde 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 59b2e36..ba0c3e5 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -6198,6 +6198,22 @@ static int bnxt_set_dflt_rings(struct bnxt *bp)
return rc;
 }
 
+static void bnxt_parse_log_pcie_link(struct bnxt *bp)
+{
+   enum pcie_link_width width = PCIE_LNK_WIDTH_UNKNOWN;
+   enum pci_bus_speed speed = PCI_SPEED_UNKNOWN;
+
+   if (pcie_get_minimum_link(bp->pdev, &speed, &width) ||
+   speed == PCI_SPEED_UNKNOWN || width == PCIE_LNK_WIDTH_UNKNOWN)
+   netdev_info(bp->dev, "Failed to determine PCIe Link Info\n");
+   else
+   netdev_info(bp->dev, "PCIe: Speed %s Width x%d\n",
+   speed == PCIE_SPEED_2_5GT ? "2.5GT/s" :
+   speed == PCIE_SPEED_5_0GT ? "5.0GT/s" :
+   speed == PCIE_SPEED_8_0GT ? "8.0GT/s" :
+   "Unknown", width);
+}
+
 static int bnxt_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
 {
static int version_printed;
@@ -6318,6 +6334,8 @@ static int bnxt_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
board_info[ent->driver_data].name,
(long)pci_resource_start(pdev, 0), dev->dev_addr);
 
+   bnxt_parse_log_pcie_link(bp);
+
return 0;
 
 init_err:
-- 
1.8.3.1



[PATCH net-next 9/9] bnxt_en: Use dma_rmb() instead of rmb().

2016-05-14 Thread Michael Chan
Use the weaker but more appropriate dma_rmb() to order the reading of
the completion ring.

Suggested-by: Ajit Khaparde 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index f2ac7da..643c3ec 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -1433,7 +1433,7 @@ static int bnxt_poll_work(struct bnxt *bp, struct 
bnxt_napi *bnapi, int budget)
/* The valid test of the entry must be done first before
 * reading any further.
 */
-   rmb();
+   dma_rmb();
if (TX_CMP_TYPE(txcmp) == CMP_TYPE_TX_L2_CMP) {
tx_pkts++;
/* return full budget so NAPI will complete. */
-- 
1.8.3.1



[PATCH net-next 1/9] bnxt_en: Fix invalid max channel parameter in ethtool -l.

2016-05-14 Thread Michael Chan
From: Satish Baddipadige 

When there is only 1 MSI-X vector or in INTA mode, tx and rx pre-set
max channel parameters are shown incorrectly in ethtool -l.  With only 1
vector, bnxt_get_max_rings() will return -ENOMEM.  bnxt_get_channels
should check this return value, and set max_rx/max_tx to 0 if it is
non-zero.

Signed-off-by: Satish Baddipadige 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index d6e41f2..28171f9 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -327,7 +327,11 @@ static void bnxt_get_channels(struct net_device *dev,
bnxt_get_max_rings(bp, &max_rx_rings, &max_tx_rings, true);
channel->max_combined = max_rx_rings;
 
-   bnxt_get_max_rings(bp, &max_rx_rings, &max_tx_rings, false);
+   if (bnxt_get_max_rings(bp, &max_rx_rings, &max_tx_rings, false)) {
+   max_rx_rings = 0;
+   max_tx_rings = 0;
+   }
+
tcs = netdev_get_num_tc(dev);
if (tcs > 1)
max_tx_rings /= tcs;
-- 
1.8.3.1



[PATCH net-next 4/9] bnxt_en: Reduce maximum ring pages if page size is 64K.

2016-05-14 Thread Michael Chan
The chip supports 4K/8K/64K page sizes for the rings and we try to
match it to the CPU PAGE_SIZE.  The current page size limits for the rings
are based on 4K/8K page size. If the page size is 64K, these limits are
too large.  Reduce them appropriately.

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 355843b..408bb00 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -425,10 +425,17 @@ struct rx_tpa_end_cmp_ext {
 
 #define MAX_TPA64
 
+#if (BNXT_PAGE_SHIFT == 16)
+#define MAX_RX_PAGES   1
+#define MAX_RX_AGG_PAGES   4
+#define MAX_TX_PAGES   1
+#define MAX_CP_PAGES   8
+#else
 #define MAX_RX_PAGES   8
 #define MAX_RX_AGG_PAGES   32
 #define MAX_TX_PAGES   8
 #define MAX_CP_PAGES   64
+#endif
 
 #define RX_DESC_CNT (BNXT_PAGE_SIZE / sizeof(struct rx_bd))
 #define TX_DESC_CNT (BNXT_PAGE_SIZE / sizeof(struct tx_bd))
-- 
1.8.3.1



[PATCH net-next 8/9] bnxt_en: Add BCM57314 device ID.

2016-05-14 Thread Michael Chan
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6def145..f2ac7da 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -78,6 +78,7 @@ enum board_idx {
BCM57402,
BCM57404,
BCM57406,
+   BCM57314,
BCM57304_VF,
BCM57404_VF,
 };
@@ -92,6 +93,7 @@ static const struct {
{ "Broadcom BCM57402 NetXtreme-E Dual-port 10Gb Ethernet" },
{ "Broadcom BCM57404 NetXtreme-E Dual-port 10Gb/25Gb Ethernet" },
{ "Broadcom BCM57406 NetXtreme-E Dual-port 10GBase-T Ethernet" },
+   { "Broadcom BCM57314 NetXtreme-C Dual-port 10Gb/25Gb/40Gb/50Gb 
Ethernet" },
{ "Broadcom BCM57304 NetXtreme-C Ethernet Virtual Function" },
{ "Broadcom BCM57404 NetXtreme-E Ethernet Virtual Function" },
 };
@@ -103,6 +105,7 @@ static const struct pci_device_id bnxt_pci_tbl[] = {
{ PCI_VDEVICE(BROADCOM, 0x16d0), .driver_data = BCM57402 },
{ PCI_VDEVICE(BROADCOM, 0x16d1), .driver_data = BCM57404 },
{ PCI_VDEVICE(BROADCOM, 0x16d2), .driver_data = BCM57406 },
+   { PCI_VDEVICE(BROADCOM, 0x16df), .driver_data = BCM57314 },
 #ifdef CONFIG_BNXT_SRIOV
{ PCI_VDEVICE(BROADCOM, 0x16cb), .driver_data = BCM57304_VF },
{ PCI_VDEVICE(BROADCOM, 0x16d3), .driver_data = BCM57404_VF },
-- 
1.8.3.1



[PATCH net-next 5/9] bnxt_en: Improve the delay logic for firmware response.

2016-05-14 Thread Michael Chan
The current code has 2 problems:

1. The maximum wait time is not long enough.  It is about 60% of the
duration specified by the firmware.  It is calling usleep_range(600, 800)
for every 1 msec we are supposed to wait.

2. The granularity of the delay is too coarse.  Many simple firmware
commands finish in 25 usec or less.

We fix these 2 issues by multiplying the original 1 msec loop counter by
40 and calling usleep_range(25, 40) for each iteration.

There is also a second delay loop to wait for the last DMA word to
complete.  This delay loop should be a very short 5 usec wait.

This change results in much faster bring-up/down time:

Before the patch:

time ip link set p4p1 up

real0m0.120s
user0m0.001s
sys 0m0.009s

After the patch:

time ip link set p4p1 up

real0m0.030s
user0m0.000s
sys 0m0.010s

Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index ba0c3e5..d33b20f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -2718,7 +2718,7 @@ void bnxt_hwrm_cmd_hdr_init(struct bnxt *bp, void 
*request, u16 req_type,
 static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void *msg, u32 msg_len,
 int timeout, bool silent)
 {
-   int i, intr_process, rc;
+   int i, intr_process, rc, tmo_count;
struct input *req = msg;
u32 *data = msg;
__le32 *resp_len, *valid;
@@ -2747,11 +2747,12 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void 
*msg, u32 msg_len,
timeout = DFLT_HWRM_CMD_TIMEOUT;
 
i = 0;
+   tmo_count = timeout * 40;
if (intr_process) {
/* Wait until hwrm response cmpl interrupt is processed */
while (bp->hwrm_intr_seq_id != HWRM_SEQ_ID_INVALID &&
-  i++ < timeout) {
-   usleep_range(600, 800);
+  i++ < tmo_count) {
+   usleep_range(25, 40);
}
 
if (bp->hwrm_intr_seq_id != HWRM_SEQ_ID_INVALID) {
@@ -2762,15 +2763,15 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void 
*msg, u32 msg_len,
} else {
/* Check if response len is updated */
resp_len = bp->hwrm_cmd_resp_addr + HWRM_RESP_LEN_OFFSET;
-   for (i = 0; i < timeout; i++) {
+   for (i = 0; i < tmo_count; i++) {
len = (le32_to_cpu(*resp_len) & HWRM_RESP_LEN_MASK) >>
  HWRM_RESP_LEN_SFT;
if (len)
break;
-   usleep_range(600, 800);
+   usleep_range(25, 40);
}
 
-   if (i >= timeout) {
+   if (i >= tmo_count) {
netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 
0x%x} len:%d\n",
   timeout, le16_to_cpu(req->req_type),
   le16_to_cpu(req->seq_id), *resp_len);
@@ -2779,13 +2780,13 @@ static int bnxt_hwrm_do_send_msg(struct bnxt *bp, void 
*msg, u32 msg_len,
 
/* Last word of resp contains valid bit */
valid = bp->hwrm_cmd_resp_addr + len - 4;
-   for (i = 0; i < timeout; i++) {
+   for (i = 0; i < 5; i++) {
if (le32_to_cpu(*valid) & HWRM_RESP_VALID_MASK)
break;
-   usleep_range(600, 800);
+   udelay(1);
}
 
-   if (i >= timeout) {
+   if (i >= 5) {
netdev_err(bp->dev, "Error (timeout: %d) msg {0x%x 
0x%x} len:%d v:%d\n",
   timeout, le16_to_cpu(req->req_type),
   le16_to_cpu(req->seq_id), len, *valid);
-- 
1.8.3.1



[PATCH net-next 2/9] bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO

2016-05-14 Thread Michael Chan
From: Ajit Khaparde 

Add support to fetch the SFP EEPROM settings from the firmware
and display it via the ethtool -m command.  We support SFP+ and QSFP
modules.

Signed-off-by: Ajit Khaparde 
Signed-off-by: Michael Chan 
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c |   1 +
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  11 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 120 ++
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h |  34 ++
 4 files changed, 166 insertions(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 6a5a717..59b2e36 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -4671,6 +4671,7 @@ static int bnxt_update_link(struct bnxt *bp, bool 
chng_link_state)
link_info->transceiver = resp->xcvr_pkg_type;
link_info->phy_addr = resp->eee_config_phy_addr &
  PORT_PHY_QCFG_RESP_PHY_ADDR_MASK;
+   link_info->module_status = resp->module_status;
 
if (bp->flags & BNXT_FLAG_EEE_CAP) {
struct ethtool_eee *eee = &bp->eee;
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h 
b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
index 6289635..355843b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h
@@ -829,6 +829,7 @@ struct bnxt_link_info {
u16 lp_auto_link_speeds;
u16 force_link_speed;
u32 preemphasis;
+   u8  module_status;
 
/* copy of requested setting from ethtool cmd */
u8  autoneg;
@@ -1121,6 +1122,16 @@ static inline void bnxt_disable_poll(struct bnxt_napi 
*bnapi)
 
 #endif
 
+#define I2C_DEV_ADDR_A00xa0
+#define I2C_DEV_ADDR_A20xa2
+#define SFP_EEPROM_SFF_8472_COMP_ADDR  0x5e
+#define SFP_EEPROM_SFF_8472_COMP_SIZE  1
+#define SFF_MODULE_ID_SFP  0x3
+#define SFF_MODULE_ID_QSFP 0xc
+#define SFF_MODULE_ID_QSFP_PLUS0xd
+#define SFF_MODULE_ID_QSFP28   0x11
+#define BNXT_MAX_PHY_I2C_RESP_SIZE 64
+
 void bnxt_set_ring_params(struct bnxt *);
 void bnxt_hwrm_cmd_hdr_init(struct bnxt *, void *, u16, u16, u16);
 int _hwrm_send_message(struct bnxt *, void *, u32, int);
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c 
b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
index 28171f9..93a3e5f 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c
@@ -1498,6 +1498,124 @@ static int bnxt_get_eee(struct net_device *dev, struct 
ethtool_eee *edata)
return 0;
 }
 
+static int bnxt_read_sfp_module_eeprom_info(struct bnxt *bp, u16 i2c_addr,
+   u16 page_number, u16 start_addr,
+   u16 data_length, u8 *buf)
+{
+   struct hwrm_port_phy_i2c_read_input req = {0};
+   struct hwrm_port_phy_i2c_read_output *output = bp->hwrm_cmd_resp_addr;
+   int rc, byte_offset = 0;
+
+   bnxt_hwrm_cmd_hdr_init(bp, &req, HWRM_PORT_PHY_I2C_READ, -1, -1);
+   req.i2c_slave_addr = i2c_addr;
+   req.page_number = cpu_to_le16(page_number);
+   req.port_id = cpu_to_le16(bp->pf.port_id);
+   do {
+   u16 xfer_size;
+
+   xfer_size = min_t(u16, data_length, BNXT_MAX_PHY_I2C_RESP_SIZE);
+   data_length -= xfer_size;
+   req.page_offset = cpu_to_le16(start_addr + byte_offset);
+   req.data_length = xfer_size;
+   req.enables = cpu_to_le32(start_addr + byte_offset ?
+PORT_PHY_I2C_READ_REQ_ENABLES_PAGE_OFFSET : 0);
+   mutex_lock(&bp->hwrm_cmd_lock);
+   rc = _hwrm_send_message(bp, &req, sizeof(req),
+   HWRM_CMD_TIMEOUT);
+   if (!rc)
+   memcpy(buf + byte_offset, output->data, xfer_size);
+   mutex_unlock(&bp->hwrm_cmd_lock);
+   byte_offset += xfer_size;
+   } while (!rc && data_length > 0);
+
+   return rc;
+}
+
+static int bnxt_get_module_info(struct net_device *dev,
+   struct ethtool_modinfo *modinfo)
+{
+   struct bnxt *bp = netdev_priv(dev);
+   struct hwrm_port_phy_i2c_read_input req = {0};
+   struct hwrm_port_phy_i2c_read_output *output = bp->hwrm_cmd_resp_addr;
+   int rc;
+
+   /* No point in going further if phy status indicates
+* module is not inserted or if it is powered down or
+* if it is of type 10GBase-T
+*/
+   if (bp->link_info.module_status >
+   PORT_PHY_QCFG_RESP_MODULE_STATUS_WARNINGMSG)
+   return -EOPNOTSUPP;
+
+  

[PATCH net-next 0/9] bnxt_en: updates for net-next.

2016-05-14 Thread Michael Chan
Non-critical bug fixes, improvements, a new ethtool feature, and a new
device ID.

Michael Chan (9):
  bnxt_en: Fix invalid max channel parameter in ethtool -l.
  bnxt_en: Add Support for ETHTOOL_GMODULEINFO and ETHTOOL_GMODULEEEPRO
  bnxt_en: Report PCIe link speed and width during driver load
  bnxt_en: Reduce maximum ring pages if page size is 64K.
  bnxt_en: Improve the delay logic for firmware response.
  bnxt_en: Fix length value in dmesg log firmware error message.
  bnxt_en: Simplify and improve unsupported SFP+ module reporting.
  bnxt_en: Add BCM57314 device ID.
  bnxt_en: Use dma_rmb() instead of rmb().

 drivers/net/ethernet/broadcom/bnxt/bnxt.c | 111 +++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h |  19 +++-
 drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 126 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt_hsi.h |  34 ++
 4 files changed, 241 insertions(+), 49 deletions(-)

-- 
1.8.3.1



Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-14 Thread Shuyu Wei
Sorry, the last two lines is wrong, ignore it.

I mean I intended to ignore one or two packets.
It's just a trade-off for performance, but it
doesn't cause any memory leak.


Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-14 Thread Shuyu Wei
On Sat, May 14, 2016 at 10:03:56PM +0200, Francois Romieu wrote:
> Shuyu Wei  :
> > The tail of the ring buffer(txbd_dirty) should never go ahead of the
> > head(txbd_curr) or the ring buffer will corrupt. 
> > 
> > This is the root cause of racing.
> 
> No (see below).
> 
> It may suffer from some barrier illness though.
> 
> > Besides, setting the FOR_EMAC flag should be the last step of modifying
> > the buffer descriptor, or possible racing will occur.
> 
> (s/Besides//)
> 
> Yes. Good catch.
> 
> > Signed-off-by: Shuyu Wei 
> > ---
> > 
> > diff --git a/drivers/net/ethernet/arc/emac_main.c 
> > b/drivers/net/ethernet/arc/emac_main.c
> > index a3a9392..5ece05b 100644
> > --- a/drivers/net/ethernet/arc/emac_main.c
> > +++ b/drivers/net/ethernet/arc/emac_main.c
> > @@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
> > struct net_device_stats *stats = &ndev->stats;
> > unsigned int i;
> >  
> > -   for (i = 0; i < TX_BD_NUM; i++) {
> > +   for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % 
> > TX_BD_NUM) {
> > unsigned int *txbd_dirty = &priv->txbd_dirty;
> > struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
> > struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
> 
> "i" is only used as a loop counter in arc_emac_tx_clean. It is not even
> used as an index to dereference an array or whatever. Only "priv->txbd_dirty"
> is used.
> 
> arc_emac_tx_clean() checks FOR_EMAC, skb, and dirty tx data. It takes care of
> clearing those itself. Thus, (memory / io barrier considerations apart) it can
> only proceed beyond its own "if ((info & FOR_EMAC) || !txbd->data || !skb)"
> check if arc_emac_tx wrote all of those.
> 
> Where they are used as loop counters, both TX_BD_NUM and txbd_curr - 
> txbd_dirty
> can be considered as hints (please note that unsigned arithmetic can replace
> the "%" sludgehammer here).
> 
> > @@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> > net_device *ndev)
> >  
> > skb_tx_timestamp(skb);
> 
> > +   priv->tx_buff[*txbd_curr].skb = skb;
> 
>   dma_wmb();
> 
> (sync writes to memory before releasing descriptor)
> 
> > *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
> >  
> > /* Make sure info word is set */
> > wmb();
> 
> arc_emac_tx_clean can run from this point.
> 
> txbd_curr is still not set (and it does need to). So, if you insist
> on txbd_curr appearing in arc_emac_tx_clean::for(...), it's perfectly
> possible to ignore a sent packet.
> 
> I ignored arc_reg_set() at the end of arc_emac_tx(). I have no idea
> if it is posted nor if it forces the chipset to read the descriptors
> (synchronously ?) so part of the sentence above could be wrong.
> 
> You have found a big offender in arc_emac_tx() but the arc_emac_tx_clean()
> part is imho useless, incorrectly understood or misworded.
> 
> -- 
> Ueimor


Hi, Ueimor. Thanks for your reply.

I don't think taking txbd_curr and txbd_dirty only as hints is a good idea.
That could be a big waste, since tx_clean have to go through all the txbds.
I tried your advice, Tx throughput can only reach 5.52MB/s.

Leaving one sent packet in tx_clean is acceptable if we respect to txbd_curr
and txbd_dirty, since the ignored packet will be cleaned when new packets
arrive.

>  for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % TX_BD_NUM) {
In fact, the loop above will always ignore one or two sent packet, the loop
below can free all packets or leave one if txbd_curr is not updated. I
use the above one since it is clearer.

  for (i = priv->txbd_dirty; (i + 1) % TX_BD_NUM != priv->txbd_curr; i = (i + 
1) % TX_BD_NUM) {


Re: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread Linus Torvalds
On Sat, May 14, 2016 at 2:33 PM, Willy Tarreau  wrote:
>
> Why simply not cast the atomic to (unsigned long long) instead of (u64)
> so that %llu always matches ?

Yes, that fixes the problem. It's just more typing, and annoying. The
fact that MS got it right while posix and gcc screwed it up is a bit
embarrassing..

If we ever start using __uint128_t, we'll have even more problems in
this area. Oh well.

Linus


Re: [PATCH RFT 1/2] phylib: add device reset GPIO support

2016-05-14 Thread Sergei Shtylyov

Hello.

On 05/14/2016 10:50 PM, Andrew Lunn wrote:


Another issue is that on some boards we have one reset line tied to
multiple PHYs.How do we prevent multiple resets being taking place when each of
the PHYs are registered?


 My patch just doesn't address this case -- it's about the
individual resets only.


This actually needs to be addresses a layer above. What you have is a
bus reset, not a device reset.


  No.
  There's simply no such thing as a bus reset for the xMII/MDIO
busses, there's simply no reset signaling on them. Every device has
its own reset signal and its own timing requirements.


Except in the case above, where two phys are sharing the same reset
signal. So although it is not part of the mdio standard to have a bus
reset, this is in effect what the gpio line is doing, resetting all
devices on the bus. If you don't model that as a bus reset, how do you
model it?


   I'm not suggesting that the shared reset should be handled by my
patch. Contrariwise, I suggested to use the mii_bus::reset() method


I think we miss understood each other somewhere.

Your code is great for one gpio reset line for one phy.

I think there could be similar code one layer above to handle one gpio
line for multiple phys.


   Ah, you want me to recognize some MAC/MDIO bound prop (e.g. 
"mdio-reset-gpios") in of_mdiobus_register()? I'll think about it now that my 
patch needs fixing anyway...



 Andrew


MBR, Sergei



Re: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread Willy Tarreau
On Sat, May 14, 2016 at 02:31:04PM -0700, Linus Torvalds wrote:
> On Sat, May 14, 2016 at 11:24 AM, Linus Torvalds
>  wrote:
> >
> >
> > -   net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%p", net);
> > +   net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%llu",
> > +   (u64)atomic64_inc_return(&unique_id));
> 
> Oh well. I suspect this is going to cause a new warning on alpha and
> ia64 and possibly others.
> 
> "u64" is indeed "unsigned long long" on x86 and many other
> architectures, but on alpga and ia64 it's just "unsigned long".
> 
> So that case should have been to "long long". I detest how there isn't
> a "64-bit size" printf specifier.

Why simply not cast the atomic to (unsigned long long) instead of (u64)
so that %llu always matches ?

Willy



Re: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread Linus Torvalds
On Sat, May 14, 2016 at 11:24 AM, Linus Torvalds
 wrote:
>
>
> -   net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%p", net);
> +   net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%llu",
> +   (u64)atomic64_inc_return(&unique_id));

Oh well. I suspect this is going to cause a new warning on alpha and
ia64 and possibly others.

"u64" is indeed "unsigned long long" on x86 and many other
architectures, but on alpga and ia64 it's just "unsigned long".

So that case should have been to "long long". I detest how there isn't
a "64-bit size" printf specifier.

And no, PRId64 preprocessor garbage and similar disgusting hacks by
POSIX isn't the answer. I think MSC actually got that right with
"%I64d".

Oh well. It's a fairly harmless compiler warning, it isn't a code
correctness issue.

   Linus


Re: [PATCH net-next] net: switchdev: Drop EXPERIMENTAL from description

2016-05-14 Thread David Miller
From: Florian Fainelli 
Date: Sat, 14 May 2016 12:49:54 -0700

> Switchdev has been around for quite a while now, putting "EXPERIMENTAL"
> in the description is no longer accurate, drop it.
> 
> Signed-off-by: Florian Fainelli 

Yeah this is long overdue, applied, thanks.


Re: [PATCH RFT 1/2] phylib: add device reset GPIO support

2016-05-14 Thread Sergei Shtylyov

Hello.

On 05/13/2016 10:18 PM, Sergei Shtylyov wrote:


[we already talked about this patch in #armlinux, I'm now just
forwarding my comments on the list. Background was that I sent an easier
and less complete patch with the same idea. See
http://patchwork.ozlabs.org/patch/621418/]

[added Linus Walleij to Cc, there is a question for you/him below]

On Fri, Apr 29, 2016 at 01:12:54AM +0300, Sergei Shtylyov wrote:

--- net-next.orig/Documentation/devicetree/bindings/net/phy.txt
+++ net-next/Documentation/devicetree/bindings/net/phy.txt
@@ -35,6 +35,8 @@ Optional Properties:
 - broken-turn-around: If set, indicates the PHY device does not correctly
   release the turn around line low at the end of a MDIO transaction.

+- reset-gpios: The GPIO phandle and specifier for the PHY reset signal.
+
 Example:

 ethernet-phy@0 {


This is great.


Index: net-next/drivers/net/phy/at803x.c
===
--- net-next.orig/drivers/net/phy/at803x.c
+++ net-next/drivers/net/phy/at803x.c
@@ -65,7 +65,6 @@ MODULE_LICENSE("GPL");
[...]


My patch breaks this driver. I wasn't aware of it.


   I tried to be as careful as I could but still it looks that I didn't
succeed at that too...


   Hm, I'm starting to forget the vital details about my patch...


[...]

Index: net-next/drivers/net/phy/mdio_device.c
===
--- net-next.orig/drivers/net/phy/mdio_device.c
+++ net-next/drivers/net/phy/mdio_device.c

[...]

@@ -117,9 +126,16 @@ static int mdio_probe(struct device *dev
 struct mdio_driver *mdiodrv = to_mdio_driver(drv);
 int err = 0;

-if (mdiodrv->probe)
+if (mdiodrv->probe) {
+/* Deassert the reset signal */
+mdio_device_reset(mdiodev, 0);
+
 err = mdiodrv->probe(mdiodev);

+/* Assert the reset signal */
+mdio_device_reset(mdiodev, 1);


I wonder if it's safe to do this in general. What if ->probe does
something with the phy that is lost by resetting but that is relied on
later?


   Well, I thought that config_init() method is designed for that but indeed
the LXT driver writes to BMCR in its probe() method and hence is broken.
Thank you for noticing...


   It's broken even without my patch. The phylib will cause a PHY soft reset


   Only iff the config_init() method exists in the PHY driver...


when attaching to the PHY device, so all BMCR programming dpone in the probe()
method will be lost. My patch does make sense as is.


   No, actually it doesn't. :-(


Looks like I should alsolook into fixing lxt.c.


   It took me to actually do a patch to understand my fault. Sigh... :-/


Florian, what do you think?


   Florian, is phy_init_hw() logic correct?

MBR, Sergei



[PATCH v2] r8169: default to 64-bit DMA on recent PCIe chips

2016-05-14 Thread Ard Biesheuvel
The current logic around the 'use_dac' module parameter prevents the
r81969 driver from being loadable on 64-bit systems without any RAM
below 4 GB when the parameter is left at its default value.

So introduce a new default value -1 which indicates that 64-bit DMA
should be enabled on sufficiently recent PCIe chips, i.e., versions
RTL_GIGA_MAC_VER_18 or later. Explicit param values of 0 or 1 retain
the existing behavior of unconditionally enabling/disabling 64-bit DMA
on 64-bit architectures (i.e., regardless of the type and version of the
chip)

Since PCIe chips do not need to CPlusCmd Dual Address Cycle to be set,
make that conditional on the device type as well.

Cc: Realtek linux nic maintainers 
Signed-off-by: Ard Biesheuvel 
---
This is a followup to 'r8169: default to 64-bit DMA on systems without memory
below 4 GB' [1]. At the request of Francois, this version bases the decision
whether to use 64-bit DMA by default on whether the device is PCIe and
sufficiently recent, rather than whether the platform requires 64-bit DMA
because it does not have any memory below 4 GB to begin with. This is safer,
since it will prevent the use of such problematic cards on these platforms.

v2: drop unnecessary reordering of rtl8169_get_mac_version() call with pcie
check

[1] http://article.gmane.org/gmane.linux.network/412246

 drivers/net/ethernet/realtek/r8169.c | 44 +++-
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 94f08f1e841c..0e62d74b09b3 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -345,7 +345,7 @@ static const struct pci_device_id rtl8169_pci_tbl[] = {
 MODULE_DEVICE_TABLE(pci, rtl8169_pci_tbl);
 
 static int rx_buf_sz = 16383;
-static int use_dac;
+static int use_dac = -1;
 static struct {
u32 msg_enable;
 } debug = { -1 };
@@ -8224,20 +8224,6 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_out_mwi_2;
}
 
-   tp->cp_cmd = 0;
-
-   if ((sizeof(dma_addr_t) > 4) &&
-   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
-   tp->cp_cmd |= PCIDAC;
-   dev->features |= NETIF_F_HIGHDMA;
-   } else {
-   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
-   if (rc < 0) {
-   netif_err(tp, probe, dev, "DMA configuration failed\n");
-   goto err_out_free_res_3;
-   }
-   }
-
/* ioremap MMIO region */
ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
if (!ioaddr) {
@@ -8253,6 +8239,25 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
/* Identify chip attached to board */
rtl8169_get_mac_version(tp, dev, cfg->default_ver);
 
+   tp->cp_cmd = 0;
+
+   if ((sizeof(dma_addr_t) > 4) &&
+   (use_dac == 1 || (use_dac == -1 && pci_is_pcie(pdev) &&
+ tp->mac_version >= RTL_GIGA_MAC_VER_18)) &&
+   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
+
+   /* CPlusCmd Dual Access Cycle is only needed for non-PCIe */
+   if (!pci_is_pcie(pdev))
+   tp->cp_cmd |= PCIDAC;
+   dev->features |= NETIF_F_HIGHDMA;
+   } else {
+   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc < 0) {
+   netif_err(tp, probe, dev, "DMA configuration failed\n");
+   goto err_out_unmap_4;
+   }
+   }
+
rtl_init_rxcfg(tp);
 
rtl_irq_disable(tp);
@@ -8412,12 +8417,12 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
   &tp->counters_phys_addr, GFP_KERNEL);
if (!tp->counters) {
rc = -ENOMEM;
-   goto err_out_msi_4;
+   goto err_out_msi_5;
}
 
rc = register_netdev(dev);
if (rc < 0)
-   goto err_out_cnt_5;
+   goto err_out_cnt_6;
 
pci_set_drvdata(pdev, dev);
 
@@ -8451,12 +8456,13 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 out:
return rc;
 
-err_out_cnt_5:
+err_out_cnt_6:
dma_free_coherent(&pdev->dev, sizeof(*tp->counters), tp->counters,
  tp->counters_phys_addr);
-err_out_msi_4:
+err_out_msi_5:
netif_napi_del(&tp->napi);
rtl_disable_msi(pdev, tp);
+err_out_unmap_4:
iounmap(ioaddr);
 err_out_free_res_3:
pci_release_regions(pdev);
-- 
2.7.4



Re: What ixgbe devices support HWTSTAMP_FILTER_ALL for hardware time stamping?

2016-05-14 Thread Richard Cochran
On Sat, May 14, 2016 at 11:47:22AM -0700, Guy Harris wrote:
> So if you have a GUI application for packet capture, with a combo box to 
> select the type of time stamping, should it:
> 
>   1) regardless of whether ETHTOOL_GET_TS_INFO is available, open the 
> adapter, try each of the time stamp types to see whether it works, and show a 
> combo box based on that;
> 
>   2) use ETHTOOL_GET_TS_INFO if available;
> 
>   3) offer all possibilities regardless of whether they work with the 
> adapter or not, and just report an error for possibilities that don't work?
> 
> My preference is 2) - which is the main reason why libpcap offers "what 
> possibilities are available?" APIs, not just "request this possibility" APIs.

You are going to have to implement #1 in any case, if you want your
program to work on all kernels.  IIRC get_ts_info appeared in 3.5.

Thanks,
Richard


Re: [PATCH net-next] net: switchdev: Drop EXPERIMENTAL from description

2016-05-14 Thread Jiri Pirko
Sat, May 14, 2016 at 09:49:54PM CEST, f.faine...@gmail.com wrote:
>Switchdev has been around for quite a while now, putting "EXPERIMENTAL"
>in the description is no longer accurate, drop it.
>
>Signed-off-by: Florian Fainelli 

Acked-by: Jiri Pirko 


[GIT] Networking

2016-05-14 Thread David Miller

1) Fix mvneta/bm dependencies, from Arnd Bergmann.

2) RX completion hw bug workaround in bnxt_en, from Michael Chan.

3) Kernel pointer leak in nf_conntrack, from Linus.

4) Hoplimit route attribute limits not enforced properly, from
   Paolo Abeni.

5) qlcnic driver NULL deref fix from Dan Carpenter.

Please pull, thanks a lot!

The following changes since commit 685764b108a7e5fe9f5ee213d6a627c1166d7c88:

  Merge tag 'scsi-fixes' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi (2016-05-11 13:17:12 
-0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 98397fc547e3f4553553a30ea56fa34d613f0a4c:

  arm64: bpf: jit JMP_JSET_{X,K} (2016-05-14 16:11:45 -0400)


Arnd Bergmann (1):
  net: mvneta: bm: fix dependencies again

Dan Carpenter (1):
  qlcnic: potential NULL dereference in qlcnic_83xx_get_minidump_template()

David S. Miller (2):
  Merge branch 'bnxt_en-fixes'
  Merge branch 'xgene-fixes'

Iyappan Subramanian (5):
  drivers: net: xgene: fix IPv4 forward crash
  drivers: net: xgene: fix sharing of irqs
  drivers: net: xgene: fix ununiform latency across queues
  drivers: net: xgene: fix statistics counters race condition
  drivers: net: xgene: fix register offset

Linus Torvalds (1):
  nf_conntrack: avoid kernel pointer value leak in slab name

Michael Chan (2):
  bnxt_en: Add workaround to detect bad opaque in rx completion (part 1)
  bnxt_en: Add workaround to detect bad opaque in rx completion (part 2)

Paolo Abeni (1):
  net/route: enforce hoplimit max value

Paul Durrant (1):
  xen-netback: fix extra_info handling in xenvif_tx_err()

Zi Shen Lim (1):
  arm64: bpf: jit JMP_JSET_{X,K}

 arch/arm64/net/bpf_jit_comp.c|  1 +
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.c  | 11 +-
 drivers/net/ethernet/apm/xgene/xgene_enet_cle.h  |  2 ++
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.c   | 19 ++---
 drivers/net/ethernet/apm/xgene/xgene_enet_hw.h   |  8 ---
 drivers/net/ethernet/apm/xgene/xgene_enet_main.c | 75 
+++--
 drivers/net/ethernet/apm/xgene/xgene_enet_main.h | 18 
 drivers/net/ethernet/apm/xgene/xgene_enet_sgmac.h|  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c| 63 
++
 drivers/net/ethernet/broadcom/bnxt/bnxt.h|  2 ++
 drivers/net/ethernet/marvell/Kconfig |  2 +-
 drivers/net/ethernet/qlogic/qlcnic/qlcnic_minidump.c |  8 +--
 drivers/net/xen-netback/netback.c|  1 +
 net/ipv4/fib_semantics.c |  2 ++
 net/ipv6/route.c |  2 ++
 net/netfilter/nf_conntrack_core.c|  4 +++-
 16 files changed, 175 insertions(+), 45 deletions(-)


Re: [PATCH] arm64: bpf: jit JMP_JSET_{X,K}

2016-05-14 Thread David Miller
From: Will Deacon 
Date: Fri, 13 May 2016 11:03:07 +0100

> On Fri, May 13, 2016 at 10:57:18AM +0100, Will Deacon wrote:
>> On Thu, May 12, 2016 at 11:37:58PM -0700, Zi Shen Lim wrote:
>> > Original implementation commit e54bcde3d69d ("arm64: eBPF JIT compiler")
>> > had the relevant code paths, but due to an oversight always fail jiting.
>> > 
>> > As a result, we had been falling back to BPF interpreter whenever a BPF
>> > program has JMP_JSET_{X,K} instructions.
>> > 
>> > With this fix, we confirm that the corresponding tests in lib/test_bpf
>> > continue to pass, and also jited.
>> > 
>> > ...
>> > [2.784553] test_bpf: #30 JSET jited:1 188 192 197 PASS
>> > [2.791373] test_bpf: #31 tcpdump port 22 jited:1 325 677 625 PASS
>> > [2.808800] test_bpf: #32 tcpdump complex jited:1 323 731 991 PASS
>> > ...
>> > [3.190759] test_bpf: #237 JMP_JSET_K: if (0x3 & 0x2) return 1 jited:1 
>> > 110 PASS
>> > [3.192524] test_bpf: #238 JMP_JSET_K: if (0x3 & 0x) return 1 
>> > jited:1 98 PASS
>> > [3.211014] test_bpf: #249 JMP_JSET_X: if (0x3 & 0x2) return 1 jited:1 
>> > 120 PASS
>> > [3.212973] test_bpf: #250 JMP_JSET_X: if (0x3 & 0x) return 1 
>> > jited:1 89 PASS
>> > ...
>> > 
>> > Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
>> > Signed-off-by: Zi Shen Lim 
 ...
> Acked-by: Will Deacon 
> 
> I'm assuming David will queue this?

Yep I got this, applied, thanks.


Re: [PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-14 Thread Francois Romieu
Shuyu Wei  :
> The tail of the ring buffer(txbd_dirty) should never go ahead of the
> head(txbd_curr) or the ring buffer will corrupt. 
> 
> This is the root cause of racing.

No (see below).

It may suffer from some barrier illness though.

> Besides, setting the FOR_EMAC flag should be the last step of modifying
> the buffer descriptor, or possible racing will occur.

(s/Besides//)

Yes. Good catch.

> Signed-off-by: Shuyu Wei 
> ---
> 
> diff --git a/drivers/net/ethernet/arc/emac_main.c 
> b/drivers/net/ethernet/arc/emac_main.c
> index a3a9392..5ece05b 100644
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
> struct net_device_stats *stats = &ndev->stats;
> unsigned int i;
>  
> -   for (i = 0; i < TX_BD_NUM; i++) {
> +   for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % 
> TX_BD_NUM) {
> unsigned int *txbd_dirty = &priv->txbd_dirty;
> struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
> struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];

"i" is only used as a loop counter in arc_emac_tx_clean. It is not even
used as an index to dereference an array or whatever. Only "priv->txbd_dirty"
is used.

arc_emac_tx_clean() checks FOR_EMAC, skb, and dirty tx data. It takes care of
clearing those itself. Thus, (memory / io barrier considerations apart) it can
only proceed beyond its own "if ((info & FOR_EMAC) || !txbd->data || !skb)"
check if arc_emac_tx wrote all of those.

Where they are used as loop counters, both TX_BD_NUM and txbd_curr - txbd_dirty
can be considered as hints (please note that unsigned arithmetic can replace
the "%" sludgehammer here).

> @@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>  
> skb_tx_timestamp(skb);

> +   priv->tx_buff[*txbd_curr].skb = skb;

dma_wmb();

(sync writes to memory before releasing descriptor)

> *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
>  
> /* Make sure info word is set */
> wmb();

arc_emac_tx_clean can run from this point.

txbd_curr is still not set (and it does need to). So, if you insist
on txbd_curr appearing in arc_emac_tx_clean::for(...), it's perfectly
possible to ignore a sent packet.

I ignored arc_reg_set() at the end of arc_emac_tx(). I have no idea
if it is posted nor if it forces the chipset to read the descriptors
(synchronously ?) so part of the sentence above could be wrong.

You have found a big offender in arc_emac_tx() but the arc_emac_tx_clean()
part is imho useless, incorrectly understood or misworded.

-- 
Ueimor


Re: [PATCH RFT 1/2] phylib: add device reset GPIO support

2016-05-14 Thread Andrew Lunn
On Sat, May 14, 2016 at 10:36:38PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 05/14/2016 02:44 AM, Andrew Lunn wrote:
> 
> >Another issue is that on some boards we have one reset line tied to
> >multiple PHYs.How do we prevent multiple resets being taking place when 
> >each of
> >the PHYs are registered?
> 
>   My patch just doesn't address this case -- it's about the
> individual resets only.
> >>>
> >>>This actually needs to be addresses a layer above. What you have is a
> >>>bus reset, not a device reset.
> >>
> >>   No.
> >>   There's simply no such thing as a bus reset for the xMII/MDIO
> >>busses, there's simply no reset signaling on them. Every device has
> >>its own reset signal and its own timing requirements.
> >
> >Except in the case above, where two phys are sharing the same reset
> >signal. So although it is not part of the mdio standard to have a bus
> >reset, this is in effect what the gpio line is doing, resetting all
> >devices on the bus. If you don't model that as a bus reset, how do you
> >model it?
> 
>I'm not suggesting that the shared reset should be handled by my
> patch. Contrariwise, I suggested to use the mii_bus::reset() method

I think we miss understood each other somewhere.

Your code is great for one gpio reset line for one phy.

I think there could be similar code one layer above to handle one gpio
line for multiple phys.

 Andrew


[PATCH net-next] net: switchdev: Drop EXPERIMENTAL from description

2016-05-14 Thread Florian Fainelli
Switchdev has been around for quite a while now, putting "EXPERIMENTAL"
in the description is no longer accurate, drop it.

Signed-off-by: Florian Fainelli 
---
 net/switchdev/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/switchdev/Kconfig b/net/switchdev/Kconfig
index 86a47e1..651fa20 100644
--- a/net/switchdev/Kconfig
+++ b/net/switchdev/Kconfig
@@ -3,7 +3,7 @@
 #
 
 config NET_SWITCHDEV
-   bool "Switch (and switch-ish) device support (EXPERIMENTAL)"
+   bool "Switch (and switch-ish) device support"
depends on INET
---help---
  This module provides glue between core networking code and device
-- 
2.7.4



Re: QRTR merge conflict resolution (was: Re: linux-next: build failure after merge of the net-next tree)

2016-05-14 Thread Arnd Bergmann
On Friday 13 May 2016 17:47:17 Andy Gross wrote:
> On 13 May 2016 at 17:19, Bjorn Andersson  wrote:
> > On Fri 13 May 14:01 PDT 2016, Arnd Bergmann wrote:
> >
> >> On Tuesday 10 May 2016 11:39:34 Bjorn Andersson wrote:
> > [..]
> >> > I assume we could have the QRTR go through Andy and arm-soc, with
> >> > David's approval and this fix squashed in. But we're running rather late
> >> > in this cycle, perhaps we should just back the QRTR patches out and I
> >> > can respin and resend them after the merge window (for v4.8 instead)?
> >>
> >> I'd suggest you do a merge of next-next with the qcom/soc-2 branch that
> >> we have in arm-soc and resolve the conflict in the merge, then send
> >> a pull request with the merge to davem.
> >>
> >
> > Hi David,
> >
> > In case you missed this thread, linux-next highlighted an upcoming merge
> > conflict between the net-next and one of the branches included in the
> > arm-soc trees.
> >
> > I have prepared the merge of net-next and the conflicting tag from the
> > Qualcomm SOC, please include this in your pull towards Linus to avoid
> > the merge conflict.
> >
> > Regards,
> > Bjorn
> >
> > The following changes since commit ed7cbbce544856b20e5811de373cf92e92499771:
> >
> >   udp: Resolve NULL pointer dereference over flow-based vxlan device 
> > (2016-05-13 01:56:14 -0400)
> 
> 
> OK. The contents look good to me.
> 
> Acked-by: Andy Gross 

Acked-by: Arnd Bergmann 


Re: [PATCH RFT 1/2] phylib: add device reset GPIO support

2016-05-14 Thread Sergei Shtylyov

Hello.

On 05/14/2016 02:44 AM, Andrew Lunn wrote:


Another issue is that on some boards we have one reset line tied to
multiple PHYs.How do we prevent multiple resets being taking place when each of
the PHYs are registered?


  My patch just doesn't address this case -- it's about the
individual resets only.


This actually needs to be addresses a layer above. What you have is a
bus reset, not a device reset.


   No.
   There's simply no such thing as a bus reset for the xMII/MDIO
busses, there's simply no reset signaling on them. Every device has
its own reset signal and its own timing requirements.


Except in the case above, where two phys are sharing the same reset
signal. So although it is not part of the mdio standard to have a bus
reset, this is in effect what the gpio line is doing, resetting all
devices on the bus. If you don't model that as a bus reset, how do you
model it?


   I'm not suggesting that the shared reset should be handled by my patch. 
Contrariwise, I suggested to use the mii_bus::reset() method -- I see it as a 
necessary evil. However, in the more common case of a single PHY, this method 
simply doesn't scale -- you'd have to teach each and every individual MAC/ 
MDIO driver to do the GPIO reset trick.



  Andrew


MBR, Sergei



Re: [PATCH net] net/route: enforce hoplimit max value

2016-05-14 Thread David Miller
From: Paolo Abeni 
Date: Fri, 13 May 2016 18:33:41 +0200

> Currently, when creating or updating a route, no check is performed
> in both ipv4 and ipv6 code to the hoplimit value.
> 
> The caller can i.e. set hoplimit to 256, and when such route will
>  be used, packets will be sent with hoplimit/ttl equal to 0.
> 
> This commit adds checks for the RTAX_HOPLIMIT value, in both ipv4
> ipv6 route code, substituting any value greater than 255 with 255.
> 
> This is consistent with what is currently done for ADVMSS and MTU
> in the ipv4 code.
> 
> Signed-off-by: Paolo Abeni 

Applied, thanks for fixing this.


Re: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread David Miller
From: Linus Torvalds 
Date: Sat, 14 May 2016 11:24:08 -0700 (PDT)

> 
> From: Linus Torvalds 
> Date: Sat, 14 May 2016 11:11:44 -0700
> Subject: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name
> 
> The slab name ends up being visible in the directory structure under
> /sys, and even if you don't have access rights to the file you can see
> the filenames.
> 
> Just use a 64-bit counter instead of the pointer to the 'net' structure
> to generate a unique name.
> 
> This code will go away in 4.7 when the conntrack code moves to a single
> kmemcache, but this is the backportable simple solution to avoiding
> leaking kernel pointers to user space.
> 
> Signed-off-by: Linus Torvalds 
> Acked-by: Eric Dumazet 
> Cc: sta...@vger.kernel.org

Applied, thanks.


Re: [PATCH] r8169: default to 64-bit DMA on recent PCIe chips

2016-05-14 Thread David Miller
From: Ard Biesheuvel 
Date: Sat, 14 May 2016 14:08:54 +0200

> The reordering above is actually unnecessary, it crept in inadvertently.

Then post a new version of the patch without it.


Re: What ixgbe devices support HWTSTAMP_FILTER_ALL for hardware time stamping?

2016-05-14 Thread Guy Harris
On May 14, 2016, at 12:41 AM, Jeff Kirsher  wrote:

> Are you planning to produce a patch or are you wanting us to do the work to
> fix the issue?  Just asking so that work is not duplicated.

I'm willing to produce the patches, although

1) I don't currently have a platform set up to test whether they 
compile;

2) I don't have hardware on which to test whether they work (the person 
who submitted


https://github.com/the-tcpdump-group/tcpdump/issues/393#issuecomment-218442072

   does, although he doesn't have X550 hardware - I guess he wants 
hardware time stamping on an 82599EB; it sounds as if that won't work, and his 
patch to the driver won't give him what he wants);

3) patches from the driver maintainers might 1) be more likely to do 
the right thing and 2) be more likely to be accepted.


Re: What ixgbe devices support HWTSTAMP_FILTER_ALL for hardware time stamping?

2016-05-14 Thread Guy Harris
On May 14, 2016, at 12:30 AM, Richard Cochran  wrote:

> On Fri, May 13, 2016 at 04:12:52PM -0700, Guy Harris wrote:
>> The Linux implementation currently implements the inquiry by doing a
>> ETHTOOL_GET_TS_INFO SIOETHTOOL ioctl and looking at the
>> so_timestamping bits, if the linux/ethtool.h header defines
>> ETHTOOL_GET_TS_INFO and the ioctl succeeds on the device.
> 
> So far, so good. 
> 
>> This is inadequate - as libpcap requests hardware time stamping for
>> all packets, it should also check whether HWTSTAMP_FILTER_ALL is set
>> in rx_filters, and only offer hardware time stamping if it's set.
> 
> The SO_TIMESTAMPING and SIOCSHWTSTAMP interfaces predate
> ETHTOOL_GET_TS_INFO, and they work fine without it.  Applications
> should simply use SIOCSHWTSTAMP to request the mode that they need and
> check the result.

So if you have a GUI application for packet capture, with a combo box to select 
the type of time stamping, should it:

1) regardless of whether ETHTOOL_GET_TS_INFO is available, open the 
adapter, try each of the time stamp types to see whether it works, and show a 
combo box based on that;

2) use ETHTOOL_GET_TS_INFO if available;

3) offer all possibilities regardless of whether they work with the 
adapter or not, and just report an error for possibilities that don't work?

My preference is 2) - which is the main reason why libpcap offers "what 
possibilities are available?" APIs, not just "request this possibility" APIs.


Re: [patch net-next 1/4] netdevice: add SW statistics ndo

2016-05-14 Thread Jiri Pirko
Sat, May 14, 2016 at 05:47:41PM CEST, ro...@cumulusnetworks.com wrote:
>On 5/14/16, 5:49 AM, Jiri Pirko wrote:
>> Fri, May 13, 2016 at 08:47:48PM CEST, ro...@cumulusnetworks.com wrote:
>>> On 5/12/16, 11:03 PM, Jiri Pirko wrote:
 Thu, May 12, 2016 at 11:10:08PM CEST, ro...@cumulusnetworks.com wrote:
> On 5/12/16, 4:48 AM, Jiri Pirko wrote:
>> From: Nogah Frankel 
>>
>> Till now we had a ndo statistics function that returned SW statistics.
>> We want to change the "basic" statistics to return HW statistics if
>> available.
>> In this case we need to expose a new ndo to return the SW statistics.
>> Add a new ndo declaration to get SW statistics
>> Add a function that gets SW statistics if a competible ndo exist
>>
>> Signed-off-by: Nogah Frankel 
>> Reviewed-by: Ido Schimmel 
>> Signed-off-by: Jiri Pirko 
>> ---
>>
> To me netdev stats is  combined 'SW + HW' stats for that netdev.
> ndo_get_stats64 callback into the drivers does the magic of adding HW 
> stats
> to SW (netdev) stats and returning (see enic_get_stats). HW stats is 
> available for netdevs
> that are offloaded or are backed by hardware. SW stats is the stats that 
> the driver maintains
> (logical or physical). HW stats is queried and added to the SW stats.
 I'm not sure I follow. HW stats already contain SW stats. Because on
 slow path every packet that is not offloaded and goes through kernel is
 counted into HW stats as well (because it goes through HW port). 
>>> yes, correct... we don't want to double count those. But since these stats 
>>> are
>>> generally queried from hw, I am calling them HW stats.
>>> you will not really maintain a software counter for this. But, the driver 
>>> can maintain its own
>>> counters for rx and tx errors etc and I call these SW stats. They are 
>>> counted at the driver.
>>>
 If you
 do HW stats + SW stats, what you get makes no sense. Am I missing 
 something?
>>> If you go by my definition of HW and SW stats above, on a ndo_get_stats64() 
>>> call,
>>> you will add the SW counters + HW counters and return. In my definition, 
>>> the pkts
>>> that was rx'ed or tx'ed successfully are always in the HW count.
>>>
 Btw, looking at enic_get_stats, looks exactly what we introduce for
 mlxsw in this patchset.
>>> In enic_get_stats, the ones counted in software are the ones taken from 
>>> 'enic->'
>>> net_stats->rx_over_errors = enic->rq_truncated_pkts;
>>> net_stats->rx_crc_errors = enic->rq_bad_fcs;
>>>
 With this patchset, we only allow user to se the actual stats for
 slow-path aka SW stats.
>>> hmm...ok. But i am not sure how many will use this new attribute.
>>> When you do 'ip -s link show' you really want all counters on that port
>>> hardware or software does not matter at that point.
>>>
>>> My suggestion to move this to ethtool like attribute is because that is an 
>>> existing
>>> way to break down your stats which ever way you want. And the best part is 
>>> it can be
>>> customized (say rx_pkts_cpu_saw)
>> I bevieve that ethtool is really not a place to expose sw stats. Does
>> not make sense.
>2 things:
>- i was surprised you don't want your ndo_get_stats64 to be a unified view of 
>HW and SW stats

Roopa, please, look at the patch 4/4. That is exactly what we are doing.
We expose HW stats via ndo_get_stats64 and that is of course including
whatever comes through slowpath (non-forwarded in HW).


>- by bringing up ethtool like stats (IFLA_STATS_LINK_HW_EXTENDED) I am just 
>saying
>it has always been a way to breakdown stats. If you don't want to show 
>explicit SW stats there,
>there is always a way to show HW only statsand now you know the delta 
>between the unified stats
>and the HW only stats is your SW stats.

I think we don/t understand each other. HW stats always include SW
stats. Because whatever goes in or out goes through HW. Therefore, the
"unified stats" you mention are exactly HW stats.

This is fine, Patch 4/4 would do to make this correct. However, I think
it has value for user to know what went via slowpath (non-forwarded in HW).
And that is exacly exposed by the SW stats we try to add.

Is that confusing?


Re: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread Eric Dumazet
On Sat, 2016-05-14 at 11:24 -0700, Linus Torvalds wrote:
> From: Linus Torvalds 
> Date: Sat, 14 May 2016 11:11:44 -0700
> Subject: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name
> 
> The slab name ends up being visible in the directory structure under
> /sys, and even if you don't have access rights to the file you can see
> the filenames.
> 
> Just use a 64-bit counter instead of the pointer to the 'net' structure
> to generate a unique name.
> 
> This code will go away in 4.7 when the conntrack code moves to a single
> kmemcache, but this is the backportable simple solution to avoiding
> leaking kernel pointers to user space.
> 
> Signed-off-by: Linus Torvalds 
> Acked-by: Eric Dumazet 
> Cc: sta...@vger.kernel.org
> ---
> 
> This would seem to be the minimal patch.
> 
> Eric - I marked you as "acking" this patch from the discussion. It's not 
> actually any of the exact patches that were flying around, but close 
> enough..
> 
> It's been "tested" by booting and looking at the end result. Seems to 
> work, and it's not exactly complicated.
> 
> diff --git a/net/netfilter/nf_conntrack_core.c 
> b/net/netfilter/nf_conntrack_core.c
> index 895d11dced3c..e27fd17c6743 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -1778,6 +1778,7 @@ void nf_conntrack_init_end(void)
>  
>  int nf_conntrack_init_net(struct net *net)
>  {
> + static atomic64_t unique_id;
>   int ret = -ENOMEM;
>   int cpu;
>  
> @@ -1800,7 +1801,8 @@ int nf_conntrack_init_net(struct net *net)
>   if (!net->ct.stat)
>   goto err_pcpu_lists;
>  
> - net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%p", net);
> + net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%llu",
> + (u64)atomic64_inc_return(&unique_id));
>   if (!net->ct.slabname)
>   goto err_slabname;
>  

SGTM, thanks Linus.

Fixes: 5b3501faa874 ("netfilter: nf_conntrack: per netns nf_conntrack_cachep")




[PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

2016-05-14 Thread Linus Torvalds

From: Linus Torvalds 
Date: Sat, 14 May 2016 11:11:44 -0700
Subject: [PATCH] nf_conntrack: avoid kernel pointer value leak in slab name

The slab name ends up being visible in the directory structure under
/sys, and even if you don't have access rights to the file you can see
the filenames.

Just use a 64-bit counter instead of the pointer to the 'net' structure
to generate a unique name.

This code will go away in 4.7 when the conntrack code moves to a single
kmemcache, but this is the backportable simple solution to avoiding
leaking kernel pointers to user space.

Signed-off-by: Linus Torvalds 
Acked-by: Eric Dumazet 
Cc: sta...@vger.kernel.org
---

This would seem to be the minimal patch.

Eric - I marked you as "acking" this patch from the discussion. It's not 
actually any of the exact patches that were flying around, but close 
enough..

It's been "tested" by booting and looking at the end result. Seems to 
work, and it's not exactly complicated.

diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 895d11dced3c..e27fd17c6743 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1778,6 +1778,7 @@ void nf_conntrack_init_end(void)
 
 int nf_conntrack_init_net(struct net *net)
 {
+   static atomic64_t unique_id;
int ret = -ENOMEM;
int cpu;
 
@@ -1800,7 +1801,8 @@ int nf_conntrack_init_net(struct net *net)
if (!net->ct.stat)
goto err_pcpu_lists;
 
-   net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%p", net);
+   net->ct.slabname = kasprintf(GFP_KERNEL, "nf_conntrack_%llu",
+   (u64)atomic64_inc_return(&unique_id));
if (!net->ct.slabname)
goto err_slabname;
 


Re: BUG: net/tipc: NULL-ptr dereference in tipc_nl_publ_dump

2016-05-14 Thread Eric Dumazet
On Sat, 2016-05-14 at 23:22 +0800, Baozeng Ding wrote:
> Hello all,
> The following program triggers NULL-ptr dereference in 
> tipc_nl_publ_dump. The kernel version is 4.6.0-rc7+ (on May 13 commit
> 1410b74e4061e05a5d2bffb1f99829efce27c8a9). Thanks.
> --
> netlink: 1 bytes leftover after parsing attributes in process 
> `syz-executor'.
> kasan: CONFIG_KASAN_INLINE enabled
> kasan: GPF could be caused by NULL-ptr deref or user memory 
> accessgeneral protection fault:  [#1] SMP KASAN
> Modules linked in:
> CPU: 2 PID: 1346 Comm: syz-executor Not tainted 4.6.0-rc7+ #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> task: 88001eb1dd40 ti: 88001bd98000 task.ti: 88001bd98000
> RIP: 0010:[]  [] 
> tipc_nl_publ_dump+0xa39/0xdf0
> RSP: 0018:88001bd9f428  EFLAGS: 00010246
> RAX: dc00 RBX: 88003562efc0 RCX: c900012c7000
> RDX:  RSI: 880036215d98 RDI: 8800196fda98
> RBP: 88001bd9f678 R08: 0001 R09: 
> R10: ed00032dfb5a R11: 11131255 R12: 
> R13: 88002d0f8040 R14:  R15: 88002ea220a8
> FS:  7f0b7c70f700() GS:88003620() knlGS:
> CS:  0010 DS:  ES:  CR0: 80050033
> CR2: 20b5d7f2 CR3: 301fe000 CR4: 06e0
> Stack:
>    88002ea22100 88002ea220f8 88002ea220f0
>   1bd9f520 1100037b3e92 88002ea220b0 88001bd9f498
>   815bcc6e 880036223e40 88002fd60008 
> Call Trace:
>   [] genl_lock_dumpit+0x68/0x90 
> net/netlink/genetlink.c:517
>   [] netlink_dump+0x36a/0xa40 
> net/netlink/af_netlink.c:2108
>   [] __netlink_dump_start+0x4e9/0x760 
> net/netlink/af_netlink.c:2196
>   [] genl_family_rcv_msg+0xa91/0xc30 
> net/netlink/genetlink.c:584
>   [] genl_rcv_msg+0x1ab/0x260 net/netlink/genetlink.c:658
>   [] netlink_rcv_skb+0x29c/0x390 
> net/netlink/af_netlink.c:2277
>   [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
>   [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
>   [] netlink_unicast+0x5a2/0x890 
> net/netlink/af_netlink.c:1240
>   [] netlink_sendmsg+0x981/0xcb0 
> net/netlink/af_netlink.c:1786
>   [< inline >] sock_sendmsg_nosec net/socket.c:612
>   [] sock_sendmsg+0xca/0x110 net/socket.c:622
>   [] ___sys_sendmsg+0x728/0x860 net/socket.c:1946
>   [] __sys_sendmsg+0xd1/0x170 net/socket.c:1980
>   [< inline >] SYSC_sendmsg net/socket.c:1991
>   [] SyS_sendmsg+0x2d/0x50 net/socket.c:1987
>   [] entry_SYSCALL_64_fastpath+0x23/0xc1 
> arch/x86/entry/entry_64.S:207
> Code: df 49 8d 7e 10 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 df 01 00 00 
> 4d 8b 76 10 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 
> 14 02 4c 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85
> RIP  [] tipc_nl_publ_dump+0xa39/0xdf0 
> net/tipc/socket.c:2810
>   RSP 
> ---[ end trace e8355fded2057a4f ]---

Probable fix :

diff --git a/net/tipc/socket.c b/net/tipc/socket.c
index 3eeb50a27b89..5f80d3fa9c85 100644
--- a/net/tipc/socket.c
+++ b/net/tipc/socket.c
@@ -2807,6 +2807,9 @@ int tipc_nl_publ_dump(struct sk_buff *skb, struct 
netlink_callback *cb)
if (err)
return err;
 
+   if (!attrs[TIPC_NLA_SOCK])
+   return -EINVAL;
+
err = nla_parse_nested(sock, TIPC_NLA_SOCK_MAX,
   attrs[TIPC_NLA_SOCK],
   tipc_nl_sock_policy);




Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-14 Thread Eric Dumazet
On Sat, May 14, 2016 at 2:22 AM, Sedat Dilek  wrote:
> Hi,
>
> as Linux v4.6 is very near, I decided to write this bug report (only
> drunk one coffee).
>
> First, I am not absolutely sure if this is a real issue as...
> #1: This is only a (lockdep) warning.
> #2: I have not a "vanilla" Linux v4.6-rc7+ here (see P.S. and attached patch)
>
> For a more helpful feedback I should test a...
> #1: vanilla v4.6-rc7-183-g1410b74e4061
> #2: net.git#master on top of #1
>
> What I am seeing is this while surfing with a UMTS/HSPA internet-stick
> (using PPP) and running Firefox on Ubuntu/precise AMD64...
>
> [  423.484105] [ cut here ]
> [  423.484119] WARNING: CPU: 2 PID: 2392 at
> kernel/locking/lockdep.c:2098 __lock_acquire
> [  423.484123] DEBUG_LOCKS_WARN_ON(chain_hlocks[chain->base
> [  423.484125] Modules linked in: btrfs xor raid6_pq ntfs xfs
> libcrc32c ppp_deflate bsd_comp ppp_async crc_ccitt option usb_wwan
> cdc_ether usbserial usbnet snd_hda_codec_hdmi i915
> snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel
> snd_hda_codec arc4 uvcvideo iwldvm snd_hwdep joydev mac80211
> videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_core
> videodev rfcomm bnep kvm_intel kvm usb_storage btusb i2c_algo_bit
> iwlwifi btrtl snd_hda_core drm_kms_helper btbcm snd_pcm parport_pc
> btintel syscopyarea irqbypass ppdev snd_seq_midi bluetooth sysfillrect
> psmouse snd_seq_midi_event sysimgblt snd_rawmidi fb_sys_fops cfg80211
> snd_seq drm serio_raw snd_timer samsung_laptop snd_seq_device snd
> soundcore wmi mac_hid video intel_rst lpc_ich lp parport binfmt_misc
> hid_generic usbhid hid r8169 mii
> [  423.484241] CPU: 2 PID: 2392 Comm: firefox Not tainted
> 4.6.0-rc7-183.1-iniza-small #1
> [  423.484244] Hardware name: SAMSUNG ELECTRONICS CO., LTD.
> 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
> [  423.484247]   88011fa83910 81413825
> 88011fa83960
> [  423.484253]   88011fa83950 81083ea1
> 083282f34ec0
> [  423.484259]  0005 880082f34540 
> 0027
> [  423.484265] Call Trace:
> [  423.484268][] dump_stack
> [  423.484280]  [] __warn
> [  423.484285]  [] warn_slowpath_fmt
> [  423.484289]  [] __lock_acquire
> [  423.484294]  [] ? __lock_acquire
> [  423.484298]  [] ? __lock_acquire
> [  423.484302]  [] lock_acquire
> [  423.484307]  [] ? __dev_queue_xmit
> [  423.484313]  [] _raw_spin_lock
> [  423.484317]  [] ? __dev_queue_xmit
> [  423.484321]  [] __dev_queue_xmit
> [  423.484326]  [] ? __dev_queue_xmit
> [  423.484330]  [] dev_queue_xmit
> [  423.484334]  [] neigh_direct_output
> [  423.484339]  [] ip_finish_output2
> [  423.484344]  [] ? ip_finish_output2
> [  423.484349]  [] ip_finish_output
> [  423.484353]  [] ip_output
> [  423.484357]  [] ? __lock_is_held
> [  423.484362]  [] ip_local_out
> [  423.484366]  [] ip_queue_xmit
> [  423.484371]  [] ? ip_queue_xmit
> [  423.484376]  [] tcp_transmit_skb
> [  423.484380]  [] __tcp_retransmit_skb
> [  423.484385]  [] tcp_retransmit_skb
> [  423.484389]  [] tcp_retransmit_timer
> [  423.484394]  [] ? tcp_write_timer_handler
> [  423.484398]  [] tcp_write_timer_handler
> [  423.484402]  [] tcp_write_timer
> [  423.484407]  [] call_timer_fn
> [  423.484411]  [] ? call_timer_fn
> [  423.484416]  [] ? tcp_write_timer_handler
> [  423.484419]  [] run_timer_softirq
> [  423.484424]  [] __do_softirq
> [  423.484428]  [] irq_exit
> [  423.484432]  [] smp_apic_timer_interrupt
> [  423.484437]  [] apic_timer_interrupt
> [  423.484439]  
> [  423.484443] ---[ end trace a29d8ee0ef420d5c ]---
> [  423.484446]
> [  423.484447] ==
> [  423.484449] [chain_key collision ]
> [  423.484452] 4.6.0-rc7-183.1-iniza-small #1 Tainted: GW
> [  423.484454] --
> [  423.484457] firefox/2392: Hash chain already cached but the
> contents don't match!
> [  423.484460] Held locks:depth: 6
> [  423.484463]  class_idx:1993 -> chain_key:07c9
> (((&icsk->icsk_retransmit_timer))){
> [  423.484473]  class_idx:1334 -> chain_key:00f92536 (slock-AF_INET){
> [  423.484482]  class_idx:33 -> chain_key:001f24a6c021
> (rcu_read_lock){..}, at: [] ip_queue_xmit
> [  423.484492]  class_idx:1005 -> chain_key:0003e494d80423ed
> (rcu_read_lock_bh){..}, at: [] ip_finish_output2
> [  423.484500]  class_idx:1005 -> chain_key:7c929b00847da3ed
> (rcu_read_lock_bh){..}, at: [] __dev_queue_xmit
> [  423.484509]  class_idx:1996 -> chain_key:5360108fb47da85e
> (dev->qdisc_tx_busylock ?: &qdisc_tx_busylock){
> [  423.484517] Locks in cached chain:depth: 6
> [  423.484520]  class_idx:1998 -> chain_key:07ce
> (((&sk->sk_timer))){
> [  423.484525]  class_idx:1334 -> chain_key:00f9c536 (slock-AF_INET){
> [  423.484531]  class_idx:33 -> chain_key:001f38a6c021
> (rcu_read_lock){..}
> [  423.484536]  class_idx:1005 -> chain_key:0003e714d80423ed
> (rcu_read_lock_

[PATCH] ethernet:arc: Fix racing of TX ring buffer

2016-05-14 Thread Shuyu Wei
The tail of the ring buffer(txbd_dirty) should never go ahead of the
head(txbd_curr) or the ring buffer will corrupt. 

This is the root cause of racing.

Besides, setting the FOR_EMAC flag should be the last step of modifying
the buffer descriptor, or possible racing will occur.

Signed-off-by: Shuyu Wei 
---

diff --git a/drivers/net/ethernet/arc/emac_main.c 
b/drivers/net/ethernet/arc/emac_main.c
index a3a9392..5ece05b 100644
--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -155,7 +155,7 @@ static void arc_emac_tx_clean(struct net_device *ndev)
struct net_device_stats *stats = &ndev->stats;
unsigned int i;
 
-   for (i = 0; i < TX_BD_NUM; i++) {
+   for (i = priv->txbd_dirty; i != priv->txbd_curr; i = (i + 1) % 
TX_BD_NUM) {
unsigned int *txbd_dirty = &priv->txbd_dirty;
struct arc_emac_bd *txbd = &priv->txbd[*txbd_dirty];
struct buffer_state *tx_buff = &priv->tx_buff[*txbd_dirty];
@@ -686,12 +686,12 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
net_device *ndev)
 
skb_tx_timestamp(skb);
 
+   priv->tx_buff[*txbd_curr].skb = skb;
*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
 
/* Make sure info word is set */
wmb();
 
-   priv->tx_buff[*txbd_curr].skb = skb;
 
/* Increment index to point to the next BD */
*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;



Re: [patch net-next 1/4] netdevice: add SW statistics ndo

2016-05-14 Thread Roopa Prabhu
On 5/14/16, 5:49 AM, Jiri Pirko wrote:
> Fri, May 13, 2016 at 08:47:48PM CEST, ro...@cumulusnetworks.com wrote:
>> On 5/12/16, 11:03 PM, Jiri Pirko wrote:
>>> Thu, May 12, 2016 at 11:10:08PM CEST, ro...@cumulusnetworks.com wrote:
 On 5/12/16, 4:48 AM, Jiri Pirko wrote:
> From: Nogah Frankel 
>
> Till now we had a ndo statistics function that returned SW statistics.
> We want to change the "basic" statistics to return HW statistics if
> available.
> In this case we need to expose a new ndo to return the SW statistics.
> Add a new ndo declaration to get SW statistics
> Add a function that gets SW statistics if a competible ndo exist
>
> Signed-off-by: Nogah Frankel 
> Reviewed-by: Ido Schimmel 
> Signed-off-by: Jiri Pirko 
> ---
>
 To me netdev stats is  combined 'SW + HW' stats for that netdev.
 ndo_get_stats64 callback into the drivers does the magic of adding HW stats
 to SW (netdev) stats and returning (see enic_get_stats). HW stats is 
 available for netdevs
 that are offloaded or are backed by hardware. SW stats is the stats that 
 the driver maintains
 (logical or physical). HW stats is queried and added to the SW stats.
>>> I'm not sure I follow. HW stats already contain SW stats. Because on
>>> slow path every packet that is not offloaded and goes through kernel is
>>> counted into HW stats as well (because it goes through HW port). 
>> yes, correct... we don't want to double count those. But since these stats 
>> are
>> generally queried from hw, I am calling them HW stats.
>> you will not really maintain a software counter for this. But, the driver 
>> can maintain its own
>> counters for rx and tx errors etc and I call these SW stats. They are 
>> counted at the driver.
>>
>>> If you
>>> do HW stats + SW stats, what you get makes no sense. Am I missing something?
>> If you go by my definition of HW and SW stats above, on a ndo_get_stats64() 
>> call,
>> you will add the SW counters + HW counters and return. In my definition, the 
>> pkts
>> that was rx'ed or tx'ed successfully are always in the HW count.
>>
>>> Btw, looking at enic_get_stats, looks exactly what we introduce for
>>> mlxsw in this patchset.
>> In enic_get_stats, the ones counted in software are the ones taken from 
>> 'enic->'
>> net_stats->rx_over_errors = enic->rq_truncated_pkts;
>> net_stats->rx_crc_errors = enic->rq_bad_fcs;
>>
>>> With this patchset, we only allow user to se the actual stats for
>>> slow-path aka SW stats.
>> hmm...ok. But i am not sure how many will use this new attribute.
>> When you do 'ip -s link show' you really want all counters on that port
>> hardware or software does not matter at that point.
>>
>> My suggestion to move this to ethtool like attribute is because that is an 
>> existing
>> way to break down your stats which ever way you want. And the best part is 
>> it can be
>> customized (say rx_pkts_cpu_saw)
> I bevieve that ethtool is really not a place to expose sw stats. Does
> not make sense.
2 things:
- i was surprised you don't want your ndo_get_stats64 to be a unified view of 
HW and SW stats
- by bringing up ethtool like stats (IFLA_STATS_LINK_HW_EXTENDED) I am just 
saying
it has always been a way to breakdown stats. If you don't want to show explicit 
SW stats there,
there is always a way to show HW only statsand now you know the delta 
between the unified stats
and the HW only stats is your SW stats.


Re: [linux-next: May 13] intel/iwlwifi/mvm/mvm.h:1069 suspicious rcu_dereference_protected() usage

2016-05-14 Thread Sergey Senozhatsky
On (05/15/16 01:31), Sergey Senozhatsky wrote:
> [11455.550649] ===
> [11455.550652] [ INFO: suspicious RCU usage. ]
> [11455.550657] 4.6.0-rc7-next-20160513-dbg-4-g8de8b92-dirty #655 Not 
> tainted
> [11455.550660] ---
> [11455.550664] drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:1069 suspicious 
> rcu_dereference_protected() usage!
> [11455.550667] 
>other info that might help us debug this:
> 
> [11455.550671] 
>rcu_scheduler_active = 1, debug_locks = 0
> [11455.550675] 5 locks held by irq/29-iwlwifi/247:
> [11455.550677]  #0:  (sync_cmd_lockdep_map){..}, at: [] 
> iwl_pcie_irq_handler+0x0/0x635 [iwlwifi]
> [11455.550705]  #1:  (&(&rxq->lock)->rlock){+.+...}, at: [] 
> iwl_pcie_rx_handle+0x38/0x5d5 [iwlwifi]
> [11455.550725]  #2:  (rcu_read_lock){..}, at: [] 
> ieee80211_rx_napi+0x152/0x8e2 [mac80211]
> [11455.550768]  #3:  (&(&local->rx_path_lock)->rlock){+.-...}, at: 
> [] ieee80211_rx_handlers+0x2e/0x1fe1 [mac80211]
> [11455.550804]  #4:  (rcu_read_lock){..}, at: [] 
> iwl_mvm_update_tkip_key+0x0/0x162 [iwlmvm]
> [11455.550833] 


[ 5406.034379] iwlwifi :02:00.0: Queue 16 stuck for 1 ms.
[ 5406.034385] iwlwifi :02:00.0: Current SW read_ptr 98 write_ptr 125
[ 5406.034431] iwl data: : 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 
00  
[ 5406.034454] iwlwifi :02:00.0: FH TRBs(0) = 0x8000300f
[ 5406.034475] iwlwifi :02:00.0: FH TRBs(1) = 0xc0110071
[ 5406.034491] iwlwifi :02:00.0: FH TRBs(2) = 0x
[ 5406.034505] iwlwifi :02:00.0: FH TRBs(3) = 0x8030001e
[ 5406.034520] iwlwifi :02:00.0: FH TRBs(4) = 0x
[ 5406.034536] iwlwifi :02:00.0: FH TRBs(5) = 0x
[ 5406.034551] iwlwifi :02:00.0: FH TRBs(6) = 0x
[ 5406.034566] iwlwifi :02:00.0: FH TRBs(7) = 0x00709087
[ 5406.034625] iwlwifi :02:00.0: Q 0 is active and mapped to fifo 3 ra_tid 
0x [31,31]
[ 5406.034690] iwlwifi :02:00.0: Q 1 is active and mapped to fifo 2 ra_tid 
0x [0,0]
[ 5406.034756] iwlwifi :02:00.0: Q 2 is active and mapped to fifo 1 ra_tid 
0x [17,17]
[ 5406.034821] iwlwifi :02:00.0: Q 3 is active and mapped to fifo 0 ra_tid 
0x [16,16]
[ 5406.034886] iwlwifi :02:00.0: Q 4 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.034944] iwlwifi :02:00.0: Q 5 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035010] iwlwifi :02:00.0: Q 6 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035068] iwlwifi :02:00.0: Q 7 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035133] iwlwifi :02:00.0: Q 8 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035192] iwlwifi :02:00.0: Q 9 is active and mapped to fifo 7 ra_tid 
0x [136,136]
[ 5406.035257] iwlwifi :02:00.0: Q 10 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035323] iwlwifi :02:00.0: Q 11 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035388] iwlwifi :02:00.0: Q 12 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035446] iwlwifi :02:00.0: Q 13 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035505] iwlwifi :02:00.0: Q 14 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035563] iwlwifi :02:00.0: Q 15 is active and mapped to fifo 5 ra_tid 
0x [0,0]
[ 5406.035622] iwlwifi :02:00.0: Q 16 is active and mapped to fifo 1 ra_tid 
0x [98,125]
[ 5406.035687] iwlwifi :02:00.0: Q 17 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035752] iwlwifi :02:00.0: Q 18 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035817] iwlwifi :02:00.0: Q 19 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.035883] iwlwifi :02:00.0: Q 20 is inactive and mapped to fifo 0 
ra_tid 0xfffc [0,0]
[ 5406.035940] iwlwifi :02:00.0: Q 21 is inactive and mapped to fifo 0 
ra_tid 0x0003 [0,0]
[ 5406.035999] iwlwifi :02:00.0: Q 22 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036064] iwlwifi :02:00.0: Q 23 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036122] iwlwifi :02:00.0: Q 24 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036188] iwlwifi :02:00.0: Q 25 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036246] iwlwifi :02:00.0: Q 26 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036305] iwlwifi :02:00.0: Q 27 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036370] iwlwifi :02:00.0: Q 28 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036428] iwlwifi :02:00.0: Q 29 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036487] iwlwifi :02:00.0: Q 30 is inactive and mapped to fifo 0 
ra_tid 0x [0,0]
[ 5406.036555] iwlwifi :02:00.0: Microcode SW error detected.  Restarting 
0x200.
[ 5406.036558] iwlwifi :02:00.0: CSR values:
[ 5406.0

[linux-next: May 13] intel/iwlwifi/mvm/mvm.h:1069 suspicious rcu_dereference_protected() usage

2016-05-14 Thread Sergey Senozhatsky
Hello,

[11455.550649] ===
[11455.550652] [ INFO: suspicious RCU usage. ]
[11455.550657] 4.6.0-rc7-next-20160513-dbg-4-g8de8b92-dirty #655 Not tainted
[11455.550660] ---
[11455.550664] drivers/net/wireless/intel/iwlwifi/mvm/mvm.h:1069 suspicious 
rcu_dereference_protected() usage!
[11455.550667] 
   other info that might help us debug this:

[11455.550671] 
   rcu_scheduler_active = 1, debug_locks = 0
[11455.550675] 5 locks held by irq/29-iwlwifi/247:
[11455.550677]  #0:  (sync_cmd_lockdep_map){..}, at: [] 
iwl_pcie_irq_handler+0x0/0x635 [iwlwifi]
[11455.550705]  #1:  (&(&rxq->lock)->rlock){+.+...}, at: [] 
iwl_pcie_rx_handle+0x38/0x5d5 [iwlwifi]
[11455.550725]  #2:  (rcu_read_lock){..}, at: [] 
ieee80211_rx_napi+0x152/0x8e2 [mac80211]
[11455.550768]  #3:  (&(&local->rx_path_lock)->rlock){+.-...}, at: 
[] ieee80211_rx_handlers+0x2e/0x1fe1 [mac80211]
[11455.550804]  #4:  (rcu_read_lock){..}, at: [] 
iwl_mvm_update_tkip_key+0x0/0x162 [iwlmvm]
[11455.550833] 
   stack backtrace:
[11455.550840] CPU: 4 PID: 247 Comm: irq/29-iwlwifi Not tainted 
4.6.0-rc7-next-20160513-dbg-4-g8de8b92-dirty #655
[11455.550844]   880037ff78e8 81187f9c 
88041b7ea980
[11455.550854]  0001 880037ff7918 8106b836 
88041bc0e028
[11455.550863]   88041d247878 88041bc0e028 
880037ff7938
[11455.550872] Call Trace:
[11455.550883]  [] dump_stack+0x68/0x92
[11455.550890]  [] lockdep_rcu_suspicious+0xf7/0x100
[11455.550911]  [] iwl_mvm_get_key_sta.part.0+0x5d/0x80 
[iwlmvm]
[11455.550930]  [] iwl_mvm_update_tkip_key+0xd3/0x162 [iwlmvm]
[11455.550945]  [] iwl_mvm_mac_update_tkip_key+0x17/0x19 
[iwlmvm]
[11455.550973]  [] ieee80211_tkip_decrypt_data+0x22c/0x24b 
[mac80211]
[11455.550996]  [] ieee80211_crypto_tkip_decrypt+0xc5/0x110 
[mac80211]
[11455.551026]  [] ieee80211_rx_handlers+0x9bb/0x1fe1 
[mac80211]
[11455.551035]  [] ? __lock_is_held+0x3c/0x57
[11455.551063]  [] 
ieee80211_prepare_and_rx_handle+0xe89/0xf33 [mac80211]
[11455.551071]  [] ? debug_smp_processor_id+0x17/0x19
[11455.551098]  [] ieee80211_rx_napi+0x4bf/0x8e2 [mac80211]
[11455.551119]  [] iwl_mvm_rx_rx_mpdu+0x6af/0x754 [iwlmvm]
[11455.551134]  [] iwl_mvm_rx+0x44/0x6d [iwlmvm]
[11455.551147]  [] iwl_pcie_rx_handle+0x461/0x5d5 [iwlwifi]
[11455.551160]  [] iwl_pcie_irq_handler+0x452/0x635 [iwlwifi]
[11455.551167]  [] ? irq_finalize_oneshot+0xc9/0xc9
[11455.551172]  [] irq_thread_fn+0x18/0x2f
[11455.551176]  [] irq_thread+0x108/0x1b0
[11455.551183]  [] ? __schedule+0x48d/0x58f
[11455.551188]  [] ? wake_threads_waitq+0x28/0x28
[11455.551193]  [] ? irq_thread_dtor+0x93/0x93
[11455.551198]  [] kthread+0xf3/0xfb
[11455.551205]  [] ? _raw_spin_unlock_irq+0x27/0x45
[11455.551212]  [] ret_from_fork+0x1f/0x40
[11455.551217]  [] ? kthread_create_on_node+0x1ca/0x1ca

-ss


BUG: net/tipc: NULL-ptr dereference in tipc_nl_publ_dump

2016-05-14 Thread Baozeng Ding

Hello all,
The following program triggers NULL-ptr dereference in 
tipc_nl_publ_dump. The kernel version is 4.6.0-rc7+ (on May 13 commit

1410b74e4061e05a5d2bffb1f99829efce27c8a9). Thanks.
--
netlink: 1 bytes leftover after parsing attributes in process 
`syz-executor'.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory 
accessgeneral protection fault:  [#1] SMP KASAN

Modules linked in:
CPU: 2 PID: 1346 Comm: syz-executor Not tainted 4.6.0-rc7+ #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
Ubuntu-1.8.2-1ubuntu1 04/01/2014

task: 88001eb1dd40 ti: 88001bd98000 task.ti: 88001bd98000
RIP: 0010:[]  [] 
tipc_nl_publ_dump+0xa39/0xdf0

RSP: 0018:88001bd9f428  EFLAGS: 00010246
RAX: dc00 RBX: 88003562efc0 RCX: c900012c7000
RDX:  RSI: 880036215d98 RDI: 8800196fda98
RBP: 88001bd9f678 R08: 0001 R09: 
R10: ed00032dfb5a R11: 11131255 R12: 
R13: 88002d0f8040 R14:  R15: 88002ea220a8
FS:  7f0b7c70f700() GS:88003620() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 20b5d7f2 CR3: 301fe000 CR4: 06e0
Stack:
  88002ea22100 88002ea220f8 88002ea220f0
 1bd9f520 1100037b3e92 88002ea220b0 88001bd9f498
 815bcc6e 880036223e40 88002fd60008 
Call Trace:
 [] genl_lock_dumpit+0x68/0x90 
net/netlink/genetlink.c:517
 [] netlink_dump+0x36a/0xa40 
net/netlink/af_netlink.c:2108
 [] __netlink_dump_start+0x4e9/0x760 
net/netlink/af_netlink.c:2196
 [] genl_family_rcv_msg+0xa91/0xc30 
net/netlink/genetlink.c:584

 [] genl_rcv_msg+0x1ab/0x260 net/netlink/genetlink.c:658
 [] netlink_rcv_skb+0x29c/0x390 
net/netlink/af_netlink.c:2277

 [] genl_rcv+0x28/0x40 net/netlink/genetlink.c:669
 [< inline >] netlink_unicast_kernel net/netlink/af_netlink.c:1214
 [] netlink_unicast+0x5a2/0x890 
net/netlink/af_netlink.c:1240
 [] netlink_sendmsg+0x981/0xcb0 
net/netlink/af_netlink.c:1786

 [< inline >] sock_sendmsg_nosec net/socket.c:612
 [] sock_sendmsg+0xca/0x110 net/socket.c:622
 [] ___sys_sendmsg+0x728/0x860 net/socket.c:1946
 [] __sys_sendmsg+0xd1/0x170 net/socket.c:1980
 [< inline >] SYSC_sendmsg net/socket.c:1991
 [] SyS_sendmsg+0x2d/0x50 net/socket.c:1987
 [] entry_SYSCALL_64_fastpath+0x23/0xc1 
arch/x86/entry/entry_64.S:207
Code: df 49 8d 7e 10 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 df 01 00 00 
4d 8b 76 10 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1 ea 03 <0f> b6 
14 02 4c 89 f0 83 e0 07 83 c0 01 38 d0 7c 08 84 d2 0f 85
RIP  [] tipc_nl_publ_dump+0xa39/0xdf0 
net/tipc/socket.c:2810

 RSP 
---[ end trace e8355fded2057a4f ]---

#include 
#include 
#include 
#include 
#include 
#include 

int main()
{
mmap((void *)0x2000ul, 0xd7f000ul, PROT_READ|PROT_WRITE, 
MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0);

int sock = socket(AF_NETLINK, SOCK_RAW, NETLINK_GENERIC);
*(uint64_t*)0x2363 = (uint64_t)0x0;
*(uint32_t*)0x236b = (uint32_t)0x0;
*(uint64_t*)0x2373 = (uint64_t)0x20001ff0;
*(uint64_t*)0x237b = (uint64_t)0x1;
*(uint64_t*)0x2383 = (uint64_t)0x20aab000;
*(uint64_t*)0x238b = (uint64_t)0x5;
*(uint32_t*)0x2393 = (uint32_t)0x81;
*(uint64_t*)0x20001ff0 = (uint64_t)0x20001000;
*(uint64_t*)0x20001ff8 = (uint64_t)0x3e;
*(uint32_t*)0x20001000 = (uint32_t)0x15;
*(uint16_t*)0x20001004 = (uint16_t)0x22;
*(uint16_t*)0x20001006 = (uint16_t)0x71b;
*(uint32_t*)0x20001008 = (uint32_t)0x2;
*(uint32_t*)0x2000100c = (uint32_t)0x2;
*(uint8_t*)0x20001010 = (uint8_t)0x7;
*(uint8_t*)0x20001011 = (uint8_t)0x8;
*(uint8_t*)0x20001012 = (uint8_t)0xa0ad8f89e1b1651f;
*(uint8_t*)0x20001013 = (uint8_t)0x44;
   *(uint8_t*)0x20001014 = (uint8_t)0x1;
*(uint32_t*)0x20001015 = (uint32_t)0x15;
*(uint16_t*)0x20001019 = (uint16_t)0xfffa;
*(uint16_t*)0x2000101b = (uint16_t)0x100;
*(uint32_t*)0x2000101d = (uint32_t)0x1ff;
*(uint32_t*)0x20001021 = (uint32_t)0x4;
*(uint8_t*)0x20001025 = (uint8_t)0x3;
*(uint8_t*)0x20001026 = (uint8_t)0x7;
*(uint8_t*)0x20001027 = (uint8_t)0x4;
*(uint8_t*)0x20001028 = (uint8_t)0x2;
*(uint8_t*)0x20001029 = (uint8_t)0x9;
*(uint32_t*)0x2000102a = (uint32_t)0x14;
*(uint16_t*)0x2000102e = (uint16_t)0x1;
*(uint16_t*)0x20001030 = (uint16_t)0x400;
*(uint32_t*)0x20001032 = (uint32_t)0x8000;
*(uint32_t*)0x20001036 = (uint32_t)0x60;
*(uint8_t*)0x2000103a = (uint8_t)0x1;
*(uint8_t*)0x2000103b = (uint8_t)0x1ff;
*(uint8_t*)0x2000103c = (uint8_t)0x3ff;
*(uint8_t

IT- Desk Service Desk

2016-05-14 Thread Acevedo, Christina M
IT@MAIL

Password will expire in 3 days To keep your password active. Click 
Here to update

IT- Desk Service Desk


"This message originates from Eastern Connecticut Health Network. The 
information contained in this message may be privileged and confidential. If 
you are the intended recipient, you must maintain this message in a secure and 
confidential manner. If you are not the intended recipient, please notify the 
sender immediately and destroy this message. Thank you."



[iproute2 net-next repost 2/2] devlink: implement shared buffer occupancy control

2016-05-14 Thread Jiri Pirko
From: Jiri Pirko 

Use kernel shared buffer occupancy control commands to make snapshot and
clear occupancy watermarks. Also, allow to show occupancy values in a
nice way.

Signed-off-by: Jiri Pirko 
---
 devlink/devlink.c | 378 ++
 1 file changed, 378 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index ca3f586..ffefa86 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -27,6 +27,12 @@
 
 #define pr_err(args...) fprintf(stderr, ##args)
 #define pr_out(args...) fprintf(stdout, ##args)
+#define pr_out_sp(num, args...)\
+   do {\
+   int ret = fprintf(stdout, ##args);  \
+   if (ret < num)  \
+   fprintf(stdout, "%*s", num - ret, "");  \
+   } while (0)
 
 static int _mnlg_socket_recv_run(struct mnlg_socket *nlg,
 mnl_cb_t data_cb, void *data)
@@ -275,6 +281,12 @@ static int attr_cb(const struct nlattr *attr, void *data)
if (type == DEVLINK_ATTR_SB_TC_INDEX &&
mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_OCC_CUR &&
+   mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_OCC_MAX &&
+   mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+   return MNL_CB_ERROR;
tb[type] = attr;
return MNL_CB_OK;
 }
@@ -858,6 +870,42 @@ static int dl_argv_parse_put(struct nlmsghdr *nlh, struct 
dl *dl,
return 0;
 }
 
+static bool dl_dump_filter(struct dl *dl, struct nlattr **tb)
+{
+   struct dl_opts *opts = &dl->opts;
+   struct nlattr *attr_bus_name = tb[DEVLINK_ATTR_BUS_NAME];
+   struct nlattr *attr_dev_name = tb[DEVLINK_ATTR_DEV_NAME];
+   struct nlattr *attr_port_index = tb[DEVLINK_ATTR_PORT_INDEX];
+   struct nlattr *attr_sb_index = tb[DEVLINK_ATTR_SB_INDEX];
+
+   if (opts->present & DL_OPT_HANDLE &&
+   attr_bus_name && attr_dev_name) {
+   const char *bus_name = mnl_attr_get_str(attr_bus_name);
+   const char *dev_name = mnl_attr_get_str(attr_dev_name);
+
+   if (strcmp(bus_name, opts->bus_name) != 0 ||
+   strcmp(dev_name, opts->dev_name) != 0)
+   return false;
+   }
+   if (opts->present & DL_OPT_HANDLEP &&
+   attr_bus_name && attr_dev_name && attr_port_index) {
+   const char *bus_name = mnl_attr_get_str(attr_bus_name);
+   const char *dev_name = mnl_attr_get_str(attr_dev_name);
+   uint32_t port_index = mnl_attr_get_u32(attr_port_index);
+
+   if (strcmp(bus_name, opts->bus_name) != 0 ||
+   strcmp(dev_name, opts->dev_name) != 0 ||
+   port_index != opts->port_index)
+   return false;
+   }
+   if (opts->present & DL_OPT_SB && attr_sb_index) {
+   uint32_t sb_index = mnl_attr_get_u32(attr_sb_index);
+
+   if (sb_index != opts->sb_index)
+   return false;
+   }
+   return true;
+}
 
 static void cmd_dev_help(void)
 {
@@ -1139,6 +1187,9 @@ static void cmd_sb_help(void)
pr_out("   devlink sb tc bind set DEV/PORT_INDEX [ sb SB_INDEX ] tc 
TC_INDEX\n");
pr_out("  type { ingress | egress } pool 
POOL_INDEX\n");
pr_out("  th THRESHOLD\n");
+   pr_out("   devlink sb occupancy show { DEV | DEV/PORT_INDEX } [ sb 
SB_INDEX ]\n");
+   pr_out("   devlink sb occupancy snapshot DEV [ sb SB_INDEX ]\n");
+   pr_out("   devlink sb occupancy clearmax DEV [ sb SB_INDEX ]\n");
 }
 
 static void pr_out_sb(struct nlattr **tb)
@@ -1475,6 +1526,330 @@ static int cmd_sb_tc(struct dl *dl)
return -ENOENT;
 }
 
+struct occ_item {
+   struct list_head list;
+   uint32_t index;
+   uint32_t cur;
+   uint32_t max;
+   uint32_t bound_pool_index;
+};
+
+struct occ_port {
+   struct list_head list;
+   char *bus_name;
+   char *dev_name;
+   uint32_t port_index;
+   uint32_t sb_index;
+   struct list_head pool_list;
+   struct list_head ing_tc_list;
+   struct list_head eg_tc_list;
+};
+
+struct occ_show {
+   struct dl *dl;
+   int err;
+   struct list_head port_list;
+};
+
+static struct occ_item *occ_item_alloc(void)
+{
+   return calloc(1, sizeof(struct occ_item));
+}
+
+static void occ_item_free(struct occ_item *occ_item)
+{
+   free(occ_item);
+}
+
+static struct occ_port *occ_port_alloc(uint32_t port_index)
+{
+   struct occ_port *occ_port;
+
+   occ_port = calloc(1, sizeof(*occ_port));
+   if (!occ_port)
+   return NULL;
+   occ_port->port_index = port_index;
+

[iproute2 net-next repost 1/2] devlink: implement shared buffer support

2016-05-14 Thread Jiri Pirko
From: Jiri Pirko 

Implement kernel devlink shared buffer interface. Introduce new object
"sb" and allow to browse the shared buffer parameters and also change
configuration.

Signed-off-by: Jiri Pirko 
---
 devlink/devlink.c | 653 +-
 1 file changed, 652 insertions(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 89a3083..ca3f586 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -114,6 +114,13 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HANDLEP BIT(1)
 #define DL_OPT_PORT_TYPE   BIT(2)
 #define DL_OPT_PORT_COUNT  BIT(3)
+#define DL_OPT_SB  BIT(4)
+#define DL_OPT_SB_POOL BIT(5)
+#define DL_OPT_SB_SIZE BIT(6)
+#define DL_OPT_SB_TYPE BIT(7)
+#define DL_OPT_SB_THTYPE   BIT(8)
+#define DL_OPT_SB_TH   BIT(9)
+#define DL_OPT_SB_TC   BIT(10)
 
 struct dl_opts {
uint32_t present; /* flags of present items */
@@ -122,6 +129,13 @@ struct dl_opts {
uint32_t port_index;
enum devlink_port_type port_type;
uint32_t port_count;
+   uint32_t sb_index;
+   uint16_t sb_pool_index;
+   uint32_t sb_pool_size;
+   enum devlink_sb_pool_type sb_pool_type;
+   enum devlink_sb_threshold_type sb_pool_thtype;
+   uint32_t sb_threshold;
+   uint16_t sb_tc_index;
 };
 
 struct dl {
@@ -225,6 +239,42 @@ static int attr_cb(const struct nlattr *attr, void *data)
if (type == DEVLINK_ATTR_PORT_IBDEV_NAME &&
mnl_attr_validate(attr, MNL_TYPE_NUL_STRING) < 0)
return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_INDEX &&
+   mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_SIZE &&
+   mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_INGRESS_POOL_COUNT &&
+   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_EGRESS_POOL_COUNT &&
+   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_INGRESS_TC_COUNT &&
+   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_EGRESS_TC_COUNT &&
+   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_POOL_INDEX &&
+   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_POOL_TYPE &&
+   mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_POOL_SIZE &&
+   mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_POOL_THRESHOLD_TYPE &&
+   mnl_attr_validate(attr, MNL_TYPE_U8) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_THRESHOLD &&
+   mnl_attr_validate(attr, MNL_TYPE_U32) < 0)
+   return MNL_CB_ERROR;
+   if (type == DEVLINK_ATTR_SB_TC_INDEX &&
+   mnl_attr_validate(attr, MNL_TYPE_U16) < 0)
+   return MNL_CB_ERROR;
tb[type] = attr;
return MNL_CB_OK;
 }
@@ -307,6 +357,23 @@ static int ifname_map_lookup(struct dl *dl, const char 
*ifname,
return -ENOENT;
 }
 
+static int ifname_map_rev_lookup(struct dl *dl, const char *bus_name,
+const char *dev_name, uint32_t port_index,
+char **p_ifname)
+{
+   struct ifname_map *ifname_map;
+
+   list_for_each_entry(ifname_map, &dl->ifname_map_list, list) {
+   if (strcmp(bus_name, ifname_map->bus_name) == 0 &&
+   strcmp(dev_name, ifname_map->dev_name) == 0 &&
+   port_index == ifname_map->port_index) {
+   *p_ifname = ifname_map->ifname;
+   return 0;
+   }
+   }
+   return -ENOENT;
+}
+
 static unsigned int strslashcount(char *str)
 {
unsigned int count = 0;
@@ -346,6 +413,20 @@ static int strtouint32_t(const char *str, uint32_t *p_val)
return 0;
 }
 
+static int strtouint16_t(const char *str, uint16_t *p_val)
+{
+   char *endptr;
+   unsigned long int val;
+
+   val = strtoul(str, &endptr, 10);
+   if (endptr == str || *endptr != '\0')
+   return -EINVAL;
+   if (val > USHRT_MAX)
+   return -ERANGE;
+   *p_val = val;
+   return 0;
+}
+
 static int __dl_argv_handle(char *str, char **p_bus_name, char **p_dev_name)
 {
strslashrsplit(str, p_bus_name, p_dev_name);
@@ -486,6 +567,24 @@ static int dl_argv_uint32_t(struct dl *dl, uint32_t *p_val)
return 0;
 }
 
+static int dl_argv_uint16_t(struct dl

Re: [patch net-next 1/4] netdevice: add SW statistics ndo

2016-05-14 Thread Jiri Pirko
Fri, May 13, 2016 at 08:47:48PM CEST, ro...@cumulusnetworks.com wrote:
>On 5/12/16, 11:03 PM, Jiri Pirko wrote:
>> Thu, May 12, 2016 at 11:10:08PM CEST, ro...@cumulusnetworks.com wrote:
>>> On 5/12/16, 4:48 AM, Jiri Pirko wrote:
 From: Nogah Frankel 

 Till now we had a ndo statistics function that returned SW statistics.
 We want to change the "basic" statistics to return HW statistics if
 available.
 In this case we need to expose a new ndo to return the SW statistics.
 Add a new ndo declaration to get SW statistics
 Add a function that gets SW statistics if a competible ndo exist

 Signed-off-by: Nogah Frankel 
 Reviewed-by: Ido Schimmel 
 Signed-off-by: Jiri Pirko 
 ---

>>> To me netdev stats is  combined 'SW + HW' stats for that netdev.
>>> ndo_get_stats64 callback into the drivers does the magic of adding HW stats
>>> to SW (netdev) stats and returning (see enic_get_stats). HW stats is 
>>> available for netdevs
>>> that are offloaded or are backed by hardware. SW stats is the stats that 
>>> the driver maintains
>>> (logical or physical). HW stats is queried and added to the SW stats.
>> I'm not sure I follow. HW stats already contain SW stats. Because on
>> slow path every packet that is not offloaded and goes through kernel is
>> counted into HW stats as well (because it goes through HW port). 
>yes, correct... we don't want to double count those. But since these stats are
>generally queried from hw, I am calling them HW stats.
>you will not really maintain a software counter for this. But, the driver can 
>maintain its own
> counters for rx and tx errors etc and I call these SW stats. They are counted 
> at the driver.
>
>> If you
>> do HW stats + SW stats, what you get makes no sense. Am I missing something?
>If you go by my definition of HW and SW stats above, on a ndo_get_stats64() 
>call,
>you will add the SW counters + HW counters and return. In my definition, the 
>pkts
>that was rx'ed or tx'ed successfully are always in the HW count.
>
>> Btw, looking at enic_get_stats, looks exactly what we introduce for
>> mlxsw in this patchset.
>
>In enic_get_stats, the ones counted in software are the ones taken from 
>'enic->'
> net_stats->rx_over_errors = enic->rq_truncated_pkts;
> net_stats->rx_crc_errors = enic->rq_bad_fcs;
>
>>
>> With this patchset, we only allow user to se the actual stats for
>> slow-path aka SW stats.
>hmm...ok. But i am not sure how many will use this new attribute.
>When you do 'ip -s link show' you really want all counters on that port
>hardware or software does not matter at that point.
>
>My suggestion to move this to ethtool like attribute is because that is an 
>existing
> way to break down your stats which ever way you want. And the best part is it 
> can be
>customized (say rx_pkts_cpu_saw)

I bevieve that ethtool is really not a place to expose sw stats. Does
not make sense.


Re: [PATCH] r8169: default to 64-bit DMA on recent PCIe chips

2016-05-14 Thread Ard Biesheuvel
On 14 May 2016 at 12:41, Ard Biesheuvel  wrote:
> The current logic around the 'use_dac' module parameter prevents the
> r81969 driver from being loadable on 64-bit systems without any RAM
> below 4 GB when the parameter is left at its default value.
>
> So introduce a new default value -1 which indicates that 64-bit DMA
> should be enabled on sufficiently recent PCIe chips, i.e., versions
> RTL_GIGA_MAC_VER_18 or later. Explicit param values of 0 or 1 retain
> the existing behavior of unconditionally enabling/disabling 64-bit DMA
> on 64-bit architectures (i.e., regardless of the type and version of the
> chip)
>
> Since PCIe chips do not need to CPlusCmd Dual Address Cycle to be set,
> make that conditional on the device type as well.
>
> Cc: Realtek linux nic maintainers 
> Signed-off-by: Ard Biesheuvel 
> ---
>
> This is a followup to 'r8169: default to 64-bit DMA on systems without memory
> below 4 GB' [1]. At the request of Francois, this version bases the decision
> whether to use 64-bit DMA by default on whether the device is PCIe and
> sufficiently recent, rather than whether the platform requires 64-bit DMA
> because it does not have any memory below 4 GB to begin with. This is safer,
> since it will prevent the use of such problematic cards on these platforms.
>
> [1] http://article.gmane.org/gmane.linux.network/412246
>
>  drivers/net/ethernet/realtek/r8169.c | 48 +++-
>  1 file changed, 27 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/net/ethernet/realtek/r8169.c 
> b/drivers/net/ethernet/realtek/r8169.c
> index 94f08f1e841c..80bb8ea265ad 100644
> --- a/drivers/net/ethernet/realtek/r8169.c
> +++ b/drivers/net/ethernet/realtek/r8169.c
> @@ -345,7 +345,7 @@ static const struct pci_device_id rtl8169_pci_tbl[] = {
>  MODULE_DEVICE_TABLE(pci, rtl8169_pci_tbl);
>
>  static int rx_buf_sz = 16383;
> -static int use_dac;
> +static int use_dac = -1;
>  static struct {
> u32 msg_enable;
>  } debug = { -1 };
> @@ -8224,20 +8224,6 @@ static int rtl_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
> goto err_out_mwi_2;
> }
>
> -   tp->cp_cmd = 0;
> -
> -   if ((sizeof(dma_addr_t) > 4) &&
> -   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
> -   tp->cp_cmd |= PCIDAC;
> -   dev->features |= NETIF_F_HIGHDMA;
> -   } else {
> -   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
> -   if (rc < 0) {
> -   netif_err(tp, probe, dev, "DMA configuration 
> failed\n");
> -   goto err_out_free_res_3;
> -   }
> -   }
> -
> /* ioremap MMIO region */
> ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
> if (!ioaddr) {
> @@ -8247,11 +8233,30 @@ static int rtl_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
> }
> tp->mmio_addr = ioaddr;
>
> +   /* Identify chip attached to board */
> +   rtl8169_get_mac_version(tp, dev, cfg->default_ver);
> +
> if (!pci_is_pcie(pdev))
> netif_info(tp, probe, dev, "not PCI Express\n");
>
> -   /* Identify chip attached to board */
> -   rtl8169_get_mac_version(tp, dev, cfg->default_ver);

The reordering above is actually unnecessary, it crept in inadvertently.

> +   tp->cp_cmd = 0;
> +
> +   if ((sizeof(dma_addr_t) > 4) &&
> +   (use_dac == 1 || (use_dac == -1 && pci_is_pcie(pdev) &&
> + tp->mac_version >= RTL_GIGA_MAC_VER_18)) &&
> +   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
> +
> +   /* CPlusCmd Dual Access Cycle is only needed for non-PCIe */
> +   if (!pci_is_pcie(pdev))
> +   tp->cp_cmd |= PCIDAC;
> +   dev->features |= NETIF_F_HIGHDMA;
> +   } else {
> +   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
> +   if (rc < 0) {
> +   netif_err(tp, probe, dev, "DMA configuration 
> failed\n");
> +   goto err_out_unmap_4;
> +   }
> +   }
>
> rtl_init_rxcfg(tp);
>
> @@ -8412,12 +8417,12 @@ static int rtl_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>&tp->counters_phys_addr, 
> GFP_KERNEL);
> if (!tp->counters) {
> rc = -ENOMEM;
> -   goto err_out_msi_4;
> +   goto err_out_msi_5;
> }
>
> rc = register_netdev(dev);
> if (rc < 0)
> -   goto err_out_cnt_5;
> +   goto err_out_cnt_6;
>
> pci_set_drvdata(pdev, dev);
>
> @@ -8451,12 +8456,13 @@ static int rtl_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>  out:
> return rc;
>
> -err_out_cnt_5:
> +err_out_cnt_6:
> dma_free_coherent(&pdev->dev, sizeof(*tp->counters), tp->counters,
>   tp->counters_phys_ad

pull request: bluetooth-next 2016-05-14

2016-05-14 Thread Johan Hedberg
Hi Dave,

Here are two more Bluetooth patches for the 4.7 kernel which we wanted
to get into net-next before the merge window opens. Please let me know
if there are any issues pulling. Thanks.

Johan

---
The following changes since commit ed7cbbce544856b20e5811de373cf92e92499771:

  udp: Resolve NULL pointer dereference over flow-based vxlan device 
(2016-05-13 01:56:14 -0400)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to 72f9f8b58bc743e6b6abdc68f60db98486c3ffcf:

  Bluetooth: Add USB ID 13D3:3487 to ath3k (2016-05-13 16:54:59 +0200)


Jiri Slaby (1):
  Bluetooth: fix power_on vs close race

Lauro Costa (1):
  Bluetooth: Add USB ID 13D3:3487 to ath3k

 drivers/bluetooth/ath3k.c | 2 ++
 drivers/bluetooth/btusb.c | 1 +
 net/bluetooth/hci_core.c  | 4 ++--
 3 files changed, 5 insertions(+), 2 deletions(-)



signature.asc
Description: PGP signature


[PATCH] r8169: default to 64-bit DMA on recent PCIe chips

2016-05-14 Thread Ard Biesheuvel
The current logic around the 'use_dac' module parameter prevents the
r81969 driver from being loadable on 64-bit systems without any RAM
below 4 GB when the parameter is left at its default value.

So introduce a new default value -1 which indicates that 64-bit DMA
should be enabled on sufficiently recent PCIe chips, i.e., versions
RTL_GIGA_MAC_VER_18 or later. Explicit param values of 0 or 1 retain
the existing behavior of unconditionally enabling/disabling 64-bit DMA
on 64-bit architectures (i.e., regardless of the type and version of the
chip)

Since PCIe chips do not need to CPlusCmd Dual Address Cycle to be set,
make that conditional on the device type as well.

Cc: Realtek linux nic maintainers 
Signed-off-by: Ard Biesheuvel 
---

This is a followup to 'r8169: default to 64-bit DMA on systems without memory
below 4 GB' [1]. At the request of Francois, this version bases the decision
whether to use 64-bit DMA by default on whether the device is PCIe and
sufficiently recent, rather than whether the platform requires 64-bit DMA
because it does not have any memory below 4 GB to begin with. This is safer,
since it will prevent the use of such problematic cards on these platforms.

[1] http://article.gmane.org/gmane.linux.network/412246

 drivers/net/ethernet/realtek/r8169.c | 48 +++-
 1 file changed, 27 insertions(+), 21 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c 
b/drivers/net/ethernet/realtek/r8169.c
index 94f08f1e841c..80bb8ea265ad 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -345,7 +345,7 @@ static const struct pci_device_id rtl8169_pci_tbl[] = {
 MODULE_DEVICE_TABLE(pci, rtl8169_pci_tbl);
 
 static int rx_buf_sz = 16383;
-static int use_dac;
+static int use_dac = -1;
 static struct {
u32 msg_enable;
 } debug = { -1 };
@@ -8224,20 +8224,6 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
goto err_out_mwi_2;
}
 
-   tp->cp_cmd = 0;
-
-   if ((sizeof(dma_addr_t) > 4) &&
-   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64)) && use_dac) {
-   tp->cp_cmd |= PCIDAC;
-   dev->features |= NETIF_F_HIGHDMA;
-   } else {
-   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
-   if (rc < 0) {
-   netif_err(tp, probe, dev, "DMA configuration failed\n");
-   goto err_out_free_res_3;
-   }
-   }
-
/* ioremap MMIO region */
ioaddr = ioremap(pci_resource_start(pdev, region), R8169_REGS_SIZE);
if (!ioaddr) {
@@ -8247,11 +8233,30 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
}
tp->mmio_addr = ioaddr;
 
+   /* Identify chip attached to board */
+   rtl8169_get_mac_version(tp, dev, cfg->default_ver);
+
if (!pci_is_pcie(pdev))
netif_info(tp, probe, dev, "not PCI Express\n");
 
-   /* Identify chip attached to board */
-   rtl8169_get_mac_version(tp, dev, cfg->default_ver);
+   tp->cp_cmd = 0;
+
+   if ((sizeof(dma_addr_t) > 4) &&
+   (use_dac == 1 || (use_dac == -1 && pci_is_pcie(pdev) &&
+ tp->mac_version >= RTL_GIGA_MAC_VER_18)) &&
+   !pci_set_dma_mask(pdev, DMA_BIT_MASK(64))) {
+
+   /* CPlusCmd Dual Access Cycle is only needed for non-PCIe */
+   if (!pci_is_pcie(pdev))
+   tp->cp_cmd |= PCIDAC;
+   dev->features |= NETIF_F_HIGHDMA;
+   } else {
+   rc = pci_set_dma_mask(pdev, DMA_BIT_MASK(32));
+   if (rc < 0) {
+   netif_err(tp, probe, dev, "DMA configuration failed\n");
+   goto err_out_unmap_4;
+   }
+   }
 
rtl_init_rxcfg(tp);
 
@@ -8412,12 +8417,12 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
   &tp->counters_phys_addr, GFP_KERNEL);
if (!tp->counters) {
rc = -ENOMEM;
-   goto err_out_msi_4;
+   goto err_out_msi_5;
}
 
rc = register_netdev(dev);
if (rc < 0)
-   goto err_out_cnt_5;
+   goto err_out_cnt_6;
 
pci_set_drvdata(pdev, dev);
 
@@ -8451,12 +8456,13 @@ static int rtl_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 out:
return rc;
 
-err_out_cnt_5:
+err_out_cnt_6:
dma_free_coherent(&pdev->dev, sizeof(*tp->counters), tp->counters,
  tp->counters_phys_addr);
-err_out_msi_4:
+err_out_msi_5:
netif_napi_del(&tp->napi);
rtl_disable_msi(pdev, tp);
+err_out_unmap_4:
iounmap(ioaddr);
 err_out_free_res_3:
pci_release_regions(pdev);
-- 
2.7.4



Re: [PATCH nf V2] netfilter: fix oops in nfqueue during netns error unwinding

2016-05-14 Thread Florian Westphal
Eric W. Biederman  wrote:
> Florian Westphal  writes:
> 
> > Eric W. Biederman  wrote:
> >> Florian could you test and verify this patch fixes your issues?
> >
> > Yes, this seems to work.
> >
> > Pablo, I'm fine with this patch going into -nf/stable but I do not think
> > making the pointers per netns is a desireable option in the long term.
> >
> >> Unlike the other possibilities that have been discussed this also
> >> addresses the nf_queue path as well as the nf_queue_hook_drop path.
> >
> > The nf_queue path should have been fine, no?
> >
> > Or putting it differently: can we start processing skbs before a netns
> > is fully initialized?
> 
> The practical case that worries me is what happens when someone does
> "rmmod nfnetlink_queue" while the system is running.  It appears to me
> that today we could free the per netns data during the rcu grace period
> and cause a similar issue in nfnl_queue_pernet.
>
> That looks like it could affect both the nf_queue path and the
> nf_queue_nf_hook_drop path.

OK, I'll check this again but I seem to recall this was fine (the
nfqueue module exit path sets the handler to NULL before doing anything
else).

The normal netns exit path should be fine too as exit and free happens
in two distinct loops, i.e. while (without your change) we can have
calls to nf_queue_hook_drop after the nfqueue netns exit function was
called, these calls will always happen before the pernets data is
freed.


[PATCH] mac80211_hwsim: Allow wmediumd to attach to radios created in its netns

2016-05-14 Thread Martin Willi
Registering wmediumd is currently limited to the initial network
namespace. This patch enables wmediumd to attach from non-initial
network namespaces using a user namespace having CAP_NET_ADMIN. A
registered wmediumd can forward frames on radios that have been created
in the same network namespace, even if they have been moved to other
network namespaces.

The wmediumd Netlink portid is tracked per net namespace. Additionally,
the portid is stored on all radios created in that net namespace to
simplify the portid lookup in the data path.

Signed-off-by: Martin Willi 
---
 drivers/net/wireless/mac80211_hwsim.c | 92 +--
 1 file changed, 76 insertions(+), 16 deletions(-)

diff --git a/drivers/net/wireless/mac80211_hwsim.c 
b/drivers/net/wireless/mac80211_hwsim.c
index a16cd0c..5bb9f0a 100644
--- a/drivers/net/wireless/mac80211_hwsim.c
+++ b/drivers/net/wireless/mac80211_hwsim.c
@@ -41,8 +41,6 @@ MODULE_AUTHOR("Jouni Malinen");
 MODULE_DESCRIPTION("Software simulator of 802.11 radio(s) for mac80211");
 MODULE_LICENSE("GPL");
 
-static u32 wmediumd_portid;
-
 static int radios = 2;
 module_param(radios, int, 0444);
 MODULE_PARM_DESC(radios, "Number of simulated radios");
@@ -258,6 +256,7 @@ static int hwsim_netgroup;
 
 struct hwsim_net {
int netgroup;
+   u32 wmediumd;
 };
 
 static inline int hwsim_net_get_netgroup(struct net *net)
@@ -274,6 +273,20 @@ static inline void hwsim_net_set_netgroup(struct net *net)
hwsim_net->netgroup = hwsim_netgroup++;
 }
 
+static inline u32 hwsim_net_get_wmediumd(struct net *net)
+{
+   struct hwsim_net *hwsim_net = net_generic(net, hwsim_net_id);
+
+   return hwsim_net->wmediumd;
+}
+
+static inline void hwsim_net_set_wmediumd(struct net *net, u32 portid)
+{
+   struct hwsim_net *hwsim_net = net_generic(net, hwsim_net_id);
+
+   hwsim_net->wmediumd = portid;
+}
+
 static struct class *hwsim_class;
 
 static struct net_device *hwsim_mon; /* global monitor netdev */
@@ -552,6 +565,8 @@ struct mac80211_hwsim_data {
 
/* group shared by radios created in the same netns */
int netgroup;
+   /* wmediumd portid responsile for netgroup of this radio */
+   u32 wmediumd;
 
int power_level;
 
@@ -983,6 +998,29 @@ static bool hwsim_ps_rx_ok(struct mac80211_hwsim_data 
*data,
return true;
 }
 
+static int hwsim_unicast_netgroup(struct mac80211_hwsim_data *data,
+ struct sk_buff *skb, int portid)
+{
+   struct net *net;
+   bool found = false;
+   int res = -ENOENT;
+
+   rcu_read_lock();
+   for_each_net_rcu(net) {
+   if (data->netgroup == hwsim_net_get_netgroup(net)) {
+   res = genlmsg_unicast(net, skb, portid);
+   found = true;
+   break;
+   }
+   }
+   rcu_read_unlock();
+
+   if (!found)
+   nlmsg_free(skb);
+
+   return res;
+}
+
 static void mac80211_hwsim_tx_frame_nl(struct ieee80211_hw *hw,
   struct sk_buff *my_skb,
   int dst_portid)
@@ -1062,7 +1100,7 @@ static void mac80211_hwsim_tx_frame_nl(struct 
ieee80211_hw *hw,
goto nla_put_failure;
 
genlmsg_end(skb, msg_head);
-   if (genlmsg_unicast(&init_net, skb, dst_portid))
+   if (hwsim_unicast_netgroup(data, skb, dst_portid))
goto err_free_txskb;
 
/* Enqueue the packet */
@@ -1355,7 +1393,7 @@ static void mac80211_hwsim_tx(struct ieee80211_hw *hw,
mac80211_hwsim_monitor_rx(hw, skb, channel);
 
/* wmediumd mode check */
-   _portid = ACCESS_ONCE(wmediumd_portid);
+   _portid = ACCESS_ONCE(data->wmediumd);
 
if (_portid)
return mac80211_hwsim_tx_frame_nl(hw, skb, _portid);
@@ -1451,7 +1489,8 @@ static void mac80211_hwsim_tx_frame(struct ieee80211_hw 
*hw,
struct sk_buff *skb,
struct ieee80211_channel *chan)
 {
-   u32 _pid = ACCESS_ONCE(wmediumd_portid);
+   struct mac80211_hwsim_data *data = hw->priv;
+   u32 _pid = ACCESS_ONCE(data->wmediumd);
 
if (ieee80211_hw_check(hw, SUPPORTS_RC_TABLE)) {
struct ieee80211_tx_info *txi = IEEE80211_SKB_CB(skb);
@@ -2796,6 +2835,20 @@ static struct mac80211_hwsim_data 
*get_hwsim_data_ref_from_addr(const u8 *addr)
return data;
 }
 
+static void hwsim_register_wmediumd(struct net *net, u32 portid)
+{
+   struct mac80211_hwsim_data *data;
+
+   hwsim_net_set_wmediumd(net, portid);
+
+   spin_lock_bh(&hwsim_radio_lock);
+   list_for_each_entry(data, &hwsim_radios, list) {
+   if (data->netgroup == hwsim_net_get_netgroup(net))
+   data->wmediumd = portid;
+   }
+   spin_unlock_bh(&hwsim_radio_lock);
+}
+
 static int hwsim_tx_info_frame_received_nl(struct sk_buff *skb_2,
 

Re: What ixgbe devices support HWTSTAMP_FILTER_ALL for hardware time stamping?

2016-05-14 Thread Jeff Kirsher
On Fri, 2016-05-13 at 16:12 -0700, Guy Harris wrote:
> libpcap offers the ability to request hardware time stamping for packets
> and to inquire which forms of hardware time stamping, if any, are
> supported for an interface.
> 
> The Linux implementation currently implements the inquiry by doing a
> ETHTOOL_GET_TS_INFO SIOETHTOOL ioctl and looking at the so_timestamping
> bits, if the linux/ethtool.h header defines ETHTOOL_GET_TS_INFO and the
> ioctl succeeds on the device.
> 
> This is inadequate - as libpcap requests hardware time stamping for all
> packets, it should also check whether HWTSTAMP_FILTER_ALL is set in
> rx_filters, and only offer hardware time stamping if it's set.
> 
> The code in ixgbe_ptp.c does:
> 
> case HWTSTAMP_FILTER_PTP_V1_L4_EVENT:
> case HWTSTAMP_FILTER_ALL:
> /* The X550 controller is capable of timestamping all
> packets,
>  * which allows it to accept any filter.
>  */
> if (hw->mac.type >= ixgbe_mac_X550) {
> tsync_rx_ctl |= IXGBE_TSYNCRXCTL_TYPE_ALL;
> config->rx_filter = HWTSTAMP_FILTER_ALL;
> adapter->flags |= IXGBE_FLAG_RX_HWTSTAMP_ENABLED;
> break;
> }
> /* fall through */
> default:
> /*
>  * register RXMTRL must be set in order to do V1 packets,
>  * therefore it is not possible to time stamp both V1
> Sync and
>  * Delay_Req messages and hardware does not support
>  * timestamping all packets => return error
>  */
> adapter->flags &= ~(IXGBE_FLAG_RX_HWTSTAMP_ENABLED |
>     IXGBE_FLAG_RX_HWTSTAMP_IN_REGISTER);
> config->rx_filter = HWTSTAMP_FILTER_NONE;
> return -ERANGE;
> 
> which seems to indicate that only the X550 controller supports time
> stamping all packets in hardware.
> 
> However, the code in ixgbe_ethtool.c does:
> 
> switch (adapter->hw.mac.type) {
> case ixgbe_mac_X550:
> case ixgbe_mac_X550EM_x:
> case ixgbe_mac_X540:
> case ixgbe_mac_82599EB:
> info->so_timestamping =
> SOF_TIMESTAMPING_TX_SOFTWARE |
> SOF_TIMESTAMPING_RX_SOFTWARE |
> SOF_TIMESTAMPING_SOFTWARE |
> SOF_TIMESTAMPING_TX_HARDWARE |
> SOF_TIMESTAMPING_RX_HARDWARE |
> SOF_TIMESTAMPING_RAW_HARDWARE;
> 
> if (adapter->ptp_clock)
> info->phc_index = ptp_clock_index(adapter-
> >ptp_clock);
> else
> info->phc_index = -1;
> 
> info->tx_types =
> (1 << HWTSTAMP_TX_OFF) |
> (1 << HWTSTAMP_TX_ON);
> 
> info->rx_filters =
> (1 << HWTSTAMP_FILTER_NONE) |
> (1 << HWTSTAMP_FILTER_PTP_V1_L4_SYNC) |
> (1 << HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ) |
> (1 << HWTSTAMP_FILTER_PTP_V2_EVENT);
> break;
> default:
> return ethtool_op_get_ts_info(dev, info);
> }
> 
> which draws no distinction between the X550 controller and the X540 and
> 82599, and doesn't say *any* of them support HWTSTAMP_FILTER_ALL.
> 
> Is it the case that only the ixgbe_mac_X550 and ixgbe_mac_X550EM_x
> controllers support HWTSTAMP_FILTER_ALL?  If so, shouldn't
> ixgbe_get_ts_info() be doing something such as:
> 
> switch (adapter->hw.mac.type) {
> case ixgbe_mac_X550:
> case ixgbe_mac_X550EM_x:
> case ixgbe_mac_X540:
> case ixgbe_mac_82599EB:
> info->so_timestamping =
> SOF_TIMESTAMPING_TX_SOFTWARE |
> SOF_TIMESTAMPING_RX_SOFTWARE |
> SOF_TIMESTAMPING_SOFTWARE |
> SOF_TIMESTAMPING_TX_HARDWARE |
> SOF_TIMESTAMPING_RX_HARDWARE |
> SOF_TIMESTAMPING_RAW_HARDWARE;
> 
> if (adapter->ptp_clock)
> info->phc_index = ptp_clock_index(adapter-
> >ptp_clock);
> else
> info->phc_index = -1;
> 
> info->tx_types =
> (1 << HWTSTAMP_TX_OFF) |
> (1 << HWTSTAMP_TX_ON);
> 
> info->rx_filters =
> (1 << HWTSTAMP_FILTER_NONE) |
> (1 << HWTSTAMP_FILTER_PTP_V1_L4_SYNC) |
> (1 << HWTSTAMP_FILTER_PTP_V1_L4_DELAY_REQ) |
> (1 << HWTSTAMP_FILTER_PTP_V2_EVENT);
> if (adapter->hw.mac.type >= ixgbe_mac_X550)
>    

Re: What ixgbe devices support HWTSTAMP_FILTER_ALL for hardware time stamping?

2016-05-14 Thread Richard Cochran
On Fri, May 13, 2016 at 04:12:52PM -0700, Guy Harris wrote:
> The Linux implementation currently implements the inquiry by doing a
> ETHTOOL_GET_TS_INFO SIOETHTOOL ioctl and looking at the
> so_timestamping bits, if the linux/ethtool.h header defines
> ETHTOOL_GET_TS_INFO and the ioctl succeeds on the device.

So far, so good. 

> This is inadequate - as libpcap requests hardware time stamping for
> all packets, it should also check whether HWTSTAMP_FILTER_ALL is set
> in rx_filters, and only offer hardware time stamping if it's set.

The SO_TIMESTAMPING and SIOCSHWTSTAMP interfaces predate
ETHTOOL_GET_TS_INFO, and they work fine without it.  Applications
should simply use SIOCSHWTSTAMP to request the mode that they need and
check the result.

That said, the information in ETHTOOL_GET_TS_INFO should be correct.

> Is it the case that only the ixgbe_mac_X550 and ixgbe_mac_X550EM_x
> controllers support HWTSTAMP_FILTER_ALL?  

Looks like it.

> If so, shouldn't ixgbe_get_ts_info() be doing something such as:

>   if (adapter->hw.mac.type >= ixgbe_mac_X550)
>   info->rx_filters |= (1 << HWTSTAMP_FILTER_ALL);

Yes, probably.

> From a quick scan of drivers/net, it looks as if
> 
>   drivers/net/ethernet/cavium/liquidio
> 
> also support HWTSTAMP_FILTER_ALL but don't advertise it, 

For this and the other drivers you mentioned, their maintainers might
appreciate patches...

Thanks,
Richard