Re: [PATCH] - gre: add eth_p_teb gro-handler for OVS with gre tunnels

2015-10-12 Thread Jesse Gross
On Mon, Oct 12, 2015 at 3:26 PM, Ramu Ramamurthy
 wrote:
>
> Problem:
> 
>
> When using OVS with GRE tunnels, and GRO is enabled on the nic,
> We find that GRO doesnt really take effect. As a result, TCP stream
> performance on a 10G nic is around 2-3Gbps.
>
> Root Cause:
> ---
>
> The protocol field set in GRE (by OVS) is ETH_P_TEB.
> The code in gre_gro_receive() (gre_offload.c) calls
> gro_find_receive_by_type() to determine a gro handler for the
> ETH_P_TEB protocol. However, no such protocol is registered
> at the device layer (only ETH_P_IP, ETH_P_IPV6, and mpls related
> protocols are registered). Hence, GRO is skipped.

Why doesn't this work?

commit 9b174d88c257150562b0101fcc6cb6c3cb74275c
Author: Jesse Gross 
Date:   Tue Dec 30 19:10:15 2014 -0800

net: Add Transparent Ethernet Bridging GRO support.

Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.

Signed-off-by: Jesse Gross 
Signed-off-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] ipv6 route: use err pointers instead of returning pointer by reference

2015-10-12 Thread David Miller
From: Roopa Prabhu 
Date: Sat, 10 Oct 2015 08:26:36 -0700

> From: Roopa Prabhu 
> 
> This patch makes ip6_route_info_create return err pointer instead of
> returning the rt pointer by reference as suggested  by Dave
> 
> Signed-off-by: Roopa Prabhu 
> ---
> v1 - v2: remove unnecessary NULL initialization of rt as pointed out by scott 
> feldman

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] net: hns: fix the unknown phy_nterface_t type error

2015-10-12 Thread David Miller
From: huangdaode 
Date: Sat, 10 Oct 2015 17:20:38 +0800

> This patch fix the building error reported by Jiri Pirko 
> 
> drivers/net/ethernet/hisilicon/hns/hnae.h:465:2: error: unknown type
> name 'phy_interface_t'
> phy_interface_t phy_if;
>   ^
> the full build log is on https://lists.01.org/pipermail/kbuild-all.
> 
> Signed-off-by: huangdaode 
> Signed-off-by: yankejian 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 1/7] switchdev: introduce switchdev workqueue

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 10:54 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> This is going to be used for deferred operations.
>
> Signed-off-by: Jiri Pirko 

Acked-by: Scott Feldman 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> When object is used in deferred work, we cannot use pointers in
> switchdev object structures because the memory they point at may be already
> used by someone else. So rather do local copy of the value.
>
> Signed-off-by: Jiri Pirko 

Acked-by: Scott Feldman 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Since spinlock is held here, defer the switchdev operation.
>
> Signed-off-by: Jiri Pirko 
> ---
>  net/bridge/br_fdb.c | 5 -
>  net/bridge/br_if.c  | 3 +++
>  2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
> index f5e7da0..c88bd8e 100644
> --- a/net/bridge/br_fdb.c
> +++ b/net/bridge/br_fdb.c
> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, const 
> unsigned char *addr)
>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>  {
> struct switchdev_obj_port_fdb fdb = {
> -   .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
> +   .obj = {
> +   .id = SWITCHDEV_OBJ_ID_PORT_FDB,
> +   .flags = SWITCHDEV_F_DEFER,
> +   },
> .vid = f->vlan_id,
> };
>
> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
> index 934cae9..09147cb 100644
> --- a/net/bridge/br_if.c
> +++ b/net/bridge/br_if.c
> @@ -24,6 +24,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "br_private.h"
>
> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
> list_del_rcu(>list);
>
> br_fdb_delete_by_port(br, p, 0, 1);
> +   switchdev_flush_deferred();
> +

This potentially flushes other (valid) work on the deferred queue not
related to FDB del.

I wonder if this flush step is necessary at all?  The work we deferred
to delete the FDB entry can still happen after the port has been
removed (del_nbp).  If the port driver/device find the FDB entry, then
delete it, otherwise ignore it.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Wangnan (F)



On 2015/10/13 3:29, Alexei Starovoitov wrote:

On 10/12/15 2:02 AM, Kaixu Xia wrote:

+extern const struct bpf_func_proto bpf_perf_event_sample_enable_proto;
+extern const struct bpf_func_proto bpf_perf_event_sample_disable_proto;


externs are unnecessary. Just make them static.
Also I prefer single helper that takes a flag, so we can extend it
instead of adding func_id for every little operation.

To avoid conflicts if you touch kernel/bpf/* or bpf.h please always
base your patches of net-next.

> +atomic_set(>perf_sample_disable, 0);

global flag per map is no go.
events are independent and should be treated as such.



Then how to avoid racing? For example, when one core disabling all events
in a map, another core is enabling all of them. This racing may causes 
sereval
perf events in a map dump samples while other events not. To avoid such 
racing

I think some locking must be introduced, then cost is even higher.

The reason why we introduce an atomic pointer is because each operation 
should
controls a set of events, not one event, due to the per-cpu manner of 
perf events.


Thank you.


Please squash these two patches, since they're part of one logical
feature. Splitting them like this only makes review harder.

--
To unsubscribe from this list: send the line "unsubscribe 
linux-kernel" in

the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Wangnan (F)



On 2015/10/13 12:16, Alexei Starovoitov wrote:

On 10/12/15 8:51 PM, Wangnan (F) wrote:

why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so 
just

disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its display.
We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.


ok, but that use case may have different enable/disable pattern.
In sys_write example ultra-fast enable/disable is must have, since
the whole syscall is fast and overhead should be minimal.
but for display refresh? we're talking milliseconds, no?
Can you just ioctl() it from user space?
If cost of enable/disable is high or the time range between toggling is
long, then doing it from the bpf program doesn't make sense. Instead
the program can do bpf_perf_event_output() to send a notification to
user space that condition is met and the user space can ioctl() events.



OK. I think I understand your design principle that, everything inside BPF
should be as fast as possible.

Make userspace control events using ioctl make things harder. You know that
'perf record' itself doesn't care too much about events it reveived. It only
copies data to perf.data, but what we want is to use perf record simply like
this:

 # perf record -e evt=cycles -e control.o/pmu=evt/ -a sleep 100

And in control.o we create uprobe point to mark the start and finish of 
a frame:


 SEC("target=/a/b/c.o\nstartFrame=0x123456")
 int startFrame(void *) {
   bpf_pmu_enable(pmu);
   return 1;
 }

 SEC("target=/a/b/c.o\nfinishFrame=0x234568")
 int finishFrame(void *) {
   bpf_pmu_disable(pmu);
   return 1;
 }

I think it is make sence also.

I still think perf is not necessary be independent each other. You know 
we have

PERF_EVENT_IOC_SET_OUTPUT which can set multiple events output through one
ringbuffer. This way perf events are connected.

I think the 'set disable/enable' design in this patchset satisify the 
design goal
that in BPF program we only do simple and fast things. The only 
inconvience is
we add something into map, which is ugly. What about using similar 
implementation
like PERF_EVENT_IOC_SET_OUTPUT, creating a new ioctl like 
PERF_EVENT_IOC_SET_ENABLER,
then let perf to select an event as 'enabler', then BPF can still 
control one atomic

variable to enable/disable a set of events.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 3/7] switchdev: remove pointers from switchdev objects

2015-10-12 Thread John Fastabend
On 15-10-12 08:01 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> When object is used in deferred work, we cannot use pointers in
>> switchdev object structures because the memory they point at may be already
>> used by someone else. So rather do local copy of the value.
>>
>> Signed-off-by: Jiri Pirko 
> 
> Acked-by: Scott Feldman 
> 

also fwiw

Reviewed-by: John Fastabend 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] switchdev: check if the vlan id is in the proper vlan range

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 5:31 AM, Nikolay Aleksandrov
 wrote:
> From: Nikolay Aleksandrov 
>
> VLANs 0 and 4095 are reserved and shouldn't be used, add checks to
> switchdev similar to the bridge. Also make sure ids above 4095 cannot
> be passed either.
>
> Fixes: 47f8328bb1a4 ("switchdev: add new switchdev bridge setlink")
> Signed-off-by: Nikolay Aleksandrov 

Acked-by: Scott Feldman 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 07/17] i40e: remove redundant call

2015-10-12 Thread Jeff Kirsher
From: Mitch Williams 

This function call isn't needed here; the same function is already
called by i40e_reset_vf.

Change-ID: I96ccbf91b752965c9e28fe895d4c7d4c46e3ba44
Signed-off-by: Mitch Williams 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c 
b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
index ee747dc..2102280 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c
@@ -964,8 +964,6 @@ int i40e_alloc_vfs(struct i40e_pf *pf, u16 num_alloc_vfs)
/* VF resources get allocated during reset */
i40e_reset_vf([i], false);
 
-   /* enable VF vplan_qtable mappings */
-   i40e_enable_vf_mappings([i]);
}
pf->num_alloc_vfs = num_alloc_vfs;
 
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 06/17] i40e: Convert CEE App TLV selector to IEEE selector

2015-10-12 Thread Jeff Kirsher
From: Greg Bowers 

Changes the parsing of CEE App TLVs to fill in the App selector in struct
i40e_dcbx_config with the IEEE App selector so the caller doesn't have to
consider whether the App came from a CEE or IEEE DCBX negotiation.

Change-ID: Ia7d9d664cde04d2ebcc9822fd22e4929c6edab3a
Signed-off-by: Greg Bowers 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_dcb.c  | 16 
 drivers/net/ethernet/intel/i40e/i40e_type.h |  2 ++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c 
b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
index 251a841..2691277 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
@@ -380,7 +380,7 @@ static void i40e_parse_cee_app_tlv(struct i40e_cee_feat_tlv 
*tlv,
 {
u16 length, typelength, offset = 0;
struct i40e_cee_app_prio *app;
-   u8 i, up;
+   u8 i, up, selector;
 
typelength = ntohs(tlv->hdr.typelen);
length = (u16)((typelength & I40E_LLDP_TLV_LEN_MASK) >>
@@ -397,9 +397,17 @@ static void i40e_parse_cee_app_tlv(struct 
i40e_cee_feat_tlv *tlv,
break;
}
dcbcfg->app[i].priority = up;
-   /* Get Selector from lower 2 bits */
-   dcbcfg->app[i].selector = (app->upper_oui_sel &
-  I40E_CEE_APP_SELECTOR_MASK);
+
+   /* Get Selector from lower 2 bits, and convert to IEEE */
+   selector = (app->upper_oui_sel & I40E_CEE_APP_SELECTOR_MASK);
+   if (selector == I40E_CEE_APP_SEL_ETHTYPE)
+   dcbcfg->app[i].selector = I40E_APP_SEL_ETHTYPE;
+   else if (selector == I40E_CEE_APP_SEL_TCPIP)
+   dcbcfg->app[i].selector = I40E_APP_SEL_TCPIP;
+   else
+   /* Keep selector as it is for unknown types */
+   dcbcfg->app[i].selector = selector;
+
dcbcfg->app[i].protocolid = ntohs(app->protocol);
/* Move to next app */
offset += sizeof(*app);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index c8f7a52..4ec3ffa 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -418,6 +418,8 @@ struct i40e_fc_info {
 #define I40E_APP_PROTOID_FIP   0x8914
 #define I40E_APP_SEL_ETHTYPE   0x1
 #define I40E_APP_SEL_TCPIP 0x2
+#define I40E_CEE_APP_SEL_ETHTYPE   0x0
+#define I40E_CEE_APP_SEL_TCPIP 0x1
 
 /* CEE or IEEE 802.1Qaz ETS Configuration data */
 struct i40e_dcb_ets_config {
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 12/17] i40e/i40evf: Add module_types and update_link_info

2015-10-12 Thread Jeff Kirsher
From: Catherine Sullivan 

Add a module_types variable to the link_info struct to save the module
information from get_phy_capabilities. This information can be used to
determine which speeds the module supports.

Also add a new function update_link_info which updates the module_types
parameter and then calls get_link_info. This function should be called
in place of get_link_info so that the module_types variable stays
up-to-date with the rest of the link information.

The EAS table does not reflect the values that are actually returned,
so instead, basing these values on the Ethernet compliance codes
specified in table 33 of SFF-8436 as these have been accurate.

Use the new variable in ethtool to differentiate between a 10G/1G dual
speed fiber module and a 10G only module.

Change-ID: Ib7585cce321319c10ce15180054c41a6cbd41389
Signed-off-by: Catherine Sullivan 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c| 30 +---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c   | 16 +
 drivers/net/ethernet/intel/i40e/i40e_main.c  |  2 +-
 drivers/net/ethernet/intel/i40e/i40e_prototype.h |  1 +
 drivers/net/ethernet/intel/i40e/i40e_type.h  | 18 ++
 drivers/net/ethernet/intel/i40evf/i40e_type.h| 18 ++
 6 files changed, 76 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index d9519ce..c1d0dca 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -1717,14 +1717,14 @@ enum i40e_status_code i40e_set_fc(struct i40e_hw *hw, 
u8 *aq_failures,
*aq_failures |= I40E_SET_FC_AQ_FAIL_SET;
}
/* Update the link info */
-   status = i40e_aq_get_link_info(hw, true, NULL, NULL);
+   status = i40e_update_link_info(hw);
if (status) {
/* Wait a little bit (on 40G cards it sometimes takes a really
 * long time for link to come back from the atomic reset)
 * and try once more
 */
msleep(1000);
-   status = i40e_aq_get_link_info(hw, true, NULL, NULL);
+   status = i40e_update_link_info(hw);
}
if (status)
*aq_failures |= I40E_SET_FC_AQ_FAIL_UPDATE;
@@ -2315,7 +2315,7 @@ i40e_status i40e_get_link_status(struct i40e_hw *hw, bool 
*link_up)
i40e_status status = 0;
 
if (hw->phy.get_link_info) {
-   status = i40e_aq_get_link_info(hw, true, NULL, NULL);
+   status = i40e_update_link_info(hw);
 
if (status)
i40e_debug(hw, I40E_DEBUG_LINK, "get link failed: 
status %d\n",
@@ -2328,6 +2328,30 @@ i40e_status i40e_get_link_status(struct i40e_hw *hw, 
bool *link_up)
 }
 
 /**
+ * i40e_updatelink_status - update status of the HW network link
+ * @hw: pointer to the hw struct
+ **/
+i40e_status i40e_update_link_info(struct i40e_hw *hw)
+{
+   struct i40e_aq_get_phy_abilities_resp abilities;
+   i40e_status status = 0;
+
+   status = i40e_aq_get_link_info(hw, true, NULL, NULL);
+   if (status)
+   return status;
+
+   status = i40e_aq_get_phy_capabilities(hw, false, false, ,
+ NULL);
+   if (status)
+   return status;
+
+   memcpy(hw->phy.link_info.module_type, _type,
+  sizeof(hw->phy.link_info.module_type));
+
+   return status;
+}
+
+/**
  * i40e_aq_add_veb - Insert a VEB between the VSI and the MAC
  * @hw: pointer to the hw struct
  * @uplink_seid: the MAC or other gizmo SEID
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 148f614..46019e9 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -307,12 +307,18 @@ static void i40e_get_settings_link_up(struct i40e_hw *hw,
case I40E_PHY_TYPE_10GBASE_LR:
case I40E_PHY_TYPE_1000BASE_SX:
case I40E_PHY_TYPE_1000BASE_LX:
-   ecmd->supported = SUPPORTED_1baseT_Full |
- SUPPORTED_1000baseT_Full;
+   ecmd->supported = SUPPORTED_1baseT_Full;
+   if (hw_link_info->module_type[2] &
+   I40E_MODULE_TYPE_1000BASE_SX ||
+   hw_link_info->module_type[2] &
+   I40E_MODULE_TYPE_1000BASE_LX) {
+   ecmd->supported |= SUPPORTED_1000baseT_Full;
+   if (hw_link_info->requested_speeds &
+   I40E_LINK_SPEED_1GB)
+   ecmd->advertising |= ADVERTISED_1000baseT_Full;
+   }
if 

[net-next 05/17] i40e/i40evf: Add info to nvm info struct for OEM version data

2015-10-12 Thread Jeff Kirsher
From: Carolyn Wyborny 

This patch adds a member to the nvm_info struct for oem_ver info to be
output either by OID or ethtool.

Change-ID: I1e5d513ae67622e2af17042924fdb4b5d6d85366
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.c | 9 -
 drivers/net/ethernet/intel/i40e/i40e_type.h   | 3 +++
 drivers/net/ethernet/intel/i40evf/i40e_type.h | 2 ++
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.c 
b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
index fa2e916..5c950e2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.c
@@ -553,8 +553,9 @@ shutdown_arq_out:
  **/
 i40e_status i40e_init_adminq(struct i40e_hw *hw)
 {
-   i40e_status ret_code;
+   u16 cfg_ptr, oem_hi, oem_lo;
u16 eetrack_lo, eetrack_hi;
+   i40e_status ret_code;
int retry = 0;
 
/* verify input for valid configuration */
@@ -613,6 +614,12 @@ i40e_status i40e_init_adminq(struct i40e_hw *hw)
i40e_read_nvm_word(hw, I40E_SR_NVM_EETRACK_LO, _lo);
i40e_read_nvm_word(hw, I40E_SR_NVM_EETRACK_HI, _hi);
hw->nvm.eetrack = (eetrack_hi << 16) | eetrack_lo;
+   i40e_read_nvm_word(hw, I40E_SR_BOOT_CONFIG_PTR, _ptr);
+   i40e_read_nvm_word(hw, (cfg_ptr + I40E_NVM_OEM_VER_OFF),
+  _hi);
+   i40e_read_nvm_word(hw, (cfg_ptr + (I40E_NVM_OEM_VER_OFF + 1)),
+  _lo);
+   hw->nvm.oem_ver = ((u32)oem_hi << 16) | oem_lo;
 
if (hw->aq.api_maj_ver > I40E_FW_API_VERSION_MAJOR) {
ret_code = I40E_ERR_FIRMWARE_API_VERSION;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_type.h 
b/drivers/net/ethernet/intel/i40e/i40e_type.h
index d1ec5a4..c8f7a52 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_type.h
@@ -289,6 +289,7 @@ struct i40e_nvm_info {
bool blank_nvm_mode;  /* is NVM empty (no FW present)*/
u16 version;  /* NVM package version */
u32 eetrack;  /* NVM data version */
+   u32 oem_ver;  /* OEM version info */
 };
 
 /* definitions used in NVM update support */
@@ -1204,6 +1205,8 @@ struct i40e_hw_port_stats {
 #define I40E_SR_EMP_MODULE_PTR 0x0F
 #define I40E_SR_PBA_FLAGS  0x15
 #define I40E_SR_PBA_BLOCK_PTR  0x16
+#define I40E_SR_BOOT_CONFIG_PTR0x17
+#define I40E_NVM_OEM_VER_OFF   0x83
 #define I40E_SR_NVM_DEV_STARTER_VERSION0x18
 #define I40E_SR_NVM_WAKE_ON_LAN0x19
 #define I40E_SR_ALTERNATE_SAN_MAC_ADDRESS_PTR  0x27
diff --git a/drivers/net/ethernet/intel/i40evf/i40e_type.h 
b/drivers/net/ethernet/intel/i40evf/i40e_type.h
index a59b60f..b3c65dd 100644
--- a/drivers/net/ethernet/intel/i40evf/i40e_type.h
+++ b/drivers/net/ethernet/intel/i40evf/i40e_type.h
@@ -288,6 +288,7 @@ struct i40e_nvm_info {
bool blank_nvm_mode;  /* is NVM empty (no FW present)*/
u16 version;  /* NVM package version */
u32 eetrack;  /* NVM data version */
+   u32 oem_ver;  /* OEM version info */
 };
 
 /* definitions used in NVM update support */
@@ -1173,6 +1174,7 @@ struct i40e_hw_port_stats {
 /* Checksum and Shadow RAM pointers */
 #define I40E_SR_NVM_CONTROL_WORD   0x00
 #define I40E_SR_EMP_MODULE_PTR 0x0F
+#define I40E_NVM_OEM_VER_OFF   0x83
 #define I40E_SR_NVM_DEV_STARTER_VERSION0x18
 #define I40E_SR_NVM_WAKE_ON_LAN0x19
 #define I40E_SR_ALTERNATE_SAN_MAC_ADDRESS_PTR  0x27
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 03/17] i40e: Use BIT() macro for priority map parsing

2015-10-12 Thread Jeff Kirsher
From: Neerav Parikh 

Replace one left over (1 << up) in the i40e_dcb.c file with the BIT()
macro.

Change-ID: I39492a400a2cee5ac566143a5b436cc478bea0db
Signed-off-by: Neerav Parikh 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_dcb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_dcb.c 
b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
index 6fa07ef..251a841 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_dcb.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_dcb.c
@@ -393,7 +393,7 @@ static void i40e_parse_cee_app_tlv(struct i40e_cee_feat_tlv 
*tlv,
for (i = 0; i < dcbcfg->numapps; i++) {
app = (struct i40e_cee_app_prio *)(tlv->tlvinfo + offset);
for (up = 0; up < I40E_MAX_USER_PRIORITY; up++) {
-   if (app->prio_map & (1 << up))
+   if (app->prio_map & BIT(up))
break;
}
dcbcfg->app[i].priority = up;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 15/17] ixgbe: add flow control ethertype to the anti-spoofing filter

2015-10-12 Thread Jeff Kirsher
From: Emil Tantilov 

This patch makes sure that flow control packets initiated by the VF are
dropped and reported as spoofed.

Flow control packets can be used to limit the throughput or as DOS
attack when generated from a VF. Flow control is not supported per VF
hence any pause frames generated from a VF are considered malicious.

Also cleaned up indentation and some redundant comments.

Signed-off-by: Emil Tantilov 
Tested-by: Krishneil Singh 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 16 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |  4 
 2 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 1910039..c4608f8 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -3723,14 +3723,20 @@ static void ixgbe_configure_virtualization(struct 
ixgbe_adapter *adapter)
hw->mac.ops.set_mac_anti_spoofing(hw, (adapter->num_vfs != 0),
  adapter->num_vfs);
 
-   /* Ensure LLDP is set for Ethertype Antispoofing if we will be
+   /* Ensure LLDP and FC is set for Ethertype Antispoofing if we will be
 * calling set_ethertype_anti_spoofing for each VF in loop below
 */
-   if (hw->mac.ops.set_ethertype_anti_spoofing)
+   if (hw->mac.ops.set_ethertype_anti_spoofing) {
IXGBE_WRITE_REG(hw, IXGBE_ETQF(IXGBE_ETQF_FILTER_LLDP),
-   (IXGBE_ETQF_FILTER_EN| /* enable filter */
-IXGBE_ETQF_TX_ANTISPOOF | /* tx antispoof */
-IXGBE_ETH_P_LLDP));   /* LLDP eth type */
+   (IXGBE_ETQF_FILTER_EN|
+IXGBE_ETQF_TX_ANTISPOOF |
+IXGBE_ETH_P_LLDP));
+
+   IXGBE_WRITE_REG(hw, IXGBE_ETQF(IXGBE_ETQF_FILTER_FC),
+   (IXGBE_ETQF_FILTER_EN |
+IXGBE_ETQF_TX_ANTISPOOF |
+ETH_P_PAUSE));
+   }
 
/* For VFs that have spoof checking turned off */
for (i = 0; i < adapter->num_vfs; i++) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index 939c90c..995f031 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -1752,6 +1752,9 @@ enum {
  *FCoE (0x8906): Filter 2
  *1588 (0x88f7): Filter 3
  *FIP  (0x8914): Filter 4
+ *LLDP (0x88CC): Filter 5
+ *LACP (0x8809): Filter 6
+ *FC   (0x8808): Filter 7
  */
 #define IXGBE_ETQF_FILTER_EAPOL  0
 #define IXGBE_ETQF_FILTER_FCOE   2
@@ -1759,6 +1762,7 @@ enum {
 #define IXGBE_ETQF_FILTER_FIP4
 #define IXGBE_ETQF_FILTER_LLDP  5
 #define IXGBE_ETQF_FILTER_LACP  6
+#define IXGBE_ETQF_FILTER_FC7
 
 /* VLAN Control Bit Masks */
 #define IXGBE_VLNCTRL_VET   0x  /* bits 0-15 */
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 02/17] i40e: Make it clear a parameter is never used

2015-10-12 Thread Jeff Kirsher
From: Carolyn Wyborny 

Flag the filter_mask parameter as __always_unused in the
ndo_bridge_getlink function.

Change-ID: Ifc1e99c7fb84bcbf81cf7b0ac891ad8ca956ffb2
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index a484f22..d5d8b66 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8336,7 +8336,8 @@ static int i40e_ndo_bridge_setlink(struct net_device *dev,
  **/
 static int i40e_ndo_bridge_getlink(struct sk_buff *skb, u32 pid, u32 seq,
   struct net_device *dev,
-  u32 filter_mask, int nlflags)
+  u32 __always_unused filter_mask,
+  int nlflags)
 {
struct i40e_netdev_priv *np = netdev_priv(dev);
struct i40e_vsi *vsi = np->vsi;
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[net-next 09/17] i40e: update fw version text string per previous product formats

2015-10-12 Thread Jeff Kirsher
From: Carolyn Wyborny 

This patch moves the internal fw version and fw api version info to be
output in probe.  The nvm version, etrack and oem version info are now
configured for output via ethtool -i.

Change-ID: I05d490093a7137dbefcdef263d014d1e5c9e83d0
Signed-off-by: Carolyn Wyborny 
Tested-by: Andrew Bowers 
Signed-off-by: Jeff Kirsher 
---
 drivers/net/ethernet/intel/i40e/i40e.h  | 10 ++
 drivers/net/ethernet/intel/i40e/i40e_main.c |  7 +++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index f26dcb2..cfe8f83 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -107,6 +107,8 @@
 #define I40E_NVM_VERSION_LO_MASK   (0xff << I40E_NVM_VERSION_LO_SHIFT)
 #define I40E_NVM_VERSION_HI_SHIFT  12
 #define I40E_NVM_VERSION_HI_MASK   (0xf << I40E_NVM_VERSION_HI_SHIFT)
+#define I40E_OEM_VER_BUILD_MASK0xff00
+#define I40E_OEM_VER_PATCH_MASK0xff
 
 /* The values in here are decimal coded as hex as is the case in the NVM map*/
 #define I40E_CURRENT_NVM_VERSION_HI 0x2
@@ -587,14 +589,14 @@ static inline char *i40e_fw_version_str(struct i40e_hw 
*hw)
static char buf[32];
 
snprintf(buf, sizeof(buf),
-"f%d.%d.%05d a%d.%d n%x.%02x e%x",
-hw->aq.fw_maj_ver, hw->aq.fw_min_ver, hw->aq.fw_build,
-hw->aq.api_maj_ver, hw->aq.api_min_ver,
+"%x.%02x 0x%x %d.%d.%d",
 (hw->nvm.version & I40E_NVM_VERSION_HI_MASK) >>
I40E_NVM_VERSION_HI_SHIFT,
 (hw->nvm.version & I40E_NVM_VERSION_LO_MASK) >>
I40E_NVM_VERSION_LO_SHIFT,
-(hw->nvm.eetrack & 0xff));
+hw->nvm.eetrack, (hw->nvm.oem_ver >> 24),
+(hw->nvm.oem_ver & I40E_OEM_VER_BUILD_MASK) >> 8,
+hw->nvm.oem_ver & I40E_OEM_VER_PATCH_MASK);
 
return buf;
 }
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index d5d8b66..45b3292 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -10064,6 +10064,13 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
 
err = i40e_init_adminq(hw);
dev_info(>dev, "%s\n", i40e_fw_version_str(hw));
+
+   /* provide additional fw info, like api and ver */
+   dev_info(>dev, "fw_version:%d.%d.%05d\n",
+hw->aq.fw_maj_ver, hw->aq.fw_min_ver, hw->aq.fw_build);
+   dev_info(>dev, "fw api version:%d.%d\n",
+hw->aq.api_maj_ver, hw->aq.api_min_ver);
+
if (err) {
dev_info(>dev,
 "The driver for the device stopped because the NVM 
image is newer than expected. You must install the most recent version of the 
network driver.\n");
-- 
2.4.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 net-next 0/4] tcp: better smp listener behavior

2015-10-12 Thread David Miller
From: Eric Dumazet 
Date: Thu,  8 Oct 2015 19:33:20 -0700

> As promised in last patch series, we implement a better SO_REUSEPORT
> strategy, based on cpu affinities if selected by the application.
> 
> We also moved sk_refcnt out of the cache line containing the lookup
> keys, as it was considerably slowing down smp operations because
> of false sharing. This was simpler than converting listen sockets
> to conventional RCU (to avoid sk_refcnt dirtying)
> 
> Could process 6.0 Mpps SYN instead of 4.2 Mpps on my test server.

Just clarifying that I applied this v3 not v2 which I just replied
to by accident :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 net-next 0/4] tcp: better smp listener behavior

2015-10-12 Thread David Miller
From: Eric Dumazet 
Date: Thu,  8 Oct 2015 14:58:53 -0700

> As promised in last patch series, we implement a better SO_REUSEPORT
> strategy, based on cpu hints if given by the application.
> 
> We also moved sk_refcnt out of the cache line containing the lookup
> keys, as it was considerably slowing down smp operations because
> of false sharing. This was simpler than converting listen sockets
> to conventional RCU (to avoid sk_refcnt dirtying)
> 
> Could process 6.0 Mpps SYN instead of 4.2 Mpps on my test server.

Series applied, thanks Eric.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 4/4] Adds hardware supported cross timestamp

2015-10-12 Thread kbuild test robot
Hi Christopher,

[auto build test ERROR on net/master -- if it's inappropriate base, please 
suggest rules for selecting the more suitable base]

url:
https://github.com/0day-ci/linux/commits/Christopher-S-Hall/Patchset-enabling-hardware-based-cross-timestamps-for-next-gen-Intel-platforms/20151013-095135
config: sparc64-defconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=sparc64 

All errors (new ones prefixed by >>):

>> drivers/net/ethernet/intel/e1000e/ptp.c:28:21: fatal error: asm/tsc.h: No 
>> such file or directory
#include 
^
   compilation terminated.

vim +28 drivers/net/ethernet/intel/e1000e/ptp.c

22  /* PTP 1588 Hardware Clock (PHC)
23   * Derived from PTP Hardware Clock driver for Intel 82576 and 82580 
(igb)
24   * Copyright (C) 2011 Richard Cochran 
25   */
26  
27  #include "e1000.h"
  > 28  #include 
29  #include 
30  
31  /**

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn

2015-10-12 Thread John Fastabend
On 15-10-12 08:28 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> Since spinlock is held here, defer the switchdev operation.
>>
>> Signed-off-by: Jiri Pirko 
>> ---
>>  net/bridge/br_fdb.c | 5 -
>>  net/bridge/br_if.c  | 3 +++
>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>> index f5e7da0..c88bd8e 100644
>> --- a/net/bridge/br_fdb.c
>> +++ b/net/bridge/br_fdb.c
>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
>> const unsigned char *addr)
>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>  {
>> struct switchdev_obj_port_fdb fdb = {
>> -   .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>> +   .obj = {
>> +   .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>> +   .flags = SWITCHDEV_F_DEFER,
>> +   },
>> .vid = f->vlan_id,
>> };
>>
>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>> index 934cae9..09147cb 100644
>> --- a/net/bridge/br_if.c
>> +++ b/net/bridge/br_if.c
>> @@ -24,6 +24,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include "br_private.h"
>>
>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>> list_del_rcu(>list);
>>
>> br_fdb_delete_by_port(br, p, 0, 1);
>> +   switchdev_flush_deferred();
>> +
> 
> This potentially flushes other (valid) work on the deferred queue not
> related to FDB del.
> 
> I wonder if this flush step is necessary at all?  The work we deferred
> to delete the FDB entry can still happen after the port has been
> removed (del_nbp).  If the port driver/device find the FDB entry, then
> delete it, otherwise ignore it.
> 

Just the first thing that springs to mind reading this comment is,

  - del gets deffered
  - add fdb
  - del runs

Is there an issue here? Sorry I'll do a more thorough review now just
thought I would toss it out there before I forget.

Thanks,
John
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Alexei Starovoitov

On 10/12/15 8:51 PM, Wangnan (F) wrote:

why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so just
disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its display.
We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.


ok, but that use case may have different enable/disable pattern.
In sys_write example ultra-fast enable/disable is must have, since
the whole syscall is fast and overhead should be minimal.
but for display refresh? we're talking milliseconds, no?
Can you just ioctl() it from user space?
If cost of enable/disable is high or the time range between toggling is
long, then doing it from the bpf program doesn't make sense. Instead
the program can do bpf_perf_event_output() to send a notification to
user space that condition is met and the user space can ioctl() events.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred

2015-10-12 Thread John Fastabend
On 15-10-12 07:52 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
>> From: Jiri Pirko 
>>
>> Caller should know if he can call attr_set directly (when holding RTNL)
>> or if he has to defer the att_set processing for later.
>>
>> This also allows drivers to sleep inside attr_set and report operation
>> status back to switchdev core. Switchdev core then warns if status is
>> not ok, instead of silent errors happening in drivers.
>>
>> Signed-off-by: Jiri Pirko 
>> ---
>>  include/net/switchdev.h   |   1 +
>>  net/bridge/br_stp.c   |   3 +-
>>  net/switchdev/switchdev.c | 107 
>> --
>>  3 files changed, 59 insertions(+), 52 deletions(-)
>>
>> diff --git a/include/net/switchdev.h b/include/net/switchdev.h
>> index d2879f2..6b109e4 100644
>> --- a/include/net/switchdev.h
>> +++ b/include/net/switchdev.h
>> @@ -17,6 +17,7 @@
>>
>>  #define SWITCHDEV_F_NO_RECURSE BIT(0)
>>  #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1)
>> +#define SWITCHDEV_F_DEFER  BIT(2)
>>
>>  struct switchdev_trans_item {
>> struct list_head list;
>> diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
>> index db6d243de..80c34d7 100644
>> --- a/net/bridge/br_stp.c
>> +++ b/net/bridge/br_stp.c
>> @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned 
>> int state)
>>  {
>> struct switchdev_attr attr = {
>> .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
>> +   .flags = SWITCHDEV_F_DEFER,
>> .u.stp_state = state,
>> };
>> int err;
>>
>> p->state = state;
>> err = switchdev_port_attr_set(p->dev, );
>> -   if (err && err != -EOPNOTSUPP)
>> +   if (err)
> 
> This looks like a problem as now all other non-switchdev ports will
> get an WARN in the log when STP state changes.  We should only WARN if
> there was an err and the err is not -EOPNOTSUPP.
> 
>> br_warn(p->br, "error setting offload STP state on port 
>> %u(%s)\n",
>> (unsigned int) p->port_no, p->dev->name);
>>  }
> 
> 
> 
>>  struct switchdev_attr_set_work {
>> struct work_struct work;
>> struct net_device *dev;
>> @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct 
>> work_struct *work)
>>  {
>> struct switchdev_attr_set_work *asw =
>> container_of(work, struct switchdev_attr_set_work, work);
>> +   bool rtnl_locked = rtnl_is_locked();
>> int err;
>>
>> -   rtnl_lock();
>> -   err = switchdev_port_attr_set(asw->dev, >attr);
>> +   if (!rtnl_locked)
>> +   rtnl_lock();
> 
> I'm not following this change.  If someone else has rtnl_lock, we'll
> not wait to grab it here ourselves, and proceed as if we have the
> lock.  But what if that someone else releases the lock in the middle
> of us doing switchdev_port_attr_set_now?  Seems we want to
> unconditionally wait and grab the lock.  We need to block anything
> from moving while we do the attr set.
> 

Also an additional race between setting rtnl_locked and the if stmt
and then grabbing the lock. There seems to be a something of pattern
around this where other subsystems use a rtnl_trylock and if it fails
do a restart/re-queue operation to retry. Looks like how you handle
it in the team driver at least.

>> +   err = switchdev_port_attr_set_now(asw->dev, >attr);
>> if (err && err != -EOPNOTSUPP)
>> netdev_err(asw->dev, "failed (err=%d) to set attribute 
>> (id=%d)\n",
>>err, asw->attr.id);
>> -   rtnl_unlock();
>> +   if (!rtnl_locked)
>> +   rtnl_unlock();
>>
>> dev_put(asw->dev);
>> kfree(work);
>> @@ -211,7 +257,7 @@ static int switchdev_port_attr_set_defer(struct 
>> net_device *dev,
>> asw->dev = dev;
>> memcpy(>attr, attr, sizeof(asw->attr));
>>
>> -   schedule_work(>work);
>> +   queue_work(switchdev_wq, >work);
>>
>> return 0;
>>  }

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Alexei Starovoitov

On 10/12/15 9:34 PM, Wangnan (F) wrote:



On 2015/10/13 12:16, Alexei Starovoitov wrote:

On 10/12/15 8:51 PM, Wangnan (F) wrote:

why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so
just
disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its display.
We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.


ok, but that use case may have different enable/disable pattern.
In sys_write example ultra-fast enable/disable is must have, since
the whole syscall is fast and overhead should be minimal.
but for display refresh? we're talking milliseconds, no?
Can you just ioctl() it from user space?
If cost of enable/disable is high or the time range between toggling is
long, then doing it from the bpf program doesn't make sense. Instead
the program can do bpf_perf_event_output() to send a notification to
user space that condition is met and the user space can ioctl() events.



OK. I think I understand your design principle that, everything inside BPF
should be as fast as possible.

Make userspace control events using ioctl make things harder. You know that
'perf record' itself doesn't care too much about events it reveived. It
only
copies data to perf.data, but what we want is to use perf record simply
like
this:

  # perf record -e evt=cycles -e control.o/pmu=evt/ -a sleep 100

And in control.o we create uprobe point to mark the start and finish of
a frame:

  SEC("target=/a/b/c.o\nstartFrame=0x123456")
  int startFrame(void *) {
bpf_pmu_enable(pmu);
return 1;
  }

  SEC("target=/a/b/c.o\nfinishFrame=0x234568")
  int finishFrame(void *) {
bpf_pmu_disable(pmu);
return 1;
  }

I think it is make sence also.


yes. that looks quite useful,
but did you consider re-entrant startFrame() ?
start << here sampling starts
  start
  finish << here all samples disabled?!
finish
and startFrame()/finishFrame() running on all cpus of that user app ?
One cpu entering into startFrame() while another cpu doing finishFrame
what behavior should be? sampling is still enabled on all cpus? or off?
Either case doesn't seem to work with simple enable/disable.
Few emails in this thread back, I mentioned inc/dec of a flag
to solve that.


What about using similar
implementation
like PERF_EVENT_IOC_SET_OUTPUT, creating a new ioctl like
PERF_EVENT_IOC_SET_ENABLER,
then let perf to select an event as 'enabler', then BPF can still
control one atomic
variable to enable/disable a set of events.


you lost me on that last sentence. How this 'enabler' will work?
Also I'm still missing what's wrong with perf doing ioctl() on
events on all cpus manually when bpf program tells it to do so.
Is it speed you concerned about or extra work in perf ?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issue with /proc/sys/net/ipv4/tcp_mem

2015-10-12 Thread Eric W. Biederman
Eric Dumazet  writes:

> On Mon, 2015-10-12 at 11:37 -0500, Eric W. Biederman wrote:
>> wangyufen  writes:
>> 
>> > Hi,
>> >
>> > I tried on linux-4.1:
>> > linux:~# cat /proc/sys/net/ipv4/tcp_mem 
>> > 83886081258291216777216
>> > linux:~# echo 1234 >/proc/sys/net/ipv4/tcp_mem 
>> > -bash: echo: write error: Invalid argument
>> > linux:~# cat /proc/sys/net/ipv4/tcp_mem 
>> > 1234   1258291216777216
>> >
>> > the echo operation got error, but value already written to tcp_mem.
>> >
>> > I checked, patch f594d63199688ad568fb caused the issue.
>> 
>> 
>> If your problem is that you can not write a single value and instead
>> have to write all three values I don't know what to tell you.  I don't
>> see how that could have ever worked.
>> 
>> Certainly the commit you pointed at did not change that behavior.
>
> I would not be so sure.
> Above commit added a regression for partial writes.
> If a write() returns an error like EINVAL, we expect no change occurred.
>
> Prior code was calling proc_doulongvec_minmax() using a temporary array,
> and updated tcp_mem[0 .. 2] only of proc_doulongvec_minmax() returned 0
>
>ret = proc_doulongvec_minmax(, write, buffer, lenp, ppos);
>if (ret)
>return ret;
> #ifdef CONFIG_MEMCG_KMEM
>   // deleted for clarity
> #endif
>
>net->ipv4.sysctl_tcp_mem[0] = vec[0];
>net->ipv4.sysctl_tcp_mem[1] = vec[1];
>net->ipv4.sysctl_tcp_mem[2] = vec[2];
>
>return 0;
>
> We could argue it is a bug in proc_doulongvec_minmax().
> This helper probably should allocate a temp buffer,
> as we have the same issue with udp_mem[].

Point.  We do store the value on partial writes when before we did not.

That is weird.  Clearly someone noticed.  I agree this is a confusing
corner case in proc_doulongvec_minmax that it may be worth addressing.

Does this cause a regression in a real application?   I definitely would
like to know what in the world a real application is doing that causes
it to break with this difference in behavior before doing anything,
because I am dense enough not to see how an application could
meaningfully care about this difference in behavior.

Eric

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn

2015-10-12 Thread John Fastabend
On 15-10-12 09:19 PM, Scott Feldman wrote:
> On Mon, Oct 12, 2015 at 8:31 PM, John Fastabend
>  wrote:
>> On 15-10-12 08:28 PM, Scott Feldman wrote:
>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
 From: Jiri Pirko 

 Since spinlock is held here, defer the switchdev operation.

 Signed-off-by: Jiri Pirko 
 ---
  net/bridge/br_fdb.c | 5 -
  net/bridge/br_if.c  | 3 +++
  2 files changed, 7 insertions(+), 1 deletion(-)

 diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
 index f5e7da0..c88bd8e 100644
 --- a/net/bridge/br_fdb.c
 +++ b/net/bridge/br_fdb.c
 @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
 const unsigned char *addr)
  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
  {
 struct switchdev_obj_port_fdb fdb = {
 -   .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
 +   .obj = {
 +   .id = SWITCHDEV_OBJ_ID_PORT_FDB,
 +   .flags = SWITCHDEV_F_DEFER,
 +   },
 .vid = f->vlan_id,
 };

 diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
 index 934cae9..09147cb 100644
 --- a/net/bridge/br_if.c
 +++ b/net/bridge/br_if.c
 @@ -24,6 +24,7 @@
  #include 
  #include 
  #include 
 +#include 

  #include "br_private.h"

 @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
 list_del_rcu(>list);

 br_fdb_delete_by_port(br, p, 0, 1);
 +   switchdev_flush_deferred();
 +
>>>
>>> This potentially flushes other (valid) work on the deferred queue not
>>> related to FDB del.
>>>
>>> I wonder if this flush step is necessary at all?  The work we deferred
>>> to delete the FDB entry can still happen after the port has been
>>> removed (del_nbp).  If the port driver/device find the FDB entry, then
>>> delete it, otherwise ignore it.
>>>
>>
>> Just the first thing that springs to mind reading this comment is,
>>
>>   - del gets deffered
>>   - add fdb
>>   - del runs
>>
>> Is there an issue here? Sorry I'll do a more thorough review now just
>> thought I would toss it out there before I forget.
> 
> It's a valid thought to consider, for sure.  The context is these are
> only FDB entries added by an external learn event.  So I believe in
> your sequence, the second step to add fdb entry wouldn't happen as the
> fdb entry already exists at that point (in other words, the entry has
> already been learned on external device and pushed up via notifier to
> bridge).  So I think we're OK in regards to your question.
> 

ah I see so the take away is we need to be very careful about who/what
sets the deferred bit or you might get yourself in a world of hurt.

Here you are just ensuring you get all the fdb addr's out of the device.
Seems OK to me just be sure you don't try to set the deferred bit on
the attributes setting the state to DISABLED so we don't get a race
there.

Thanks,
John

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 2/7] switchdev: allow caller to explicitly request attr_set as deferred

2015-10-12 Thread John Fastabend
On 15-10-12 10:45 PM, Jiri Pirko wrote:
> Tue, Oct 13, 2015 at 06:40:25AM CEST, john.fastab...@gmail.com wrote:
>> On 15-10-12 07:52 PM, Scott Feldman wrote:
>>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
 From: Jiri Pirko 

 Caller should know if he can call attr_set directly (when holding RTNL)
 or if he has to defer the att_set processing for later.

 This also allows drivers to sleep inside attr_set and report operation
 status back to switchdev core. Switchdev core then warns if status is
 not ok, instead of silent errors happening in drivers.

 Signed-off-by: Jiri Pirko 
 ---
  include/net/switchdev.h   |   1 +
  net/bridge/br_stp.c   |   3 +-
  net/switchdev/switchdev.c | 107 
 --
  3 files changed, 59 insertions(+), 52 deletions(-)

 diff --git a/include/net/switchdev.h b/include/net/switchdev.h
 index d2879f2..6b109e4 100644
 --- a/include/net/switchdev.h
 +++ b/include/net/switchdev.h
 @@ -17,6 +17,7 @@

  #define SWITCHDEV_F_NO_RECURSE BIT(0)
  #define SWITCHDEV_F_SKIP_EOPNOTSUPPBIT(1)
 +#define SWITCHDEV_F_DEFER  BIT(2)

  struct switchdev_trans_item {
 struct list_head list;
 diff --git a/net/bridge/br_stp.c b/net/bridge/br_stp.c
 index db6d243de..80c34d7 100644
 --- a/net/bridge/br_stp.c
 +++ b/net/bridge/br_stp.c
 @@ -41,13 +41,14 @@ void br_set_state(struct net_bridge_port *p, unsigned 
 int state)
  {
 struct switchdev_attr attr = {
 .id = SWITCHDEV_ATTR_ID_PORT_STP_STATE,
 +   .flags = SWITCHDEV_F_DEFER,
 .u.stp_state = state,
 };
 int err;

 p->state = state;
 err = switchdev_port_attr_set(p->dev, );
 -   if (err && err != -EOPNOTSUPP)
 +   if (err)
>>>
>>> This looks like a problem as now all other non-switchdev ports will
>>> get an WARN in the log when STP state changes.  We should only WARN if
>>> there was an err and the err is not -EOPNOTSUPP.
>>>
 br_warn(p->br, "error setting offload STP state on port 
 %u(%s)\n",
 (unsigned int) p->port_no, p->dev->name);
  }
>>>
>>> 
>>>
  struct switchdev_attr_set_work {
 struct work_struct work;
 struct net_device *dev;
 @@ -183,14 +226,17 @@ static void switchdev_port_attr_set_work(struct 
 work_struct *work)
  {
 struct switchdev_attr_set_work *asw =
 container_of(work, struct switchdev_attr_set_work, work);
 +   bool rtnl_locked = rtnl_is_locked();
 int err;

 -   rtnl_lock();
 -   err = switchdev_port_attr_set(asw->dev, >attr);
 +   if (!rtnl_locked)
 +   rtnl_lock();
>>>
>>> I'm not following this change.  If someone else has rtnl_lock, we'll
>>> not wait to grab it here ourselves, and proceed as if we have the
>>> lock.  But what if that someone else releases the lock in the middle
>>> of us doing switchdev_port_attr_set_now?  Seems we want to
>>> unconditionally wait and grab the lock.  We need to block anything
>>> from moving while we do the attr set.
>>>
>>
>> Also an additional race between setting rtnl_locked and the if stmt
>> and then grabbing the lock. There seems to be a something of pattern
>> around this where other subsystems use a rtnl_trylock and if it fails
>> do a restart/re-queue operation to retry. Looks like how you handle
>> it in the team driver at least.
> 
> No, this is for different case. This is for case someone calls
> switchdev_flush_defererd holding the rtnl_lock.
> 

OK rather than funky if stmt could you just do a rtnl_trylock() and
put a comment explaining the reasoning?
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/3] net: Pass net into defragmentation

2015-10-12 Thread David Miller
From: ebied...@xmission.com (Eric W. Biederman)
Date: Fri, 09 Oct 2015 13:42:20 -0500

> 
> This is the next installment of my work to pass struct net through the
> output path so the code does not need to guess how to figure out which
> network namespace it is in, and ultimately routes can have output
> devices in another network namespace.
> 
> In netfilter and af_packet we defragment packets in the output path,
> and there is the usual amount of confusion about how to compute which
> net we are processing the packets in.  This patchset clears that
> confusion up by explicitly passing in struct net in ip_defrag,
> ip_check_defrag, and nf_ct_frag6_gather.
> 
> The changes are also available against net-next at:
> git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/net-next.git master

I applied this as a patch series instead of pulling, in order to
get Pablo's ACKs.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 1/2] perf: Add the flag sample_disable not to output data on samples

2015-10-12 Thread xiakaixu
于 2015/10/13 3:20, Alexei Starovoitov 写道:
> On 10/12/15 2:02 AM, Kaixu Xia wrote:
>> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
>> index f57d7fe..25e073d 100644
>> --- a/include/linux/bpf.h
>> +++ b/include/linux/bpf.h
>> @@ -39,6 +39,7 @@ struct bpf_map {
>>   u32 max_entries;
>>   const struct bpf_map_ops *ops;
>>   struct work_struct work;
>> +atomic_t perf_sample_disable;
>>   };
>>
>>   struct bpf_map_type_list {
>> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
>> index 092a0e8..0606d1d 100644
>> --- a/include/linux/perf_event.h
>> +++ b/include/linux/perf_event.h
>> @@ -483,6 +483,8 @@ struct perf_event {
>>   perf_overflow_handler_toverflow_handler;
>>   void*overflow_handler_context;
>>
>> +atomic_t*sample_disable;
> 
> this looks fragile and unnecessary.
> Why add such field to generic bpf_map and carry its pointer into perf_event?
> Single extra field in perf_event would have been enough.
> Even better is to avoid adding any fields.
> There is already event->state why not to use that?
> The proper perf_event_enable/disable are so heavy that another
> mechanism needed? cpu_function_call is probably too much to do
> from bpf program, but that can be simplified?
> Based on the use case from cover letter, sounds like you want
> something like soft_disable?
> Then extending event->state would make the most sense.
> Also consider the case of re-entrant event enable/disable.
> So inc/dec of a flag may be needed?

Thanks for your comments!
I've tried perf_event_enable/disable, but there is a warning caused
by cpu_function_call. The main reason as follows,
 int smp_call_function_single(...)
 {
...
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
 && !oops_in_progress);
...
}
So I added the extra atomic flag filed in order to avoid this problem.
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tun: use sk_fullsock() before reading sk->sk_tsflags

2015-10-12 Thread David Miller
From: Eric Dumazet 
Date: Fri, 09 Oct 2015 15:42:21 -0700

> From: Eric Dumazet 
> 
> timewait or request sockets are small and do not contain sk->sk_tsflags
> 
> Without this fix, we might read garbage, and crash later in
> 
> __skb_complete_tx_timestamp()
>  -> sock_queue_err_skb()
> 
> (These pseudo sockets do not have an error queue either)
> 
> Fixes: ca6fb0651883 ("tcp: attach SYNACK messages to request sockets instead 
> of listener")
> Signed-off-by: Eric Dumazet 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] atm: iphase: fix misleading indention

2015-10-12 Thread David Miller
From: Tillmann Heidsieck 
Date: Sat, 10 Oct 2015 21:47:19 +0200

> Fix a smatch warning:
> drivers/atm/iphase.c:1178 rx_pkt() warn: curly braces intended?
> 
> The code is correct, the indention is misleading. In case the allocation
> of skb fails, we want to skip to the end.
> 
> Signed-off-by: Tillmann Heidsieck 

Applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] atm: iphase: return -ENOMEM instead of -1 in case of failed kmalloc()

2015-10-12 Thread David Miller
From: Tillmann Heidsieck 
Date: Sat, 10 Oct 2015 21:47:18 +0200

> Smatch complains about returning hard coded error codes, silence this
> warning.
> 
> drivers/atm/iphase.c:115 ia_enque_rtn_q() warn: returning -1 instead of 
> -ENOMEM is sloppy
> 
> Signed-off-by: Tillmann Heidsieck 

Applied to net-next.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Alexei Starovoitov

On 10/12/15 8:27 PM, Wangnan (F) wrote:

Then how to avoid racing? For example, when one core disabling all events
in a map, another core is enabling all of them. This racing may causes
sereval
perf events in a map dump samples while other events not. To avoid such
racing
I think some locking must be introduced, then cost is even higher.

The reason why we introduce an atomic pointer is because each operation
should
controls a set of events, not one event, due to the per-cpu manner of
perf events.


why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so just
disabling it on the current one is enough.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next v5 01/10] qed: Add module with basic common support

2015-10-12 Thread Yuval Mintz
> 500 ko of a basic something is mildly reviewable for mere mortals.
Undertood, obviously. Thanks for the time you've put into this.

> [...]
> > +/* forward */
> > +struct qed_ptt_pool;
> > +struct qed_spq;
> > +struct qed_sb_info;
> > +struct qed_sb_attn_info;
> > +struct qed_cxt_mngr;
> > +struct qed_sb_sp_info;
> > +struct qed_mcp_info;
Could have been solved by adding additional header files.
But given the high number of files already introduced by this,
the thinking was that a couple of forward declerations was better than
adding lots of new header files to prevent this sort of dependency.

> [...]
> > +struct qed_simd_fp_handler {
> > + void*token;
> > + void(*func)(void *);
> > +};
> Use union * ?
The token is a cookie to be used by a func, so union isn't appropriate.

> [...]
> > +static int qed_ilt_shadow_alloc(struct qed_hwfn *p_hwfn)
> > +{
> > + struct qed_cxt_mngr *p_mngr = p_hwfn->p_cxt_mngr;
> > + struct qed_ilt_client_cfg *clients = p_mngr->clients;
> > + struct qed_ilt_cli_blk *p_blk;
> > + u32 size, i, j;
> > + int rc;
> > +
> > + size = qed_cxt_ilt_shadow_size(clients);
> > + p_mngr->ilt_shadow = kcalloc(size, sizeof(struct qed_dma_mem),
> > +  GFP_KERNEL);
> > + if (!p_mngr->ilt_shadow) {
> > + DP_NOTICE(p_hwfn, "Failed to allocate ilt shadow table\n");
> > + rc = -ENOMEM;
> > + goto ilt_shadow_fail;
> > + } else {
> > + DP_VERBOSE(p_hwfn, QED_MSG_ILT,
> > +"Allocated 0x%x bytes for ilt shadow\n",
> > +(u32)(size * sizeof(struct qed_dma_mem)));
> > + }
> The "else" branch after the "goto" isn't idiomatic.
Not that I mind, but is such a prefernce described in any style-guide?


> [...]
> > +static int qed_init_qm_info(struct qed_hwfn *p_hwfn)
> > +{
> [...]
> > + /* PQs will be arranged as follows: First per-TC PQ then pure-LB 
> > quete.
> > +   */
> > + qm_info->qm_pq_params = kzalloc(sizeof(*qm_info->qm_pq_params) *
> > + num_pqs, GFP_ATOMIC);
> qed_init_qm_info is only used in qed_resc_alloc. qed_resc_alloc performs
> GFP_KERNEL alloc and qed_resc_alloc does not use qed_init_qm_info in
> a spinlocked section. I would thus expect both to use the same allocation
> flag.
I know we're wasteful in using GFP_ATOMIC in the driver in many places.
We've already revised this in our dev tree, but we're trying to use the
same code-base for the initial submission [otherwise it would make a
difficult task even more difficult].
Thanks for pointing this out, but unless this is considered crucial for inital
submission we'll fix it later on.

Thanks,
Yuval--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-12 Thread Richard Cochran
On Mon, Oct 12, 2015 at 11:45:19AM -0700, Christopher S. Hall wrote:
> Another representative use case of time sync and the correlated
> clocksource (in addition to PTP noted above) is PTP synchronized
> audio.

The added explanations of the audio use case do help.  However, you
did not address my point in the last series in any way.
 
> In a streaming application, as an example, samples will be sent
> and/or received by multiple devices with a presentation time that is
> in terms of the PTP master clock. Synchronizing the audio output on
> these devices requires correlating the audio clock with the PTP
> master clock. The more precise this correlation is, the better the
> audio quality (i.e. out of sync audio sounds bad).


This is mega important.  You want to convert PTP time into audio clock
time.  There is no need for the system time at all.
 
> From an application standpoint, to correlate the PTP master clock
> with the audio device clock, the system clock is used as a
> intermediate timebase.

But why involve the system time base?

> The transforms such an application would
> perform are:
> 
> System Clock <-> Audio clock
> System Clock <-> Network Device Clock [<-> PTP Master Clock]

This is extra work with no benefit.  In fact, this hurts you
because of the need to take avoid update_wall_time AND because of the
NTP frequency adjustments.  Cascaded servos are prone to gain peaking,
and this can easily avoided in this case.
 
> Modern Intel platforms can perform a more accurate cross-
> timestamp in hardware (ART,audio device clock).  The audio driver
> requires ART->system time transforms -- the same as required for
> the network driver.

No, it doesn't need the system time.  It only needs the PTP time.

> The modification to the original patch accomodates these
> slow devices by adding the option of providing an ART value outside
> of the retry loop and adding a history which can consulted in the
> case of an out of date counter value. The history is kept by
> making the shadow_timekeeper an array. Each write to the
> timekeeper rotates through the array, preserving a
> history of updates.

This is all wrong.  All you need to provide the DSP with (ART, PTP)
pairs.  This can be done in a multiple of the DSP period, like every
1, 10, or 100 milliseconds.

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4 1/4] Produce system time from correlated clocksource

2015-10-12 Thread Richard Cochran
On Mon, Oct 12, 2015 at 11:45:19AM -0700, Christopher S. Hall wrote:
> +int get_correlated_timestamp(struct correlated_ts *crt,
> +  struct correlated_cs *crs)
> +{
> + struct timekeeper *tk = _core.timekeeper;
> + unsigned long seq;
> + cycles_t cycles, cycles_now, cycles_last;
> + ktime_t base;
> + s64 nsecs;
> + int ret;
> +
> + do {
> + seq = read_seqcount_begin(_core.seq);
> + /*
> +  * Verify that the correlated clocksoure is related to
> +  * the currently installed timekeeper clocksoure
> +  */
> + if (tk->tkr_mono.clock != crs->related_cs)
> + return -ENODEV;
> +
> + /*
> +  * Get a timestamp from the device if get_ts is non-NULL
> +  */
> + if( crt->get_ts ) {

CodingStyle.

> + ret = crt->get_ts(crt);
> + if (ret)
> + return ret;
> + }
> +
> + /*
> +  * Convert the timestamp to timekeeper clock cycles
> +  */
> + cycles = crs->convert(crs, crt->system_ts);
> +
> + /*
> +  * If we have get_ts is valid, we know the cycles value
> +  * value is up to date and we can just do the conversion
> +  */
> + if( crt->get_ts )

Ditto.

> + goto do_convert;
> +

Thanks,
Richard
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next] bridge: try switchdev op first in __vlan_vid_add/del

2015-10-12 Thread David Miller
From: Jiri Pirko 
Date: Fri,  9 Oct 2015 13:54:11 +0200

> From: Jiri Pirko 
> 
> Some drivers need to implement both switchdev vlan ops and
> vid_add/kill ndos. For that to work in bridge code, we need to try
> switchdev op first when adding/deleting vlan id.
> 
> Signed-off-by: Jiri Pirko 
> Signed-off-by: Ido Schimmel 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] ipv4/icmp: redirect messages can use the ingress daddr as source

2015-10-12 Thread David Miller
From: Paolo Abeni 
Date: Fri,  9 Oct 2015 14:34:31 +0200

> This patch allows configuring how the source address of ICMP
> redirect messages is selected; by default the old behaviour is
> retained, while setting icmp_redirects_use_orig_daddr force the
> usage of the destination address of the packet that caused the
> redirect.
> 
> The new behaviour fits closely the RFC 5798 section 8.1.1, and fix the
> following scenario:
> 
> Two machines are set up with VRRP to act as routers out of a subnet,
> they have IPs x.x.x.1/24 and x.x.x.2/24, with VRRP holding on to
> x.x.x.254/24.
> 
> If a host in said subnet needs to get an ICMP redirect from the VRRP
> router, i.e. to reach a destination behind a different gateway, the
> source IP in the ICMP redirect is chosen as the primary IP on the
> interface that the packet arrived at, i.e. x.x.x.1 or x.x.x.2.
> 
> The host will then ignore said redirect, due to RFC 1122 section 3.2.2.2,
> and will continue to use the wrong next-op.
> 
> Signed-off-by: Paolo Abeni 

Applied to net-next, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 4/7] switchdev: introduce possibility to defer obj_add/del

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> Similar to the attr usecase, the caller knows if he is holding RTNL and is
> in atomic section. So let the called to decide the correct call variant.
>
> This allows drivers to sleep inside their ops and wait for hw to get the
> operation status. Then the status is propagated into switchdev core.
> This avoids silent errors in drivers.
>
> Signed-off-by: Jiri Pirko 



> +static void switchdev_port_obj_work(struct work_struct *work)
> +{
> +   struct switchdev_obj_work *ow =
> +   container_of(work, struct switchdev_obj_work, work);
> +   bool rtnl_locked = rtnl_is_locked();
> +   int err;
> +
> +   if (!rtnl_locked)
> +   rtnl_lock();

Same comment as on patch 2/7 about not unconditionally grabbing rtnl_lock.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH] qlcnic: constify qlcnic_mbx_ops structure

2015-10-12 Thread Sony Chacko
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Julia Lawall
> Sent: Sunday, October 11, 2015 4:48 AM
> To: Dept-GE Linux NIC Dev 
> Cc: kernel-janit...@vger.kernel.org; netdev ; linux-
> kernel 
> Subject: [PATCH] qlcnic: constify qlcnic_mbx_ops structure
> 
> The only instance of a qlcnic_mbx_ops structure is never modified.  Thus the
> declaration of the structure and all references to the structure type can be 
> made
> const.
> 
> In the definition of the qlcnic_mailbox structure, the ops field is no longer 
> lined
> up with the other fields.  This was left as is, to avoid a lot of trivial 
> changes on
> the other lines.
> 
> Done with the help of Coccinelle.
> 
> Signed-off-by: Julia Lawall 
> 
> ---
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic.h |2 +-
>  drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c |4 ++--
>  2 files changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> index 9f0bdd9..37a731b 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_hw.c
> @@ -4048,7 +4048,7 @@ static void qlcnic_83xx_mailbox_worker(struct
> work_struct *work)
>   struct qlcnic_mailbox *mbx = container_of(work, struct qlcnic_mailbox,
> work);
>   struct qlcnic_adapter *adapter = mbx->adapter;
> - struct qlcnic_mbx_ops *mbx_ops = mbx->ops;
> + const struct qlcnic_mbx_ops *mbx_ops = mbx->ops;
>   struct device *dev = >pdev->dev;
>   atomic_t *rsp_status = >rsp_status;
>   struct list_head *head = >cmd_q;
> @@ -4098,7 +4098,7 @@ static void qlcnic_83xx_mailbox_worker(struct
> work_struct *work)
>   }
>  }
> 
> -static struct qlcnic_mbx_ops qlcnic_83xx_mbx_ops = {
> +static const struct qlcnic_mbx_ops qlcnic_83xx_mbx_ops = {
>   .enqueue_cmd= qlcnic_83xx_enqueue_mbx_cmd,
>   .dequeue_cmd= qlcnic_83xx_dequeue_mbx_cmd,
>   .decode_resp= qlcnic_83xx_decode_mbx_rsp,
> diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> index d6696cf..46bbea8 100644
> --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic.h
> @@ -1092,7 +1092,7 @@ struct qlcnic_filter_hash {  struct qlcnic_mailbox {
>   struct workqueue_struct *work_q;
>   struct qlcnic_adapter   *adapter;
> - struct qlcnic_mbx_ops   *ops;
> + const struct qlcnic_mbx_ops *ops;
>   struct work_struct  work;
>   struct completion   completion;
>   struct list_headcmd_q;
> 
Acked-by: Sony Chacko 

Thanks,
Sony
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] bpf: Implement bpf_perf_event_sample_enable/disable() helpers

2015-10-12 Thread Wangnan (F)



On 2015/10/13 11:39, Alexei Starovoitov wrote:

On 10/12/15 8:27 PM, Wangnan (F) wrote:
Then how to avoid racing? For example, when one core disabling all 
events

in a map, another core is enabling all of them. This racing may causes
sereval
perf events in a map dump samples while other events not. To avoid such
racing
I think some locking must be introduced, then cost is even higher.

The reason why we introduce an atomic pointer is because each operation
should
controls a set of events, not one event, due to the per-cpu manner of
perf events.


why 'set disable' is needed ?
the example given in cover letter shows the use case where you want
to receive samples only within sys_write() syscall.
The example makes sense, but sys_write() is running on this cpu, so just
disabling it on the current one is enough.



Our real use case is control of the system-wide sampling. For example,
we need sampling all CPUs when smartphone start refershing its display.
We need all CPUs because in Android system there are plenty of threads
get involed into this behavior. We can't achieve this by controling
sampling on only one CPU. This is the reason we need 'set enable'
and 'set disable'.

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 6/7] rocker: remove nowait from switchdev callbacks.

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
> From: Jiri Pirko 
>
> No need to avoid sleeping in switchdev callbacks now, as the switchdev
> core allows it.
>
> Signed-off-by: Jiri Pirko 
> ---
>  drivers/net/ethernet/rocker/rocker.c | 7 +++
>  1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/rocker/rocker.c 
> b/drivers/net/ethernet/rocker/rocker.c
> index bb956a5..9629c5b5 100644
> --- a/drivers/net/ethernet/rocker/rocker.c
> +++ b/drivers/net/ethernet/rocker/rocker.c
> @@ -3672,7 +3672,7 @@ static int rocker_port_fdb_flush(struct rocker_port 
> *rocker_port,
> rocker_port->stp_state == BR_STATE_FORWARDING)
> return 0;
>
> -   flags |= ROCKER_OP_FLAG_REMOVE;
> +   flags |= ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;

I understand the two changes below where you're removing NOWAIT, but
here you're adding NOWAIT which I'm not sure how that is related to
the switchdev core changes.  Is this two patches?


> spin_lock_irqsave(>fdb_tbl_lock, lock_flags);
>
> @@ -4382,8 +4382,7 @@ static int rocker_port_attr_set(struct net_device *dev,
>
> switch (attr->id) {
> case SWITCHDEV_ATTR_ID_PORT_STP_STATE:
> -   err = rocker_port_stp_update(rocker_port, trans,
> -ROCKER_OP_FLAG_NOWAIT,
> +   err = rocker_port_stp_update(rocker_port, trans, 0,
>  attr->u.stp_state);
> break;
> case SWITCHDEV_ATTR_ID_PORT_BRIDGE_FLAGS:
> @@ -4517,7 +4516,7 @@ static int rocker_port_fdb_del(struct rocker_port 
> *rocker_port,
>const struct switchdev_obj_port_fdb *fdb)
>  {
> __be16 vlan_id = rocker_port_vid_to_vlan(rocker_port, fdb->vid, NULL);
> -   int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_REMOVE;
> +   int flags = ROCKER_OP_FLAG_REMOVE;
>
> if (!rocker_port_is_bridged(rocker_port))
> return -EINVAL;
> --
> 1.9.3
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch net-next v4 5/7] bridge: defer switchdev fdb del call in fdb_del_external_learn

2015-10-12 Thread Scott Feldman
On Mon, Oct 12, 2015 at 8:31 PM, John Fastabend
 wrote:
> On 15-10-12 08:28 PM, Scott Feldman wrote:
>> On Mon, Oct 12, 2015 at 11:03 AM, Jiri Pirko  wrote:
>>> From: Jiri Pirko 
>>>
>>> Since spinlock is held here, defer the switchdev operation.
>>>
>>> Signed-off-by: Jiri Pirko 
>>> ---
>>>  net/bridge/br_fdb.c | 5 -
>>>  net/bridge/br_if.c  | 3 +++
>>>  2 files changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/bridge/br_fdb.c b/net/bridge/br_fdb.c
>>> index f5e7da0..c88bd8e 100644
>>> --- a/net/bridge/br_fdb.c
>>> +++ b/net/bridge/br_fdb.c
>>> @@ -134,7 +134,10 @@ static void fdb_del_hw_addr(struct net_bridge *br, 
>>> const unsigned char *addr)
>>>  static void fdb_del_external_learn(struct net_bridge_fdb_entry *f)
>>>  {
>>> struct switchdev_obj_port_fdb fdb = {
>>> -   .obj.id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>> +   .obj = {
>>> +   .id = SWITCHDEV_OBJ_ID_PORT_FDB,
>>> +   .flags = SWITCHDEV_F_DEFER,
>>> +   },
>>> .vid = f->vlan_id,
>>> };
>>>
>>> diff --git a/net/bridge/br_if.c b/net/bridge/br_if.c
>>> index 934cae9..09147cb 100644
>>> --- a/net/bridge/br_if.c
>>> +++ b/net/bridge/br_if.c
>>> @@ -24,6 +24,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  #include "br_private.h"
>>>
>>> @@ -249,6 +250,8 @@ static void del_nbp(struct net_bridge_port *p)
>>> list_del_rcu(>list);
>>>
>>> br_fdb_delete_by_port(br, p, 0, 1);
>>> +   switchdev_flush_deferred();
>>> +
>>
>> This potentially flushes other (valid) work on the deferred queue not
>> related to FDB del.
>>
>> I wonder if this flush step is necessary at all?  The work we deferred
>> to delete the FDB entry can still happen after the port has been
>> removed (del_nbp).  If the port driver/device find the FDB entry, then
>> delete it, otherwise ignore it.
>>
>
> Just the first thing that springs to mind reading this comment is,
>
>   - del gets deffered
>   - add fdb
>   - del runs
>
> Is there an issue here? Sorry I'll do a more thorough review now just
> thought I would toss it out there before I forget.

It's a valid thought to consider, for sure.  The context is these are
only FDB entries added by an external learn event.  So I believe in
your sequence, the second step to add fdb entry wouldn't happen as the
fdb entry already exists at that point (in other words, the entry has
already been learned on external device and pushed up via notifier to
bridge).  So I think we're OK in regards to your question.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: phy: smsc: disable energy detect mode

2015-10-12 Thread Heiko Schocher
On some boards the energy enable detect mode leads in
trouble with some switches, so make the enabling of
this mode configurable through DT.

Signed-off-by: Heiko Schocher 
---

 .../devicetree/bindings/net/smsc-lan87xx.txt   | 19 +
 drivers/net/phy/smsc.c | 24 +-
 2 files changed, 38 insertions(+), 5 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/smsc-lan87xx.txt

diff --git a/Documentation/devicetree/bindings/net/smsc-lan87xx.txt 
b/Documentation/devicetree/bindings/net/smsc-lan87xx.txt
new file mode 100644
index 000..39aa1dc
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/smsc-lan87xx.txt
@@ -0,0 +1,19 @@
+SMSC LAN87xx Ethernet PHY
+
+Some boards require special tuning values. Configure them
+through an Ethernet OF device node.
+
+Optional properties:
+
+- disable-energy-detect:
+  If set, do not enable energy detect mode for the SMSC phy.
+  default: enable energy detect mode
+
+Examples:
+
+   /* Attach to an Ethernet device with autodetected PHY */
+   _emac0 {
+   phy_id = <_mdio>, <0>;
+   phy-mode = "mii";
+   disable-energy-detect;
+   };
diff --git a/drivers/net/phy/smsc.c b/drivers/net/phy/smsc.c
index 70b0895..f90fbf3 100644
--- a/drivers/net/phy/smsc.c
+++ b/drivers/net/phy/smsc.c
@@ -43,16 +43,30 @@ static int smsc_phy_ack_interrupt(struct phy_device *phydev)
 
 static int smsc_phy_config_init(struct phy_device *phydev)
 {
+#ifdef CONFIG_OF
+   int len;
+   struct device *dev = >dev;
+   struct device_node *of_node = dev->of_node;
+#endif
int rc = phy_read(phydev, MII_LAN83C185_CTRL_STATUS);
+   int enable_energy = 1;
 
if (rc < 0)
return rc;
 
-   /* Enable energy detect mode for this SMSC Transceivers */
-   rc = phy_write(phydev, MII_LAN83C185_CTRL_STATUS,
-  rc | MII_LAN83C185_EDPWRDOWN);
-   if (rc < 0)
-   return rc;
+#ifdef CONFIG_OF
+   if (!of_node && dev->parent->of_node)
+   of_node = dev->parent->of_node;
+   if (of_find_property(of_node, "disable-energy-detect", ))
+   enable_energy = 0;
+#endif
+   if (enable_energy) {
+   /* Enable energy detect mode for this SMSC Transceivers */
+   rc = phy_write(phydev, MII_LAN83C185_CTRL_STATUS,
+  rc | MII_LAN83C185_EDPWRDOWN);
+   if (rc < 0)
+   return rc;
+   }
 
return smsc_phy_ack_interrupt(phydev);
 }
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v3 3/4] bridge: push bridge setting ageing_time down to switchdev

2015-10-12 Thread Scott Feldman
On Sat, Oct 10, 2015 at 8:56 AM, Vivien Didelot
 wrote:

> Scott, didn't you have a plan to add a struct device for the parent of
> switchdev ports?

I had sent out a rough RFC for a switch device in the last window.  I
have continued working on it, and I plan to send it very soon,
probably again as an RFC.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


<    1   2   3