RE: [PATCH 1/2] bnx2x:Fix error handling in the function bnxc2x_dbcx_set_params
> This fixes the error handling in the function bnxc2x_dbcx_params for both > calls > to bnx2x_dcbnl_update_applist by checking if they failed by returning a error > code and if so return immediately to this function's caller due to this > nonrecoverable internal failure. Hi Nicholas, Even assuming this is the reasonable thing to do [which I'm not fully convinced], I don't think this solution is complete - you'd also need to break the iteration in bnx2x_dcbnl_update_applist() [at the moment said function always returns 'rc' from the last dcb_app weve tried setting]. Also, why do you send multiple patches for fixing the same issue in different functions in the same file? Thanks, Yuval > > Signed-off-by: Nicholas Krause > --- > drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c > b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c > index 6e4294e..e3cd0c6 100644 > --- a/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c > +++ b/drivers/net/ethernet/broadcom/bnx2x/bnx2x_dcb.c > @@ -724,7 +724,8 @@ void bnx2x_dcbx_set_params(struct bnx2x *bp, u32 > state) >* Delete app tlvs from dcbnl before reading new >* negotiation results >*/ > - bnx2x_dcbnl_update_applist(bp, true); > + if (bnx2x_dcbnl_update_applist(bp, true)) > + return; > > /* Read remote mib if dcbx is in the FW */ > if (bnx2x_dcbx_read_shmem_remote_mib(bp)) > @@ -748,7 +749,8 @@ void bnx2x_dcbx_set_params(struct bnx2x *bp, u32 > state) > /* >* Add new app tlvs to dcbnl >*/ > - bnx2x_dcbnl_update_applist(bp, false); > + if (bnx2x_dcbnl_update_applist(bp, false)) > + return; > #endif > /* >* reconfigure the netdevice with the results of the new > -- > 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC net-next] net/vxlan: Fix kernel unaligned access in __vxlan_find_mac
From: Joe Perches Date: Sat, 18 Jul 2015 11:06:26 -0700 > As it's not fatal, naybe the sparc64 message should be > KERN_DEBUG/pr_debug. It's not fatal insofar as it doesn't crash the system. But performance wise is _IS_ fatal, and I want the kernel to bark at the user so that they report the issue or fix it sooner rather than later. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next]r8169:Disable advertise full duplex on chip id RTL_GIGA_MAC_VER_09. Disable advertise full duplex on chip id RTL_GIGA_MAC_VER_09 wich support random full duplex.(very sad) On branc
This is still ver far from acceptable. You aren't even talking to us when we ask you questions and ask you to make changes, you just make another improperly formatted submission of a bad patch all over again. Your entire description is in your Subject line, and it insufficiently describes your change. Frankly, I am very sick and tired of these low quality submissions. I want to know where these changes are coming from, who they are important to, and most importantly why it is so difficult for you to submit your changes properly. Just simply _LOOK_ at postings by other developers who submit changes that successfully get accepted by us. _LOOK_ at how their Subject line looks, and mimick what they do _EXACTLY_ _LOOK_ at how their commit messages look, how detailed they are, how descriptive they are, and how they are formatted and composed, and mimick _EACTLY_ what they are doing. This isn't rocket science, if you want us to take your changes seriously you _MUST_ learn how to conform to our patch submission methods and expectations. Thanks. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] inet: Always increment refcount in inet_twsk_schedule
I am seeing an issue with the reference count of time wait sockets which leads to freeing of active timer object. This occurs in some data stress test setups, so I am unable to determine the exact step when it occured. However, I logged the refcount and was able to find out the code path which leads to this problem. //Initialize time wait socket and setup timer inet_twsk_alloc() tw_refcnt = 0 __inet_twsk_hashdance() tw_refcnt = 3 inet_twsk_schedule() tw_refcnt = 4 inet_twsk_put() tw_refcnt = 3 //Receive packet 1 in timewait state tcp_timewait_state_process() -> inet_twsk_schedule tw_refcnt = 3 (no change) TCP: tcp_v4_timewait_ack() -> inet_twsk_put() tw_refcnt = 2 //Receive packet 2 in timewait state tcp_timewait_state_process() -> inet_twsk_schedule tw_refcnt = 2 (no change) TCP: tcp_v4_timewait_ack() -> inet_twsk_put() tw_refcnt = 1 //Receive packet 3 in timewait state tcp_timewait_state_process() -> inet_twsk_schedule tw_refcnt = 1 (no change) TCP: tcp_v4_timewait_ack() -> inet_twsk_put() tw_refcnt = 0 After this step, the time wait socket is destroyed along with the active timer object. This leads to a warning being printed which eventually leads to a crash. ODEBUG: free active (active state 0) object type: timer_list hint: tw_timer_handler+0x0/0x68 It appears that inet_twsk_schedule needs to increment the reference count unconditionally, otherwise the socket will be destroyed since reference count will be decremented each time an ack is sent out as a response for an incoming packet. Signed-off-by: Subash Abhinov Kasiviswanathan --- net/ipv4/inet_timewait_sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/inet_timewait_sock.c b/net/ipv4/inet_timewait_sock.c index cbeb022..99c349a 100644 --- a/net/ipv4/inet_timewait_sock.c +++ b/net/ipv4/inet_timewait_sock.c @@ -246,9 +246,9 @@ void inet_twsk_schedule(struct inet_timewait_sock *tw, const int timeo) tw->tw_kill = timeo <= 4*HZ; if (!mod_timer_pinned(&tw->tw_timer, jiffies + timeo)) { - atomic_inc(&tw->tw_refcnt); atomic_inc(&tw->tw_dr->tw_count); } + atomic_inc(&tw->tw_refcnt); } EXPORT_SYMBOL_GPL(inet_twsk_schedule); -- Employee of Qualcomm Innovation Center, Inc. Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 4/5] rocker: add offload_fwd_mark support
From: Scott Feldman If device flags ingress packet as "fwd offload", mark the skb->offlaod_fwd_mark using the ingress port's dev->offlaod_fwd_mark. This will be the hint to the kernel that this packet has already been forwarded by device to egress ports matching skb->offlaod_fwd_mark. For rocker, derive port dev->offlaod_fwd_mark based on device switch ID and port ifindex. If port is bridged, use the bridge ifindex rather than the port ifindex. Signed-off-by: Scott Feldman Acked-by: Jiri Pirko --- drivers/net/ethernet/rocker/rocker.c | 11 +++ drivers/net/ethernet/rocker/rocker.h |1 + 2 files changed, 12 insertions(+) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 9324283..0fdfa47 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4800,6 +4800,7 @@ static int rocker_port_rx_proc(const struct rocker *rocker, const struct rocker_tlv *attrs[ROCKER_TLV_RX_MAX + 1]; struct sk_buff *skb = rocker_desc_cookie_ptr_get(desc_info); size_t rx_len; + u16 rx_flags = 0; if (!skb) return -ENOENT; @@ -4807,6 +4808,8 @@ static int rocker_port_rx_proc(const struct rocker *rocker, rocker_tlv_parse_desc(attrs, ROCKER_TLV_RX_MAX, desc_info); if (!attrs[ROCKER_TLV_RX_FRAG_LEN]) return -EINVAL; + if (attrs[ROCKER_TLV_RX_FLAGS]) + rx_flags = rocker_tlv_get_u16(attrs[ROCKER_TLV_RX_FLAGS]); rocker_dma_rx_ring_skb_unmap(rocker, attrs); @@ -4814,6 +4817,9 @@ static int rocker_port_rx_proc(const struct rocker *rocker, skb_put(skb, rx_len); skb->protocol = eth_type_trans(skb, rocker_port->dev); + if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD) + skb->offload_fwd_mark = rocker_port->dev->offload_fwd_mark; + rocker_port->dev->stats.rx_packets++; rocker_port->dev->stats.rx_bytes += skb->len; @@ -4951,6 +4957,8 @@ static int rocker_probe_port(struct rocker *rocker, unsigned int port_number) } rocker->ports[port_number] = rocker_port; + switchdev_port_fwd_mark_set(rocker_port->dev, NULL, false); + rocker_port_set_learning(rocker_port, SWITCHDEV_TRANS_NONE); err = rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, 0); @@ -5230,6 +5238,7 @@ static int rocker_port_bridge_join(struct rocker_port *rocker_port, rocker_port_internal_vlan_id_get(rocker_port, bridge->ifindex); rocker_port->bridge_dev = bridge; + switchdev_port_fwd_mark_set(rocker_port->dev, bridge, true); return rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE, untagged_vid, 0); @@ -5250,6 +5259,8 @@ static int rocker_port_bridge_leave(struct rocker_port *rocker_port) rocker_port_internal_vlan_id_get(rocker_port, rocker_port->dev->ifindex); + switchdev_port_fwd_mark_set(rocker_port->dev, rocker_port->bridge_dev, + false); rocker_port->bridge_dev = NULL; err = rocker_port_vlan_add(rocker_port, SWITCHDEV_TRANS_NONE, diff --git a/drivers/net/ethernet/rocker/rocker.h b/drivers/net/ethernet/rocker/rocker.h index 08b2c3d..12490b2 100644 --- a/drivers/net/ethernet/rocker/rocker.h +++ b/drivers/net/ethernet/rocker/rocker.h @@ -246,6 +246,7 @@ enum { #define ROCKER_RX_FLAGS_TCPBIT(5) #define ROCKER_RX_FLAGS_UDPBIT(6) #define ROCKER_RX_FLAGS_TCP_UDP_CSUM_GOOD BIT(7) +#define ROCKER_RX_FLAGS_FWD_OFFLOADBIT(8) enum { ROCKER_TLV_TX_UNSPEC, -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 5/5] switchdev: update documentation for offload_fwd_mark
From: Scott Feldman Signed-off-by: Scott Feldman Acked-by: Jiri Pirko --- Documentation/networking/switchdev.txt | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/Documentation/networking/switchdev.txt b/Documentation/networking/switchdev.txt index c5d7ade..9825f32 100644 --- a/Documentation/networking/switchdev.txt +++ b/Documentation/networking/switchdev.txt @@ -279,8 +279,18 @@ and unknown unicast packets to all ports in domain, if allowed by port's current STP state. The switch driver, knowing which ports are within which vlan L2 domain, can program the switch device for flooding. The packet should also be sent to the port netdev for processing by the bridge driver. The -bridge should not reflood the packet to the same ports the device flooded. -XXX: the mechanism to avoid duplicate flood packets is being discuseed. +bridge should not reflood the packet to the same ports the device flooded, +otherwise there will be duplicate packets on the wire. + +To avoid duplicate packets, the device/driver should mark a packet as already +forwarded using skb->offload_fwd_mark. The same mark is set on the device +ports in the domain using dev->offload_fwd_mark. If the skb->offload_fwd_mark +is non-zero and matches the forwarding egress port's dev->skb_mark, the kernel +will drop the skb right before transmit on the egress port, with the +understanding that the device already forwarded the packet on same egress port. +The driver can use switchdev_port_fwd_mark_set() to set a globally unique mark +for port's dev->offload_fwd_mark, based on the port's parent ID (switch ID) and +a group ifindex. It is possible for the switch device to not handle flooding and push the packets up to the bridge driver for flooding. This is not ideal as the number -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 2/5] net: add phys ID compare helper to test if two IDs are the same
From: Scott Feldman Signed-off-by: Scott Feldman Acked-by: Jiri Pirko --- include/linux/netdevice.h |7 +++ net/switchdev/switchdev.c |8 ++-- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 8364f29..607b5f4 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -766,6 +766,13 @@ struct netdev_phys_item_id { unsigned char id_len; }; +static inline bool netdev_phys_item_id_same(struct netdev_phys_item_id *a, + struct netdev_phys_item_id *b) +{ + return a->id_len == b->id_len && + memcmp(a->id, b->id, a->id_len) == 0; +} + typedef u16 (*select_queue_fallback_t)(struct net_device *dev, struct sk_buff *skb); diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 9f2add3..4e5bba5 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -910,13 +910,9 @@ static struct net_device *switchdev_get_dev_by_nhs(struct fib_info *fi) if (switchdev_port_attr_get(dev, &attr)) return NULL; - if (nhsel > 0) { - if (prev_attr.u.ppid.id_len != attr.u.ppid.id_len) + if (nhsel > 0 && + !netdev_phys_item_id_same(&prev_attr.u.ppid, &attr.u.ppid)) return NULL; - if (memcmp(prev_attr.u.ppid.id, attr.u.ppid.id, - attr.u.ppid.id_len)) - return NULL; - } prev_attr = attr; } -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 0/5] switchdev: avoid duplicate packet forwarding
From: Scott Feldman v3: - Per Nicolas Dichtel review: remove errant empty union. v2: - Per davem review: in sk_buff, union fwd_mark with secmark to save space since features appear to be mutually exclusive. - Per Simon Horman review: - fix grammar in switchdev.txt wrt fwd_mark - remove some unrelated changes that snuck in v1: This patchset was previously submitted as RFC. No changes from the last version (v2) sent under RFC. Including RFC version history here for reference. RFC v2: - s/fwd_mark/offload_fwd_mark - use consume_skb rather than kfree_skb when dropping pkt on egress. - Use Jiri's suggestion to use ifindex of one of the ports in a group as the mark for all the ports in the group. This can be done with no additional storage (no hashtable from v1). To pull it off, we need some simple recursive routines to walk the netdev tree ensuring all leaves in the tree (ports) in the same group (e.g. bridge) belonging to the same switch device will have the same offload fwd mark. Maybe someone sees a better design for the recusive routines? They're not too bad, and should cover the stacked driver cases. RFC v1: With switchdev support for offloading L2/L3 forwarding data path to a switch device, we have a general problem where both the device and the kernel may forward the packet, resulting in duplicate packets on the wire. Anytime a packet is forwarded by the device and a copy is sent to the CPU, there is potential for duplicate forwarding, as the kernel may also do a forwarding lookup and send the packet on the wire. The specific problem this patch series is interested in solving is avoiding duplicate packets on bridged ports. There was a previous RFC from Roopa (http://marc.info/?l=linux-netdev&m=142687073314252&w=2) to address this problem, but didn't solve the problem of mixed ports in the bridge from different devices; there was no way to exclude some ports from forwarding and include others. This RFC solves that problem by tagging the ingressing packet with a unique mark, and then comparing the packet mark with the egress port mark, and skip forwarding when there is a match. For the mixed ports bridge case, only those ports with matching marks are skipped. The switchdev port driver must do two things: 1) Generate a fwd_mark for each switch port, using some unique key of the switch device (and optionally port). This is done when the port netdev is registered or if the port's group membership changes (joins/leaves a bridge, for example). 2) On packet ingress from port, mark the skb with the ingress port's fwd_mark. If the device supports it, it's useful to only mark skbs which were already forwarded by the device. If the device does not support such indication, all skbs can be marked, even if they're local dst. Two new 32-bit fields are added to struct sk_buff and struct netdevice to hold the fwd_mark. I've wrapped these with CONFIG_NET_SWITCHDEV for now. I tried using skb->mark for this purpose, but ebtables can overwrite the skb->mark before the bridge gets it, so that will not work. In general, this fwd_mark can be used for any case where a packet is forwarded by the device and a copy is sent to the CPU, to avoid the kernel re-forwarding the packet. sFlow is another use-case that comes to mind, but I haven't explored the details. Scott Feldman (5): net: don't reforward packets already forwarded by offload device net: add phys ID compare helper to test if two IDs are the same switchdev: add offload_fwd_mark generator helper rocker: add offload_fwd_mark support switchdev: update documentation for offload_fwd_mark Documentation/networking/switchdev.txt | 14 +++- drivers/net/ethernet/rocker/rocker.c | 11 drivers/net/ethernet/rocker/rocker.h |1 + include/linux/netdevice.h | 13 include/linux/skbuff.h |9 ++- include/net/switchdev.h|9 +++ net/core/dev.c | 10 +++ net/switchdev/switchdev.c | 111 ++-- 8 files changed, 169 insertions(+), 9 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 1/5] net: don't reforward packets already forwarded by offload device
From: Scott Feldman Just before queuing skb for xmit on port, check if skb has been marked by switchdev port driver as already fordwarded by device. If so, drop skb. A non-zero skb->offload_fwd_mark field is set by the switchdev port driver/device on ingress to indicate the skb has already been forwarded by the device to egress ports with matching dev->skb_mark. The switchdev port driver would assign a non-zero dev->offload_skb_mark for each device port netdev during registration, for example. Signed-off-by: Scott Feldman Acked-by: Jiri Pirko Acked-by: Roopa Prabhu --- include/linux/netdevice.h |6 ++ include/linux/skbuff.h|9 - net/core/dev.c| 10 ++ 3 files changed, 24 insertions(+), 1 deletion(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 45cfd79..8364f29 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1456,6 +1456,8 @@ enum netdev_priv_flags { * * @xps_maps: XXX: need comments on this one * + * @offload_fwd_mark: Offload device fwding mark + * * @trans_start: Time (in jiffies) of last Tx * @watchdog_timeo:Represents the timeout that is used by * the watchdog ( see dev_watchdog() ) @@ -1697,6 +1699,10 @@ struct net_device { struct xps_dev_maps __rcu *xps_maps; #endif +#ifdef CONFIG_NET_SWITCHDEV + u32 offload_fwd_mark; +#endif + /* These may be needed for future network-power-down code. */ /* diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index d6cdd6e..af7a096 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -506,6 +506,7 @@ static inline u32 skb_mstamp_us_delta(const struct skb_mstamp *t1, * @no_fcs: Request NIC to treat last 4 bytes as Ethernet FCS *@napi_id: id of the NAPI struct this skb came from * @secmark: security marking + * @offload_fwd_mark: fwding offload mark * @mark: Generic packet mark * @vlan_proto: vlan encapsulation protocol * @vlan_tci: vlan tag control information @@ -650,9 +651,15 @@ struct sk_buff { unsigned intsender_cpu; }; #endif + union { #ifdef CONFIG_NETWORK_SECMARK - __u32 secmark; + __u32 secmark; +#endif +#ifdef CONFIG_NET_SWITCHDEV + __u32 offload_fwd_mark; #endif + }; + union { __u32 mark; __u32 reserved_tailroom; diff --git a/net/core/dev.c b/net/core/dev.c index 8810b6b..2ee15af 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3061,6 +3061,16 @@ static int __dev_queue_xmit(struct sk_buff *skb, void *accel_priv) else skb_dst_force(skb); +#ifdef CONFIG_NET_SWITCHDEV + /* Don't forward if offload device already forwarded */ + if (skb->offload_fwd_mark && + skb->offload_fwd_mark == dev->offload_fwd_mark) { + consume_skb(skb); + rc = NET_XMIT_SUCCESS; + goto out; + } +#endif + txq = netdev_pick_tx(dev, skb, accel_priv); q = rcu_dereference_bh(txq->qdisc); -- 1.7.10.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v3 3/5] switchdev: add offload_fwd_mark generator helper
From: Scott Feldman skb->offload_fwd_mark and dev->offload_fwd_mark are 32-bit and should be unique for device and may even be unique for a sub-set of ports within device, so add switchdev helper function to generate unique marks based on port's switch ID and group_ifindex. group_ifindex would typically be the container dev's ifindex, such as the bridge's ifindex. The generator uses a global hash table to store offload_fwd_marks hashed by {switch ID, group_ifindex} key. Signed-off-by: Scott Feldman Acked-by: Jiri Pirko --- include/net/switchdev.h |9 net/switchdev/switchdev.c | 103 + 2 files changed, 112 insertions(+) diff --git a/include/net/switchdev.h b/include/net/switchdev.h index d5671f1..89da893 100644 --- a/include/net/switchdev.h +++ b/include/net/switchdev.h @@ -157,6 +157,9 @@ int switchdev_port_fdb_del(struct ndmsg *ndm, struct nlattr *tb[], int switchdev_port_fdb_dump(struct sk_buff *skb, struct netlink_callback *cb, struct net_device *dev, struct net_device *filter_dev, int idx); +void switchdev_port_fwd_mark_set(struct net_device *dev, +struct net_device *group_dev, +bool joining); #else @@ -271,6 +274,12 @@ static inline int switchdev_port_fdb_dump(struct sk_buff *skb, return -EOPNOTSUPP; } +static inline void switchdev_port_fwd_mark_set(struct net_device *dev, + struct net_device *group_dev, + bool joining) +{ +} + #endif #endif /* _LINUX_SWITCHDEV_H_ */ diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c index 4e5bba5..33bafa2 100644 --- a/net/switchdev/switchdev.c +++ b/net/switchdev/switchdev.c @@ -1039,3 +1039,106 @@ void switchdev_fib_ipv4_abort(struct fib_info *fi) fi->fib_net->ipv4.fib_offload_disabled = true; } EXPORT_SYMBOL_GPL(switchdev_fib_ipv4_abort); + +static bool switchdev_port_same_parent_id(struct net_device *a, + struct net_device *b) +{ + struct switchdev_attr a_attr = { + .id = SWITCHDEV_ATTR_PORT_PARENT_ID, + .flags = SWITCHDEV_F_NO_RECURSE, + }; + struct switchdev_attr b_attr = { + .id = SWITCHDEV_ATTR_PORT_PARENT_ID, + .flags = SWITCHDEV_F_NO_RECURSE, + }; + + if (switchdev_port_attr_get(a, &a_attr) || + switchdev_port_attr_get(b, &b_attr)) + return false; + + return netdev_phys_item_id_same(&a_attr.u.ppid, &b_attr.u.ppid); +} + +static u32 switchdev_port_fwd_mark_get(struct net_device *dev, + struct net_device *group_dev) +{ + struct net_device *lower_dev; + struct list_head *iter; + + netdev_for_each_lower_dev(group_dev, lower_dev, iter) { + if (lower_dev == dev) + continue; + if (switchdev_port_same_parent_id(dev, lower_dev)) + return lower_dev->offload_fwd_mark; + return switchdev_port_fwd_mark_get(dev, lower_dev); + } + + return dev->ifindex; +} + +static void switchdev_port_fwd_mark_reset(struct net_device *group_dev, + u32 old_mark, u32 *reset_mark) +{ + struct net_device *lower_dev; + struct list_head *iter; + + netdev_for_each_lower_dev(group_dev, lower_dev, iter) { + if (lower_dev->offload_fwd_mark == old_mark) { + if (!*reset_mark) + *reset_mark = lower_dev->ifindex; + lower_dev->offload_fwd_mark = *reset_mark; + } + switchdev_port_fwd_mark_reset(lower_dev, old_mark, reset_mark); + } +} + +/** + * switchdev_port_fwd_mark_set - Set port offload forwarding mark + * + * @dev: port device + * @group_dev: containing device + * @joining: true if dev is joining group; false if leaving group + * + * An ungrouped port's offload mark is just its ifindex. A grouped + * port's (member of a bridge, for example) offload mark is the ifindex + * of one of the ports in the group with the same parent (switch) ID. + * Ports on the same device in the same group will have the same mark. + * + * Example: + * + * br0 ifindex=9 + * sw1p1 ifindex=2 mark=2 + * sw1p2 ifindex=3 mark=2 + * sw2p1 ifindex=4 mark=5 + * sw2p2 ifindex=5 mark=5 + * + * If sw2p2 leaves the bridge, we'll have: + * + * br0 ifindex=9 + * sw1p1 ifindex=2 mark=2 + * sw1p2 ifindex=3 mark=2 + * sw2p1 ifindex=4 mark=4 + * sw2p2
Re: [PATCH 1/3] fixed_phy: handle link-down case
18.07.2015 05:29, Florian Fainelli пишет: Le 07/17/15 16:53, Stas Sergeev a écrit : 18.07.2015 02:35, Florian Fainelli пишет: On 17/07/15 16:24, Stas Sergeev wrote: 18.07.2015 01:01, Florian Fainelli пишет: On 17/07/15 13:03, Stas Sergeev wrote: 17.07.2015 21:50, Florian Fainelli пишет: On 17/07/15 04:26, Stas Sergeev wrote: 17.07.2015 02:25, Florian Fainelli пишет: On 16/07/15 07:50, Stas Sergeev wrote: Currently fixed_phy driver recognizes only the link-up state. This simple patch adds an implementation of link-down state. It fixes the status registers when link is down, and also allows to register the fixed-phy with link down without specifying the speed. This patch still breaks my setups here, e.g: drivers/net/dsa/bcm_sf2.c, but I will look into it. Do we really need this for now for your two other patches to work properly, or is it just nicer to have? Yes, absolutely. Otherwise registering fixed phy will return -EINVAL because of the missing link speed (even though the link is down). Ok, I see the problem that you have now. Arguably you could say that according to the fixed-link binding, speed needs to be specified and the code correctly errors out with such an error if you do not specify it. I Aren't you missing the fact that .link=0? I think what you say is true only for the link-up case, no? .speed==0 is valid for link-down IMHO: no link - zero speed. Pardon me being very dense and stupid here, but your problem is that the "speed" parameter is not specified in your DT, Not even a fixed-link at all, since the latest patches. I removed fixed-link defs from my DT. Hummm, okay, so you just have the inband-status property and that's it, not even a fixed-link node anymore, right? How does mvneta_fixed_link_update() work then since it needs a fixed PHY to be registered? You can see it from my patch: --- +err = of_property_read_string(np, "managed", &managed); +if (err == 0) { +if (strcmp(managed, "in-band-status") == 0) { +/* status is zeroed, namely its .link member */ +phy = fixed_phy_register(PHY_POLL, &status, np); +return IS_ERR(phy) ? PTR_ERR(phy) : 0; +} +} --- which is the hunk added to the of_phy_register_fixed_link(). So in that case we register fixed-phy, but do not parse the fixed-link. Ok, I missed that part. Could not you just override everything that is needed here to get past the point where you register your fixed PHY even with link = 0, this will be discarded anyway once you start in-band negotiation. Maybe my English is bad, but I have problems understanding some of your senteneces. What do you mean? If you meant to re-use the existing registration code instead of adding a new hunk, please note that there is no fixed-link node at all, so we do not even enter the parsing code block. As such, there is nothing to override. I will work on something anyway. Thanks, hope to hear from you soon. This stream of regressions is disturbing. :) Should finally be fixed for real. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 2/3] sched: cls_flower: fix panic on filter replace
Fri, Jul 17, 2015 at 10:38:44PM CEST, dan...@iogearbox.net wrote: >The following test case causes a NULL pointer dereference in cls_flower: > > tc filter add dev foo parent 1: flower eth_type ipv4 action ok flowid 1:1 > tc filter replace dev foo parent 1: pref 49152 handle 0x1 \ >flower eth_type ipv6 action ok flowid 1:1 > >The problem is that commit 77b9900ef53a ("tc: introduce Flower classifier") >accidentally swapped the arguments of list_replace_rcu(), the old >element needs to be the first argument and the new element the second. > >Fixes: 77b9900ef53a ("tc: introduce Flower classifier") >Signed-off-by: Daniel Borkmann Acked-by: Jiri Pirko -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: mvneta.c, MVNETA_TXQ_UPDATE_REG, shut down issue
Constantine, On Sat, 18 Jul 2015 21:48:59 +0300, Constantine Shulyupin wrote: > I try to run vanilla kernel 4.2.0 on custom Armada-370 board, based on > DB-88F6710-BP. > > The problem is that the board shuts down after writing to > MVNETA_TXQ_UPDATE_REG in this code: What do you mean by shut down? Physically turns off? I hardly see how writing to this register can turn off the board. Can you try on a non-custom A370 board? Thanks, Thomas -- Thomas Petazzoni, CTO, Free Electrons Embedded Linux, Kernel and Android engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
mvneta.c, MVNETA_TXQ_UPDATE_REG, shut down issue
Hi Rami and Thomas, I try to run vanilla kernel 4.2.0 on custom Armada-370 board, based on DB-88F6710-BP. The problem is that the board shuts down after writing to MVNETA_TXQ_UPDATE_REG in this code: static void mvneta_txq_pend_desc_add(struct mvneta_port *pp, struct mvneta_tx_queue *txq, int pend_desc) { u32 val; /* Only 255 descriptors can be added at once ; Assume caller * process TX desriptors in quanta less than 256 */ val = pend_desc; mvreg_write(pp, MVNETA_TXQ_UPDATE_REG(txq->id), val); } BTW, pend_desc == 1 What can you suggest? Thanks Constantine -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2 RFC net-next] net/vxlan: Fix kernel unaligned access in __vxlan_find_mac
On Fri, 2015-07-17 at 18:18 -0700, David Miller wrote: > From: Joe Perches Date: Fri, 17 Jul 2015 16:07:02 -0700 > > On Fri, 2015-07-17 at 22:00 +0200, Sowmini Varadhan wrote: > >> __vxlan_find_mac invokes ether_addr_equal on the eth_addr field, > >> which triggers unaligned access messages, so rearrange vxlan_fdb > >> to avoid this in the most non-intrusive way. > > What arch does this? > Sparc, MIPS, etc. It seems that this code has had unaligned accesses on this field even before compare_ether_addr was converted to ether_addr_equal. Is sparc64 the only one that emits / ratelimits that unaligned access message? I looked a little, but I didn't find a fixup message when MIPS does unaligned accesses. Are all the other arches silent when fixing up unaligned accesses? Maye adding a generic debug only ratelimited message might help remove more of these. As it's not fatal, naybe the sparc64 message should be KERN_DEBUG/pr_debug. --- arch/sparc/kernel/unaligned_64.c | 8 ++-- 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/arch/sparc/kernel/unaligned_64.c b/arch/sparc/kernel/unaligned_64.c index 62098a8..6b7aeb7 100644 --- a/arch/sparc/kernel/unaligned_64.c +++ b/arch/sparc/kernel/unaligned_64.c @@ -294,13 +294,9 @@ static void kernel_mna_trap_fault(int fixup_tstate_asi) static void log_unaligned(struct pt_regs *regs) { - static DEFINE_RATELIMIT_STATE(ratelimit, 5 * HZ, 5); + pr_debug_ratelimited("Kernel unaligned access at TPC[%lx] %pS\n", +regs->tpc, (void *)regs->tpc); - if (__ratelimit(&ratelimit)) { - printk("Kernel unaligned access at TPC[%lx] %pS\n", - regs->tpc, (void *) regs->tpc); - } -} asmlinkage void kernel_unaligned_trap(struct pt_regs *regs, unsigned int insn) { -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 -next] net: fib: use fib result when zero-length prefix aliases exist
Julian Anastasov wrote: [ Dave, please toss my patch, its either v3 or something else entirely ] > In fact, TOS should be matched just like in > fib_table_lookup but it is not. > > > This changes fib_select_default to not change the FIB chosen result EXCEPT > > if this nexthop appears to be unreachable. > > > > fib_detect_death() logic is reversed -- we consider a nexthop 'dead' only > > if it has a neigh entry in unreachable state. > > The only difference I see is that NUD_NODE is > declared by fib_nud_is_unreach() as reachable. May be > it is better, for example, new route in NUD_NONE state > will be tried for a while, until its state is determined. fib_detect_death considers neigh_lookup() returning NULL as unreach. I don't think its good idea and should rather only skip if NUD is failed or incomplete. > > + if (likely(!fib_nud_is_unreach(res->fi))) > > + return; > > here first is unreachable... > > + /* attempt to pick another nexthop */ > > hlist_for_each_entry_rcu(fa, fa_head, fa_list) { > > struct fib_info *next_fi = fa->fa_info; > > > > @@ -1223,38 +1223,30 @@ void fib_select_default(struct fib_result *res) > > next_fi->fib_nh[0].nh_scope != RT_SCOPE_LINK) > > continue; > > > > + order++; > > + > > + if (next_fi == res->fi) /* already tested, not reachable */ > > + continue; > > + > > fib_alias_accessed(fa); > > > > - if (!fi) { > > - if (next_fi != res->fi) > > + if (fib_nud_is_unreach(next_fi)) > > all others are unreachable too... > > > + continue; > > + > > + /* try to round-robin among all fa_aliases in case > > +* res->fi nexthop is unreachable. > > round-robin only among reachables? But original algorithm > can round-robin among unreachables. State will not change > without trying some packets to unreachable dests. Hmm... thanks for pointing that out. But thats confusing. If thats true, why does fib_detect_death even exist? If we skip unreachables, they cannot become reachable again...? > > + fi = next_fi; > > + fi_idx = order; > > + if (order > tb->tb_default) > > break; > > 1=unreac, 2=reach, 3=tb_default(reach), 4=reach. > We like 2 (fi == NULL), why we continue after 2, skip > default 3 (because order > tb->tb_default is false) and > finally select 4 (order=4 > tb->tb_default=3)? Hmm. Let me ask different question. What is the supposed behaviour in the case you describe? Removing order > default test means we'll always pick first reachable in the list, no? I don't think thats bad, but, how/why is is that 'better'? > > + tb->tb_default = fi_idx; > > And we do not round-robin if all look unreachable? Right. One possible solution might be this patch, but alter fib_nud_is_unreach() to also check timer_pending(n->timer), i.e. if (state & (NUD_INCOMPLETE | NUD_FAILED)) return timer_was_pending ? UNREACHABLE : REACHABLE; This way we would re-try gw not only if neigh doesn't exist but also if no retry is pending. What do you think? > As for fib_select_default, > may be only change like below is needed. It additionally > restricts the alternative routes to same prefix, same TOS. > The TOS restriction was missing from long time but the > prefix check was lost recently when fa_head was changed > to contain aliases from different prefixes. > /* Must be invoked inside of an RCU protected region. */ > -void fib_select_default(struct fib_result *res) > +void fib_select_default(const struct flowi4 *flp, struct fib_result *res) > { > struct fib_info *fi = NULL, *last_resort = NULL; > struct hlist_head *fa_head = res->fa_head; > struct fib_table *tb = res->table; > + u8 slen = 32 - res->prefixlen; > int order = -1, last_idx = -1; > struct fib_alias *fa; > > hlist_for_each_entry_rcu(fa, fa_head, fa_list) { > struct fib_info *next_fi = fa->fa_info; > > + if (fa->fa_slen != slen) > + continue; > if (next_fi->fib_scope != res->scope || > fa->fa_type != RTN_UNICAST) > continue; > > + if (fa->tb_id != tb->tb_id) > + continue; > + if (fa->fa_tos && fa->fa_tos != flp->flowi4_tos) > + continue; Hmm, this means we won't consider alternative if tos doesn't match, except 0. So with ip route add tos 0x00 via 192.168.1.1 ip route add tos 0x04 via 192.168.1.2 ip route add tos 0x08 via 192.168.1.3 ip route add tos 0x0c via 192.168.1.4 ip route add tos 0x10 via 192.168.1.5 and ping -Q 1.2.3.4 if 1.5 is unreachable, we pick 1.1 (tos 0x0). Others are not attempted. Do you consider that 'correct'? I don't see why we can use tos 0x00 bu
Re: [PATCH 1/2] net: dsa: mv88e6xxx: sleep in _mv88e6xxx_stats_wait
On Sat, Jul 18, 2015 at 11:23:19AM -0400, Vivien Didelot wrote: > Hi all, > > - On Jul 18, 2015, at 10:58 AM, Andrew Lunn and...@lunn.ch wrote: > > >> Good point. The timeout is most definitely quite large and for sure on > >> the safe side. It might make sense to add some statistics gathering to > >> see how long the maximum observed delay actually is. > > > > Hi All > > > > Statistics are something which can be used a lot, i bursts and > > interactivily. ATU, VTU etc, are much less often used. So different > > delays might be justified. > > > > I agree about doing some statistics gathering to determine actual > > delays needed. > > > >Andrew > > What do you think about something like this? Hi Vivien Lets get some actually statistics first. I would suggest for testing you make _mv88e6xxx_wait() a busy loop and time how long it actually takes for different busy bits to clear. We should also test it on different families. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/18/2015 05:28 PM, Johan Schuijt wrote: > Thx for your looking into this! > >> >> Thank you for the report, I will try to reproduce this locally >> Could you please post the full crash log ? > > Of course, please see attached file. > >> Also could you test >> with a clean current kernel from Linus' tree or Dave's -net ? > > Will do. > >> These are available at: >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net >> respectively. >> >> One last question how many IRQs do you pin i.e. how many cores >> do you actively use for receive ? > > This varies a bit across our systems, but we’ve managed to reproduce this > with IRQs pinned on as many as 2,4,8 or 20 cores. > > I won’t have access to our test-setup till Monday again, so I’ll be testing 3 > scenario’s then: > - Your patch - > - Linux tree > - Dave’s -net tree Just one of these two would be enough. I couldn't reproduce it here but I don't have as many machines to test right now and had to improvise with VMs. :-) > > I’ll make sure to keep you posted on all the results then. We have a kernel > dump of the panic, so if you need me to extract any data from there just let > me know! (Some instructions might be needed) > > - Johan > Great, thank you! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
With attachment this time, also not sure wether this is what you were referring to, so let me know if anything else needed! - Johan > On 18 Jul 2015, at 17:28, Johan Schuijt-Li wrote: > > Thx for your looking into this! > >> >> Thank you for the report, I will try to reproduce this locally >> Could you please post the full crash log ? > > Of course, please see attached file. > >> Also could you test >> with a clean current kernel from Linus' tree or Dave's -net ? > > Will do. > >> These are available at: >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net >> respectively. >> >> One last question how many IRQs do you pin i.e. how many cores >> do you actively use for receive ? > > This varies a bit across our systems, but we’ve managed to reproduce this > with IRQs pinned on as many as 2,4,8 or 20 cores. > > I won’t have access to our test-setup till Monday again, so I’ll be testing 3 > scenario’s then: > - Your patch > - Linux tree > - Dave’s -net tree > > I’ll make sure to keep you posted on all the results then. We have a kernel > dump of the panic, so if you need me to extract any data from there just let > me know! (Some instructions might be needed) > > - Johan [28732.285611] general protection fault: [#1] SMP [28732.285665] Modules linked in: vhost_net vhost macvtap macvlan act_police cls_u32 sch_ingress cls_fw sch_sfq sch_htb nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack xt_physdev br_netfilter ebt_arp ebt_ip6 ebt_ip ebtable_nat tun rpcsec_gss_krb5 nfsv4 dns_resolver ebtable_filter ebtables ip6table_raw ip6table_mangle ip6table_filter ip6_tables nfsd auth_rpcgss oid_registry nfs_acl nfs lockd grace fscache sunrpc bridge 8021q garp mrp stp llc bonding xt_CT xt_DSCP iptable_mangle ipt_REJECT nf_reject_ipv4 xt_pkttype xt_tcpudp nf_log_ipv4 nf_log_common xt_LOG xt_limit xt_comment nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_owner iptable_filter iptable_raw ip_tables x_tables loop joydev hid_generic usbhid hid x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm ttm crct10dif_pclmul crc32_pclmul [28732.286421] ghash_clmulni_intel aesni_intel drm_kms_helper drm i2c_algo_bit aes_x86_64 lrw gf128mul dcdbas ipmi_si i2c_core evdev glue_helper ablk_helper tpm_tis mei_me tpm ehci_pci ehci_hcd mei cryptd usbcore iTCO_wdt iTCO_vendor_support ipmi_msghandler lpc_ich mfd_core wmi pcspkr usb_common shpchp sb_edac edac_core acpi_power_meter acpi_pad button processor thermal_sys ext4 crc16 mbcache jbd2 dm_mod sg sd_mod ahci libahci bnx2x libata ptp pps_core mdio crc32c_generic megaraid_sas crc32c_intel scsi_mod libcrc32c [28732.286955] CPU: 9 PID: 56 Comm: kworker/9:0 Not tainted 3.18.7-transip-2.0 #1 [28732.287023] Hardware name: Dell Inc. PowerEdge M620/0VHRN7, BIOS 2.5.2 02/03/2015 [28732.287096] Workqueue: events inet_frag_worker [28732.287139] task: 885f3d9fc210 ti: 885f3da0 task.ti: 885f3da0 [28732.287205] RIP: 0010:[] [] inet_evict_bucket+0x119/0x180 [28732.287278] RSP: 0018:885f3da03d58 EFLAGS: 00010292 [28732.287318] RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 [28732.287362] RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 [28732.287406] RBP: 0002 R08: 0286 R09: 88302f401640 [28732.287450] R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 [28732.287494] R13: 885f3da03d70 R14: R15: 881d6efe1a00 [28732.287538] FS: () GS:88602f28() knlGS: [28732.287606] CS: 0010 DS: ES: CR0: 80050033 [28732.287647] CR2: 00b11000 CR3: 004f05b24000 CR4: 000427e0 [28732.287691] Stack: [28732.287722] 81a905e0 81a905e8 814f4599 881d6efe1a58 [28732.287807] 0246 002e 81a8d8c0 81a918c0 [28732.287891] 02d3 0019 0240 8148075a [28732.287975] Call Trace: [28732.288013] [] ? _raw_spin_unlock_irqrestore+0x9/0x10 [28732.288056] [] ? inet_frag_worker+0x5a/0x250 [28732.288103] [] ? process_one_work+0x149/0x3f0 [28732.288146] [] ? worker_thread+0x63/0x490 [28732.288187] [] ? rescuer_thread+0x290/0x290 [28732.288229] [] ? kthread+0xce/0xf0 [28732.288269] [] ? kthread_create_on_node+0x180/0x180 [28732.288313] [] ? ret_from_fork+0x7c/0xb0 [28732.288353] [] ? kthread_create_on_node+0x180/0x180 [28732.288396] Code: 8b 04 24 66 83 40 08 01 48 8b 7c 24 18 48 85 ff 74 2a 48 83 ef 58 75 13 eb 22 0f 1f 84 00 00 00 00 00 48 83 eb 58 48 89 df 74 11 <48> 8b 5f 58 41 ff 94 24 70 40 00 00 48 85 db 75 e6 48 83 c4 28 [28732.288827] RIP [] inet_evict_bucket+0x119/0x180 [28732.288873] RSP
Re: reproducable panic eviction work queue
Thx for your looking into this! > > Thank you for the report, I will try to reproduce this locally > Could you please post the full crash log ? Of course, please see attached file. > Also could you test > with a clean current kernel from Linus' tree or Dave's -net ? Will do. > These are available at: > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net > respectively. > > One last question how many IRQs do you pin i.e. how many cores > do you actively use for receive ? This varies a bit across our systems, but we’ve managed to reproduce this with IRQs pinned on as many as 2,4,8 or 20 cores. I won’t have access to our test-setup till Monday again, so I’ll be testing 3 scenario’s then: - Your patch - Linux tree - Dave’s -net tree I’ll make sure to keep you posted on all the results then. We have a kernel dump of the panic, so if you need me to extract any data from there just let me know! (Some instructions might be needed) - JohanN�r��yb�X��ǧv�^�){.n�+���z�^�)w*jg����ݢj/���z�ޖ��2�ޙ&�)ߡ�a�����G���h��j:+v���w��٥
Re: [PATCH 1/2] net: dsa: mv88e6xxx: sleep in _mv88e6xxx_stats_wait
Hi all, - On Jul 18, 2015, at 10:58 AM, Andrew Lunn and...@lunn.ch wrote: >> Good point. The timeout is most definitely quite large and for sure on >> the safe side. It might make sense to add some statistics gathering to >> see how long the maximum observed delay actually is. > > Hi All > > Statistics are something which can be used a lot, i bursts and > interactivily. ATU, VTU etc, are much less often used. So different > delays might be justified. > > I agree about doing some statistics gathering to determine actual > delays needed. > >Andrew What do you think about something like this? Thanks, -v diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 4f3701f..6471807 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -563,9 +563,10 @@ static bool mv88e6xxx_6352_family(struct dsa_switch *ds) /* Must be called with SMI lock held */ static int _mv88e6xxx_wait(struct dsa_switch *ds, int reg, int offset, - u16 mask) + u16 mask, unsigned int msecs) { unsigned long timeout = jiffies + HZ / 10; + unsigned long usecs = msecs * 1000; while (time_before(jiffies, timeout)) { int ret; @@ -576,7 +577,8 @@ static int _mv88e6xxx_wait(struct dsa_switch *ds, int reg, int offset, if (!(ret & mask)) return 0; - usleep_range(1000, 2000); + if (usecs) + usleep_range(usecs, usecs + 1000); } return -ETIMEDOUT; } @@ -585,7 +587,7 @@ static int _mv88e6xxx_wait(struct dsa_switch *ds, int reg, int offset, static int _mv88e6xxx_stats_wait(struct dsa_switch *ds) { return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_STATS_OP, - GLOBAL_STATS_OP_BUSY); + GLOBAL_STATS_OP_BUSY, 0); } /* Must be called with SMI mutex held */ @@ -872,13 +874,14 @@ error: } #endif /* CONFIG_NET_DSA_HWMON */ -static int mv88e6xxx_wait(struct dsa_switch *ds, int reg, int offset, u16 mask) +static int mv88e6xxx_wait(struct dsa_switch *ds, int reg, int offset, u16 mask, + unsigned int msecs) { struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); int ret; mutex_lock(&ps->smi_mutex); - ret = _mv88e6xxx_wait(ds, reg, offset, mask); + ret = _mv88e6xxx_wait(ds, reg, offset, mask, msecs); mutex_unlock(&ps->smi_mutex); return ret; @@ -887,33 +890,33 @@ static int mv88e6xxx_wait(struct dsa_switch *ds, int reg, int offset, u16 mask) static int _mv88e6xxx_phy_wait(struct dsa_switch *ds) { return _mv88e6xxx_wait(ds, REG_GLOBAL2, GLOBAL2_SMI_OP, - GLOBAL2_SMI_OP_BUSY); + GLOBAL2_SMI_OP_BUSY, 1); } int mv88e6xxx_eeprom_load_wait(struct dsa_switch *ds) { return mv88e6xxx_wait(ds, REG_GLOBAL2, GLOBAL2_EEPROM_OP, - GLOBAL2_EEPROM_OP_LOAD); + GLOBAL2_EEPROM_OP_LOAD, 1); } int mv88e6xxx_eeprom_busy_wait(struct dsa_switch *ds) { return mv88e6xxx_wait(ds, REG_GLOBAL2, GLOBAL2_EEPROM_OP, - GLOBAL2_EEPROM_OP_BUSY); + GLOBAL2_EEPROM_OP_BUSY, 1); } /* Must be called with SMI lock held */ static int _mv88e6xxx_atu_wait(struct dsa_switch *ds) { return _mv88e6xxx_wait(ds, REG_GLOBAL, GLOBAL_ATU_OP, - GLOBAL_ATU_OP_BUSY); + GLOBAL_ATU_OP_BUSY, 1); } /* Must be called with SMI lock held */ static int _mv88e6xxx_scratch_wait(struct dsa_switch *ds) { return _mv88e6xxx_wait(ds, REG_GLOBAL2, GLOBAL2_SCRATCH_MISC, - GLOBAL2_SCRATCH_BUSY); + GLOBAL2_SCRATCH_BUSY, 1); } /* Must be called with SMI mutex held */ -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] net: dsa: mv88e6xxx: sleep in _mv88e6xxx_stats_wait
> Good point. The timeout is most definitely quite large and for sure on > the safe side. It might make sense to add some statistics gathering to > see how long the maximum observed delay actually is. Hi All Statistics are something which can be used a lot, i bursts and interactivily. ATU, VTU etc, are much less often used. So different delays might be justified. I agree about doing some statistics gathering to determine actual delays needed. Andrew -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] xfrm: Fix a typo
Signed-off-by: Jakub Wilk --- net/xfrm/xfrm_user.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c index bd16c6c..0cebf1f 100644 --- a/net/xfrm/xfrm_user.c +++ b/net/xfrm/xfrm_user.c @@ -2048,7 +2048,7 @@ static int xfrm_add_pol_expire(struct sk_buff *skb, struct nlmsghdr *nlh, xfrm_audit_policy_delete(xp, 1, true); } else { // reset the timers here? - WARN(1, "Dont know what to do with soft policy expire\n"); + WARN(1, "Don't know what to do with soft policy expire\n"); } km_policy_expired(xp, p->dir, up->hard, nlh->nlmsg_pid); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
On 07/18/2015 12:02 PM, Nikolay Aleksandrov wrote: > On 07/18/2015 11:01 AM, Johan Schuijt wrote: >> Yes, we already found these and are included in our kernel, but even with >> these patches we still receive the panic. >> >> - Johan >> >> >>> On 18 Jul 2015, at 10:56, Eric Dumazet wrote: >>> >>> On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: Hey guys, We’re currently running into a reproducible panic in the eviction work queue code when we pin al our eth* IRQ to different CPU cores (in order to scale our networking performance for our virtual servers). This only occurs in kernels >= 3.17 and is a result of the following change: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 The race/panic we see seems to be the same as, or similar to: https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 We can confirm that this is directly exposed by the IRQ pinning since disabling this stops us from being able to reproduce this case :) How te reproduce: in our test-setup we have 4 machines generating UDP packets which are send to the vulnerable host. These all have a MTU of 100 (for test purposes) and send UDP packets of a size of 256 bytes. Within half an hour you will see the following panic: crash> bt PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: "kworker/9:0" #0 [885f3da03b60] machine_kexec at 8104a1f7 #1 [885f3da03bb0] crash_kexec at 810db187 #2 [885f3da03c80] oops_end at 81015140 #3 [885f3da03ca0] general_protection at 814f6c88 [exception RIP: inet_evict_bucket+281] RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 RAX: 885f3da03d08 RBX: dead001000a8 RCX: 885f3da03d08 RDX: 0006 RSI: 885f3da03ce8 RDI: dead001000a8 RBP: 0002 R8: 0286 R9: 88302f401640 R10: 8000 R11: 88602ec0c138 R12: 81a8d8c0 R13: 885f3da03d70 R14: R15: 881d6efe1a00 ORIG_RAX: CS: 0010 SS: 0018 #4 [885f3da03db0] inet_frag_worker at 8148075a #5 [885f3da03e10] process_one_work at 8107be19 #6 [885f3da03e60] worker_thread at 8107c6e3 #7 [885f3da03ed0] kthread at 8108103e #8 [885f3da03f50] ret_from_fork at 814f4d7c We would love to receive your input on this matter. Thx in advance, - Johan >>> >>> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 & >>> d70127e8a942364de8dd140fe73893efda363293 >>> >>> Also please send your mails in text format, not html, and CC netdev ( I >>> did here) >>> >>> >>> >> >> N�r��y���b�X��ǧv�^�){.n�+���z�^�)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥ >> > > Thank you for the report, I will try to reproduce this locally > Could you please post the full crash log ? Also could you test > with a clean current kernel from Linus' tree or Dave's -net ? > These are available at: > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net > respectively. > > One last question how many IRQs do you pin i.e. how many cores > do you actively use for receive ? > Flags seems to be modified while still linked and we may get the following (theoretical) situation: CPU 1 CPU 2 inet_frag_evictor (wait for chainlock) spin_lock(chainlock) unlock(chainlock) get lock, set EVICT flag, hlist_del etc. change flags again while qp is in the evict list So could you please try the following patch which sets the flag while holding the chain lock: diff --git a/net/ipv4/inet_fragment.c b/net/ipv4/inet_fragment.c index 5e346a082e5f..2521ed9c1b52 100644 --- a/net/ipv4/inet_fragment.c +++ b/net/ipv4/inet_fragment.c @@ -354,8 +354,8 @@ static struct inet_frag_queue *inet_frag_intern(struct netns_frags *nf, hlist_for_each_entry(qp, &hb->chain, list) { if (qp->net == nf && f->match(qp, arg)) { atomic_inc(&qp->refcnt); - spin_unlock(&hb->chain_lock); qp_in->flags |= INET_FRAG_COMPLETE; + spin_unlock(&hb->chain_lock); inet_frag_put(qp_in, f); return qp; } -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of
[net-next]r8169: Disable advertise full duplex on chip RTL_GIGA_MAC_VER_09. Disable advertise full duplex on chip id RTL_GIGA_MAC_VER_09 wich support random full duplex.(very sad) On branch master
Changes to be committed: modified: drivers/net/ethernet/realtek/r8169.c Signed-off-by: Corcodel Marian --- drivers/net/ethernet/realtek/r8169.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 714af89..a373679 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1918,12 +1918,23 @@ static int rtl8169_set_speed_xmi(struct net_device *dev, if (adv & ADVERTISED_10baseT_Half) auto_nego |= ADVERTISE_10HALF; - if (adv & ADVERTISED_10baseT_Full) - auto_nego |= ADVERTISE_10FULL; + if (!tp->mac_version == RTL_GIGA_MAC_VER_09) { + if (adv & ADVERTISED_10baseT_Full) + auto_nego |= ADVERTISE_10FULL; + } else { + netif_info(tp, link, dev, + "PHY does not support 10Mbps full duplex\n"); + } if (adv & ADVERTISED_100baseT_Half) auto_nego |= ADVERTISE_100HALF; - if (adv & ADVERTISED_100baseT_Full) - auto_nego |= ADVERTISE_100FULL; + if (!tp->mac_version == RTL_GIGA_MAC_VER_09) { + if (adv & ADVERTISED_100baseT_Full) + auto_nego |= ADVERTISE_100FULL; + } else { + netif_info(tp, link, dev, + "PHY does not support 100Mbps full duplex\n"); + } + auto_nego |= ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM;
[PATCH net-next]r8169:Disable advertise full duplex on chip id RTL_GIGA_MAC_VER_09. Disable advertise full duplex on chip id RTL_GIGA_MAC_VER_09 wich support random full duplex.(very sad) On branch
Changes to be committed: modified: drivers/net/ethernet/realtek/r8169.c Signed-off-by: Corcodel Marian --- drivers/net/ethernet/realtek/r8169.c | 19 +++ 1 file changed, 15 insertions(+), 4 deletions(-) diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c index 714af89..a373679 100644 --- a/drivers/net/ethernet/realtek/r8169.c +++ b/drivers/net/ethernet/realtek/r8169.c @@ -1918,12 +1918,23 @@ static int rtl8169_set_speed_xmii(struct net_device *dev, if (adv & ADVERTISED_10baseT_Half) auto_nego |= ADVERTISE_10HALF; - if (adv & ADVERTISED_10baseT_Full) - auto_nego |= ADVERTISE_10FULL; + if (!tp->mac_version == RTL_GIGA_MAC_VER_09) { + if (adv & ADVERTISED_10baseT_Full) + auto_nego |= ADVERTISE_10FULL; + } else { + netif_info(tp, link, dev, + "PHY does not support 10Mbps full duplex\n"); + } if (adv & ADVERTISED_100baseT_Half) auto_nego |= ADVERTISE_100HALF; - if (adv & ADVERTISED_100baseT_Full) - auto_nego |= ADVERTISE_100FULL; + if (!tp->mac_version == RTL_GIGA_MAC_VER_09) { + if (adv & ADVERTISED_100baseT_Full) + auto_nego |= ADVERTISE_100FULL; + } else { + netif_info(tp, link, dev, + "PHY does not support 100Mbps full duplex\n"); + } + auto_nego |= ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM;
Re: reproducable panic eviction work queue
On 07/18/2015 11:01 AM, Johan Schuijt wrote: > Yes, we already found these and are included in our kernel, but even with > these patches we still receive the panic. > > - Johan > > >> On 18 Jul 2015, at 10:56, Eric Dumazet wrote: >> >> On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: >>> Hey guys, >>> >>> >>> We’re currently running into a reproducible panic in the eviction work >>> queue code when we pin al our eth* IRQ to different CPU cores (in >>> order to scale our networking performance for our virtual servers). >>> This only occurs in kernels >= 3.17 and is a result of the following >>> change: >>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 >>> >>> >>> The race/panic we see seems to be the same as, or similar to: >>> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 >>> >>> >>> We can confirm that this is directly exposed by the IRQ pinning since >>> disabling this stops us from being able to reproduce this case :) >>> >>> >>> How te reproduce: in our test-setup we have 4 machines generating UDP >>> packets which are send to the vulnerable host. These all have a MTU of >>> 100 (for test purposes) and send UDP packets of a size of 256 bytes. >>> Within half an hour you will see the following panic: >>> >>> >>> crash> bt >>> PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: "kworker/9:0" >>> #0 [885f3da03b60] machine_kexec at 8104a1f7 >>> #1 [885f3da03bb0] crash_kexec at 810db187 >>> #2 [885f3da03c80] oops_end at 81015140 >>> #3 [885f3da03ca0] general_protection at 814f6c88 >>>[exception RIP: inet_evict_bucket+281] >>>RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 >>>RAX: 885f3da03d08 RBX: dead001000a8 RCX: >>> 885f3da03d08 >>>RDX: 0006 RSI: 885f3da03ce8 RDI: >>> dead001000a8 >>>RBP: 0002 R8: 0286 R9: >>> 88302f401640 >>>R10: 8000 R11: 88602ec0c138 R12: >>> 81a8d8c0 >>>R13: 885f3da03d70 R14: R15: >>> 881d6efe1a00 >>>ORIG_RAX: CS: 0010 SS: 0018 >>> #4 [885f3da03db0] inet_frag_worker at 8148075a >>> #5 [885f3da03e10] process_one_work at 8107be19 >>> #6 [885f3da03e60] worker_thread at 8107c6e3 >>> #7 [885f3da03ed0] kthread at 8108103e >>> #8 [885f3da03f50] ret_from_fork at 814f4d7c >>> >>> >>> We would love to receive your input on this matter. >>> >>> >>> Thx in advance, >>> >>> >>> - Johan >> >> Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 & >> d70127e8a942364de8dd140fe73893efda363293 >> >> Also please send your mails in text format, not html, and CC netdev ( I >> did here) >> >>> >>> >> >> > > N�r��y���b�X��ǧv�^�){.n�+���z�^�)���w*jg����ݢj/���z�ޖ��2�ޙ���&�)ߡ�a�����G���h��j:+v���w�٥ > Thank you for the report, I will try to reproduce this locally Could you please post the full crash log ? Also could you test with a clean current kernel from Linus' tree or Dave's -net ? These are available at: git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net respectively. One last question how many IRQs do you pin i.e. how many cores do you actively use for receive ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: reproducable panic eviction work queue
Yes, we already found these and are included in our kernel, but even with these patches we still receive the panic. - Johan > On 18 Jul 2015, at 10:56, Eric Dumazet wrote: > > On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: >> Hey guys, >> >> >> We’re currently running into a reproducible panic in the eviction work >> queue code when we pin al our eth* IRQ to different CPU cores (in >> order to scale our networking performance for our virtual servers). >> This only occurs in kernels >= 3.17 and is a result of the following >> change: >> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 >> >> >> The race/panic we see seems to be the same as, or similar to: >> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 >> >> >> We can confirm that this is directly exposed by the IRQ pinning since >> disabling this stops us from being able to reproduce this case :) >> >> >> How te reproduce: in our test-setup we have 4 machines generating UDP >> packets which are send to the vulnerable host. These all have a MTU of >> 100 (for test purposes) and send UDP packets of a size of 256 bytes. >> Within half an hour you will see the following panic: >> >> >> crash> bt >> PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: "kworker/9:0" >> #0 [885f3da03b60] machine_kexec at 8104a1f7 >> #1 [885f3da03bb0] crash_kexec at 810db187 >> #2 [885f3da03c80] oops_end at 81015140 >> #3 [885f3da03ca0] general_protection at 814f6c88 >>[exception RIP: inet_evict_bucket+281] >>RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 >>RAX: 885f3da03d08 RBX: dead001000a8 RCX: >> 885f3da03d08 >>RDX: 0006 RSI: 885f3da03ce8 RDI: >> dead001000a8 >>RBP: 0002 R8: 0286 R9: >> 88302f401640 >>R10: 8000 R11: 88602ec0c138 R12: >> 81a8d8c0 >>R13: 885f3da03d70 R14: R15: >> 881d6efe1a00 >>ORIG_RAX: CS: 0010 SS: 0018 >> #4 [885f3da03db0] inet_frag_worker at 8148075a >> #5 [885f3da03e10] process_one_work at 8107be19 >> #6 [885f3da03e60] worker_thread at 8107c6e3 >> #7 [885f3da03ed0] kthread at 8108103e >> #8 [885f3da03f50] ret_from_fork at 814f4d7c >> >> >> We would love to receive your input on this matter. >> >> >> Thx in advance, >> >> >> - Johan > > Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 & > d70127e8a942364de8dd140fe73893efda363293 > > Also please send your mails in text format, not html, and CC netdev ( I > did here) > >> >> > >
Re: reproducable panic eviction work queue
On Fri, 2015-07-17 at 21:18 +, Johan Schuijt wrote: > Hey guys, > > > We’re currently running into a reproducible panic in the eviction work > queue code when we pin al our eth* IRQ to different CPU cores (in > order to scale our networking performance for our virtual servers). > This only occurs in kernels >= 3.17 and is a result of the following > change: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=b13d3cbfb8e8a8f53930af67d1ebf05149f32c24 > > > The race/panic we see seems to be the same as, or similar to: > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-3.18.y&id=65ba1f1ec0eff1c25933468e1d238201c0c2cb29 > > > We can confirm that this is directly exposed by the IRQ pinning since > disabling this stops us from being able to reproduce this case :) > > > How te reproduce: in our test-setup we have 4 machines generating UDP > packets which are send to the vulnerable host. These all have a MTU of > 100 (for test purposes) and send UDP packets of a size of 256 bytes. > Within half an hour you will see the following panic: > > > crash> bt > PID: 56 TASK: 885f3d9fc210 CPU: 9 COMMAND: "kworker/9:0" > #0 [885f3da03b60] machine_kexec at 8104a1f7 > #1 [885f3da03bb0] crash_kexec at 810db187 > #2 [885f3da03c80] oops_end at 81015140 > #3 [885f3da03ca0] general_protection at 814f6c88 > [exception RIP: inet_evict_bucket+281] > RIP: 81480699 RSP: 885f3da03d58 RFLAGS: 00010292 > RAX: 885f3da03d08 RBX: dead001000a8 RCX: > 885f3da03d08 > RDX: 0006 RSI: 885f3da03ce8 RDI: > dead001000a8 > RBP: 0002 R8: 0286 R9: > 88302f401640 > R10: 8000 R11: 88602ec0c138 R12: > 81a8d8c0 > R13: 885f3da03d70 R14: R15: > 881d6efe1a00 > ORIG_RAX: CS: 0010 SS: 0018 > #4 [885f3da03db0] inet_frag_worker at 8148075a > #5 [885f3da03e10] process_one_work at 8107be19 > #6 [885f3da03e60] worker_thread at 8107c6e3 > #7 [885f3da03ed0] kthread at 8108103e > #8 [885f3da03f50] ret_from_fork at 814f4d7c > > > We would love to receive your input on this matter. > > > Thx in advance, > > > - Johan Check commits 65ba1f1ec0eff1c25933468e1d238201c0c2cb29 & d70127e8a942364de8dd140fe73893efda363293 Also please send your mails in text format, not html, and CC netdev ( I did here) > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RGC PATCH v3.10] net: sched: validate that class is found in qdisc_tree_decrease_qlen
On Fri, 2015-07-17 at 17:27 -0700, Alex Gartrell wrote: > On Fri, Jul 17, 2015 at 5:10 PM, Cong Wang wrote: > > Hmm, but in htb_delete() we do reset the leaf qdisc before removing the > > class from ha > > > > if (!cl->level) { > > qlen = cl->un.leaf.q->q.qlen; > > qdisc_reset(cl->un.leaf.q); > > qdisc_tree_decrease_qlen(cl->un.leaf.q, qlen); > > } > > > > therefore, the leaf fq_codel qdisc is supposed to have 0 skb after that, > > that is, the second fq_codel_reset() is supposed to return NULL immediately? > > Why is qdidsc_tree_decrease_qlen() called in the second fq_codel_reset()? > > > > Or there is a race condition between ->delete() and ->put()? In which new > > skb can be enqueued? > > This makes the most sense to me, but I have ~32 hours of experience > with this subsystem :) > > Taking that for granted, it seems like it'd be appropriate to note the > invariant in the code i've changed with a WARN_ON and to skip it, and > then to otherwise find a way to close the hole. Do you agree? > HTB + fq_codel ? Make sure you applied Cong fix commit c0afd9ce4d6a646fb6433536f95a418bb348fab1 Author: WANG Cong Date: Tue Jul 14 11:21:58 2015 -0700 fq_codel: fix return value of fq_codel_drop() The ->drop() is supposed to return the number of bytes it dropped, however fq_codel_drop() returns the index of the flow where it drops a packet from. Fix this by introducing a helper to wrap fq_codel_drop(). -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html