Re: [RFC PATCH 2/3] net: macb: Add support for 1588 for Zynq Ultrascale+ MPSoC
Hi Nicolas, Thanks for your reply On Tue, Aug 9, 2016 at 10:26 PM, Punnaiah Choudary Kalluri wrote: > Hi Nicolas, > > 1588 implementation in cadence GEM IP we have in Zynq Ultascale+ MPSoC is > Different to the one in Zynq SOC. > > In earlier version, all timestamp values will be stored in registers and > there is no specific > Mechanism to distinguish the received ethernet frame that contains time stamp > information > Other than parsing the frame for PTP packet type. > > We have basic implementation for earlier version in our out of tree driver, > which is going to be deprecated > Soon. You could also check the below driver for 1588 support. > https://gitenterprise.xilinx.com/Linux/linux-xlnx/blob/master/drivers/net/ethernet/xilinx/xilinx_emacps.c > > > Regards, > Punnaiah > >> -Original Message- >> From: Nicolas Ferre [mailto:nicolas.fe...@atmel.com] >> Sent: Tuesday, August 09, 2016 10:10 PM >> To: Harini Katakam ; Harini Katakam >> ; Andrei Pistirica >> Cc: da...@davemloft.net; Boris Brezillon > electrons.com>; alexandre.bell...@free-electrons.com; >> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; >> devicet...@vger.kernel.org; Punnaiah Choudary Kalluri >> ; Michal Simek ; Anirudha >> Sarangi >> Subject: Re: [RFC PATCH 2/3] net: macb: Add support for 1588 for Zynq >> Ultrascale+ MPSoC >> >> Le 21/09/2015 à 19:49, Harini Katakam a écrit : >> > On Fri, Sep 11, 2015 at 1:27 PM, Harini Katakam >> > wrote: >> >> Cadence GEM in Zynq Ultrascale+ MPSoC supports 1588 and provides a >> >> 102 bit time counter with 48 bits for seconds, 30 bits for nsecs and >> >> 24 bits for sub-nsecs. The timestamp is made available to the SW through >> >> registers as well as (more precisely) through upper two words in >> >> an extended BD. >> >> >> >> This patch does the following: >> >> - Adds MACB_CAPS_TSU in zynqmp_config. >> >> - Registers to ptp clock framework (after checking for timestamp support >> in >> >> IP and capability in config). >> >> - TX BD and RX BD control registers are written to populate timestamp in >> >> extended BD words. >> >> - Timer initialization is done by writing time of day to the timer >> >> counter. >> >> - ns increment register is programmed as NS_PER_SEC/TSU_CLK. >> >> For a 24 bit subns precision, the subns increment equals >> >> remainder of (NS_PER_SEC/TSU_CLK) * (2^24). >> >> TSU (Time stamp unit) clock is obtained by the driver from devicetree. >> >> - HW time stamp capabilities are advertised via ethtool and macb ioctl is >> >> updated accordingly. >> >> - For all PTP event frames, nanoseconds and the lower 5 bits of seconds >> are >> >> obtained from the BD. This offers a precise timestamp. The upper bits >> >> (which dont vary between consecutive packets) are obtained from the >> >> TX/RX PTP event/PEER registers. The timestamp obtained thus is >> updated >> >> in skb for upper layers to access. >> >> - The drivers register functions with ptp to perform time and frequency >> >> adjustment. >> >> - Time adjustment is done by writing to the 1558_ADJUST register. >> >> The controller will read the delta in this register and update the timer >> >> counter register. Alternatively, for large time offset adjustments, >> >> the driver reads the secs and nsecs counter values, adds/subtracts the >> >> delta and updates the timer counter. In order to be as precise as >> possible, >> >> nsecs counter is read again if secs has incremented during the counter >> read. >> >> - Frequency adjustment is not directly supported by this IP. >> >> addend is the initial value ns increment and similarly addendesub. >> >> The ppb (parts per billion) provided is used as >> >> ns_incr = addend +/- (ppb/rate). >> >> Similarly the remainder of the above is used to populate subns >> increment. >> >> In case the ppb requested is negative AND subns adjustment greater >> than >> >> the addendsub, ns_incr is reduced by 1 and subns_incr is adjusted in >> >> positive accordingly. >> >> >> >> Signed-off-by: Harini Katakam : >> >> --- >> >> drivers/net/ethernet/cadence/macb.c | 372 >> ++- >> >> drivers/net/ethernet/cadence/macb.h | 64 ++ >> >> 2 files changed, 428 insertions(+), 8 deletions(-) >> >> >> >> diff --git a/drivers/net/ethernet/cadence/macb.c >> b/drivers/net/ethernet/cadence/macb.c >> >> index bb2932c..b531008 100644 >> >> --- a/drivers/net/ethernet/cadence/macb.c >> >> +++ b/drivers/net/ethernet/cadence/macb.c >> >> @@ -30,6 +30,8 @@ >> >> #include >> >> #include >> >> [..] >> >> >> + unsigned intns_incr; >> >> + unsigned intsubns_incr; >> >> }; >> >> >> >> static inline bool macb_is_gem(struct macb *bp) >> >> -- >> >> 1.7.9.5 >> > >> > Ping >> > >> > Thanks. >> >> Harini, >> >> I come back to this patch of last year and I'm sorry about being so late >> answering you. >> >> Andrei who is added to the discussion will have some time to deal with >> this fe
Re: [PATCH net] bridge: Fix problems around fdb entries pointing to the bridge device
From: Toshiaki Makita Date: Thu, 4 Aug 2016 11:11:19 +0900 > Adding fdb entries pointing to the bridge device uses fdb_insert(), > which lacks various checks and does not respect added_by_user flag. > > As a result, some inconsistent behavior can happen: > * Adding temporary entries succeeds but results in permanent entries. > * Same goes for "dynamic" and "use". > * Changing mac address of the bridge device causes deletion of > user-added entries. > * Replacing existing entries looks successful from userspace but actually > not, regardless of NLM_F_EXCL flag. > > Use the same logic as other entries and fix them. > > Fixes: 3741873b4f73 ("bridge: allow adding of fdb entries pointing to the > bridge device") > Signed-off-by: Toshiaki Makita Applied, thanks.
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 08:52:01PM -0700, Alexei Starovoitov wrote: > On Tue, Aug 09, 2016 at 08:40:05PM -0700, Sargun Dhillon wrote: > > On Tue, Aug 09, 2016 at 08:27:32PM -0700, Alexei Starovoitov wrote: > > > On Tue, Aug 09, 2016 at 06:26:37PM -0700, Sargun Dhillon wrote: > > > > On Tue, Aug 09, 2016 at 06:02:34PM -0700, Alexei Starovoitov wrote: > > > > > On Tue, Aug 09, 2016 at 05:55:26PM -0700, Sargun Dhillon wrote: > > > > > > On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > > > > > > > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > > > > > > > This adds a bpf helper that's similar to the skb_in_cgroup > > > > > > > > helper to check > > > > > > > > whether the probe is currently executing in the context of a > > > > > > > > specific > > > > > > > > subset of the cgroupsv2 hierarchy. It does this based on > > > > > > > > membership test > > > > > > > > for a cgroup arraymap. It is invalid to call this in an > > > > > > > > interrupt, and > > > > > > > > it'll return an error. The helper is primarily to be used in > > > > > > > > debugging > > > > > > > > activities for containers, where you may have multiple programs > > > > > > > > running in > > > > > > > > a given top-level "container". > > > > > > > > > > > > > > > > This patch also genericizes some of the arraymap fetching logic > > > > > > > > between the > > > > > > > > skb_in_cgroup helper and this new helper. > > > > > > > > > > > > > > > > Signed-off-by: Sargun Dhillon > > > > > > > > Cc: Alexei Starovoitov > > > > > > > > Cc: Daniel Borkmann > > > > > > > > --- > > > > > > > > include/linux/bpf.h | 24 > > > > > > > > include/uapi/linux/bpf.h | 11 +++ > > > > > > > > kernel/bpf/arraymap.c| 2 +- > > > > > > > > kernel/bpf/verifier.c| 4 +++- > > > > > > > > kernel/trace/bpf_trace.c | 34 > > > > > > > > ++ > > > > > > > > net/core/filter.c| 11 --- > > > > > > > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > > > > index 1113423..9adf712 100644 > > > > > > > > --- a/include/linux/bpf.h > > > > > > > > +++ b/include/linux/bpf.h > > > > > > > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > > > > > > > bpf_get_stackid_proto; > > > > > > > > void bpf_user_rnd_init_once(void); > > > > > > > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > > > > > > > > > > > > > +#ifdef CONFIG_CGROUPS > > > > > > > > +/* Helper to fetch a cgroup pointer based on index. > > > > > > > > + * @map: a cgroup arraymap > > > > > > > > + * @idx: index of the item you want to fetch > > > > > > > > + * > > > > > > > > + * Returns pointer on success, > > > > > > > > + * Error code if item not found, or out-of-bounds access > > > > > > > > + */ > > > > > > > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map > > > > > > > > *map, int idx) > > > > > > > > +{ > > > > > > > > + struct cgroup *cgrp; > > > > > > > > + struct bpf_array *array = container_of(map, struct > > > > > > > > bpf_array, map); > > > > > > > > + > > > > > > > > + if (unlikely(idx >= array->map.max_entries)) > > > > > > > > + return ERR_PTR(-E2BIG); > > > > > > > > + > > > > > > > > + cgrp = READ_ONCE(array->ptrs[idx]); > > > > > > > > + if (unlikely(!cgrp)) > > > > > > > > + return ERR_PTR(-EAGAIN); > > > > > > > > + > > > > > > > > + return cgrp; > > > > > > > > +} > > > > > > > > +#endif /* CONFIG_CGROUPS */ > > > > > > > > + > > > > > > > > #endif /* _LINUX_BPF_H */ > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > > > index da218fe..64b1a07 100644 > > > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > > > > > > > */ > > > > > > > > BPF_FUNC_probe_write_user, > > > > > > > > > > > > > > > > + /** > > > > > > > > +* bpf_current_task_in_cgroup(map, index) - Check > > > > > > > > cgroup2 membership of current task > > > > > > > > +* @map: pointer to bpf_map in > > > > > > > > BPF_MAP_TYPE_CGROUP_ARRAY type > > > > > > > > +* @index: index of the cgroup in the bpf_map > > > > > > > > +* Return: > > > > > > > > +* == 0 current failed the cgroup2 descendant test > > > > > > > > +* == 1 current succeeded the cgroup2 descendant test > > > > > > > > +*< 0 error > > > > > > > > +*/ > > > > > > > > + BPF_FUNC_current_task_in_cgroup, > > > > > > > > + > > > > > > > > __BPF_FUNC_MAX_ID, > > > > > > > > }; > > > > > > > > > > > > > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > > > > > > index 633a650..a2ac051 100644 > > > > > > > > --- a/kernel/bpf/arraymap.c > > >
e1000: __pskb_pull_tail failed
MY NFS server running 4.8-rc1 is getting flooded with this message: e1000e :00:19.0 eth0: __pskb_pull_tail failed. Never saw it happen with 4.7 or earlier. That device is this onboard NIC: 00:19.0 Ethernet controller: Intel Corporation Ethernet Connection (2) I218-V Dave
Re: problem with MPLS and TSO/GSO
On 7/25/16 10:39 AM, Lennert Buytenhek wrote: > Hi! > > I am seeing pretty horrible TCP transmit performance (anywhere between > 1 and 10 Mb/s, on a 10 Gb/s interface) when traffic is sent out over a > route that involves MPLS labeling, and this seems to be due to an > interaction between MPLS and TSO/GSO that causes all segmentable TCP > frames that are MPLS-labeled to be dropped on egress. ... > But, loading mpls_gso doesn't change much -- skb_gso_segment() then > starts return -EINVAL instead, which is due to the > skb_network_protocol() call in skb_mac_gso_segment() returning zero. > And looking at skb_network_protocol(), I don't see how this is > supposed to work -- skb->protocol is 0 at this point, and there is no > way to figure out that what we are encapsulating is IP traffic, because > unlike what is the case with VLAN tags, MPLS labels aren't followed by > an inner ethertype that says what kind of traffic is in here, you have > to have explicit knowledge of the payload type for MPLS. > > Any ideas? A quick update. I have a pretty good handle on the GSO changes for MPLS but I am still puzzled by a few things. Hopefully by end of week I can send out a patch series. Current performance comparison with my changes and a patch from Roopa: MPLS root@kenny-jessie3:~# ip netns exec ns0 netperf -c -C -H 10.10.10.10 -l 10 -t TCP_STREAM MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.10.10.10 () port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638410.00 3510.26 48.1148.114.491 4.491 non-MPLS root@kenny-jessie3:~# ip netns exec ns0 netperf -c -C -H 172.16.21.22 -l 30 -t TCP_STREAM MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 172.16.21.22 () port 0 AF_INET Recv SendSend Utilization Service Demand Socket Socket Message Elapsed Send Recv SendRecv Size SizeSize Time Throughput localremote local remote bytes bytes bytessecs.10^6bits/s % S % S us/KB us/KB 87380 16384 1638430.00 9654.97 42.3742.371.438 1.438
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 08:40:05PM -0700, Sargun Dhillon wrote: > On Tue, Aug 09, 2016 at 08:27:32PM -0700, Alexei Starovoitov wrote: > > On Tue, Aug 09, 2016 at 06:26:37PM -0700, Sargun Dhillon wrote: > > > On Tue, Aug 09, 2016 at 06:02:34PM -0700, Alexei Starovoitov wrote: > > > > On Tue, Aug 09, 2016 at 05:55:26PM -0700, Sargun Dhillon wrote: > > > > > On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > > > > > > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > > > > > > This adds a bpf helper that's similar to the skb_in_cgroup helper > > > > > > > to check > > > > > > > whether the probe is currently executing in the context of a > > > > > > > specific > > > > > > > subset of the cgroupsv2 hierarchy. It does this based on > > > > > > > membership test > > > > > > > for a cgroup arraymap. It is invalid to call this in an > > > > > > > interrupt, and > > > > > > > it'll return an error. The helper is primarily to be used in > > > > > > > debugging > > > > > > > activities for containers, where you may have multiple programs > > > > > > > running in > > > > > > > a given top-level "container". > > > > > > > > > > > > > > This patch also genericizes some of the arraymap fetching logic > > > > > > > between the > > > > > > > skb_in_cgroup helper and this new helper. > > > > > > > > > > > > > > Signed-off-by: Sargun Dhillon > > > > > > > Cc: Alexei Starovoitov > > > > > > > Cc: Daniel Borkmann > > > > > > > --- > > > > > > > include/linux/bpf.h | 24 > > > > > > > include/uapi/linux/bpf.h | 11 +++ > > > > > > > kernel/bpf/arraymap.c| 2 +- > > > > > > > kernel/bpf/verifier.c| 4 +++- > > > > > > > kernel/trace/bpf_trace.c | 34 ++ > > > > > > > net/core/filter.c| 11 --- > > > > > > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > > > index 1113423..9adf712 100644 > > > > > > > --- a/include/linux/bpf.h > > > > > > > +++ b/include/linux/bpf.h > > > > > > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > > > > > > bpf_get_stackid_proto; > > > > > > > void bpf_user_rnd_init_once(void); > > > > > > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > > > > > > > > > > > +#ifdef CONFIG_CGROUPS > > > > > > > +/* Helper to fetch a cgroup pointer based on index. > > > > > > > + * @map: a cgroup arraymap > > > > > > > + * @idx: index of the item you want to fetch > > > > > > > + * > > > > > > > + * Returns pointer on success, > > > > > > > + * Error code if item not found, or out-of-bounds access > > > > > > > + */ > > > > > > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map > > > > > > > *map, int idx) > > > > > > > +{ > > > > > > > + struct cgroup *cgrp; > > > > > > > + struct bpf_array *array = container_of(map, struct bpf_array, > > > > > > > map); > > > > > > > + > > > > > > > + if (unlikely(idx >= array->map.max_entries)) > > > > > > > + return ERR_PTR(-E2BIG); > > > > > > > + > > > > > > > + cgrp = READ_ONCE(array->ptrs[idx]); > > > > > > > + if (unlikely(!cgrp)) > > > > > > > + return ERR_PTR(-EAGAIN); > > > > > > > + > > > > > > > + return cgrp; > > > > > > > +} > > > > > > > +#endif /* CONFIG_CGROUPS */ > > > > > > > + > > > > > > > #endif /* _LINUX_BPF_H */ > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > > index da218fe..64b1a07 100644 > > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > > > > > >*/ > > > > > > > BPF_FUNC_probe_write_user, > > > > > > > > > > > > > > + /** > > > > > > > + * bpf_current_task_in_cgroup(map, index) - Check cgroup2 > > > > > > > membership of current task > > > > > > > + * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > > > > > > > + * @index: index of the cgroup in the bpf_map > > > > > > > + * Return: > > > > > > > + * == 0 current failed the cgroup2 descendant test > > > > > > > + * == 1 current succeeded the cgroup2 descendant test > > > > > > > + *< 0 error > > > > > > > + */ > > > > > > > + BPF_FUNC_current_task_in_cgroup, > > > > > > > + > > > > > > > __BPF_FUNC_MAX_ID, > > > > > > > }; > > > > > > > > > > > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > > > > > index 633a650..a2ac051 100644 > > > > > > > --- a/kernel/bpf/arraymap.c > > > > > > > +++ b/kernel/bpf/arraymap.c > > > > > > > @@ -538,7 +538,7 @@ static int __init > > > > > > > register_perf_event_array_map(void) > > > > > > > } > > > > > > > late_initcall(register_perf_event_array_map); > > > > > > > > > > > > > > -#ifdef CONFIG_SOCK_CGROUP_DATA > > > > > > > +#ifdef CONFIG_CGROUPS > > > > > > > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, > > > > > > >
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 08:27:32PM -0700, Alexei Starovoitov wrote: > On Tue, Aug 09, 2016 at 06:26:37PM -0700, Sargun Dhillon wrote: > > On Tue, Aug 09, 2016 at 06:02:34PM -0700, Alexei Starovoitov wrote: > > > On Tue, Aug 09, 2016 at 05:55:26PM -0700, Sargun Dhillon wrote: > > > > On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > > > > > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > > > > > This adds a bpf helper that's similar to the skb_in_cgroup helper > > > > > > to check > > > > > > whether the probe is currently executing in the context of a > > > > > > specific > > > > > > subset of the cgroupsv2 hierarchy. It does this based on membership > > > > > > test > > > > > > for a cgroup arraymap. It is invalid to call this in an interrupt, > > > > > > and > > > > > > it'll return an error. The helper is primarily to be used in > > > > > > debugging > > > > > > activities for containers, where you may have multiple programs > > > > > > running in > > > > > > a given top-level "container". > > > > > > > > > > > > This patch also genericizes some of the arraymap fetching logic > > > > > > between the > > > > > > skb_in_cgroup helper and this new helper. > > > > > > > > > > > > Signed-off-by: Sargun Dhillon > > > > > > Cc: Alexei Starovoitov > > > > > > Cc: Daniel Borkmann > > > > > > --- > > > > > > include/linux/bpf.h | 24 > > > > > > include/uapi/linux/bpf.h | 11 +++ > > > > > > kernel/bpf/arraymap.c| 2 +- > > > > > > kernel/bpf/verifier.c| 4 +++- > > > > > > kernel/trace/bpf_trace.c | 34 ++ > > > > > > net/core/filter.c| 11 --- > > > > > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > > index 1113423..9adf712 100644 > > > > > > --- a/include/linux/bpf.h > > > > > > +++ b/include/linux/bpf.h > > > > > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > > > > > bpf_get_stackid_proto; > > > > > > void bpf_user_rnd_init_once(void); > > > > > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > > > > > > > > > +#ifdef CONFIG_CGROUPS > > > > > > +/* Helper to fetch a cgroup pointer based on index. > > > > > > + * @map: a cgroup arraymap > > > > > > + * @idx: index of the item you want to fetch > > > > > > + * > > > > > > + * Returns pointer on success, > > > > > > + * Error code if item not found, or out-of-bounds access > > > > > > + */ > > > > > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map > > > > > > *map, int idx) > > > > > > +{ > > > > > > + struct cgroup *cgrp; > > > > > > + struct bpf_array *array = container_of(map, struct bpf_array, > > > > > > map); > > > > > > + > > > > > > + if (unlikely(idx >= array->map.max_entries)) > > > > > > + return ERR_PTR(-E2BIG); > > > > > > + > > > > > > + cgrp = READ_ONCE(array->ptrs[idx]); > > > > > > + if (unlikely(!cgrp)) > > > > > > + return ERR_PTR(-EAGAIN); > > > > > > + > > > > > > + return cgrp; > > > > > > +} > > > > > > +#endif /* CONFIG_CGROUPS */ > > > > > > + > > > > > > #endif /* _LINUX_BPF_H */ > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > > index da218fe..64b1a07 100644 > > > > > > --- a/include/uapi/linux/bpf.h > > > > > > +++ b/include/uapi/linux/bpf.h > > > > > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > > > > > */ > > > > > > BPF_FUNC_probe_write_user, > > > > > > > > > > > > + /** > > > > > > +* bpf_current_task_in_cgroup(map, index) - Check cgroup2 > > > > > > membership of current task > > > > > > +* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > > > > > > +* @index: index of the cgroup in the bpf_map > > > > > > +* Return: > > > > > > +* == 0 current failed the cgroup2 descendant test > > > > > > +* == 1 current succeeded the cgroup2 descendant test > > > > > > +*< 0 error > > > > > > +*/ > > > > > > + BPF_FUNC_current_task_in_cgroup, > > > > > > + > > > > > > __BPF_FUNC_MAX_ID, > > > > > > }; > > > > > > > > > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > > > > index 633a650..a2ac051 100644 > > > > > > --- a/kernel/bpf/arraymap.c > > > > > > +++ b/kernel/bpf/arraymap.c > > > > > > @@ -538,7 +538,7 @@ static int __init > > > > > > register_perf_event_array_map(void) > > > > > > } > > > > > > late_initcall(register_perf_event_array_map); > > > > > > > > > > > > -#ifdef CONFIG_SOCK_CGROUP_DATA > > > > > > +#ifdef CONFIG_CGROUPS > > > > > > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, > > > > > > struct file *map_file /* not used > > > > > > */, > > > > > > int fd) > > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > > > > index 7094c69..80efab8 100644 > > > > > >
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 06:26:37PM -0700, Sargun Dhillon wrote: > On Tue, Aug 09, 2016 at 06:02:34PM -0700, Alexei Starovoitov wrote: > > On Tue, Aug 09, 2016 at 05:55:26PM -0700, Sargun Dhillon wrote: > > > On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > > > > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > > > > This adds a bpf helper that's similar to the skb_in_cgroup helper to > > > > > check > > > > > whether the probe is currently executing in the context of a specific > > > > > subset of the cgroupsv2 hierarchy. It does this based on membership > > > > > test > > > > > for a cgroup arraymap. It is invalid to call this in an interrupt, and > > > > > it'll return an error. The helper is primarily to be used in debugging > > > > > activities for containers, where you may have multiple programs > > > > > running in > > > > > a given top-level "container". > > > > > > > > > > This patch also genericizes some of the arraymap fetching logic > > > > > between the > > > > > skb_in_cgroup helper and this new helper. > > > > > > > > > > Signed-off-by: Sargun Dhillon > > > > > Cc: Alexei Starovoitov > > > > > Cc: Daniel Borkmann > > > > > --- > > > > > include/linux/bpf.h | 24 > > > > > include/uapi/linux/bpf.h | 11 +++ > > > > > kernel/bpf/arraymap.c| 2 +- > > > > > kernel/bpf/verifier.c| 4 +++- > > > > > kernel/trace/bpf_trace.c | 34 ++ > > > > > net/core/filter.c| 11 --- > > > > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > > index 1113423..9adf712 100644 > > > > > --- a/include/linux/bpf.h > > > > > +++ b/include/linux/bpf.h > > > > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > > > > bpf_get_stackid_proto; > > > > > void bpf_user_rnd_init_once(void); > > > > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > > > > > > > +#ifdef CONFIG_CGROUPS > > > > > +/* Helper to fetch a cgroup pointer based on index. > > > > > + * @map: a cgroup arraymap > > > > > + * @idx: index of the item you want to fetch > > > > > + * > > > > > + * Returns pointer on success, > > > > > + * Error code if item not found, or out-of-bounds access > > > > > + */ > > > > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map *map, > > > > > int idx) > > > > > +{ > > > > > + struct cgroup *cgrp; > > > > > + struct bpf_array *array = container_of(map, struct bpf_array, > > > > > map); > > > > > + > > > > > + if (unlikely(idx >= array->map.max_entries)) > > > > > + return ERR_PTR(-E2BIG); > > > > > + > > > > > + cgrp = READ_ONCE(array->ptrs[idx]); > > > > > + if (unlikely(!cgrp)) > > > > > + return ERR_PTR(-EAGAIN); > > > > > + > > > > > + return cgrp; > > > > > +} > > > > > +#endif /* CONFIG_CGROUPS */ > > > > > + > > > > > #endif /* _LINUX_BPF_H */ > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > > index da218fe..64b1a07 100644 > > > > > --- a/include/uapi/linux/bpf.h > > > > > +++ b/include/uapi/linux/bpf.h > > > > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > > > >*/ > > > > > BPF_FUNC_probe_write_user, > > > > > > > > > > + /** > > > > > + * bpf_current_task_in_cgroup(map, index) - Check cgroup2 > > > > > membership of current task > > > > > + * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > > > > > + * @index: index of the cgroup in the bpf_map > > > > > + * Return: > > > > > + * == 0 current failed the cgroup2 descendant test > > > > > + * == 1 current succeeded the cgroup2 descendant test > > > > > + *< 0 error > > > > > + */ > > > > > + BPF_FUNC_current_task_in_cgroup, > > > > > + > > > > > __BPF_FUNC_MAX_ID, > > > > > }; > > > > > > > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > > > index 633a650..a2ac051 100644 > > > > > --- a/kernel/bpf/arraymap.c > > > > > +++ b/kernel/bpf/arraymap.c > > > > > @@ -538,7 +538,7 @@ static int __init > > > > > register_perf_event_array_map(void) > > > > > } > > > > > late_initcall(register_perf_event_array_map); > > > > > > > > > > -#ifdef CONFIG_SOCK_CGROUP_DATA > > > > > +#ifdef CONFIG_CGROUPS > > > > > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, > > > > >struct file *map_file /* not used > > > > > */, > > > > >int fd) > > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > > > index 7094c69..80efab8 100644 > > > > > --- a/kernel/bpf/verifier.c > > > > > +++ b/kernel/bpf/verifier.c > > > > > @@ -1053,7 +1053,8 @@ static int check_map_func_compatibility(struct > > > > > bpf_map *map, int func_id) > > > > > goto error; > > > > > break; > > > > > case
Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver
On 08/09/2016 06:09 PM, Timur Tabi wrote: > Florian Fainelli wrote: > >> nr_frags can't be bigger than MAX_SKB_FRAGS, hence these checks all >> other drivers do against 1 + MAX_SKB_FRAGS. > > Doh, I just realized something. emac_mac_tx_buf_send() just needs to > make sure that there's enough room for ONE skb. For some reason I > thought it had to make sure there's enough room for multiple SKBs. > > Now it makes a lot more sense. Thank you. > > So it looks like a given SKB can occupy 3 + nr_frags descriptors. So I > need to change that line to: > > if (emac_tpd_num_free_descs(tx_q) < (MAX_SKB_FRAGS + 3)) > netif_stop_queue(adpt->netdev); > > Question, some drivers do <= instead of just <, like this: > > if (ring->free_bds <= (MAX_SKB_FRAGS + 1)) > netif_tx_stop_queue(txq); > > Is it necessary to stop the queue if there exactly enough descriptors to > hold an SKB? Shouldn't the above be this instead: > > if (ring->free_bds < (MAX_SKB_FRAGS + 1)) > netif_tx_stop_queue(txq); Humm, it kind of depends, but I would err on the side of strictly lesser than as a better behavior where you may still allow a full fragmented SKB to make it through on the next xmit call (I know the code you quote is exactly not doing that). > > >>> However, I'm confused about one thing. Almost every other driver just >>> sets "netdev->mtu = new_mtu" and does nothing else. I can't find any >>> other driver that actually stops the RX path, reprograms the hardware, >>> and then restarts the RX path. I know this is a stupid question, but >>> why is my driver doing that? >> >> Most drivers allocate RX buffer sizes that are usually bigger than the >> MTU, but would probably silently fail or expose transient behavior once >> the MTU changes to greater than the size pre-defined. > > So it looks like the real problem is a race condition between > > adpt->rxbuf_size = new_mtu > EMAC_DEF_RX_BUF_SIZE ? > ALIGN(max_frame, 8) : EMAC_DEF_RX_BUF_SIZE; > > and > > if (netif_running(netdev)) > return emac_reinit_locked(adpt); > > > That is, if the interface is running, I set rxbuf_size. If suddenly I > receive some packets, then the driver will use the wrong buffer size. Correct, and possibly other HW settings that you have may have to program to tell the MAC what the maximum packet length should be. > > Is there an easy way for me to stop the RX path before I set rxbuf_size? > Some netif_xxx function I can call? napi_disable() should take care of that. -- Florian
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 06:02:34PM -0700, Alexei Starovoitov wrote: > On Tue, Aug 09, 2016 at 05:55:26PM -0700, Sargun Dhillon wrote: > > On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > > > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > > > This adds a bpf helper that's similar to the skb_in_cgroup helper to > > > > check > > > > whether the probe is currently executing in the context of a specific > > > > subset of the cgroupsv2 hierarchy. It does this based on membership test > > > > for a cgroup arraymap. It is invalid to call this in an interrupt, and > > > > it'll return an error. The helper is primarily to be used in debugging > > > > activities for containers, where you may have multiple programs running > > > > in > > > > a given top-level "container". > > > > > > > > This patch also genericizes some of the arraymap fetching logic between > > > > the > > > > skb_in_cgroup helper and this new helper. > > > > > > > > Signed-off-by: Sargun Dhillon > > > > Cc: Alexei Starovoitov > > > > Cc: Daniel Borkmann > > > > --- > > > > include/linux/bpf.h | 24 > > > > include/uapi/linux/bpf.h | 11 +++ > > > > kernel/bpf/arraymap.c| 2 +- > > > > kernel/bpf/verifier.c| 4 +++- > > > > kernel/trace/bpf_trace.c | 34 ++ > > > > net/core/filter.c| 11 --- > > > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > > index 1113423..9adf712 100644 > > > > --- a/include/linux/bpf.h > > > > +++ b/include/linux/bpf.h > > > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > > > bpf_get_stackid_proto; > > > > void bpf_user_rnd_init_once(void); > > > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > > > > > +#ifdef CONFIG_CGROUPS > > > > +/* Helper to fetch a cgroup pointer based on index. > > > > + * @map: a cgroup arraymap > > > > + * @idx: index of the item you want to fetch > > > > + * > > > > + * Returns pointer on success, > > > > + * Error code if item not found, or out-of-bounds access > > > > + */ > > > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map *map, > > > > int idx) > > > > +{ > > > > + struct cgroup *cgrp; > > > > + struct bpf_array *array = container_of(map, struct bpf_array, > > > > map); > > > > + > > > > + if (unlikely(idx >= array->map.max_entries)) > > > > + return ERR_PTR(-E2BIG); > > > > + > > > > + cgrp = READ_ONCE(array->ptrs[idx]); > > > > + if (unlikely(!cgrp)) > > > > + return ERR_PTR(-EAGAIN); > > > > + > > > > + return cgrp; > > > > +} > > > > +#endif /* CONFIG_CGROUPS */ > > > > + > > > > #endif /* _LINUX_BPF_H */ > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > index da218fe..64b1a07 100644 > > > > --- a/include/uapi/linux/bpf.h > > > > +++ b/include/uapi/linux/bpf.h > > > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > > > */ > > > > BPF_FUNC_probe_write_user, > > > > > > > > + /** > > > > +* bpf_current_task_in_cgroup(map, index) - Check cgroup2 > > > > membership of current task > > > > +* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > > > > +* @index: index of the cgroup in the bpf_map > > > > +* Return: > > > > +* == 0 current failed the cgroup2 descendant test > > > > +* == 1 current succeeded the cgroup2 descendant test > > > > +*< 0 error > > > > +*/ > > > > + BPF_FUNC_current_task_in_cgroup, > > > > + > > > > __BPF_FUNC_MAX_ID, > > > > }; > > > > > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > > index 633a650..a2ac051 100644 > > > > --- a/kernel/bpf/arraymap.c > > > > +++ b/kernel/bpf/arraymap.c > > > > @@ -538,7 +538,7 @@ static int __init > > > > register_perf_event_array_map(void) > > > > } > > > > late_initcall(register_perf_event_array_map); > > > > > > > > -#ifdef CONFIG_SOCK_CGROUP_DATA > > > > +#ifdef CONFIG_CGROUPS > > > > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, > > > > struct file *map_file /* not used > > > > */, > > > > int fd) > > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > > index 7094c69..80efab8 100644 > > > > --- a/kernel/bpf/verifier.c > > > > +++ b/kernel/bpf/verifier.c > > > > @@ -1053,7 +1053,8 @@ static int check_map_func_compatibility(struct > > > > bpf_map *map, int func_id) > > > > goto error; > > > > break; > > > > case BPF_MAP_TYPE_CGROUP_ARRAY: > > > > - if (func_id != BPF_FUNC_skb_in_cgroup) > > > > + if (func_id != BPF_FUNC_skb_in_cgroup && > > > > + func_id != BPF_FUNC_current_task_in_cgroup) > > > >
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On 2016/8/10 7:51, Jay Vosburgh wrote: > Jörn Engel wrote: > >> On Tue, Aug 09, 2016 at 12:06:36PM -0700, David Miller wrote: On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: Simply not checking errors when setting the mac address solves the problem for me. No new features needed. >>> >>> But it only works in certain modes. >>> >>> So the best we can do is enforce the MAC address setting in the >>> modes that absolutely require it. We cannot ignore the MAC >>> address setting unilaterally. >> >> Something like this? >> >> [PATCH] bonding: Allow tun-interfaces as slaves in balance-rr mode >> >> Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be >> used to enslave tun-interfaces. 00503b6f702e broke that behaviour, >> afaics as an unintended side-effect. >> >> For the purpose of bond-over-tun in balance-rr mode, simply ignoring the >> error from dev_set_mac_address() is good enough. >> >> Signed-off-by: Joern Engel >> --- >> drivers/net/bonding/bond_main.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/bonding/bond_main.c >> b/drivers/net/bonding/bond_main.c >> index 1f276fa30ba6..2f686bfe4304 100644 >> --- a/drivers/net/bonding/bond_main.c >> +++ b/drivers/net/bonding/bond_main.c >> @@ -1490,7 +1490,8 @@ int bond_enslave(struct net_device *bond_dev, struct >> net_device *slave_dev) >> memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); >> addr.sa_family = slave_dev->type; >> res = dev_set_mac_address(slave_dev, &addr); >> -if (res) { >> +/* round-robin mode works fine without a mac address */ >> +if (res && BOND_MODE(bond) != BOND_MODE_ROUNDROBIN) { > > This will cause balance-rr to add the slave to the bond if any > device's dev_set_mac_address call fails. > > If a bond of regular Ethernet devices is connected to a static > link aggregation (Etherchannel channel group), a set_mac failure would > result in that slave having a different MAC address than the bond, which > in turn would cause traffic inbound from the switch to that slave to be > dropped (as the destination MAC would not pass the device MAC filters). > > The failure check for the set_mac call serves a legitimate > purpose, and I don't believe we should bypass it without making the > bypass an option that is explicitly enabled for those special cases that > need it. > > E.g., something like the following (which I have not tested); > this would also need documentation and iproute2 updates to go with it. > This would be enabled with "fail_over_mac=keepmac". > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 1f276fa30ba6..d2283fc23b16 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -1483,7 +1483,8 @@ int bond_enslave(struct net_device *bond_dev, struct > net_device *slave_dev) > ether_addr_copy(new_slave->perm_hwaddr, slave_dev->dev_addr); > > if (!bond->params.fail_over_mac || > - BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { > + (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP && > + bond->params.fail_over_mac != BOND_FOM_KEEPMAC)) { > /* Set slave to master's mac address. The application already >* set the master's mac address to that of the first slave >*/ > diff --git a/drivers/net/bonding/bond_options.c > b/drivers/net/bonding/bond_options.c > index 577e57cad1dc..f9653fe4d622 100644 > --- a/drivers/net/bonding/bond_options.c > +++ b/drivers/net/bonding/bond_options.c > @@ -125,6 +125,7 @@ static const struct bond_opt_value > bond_fail_over_mac_tbl[] = { > { "none", BOND_FOM_NONE, BOND_VALFLAG_DEFAULT}, > { "active", BOND_FOM_ACTIVE, 0}, > { "follow", BOND_FOM_FOLLOW, 0}, > + { "keepmac", BOND_FOM_KEEPMAC, 0}, > { NULL, -1, 0}, > }; > > diff --git a/include/net/bonding.h b/include/net/bonding.h > index 6360c259da6d..ec3442b3aa83 100644 > --- a/include/net/bonding.h > +++ b/include/net/bonding.h > @@ -420,6 +420,7 @@ static inline bool bond_slave_can_tx(struct slave *slave) > #define BOND_FOM_NONE0 > #define BOND_FOM_ACTIVE 1 > #define BOND_FOM_FOLLOW 2 > +#define BOND_FOM_KEEPMAC 3 > > #define BOND_ARP_TARGETS_ANY 0 > #define BOND_ARP_TARGETS_ALL 1 > > > -J > Hi Jay: It looks the best solution till now, the user need to ensure the slave don't need the same mac any more, and no need to checking the ndo_set_mac_address, it looks need more think about this later, but let we fix it first. Ding > --- > -Jay Vosburgh, jay.vosbu...@canonical.com > > . >
Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver
Florian Fainelli wrote: nr_frags can't be bigger than MAX_SKB_FRAGS, hence these checks all other drivers do against 1 + MAX_SKB_FRAGS. Doh, I just realized something. emac_mac_tx_buf_send() just needs to make sure that there's enough room for ONE skb. For some reason I thought it had to make sure there's enough room for multiple SKBs. Now it makes a lot more sense. Thank you. So it looks like a given SKB can occupy 3 + nr_frags descriptors. So I need to change that line to: if (emac_tpd_num_free_descs(tx_q) < (MAX_SKB_FRAGS + 3)) netif_stop_queue(adpt->netdev); Question, some drivers do <= instead of just <, like this: if (ring->free_bds <= (MAX_SKB_FRAGS + 1)) netif_tx_stop_queue(txq); Is it necessary to stop the queue if there exactly enough descriptors to hold an SKB? Shouldn't the above be this instead: if (ring->free_bds < (MAX_SKB_FRAGS + 1)) netif_tx_stop_queue(txq); However, I'm confused about one thing. Almost every other driver just sets "netdev->mtu = new_mtu" and does nothing else. I can't find any other driver that actually stops the RX path, reprograms the hardware, and then restarts the RX path. I know this is a stupid question, but why is my driver doing that? Most drivers allocate RX buffer sizes that are usually bigger than the MTU, but would probably silently fail or expose transient behavior once the MTU changes to greater than the size pre-defined. So it looks like the real problem is a race condition between adpt->rxbuf_size = new_mtu > EMAC_DEF_RX_BUF_SIZE ? ALIGN(max_frame, 8) : EMAC_DEF_RX_BUF_SIZE; and if (netif_running(netdev)) return emac_reinit_locked(adpt); That is, if the interface is running, I set rxbuf_size. If suddenly I receive some packets, then the driver will use the wrong buffer size. Is there an easy way for me to stop the RX path before I set rxbuf_size? Some netif_xxx function I can call? -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 05:55:26PM -0700, Sargun Dhillon wrote: > On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > > This adds a bpf helper that's similar to the skb_in_cgroup helper to check > > > whether the probe is currently executing in the context of a specific > > > subset of the cgroupsv2 hierarchy. It does this based on membership test > > > for a cgroup arraymap. It is invalid to call this in an interrupt, and > > > it'll return an error. The helper is primarily to be used in debugging > > > activities for containers, where you may have multiple programs running in > > > a given top-level "container". > > > > > > This patch also genericizes some of the arraymap fetching logic between > > > the > > > skb_in_cgroup helper and this new helper. > > > > > > Signed-off-by: Sargun Dhillon > > > Cc: Alexei Starovoitov > > > Cc: Daniel Borkmann > > > --- > > > include/linux/bpf.h | 24 > > > include/uapi/linux/bpf.h | 11 +++ > > > kernel/bpf/arraymap.c| 2 +- > > > kernel/bpf/verifier.c| 4 +++- > > > kernel/trace/bpf_trace.c | 34 ++ > > > net/core/filter.c| 11 --- > > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > > index 1113423..9adf712 100644 > > > --- a/include/linux/bpf.h > > > +++ b/include/linux/bpf.h > > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > > bpf_get_stackid_proto; > > > void bpf_user_rnd_init_once(void); > > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > > > +#ifdef CONFIG_CGROUPS > > > +/* Helper to fetch a cgroup pointer based on index. > > > + * @map: a cgroup arraymap > > > + * @idx: index of the item you want to fetch > > > + * > > > + * Returns pointer on success, > > > + * Error code if item not found, or out-of-bounds access > > > + */ > > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map *map, int > > > idx) > > > +{ > > > + struct cgroup *cgrp; > > > + struct bpf_array *array = container_of(map, struct bpf_array, map); > > > + > > > + if (unlikely(idx >= array->map.max_entries)) > > > + return ERR_PTR(-E2BIG); > > > + > > > + cgrp = READ_ONCE(array->ptrs[idx]); > > > + if (unlikely(!cgrp)) > > > + return ERR_PTR(-EAGAIN); > > > + > > > + return cgrp; > > > +} > > > +#endif /* CONFIG_CGROUPS */ > > > + > > > #endif /* _LINUX_BPF_H */ > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > index da218fe..64b1a07 100644 > > > --- a/include/uapi/linux/bpf.h > > > +++ b/include/uapi/linux/bpf.h > > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > >*/ > > > BPF_FUNC_probe_write_user, > > > > > > + /** > > > + * bpf_current_task_in_cgroup(map, index) - Check cgroup2 membership of > > > current task > > > + * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > > > + * @index: index of the cgroup in the bpf_map > > > + * Return: > > > + * == 0 current failed the cgroup2 descendant test > > > + * == 1 current succeeded the cgroup2 descendant test > > > + *< 0 error > > > + */ > > > + BPF_FUNC_current_task_in_cgroup, > > > + > > > __BPF_FUNC_MAX_ID, > > > }; > > > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > > index 633a650..a2ac051 100644 > > > --- a/kernel/bpf/arraymap.c > > > +++ b/kernel/bpf/arraymap.c > > > @@ -538,7 +538,7 @@ static int __init register_perf_event_array_map(void) > > > } > > > late_initcall(register_perf_event_array_map); > > > > > > -#ifdef CONFIG_SOCK_CGROUP_DATA > > > +#ifdef CONFIG_CGROUPS > > > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, > > >struct file *map_file /* not used */, > > >int fd) > > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > > index 7094c69..80efab8 100644 > > > --- a/kernel/bpf/verifier.c > > > +++ b/kernel/bpf/verifier.c > > > @@ -1053,7 +1053,8 @@ static int check_map_func_compatibility(struct > > > bpf_map *map, int func_id) > > > goto error; > > > break; > > > case BPF_MAP_TYPE_CGROUP_ARRAY: > > > - if (func_id != BPF_FUNC_skb_in_cgroup) > > > + if (func_id != BPF_FUNC_skb_in_cgroup && > > > + func_id != BPF_FUNC_current_task_in_cgroup) > > > goto error; > > > break; > > > default: > > > @@ -1075,6 +1076,7 @@ static int check_map_func_compatibility(struct > > > bpf_map *map, int func_id) > > > if (map->map_type != BPF_MAP_TYPE_STACK_TRACE) > > > goto error; > > > break; > > > + case BPF_FUNC_current_task_in_cgroup: > > > case BPF_FUNC_skb_in_cgroup: > > > if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY) > > > goto error; > > > diff --git a/kernel/trace/bpf_tr
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 05:23:50PM -0700, Alexei Starovoitov wrote: > On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > > This adds a bpf helper that's similar to the skb_in_cgroup helper to check > > whether the probe is currently executing in the context of a specific > > subset of the cgroupsv2 hierarchy. It does this based on membership test > > for a cgroup arraymap. It is invalid to call this in an interrupt, and > > it'll return an error. The helper is primarily to be used in debugging > > activities for containers, where you may have multiple programs running in > > a given top-level "container". > > > > This patch also genericizes some of the arraymap fetching logic between the > > skb_in_cgroup helper and this new helper. > > > > Signed-off-by: Sargun Dhillon > > Cc: Alexei Starovoitov > > Cc: Daniel Borkmann > > --- > > include/linux/bpf.h | 24 > > include/uapi/linux/bpf.h | 11 +++ > > kernel/bpf/arraymap.c| 2 +- > > kernel/bpf/verifier.c| 4 +++- > > kernel/trace/bpf_trace.c | 34 ++ > > net/core/filter.c| 11 --- > > 6 files changed, 77 insertions(+), 9 deletions(-) > > > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > > index 1113423..9adf712 100644 > > --- a/include/linux/bpf.h > > +++ b/include/linux/bpf.h > > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto > > bpf_get_stackid_proto; > > void bpf_user_rnd_init_once(void); > > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > > > +#ifdef CONFIG_CGROUPS > > +/* Helper to fetch a cgroup pointer based on index. > > + * @map: a cgroup arraymap > > + * @idx: index of the item you want to fetch > > + * > > + * Returns pointer on success, > > + * Error code if item not found, or out-of-bounds access > > + */ > > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map *map, int > > idx) > > +{ > > + struct cgroup *cgrp; > > + struct bpf_array *array = container_of(map, struct bpf_array, map); > > + > > + if (unlikely(idx >= array->map.max_entries)) > > + return ERR_PTR(-E2BIG); > > + > > + cgrp = READ_ONCE(array->ptrs[idx]); > > + if (unlikely(!cgrp)) > > + return ERR_PTR(-EAGAIN); > > + > > + return cgrp; > > +} > > +#endif /* CONFIG_CGROUPS */ > > + > > #endif /* _LINUX_BPF_H */ > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index da218fe..64b1a07 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -375,6 +375,17 @@ enum bpf_func_id { > > */ > > BPF_FUNC_probe_write_user, > > > > + /** > > +* bpf_current_task_in_cgroup(map, index) - Check cgroup2 membership of > > current task > > +* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > > +* @index: index of the cgroup in the bpf_map > > +* Return: > > +* == 0 current failed the cgroup2 descendant test > > +* == 1 current succeeded the cgroup2 descendant test > > +*< 0 error > > +*/ > > + BPF_FUNC_current_task_in_cgroup, > > + > > __BPF_FUNC_MAX_ID, > > }; > > > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > > index 633a650..a2ac051 100644 > > --- a/kernel/bpf/arraymap.c > > +++ b/kernel/bpf/arraymap.c > > @@ -538,7 +538,7 @@ static int __init register_perf_event_array_map(void) > > } > > late_initcall(register_perf_event_array_map); > > > > -#ifdef CONFIG_SOCK_CGROUP_DATA > > +#ifdef CONFIG_CGROUPS > > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, > > struct file *map_file /* not used */, > > int fd) > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > > index 7094c69..80efab8 100644 > > --- a/kernel/bpf/verifier.c > > +++ b/kernel/bpf/verifier.c > > @@ -1053,7 +1053,8 @@ static int check_map_func_compatibility(struct > > bpf_map *map, int func_id) > > goto error; > > break; > > case BPF_MAP_TYPE_CGROUP_ARRAY: > > - if (func_id != BPF_FUNC_skb_in_cgroup) > > + if (func_id != BPF_FUNC_skb_in_cgroup && > > + func_id != BPF_FUNC_current_task_in_cgroup) > > goto error; > > break; > > default: > > @@ -1075,6 +1076,7 @@ static int check_map_func_compatibility(struct > > bpf_map *map, int func_id) > > if (map->map_type != BPF_MAP_TYPE_STACK_TRACE) > > goto error; > > break; > > + case BPF_FUNC_current_task_in_cgroup: > > case BPF_FUNC_skb_in_cgroup: > > if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY) > > goto error; > > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > > index b20438f..39f0290 100644 > > --- a/kernel/trace/bpf_trace.c > > +++ b/kernel/trace/bpf_trace.c > > @@ -376,6 +376,36 @@ static const struct bpf_func_proto > > bpf_get_current_task_proto = { >
Re: [net-next v2 v2 2/2] samples/bpf: Add opensnoop example that uses current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 05:00:58PM -0700, Sargun Dhillon wrote: > This example adds the trace_opensnoop BPF sample. This example program > prints all activities of files being opened for all programs in the > provided cgroupsv2 cgroup and it's descendants in the cgroupv2 hierarchy. > > It populate a cgroups arraymap prior to execution in userspace. This means > that the program must be run in the same cgroups namespace as the programs > that are being traced. > > Signed-off-by: Sargun Dhillon > Cc: Alexei Starovoitov > Cc: Daniel Borkmann > --- > samples/bpf/Makefile | 4 +++ > samples/bpf/bpf_helpers.h | 2 ++ > samples/bpf/trace_opensnoop_kern.c | 35 +++ > samples/bpf/trace_opensnoop_user.c | 69 > ++ > 4 files changed, 110 insertions(+) > create mode 100644 samples/bpf/trace_opensnoop_kern.c > create mode 100644 samples/bpf/trace_opensnoop_user.c > > diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile > index 90ebf7d..d9c37a4 100644 > --- a/samples/bpf/Makefile > +++ b/samples/bpf/Makefile > @@ -24,6 +24,7 @@ hostprogs-y += test_overhead > hostprogs-y += test_cgrp2_array_pin > hostprogs-y += xdp1 > hostprogs-y += xdp2 > +hostprogs-y += trace_opensnoop > > test_verifier-objs := test_verifier.o libbpf.o > test_maps-objs := test_maps.o libbpf.o > @@ -49,6 +50,7 @@ test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o > xdp1-objs := bpf_load.o libbpf.o xdp1_user.o > # reuse xdp1 source intentionally > xdp2-objs := bpf_load.o libbpf.o xdp1_user.o > +trace_opensnoop-objs := bpf_load.o libbpf.o trace_opensnoop_user.o > > # Tell kbuild to always build the programs > always := $(hostprogs-y) > @@ -74,6 +76,7 @@ always += parse_varlen.o parse_simple.o parse_ldabs.o > always += test_cgrp2_tc_kern.o > always += xdp1_kern.o > always += xdp2_kern.o > +always += trace_opensnoop_kern.o > > HOSTCFLAGS += -I$(objtree)/usr/include > > @@ -97,6 +100,7 @@ HOSTLOADLIBES_map_perf_test += -lelf -lrt > HOSTLOADLIBES_test_overhead += -lelf -lrt > HOSTLOADLIBES_xdp1 += -lelf > HOSTLOADLIBES_xdp2 += -lelf > +HOSTLOADLIBES_trace_opensnoop += -lelf > > # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on > cmdline: > # make samples/bpf/ LLC=~/git/llvm/build/bin/llc > CLANG=~/git/llvm/build/bin/clang > diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h > index 217c8d5..d409cbb 100644 > --- a/samples/bpf/bpf_helpers.h > +++ b/samples/bpf/bpf_helpers.h > @@ -43,6 +43,8 @@ static int (*bpf_get_stackid)(void *ctx, void *map, int > flags) = > (void *) BPF_FUNC_get_stackid; > static int (*bpf_probe_write_user)(void *dst, void *src, int size) = > (void *) BPF_FUNC_probe_write_user; > +static int (*bpf_current_task_in_cgroup)(void *map, int index) = > + (void *) BPF_FUNC_current_task_in_cgroup; > > /* llvm builtin functions that eBPF C program may use to > * emit BPF_LD_ABS and BPF_LD_IND instructions > diff --git a/samples/bpf/trace_opensnoop_kern.c > b/samples/bpf/trace_opensnoop_kern.c > new file mode 100644 > index 000..dade471 > --- /dev/null > +++ b/samples/bpf/trace_opensnoop_kern.c > @@ -0,0 +1,35 @@ > +/* Copyright (c) 2016 Sargun Dhillon > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of version 2 of the GNU General Public > + * License as published by the Free Software Foundation. > + */ > + > +#include > +#include > +#include > +#include "bpf_helpers.h" > + > +struct bpf_map_def SEC("maps") cgroup_map = { > + .type = BPF_MAP_TYPE_CGROUP_ARRAY, > + .key_size = sizeof(u32), > + .value_size = sizeof(u32), > + .max_entries = 1, > +}; > + > +SEC("kprobe/sys_open") > +int bpf_prog1(struct pt_regs *ctx) > +{ > + const char *filename = (char *)PT_REGS_PARM1(ctx); > + char fmt[] = "Opening file: %s\n"; > + > + if (!bpf_current_task_in_cgroup(&cgroup_map, 0)) > + return 0; > + > + bpf_trace_printk(fmt, sizeof(fmt), filename); > + > + return 1; what is the point of return 1 here? Could you also add a bit more meat in here like real opensnoop does? Computing sys_open time delta and capturing return code? Then it will be a solid example and test. > +++ b/samples/bpf/trace_opensnoop_user.c > @@ -0,0 +1,69 @@ > +#include license banner pls. > + ret = bpf_update_elem(map_fd[0], > + &array_index, > + &cg2_fd, BPF_ANY); couldn't it be on the same line? > + if (ret) { > + perror("bpf_update_elem"); > + return 1; > + } > + > + read_trace_pipe(); could you make it into test instead? The examples that have to be ctrl-c are not friendly for automatic testing.
Re: [net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
On Tue, Aug 09, 2016 at 05:00:12PM -0700, Sargun Dhillon wrote: > This adds a bpf helper that's similar to the skb_in_cgroup helper to check > whether the probe is currently executing in the context of a specific > subset of the cgroupsv2 hierarchy. It does this based on membership test > for a cgroup arraymap. It is invalid to call this in an interrupt, and > it'll return an error. The helper is primarily to be used in debugging > activities for containers, where you may have multiple programs running in > a given top-level "container". > > This patch also genericizes some of the arraymap fetching logic between the > skb_in_cgroup helper and this new helper. > > Signed-off-by: Sargun Dhillon > Cc: Alexei Starovoitov > Cc: Daniel Borkmann > --- > include/linux/bpf.h | 24 > include/uapi/linux/bpf.h | 11 +++ > kernel/bpf/arraymap.c| 2 +- > kernel/bpf/verifier.c| 4 +++- > kernel/trace/bpf_trace.c | 34 ++ > net/core/filter.c| 11 --- > 6 files changed, 77 insertions(+), 9 deletions(-) > > diff --git a/include/linux/bpf.h b/include/linux/bpf.h > index 1113423..9adf712 100644 > --- a/include/linux/bpf.h > +++ b/include/linux/bpf.h > @@ -319,4 +319,28 @@ extern const struct bpf_func_proto bpf_get_stackid_proto; > void bpf_user_rnd_init_once(void); > u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); > > +#ifdef CONFIG_CGROUPS > +/* Helper to fetch a cgroup pointer based on index. > + * @map: a cgroup arraymap > + * @idx: index of the item you want to fetch > + * > + * Returns pointer on success, > + * Error code if item not found, or out-of-bounds access > + */ > +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map *map, int idx) > +{ > + struct cgroup *cgrp; > + struct bpf_array *array = container_of(map, struct bpf_array, map); > + > + if (unlikely(idx >= array->map.max_entries)) > + return ERR_PTR(-E2BIG); > + > + cgrp = READ_ONCE(array->ptrs[idx]); > + if (unlikely(!cgrp)) > + return ERR_PTR(-EAGAIN); > + > + return cgrp; > +} > +#endif /* CONFIG_CGROUPS */ > + > #endif /* _LINUX_BPF_H */ > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index da218fe..64b1a07 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -375,6 +375,17 @@ enum bpf_func_id { >*/ > BPF_FUNC_probe_write_user, > > + /** > + * bpf_current_task_in_cgroup(map, index) - Check cgroup2 membership of > current task > + * @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type > + * @index: index of the cgroup in the bpf_map > + * Return: > + * == 0 current failed the cgroup2 descendant test > + * == 1 current succeeded the cgroup2 descendant test > + *< 0 error > + */ > + BPF_FUNC_current_task_in_cgroup, > + > __BPF_FUNC_MAX_ID, > }; > > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c > index 633a650..a2ac051 100644 > --- a/kernel/bpf/arraymap.c > +++ b/kernel/bpf/arraymap.c > @@ -538,7 +538,7 @@ static int __init register_perf_event_array_map(void) > } > late_initcall(register_perf_event_array_map); > > -#ifdef CONFIG_SOCK_CGROUP_DATA > +#ifdef CONFIG_CGROUPS > static void *cgroup_fd_array_get_ptr(struct bpf_map *map, >struct file *map_file /* not used */, >int fd) > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 7094c69..80efab8 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -1053,7 +1053,8 @@ static int check_map_func_compatibility(struct bpf_map > *map, int func_id) > goto error; > break; > case BPF_MAP_TYPE_CGROUP_ARRAY: > - if (func_id != BPF_FUNC_skb_in_cgroup) > + if (func_id != BPF_FUNC_skb_in_cgroup && > + func_id != BPF_FUNC_current_task_in_cgroup) > goto error; > break; > default: > @@ -1075,6 +1076,7 @@ static int check_map_func_compatibility(struct bpf_map > *map, int func_id) > if (map->map_type != BPF_MAP_TYPE_STACK_TRACE) > goto error; > break; > + case BPF_FUNC_current_task_in_cgroup: > case BPF_FUNC_skb_in_cgroup: > if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY) > goto error; > diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c > index b20438f..39f0290 100644 > --- a/kernel/trace/bpf_trace.c > +++ b/kernel/trace/bpf_trace.c > @@ -376,6 +376,36 @@ static const struct bpf_func_proto > bpf_get_current_task_proto = { > .ret_type = RET_INTEGER, > }; > > +#ifdef CONFIG_CGROUPS > +static u64 bpf_current_task_in_cgroup(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) please don't introduce #ifdef into .c code. In this case add #else in .h to fetch_arraymap_p
Re: [PATCH v6 1/1] rps: Inspect PPTP encapsulated by GRE to get flow hash
On 08/08/2016 10:38 PM, f...@48lvckh6395k16k5.yundunddos.com wrote: From: Gao Feng The PPTP is encapsulated by GRE header with that GRE_VERSION bits must contain one. But current GRE RPS needs the GRE_VERSION must be zero. So RPS does not work for PPTP traffic. In my test environment, there are four MIPS cores, and all traffic are passed through by PPTP. As a result, only one core is 100% busy while other three cores are very idle. After this patch, the usage of four cores are balanced well. Signed-off-by: Gao Feng Reviewed-by: Philip Prindeville
Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access
On Tue, 2016-08-09 at 11:12 -0700, Alexander Duyck wrote: > > The PCI spec is what essentially assumes that there is only one block. > If I am not mistaken in the case of this device the second block here > actually contains device configuration data, not actual VPD data. The > issue here is that the second block is being accessed as VPD when it > isn't. Devices do funny things with config space, film at 11. VFIO trying to be the middle man and intercept/interpret things is broken, cannot work, will never work, will just results in lots and lots of useless code, but I've been singing that song for too long and nobody seems to care... > > > # Large item 42 bytes; name 0x2 Identifier String > > #002d Large item 74 bytes; name 0x10 > > #007a Small item 1 bytes; name 0xf End Tag > > --- > > #0c00 Large item 16 bytes; name 0x2 Identifier String > > #0c13 Large item 234 bytes; name 0x10 > > #0d00 Large item 252 bytes; name 0x11 > > #0dff Small item 0 bytes; name 0xf End Tag > > The second block here is driver proprietary setup bits. Right. They happen to be in VPD on this device. They an be elsewhere on other devices. In between capabilities on some, in vendor caps on others... > > > The cxgb3 driver is reading the second bit starting from 0xc00 but since > > the size is wrongly detected as 0x7c, VFIO blocks access beyond it and the > > guest driver fails to probe. > > > > I also cannot find a clause in the PCI 3.0 spec saying that there must be > > just a single block, is it there? > > > The problem is we need to be able to parse it. We can parse the standard part for generic stuff like inventory tools or lsvpd, but we shouldn't get in the way of the driver poking at its device. > The spec defines a > series of tags that can be used starting at offset 0. That is how we > are supposed to get around through the VPD data. The problem is we > can't have more than one end tag and what appears to be happening here > is that we are defining a second block of data which uses the same > formatting as VPD but is not VPD. > > > What would the correct fix be? Scanning all 32k of VPD is not an option I > > suppose as this is what this patch is trying to avoid. Thanks. > > I adding the current cxgb3 maintainer and netdev list to the Cc. This > is something that can probably be addressed via a PCI quirk as what > needs to happen is that we need to extend the VPD in the case of this > part in order to include this second block. As long as we can read > the VPD data all the way out to 0xdff odds are we could probably just > have the size arbitrarily increased to 0xe00 via the quirk and then > you would be able to access all of the VPD for the device. We already > have code making other modifications to drivers/pci/quirks.c for > several Broadcom devices and probably just need something similar to > allow extended access in the case of these devices. > > > > > > > > > This is the device: > > > > > [aik@p81-p9 ~]$ sudo lspci -vvnns 0001:03:00.0 > > 0001:03:00.0 Ethernet controller [0200]: Chelsio Communications Inc T310 > > 10GbE Single Port Adapter [1425:0030] > > Subsystem: IBM Device [1014:038c] > > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > > Stepping- SERR- FastB2B- DisINTx+ > > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > > > SERR- > Latency: 0 > > Interrupt: pin A routed to IRQ 494 > > Region 0: Memory at 3fe08088 (64-bit, non-prefetchable) > >[size=4K] > > Region 2: Memory at 3fe08000 (64-bit, non-prefetchable) > >[size=8M] > > Region 4: Memory at 3fe080881000 (64-bit, non-prefetchable) > >[size=4K] > > [virtual] Expansion ROM at 3fe08080 [disabled] [size=512K] > > Capabilities: [40] Power Management version 3 > > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > >PME(D0+,D1-,D2-,D3hot+,D3cold-) > > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > > Capabilities: [48] MSI: Enable- Count=1/32 Maskable- 64bit+ > > Address: Data: > > Capabilities: [58] Express (v2) Endpoint, MSI 00 > > DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s > ><64ns, L1 <1us > > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > > DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ > >Unsupported+ > > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > > MaxPayload 256 bytes, MaxReadReq 512 bytes > > DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- > >TransPend- > > LnkCap: Port #0, Speed 2.5GT/s, Width x8, ASPM L0s L1, Exit > >Latency L0s > > unlimited, L1 unlimited > > ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- > > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- > > E
[net-next v2 v2 2/2] samples/bpf: Add opensnoop example that uses current_task_in_cgroup helper
This example adds the trace_opensnoop BPF sample. This example program prints all activities of files being opened for all programs in the provided cgroupsv2 cgroup and it's descendants in the cgroupv2 hierarchy. It populate a cgroups arraymap prior to execution in userspace. This means that the program must be run in the same cgroups namespace as the programs that are being traced. Signed-off-by: Sargun Dhillon Cc: Alexei Starovoitov Cc: Daniel Borkmann --- samples/bpf/Makefile | 4 +++ samples/bpf/bpf_helpers.h | 2 ++ samples/bpf/trace_opensnoop_kern.c | 35 +++ samples/bpf/trace_opensnoop_user.c | 69 ++ 4 files changed, 110 insertions(+) create mode 100644 samples/bpf/trace_opensnoop_kern.c create mode 100644 samples/bpf/trace_opensnoop_user.c diff --git a/samples/bpf/Makefile b/samples/bpf/Makefile index 90ebf7d..d9c37a4 100644 --- a/samples/bpf/Makefile +++ b/samples/bpf/Makefile @@ -24,6 +24,7 @@ hostprogs-y += test_overhead hostprogs-y += test_cgrp2_array_pin hostprogs-y += xdp1 hostprogs-y += xdp2 +hostprogs-y += trace_opensnoop test_verifier-objs := test_verifier.o libbpf.o test_maps-objs := test_maps.o libbpf.o @@ -49,6 +50,7 @@ test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o xdp1-objs := bpf_load.o libbpf.o xdp1_user.o # reuse xdp1 source intentionally xdp2-objs := bpf_load.o libbpf.o xdp1_user.o +trace_opensnoop-objs := bpf_load.o libbpf.o trace_opensnoop_user.o # Tell kbuild to always build the programs always := $(hostprogs-y) @@ -74,6 +76,7 @@ always += parse_varlen.o parse_simple.o parse_ldabs.o always += test_cgrp2_tc_kern.o always += xdp1_kern.o always += xdp2_kern.o +always += trace_opensnoop_kern.o HOSTCFLAGS += -I$(objtree)/usr/include @@ -97,6 +100,7 @@ HOSTLOADLIBES_map_perf_test += -lelf -lrt HOSTLOADLIBES_test_overhead += -lelf -lrt HOSTLOADLIBES_xdp1 += -lelf HOSTLOADLIBES_xdp2 += -lelf +HOSTLOADLIBES_trace_opensnoop += -lelf # Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline: # make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang diff --git a/samples/bpf/bpf_helpers.h b/samples/bpf/bpf_helpers.h index 217c8d5..d409cbb 100644 --- a/samples/bpf/bpf_helpers.h +++ b/samples/bpf/bpf_helpers.h @@ -43,6 +43,8 @@ static int (*bpf_get_stackid)(void *ctx, void *map, int flags) = (void *) BPF_FUNC_get_stackid; static int (*bpf_probe_write_user)(void *dst, void *src, int size) = (void *) BPF_FUNC_probe_write_user; +static int (*bpf_current_task_in_cgroup)(void *map, int index) = + (void *) BPF_FUNC_current_task_in_cgroup; /* llvm builtin functions that eBPF C program may use to * emit BPF_LD_ABS and BPF_LD_IND instructions diff --git a/samples/bpf/trace_opensnoop_kern.c b/samples/bpf/trace_opensnoop_kern.c new file mode 100644 index 000..dade471 --- /dev/null +++ b/samples/bpf/trace_opensnoop_kern.c @@ -0,0 +1,35 @@ +/* Copyright (c) 2016 Sargun Dhillon + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of version 2 of the GNU General Public + * License as published by the Free Software Foundation. + */ + +#include +#include +#include +#include "bpf_helpers.h" + +struct bpf_map_def SEC("maps") cgroup_map = { + .type = BPF_MAP_TYPE_CGROUP_ARRAY, + .key_size = sizeof(u32), + .value_size = sizeof(u32), + .max_entries = 1, +}; + +SEC("kprobe/sys_open") +int bpf_prog1(struct pt_regs *ctx) +{ + const char *filename = (char *)PT_REGS_PARM1(ctx); + char fmt[] = "Opening file: %s\n"; + + if (!bpf_current_task_in_cgroup(&cgroup_map, 0)) + return 0; + + bpf_trace_printk(fmt, sizeof(fmt), filename); + + return 1; +} + +char _license[] SEC("license") = "GPL"; +u32 _version SEC("version") = LINUX_VERSION_CODE; diff --git a/samples/bpf/trace_opensnoop_user.c b/samples/bpf/trace_opensnoop_user.c new file mode 100644 index 000..403664e --- /dev/null +++ b/samples/bpf/trace_opensnoop_user.c @@ -0,0 +1,69 @@ +#include +#include +#include +#include +#include "libbpf.h" +#include "bpf_load.h" +#include +#include +#include +#include +#include +#include +#include + +static void usage(char **argv) +{ + printf("Usage: %s [...]\n", argv[0]); + printf("Prints the file opening activity of all processes under a given cgroupv2 hierarchy.\n"); + printf("-v Full path of the cgroup2\n"); + printf("-h Display this help\n"); +} + +int main(int argc, char **argv) +{ + char filename[256]; + const char *cg2 = NULL; + int ret, opt, cg2_fd; + int array_index = 0; + + while ((opt = getopt(argc, argv, "v:")) != -1) { + switch (opt) { + case 'v': + cg2 = optarg; + break; + default: +
[net-next v2 v2 1/2] bpf: Add bpf_current_task_in_cgroup helper
This adds a bpf helper that's similar to the skb_in_cgroup helper to check whether the probe is currently executing in the context of a specific subset of the cgroupsv2 hierarchy. It does this based on membership test for a cgroup arraymap. It is invalid to call this in an interrupt, and it'll return an error. The helper is primarily to be used in debugging activities for containers, where you may have multiple programs running in a given top-level "container". This patch also genericizes some of the arraymap fetching logic between the skb_in_cgroup helper and this new helper. Signed-off-by: Sargun Dhillon Cc: Alexei Starovoitov Cc: Daniel Borkmann --- include/linux/bpf.h | 24 include/uapi/linux/bpf.h | 11 +++ kernel/bpf/arraymap.c| 2 +- kernel/bpf/verifier.c| 4 +++- kernel/trace/bpf_trace.c | 34 ++ net/core/filter.c| 11 --- 6 files changed, 77 insertions(+), 9 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1113423..9adf712 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -319,4 +319,28 @@ extern const struct bpf_func_proto bpf_get_stackid_proto; void bpf_user_rnd_init_once(void); u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); +#ifdef CONFIG_CGROUPS +/* Helper to fetch a cgroup pointer based on index. + * @map: a cgroup arraymap + * @idx: index of the item you want to fetch + * + * Returns pointer on success, + * Error code if item not found, or out-of-bounds access + */ +static inline struct cgroup *fetch_arraymap_ptr(struct bpf_map *map, int idx) +{ + struct cgroup *cgrp; + struct bpf_array *array = container_of(map, struct bpf_array, map); + + if (unlikely(idx >= array->map.max_entries)) + return ERR_PTR(-E2BIG); + + cgrp = READ_ONCE(array->ptrs[idx]); + if (unlikely(!cgrp)) + return ERR_PTR(-EAGAIN); + + return cgrp; +} +#endif /* CONFIG_CGROUPS */ + #endif /* _LINUX_BPF_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index da218fe..64b1a07 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -375,6 +375,17 @@ enum bpf_func_id { */ BPF_FUNC_probe_write_user, + /** +* bpf_current_task_in_cgroup(map, index) - Check cgroup2 membership of current task +* @map: pointer to bpf_map in BPF_MAP_TYPE_CGROUP_ARRAY type +* @index: index of the cgroup in the bpf_map +* Return: +* == 0 current failed the cgroup2 descendant test +* == 1 current succeeded the cgroup2 descendant test +*< 0 error +*/ + BPF_FUNC_current_task_in_cgroup, + __BPF_FUNC_MAX_ID, }; diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c index 633a650..a2ac051 100644 --- a/kernel/bpf/arraymap.c +++ b/kernel/bpf/arraymap.c @@ -538,7 +538,7 @@ static int __init register_perf_event_array_map(void) } late_initcall(register_perf_event_array_map); -#ifdef CONFIG_SOCK_CGROUP_DATA +#ifdef CONFIG_CGROUPS static void *cgroup_fd_array_get_ptr(struct bpf_map *map, struct file *map_file /* not used */, int fd) diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index 7094c69..80efab8 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1053,7 +1053,8 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id) goto error; break; case BPF_MAP_TYPE_CGROUP_ARRAY: - if (func_id != BPF_FUNC_skb_in_cgroup) + if (func_id != BPF_FUNC_skb_in_cgroup && + func_id != BPF_FUNC_current_task_in_cgroup) goto error; break; default: @@ -1075,6 +1076,7 @@ static int check_map_func_compatibility(struct bpf_map *map, int func_id) if (map->map_type != BPF_MAP_TYPE_STACK_TRACE) goto error; break; + case BPF_FUNC_current_task_in_cgroup: case BPF_FUNC_skb_in_cgroup: if (map->map_type != BPF_MAP_TYPE_CGROUP_ARRAY) goto error; diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index b20438f..39f0290 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -376,6 +376,36 @@ static const struct bpf_func_proto bpf_get_current_task_proto = { .ret_type = RET_INTEGER, }; +#ifdef CONFIG_CGROUPS +static u64 bpf_current_task_in_cgroup(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) +{ + struct bpf_map *map = (struct bpf_map *)(long)r1; + struct css_set *cset; + struct cgroup *cgrp; + u32 idx = (u32)r2; + + if (unlikely(in_interrupt())) + return -EINVAL; + + cgrp = fetch_arraymap_ptr(map, idx); + + if (unlikely(IS_ERR(cgrp))) + return PTR_
[net-next v2 v2 0/2] Add bpf current_task_in_cgroup helper & opensnoop example
This patchset includes a helper and an example to determine whether the probe is currently executing in the context of a specific cgroup based on a cgroup bpf map / array. The helper checks the cgroupsv2 hierarchy based on the handle in the map and if the current cgroup is equal to it, or a descendant of it. The helper was tested with the example program, and it was verified that the correct behaviour occurs in the interrupt context. The example on the other hand, "open snoop" is much simplified version of that in the iovisor/BCC project. In order to run it, you must supply a specific cgroup in the hierarchy, and it'll print out all files being opened under it. v1->v2: Add better example code -- OpenSnoop, clean up Sargun Dhillon (2): bpf: Add bpf_current_task_in_cgroup helper samples/bpf: Add opensnoop example that uses current_task_in_cgroup helper include/linux/bpf.h| 24 + include/uapi/linux/bpf.h | 11 ++ kernel/bpf/arraymap.c | 2 +- kernel/bpf/verifier.c | 4 ++- kernel/trace/bpf_trace.c | 34 ++ net/core/filter.c | 11 +++--- samples/bpf/Makefile | 4 +++ samples/bpf/bpf_helpers.h | 2 ++ samples/bpf/trace_opensnoop_kern.c | 35 +++ samples/bpf/trace_opensnoop_user.c | 70 ++ 10 files changed, 188 insertions(+), 9 deletions(-) create mode 100644 samples/bpf/trace_opensnoop_kern.c create mode 100644 samples/bpf/trace_opensnoop_user.c -- 2.7.4
Re: [PATCH] bonding: Allow tun-interfaces as slaves
Jörn Engel wrote: >On Tue, Aug 09, 2016 at 12:06:36PM -0700, David Miller wrote: >> > On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: >> > >> > Simply not checking errors when setting the mac address solves the >> > problem for me. No new features needed. >> >> But it only works in certain modes. >> >> So the best we can do is enforce the MAC address setting in the >> modes that absolutely require it. We cannot ignore the MAC >> address setting unilaterally. > >Something like this? > >[PATCH] bonding: Allow tun-interfaces as slaves in balance-rr mode > >Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be >used to enslave tun-interfaces. 00503b6f702e broke that behaviour, >afaics as an unintended side-effect. > >For the purpose of bond-over-tun in balance-rr mode, simply ignoring the >error from dev_set_mac_address() is good enough. > >Signed-off-by: Joern Engel >--- > drivers/net/bonding/bond_main.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index 1f276fa30ba6..2f686bfe4304 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -1490,7 +1490,8 @@ int bond_enslave(struct net_device *bond_dev, struct >net_device *slave_dev) > memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); > addr.sa_family = slave_dev->type; > res = dev_set_mac_address(slave_dev, &addr); >- if (res) { >+ /* round-robin mode works fine without a mac address */ >+ if (res && BOND_MODE(bond) != BOND_MODE_ROUNDROBIN) { This will cause balance-rr to add the slave to the bond if any device's dev_set_mac_address call fails. If a bond of regular Ethernet devices is connected to a static link aggregation (Etherchannel channel group), a set_mac failure would result in that slave having a different MAC address than the bond, which in turn would cause traffic inbound from the switch to that slave to be dropped (as the destination MAC would not pass the device MAC filters). The failure check for the set_mac call serves a legitimate purpose, and I don't believe we should bypass it without making the bypass an option that is explicitly enabled for those special cases that need it. E.g., something like the following (which I have not tested); this would also need documentation and iproute2 updates to go with it. This would be enabled with "fail_over_mac=keepmac". diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f276fa30ba6..d2283fc23b16 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1483,7 +1483,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) ether_addr_copy(new_slave->perm_hwaddr, slave_dev->dev_addr); if (!bond->params.fail_over_mac || - BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { + (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP && +bond->params.fail_over_mac != BOND_FOM_KEEPMAC)) { /* Set slave to master's mac address. The application already * set the master's mac address to that of the first slave */ diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 577e57cad1dc..f9653fe4d622 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -125,6 +125,7 @@ static const struct bond_opt_value bond_fail_over_mac_tbl[] = { { "none", BOND_FOM_NONE, BOND_VALFLAG_DEFAULT}, { "active", BOND_FOM_ACTIVE, 0}, { "follow", BOND_FOM_FOLLOW, 0}, + { "keepmac", BOND_FOM_KEEPMAC, 0}, { NULL, -1, 0}, }; diff --git a/include/net/bonding.h b/include/net/bonding.h index 6360c259da6d..ec3442b3aa83 100644 --- a/include/net/bonding.h +++ b/include/net/bonding.h @@ -420,6 +420,7 @@ static inline bool bond_slave_can_tx(struct slave *slave) #define BOND_FOM_NONE 0 #define BOND_FOM_ACTIVE1 #define BOND_FOM_FOLLOW2 +#define BOND_FOM_KEEPMAC 3 #define BOND_ARP_TARGETS_ANY 0 #define BOND_ARP_TARGETS_ALL 1 -J --- -Jay Vosburgh, jay.vosbu...@canonical.com
[PATCH v3 05/13] net: ethernet: ti: cpsw: don't check slave num in runtime
No need to check const slave num in runtime for every packet, and ndev for slaves w/o ndev is anyway NULL. So remove redundant check and macro. Reviewed-by: Mugunthan V N Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 0b6958d..cfbb1f2 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -497,9 +497,6 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { n; n--) \ (func)(slave++, ##arg); \ } while (0) -#define cpsw_get_slave_ndev(priv, __slave_no__) \ - ((__slave_no__ < priv->data.slaves) ? \ - priv->slaves[__slave_no__].ndev : NULL) #define cpsw_get_slave_priv(priv, __slave_no__) \ (((__slave_no__ < priv->data.slaves) && \ (priv->slaves[__slave_no__].ndev)) ?\ @@ -510,11 +507,11 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { if (!priv->data.dual_emac) \ break; \ if (CPDMA_RX_SOURCE_PORT(status) == 1) {\ - ndev = cpsw_get_slave_ndev(priv, 0);\ + ndev = priv->slaves[0].ndev;\ priv = netdev_priv(ndev); \ skb->dev = ndev;\ } else if (CPDMA_RX_SOURCE_PORT(status) == 2) { \ - ndev = cpsw_get_slave_ndev(priv, 1);\ + ndev = priv->slaves[1].ndev;\ priv = netdev_priv(ndev); \ skb->dev = ndev;\ } \ @@ -2561,7 +2558,7 @@ static int cpsw_remove(struct platform_device *pdev) } if (priv->data.dual_emac) - unregister_netdev(cpsw_get_slave_ndev(priv, 1)); + unregister_netdev(priv->slaves[1].ndev); unregister_netdev(ndev); cpsw_ale_destroy(priv->ale); @@ -2570,7 +2567,7 @@ static int cpsw_remove(struct platform_device *pdev) pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev); if (priv->data.dual_emac) - free_netdev(cpsw_get_slave_ndev(priv, 1)); + free_netdev(priv->slaves[1].ndev); free_netdev(ndev); return 0; } -- 1.9.1
[PATCH v3 04/13] net: ethernet: ti: cpsw: remove clk var from priv
There is no need to hold link to clk, it's used only once while probe. Reviewed-by: Mugunthan V N Reviewed-by: Grygorii Strashko Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 10 -- 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 4f6a4c1..0b6958d 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -379,7 +379,6 @@ struct cpsw_priv { u32 coal_intvl; u32 bus_freq_mhz; int rx_packet_max; - struct clk *clk; u8 mac_addr[ETH_ALEN]; struct cpsw_slave *slaves; struct cpdma_ctlr *dma; @@ -2177,8 +2176,6 @@ static int cpsw_probe_dual_emac(struct platform_device *pdev, memcpy(ndev->dev_addr, priv_sl2->mac_addr, ETH_ALEN); priv_sl2->slaves = priv->slaves; - priv_sl2->clk = priv->clk; - priv_sl2->coal_intvl = 0; priv_sl2->bus_freq_mhz = priv->bus_freq_mhz; @@ -2256,6 +2253,7 @@ MODULE_DEVICE_TABLE(of, cpsw_of_mtable); static int cpsw_probe(struct platform_device *pdev) { + struct clk *clk; struct cpsw_platform_data *data; struct net_device *ndev; struct cpsw_priv*priv; @@ -2334,14 +2332,14 @@ static int cpsw_probe(struct platform_device *pdev) priv->slaves[0].ndev = ndev; priv->emac_port = 0; - priv->clk = devm_clk_get(&pdev->dev, "fck"); - if (IS_ERR(priv->clk)) { + clk = devm_clk_get(&pdev->dev, "fck"); + if (IS_ERR(clk)) { dev_err(priv->dev, "fck is not found\n"); ret = -ENODEV; goto clean_runtime_disable_ret; } priv->coal_intvl = 0; - priv->bus_freq_mhz = clk_get_rate(priv->clk) / 100; + priv->bus_freq_mhz = clk_get_rate(clk) / 100; ss_res = platform_get_resource(pdev, IORESOURCE_MEM, 0); ss_regs = devm_ioremap_resource(&pdev->dev, ss_res); -- 1.9.1
[PATCH v3 06/13] net: ethernet: ti: cpsw: create common struct to hold shared driver data
This patch simply create holder for common data and as a start moves pdev var to it. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 62 ++ 1 file changed, 39 insertions(+), 23 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index cfbb1f2..3ccf577 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -363,8 +363,11 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) __raw_writel(val, slave->regs + offset); } -struct cpsw_priv { +struct cpsw_common { struct platform_device *pdev; +}; + +struct cpsw_priv { struct net_device *ndev; struct napi_struct napi_rx; struct napi_struct napi_tx; @@ -394,6 +397,7 @@ struct cpsw_priv { u32 num_irqs; struct cpts *cpts; u32 emac_port; + struct cpsw_common *cpsw; }; struct cpsw_stats { @@ -484,6 +488,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { #define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats) +#define ndev_to_cpsw(ndev) (((struct cpsw_priv *)netdev_priv(ndev))->cpsw) #define napi_to_priv(napi) container_of(napi, struct cpsw_priv, napi) #define for_each_slave(priv, func, arg...) \ do {\ @@ -1091,6 +1096,7 @@ static void soft_reset_slave(struct cpsw_slave *slave) static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv) { u32 slave_port; + struct cpsw_common *cpsw = priv->cpsw; soft_reset_slave(slave); @@ -1149,7 +1155,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv) phy_start(slave->phy); /* Configure GMII_SEL register */ - cpsw_phy_sel(&priv->pdev->dev, slave->phy->interface, slave->slave_num); + cpsw_phy_sel(&cpsw->pdev->dev, slave->phy->interface, slave->slave_num); } static inline void cpsw_add_default_vlan(struct cpsw_priv *priv) @@ -1231,12 +1237,13 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, struct cpsw_priv *priv) static int cpsw_ndo_open(struct net_device *ndev) { struct cpsw_priv *priv = netdev_priv(ndev); + struct cpsw_common *cpsw = priv->cpsw; int i, ret; u32 reg; - ret = pm_runtime_get_sync(&priv->pdev->dev); + ret = pm_runtime_get_sync(&cpsw->pdev->dev); if (ret < 0) { - pm_runtime_put_noidle(&priv->pdev->dev); + pm_runtime_put_noidle(&cpsw->pdev->dev); return ret; } @@ -1313,7 +1320,7 @@ static int cpsw_ndo_open(struct net_device *ndev) */ cpsw_info(priv, ifup, "submitted %d rx descriptors\n", i); - if (cpts_register(&priv->pdev->dev, priv->cpts, + if (cpts_register(&cpsw->pdev->dev, priv->cpts, priv->data.cpts_clock_mult, priv->data.cpts_clock_shift)) dev_err(priv->dev, "error registering cpts device\n"); @@ -1338,7 +1345,7 @@ static int cpsw_ndo_open(struct net_device *ndev) err_cleanup: cpdma_ctlr_stop(priv->dma); for_each_slave(priv, cpsw_slave_stop, priv); - pm_runtime_put_sync(&priv->pdev->dev); + pm_runtime_put_sync(&cpsw->pdev->dev); netif_carrier_off(priv->ndev); return ret; } @@ -1346,6 +1353,7 @@ err_cleanup: static int cpsw_ndo_stop(struct net_device *ndev) { struct cpsw_priv *priv = netdev_priv(ndev); + struct cpsw_common *cpsw = priv->cpsw; cpsw_info(priv, ifdown, "shutting down cpsw device\n"); netif_stop_queue(priv->ndev); @@ -1362,7 +1370,7 @@ static int cpsw_ndo_stop(struct net_device *ndev) cpsw_ale_stop(priv->ale); } for_each_slave(priv, cpsw_slave_stop, priv); - pm_runtime_put_sync(&priv->pdev->dev); + pm_runtime_put_sync(&cpsw->pdev->dev); if (priv->data.dual_emac) priv->slaves[priv->emac_port].open_stat = false; return 0; @@ -1594,6 +1602,7 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p) { struct cpsw_priv *priv = netdev_priv(ndev); struct sockaddr *addr = (struct sockaddr *)p; + struct cpsw_common *cpsw = priv->cpsw; int flags = 0; u16 vid = 0; int ret; @@ -1601,9 +1610,9 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p) if (!is_valid_ether_addr(addr->sa_data)) return -EADDRNOTAVAIL; - ret = pm_runtime_get_sync(&priv->pdev->dev); + ret = pm_runtime_get_sync(&cpsw->pdev->dev); if (ret < 0) { - pm_runtime_put_noidle(&priv->pdev->dev); + pm_runtime_put_noidle(&cpsw->pdev->dev); return ret; }
[PATCH v3 08/13] net: ethernet: ti: cpsw: move links on h/w registers to cpsw_common
The pointers on h/w registers are common for every cpsw_private instance, so no need to hold them for every ndev. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 97 +++--- 1 file changed, 53 insertions(+), 44 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index c21cc38..5db2a55 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -365,6 +365,10 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) struct cpsw_common { struct device *dev; + struct cpsw_ss_regs __iomem *regs; + struct cpsw_wr_regs __iomem *wr_regs; + u8 __iomem *hw_stats; + struct cpsw_host_regs __iomem *host_port_regs; }; struct cpsw_priv { @@ -373,10 +377,6 @@ struct cpsw_priv { struct napi_struct napi_tx; struct device *dev; struct cpsw_platform_data data; - struct cpsw_ss_regs __iomem *regs; - struct cpsw_wr_regs __iomem *wr_regs; - u8 __iomem *hw_stats; - struct cpsw_host_regs __iomem *host_port_regs; u32 msg_enable; u32 version; u32 coal_intvl; @@ -656,8 +656,10 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev) static void cpsw_intr_enable(struct cpsw_priv *priv) { - __raw_writel(0xFF, &priv->wr_regs->tx_en); - __raw_writel(0xFF, &priv->wr_regs->rx_en); + struct cpsw_common *cpsw = priv->cpsw; + + __raw_writel(0xFF, &cpsw->wr_regs->tx_en); + __raw_writel(0xFF, &cpsw->wr_regs->rx_en); cpdma_ctlr_int_ctrl(priv->dma, true); return; @@ -665,8 +667,10 @@ static void cpsw_intr_enable(struct cpsw_priv *priv) static void cpsw_intr_disable(struct cpsw_priv *priv) { - __raw_writel(0, &priv->wr_regs->tx_en); - __raw_writel(0, &priv->wr_regs->rx_en); + struct cpsw_common *cpsw = priv->cpsw; + + __raw_writel(0, &cpsw->wr_regs->tx_en); + __raw_writel(0, &cpsw->wr_regs->rx_en); cpdma_ctlr_int_ctrl(priv->dma, false); return; @@ -750,8 +754,9 @@ requeue: static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) { struct cpsw_priv *priv = dev_id; + struct cpsw_common *cpsw = priv->cpsw; - writel(0, &priv->wr_regs->tx_en); + writel(0, &cpsw->wr_regs->tx_en); cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_TX); if (priv->quirk_irq) { @@ -766,9 +771,10 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) { struct cpsw_priv *priv = dev_id; + struct cpsw_common *cpsw = priv->cpsw; cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_RX); - writel(0, &priv->wr_regs->rx_en); + writel(0, &cpsw->wr_regs->rx_en); if (priv->quirk_irq) { disable_irq_nosync(priv->irqs_table[0]); @@ -783,11 +789,12 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget) { struct cpsw_priv*priv = napi_to_priv(napi_tx); int num_tx; + struct cpsw_common *cpsw = priv->cpsw; num_tx = cpdma_chan_process(priv->txch, budget); if (num_tx < budget) { napi_complete(napi_tx); - writel(0xff, &priv->wr_regs->tx_en); + writel(0xff, &cpsw->wr_regs->tx_en); if (priv->quirk_irq && priv->tx_irq_disabled) { priv->tx_irq_disabled = false; enable_irq(priv->irqs_table[1]); @@ -801,11 +808,12 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget) { struct cpsw_priv*priv = napi_to_priv(napi_rx); int num_rx; + struct cpsw_common *cpsw = priv->cpsw; num_rx = cpdma_chan_process(priv->rxch, budget); if (num_rx < budget) { napi_complete(napi_rx); - writel(0xff, &priv->wr_regs->rx_en); + writel(0xff, &cpsw->wr_regs->rx_en); if (priv->quirk_irq && priv->rx_irq_disabled) { priv->rx_irq_disabled = false; enable_irq(priv->irqs_table[0]); @@ -925,10 +933,11 @@ static int cpsw_set_coalesce(struct net_device *ndev, u32 prescale = 0; u32 addnl_dvdr = 1; u32 coal_intvl = 0; + struct cpsw_common *cpsw = priv->cpsw; coal_intvl = coal->rx_coalesce_usecs; - int_ctrl = readl(&priv->wr_regs->int_control); + int_ctrl = readl(&cpsw->wr_regs->int_control); prescale = priv->bus_freq_mhz * 4; if (!coal->rx_coalesce_usecs) { @@ -957,15 +966,15 @@ static int cpsw_set_coalesce(struct net_device *ndev, } num_interrupts = (1000 * addn
[PATCH v3 03/13] net: ethernet: ti: cpsw: remove priv from cpsw_get_slave_port() parameters list
There is no need in priv here. Reviewed-by: Mugunthan V N Reviewed-by: Grygorii Strashko Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 21cf367..4f6a4c1 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -525,7 +525,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { if (priv->data.dual_emac) { \ struct cpsw_slave *slave = priv->slaves + \ priv->emac_port;\ - int slave_port = cpsw_get_slave_port(priv, \ + int slave_port = cpsw_get_slave_port( \ slave->slave_num); \ cpsw_ale_add_mcast(priv->ale, addr, \ 1 << slave_port | ALE_PORT_HOST,\ @@ -537,7 +537,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { } \ } while (0) -static inline int cpsw_get_slave_port(struct cpsw_priv *priv, u32 slave_num) +static inline int cpsw_get_slave_port(u32 slave_num) { return slave_num + 1; } @@ -847,7 +847,7 @@ static void _cpsw_adjust_link(struct cpsw_slave *slave, if (!phy) return; - slave_port = cpsw_get_slave_port(priv, slave->slave_num); + slave_port = cpsw_get_slave_port(slave->slave_num); if (phy->link) { mac_control = priv->data.mac_control; @@ -1118,7 +1118,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv) slave->mac_control = 0; /* no link yet */ - slave_port = cpsw_get_slave_port(priv, slave->slave_num); + slave_port = cpsw_get_slave_port(slave->slave_num); if (priv->data.dual_emac) cpsw_add_dual_emac_def_ale_entries(priv, slave, slave_port); @@ -1220,7 +1220,7 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, struct cpsw_priv *priv) { u32 slave_port; - slave_port = cpsw_get_slave_port(priv, slave->slave_num); + slave_port = cpsw_get_slave_port(slave->slave_num); if (!slave->phy) return; -- 1.9.1
[PATCH v3 00/13] net: ethernet: ti: cpsw: split driver data and per ndev data
In dual_emac mode the driver can handle 2 network devices. Each of them can use its own private data and common data/resources. This patchset splits common driver data/resources and private per net device data. It leads to: - reduce memory usage - increase code readability - allows add a bunch of simplification - create prerequisites to add multi-channel support, when channels are shared between net devices Doesn't have bad impact on performance. v2: https://lkml.org/lkml/2016/8/6/108 Since v2: - removed patch: net: ethernet: ti: cpsw: fix int dbg message - replaced patch: "net: ethernet: ti: cpsw: remove redundant check in napi poll" on "net: ethernet: ti: cpsw: remove intr dbg msg from poll handlers" - removed macro "cpsw_get_slave_ndev" - corrected some commits Since v1: - added several patch improvements - avoided variable reordering in structures - removed static variable for common function - split big patch on several patches: net: ethernet: ti: cpsw: remove priv from cpsw_get_slave_port() parameters list net: ethernet: ti: cpsw: remove clk var from priv net: ethernet: ti: cpsw: don't check slave num in runtime net: ethernet: ti: cpsw: create common struct to hold shared driver data net: ethernet: ti: cpsw: replace pdev on dev net: ethernet: ti: cpsw: move links on h/w registers to cpsw_common net: ethernet: ti: cpsw: move cpdma resources to cpsw_common net; ethernet: ti: cpsw: move irq stuff under cpsw_common net: ethernet: ti: cpsw: move data platform data and slaves info to cpsw_common net: ethernet: ti: cpsw: fix int dbg message net: ethernet: ti: cpsw: move napi struct to cpsw_common net: ethernet: ti: cpsw: move ale, cpts and drivers params under Based on net-next/master Ivan Khoronzhuk (13): net: ethernet: ti: cpsw: simplify submit routine net: ethernet: ti: cpsw: remove intr dbg msg from poll handlers net: ethernet: ti: cpsw: remove priv from cpsw_get_slave_port() parameters list net: ethernet: ti: cpsw: remove clk var from priv net: ethernet: ti: cpsw: don't check slave num in runtime net: ethernet: ti: cpsw: create common struct to hold shared driver data net: ethernet: ti: cpsw: replace pdev on dev net: ethernet: ti: cpsw: move links on h/w registers to cpsw_common net: ethernet: ti: cpsw: move cpdma resources to cpsw_common net; ethernet: ti: cpsw: move irq stuff under cpsw_common net: ethernet: ti: cpsw: move platform data and slaves info to cpsw_common net: ethernet: ti: cpsw: move napi struct to cpsw_common net: ethernet: ti: cpsw: move ale, cpts and drivers params under cpsw_common drivers/net/ethernet/ti/cpsw.c | 847 - 1 file changed, 413 insertions(+), 434 deletions(-) -- 1.9.1
[PATCH v3 11/13] net: ethernet: ti: cpsw: move platform data and slaves info to cpsw_common
These data are common for net devs in dual_emac mode. No need to hold it for every priv instance, so move them under cpsw_common. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 265 + 1 file changed, 137 insertions(+), 128 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index b2482b6..ab5488b 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -140,9 +140,9 @@ do { \ #define CPSW_CMINTMAX_INTVL(1000 / CPSW_CMINTMIN_CNT) #define CPSW_CMINTMIN_INTVL((1000 / CPSW_CMINTMAX_CNT) + 1) -#define cpsw_slave_index(priv) \ - ((priv->data.dual_emac) ? priv->emac_port : \ - priv->data.active_slave) +#define cpsw_slave_index(cpsw, priv) \ + ((cpsw->data.dual_emac) ? priv->emac_port : \ + cpsw->data.active_slave) #define IRQ_NUM2 static int debug_level; @@ -366,10 +366,12 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) struct cpsw_common { struct device *dev; + struct cpsw_platform_data data; struct cpsw_ss_regs __iomem *regs; struct cpsw_wr_regs __iomem *wr_regs; u8 __iomem *hw_stats; struct cpsw_host_regs __iomem *host_port_regs; + struct cpsw_slave *slaves; struct cpdma_ctlr *dma; struct cpdma_chan *txch, *rxch; boolquirk_irq; @@ -383,14 +385,12 @@ struct cpsw_priv { struct napi_struct napi_rx; struct napi_struct napi_tx; struct device *dev; - struct cpsw_platform_data data; u32 msg_enable; u32 version; u32 coal_intvl; u32 bus_freq_mhz; int rx_packet_max; u8 mac_addr[ETH_ALEN]; - struct cpsw_slave *slaves; struct cpsw_ale *ale; boolrx_pause; booltx_pause; @@ -492,38 +492,39 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { #define for_each_slave(priv, func, arg...) \ do {\ struct cpsw_slave *slave; \ + struct cpsw_common *cpsw = (priv)->cpsw;\ int n; \ - if (priv->data.dual_emac) \ - (func)((priv)->slaves + priv->emac_port, ##arg);\ + if (cpsw->data.dual_emac) \ + (func)((cpsw)->slaves + priv->emac_port, ##arg);\ else\ - for (n = (priv)->data.slaves, \ - slave = (priv)->slaves; \ + for (n = cpsw->data.slaves, \ + slave = cpsw->slaves; \ n; n--) \ (func)(slave++, ##arg); \ } while (0) -#define cpsw_get_slave_priv(priv, __slave_no__) \ - (((__slave_no__ < priv->data.slaves) && \ - (priv->slaves[__slave_no__].ndev)) ?\ - netdev_priv(priv->slaves[__slave_no__].ndev) : NULL)\ +#define cpsw_get_slave_priv(cpsw, __slave_no__) \ + (((__slave_no__ < cpsw->data.slaves) && \ + (cpsw->slaves[__slave_no__].ndev)) ?\ + netdev_priv(cpsw->slaves[__slave_no__].ndev) : NULL)\ -#define cpsw_dual_emac_src_port_detect(status, priv, ndev, skb) \ +#define cpsw_dual_emac_src_port_detect(cpsw, status, priv, ndev, skb) \ do {\ - if (!priv->data.dual_emac) \ + if (!cpsw->data.dual_emac) \ break; \ if (CPDMA_RX_SOURCE_PORT(status) == 1) {\ - ndev = priv->slaves[0].ndev;\ + nde
[PATCH v3 09/13] net: ethernet: ti: cpsw: move cpdma resources to cpsw_common
Every net device private struct holds links to shared cpdma resources. No need to save and every time synchronize these resources per net dev. So, move it to common driver struct. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 97 +- 1 file changed, 48 insertions(+), 49 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 5db2a55..6d99d1e 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -369,6 +369,8 @@ struct cpsw_common { struct cpsw_wr_regs __iomem *wr_regs; u8 __iomem *hw_stats; struct cpsw_host_regs __iomem *host_port_regs; + struct cpdma_ctlr *dma; + struct cpdma_chan *txch, *rxch; }; struct cpsw_priv { @@ -384,8 +386,6 @@ struct cpsw_priv { int rx_packet_max; u8 mac_addr[ETH_ALEN]; struct cpsw_slave *slaves; - struct cpdma_ctlr *dma; - struct cpdma_chan *txch, *rxch; struct cpsw_ale *ale; boolrx_pause; booltx_pause; @@ -654,25 +654,21 @@ static void cpsw_ndo_set_rx_mode(struct net_device *ndev) } } -static void cpsw_intr_enable(struct cpsw_priv *priv) +static void cpsw_intr_enable(struct cpsw_common *cpsw) { - struct cpsw_common *cpsw = priv->cpsw; - __raw_writel(0xFF, &cpsw->wr_regs->tx_en); __raw_writel(0xFF, &cpsw->wr_regs->rx_en); - cpdma_ctlr_int_ctrl(priv->dma, true); + cpdma_ctlr_int_ctrl(cpsw->dma, true); return; } -static void cpsw_intr_disable(struct cpsw_priv *priv) +static void cpsw_intr_disable(struct cpsw_common *cpsw) { - struct cpsw_common *cpsw = priv->cpsw; - __raw_writel(0, &cpsw->wr_regs->tx_en); __raw_writel(0, &cpsw->wr_regs->rx_en); - cpdma_ctlr_int_ctrl(priv->dma, false); + cpdma_ctlr_int_ctrl(cpsw->dma, false); return; } @@ -700,6 +696,7 @@ static void cpsw_rx_handler(void *token, int len, int status) struct net_device *ndev = skb->dev; struct cpsw_priv*priv = netdev_priv(ndev); int ret = 0; + struct cpsw_common *cpsw = priv->cpsw; cpsw_dual_emac_src_port_detect(status, priv, ndev, skb); @@ -745,8 +742,8 @@ static void cpsw_rx_handler(void *token, int len, int status) } requeue: - ret = cpdma_chan_submit(priv->rxch, new_skb, new_skb->data, - skb_tailroom(new_skb), 0); + ret = cpdma_chan_submit(cpsw->rxch, new_skb, new_skb->data, + skb_tailroom(new_skb), 0); if (WARN_ON(ret < 0)) dev_kfree_skb_any(new_skb); } @@ -757,7 +754,7 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) struct cpsw_common *cpsw = priv->cpsw; writel(0, &cpsw->wr_regs->tx_en); - cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_TX); + cpdma_ctlr_eoi(cpsw->dma, CPDMA_EOI_TX); if (priv->quirk_irq) { disable_irq_nosync(priv->irqs_table[1]); @@ -773,7 +770,7 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) struct cpsw_priv *priv = dev_id; struct cpsw_common *cpsw = priv->cpsw; - cpdma_ctlr_eoi(priv->dma, CPDMA_EOI_RX); + cpdma_ctlr_eoi(cpsw->dma, CPDMA_EOI_RX); writel(0, &cpsw->wr_regs->rx_en); if (priv->quirk_irq) { @@ -791,7 +788,7 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget) int num_tx; struct cpsw_common *cpsw = priv->cpsw; - num_tx = cpdma_chan_process(priv->txch, budget); + num_tx = cpdma_chan_process(cpsw->txch, budget); if (num_tx < budget) { napi_complete(napi_tx); writel(0xff, &cpsw->wr_regs->tx_en); @@ -810,7 +807,7 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget) int num_rx; struct cpsw_common *cpsw = priv->cpsw; - num_rx = cpdma_chan_process(priv->rxch, budget); + num_rx = cpdma_chan_process(cpsw->rxch, budget); if (num_rx < budget) { napi_complete(napi_rx); writel(0xff, &cpsw->wr_regs->rx_en); @@ -1020,17 +1017,16 @@ static void cpsw_get_strings(struct net_device *ndev, u32 stringset, u8 *data) static void cpsw_get_ethtool_stats(struct net_device *ndev, struct ethtool_stats *stats, u64 *data) { - struct cpsw_priv *priv = netdev_priv(ndev); struct cpdma_chan_stats rx_stats; struct cpdma_chan_stats tx_stats; u32 val; u8 *p; int i; - struct cpsw_common *cpsw = priv->cpsw; + struct cpsw_common *cpsw = ndev_to_cpsw(nd
[PATCH v3 01/13] net: ethernet: ti: cpsw: simplify submit routine
As second net dev is created only in case of dual_emac mode, port number can be figured out in simpler way. Also no need to pass redundant ndev struct. Reviewed-by: Mugunthan V N Reviewed-by: Grygorii Strashko Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index c51f346..8972bf6 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -1065,19 +1065,11 @@ static int cpsw_common_res_usage_state(struct cpsw_priv *priv) return usage_count; } -static inline int cpsw_tx_packet_submit(struct net_device *ndev, - struct cpsw_priv *priv, struct sk_buff *skb) +static inline int cpsw_tx_packet_submit(struct cpsw_priv *priv, + struct sk_buff *skb) { - if (!priv->data.dual_emac) - return cpdma_chan_submit(priv->txch, skb, skb->data, - skb->len, 0); - - if (ndev == cpsw_get_slave_ndev(priv, 0)) - return cpdma_chan_submit(priv->txch, skb, skb->data, - skb->len, 1); - else - return cpdma_chan_submit(priv->txch, skb, skb->data, - skb->len, 2); + return cpdma_chan_submit(priv->txch, skb, skb->data, skb->len, +priv->emac_port + priv->data.dual_emac); } static inline void cpsw_add_dual_emac_def_ale_entries( @@ -1406,7 +1398,7 @@ static netdev_tx_t cpsw_ndo_start_xmit(struct sk_buff *skb, skb_tx_timestamp(skb); - ret = cpsw_tx_packet_submit(ndev, priv, skb); + ret = cpsw_tx_packet_submit(priv, skb); if (unlikely(ret != 0)) { cpsw_err(priv, tx_err, "desc submit failed\n"); goto fail; -- 1.9.1
[PATCH v3 02/13] net: ethernet: ti: cpsw: remove intr dbg msg from poll handlers
At poll handler no possibility to figure out which network device is handling packets, as cpdma channels are common for both network devices in dual_emac mode. Currently, the messages are printed only for one device, in fact, there is two. This print msg is incorrect and seems is not very useful, so drop it from poll handler. Reviewed-by: Mugunthan V N Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 8972bf6..21cf367 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -793,9 +793,6 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget) } } - if (num_tx) - cpsw_dbg(priv, intr, "poll %d tx pkts\n", num_tx); - return num_tx; } @@ -814,9 +811,6 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget) } } - if (num_rx) - cpsw_dbg(priv, intr, "poll %d rx pkts\n", num_rx); - return num_rx; } -- 1.9.1
[PATCH v3 10/13] net; ethernet: ti: cpsw: move irq stuff under cpsw_common
The irq data are common for net devs in dual_emac mode. So no need to hold these data in every priv struct, move them under cpsw_common. Also delete irq_num var, as after optimization it's not needed. Correct number of irqs to 2, as anyway, driver is using only 2, at least for now. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 65 +++--- 1 file changed, 29 insertions(+), 36 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 6d99d1e..b2482b6 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -143,6 +143,7 @@ do { \ #define cpsw_slave_index(priv) \ ((priv->data.dual_emac) ? priv->emac_port : \ priv->data.active_slave) +#define IRQ_NUM2 static int debug_level; module_param(debug_level, int, 0); @@ -371,6 +372,10 @@ struct cpsw_common { struct cpsw_host_regs __iomem *host_port_regs; struct cpdma_ctlr *dma; struct cpdma_chan *txch, *rxch; + boolquirk_irq; + boolrx_irq_disabled; + booltx_irq_disabled; + u32 irqs_table[IRQ_NUM]; }; struct cpsw_priv { @@ -389,12 +394,6 @@ struct cpsw_priv { struct cpsw_ale *ale; boolrx_pause; booltx_pause; - boolquirk_irq; - boolrx_irq_disabled; - booltx_irq_disabled; - /* snapshot of IRQ numbers */ - u32 irqs_table[4]; - u32 num_irqs; struct cpts *cpts; u32 emac_port; struct cpsw_common *cpsw; @@ -756,9 +755,9 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) writel(0, &cpsw->wr_regs->tx_en); cpdma_ctlr_eoi(cpsw->dma, CPDMA_EOI_TX); - if (priv->quirk_irq) { - disable_irq_nosync(priv->irqs_table[1]); - priv->tx_irq_disabled = true; + if (cpsw->quirk_irq) { + disable_irq_nosync(cpsw->irqs_table[1]); + cpsw->tx_irq_disabled = true; } napi_schedule(&priv->napi_tx); @@ -773,9 +772,9 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) cpdma_ctlr_eoi(cpsw->dma, CPDMA_EOI_RX); writel(0, &cpsw->wr_regs->rx_en); - if (priv->quirk_irq) { - disable_irq_nosync(priv->irqs_table[0]); - priv->rx_irq_disabled = true; + if (cpsw->quirk_irq) { + disable_irq_nosync(cpsw->irqs_table[0]); + cpsw->rx_irq_disabled = true; } napi_schedule(&priv->napi_rx); @@ -792,9 +791,9 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget) if (num_tx < budget) { napi_complete(napi_tx); writel(0xff, &cpsw->wr_regs->tx_en); - if (priv->quirk_irq && priv->tx_irq_disabled) { - priv->tx_irq_disabled = false; - enable_irq(priv->irqs_table[1]); + if (cpsw->quirk_irq && cpsw->tx_irq_disabled) { + cpsw->tx_irq_disabled = false; + enable_irq(cpsw->irqs_table[1]); } } @@ -811,9 +810,9 @@ static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget) if (num_rx < budget) { napi_complete(napi_rx); writel(0xff, &cpsw->wr_regs->rx_en); - if (priv->quirk_irq && priv->rx_irq_disabled) { - priv->rx_irq_disabled = false; - enable_irq(priv->irqs_table[0]); + if (cpsw->quirk_irq && cpsw->rx_irq_disabled) { + cpsw->rx_irq_disabled = false; + enable_irq(cpsw->irqs_table[0]); } } @@ -1299,14 +1298,14 @@ static int cpsw_ndo_open(struct net_device *ndev) napi_enable(&priv_sl0->napi_rx); napi_enable(&priv_sl0->napi_tx); - if (priv_sl0->tx_irq_disabled) { - priv_sl0->tx_irq_disabled = false; - enable_irq(priv->irqs_table[1]); + if (cpsw->tx_irq_disabled) { + cpsw->tx_irq_disabled = false; + enable_irq(cpsw->irqs_table[1]); } - if (priv_sl0->rx_irq_disabled) { - priv_sl0->rx_irq_disabled = false; - enable_irq(priv->irqs_table[0]); + if (cpsw->rx_irq_disabled) { + cpsw->rx_irq_disabled = false; + enable_irq(cpsw->irqs_table[0]); } buf_num = cpdma_ch
[PATCH v3 07/13] net: ethernet: ti: cpsw: replace pdev on dev
No need to hold pdev link when only dev is needed. This allows to simplify a bunch of cpsw->pdev->dev now and farther. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 65 ++ 1 file changed, 34 insertions(+), 31 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 3ccf577..c21cc38 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -364,7 +364,7 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) } struct cpsw_common { - struct platform_device *pdev; + struct device *dev; }; struct cpsw_priv { @@ -1155,7 +1155,7 @@ static void cpsw_slave_open(struct cpsw_slave *slave, struct cpsw_priv *priv) phy_start(slave->phy); /* Configure GMII_SEL register */ - cpsw_phy_sel(&cpsw->pdev->dev, slave->phy->interface, slave->slave_num); + cpsw_phy_sel(cpsw->dev, slave->phy->interface, slave->slave_num); } static inline void cpsw_add_default_vlan(struct cpsw_priv *priv) @@ -1241,9 +1241,9 @@ static int cpsw_ndo_open(struct net_device *ndev) int i, ret; u32 reg; - ret = pm_runtime_get_sync(&cpsw->pdev->dev); + ret = pm_runtime_get_sync(cpsw->dev); if (ret < 0) { - pm_runtime_put_noidle(&cpsw->pdev->dev); + pm_runtime_put_noidle(cpsw->dev); return ret; } @@ -1320,7 +1320,7 @@ static int cpsw_ndo_open(struct net_device *ndev) */ cpsw_info(priv, ifup, "submitted %d rx descriptors\n", i); - if (cpts_register(&cpsw->pdev->dev, priv->cpts, + if (cpts_register(cpsw->dev, priv->cpts, priv->data.cpts_clock_mult, priv->data.cpts_clock_shift)) dev_err(priv->dev, "error registering cpts device\n"); @@ -1345,7 +1345,7 @@ static int cpsw_ndo_open(struct net_device *ndev) err_cleanup: cpdma_ctlr_stop(priv->dma); for_each_slave(priv, cpsw_slave_stop, priv); - pm_runtime_put_sync(&cpsw->pdev->dev); + pm_runtime_put_sync(cpsw->dev); netif_carrier_off(priv->ndev); return ret; } @@ -1370,7 +1370,7 @@ static int cpsw_ndo_stop(struct net_device *ndev) cpsw_ale_stop(priv->ale); } for_each_slave(priv, cpsw_slave_stop, priv); - pm_runtime_put_sync(&cpsw->pdev->dev); + pm_runtime_put_sync(cpsw->dev); if (priv->data.dual_emac) priv->slaves[priv->emac_port].open_stat = false; return 0; @@ -1610,9 +1610,9 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p) if (!is_valid_ether_addr(addr->sa_data)) return -EADDRNOTAVAIL; - ret = pm_runtime_get_sync(&cpsw->pdev->dev); + ret = pm_runtime_get_sync(cpsw->dev); if (ret < 0) { - pm_runtime_put_noidle(&cpsw->pdev->dev); + pm_runtime_put_noidle(cpsw->dev); return ret; } @@ -1630,7 +1630,7 @@ static int cpsw_ndo_set_mac_address(struct net_device *ndev, void *p) memcpy(ndev->dev_addr, priv->mac_addr, ETH_ALEN); for_each_slave(priv, cpsw_set_slave_mac, priv); - pm_runtime_put(&cpsw->pdev->dev); + pm_runtime_put(cpsw->dev); return 0; } @@ -1702,9 +1702,9 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev, if (vid == priv->data.default_vlan) return 0; - ret = pm_runtime_get_sync(&cpsw->pdev->dev); + ret = pm_runtime_get_sync(cpsw->dev); if (ret < 0) { - pm_runtime_put_noidle(&cpsw->pdev->dev); + pm_runtime_put_noidle(cpsw->dev); return ret; } @@ -1724,7 +1724,7 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device *ndev, dev_info(priv->dev, "Adding vlanid %d to vlan filter\n", vid); ret = cpsw_add_vlan_ale_entry(priv, vid); - pm_runtime_put(&cpsw->pdev->dev); + pm_runtime_put(cpsw->dev); return ret; } @@ -1738,9 +1738,9 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev, if (vid == priv->data.default_vlan) return 0; - ret = pm_runtime_get_sync(&cpsw->pdev->dev); + ret = pm_runtime_get_sync(cpsw->dev); if (ret < 0) { - pm_runtime_put_noidle(&cpsw->pdev->dev); + pm_runtime_put_noidle(cpsw->dev); return ret; } @@ -1765,7 +1765,7 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev, ret = cpsw_ale_del_mcast(priv->ale, priv->ndev->broadcast, 0, ALE_VLAN, vid); - pm_runtime_put(&cpsw->pdev->dev); + pm_runtime_put(cpsw->dev); return ret; } @@ -1809,10 +1809,11 @@ static void cpsw_get_drvinfo(struct net_device *nd
[PATCH v3 12/13] net: ethernet: ti: cpsw: move napi struct to cpsw_common
The napi structs are common for both net devices in dual_emac mode, In order to not hold duplicate links to them, move to cpsw_common. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 52 ++ 1 file changed, 22 insertions(+), 30 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index ab5488b..2c2e36a 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -367,6 +367,8 @@ static inline void slave_write(struct cpsw_slave *slave, u32 val, u32 offset) struct cpsw_common { struct device *dev; struct cpsw_platform_data data; + struct napi_struct napi_rx; + struct napi_struct napi_tx; struct cpsw_ss_regs __iomem *regs; struct cpsw_wr_regs __iomem *wr_regs; u8 __iomem *hw_stats; @@ -382,8 +384,6 @@ struct cpsw_common { struct cpsw_priv { struct net_device *ndev; - struct napi_struct napi_rx; - struct napi_struct napi_tx; struct device *dev; u32 msg_enable; u32 version; @@ -488,7 +488,7 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { #define CPSW_STATS_LEN ARRAY_SIZE(cpsw_gstrings_stats) #define ndev_to_cpsw(ndev) (((struct cpsw_priv *)netdev_priv(ndev))->cpsw) -#define napi_to_priv(napi) container_of(napi, struct cpsw_priv, napi) +#define napi_to_cpsw(napi) container_of(napi, struct cpsw_common, napi) #define for_each_slave(priv, func, arg...) \ do {\ struct cpsw_slave *slave; \ @@ -752,8 +752,7 @@ requeue: static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) { - struct cpsw_priv *priv = dev_id; - struct cpsw_common *cpsw = priv->cpsw; + struct cpsw_common *cpsw = dev_id; writel(0, &cpsw->wr_regs->tx_en); cpdma_ctlr_eoi(cpsw->dma, CPDMA_EOI_TX); @@ -763,14 +762,13 @@ static irqreturn_t cpsw_tx_interrupt(int irq, void *dev_id) cpsw->tx_irq_disabled = true; } - napi_schedule(&priv->napi_tx); + napi_schedule(&cpsw->napi_tx); return IRQ_HANDLED; } static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) { - struct cpsw_priv *priv = dev_id; - struct cpsw_common *cpsw = priv->cpsw; + struct cpsw_common *cpsw = dev_id; cpdma_ctlr_eoi(cpsw->dma, CPDMA_EOI_RX); writel(0, &cpsw->wr_regs->rx_en); @@ -780,15 +778,14 @@ static irqreturn_t cpsw_rx_interrupt(int irq, void *dev_id) cpsw->rx_irq_disabled = true; } - napi_schedule(&priv->napi_rx); + napi_schedule(&cpsw->napi_rx); return IRQ_HANDLED; } static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget) { - struct cpsw_priv*priv = napi_to_priv(napi_tx); + struct cpsw_common *cpsw = napi_to_cpsw(napi_tx); int num_tx; - struct cpsw_common *cpsw = priv->cpsw; num_tx = cpdma_chan_process(cpsw->txch, budget); if (num_tx < budget) { @@ -805,9 +802,8 @@ static int cpsw_tx_poll(struct napi_struct *napi_tx, int budget) static int cpsw_rx_poll(struct napi_struct *napi_rx, int budget) { - struct cpsw_priv*priv = napi_to_priv(napi_rx); + struct cpsw_common *cpsw = napi_to_cpsw(napi_rx); int num_rx; - struct cpsw_common *cpsw = priv->cpsw; num_rx = cpdma_chan_process(cpsw->rxch, budget); if (num_rx < budget) { @@ -1283,7 +1279,6 @@ static int cpsw_ndo_open(struct net_device *ndev) ALE_ALL_PORTS, ALE_ALL_PORTS, 0, 0); if (!cpsw_common_res_usage_state(cpsw)) { - struct cpsw_priv *priv_sl0 = cpsw_get_slave_priv(cpsw, 0); int buf_num; /* setup tx dma to fixed prio and zero offset */ @@ -1299,8 +1294,8 @@ static int cpsw_ndo_open(struct net_device *ndev) /* Enable internal fifo flow control */ writel(0x7, &cpsw->regs->flow_control); - napi_enable(&priv_sl0->napi_rx); - napi_enable(&priv_sl0->napi_tx); + napi_enable(&cpsw->napi_rx); + napi_enable(&cpsw->napi_tx); if (cpsw->tx_irq_disabled) { cpsw->tx_irq_disabled = false; @@ -1373,10 +1368,8 @@ static int cpsw_ndo_stop(struct net_device *ndev) netif_carrier_off(priv->ndev); if (cpsw_common_res_usage_state(cpsw) <= 1) { - struct cpsw_priv *priv_sl0 = cpsw_get_slave_priv(cpsw, 0); - - napi_disable(&priv_sl0->napi_rx); - napi_disa
[PATCH v3 13/13] net: ethernet: ti: cpsw: move ale, cpts and drivers params under cpsw_common
The ale, cpts, version, rx_packet_max, bus_freq, interrupt pacing parameters are common per net device that uses the same h/w. So, move them to common driver structure. Signed-off-by: Ivan Khoronzhuk --- drivers/net/ethernet/ti/cpsw.c | 235 +++-- 1 file changed, 106 insertions(+), 129 deletions(-) diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c index 2c2e36a..b4d3b41 100644 --- a/drivers/net/ethernet/ti/cpsw.c +++ b/drivers/net/ethernet/ti/cpsw.c @@ -373,28 +373,28 @@ struct cpsw_common { struct cpsw_wr_regs __iomem *wr_regs; u8 __iomem *hw_stats; struct cpsw_host_regs __iomem *host_port_regs; + u32 version; + u32 coal_intvl; + u32 bus_freq_mhz; + int rx_packet_max; struct cpsw_slave *slaves; struct cpdma_ctlr *dma; struct cpdma_chan *txch, *rxch; + struct cpsw_ale *ale; boolquirk_irq; boolrx_irq_disabled; booltx_irq_disabled; u32 irqs_table[IRQ_NUM]; + struct cpts *cpts; }; struct cpsw_priv { struct net_device *ndev; struct device *dev; u32 msg_enable; - u32 version; - u32 coal_intvl; - u32 bus_freq_mhz; - int rx_packet_max; u8 mac_addr[ETH_ALEN]; - struct cpsw_ale *ale; boolrx_pause; booltx_pause; - struct cpts *cpts; u32 emac_port; struct cpsw_common *cpsw; }; @@ -502,22 +502,16 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { n; n--) \ (func)(slave++, ##arg); \ } while (0) -#define cpsw_get_slave_priv(cpsw, __slave_no__) \ - (((__slave_no__ < cpsw->data.slaves) && \ - (cpsw->slaves[__slave_no__].ndev)) ?\ - netdev_priv(cpsw->slaves[__slave_no__].ndev) : NULL)\ -#define cpsw_dual_emac_src_port_detect(cpsw, status, priv, ndev, skb) \ +#define cpsw_dual_emac_src_port_detect(cpsw, status, ndev, skb) \ do {\ if (!cpsw->data.dual_emac) \ break; \ if (CPDMA_RX_SOURCE_PORT(status) == 1) {\ ndev = cpsw->slaves[0].ndev;\ - priv = netdev_priv(ndev); \ skb->dev = ndev;\ } else if (CPDMA_RX_SOURCE_PORT(status) == 2) { \ ndev = cpsw->slaves[1].ndev;\ - priv = netdev_priv(ndev); \ skb->dev = ndev;\ } \ } while (0) @@ -528,11 +522,11 @@ static const struct cpsw_stats cpsw_gstrings_stats[] = { priv->emac_port;\ int slave_port = cpsw_get_slave_port( \ slave->slave_num); \ - cpsw_ale_add_mcast(priv->ale, addr, \ + cpsw_ale_add_mcast(cpsw->ale, addr, \ 1 << slave_port | ALE_PORT_HOST,\ ALE_VLAN, slave->port_vlan, 0); \ } else {\ - cpsw_ale_add_mcast(priv->ale, addr, \ + cpsw_ale_add_mcast(cpsw->ale, addr, \ ALE_ALL_PORTS, \ 0, 0, 0); \ } \ @@ -545,9 +539,8 @@ static inline int cpsw_get_slave_port(u32 slave_num) static void cpsw_set_promiscious(struct net_device *ndev, bool enable) { - struct cpsw_priv *priv = netdev_priv(ndev); - struct cpsw_common *cpsw = priv->cpsw; - struct cpsw_ale *ale = priv->ale; +
Re: [PATCH v2] net: phy: micrel: Add specific suspend
From: Wenyou Yang Date: Fri, 5 Aug 2016 14:35:41 +0800 > Disable all interrupts when suspend, they will be enabled > when resume. Otherwise, the suspend/resume process will be > blocked occasionally. > > Signed-off-by: Wenyou Yang > Acked-by: Nicolas Ferre > --- > > Changes in v2: > - Use fairly generic phydrv->config_intr() with >PHY_INTERRUPT_DISABLED, then call genphy_suspend(). > - Modify kszphy_resume() with PHY_INTERRUPT_ENABLED accordingly. > - Add static attribute for kszphy_suspend(). Applied, thanks.
Re: [PATCH net-next v2 0/2] HFSC patches
From: Michal Soltys Date: Wed, 3 Aug 2016 00:44:53 +0200 > Changes since v1: > > - in the first patch's commit message, reference old patch using kernel.org's > historical tree (instead of using gmane) > - in the second patch we just remove variable > > Notes regarding patch #1/2: > > This patch syncs virtual times with fair service curve and fixes a very old > subtle bug. > > The detailed explanation is in the commit message. Additionally > I've made an illustration to help understand the issue better: > > http://imgur.com/a/N8uMC > > See the example at the bottom of the commit message - Am1_3 and Am2_3 is what > should happen with such queue setup, Am1_3real and Am2_3real is what actually > happens due to rtsc_min() calculating minimum from corrected and uncorrected > curves. I applied this series to net-next, thanks Michal.
Re: [PATCH 2/2] ravb: add sleep PM suspend/resume support
From: Niklas Söderlund Date: Wed, 3 Aug 2016 15:56:47 +0200 > The interface would not function after the system had been woken up > after have been suspended (echo mem > /sys/power/state) cycle. The > reason for this is that all device registers have been reset to its > default values. This patch adds sleep suspend and resume functions that > detached the interface at suspend and restore the registers and reattach > the interface at resume. > > Only the registers that are only configured at probe time needs to be > explicitly restored by the resume handler. All other registers are > reconfigured by either reopening the device in the resume handler (if > the device was running when the system was suspended) or when the > interface is opened by a user at a later time. > > Signed-off-by: Niklas Söderlund Applied to net-next, thanks.
Re: [PATCH net-next v3 1/3] strparser: Stream parser for messages
From: Tom Herbert Date: Mon, 1 Aug 2016 14:28:47 -0700 > +/* Lower lock held */ > +void strp_parser_err(struct strparser *strp, int err, read_descriptor_t > *desc) > +{ > + desc->error = err; > + kfree_skb(strp->rx_skb_head); > + strp->rx_skb_head = NULL; > + strp_abort_rx_strp(strp, err); > +} Unused outside of this file, please mark "static". > +void strp_queue_work(struct strparser *strp) > +{ > + queue_work(strp_wq, &strp->rx_work); > +} Completely unused, please remove. Thanks.
Re: [PATCH net] net: ipv6: Fix ping to link-local addresses.
David Ahern wrote: > On 8/9/16 1:01 AM, Erik Kline wrote: >> On 9 August 2016 at 14:20, David Miller wrote: >>> From: Lorenzo Colitti >>> Date: Tue, 9 Aug 2016 10:00:25 +0900 >>> Note that pretty much every sendmsg codepath allows other data to take precedence over sk_bound_dev_if: - udpv6_sendmsg: if sin6_scope_id specified on a scoped address - rawv6_sendmsg: if sin6_scope_id specified on a scoped address - l2tp_ip6_sendmsg: if sin6_scope_id specified on a scoped address - ip_cmsg_send: if IP_PKTINFO or IPV6_PKTINFO specified What should I do about those? -EINVAL? Ignore the conflicting data? Leave as is? >>> >>> That's a good point, I guess this needs some more thought. >> >> I could see a point of view that says when bound_if is in play sending >> to destinations on/via other interfaces--by any mechanism--should >> effectively get ENETUNREACH (or something). > > VRF uses this capability to send on an enslaved interface. ie., socket is > bound to VRF device to limit packets to that L3 domain and then uses PKTINFO > to force a packet out a particular interface. > We could extend our code to allow enslave devices, maybe. -- Hideaki Yoshifuji Technical Division, MIRACLE LINUX CORPORATION
Re: [PATCH net] ibmveth: Disable tx queue while changing mtu
From: Thomas Falcon Date: Tue, 9 Aug 2016 12:47:37 -0500 > If the device is running while the MTU is changed, ibmveth > is closed and the bounce buffer is freed. If a transmission > is sent before ibmveth can be reopened, ibmveth_start_xmit > tries to copy to the null bounce buffer, leading to a kernel > oops. The proposed solution disables the tx queue until > ibmveth is restarted. > > Reported-by: Jan Stancek > Tested-by: Jan Stancek > Signed-off-by: Thomas Falcon The bugs in the patch show clearly why this kind of non-unwindable behavior is so undesirable. > @@ -1378,14 +1379,18 @@ static int ibmveth_change_mtu(struct net_device *dev, > int new_mtu) > ibmveth_get_desired_dma > (viodev)); > if (need_restart) { > - return ibmveth_open(adapter->netdev); > + rc = ibmveth_open(adapter->netdev); > + netif_wake_queue(dev); > + return rc; If the open fails, the last thing in the world you should do is wake the TX queue. Furthermore, ibmveth_open() does netif_start_queue() so this call should be completely unnecessary. But fundamentally here the real problem, the whole operation should be done in a "prepare, commit" style transaction. So that if we can't make the MTU change for whatever reason, the original MTU configuration is retained and the interface stays up and operational. The error recovery mechanism here in this function is unacceptable, and needs to be rewritten from scratch.
Re: [PATCH v2] net: dsa: b53: constify b53_io_ops structures
From: Julia Lawall Date: Tue, 9 Aug 2016 19:09:45 +0200 > The b53_io_ops structures are never modified, so declare them as const. > > Done with the help of Coccinelle. > > Signed-off-by: Julia Lawall > > --- > v2: Refer to the right structure in the commit message Applied to net-next, thanks.
Re: [PATCH] dm9000: Fix irq trigger type setup on non-dt platforms
From: Robert Jarzmik Date: Tue, 09 Aug 2016 19:20:44 +0200 > Sylwester Nawrocki writes: > >> Commit b5a099c67a1c36b "net: ethernet: davicom: fix devicetree irq >> resource" causes an interrupt storm after the ethernet interface >> is activated on S3C24XX platform (ARM non-dt), due to the interrupt >> trigger type not being set properly. >> >> It seems, after adding parsing of IRQ flags in commit 7085a7401ba54e92b >> "drivers: platform: parse IRQ flags from resources", there is no path >> for non-dt platforms where irq_set_type callback could be invoked when >> we don't pass the trigger type flags to the request_irq() call. >> >> In case of a board where the regression is seen the interrupt trigger >> type flags are passed through a platform device's resource and it is >> not currently handled properly without passing the irq trigger type >> flags to the request_irq() call. In case of OF an of_irq_get() call >> within platform_get_irq() function seems to be ensuring required irq_chip >> setup, but there is no equivalent code for non OF/ACPI platforms. >> >> This patch mostly restores irq trigger type setting code which has been >> removed in commit ("net: ethernet: davicom: fix devicetree irq resource"). >> >> Fixes: b5a099c67a1c36b913 ("net: ethernet: davicom: fix devicetree irq >> resource") >> >> Signed-off-by: Sylwester Nawrocki >> --- >> >> Perhaps instead the core could be configuring the irqchip automatically as it >> is done for OF/ACPI cases. I had doubts though if trying to make such changes >> for a bug fix patch was the right thing to do. > Hi Sylvester, > > You're right, and I came to the same conclusion a bit earlier, in [1], but I > didn't notice my FAI didn't actually send the mail. Your analysis of the core > in > non-OF/ACPI case is the reason I didn't post a patch for dm9000 ... I was > overconfident in finding a reason in irq core code within a couple of days. > > Therefore: > Acked-by: Robert Jarzmik Applied.
Re: [PATCH net-next] ppp: build ifname using unit identifier for rtnl based devices
From: Guillaume Nault Date: Tue, 9 Aug 2016 15:12:26 +0200 > Userspace programs generally need to know the name of the ppp devices > they create. Both ioctl and rtnl interfaces use the ppp sheme > to name them. But although the suffix used by the ioctl interface can > be known by userspace (it's the PPP unit identifier returned by the > PPPIOCGUNIT ioctl), the one used by the rtnl is only known by the > kernel. > > This patch brings more consistency between ioctl and rtnl based ppp > devices by generating device names using the PPP unit identifer as > suffix in both cases. This way, userspace can always infer the name of > the devices they create. > > Signed-off-by: Guillaume Nault Applied, thanks.
[PATCH 2/2] net: ethernet: renesas: sh_eth: use new api ethtool_{get|set}_link_ksettings
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. Signed-off-by: Philippe Reynes --- drivers/net/ethernet/renesas/sh_eth.c | 18 +- 1 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 901ed36..1f8240a 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -1817,8 +1817,8 @@ static int sh_eth_phy_start(struct net_device *ndev) return 0; } -static int sh_eth_get_settings(struct net_device *ndev, - struct ethtool_cmd *ecmd) +static int sh_eth_get_link_ksettings(struct net_device *ndev, +struct ethtool_link_ksettings *cmd) { struct sh_eth_private *mdp = netdev_priv(ndev); unsigned long flags; @@ -1828,14 +1828,14 @@ static int sh_eth_get_settings(struct net_device *ndev, return -ENODEV; spin_lock_irqsave(&mdp->lock, flags); - ret = phy_ethtool_gset(ndev->phydev, ecmd); + ret = phy_ethtool_ksettings_get(ndev->phydev, cmd); spin_unlock_irqrestore(&mdp->lock, flags); return ret; } -static int sh_eth_set_settings(struct net_device *ndev, - struct ethtool_cmd *ecmd) +static int sh_eth_set_link_ksettings(struct net_device *ndev, +const struct ethtool_link_ksettings *cmd) { struct sh_eth_private *mdp = netdev_priv(ndev); unsigned long flags; @@ -1849,11 +1849,11 @@ static int sh_eth_set_settings(struct net_device *ndev, /* disable tx and rx */ sh_eth_rcv_snd_disable(ndev); - ret = phy_ethtool_sset(ndev->phydev, ecmd); + ret = phy_ethtool_ksettings_set(ndev->phydev, cmd); if (ret) goto error_exit; - if (ecmd->duplex == DUPLEX_FULL) + if (cmd->base.duplex == DUPLEX_FULL) mdp->duplex = 1; else mdp->duplex = 0; @@ -2195,8 +2195,6 @@ static int sh_eth_set_ringparam(struct net_device *ndev, } static const struct ethtool_ops sh_eth_ethtool_ops = { - .get_settings = sh_eth_get_settings, - .set_settings = sh_eth_set_settings, .get_regs_len = sh_eth_get_regs_len, .get_regs = sh_eth_get_regs, .nway_reset = sh_eth_nway_reset, @@ -2208,6 +2206,8 @@ static const struct ethtool_ops sh_eth_ethtool_ops = { .get_sset_count = sh_eth_get_sset_count, .get_ringparam = sh_eth_get_ringparam, .set_ringparam = sh_eth_set_ringparam, + .get_link_ksettings = sh_eth_get_link_ksettings, + .set_link_ksettings = sh_eth_set_link_ksettings, }; /* network device open function */ -- 1.7.4.4
[PATCH 1/2] net: ethernet: renesas: sh_eth: use phydev from struct net_device
The private structure contain a pointer to phydev, but the structure net_device already contain such pointer. So we can remove the pointer phy_dev in the private structure, and update the driver to use the one contained in struct net_device. Signed-off-by: Philippe Reynes --- drivers/net/ethernet/renesas/sh_eth.c | 29 - drivers/net/ethernet/renesas/sh_eth.h |1 - 2 files changed, 12 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/renesas/sh_eth.c b/drivers/net/ethernet/renesas/sh_eth.c index 799d58d..901ed36 100644 --- a/drivers/net/ethernet/renesas/sh_eth.c +++ b/drivers/net/ethernet/renesas/sh_eth.c @@ -1723,7 +1723,7 @@ out: static void sh_eth_adjust_link(struct net_device *ndev) { struct sh_eth_private *mdp = netdev_priv(ndev); - struct phy_device *phydev = mdp->phydev; + struct phy_device *phydev = ndev->phydev; int new_state = 0; if (phydev->link) { @@ -1800,22 +1800,19 @@ static int sh_eth_phy_init(struct net_device *ndev) phy_attached_info(phydev); - mdp->phydev = phydev; - return 0; } /* PHY control start function */ static int sh_eth_phy_start(struct net_device *ndev) { - struct sh_eth_private *mdp = netdev_priv(ndev); int ret; ret = sh_eth_phy_init(ndev); if (ret) return ret; - phy_start(mdp->phydev); + phy_start(ndev->phydev); return 0; } @@ -1827,11 +1824,11 @@ static int sh_eth_get_settings(struct net_device *ndev, unsigned long flags; int ret; - if (!mdp->phydev) + if (!ndev->phydev) return -ENODEV; spin_lock_irqsave(&mdp->lock, flags); - ret = phy_ethtool_gset(mdp->phydev, ecmd); + ret = phy_ethtool_gset(ndev->phydev, ecmd); spin_unlock_irqrestore(&mdp->lock, flags); return ret; @@ -1844,7 +1841,7 @@ static int sh_eth_set_settings(struct net_device *ndev, unsigned long flags; int ret; - if (!mdp->phydev) + if (!ndev->phydev) return -ENODEV; spin_lock_irqsave(&mdp->lock, flags); @@ -1852,7 +1849,7 @@ static int sh_eth_set_settings(struct net_device *ndev, /* disable tx and rx */ sh_eth_rcv_snd_disable(ndev); - ret = phy_ethtool_sset(mdp->phydev, ecmd); + ret = phy_ethtool_sset(ndev->phydev, ecmd); if (ret) goto error_exit; @@ -2067,11 +2064,11 @@ static int sh_eth_nway_reset(struct net_device *ndev) unsigned long flags; int ret; - if (!mdp->phydev) + if (!ndev->phydev) return -ENODEV; spin_lock_irqsave(&mdp->lock, flags); - ret = phy_start_aneg(mdp->phydev); + ret = phy_start_aneg(ndev->phydev); spin_unlock_irqrestore(&mdp->lock, flags); return ret; @@ -2408,10 +2405,9 @@ static int sh_eth_close(struct net_device *ndev) sh_eth_dev_exit(ndev); /* PHY Disconnect */ - if (mdp->phydev) { - phy_stop(mdp->phydev); - phy_disconnect(mdp->phydev); - mdp->phydev = NULL; + if (ndev->phydev) { + phy_stop(ndev->phydev); + phy_disconnect(ndev->phydev); } free_irq(ndev->irq, ndev); @@ -2429,8 +2425,7 @@ static int sh_eth_close(struct net_device *ndev) /* ioctl to device function */ static int sh_eth_do_ioctl(struct net_device *ndev, struct ifreq *rq, int cmd) { - struct sh_eth_private *mdp = netdev_priv(ndev); - struct phy_device *phydev = mdp->phydev; + struct phy_device *phydev = ndev->phydev; if (!netif_running(ndev)) return -EINVAL; diff --git a/drivers/net/ethernet/renesas/sh_eth.h b/drivers/net/ethernet/renesas/sh_eth.h index c62380e..d050f37 100644 --- a/drivers/net/ethernet/renesas/sh_eth.h +++ b/drivers/net/ethernet/renesas/sh_eth.h @@ -518,7 +518,6 @@ struct sh_eth_private { /* MII transceiver section. */ u32 phy_id; /* PHY ID */ struct mii_bus *mii_bus;/* MDIO bus control */ - struct phy_device *phydev; /* PHY device control */ int link; phy_interface_t phy_interface; int msg_enable; -- 1.7.4.4
Re: [PATCH 1/1] bonding: fix the typo
From: zyjzyj2...@gmail.com Date: Tue, 9 Aug 2016 21:36:04 +0800 > From: Zhu Yanjun > > The message "803.ad" should be "802.3ad". > > Signed-off-by: Zhu Yanjun Applied.
Re: [PATCH net-next] net: Remove fib_local variable
From: David Ahern Date: Tue, 9 Aug 2016 06:51:06 -0700 > After commit 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") > fib_local is set but not used. Remove it. > > Signed-off-by: David Ahern Applied, thanks David.
Re: [RFC PATCH v5 2/3] Documentation: DT: net: Add Xilinx gmiitorgmii converter device tree binding documentation
On 08/09/2016 02:34 AM, Kedareswara rao Appana wrote: > Device-tree binding documentation for xilinx gmiitorgmii converter. > > Signed-off-by: Kedareswara rao Appana > --- > Changes for v5: > ---> Fixed Indentation in the example as suggested by Michal. > Changes for v4: > --> Modified compatible as suggested by Rob. > --> Removed underscores from the converter node name as suggested by Rob. > Changes for v3: > --> None. > Changes for v2: > --> New patch. > > .../devicetree/bindings/net/xilinx_gmii2rgmii.txt | 38 > ++ > 1 file changed, 38 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt > > diff --git a/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt > b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt > new file mode 100644 > index 000..5f48793 > --- /dev/null > +++ b/Documentation/devicetree/bindings/net/xilinx_gmii2rgmii.txt > @@ -0,0 +1,38 @@ > +XILINX GMIITORGMII Converter Driver Device Tree Bindings > + > + > +The Gigabit Media Independent Interface (GMII) to Reduced Gigabit Media > +Independent Interface (RGMII) core provides the RGMII between RGMII-compliant > +Ethernet physical media devices (PHY) and the Gigabit Ethernet controller. > +This core can be used in all three modes of operation(10/100/1000 Mb/s). > +The Management Data Input/Output (MDIO) interface is used to configure the > +Speed of operation. This core can switch dynamically between the three > +Different speed modes by configuring the conveter register through mdio > write. > + > +The MDIO is a bus to which the PHY devices are connected. For each > +device that exists on this bus, a child node should be created. See > +the definition of the PHY node in booting-without-of.txt for an example > +of how to define a PHY. I would skip this paragraph which does not really help with understanding, and just refer to Documentation/devicetree/bindings/net/phy.txt for examples. > + > +This converter sits between the ethernet MAC and the external phy. > +MAC <==> GMII2RGMII <==> RGMII_PHY > + > +Required properties: > +- compatible : Should be "xlnx,gmii-to-rgmii-1.0" > +- reg: The ID number for the phy, usually a small integer You would want specify that "reg" property needs to match the one of the PHY (specified via phy-handle) you are converting to/from for this "proxy" piece of hardware to work. If these two have the same "reg" value, is not that going to lead to duplicate MDIO devices created on the bus, this may work, based on probing ordering, but seems unusual, you don't really need the "reg" property here it seems? > +- phy-handle : Should point to the external phy device. > + See ethernet.txt file in the same directory. > + > +Example: > + mdio { > + #address-cells = <1>; > + #size-cells = <0>; > + phy: ethernet-phy@0 { > + .. > + }; > + gmiitorgmii: gmiitorgmii@8 { > + compatible = "xlnx,gmii-to-rgmii-1.0"; > + reg = <8>; > + phy-handle = <&phy>; > + }; > + }; > -- Florian
[PATCH iproute v3 0/5] iproute: ila and fou additions
Patch set includes: - Allow configuring checksum mode for ila LWT (e.g. configure checksum neutral - Configuration for performing ila translations using netfilter hook - fou encapsulation for ip6tnl and gre6 - fou listener for IPv6 v2: - Fixed coding style issues v3: - Fixed uniinitialized variable waning in ipila.c Tom Herbert (5): ila: Support for checksum neutral translation ila: Support for configuring ila to use netfilter hook ip6tnl: Support for fou encapsulation gre6: Support for fou encapsulation fou: Allowing configuring IPv6 listener ip/Makefile | 2 +- ip/ip.c | 3 +- ip/ip_common.h| 1 + ip/ipfou.c| 9 +- ip/ipila.c| 268 ++ ip/iproute_lwtunnel.c | 58 ++- ip/link_gre.c | 2 +- ip/link_gre6.c| 101 +++ ip/link_ip6tnl.c | 92 - 9 files changed, 528 insertions(+), 8 deletions(-) create mode 100644 ip/ipila.c -- 2.8.0.rc2
[PATCH iproute v3 1/5] ila: Support for checksum neutral translation
Add configuration of ila LWT tunnels for checksum mode including checksum neutral translation. Signed-off-by: Tom Herbert --- ip/iproute_lwtunnel.c | 58 +-- 1 file changed, 56 insertions(+), 2 deletions(-) diff --git a/ip/iproute_lwtunnel.c b/ip/iproute_lwtunnel.c index bdbb15d..b656143 100644 --- a/ip/iproute_lwtunnel.c +++ b/ip/iproute_lwtunnel.c @@ -90,6 +90,32 @@ static void print_encap_ip(FILE *fp, struct rtattr *encap) fprintf(fp, "tos %d ", rta_getattr_u8(tb[LWTUNNEL_IP_TOS])); } +static char *ila_csum_mode2name(__u8 csum_mode) +{ + switch (csum_mode) { + case ILA_CSUM_ADJUST_TRANSPORT: + return "adj-transport"; + case ILA_CSUM_NEUTRAL_MAP: + return "neutral-map"; + case ILA_CSUM_NO_ACTION: + return "no-action"; + default: + return "unknown"; + } +} + +static __u8 ila_csum_name2mode(char *name) +{ + if (strcmp(name, "adj-transport") == 0) + return ILA_CSUM_ADJUST_TRANSPORT; + else if (strcmp(name, "neutral-map") == 0) + return ILA_CSUM_NEUTRAL_MAP; + else if (strcmp(name, "no-action") == 0) + return ILA_CSUM_NO_ACTION; + else + return -1; +} + static void print_encap_ila(FILE *fp, struct rtattr *encap) { struct rtattr *tb[ILA_ATTR_MAX+1]; @@ -103,6 +129,10 @@ static void print_encap_ila(FILE *fp, struct rtattr *encap) abuf, sizeof(abuf)); fprintf(fp, " %s ", abuf); } + + if (tb[ILA_ATTR_CSUM_MODE]) + fprintf(fp, " csum-mode %s ", + ila_csum_mode2name(rta_getattr_u8(tb[ILA_ATTR_CSUM_MODE]))); } static void print_encap_ip6(FILE *fp, struct rtattr *encap) @@ -246,10 +276,34 @@ static int parse_encap_ila(struct rtattr *rta, size_t len, exit(1); } + argc--; argv++; + rta_addattr64(rta, 1024, ILA_ATTR_LOCATOR, locator); - *argcp = argc; - *argvp = argv; + while (argc > 0) { + if (strcmp(*argv, "csum-mode") == 0) { + __u8 csum_mode; + + NEXT_ARG(); + + csum_mode = ila_csum_name2mode(*argv); + if (csum_mode < 0) + invarg("\"csum-mode\" value is invalid\n", *argv); + + rta_addattr8(rta, 1024, ILA_ATTR_CSUM_MODE, csum_mode); + + argc--; argv++; + } else { + break; + } + } + + /* argv is currently the first unparsed argument, +* but the lwt_parse_encap() caller will move to the next, +* so step back +*/ + *argcp = argc + 1; + *argvp = argv - 1; return 0; } -- 2.8.0.rc2
[PATCH iproute v3 2/5] ila: Support for configuring ila to use netfilter hook
Signed-off-by: Tom Herbert --- ip/Makefile| 2 +- ip/ip.c| 3 +- ip/ip_common.h | 1 + ip/ipila.c | 268 + 4 files changed, 272 insertions(+), 2 deletions(-) create mode 100644 ip/ipila.c diff --git a/ip/Makefile b/ip/Makefile index 33e9286..86c8cdc 100644 --- a/ip/Makefile +++ b/ip/Makefile @@ -7,7 +7,7 @@ IPOBJ=ip.o ipaddress.o ipaddrlabel.o iproute.o iprule.o ipnetns.o \ iplink_vxlan.o tcp_metrics.o iplink_ipoib.o ipnetconf.o link_ip6tnl.o \ link_iptnl.o link_gre6.o iplink_bond.o iplink_bond_slave.o iplink_hsr.o \ iplink_bridge.o iplink_bridge_slave.o ipfou.o iplink_ipvlan.o \ -iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o +iplink_geneve.o iplink_vrf.o iproute_lwtunnel.o ipmacsec.o ipila.o RTMONOBJ=rtmon.o diff --git a/ip/ip.c b/ip/ip.c index 166ef17..cb3adcb 100644 --- a/ip/ip.c +++ b/ip/ip.c @@ -51,7 +51,7 @@ static void usage(void) " ip [ -force ] -batch filename\n" "where OBJECT := { link | address | addrlabel | route | rule | neigh | ntable |\n" " tunnel | tuntap | maddress | mroute | mrule | monitor | xfrm |\n" -" netns | l2tp | fou | macsec | tcp_metrics | token | netconf }\n" +" netns | l2tp | fou | macsec | tcp_metrics | token | netconf | ila }\n" " OPTIONS := { -V[ersion] | -s[tatistics] | -d[etails] | -r[esolve] |\n" "-h[uman-readable] | -iec |\n" "-f[amily] { inet | inet6 | ipx | dnet | mpls | bridge | link } |\n" @@ -84,6 +84,7 @@ static const struct cmd { { "link", do_iplink }, { "l2tp", do_ipl2tp }, { "fou",do_ipfou }, + { "ila",do_ipila }, { "macsec", do_ipmacsec }, { "tunnel", do_iptunnel }, { "tunl", do_iptunnel }, diff --git a/ip/ip_common.h b/ip/ip_common.h index c818812..93ff5bc 100644 --- a/ip/ip_common.h +++ b/ip/ip_common.h @@ -52,6 +52,7 @@ int do_netns(int argc, char **argv); int do_xfrm(int argc, char **argv); int do_ipl2tp(int argc, char **argv); int do_ipfou(int argc, char **argv); +extern int do_ipila(int argc, char **argv); int do_tcp_metrics(int argc, char **argv); int do_ipnetconf(int argc, char **argv); int do_iptoken(int argc, char **argv); diff --git a/ip/ipila.c b/ip/ipila.c new file mode 100644 index 000..42da9f2 --- /dev/null +++ b/ip/ipila.c @@ -0,0 +1,268 @@ +/* + * ipila.c ILA (Identifier Locator Addressing) support + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * Authors:Tom Herbert + */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include "libgenl.h" +#include "utils.h" +#include "ip_common.h" + +static void usage(void) +{ + fprintf(stderr, "Usage: ip ila add loc_match LOCATOR_MATCH " + "loc LOCATOR [ dev DEV ]\n"); + fprintf(stderr, " ip ila del loc_match LOCATOR_MATCH " + "[ loc LOCATOR ] [ dev DEV ]\n"); + fprintf(stderr, " ip ila list\n"); + fprintf(stderr, "\n"); + + exit(-1); +} + +/* netlink socket */ +static struct rtnl_handle genl_rth = { .fd = -1 }; +static int genl_family = -1; + +#define ILA_REQUEST(_req, _bufsiz, _cmd, _flags) \ + GENL_REQUEST(_req, _bufsiz, genl_family, 0, \ +ILA_GENL_VERSION, _cmd, _flags) + +#define ILA_RTA(g) ((struct rtattr *)(((char *)(g)) + \ + NLMSG_ALIGN(sizeof(struct genlmsghdr + +#define ADDR_BUF_SIZE sizeof(":::") + +static int print_addr64(__u64 addr, char *buff, size_t len) +{ + __u16 *words = (__u16 *)&addr; + __u16 v; + int i, ret; + size_t written = 0; + char *sep = ":"; + + for (i = 0; i < 4; i++) { + v = ntohs(words[i]); + + if (i == 3) + sep = ""; + + ret = snprintf(&buff[written], len - written, "%x%s", v, sep); + if (ret < 0) + return ret; + + written += ret; + } + + return written; +} + +static void print_ila_locid(FILE *fp, int attr, struct rtattr *tb[], int space) +{ + char abuf[256]; + size_t blen; + int i; + + if (tb[attr]) { + blen = print_addr64(rta_getattr_u32(tb[attr]), + abuf, sizeof(abuf)); + fprintf(fp, "%s", abuf); + } else { + fprintf(fp, "-"); + blen = 1; + } + + for (i = 0; i < space - blen; i++) + fprintf(fp, " "); +} + +static int print_ila_mapping(const struct sockaddr_nl *who, +
[PATCH iproute v3 4/5] gre6: Support for fou encapsulation
Signed-off-by: Tom Herbert --- ip/link_gre.c | 2 +- ip/link_gre6.c | 101 + 2 files changed, 102 insertions(+), 1 deletion(-) diff --git a/ip/link_gre.c b/ip/link_gre.c index 5dc4067..3b99e56 100644 --- a/ip/link_gre.c +++ b/ip/link_gre.c @@ -429,7 +429,7 @@ static void gre_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[]) fputs("external ", f); if (tb[IFLA_GRE_ENCAP_TYPE] && - *(__u16 *)RTA_DATA(tb[IFLA_GRE_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) { + rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]) != TUNNEL_ENCAP_NONE) { __u16 type = rta_getattr_u16(tb[IFLA_GRE_ENCAP_TYPE]); __u16 flags = rta_getattr_u16(tb[IFLA_GRE_ENCAP_FLAGS]); __u16 sport = rta_getattr_u16(tb[IFLA_GRE_ENCAP_SPORT]); diff --git a/ip/link_gre6.c b/ip/link_gre6.c index 6767ef6..d00db1f 100644 --- a/ip/link_gre6.c +++ b/ip/link_gre6.c @@ -38,6 +38,9 @@ static void print_usage(FILE *f) fprintf(f, " [ hoplimit TTL ] [ encaplimit ELIM ]\n"); fprintf(f, " [ tclass TCLASS ] [ flowlabel FLOWLABEL ]\n"); fprintf(f, " [ dscp inherit ] [ dev PHYS_DEV ]\n"); + fprintf(f, " [ noencap ] [ encap { fou | gue | none } ]\n"); + fprintf(f, " [ encap-sport PORT ] [ encap-dport PORT ]\n"); + fprintf(f, " [ [no]encap-csum ] [ [no]encap-csum6 ] [ [no]encap-remcsum ]\n"); fprintf(f, "\n"); fprintf(f, "Where: NAME := STRING\n"); fprintf(f, " ADDR := IPV6_ADDRESS\n"); @@ -86,6 +89,10 @@ static int gre_parse_opt(struct link_util *lu, int argc, char **argv, unsigned int flags = 0; __u8 hop_limit = DEFAULT_TNL_HOP_LIMIT; __u8 encap_limit = IPV6_DEFAULT_TNL_ENCAP_LIMIT; + __u16 encaptype = 0; + __u16 encapflags = TUNNEL_ENCAP_FLAG_CSUM6; + __u16 encapsport = 0; + __u16 encapdport = 0; int len; if (!(n->nlmsg_flags & NLM_F_CREATE)) { @@ -146,6 +153,18 @@ get_failed: if (greinfo[IFLA_GRE_FLAGS]) flags = rta_getattr_u32(greinfo[IFLA_GRE_FLAGS]); + + if (greinfo[IFLA_GRE_ENCAP_TYPE]) + encaptype = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_TYPE]); + + if (greinfo[IFLA_GRE_ENCAP_FLAGS]) + encapflags = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_FLAGS]); + + if (greinfo[IFLA_GRE_ENCAP_SPORT]) + encapsport = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_SPORT]); + + if (greinfo[IFLA_GRE_ENCAP_DPORT]) + encapdport = rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_DPORT]); } while (argc > 0) { @@ -277,6 +296,40 @@ get_failed: if (strcmp(*argv, "inherit") != 0) invarg("not inherit", *argv); flags |= IP6_TNL_F_RCV_DSCP_COPY; + } else if (strcmp(*argv, "noencap") == 0) { + encaptype = TUNNEL_ENCAP_NONE; + } else if (strcmp(*argv, "encap") == 0) { + NEXT_ARG(); + if (strcmp(*argv, "fou") == 0) + encaptype = TUNNEL_ENCAP_FOU; + else if (strcmp(*argv, "gue") == 0) + encaptype = TUNNEL_ENCAP_GUE; + else if (strcmp(*argv, "none") == 0) + encaptype = TUNNEL_ENCAP_NONE; + else + invarg("Invalid encap type.", *argv); + } else if (strcmp(*argv, "encap-sport") == 0) { + NEXT_ARG(); + if (strcmp(*argv, "auto") == 0) + encapsport = 0; + else if (get_u16(&encapsport, *argv, 0)) + invarg("Invalid source port.", *argv); + } else if (strcmp(*argv, "encap-dport") == 0) { + NEXT_ARG(); + if (get_u16(&encapdport, *argv, 0)) + invarg("Invalid destination port.", *argv); + } else if (strcmp(*argv, "encap-csum") == 0) { + encapflags |= TUNNEL_ENCAP_FLAG_CSUM; + } else if (strcmp(*argv, "noencap-csum") == 0) { + encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM; + } else if (strcmp(*argv, "encap-udp6-csum") == 0) { + encapflags |= TUNNEL_ENCAP_FLAG_CSUM6; + } else if (strcmp(*argv, "noencap-udp6-csum") == 0) { + encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM6; + } else if (strcmp(*argv, "encap-remcsum") == 0) { + encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM; + } else if (strcmp(*argv, "noencap-remcsum") == 0) { +
[PATCH iproute v3 3/5] ip6tnl: Support for fou encapsulation
Signed-off-by: Tom Herbert --- ip/link_ip6tnl.c | 92 +++- 1 file changed, 91 insertions(+), 1 deletion(-) diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c index 89861c6..59162a3 100644 --- a/ip/link_ip6tnl.c +++ b/ip/link_ip6tnl.c @@ -37,6 +37,9 @@ static void print_usage(FILE *f) fprintf(f, " [ dev PHYS_DEV ] [ encaplimit ELIM ]\n"); fprintf(f, " [ hoplimit HLIM ] [ tclass TCLASS ] [ flowlabel FLOWLABEL ]\n"); fprintf(f, " [ dscp inherit ] [ fwmark inherit ]\n"); + fprintf(f, " [ noencap ] [ encap { fou | gue | none } ]\n"); + fprintf(f, " [ encap-sport PORT ] [ encap-dport PORT ]\n"); + fprintf(f, " [ [no]encap-csum ] [ [no]encap-csum6 ] [ [no]encap-remcsum ]\n"); fprintf(f, "\n"); fprintf(f, "Where: NAME := STRING\n"); fprintf(f, " ADDR := IPV6_ADDRESS\n"); @@ -82,6 +85,10 @@ static int ip6tunnel_parse_opt(struct link_util *lu, int argc, char **argv, __u32 flags = 0; __u32 link = 0; __u8 proto = 0; + __u16 encaptype = 0; + __u16 encapflags = TUNNEL_ENCAP_FLAG_CSUM6; + __u16 encapsport = 0; + __u16 encapdport = 0; if (!(n->nlmsg_flags & NLM_F_CREATE)) { if (rtnl_talk(&rth, &req.n, &req.n, sizeof(req)) < 0) { @@ -182,7 +189,7 @@ get_failed: if (get_u8(&uval, *argv, 0)) invarg("invalid HLIM", *argv); hop_limit = uval; - } else if (matches(*argv, "encaplimit") == 0) { + } else if (strcmp(*argv, "encaplimit") == 0) { NEXT_ARG(); if (strcmp(*argv, "none") == 0) { flags |= IP6_TNL_F_IGN_ENCAP_LIMIT; @@ -236,6 +243,40 @@ get_failed: if (strcmp(*argv, "inherit") != 0) invarg("not inherit", *argv); flags |= IP6_TNL_F_USE_ORIG_FWMARK; + } else if (strcmp(*argv, "noencap") == 0) { + encaptype = TUNNEL_ENCAP_NONE; + } else if (strcmp(*argv, "encap") == 0) { + NEXT_ARG(); + if (strcmp(*argv, "fou") == 0) + encaptype = TUNNEL_ENCAP_FOU; + else if (strcmp(*argv, "gue") == 0) + encaptype = TUNNEL_ENCAP_GUE; + else if (strcmp(*argv, "none") == 0) + encaptype = TUNNEL_ENCAP_NONE; + else + invarg("Invalid encap type.", *argv); + } else if (strcmp(*argv, "encap-sport") == 0) { + NEXT_ARG(); + if (strcmp(*argv, "auto") == 0) + encapsport = 0; + else if (get_u16(&encapsport, *argv, 0)) + invarg("Invalid source port.", *argv); + } else if (strcmp(*argv, "encap-dport") == 0) { + NEXT_ARG(); + if (get_u16(&encapdport, *argv, 0)) + invarg("Invalid destination port.", *argv); + } else if (strcmp(*argv, "encap-csum") == 0) { + encapflags |= TUNNEL_ENCAP_FLAG_CSUM; + } else if (strcmp(*argv, "noencap-csum") == 0) { + encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM; + } else if (strcmp(*argv, "encap-udp6-csum") == 0) { + encapflags |= TUNNEL_ENCAP_FLAG_CSUM6; + } else if (strcmp(*argv, "noencap-udp6-csum") == 0) { + encapflags &= ~TUNNEL_ENCAP_FLAG_CSUM6; + } else if (strcmp(*argv, "encap-remcsum") == 0) { + encapflags |= TUNNEL_ENCAP_FLAG_REMCSUM; + } else if (strcmp(*argv, "noencap-remcsum") == 0) { + encapflags |= ~TUNNEL_ENCAP_FLAG_REMCSUM; } else usage(); argc--, argv++; @@ -250,6 +291,11 @@ get_failed: addattr32(n, 1024, IFLA_IPTUN_FLAGS, flags); addattr32(n, 1024, IFLA_IPTUN_LINK, link); + addattr16(n, 1024, IFLA_IPTUN_ENCAP_TYPE, encaptype); + addattr16(n, 1024, IFLA_IPTUN_ENCAP_FLAGS, encapflags); + addattr16(n, 1024, IFLA_IPTUN_ENCAP_SPORT, htons(encapsport)); + addattr16(n, 1024, IFLA_IPTUN_ENCAP_DPORT, htons(encapdport)); + return 0; } @@ -334,6 +380,50 @@ static void ip6tunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb if (flags & IP6_TNL_F_USE_ORIG_FWMARK) fprintf(f, "fwmark inherit "); + + if (tb[IFLA_IPTUN_ENCAP_TYPE] && + rta_getattr_u16(tb[IFLA_IPTUN_ENCAP_TYPE]) != + TUNNEL_ENCAP_NONE) { + __u16 type
[PATCH iproute v3 5/5] fou: Allowing configuring IPv6 listener
Signed-off-by: Tom Herbert --- ip/ipfou.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/ip/ipfou.c b/ip/ipfou.c index 2a6ae17..0673d11 100644 --- a/ip/ipfou.c +++ b/ip/ipfou.c @@ -25,8 +25,9 @@ static void usage(void) { - fprintf(stderr, "Usage: ip fou add port PORT { ipproto PROTO | gue }\n"); - fprintf(stderr, " ip fou del port PORT\n"); + fprintf(stderr, "Usage: ip fou add port PORT " + "{ ipproto PROTO | gue } [ -6 ]\n"); + fprintf(stderr, " ip fou del port PORT [ -6 ]\n"); fprintf(stderr, "\n"); fprintf(stderr, "Where: PROTO { ipproto-name | 1..255 }\n"); fprintf(stderr, " PORT { 1..65535 }\n"); @@ -50,6 +51,7 @@ static int fou_parse_opt(int argc, char **argv, struct nlmsghdr *n, __u8 ipproto, type; bool gue_set = false; int ipproto_set = 0; + unsigned short family = AF_INET; while (argc > 0) { if (!matches(*argv, "port")) { @@ -71,6 +73,8 @@ static int fou_parse_opt(int argc, char **argv, struct nlmsghdr *n, ipproto_set = 1; } else if (!matches(*argv, "gue")) { gue_set = true; + } else if (!matches(*argv, "-6")) { + family = AF_INET6; } else { fprintf(stderr, "fou: unknown command \"%s\"?\n", *argv); usage(); @@ -98,6 +102,7 @@ static int fou_parse_opt(int argc, char **argv, struct nlmsghdr *n, addattr16(n, 1024, FOU_ATTR_PORT, port); addattr8(n, 1024, FOU_ATTR_TYPE, type); + addattr16(n, 1024, FOU_ATTR_AF, family); if (ipproto_set) addattr8(n, 1024, FOU_ATTR_IPPROTO, ipproto); -- 2.8.0.rc2
Re: [RFC PATCH v5 1/3] net: Add mask for Control register 10Mbps speed
On 08/09/2016 02:34 AM, Kedareswara rao Appana wrote: > This patch adds mask for the Control register > 10Mbps speed. > > Signed-off-by: Kedareswara rao Appana Reviewed-by: Florian Fainelli -- Florian
Re: [PATCH RESEND net-next 15/15] smc: proc-fs interface for smc connections
From: Ursula Braun Date: Tue, 9 Aug 2016 12:13:00 +0200 > + sock_hold(&smc->sk); ... > +out_line: > + seq_putc(m, '\n'); > + sock_put(&smc->sk); You hold the smc_proc_list_lock during this function's execution, therefore the table cannot change and therefore the socket cannot go away. Therefore taking a reference count here is unnecessary overhead, please remove it.
Re: [PATCH v1 1/1] net: phy: Add edge-rate, mac-if, read, write func to Microsemi PHYs.
On 08/08/2016 06:43 AM, Nagaraju Lakkaraju wrote: > crosemi PHYsBcc: > Subject: [PATCH v1 1/1] net: phy: Add edge-rate, mac-if, read, write func to > Reply-To: Nagaraju Lakkaraju > > Hello, > > As part of 2nd patch, Add Edge rate control, MAC Interface, Read and write > driver functions add for Microsemi PHYs. > > Please review and send your comments. First thing is to get the initial patch accepted, which is *not* the case yet, then you can submit incremental changes using the accepted driver as a baseline. -- Florian
Re: [PATCH RESEND net-next 13/15] smc: receive data from RMBE
From: Ursula Braun Date: Tue, 9 Aug 2016 12:12:58 +0200 > + xchg(&conn->rx_curs_confirmed.acurs, > + smc_curs_read(conn->local_tx_ctrl.cons.acurs)); Why in the world do you need to use xchg() in all of these places? It makes no sense whatsoever, especially since you don't even check the return value. If you need the operation to be atomic, then you have to check the return value and do something to recover if something else beat you to the xchg() and put something else into the location. Otherwise, you therefore don't need it be atomic and can avoid this expensive operation and just store the value normally.
Re: [RFC PATCH v5 3/3] net: phy: Add gmiitorgmii converter support
On 08/09/2016 02:34 AM, Kedareswara rao Appana wrote: > This patch adds support for gmiitorgmii converter. > > The GMII to RGMII IP core provides the Reduced Gigabit Media > Independent Interface (RGMII) between Ethernet physical media > Devices and the Gigabit Ethernet controller. This core can > Switch dynamically between the three different speed modes of > Operation by configuring the converter register through mdio write. > > MDIO interface is used to set operating speed of Ethernet MAC. > > This converter sits between the MAC and the external phy > MAC <==> GMII2RGMII <==> RGMII_PHY This looks good, just a few things, see below: > > Signed-off-by: Kedareswara rao Appana > --- > Thanks a lot Andrew for your inputs. > Changes for v5: > --> Fixed return values in the probe as suggested by punnaiah. > --> Added a mask for the converter speed as suggested by punnaiah. > Changes for v4: > --> Updated phydev speed for all 3 speeds as suggested by zhuyj. > Changes for v3: > --> Updated the driver as suggested by Andrew. > Changes for v2: > --> Passed struct xphy pointer directly to the fix_mac_speed > API as suggested by the Florian. > --> Added checks for the phy-node fail case as suggested > by the Florian > > drivers/net/phy/Kconfig | 8 +++ > drivers/net/phy/Makefile| 1 + > drivers/net/phy/xilinx_gmii2rgmii.c | 121 > > 3 files changed, 130 insertions(+) > create mode 100644 drivers/net/phy/xilinx_gmii2rgmii.c > > diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig > index 1b534ea..c79f347 100644 > --- a/drivers/net/phy/Kconfig > +++ b/drivers/net/phy/Kconfig > @@ -312,6 +312,14 @@ config MICROSEMI_PHY > ---help--- >Currently supports the VSC8531 and VSC8541 PHYs > > +config XILINX_GMII2RGMII > + tristate "Xilinx GMII2RGMII converter driver" > + default y Don't force that, or at least make the default based on the potential users/drivers here. > + ---help--- > + This driver support xilinx GMII to RGMII IP core it provides > + the Reduced Gigabit Media Independent Interface(RGMII) between > + Ethernet physical media devices and the Gigabit Ethernet controller. > + > endif # PHYLIB > > config MICREL_KS8995MA > diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile > index a713bd4..73d65ce 100644 > --- a/drivers/net/phy/Makefile > +++ b/drivers/net/phy/Makefile > @@ -50,3 +50,4 @@ obj-$(CONFIG_MDIO_BCM_IPROC)+= mdio-bcm-iproc.o > obj-$(CONFIG_INTEL_XWAY_PHY) += intel-xway.o > obj-$(CONFIG_MDIO_HISI_FEMAC)+= mdio-hisi-femac.o > obj-$(CONFIG_MDIO_XGENE) += mdio-xgene.o > +obj-$(CONFIG_XILINX_GMII2RGMII) += xilinx_gmii2rgmii.o > diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c > b/drivers/net/phy/xilinx_gmii2rgmii.c > new file mode 100644 > index 000..1456e27 > --- /dev/null > +++ b/drivers/net/phy/xilinx_gmii2rgmii.c > @@ -0,0 +1,121 @@ > +/* Xilinx GMII2RGMII Converter driver > + * > + * Copyright (C) 2016 Xilinx, Inc. > + * > + * Author: Kedareswara rao Appana > + * > + * Description: > + * This driver is developed for Xilinx GMII2RGMII Converter > + * > + * This program is free software: you can redistribute it and/or modify > + * it under the terms of the GNU General Public License as published by > + * the Free Software Foundation, either version 2 of the License, or > + * (at your option) any later version. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ > +#include > +#include > +#include > +#include > +#include > +#include > + > +#define XILINX_GMII2RGMII_REG0x10 > +#define XILINX_GMII2RGMII_SPEED_MASK 0x2040 BMCR_SPEED1000 | BMCR_SPEED100 would be clearer here. > + > +struct gmii2rgmii { > + struct phy_device *phy_dev; > + struct phy_driver *phy_drv; > + struct phy_driver conv_phy_drv; > + int addr; > +}; > + > +static int xgmiitorgmii_read_status(struct phy_device *phydev) > +{ > + struct gmii2rgmii *priv = (struct gmii2rgmii *)phydev->priv; Casting is not required here, priv is void *. > + u16 val = 0; > + > + priv->phy_drv->read_status(phydev); > + > + val = mdiobus_read(phydev->mdio.bus, priv->addr, XILINX_GMII2RGMII_REG); > + val &= XILINX_GMII2RGMII_SPEED_MASK; > + > + switch (phydev->speed) { > + case SPEED_1000: > + val |= BMCR_SPEED1000; Is the fall through really intentional here? See genphy_setup_forced() for instance. > + case SPEED_100: > + val |= BMCR_SPEED100; > + case SPEED_10: > + val |= BMCR_SPEED10; > + } > + > + mdiobus_write(phydev->mdio.bus, priv->addr, XILINX_GMII2RGMII_REG, val); > + > + return 0; > +} [snip] > +static int __init xgmiitorgmii_i
[patch net-next v6 2/3] net: core: add SW stats to if_stats_msg
From: Nogah Frankel Add a nested attribute of SW stats to if_stats_msg under IFLA_STATS_LINK_SW_64. Signed-off-by: Nogah Frankel Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko --- include/uapi/linux/if_link.h | 1 + net/core/rtnetlink.c | 21 + 2 files changed, 22 insertions(+) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index a1b5202..1c9b808 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -825,6 +825,7 @@ enum { IFLA_STATS_LINK_64, IFLA_STATS_LINK_XSTATS, IFLA_STATS_LINK_XSTATS_SLAVE, + IFLA_STATS_LINK_SW_64, __IFLA_STATS_MAX, }; diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 189cc78..910f802 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -3583,6 +3583,21 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev, dev_get_stats(dev, sp); } + if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_SW_64, *idxattr)) { + if (dev_have_sw_stats(dev)) { + struct rtnl_link_stats64 *sp; + + attr = nla_reserve_64bit(skb, IFLA_STATS_LINK_SW_64, +sizeof(struct rtnl_link_stats64), +IFLA_STATS_UNSPEC); + if (!attr) + goto nla_put_failure; + + sp = nla_data(attr); + dev_get_sw_stats(dev, sp); + } + } + if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_XSTATS, *idxattr)) { const struct rtnl_link_ops *ops = dev->rtnl_link_ops; @@ -3644,6 +3659,7 @@ nla_put_failure: static const struct nla_policy ifla_stats_policy[IFLA_STATS_MAX + 1] = { [IFLA_STATS_LINK_64]= { .len = sizeof(struct rtnl_link_stats64) }, + [IFLA_STATS_LINK_SW_64] = { .len = sizeof(struct rtnl_link_stats64) }, }; static size_t if_nlmsg_stats_size(const struct net_device *dev, @@ -3685,6 +3701,11 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev, } } + if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_SW_64, 0)) { + if (dev_have_sw_stats(dev)) + size += nla_total_size_64bit(sizeof(struct rtnl_link_stats64)); + } + return size; } -- 2.5.5
[patch net-next v6 0/3] return offloaded stats as default and expose original sw stats
From: Jiri Pirko The problem we try to handle is about offloaded forwarded packets which are not seen by kernel. Let me try to draw it: port1 port2 (HW stats are counted here) \ / \/ \ / --(A) ASIC --(B)-- | (C) | CPU (SW stats are counted here) Now we have couple of flows for TX and RX (direction does not matter here): 1) port1->A->ASIC->C->CPU For this flow, HW and SW stats are equal. 2) port1->A->ASIC->C->CPU->C->ASIC->B->port2 For this flow, HW and SW stats are equal. 3) port1->A->ASIC->B->port2 For this flow, SW stats are 0. The purpose of this patchset is to provide facility for user to find out the difference between flows 1+2 and 3. In other words, user will be able to see the statistics for the slow-path (through kernel). Also note that HW stats are what someone calls "accumulated" stats. Every packet counted by SW is also counted by HW. Not the other way around. As a default the accumulated stats (HW) will be exposed to user so the userspace apps can react properly. --- v5->v6: - patch 2/4 was dropped as requested by Roopa - patch 1/3: - comment changed to indicate that default stats are combined stats - commit massage changed - patch 2/3: (previously 3/4) - SW stats return nothing if there is no SW stats ndo v4->v5: - updated cover letter - patch3/4: - using memcpy directly to copy stats as requested by DaveM v3->v4: - patch1/4: - fixed "return ()" pointed out by EricD - patch2/4: - fixed if_nlmsg_size as pointed out by EricD v2->v3: - patch1/4: - added dev_have_sw_stats helper - patch2/4: - avoided memcpy as requested by DaveM - patch3/4: - use new dev_have_sw_stats helper v1->v2: - patch3/4: - fixed NULL initialization Nogah Frankel (3): netdevice: add SW statistics ndo net: core: add SW stats to if_stats_msg mlxsw: spectrum: Implement SW stats ndo and expose HW stats by default drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 105 +++-- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 5 ++ include/linux/netdevice.h | 14 include/uapi/linux/if_link.h | 1 + net/core/dev.c | 31 net/core/rtnetlink.c | 21 + 6 files changed, 171 insertions(+), 6 deletions(-) -- 2.5.5
[patch net-next v6 1/3] netdevice: add SW statistics ndo
From: Nogah Frankel Stats should return the number of packets that went though a port by SW or HW. But when one has offloaded traffic, one might want to know how many packets went via slow path. So this ndo return SW statistics for packets that went via CPU, (opposed to HW counter that count all the packets, slow path or not). Add a new ndo declaration to get SW statistics. Add a function that gets SW statistics if a compatible ndo exist. Signed-off-by: Nogah Frankel Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko --- include/linux/netdevice.h | 14 ++ net/core/dev.c| 31 +++ 2 files changed, 45 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 076df53..4f5c0875 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -922,6 +922,15 @@ struct netdev_xdp { *field is written atomically. * 3. Update dev->stats asynchronously and atomically, and define *neither operation. + * If there are both SW and HW stats, driver should return combined + * stats. + * + * struct rtnl_link_stats64* (*ndo_get_sw_stats64)(struct net_device *dev, + * struct rtnl_link_stats64 *storage); + * Similar to rtnl_link_stats64 but used to get SW statistics, + * if it is possible to get HW and SW statistics separately. + * If this option isn't valid - driver doesn't need to define + * this function. * * int (*ndo_vlan_rx_add_vid)(struct net_device *dev, __be16 proto, u16 vid); * If device supports VLAN filtering this function is called when a @@ -1154,6 +1163,9 @@ struct net_device_ops { struct rtnl_link_stats64* (*ndo_get_stats64)(struct net_device *dev, struct rtnl_link_stats64 *storage); + struct rtnl_link_stats64* (*ndo_get_sw_stats64)(struct net_device *dev, + struct rtnl_link_stats64 *storage); + struct net_device_stats* (*ndo_get_stats)(struct net_device *dev); int (*ndo_vlan_rx_add_vid)(struct net_device *dev, @@ -3798,6 +3810,8 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, struct rtnl_link_stats64 *storage); void netdev_stats_to_stats64(struct rtnl_link_stats64 *stats64, const struct net_device_stats *netdev_stats); +int dev_get_sw_stats(struct net_device *dev, struct rtnl_link_stats64 *storage); +bool dev_have_sw_stats(const struct net_device *dev); extern int netdev_max_backlog; extern int netdev_tstamp_prequeue; diff --git a/net/core/dev.c b/net/core/dev.c index 4ce07dc..e5b8cbf 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -7494,6 +7494,8 @@ EXPORT_SYMBOL(netdev_stats_to_stats64); * The device driver may provide its own method by setting * dev->netdev_ops->get_stats64 or dev->netdev_ops->get_stats; * otherwise the internal statistics structure is used. + * If device supports both HW & SW statistics - this function should + * return the combined statistics. */ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, struct rtnl_link_stats64 *storage) @@ -7515,6 +7517,35 @@ struct rtnl_link_stats64 *dev_get_stats(struct net_device *dev, } EXPORT_SYMBOL(dev_get_stats); +/* dev_get_sw_stats- get network device SW statistics + * (if it is possible to get HW & SW statistics separately) + * @dev: device to get statistics from + * @storage: place to store stats + * + * if exist a function to query the netdev SW statistics get it to storage + * return 0 if did, or -EINVAL if this function doesn't exist + */ +int dev_get_sw_stats(struct net_device *dev, +struct rtnl_link_stats64 *storage) +{ + const struct net_device_ops *ops = dev->netdev_ops; + + if (ops->ndo_get_sw_stats64) { + memset(storage, 0, sizeof(*storage)); + ops->ndo_get_sw_stats64(dev, storage); + } else { + return -EINVAL; + } + return 0; +} +EXPORT_SYMBOL(dev_get_sw_stats); + +bool dev_have_sw_stats(const struct net_device *dev) +{ + return dev->netdev_ops->ndo_get_sw_stats64 != NULL; +} +EXPORT_SYMBOL(dev_have_sw_stats); + struct netdev_queue *dev_ingress_queue_create(struct net_device *dev) { struct netdev_queue *queue = dev_ingress_queue(dev); -- 2.5.5
[patch net-next v6 3/3] mlxsw: spectrum: Implement SW stats ndo and expose HW stats by default
From: Nogah Frankel Add a function to get the SW statistics with an ndo. Change the default statistics ndo to return HW statistics (like the one returned by ethtool_ops). The HW stats are collected to a cache by delayed work every 1 sec. Signed-off-by: Nogah Frankel Reviewed-by: Ido Schimmel Signed-off-by: Jiri Pirko --- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 105 +++-- drivers/net/ethernet/mellanox/mlxsw/spectrum.h | 5 ++ 2 files changed, 104 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c index e1b8f62..29230e2 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c +++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c @@ -811,8 +811,8 @@ err_span_port_mtu_update: } static struct rtnl_link_stats64 * -mlxsw_sp_port_get_stats64(struct net_device *dev, - struct rtnl_link_stats64 *stats) +mlxsw_sp_port_get_sw_stats64(struct net_device *dev, +struct rtnl_link_stats64 *stats) { struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); struct mlxsw_sp_port_pcpu_stats *p; @@ -842,6 +842,86 @@ mlxsw_sp_port_get_stats64(struct net_device *dev, return stats; } +static int mlxsw_sp_port_get_stats_raw(struct net_device *dev, int grp, + int prio, char *ppcnt_pl) +{ + struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); + struct mlxsw_sp *mlxsw_sp = mlxsw_sp_port->mlxsw_sp; + + mlxsw_reg_ppcnt_pack(ppcnt_pl, mlxsw_sp_port->local_port, grp, prio); + return mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(ppcnt), ppcnt_pl); +} + +static int mlxsw_sp_port_get_hw_stats(struct net_device *dev, + struct rtnl_link_stats64 *stats) +{ + char ppcnt_pl[MLXSW_REG_PPCNT_LEN]; + int err; + + err = mlxsw_sp_port_get_stats_raw(dev, MLXSW_REG_PPCNT_IEEE_8023_CNT, + 0, ppcnt_pl); + if (err) + goto out; + + stats->tx_packets = + mlxsw_reg_ppcnt_a_frames_transmitted_ok_get(ppcnt_pl); + stats->rx_packets = + mlxsw_reg_ppcnt_a_frames_received_ok_get(ppcnt_pl); + stats->tx_bytes = + mlxsw_reg_ppcnt_a_octets_transmitted_ok_get(ppcnt_pl); + stats->rx_bytes = + mlxsw_reg_ppcnt_a_octets_received_ok_get(ppcnt_pl); + stats->multicast = + mlxsw_reg_ppcnt_a_multicast_frames_received_ok_get(ppcnt_pl); + + stats->rx_crc_errors = + mlxsw_reg_ppcnt_a_frame_check_sequence_errors_get(ppcnt_pl); + stats->rx_frame_errors = + mlxsw_reg_ppcnt_a_alignment_errors_get(ppcnt_pl); + + stats->rx_length_errors = ( + mlxsw_reg_ppcnt_a_in_range_length_errors_get(ppcnt_pl) + + mlxsw_reg_ppcnt_a_out_of_range_length_field_get(ppcnt_pl) + + mlxsw_reg_ppcnt_a_frame_too_long_errors_get(ppcnt_pl)); + + stats->rx_errors = (stats->rx_crc_errors + + stats->rx_frame_errors + stats->rx_length_errors); + +out: + return err; +} + +static void update_stats_cache(struct work_struct *work) +{ + struct mlxsw_sp_port *mlxsw_sp_port = + container_of(work, struct mlxsw_sp_port, +hw_stats.update_dw.work); + + if (!netif_carrier_ok(mlxsw_sp_port->dev)) + goto out; + + mlxsw_sp_port_get_hw_stats(mlxsw_sp_port->dev, + mlxsw_sp_port->hw_stats.cache); + +out: + mlxsw_core_schedule_dw(&mlxsw_sp_port->hw_stats.update_dw, + MLXSW_HW_STATS_UPDATE_TIME); +} + +/* Return the stats from a cache that is updated periodically, + * as this function might get called in an atomic context. + */ +static struct rtnl_link_stats64 * +mlxsw_sp_port_get_stats64(struct net_device *dev, + struct rtnl_link_stats64 *stats) +{ + struct mlxsw_sp_port *mlxsw_sp_port = netdev_priv(dev); + + memcpy(stats, mlxsw_sp_port->hw_stats.cache, sizeof(*stats)); + + return stats; +} + int mlxsw_sp_port_vlan_set(struct mlxsw_sp_port *mlxsw_sp_port, u16 vid_begin, u16 vid_end, bool is_member, bool untagged) { @@ -1234,6 +1314,7 @@ static const struct net_device_ops mlxsw_sp_port_netdev_ops = { .ndo_set_mac_address= mlxsw_sp_port_set_mac_address, .ndo_change_mtu = mlxsw_sp_port_change_mtu, .ndo_get_stats64= mlxsw_sp_port_get_stats64, + .ndo_get_sw_stats64 = mlxsw_sp_port_get_sw_stats64, .ndo_vlan_rx_add_vid= mlxsw_sp_port_add_vid, .ndo_vlan_rx_kill_vid = mlxsw_sp_port_kill_vid, .ndo_neigh_construct= mlxsw_sp_router_neigh_construct, @@ -1572,8 +1653,6 @@ static void __mlxsw_sp_port_get_stats(struct net_device *dev,
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 12:06:36PM -0700, David Miller wrote: > > On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: > > > > Simply not checking errors when setting the mac address solves the > > problem for me. No new features needed. > > But it only works in certain modes. > > So the best we can do is enforce the MAC address setting in the > modes that absolutely require it. We cannot ignore the MAC > address setting unilaterally. Something like this? [PATCH] bonding: Allow tun-interfaces as slaves in balance-rr mode Up until 00503b6f702e (part of 3.14-rc1), the bonding driver could be used to enslave tun-interfaces. 00503b6f702e broke that behaviour, afaics as an unintended side-effect. For the purpose of bond-over-tun in balance-rr mode, simply ignoring the error from dev_set_mac_address() is good enough. Signed-off-by: Joern Engel --- drivers/net/bonding/bond_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 1f276fa30ba6..2f686bfe4304 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -1490,7 +1490,8 @@ int bond_enslave(struct net_device *bond_dev, struct net_device *slave_dev) memcpy(addr.sa_data, bond_dev->dev_addr, bond_dev->addr_len); addr.sa_family = slave_dev->type; res = dev_set_mac_address(slave_dev, &addr); - if (res) { + /* round-robin mode works fine without a mac address */ + if (res && BOND_MODE(bond) != BOND_MODE_ROUNDROBIN) { netdev_dbg(bond_dev, "Error %d calling set_mac_address\n", res); goto err_restore_mtu; } -- 2.1.4
Re: [Patch net 1/5] net_sched: remove the leftover cleanup_a()
On 16-08-08 04:46 PM, Cong Wang wrote: After refactoring tc_action into tcf_common, we no longer need to cleanup temporary "actions" in list, they are permanently stored in the hashtable. Fixes: a85a970af265 ("net_sched: move tc_action into tcf_common") Reported-by: Jamal Hadi Salim Cc: Jamal Hadi Salim Signed-off-by: Cong Wang Cong I will test these patches and provide feedback by end of day or tommorrow morning. cheers, jamal
Re: [PATCH net v2] vti: flush x-netns xfrm cache when vti interface is removed
From: Lance Richardson Date: Tue, 9 Aug 2016 15:29:42 -0400 > When executing the script included below, the netns delete operation > hangs with the following message (repeated at 10 second intervals): > > kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 > > This occurs because a reference to the lo interface in the "secure" netns > is still held by a dst entry in the xfrm bundle cache in the init netns. > > Address this problem by garbage collecting the tunnel netns flow cache > when a cross-namespace vti interface receives a NETDEV_DOWN notification. ... > Reported-by: Hangbin Liu > Reported-by: Jan Tluka > Signed-off-by: Lance Richardson > --- > v2: Perform garbage collection on NETDEV_DOWN notification (v1 did this > in uninit op handler). Looks good, applied and queued up for -stable, thanks!
Re: [PATCH net 0/6] rxrpc: Miscellaneous fixes
From: David Howells Date: Tue, 09 Aug 2016 17:33:12 +0100 > Here are a bunch of miscellaneous fixes to AF_RXRPC: > > (*) Fix an uninitialised pointer. > > (*) Fix error handling when we fail to connect a call. > > (*) Fix a NULL pointer dereference. > > (*) Fix two occasions where a packet is accessed again after being queued > for someone else to deal with. > > (*) Fix a missing skb free. > > --- > The patches can be found here also: > > > http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes > > Tagged thusly: > > git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git > rxrpc-fixes-20160809 Pulled, thanks David.
Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver
Timur Tabi wrote: I used to work on PowerPC, so I respect making things work for both endians. However, even I think that this is overkill for my driver. First, there's no way this driver will ever be used on a big-endian system. Second, I'm pretty sure there are lots of places that would need cpu_to_le32() in order to make this driver big-endian compatible. It would be a huge mess. #define TPD_BUFFER_ADDR_H_SET(tpd, val)BITS_SET((tpd)->word[3], 18, 30, val) This macros sets specific bits based on the definition of the register. I could change it to this: #define TPD_BUFFER_ADDR_H_SET(tpd, val) BITS_SET((tpd)->word[3], 18, 30, cpu_to_le32(val)) But I honestly don't see what good that will do. There are still thousands of other places that assume little-endian. Ok, so I took another look at this, and even though I still think that it's useless, it seems to be much less difficult to implement than I initially thought. I think all I need to do is to modify the BITS_GET() and BITS_SET() macros in emac-mac.h, as well as any of the RRD_xxx and TPD_xxx macros that do not use BITS_GET() or BITS_SET(). This is a minor change which I will implement in v8 of the patch. Every other hardware access uses readl/writel, which is already endian-safe. -- Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH net v2] vti: flush x-netns xfrm cache when vti interface is removed
When executing the script included below, the netns delete operation hangs with the following message (repeated at 10 second intervals): kernel:unregister_netdevice: waiting for lo to become free. Usage count = 1 This occurs because a reference to the lo interface in the "secure" netns is still held by a dst entry in the xfrm bundle cache in the init netns. Address this problem by garbage collecting the tunnel netns flow cache when a cross-namespace vti interface receives a NETDEV_DOWN notification. A more detailed description of the problem scenario (referencing commands in the script below): (1) ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1 The vti_test interface is created in the init namespace. vti_tunnel_init() attaches a struct ip_tunnel to the vti interface's netdev_priv(dev), setting the tunnel net to &init_net. (2) ip link set vti_test netns secure The vti_test interface is moved to the "secure" netns. Note that the associated struct ip_tunnel still has tunnel->net set to &init_net. (3) ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1 The first packet sent using the vti device causes xfrm_lookup() to be called as follows: dst = xfrm_lookup(tunnel->net, skb_dst(skb), fl, NULL, 0); Note that tunnel->net is the init namespace, while skb_dst(skb) references the vti_test interface in the "secure" namespace. The returned dst references an interface in the init namespace. Also note that the first parameter to xfrm_lookup() determines which flow cache is used to store the computed xfrm bundle, so after xfrm_lookup() returns there will be a cached bundle in the init namespace flow cache with a dst referencing a device in the "secure" namespace. (4) ip netns del secure Kernel begins to delete the "secure" namespace. At some point the vti_test interface is deleted, at which point dst_ifdown() changes the dst->dev in the cached xfrm bundle flow from vti_test to lo (still in the "secure" namespace however). Since nothing has happened to cause the init namespace's flow cache to be garbage collected, this dst remains attached to the flow cache, so the kernel loops waiting for the last reference to lo to go away. ip link add br1 type bridge ip link set dev br1 up ip addr add dev br1 1.1.1.1/8 ip netns add secure ip link add vti_test type vti local 1.1.1.1 remote 1.1.1.2 key 1 ip link set vti_test netns secure ip netns exec secure ip link set vti_test up ip netns exec secure ip link s lo up ip netns exec secure ip addr add dev lo 192.168.100.1/24 ip netns exec secure ip route add 192.168.200.0/24 dev vti_test ip xfrm policy flush ip xfrm state flush ip xfrm policy add dir out tmpl src 1.1.1.1 dst 1.1.1.2 \ proto esp mode tunnel mark 1 ip xfrm policy add dir in tmpl src 1.1.1.2 dst 1.1.1.1 \ proto esp mode tunnel mark 1 ip xfrm state add src 1.1.1.1 dst 1.1.1.2 proto esp spi 1 \ mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788 ip xfrm state add src 1.1.1.2 dst 1.1.1.1 proto esp spi 1 \ mode tunnel enc des3_ede 0x112233445566778811223344556677881122334455667788 ip netns exec secure ping -c 4 -i 0.02 -I 192.168.100.1 192.168.200.1 ip netns del secure Reported-by: Hangbin Liu Reported-by: Jan Tluka Signed-off-by: Lance Richardson --- v2: Perform garbage collection on NETDEV_DOWN notification (v1 did this in uninit op handler). net/ipv4/ip_vti.c | 31 +++ 1 file changed, 31 insertions(+) diff --git a/net/ipv4/ip_vti.c b/net/ipv4/ip_vti.c index a917903..cc701fa 100644 --- a/net/ipv4/ip_vti.c +++ b/net/ipv4/ip_vti.c @@ -557,6 +557,33 @@ static struct rtnl_link_ops vti_link_ops __read_mostly = { .get_link_net = ip_tunnel_get_link_net, }; +static bool is_vti_tunnel(const struct net_device *dev) +{ + return dev->netdev_ops == &vti_netdev_ops; +} + +static int vti_device_event(struct notifier_block *unused, + unsigned long event, void *ptr) +{ + struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct ip_tunnel *tunnel = netdev_priv(dev); + + if (!is_vti_tunnel(dev)) + return NOTIFY_DONE; + + switch (event) { + case NETDEV_DOWN: + if (!net_eq(tunnel->net, dev_net(dev))) + xfrm_garbage_collect(tunnel->net); + break; + } + return NOTIFY_DONE; +} + +static struct notifier_block vti_notifier_block __read_mostly = { + .notifier_call = vti_device_event, +}; + static int __init vti_init(void) { const char *msg; @@ -564,6 +591,8 @@ static int __init vti_init(void) pr_info("IPv4 over IPsec tunneling driver\n"); + register_netdevice_notifier(&vti_notifier_block); + msg = "tunnel device"; err = register_pernet_device(&vti_net_ops); if (err < 0) @@ -596,6 +625,7 @@ xfrm_proto_ah_failed: xfrm_proto_esp_failed: unregister_pern
Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver
On 08/09/2016 11:25 AM, Timur Tabi wrote: > I need some help figuring that out. Like I said, I didn't write this > driver initially, so there are parts that I don't really understand. I > copied the above code from other drivers, but after studying your > question, I think I understand what you're asking. I just don't know > how to fix it. > > First of all, why do other drivers test MAX_SKB_FRAGS + 1? Why the +1? The 1 is for the non-fragment part of the SKB, like its head. > > The driver originally used function emac_tx_has_enough_descs() to > determine if there is enough room for the new packet. Then I changed > the code as you suggested, and now it guesses how many descriptors need > to be free to support the next packet. That seems fine and expected. > > If I'm reading emac_tx_fill_tpd() correctly, there could be as many as > (2 + skb_shinfo(skb)->nr_frags) descriptors for a given packet. I don't > know how big nr_frags could get, so I don't know how to calculate the > number of descriptors I really need. I'm assuming I need to do > something like this: nr_frags can't be bigger than MAX_SKB_FRAGS, hence these checks all other drivers do against 1 + MAX_SKB_FRAGS. > > However, I'm confused about one thing. Almost every other driver just > sets "netdev->mtu = new_mtu" and does nothing else. I can't find any > other driver that actually stops the RX path, reprograms the hardware, > and then restarts the RX path. I know this is a stupid question, but > why is my driver doing that? Most drivers allocate RX buffer sizes that are usually bigger than the MTU, but would probably silently fail or expose transient behavior once the MTU changes to greater than the size pre-defined. > > Can I get away with just calling netdev_update_features()? MTU change is a pretty disruptive change for the HW I typically work with since we need to choose a RX buffer size that is aligned to the DRAM controller burst size, reprogram the MAC to accept packets up to that size, and potentially change the RX ring allocation strategy and typical buffer size. None of these requirements are unusual or unique, they ost likely apply to most MACs out there, my guess is that MTU change is barely tested. -- Florian
Re: size of data_segs_[in|out] and segs_[in|out]
On 8/9/16 3:04 PM, David Miller wrote: From: rapier Date: Tue, 9 Aug 2016 13:17:59 -0400 I cannot deny that would be a problem but conversely, those applications are currently in a position where they may be depending on inaccurate data. I'm not advocating breaking things for the sake of breaking things but my feeling is that this will eventually need to be addressed. Since the segs datastructure is a relatively recent addition it might make sense to make that change now before it's even more baked in to other applications. Changing the size or position of these data structure members is simply not an option. It is locked into stone, and a permanent ABI which we cannot change. Please stop discussing as if changing this is a possibility. I apologize, I wasn't aware of the constraints on this. There are other solutions available and I'll focus on those.
Re: [PATCH] bonding: Allow tun-interfaces as slaves
From: Jörn Engel Date: Tue, 9 Aug 2016 11:40:57 -0700 > On Tue, Aug 09, 2016 at 11:21:31AM -0700, Jay Vosburgh wrote: >> The balance-rr mode (as well as the -xor mode) is designed to >> interoperate with a Cisco Etherchannel-style static link aggregation, >> which requires all members to have the same MAC address for proper >> function. > > Linux was designed to be a terminal for dialup to a university in > Helsinki, if memory serves. Sometimes it is a good thing to work in > ways the design never intended. You're not addressing the issue Jay is trying to make you aware of in a useful way. You state that Jay doesn't want to help you, but your comment here shows that you really aren't exactly participating in the most positive manner either.
Re: [PATCH] bonding: Allow tun-interfaces as slaves
From: Jörn Engel Date: Tue, 9 Aug 2016 11:08:30 -0700 > On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: >> >> I think if the bonding dev has to support L3 virtual device, we need to add >> new bond features to distinguish the dev and make the >> bond xmit and transfer without the mac address. > > Simply not checking errors when setting the mac address solves the > problem for me. No new features needed. But it only works in certain modes. So the best we can do is enforce the MAC address setting in the modes that absolutely require it. We cannot ignore the MAC address setting unilaterally.
Re: size of data_segs_[in|out] and segs_[in|out]
From: rapier Date: Tue, 9 Aug 2016 13:17:59 -0400 > I cannot deny that would be a problem but conversely, those > applications are currently in a position where they may be depending > on inaccurate data. I'm not advocating breaking things for the sake of > breaking things but my feeling is that this will eventually need to be > addressed. Since the segs datastructure is a relatively recent > addition it might make sense to make that change now before it's even > more baked in to other applications. Changing the size or position of these data structure members is simply not an option. It is locked into stone, and a permanent ABI which we cannot change. Please stop discussing as if changing this is a possibility. You need to find another solution to the problem, or advocate a user side solution (periodic polling).
RE: [Intel-wired-lan] e1000e: PHY cann't be initialized correctly on some I218 controllers
> -Original Message- > From: Avargil, Raanan > Sent: Tuesday, August 9, 2016 8:11 AM > To: Denis Turischev ; intel-wired- > l...@lists.osuosl.org; Brown, Aaron F ; Kirsher, > Jeffrey T > Cc: netdev@vger.kernel.org; linux-ker...@vger.kernel.org > Subject: Re: [Intel-wired-lan] e1000e: PHY cann't be initialized correctly on > some I218 controllers > > > > On 8/7/2016 11:56, Denis Turischev wrote: > > Hi Raanan, > > > > On 08/04/2016 07:18 PM, Raanan Avargil wrote: > >> It is hard to determine what is exactly the problem here and what causes > the -3 error, but this is not the place to perform hw reset and retry to init > the > phy. > > I agree that may be this is not the correct place for hw reset, can you > > advice > about another solution? > > > > It works, not only for me, for example it helps here: > > http://www.linux.org.ru/forum/linux-hardware/10268216 > > > > Another fact that Windows drivers work fine with the same hw. > > > > Denis > > > > Hi Denis, > > More suitable place to retry init the device would be in probe function. > However, we haven't seen initialization issues like this on i218. > > Adding Aaron. > > @Aaron - have you seen initialization errors on i218 devices? I have not. But I don't have many systems with i218 so all that means is my systems do not exhibit the problem. > > Raanan
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 11:21:31AM -0700, Jay Vosburgh wrote: > Jörn Engel wrote: > >On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote: > >> > >> I don't understand your problem clearly, can you explain more about how > >> the 00503b6f702e break tun-interfaces > >> and we will try to fix it. > > > >Here is a trivial testcase: > >openvpn --mktun --dev tun0 > >echo +tun0 > /sys/class/net/bond0/bonding/slaves > > > >Worked fine before your patch, no longer works after your patch. Works > >again after my patch. > > Could you describe your use case a bit further? Are you bonding > together multiple VPN tunnels? Yes. Specificaly I use "ssh -w" to create tunnels. Ssh is single-threaded, so the tunnel is too slow. Aggregate a bunch and you get closer to link speed. Alternative would be pfSense. Afaics that easily beats anything Linux can offer. I'm just more familiar with Linux and trust ssh security more than most alternatives. > This may be a regression, but since the patch that nominally > introduced it was 2 years ago, the impact appears to be very narrow. Did you check the dates on the other two bug reports? Anyone experiencing the problem and checking google will come to the conclusion that you don't care and not bother sending yet another bug report. You then come to the conclusion that users don't care. > >> and more, dev_set_mac_address will change the salver's mac address, some > >> nic don't support to change the mac address and > >> could not work as bond slave, so we need to check the return value, I > >> don't think this patch has any effective improvement. > > > >Using bonding in balance-rr mode, there doesn't seem to be a need to > >change the mac address. I suppose you might care in other modes, but I > >don't. > > The balance-rr mode (as well as the -xor mode) is designed to > interoperate with a Cisco Etherchannel-style static link aggregation, > which requires all members to have the same MAC address for proper > function. Linux was designed to be a terminal for dialup to a university in Helsinki, if memory serves. Sometimes it is a good thing to work in ways the design never intended. Jörn -- A defeated army first battles and then seeks victory. -- Sun Tzu
Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver
On Tue, Aug 9, 2016 at 1:25 PM, Timur Tabi wrote: > Lino Sanfilippo wrote: > >>> +/* Fill up transmit descriptors */ >>> +static void emac_tx_fill_tpd(struct emac_adapter *adpt, >>> +struct emac_tx_queue *tx_q, struct sk_buff >>> *skb, >>> +struct emac_tpd *tpd) >>> +{ >>> + u16 nr_frags = skb_shinfo(skb)->nr_frags; >>> + unsigned int len = skb_headlen(skb); >>> + struct emac_buffer *tpbuf = NULL; >>> + unsigned int mapped_len = 0; >>> + unsigned int i; >>> + int ret; >>> + >>> + /* if Large Segment Offload is (in TCP Segmentation Offload >>> struct) */ >>> + if (TPD_LSO(tpd)) { >>> + mapped_len = skb_transport_offset(skb) + tcp_hdrlen(skb); >>> + >>> + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx); >>> + tpbuf->length = mapped_len; >>> + tpbuf->dma_addr = >>> dma_map_single(adpt->netdev->dev.parent, >>> +skb->data, mapped_len, >>> +DMA_TO_DEVICE); >>> + ret = dma_mapping_error(adpt->netdev->dev.parent, >>> + tpbuf->dma_addr); >>> + if (ret) { >>> + dev_kfree_skb(skb); >>> + return; >>> + } >>> + >>> + TPD_BUFFER_ADDR_L_SET(tpd, >>> lower_32_bits(tpbuf->dma_addr)); >>> + TPD_BUFFER_ADDR_H_SET(tpd, >>> upper_32_bits(tpbuf->dma_addr)); >> >> >> You should also take big endian systems into account. This means that if >> the multi-byte values >> in the descriptors require little-endian you have to convert from host >> byte order to le and >> vice versa. You can use cpu_to_le32() and friends for this. > > > I used to work on PowerPC, so I respect making things work for both endians. > However, even I think that this is overkill for my driver. First, there's no > way this driver will ever be used on a big-endian system. Second, I'm > pretty sure there are lots of places that would need cpu_to_le32() in order > to make this driver big-endian compatible. It would be a huge mess. I thought that too about Calxeda systems and then someone went off and made them run BE. I was surprised it worked, but I guess when the h/w doesn't try to do swizzling of i/o things just work. Rob
Re: [PATCH] [v7] net: emac: emac gigabit ethernet controller driver
Lino Sanfilippo wrote: +/* Fill up transmit descriptors */ +static void emac_tx_fill_tpd(struct emac_adapter *adpt, +struct emac_tx_queue *tx_q, struct sk_buff *skb, +struct emac_tpd *tpd) +{ + u16 nr_frags = skb_shinfo(skb)->nr_frags; + unsigned int len = skb_headlen(skb); + struct emac_buffer *tpbuf = NULL; + unsigned int mapped_len = 0; + unsigned int i; + int ret; + + /* if Large Segment Offload is (in TCP Segmentation Offload struct) */ + if (TPD_LSO(tpd)) { + mapped_len = skb_transport_offset(skb) + tcp_hdrlen(skb); + + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx); + tpbuf->length = mapped_len; + tpbuf->dma_addr = dma_map_single(adpt->netdev->dev.parent, +skb->data, mapped_len, +DMA_TO_DEVICE); + ret = dma_mapping_error(adpt->netdev->dev.parent, + tpbuf->dma_addr); + if (ret) { + dev_kfree_skb(skb); + return; + } + + TPD_BUFFER_ADDR_L_SET(tpd, lower_32_bits(tpbuf->dma_addr)); + TPD_BUFFER_ADDR_H_SET(tpd, upper_32_bits(tpbuf->dma_addr)); You should also take big endian systems into account. This means that if the multi-byte values in the descriptors require little-endian you have to convert from host byte order to le and vice versa. You can use cpu_to_le32() and friends for this. I used to work on PowerPC, so I respect making things work for both endians. However, even I think that this is overkill for my driver. First, there's no way this driver will ever be used on a big-endian system. Second, I'm pretty sure there are lots of places that would need cpu_to_le32() in order to make this driver big-endian compatible. It would be a huge mess. #define TPD_BUFFER_ADDR_H_SET(tpd, val) BITS_SET((tpd)->word[3], 18, 30, val) This macros sets specific bits based on the definition of the register. I could change it to this: #define TPD_BUFFER_ADDR_H_SET(tpd, val) BITS_SET((tpd)->word[3], 18, 30, cpu_to_le32(val)) But I honestly don't see what good that will do. There are still thousands of other places that assume little-endian. + TPD_BUF_LEN_SET(tpd, tpbuf->length); + emac_tx_tpd_create(adpt, tx_q, tpd); + } + + if (mapped_len < len) { + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx); + tpbuf->length = len - mapped_len; + tpbuf->dma_addr = dma_map_single(adpt->netdev->dev.parent, +skb->data + mapped_len, +tpbuf->length, DMA_TO_DEVICE); + ret = dma_mapping_error(adpt->netdev->dev.parent, + tpbuf->dma_addr); + if (ret) { + dev_kfree_skb(skb); + return; + } + + TPD_BUFFER_ADDR_L_SET(tpd, lower_32_bits(tpbuf->dma_addr)); + TPD_BUFFER_ADDR_H_SET(tpd, upper_32_bits(tpbuf->dma_addr)); + TPD_BUF_LEN_SET(tpd, tpbuf->length); + emac_tx_tpd_create(adpt, tx_q, tpd); + } + + for (i = 0; i < nr_frags; i++) { + struct skb_frag_struct *frag; + + frag = &skb_shinfo(skb)->frags[i]; + + tpbuf = GET_TPD_BUFFER(tx_q, tx_q->tpd.produce_idx); + tpbuf->length = frag->size; + tpbuf->dma_addr = dma_map_page(adpt->netdev->dev.parent, + frag->page.p, frag->page_offset, + tpbuf->length, DMA_TO_DEVICE); + ret = dma_mapping_error(adpt->netdev->dev.parent, + tpbuf->dma_addr); + if (ret) { + dev_kfree_skb(skb); + return; + } In case of error you need to undo all mappings that you have done so far. Ok. + + TPD_BUFFER_ADDR_L_SET(tpd, lower_32_bits(tpbuf->dma_addr)); + TPD_BUFFER_ADDR_H_SET(tpd, upper_32_bits(tpbuf->dma_addr)); + TPD_BUF_LEN_SET(tpd, tpbuf->length); + emac_tx_tpd_create(adpt, tx_q, tpd); + } + + /* The last tpd */ + emac_tx_tpd_mark_last(adpt, tx_q); Use a wmb() here to make sure that all writes to the descriptors in dma memory are completed before you update the producer register (see memory-barriers.txt for the reason why this is needed) Ok. +/* Transmit the packet using specified transmit queue */ +int emac_mac_tx_buf_send(struct emac_adapter *adpt, struct emac_tx_queue *tx_q, +struct sk_buff *skb) +{ + struct emac_tpd tpd;
Re: [PATCH] bonding: Allow tun-interfaces as slaves
Jörn Engel wrote: >Hello Tianhong! > >On Tue, Aug 09, 2016 at 10:18:41AM +0800, Ding Tianhong wrote: >> >> I don't understand your problem clearly, can you explain more about how the >> 00503b6f702e break tun-interfaces >> and we will try to fix it. > >Here is a trivial testcase: >openvpn --mktun --dev tun0 >echo +tun0 > /sys/class/net/bond0/bonding/slaves > >Worked fine before your patch, no longer works after your patch. Works >again after my patch. Could you describe your use case a bit further? Are you bonding together multiple VPN tunnels? This may be a regression, but since the patch that nominally introduced it was 2 years ago, the impact appears to be very narrow. >> and more, dev_set_mac_address will change the salver's mac address, some nic >> don't support to change the mac address and >> could not work as bond slave, so we need to check the return value, I don't >> think this patch has any effective improvement. > >Using bonding in balance-rr mode, there doesn't seem to be a need to >change the mac address. I suppose you might care in other modes, but I >don't. The balance-rr mode (as well as the -xor mode) is designed to interoperate with a Cisco Etherchannel-style static link aggregation, which requires all members to have the same MAC address for proper function. Now, the above notwithstanding, I don't have an issue if you want to bond together a couple of tun devices and can make it work. However, for the standard balance-rr case, the enslavement must fail if the call to set the slave MAC fails, as permitting the slave into the bond after set-MAC fails can result in a bond that silently loses packets. My tentative suggestion is that we extened fail_over_mac to cover additional modes, as a sort of "I really know what I'm doing" flag, and allow the enslavement to succeed when it is set. This would require setting an additional bonding option for this situation, but that doesn't seem to be a undue burden as this looks to be a niche use case. -J --- -Jay Vosburgh, jay.vosbu...@canonical.com
4.7.0: RCU stall in nf_conntrack
Hi, I just experienced network hangup with 4.7.0, it happened shortly after resume from hibernate: [201988.443552] INFO: rcu_preempt detected stalls on CPUs/tasks: [201988.443556] Tasks blocked on level-0 rcu_node (CPUs 0-3): P14563 [201988.443557] (detected by 3, t=18002 jiffies, g=7365154, c=7365153, q=15274) [201988.443560] client_socket_t R running task0 14563 1 0x [201988.443563] 8800c427a900 e1b77832 880217603da0 810bf66a [201988.443565] 810bf5d1 8800c427a900 81e566c0 880217603dd0 [201988.443567] 8119a3cf 8802177d80c0 81e566c0 81f89ae0 [201988.443569] Call Trace: [201988.443571][] sched_show_task+0xfa/0x160 [201988.443585] [] ? sched_show_task+0x61/0x160 [201988.443587] [] rcu_print_detail_task_stall_rnp+0x52/0x76 [201988.443590] [] rcu_check_callbacks+0x866/0x9e0 [201988.443592] [] update_process_times+0x39/0x60 [201988.443594] [] tick_sched_handle.isra.5+0x21/0x60 [201988.443596] [] tick_sched_timer+0x42/0x70 [201988.443598] [] __hrtimer_run_queues+0x140/0x3c0 [201988.443599] [] ? tick_sched_handle.isra.5+0x60/0x60 [201988.443601] [] hrtimer_interrupt+0xb3/0x1c0 [201988.443603] [] local_apic_timer_interrupt+0x36/0x60 [201988.443606] [] smp_apic_timer_interrupt+0x3d/0x50 [201988.443607] [] apic_timer_interrupt+0x8c/0xa0 [201988.443608][] ? __nf_conntrack_find_get+0x285/0x420 [201988.443611] [] ? nf_conntrack_in+0x1d1/0x8d0 [201988.443612] [] nf_conntrack_in+0x1d1/0x8d0 [201988.443615] [] ipv4_conntrack_local+0x45/0x50 [201988.443616] [] nf_iterate+0x62/0x80 [201988.443618] [] nf_hook_slow+0xa0/0x110 [201988.443620] [] ? nf_hook_slow+0x5/0x110 [201988.443622] [] __ip_local_out+0xd8/0x120 [201988.443624] [] ? ip_forward_options+0x1f0/0x1f0 [201988.443625] [] ip_local_out+0x1c/0x70 [201988.443627] [] ip_queue_xmit+0x18f/0x450 [201988.443628] [] ? ip_queue_xmit+0x5/0x450 [201988.443630] [] tcp_transmit_skb+0x48b/0x8e0 [201988.443632] [] tcp_connect+0x629/0x830 [201988.443634] [] ? secure_tcp_sequence_number+0x7f/0xe0 [201988.443636] [] tcp_v4_connect+0x2b9/0x460 [201988.443638] [] __inet_stream_connect+0xb2/0x310 [201988.443640] [] ? preempt_count_sub+0xa1/0x100 [201988.443642] [] ? lock_sock_nested+0x31/0x90 [201988.443644] [] ? __local_bh_enable_ip+0x6f/0xd0 [201988.443646] [] inet_stream_connect+0x38/0x50 [201988.443647] [] SyS_connect+0x7b/0xf0 [201988.443649] [] ? sock_alloc_file+0xa5/0x140 [201988.443651] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [201988.443652] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [201988.443654] client_socket_t R running task0 14563 1 0x [201988.443656] 8800c427a900 e1b77832 880217603da0 810bf66a [201988.443658] 810bf5d1 8800c427a900 81e566c0 880217603dd0 [201988.443660] 8119a3cf 8802177d80c0 81e566c0 81f89ae0 [201988.443662] Call Trace: [201988.443663][] sched_show_task+0xfa/0x160 [201988.443665] [] ? sched_show_task+0x61/0x160 [201988.443666] [] rcu_print_detail_task_stall_rnp+0x52/0x76 [201988.443668] [] rcu_check_callbacks+0x89f/0x9e0 [201988.443669] [] update_process_times+0x39/0x60 [201988.443671] [] tick_sched_handle.isra.5+0x21/0x60 [201988.443672] [] tick_sched_timer+0x42/0x70 [201988.443674] [] __hrtimer_run_queues+0x140/0x3c0 [201988.443675] [] ? tick_sched_handle.isra.5+0x60/0x60 [201988.443677] [] hrtimer_interrupt+0xb3/0x1c0 [201988.443679] [] local_apic_timer_interrupt+0x36/0x60 [201988.443680] [] smp_apic_timer_interrupt+0x3d/0x50 [201988.443682] [] apic_timer_interrupt+0x8c/0xa0 [201988.443682][] ? __nf_conntrack_find_get+0x285/0x420 [201988.443685] [] ? nf_conntrack_in+0x1d1/0x8d0 [201988.443686] [] nf_conntrack_in+0x1d1/0x8d0 [201988.443688] [] ipv4_conntrack_local+0x45/0x50 [201988.443689] [] nf_iterate+0x62/0x80 [201988.443691] [] nf_hook_slow+0xa0/0x110 [201988.443692] [] ? nf_hook_slow+0x5/0x110 [201988.443694] [] __ip_local_out+0xd8/0x120 [201988.443696] [] ? ip_forward_options+0x1f0/0x1f0 [201988.443697] [] ip_local_out+0x1c/0x70 [201988.443699] [] ip_queue_xmit+0x18f/0x450 [201988.443700] [] ? ip_queue_xmit+0x5/0x450 [201988.443702] [] tcp_transmit_skb+0x48b/0x8e0 [201988.443703] [] tcp_connect+0x629/0x830 [201988.443705] [] ? secure_tcp_sequence_number+0x7f/0xe0 [201988.443706] [] tcp_v4_connect+0x2b9/0x460 [201988.443708] [] __inet_stream_connect+0xb2/0x310 [201988.443710] [] ? preempt_count_sub+0xa1/0x100 [201988.443711] [] ? lock_sock_nested+0x31/0x90 [201988.443713] [] ? __local_bh_enable_ip+0x6f/0xd0 [201988.443715] [] inet_stream_connect+0x38/0x50 [201988.443716] [] SyS_connect+0x7b/0xf0 [201988.443718] [] ? sock_alloc_file+0xa5/0x140 [201988.443719] [] ? trace_hardirqs_on_thunk+0x1a/0x1c [201988.443720] [] entry_SYSCALL_64_fastpath+0x1f/0xbd [202168.442569] INFO: rcu_preempt detected stalls on CPUs/tasks: [202168.442572] Tasks
Re: [PATCHv2 3/4] pci: Determine actual VPD size on first access
On Tue, Aug 9, 2016 at 5:54 AM, Alexey Kardashevskiy wrote: > On 10/02/16 08:04, Bjorn Helgaas wrote: >> On Wed, Jan 13, 2016 at 12:25:34PM +0100, Hannes Reinecke wrote: >>> PCI-2.2 VPD entries have a maximum size of 32k, but might actually >>> be smaller than that. To figure out the actual size one has to read >>> the VPD area until the 'end marker' is reached. >>> Trying to read VPD data beyond that marker results in 'interesting' >>> effects, from simple read errors to crashing the card. And to make >>> matters worse not every PCI card implements this properly, leaving >>> us with no 'end' marker or even completely invalid data. >>> This path tries to determine the size of the VPD data. >>> If no valid data can be read an I/O error will be returned when >>> reading the sysfs attribute. > > > I have a problem with this particular feature as today VFIO uses this > pci_vpd_ API to virtualize access to VPD and the existing code assumes > there is just one VPD block with 0x2 start and 0xf end. However I have at > least one device where this is not true - "10 Gigabit Ethernet-SR PCI > Express Adapter" - it has 2 blocks (made a script to read/parse it as > /sys/bus/pci/devices/0001\:03\:00.0/vpd shows it wrong): The PCI spec is what essentially assumes that there is only one block. If I am not mistaken in the case of this device the second block here actually contains device configuration data, not actual VPD data. The issue here is that the second block is being accessed as VPD when it isn't. > # Large item 42 bytes; name 0x2 Identifier String > #002d Large item 74 bytes; name 0x10 > #007a Small item 1 bytes; name 0xf End Tag > --- > #0c00 Large item 16 bytes; name 0x2 Identifier String > #0c13 Large item 234 bytes; name 0x10 > #0d00 Large item 252 bytes; name 0x11 > #0dff Small item 0 bytes; name 0xf End Tag The second block here is driver proprietary setup bits. > The cxgb3 driver is reading the second bit starting from 0xc00 but since > the size is wrongly detected as 0x7c, VFIO blocks access beyond it and the > guest driver fails to probe. > > I also cannot find a clause in the PCI 3.0 spec saying that there must be > just a single block, is it there? The problem is we need to be able to parse it. The spec defines a series of tags that can be used starting at offset 0. That is how we are supposed to get around through the VPD data. The problem is we can't have more than one end tag and what appears to be happening here is that we are defining a second block of data which uses the same formatting as VPD but is not VPD. > What would the correct fix be? Scanning all 32k of VPD is not an option I > suppose as this is what this patch is trying to avoid. Thanks. I adding the current cxgb3 maintainer and netdev list to the Cc. This is something that can probably be addressed via a PCI quirk as what needs to happen is that we need to extend the VPD in the case of this part in order to include this second block. As long as we can read the VPD data all the way out to 0xdff odds are we could probably just have the size arbitrarily increased to 0xe00 via the quirk and then you would be able to access all of the VPD for the device. We already have code making other modifications to drivers/pci/quirks.c for several Broadcom devices and probably just need something similar to allow extended access in the case of these devices. > > > > This is the device: > > [aik@p81-p9 ~]$ sudo lspci -vvnns 0001:03:00.0 > 0001:03:00.0 Ethernet controller [0200]: Chelsio Communications Inc T310 > 10GbE Single Port Adapter [1425:0030] > Subsystem: IBM Device [1014:038c] > Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- > Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- > SERR- Latency: 0 > Interrupt: pin A routed to IRQ 494 > Region 0: Memory at 3fe08088 (64-bit, non-prefetchable) [size=4K] > Region 2: Memory at 3fe08000 (64-bit, non-prefetchable) [size=8M] > Region 4: Memory at 3fe080881000 (64-bit, non-prefetchable) [size=4K] > [virtual] Expansion ROM at 3fe08080 [disabled] [size=512K] > Capabilities: [40] Power Management version 3 > Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA > PME(D0+,D1-,D2-,D3hot+,D3cold-) > Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [48] MSI: Enable- Count=1/32 Maskable- 64bit+ > Address: Data: > Capabilities: [58] Express (v2) Endpoint, MSI 00 > DevCap: MaxPayload 4096 bytes, PhantFunc 0, Latency L0s > <64ns, L1 <1us > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ > Unsupported+ > RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 256 bytes, MaxR
Re: size of data_segs_[in|out] and segs_[in|out]
On 8/8/16 7:02 PM, David Miller wrote: From: rapier Date: Mon, 8 Aug 2016 18:02:29 -0400 As such, would it be feasible to define these instruments as 64bit instead of 32bit? If so, a cursory look at the code seems to indicate that this would only require a change in the header files. It would break every application looking at these datastructures right now. I cannot deny that would be a problem but conversely, those applications are currently in a position where they may be depending on inaccurate data. I'm not advocating breaking things for the sake of breaking things but my feeling is that this will eventually need to be addressed. Since the segs datastructure is a relatively recent addition it might make sense to make that change now before it's even more baked in to other applications.
Re: [PATCH] bonding: Allow tun-interfaces as slaves
On Tue, Aug 09, 2016 at 09:28:45PM +0800, Ding Tianhong wrote: > > This patch is a simple solution for this problem, but I don't think it is the > right solution, the bond is a virtual device base on L2, > if the slave has no mac address, it will break the design principle, so we > need to think more about it. The important point is: it worked. It solved a problem that at least three people cared enough about to send a bug report. Now it doesn't work anymore. That is a regression. Whether or not L2 has always been a design principle for bonding can be argued as well. But in the face of a regression, I suggest we fix the regression. > I think if the bonding dev has to support L3 virtual device, we need to add > new bond features to distinguish the dev and make the > bond xmit and transfer without the mac address. Simply not checking errors when setting the mac address solves the problem for me. No new features needed. If you want to retain error handling, you can make those checks conditional on the mode. In balance-rr or broadcast mode, ignore the error. I don't need and haven't tested broadcast mode, but it doesn't seem to depend on any L2 attributes either. Jörn -- Do not stop an army on its way home. -- Sun Tzu
Re: [Patch net 5/5] net_sched: convert tcf_exts from list to flex_array
On Tue, Aug 9, 2016 at 1:03 AM, Amir Vadai wrote: > >> -#define tc_single_action(_exts) \ >> - (list_is_singular(&(_exts)->actions)) >> +#define tc_no_actions(_exts) (&(_exts)->nr_actions == 0) >> +#define tc_single_action(_exts) (&(_exts)->nr_actions == 1) > > Should remove the '&' here. Good catch! I even didn't notice the '&' there. :-/ I will wait for Jamal's comments before sending v2. Thanks.
[PATCH net] ibmveth: Disable tx queue while changing mtu
If the device is running while the MTU is changed, ibmveth is closed and the bounce buffer is freed. If a transmission is sent before ibmveth can be reopened, ibmveth_start_xmit tries to copy to the null bounce buffer, leading to a kernel oops. The proposed solution disables the tx queue until ibmveth is restarted. Reported-by: Jan Stancek Tested-by: Jan Stancek Signed-off-by: Thomas Falcon --- drivers/net/ethernet/ibm/ibmveth.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/ibm/ibmveth.c b/drivers/net/ethernet/ibm/ibmveth.c index ebe6071..9a74e4c 100644 --- a/drivers/net/ethernet/ibm/ibmveth.c +++ b/drivers/net/ethernet/ibm/ibmveth.c @@ -1362,6 +1362,7 @@ static int ibmveth_change_mtu(struct net_device *dev, int new_mtu) /* Deactivate all the buffer pools so that the next loop can activate only the buffer pools necessary to hold the new MTU */ if (netif_running(adapter->netdev)) { + netif_tx_disable(dev); need_restart = 1; adapter->pool_config = 1; ibmveth_close(adapter->netdev); @@ -1378,14 +1379,18 @@ static int ibmveth_change_mtu(struct net_device *dev, int new_mtu) ibmveth_get_desired_dma (viodev)); if (need_restart) { - return ibmveth_open(adapter->netdev); + rc = ibmveth_open(adapter->netdev); + netif_wake_queue(dev); + return rc; } return 0; } } - if (need_restart && (rc = ibmveth_open(adapter->netdev))) + if (need_restart && (rc = ibmveth_open(adapter->netdev))) { + netif_wake_queue(dev); return rc; + } return -EINVAL; } -- 2.4.11
Re: [PATCH v2] net: dsa: b53: constify b53_io_ops structures
On 08/09/2016 10:09 AM, Julia Lawall wrote: > The b53_io_ops structures are never modified, so declare them as const. > > Done with the help of Coccinelle. > > Signed-off-by: Julia Lawall Acked-by: Florian Fainelli -- Florian
Re: [PATCH] net: dsa: b53: constify xfrm_replay structures
On Tue, 9 Aug 2016, Florian Fainelli wrote: > On 08/09/2016 09:58 AM, Julia Lawall wrote: > > The xfrm_replay structures are never modified, so declare them as const. > > You mean b53_io_ops here, right? Oops. I sent a v2. julia > Other than that LTGM, but this will > have to wait for "net"next to re_open since this is more of a cleanup > than bugfix. Thanks! > -- > Florian > -- > To unsubscribe from this list: send the line "unsubscribe kernel-janitors" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH] dm9000: Fix irq trigger type setup on non-dt platforms
Sylwester Nawrocki writes: > Commit b5a099c67a1c36b "net: ethernet: davicom: fix devicetree irq > resource" causes an interrupt storm after the ethernet interface > is activated on S3C24XX platform (ARM non-dt), due to the interrupt > trigger type not being set properly. > > It seems, after adding parsing of IRQ flags in commit 7085a7401ba54e92b > "drivers: platform: parse IRQ flags from resources", there is no path > for non-dt platforms where irq_set_type callback could be invoked when > we don't pass the trigger type flags to the request_irq() call. > > In case of a board where the regression is seen the interrupt trigger > type flags are passed through a platform device's resource and it is > not currently handled properly without passing the irq trigger type > flags to the request_irq() call. In case of OF an of_irq_get() call > within platform_get_irq() function seems to be ensuring required irq_chip > setup, but there is no equivalent code for non OF/ACPI platforms. > > This patch mostly restores irq trigger type setting code which has been > removed in commit ("net: ethernet: davicom: fix devicetree irq resource"). > > Fixes: b5a099c67a1c36b913 ("net: ethernet: davicom: fix devicetree irq > resource") > > Signed-off-by: Sylwester Nawrocki > --- > > Perhaps instead the core could be configuring the irqchip automatically as it > is done for OF/ACPI cases. I had doubts though if trying to make such changes > for a bug fix patch was the right thing to do. Hi Sylvester, You're right, and I came to the same conclusion a bit earlier, in [1], but I didn't notice my FAI didn't actually send the mail. Your analysis of the core in non-OF/ACPI case is the reason I didn't post a patch for dm9000 ... I was overconfident in finding a reason in irq core code within a couple of days. Therefore: Acked-by: Robert Jarzmik And I can make a test for you on my cm-x300 board, even if your patch is very alike the draft I had in my internal tree since then. Cheers. -- Robert [1] Non-delivered mail, shame on me From: Robert Jarzmik To: Linus Walleij Cc: linux-arm-ker...@lists.infradead.org, Thomas Gleixner , linux-ker...@vger.kernel.org "David S. Miller" Subject: platform_get_irq and trigger types X-URL: http://belgarath.falguerolles.org/ Date: Sat, 21 May 2016 11:16:09 +0200 Message-ID: <87y473hiue@belgarion.home> User-Agent: Gnus/5.130008 (Ma Gnus v0.8) Emacs/24.4 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Hi Linus, I was bitten again by the rising/falling flags of interrupt flags. The commit which triggered this "regression" (the wording regression is rather incorrect so please don't take this as an incentive to revert) is : b5a099c67a1c ("net: ethernet: davicom: fix devicetree irq resource") The exact context is that for platform type builds, the irq rising edge flag is not activated in the irqchip, ie. in the gpio-pxa.c pxa_gpio_irq_type() is not called. The board used for this test is arch/arm/mach-pxa/cm-x300.c (line 200). Now I've started to add printks here and there, and from a first glance : - platform_get_irq() is correctly calling irqd_set_trigger_type() - but upon the request_irq() in drivers/net/ethernet/davicom/dm9000.c:1319, these flags are not taken into account => this is where commit b5a099c67a1c comes into play => re-adding irq_get_trigger_type(dev->irq) to the passed flags does solve the issue I tried to ponder whether my commit was wrong, or if it's the gpio-pxa.c which is wrong, or something else. My inner feeling is that dm9000.c code is now correct, and that something else is happening that I don't understand. I'm bringing this to your attention if you have an idea before I begin to dig deeper, add printk() and go down to the problem. Cheers. -- Robert
[PATCH v2] net: dsa: b53: constify b53_io_ops structures
The b53_io_ops structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall --- v2: Refer to the right structure in the commit message drivers/net/dsa/b53/b53_common.c |3 ++- drivers/net/dsa/b53/b53_mdio.c |2 +- drivers/net/dsa/b53/b53_mmap.c |2 +- drivers/net/dsa/b53/b53_priv.h |3 ++- drivers/net/dsa/b53/b53_spi.c|2 +- drivers/net/dsa/b53/b53_srab.c |2 +- 6 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index bda37d3..38ee10d 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1681,7 +1681,8 @@ static int b53_switch_init(struct b53_device *dev) return 0; } -struct b53_device *b53_switch_alloc(struct device *base, struct b53_io_ops *ops, +struct b53_device *b53_switch_alloc(struct device *base, + const struct b53_io_ops *ops, void *priv) { struct dsa_switch *ds; diff --git a/drivers/net/dsa/b53/b53_mdio.c b/drivers/net/dsa/b53/b53_mdio.c index aa87c3f..477a16b 100644 --- a/drivers/net/dsa/b53/b53_mdio.c +++ b/drivers/net/dsa/b53/b53_mdio.c @@ -267,7 +267,7 @@ static int b53_mdio_phy_write16(struct b53_device *dev, int addr, int reg, return mdiobus_write_nested(bus, addr, reg, value); } -static struct b53_io_ops b53_mdio_ops = { +static const struct b53_io_ops b53_mdio_ops = { .read8 = b53_mdio_read8, .read16 = b53_mdio_read16, .read32 = b53_mdio_read32, diff --git a/drivers/net/dsa/b53/b53_mmap.c b/drivers/net/dsa/b53/b53_mmap.c index 77ffc43..cc9e6bd 100644 --- a/drivers/net/dsa/b53/b53_mmap.c +++ b/drivers/net/dsa/b53/b53_mmap.c @@ -208,7 +208,7 @@ static int b53_mmap_write64(struct b53_device *dev, u8 page, u8 reg, return 0; } -static struct b53_io_ops b53_mmap_ops = { +static const struct b53_io_ops b53_mmap_ops = { .read8 = b53_mmap_read8, .read16 = b53_mmap_read16, .read32 = b53_mmap_read32, diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h index 835a744..d268493 100644 --- a/drivers/net/dsa/b53/b53_priv.h +++ b/drivers/net/dsa/b53/b53_priv.h @@ -182,7 +182,8 @@ static inline int is_cpu_port(struct b53_device *dev, int port) return dev->cpu_port; } -struct b53_device *b53_switch_alloc(struct device *base, struct b53_io_ops *ops, +struct b53_device *b53_switch_alloc(struct device *base, + const struct b53_io_ops *ops, void *priv); int b53_switch_detect(struct b53_device *dev); diff --git a/drivers/net/dsa/b53/b53_spi.c b/drivers/net/dsa/b53/b53_spi.c index 2bda0b5..1c847a1 100644 --- a/drivers/net/dsa/b53/b53_spi.c +++ b/drivers/net/dsa/b53/b53_spi.c @@ -270,7 +270,7 @@ static int b53_spi_write64(struct b53_device *dev, u8 page, u8 reg, u64 value) return spi_write(spi, txbuf, sizeof(txbuf)); } -static struct b53_io_ops b53_spi_ops = { +static const struct b53_io_ops b53_spi_ops = { .read8 = b53_spi_read8, .read16 = b53_spi_read16, .read32 = b53_spi_read32, diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c index 3e2d4a5..8a62b6a 100644 --- a/drivers/net/dsa/b53/b53_srab.c +++ b/drivers/net/dsa/b53/b53_srab.c @@ -344,7 +344,7 @@ err: return ret; } -static struct b53_io_ops b53_srab_ops = { +static const struct b53_io_ops b53_srab_ops = { .read8 = b53_srab_read8, .read16 = b53_srab_read16, .read32 = b53_srab_read32,
Re: [PATCH] net: dsa: b53: constify xfrm_replay structures
On 08/09/2016 10:23 AM, Florian Fainelli wrote: > On 08/09/2016 09:58 AM, Julia Lawall wrote: >> The xfrm_replay structures are never modified, so declare them as const. > > You mean b53_io_ops here, right? Other than that LTGM, but this will > have to wait for "net"next to re_open since this is more of a cleanup > than bugfix. Thanks! net-next is open again, just noticed that, please resubmit with s/xfrm_read/b53_io_ops/, thanks -- Florian
Re: [PATCH] net: dsa: b53: constify xfrm_replay structures
On 08/09/2016 09:58 AM, Julia Lawall wrote: > The xfrm_replay structures are never modified, so declare them as const. You mean b53_io_ops here, right? Other than that LTGM, but this will have to wait for "net"next to re_open since this is more of a cleanup than bugfix. Thanks! -- Florian
[PATCH] net: dsa: b53: constify xfrm_replay structures
The xfrm_replay structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall --- drivers/net/dsa/b53/b53_common.c |3 ++- drivers/net/dsa/b53/b53_mdio.c |2 +- drivers/net/dsa/b53/b53_mmap.c |2 +- drivers/net/dsa/b53/b53_priv.h |3 ++- drivers/net/dsa/b53/b53_spi.c|2 +- drivers/net/dsa/b53/b53_srab.c |2 +- 6 files changed, 8 insertions(+), 6 deletions(-) diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c index bda37d3..38ee10d 100644 --- a/drivers/net/dsa/b53/b53_common.c +++ b/drivers/net/dsa/b53/b53_common.c @@ -1681,7 +1681,8 @@ static int b53_switch_init(struct b53_device *dev) return 0; } -struct b53_device *b53_switch_alloc(struct device *base, struct b53_io_ops *ops, +struct b53_device *b53_switch_alloc(struct device *base, + const struct b53_io_ops *ops, void *priv) { struct dsa_switch *ds; diff --git a/drivers/net/dsa/b53/b53_mdio.c b/drivers/net/dsa/b53/b53_mdio.c index aa87c3f..477a16b 100644 --- a/drivers/net/dsa/b53/b53_mdio.c +++ b/drivers/net/dsa/b53/b53_mdio.c @@ -267,7 +267,7 @@ static int b53_mdio_phy_write16(struct b53_device *dev, int addr, int reg, return mdiobus_write_nested(bus, addr, reg, value); } -static struct b53_io_ops b53_mdio_ops = { +static const struct b53_io_ops b53_mdio_ops = { .read8 = b53_mdio_read8, .read16 = b53_mdio_read16, .read32 = b53_mdio_read32, diff --git a/drivers/net/dsa/b53/b53_mmap.c b/drivers/net/dsa/b53/b53_mmap.c index 77ffc43..cc9e6bd 100644 --- a/drivers/net/dsa/b53/b53_mmap.c +++ b/drivers/net/dsa/b53/b53_mmap.c @@ -208,7 +208,7 @@ static int b53_mmap_write64(struct b53_device *dev, u8 page, u8 reg, return 0; } -static struct b53_io_ops b53_mmap_ops = { +static const struct b53_io_ops b53_mmap_ops = { .read8 = b53_mmap_read8, .read16 = b53_mmap_read16, .read32 = b53_mmap_read32, diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h index 835a744..d268493 100644 --- a/drivers/net/dsa/b53/b53_priv.h +++ b/drivers/net/dsa/b53/b53_priv.h @@ -182,7 +182,8 @@ static inline int is_cpu_port(struct b53_device *dev, int port) return dev->cpu_port; } -struct b53_device *b53_switch_alloc(struct device *base, struct b53_io_ops *ops, +struct b53_device *b53_switch_alloc(struct device *base, + const struct b53_io_ops *ops, void *priv); int b53_switch_detect(struct b53_device *dev); diff --git a/drivers/net/dsa/b53/b53_spi.c b/drivers/net/dsa/b53/b53_spi.c index 2bda0b5..1c847a1 100644 --- a/drivers/net/dsa/b53/b53_spi.c +++ b/drivers/net/dsa/b53/b53_spi.c @@ -270,7 +270,7 @@ static int b53_spi_write64(struct b53_device *dev, u8 page, u8 reg, u64 value) return spi_write(spi, txbuf, sizeof(txbuf)); } -static struct b53_io_ops b53_spi_ops = { +static const struct b53_io_ops b53_spi_ops = { .read8 = b53_spi_read8, .read16 = b53_spi_read16, .read32 = b53_spi_read32, diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c index 3e2d4a5..8a62b6a 100644 --- a/drivers/net/dsa/b53/b53_srab.c +++ b/drivers/net/dsa/b53/b53_srab.c @@ -344,7 +344,7 @@ err: return ret; } -static struct b53_io_ops b53_srab_ops = { +static const struct b53_io_ops b53_srab_ops = { .read8 = b53_srab_read8, .read16 = b53_srab_read16, .read32 = b53_srab_read32,
RE: [RFC PATCH 2/3] net: macb: Add support for 1588 for Zynq Ultrascale+ MPSoC
Hi Nicolas, 1588 implementation in cadence GEM IP we have in Zynq Ultascale+ MPSoC is Different to the one in Zynq SOC. In earlier version, all timestamp values will be stored in registers and there is no specific Mechanism to distinguish the received ethernet frame that contains time stamp information Other than parsing the frame for PTP packet type. We have basic implementation for earlier version in our out of tree driver, which is going to be deprecated Soon. You could also check the below driver for 1588 support. https://gitenterprise.xilinx.com/Linux/linux-xlnx/blob/master/drivers/net/ethernet/xilinx/xilinx_emacps.c Regards, Punnaiah > -Original Message- > From: Nicolas Ferre [mailto:nicolas.fe...@atmel.com] > Sent: Tuesday, August 09, 2016 10:10 PM > To: Harini Katakam ; Harini Katakam > ; Andrei Pistirica > Cc: da...@davemloft.net; Boris Brezillon electrons.com>; alexandre.bell...@free-electrons.com; > netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > devicet...@vger.kernel.org; Punnaiah Choudary Kalluri > ; Michal Simek ; Anirudha > Sarangi > Subject: Re: [RFC PATCH 2/3] net: macb: Add support for 1588 for Zynq > Ultrascale+ MPSoC > > Le 21/09/2015 à 19:49, Harini Katakam a écrit : > > On Fri, Sep 11, 2015 at 1:27 PM, Harini Katakam > > wrote: > >> Cadence GEM in Zynq Ultrascale+ MPSoC supports 1588 and provides a > >> 102 bit time counter with 48 bits for seconds, 30 bits for nsecs and > >> 24 bits for sub-nsecs. The timestamp is made available to the SW through > >> registers as well as (more precisely) through upper two words in > >> an extended BD. > >> > >> This patch does the following: > >> - Adds MACB_CAPS_TSU in zynqmp_config. > >> - Registers to ptp clock framework (after checking for timestamp support > in > >> IP and capability in config). > >> - TX BD and RX BD control registers are written to populate timestamp in > >> extended BD words. > >> - Timer initialization is done by writing time of day to the timer counter. > >> - ns increment register is programmed as NS_PER_SEC/TSU_CLK. > >> For a 24 bit subns precision, the subns increment equals > >> remainder of (NS_PER_SEC/TSU_CLK) * (2^24). > >> TSU (Time stamp unit) clock is obtained by the driver from devicetree. > >> - HW time stamp capabilities are advertised via ethtool and macb ioctl is > >> updated accordingly. > >> - For all PTP event frames, nanoseconds and the lower 5 bits of seconds > are > >> obtained from the BD. This offers a precise timestamp. The upper bits > >> (which dont vary between consecutive packets) are obtained from the > >> TX/RX PTP event/PEER registers. The timestamp obtained thus is > updated > >> in skb for upper layers to access. > >> - The drivers register functions with ptp to perform time and frequency > >> adjustment. > >> - Time adjustment is done by writing to the 1558_ADJUST register. > >> The controller will read the delta in this register and update the timer > >> counter register. Alternatively, for large time offset adjustments, > >> the driver reads the secs and nsecs counter values, adds/subtracts the > >> delta and updates the timer counter. In order to be as precise as > possible, > >> nsecs counter is read again if secs has incremented during the counter > read. > >> - Frequency adjustment is not directly supported by this IP. > >> addend is the initial value ns increment and similarly addendesub. > >> The ppb (parts per billion) provided is used as > >> ns_incr = addend +/- (ppb/rate). > >> Similarly the remainder of the above is used to populate subns > increment. > >> In case the ppb requested is negative AND subns adjustment greater > than > >> the addendsub, ns_incr is reduced by 1 and subns_incr is adjusted in > >> positive accordingly. > >> > >> Signed-off-by: Harini Katakam : > >> --- > >> drivers/net/ethernet/cadence/macb.c | 372 > ++- > >> drivers/net/ethernet/cadence/macb.h | 64 ++ > >> 2 files changed, 428 insertions(+), 8 deletions(-) > >> > >> diff --git a/drivers/net/ethernet/cadence/macb.c > b/drivers/net/ethernet/cadence/macb.c > >> index bb2932c..b531008 100644 > >> --- a/drivers/net/ethernet/cadence/macb.c > >> +++ b/drivers/net/ethernet/cadence/macb.c > >> @@ -30,6 +30,8 @@ > >> #include > >> #include > > [..] > > >> + unsigned intns_incr; > >> + unsigned intsubns_incr; > >> }; > >> > >> static inline bool macb_is_gem(struct macb *bp) > >> -- > >> 1.7.9.5 > > > > Ping > > > > Thanks. > > Harini, > > I come back to this patch of last year and I'm sorry about being so late > answering you. > > Andrei who is added to the discussion will have some time to deal with > this feature and we would like to make some progress with it. He already > had some work done on his side before I recall your email. > > So, could you please re-send your original 1588 patch with Andrei in > copy so that we can all (re-)start the discu
[PATCH] xfrm: constify xfrm_replay structures
The xfrm_replay structures are never modified, so declare them as const. Done with the help of Coccinelle. Signed-off-by: Julia Lawall --- include/net/xfrm.h |2 +- net/xfrm/xfrm_replay.c |6 +++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/include/net/xfrm.h b/include/net/xfrm.h index adfebd6..d2fdd6d 100644 --- a/include/net/xfrm.h +++ b/include/net/xfrm.h @@ -187,7 +187,7 @@ struct xfrm_state { struct xfrm_replay_state_esn *preplay_esn; /* The functions for replay detection. */ - struct xfrm_replay *repl; + const struct xfrm_replay *repl; /* internal flag that only holds state for delayed aevent at the * moment diff --git a/net/xfrm/xfrm_replay.c b/net/xfrm/xfrm_replay.c index 4fd725a..cdc2e2e 100644 --- a/net/xfrm/xfrm_replay.c +++ b/net/xfrm/xfrm_replay.c @@ -558,7 +558,7 @@ static void xfrm_replay_advance_esn(struct xfrm_state *x, __be32 net_seq) x->repl->notify(x, XFRM_REPLAY_UPDATE); } -static struct xfrm_replay xfrm_replay_legacy = { +static const struct xfrm_replay xfrm_replay_legacy = { .advance= xfrm_replay_advance, .check = xfrm_replay_check, .recheck= xfrm_replay_check, @@ -566,7 +566,7 @@ static struct xfrm_replay xfrm_replay_legacy = { .overflow = xfrm_replay_overflow, }; -static struct xfrm_replay xfrm_replay_bmp = { +static const struct xfrm_replay xfrm_replay_bmp = { .advance= xfrm_replay_advance_bmp, .check = xfrm_replay_check_bmp, .recheck= xfrm_replay_check_bmp, @@ -574,7 +574,7 @@ static struct xfrm_replay xfrm_replay_bmp = { .overflow = xfrm_replay_overflow_bmp, }; -static struct xfrm_replay xfrm_replay_esn = { +static const struct xfrm_replay xfrm_replay_esn = { .advance= xfrm_replay_advance_esn, .check = xfrm_replay_check_esn, .recheck= xfrm_replay_recheck_esn,
Re: [RFC PATCH 2/3] net: macb: Add support for 1588 for Zynq Ultrascale+ MPSoC
Le 21/09/2015 à 19:49, Harini Katakam a écrit : > On Fri, Sep 11, 2015 at 1:27 PM, Harini Katakam > wrote: >> Cadence GEM in Zynq Ultrascale+ MPSoC supports 1588 and provides a >> 102 bit time counter with 48 bits for seconds, 30 bits for nsecs and >> 24 bits for sub-nsecs. The timestamp is made available to the SW through >> registers as well as (more precisely) through upper two words in >> an extended BD. >> >> This patch does the following: >> - Adds MACB_CAPS_TSU in zynqmp_config. >> - Registers to ptp clock framework (after checking for timestamp support in >> IP and capability in config). >> - TX BD and RX BD control registers are written to populate timestamp in >> extended BD words. >> - Timer initialization is done by writing time of day to the timer counter. >> - ns increment register is programmed as NS_PER_SEC/TSU_CLK. >> For a 24 bit subns precision, the subns increment equals >> remainder of (NS_PER_SEC/TSU_CLK) * (2^24). >> TSU (Time stamp unit) clock is obtained by the driver from devicetree. >> - HW time stamp capabilities are advertised via ethtool and macb ioctl is >> updated accordingly. >> - For all PTP event frames, nanoseconds and the lower 5 bits of seconds are >> obtained from the BD. This offers a precise timestamp. The upper bits >> (which dont vary between consecutive packets) are obtained from the >> TX/RX PTP event/PEER registers. The timestamp obtained thus is updated >> in skb for upper layers to access. >> - The drivers register functions with ptp to perform time and frequency >> adjustment. >> - Time adjustment is done by writing to the 1558_ADJUST register. >> The controller will read the delta in this register and update the timer >> counter register. Alternatively, for large time offset adjustments, >> the driver reads the secs and nsecs counter values, adds/subtracts the >> delta and updates the timer counter. In order to be as precise as possible, >> nsecs counter is read again if secs has incremented during the counter >> read. >> - Frequency adjustment is not directly supported by this IP. >> addend is the initial value ns increment and similarly addendesub. >> The ppb (parts per billion) provided is used as >> ns_incr = addend +/- (ppb/rate). >> Similarly the remainder of the above is used to populate subns increment. >> In case the ppb requested is negative AND subns adjustment greater than >> the addendsub, ns_incr is reduced by 1 and subns_incr is adjusted in >> positive accordingly. >> >> Signed-off-by: Harini Katakam : >> --- >> drivers/net/ethernet/cadence/macb.c | 372 >> ++- >> drivers/net/ethernet/cadence/macb.h | 64 ++ >> 2 files changed, 428 insertions(+), 8 deletions(-) >> >> diff --git a/drivers/net/ethernet/cadence/macb.c >> b/drivers/net/ethernet/cadence/macb.c >> index bb2932c..b531008 100644 >> --- a/drivers/net/ethernet/cadence/macb.c >> +++ b/drivers/net/ethernet/cadence/macb.c >> @@ -30,6 +30,8 @@ >> #include >> #include [..] >> + unsigned intns_incr; >> + unsigned intsubns_incr; >> }; >> >> static inline bool macb_is_gem(struct macb *bp) >> -- >> 1.7.9.5 > > Ping > > Thanks. Harini, I come back to this patch of last year and I'm sorry about being so late answering you. Andrei who is added to the discussion will have some time to deal with this feature and we would like to make some progress with it. He already had some work done on his side before I recall your email. So, could you please re-send your original 1588 patch with Andrei in copy so that we can all (re-)start the discussion and progress for adding this feature. We must also note that some hardware differences between our platforms may have an impact on the code and how we implement things (as highlighted on this forum: http://www.at91.com/discussions/viewtopic.php/f,12/t,25462.html). Anyway, we'll overcome this and have a widely tested solution at the end of the day! Thanks for your patience, bye! PS: for some reason, I only have this "ping" part of your email but not the original one -- Nicolas Ferre
[PATCH net 0/6] rxrpc: Miscellaneous fixes
Hi Dave, Here are a bunch of miscellaneous fixes to AF_RXRPC: (*) Fix an uninitialised pointer. (*) Fix error handling when we fail to connect a call. (*) Fix a NULL pointer dereference. (*) Fix two occasions where a packet is accessed again after being queued for someone else to deal with. (*) Fix a missing skb free. --- The patches can be found here also: http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-fixes Tagged thusly: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git rxrpc-fixes-20160809 David --- Arnd Bergmann (1): rxrpc: fix uninitialized pointer dereference in debug code David Howells (5): rxrpc: Need to flag call as being released on connect failure rxrpc: Don't access connection from call if pointer is NULL rxrpc: Once packet posted in data_ready, don't retry posting rxrpc: Fix a use-after-push in data_ready handler rxrpc: Free packets discarded in data_ready net/rxrpc/call_event.c |4 net/rxrpc/call_object.c |3 +++ net/rxrpc/input.c | 27 --- 3 files changed, 23 insertions(+), 11 deletions(-)
[PATCH net 2/6] rxrpc: Need to flag call as being released on connect failure
If rxrpc_new_client_call() fails to make a connection, the call record that it allocated needs to be marked as RXRPC_CALL_RELEASED before it is passed to rxrpc_put_call() to indicate that it no longer has any attachment to the AF_RXRPC socket. Without this, an assertion failure may occur at: net/rxrpc/call_object:635 Signed-off-by: David Howells --- net/rxrpc/call_object.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index e8c953c48cb8..ae057e0740f3 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -275,6 +275,7 @@ error: list_del_init(&call->link); write_unlock_bh(&rxrpc_call_lock); + set_bit(RXRPC_CALL_RELEASED, &call->flags); call->state = RXRPC_CALL_DEAD; rxrpc_put_call(call); _leave(" = %d", ret); @@ -287,6 +288,7 @@ error: */ found_user_ID_now_present: write_unlock(&rx->call_lock); + set_bit(RXRPC_CALL_RELEASED, &call->flags); call->state = RXRPC_CALL_DEAD; rxrpc_put_call(call); _leave(" = -EEXIST [%p]", call);
[PATCH net 1/6] rxrpc: fix uninitialized pointer dereference in debug code
From: Arnd Bergmann A newly added bugfix caused an uninitialized variable to be used for printing debug output. This is harmless as long as the debug setting is disabled, but otherwise leads to an immediate crash. gcc warns about this when -Wmaybe-uninitialized is enabled: net/rxrpc/call_object.c: In function 'rxrpc_release_call': net/rxrpc/call_object.c:496:163: error: 'sp' may be used uninitialized in this function [-Werror=maybe-uninitialized] The initialization was removed but one of the users remains. This adds back the initialization. Signed-off-by: Arnd Bergmann Fixes: 372ee16386bb ("rxrpc: Fix races between skb free, ACK generation and replying") Signed-off-by: David Howells --- net/rxrpc/call_object.c |1 + 1 file changed, 1 insertion(+) diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c index c47f14fc5e88..e8c953c48cb8 100644 --- a/net/rxrpc/call_object.c +++ b/net/rxrpc/call_object.c @@ -493,6 +493,7 @@ void rxrpc_release_call(struct rxrpc_call *call) (skb = skb_dequeue(&call->rx_oos_queue))) { spin_unlock_bh(&call->lock); + sp = rxrpc_skb(skb); _debug("- zap %s %%%u #%u", rxrpc_pkts[sp->hdr.type], sp->hdr.serial, sp->hdr.seq);