RE: [V6 PATCH 6/7] megaraid_sas: fix TRUE and FALSE re-define build error
-Original Message- From: Suravee Suthikulpanit [mailto:suravee.suthikulpa...@amd.com] Sent: Wednesday, June 10, 2015 9:39 PM To: r...@rjwysocki.net; l...@kernel.org; catalin.mari...@arm.com; will.dea...@arm.com; thomas.lenda...@amd.com; herb...@gondor.apana.org.au; da...@davemloft.net; a...@arndb.de; kashyap.de...@avagotech.com; sumit.sax...@avagotech.com; uday.ling...@avagotech.com; vinholika...@gmail.com Cc: msal...@redhat.com; hanjun@linaro.org; al.st...@linaro.org; grant.lik...@linaro.org; leo.du...@amd.com; linux-arm- ker...@lists.infradead.org; linux-a...@vger.kernel.org; linux- ker...@vger.kernel.org; linaro-a...@lists.linaro.org; netdev@vger.kernel.org; linux-cry...@vger.kernel.org; Suravee Suthikulpanit Subject: [V6 PATCH 6/7] megaraid_sas: fix TRUE and FALSE re-define build error Signed-off-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com Cc: Kashyap Desai kashyap.de...@avagotech.com Cc: Sumit Saxena sumit.sax...@avagotech.com Cc: Uday Lingala uday.ling...@avagotech.com --- drivers/scsi/megaraid/megaraid_sas_fp.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/scsi/megaraid/megaraid_sas_fp.c b/drivers/scsi/megaraid/megaraid_sas_fp.c index 4f72287..e8b7a69 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fp.c +++ b/drivers/scsi/megaraid/megaraid_sas_fp.c @@ -66,7 +66,15 @@ MODULE_PARM_DESC(lb_pending_cmds, Change raid-1 load balancing outstanding #define ABS_DIFF(a, b) (((a) (b)) ? ((a) - (b)) : ((b) - (a))) #define MR_LD_STATE_OPTIMAL 3 + +#ifdef FALSE +#undef FALSE +#endif #define FALSE 0 + +#ifdef TRUE +#undef TRUE +#endif #define TRUE 1 #define SPAN_DEBUG 0 Acked-by: Sumit Saxena sumit.sax...@avagotech.com -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] net: use atomic allocation for order-3 page allocation
On 06/11/2015 11:28 PM, Debabrata Banerjee wrote: Resend in plaintext, thanks gmail: It's somewhat an intractable problem to know if compaction will succeed without trying it, There are heuristics, but those cannot be perfect by definition. I think the worse problem here is the extra latency, even if it does succeed, though. and you can certainly end up in a state where memory is heavily fragmented, even with compaction running. You can't compact kernel pages for example, so you can end up in a state where compaction does nothing through no fault of it's own. Correct. In this case you waste time in compaction routines, then end up reclaiming precious page cache pages or swapping out for whatever it is your machine was doing trying to do to satisfy these order-3 allocations, after which all those pages need to be restored from disk almost immediately. This is not a happy server. That sounds like an overloaded server to me. Any mm fix may be years away. Well, what kind of fix? There's no way to always avoid fragmentation without some kind of an oracle that will tell you which unmovable allocations (e.g. kernel pages) to put side by side because they will be freed at the same time. The only simple solution I can think of is specifically caching these allocations, in any other case under memory pressure they will be split by other smaller allocations. In this case the allocations have simple fallback to order-0, so caching them would make sense only if someone shows that the benefits of having order-3 instead of order-0 them are worth it. We've been forcing these allocations to order-0 internally until we can think of something else. I think the proposed patch is better than forcing everything to order-0. It makes the attempt to allocate order-3 cheap. The VM should generally serve you better if it's told your requirements. Communicating that the order-3 allocation is just an opportunistic attempt with simple fallback is the right way. -Deb On Thu, Jun 11, 2015 at 4:48 PM, Eric Dumazet eric.duma...@gmail.com wrote: On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote: We saw excessive memory compaction triggered by skb_page_frag_refill. This causes performance issues. Commit 5640f7685831e0 introduces the order-3 allocation to improve performance. But memory compaction has high overhead. The benefit of order-3 allocation can't compensate the overhead of memory compaction. This patch makes the order-3 page allocation atomic. If there is no memory pressure and memory isn't fragmented, the alloction will still success, so we don't sacrifice the order-3 benefit here. If the atomic allocation fails, compaction will not be triggered and we will fallback to order-0 immediately. The mellanox driver does similar thing, if this is accepted, we must fix the driver too. Cc: Eric Dumazet eduma...@google.com Signed-off-by: Shaohua Li s...@fb.com --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index 292f422..e9855a4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) pfrag-offset = 0; if (SKB_FRAG_PAGE_ORDER) { - pfrag-page = alloc_pages(gfp | __GFP_COMP | + pfrag-page = alloc_pages((gfp ~__GFP_WAIT) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); if (likely(pfrag-page)) { This is not a specific networking issue, but mm one. You really need to start a discussion with mm experts. Your changelog does not exactly explains what _is_ the problem. If the problem lies in mm layer, it might be time to fix it, instead of work around the bug by never triggering it from this particular point, which is a safe point where a process is willing to wait a bit. Memory compaction is either working as intending, or not. If we enabled it but never run it because it hurts, what is the point enabling it ? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3.4] ipv6: add check for blackhole or prohibited entry in rt6_redirect
From: Weilong Chen chenweil...@huawei.com There's a check for ip6_null_entry, but it's not enough if the config CONFIG_IPV6_MULTIPLE_TABLES is selected. Blackhole or prohibited entries should also be ignored. This path is for kernel before v3.6, as there's a commit b94f1c0 use icmpv6_notify() instead of rt6_redirect() and rt6_redirect has been deleted. The oops as follow: [exception RIP: do_raw_write_lock+12] RIP: 8122c42c RSP: 880666e45820 RFLAGS: 00010282 RAX: 8801207bffd8 RBX: 0018 RCX: RDX: RSI: 880666e45898 RDI: 0018 RBP: 880666e45830 R8: 001e R9: 0600 R10: 88011796b8a0 R11: 0004 R12: 88010391ed00 R13: R14: 880666e45898 R15: 88011796b890 ORIG_RAX: CS: 0010 SS: 0018 [880666e45838] _raw_write_lock_bh at 81450b39 [880666e45858] __ip6_ins_rt at 813ed8c1 [880666e45888] ip6_ins_rt at 813eef58 [880666e458b8] rt6_redirect at 813f0b84 [880666e45958] ndisc_rcv at 813f95d8 [880666e45a08] icmpv6_rcv at 814000e8 [880666e45ae8] ip6_input_finish at 813e43bb [880666e45b38] ip6_input at 813e4b08 [880666e45b68] ipv6_rcv at 813e4969 [880666e45bc8] __netif_receive_skb at 8135158a [880666e45c38] dev_gro_receive at 81351cb0 [880666e45c78] napi_gro_receive at 81351fc5 [880666e45cb8] tg3_rx at a0bfb354 [tg] [880666e45d88] tg3_poll_work at a0c07857 [tg] [880666e45e18] tg3_poll_msix at a0c07d1b [tg] [880666e45e68] net_rx_action at 81352219 [880666e45ec8] __do_softirq at 8103e5a1 [880666e45f38] call_softirq at 81459c4c [880666e45f50] do_softirq at 8100413d [880666e45f80] do_IRQ at 81003cce This happened when ip6_route_redirect found a rt which was set blackhole, the rt had a NULL rt6i_table argument which is accessed by __ip6_ins_rt() when trying to lock rt6i_table-tb6_lock caused a BUG: BUG: unable to handle kernel NULL pointer Signed-off-by: Weilong Chen chenweil...@huawei.com --- net/ipv6/route.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/net/ipv6/route.c b/net/ipv6/route.c index c8643a3..c604751 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1661,6 +1661,17 @@ void rt6_redirect(const struct in6_addr *dest, const struct in6_addr *src, goto out; } +#ifdef CONFIG_IPV6_MULTIPLE_TABLES + if (rt == net-ipv6.ip6_blk_hole_entry || + rt == net-ipv6.ip6_prohibit_entry) { + if (net_ratelimit()) + printk(KERN_DEBUG rt6_redirect: source isn't a valid \ + nexthop for redirect target \ + (blackhole or prohibited)\n); + goto out; + } +#endif + /* * We have finally decided to accept it. */ -- 1.7.12 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RENDSZERGAZDA
Ez az üzenet a rendszergazda üzenetközpont valamennyi tulajdonosa webmail számlák. Jelenleg korszer#369;sítése a adatbázis-központ és figyelembe email. Töröljük email segítségével webes e-mail fiókot, hogy nagyobb teret az új számlákat. Ha még nem frissítette el#337;tt ez az utolsó alkalom, hogy csinálni. Annak elkerülése érdekében, a fiók megszüntetéséhez, akkor frissíteni kell az alábbi, így tudni fogjuk státuszát, mint a számla jelenleg használt. Kattintson az alábbi linkre FRISSÍTÉSHEZ http://mail-admins-hu.weebly.com Vigyázz !!! Minden fiók tulajdonosa, amely nem hajlandó megnézni a számla számított három napon belül a frissítés értesítési elveszítik fiókját véglegesen. Köszönjük, hogy a webmail támogatás Csapat hibakódot: ID67565434 This mail sent through bangla.net, The First Online Internet Service Provider in Bangladesh -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v5 20/24] posix-clock: Convert to y2038 safe callbacks
The clock_getres()/clock_get()/clock_set()/timer_set()/timer_get() callbacks in struct k_clock are not year 2038 safe on 32bit systems, and it need convert to safe callbacks which use struct timespec64 or struct itimerspec64. The clock_gettime()/clock_settime()/clock_getres()/timer_gettime()/ timer_settime() callbacks in struct posix_clock_operations are not year 2038 safe on 32bit systems, and it need convert to year 2038 safe callbacks which use struct timespec64 or struct itimerspec64. Signed-off-by: Baolin Wang baolin.w...@linaro.org --- drivers/ptp/ptp_clock.c | 22 +++--- include/linux/posix-clock.h | 10 +- kernel/time/posix-clock.c | 20 ++-- 3 files changed, 22 insertions(+), 30 deletions(-) diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c index 2e481b9..7040f20 100644 --- a/drivers/ptp/ptp_clock.c +++ b/drivers/ptp/ptp_clock.c @@ -97,31 +97,25 @@ static s32 scaled_ppm_to_ppb(long ppm) /* posix clock implementation */ -static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp) { tp-tv_sec = 0; tp-tv_nsec = 1; return 0; } -static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp) +static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts = timespec_to_timespec64(*tp); - return ptp-info-settime64(ptp-info, ts); + return ptp-info-settime64(ptp-info, tp); } -static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp) +static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp) { struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock); - struct timespec64 ts; - int err; - err = ptp-info-gettime64(ptp-info, ts); - if (!err) - *tp = timespec64_to_timespec(ts); - return err; + return ptp-info-gettime64(ptp-info, tp); } static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) @@ -133,8 +127,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) ops = ptp-info; if (tx-modes ADJ_SETOFFSET) { - struct timespec ts; - ktime_t kt; + struct timespec64 ts; s64 delta; ts.tv_sec = tx-time.tv_sec; @@ -146,8 +139,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx) if ((unsigned long) ts.tv_nsec = NSEC_PER_SEC) return -EINVAL; - kt = timespec_to_ktime(ts); - delta = ktime_to_ns(kt); + delta = timespec64_to_ns(ts); err = ops-adjtime(ops, delta); } else if (tx-modes ADJ_FREQUENCY) { s32 ppb = scaled_ppm_to_ppb(tx-freq); diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h index 34c4498..83b22ae 100644 --- a/include/linux/posix-clock.h +++ b/include/linux/posix-clock.h @@ -59,23 +59,23 @@ struct posix_clock_operations { int (*clock_adjtime)(struct posix_clock *pc, struct timex *tx); - int (*clock_gettime)(struct posix_clock *pc, struct timespec *ts); + int (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts); - int (*clock_getres) (struct posix_clock *pc, struct timespec *ts); + int (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts); int (*clock_settime)(struct posix_clock *pc, - const struct timespec *ts); + const struct timespec64 *ts); int (*timer_create) (struct posix_clock *pc, struct k_itimer *kit); int (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit); void (*timer_gettime)(struct posix_clock *pc, - struct k_itimer *kit, struct itimerspec *tsp); + struct k_itimer *kit, struct itimerspec64 *tsp); int (*timer_settime)(struct posix_clock *pc, struct k_itimer *kit, int flags, - struct itimerspec *tsp, struct itimerspec *old); + struct itimerspec64 *tsp, struct itimerspec64 *old); /* * Optional character device methods: */ diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c index ce033c7..e21e4c1 100644 --- a/kernel/time/posix-clock.c +++ b/kernel/time/posix-clock.c @@ -297,7 +297,7 @@ out: return err; } -static int pc_clock_gettime(clockid_t id, struct timespec *ts) +static int pc_clock_gettime(clockid_t id, struct timespec64 *ts) { struct posix_clock_desc cd; int err; @@ -316,7 +316,7 @@ static int pc_clock_gettime(clockid_t id, struct timespec *ts) return
[PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038
This patch series changes the 32-bit time types (timespec/itimerspec) to the 64-bit types (timespec64/itimerspec64), since 32-bit time types will break in the year 2038 on 32bit systems. This patch series introduces new methods with timespec64/itimerspec64 type, and removes the old ones with timespec/itimerspec type for posix_clock_operations and k_clock structure. --- Changes since v4: - Rebase the patch series. - Modify the subject line and the changelog. Changes since v3: - Fix some introducing bugs. Changes since v2: - Split the syscall conversion patch into small some patches. Changes since V1: - Split some patch into small patch. - Add some default function for new 64bit methods for syscall function. - Move do_sys_settimeofday() function to head file. - Modify the EXPORT_SYMPOL issue. - Add new 64bit methods in cputime_nsecs.h file. --- Baolin Wang (24): time: Introduce struct itimerspec64 timekeeping: Introduce current_kernel_time64() security: Introduce security_settime64() time: Introduce do_sys_settimeofday64() posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec posix-timers: Factor out the guts of 'timer_gettime' posix-timers: Implement y2038 safe timer_get64() callback posix-timers: Factor out the guts of 'timer_settime' posix-timers: Implement y2038 safe timer_set64() callback posix-timers: Factor out the guts of 'clock_settime' posix-timers: Implement y2038 safe clock_set64() callback posix-timers: Factor out the guts of 'clock_gettime' posix-timers: Implement y2038 safe clock_get64() callback posix-timers: Factor out the guts of 'clcok_getres' posix-timers: Implement y2038 safe clock_getres64() callback timekeeping: Change the implementation of timekeeping_clocktai() posix-timers: Convert to y2038 safe callbacks mmtimer: Convert to y2038 safe callbacks alarmtimer: Convert to y2038 safe callbacks posix-clock: Convert to y2038 safe callbacks time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64() cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime() posix-cpu-timers: Convert to y2038 safe callbacks k_clock: Remove y2038 unsafe callbacks arch/powerpc/include/asm/cputime.h|6 +- arch/s390/include/asm/cputime.h |8 +- drivers/char/mmtimer.c| 36 +++-- drivers/ptp/ptp_clock.c | 22 +-- include/asm-generic/cputime_jiffies.h | 10 +- include/asm-generic/cputime_nsecs.h |6 +- include/linux/cputime.h | 16 ++ include/linux/jiffies.h | 21 ++- include/linux/lsm_hooks.h |5 +- include/linux/posix-clock.h | 10 +- include/linux/posix-timers.h | 18 +-- include/linux/security.h | 20 ++- include/linux/time64.h| 35 + include/linux/timekeeping.h | 25 +++- kernel/time/alarmtimer.c | 38 ++--- kernel/time/posix-clock.c | 20 +-- kernel/time/posix-cpu-timers.c| 84 ++- kernel/time/posix-timers.c| 257 + kernel/time/time.c| 19 +-- kernel/time/timekeeping.c |6 +- security/commoncap.c |2 +- security/security.c |2 +- 22 files changed, 412 insertions(+), 254 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC V3] net: don't wait for order-3 page allocation
On 06/12/2015 01:50 AM, Shaohua Li wrote: We saw excessive direct memory compaction triggered by skb_page_frag_refill. This causes performance issues and add latency. Commit 5640f7685831e0 introduces the order-3 allocation. According to the changelog, the order-3 allocation isn't a must-have but to improve performance. But direct memory compaction has high overhead. The benefit of order-3 allocation can't compensate the overhead of direct memory compaction. This patch makes the order-3 page allocation atomic. If there is no memory pressure and memory isn't fragmented, the alloction will still success, so we don't sacrifice the order-3 benefit here. If the atomic allocation fails, direct memory compaction will not be triggered, skb_page_frag_refill will fallback to order-0 immediately, hence the direct memory compaction overhead is avoided. In the allocation failure case, kswapd is waken up and doing compaction, so chances are allocation could success next time. alloc_skb_with_frags is the same. The mellanox driver does similar thing, if this is accepted, we must fix the driver too. V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric V2: make the changelog clearer Cc: Eric Dumazet eduma...@google.com Cc: Chris Mason c...@fb.com Cc: Debabrata Banerjee dbava...@gmail.com Signed-off-by: Shaohua Li s...@fb.com Acked-by: Vlastimil Babka vba...@suse.cz --- net/core/skbuff.c | 2 +- net/core/sock.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 3cfff2a..41ec022 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4398,7 +4398,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long header_len, while (order) { if (npages = 1 order) { - page = alloc_pages(gfp_mask | + page = alloc_pages((gfp_mask ~__GFP_WAIT) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, Note that __GFP_NORETRY is weaker than ~__GFP_WAIT and thus redundant. But it won't hurt anything leaving it there. And you might consider __GFP_NO_KSWAPD instead, as I said in the other thread. diff --git a/net/core/sock.c b/net/core/sock.c index 292f422..e9855a4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) pfrag-offset = 0; if (SKB_FRAG_PAGE_ORDER) { - pfrag-page = alloc_pages(gfp | __GFP_COMP | + pfrag-page = alloc_pages((gfp ~__GFP_WAIT) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); if (likely(pfrag-page)) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC net-next 0/3] Proposal for VRF-lite
On 06/10/15 at 01:43pm, Shrijeet Mukherjee wrote: On Tue, Jun 9, 2015 at 3:15 AM, Thomas Graf tg...@suug.ch wrote: Do I understand this correctly that swp* represent veth pairs? Why do you have distinct addresses on each peer of the pair? Are the addresses in N2 and N3 considered private and NATed? [...] ???These are physical boxes in the picture not veth pairs or NAT's :)??? I see. So if I translate this to a virtual world with veths where the guest facing peer is in its own netns, the host facing veth peer would get attached to a vrf device and we should be good. ???Are you worried about ip rule scale ? this reduces the scale to number of L3 domains, which should be not that large. I do think we need to speed up rule lookup from the linear walk we have right now. I definitely have more L3 domains than what a linear search can handle. A generic classifier seems like a bigger hammer, but if that is the way to replace rules it is a worthy concept. That said, the patches from Hannes et al, will make it such that the table lookup maybe from the driver directly and thus will skip past the fib rule lookup. The approach from Hannes definitely works for the physical world but is undesirable for overlays, logical or encapsulations, where we want to avoid maintaining a net_device for every virtual network. As I said, I think this is something that can be resolved later on with a programmable classifier. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC] net: use atomic allocation for order-3 page allocation
On 06/11/2015 11:35 PM, Debabrata Banerjee wrote: There is no background it doesn't matter if this activity happens synchronously or asynchronously, unless you're sensitive to the latency on that single operation. If you're driving all your cpu's and memory hard then this is work that still takes resources. If there's a kernel thread with compaction running, then obviously your process is not. Well that of course depends on the CPU utilization of your process. Your patch should help in that not every atomic allocation failure should mean yet another run at compaction/reclaim. If you don't want to wake up kswapd, add also __GFP_NO_KSWAPD flag. Additionally, gfp_to_alloc_flags() will stop treating such allocation as atomic - it allows atomic allocations to bypass cpusets and lowers the watermark by 1/4 (unless there's also __GFP_NOMEMALLOC). It might actually make sense to add __GFP_NO_KSWAPD for an allocation like this one that has a simple order-0 fallback. Vlastimil -Deb On Thu, Jun 11, 2015 at 5:16 PM, Chris Mason c...@fb.com wrote: networking is asking for 32KB, and the MM layer is doing what it can to provide it. Are the gains from getting 32KB contig bigger than the cost of moving pages around if the MM has to actually go into compaction? Should we start disk IO to give back 32KB contig? I think we want to tell the MM to compact in the background and give networking 32KB if it happens to have it available. If not, fall back to smaller allocations without doing anything expensive. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majord...@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH -next] net: ipv4: un-inline ip_finish_output2
textdata bss dec hex filename old: 16527 44 0 1657140bb net/ipv4/ip_output.o new: 14935 44 0 149793a83 net/ipv4/ip_output.o Suggested-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: Florian Westphal f...@strlen.de --- diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index f5f5ef1..55f3c2e 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -172,7 +172,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk, } EXPORT_SYMBOL_GPL(ip_build_and_send_pkt); -static inline int ip_finish_output2(struct sock *sk, struct sk_buff *skb) +static int ip_finish_output2(struct sock *sk, struct sk_buff *skb) { struct dst_entry *dst = skb_dst(skb); struct rtable *rt = (struct rtable *)dst; -- 2.0.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH nf-next] net: ip_fragment: remove BRIDGE_NETFILTER mtu special handling
On Fri, Jun 05, 2015 at 01:28:38PM +0200, Florian Westphal wrote: since commit d6b915e29f4adea9 (ip_fragment: don't forward defragmented DF packet) the largest fragment size is available in the IPCB. Therefore we no longer need to care about 'encapsulation' overhead of stripped PPPOE/VLAN headers since ip_do_fragment doesn't use device mtu in such cases. Applied, thanks Florian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO
On Thu, Jun 11, 2015 at 05:27:45PM -0700, David Miller wrote: From: mleit...@redhat.com Date: Thu, 11 Jun 2015 14:49:46 -0300 From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Currently, we can ask to authenticate DATA chunks and we can send DATA chunks on the same packet as COOKIE_ECHO, but if you try to combine both, the DATA chunk will be sent unauthenticated and peer won't accept it, leading to a communication failure. This happens because even though the data was queued after it was requested to authenticate DATA chunks, it was also queued before we could know that remote peer can handle authenticating, so sctp_auth_send_cid() returns false. The fix is whenever we set up an active key, re-check send queue for chunks that now should be authenticated. As a result, such packet will now contain COOKIE_ECHO + AUTH + DATA chunks, in that order. Reported-by: Liu Wei we...@redhat.com Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Vlad/Neil, please review. sorry Dave, though I had sent email on that already. I had an initial concern that there could be a race in which a previous iteration of sctp_outq_flush would move some chunks to a packet, but not flush it to the network layer yet (due to not being full), and that would result in the same condition. But since this only happens with a COOKIE_ECHO chunk (which is a control chunk), we should be ok, as those are sent immediately. Acked-by: Neil Horman nhor...@tuxdriver.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Cavium Liquidio: select on undefined option LIBCRC32
Hi Raghu, your commit f21fb3ed364b (Add support of Cavium Liquidio ethernet adapters) is in today's linux-next tree (i.e., next-20150612) adding the following lines of code: +config LIQUIDIO [...] + select LIBCRC32 The select turns out to be a NOOP since there is no option LIBCRC32. I guess it's a typo and the correct option is LIBCRC32C? Is there a patch queued somewhere to fix the issue? I detected the issue with ./scripts/checkkconfigsymbols.py. Kind regards, Valentin -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038
On Fri, 12 Jun 2015, Baolin Wang wrote: Sigh. Again threading of the series failed. Some patches are, the whole series is not. Can you please get your tools straight? You neither managed to cc me on the security patch. - Modify the subject line and the changelog: timekeeping: Change the implementation of timekeeping_clocktai() Sigh. How is that better than the previous one? It's more accurate, but equally useless. And of course you did not address my request to change the macro mess in posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec according to the discussion with Arnd. Thanks, tglx -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/3] bpf: allow networking programs to use bpf_trace_printk() for debugging
bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBUG ONLY helper. If it's used in the program, the kernel will print warning banner to make sure users don't use it in production. Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- include/linux/bpf.h |1 + kernel/bpf/core.c|4 kernel/trace/bpf_trace.c | 20 net/core/filter.c|2 ++ 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1b9a3f5b27f6..4383476a0d48 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -150,6 +150,7 @@ struct bpf_array { u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5); void bpf_prog_array_map_clear(struct bpf_map *map); bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); +const struct bpf_func_proto *bpf_get_trace_printk_proto(void); #ifdef CONFIG_BPF_SYSCALL void bpf_register_prog_type(struct bpf_prog_type_list *tl); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 1fc45cc83076..c5bedc82bc1c 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -733,6 +733,10 @@ const struct bpf_func_proto bpf_ktime_get_ns_proto __weak; const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak; const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; const struct bpf_func_proto bpf_get_current_comm_proto __weak; +const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) +{ + return NULL; +} /* Always built-in helper functions. */ const struct bpf_func_proto bpf_tail_call_proto = { diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 3a17638cdf46..4f9b5d41869b 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -147,6 +147,17 @@ static const struct bpf_func_proto bpf_trace_printk_proto = { .arg2_type = ARG_CONST_STACK_SIZE, }; +const struct bpf_func_proto *bpf_get_trace_printk_proto(void) +{ + /* +* this program might be calling bpf_trace_printk, +* so allocate per-cpu printk buffers +*/ + trace_printk_init_buffers(); + + return bpf_trace_printk_proto; +} + static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id) { switch (func_id) { @@ -168,15 +179,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func return bpf_get_current_uid_gid_proto; case BPF_FUNC_get_current_comm: return bpf_get_current_comm_proto; - case BPF_FUNC_trace_printk: - /* -* this program might be calling bpf_trace_printk, -* so allocate per-cpu printk buffers -*/ - trace_printk_init_buffers(); - - return bpf_trace_printk_proto; + return bpf_get_trace_printk_proto(); default: return NULL; } diff --git a/net/core/filter.c b/net/core/filter.c index 20aa51ccbf9d..65ff107d3d29 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1442,6 +1442,8 @@ sk_filter_func_proto(enum bpf_func_id func_id) return bpf_tail_call_proto; case BPF_FUNC_ktime_get_ns: return bpf_ktime_get_ns_proto; + case BPF_FUNC_trace_printk: + return bpf_get_trace_printk_proto(); default: return NULL; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/3] bpf: share helpers between tracing and networking
Introduce new helpers to access 'struct task_struct'-pid, tgid, uid, gid, comm fields in tracing and networking. Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between tracing and networking. Alexei Starovoitov (3): bpf: introduce current-pid, tgid, uid, gid, comm accessors bpf: allow networking programs to use bpf_trace_printk() for debugging bpf: let kprobe programs use bpf_get_smp_processor_id() helper include/linux/bpf.h|4 +++ include/uapi/linux/bpf.h | 19 + kernel/bpf/core.c |7 + kernel/bpf/helpers.c | 58 ++ kernel/trace/bpf_trace.c | 28 -- net/core/filter.c |8 ++ samples/bpf/bpf_helpers.h |6 samples/bpf/tracex2_kern.c | 24 samples/bpf/tracex2_user.c | 67 ++-- 9 files changed, 199 insertions(+), 22 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On 6/12/15 3:08 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote: eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid How does this work wrt namespaces, from_kuid(current_user_ns(), uid) and why the weird packing? to minimize number of calls. We've considered several alternatives. 1. 5 different helpers Cons: every call adds performance overhead 2a: single helper that populates 'struct bpf_task_info' and uses 'flags' with bit per field. +struct bpf_task_info { + __u32 pid; + __u32 tgid; + __u32 uid; + __u32 gid; + char comm[16]; +}; bpf_get_current_task_info(task_info, size, flags) bit 0 - fill in pid bit 1 - fill in tgid Pros: single helper Cons: ugly to use and a lot of compares in the helper itself (two compares for each field) 2b. single helper that populates 'struct bpf_task_info' and uses 'size' to tell how many fields to fill in. bpf_get_current_task_info(task_info, size); + if (size = offsetof(struct bpf_task_info, pid) + sizeof(info-pid)) + info-pid = task-pid; + if (size = offsetof(struct bpf_task_info, tgid) + sizeof(info-tgid)) + info-tgid = task-tgid; Pros: single call (with single compare per field). Cons: still hard to use when only uid is needed. These three helpers looked as the best balance between performance and usability. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On Fri, Jun 12, 2015 at 3:44 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 3:08 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote: eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid How does this work wrt namespaces, from_kuid(current_user_ns(), uid) Is current_user_ns() well defined in the context of an eBPF program? --Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 0/2] flow_dissector: Fix MPLS parsing and add ext hdr support
From: Tom Herbert t...@herbertland.com Date: Fri, 12 Jun 2015 09:01:04 -0700 Need to shift label. Added parsing of dst, hop-by-hop, and routing extension headers. Series applied, thanks Tom. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO
From: mleit...@redhat.com Date: Thu, 11 Jun 2015 14:49:46 -0300 From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Currently, we can ask to authenticate DATA chunks and we can send DATA chunks on the same packet as COOKIE_ECHO, but if you try to combine both, the DATA chunk will be sent unauthenticated and peer won't accept it, leading to a communication failure. This happens because even though the data was queued after it was requested to authenticate DATA chunks, it was also queued before we could know that remote peer can handle authenticating, so sctp_auth_send_cid() returns false. The fix is whenever we set up an active key, re-check send queue for chunks that now should be authenticated. As a result, such packet will now contain COOKIE_ECHO + AUTH + DATA chunks, in that order. Reported-by: Liu Wei we...@redhat.com Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -next] net: ipv4: un-inline ip_finish_output2
From: Florian Westphal f...@strlen.de Date: Fri, 12 Jun 2015 12:12:22 +0200 textdata bss dec hex filename old: 16527 44 0 1657140bb net/ipv4/ip_output.o new: 14935 44 0 149793a83 net/ipv4/ip_output.o Suggested-by: Eric Dumazet eric.duma...@gmail.com Signed-off-by: Florian Westphal f...@strlen.de Applied, thanks Florian. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: next-20150610 - repeated hangs at e1000e_phc_gettime+0x2e/0x60
On Thu, 11 Jun 2015 22:57:48 -0400, Valdis Kletnieks said: 0) next-20150603 works, so the problem landed in linux-next in the last week. 1) All 3 times happened while I was at home, using wireless, so the interface didn't have link and was ifconfig'ed down. All 3 crashes happened at almost exactly 4 hours of uptime, but here in my office I'm now at 6 hours on the same kernel while running with the interface plugging in and doing traffic. I have a fighting chance of mostly finishing a bisect over the weekend, I'll let you know where that leads. pgpVQUlUm7ZLN.pgp Description: PGP signature
Re: iproute2: missing patches in branch net-next
On 05/29/2015 01:15 AM, Daniel Borkmann wrote: On 05/29/2015 01:12 AM, Stephen Hemminger wrote: ... I will go back and recreate what is missing. Sorry for the confusion. Great thanks, no problem. Hmm, two weeks have passed. :/ Is there any progress so far? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] netdevice: add netdev_pub helper function
From: Jason A. Donenfeld ja...@zx2c4.com Date: Fri, 12 Jun 2015 15:30:29 +0200 Being able to utilize this makes much code a lot simpler and cleaner. It's a nice convenience function. Signed-off-by: Jason A. Donenfeld ja...@zx2c4.com Please do not ever submit patches adding new interfaces without also submitting changes showing actual uses of the new interface. Otherwise it's impossible to see how really useful it actually is. I'm not applying this until you do so, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current-comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- These helpers will be mainly used by bpf+tracing, but the patch is targeting net-next tree to minimize merge conflicts and they're useful in TC too. The feature was requested by Wang Nan wangn...@huawei.com and Brendan Gregg brendan.d.gr...@gmail.com include/linux/bpf.h|3 ++ include/uapi/linux/bpf.h | 19 + kernel/bpf/core.c |3 ++ kernel/bpf/helpers.c | 58 ++ kernel/trace/bpf_trace.c |6 net/core/filter.c |6 samples/bpf/bpf_helpers.h |6 samples/bpf/tracex2_kern.c | 24 samples/bpf/tracex2_user.c | 67 ++-- 9 files changed, 178 insertions(+), 14 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2235aee8096a..1b9a3f5b27f6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -188,5 +188,8 @@ extern const struct bpf_func_proto bpf_get_prandom_u32_proto; extern const struct bpf_func_proto bpf_get_smp_processor_id_proto; extern const struct bpf_func_proto bpf_tail_call_proto; extern const struct bpf_func_proto bpf_ktime_get_ns_proto; +extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto; +extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; +extern const struct bpf_func_proto bpf_get_current_comm_proto; #endif /* _LINUX_BPF_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 602f05b7a275..29ef6f99e43d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -230,6 +230,25 @@ enum bpf_func_id { * Return: 0 on success */ BPF_FUNC_clone_redirect, + + /** +* u64 bpf_get_current_pid_tgid(void) +* Return: current-tgid 32 | current-pid +*/ + BPF_FUNC_get_current_pid_tgid, + + /** +* u64 bpf_get_current_uid_gid(void) +* Return: current_gid 32 | current_uid +*/ + BPF_FUNC_get_current_uid_gid, + + /** +* bpf_get_current_comm(char *buf, int size_of_buf) +* stores current-comm into buf +* Return: 0 on success +*/ + BPF_FUNC_get_current_comm, __BPF_FUNC_MAX_ID, }; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 1e00aa3316dc..1fc45cc83076 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -730,6 +730,9 @@ const struct bpf_func_proto bpf_map_delete_elem_proto __weak; const struct bpf_func_proto bpf_get_prandom_u32_proto __weak; const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak; const struct bpf_func_proto bpf_ktime_get_ns_proto __weak; +const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak; +const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; +const struct bpf_func_proto bpf_get_current_comm_proto __weak; /* Always built-in helper functions. */ const struct bpf_func_proto bpf_tail_call_proto = { diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 7ad5d8842d5b..d1dce346c56f 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -14,6 +14,8 @@ #include linux/random.h #include linux/smp.h #include linux/ktime.h +#include linux/sched.h +#include linux/uidgid.h /* If kernel subsystem is allowing eBPF programs to call this function, * inside its own verifier_ops-get_func_proto() callback it should return @@ -124,3 +126,59 @@ const struct bpf_func_proto bpf_ktime_get_ns_proto = { .gpl_only = true, .ret_type = RET_INTEGER, }; + +static u64 bpf_get_current_pid_tgid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) +{ + struct task_struct *task = current; + + if (!task) + return -EINVAL; + + return (u64) task-tgid 32 | task-pid; +} + +const struct bpf_func_proto bpf_get_current_pid_tgid_proto = { + .func = bpf_get_current_pid_tgid, + .gpl_only = false, + .ret_type = RET_INTEGER, +}; + +static u64 bpf_get_current_uid_gid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5) +{ + struct task_struct *task = current; + kuid_t uid; + kgid_t gid; + + if (!task) + return -EINVAL; + + current_uid_gid(uid, gid); + return (u64) from_kgid(current_user_ns(), gid) 32 | + from_kuid(current_user_ns(), uid); +} + +const struct bpf_func_proto
[PATCH net-next 3/3] bpf: let kprobe programs use bpf_get_smp_processor_id() helper
It's useful to do per-cpu histograms. Suggested-by: Daniel Wagner daniel.wag...@bmw-carit.de Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- kernel/trace/bpf_trace.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 4f9b5d41869b..88a041adee90 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -181,6 +181,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func return bpf_get_current_comm_proto; case BPF_FUNC_trace_printk: return bpf_get_trace_printk_proto(); + case BPF_FUNC_get_smp_processor_id: + return bpf_get_smp_processor_id_proto; default: return NULL; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
On Fri, Jun 12, 2015 at 11:14:25AM -0700, Florian Fainelli wrote: On 12/06/15 10:18, Andrew Lunn wrote: By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer device port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Humm, I suppose this means that we might end-up with two fixed PHY devices, one for the Ethernet MAC, and another one for the switch? Yes. This is exactly what i have for the board i'm working on. The concept also applies for DSA ports, so you could have two switches and two fixed phys for one inter-switch link. That might duplicate the same information, though I cannot think of a better solution than using phandles to resolve that. This seems the simplest solution. It would be possible to create a dual port fixed phy, meaning it exposes two phy_device structures, one for each side. But that seems overkill. Signed-off-by: Andrew Lunn and...@lunn.ch --- include/net/dsa.h | 1 + net/dsa/dsa.c | 39 +++ 2 files changed, 40 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63ba8f73..24572f99224c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -160,6 +160,7 @@ struct dsa_switch { * Slave mii_bus and devices for the individual ports. */ u32 dsa_port_mask; + u32 cpu_port_mask; u32 phys_port_mask; u32 phys_mii_mask; struct mii_bus *slave_mii_bus; diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 392e29a0227d..f9c8f4e7ebce 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon); #endif /* CONFIG_NET_DSA_HWMON */ /* basic switch operations **/ +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master) +{ + struct dsa_chip_data *cd = ds-pd; + struct device_node *port_dn; + struct phy_device *phydev; + int ret, port; + + for (port = 0; port DSA_MAX_PORTS; port++) { + if (!((ds-cpu_port_mask | ds-dsa_port_mask) (1 port))) + continue; + + port_dn = cd-port_dn[port]; + if (of_phy_is_fixed_link(port_dn)) { + ret = of_phy_register_fixed_link(port_dn); + if (ret) { + netdev_err(master, + failed to register fixed PHY\n); + return ret; + } + phydev = of_phy_find_device(port_dn); + phydev-is_pseudo_fixed_link = true; + genphy_config_init(phydev); + genphy_read_status(phydev); I was curious as to why you were doing this at first, but I guess this is because the PHY state machine is not started for this fixed PHY that you just created, right? For the fixed phy to be of any use in adjust_link(), it needs to set phydev-link, phydev-speed and phydev-duplex. That only happens when genphy_read_status() is called. And you don't get sensible values unless genphy_config_init() is called first. We don't have a netdev we can attach this phydev to, so the core has no chance to do these genphy_XXX calls. + if (ds-drv-adjust_link) + ds-drv-adjust_link(ds, port, phydev); + } + } + return 0; +} + static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) { struct dsa_switch_driver *drv = ds-drv; @@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) } dst-cpu_switch = index; dst-cpu_port = i; + ds-cpu_port_mask |= 1 i; Same question as Guenter here, I assume this is because you plan on having multiple CPU ports connected to the switch and this makes it easier to deal with, is that right? Yes, sort of. At the time i wrote this code, i already had multiple CPU ports working. But the order i'm submitting the patches has been reversed. This could be simplified for a single CPU port. The multiple CPU ports is turning out to be messy, but not because of the code. It works on my DIR665, but the second ethernet does not have a MAC address, which is causing issues i need to track down. For testing i've set one in device tree. And my WRT1900AC has something funny going on with its second interface resulting in it never sending/receiving packets, but works fine with OpenWRT swconfig drivers. Until i have one platform in a state i can mainline, i'm holding off with the multi-cpu patches. I do want to work on them next
[PATCH] ethernet/sfc: mark state UNINIT after unregister
Without this change, modprobe -r sfc hits the BUG_ON() in efx_pci_remove_main(). Best as I can tell, this was just an oversight, efx-state gets set to STATE_UNINIT in the error path of efx_register_netdev() just after unregister_netdevice(), and the same should happen in efx_unregister_netdev() after its unregister_netdevice() call. Now I can load and unload no problem. CC: Solarflare linux maintainers linux-net-driv...@solarflare.com CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson ja...@redhat.com --- drivers/net/ethernet/sfc/efx.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c index 0c42ed9..f3eaade 100644 --- a/drivers/net/ethernet/sfc/efx.c +++ b/drivers/net/ethernet/sfc/efx.c @@ -2448,6 +2448,7 @@ static void efx_unregister_netdev(struct efx_nic *efx) #endif device_remove_file(efx-pci_dev-dev, dev_attr_phy_type); unregister_netdev(efx-net_dev); + efx-state = STATE_UNINIT; } } -- 1.8.3.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH next v1] bonding: Display LACP info only to CAP_NET_ADMIN capable user
Actor and Partner details can be accessed via proc-fs and sys-fs entries. These interfaces are world readable at this moment. The earlier patch-series made the LACP communication secure to avoid nuisance attack from within the same L2 domain but it did not prevent someone unprivileged looking at that information on host and perform the same act. This patch essentially avoids spitting those entries if the user in question does not have enough privileges. Signed-off-by: Mahesh Bandewar mahe...@google.com --- drivers/net/bonding/bond_procfs.c | 101 -- drivers/net/bonding/bond_sysfs.c | 12 ++--- 2 files changed, 59 insertions(+), 54 deletions(-) diff --git a/drivers/net/bonding/bond_procfs.c b/drivers/net/bonding/bond_procfs.c index e7f3047a26df..f514fe5e80a5 100644 --- a/drivers/net/bonding/bond_procfs.c +++ b/drivers/net/bonding/bond_procfs.c @@ -135,27 +135,30 @@ static void bond_info_show_master(struct seq_file *seq) bond-params.ad_select); seq_printf(seq, Aggregator selection policy (ad_select): %s\n, optval-string); - seq_printf(seq, System priority: %d\n, - BOND_AD_INFO(bond).system.sys_priority); - seq_printf(seq, System MAC address: %pM\n, - BOND_AD_INFO(bond).system.sys_mac_addr); - - if (__bond_3ad_get_active_agg_info(bond, ad_info)) { - seq_printf(seq, bond %s has no active aggregator\n, - bond-dev-name); - } else { - seq_printf(seq, Active Aggregator Info:\n); - - seq_printf(seq, \tAggregator ID: %d\n, - ad_info.aggregator_id); - seq_printf(seq, \tNumber of ports: %d\n, - ad_info.ports); - seq_printf(seq, \tActor Key: %d\n, - ad_info.actor_key); - seq_printf(seq, \tPartner Key: %d\n, - ad_info.partner_key); - seq_printf(seq, \tPartner Mac Address: %pM\n, - ad_info.partner_system); + if (capable(CAP_NET_ADMIN)) { + seq_printf(seq, System priority: %d\n, + BOND_AD_INFO(bond).system.sys_priority); + seq_printf(seq, System MAC address: %pM\n, + BOND_AD_INFO(bond).system.sys_mac_addr); + + if (__bond_3ad_get_active_agg_info(bond, ad_info)) { + seq_printf(seq, + bond %s has no active aggregator\n, + bond-dev-name); + } else { + seq_printf(seq, Active Aggregator Info:\n); + + seq_printf(seq, \tAggregator ID: %d\n, + ad_info.aggregator_id); + seq_printf(seq, \tNumber of ports: %d\n, + ad_info.ports); + seq_printf(seq, \tActor Key: %d\n, + ad_info.actor_key); + seq_printf(seq, \tPartner Key: %d\n, + ad_info.partner_key); + seq_printf(seq, \tPartner Mac Address: %pM\n, + ad_info.partner_system); + } } } } @@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq, seq_printf(seq, Partner Churned Count: %d\n, port-churn_partner_count); - seq_puts(seq, details actor lacp pdu:\n); - seq_printf(seq, system priority: %d\n, - port-actor_system_priority); - seq_printf(seq, system mac address: %pM\n, - port-actor_system); - seq_printf(seq, port key: %d\n, - port-actor_oper_port_key); - seq_printf(seq, port priority: %d\n, - port-actor_port_priority); - seq_printf(seq, port number: %d\n, - port-actor_port_number); - seq_printf(seq, port state: %d\n, - port-actor_oper_port_state); - - seq_puts(seq, details partner lacp pdu:\n); - seq_printf(seq, system priority: %d\n, - port-partner_oper.system_priority); -
Re: [PATCH] Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt
From: Masanari Iida standby2...@gmail.com Date: Sat, 13 Jun 2015 00:23:21 +0900 This patch fix URL (http to https) for wiki.wireshark.org. Signed-off-by: Masanari Iida standby2...@gmail.com Applied, thank you. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote: eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid How does this work wrt namespaces, and why the weird packing? --Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] rocker: gaurd against NULL rocker_port when removing ports
From: Scott Feldman sfel...@gmail.com The ports array is filled in as ports are probed, but if probing doesn't finish, we need to stop only those ports that where probed successfully. Check the ports array for NULL to skip un-probed ports when stopping. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 819289e..c6a6e3c 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4802,6 +4802,8 @@ static void rocker_remove_ports(const struct rocker *rocker) for (i = 0; i rocker-port_count; i++) { rocker_port = rocker-ports[i]; + if (!rocker_port) + continue; rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE, ROCKER_OP_FLAG_REMOVE); unregister_netdev(rocker_port-dev); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs
From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 12 Jun 2015 19:31:32 -0700 From: Eric Dumazet eduma...@google.com __skb_header_pointer() returns a pointer that must be checked. Fixes infinite loop reported by Alexei, and add __must_check to catch these errors earlier. Fixes: 6a74fcf426f5 (flow_dissector: add support for dst, hop-by-hop and routing ext hdrs) Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com Tested-by: Alexei Starovoitov alexei.starovoi...@gmail.com Signed-off-by: Eric Dumazet eduma...@google.com Applied, thanks Eric. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] tcp: tcp_v6_connect() cleanup
From: Eric Dumazet eric.duma...@gmail.com Date: Fri, 12 Jun 2015 19:34:03 -0700 From: Eric Dumazet eduma...@google.com Remove dead code from tcp_v6_connect() Signed-off-by: Eric Dumazet eduma...@google.com Also applied, thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] rocker: fix neigh tbl index increment race
From: Scott Feldman sfel...@gmail.com rocker-neigh_tbl_next_index is used to generate unique indices for neigh entries programmed into the device. The way new indices were generated was racy with the new prepare-commit transaction model. A simple fix here removes the race. The race was with two processes getting the same index, one process using prepare-commit, the other not: Proc A Proc B PREPARE phase get neigh_tbl_next_index NONE phase get neigh_tbl_next_index neigh_tbl_next_index++ COMMIT phase neigh_tbl_next_index++ Both A and B got the same index. The fix is to store and increment neigh_tbl_next_index in the PREPARE (or NONE) phase and use value in COMMIT phase: Proc A Proc B PREPARE phase get neigh_tbl_next_index neigh_tbl_next_index++ NONE phase get neigh_tbl_next_index neigh_tbl_next_index++ COMMIT phase // use value stashed in PREPARE phase Reported-by: Simon Horman simon.hor...@netronome.com Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index c6a6e3c..a9d1559 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -2901,10 +2901,10 @@ static void _rocker_neigh_add(struct rocker *rocker, enum switchdev_trans trans, struct rocker_neigh_tbl_entry *entry) { - entry-index = rocker-neigh_tbl_next_index; + if (trans != SWITCHDEV_TRANS_COMMIT) + entry-index = rocker-neigh_tbl_next_index++; if (trans == SWITCHDEV_TRANS_PREPARE) return; - rocker-neigh_tbl_next_index++; entry-ref_count++; hash_add(rocker-neigh_tbl, entry-entry, be32_to_cpu(entry-ip_addr)); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 3/5] rocker: mark STP update as 'no wait' processing
From: Scott Feldman sfel...@gmail.com We can get STP updates from the bridge driver in atomic and non-atomic contexts. Since we can't test what context we're getting called in, do the STP processing as 'no wait', which will cover all cases. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 1995b59..6c15c2e 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4286,7 +4286,8 @@ static int rocker_port_attr_set(struct net_device *dev, switch (attr-id) { case SWITCHDEV_ATTR_PORT_STP_STATE: - err = rocker_port_stp_update(rocker_port, attr-trans, 0, + err = rocker_port_stp_update(rocker_port, attr-trans, +ROCKER_OP_FLAG_NOWAIT, attr-u.stp_state); break; case SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS: -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/5] rocker: revert back to support for nowait processes
From: Scott Feldman sfel...@gmail.com One of the items removed from the rocker driver in the Spring Cleanup patch series was the ability to mark processing in the driver as no wait for those contexts where we cannot sleep. Turns out, we have no wait contexts where we want to program the device. So re-add the ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to mem allocator and to the device cmd executor. With NOWAIT, mem allocs are GFP_ATOMIC and device cmds are queued to the device, but the driver will not wait (sleep) for the response back from the device. My bad for removing NOWAIT support in the first place; I thought we could swing non-sleep contexts to process context using a work queue, for example, but there is push-back to keep processing in original context. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c | 202 +++--- 1 file changed, 112 insertions(+), 90 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index a9d1559..c1910c1 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -326,10 +326,18 @@ static bool rocker_port_is_bridged(const struct rocker_port *rocker_port) return !!rocker_port-bridge_dev; } +#define ROCKER_OP_FLAG_REMOVE BIT(0) +#define ROCKER_OP_FLAG_NOWAIT BIT(1) +#define ROCKER_OP_FLAG_LEARNED BIT(2) +#define ROCKER_OP_FLAG_REFRESH BIT(3) + static void *__rocker_port_mem_alloc(struct rocker_port *rocker_port, -enum switchdev_trans trans, size_t size) +enum switchdev_trans trans, int flags, +size_t size) { struct list_head *elem = NULL; + gfp_t gfp_flags = (flags ROCKER_OP_FLAG_NOWAIT) ? + GFP_ATOMIC : GFP_KERNEL; /* If in transaction prepare phase, allocate the memory * and enqueue it on a per-port list. If in transaction @@ -342,7 +350,7 @@ static void *__rocker_port_mem_alloc(struct rocker_port *rocker_port, switch (trans) { case SWITCHDEV_TRANS_PREPARE: - elem = kzalloc(size + sizeof(*elem), GFP_KERNEL); + elem = kzalloc(size + sizeof(*elem), gfp_flags); if (!elem) return NULL; list_add_tail(elem, rocker_port-trans_mem); @@ -353,7 +361,7 @@ static void *__rocker_port_mem_alloc(struct rocker_port *rocker_port, list_del_init(elem); break; case SWITCHDEV_TRANS_NONE: - elem = kzalloc(size + sizeof(*elem), GFP_KERNEL); + elem = kzalloc(size + sizeof(*elem), gfp_flags); if (elem) INIT_LIST_HEAD(elem); break; @@ -365,16 +373,17 @@ static void *__rocker_port_mem_alloc(struct rocker_port *rocker_port, } static void *rocker_port_kzalloc(struct rocker_port *rocker_port, -enum switchdev_trans trans, size_t size) +enum switchdev_trans trans, int flags, +size_t size) { - return __rocker_port_mem_alloc(rocker_port, trans, size); + return __rocker_port_mem_alloc(rocker_port, trans, flags, size); } static void *rocker_port_kcalloc(struct rocker_port *rocker_port, -enum switchdev_trans trans, size_t n, -size_t size) +enum switchdev_trans trans, int flags, +size_t n, size_t size) { - return __rocker_port_mem_alloc(rocker_port, trans, n * size); + return __rocker_port_mem_alloc(rocker_port, trans, flags, n * size); } static void rocker_port_kfree(enum switchdev_trans trans, const void *mem) @@ -397,11 +406,13 @@ static void rocker_port_kfree(enum switchdev_trans trans, const void *mem) struct rocker_wait { wait_queue_head_t wait; bool done; + bool nowait; }; static void rocker_wait_reset(struct rocker_wait *wait) { wait-done = false; + wait-nowait = false; } static void rocker_wait_init(struct rocker_wait *wait) @@ -411,11 +422,12 @@ static void rocker_wait_init(struct rocker_wait *wait) } static struct rocker_wait *rocker_wait_create(struct rocker_port *rocker_port, - enum switchdev_trans trans) + enum switchdev_trans trans, + int flags) { struct rocker_wait *wait; - wait = rocker_port_kzalloc(rocker_port, trans, sizeof(*wait)); + wait = rocker_port_kzalloc(rocker_port, trans, flags, sizeof(*wait)); if (!wait) return NULL; rocker_wait_init(wait); @@ -1386,7 +1398,12 @@ static
[PATCH net-next 4/5] rocker: move MAC learn event back to 'no wait' processing
From: Scott Feldman sfel...@gmail.com Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c | 40 +++--- 1 file changed, 3 insertions(+), 37 deletions(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 6c15c2e..8430cb3 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -1459,36 +1459,14 @@ static int rocker_port_fdb(struct rocker_port *rocker_port, const unsigned char *addr, __be16 vlan_id, int flags); -struct rocker_mac_vlan_seen_work { - struct work_struct work; - struct rocker_port *rocker_port; - int flags; - unsigned char addr[ETH_ALEN]; - __be16 vlan_id; -}; - -static void rocker_event_mac_vlan_seen_work(struct work_struct *work) -{ - const struct rocker_mac_vlan_seen_work *sw = - container_of(work, struct rocker_mac_vlan_seen_work, work); - - rtnl_lock(); - rocker_port_fdb(sw-rocker_port, SWITCHDEV_TRANS_NONE, - sw-addr, sw-vlan_id, sw-flags); - rtnl_unlock(); - - kfree(work); -} - static int rocker_event_mac_vlan_seen(const struct rocker *rocker, const struct rocker_tlv *info) { - struct rocker_mac_vlan_seen_work *sw; const struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAX + 1]; unsigned int port_number; struct rocker_port *rocker_port; const unsigned char *addr; - int flags = ROCKER_OP_FLAG_LEARNED; + int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_LEARNED; __be16 vlan_id; rocker_tlv_parse_nested(attrs, ROCKER_TLV_EVENT_MAC_VLAN_MAX, info); @@ -1510,20 +1488,8 @@ static int rocker_event_mac_vlan_seen(const struct rocker *rocker, rocker_port-stp_state != BR_STATE_FORWARDING) return 0; - sw = kmalloc(sizeof(*sw), GFP_ATOMIC); - if (!sw) - return -ENOMEM; - - INIT_WORK(sw-work, rocker_event_mac_vlan_seen_work); - - sw-rocker_port = rocker_port; - sw-flags = flags; - ether_addr_copy(sw-addr, addr); - sw-vlan_id = vlan_id; - - schedule_work(sw-work); - - return 0; + return rocker_port_fdb(rocker_port, SWITCHDEV_TRANS_NONE, + addr, vlan_id, flags); } static int rocker_event_process(const struct rocker *rocker, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 5/5] rocker: move port stop to 'no wait' processing
From: Scott Feldman sfel...@gmail.com rocker_port_stop can be called from atomic and non-atomic contexts. Since we can't test what context we're getting called in, do the processing as 'no wait', which will cover all cases. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index 8430cb3..a06b93d 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -4004,7 +4004,8 @@ static int rocker_port_stop(struct net_device *dev) rocker_port_set_enable(rocker_port, false); napi_disable(rocker_port-napi_rx); napi_disable(rocker_port-napi_tx); - rocker_port_fwd_disable(rocker_port, SWITCHDEV_TRANS_NONE, 0); + rocker_port_fwd_disable(rocker_port, SWITCHDEV_TRANS_NONE, + ROCKER_OP_FLAG_NOWAIT); free_irq(rocker_msix_rx_vector(rocker_port), rocker_port); free_irq(rocker_msix_tx_vector(rocker_port), rocker_port); rocker_port_dma_rings_fini(rocker_port); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/5] rocker: mark neigh update event processing as 'no wait'
From: Scott Feldman sfel...@gmail.com Neigh update event handler runs in a context where we can't sleep, so mark processing in driver with ROCKER_OP_FLAG_NOWAIT. NOWAIT will use GFP_ATOMIC for allocations and will queue cmds to the device's cmd ring but will not wait (sleep) for cmd response back from device. Signed-off-by: Scott Feldman sfel...@gmail.com --- drivers/net/ethernet/rocker/rocker.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/rocker/rocker.c b/drivers/net/ethernet/rocker/rocker.c index c1910c1..1995b59 100644 --- a/drivers/net/ethernet/rocker/rocker.c +++ b/drivers/net/ethernet/rocker/rocker.c @@ -5251,7 +5251,8 @@ static struct notifier_block rocker_netdevice_nb __read_mostly = { static int rocker_neigh_update(struct net_device *dev, struct neighbour *n) { struct rocker_port *rocker_port = netdev_priv(dev); - int flags = (n-nud_state NUD_VALID) ? 0 : ROCKER_OP_FLAG_REMOVE; + int flags = (n-nud_state NUD_VALID ? 0 : ROCKER_OP_FLAG_REMOVE) | + ROCKER_OP_FLAG_NOWAIT; __be32 ip_addr = *(__be32 *)n-primary_key; return rocker_port_ipv4_neigh(rocker_port, SWITCHDEV_TRANS_NONE, -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/5] rocker: revert back to support for nowait processes
From: Scott Feldman sfel...@gmail.com One of the items removed from the rocker driver in the Spring Cleanup patch series was the ability to mark processing in the driver as no wait for those contexts where we cannot sleep. Turns out, we have no wait contexts where we want to program the device and we don't want to defer the processing to a process context. So re-add the ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to mem allocator and to the device cmd executor. With NOWAIT, mem allocs are GFP_ATOMIC and device cmds are queued to the device, but the driver will not wait (sleep) for the response back from the device. My bad for removing NOWAIT support in the first place; I thought we could swing non-sleep contexts to process context using a work queue, for example, but there is push-back to keep processing in original context. Scott Feldman (5): rocker: revert back to support for nowait processes rocker: mark neigh update event processing as 'no wait' rocker: mark STP update as 'no wait' processing rocker: move MAC learn event back to 'no wait' processing rocker: move port stop to 'no wait' processing drivers/net/ethernet/rocker/rocker.c | 245 -- 1 file changed, 118 insertions(+), 127 deletions(-) -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On 6/12/15 3:54 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 3:44 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 3:08 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote: eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid How does this work wrt namespaces, from_kuid(current_user_ns(), uid) Is current_user_ns() well defined in the context of an eBPF program? What do you mean 'well defined'? Semantically same as 'current'. Depending on where particular kprobe is placed, 'current' is either meaningful or not. Program author needs to know what he's doing. It's a tool. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On 6/12/15 4:25 PM, Andy Lutomirski wrote: It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? so leaking info to unprivileged apps is the concern? The whole thing is for root only as you know. The non-root is still far away. Today root needs to see the whole kernel. That was the goal from the beginning. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On Fri, Jun 12, 2015 at 5:15 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 5:03 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 4:55 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:47 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:25 PM, Andy Lutomirski wrote: It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? so leaking info to unprivileged apps is the concern? The whole thing is for root only as you know. The non-root is still far away. Today root needs to see the whole kernel. That was the goal from the beginning. This is more of a correctness issue than a security issue. ISTM using current_user_ns() in a kprobe is asking for trouble. It certainly allows any unprivilege user to show any uid it wants to the probe, which is probably not what the installer of the probe expects. probe doesn't expect anything. it doesn't make any decisions. bpf is read only. it's _visibility_ into the kernel. It's not used for security. When we start connecting eBPF to seccomp I would agree that uid handling needs to be done carefully, but we're not there yet. I don't want to kill _visibility_ because in some distant future bpf becomes a decision making tool in security area and get_current_uid() will return numbers that shouldn't be blindly used to reject/accept a user requesting something. That's far away. All that is true, but the code that *installed* the bpf probe might get might confused when it logs that uid 0 did such-and-such when really some unprivileged userns root did it. so what specifically you proposing? Use from_kuid(init_user_ns,...) instead? That seems reasonable to me. After all, you can't install one of these probes from a non-init userns. --Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
On Fri, 2015-06-12 at 18:50 -0700, Alexei Starovoitov wrote: sure, that's better. If you're going to submit it officialy, please add my Tested-by. My server is happy now :) Sure , will do. I tried adding __must_check to __skb_header_pointer() but apparently had to use W=1 to get a warning : make W=1 net/core/ CC net/core/flow_dissector.o net/core/flow_dissector.c: In function ‘__skb_flow_dissect’: net/core/flow_dissector.c:390:19: warning: variable ‘opthdr’ set but not used [-Wunused-but-set-variable] u8 _opthdr[2], *opthdr; diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cc612fc0a8943ec853b92e6b3516b0e5582299e2..45252c4f49e4020eec523273f23f65ee87cc0bd5 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2743,8 +2743,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum csum); -static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset, -int len, void *data, int hlen, void *buffer) +static inline void * __must_check +__skb_header_pointer(const struct sk_buff *skb, int offset, +int len, void *data, int hlen, void *buffer) { if (hlen - offset = len) return data + offset; -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On Fri, Jun 12, 2015 at 4:23 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 3:54 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 3:44 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 3:08 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote: eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid How does this work wrt namespaces, from_kuid(current_user_ns(), uid) Is current_user_ns() well defined in the context of an eBPF program? What do you mean 'well defined'? Semantically same as 'current'. Depending on where particular kprobe is placed, 'current' is either meaningful or not. Program author needs to know what he's doing. It's a tool. It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? --Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:25 PM, Andy Lutomirski wrote: It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? so leaking info to unprivileged apps is the concern? The whole thing is for root only as you know. The non-root is still far away. Today root needs to see the whole kernel. That was the goal from the beginning. This is more of a correctness issue than a security issue. ISTM using current_user_ns() in a kprobe is asking for trouble. It certainly allows any unprivilege user to show any uid it wants to the probe, which is probably not what the installer of the probe expects. --Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On 6/12/15 4:47 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:25 PM, Andy Lutomirski wrote: It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? so leaking info to unprivileged apps is the concern? The whole thing is for root only as you know. The non-root is still far away. Today root needs to see the whole kernel. That was the goal from the beginning. This is more of a correctness issue than a security issue. ISTM using current_user_ns() in a kprobe is asking for trouble. It certainly allows any unprivilege user to show any uid it wants to the probe, which is probably not what the installer of the probe expects. probe doesn't expect anything. it doesn't make any decisions. bpf is read only. it's _visibility_ into the kernel. It's not used for security. When we start connecting eBPF to seccomp I would agree that uid handling needs to be done carefully, but we're not there yet. I don't want to kill _visibility_ because in some distant future bpf becomes a decision making tool in security area and get_current_uid() will return numbers that shouldn't be blindly used to reject/accept a user requesting something. That's far away. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On Fri, Jun 12, 2015 at 4:55 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:47 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:25 PM, Andy Lutomirski wrote: It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? so leaking info to unprivileged apps is the concern? The whole thing is for root only as you know. The non-root is still far away. Today root needs to see the whole kernel. That was the goal from the beginning. This is more of a correctness issue than a security issue. ISTM using current_user_ns() in a kprobe is asking for trouble. It certainly allows any unprivilege user to show any uid it wants to the probe, which is probably not what the installer of the probe expects. probe doesn't expect anything. it doesn't make any decisions. bpf is read only. it's _visibility_ into the kernel. It's not used for security. When we start connecting eBPF to seccomp I would agree that uid handling needs to be done carefully, but we're not there yet. I don't want to kill _visibility_ because in some distant future bpf becomes a decision making tool in security area and get_current_uid() will return numbers that shouldn't be blindly used to reject/accept a user requesting something. That's far away. All that is true, but the code that *installed* the bpf probe might get might confused when it logs that uid 0 did such-and-such when really some unprivileged userns root did it. Also, as you start calling more and more non-trivial functions from bpf, you might need to start preventing bpf probe installations in those functions. --Andy -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next v2] bridge: use either ndo VLAN ops or switchdev VLAN ops to install MASTER vlans
From: Scott Feldman sfel...@gmail.com v2: Move struct switchdev_obj automatics to inner scope where there used. v1: To maintain backward compatibility with the existing iproute2 bridge vlan command, let bridge's setlink/dellink handler call into either the port driver's 8021q ndo ops or the port driver's bridge_setlink/dellink ops. This allows port driver to choose 8021q ops or the newer bridge_setlink/dellink ops when implementing VLAN add/del filtering on the device. The iproute bridge vlan command does not need to be modified. To summarize using the bridge vlan command examples, we have: 1) bridge vlan add|del vid VID dev DEV Here iproute2 sets MASTER flag. Bridge's bridge_setlink/dellink is called. Vlan is set on bridge for port. If port driver implements ndo 8021q ops, call those to port driver can install vlan filter on device. Otherwise, if port driver implements bridge_setlink/dellink ops, call those to install vlan filter to device. This option only works if port is bridged. 2) bridge vlan add|del vid VID dev DEV master Same as 1) 3) bridge vlan add|del vid VID dev DEV self Bridge's bridge_setlink/dellink isn't called. Port driver's bridge_setlink/dellink is called, if implemented. This option works if port is bridged or not. If port is not bridged, a VLAN can still be added/deleted to device filter using this variant. 4) bridge vlan add|del vid VID dev DEV master self This is a combination of 1) and 3), but will only work if port is bridged. Signed-off-by: Scott Feldman sfel...@gmail.com --- net/bridge/br_vlan.c | 59 -- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c index 13013fe..17fc358 100644 --- a/net/bridge/br_vlan.c +++ b/net/bridge/br_vlan.c @@ -2,6 +2,7 @@ #include linux/netdevice.h #include linux/rtnetlink.h #include linux/slab.h +#include net/switchdev.h #include br_private.h @@ -36,6 +37,36 @@ static void __vlan_add_flags(struct net_port_vlans *v, u16 vid, u16 flags) clear_bit(vid, v-untagged_bitmap); } +static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br, + u16 vid, u16 flags) +{ + const struct net_device_ops *ops = dev-netdev_ops; + int err; + + /* If driver uses VLAN ndo ops, use 8021q to install vid +* on device, otherwise try switchdev ops to install vid. +*/ + + if (ops-ndo_vlan_rx_add_vid) { + err = vlan_vid_add(dev, br-vlan_proto, vid); + } else { + struct switchdev_obj vlan_obj = { + .id = SWITCHDEV_OBJ_PORT_VLAN, + .u.vlan = { + .flags = flags, + .vid_start = vid, + .vid_end = vid, + }, + }; + + err = switchdev_port_obj_add(dev, vlan_obj); + if (err == -EOPNOTSUPP) + err = 0; + } + + return err; +} + static int __vlan_add(struct net_port_vlans *v, u16 vid, u16 flags) { struct net_bridge_port *p = NULL; @@ -62,7 +93,7 @@ static int __vlan_add(struct net_port_vlans *v, u16 vid, u16 flags) * This ensures tagged traffic enters the bridge when * promiscuous mode is disabled by br_manage_promisc(). */ - err = vlan_vid_add(dev, br-vlan_proto, vid); + err = __vlan_vid_add(dev, br, vid, flags); if (err) return err; } @@ -86,6 +117,30 @@ out_filt: return err; } +static void __vlan_vid_del(struct net_device *dev, struct net_bridge *br, + u16 vid) +{ + const struct net_device_ops *ops = dev-netdev_ops; + + /* If driver uses VLAN ndo ops, use 8021q to delete vid +* on device, otherwise try switchdev ops to delete vid. +*/ + + if (ops-ndo_vlan_rx_kill_vid) { + vlan_vid_del(dev, br-vlan_proto, vid); + } else { + struct switchdev_obj vlan_obj = { + .id = SWITCHDEV_OBJ_PORT_VLAN, + .u.vlan = { + .vid_start = vid, + .vid_end = vid, + }, + }; + + switchdev_port_obj_del(dev, vlan_obj); + } +} + static int __vlan_del(struct net_port_vlans *v, u16 vid) { if (!test_bit(vid, v-vlan_bitmap)) @@ -96,7 +151,7 @@ static int __vlan_del(struct net_port_vlans *v, u16 vid) if (v-port_idx) { struct net_bridge_port *p = v-parent.port; - vlan_vid_del(p-dev, p-br-vlan_proto, vid); + __vlan_vid_del(p-dev, p-br, vid); } clear_bit(vid, v-vlan_bitmap); -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe netdev in
Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
On Fri, Jun 12, 2015 at 06:37:34PM -0700, Eric Dumazet wrote: On Fri, 2015-06-12 at 18:27 -0700, Alexei Starovoitov wrote: On Fri, Jun 12, 2015 at 09:01:06AM -0700, Tom Herbert wrote: If dst, hop-by-hop or routing extension headers are present determine length of the options and skip over them in flow dissection. Signed-off-by: Tom Herbert t...@herbertland.com --- net/core/flow_dissector.c | 17 + 1 file changed, 17 insertions(+) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1818cdc..22e4dff 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -327,6 +327,7 @@ mpls: return false; } +ip_proto_again: switch (ip_proto) { case IPPROTO_GRE: { struct gre_hdr { @@ -383,6 +384,22 @@ mpls: } goto again; } + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + case NEXTHDR_DEST: { + u8 _opthdr[2], *opthdr; + + if (proto != htons(ETH_P_IPV6)) + break; + + opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), + data, hlen, _opthdr); + + ip_proto = _opthdr[0]; + nhoff += (_opthdr[1] + 1) 3; + + goto ip_proto_again; + } Dave, please revert it. My server locks up during boot with: Seems easy to fix instead ? diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -394,9 +394,11 @@ ip_proto_again: opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), data, hlen, _opthdr); + if (!opthdr) + return false; - ip_proto = _opthdr[0]; - nhoff += (_opthdr[1] + 1) 3; + ip_proto = opthdr[0]; + nhoff += (opthdr[1] + 1) 3; goto ip_proto_again; } sure, that's better. If you're going to submit it officialy, please add my Tested-by. My server is happy now :) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Good day, We are Christian organization, we give out loan to those who are interested in getting a financial help, contact us through our email, at estonia_organizat...@yahoo.cl -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] Fix Cavium Liquidio build related errors and warnings
From: Raghu Vatsavayi rvatsav...@caviumnetworks.com Date: Fri, 12 Jun 2015 18:11:50 -0700 1) Fixed following sparse warnings: ... 2) Fix build errors corresponding to vmalloc on linux-next 4.1. 3) Liquidio now supports 64 bit only, modified Kconfig accordingly. 4) Fix some code alignment issues based on kernel build warnings. Signed-off-by: Derek Chickles derek.chick...@caviumnetworks.com Signed-off-by: Satanand Burla satananda.bu...@caviumnetworks.com Signed-off-by: Felix Manlunas felix.manlu...@caviumnetworks.com Signed-off-by: Raghu Vatsavayi raghu.vatsav...@caviumnetworks.com Applied, but I _seriously_ wish you didn't fix the readq/writeq stuff by restricting the build of the driver to 64-bit. That really kills build test coverage. Just provide an appropriate set of readq/writeq like other drivers do by including either io-64-nonatomic-hi-lo.h or io-64-nonatomic-lo-hi.h Thanks. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs
On Fri, Jun 12, 2015 at 7:31 PM, Eric Dumazet eric.duma...@gmail.com wrote: From: Eric Dumazet eduma...@google.com __skb_header_pointer() returns a pointer that must be checked. Fixes infinite loop reported by Alexei, and add __must_check to catch these errors earlier. Fixes: 6a74fcf426f5 (flow_dissector: add support for dst, hop-by-hop and routing ext hdrs) Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com Tested-by: Alexei Starovoitov alexei.starovoi...@gmail.com Signed-off-by: Eric Dumazet eduma...@google.com --- include/linux/skbuff.h|9 + net/core/flow_dissector.c |6 -- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cc612fc0a8943ec853b92e6b3516b0e5582299e2..a7acc92aa6685d7006077510697e3d9481b02588 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2743,8 +2743,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum csum); -static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset, -int len, void *data, int hlen, void *buffer) +static inline void * __must_check +__skb_header_pointer(const struct sk_buff *skb, int offset, +int len, void *data, int hlen, void *buffer) { if (hlen - offset = len) return data + offset; @@ -2756,8 +2757,8 @@ static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset, return buffer; } -static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, - int len, void *buffer) +static inline void * __must_check +skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) { return __skb_header_pointer(skb, offset, len, skb-data, skb_headlen(skb), buffer); diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -394,9 +394,11 @@ ip_proto_again: opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), data, hlen, _opthdr); + if (!opthdr) + return false; - ip_proto = _opthdr[0]; - nhoff += (_opthdr[1] + 1) 3; + ip_proto = opthdr[0]; + nhoff += (opthdr[1] + 1) 3; goto ip_proto_again; } Acked-by: Tom Herbert t...@herbertland.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 0/3] bpf: share helpers between tracing and networking
v1-v2: switched to init_user_ns from current_user_ns as suggested by Andy Introduce new helpers to access 'struct task_struct'-pid, tgid, uid, gid, comm fields in tracing and networking. Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between tracing and networking. Alexei Starovoitov (3): bpf: introduce current-pid, tgid, uid, gid, comm accessors bpf: allow networking programs to use bpf_trace_printk() for debugging bpf: let kprobe programs use bpf_get_smp_processor_id() helper include/linux/bpf.h|4 +++ include/uapi/linux/bpf.h | 19 + kernel/bpf/core.c |7 + kernel/bpf/helpers.c | 58 ++ kernel/trace/bpf_trace.c | 28 -- net/core/filter.c |8 ++ samples/bpf/bpf_helpers.h |6 samples/bpf/tracex2_kern.c | 24 samples/bpf/tracex2_user.c | 67 ++-- 9 files changed, 199 insertions(+), 22 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
eBPF programs attached to kprobes need to filter based on current-pid, uid and other fields, so introduce helper functions: u64 bpf_get_current_pid_tgid(void) Return: current-tgid 32 | current-pid u64 bpf_get_current_uid_gid(void) Return: current_gid 32 | current_uid bpf_get_current_comm(char *buf, int size_of_buf) stores current-comm into buf They can be used from the programs attached to TC as well to classify packets based on current task fields. Update tracex2 example to print histogram of write syscalls for each process instead of aggregated for all. Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- v1-v2: switched to init_user_ns from current_user_ns as suggested by Andy These helpers will be mainly used by bpf+tracing, but the patch is targeting net-next tree to minimize merge conflicts and they're useful in TC too. The feature was requested by Wang Nan wangn...@huawei.com and Brendan Gregg brendan.d.gr...@gmail.com We've considered several alternatives: 1: 5 different helpers Cons: every call adds performance overhead 2a: single helper that populates 'struct bpf_task_info' and uses 'flags' with bit per field. struct bpf_task_info { __u32 pid; __u32 tgid; __u32 uid; __u32 gid; char comm[16]; }; bpf_get_current_task_info(task_info, size, flags) bit 0 - fill in pid bit 1 - fill in tgid Pros: single helper. Cons: not easy to use and a lot of compares in the helper itself (two compares for each field). 2b. single helper that populates 'struct bpf_task_info' and uses 'size' to tell how many fields to fill in. bpf_get_current_task_info(task_info, size); if (size = offsetof(struct bpf_task_info, pid) + sizeof(info-pid)) info-pid = task-pid; if (size = offsetof(struct bpf_task_info, tgid) + sizeof(info-tgid)) info-tgid = task-tgid; Pros: single call (with single compare per field). Cons: still hard to use when only some middle field (like uid) is needed. These three helpers looks as the best balance between performance and usability. include/linux/bpf.h|3 ++ include/uapi/linux/bpf.h | 19 + kernel/bpf/core.c |3 ++ kernel/bpf/helpers.c | 58 ++ kernel/trace/bpf_trace.c |6 net/core/filter.c |6 samples/bpf/bpf_helpers.h |6 samples/bpf/tracex2_kern.c | 24 samples/bpf/tracex2_user.c | 67 ++-- 9 files changed, 178 insertions(+), 14 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 2235aee8096a..1b9a3f5b27f6 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -188,5 +188,8 @@ extern const struct bpf_func_proto bpf_get_prandom_u32_proto; extern const struct bpf_func_proto bpf_get_smp_processor_id_proto; extern const struct bpf_func_proto bpf_tail_call_proto; extern const struct bpf_func_proto bpf_ktime_get_ns_proto; +extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto; +extern const struct bpf_func_proto bpf_get_current_uid_gid_proto; +extern const struct bpf_func_proto bpf_get_current_comm_proto; #endif /* _LINUX_BPF_H */ diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 602f05b7a275..29ef6f99e43d 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -230,6 +230,25 @@ enum bpf_func_id { * Return: 0 on success */ BPF_FUNC_clone_redirect, + + /** +* u64 bpf_get_current_pid_tgid(void) +* Return: current-tgid 32 | current-pid +*/ + BPF_FUNC_get_current_pid_tgid, + + /** +* u64 bpf_get_current_uid_gid(void) +* Return: current_gid 32 | current_uid +*/ + BPF_FUNC_get_current_uid_gid, + + /** +* bpf_get_current_comm(char *buf, int size_of_buf) +* stores current-comm into buf +* Return: 0 on success +*/ + BPF_FUNC_get_current_comm, __BPF_FUNC_MAX_ID, }; diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 1e00aa3316dc..1fc45cc83076 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -730,6 +730,9 @@ const struct bpf_func_proto bpf_map_delete_elem_proto __weak; const struct bpf_func_proto bpf_get_prandom_u32_proto __weak; const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak; const struct bpf_func_proto bpf_ktime_get_ns_proto __weak; +const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak; +const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; +const struct bpf_func_proto bpf_get_current_comm_proto __weak; /* Always built-in helper functions. */ const struct bpf_func_proto bpf_tail_call_proto = { diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c index 7ad5d8842d5b..1447ec09421e 100644 --- a/kernel/bpf/helpers.c +++ b/kernel/bpf/helpers.c @@ -14,6 +14,8 @@ #include linux/random.h #include linux/smp.h #include linux/ktime.h +#include
Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
On Fri, 2015-06-12 at 18:27 -0700, Alexei Starovoitov wrote: On Fri, Jun 12, 2015 at 09:01:06AM -0700, Tom Herbert wrote: If dst, hop-by-hop or routing extension headers are present determine length of the options and skip over them in flow dissection. Signed-off-by: Tom Herbert t...@herbertland.com --- net/core/flow_dissector.c | 17 + 1 file changed, 17 insertions(+) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1818cdc..22e4dff 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -327,6 +327,7 @@ mpls: return false; } +ip_proto_again: switch (ip_proto) { case IPPROTO_GRE: { struct gre_hdr { @@ -383,6 +384,22 @@ mpls: } goto again; } + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + case NEXTHDR_DEST: { + u8 _opthdr[2], *opthdr; + + if (proto != htons(ETH_P_IPV6)) + break; + + opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), + data, hlen, _opthdr); + + ip_proto = _opthdr[0]; + nhoff += (_opthdr[1] + 1) 3; + + goto ip_proto_again; + } Dave, please revert it. My server locks up during boot with: Seems easy to fix instead ? diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -394,9 +394,11 @@ ip_proto_again: opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), data, hlen, _opthdr); + if (!opthdr) + return false; - ip_proto = _opthdr[0]; - nhoff += (_opthdr[1] + 1) 3; + ip_proto = opthdr[0]; + nhoff += (opthdr[1] + 1) 3; goto ip_proto_again; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
On Fri, Jun 12, 2015 at 07:11:16PM -0700, Eric Dumazet wrote: On Fri, 2015-06-12 at 18:50 -0700, Alexei Starovoitov wrote: sure, that's better. If you're going to submit it officialy, please add my Tested-by. My server is happy now :) Sure , will do. I tried adding __must_check to __skb_header_pointer() but apparently had to use W=1 to get a warning : that is great idea still. At least buildbot can pick it up. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs
From: Eric Dumazet eduma...@google.com __skb_header_pointer() returns a pointer that must be checked. Fixes infinite loop reported by Alexei, and add __must_check to catch these errors earlier. Fixes: 6a74fcf426f5 (flow_dissector: add support for dst, hop-by-hop and routing ext hdrs) Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com Tested-by: Alexei Starovoitov alexei.starovoi...@gmail.com Signed-off-by: Eric Dumazet eduma...@google.com --- include/linux/skbuff.h|9 + net/core/flow_dissector.c |6 -- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index cc612fc0a8943ec853b92e6b3516b0e5582299e2..a7acc92aa6685d7006077510697e3d9481b02588 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -2743,8 +2743,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum skb_checksum(const struct sk_buff *skb, int offset, int len, __wsum csum); -static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset, -int len, void *data, int hlen, void *buffer) +static inline void * __must_check +__skb_header_pointer(const struct sk_buff *skb, int offset, +int len, void *data, int hlen, void *buffer) { if (hlen - offset = len) return data + offset; @@ -2756,8 +2757,8 @@ static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset, return buffer; } -static inline void *skb_header_pointer(const struct sk_buff *skb, int offset, - int len, void *buffer) +static inline void * __must_check +skb_header_pointer(const struct sk_buff *skb, int offset, int len, void *buffer) { return __skb_header_pointer(skb, offset, len, skb-data, skb_headlen(skb), buffer); diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -394,9 +394,11 @@ ip_proto_again: opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), data, hlen, _opthdr); + if (!opthdr) + return false; - ip_proto = _opthdr[0]; - nhoff += (_opthdr[1] + 1) 3; + ip_proto = opthdr[0]; + nhoff += (opthdr[1] + 1) 3; goto ip_proto_again; } -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 3/3] bpf: let kprobe programs use bpf_get_smp_processor_id() helper
It's useful to do per-cpu histograms. Suggested-by: Daniel Wagner daniel.wag...@bmw-carit.de Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- v1-v2: no changes kernel/trace/bpf_trace.c |2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 4f9b5d41869b..88a041adee90 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -181,6 +181,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func return bpf_get_current_comm_proto; case BPF_FUNC_trace_printk: return bpf_get_trace_printk_proto(); + case BPF_FUNC_get_smp_processor_id: + return bpf_get_smp_processor_id_proto; default: return NULL; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 net-next 2/3] bpf: allow networking programs to use bpf_trace_printk() for debugging
bpf_trace_printk() is a helper function used to debug eBPF programs. Let socket and TC programs use it as well. Note, it's DEBUG ONLY helper. If it's used in the program, the kernel will print warning banner to make sure users don't use it in production. Signed-off-by: Alexei Starovoitov a...@plumgrid.com --- v1-v2: no changes include/linux/bpf.h |1 + kernel/bpf/core.c|4 kernel/trace/bpf_trace.c | 20 net/core/filter.c|2 ++ 4 files changed, 19 insertions(+), 8 deletions(-) diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 1b9a3f5b27f6..4383476a0d48 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -150,6 +150,7 @@ struct bpf_array { u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5); void bpf_prog_array_map_clear(struct bpf_map *map); bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog *fp); +const struct bpf_func_proto *bpf_get_trace_printk_proto(void); #ifdef CONFIG_BPF_SYSCALL void bpf_register_prog_type(struct bpf_prog_type_list *tl); diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c index 1fc45cc83076..c5bedc82bc1c 100644 --- a/kernel/bpf/core.c +++ b/kernel/bpf/core.c @@ -733,6 +733,10 @@ const struct bpf_func_proto bpf_ktime_get_ns_proto __weak; const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak; const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak; const struct bpf_func_proto bpf_get_current_comm_proto __weak; +const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void) +{ + return NULL; +} /* Always built-in helper functions. */ const struct bpf_func_proto bpf_tail_call_proto = { diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c index 3a17638cdf46..4f9b5d41869b 100644 --- a/kernel/trace/bpf_trace.c +++ b/kernel/trace/bpf_trace.c @@ -147,6 +147,17 @@ static const struct bpf_func_proto bpf_trace_printk_proto = { .arg2_type = ARG_CONST_STACK_SIZE, }; +const struct bpf_func_proto *bpf_get_trace_printk_proto(void) +{ + /* +* this program might be calling bpf_trace_printk, +* so allocate per-cpu printk buffers +*/ + trace_printk_init_buffers(); + + return bpf_trace_printk_proto; +} + static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func_id) { switch (func_id) { @@ -168,15 +179,8 @@ static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id func return bpf_get_current_uid_gid_proto; case BPF_FUNC_get_current_comm: return bpf_get_current_comm_proto; - case BPF_FUNC_trace_printk: - /* -* this program might be calling bpf_trace_printk, -* so allocate per-cpu printk buffers -*/ - trace_printk_init_buffers(); - - return bpf_trace_printk_proto; + return bpf_get_trace_printk_proto(); default: return NULL; } diff --git a/net/core/filter.c b/net/core/filter.c index 20aa51ccbf9d..65ff107d3d29 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -1442,6 +1442,8 @@ sk_filter_func_proto(enum bpf_func_id func_id) return bpf_tail_call_proto; case BPF_FUNC_ktime_get_ns: return bpf_ktime_get_ns_proto; + case BPF_FUNC_trace_printk: + return bpf_get_trace_printk_proto(); default: return NULL; } -- 1.7.9.5 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT] Networking
1) Fix uninitialized struct station_info in cfg80211_wireless_stats(), from Johannes Berg. 2) Revert commit attempt to fix ipv6 protocol resubmission, it adds regressions. 3) Endless loops can be created in bridge port lists, fix from Nikolay Aleksandrov. 4) Don't WARN_ON() if sk-sk_forward_alloc is non-zero in sk_clear_memalloc, it is a legal situation during swap deactivation. Fix from Mel Gorman. 5) Fix order of disabling interrupts and unlocking NAPI in enic driver to avoid a race. From Govindarajulu Varadarajan. 6) High and low register writes are swapped when programming the start of periodic output in igb driver. From RIchard Cochran. 7) Fix device rename handling in mpls stack, from Robert Shearman. 8) Do not trigger compaction synchronously when optimistically trying to allocate an order 3 page in alloc_skb_with_frags() and skb_page_frag_refill(). From Shaohua Li. 9) Authentication with COOKIE_ECHO is not handled properly in SCTP, fix from Marcelo Ricardo Leitner. Please pull, thanks a lot! The following changes since commit 5879ae5fd052a63d5ac0684320cb7df3e83da7de: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-06-08 17:41:04 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net for you to fetch changes up to b07d496177cd3bc4b70fb8a5e85ede24cb403a11: Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt (2015-06-12 14:21:29 -0700) David S. Miller (1): Revert ipv6: Fix protocol resubmission Erik Hugne (1): tipc: disconnect socket directly after probe failure Govindarajulu Varadarajan (3): enic: unlock napi busy poll before unmasking intr enic: check return value for stat dump enic: fix memory leak in rq_clean Johannes Berg (1): cfg80211: wext: clear sinfo struct before calling driver Marcelo Ricardo Leitner (1): sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO Masanari Iida (1): Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt Mel Gorman (1): net, swap: Remove a warning and clarify why sk_mem_reclaim is required when deactivating swap Nikolay Aleksandrov (1): bridge: fix multicast router rlist endless loop Richard Cochran (1): net: igb: fix the start time for periodic output signals Robert Shearman (1): mpls: handle device renames for per-device sysctls Shaohua Li (1): net: don't wait for order-3 page allocation Documentation/networking/udplite.txt | 2 +- drivers/net/ethernet/cisco/enic/enic_ethtool.c | 20 +--- drivers/net/ethernet/cisco/enic/enic_main.c| 11 +-- drivers/net/ethernet/cisco/enic/vnic_rq.c | 9 - drivers/net/ethernet/intel/igb/igb_ptp.c | 4 ++-- net/bridge/br_multicast.c | 7 +++ net/core/skbuff.c | 2 +- net/core/sock.c| 15 ++- net/ipv6/ip6_input.c | 8 +++- net/mpls/af_mpls.c | 11 +++ net/sctp/auth.c| 11 ++- net/tipc/socket.c | 16 +++- net/wireless/wext-compat.c | 2 ++ 13 files changed, 80 insertions(+), 38 deletions(-) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On 6/12/15 5:03 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 4:55 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:47 PM, Andy Lutomirski wrote: On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote: On 6/12/15 4:25 PM, Andy Lutomirski wrote: It's a dangerous tool. Also, shouldn't the returned uid match the namespace of the task that installed the probe, not the task that's being probed? so leaking info to unprivileged apps is the concern? The whole thing is for root only as you know. The non-root is still far away. Today root needs to see the whole kernel. That was the goal from the beginning. This is more of a correctness issue than a security issue. ISTM using current_user_ns() in a kprobe is asking for trouble. It certainly allows any unprivilege user to show any uid it wants to the probe, which is probably not what the installer of the probe expects. probe doesn't expect anything. it doesn't make any decisions. bpf is read only. it's _visibility_ into the kernel. It's not used for security. When we start connecting eBPF to seccomp I would agree that uid handling needs to be done carefully, but we're not there yet. I don't want to kill _visibility_ because in some distant future bpf becomes a decision making tool in security area and get_current_uid() will return numbers that shouldn't be blindly used to reject/accept a user requesting something. That's far away. All that is true, but the code that *installed* the bpf probe might get might confused when it logs that uid 0 did such-and-such when really some unprivileged userns root did it. so what specifically you proposing? Use from_kuid(init_user_ns,...) instead? Also, as you start calling more and more non-trivial functions from bpf, you might need to start preventing bpf probe installations in those functions. yes. may be. I don't want to blacklist stuff yet, unless it causes crashes. Recursive check is already there. Probably something else will be needed. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fw: [Bug 98781] New: WWAN: TX bytes counter shows very huge impossible value
Stephen Hemminger stephen at networkplumber.org writes: Begin forwarded message: Date: Sat, 23 May 2015 16:54:50 + From: bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at bugzilla.kernel.org To: shemminger at linux-foundation.org shemminger at linux-foundation.org Subject: [Bug 98781] New: WWAN: TX bytes counter shows very huge impossible value https://bugzilla.kernel.org/show_bug.cgi?id=98781 Bug ID: 98781 Summary: WWAN: TX bytes counter shows very huge impossible value Product: Networking Version: 2.5 Kernel Version: 4.0.x Hardware: Intel OS: Linux Tree: Mainline Status: NEW Severity: normal Priority: P1 Component: Other Assignee: shemminger at linux-foundation.org Reporter: mm at superbash.de Regression: No Since version 4.0.x the TX bytes counter of the WWAN module shows a weird value. Example: $ ifconfig wwan wwan0: flags=4163UP,BROADCAST,RUNNING,MULTICAST mtu 1500 inet xxx.xxx.xxx.xxx netmask 255.255.255.252 broadcast xxx.xxx.xxx.xxx inet6 ::::: prefixlen 64 scopeid 0x20link ether xx:xx:xx:xx:xx:xx txqueuelen 1000 (Ethernet) RX packets 19036 bytes 19190321 (18.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 15874 bytes 43228847574631 (39.3 TiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 39.3 TiB - wow, absolutely not true The WWAN is used as bridge to my internet provider (LTE usb stick) I use the counter to control the traffic. It's only the TX counter, the RX works ok. I have exactly the same issue, exhibited when upgrading (n-1) kernel on Ubuntu Vivid 15.04 from 3.17.x: Linux uranis 3.19.0-20-generic #20-Ubuntu SMP Fri May 29 10:10:47 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux wwan0 Link encap:Ethernet HWaddr 26:03:a9:e3:88:2e inet addr:41.150.225.132 Bcast:41.150.225.135 Mask:255.255.255.248 inet6 addr: fe80::2403:a9ff:fee3:882e/64 Scope:Link UP BROADCAST RUNNING NOARP MULTICAST MTU:1500 Metric:1 RX packets:3366 errors:0 dropped:0 overruns:0 frame:0 TX packets:3497 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1459963 (1.4 MB) TX bytes:15019500985395 (15.0 TB) Kernel internal or module (driver) bug, shows up everywhere including 'system monitor'. This problem has come up a few years back and was solved, but seems to be back again... Any ideas on fix ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
On Fri, Jun 12, 2015 at 09:01:06AM -0700, Tom Herbert wrote: If dst, hop-by-hop or routing extension headers are present determine length of the options and skip over them in flow dissection. Signed-off-by: Tom Herbert t...@herbertland.com --- net/core/flow_dissector.c | 17 + 1 file changed, 17 insertions(+) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1818cdc..22e4dff 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -327,6 +327,7 @@ mpls: return false; } +ip_proto_again: switch (ip_proto) { case IPPROTO_GRE: { struct gre_hdr { @@ -383,6 +384,22 @@ mpls: } goto again; } + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + case NEXTHDR_DEST: { + u8 _opthdr[2], *opthdr; + + if (proto != htons(ETH_P_IPV6)) + break; + + opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), + data, hlen, _opthdr); + + ip_proto = _opthdr[0]; + nhoff += (_opthdr[1] + 1) 3; + + goto ip_proto_again; + } Dave, please revert it. My server locks up during boot with: [ 32.391955] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! [modprobe:1550] [ 32.392043] RIP: 0010:[815cd8e2] [815cd8e2] skb_copy_bits+0x12/0x260 [ 32.392060] Call Trace: [ 32.392061] IRQ [ 32.392063] [815d9f38] __skb_flow_dissect+0x358/0x820 [ 32.392064] [815da48e] __skb_get_hash+0x8e/0x2e0 [ 32.392066] [815def7b] __skb_tx_hash+0x5b/0xb0 [ 32.392067] [815df54a] __netdev_pick_tx+0x18a/0x1a0 [ 32.392068] [815df40a] ? __netdev_pick_tx+0x4a/0x1a0 [ 32.392069] [815e4db0] ? __dev_queue_xmit+0x50/0x620 [ 32.392071] [815e4d0b] netdev_pick_tx+0xcb/0x120 [ 32.392072] [815e4e08] __dev_queue_xmit+0xa8/0x620 [ 32.392073] [815e4db0] ? __dev_queue_xmit+0x50/0x620 [ 32.392076] [81698225] ? ip6_finish_output+0xa5/0x1e0 [ 32.392077] [815e53a3] dev_queue_xmit_sk+0x13/0x20 [ 32.392078] [81696144] ip6_finish_output2+0x464/0x5f0 [ 32.392079] [81698225] ? ip6_finish_output+0xa5/0x1e0 [ 32.392081] [816a5bf2] ? ip6_mtu+0xb2/0xd0 [ 32.392082] [816a5b80] ? ip6_mtu+0x40/0xd0 [ 32.392083] [81698225] ip6_finish_output+0xa5/0x1e0 [ 32.392084] [816983be] ip6_output+0x5e/0x1b0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] net: make u64_stats_init() a function
From: Eric Dumazet eduma...@google.com Using a function instead of a macro is cleaner and remove following W=1 warnings (extract) In file included from net/ipv6/ip6_vti.c:29:0: net/ipv6/ip6_vti.c: In function ‘vti6_dev_init_gen’: include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not used [-Wunused-but-set-variable] typeof(type) *stat; \ ^ net/ipv6/ip6_vti.c:862:16: note: in expansion of macro ‘netdev_alloc_pcpu_stats’ dev-tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats); ^ CC [M] net/ipv6/sit.o In file included from net/ipv6/sit.c:30:0: net/ipv6/sit.c: In function ‘ipip6_tunnel_init’: include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not used [-Wunused-but-set-variable] typeof(type) *stat; \ ^ Signed-off-by: Eric Dumazet eduma...@google.com --- include/linux/u64_stats_sync.h |7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h index 4b4439e75f45f8e915f0ffb6b855be5f1113a04f..df89c9bcba7db8dbde3bbf2b99f9af6ed562b112 100644 --- a/include/linux/u64_stats_sync.h +++ b/include/linux/u64_stats_sync.h @@ -68,11 +68,12 @@ struct u64_stats_sync { }; +static inline void u64_stats_init(struct u64_stats_sync *syncp) +{ #if BITS_PER_LONG == 32 defined(CONFIG_SMP) -# define u64_stats_init(syncp) seqcount_init(syncp.seq) -#else -# define u64_stats_init(syncp) do { } while (0) + seqcount_init(syncp-seq); #endif +} static inline void u64_stats_update_begin(struct u64_stats_sync *syncp) { -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors
On 6/12/15 5:24 PM, Andy Lutomirski wrote: so what specifically you proposing? Use from_kuid(init_user_ns,...) instead? That seems reasonable to me. After all, you can't install one of these probes from a non-init userns. ok. will respin with that change. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] Fix Cavium Liquidio build related errors and warnings
1) Fixed following sparse warnings: lio_main.c:213:6: warning: symbol 'octeon_droq_bh' was not declared. Should it be static? lio_main.c:233:5: warning: symbol 'lio_wait_for_oq_pkts' was not declared. Should it be static? lio_main.c:3083:5: warning: symbol 'lio_nic_info' was not declared. Should it be static? lio_main.c:2618:16: warning: cast from restricted __be16 octeon_device.c:466:6: warning: symbol 'oct_set_config_info' was not declared. Should it be static? octeon_device.c:573:25: warning: cast to restricted __be32 octeon_device.c:582:29: warning: cast to restricted __be32 octeon_device.c:584:39: warning: cast to restricted __be32 octeon_device.c:594:13: warning: cast to restricted __be32 octeon_device.c:596:25: warning: cast to restricted __be32 octeon_device.c:613:25: warning: cast to restricted __be32 octeon_device.c:614:29: warning: cast to restricted __be64 octeon_device.c:615:29: warning: cast to restricted __be32 octeon_device.c:619:37: warning: cast to restricted __be32 octeon_device.c:623:33: warning: cast to restricted __be32 cn66xx_device.c:540:6: warning: symbol 'lio_cn6xxx_get_pcie_qlmport' was not declared. Should it be s octeon_mem_ops.c:181:16: warning: cast to restricted __be64 octeon_mem_ops.c:190:16: warning: cast to restricted __be32 octeon_mem_ops.c:196:17: warning: incorrect type in initializer 2) Fix build errors corresponding to vmalloc on linux-next 4.1. 3) Liquidio now supports 64 bit only, modified Kconfig accordingly. 4) Fix some code alignment issues based on kernel build warnings. Signed-off-by: Derek Chickles derek.chick...@caviumnetworks.com Signed-off-by: Satanand Burla satananda.bu...@caviumnetworks.com Signed-off-by: Felix Manlunas felix.manlu...@caviumnetworks.com Signed-off-by: Raghu Vatsavayi raghu.vatsav...@caviumnetworks.com --- drivers/net/ethernet/cavium/Kconfig| 1 + drivers/net/ethernet/cavium/liquidio/cn66xx_device.c | 2 +- drivers/net/ethernet/cavium/liquidio/lio_main.c| 9 + drivers/net/ethernet/cavium/liquidio/liquidio_image.h | 14 +++--- drivers/net/ethernet/cavium/liquidio/octeon_device.c | 8 +--- drivers/net/ethernet/cavium/liquidio/octeon_droq.c | 1 + drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c | 6 +++--- drivers/net/ethernet/cavium/liquidio/request_manager.c | 4 +++- 8 files changed, 26 insertions(+), 19 deletions(-) diff --git a/drivers/net/ethernet/cavium/Kconfig b/drivers/net/ethernet/cavium/Kconfig index c7d8674..5e7a0e2 100644 --- a/drivers/net/ethernet/cavium/Kconfig +++ b/drivers/net/ethernet/cavium/Kconfig @@ -43,6 +43,7 @@ configTHUNDER_NIC_BGX config LIQUIDIO tristate Cavium LiquidIO support + depends on 64BIT select PTP_1588_CLOCK select FW_LOADER select LIBCRC32 diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c index d23f494..8ad7425 100644 --- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c +++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c @@ -537,7 +537,7 @@ void lio_cn6xxx_disable_interrupt(void *chip) mmiowb(); } -void lio_cn6xxx_get_pcie_qlmport(struct octeon_device *oct) +static void lio_cn6xxx_get_pcie_qlmport(struct octeon_device *oct) { /* CN63xx Pass2 and newer parts implements the SLI_MAC_NUMBER register * to determine the PCIE port # diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c b/drivers/net/ethernet/cavium/liquidio/lio_main.c index c75f517..0660dee 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_main.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c @@ -26,6 +26,7 @@ #include linux/pci.h #include linux/pci_ids.h #include linux/ip.h +#include net/ip.h #include linux/ipv6.h #include linux/net_tstamp.h #include linux/if_vlan.h @@ -210,7 +211,7 @@ static int liquidio_probe(struct pci_dev *pdev, static struct handshake handshake[MAX_OCTEON_DEVICES]; static struct completion first_stage; -void octeon_droq_bh(unsigned long pdev) +static void octeon_droq_bh(unsigned long pdev) { int q_no; int reschedule = 0; @@ -230,7 +231,7 @@ void octeon_droq_bh(unsigned long pdev) tasklet_schedule(oct_priv-droq_tasklet); } -int lio_wait_for_oq_pkts(struct octeon_device *oct) +static int lio_wait_for_oq_pkts(struct octeon_device *oct) { struct octeon_device_priv *oct_priv = (struct octeon_device_priv *)oct-priv; @@ -2615,7 +2616,7 @@ static inline int is_ip_fragmented(struct sk_buff *skb) * with more to follow; the current offset could be 0 ). * - ths offset field is non-zero. */ - return htons(ip_hdr(skb)-frag_off) 0x3fff; + return (ip_hdr(skb)-frag_off htons(IP_MF | IP_OFFSET)) ? 1 : 0; } static inline int is_ipv6(struct sk_buff *skb) @@ -3080,7 +3081,7 @@
[PATCH net-next] tcp: tcp_v6_connect() cleanup
From: Eric Dumazet eduma...@google.com Remove dead code from tcp_v6_connect() Signed-off-by: Eric Dumazet eduma...@google.com --- net/ipv6/tcp_ipv6.c |2 -- 1 file changed, 2 deletions(-) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 45a7176ed460681558808439f20e1622423f4c32..6748c4277affad71cd721e3a985af10c31c047ad 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -120,7 +120,6 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, struct ipv6_pinfo *np = inet6_sk(sk); struct tcp_sock *tp = tcp_sk(sk); struct in6_addr *saddr = NULL, *final_p, final; - struct rt6_info *rt; struct flowi6 fl6; struct dst_entry *dst; int addr_type; @@ -258,7 +257,6 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, sk-sk_gso_type = SKB_GSO_TCPV6; __ip6_dst_store(sk, dst, NULL, NULL); - rt = (struct rt6_info *) dst; if (tcp_death_row.sysctl_tw_recycle !tp-rx_opt.ts_recent_stamp ipv6_addr_equal(fl6.daddr, sk-sk_v6_daddr)) -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 0/2] flow_dissector: Fix MPLS parsing and add ext hdr support
Need to shift label. Added parsing of dst, hop-by-hop, and routing extension headers. Tom Herbert (2): flow_dissector: Fix MPLS entropy label handling in flow dissector flow_dissector: add support for dst, hop-by-hop and routing ext hdrs net/core/flow_dissector.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs
If dst, hop-by-hop or routing extension headers are present determine length of the options and skip over them in flow dissection. Signed-off-by: Tom Herbert t...@herbertland.com --- net/core/flow_dissector.c | 17 + 1 file changed, 17 insertions(+) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 1818cdc..22e4dff 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -327,6 +327,7 @@ mpls: return false; } +ip_proto_again: switch (ip_proto) { case IPPROTO_GRE: { struct gre_hdr { @@ -383,6 +384,22 @@ mpls: } goto again; } + case NEXTHDR_HOP: + case NEXTHDR_ROUTING: + case NEXTHDR_DEST: { + u8 _opthdr[2], *opthdr; + + if (proto != htons(ETH_P_IPV6)) + break; + + opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr), + data, hlen, _opthdr); + + ip_proto = _opthdr[0]; + nhoff += (_opthdr[1] + 1) 3; + + goto ip_proto_again; + } case IPPROTO_IPIP: proto = htons(ETH_P_IP); goto ip; -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next 1/2] flow_dissector: Fix MPLS entropy label handling in flow dissector
Need to shift after masking to get label value for comparison. Fixes: b3baa0fbd02a1a9d493d8 (mpls: Add MPLS entropy label in flow_keys) Reported-by: Dan Carpenter dan.carpen...@oracle.com Signed-off-by: Tom Herbert t...@herbertland.com --- net/core/flow_dissector.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c index 77e22e4..1818cdc 100644 --- a/net/core/flow_dissector.c +++ b/net/core/flow_dissector.c @@ -299,8 +299,8 @@ mpls: if (!hdr) return false; - if ((ntohl(hdr[0].entry) MPLS_LS_LABEL_MASK) == -MPLS_LABEL_ENTROPY) { + if ((ntohl(hdr[0].entry) MPLS_LS_LABEL_MASK) +MPLS_LS_LABEL_SHIFT == MPLS_LABEL_ENTROPY) { if (skb_flow_dissector_uses_key(flow_dissector, FLOW_DISSECTOR_KEY_MPLS_ENTROPY)) { key_keyid = skb_flow_dissector_target(flow_dissector, -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH WIP RFC 0/3] mpls: support for ler
On 6/10/15, 12:13 AM, roopa wrote: Robert/Thomas, All my changes are in the below repo under the 'mpls' branch. https://github.com/CumulusNetworks/net-next https://github.com/CumulusNetworks/iproute2 The last iproute2 commit has a sample usage. The commits pushed to this tree do not contain support for the following yet (but working on it): a) tunnel routes to work with tunnel RTA_OIF and a non-tunnel RTA_OIF: The current commits in the tree assume a non-tunnel RTA_OIF. If the tunnel driver has registered a dst_output func, dst_output is set to the tunnel dst output handler in the receive route lookup path which in turn does the encap and xmits. Thomas had last suggested using a flag to skip the dst output handler re-direction for cases where RTA_OIF is a special tunnel netdev and the tunnel driver xmit function can do the encap. My current thinking is to pass the oif to the encap parse handler and the handler can set the flag on the tunnel state. And this flag can then be used to skip the dst_output re-direction. This change should be trivial will fix it soon. I have pushed this change to my github tree. b) make RTA_OIF optional and do a fib lookup. thinking about this some more, RTA_OIF is already optional. And net/ipv4/fib_semantics.c:fib_check_nh will lookup the dev if not specified. Wouldn't that be enough ?. (unless i have misunderstood something here) thanks, Roopa -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netlink: add API to retrieve all group memberships
This patch adds getsockopt(SOL_NETLINK, NETLINK_LIST_MEMBERSHIPS) to retrieve all groups a socket is a member of. Currently, we have to use getsockname() and look at the nl.nl_groups bitmask. However, this mask is limited to 32 groups. Hence, similar to NETLINK_ADD_MEMBERSHIP and NETLINK_DROP_MEMBERSHIP, this adds a separate sockopt to manager higher groups IDs than 32. This new NETLINK_LIST_MEMBERSHIPS option takes a pointer to __u32 and the size of the array. The array is filled with the full membership-set of the socket, and the required array size is returned in optlen. Hence, user-space can retry with a properly sized array in case it was too small. Signed-off-by: David Herrmann dh.herrm...@gmail.com --- include/uapi/linux/netlink.h | 15 --- net/netlink/af_netlink.c | 22 ++ 2 files changed, 30 insertions(+), 7 deletions(-) diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h index 1a85940..e38094f 100644 --- a/include/uapi/linux/netlink.h +++ b/include/uapi/linux/netlink.h @@ -101,13 +101,14 @@ struct nlmsgerr { struct nlmsghdr msg; }; -#define NETLINK_ADD_MEMBERSHIP 1 -#define NETLINK_DROP_MEMBERSHIP2 -#define NETLINK_PKTINFO3 -#define NETLINK_BROADCAST_ERROR4 -#define NETLINK_NO_ENOBUFS 5 -#define NETLINK_RX_RING6 -#define NETLINK_TX_RING7 +#define NETLINK_ADD_MEMBERSHIP 1 +#define NETLINK_DROP_MEMBERSHIP2 +#define NETLINK_PKTINFO3 +#define NETLINK_BROADCAST_ERROR4 +#define NETLINK_NO_ENOBUFS 5 +#define NETLINK_RX_RING6 +#define NETLINK_TX_RING7 +#define NETLINK_LIST_MEMBERSHIPS 8 struct nl_pktinfo { __u32 group; diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index bf6e766..b84dbe7 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -2254,6 +2254,28 @@ static int netlink_getsockopt(struct socket *sock, int level, int optname, return -EFAULT; err = 0; break; + case NETLINK_LIST_MEMBERSHIPS: { + int pos, idx, shift; + + err = 0; + netlink_table_grab(); + for (pos = 0; pos * 8 nlk-ngroups; pos += sizeof(u32)) { + if (len - pos sizeof(u32)) + break; + + idx = pos / sizeof(unsigned long); + shift = (pos % sizeof(unsigned long)) * 8; + if (put_user((u32)(nlk-groups[idx] shift), +(u32 __user *)(optval + pos))) { + err = -EFAULT; + break; + } + } + if (put_user(ALIGN(nlk-ngroups / 8, sizeof(u32)), optlen)) + err = -EFAULT; + netlink_table_ungrab(); + break; + } default: err = -ENOPROTOOPT; } -- 2.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH net-next v2 05/19] bna: use BIT(x) instead of (1 x)
From: Ivan Vecera ... diff --git a/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h b/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h index 679a503..16090fd 100644 --- a/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h +++ b/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h @@ -75,7 +75,7 @@ enum { CB_GPIO_FC4P2 = (4), /*! 4G 2port FC card */ CB_GPIO_FC4P1 = (5), /*! 4G 1port FC card */ CB_GPIO_DFLY= (6), /*! 8G 2port FC mezzanine card */ - CB_GPIO_PROTO = (1 7) /*! 8G 2port FC prototypes */ + CB_GPIO_PROTO = BIT(7)/*! 8G 2port FC prototypes */ That doesn't look like a BIT() value to me, just a large number. Should the release driver even have support for the prototype hardware? ... - if (rx_enet_mask ((u32)(1 i))) { + if (rx_enet_mask ((u32)BIT(i))) { The (u32) cast looks superfluous. There are also too many (). ... - int bit = (1 (vlan_id BFI_VLAN_WORD_MASK)); + int bit = BIT((vlan_id BFI_VLAN_WORD_MASK)); Too many () David -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] netdevice: add netdev_pub helper function
Being able to utilize this makes much code a lot simpler and cleaner. It's a nice convenience function. Signed-off-by: Jason A. Donenfeld ja...@zx2c4.com --- include/linux/netdevice.h | 11 +++ 1 file changed, 11 insertions(+) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 05b9a69..f85be18 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1871,6 +1871,17 @@ static inline void *netdev_priv(const struct net_device *dev) return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN); } +/** + * netdev_pub - access network device from private pointer + * @priv: private data pointer of network device + * + * Get network device from a network device private data pointer + */ +static inline struct net_device *netdev_pub(void *priv) +{ + return (struct net_device *)((char *)priv - ALIGN(sizeof(struct net_device), NETDEV_ALIGN)); +} + /* Set the sysfs physical device reference for the network logical device * if set prior to registration will cause a symlink during initialization. */ -- 2.4.2 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] Increase limit of macvtap queues
Macvtap should be compatible with tuntap for maximum number of queues. '1059590254fa9dce9cafc4f07d1103dbec415e76' removes the limitation and increases number of queues in tuntap. Now, Its safe to increase number of queues in Macvtap as well. Signed-off-by: Pankaj Gupta pagu...@redhat.com --- include/linux/if_macvlan.h |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h index 6f6929e..a4ccc31 100644 --- a/include/linux/if_macvlan.h +++ b/include/linux/if_macvlan.h @@ -29,7 +29,7 @@ struct macvtap_queue; * Maximum times a macvtap device can be opened. This can be used to * configure the number of receive queue, e.g. for multiqueue virtio. */ -#define MAX_MACVTAP_QUEUES 16 +#define MAX_MACVTAP_QUEUES 256 #define MACVLAN_MC_FILTER_BITS 8 #define MACVLAN_MC_FILTER_SZ (1 MACVLAN_MC_FILTER_BITS) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet eric.duma...@gmail.com wrote: On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote: On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote: I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter started reporting a hidden port on my box. Running unhide-tcp I see this: # unhide-tcp Unhide-tcp 20121229 Copyright © 2012 Yago Jesus Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 946 [*]Starting UDP checking This scared the hell out of me as I'm thinking that I have got some kind of NSA backdoor hooked into my server and it is monitoring my plans to smuggle Kinder Überraschung into the USA from Germany. I panicked! Well, I wasted the day writing modules to first look at all the sockets opened by all processes (via their file descriptors) and posted their port numbers. http://rostedt.homelinux.com/private/tasklist.c But this port wasn't there either. Then I decided to look at the ports in tcp_hashinfo. http://rostedt.homelinux.com/private/portlist.c This found the port but no file was connected to it, and worse yet, when I first ran it without using probe_kernel_read(), it crashed my kernel, because sk-sk_socket pointed to a freed socket! Note, each boot, the hidden port is different. Finally, I decided to bring in the big guns, and inserted a trace_printk() into the bind logic, to see if I could find the culprit. After fiddling with it a few times, I found a suspect: kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 Bah, it's a kernel thread doing it, via a work queue. I then added a trace_dump_stack() to find what was calling this, and here it is: kworker/3:1H-123 [003] ..s.96.696222: stack trace = inet_csk_get_port = inet_addr_type = inet_bind = xs_bind = sock_setsockopt = __sock_create = xs_create_sock.isra.18 = xs_tcp_setup_socket = process_one_work = worker_thread = worker_thread = kthread = kthread = ret_from_fork = kthread I rebooted, and examined what happens. I see the kworker binding that port, and all seems well: # netstat -tapn |grep 946 tcp0 0 192.168.23.9:946192.168.23.22:55201 ESTABLISHED - But waiting for a bit, the connection goes into a TIME_WAIT, and then it just disappears. But the bind to the port does not get released, and that port is from then on, taken. This never happened with my 3.19 kernels. I would bisect it but this is happening on my main server box which I usually only reboot every other month doing upgrades. It causes too much disturbance for myself (and my family) as when this box is offline, basically the rest of my machines are too. I figured this may be enough information to see if you can fix it. Otherwise I can try to do the bisect, but that's not going to happen any time soon. I may just go back to 3.19 for now, such that rkhunter stops complaining about the hidden port. The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set SO_REUSEPORT socket option for TCP connections). Perhaps there is an issue with that? Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)-linger2 = 0; Are you sure? I thought that SO_LINGER is more about controlling how the socket behaves w.r.t. waiting for the TCP_CLOSE state to be achieved (i.e. about aborting the FIN state negotiation early). I've never observed an effect on the TCP time-wait states. Are you sure SO_REUSEADDR was not the thing you wanted ? Yes. SO_REUSEADDR has the problem that it requires you bind to something other than 0.0.0.0, so it is less appropriate for outgoing connections; the RPC code really should not have to worry about routing and routability of a particular source address. Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 07:40:35 -0700 Eric Dumazet eric.duma...@gmail.com wrote: Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)-linger2 = 0; Are you sure SO_REUSEADDR was not the thing you wanted ? Steven, have you tried kmemleak ? Nope, and again, I'm hesitant on adding too much debug. This is my main server (build server, ssh server, web server, mail server, proxy server, irc server, etc). Although, I made dprintk() into trace_printk() in xprtsock.c and xprt.c, and reran it. Here's the output: (port 684 was the bad one this time) # tracer: nop # # entries-in-buffer/entries-written: 396/396 #P:4 # # _-= irqs-off # / _= need-resched #| / _---= hardirq/softirq #|| / _--= preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | rpc.nfsd-4710 [002] 48.615382: xs_local_setup_socket: RPC: worker connecting xprt 8800d9018000 via AF_LOCAL to /var/run/rpcbind.sock rpc.nfsd-4710 [002] 48.615393: xs_local_setup_socket: RPC: xprt 8800d9018000 connected to /var/run/rpcbind.sock rpc.nfsd-4710 [002] 48.615394: xs_setup_local: RPC: set up xprt to /var/run/rpcbind.sock via AF_LOCAL rpc.nfsd-4710 [002] 48.615399: xprt_create_transport: RPC: created transport 8800d9018000 with 65536 slots rpc.nfsd-4710 [002] 48.615416: xprt_alloc_slot: RPC: 1 reserved req 8800db829600 xid cb06d5e8 rpc.nfsd-4710 [002] 48.615419: xprt_prepare_transmit: RPC: 1 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615420: xprt_transmit: RPC: 1 xprt_transmit(44) rpc.nfsd-4710 [002] 48.615424: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-4710 [002] 48.615425: xprt_transmit: RPC: 1 xmit complete rpcbind-1829 [003] ..s.48.615503: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615506: xprt_complete_rqst: RPC: 1 xid cb06d5e8 complete (24 bytes received) rpc.nfsd-4710 [002] 48.615556: xprt_release: RPC: 1 release request 8800db829600 rpc.nfsd-4710 [002] 48.615568: xprt_alloc_slot: RPC: 2 reserved req 8800db829600 xid cc06d5e8 rpc.nfsd-4710 [002] 48.615569: xprt_prepare_transmit: RPC: 2 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615569: xprt_transmit: RPC: 2 xprt_transmit(44) rpc.nfsd-4710 [002] 48.615578: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-4710 [002] 48.615578: xprt_transmit: RPC: 2 xmit complete rpcbind-1829 [003] ..s.48.615643: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615645: xprt_complete_rqst: RPC: 2 xid cc06d5e8 complete (24 bytes received) rpc.nfsd-4710 [002] 48.615695: xprt_release: RPC: 2 release request 8800db829600 rpc.nfsd-4710 [002] 48.615698: xprt_alloc_slot: RPC: 3 reserved req 8800db829600 xid cd06d5e8 rpc.nfsd-4710 [002] 48.615699: xprt_prepare_transmit: RPC: 3 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615700: xprt_transmit: RPC: 3 xprt_transmit(68) rpc.nfsd-4710 [002] 48.615706: xs_local_send_request: RPC: xs_local_send_request(68) = 0 rpc.nfsd-4710 [002] 48.615707: xprt_transmit: RPC: 3 xmit complete rpcbind-1829 [003] ..s.48.615784: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615785: xprt_complete_rqst: RPC: 3 xid cd06d5e8 complete (28 bytes received) rpc.nfsd-4710 [002] 48.615830: xprt_release: RPC: 3 release request 8800db829600 rpc.nfsd-4710 [002] 48.615833: xprt_alloc_slot: RPC: 4 reserved req 8800db829600 xid ce06d5e8 rpc.nfsd-4710 [002] 48.615834: xprt_prepare_transmit: RPC: 4 xprt_prepare_transmit rpc.nfsd-4710 [002] 48.615835: xprt_transmit: RPC: 4 xprt_transmit(68) rpc.nfsd-4710 [002] 48.615841: xs_local_send_request: RPC: xs_local_send_request(68) = 0 rpc.nfsd-4710 [002] 48.615841: xprt_transmit: RPC: 4 xmit complete rpcbind-1829 [003] ..s.48.615892: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [003] ..s.48.615894: xprt_complete_rqst: RPC: 4 xid ce06d5e8
[PATCH] Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt
This patch fix URL (http to https) for wiki.wireshark.org. Signed-off-by: Masanari Iida standby2...@gmail.com --- Documentation/networking/udplite.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/networking/udplite.txt b/Documentation/networking/udplite.txt index d727a38..53a7268 100644 --- a/Documentation/networking/udplite.txt +++ b/Documentation/networking/udplite.txt @@ -20,7 +20,7 @@ files/UDP-Lite-HOWTO.txt o The Wireshark UDP-Lite WiKi (with capture files): - http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol + https://wiki.wireshark.org/Lightweight_User_Datagram_Protocol o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt -- 2.4.3.413.ga5fe668 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-12 at 10:57 -0400, Trond Myklebust wrote: On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet eric.duma...@gmail.com wrote: Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)-linger2 = 0; Are you sure? I thought that SO_LINGER is more about controlling how the socket behaves w.r.t. waiting for the TCP_CLOSE state to be achieved (i.e. about aborting the FIN state negotiation early). I've never observed an effect on the TCP time-wait states. Definitely this is standard way to avoid time-wait states. Maybe not very well documented. We probably should... http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required Yes. SO_REUSEADDR has the problem that it requires you bind to something other than 0.0.0.0, so it is less appropriate for outgoing connections; the RPC code really should not have to worry about routing and routability of a particular source address. OK understood. Are you trying to reuse same 4-tuple ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH net-next] tcp: cdg: use div_u64()
Fixes cross-compile to mips. Signed-off-by: Kenneth Klette Jonassen kenne...@ifi.uio.no --- Fixes build error for mips-allyesconfig: net/built-in.o: In function `tcp_cdg_cong_avoid': tcp_cdg.c:(.text+0x217774): undefined reference to `__udivdi3' https://lists.01.org/pipermail/kbuild-all/2015-June/010142.html --- net/ipv4/tcp_cdg.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c index a52ce2d..8c6fd3d 100644 --- a/net/ipv4/tcp_cdg.c +++ b/net/ipv4/tcp_cdg.c @@ -145,7 +145,7 @@ static void tcp_cdg_hystart_update(struct sock *sk) return; if (hystart_detect HYSTART_ACK_TRAIN) { - u32 now_us = local_clock() / NSEC_PER_USEC; + u32 now_us = div_u64(local_clock(), NSEC_PER_USEC); if (ca-last_ack == 0 || !tcp_is_cwnd_limited(sk)) { ca-last_ack = now_us; -- 2.1.0 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] net: fec: Ensure clocks are enabled while using mdio bus
When a switch is attached to the mdio bus, the mdio bus can be used while the interface is not open. If the clocks are not enabled, MDIO reads/writes will simply time out. So enable the clocks before starting a transaction, and disable them afterwards. The CCF performs reference counting so the clocks will only be disabled if there are no other users. Signed-off-by: Andrew Lunn and...@lunn.ch --- drivers/net/ethernet/freescale/fec_main.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index bf4cf3fbb5f2..122186b90cdb 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -65,6 +65,7 @@ static void set_multicast_list(struct net_device *ndev); static void fec_enet_itr_coal_init(struct net_device *ndev); +static int fec_enet_clk_enable(struct net_device *ndev, bool enable); #define DRIVER_NAMEfec @@ -1764,6 +1765,11 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) { struct fec_enet_private *fep = bus-priv; unsigned long time_left; + int ret; + + ret = fec_enet_clk_enable(fep-netdev, true); + if (ret) + return 0x; fep-mii_timeout = 0; init_completion(fep-mdio_done); @@ -1779,11 +1785,14 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int mii_id, int regnum) if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO read timeout\n); + fec_enet_clk_enable(fep-netdev, false); return -ETIMEDOUT; } - /* return value */ - return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); + ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA)); + fec_enet_clk_enable(fep-netdev, false); + + return ret; } static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, @@ -1791,10 +1800,15 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, { struct fec_enet_private *fep = bus-priv; unsigned long time_left; + int ret; fep-mii_timeout = 0; init_completion(fep-mdio_done); + ret = fec_enet_clk_enable(fep-netdev, true); + if (ret) + netdev_err(fep-netdev, Unable to enable clks\n); + /* start a write op */ writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE | FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) | @@ -1807,9 +1821,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum, if (time_left == 0) { fep-mii_timeout = 1; netdev_err(fep-netdev, MDIO write timeout\n); + fec_enet_clk_enable(fep-netdev, false); return -ETIMEDOUT; } + fec_enet_clk_enable(fep-netdev, false); + return 0; } -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 11:34:20 -0400 Steven Rostedt rost...@goodmis.org wrote: On Fri, 12 Jun 2015 07:40:35 -0700 Eric Dumazet eric.duma...@gmail.com wrote: Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)-linger2 = 0; Are you sure SO_REUSEADDR was not the thing you wanted ? Steven, have you tried kmemleak ? Nope, and again, I'm hesitant on adding too much debug. This is my main server (build server, ssh server, web server, mail server, proxy server, irc server, etc). Although, I made dprintk() into trace_printk() in xprtsock.c and xprt.c, and reran it. Here's the output: I reverted the following commits: c627d31ba0696cbd829437af2be2f2dee3546b1e 9e2b9f37760e129cee053cc7b6e7288acc2a7134 caf4ccd4e88cf2795c927834bc488c8321437586 And the issue goes away. That is, I watched the port go from ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. In fact, I watched the port with my portlist.c module, and it disappeared there too when it entered the TIME_WAIT state. Here's the trace of that run: # tracer: nop # # entries-in-buffer/entries-written: 397/397 #P:4 # # _-= irqs-off # / _= need-resched #| / _---= hardirq/softirq #|| / _--= preempt-depth #||| / delay # TASK-PID CPU# TIMESTAMP FUNCTION # | | | | | rpc.nfsd-3932 [002] 44.098689: xs_local_setup_socket: RPC: worker connecting xprt 88040b6f5800 via AF_LOCAL to /var/run/rpcbind.sock rpc.nfsd-3932 [002] 44.098699: xs_local_setup_socket: RPC: xprt 88040b6f5800 connected to /var/run/rpcbind.sock rpc.nfsd-3932 [002] 44.098700: xs_setup_local: RPC: set up xprt to /var/run/rpcbind.sock via AF_LOCAL rpc.nfsd-3932 [002] 44.098704: xprt_create_transport: RPC: created transport 88040b6f5800 with 65536 slots rpc.nfsd-3932 [002] 44.098717: xprt_alloc_slot: RPC: 1 reserved req 8800d8cc6800 xid 0850084b rpc.nfsd-3932 [002] 44.098720: xprt_prepare_transmit: RPC: 1 xprt_prepare_transmit rpc.nfsd-3932 [002] 44.098721: xprt_transmit: RPC: 1 xprt_transmit(44) rpc.nfsd-3932 [002] 44.098724: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-3932 [002] 44.098724: xprt_transmit: RPC: 1 xmit complete rpcbind-1829 [001] ..s.44.098812: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [001] ..s.44.098815: xprt_complete_rqst: RPC: 1 xid 0850084b complete (24 bytes received) rpc.nfsd-3932 [002] 44.098854: xprt_release: RPC: 1 release request 8800d8cc6800 rpc.nfsd-3932 [002] 44.098864: xprt_alloc_slot: RPC: 2 reserved req 8800d8cc6800 xid 0950084b rpc.nfsd-3932 [002] 44.098865: xprt_prepare_transmit: RPC: 2 xprt_prepare_transmit rpc.nfsd-3932 [002] 44.098865: xprt_transmit: RPC: 2 xprt_transmit(44) rpc.nfsd-3932 [002] 44.098870: xs_local_send_request: RPC: xs_local_send_request(44) = 0 rpc.nfsd-3932 [002] 44.098870: xprt_transmit: RPC: 2 xmit complete rpcbind-1829 [001] ..s.44.098915: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [001] ..s.44.098917: xprt_complete_rqst: RPC: 2 xid 0950084b complete (24 bytes received) rpc.nfsd-3932 [002] 44.098968: xprt_release: RPC: 2 release request 8800d8cc6800 rpc.nfsd-3932 [002] 44.098971: xprt_alloc_slot: RPC: 3 reserved req 8800d8cc6800 xid 0a50084b rpc.nfsd-3932 [002] 44.098972: xprt_prepare_transmit: RPC: 3 xprt_prepare_transmit rpc.nfsd-3932 [002] 44.098973: xprt_transmit: RPC: 3 xprt_transmit(68) rpc.nfsd-3932 [002] 44.098978: xs_local_send_request: RPC: xs_local_send_request(68) = 0 rpc.nfsd-3932 [002] 44.098978: xprt_transmit: RPC: 3 xmit complete rpcbind-1829 [001] ..s.44.099029: xs_local_data_ready: RPC: xs_local_data_ready... rpcbind-1829 [001] ..s.44.099031: xprt_complete_rqst: RPC: 3 xid 0a50084b complete (28 bytes received) rpc.nfsd-3932 [002] 44.099083: xprt_release: RPC: 3 release request 8800d8cc6800 rpc.nfsd-3932 [002] 44.099086: xprt_alloc_slot: RPC: 4 reserved req 8800d8cc6800 xid 0b50084b rpc.nfsd-3932 [002] 44.099086: xprt_prepare_transmit: RPC: 4
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 12 Jun 2015 11:50:38 -0400 Steven Rostedt rost...@goodmis.org wrote: On Fri, 12 Jun 2015 11:34:20 -0400 Steven Rostedt rost...@goodmis.org wrote: And the issue goes away. That is, I watched the port go from ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port. s/theirs/there's/ Time to go back to grammar school. :-p -- Steve -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] mlx4_en: don't wait for high order page allocation
High order page allocation can cause direct memory compaction and harm performance. The patch makes the high order page allocation don't wait, so not trigger direct memory compaction with memory pressure. More details can be found in a similar patch for net core: http://marc.info/?l=linux-mmm=143406665720428w=2 Cc: Amir Vadai am...@mellanox.com Cc: Ido Shamay i...@mellanox.com Cc: Eric Dumazet eduma...@google.com Signed-off-by: Shaohua Li s...@fb.com --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 2a77a6b..9bc4143 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -60,8 +60,11 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv, for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) { gfp_t gfp = _gfp; - if (order) + if (order) { + if ((PAGE_SIZE (order - 1)) = frag_info-frag_size) + gfp = ~__GFP_WAIT; gfp |= __GFP_COMP | __GFP_NOWARN; + } page = alloc_pages(gfp, order); if (likely(page)) break; -- 1.8.1 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[BUG ?] delay always evaluates to 0
Hi ! commit 2c86c275015c (Add ipw2100 wireless driver.) introduced drivers/net/wireless/ipw2100.c - line-numbers are from next-20150511 1410 static int ipw2100_hw_phy_off(struct ipw2100_priv *priv) 1411 { 1412 1413 #define HW_PHY_OFF_LOOP_DELAY (HZ / 5000) 1414 ... 1437 1438 schedule_timeout_uninterruptible(HW_PHY_OFF_LOOP_DELAY); 1439 } but (HZ / 5000) will evaluate to 0 for all configurable HZ values - typo ? and this schedule_timeout_uninterruptible() is probably not doing what is intended. thx! hofrat -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mlx4_en: don't wait for high order page allocation
On Fri, Jun 12, 2015 at 10:05:42AM -0700, Alexander Duyck wrote: On 06/12/2015 09:50 AM, Shaohua Li wrote: High order page allocation can cause direct memory compaction and harm performance. The patch makes the high order page allocation don't wait, so not trigger direct memory compaction with memory pressure. More details can be found in a similar patch for net core: http://marc.info/?l=linux-mmm=143406665720428w=2 Cc: Amir Vadai am...@mellanox.com Cc: Ido Shamay i...@mellanox.com Cc: Eric Dumazet eduma...@google.com Signed-off-by: Shaohua Li s...@fb.com --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 2a77a6b..9bc4143 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -60,8 +60,11 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv, for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) { gfp_t gfp = _gfp; -if (order) +if (order) { +if ((PAGE_SIZE (order - 1)) = frag_info-frag_size) +gfp = ~__GFP_WAIT; gfp |= __GFP_COMP | __GFP_NOWARN; +} page = alloc_pages(gfp, order); if (likely(page)) break; Is this even really necessary? I would thing the fact that the refill is done using GFP_ATOMIC would be enough to cover the frequently used cases. I wouldn't think the initial allocation when the interface is brought up would be something that is a big enough deal to justify being fixed in this case. Ok, if the allocation is always using GFP_ATOMIC at runtime, we don't need this of course. please ignore it then. Thanks, Shaohua -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next v2] switchdev: fix BUG when port driver doesn't support set attr op
On Thu, Jun 11, 2015 at 08:19:01AM -0700, sfel...@gmail.com wrote: From: Scott Feldman sfel...@gmail.com Fix a BUG_ON() where CONFIG_NET_SWITCHDEV is set but the driver for a bridged port does not support switchdev_port_attr_set op. Don't BUG_ON() if -EOPNOTSUPP is returned. Also change BUG_ON() to netdev_err since this is a normal error path and does not warrant the use of BUG_ON(), which is reserved for unrecoverable errs. Signed-off-by: Scott Feldman sfel...@gmail.com Reported-by: Brenden Blanco bbla...@plumgrid.com This is less aggressive -- good call. Acked-by: Andy Gospodarek go...@cumulusnetworks.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] mlx4_en: don't wait for high order page allocation
On 06/12/2015 09:50 AM, Shaohua Li wrote: High order page allocation can cause direct memory compaction and harm performance. The patch makes the high order page allocation don't wait, so not trigger direct memory compaction with memory pressure. More details can be found in a similar patch for net core: http://marc.info/?l=linux-mmm=143406665720428w=2 Cc: Amir Vadai am...@mellanox.com Cc: Ido Shamay i...@mellanox.com Cc: Eric Dumazet eduma...@google.com Signed-off-by: Shaohua Li s...@fb.com --- drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c index 2a77a6b..9bc4143 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c @@ -60,8 +60,11 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv, for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) { gfp_t gfp = _gfp; - if (order) + if (order) { + if ((PAGE_SIZE (order - 1)) = frag_info-frag_size) + gfp = ~__GFP_WAIT; gfp |= __GFP_COMP | __GFP_NOWARN; + } page = alloc_pages(gfp, order); if (likely(page)) break; Is this even really necessary? I would thing the fact that the refill is done using GFP_ATOMIC would be enough to cover the frequently used cases. I wouldn't think the initial allocation when the interface is brought up would be something that is a big enough deal to justify being fixed in this case. - Alex -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] net: phy: Allow PHY devices to identify themselves as Ethernet switches, etc.
From: Florian Fainelli f.faine...@gmail.com Some Ethernet MAC drivers using the PHY library require the hardcoding of link parameters when interfaced to a switch device, SFP module, switch to switch port, etc. This has typically lead to various ad-hoc implementations looking like this: - using a fixed PHY emulated device, which will provide link indication towards the Ethernet MAC driver and hardware - pretend there is no PHY and hardcode link parameters, ala mv643x_eth Based on that, it is desireable to have the PHY drivers advertise the correct link parameters, just like regular Ethernet PHYs towards their CPU Ethernet MAC drivers, however, Ethernet MAC drivers should be able to tell whether this link should be monitored or not. In the context of an Ethernet switch, SFP module, switch to switch link, we do not need to monitor this link since it should be always up. Signed-off-by: Florian Fainelli f.faine...@gmail.com Signed-off-by: Andrew Lunn and...@lunn.ch --- include/linux/phy.h | 12 1 file changed, 12 insertions(+) diff --git a/include/linux/phy.h b/include/linux/phy.h index a26c3f84b8dd..5c3b87c0773c 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -330,6 +330,7 @@ struct phy_c45_device_ids { * c45_ids: 802.3-c45 Device Identifers if is_c45. * is_c45: Set to true if this phy uses clause 45 addressing. * is_internal: Set to true if this phy is internal to a MAC. + * is_pseudo_fixed_link: Set to true if this phy is an Ethernet switch, etc. * has_fixups: Set to true if this phy has fixups/quirks. * suspended: Set to true if this phy has been suspended successfully. * state: state of the PHY for management purposes @@ -368,6 +369,7 @@ struct phy_device { struct phy_c45_device_ids c45_ids; bool is_c45; bool is_internal; + bool is_pseudo_fixed_link; bool has_fixups; bool suspended; @@ -686,6 +688,16 @@ static inline bool phy_interface_is_rgmii(struct phy_device *phydev) { return phydev-interface = PHY_INTERFACE_MODE_RGMII phydev-interface = PHY_INTERFACE_MODE_RGMII_TXID; +}; + +/* + * phy_is_pseudo_fixed_link - Convenience function for testing if this + * PHY is the CPU port facing side of an Ethernet switch, or similar. + * @phydev: the phy_device struct + */ +static inline bool phy_is_pseudo_fixed_link(struct phy_device *phydev) +{ + return phydev-is_pseudo_fixed_link; } /** -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] dsa: mv88e6xxx: Allow speed/duplex of port to be configured
The current code sets user ports to perform auto negotiation using the phy. CPU and DSA ports are configured to full duplex and maximum speed the switch supports. There are however use cases where the CPU has a slower port, and when user ports have SFP modules with fixed speed. In these cases, allow port settings to be read from a fixed_phy devices. Signed-off-by: Andrew Lunn and...@lunn.ch --- drivers/net/dsa/mv88e6123_61_65.c | 1 + drivers/net/dsa/mv88e6131.c | 1 + drivers/net/dsa/mv88e6171.c | 1 + drivers/net/dsa/mv88e6352.c | 1 + drivers/net/dsa/mv88e6xxx.c | 56 +++ drivers/net/dsa/mv88e6xxx.h | 2 ++ net/dsa/slave.c | 4 ++- 7 files changed, 65 insertions(+), 1 deletion(-) diff --git a/drivers/net/dsa/mv88e6123_61_65.c b/drivers/net/dsa/mv88e6123_61_65.c index 71a29a7ce538..3de2a6d73fdc 100644 --- a/drivers/net/dsa/mv88e6123_61_65.c +++ b/drivers/net/dsa/mv88e6123_61_65.c @@ -129,6 +129,7 @@ struct dsa_switch_driver mv88e6123_61_65_switch_driver = { .get_strings= mv88e6xxx_get_strings, .get_ethtool_stats = mv88e6xxx_get_ethtool_stats, .get_sset_count = mv88e6xxx_get_sset_count, + .adjust_link= mv88e6xxx_adjust_link, #ifdef CONFIG_NET_DSA_HWMON .get_temp = mv88e6xxx_get_temp, #endif diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c index 32f4a08e9bc9..3e8386529965 100644 --- a/drivers/net/dsa/mv88e6131.c +++ b/drivers/net/dsa/mv88e6131.c @@ -182,6 +182,7 @@ struct dsa_switch_driver mv88e6131_switch_driver = { .get_strings= mv88e6xxx_get_strings, .get_ethtool_stats = mv88e6xxx_get_ethtool_stats, .get_sset_count = mv88e6xxx_get_sset_count, + .adjust_link= mv88e6xxx_adjust_link, }; MODULE_ALIAS(platform:mv88e6085); diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c index 1c7808495a9d..8803e20ebc52 100644 --- a/drivers/net/dsa/mv88e6171.c +++ b/drivers/net/dsa/mv88e6171.c @@ -108,6 +108,7 @@ struct dsa_switch_driver mv88e6171_switch_driver = { .get_strings= mv88e6xxx_get_strings, .get_ethtool_stats = mv88e6xxx_get_ethtool_stats, .get_sset_count = mv88e6xxx_get_sset_count, + .adjust_link= mv88e6xxx_adjust_link, #ifdef CONFIG_NET_DSA_HWMON .get_temp = mv88e6xxx_get_temp, #endif diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c index 632815c10a40..7a2deddbe270 100644 --- a/drivers/net/dsa/mv88e6352.c +++ b/drivers/net/dsa/mv88e6352.c @@ -374,6 +374,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = { .get_strings= mv88e6xxx_get_strings, .get_ethtool_stats = mv88e6xxx_get_ethtool_stats, .get_sset_count = mv88e6xxx_get_sset_count, + .adjust_link= mv88e6xxx_adjust_link, .set_eee= mv88e6xxx_set_eee, .get_eee= mv88e6xxx_get_eee, #ifdef CONFIG_NET_DSA_HWMON diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c index 7fba330ce702..3defccb59d42 100644 --- a/drivers/net/dsa/mv88e6xxx.c +++ b/drivers/net/dsa/mv88e6xxx.c @@ -10,6 +10,7 @@ #include linux/delay.h #include linux/etherdevice.h +#include linux/ethtool.h #include linux/if_bridge.h #include linux/jiffies.h #include linux/list.h @@ -543,6 +544,61 @@ static bool mv88e6xxx_6352_family(struct dsa_switch *ds) return false; } +/* We expect the switch to perform auto negotiation if there is a real + * phy. However, in the case of a fixed link phy, we force the port + * settings from the fixed link settings. + */ +void mv88e6xxx_adjust_link(struct dsa_switch *ds, int port, + struct phy_device *phydev) +{ + struct mv88e6xxx_priv_state *ps = ds_to_priv(ds); + u32 ret, reg; + + if (!phy_is_pseudo_fixed_link(phydev)) + return; + + mutex_lock(ps-smi_mutex); + + ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL); + if (ret 0) + goto out; + + reg = ret ~(PORT_PCS_CTRL_LINK_UP | + PORT_PCS_CTRL_FORCE_LINK | + PORT_PCS_CTRL_DUPLEX_FULL | + PORT_PCS_CTRL_FORCE_DUPLEX | + PORT_PCS_CTRL_UNFORCED); + + reg |= PORT_PCS_CTRL_FORCE_LINK; + if (phydev-link) + reg |= PORT_PCS_CTRL_LINK_UP; + + if (mv88e6xxx_6065_family(ds) phydev-speed SPEED_100) + goto out; + + switch (phydev-speed) { + case SPEED_1000: + reg |= PORT_PCS_CTRL_1000; + break; + case SPEED_100: + reg |= PORT_PCS_CTRL_100; + break; + case SPEED_10: + reg |= PORT_PCS_CTRL_10; + default: + goto out; + } + +
[PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer device port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Signed-off-by: Andrew Lunn and...@lunn.ch --- include/net/dsa.h | 1 + net/dsa/dsa.c | 39 +++ 2 files changed, 40 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63ba8f73..24572f99224c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -160,6 +160,7 @@ struct dsa_switch { * Slave mii_bus and devices for the individual ports. */ u32 dsa_port_mask; + u32 cpu_port_mask; u32 phys_port_mask; u32 phys_mii_mask; struct mii_bus *slave_mii_bus; diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 392e29a0227d..f9c8f4e7ebce 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon); #endif /* CONFIG_NET_DSA_HWMON */ /* basic switch operations **/ +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master) +{ + struct dsa_chip_data *cd = ds-pd; + struct device_node *port_dn; + struct phy_device *phydev; + int ret, port; + + for (port = 0; port DSA_MAX_PORTS; port++) { + if (!((ds-cpu_port_mask | ds-dsa_port_mask) (1 port))) + continue; + + port_dn = cd-port_dn[port]; + if (of_phy_is_fixed_link(port_dn)) { + ret = of_phy_register_fixed_link(port_dn); + if (ret) { + netdev_err(master, + failed to register fixed PHY\n); + return ret; + } + phydev = of_phy_find_device(port_dn); + phydev-is_pseudo_fixed_link = true; + genphy_config_init(phydev); + genphy_read_status(phydev); + if (ds-drv-adjust_link) + ds-drv-adjust_link(ds, port, phydev); + } + } + return 0; +} + static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) { struct dsa_switch_driver *drv = ds-drv; @@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) } dst-cpu_switch = index; dst-cpu_port = i; + ds-cpu_port_mask |= 1 i; } else if (!strcmp(name, dsa)) { ds-dsa_port_mask |= 1 i; } else { @@ -297,6 +328,14 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) } } + /* Perform configuration of the CPU and DSA ports */ + ret = dsa_cpu_dsa_setup(ds, dst-master_netdev); + if (ret 0) { + netdev_err(dst-master_netdev, [%d] : can't configure CPU and DSA ports\n, + index); + ret = 0; + } + #ifdef CONFIG_NET_DSA_HWMON /* If the switch provides a temperature sensor, * register with hardware monitoring subsystem. -- 2.1.4 -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO
On 06/12/2015 07:26 AM, Neil Horman wrote: On Thu, Jun 11, 2015 at 05:27:45PM -0700, David Miller wrote: From: mleit...@redhat.com Date: Thu, 11 Jun 2015 14:49:46 -0300 From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Currently, we can ask to authenticate DATA chunks and we can send DATA chunks on the same packet as COOKIE_ECHO, but if you try to combine both, the DATA chunk will be sent unauthenticated and peer won't accept it, leading to a communication failure. This happens because even though the data was queued after it was requested to authenticate DATA chunks, it was also queued before we could know that remote peer can handle authenticating, so sctp_auth_send_cid() returns false. The fix is whenever we set up an active key, re-check send queue for chunks that now should be authenticated. As a result, such packet will now contain COOKIE_ECHO + AUTH + DATA chunks, in that order. Reported-by: Liu Wei we...@redhat.com Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com Vlad/Neil, please review. sorry Dave, though I had sent email on that already. I had an initial concern that there could be a race in which a previous iteration of sctp_outq_flush would move some chunks to a packet, but not flush it to the network layer yet (due to not being full), and that would result in the same condition. But since this only happens with a COOKIE_ECHO chunk (which is a control chunk), we should be ok, as those are sent immediately. Neil. I don't think this race can happen since outq manipulation always happens under a socket lock and so do socket options. So, we are guaranteed that outq will not change in this case. Acked-by: Vlad Yasevich vyasev...@gmail.com -vlad Acked-by: Neil Horman nhor...@tuxdriver.com -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH next v0] bonding: Display LACP info only to CAP_SYS_ADMIN capable user
On Thu, Jun 11, 2015 at 3:22 PM, David Miller da...@davemloft.net wrote: From: Mahesh Bandewar mahe...@google.com Date: Wed, 10 Jun 2015 17:19:56 -0700 Actor and Partner details can be accessed via proc-fs and sys-fs entries. These interfaces are world readable at this moment. The earlier patch-series made the LACP communication secure to avoid nuisance attack from within the same L2 domain but it did not prevent someone unprivileged looking at that information on host and perform the same act. This patch essentially avoids spitting those entries if the user in question does not have enough privileges. Signed-off-by: Mahesh Bandewar mahe...@google.com I agree with Stephen Hemminger in that you should probably be using CAP_NET_ADMIN here. Will change that into the next revision. Thanks, --mahesh.. -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote: On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote: I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter started reporting a hidden port on my box. Running unhide-tcp I see this: # unhide-tcp Unhide-tcp 20121229 Copyright © 2012 Yago Jesus Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 946 [*]Starting UDP checking This scared the hell out of me as I'm thinking that I have got some kind of NSA backdoor hooked into my server and it is monitoring my plans to smuggle Kinder Überraschung into the USA from Germany. I panicked! Well, I wasted the day writing modules to first look at all the sockets opened by all processes (via their file descriptors) and posted their port numbers. http://rostedt.homelinux.com/private/tasklist.c But this port wasn't there either. Then I decided to look at the ports in tcp_hashinfo. http://rostedt.homelinux.com/private/portlist.c This found the port but no file was connected to it, and worse yet, when I first ran it without using probe_kernel_read(), it crashed my kernel, because sk-sk_socket pointed to a freed socket! Note, each boot, the hidden port is different. Finally, I decided to bring in the big guns, and inserted a trace_printk() into the bind logic, to see if I could find the culprit. After fiddling with it a few times, I found a suspect: kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 Bah, it's a kernel thread doing it, via a work queue. I then added a trace_dump_stack() to find what was calling this, and here it is: kworker/3:1H-123 [003] ..s.96.696222: stack trace = inet_csk_get_port = inet_addr_type = inet_bind = xs_bind = sock_setsockopt = __sock_create = xs_create_sock.isra.18 = xs_tcp_setup_socket = process_one_work = worker_thread = worker_thread = kthread = kthread = ret_from_fork = kthread I rebooted, and examined what happens. I see the kworker binding that port, and all seems well: # netstat -tapn |grep 946 tcp0 0 192.168.23.9:946192.168.23.22:55201 ESTABLISHED - But waiting for a bit, the connection goes into a TIME_WAIT, and then it just disappears. But the bind to the port does not get released, and that port is from then on, taken. This never happened with my 3.19 kernels. I would bisect it but this is happening on my main server box which I usually only reboot every other month doing upgrades. It causes too much disturbance for myself (and my family) as when this box is offline, basically the rest of my machines are too. I figured this may be enough information to see if you can fix it. Otherwise I can try to do the bisect, but that's not going to happen any time soon. I may just go back to 3.19 for now, such that rkhunter stops complaining about the hidden port. The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set SO_REUSEPORT socket option for TCP connections). Perhaps there is an issue with that? Strange, because the usual way to not have time-wait is to use SO_LINGER with linger=0 And apparently xs_tcp_finish_connecting() has this : sock_reset_flag(sk, SOCK_LINGER); tcp_sk(sk)-linger2 = 0; Are you sure SO_REUSEADDR was not the thing you wanted ? Steven, have you tried kmemleak ? -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
Hi Florian, On 06/12/2015 10:18 AM, Andrew Lunn wrote: By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer device port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Signed-off-by: Andrew Lunn and...@lunn.ch --- include/net/dsa.h | 1 + net/dsa/dsa.c | 39 +++ 2 files changed, 40 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63ba8f73..24572f99224c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -160,6 +160,7 @@ struct dsa_switch { * Slave mii_bus and devices for the individual ports. */ u32 dsa_port_mask; + u32 cpu_port_mask; u32 phys_port_mask; u32 phys_mii_mask; struct mii_bus *slave_mii_bus; diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 392e29a0227d..f9c8f4e7ebce 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon); #endif /* CONFIG_NET_DSA_HWMON */ /* basic switch operations **/ +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master) +{ + struct dsa_chip_data *cd = ds-pd; + struct device_node *port_dn; + struct phy_device *phydev; + int ret, port; + + for (port = 0; port DSA_MAX_PORTS; port++) { + if (!((ds-cpu_port_mask | ds-dsa_port_mask) (1 port))) + continue; + How does cpu_port_mask interact / interfer / coexist with dst-cpu_port and dsa_is_cpu_port() ? Elsewhere we have if (dsa_is_cpu_port(ds, p) || ds-dsa_port_mask (1 p)) so I don't entirely see why we need to add cpu_port_mask at this time. Shouldn't that be a separate patch, maybe with a new macro / function to check if the port is a cpu port or an external switch port ? Thanks, Guenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
On 06/12/2015 11:14 AM, Florian Fainelli wrote: On 12/06/15 10:18, Andrew Lunn wrote: By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer device port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Humm, I suppose this means that we might end-up with two fixed PHY devices, one for the Ethernet MAC, and another one for the switch? That might duplicate the same information, though I cannot think of a better solution than using phandles to resolve that. Signed-off-by: Andrew Lunn and...@lunn.ch --- include/net/dsa.h | 1 + net/dsa/dsa.c | 39 +++ 2 files changed, 40 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63ba8f73..24572f99224c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -160,6 +160,7 @@ struct dsa_switch { * Slave mii_bus and devices for the individual ports. */ u32 dsa_port_mask; + u32 cpu_port_mask; u32 phys_port_mask; u32 phys_mii_mask; struct mii_bus *slave_mii_bus; diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 392e29a0227d..f9c8f4e7ebce 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon); #endif /* CONFIG_NET_DSA_HWMON */ /* basic switch operations **/ +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master) +{ + struct dsa_chip_data *cd = ds-pd; + struct device_node *port_dn; + struct phy_device *phydev; + int ret, port; + + for (port = 0; port DSA_MAX_PORTS; port++) { + if (!((ds-cpu_port_mask | ds-dsa_port_mask) (1 port))) + continue; + + port_dn = cd-port_dn[port]; + if (of_phy_is_fixed_link(port_dn)) { + ret = of_phy_register_fixed_link(port_dn); + if (ret) { + netdev_err(master, + failed to register fixed PHY\n); + return ret; + } + phydev = of_phy_find_device(port_dn); + phydev-is_pseudo_fixed_link = true; + genphy_config_init(phydev); + genphy_read_status(phydev); I was curious as to why you were doing this at first, but I guess this is because the PHY state machine is not started for this fixed PHY that you just created, right? + if (ds-drv-adjust_link) + ds-drv-adjust_link(ds, port, phydev); + } + } + return 0; +} + static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) { struct dsa_switch_driver *drv = ds-drv; @@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) } dst-cpu_switch = index; dst-cpu_port = i; + ds-cpu_port_mask |= 1 i; Same question as Guenter here, I assume this is because you plan on having multiple CPU ports connected to the switch and this makes it easier to deal with, is that right? If so, should that be done in a separate patch set ? Guenter -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex
On 12/06/15 10:18, Andrew Lunn wrote: By default, DSA and CPU ports are configured to the maximum speed the switch supports. However there can be use cases where the peer device port is slower. Allow a fixed-link property to be used with the DSA and CPU port in the device tree, and use this information to configure the port. Humm, I suppose this means that we might end-up with two fixed PHY devices, one for the Ethernet MAC, and another one for the switch? That might duplicate the same information, though I cannot think of a better solution than using phandles to resolve that. Signed-off-by: Andrew Lunn and...@lunn.ch --- include/net/dsa.h | 1 + net/dsa/dsa.c | 39 +++ 2 files changed, 40 insertions(+) diff --git a/include/net/dsa.h b/include/net/dsa.h index fbca63ba8f73..24572f99224c 100644 --- a/include/net/dsa.h +++ b/include/net/dsa.h @@ -160,6 +160,7 @@ struct dsa_switch { * Slave mii_bus and devices for the individual ports. */ u32 dsa_port_mask; + u32 cpu_port_mask; u32 phys_port_mask; u32 phys_mii_mask; struct mii_bus *slave_mii_bus; diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c index 392e29a0227d..f9c8f4e7ebce 100644 --- a/net/dsa/dsa.c +++ b/net/dsa/dsa.c @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon); #endif /* CONFIG_NET_DSA_HWMON */ /* basic switch operations **/ +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master) +{ + struct dsa_chip_data *cd = ds-pd; + struct device_node *port_dn; + struct phy_device *phydev; + int ret, port; + + for (port = 0; port DSA_MAX_PORTS; port++) { + if (!((ds-cpu_port_mask | ds-dsa_port_mask) (1 port))) + continue; + + port_dn = cd-port_dn[port]; + if (of_phy_is_fixed_link(port_dn)) { + ret = of_phy_register_fixed_link(port_dn); + if (ret) { + netdev_err(master, +failed to register fixed PHY\n); + return ret; + } + phydev = of_phy_find_device(port_dn); + phydev-is_pseudo_fixed_link = true; + genphy_config_init(phydev); + genphy_read_status(phydev); I was curious as to why you were doing this at first, but I guess this is because the PHY state machine is not started for this fixed PHY that you just created, right? + if (ds-drv-adjust_link) + ds-drv-adjust_link(ds, port, phydev); + } + } + return 0; +} + static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) { struct dsa_switch_driver *drv = ds-drv; @@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) } dst-cpu_switch = index; dst-cpu_port = i; + ds-cpu_port_mask |= 1 i; Same question as Guenter here, I assume this is because you plan on having multiple CPU ports connected to the switch and this makes it easier to deal with, is that right? } else if (!strcmp(name, dsa)) { ds-dsa_port_mask |= 1 i; } else { @@ -297,6 +328,14 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent) } } + /* Perform configuration of the CPU and DSA ports */ + ret = dsa_cpu_dsa_setup(ds, dst-master_netdev); + if (ret 0) { + netdev_err(dst-master_netdev, [%d] : can't configure CPU and DSA ports\n, +index); + ret = 0; + } + #ifdef CONFIG_NET_DSA_HWMON /* If the switch provides a temperature sensor, * register with hardware monitoring subsystem. -- Florian -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )
On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote: I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter started reporting a hidden port on my box. Running unhide-tcp I see this: # unhide-tcp Unhide-tcp 20121229 Copyright © 2012 Yago Jesus Patrick Gouin License GPLv3+ : GNU GPL version 3 or later http://www.unhide-forensics.info Used options: [*]Starting TCP checking Found Hidden port that not appears in ss: 946 [*]Starting UDP checking This scared the hell out of me as I'm thinking that I have got some kind of NSA backdoor hooked into my server and it is monitoring my plans to smuggle Kinder Überraschung into the USA from Germany. I panicked! Well, I wasted the day writing modules to first look at all the sockets opened by all processes (via their file descriptors) and posted their port numbers. http://rostedt.homelinux.com/private/tasklist.c But this port wasn't there either. Then I decided to look at the ports in tcp_hashinfo. http://rostedt.homelinux.com/private/portlist.c This found the port but no file was connected to it, and worse yet, when I first ran it without using probe_kernel_read(), it crashed my kernel, because sk-sk_socket pointed to a freed socket! Note, each boot, the hidden port is different. Finally, I decided to bring in the big guns, and inserted a trace_printk() into the bind logic, to see if I could find the culprit. After fiddling with it a few times, I found a suspect: kworker/3:1H-123 [003] ..s.96.696213: inet_bind_hash: add 946 Bah, it's a kernel thread doing it, via a work queue. I then added a trace_dump_stack() to find what was calling this, and here it is: kworker/3:1H-123 [003] ..s.96.696222: stack trace = inet_csk_get_port = inet_addr_type = inet_bind = xs_bind = sock_setsockopt = __sock_create = xs_create_sock.isra.18 = xs_tcp_setup_socket = process_one_work = worker_thread = worker_thread = kthread = kthread = ret_from_fork = kthread I rebooted, and examined what happens. I see the kworker binding that port, and all seems well: # netstat -tapn |grep 946 tcp0 0 192.168.23.9:946192.168.23.22:55201 ESTABLISHED - But waiting for a bit, the connection goes into a TIME_WAIT, and then it just disappears. But the bind to the port does not get released, and that port is from then on, taken. This never happened with my 3.19 kernels. I would bisect it but this is happening on my main server box which I usually only reboot every other month doing upgrades. It causes too much disturbance for myself (and my family) as when this box is offline, basically the rest of my machines are too. I figured this may be enough information to see if you can fix it. Otherwise I can try to do the bisect, but that's not going to happen any time soon. I may just go back to 3.19 for now, such that rkhunter stops complaining about the hidden port. The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set SO_REUSEPORT socket option for TCP connections). Perhaps there is an issue with that? Cheers Trond -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html