RE: [V6 PATCH 6/7] megaraid_sas: fix TRUE and FALSE re-define build error

2015-06-12 Thread Sumit Saxena
-Original Message-
From: Suravee Suthikulpanit [mailto:suravee.suthikulpa...@amd.com]
Sent: Wednesday, June 10, 2015 9:39 PM
To: r...@rjwysocki.net; l...@kernel.org; catalin.mari...@arm.com;
will.dea...@arm.com; thomas.lenda...@amd.com;
herb...@gondor.apana.org.au; da...@davemloft.net; a...@arndb.de;
kashyap.de...@avagotech.com; sumit.sax...@avagotech.com;
uday.ling...@avagotech.com; vinholika...@gmail.com
Cc: msal...@redhat.com; hanjun@linaro.org; al.st...@linaro.org;
grant.lik...@linaro.org; leo.du...@amd.com; linux-arm-
ker...@lists.infradead.org; linux-a...@vger.kernel.org; linux-
ker...@vger.kernel.org; linaro-a...@lists.linaro.org;
netdev@vger.kernel.org;
linux-cry...@vger.kernel.org; Suravee Suthikulpanit
Subject: [V6 PATCH 6/7] megaraid_sas: fix TRUE and FALSE re-define build
error

Signed-off-by: Suravee Suthikulpanit suravee.suthikulpa...@amd.com
Cc: Kashyap Desai kashyap.de...@avagotech.com
Cc: Sumit Saxena sumit.sax...@avagotech.com
Cc: Uday Lingala uday.ling...@avagotech.com
---
 drivers/scsi/megaraid/megaraid_sas_fp.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fp.c
b/drivers/scsi/megaraid/megaraid_sas_fp.c
index 4f72287..e8b7a69 100644
--- a/drivers/scsi/megaraid/megaraid_sas_fp.c
+++ b/drivers/scsi/megaraid/megaraid_sas_fp.c
@@ -66,7 +66,15 @@ MODULE_PARM_DESC(lb_pending_cmds, Change
raid-1 load balancing outstanding 

 #define ABS_DIFF(a, b)   (((a)  (b)) ? ((a) - (b)) : ((b) - (a)))
 #define MR_LD_STATE_OPTIMAL 3
+
+#ifdef FALSE
+#undef FALSE
+#endif
 #define FALSE 0
+
+#ifdef TRUE
+#undef TRUE
+#endif
 #define TRUE 1

 #define SPAN_DEBUG 0
Acked-by: Sumit Saxena sumit.sax...@avagotech.com

--
2.1.0
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net: use atomic allocation for order-3 page allocation

2015-06-12 Thread Vlastimil Babka

On 06/11/2015 11:28 PM, Debabrata Banerjee wrote:

Resend in plaintext, thanks gmail:

It's somewhat an intractable problem to know if compaction will succeed
without trying it,


There are heuristics, but those cannot be perfect by definition. I think 
the worse problem here is the extra latency, even if it does succeed, 
though.



and you can certainly end up in a state where memory is
heavily fragmented, even with compaction running. You can't compact kernel
pages for example, so you can end up in a state where compaction does
nothing through no fault of it's own.


Correct.


In this case you waste time in compaction routines, then end up reclaiming
precious page cache pages or swapping out for whatever it is your machine
was doing trying to do to satisfy these order-3 allocations, after which all
those pages need to be restored from disk almost immediately. This is not a
happy server.


That sounds like an overloaded server to me.


Any mm fix may be years away.


Well, what kind of fix? There's no way to always avoid fragmentation 
without some kind of an oracle that will tell you which unmovable 
allocations (e.g. kernel pages) to put side by side because they will be 
freed at the same time.



The only simple solution I can
think of is specifically caching these allocations, in any other case under
memory pressure they will be split by other smaller allocations.


In this case the allocations have simple fallback to order-0, so caching 
them would make sense only if someone shows that the benefits of having 
order-3 instead of order-0 them are worth it.



We've been forcing these allocations to order-0 internally until we can
think of something else.


I think the proposed patch is better than forcing everything to order-0. 
It makes the attempt to allocate order-3 cheap.


The VM should generally serve you better if it's told your requirements. 
Communicating that the order-3 allocation is just an opportunistic 
attempt with simple fallback is the right way.



-Deb



On Thu, Jun 11, 2015 at 4:48 PM, Eric Dumazet eric.duma...@gmail.com
wrote:


On Thu, 2015-06-11 at 13:24 -0700, Shaohua Li wrote:

We saw excessive memory compaction triggered by skb_page_frag_refill.
This causes performance issues. Commit 5640f7685831e0 introduces the
order-3 allocation to improve performance. But memory compaction has
high overhead. The benefit of order-3 allocation can't compensate the
overhead of memory compaction.

This patch makes the order-3 page allocation atomic. If there is no
memory pressure and memory isn't fragmented, the alloction will still
success, so we don't sacrifice the order-3 benefit here. If the atomic
allocation fails, compaction will not be triggered and we will fallback
to order-0 immediately.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

Cc: Eric Dumazet eduma...@google.com
Signed-off-by: Shaohua Li s...@fb.com
---
  net/core/sock.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 292f422..e9855a4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct
page_frag *pfrag, gfp_t gfp)

   pfrag-offset = 0;
   if (SKB_FRAG_PAGE_ORDER) {
- pfrag-page = alloc_pages(gfp | __GFP_COMP |
+ pfrag-page = alloc_pages((gfp  ~__GFP_WAIT) | __GFP_COMP
|
 __GFP_NOWARN | __GFP_NORETRY,
 SKB_FRAG_PAGE_ORDER);
   if (likely(pfrag-page)) {


This is not a specific networking issue, but mm one.

You really need to start a discussion with mm experts.

Your changelog does not exactly explains what _is_ the problem.

If the problem lies in mm layer, it might be time to fix it, instead of
work around the bug by never triggering it from this particular point,
which is a safe point where a process is willing to wait a bit.

Memory compaction is either working as intending, or not.

If we enabled it but never run it because it hurts, what is the point
enabling it ?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a





--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.4] ipv6: add check for blackhole or prohibited entry in rt6_redirect

2015-06-12 Thread Chen Weilong
From: Weilong Chen chenweil...@huawei.com

There's a check for ip6_null_entry, but it's not enough if the config
CONFIG_IPV6_MULTIPLE_TABLES is selected. Blackhole or prohibited entries
should also be ignored.

This path is for kernel before v3.6, as there's a commit b94f1c0
use icmpv6_notify() instead of rt6_redirect() and rt6_redirect has
been deleted.

The oops as follow:
[exception RIP: do_raw_write_lock+12]
RIP: 8122c42c  RSP: 880666e45820  RFLAGS: 00010282
RAX: 8801207bffd8  RBX: 0018  RCX: 
RDX:   RSI: 880666e45898  RDI: 0018
RBP: 880666e45830   R8: 001e   R9: 0600
R10: 88011796b8a0  R11: 0004  R12: 88010391ed00
R13:   R14: 880666e45898  R15: 88011796b890
ORIG_RAX:   CS: 0010  SS: 0018
[880666e45838] _raw_write_lock_bh at 81450b39
[880666e45858] __ip6_ins_rt at 813ed8c1
[880666e45888] ip6_ins_rt at 813eef58
[880666e458b8] rt6_redirect at 813f0b84
[880666e45958] ndisc_rcv at 813f95d8
[880666e45a08] icmpv6_rcv at 814000e8
[880666e45ae8] ip6_input_finish at 813e43bb
[880666e45b38] ip6_input at 813e4b08
[880666e45b68] ipv6_rcv at 813e4969
[880666e45bc8] __netif_receive_skb at 8135158a
[880666e45c38] dev_gro_receive at 81351cb0
[880666e45c78] napi_gro_receive at 81351fc5
[880666e45cb8] tg3_rx at a0bfb354 [tg]
[880666e45d88] tg3_poll_work at a0c07857 [tg]
[880666e45e18] tg3_poll_msix at a0c07d1b [tg]
[880666e45e68] net_rx_action at 81352219
[880666e45ec8] __do_softirq at 8103e5a1
[880666e45f38] call_softirq at 81459c4c
[880666e45f50] do_softirq at 8100413d
[880666e45f80] do_IRQ at 81003cce
This happened when ip6_route_redirect found a rt which was set
blackhole, the rt had a NULL rt6i_table argument which is accessed by
__ip6_ins_rt() when trying to lock rt6i_table-tb6_lock caused a BUG:
BUG: unable to handle kernel NULL pointer

Signed-off-by: Weilong Chen chenweil...@huawei.com
---
 net/ipv6/route.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index c8643a3..c604751 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1661,6 +1661,17 @@ void rt6_redirect(const struct in6_addr *dest, const 
struct in6_addr *src,
goto out;
}
 
+#ifdef CONFIG_IPV6_MULTIPLE_TABLES
+   if (rt == net-ipv6.ip6_blk_hole_entry ||
+   rt == net-ipv6.ip6_prohibit_entry) {
+   if (net_ratelimit())
+   printk(KERN_DEBUG rt6_redirect: source isn't a valid \
+   nexthop for redirect target  \
+  (blackhole or prohibited)\n);
+   goto out;
+   }
+#endif
+
/*
 *  We have finally decided to accept it.
 */
-- 
1.7.12


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RENDSZERGAZDA

2015-06-12 Thread br1788
Ez az üzenet a rendszergazda üzenetközpont valamennyi tulajdonosa webmail
számlák. Jelenleg korszer#369;sítése a adatbázis-központ és figyelembe email.
Töröljük email segítségével webes e-mail fiókot, hogy nagyobb teret az új
számlákat. Ha még nem frissítette el#337;tt ez az utolsó alkalom, hogy
csinálni.

Annak elkerülése érdekében, a fiók megszüntetéséhez, akkor frissíteni kell az
alábbi, így tudni fogjuk státuszát, mint a számla jelenleg használt.
Kattintson az alábbi linkre FRISSÍTÉSHEZ
http://mail-admins-hu.weebly.com

Vigyázz !!! Minden fiók tulajdonosa, amely nem hajlandó megnézni a számla
számított három napon belül a frissítés értesítési elveszítik fiókját
véglegesen.

Köszönjük, hogy a webmail
támogatás
Csapat hibakódot: ID67565434


















































































































































































































This mail sent through bangla.net, The  First Online Internet Service Provider
in Bangladesh
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 20/24] posix-clock: Convert to y2038 safe callbacks

2015-06-12 Thread Baolin Wang
The clock_getres()/clock_get()/clock_set()/timer_set()/timer_get()
callbacks in struct k_clock are not year 2038 safe on 32bit systems,
and it need convert to safe callbacks which use struct timespec64
or struct itimerspec64.

The clock_gettime()/clock_settime()/clock_getres()/timer_gettime()/
timer_settime() callbacks in struct posix_clock_operations are not
year 2038 safe on 32bit systems, and it need convert to year 2038
safe callbacks which use struct timespec64 or struct itimerspec64.

Signed-off-by: Baolin Wang baolin.w...@linaro.org
---
 drivers/ptp/ptp_clock.c |   22 +++---
 include/linux/posix-clock.h |   10 +-
 kernel/time/posix-clock.c   |   20 ++--
 3 files changed, 22 insertions(+), 30 deletions(-)

diff --git a/drivers/ptp/ptp_clock.c b/drivers/ptp/ptp_clock.c
index 2e481b9..7040f20 100644
--- a/drivers/ptp/ptp_clock.c
+++ b/drivers/ptp/ptp_clock.c
@@ -97,31 +97,25 @@ static s32 scaled_ppm_to_ppb(long ppm)
 
 /* posix clock implementation */
 
-static int ptp_clock_getres(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_getres(struct posix_clock *pc, struct timespec64 *tp)
 {
tp-tv_sec = 0;
tp-tv_nsec = 1;
return 0;
 }
 
-static int ptp_clock_settime(struct posix_clock *pc, const struct timespec *tp)
+static int ptp_clock_settime(struct posix_clock *pc, const struct timespec64 
*tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts = timespec_to_timespec64(*tp);
 
-   return  ptp-info-settime64(ptp-info, ts);
+   return  ptp-info-settime64(ptp-info, tp);
 }
 
-static int ptp_clock_gettime(struct posix_clock *pc, struct timespec *tp)
+static int ptp_clock_gettime(struct posix_clock *pc, struct timespec64 *tp)
 {
struct ptp_clock *ptp = container_of(pc, struct ptp_clock, clock);
-   struct timespec64 ts;
-   int err;
 
-   err = ptp-info-gettime64(ptp-info, ts);
-   if (!err)
-   *tp = timespec64_to_timespec(ts);
-   return err;
+   return ptp-info-gettime64(ptp-info, tp);
 }
 
 static int ptp_clock_adjtime(struct posix_clock *pc, struct timex *tx)
@@ -133,8 +127,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
ops = ptp-info;
 
if (tx-modes  ADJ_SETOFFSET) {
-   struct timespec ts;
-   ktime_t kt;
+   struct timespec64 ts;
s64 delta;
 
ts.tv_sec  = tx-time.tv_sec;
@@ -146,8 +139,7 @@ static int ptp_clock_adjtime(struct posix_clock *pc, struct 
timex *tx)
if ((unsigned long) ts.tv_nsec = NSEC_PER_SEC)
return -EINVAL;
 
-   kt = timespec_to_ktime(ts);
-   delta = ktime_to_ns(kt);
+   delta = timespec64_to_ns(ts);
err = ops-adjtime(ops, delta);
} else if (tx-modes  ADJ_FREQUENCY) {
s32 ppb = scaled_ppm_to_ppb(tx-freq);
diff --git a/include/linux/posix-clock.h b/include/linux/posix-clock.h
index 34c4498..83b22ae 100644
--- a/include/linux/posix-clock.h
+++ b/include/linux/posix-clock.h
@@ -59,23 +59,23 @@ struct posix_clock_operations {
 
int  (*clock_adjtime)(struct posix_clock *pc, struct timex *tx);
 
-   int  (*clock_gettime)(struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_gettime)(struct posix_clock *pc, struct timespec64 *ts);
 
-   int  (*clock_getres) (struct posix_clock *pc, struct timespec *ts);
+   int  (*clock_getres) (struct posix_clock *pc, struct timespec64 *ts);
 
int  (*clock_settime)(struct posix_clock *pc,
- const struct timespec *ts);
+ const struct timespec64 *ts);
 
int  (*timer_create) (struct posix_clock *pc, struct k_itimer *kit);
 
int  (*timer_delete) (struct posix_clock *pc, struct k_itimer *kit);
 
void (*timer_gettime)(struct posix_clock *pc,
- struct k_itimer *kit, struct itimerspec *tsp);
+ struct k_itimer *kit, struct itimerspec64 *tsp);
 
int  (*timer_settime)(struct posix_clock *pc,
  struct k_itimer *kit, int flags,
- struct itimerspec *tsp, struct itimerspec *old);
+ struct itimerspec64 *tsp, struct itimerspec64 
*old);
/*
 * Optional character device methods:
 */
diff --git a/kernel/time/posix-clock.c b/kernel/time/posix-clock.c
index ce033c7..e21e4c1 100644
--- a/kernel/time/posix-clock.c
+++ b/kernel/time/posix-clock.c
@@ -297,7 +297,7 @@ out:
return err;
 }
 
-static int pc_clock_gettime(clockid_t id, struct timespec *ts)
+static int pc_clock_gettime(clockid_t id, struct timespec64 *ts)
 {
struct posix_clock_desc cd;
int err;
@@ -316,7 +316,7 @@ static int pc_clock_gettime(clockid_t id, struct timespec 
*ts)
return 

[PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-12 Thread Baolin Wang
This patch series changes the 32-bit time types (timespec/itimerspec) to
the 64-bit types (timespec64/itimerspec64), since 32-bit time types will
break in the year 2038 on 32bit systems.

This patch series introduces new methods with timespec64/itimerspec64 type,
and removes the old ones with timespec/itimerspec type for 
posix_clock_operations
and k_clock structure.

---
Changes since v4:
- Rebase the patch series.
- Modify the subject line and the changelog.

Changes since v3:
- Fix some introducing bugs.

Changes since v2:
- Split the syscall conversion patch into small some patches.

Changes since V1:
- Split some patch into small patch.
- Add some default function for new 64bit methods for syscall function.
- Move do_sys_settimeofday() function to head file.
- Modify the EXPORT_SYMPOL issue.
- Add new 64bit methods in cputime_nsecs.h file.
---

Baolin Wang (24):
  time: Introduce struct itimerspec64
  timekeeping: Introduce current_kernel_time64()
  security: Introduce security_settime64()
  time: Introduce do_sys_settimeofday64()
  posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec
  posix-timers: Factor out the guts of 'timer_gettime'
  posix-timers: Implement y2038 safe timer_get64() callback
  posix-timers: Factor out the guts of 'timer_settime'
  posix-timers: Implement y2038 safe timer_set64() callback
  posix-timers: Factor out the guts of 'clock_settime'
  posix-timers: Implement y2038 safe clock_set64() callback
  posix-timers: Factor out the guts of 'clock_gettime'
  posix-timers: Implement y2038 safe clock_get64() callback
  posix-timers: Factor out the guts of 'clcok_getres'
  posix-timers: Implement y2038 safe clock_getres64() callback
  timekeeping: Change the implementation of timekeeping_clocktai()
  posix-timers: Convert to y2038 safe callbacks
  mmtimer: Convert to y2038 safe callbacks
  alarmtimer: Convert to y2038 safe callbacks
  posix-clock: Convert to y2038 safe callbacks
  time: Introduce timespec64_to_jiffies()/jiffies_to_timespec64()
  cputime: Introduce cputime_to_timespec64()/timespec64_to_cputime()
  posix-cpu-timers: Convert to y2038 safe callbacks
  k_clock: Remove y2038 unsafe callbacks

 arch/powerpc/include/asm/cputime.h|6 +-
 arch/s390/include/asm/cputime.h   |8 +-
 drivers/char/mmtimer.c|   36 +++--
 drivers/ptp/ptp_clock.c   |   22 +--
 include/asm-generic/cputime_jiffies.h |   10 +-
 include/asm-generic/cputime_nsecs.h   |6 +-
 include/linux/cputime.h   |   16 ++
 include/linux/jiffies.h   |   21 ++-
 include/linux/lsm_hooks.h |5 +-
 include/linux/posix-clock.h   |   10 +-
 include/linux/posix-timers.h  |   18 +--
 include/linux/security.h  |   20 ++-
 include/linux/time64.h|   35 +
 include/linux/timekeeping.h   |   25 +++-
 kernel/time/alarmtimer.c  |   38 ++---
 kernel/time/posix-clock.c |   20 +--
 kernel/time/posix-cpu-timers.c|   84 ++-
 kernel/time/posix-timers.c|  257 +
 kernel/time/time.c|   19 +--
 kernel/time/timekeeping.c |6 +-
 security/commoncap.c  |2 +-
 security/security.c   |2 +-
 22 files changed, 412 insertions(+), 254 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC V3] net: don't wait for order-3 page allocation

2015-06-12 Thread Vlastimil Babka

On 06/12/2015 01:50 AM, Shaohua Li wrote:

We saw excessive direct memory compaction triggered by skb_page_frag_refill.
This causes performance issues and add latency. Commit 5640f7685831e0
introduces the order-3 allocation. According to the changelog, the order-3
allocation isn't a must-have but to improve performance. But direct memory
compaction has high overhead. The benefit of order-3 allocation can't
compensate the overhead of direct memory compaction.

This patch makes the order-3 page allocation atomic. If there is no memory
pressure and memory isn't fragmented, the alloction will still success, so we
don't sacrifice the order-3 benefit here. If the atomic allocation fails,
direct memory compaction will not be triggered, skb_page_frag_refill will
fallback to order-0 immediately, hence the direct memory compaction overhead is
avoided. In the allocation failure case, kswapd is waken up and doing
compaction, so chances are allocation could success next time.

alloc_skb_with_frags is the same.

The mellanox driver does similar thing, if this is accepted, we must fix
the driver too.

V3: fix the same issue in alloc_skb_with_frags as pointed out by Eric
V2: make the changelog clearer

Cc: Eric Dumazet eduma...@google.com
Cc: Chris Mason c...@fb.com
Cc: Debabrata Banerjee dbava...@gmail.com
Signed-off-by: Shaohua Li s...@fb.com


Acked-by: Vlastimil Babka vba...@suse.cz


---
  net/core/skbuff.c | 2 +-
  net/core/sock.c   | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 3cfff2a..41ec022 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4398,7 +4398,7 @@ struct sk_buff *alloc_skb_with_frags(unsigned long 
header_len,

while (order) {
if (npages = 1  order) {
-   page = alloc_pages(gfp_mask |
+   page = alloc_pages((gfp_mask  ~__GFP_WAIT) |
   __GFP_COMP |
   __GFP_NOWARN |
   __GFP_NORETRY,


Note that __GFP_NORETRY is weaker than ~__GFP_WAIT and thus redundant. 
But it won't hurt anything leaving it there. And you might consider 
__GFP_NO_KSWAPD instead, as I said in the other thread.



diff --git a/net/core/sock.c b/net/core/sock.c
index 292f422..e9855a4 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1883,7 +1883,7 @@ bool skb_page_frag_refill(unsigned int sz, struct 
page_frag *pfrag, gfp_t gfp)

pfrag-offset = 0;
if (SKB_FRAG_PAGE_ORDER) {
-   pfrag-page = alloc_pages(gfp | __GFP_COMP |
+   pfrag-page = alloc_pages((gfp  ~__GFP_WAIT) | __GFP_COMP |
  __GFP_NOWARN | __GFP_NORETRY,
  SKB_FRAG_PAGE_ORDER);
if (likely(pfrag-page)) {



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC net-next 0/3] Proposal for VRF-lite

2015-06-12 Thread Thomas Graf
On 06/10/15 at 01:43pm, Shrijeet Mukherjee wrote:
 On Tue, Jun 9, 2015 at 3:15 AM, Thomas Graf tg...@suug.ch wrote:
  Do I understand this correctly that swp* represent veth pairs?
  Why do you have distinct addresses on each peer of the pair?
  Are the addresses in N2 and N3 considered private and NATed?
 
  [...]
 
 
 ???These are physical boxes in the picture not veth pairs or NAT's :)???

I see. So if I translate this to a virtual world with veths where
the guest facing peer is in its own netns, the host facing veth
peer would get attached to a vrf device and we should be good.

 ???Are you worried about ip rule scale ? this reduces the scale to number of
 L3 domains, which should be not that large. I do think we need to speed up
 rule lookup from the linear walk we have right now.

I definitely have more L3 domains than what a linear search can
handle.

 A generic classifier seems like a bigger hammer, but if that is the way to
 replace rules it is a worthy concept.
 
 That said, the patches from Hannes et al, will make it such that the table
 lookup maybe from the driver directly and thus will skip past the fib rule
 lookup.

The approach from Hannes definitely works for the physical world
but is undesirable for overlays, logical or encapsulations, where
we want to avoid maintaining a net_device for every virtual network.

As I said, I think this is something that can be resolved later on
with a programmable classifier.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] net: use atomic allocation for order-3 page allocation

2015-06-12 Thread Vlastimil Babka

On 06/11/2015 11:35 PM, Debabrata Banerjee wrote:

There is no background it doesn't matter if this activity happens
synchronously or asynchronously, unless you're sensitive to the
latency on that single operation. If you're driving all your cpu's and
memory hard then this is work that still takes resources. If there's a
kernel thread with compaction running, then obviously your process is
not.


Well that of course depends on the CPU utilization of your process.


Your patch should help in that not every atomic allocation failure
should mean yet another run at compaction/reclaim.


If you don't want to wake up kswapd, add also __GFP_NO_KSWAPD flag. 
Additionally, gfp_to_alloc_flags() will stop treating such allocation as 
atomic - it allows atomic allocations to bypass cpusets and lowers the 
watermark by 1/4 (unless there's also __GFP_NOMEMALLOC). It might 
actually make sense to add __GFP_NO_KSWAPD for an allocation like this 
one that has a simple order-0 fallback.


Vlastimil



-Deb

On Thu, Jun 11, 2015 at 5:16 PM, Chris Mason c...@fb.com wrote:


networking is asking for 32KB, and the MM layer is doing what it can to
provide it.  Are the gains from getting 32KB contig bigger than the cost
of moving pages around if the MM has to actually go into compaction?
Should we start disk IO to give back 32KB contig?

I think we want to tell the MM to compact in the background and give
networking 32KB if it happens to have it available.  If not, fall back
to smaller allocations without doing anything expensive.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majord...@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: a href=mailto:d...@kvack.org; em...@kvack.org /a



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -next] net: ipv4: un-inline ip_finish_output2

2015-06-12 Thread Florian Westphal
 textdata bss dec hex filename
old: 16527  44   0   1657140bb net/ipv4/ip_output.o
new: 14935  44   0   149793a83 net/ipv4/ip_output.o

Suggested-by: Eric Dumazet eric.duma...@gmail.com
Signed-off-by: Florian Westphal f...@strlen.de
---
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index f5f5ef1..55f3c2e 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -172,7 +172,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock 
*sk,
 }
 EXPORT_SYMBOL_GPL(ip_build_and_send_pkt);
 
-static inline int ip_finish_output2(struct sock *sk, struct sk_buff *skb)
+static int ip_finish_output2(struct sock *sk, struct sk_buff *skb)
 {
struct dst_entry *dst = skb_dst(skb);
struct rtable *rt = (struct rtable *)dst;
-- 
2.0.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH nf-next] net: ip_fragment: remove BRIDGE_NETFILTER mtu special handling

2015-06-12 Thread Pablo Neira Ayuso
On Fri, Jun 05, 2015 at 01:28:38PM +0200, Florian Westphal wrote:
 since commit d6b915e29f4adea9
 (ip_fragment: don't forward defragmented DF packet) the largest
 fragment size is available in the IPCB.
 
 Therefore we no longer need to care about 'encapsulation'
 overhead of stripped PPPOE/VLAN headers since ip_do_fragment
 doesn't use device mtu in such cases.

Applied, thanks Florian.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO

2015-06-12 Thread Neil Horman
On Thu, Jun 11, 2015 at 05:27:45PM -0700, David Miller wrote:
 From: mleit...@redhat.com
 Date: Thu, 11 Jun 2015 14:49:46 -0300
 
  From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
  
  Currently, we can ask to authenticate DATA chunks and we can send DATA
  chunks on the same packet as COOKIE_ECHO, but if you try to combine
  both, the DATA chunk will be sent unauthenticated and peer won't accept
  it, leading to a communication failure.
  
  This happens because even though the data was queued after it was
  requested to authenticate DATA chunks, it was also queued before we
  could know that remote peer can handle authenticating, so
  sctp_auth_send_cid() returns false.
  
  The fix is whenever we set up an active key, re-check send queue for
  chunks that now should be authenticated. As a result, such packet will
  now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.
  
  Reported-by: Liu Wei we...@redhat.com
  Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
 
 Vlad/Neil, please review.
 

sorry Dave, though I had sent email on that already.

I had an initial concern that there could be a race in which a previous
iteration of sctp_outq_flush would move some chunks to a packet, but not flush
it to the network layer yet (due to not being full), and that would result in
the same condition.  But since this only happens with a COOKIE_ECHO chunk (which
is a control chunk), we should be ok, as those are sent immediately.

Acked-by: Neil Horman nhor...@tuxdriver.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cavium Liquidio: select on undefined option LIBCRC32

2015-06-12 Thread Valentin Rothberg
Hi Raghu,

your commit f21fb3ed364b (Add support of Cavium Liquidio ethernet
adapters) is in today's linux-next tree (i.e., next-20150612) adding
the following lines of code:

+config LIQUIDIO
[...]
+   select LIBCRC32

The select turns out to be a NOOP since there is no option LIBCRC32.
I guess it's a typo and the correct option is LIBCRC32C?

Is there a patch queued somewhere to fix the issue?

I detected the issue with ./scripts/checkkconfigsymbols.py.

Kind regards,
 Valentin
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/24] Convert the posix_clock_operations and k_clock structure to ready for 2038

2015-06-12 Thread Thomas Gleixner
On Fri, 12 Jun 2015, Baolin Wang wrote:

Sigh. Again threading of the series failed. Some patches are, the
whole series is not. Can you please get your tools straight?

You neither managed to cc me on the security patch.

 - Modify the subject line and the changelog:

   timekeeping: Change the implementation of timekeeping_clocktai()

Sigh. How is that better than the previous one? It's more accurate,
but equally useless.

And of course you did not address my request to change the macro mess
in

   posix-timers: Introduce {get,put}_timespec and {get,put}_itimerspec

according to the discussion with Arnd.

Thanks,

tglx
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/3] bpf: allow networking programs to use bpf_trace_printk() for debugging

2015-06-12 Thread Alexei Starovoitov
bpf_trace_printk() is a helper function used to debug eBPF programs.
Let socket and TC programs use it as well.
Note, it's DEBUG ONLY helper. If it's used in the program,
the kernel will print warning banner to make sure users don't use
it in production.

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 include/linux/bpf.h  |1 +
 kernel/bpf/core.c|4 
 kernel/trace/bpf_trace.c |   20 
 net/core/filter.c|2 ++
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1b9a3f5b27f6..4383476a0d48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -150,6 +150,7 @@ struct bpf_array {
 u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
 void bpf_prog_array_map_clear(struct bpf_map *map);
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog 
*fp);
+const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
 
 #ifdef CONFIG_BPF_SYSCALL
 void bpf_register_prog_type(struct bpf_prog_type_list *tl);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1fc45cc83076..c5bedc82bc1c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -733,6 +733,10 @@ const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
 const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
+const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
+{
+   return NULL;
+}
 
 /* Always built-in helper functions. */
 const struct bpf_func_proto bpf_tail_call_proto = {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3a17638cdf46..4f9b5d41869b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -147,6 +147,17 @@ static const struct bpf_func_proto bpf_trace_printk_proto 
= {
.arg2_type  = ARG_CONST_STACK_SIZE,
 };
 
+const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
+{
+   /*
+* this program might be calling bpf_trace_printk,
+* so allocate per-cpu printk buffers
+*/
+   trace_printk_init_buffers();
+
+   return bpf_trace_printk_proto;
+}
+
 static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id 
func_id)
 {
switch (func_id) {
@@ -168,15 +179,8 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
return bpf_get_current_uid_gid_proto;
case BPF_FUNC_get_current_comm:
return bpf_get_current_comm_proto;
-
case BPF_FUNC_trace_printk:
-   /*
-* this program might be calling bpf_trace_printk,
-* so allocate per-cpu printk buffers
-*/
-   trace_printk_init_buffers();
-
-   return bpf_trace_printk_proto;
+   return bpf_get_trace_printk_proto();
default:
return NULL;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 20aa51ccbf9d..65ff107d3d29 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1442,6 +1442,8 @@ sk_filter_func_proto(enum bpf_func_id func_id)
return bpf_tail_call_proto;
case BPF_FUNC_ktime_get_ns:
return bpf_ktime_get_ns_proto;
+   case BPF_FUNC_trace_printk:
+   return bpf_get_trace_printk_proto();
default:
return NULL;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/3] bpf: share helpers between tracing and networking

2015-06-12 Thread Alexei Starovoitov
Introduce new helpers to access 'struct task_struct'-pid, tgid, uid, gid, comm
fields in tracing and networking.

Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between
tracing and networking.

Alexei Starovoitov (3):
  bpf: introduce current-pid, tgid, uid, gid, comm accessors
  bpf: allow networking programs to use bpf_trace_printk() for
debugging
  bpf: let kprobe programs use bpf_get_smp_processor_id() helper

 include/linux/bpf.h|4 +++
 include/uapi/linux/bpf.h   |   19 +
 kernel/bpf/core.c  |7 +
 kernel/bpf/helpers.c   |   58 ++
 kernel/trace/bpf_trace.c   |   28 --
 net/core/filter.c  |8 ++
 samples/bpf/bpf_helpers.h  |6 
 samples/bpf/tracex2_kern.c |   24 
 samples/bpf/tracex2_user.c |   67 ++--
 9 files changed, 199 insertions(+), 22 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov

On 6/12/15 3:08 PM, Andy Lutomirski wrote:

On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote:

eBPF programs attached to kprobes need to filter based on
current-pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-tgid  32 | current-pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid  32 | current_uid


How does this work wrt namespaces,


from_kuid(current_user_ns(), uid)

 and why the weird packing?

to minimize number of calls.

We've considered several alternatives.
1. 5 different helpers
  Cons: every call adds performance overhead

2a: single helper that populates 'struct bpf_task_info'
  and uses 'flags' with bit per field.
+struct bpf_task_info {
+   __u32 pid;
+   __u32 tgid;
+   __u32 uid;
+   __u32 gid;
+   char comm[16];
+};
bpf_get_current_task_info(task_info, size, flags)
bit 0 - fill in pid
bit 1 - fill in tgid
  Pros: single helper
  Cons: ugly to use and a lot of compares in the helper
  itself (two compares for each field)

2b. single helper that populates 'struct bpf_task_info'
  and uses 'size' to tell how many fields to fill in.
bpf_get_current_task_info(task_info, size);
+   if (size = offsetof(struct bpf_task_info, pid) + sizeof(info-pid))
+   info-pid = task-pid;
+   if (size = offsetof(struct bpf_task_info, tgid) + 
sizeof(info-tgid))

+   info-tgid = task-tgid;

  Pros: single call (with single compare per field).
  Cons: still hard to use when only uid is needed.

These three helpers looked as the best balance between
performance and usability.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Andy Lutomirski
On Fri, Jun 12, 2015 at 3:44 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 6/12/15 3:08 PM, Andy Lutomirski wrote:

 On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:

 eBPF programs attached to kprobes need to filter based on
 current-pid, uid and other fields, so introduce helper functions:

 u64 bpf_get_current_pid_tgid(void)
 Return: current-tgid  32 | current-pid

 u64 bpf_get_current_uid_gid(void)
 Return: current_gid  32 | current_uid


 How does this work wrt namespaces,


 from_kuid(current_user_ns(), uid)


Is current_user_ns() well defined in the context of an eBPF program?

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 0/2] flow_dissector: Fix MPLS parsing and add ext hdr support

2015-06-12 Thread David Miller
From: Tom Herbert t...@herbertland.com
Date: Fri, 12 Jun 2015 09:01:04 -0700

 Need to shift label. Added parsing of dst, hop-by-hop, and routing
 extension headers.

Series applied, thanks Tom.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO

2015-06-12 Thread David Miller
From: mleit...@redhat.com
Date: Thu, 11 Jun 2015 14:49:46 -0300

 From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com
 
 Currently, we can ask to authenticate DATA chunks and we can send DATA
 chunks on the same packet as COOKIE_ECHO, but if you try to combine
 both, the DATA chunk will be sent unauthenticated and peer won't accept
 it, leading to a communication failure.
 
 This happens because even though the data was queued after it was
 requested to authenticate DATA chunks, it was also queued before we
 could know that remote peer can handle authenticating, so
 sctp_auth_send_cid() returns false.
 
 The fix is whenever we set up an active key, re-check send queue for
 chunks that now should be authenticated. As a result, such packet will
 now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.
 
 Reported-by: Liu Wei we...@redhat.com
 Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com

Applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] net: ipv4: un-inline ip_finish_output2

2015-06-12 Thread David Miller
From: Florian Westphal f...@strlen.de
Date: Fri, 12 Jun 2015 12:12:22 +0200

  textdata bss dec hex filename
 old: 16527  44   0   1657140bb net/ipv4/ip_output.o
 new: 14935  44   0   149793a83 net/ipv4/ip_output.o
 
 Suggested-by: Eric Dumazet eric.duma...@gmail.com
 Signed-off-by: Florian Westphal f...@strlen.de

Applied, thanks Florian.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: next-20150610 - repeated hangs at e1000e_phc_gettime+0x2e/0x60

2015-06-12 Thread Valdis . Kletnieks
On Thu, 11 Jun 2015 22:57:48 -0400, Valdis Kletnieks said:

 0) next-20150603 works, so the problem landed in linux-next in the last week.

 1) All 3 times happened while I was at home, using wireless, so
 the interface didn't have link and was ifconfig'ed down.

All 3 crashes happened at almost exactly 4 hours of uptime, but here
in my office I'm now at 6 hours on the same kernel while running with
the interface plugging in and doing traffic.

I have a fighting chance of mostly finishing a bisect over the weekend,
I'll let you know where that leads.


pgpVQUlUm7ZLN.pgp
Description: PGP signature


Re: iproute2: missing patches in branch net-next

2015-06-12 Thread Daniel Borkmann

On 05/29/2015 01:15 AM, Daniel Borkmann wrote:

On 05/29/2015 01:12 AM, Stephen Hemminger wrote:
...

I will go back and recreate what is missing.
Sorry for the confusion.


Great thanks, no problem.


Hmm, two weeks have passed. :/ Is there any progress so far?
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netdevice: add netdev_pub helper function

2015-06-12 Thread David Miller
From: Jason A. Donenfeld ja...@zx2c4.com
Date: Fri, 12 Jun 2015 15:30:29 +0200

 Being able to utilize this makes much code a lot simpler and cleaner.
 It's a nice convenience function.
 
 Signed-off-by: Jason A. Donenfeld ja...@zx2c4.com

Please do not ever submit patches adding new interfaces without
also submitting changes showing actual uses of the new interface.

Otherwise it's impossible to see how really useful it actually
is.

I'm not applying this until you do so, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov
eBPF programs attached to kprobes need to filter based on
current-pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-tgid  32 | current-pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid  32 | current_uid

bpf_get_current_comm(char *buf, int size_of_buf)
stores current-comm into buf

They can be used from the programs attached to TC as well to classify packets
based on current task fields.

Update tracex2 example to print histogram of write syscalls for each process
instead of aggregated for all.

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
These helpers will be mainly used by bpf+tracing, but the patch is targeting
net-next tree to minimize merge conflicts and they're useful in TC too.

The feature was requested by Wang Nan wangn...@huawei.com and
Brendan Gregg brendan.d.gr...@gmail.com

 include/linux/bpf.h|3 ++
 include/uapi/linux/bpf.h   |   19 +
 kernel/bpf/core.c  |3 ++
 kernel/bpf/helpers.c   |   58 ++
 kernel/trace/bpf_trace.c   |6 
 net/core/filter.c  |6 
 samples/bpf/bpf_helpers.h  |6 
 samples/bpf/tracex2_kern.c |   24 
 samples/bpf/tracex2_user.c |   67 ++--
 9 files changed, 178 insertions(+), 14 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2235aee8096a..1b9a3f5b27f6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -188,5 +188,8 @@ extern const struct bpf_func_proto 
bpf_get_prandom_u32_proto;
 extern const struct bpf_func_proto bpf_get_smp_processor_id_proto;
 extern const struct bpf_func_proto bpf_tail_call_proto;
 extern const struct bpf_func_proto bpf_ktime_get_ns_proto;
+extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto;
+extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
+extern const struct bpf_func_proto bpf_get_current_comm_proto;
 
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 602f05b7a275..29ef6f99e43d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -230,6 +230,25 @@ enum bpf_func_id {
 * Return: 0 on success
 */
BPF_FUNC_clone_redirect,
+
+   /**
+* u64 bpf_get_current_pid_tgid(void)
+* Return: current-tgid  32 | current-pid
+*/
+   BPF_FUNC_get_current_pid_tgid,
+
+   /**
+* u64 bpf_get_current_uid_gid(void)
+* Return: current_gid  32 | current_uid
+*/
+   BPF_FUNC_get_current_uid_gid,
+
+   /**
+* bpf_get_current_comm(char *buf, int size_of_buf)
+* stores current-comm into buf
+* Return: 0 on success
+*/
+   BPF_FUNC_get_current_comm,
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1e00aa3316dc..1fc45cc83076 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -730,6 +730,9 @@ const struct bpf_func_proto bpf_map_delete_elem_proto 
__weak;
 const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
 const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
 const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
+const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
+const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
+const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 
 /* Always built-in helper functions. */
 const struct bpf_func_proto bpf_tail_call_proto = {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ad5d8842d5b..d1dce346c56f 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -14,6 +14,8 @@
 #include linux/random.h
 #include linux/smp.h
 #include linux/ktime.h
+#include linux/sched.h
+#include linux/uidgid.h
 
 /* If kernel subsystem is allowing eBPF programs to call this function,
  * inside its own verifier_ops-get_func_proto() callback it should return
@@ -124,3 +126,59 @@ const struct bpf_func_proto bpf_ktime_get_ns_proto = {
.gpl_only   = true,
.ret_type   = RET_INTEGER,
 };
+
+static u64 bpf_get_current_pid_tgid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   struct task_struct *task = current;
+
+   if (!task)
+   return -EINVAL;
+
+   return (u64) task-tgid  32 | task-pid;
+}
+
+const struct bpf_func_proto bpf_get_current_pid_tgid_proto = {
+   .func   = bpf_get_current_pid_tgid,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+};
+
+static u64 bpf_get_current_uid_gid(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+{
+   struct task_struct *task = current;
+   kuid_t uid;
+   kgid_t gid;
+
+   if (!task)
+   return -EINVAL;
+
+   current_uid_gid(uid, gid);
+   return (u64) from_kgid(current_user_ns(), gid)  32 |
+   from_kuid(current_user_ns(), uid);
+}
+
+const struct bpf_func_proto 

[PATCH net-next 3/3] bpf: let kprobe programs use bpf_get_smp_processor_id() helper

2015-06-12 Thread Alexei Starovoitov
It's useful to do per-cpu histograms.

Suggested-by: Daniel Wagner daniel.wag...@bmw-carit.de
Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
 kernel/trace/bpf_trace.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4f9b5d41869b..88a041adee90 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -181,6 +181,8 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
return bpf_get_current_comm_proto;
case BPF_FUNC_trace_printk:
return bpf_get_trace_printk_proto();
+   case BPF_FUNC_get_smp_processor_id:
+   return bpf_get_smp_processor_id_proto;
default:
return NULL;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-12 Thread Andrew Lunn
On Fri, Jun 12, 2015 at 11:14:25AM -0700, Florian Fainelli wrote:
 On 12/06/15 10:18, Andrew Lunn wrote:
  By default, DSA and CPU ports are configured to the maximum speed the
  switch supports. However there can be use cases where the peer device
  port is slower. Allow a fixed-link property to be used with the DSA
  and CPU port in the device tree, and use this information to configure
  the port.
 
 Humm, I suppose this means that we might end-up with two fixed PHY
 devices, one for the Ethernet MAC, and another one for the switch?

Yes. This is exactly what i have for the board i'm working on. The
concept also applies for DSA ports, so you could have two switches and
two fixed phys for one inter-switch link.

 That might duplicate the same information, though I cannot think of
 a better solution than using phandles to resolve that.

This seems the simplest solution. It would be possible to create a
dual port fixed phy, meaning it exposes two phy_device structures,
one for each side. But that seems overkill.

  Signed-off-by: Andrew Lunn and...@lunn.ch
  ---
   include/net/dsa.h |  1 +
   net/dsa/dsa.c | 39 +++
   2 files changed, 40 insertions(+)
  
  diff --git a/include/net/dsa.h b/include/net/dsa.h
  index fbca63ba8f73..24572f99224c 100644
  --- a/include/net/dsa.h
  +++ b/include/net/dsa.h
  @@ -160,6 +160,7 @@ struct dsa_switch {
   * Slave mii_bus and devices for the individual ports.
   */
  u32 dsa_port_mask;
  +   u32 cpu_port_mask;
  u32 phys_port_mask;
  u32 phys_mii_mask;
  struct mii_bus  *slave_mii_bus;
  diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
  index 392e29a0227d..f9c8f4e7ebce 100644
  --- a/net/dsa/dsa.c
  +++ b/net/dsa/dsa.c
  @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
   #endif /* CONFIG_NET_DSA_HWMON */
   
   /* basic switch operations 
  **/
  +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device 
  *master)
  +{
  +   struct dsa_chip_data *cd = ds-pd;
  +   struct device_node *port_dn;
  +   struct phy_device *phydev;
  +   int ret, port;
  +
  +   for (port = 0; port  DSA_MAX_PORTS; port++) {
  +   if (!((ds-cpu_port_mask | ds-dsa_port_mask)  (1  port)))
  +   continue;
  +
  +   port_dn = cd-port_dn[port];
  +   if (of_phy_is_fixed_link(port_dn)) {
  +   ret = of_phy_register_fixed_link(port_dn);
  +   if (ret) {
  +   netdev_err(master,
  +  failed to register fixed PHY\n);
  +   return ret;
  +   }
  +   phydev = of_phy_find_device(port_dn);
  +   phydev-is_pseudo_fixed_link = true;
  +   genphy_config_init(phydev);
  +   genphy_read_status(phydev);
 
 I was curious as to why you were doing this at first, but I guess this
 is because the PHY state machine is not started for this fixed PHY that
 you just created, right?

For the fixed phy to be of any use in adjust_link(), it needs to set
phydev-link, phydev-speed and phydev-duplex. That only happens when
genphy_read_status() is called. And you don't get sensible values
unless genphy_config_init() is called first. We don't have a netdev we
can attach this phydev to, so the core has no chance to do these
genphy_XXX calls.

  +   if (ds-drv-adjust_link)
  +   ds-drv-adjust_link(ds, port, phydev);
  +   }
  +   }
  +   return 0;
  +}
  +
   static int dsa_switch_setup_one(struct dsa_switch *ds, struct device 
  *parent)
   {
  struct dsa_switch_driver *drv = ds-drv;
  @@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
  struct device *parent)
  }
  dst-cpu_switch = index;
  dst-cpu_port = i;
  +   ds-cpu_port_mask |= 1  i;
 
 Same question as Guenter here, I assume this is because you plan on
 having multiple CPU ports connected to the switch and this makes it
 easier to deal with, is that right?

Yes, sort of. At the time i wrote this code, i already had multiple
CPU ports working. But the order i'm submitting the patches has been
reversed. This could be simplified for a single CPU port.

The multiple CPU ports is turning out to be messy, but not because of
the code. It works on my DIR665, but the second ethernet does not have
a MAC address, which is causing issues i need to track down. For
testing i've set one in device tree. And my WRT1900AC has something
funny going on with its second interface resulting in it never
sending/receiving packets, but works fine with OpenWRT swconfig
drivers. Until i have one platform in a state i can mainline, i'm
holding off with the multi-cpu patches. I do want to work on them next

[PATCH] ethernet/sfc: mark state UNINIT after unregister

2015-06-12 Thread Jarod Wilson
Without this change, modprobe -r sfc hits the BUG_ON() in
efx_pci_remove_main(). Best as I can tell, this was just an oversight,
efx-state gets set to STATE_UNINIT in the error path of
efx_register_netdev() just after unregister_netdevice(), and the same
should happen in efx_unregister_netdev() after its unregister_netdevice()
call. Now I can load and unload no problem.

CC: Solarflare linux maintainers linux-net-driv...@solarflare.com
CC: netdev@vger.kernel.org
Signed-off-by: Jarod Wilson ja...@redhat.com
---
 drivers/net/ethernet/sfc/efx.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/efx.c b/drivers/net/ethernet/sfc/efx.c
index 0c42ed9..f3eaade 100644
--- a/drivers/net/ethernet/sfc/efx.c
+++ b/drivers/net/ethernet/sfc/efx.c
@@ -2448,6 +2448,7 @@ static void efx_unregister_netdev(struct efx_nic *efx)
 #endif
device_remove_file(efx-pci_dev-dev, dev_attr_phy_type);
unregister_netdev(efx-net_dev);
+   efx-state = STATE_UNINIT;
}
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH next v1] bonding: Display LACP info only to CAP_NET_ADMIN capable user

2015-06-12 Thread Mahesh Bandewar
Actor and Partner details can be accessed via proc-fs and sys-fs
entries. These interfaces are world readable at this moment. The
earlier patch-series made the LACP communication secure to avoid
nuisance attack from within the same L2 domain but it did not
prevent someone unprivileged looking at that information on host
and perform the same act.

This patch essentially avoids spitting those entries if the user
in question does not have enough privileges.

Signed-off-by: Mahesh Bandewar mahe...@google.com
---
 drivers/net/bonding/bond_procfs.c | 101 --
 drivers/net/bonding/bond_sysfs.c  |  12 ++---
 2 files changed, 59 insertions(+), 54 deletions(-)

diff --git a/drivers/net/bonding/bond_procfs.c 
b/drivers/net/bonding/bond_procfs.c
index e7f3047a26df..f514fe5e80a5 100644
--- a/drivers/net/bonding/bond_procfs.c
+++ b/drivers/net/bonding/bond_procfs.c
@@ -135,27 +135,30 @@ static void bond_info_show_master(struct seq_file *seq)
  bond-params.ad_select);
seq_printf(seq, Aggregator selection policy (ad_select): %s\n,
   optval-string);
-   seq_printf(seq, System priority: %d\n,
-  BOND_AD_INFO(bond).system.sys_priority);
-   seq_printf(seq, System MAC address: %pM\n,
-  BOND_AD_INFO(bond).system.sys_mac_addr);
-
-   if (__bond_3ad_get_active_agg_info(bond, ad_info)) {
-   seq_printf(seq, bond %s has no active aggregator\n,
-  bond-dev-name);
-   } else {
-   seq_printf(seq, Active Aggregator Info:\n);
-
-   seq_printf(seq, \tAggregator ID: %d\n,
-  ad_info.aggregator_id);
-   seq_printf(seq, \tNumber of ports: %d\n,
-  ad_info.ports);
-   seq_printf(seq, \tActor Key: %d\n,
-  ad_info.actor_key);
-   seq_printf(seq, \tPartner Key: %d\n,
-  ad_info.partner_key);
-   seq_printf(seq, \tPartner Mac Address: %pM\n,
-  ad_info.partner_system);
+   if (capable(CAP_NET_ADMIN)) {
+   seq_printf(seq, System priority: %d\n,
+  BOND_AD_INFO(bond).system.sys_priority);
+   seq_printf(seq, System MAC address: %pM\n,
+  BOND_AD_INFO(bond).system.sys_mac_addr);
+
+   if (__bond_3ad_get_active_agg_info(bond, ad_info)) {
+   seq_printf(seq,
+  bond %s has no active aggregator\n,
+  bond-dev-name);
+   } else {
+   seq_printf(seq, Active Aggregator Info:\n);
+
+   seq_printf(seq, \tAggregator ID: %d\n,
+  ad_info.aggregator_id);
+   seq_printf(seq, \tNumber of ports: %d\n,
+  ad_info.ports);
+   seq_printf(seq, \tActor Key: %d\n,
+  ad_info.actor_key);
+   seq_printf(seq, \tPartner Key: %d\n,
+  ad_info.partner_key);
+   seq_printf(seq, \tPartner Mac Address: %pM\n,
+  ad_info.partner_system);
+   }
}
}
 }
@@ -199,33 +202,35 @@ static void bond_info_show_slave(struct seq_file *seq,
seq_printf(seq, Partner Churned Count: %d\n,
   port-churn_partner_count);
 
-   seq_puts(seq, details actor lacp pdu:\n);
-   seq_printf(seq, system priority: %d\n,
-  port-actor_system_priority);
-   seq_printf(seq, system mac address: %pM\n,
-  port-actor_system);
-   seq_printf(seq, port key: %d\n,
-  port-actor_oper_port_key);
-   seq_printf(seq, port priority: %d\n,
-  port-actor_port_priority);
-   seq_printf(seq, port number: %d\n,
-  port-actor_port_number);
-   seq_printf(seq, port state: %d\n,
-  port-actor_oper_port_state);
-
-   seq_puts(seq, details partner lacp pdu:\n);
-   seq_printf(seq, system priority: %d\n,
-  port-partner_oper.system_priority);
-   

Re: [PATCH] Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt

2015-06-12 Thread David Miller
From: Masanari Iida standby2...@gmail.com
Date: Sat, 13 Jun 2015 00:23:21 +0900

 This patch fix URL (http to https) for wiki.wireshark.org.
 
 Signed-off-by: Masanari Iida standby2...@gmail.com

Applied, thank you.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Andy Lutomirski
On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 eBPF programs attached to kprobes need to filter based on
 current-pid, uid and other fields, so introduce helper functions:

 u64 bpf_get_current_pid_tgid(void)
 Return: current-tgid  32 | current-pid

 u64 bpf_get_current_uid_gid(void)
 Return: current_gid  32 | current_uid

How does this work wrt namespaces, and why the weird packing?

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] rocker: gaurd against NULL rocker_port when removing ports

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

The ports array is filled in as ports are probed, but if probing doesn't
finish, we need to stop only those ports that where probed successfully.
Check the ports array for NULL to skip un-probed ports when stopping.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 819289e..c6a6e3c 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4802,6 +4802,8 @@ static void rocker_remove_ports(const struct rocker 
*rocker)
 
for (i = 0; i  rocker-port_count; i++) {
rocker_port = rocker-ports[i];
+   if (!rocker_port)
+   continue;
rocker_port_ig_tbl(rocker_port, SWITCHDEV_TRANS_NONE,
   ROCKER_OP_FLAG_REMOVE);
unregister_netdev(rocker_port-dev);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Fri, 12 Jun 2015 19:31:32 -0700

 From: Eric Dumazet eduma...@google.com
 
 __skb_header_pointer() returns a pointer that must be checked.
 
 Fixes infinite loop reported by Alexei, and add __must_check to
 catch these errors earlier.
 
 Fixes: 6a74fcf426f5 (flow_dissector: add support for dst, hop-by-hop and 
 routing ext hdrs)
 Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com
 Tested-by: Alexei Starovoitov alexei.starovoi...@gmail.com
 Signed-off-by: Eric Dumazet eduma...@google.com

Applied, thanks Eric.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] tcp: tcp_v6_connect() cleanup

2015-06-12 Thread David Miller
From: Eric Dumazet eric.duma...@gmail.com
Date: Fri, 12 Jun 2015 19:34:03 -0700

 From: Eric Dumazet eduma...@google.com
 
 Remove dead code from tcp_v6_connect()
 
 Signed-off-by: Eric Dumazet eduma...@google.com

Also applied, thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] rocker: fix neigh tbl index increment race

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

rocker-neigh_tbl_next_index is used to generate unique indices for neigh
entries programmed into the device.  The way new indices were generated was
racy with the new prepare-commit transaction model.  A simple fix here
removes the race.  The race was with two processes getting the same index,
one process using prepare-commit, the other not:

Proc A  Proc B

PREPARE phase
get neigh_tbl_next_index

NONE phase
get neigh_tbl_next_index
neigh_tbl_next_index++

COMMIT phase
neigh_tbl_next_index++

Both A and B got the same index.  The fix is to store and increment
neigh_tbl_next_index in the PREPARE (or NONE) phase and use value in COMMIT
phase:

Proc A  Proc B

PREPARE phase
get neigh_tbl_next_index
neigh_tbl_next_index++

NONE phase
get neigh_tbl_next_index
neigh_tbl_next_index++

COMMIT phase
// use value stashed in PREPARE phase

Reported-by: Simon Horman simon.hor...@netronome.com
Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index c6a6e3c..a9d1559 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -2901,10 +2901,10 @@ static void _rocker_neigh_add(struct rocker *rocker,
  enum switchdev_trans trans,
  struct rocker_neigh_tbl_entry *entry)
 {
-   entry-index = rocker-neigh_tbl_next_index;
+   if (trans != SWITCHDEV_TRANS_COMMIT)
+   entry-index = rocker-neigh_tbl_next_index++;
if (trans == SWITCHDEV_TRANS_PREPARE)
return;
-   rocker-neigh_tbl_next_index++;
entry-ref_count++;
hash_add(rocker-neigh_tbl, entry-entry,
 be32_to_cpu(entry-ip_addr));
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 3/5] rocker: mark STP update as 'no wait' processing

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

We can get STP updates from the bridge driver in atomic and non-atomic
contexts.  Since we can't test what context we're getting called in,
do the STP processing as 'no wait', which will cover all cases.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 1995b59..6c15c2e 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4286,7 +4286,8 @@ static int rocker_port_attr_set(struct net_device *dev,
 
switch (attr-id) {
case SWITCHDEV_ATTR_PORT_STP_STATE:
-   err = rocker_port_stp_update(rocker_port, attr-trans, 0,
+   err = rocker_port_stp_update(rocker_port, attr-trans,
+ROCKER_OP_FLAG_NOWAIT,
 attr-u.stp_state);
break;
case SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS:
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/5] rocker: revert back to support for nowait processes

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

One of the items removed from the rocker driver in the Spring Cleanup patch
series was the ability to mark processing in the driver as no wait for
those contexts where we cannot sleep.  Turns out, we have no wait
contexts where we want to program the device.  So re-add the
ROCKER_OP_FLAG_NOWAIT flag to mark such processes, and propagate flags to
mem allocator and to the device cmd executor.  With NOWAIT, mem allocs are
GFP_ATOMIC and device cmds are queued to the device, but the driver will
not wait (sleep) for the response back from the device.

My bad for removing NOWAIT support in the first place; I thought we could
swing non-sleep contexts to process context using a work queue, for
example, but there is push-back to keep processing in original context.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |  202 +++---
 1 file changed, 112 insertions(+), 90 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index a9d1559..c1910c1 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -326,10 +326,18 @@ static bool rocker_port_is_bridged(const struct 
rocker_port *rocker_port)
return !!rocker_port-bridge_dev;
 }
 
+#define ROCKER_OP_FLAG_REMOVE  BIT(0)
+#define ROCKER_OP_FLAG_NOWAIT  BIT(1)
+#define ROCKER_OP_FLAG_LEARNED BIT(2)
+#define ROCKER_OP_FLAG_REFRESH BIT(3)
+
 static void *__rocker_port_mem_alloc(struct rocker_port *rocker_port,
-enum switchdev_trans trans, size_t size)
+enum switchdev_trans trans, int flags,
+size_t size)
 {
struct list_head *elem = NULL;
+   gfp_t gfp_flags = (flags  ROCKER_OP_FLAG_NOWAIT) ?
+ GFP_ATOMIC : GFP_KERNEL;
 
/* If in transaction prepare phase, allocate the memory
 * and enqueue it on a per-port list.  If in transaction
@@ -342,7 +350,7 @@ static void *__rocker_port_mem_alloc(struct rocker_port 
*rocker_port,
 
switch (trans) {
case SWITCHDEV_TRANS_PREPARE:
-   elem = kzalloc(size + sizeof(*elem), GFP_KERNEL);
+   elem = kzalloc(size + sizeof(*elem), gfp_flags);
if (!elem)
return NULL;
list_add_tail(elem, rocker_port-trans_mem);
@@ -353,7 +361,7 @@ static void *__rocker_port_mem_alloc(struct rocker_port 
*rocker_port,
list_del_init(elem);
break;
case SWITCHDEV_TRANS_NONE:
-   elem = kzalloc(size + sizeof(*elem), GFP_KERNEL);
+   elem = kzalloc(size + sizeof(*elem), gfp_flags);
if (elem)
INIT_LIST_HEAD(elem);
break;
@@ -365,16 +373,17 @@ static void *__rocker_port_mem_alloc(struct rocker_port 
*rocker_port,
 }
 
 static void *rocker_port_kzalloc(struct rocker_port *rocker_port,
-enum switchdev_trans trans, size_t size)
+enum switchdev_trans trans, int flags,
+size_t size)
 {
-   return __rocker_port_mem_alloc(rocker_port, trans, size);
+   return __rocker_port_mem_alloc(rocker_port, trans, flags, size);
 }
 
 static void *rocker_port_kcalloc(struct rocker_port *rocker_port,
-enum switchdev_trans trans, size_t n,
-size_t size)
+enum switchdev_trans trans, int flags,
+size_t n, size_t size)
 {
-   return __rocker_port_mem_alloc(rocker_port, trans, n * size);
+   return __rocker_port_mem_alloc(rocker_port, trans, flags, n * size);
 }
 
 static void rocker_port_kfree(enum switchdev_trans trans, const void *mem)
@@ -397,11 +406,13 @@ static void rocker_port_kfree(enum switchdev_trans trans, 
const void *mem)
 struct rocker_wait {
wait_queue_head_t wait;
bool done;
+   bool nowait;
 };
 
 static void rocker_wait_reset(struct rocker_wait *wait)
 {
wait-done = false;
+   wait-nowait = false;
 }
 
 static void rocker_wait_init(struct rocker_wait *wait)
@@ -411,11 +422,12 @@ static void rocker_wait_init(struct rocker_wait *wait)
 }
 
 static struct rocker_wait *rocker_wait_create(struct rocker_port *rocker_port,
- enum switchdev_trans trans)
+ enum switchdev_trans trans,
+ int flags)
 {
struct rocker_wait *wait;
 
-   wait = rocker_port_kzalloc(rocker_port, trans, sizeof(*wait));
+   wait = rocker_port_kzalloc(rocker_port, trans, flags, sizeof(*wait));
if (!wait)
return NULL;
rocker_wait_init(wait);
@@ -1386,7 +1398,12 @@ static 

[PATCH net-next 4/5] rocker: move MAC learn event back to 'no wait' processing

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |   40 +++---
 1 file changed, 3 insertions(+), 37 deletions(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 6c15c2e..8430cb3 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -1459,36 +1459,14 @@ static int rocker_port_fdb(struct rocker_port 
*rocker_port,
   const unsigned char *addr,
   __be16 vlan_id, int flags);
 
-struct rocker_mac_vlan_seen_work {
-   struct work_struct work;
-   struct rocker_port *rocker_port;
-   int flags;
-   unsigned char addr[ETH_ALEN];
-   __be16 vlan_id;
-};
-
-static void rocker_event_mac_vlan_seen_work(struct work_struct *work)
-{
-   const struct rocker_mac_vlan_seen_work *sw =
-   container_of(work, struct rocker_mac_vlan_seen_work, work);
-
-   rtnl_lock();
-   rocker_port_fdb(sw-rocker_port, SWITCHDEV_TRANS_NONE,
-   sw-addr, sw-vlan_id, sw-flags);
-   rtnl_unlock();
-
-   kfree(work);
-}
-
 static int rocker_event_mac_vlan_seen(const struct rocker *rocker,
  const struct rocker_tlv *info)
 {
-   struct rocker_mac_vlan_seen_work *sw;
const struct rocker_tlv *attrs[ROCKER_TLV_EVENT_MAC_VLAN_MAX + 1];
unsigned int port_number;
struct rocker_port *rocker_port;
const unsigned char *addr;
-   int flags = ROCKER_OP_FLAG_LEARNED;
+   int flags = ROCKER_OP_FLAG_NOWAIT | ROCKER_OP_FLAG_LEARNED;
__be16 vlan_id;
 
rocker_tlv_parse_nested(attrs, ROCKER_TLV_EVENT_MAC_VLAN_MAX, info);
@@ -1510,20 +1488,8 @@ static int rocker_event_mac_vlan_seen(const struct 
rocker *rocker,
rocker_port-stp_state != BR_STATE_FORWARDING)
return 0;
 
-   sw = kmalloc(sizeof(*sw), GFP_ATOMIC);
-   if (!sw)
-   return -ENOMEM;
-
-   INIT_WORK(sw-work, rocker_event_mac_vlan_seen_work);
-
-   sw-rocker_port = rocker_port;
-   sw-flags = flags;
-   ether_addr_copy(sw-addr, addr);
-   sw-vlan_id = vlan_id;
-
-   schedule_work(sw-work);
-
-   return 0;
+   return rocker_port_fdb(rocker_port, SWITCHDEV_TRANS_NONE,
+  addr, vlan_id, flags);
 }
 
 static int rocker_event_process(const struct rocker *rocker,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 5/5] rocker: move port stop to 'no wait' processing

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

rocker_port_stop can be called from atomic and non-atomic contexts.  Since
we can't test what context we're getting called in, do the processing as
'no wait', which will cover all cases.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index 8430cb3..a06b93d 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -4004,7 +4004,8 @@ static int rocker_port_stop(struct net_device *dev)
rocker_port_set_enable(rocker_port, false);
napi_disable(rocker_port-napi_rx);
napi_disable(rocker_port-napi_tx);
-   rocker_port_fwd_disable(rocker_port, SWITCHDEV_TRANS_NONE, 0);
+   rocker_port_fwd_disable(rocker_port, SWITCHDEV_TRANS_NONE,
+   ROCKER_OP_FLAG_NOWAIT);
free_irq(rocker_msix_rx_vector(rocker_port), rocker_port);
free_irq(rocker_msix_tx_vector(rocker_port), rocker_port);
rocker_port_dma_rings_fini(rocker_port);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/5] rocker: mark neigh update event processing as 'no wait'

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

Neigh update event handler runs in a context where we can't sleep, so mark
processing in driver with ROCKER_OP_FLAG_NOWAIT.  NOWAIT will use
GFP_ATOMIC for allocations and will queue cmds to the device's cmd ring but
will not wait (sleep) for cmd response back from device.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 drivers/net/ethernet/rocker/rocker.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/rocker/rocker.c 
b/drivers/net/ethernet/rocker/rocker.c
index c1910c1..1995b59 100644
--- a/drivers/net/ethernet/rocker/rocker.c
+++ b/drivers/net/ethernet/rocker/rocker.c
@@ -5251,7 +5251,8 @@ static struct notifier_block rocker_netdevice_nb 
__read_mostly = {
 static int rocker_neigh_update(struct net_device *dev, struct neighbour *n)
 {
struct rocker_port *rocker_port = netdev_priv(dev);
-   int flags = (n-nud_state  NUD_VALID) ? 0 : ROCKER_OP_FLAG_REMOVE;
+   int flags = (n-nud_state  NUD_VALID ? 0 : ROCKER_OP_FLAG_REMOVE) |
+   ROCKER_OP_FLAG_NOWAIT;
__be32 ip_addr = *(__be32 *)n-primary_key;
 
return rocker_port_ipv4_neigh(rocker_port, SWITCHDEV_TRANS_NONE,
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/5] rocker: revert back to support for nowait processes

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

One of the items removed from the rocker driver in the Spring Cleanup patch
series was the ability to mark processing in the driver as no wait for
those contexts where we cannot sleep.  Turns out, we have no wait
contexts where we want to program the device and we don't want to defer the
processing to a process context.  So re-add the ROCKER_OP_FLAG_NOWAIT flag
to mark such processes, and propagate flags to mem allocator and to the
device cmd executor.  With NOWAIT, mem allocs are GFP_ATOMIC and device
cmds are queued to the device, but the driver will not wait (sleep) for the
response back from the device.

My bad for removing NOWAIT support in the first place; I thought we could
swing non-sleep contexts to process context using a work queue, for
example, but there is push-back to keep processing in original context.


Scott Feldman (5):
  rocker: revert back to support for nowait processes
  rocker: mark neigh update event processing as 'no wait'
  rocker: mark STP update as 'no wait' processing
  rocker: move MAC learn event back to 'no wait' processing
  rocker: move port stop to 'no wait' processing

 drivers/net/ethernet/rocker/rocker.c |  245 --
 1 file changed, 118 insertions(+), 127 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov

On 6/12/15 3:54 PM, Andy Lutomirski wrote:

On Fri, Jun 12, 2015 at 3:44 PM, Alexei Starovoitov a...@plumgrid.com wrote:

On 6/12/15 3:08 PM, Andy Lutomirski wrote:


On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com
wrote:


eBPF programs attached to kprobes need to filter based on
current-pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-tgid  32 | current-pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid  32 | current_uid



How does this work wrt namespaces,



from_kuid(current_user_ns(), uid)



Is current_user_ns() well defined in the context of an eBPF program?


What do you mean 'well defined'?
Semantically same as 'current'. Depending on where particular
kprobe is placed, 'current' is either meaningful or not. Program
author needs to know what he's doing. It's a tool.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov

On 6/12/15 4:25 PM, Andy Lutomirski wrote:

It's a dangerous tool.  Also, shouldn't the returned uid match the
namespace of the task that installed the probe, not the task that's
being probed?


so leaking info to unprivileged apps is the concern?
The whole thing is for root only as you know.
The non-root is still far away. Today root needs to see the whole
kernel. That was the goal from the beginning.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Andy Lutomirski
On Fri, Jun 12, 2015 at 5:15 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 6/12/15 5:03 PM, Andy Lutomirski wrote:

 On Fri, Jun 12, 2015 at 4:55 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:

 On 6/12/15 4:47 PM, Andy Lutomirski wrote:


 On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:


 On 6/12/15 4:25 PM, Andy Lutomirski wrote:



 It's a dangerous tool.  Also, shouldn't the returned uid match the
 namespace of the task that installed the probe, not the task that's
 being probed?




 so leaking info to unprivileged apps is the concern?
 The whole thing is for root only as you know.
 The non-root is still far away. Today root needs to see the whole
 kernel. That was the goal from the beginning.


 This is more of a correctness issue than a security issue.  ISTM using
 current_user_ns() in a kprobe is asking for trouble.  It certainly
 allows any unprivilege user to show any uid it wants to the probe,
 which is probably not what the installer of the probe expects.



 probe doesn't expect anything. it doesn't make any decisions.
 bpf is read only. it's _visibility_ into the kernel.
 It's not used for security.
 When we start connecting eBPF to seccomp I would agree that uid
 handling needs to be done carefully, but we're not there yet.
 I don't want to kill _visibility_ because in some distant future
 bpf becomes a decision making tool in security area and
 get_current_uid() will return numbers that shouldn't be blindly
 used to reject/accept a user requesting something. That's far away.


 All that is true, but the code that *installed* the bpf probe might
 get might confused when it logs that uid 0 did such-and-such when
 really some unprivileged userns root did it.


 so what specifically you proposing?
 Use from_kuid(init_user_ns,...) instead?

That seems reasonable to me.  After all, you can't install one of
these probes from a non-init userns.

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Eric Dumazet
On Fri, 2015-06-12 at 18:50 -0700, Alexei Starovoitov wrote:

 
 sure, that's better.
 If you're going to submit it officialy, please add my Tested-by.
 My server is happy now :)

Sure , will do.

I tried adding __must_check to __skb_header_pointer() but apparently had
to use W=1 to get a warning :

make W=1 net/core/
  CC  net/core/flow_dissector.o
net/core/flow_dissector.c: In function ‘__skb_flow_dissect’:
net/core/flow_dissector.c:390:19: warning: variable ‘opthdr’ set but not
used [-Wunused-but-set-variable]
   u8 _opthdr[2], *opthdr;


diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 
cc612fc0a8943ec853b92e6b3516b0e5582299e2..45252c4f49e4020eec523273f23f65ee87cc0bd5
 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2743,8 +2743,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int 
offset, int len,
 __wsum skb_checksum(const struct sk_buff *skb, int offset, int len,
__wsum csum);
 
-static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset,
-int len, void *data, int hlen, void 
*buffer)
+static inline void * __must_check
+__skb_header_pointer(const struct sk_buff *skb, int offset,
+int len, void *data, int hlen, void *buffer)
 {
if (hlen - offset = len)
return data + offset;


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Andy Lutomirski
On Fri, Jun 12, 2015 at 4:23 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 6/12/15 3:54 PM, Andy Lutomirski wrote:

 On Fri, Jun 12, 2015 at 3:44 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:

 On 6/12/15 3:08 PM, Andy Lutomirski wrote:


 On Fri, Jun 12, 2015 at 2:40 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:


 eBPF programs attached to kprobes need to filter based on
 current-pid, uid and other fields, so introduce helper functions:

 u64 bpf_get_current_pid_tgid(void)
 Return: current-tgid  32 | current-pid

 u64 bpf_get_current_uid_gid(void)
 Return: current_gid  32 | current_uid



 How does this work wrt namespaces,



 from_kuid(current_user_ns(), uid)


 Is current_user_ns() well defined in the context of an eBPF program?


 What do you mean 'well defined'?
 Semantically same as 'current'. Depending on where particular
 kprobe is placed, 'current' is either meaningful or not. Program
 author needs to know what he's doing. It's a tool.


It's a dangerous tool.  Also, shouldn't the returned uid match the
namespace of the task that installed the probe, not the task that's
being probed?

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Andy Lutomirski
On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 6/12/15 4:25 PM, Andy Lutomirski wrote:

 It's a dangerous tool.  Also, shouldn't the returned uid match the
 namespace of the task that installed the probe, not the task that's
 being probed?


 so leaking info to unprivileged apps is the concern?
 The whole thing is for root only as you know.
 The non-root is still far away. Today root needs to see the whole
 kernel. That was the goal from the beginning.


This is more of a correctness issue than a security issue.  ISTM using
current_user_ns() in a kprobe is asking for trouble.  It certainly
allows any unprivilege user to show any uid it wants to the probe,
which is probably not what the installer of the probe expects.

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov

On 6/12/15 4:47 PM, Andy Lutomirski wrote:

On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com wrote:

On 6/12/15 4:25 PM, Andy Lutomirski wrote:


It's a dangerous tool.  Also, shouldn't the returned uid match the
namespace of the task that installed the probe, not the task that's
being probed?



so leaking info to unprivileged apps is the concern?
The whole thing is for root only as you know.
The non-root is still far away. Today root needs to see the whole
kernel. That was the goal from the beginning.



This is more of a correctness issue than a security issue.  ISTM using
current_user_ns() in a kprobe is asking for trouble.  It certainly
allows any unprivilege user to show any uid it wants to the probe,
which is probably not what the installer of the probe expects.


probe doesn't expect anything. it doesn't make any decisions.
bpf is read only. it's _visibility_ into the kernel.
It's not used for security.
When we start connecting eBPF to seccomp I would agree that uid
handling needs to be done carefully, but we're not there yet.
I don't want to kill _visibility_ because in some distant future
bpf becomes a decision making tool in security area and
get_current_uid() will return numbers that shouldn't be blindly
used to reject/accept a user requesting something. That's far away.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Andy Lutomirski
On Fri, Jun 12, 2015 at 4:55 PM, Alexei Starovoitov a...@plumgrid.com wrote:
 On 6/12/15 4:47 PM, Andy Lutomirski wrote:

 On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com
 wrote:

 On 6/12/15 4:25 PM, Andy Lutomirski wrote:


 It's a dangerous tool.  Also, shouldn't the returned uid match the
 namespace of the task that installed the probe, not the task that's
 being probed?



 so leaking info to unprivileged apps is the concern?
 The whole thing is for root only as you know.
 The non-root is still far away. Today root needs to see the whole
 kernel. That was the goal from the beginning.


 This is more of a correctness issue than a security issue.  ISTM using
 current_user_ns() in a kprobe is asking for trouble.  It certainly
 allows any unprivilege user to show any uid it wants to the probe,
 which is probably not what the installer of the probe expects.


 probe doesn't expect anything. it doesn't make any decisions.
 bpf is read only. it's _visibility_ into the kernel.
 It's not used for security.
 When we start connecting eBPF to seccomp I would agree that uid
 handling needs to be done carefully, but we're not there yet.
 I don't want to kill _visibility_ because in some distant future
 bpf becomes a decision making tool in security area and
 get_current_uid() will return numbers that shouldn't be blindly
 used to reject/accept a user requesting something. That's far away.


All that is true, but the code that *installed* the bpf probe might
get might confused when it logs that uid 0 did such-and-such when
really some unprivileged userns root did it.

Also, as you start calling more and more non-trivial functions from
bpf, you might need to start preventing bpf probe installations in
those functions.

--Andy
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next v2] bridge: use either ndo VLAN ops or switchdev VLAN ops to install MASTER vlans

2015-06-12 Thread sfeldma
From: Scott Feldman sfel...@gmail.com

v2:

Move struct switchdev_obj automatics to inner scope where there used.

v1:

To maintain backward compatibility with the existing iproute2 bridge vlan
command, let bridge's setlink/dellink handler call into either the port
driver's 8021q ndo ops or the port driver's bridge_setlink/dellink ops.

This allows port driver to choose 8021q ops or the newer
bridge_setlink/dellink ops when implementing VLAN add/del filtering on the
device.  The iproute bridge vlan command does not need to be modified.

To summarize using the bridge vlan command examples, we have:

1) bridge vlan add|del vid VID dev DEV

Here iproute2 sets MASTER flag.  Bridge's bridge_setlink/dellink is called.
Vlan is set on bridge for port.  If port driver implements ndo 8021q ops,
call those to port driver can install vlan filter on device.  Otherwise, if
port driver implements bridge_setlink/dellink ops, call those to install
vlan filter to device.  This option only works if port is bridged.

2) bridge vlan add|del vid VID dev DEV master

Same as 1)

3) bridge vlan add|del vid VID dev DEV self

Bridge's bridge_setlink/dellink isn't called.  Port driver's
bridge_setlink/dellink is called, if implemented.  This option works if
port is bridged or not.  If port is not bridged, a VLAN can still be
added/deleted to device filter using this variant.

4) bridge vlan add|del vid VID dev DEV master self

This is a combination of 1) and 3), but will only work if port is bridged.

Signed-off-by: Scott Feldman sfel...@gmail.com
---
 net/bridge/br_vlan.c |   59 --
 1 file changed, 57 insertions(+), 2 deletions(-)

diff --git a/net/bridge/br_vlan.c b/net/bridge/br_vlan.c
index 13013fe..17fc358 100644
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -2,6 +2,7 @@
 #include linux/netdevice.h
 #include linux/rtnetlink.h
 #include linux/slab.h
+#include net/switchdev.h
 
 #include br_private.h
 
@@ -36,6 +37,36 @@ static void __vlan_add_flags(struct net_port_vlans *v, u16 
vid, u16 flags)
clear_bit(vid, v-untagged_bitmap);
 }
 
+static int __vlan_vid_add(struct net_device *dev, struct net_bridge *br,
+ u16 vid, u16 flags)
+{
+   const struct net_device_ops *ops = dev-netdev_ops;
+   int err;
+
+   /* If driver uses VLAN ndo ops, use 8021q to install vid
+* on device, otherwise try switchdev ops to install vid.
+*/
+
+   if (ops-ndo_vlan_rx_add_vid) {
+   err = vlan_vid_add(dev, br-vlan_proto, vid);
+   } else {
+   struct switchdev_obj vlan_obj = {
+   .id = SWITCHDEV_OBJ_PORT_VLAN,
+   .u.vlan = {
+   .flags = flags,
+   .vid_start = vid,
+   .vid_end = vid,
+   },
+   };
+
+   err = switchdev_port_obj_add(dev, vlan_obj);
+   if (err == -EOPNOTSUPP)
+   err = 0;
+   }
+
+   return err;
+}
+
 static int __vlan_add(struct net_port_vlans *v, u16 vid, u16 flags)
 {
struct net_bridge_port *p = NULL;
@@ -62,7 +93,7 @@ static int __vlan_add(struct net_port_vlans *v, u16 vid, u16 
flags)
 * This ensures tagged traffic enters the bridge when
 * promiscuous mode is disabled by br_manage_promisc().
 */
-   err = vlan_vid_add(dev, br-vlan_proto, vid);
+   err = __vlan_vid_add(dev, br, vid, flags);
if (err)
return err;
}
@@ -86,6 +117,30 @@ out_filt:
return err;
 }
 
+static void __vlan_vid_del(struct net_device *dev, struct net_bridge *br,
+  u16 vid)
+{
+   const struct net_device_ops *ops = dev-netdev_ops;
+
+   /* If driver uses VLAN ndo ops, use 8021q to delete vid
+* on device, otherwise try switchdev ops to delete vid.
+*/
+
+   if (ops-ndo_vlan_rx_kill_vid) {
+   vlan_vid_del(dev, br-vlan_proto, vid);
+   } else {
+   struct switchdev_obj vlan_obj = {
+   .id = SWITCHDEV_OBJ_PORT_VLAN,
+   .u.vlan = {
+   .vid_start = vid,
+   .vid_end = vid,
+   },
+   };
+
+   switchdev_port_obj_del(dev, vlan_obj);
+   }
+}
+
 static int __vlan_del(struct net_port_vlans *v, u16 vid)
 {
if (!test_bit(vid, v-vlan_bitmap))
@@ -96,7 +151,7 @@ static int __vlan_del(struct net_port_vlans *v, u16 vid)
 
if (v-port_idx) {
struct net_bridge_port *p = v-parent.port;
-   vlan_vid_del(p-dev, p-br-vlan_proto, vid);
+   __vlan_vid_del(p-dev, p-br, vid);
}
 
clear_bit(vid, v-vlan_bitmap);
-- 
1.7.10.4

--
To unsubscribe from this list: send the line unsubscribe netdev in

Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Alexei Starovoitov
On Fri, Jun 12, 2015 at 06:37:34PM -0700, Eric Dumazet wrote:
 On Fri, 2015-06-12 at 18:27 -0700, Alexei Starovoitov wrote:
  On Fri, Jun 12, 2015 at 09:01:06AM -0700, Tom Herbert wrote:
   If dst, hop-by-hop or routing extension headers are present determine
   length of the options and skip over them in flow dissection.
   
   Signed-off-by: Tom Herbert t...@herbertland.com
   ---
net/core/flow_dissector.c | 17 +
1 file changed, 17 insertions(+)
   
   diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
   index 1818cdc..22e4dff 100644
   --- a/net/core/flow_dissector.c
   +++ b/net/core/flow_dissector.c
   @@ -327,6 +327,7 @@ mpls:
 return false;
 }

   +ip_proto_again:
 switch (ip_proto) {
 case IPPROTO_GRE: {
 struct gre_hdr {
   @@ -383,6 +384,22 @@ mpls:
 }
 goto again;
 }
   + case NEXTHDR_HOP:
   + case NEXTHDR_ROUTING:
   + case NEXTHDR_DEST: {
   + u8 _opthdr[2], *opthdr;
   +
   + if (proto != htons(ETH_P_IPV6))
   + break;
   +
   + opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
   +   data, hlen, _opthdr);
   +
   + ip_proto = _opthdr[0];
   + nhoff += (_opthdr[1] + 1)  3;
   +
   + goto ip_proto_again;
   + }
  
  Dave,
  
  please revert it. My server locks up during boot with:
 
 Seems easy to fix instead ?
 
 diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
 index 
 22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d
  100644
 --- a/net/core/flow_dissector.c
 +++ b/net/core/flow_dissector.c
 @@ -394,9 +394,11 @@ ip_proto_again:
  
   opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
 data, hlen, _opthdr);
 + if (!opthdr)
 + return false;
  
 - ip_proto = _opthdr[0];
 - nhoff += (_opthdr[1] + 1)  3;
 + ip_proto = opthdr[0];
 + nhoff += (opthdr[1] + 1)  3;
  
   goto ip_proto_again;
   }
 

sure, that's better.
If you're going to submit it officialy, please add my Tested-by.
My server is happy now :)

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[no subject]

2015-06-12 Thread Estonia organization




Good day,

We are Christian organization, we give out loan to those who are interested in 
getting a financial help, contact us through our email, at  
estonia_organizat...@yahoo.cl


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] Fix Cavium Liquidio build related errors and warnings

2015-06-12 Thread David Miller
From: Raghu Vatsavayi rvatsav...@caviumnetworks.com
Date: Fri, 12 Jun 2015 18:11:50 -0700

 1) Fixed following sparse warnings:
 ...
 2) Fix build errors corresponding to vmalloc on linux-next 4.1.
 3) Liquidio now supports 64 bit only, modified Kconfig accordingly.
 4) Fix some code alignment issues based on kernel build warnings.
 
 Signed-off-by: Derek Chickles derek.chick...@caviumnetworks.com
 Signed-off-by: Satanand Burla satananda.bu...@caviumnetworks.com
 Signed-off-by: Felix Manlunas felix.manlu...@caviumnetworks.com
 Signed-off-by: Raghu Vatsavayi raghu.vatsav...@caviumnetworks.com

Applied, but I _seriously_ wish you didn't fix the readq/writeq stuff by
restricting the build of the driver to 64-bit.  That really kills build
test coverage.

Just provide an appropriate set of readq/writeq like other drivers do
by including either io-64-nonatomic-hi-lo.h or io-64-nonatomic-lo-hi.h

Thanks.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next] flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Tom Herbert
On Fri, Jun 12, 2015 at 7:31 PM, Eric Dumazet eric.duma...@gmail.com wrote:
 From: Eric Dumazet eduma...@google.com

 __skb_header_pointer() returns a pointer that must be checked.

 Fixes infinite loop reported by Alexei, and add __must_check to
 catch these errors earlier.

 Fixes: 6a74fcf426f5 (flow_dissector: add support for dst, hop-by-hop and 
 routing ext hdrs)
 Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com
 Tested-by: Alexei Starovoitov alexei.starovoi...@gmail.com
 Signed-off-by: Eric Dumazet eduma...@google.com
 ---
  include/linux/skbuff.h|9 +
  net/core/flow_dissector.c |6 --
  2 files changed, 9 insertions(+), 6 deletions(-)

 diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
 index 
 cc612fc0a8943ec853b92e6b3516b0e5582299e2..a7acc92aa6685d7006077510697e3d9481b02588
  100644
 --- a/include/linux/skbuff.h
 +++ b/include/linux/skbuff.h
 @@ -2743,8 +2743,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int 
 offset, int len,
  __wsum skb_checksum(const struct sk_buff *skb, int offset, int len,
 __wsum csum);

 -static inline void *__skb_header_pointer(const struct sk_buff *skb, int 
 offset,
 -int len, void *data, int hlen, void 
 *buffer)
 +static inline void * __must_check
 +__skb_header_pointer(const struct sk_buff *skb, int offset,
 +int len, void *data, int hlen, void *buffer)
  {
 if (hlen - offset = len)
 return data + offset;
 @@ -2756,8 +2757,8 @@ static inline void *__skb_header_pointer(const struct 
 sk_buff *skb, int offset,
 return buffer;
  }

 -static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
 -  int len, void *buffer)
 +static inline void * __must_check
 +skb_header_pointer(const struct sk_buff *skb, int offset, int len, void 
 *buffer)
  {
 return __skb_header_pointer(skb, offset, len, skb-data,
 skb_headlen(skb), buffer);
 diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
 index 
 22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d
  100644
 --- a/net/core/flow_dissector.c
 +++ b/net/core/flow_dissector.c
 @@ -394,9 +394,11 @@ ip_proto_again:

 opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
   data, hlen, _opthdr);
 +   if (!opthdr)
 +   return false;

 -   ip_proto = _opthdr[0];
 -   nhoff += (_opthdr[1] + 1)  3;
 +   ip_proto = opthdr[0];
 +   nhoff += (opthdr[1] + 1)  3;

 goto ip_proto_again;
 }



Acked-by: Tom Herbert t...@herbertland.com
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 0/3] bpf: share helpers between tracing and networking

2015-06-12 Thread Alexei Starovoitov
v1-v2: switched to init_user_ns from current_user_ns as suggested by Andy

Introduce new helpers to access 'struct task_struct'-pid, tgid, uid, gid, comm
fields in tracing and networking.

Share bpf_trace_printk() and bpf_get_smp_processor_id() helpers between
tracing and networking.

Alexei Starovoitov (3):
  bpf: introduce current-pid, tgid, uid, gid, comm accessors
  bpf: allow networking programs to use bpf_trace_printk() for
debugging
  bpf: let kprobe programs use bpf_get_smp_processor_id() helper

 include/linux/bpf.h|4 +++
 include/uapi/linux/bpf.h   |   19 +
 kernel/bpf/core.c  |7 +
 kernel/bpf/helpers.c   |   58 ++
 kernel/trace/bpf_trace.c   |   28 --
 net/core/filter.c  |8 ++
 samples/bpf/bpf_helpers.h  |6 
 samples/bpf/tracex2_kern.c |   24 
 samples/bpf/tracex2_user.c |   67 ++--
 9 files changed, 199 insertions(+), 22 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov
eBPF programs attached to kprobes need to filter based on
current-pid, uid and other fields, so introduce helper functions:

u64 bpf_get_current_pid_tgid(void)
Return: current-tgid  32 | current-pid

u64 bpf_get_current_uid_gid(void)
Return: current_gid  32 | current_uid

bpf_get_current_comm(char *buf, int size_of_buf)
stores current-comm into buf

They can be used from the programs attached to TC as well to classify packets
based on current task fields.

Update tracex2 example to print histogram of write syscalls for each process
instead of aggregated for all.

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
v1-v2: switched to init_user_ns from current_user_ns as suggested by Andy

These helpers will be mainly used by bpf+tracing, but the patch is targeting
net-next tree to minimize merge conflicts and they're useful in TC too.

The feature was requested by Wang Nan wangn...@huawei.com and
Brendan Gregg brendan.d.gr...@gmail.com

We've considered several alternatives:
1: 5 different helpers
  Cons: every call adds performance overhead

2a: single helper that populates 'struct bpf_task_info'
  and uses 'flags' with bit per field.
  struct bpf_task_info {
   __u32 pid;
   __u32 tgid;
   __u32 uid;
   __u32 gid;
   char comm[16];
  };
  bpf_get_current_task_info(task_info, size, flags)
  bit 0 - fill in pid
  bit 1 - fill in tgid
  Pros: single helper.
  Cons: not easy to use and a lot of compares in the helper itself
  (two compares for each field).

2b. single helper that populates 'struct bpf_task_info'
  and uses 'size' to tell how many fields to fill in.
  bpf_get_current_task_info(task_info, size);
  if (size = offsetof(struct bpf_task_info, pid) + sizeof(info-pid))
info-pid = task-pid;
  if (size = offsetof(struct bpf_task_info, tgid) + sizeof(info-tgid))
info-tgid = task-tgid;
  Pros: single call (with single compare per field).
  Cons: still hard to use when only some middle field (like uid) is needed.

These three helpers looks as the best balance between performance and usability.

 include/linux/bpf.h|3 ++
 include/uapi/linux/bpf.h   |   19 +
 kernel/bpf/core.c  |3 ++
 kernel/bpf/helpers.c   |   58 ++
 kernel/trace/bpf_trace.c   |6 
 net/core/filter.c  |6 
 samples/bpf/bpf_helpers.h  |6 
 samples/bpf/tracex2_kern.c |   24 
 samples/bpf/tracex2_user.c |   67 ++--
 9 files changed, 178 insertions(+), 14 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 2235aee8096a..1b9a3f5b27f6 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -188,5 +188,8 @@ extern const struct bpf_func_proto 
bpf_get_prandom_u32_proto;
 extern const struct bpf_func_proto bpf_get_smp_processor_id_proto;
 extern const struct bpf_func_proto bpf_tail_call_proto;
 extern const struct bpf_func_proto bpf_ktime_get_ns_proto;
+extern const struct bpf_func_proto bpf_get_current_pid_tgid_proto;
+extern const struct bpf_func_proto bpf_get_current_uid_gid_proto;
+extern const struct bpf_func_proto bpf_get_current_comm_proto;
 
 #endif /* _LINUX_BPF_H */
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 602f05b7a275..29ef6f99e43d 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -230,6 +230,25 @@ enum bpf_func_id {
 * Return: 0 on success
 */
BPF_FUNC_clone_redirect,
+
+   /**
+* u64 bpf_get_current_pid_tgid(void)
+* Return: current-tgid  32 | current-pid
+*/
+   BPF_FUNC_get_current_pid_tgid,
+
+   /**
+* u64 bpf_get_current_uid_gid(void)
+* Return: current_gid  32 | current_uid
+*/
+   BPF_FUNC_get_current_uid_gid,
+
+   /**
+* bpf_get_current_comm(char *buf, int size_of_buf)
+* stores current-comm into buf
+* Return: 0 on success
+*/
+   BPF_FUNC_get_current_comm,
__BPF_FUNC_MAX_ID,
 };
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1e00aa3316dc..1fc45cc83076 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -730,6 +730,9 @@ const struct bpf_func_proto bpf_map_delete_elem_proto 
__weak;
 const struct bpf_func_proto bpf_get_prandom_u32_proto __weak;
 const struct bpf_func_proto bpf_get_smp_processor_id_proto __weak;
 const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
+const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
+const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
+const struct bpf_func_proto bpf_get_current_comm_proto __weak;
 
 /* Always built-in helper functions. */
 const struct bpf_func_proto bpf_tail_call_proto = {
diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 7ad5d8842d5b..1447ec09421e 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -14,6 +14,8 @@
 #include linux/random.h
 #include linux/smp.h
 #include linux/ktime.h
+#include 

Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Eric Dumazet
On Fri, 2015-06-12 at 18:27 -0700, Alexei Starovoitov wrote:
 On Fri, Jun 12, 2015 at 09:01:06AM -0700, Tom Herbert wrote:
  If dst, hop-by-hop or routing extension headers are present determine
  length of the options and skip over them in flow dissection.
  
  Signed-off-by: Tom Herbert t...@herbertland.com
  ---
   net/core/flow_dissector.c | 17 +
   1 file changed, 17 insertions(+)
  
  diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
  index 1818cdc..22e4dff 100644
  --- a/net/core/flow_dissector.c
  +++ b/net/core/flow_dissector.c
  @@ -327,6 +327,7 @@ mpls:
  return false;
  }
   
  +ip_proto_again:
  switch (ip_proto) {
  case IPPROTO_GRE: {
  struct gre_hdr {
  @@ -383,6 +384,22 @@ mpls:
  }
  goto again;
  }
  +   case NEXTHDR_HOP:
  +   case NEXTHDR_ROUTING:
  +   case NEXTHDR_DEST: {
  +   u8 _opthdr[2], *opthdr;
  +
  +   if (proto != htons(ETH_P_IPV6))
  +   break;
  +
  +   opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
  + data, hlen, _opthdr);
  +
  +   ip_proto = _opthdr[0];
  +   nhoff += (_opthdr[1] + 1)  3;
  +
  +   goto ip_proto_again;
  +   }
 
 Dave,
 
 please revert it. My server locks up during boot with:

Seems easy to fix instead ?

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 
22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d
 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -394,9 +394,11 @@ ip_proto_again:
 
opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
  data, hlen, _opthdr);
+   if (!opthdr)
+   return false;
 
-   ip_proto = _opthdr[0];
-   nhoff += (_opthdr[1] + 1)  3;
+   ip_proto = opthdr[0];
+   nhoff += (opthdr[1] + 1)  3;
 
goto ip_proto_again;
}


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Alexei Starovoitov
On Fri, Jun 12, 2015 at 07:11:16PM -0700, Eric Dumazet wrote:
 On Fri, 2015-06-12 at 18:50 -0700, Alexei Starovoitov wrote:
 
  
  sure, that's better.
  If you're going to submit it officialy, please add my Tested-by.
  My server is happy now :)
 
 Sure , will do.
 
 I tried adding __must_check to __skb_header_pointer() but apparently had
 to use W=1 to get a warning :

that is great idea still. At least buildbot can pick it up.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] flow_dissector: fix ipv6 dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

__skb_header_pointer() returns a pointer that must be checked.

Fixes infinite loop reported by Alexei, and add __must_check to
catch these errors earlier.

Fixes: 6a74fcf426f5 (flow_dissector: add support for dst, hop-by-hop and 
routing ext hdrs)
Reported-by: Alexei Starovoitov alexei.starovoi...@gmail.com
Tested-by: Alexei Starovoitov alexei.starovoi...@gmail.com
Signed-off-by: Eric Dumazet eduma...@google.com
---
 include/linux/skbuff.h|9 +
 net/core/flow_dissector.c |6 --
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 
cc612fc0a8943ec853b92e6b3516b0e5582299e2..a7acc92aa6685d7006077510697e3d9481b02588
 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2743,8 +2743,9 @@ __wsum __skb_checksum(const struct sk_buff *skb, int 
offset, int len,
 __wsum skb_checksum(const struct sk_buff *skb, int offset, int len,
__wsum csum);
 
-static inline void *__skb_header_pointer(const struct sk_buff *skb, int offset,
-int len, void *data, int hlen, void 
*buffer)
+static inline void * __must_check
+__skb_header_pointer(const struct sk_buff *skb, int offset,
+int len, void *data, int hlen, void *buffer)
 {
if (hlen - offset = len)
return data + offset;
@@ -2756,8 +2757,8 @@ static inline void *__skb_header_pointer(const struct 
sk_buff *skb, int offset,
return buffer;
 }
 
-static inline void *skb_header_pointer(const struct sk_buff *skb, int offset,
-  int len, void *buffer)
+static inline void * __must_check
+skb_header_pointer(const struct sk_buff *skb, int offset, int len, void 
*buffer)
 {
return __skb_header_pointer(skb, offset, len, skb-data,
skb_headlen(skb), buffer);
diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 
22e4dffa0c8b3b9a20a7324eae1627313e14ce30..476e5dda59e19822dba98a931369ff2666c59c0d
 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -394,9 +394,11 @@ ip_proto_again:
 
opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
  data, hlen, _opthdr);
+   if (!opthdr)
+   return false;
 
-   ip_proto = _opthdr[0];
-   nhoff += (_opthdr[1] + 1)  3;
+   ip_proto = opthdr[0];
+   nhoff += (opthdr[1] + 1)  3;
 
goto ip_proto_again;
}


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 3/3] bpf: let kprobe programs use bpf_get_smp_processor_id() helper

2015-06-12 Thread Alexei Starovoitov
It's useful to do per-cpu histograms.

Suggested-by: Daniel Wagner daniel.wag...@bmw-carit.de
Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
v1-v2: no changes

 kernel/trace/bpf_trace.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 4f9b5d41869b..88a041adee90 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -181,6 +181,8 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
return bpf_get_current_comm_proto;
case BPF_FUNC_trace_printk:
return bpf_get_trace_printk_proto();
+   case BPF_FUNC_get_smp_processor_id:
+   return bpf_get_smp_processor_id_proto;
default:
return NULL;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 net-next 2/3] bpf: allow networking programs to use bpf_trace_printk() for debugging

2015-06-12 Thread Alexei Starovoitov
bpf_trace_printk() is a helper function used to debug eBPF programs.
Let socket and TC programs use it as well.
Note, it's DEBUG ONLY helper. If it's used in the program,
the kernel will print warning banner to make sure users don't use
it in production.

Signed-off-by: Alexei Starovoitov a...@plumgrid.com
---
v1-v2: no changes

 include/linux/bpf.h  |1 +
 kernel/bpf/core.c|4 
 kernel/trace/bpf_trace.c |   20 
 net/core/filter.c|2 ++
 4 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 1b9a3f5b27f6..4383476a0d48 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -150,6 +150,7 @@ struct bpf_array {
 u64 bpf_tail_call(u64 ctx, u64 r2, u64 index, u64 r4, u64 r5);
 void bpf_prog_array_map_clear(struct bpf_map *map);
 bool bpf_prog_array_compatible(struct bpf_array *array, const struct bpf_prog 
*fp);
+const struct bpf_func_proto *bpf_get_trace_printk_proto(void);
 
 #ifdef CONFIG_BPF_SYSCALL
 void bpf_register_prog_type(struct bpf_prog_type_list *tl);
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1fc45cc83076..c5bedc82bc1c 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -733,6 +733,10 @@ const struct bpf_func_proto bpf_ktime_get_ns_proto __weak;
 const struct bpf_func_proto bpf_get_current_pid_tgid_proto __weak;
 const struct bpf_func_proto bpf_get_current_uid_gid_proto __weak;
 const struct bpf_func_proto bpf_get_current_comm_proto __weak;
+const struct bpf_func_proto * __weak bpf_get_trace_printk_proto(void)
+{
+   return NULL;
+}
 
 /* Always built-in helper functions. */
 const struct bpf_func_proto bpf_tail_call_proto = {
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 3a17638cdf46..4f9b5d41869b 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -147,6 +147,17 @@ static const struct bpf_func_proto bpf_trace_printk_proto 
= {
.arg2_type  = ARG_CONST_STACK_SIZE,
 };
 
+const struct bpf_func_proto *bpf_get_trace_printk_proto(void)
+{
+   /*
+* this program might be calling bpf_trace_printk,
+* so allocate per-cpu printk buffers
+*/
+   trace_printk_init_buffers();
+
+   return bpf_trace_printk_proto;
+}
+
 static const struct bpf_func_proto *kprobe_prog_func_proto(enum bpf_func_id 
func_id)
 {
switch (func_id) {
@@ -168,15 +179,8 @@ static const struct bpf_func_proto 
*kprobe_prog_func_proto(enum bpf_func_id func
return bpf_get_current_uid_gid_proto;
case BPF_FUNC_get_current_comm:
return bpf_get_current_comm_proto;
-
case BPF_FUNC_trace_printk:
-   /*
-* this program might be calling bpf_trace_printk,
-* so allocate per-cpu printk buffers
-*/
-   trace_printk_init_buffers();
-
-   return bpf_trace_printk_proto;
+   return bpf_get_trace_printk_proto();
default:
return NULL;
}
diff --git a/net/core/filter.c b/net/core/filter.c
index 20aa51ccbf9d..65ff107d3d29 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1442,6 +1442,8 @@ sk_filter_func_proto(enum bpf_func_id func_id)
return bpf_tail_call_proto;
case BPF_FUNC_ktime_get_ns:
return bpf_ktime_get_ns_proto;
+   case BPF_FUNC_trace_printk:
+   return bpf_get_trace_printk_proto();
default:
return NULL;
}
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT] Networking

2015-06-12 Thread David Miller

1) Fix uninitialized struct station_info in cfg80211_wireless_stats(), from
   Johannes Berg.

2) Revert commit attempt to fix ipv6 protocol resubmission, it adds
   regressions.

3) Endless loops can be created in bridge port lists, fix from Nikolay
   Aleksandrov.

4) Don't WARN_ON() if sk-sk_forward_alloc is non-zero in
   sk_clear_memalloc, it is a legal situation during swap deactivation.
   Fix from Mel Gorman.

5) Fix order of disabling interrupts and unlocking NAPI in enic driver
   to avoid a race.  From Govindarajulu Varadarajan.

6) High and low register writes are swapped when programming the start
   of periodic output in igb driver.  From RIchard Cochran.

7) Fix device rename handling in mpls stack, from Robert Shearman.

8) Do not trigger compaction synchronously when optimistically trying
   to allocate an order 3 page in alloc_skb_with_frags() and
  skb_page_frag_refill().  From Shaohua Li.

9) Authentication with COOKIE_ECHO is not handled properly in SCTP,
   fix from Marcelo Ricardo Leitner.

Please pull, thanks a lot!

The following changes since commit 5879ae5fd052a63d5ac0684320cb7df3e83da7de:

  Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2015-06-08 
17:41:04 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 

for you to fetch changes up to b07d496177cd3bc4b70fb8a5e85ede24cb403a11:

  Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt (2015-06-12 
14:21:29 -0700)


David S. Miller (1):
  Revert ipv6: Fix protocol resubmission

Erik Hugne (1):
  tipc: disconnect socket directly after probe failure

Govindarajulu Varadarajan (3):
  enic: unlock napi busy poll before unmasking intr
  enic: check return value for stat dump
  enic: fix memory leak in rq_clean

Johannes Berg (1):
  cfg80211: wext: clear sinfo struct before calling driver

Marcelo Ricardo Leitner (1):
  sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO

Masanari Iida (1):
  Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt

Mel Gorman (1):
  net, swap: Remove a warning and clarify why sk_mem_reclaim is required 
when deactivating swap

Nikolay Aleksandrov (1):
  bridge: fix multicast router rlist endless loop

Richard Cochran (1):
  net: igb: fix the start time for periodic output signals

Robert Shearman (1):
  mpls: handle device renames for per-device sysctls

Shaohua Li (1):
  net: don't wait for order-3 page allocation

 Documentation/networking/udplite.txt   |  2 +-
 drivers/net/ethernet/cisco/enic/enic_ethtool.c | 20 +---
 drivers/net/ethernet/cisco/enic/enic_main.c| 11 +--
 drivers/net/ethernet/cisco/enic/vnic_rq.c  |  9 -
 drivers/net/ethernet/intel/igb/igb_ptp.c   |  4 ++--
 net/bridge/br_multicast.c  |  7 +++
 net/core/skbuff.c  |  2 +-
 net/core/sock.c| 15 ++-
 net/ipv6/ip6_input.c   |  8 +++-
 net/mpls/af_mpls.c | 11 +++
 net/sctp/auth.c| 11 ++-
 net/tipc/socket.c  | 16 +++-
 net/wireless/wext-compat.c |  2 ++
 13 files changed, 80 insertions(+), 38 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov

On 6/12/15 5:03 PM, Andy Lutomirski wrote:

On Fri, Jun 12, 2015 at 4:55 PM, Alexei Starovoitov a...@plumgrid.com wrote:

On 6/12/15 4:47 PM, Andy Lutomirski wrote:


On Fri, Jun 12, 2015 at 4:38 PM, Alexei Starovoitov a...@plumgrid.com
wrote:


On 6/12/15 4:25 PM, Andy Lutomirski wrote:



It's a dangerous tool.  Also, shouldn't the returned uid match the
namespace of the task that installed the probe, not the task that's
being probed?




so leaking info to unprivileged apps is the concern?
The whole thing is for root only as you know.
The non-root is still far away. Today root needs to see the whole
kernel. That was the goal from the beginning.



This is more of a correctness issue than a security issue.  ISTM using
current_user_ns() in a kprobe is asking for trouble.  It certainly
allows any unprivilege user to show any uid it wants to the probe,
which is probably not what the installer of the probe expects.



probe doesn't expect anything. it doesn't make any decisions.
bpf is read only. it's _visibility_ into the kernel.
It's not used for security.
When we start connecting eBPF to seccomp I would agree that uid
handling needs to be done carefully, but we're not there yet.
I don't want to kill _visibility_ because in some distant future
bpf becomes a decision making tool in security area and
get_current_uid() will return numbers that shouldn't be blindly
used to reject/accept a user requesting something. That's far away.



All that is true, but the code that *installed* the bpf probe might
get might confused when it logs that uid 0 did such-and-such when
really some unprivileged userns root did it.


so what specifically you proposing?
Use from_kuid(init_user_ns,...) instead?


Also, as you start calling more and more non-trivial functions from
bpf, you might need to start preventing bpf probe installations in
those functions.


yes. may be. I don't want to blacklist stuff yet, unless it
causes crashes. Recursive check is already there. Probably
something else will be needed.

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Fw: [Bug 98781] New: WWAN: TX bytes counter shows very huge impossible value

2015-06-12 Thread Kevin
Stephen Hemminger stephen at networkplumber.org writes:

 
 
 Begin forwarded message:
 
 Date: Sat, 23 May 2015 16:54:50 +
 From: bugzilla-daemon at bugzilla.kernel.org bugzilla-daemon at
bugzilla.kernel.org
 To: shemminger at linux-foundation.org shemminger at
linux-foundation.org
 Subject: [Bug 98781] New: WWAN: TX bytes counter shows very huge
impossible value
 
 https://bugzilla.kernel.org/show_bug.cgi?id=98781
 
 Bug ID: 98781
Summary: WWAN: TX bytes counter shows very huge impossible
 value
Product: Networking
Version: 2.5
 Kernel Version: 4.0.x
   Hardware: Intel
 OS: Linux
   Tree: Mainline
 Status: NEW
   Severity: normal
   Priority: P1
  Component: Other
   Assignee: shemminger at linux-foundation.org
   Reporter: mm at superbash.de
 Regression: No
 
 Since version 4.0.x the TX bytes counter of the WWAN module shows a weird
 value.
 
 Example:
 
 $  ifconfig wwan
 
 wwan0: flags=4163UP,BROADCAST,RUNNING,MULTICAST  mtu 1500
 inet xxx.xxx.xxx.xxx  netmask 255.255.255.252  broadcast
 xxx.xxx.xxx.xxx
 inet6 :::::  prefixlen 64  scopeid 0x20link
 ether xx:xx:xx:xx:xx:xx  txqueuelen 1000  (Ethernet)
 RX packets 19036  bytes 19190321 (18.3 MiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 15874  bytes 43228847574631 (39.3 TiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
 
 39.3 TiB - wow, absolutely not true
 
 The WWAN is used as bridge to my internet provider (LTE usb stick)
 
 I use the counter to control the traffic. It's only the TX counter, the RX
 works ok.
 
I have exactly the same issue, exhibited when upgrading (n-1) kernel on
Ubuntu Vivid 15.04 from 3.17.x:

Linux uranis 3.19.0-20-generic #20-Ubuntu SMP Fri May 29 10:10:47 UTC 2015
x86_64 x86_64 x86_64 GNU/Linux

wwan0 Link encap:Ethernet  HWaddr 26:03:a9:e3:88:2e  
  inet addr:41.150.225.132  Bcast:41.150.225.135  Mask:255.255.255.248
  inet6 addr: fe80::2403:a9ff:fee3:882e/64 Scope:Link
  UP BROADCAST RUNNING NOARP MULTICAST  MTU:1500  Metric:1
  RX packets:3366 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3497 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000 
  RX bytes:1459963 (1.4 MB)  TX bytes:15019500985395 (15.0 TB)

Kernel internal or module (driver) bug, shows up everywhere including
'system monitor'. This problem has come up a few years back and was solved,
but seems to be back again...

Any ideas on fix ?







--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Alexei Starovoitov
On Fri, Jun 12, 2015 at 09:01:06AM -0700, Tom Herbert wrote:
 If dst, hop-by-hop or routing extension headers are present determine
 length of the options and skip over them in flow dissection.
 
 Signed-off-by: Tom Herbert t...@herbertland.com
 ---
  net/core/flow_dissector.c | 17 +
  1 file changed, 17 insertions(+)
 
 diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
 index 1818cdc..22e4dff 100644
 --- a/net/core/flow_dissector.c
 +++ b/net/core/flow_dissector.c
 @@ -327,6 +327,7 @@ mpls:
   return false;
   }
  
 +ip_proto_again:
   switch (ip_proto) {
   case IPPROTO_GRE: {
   struct gre_hdr {
 @@ -383,6 +384,22 @@ mpls:
   }
   goto again;
   }
 + case NEXTHDR_HOP:
 + case NEXTHDR_ROUTING:
 + case NEXTHDR_DEST: {
 + u8 _opthdr[2], *opthdr;
 +
 + if (proto != htons(ETH_P_IPV6))
 + break;
 +
 + opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
 +   data, hlen, _opthdr);
 +
 + ip_proto = _opthdr[0];
 + nhoff += (_opthdr[1] + 1)  3;
 +
 + goto ip_proto_again;
 + }

Dave,

please revert it. My server locks up during boot with:

[   32.391955] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 23s! 
[modprobe:1550]
[   32.392043] RIP: 0010:[815cd8e2]  [815cd8e2] 
skb_copy_bits+0x12/0x260
[   32.392060] Call Trace:
[   32.392061]  IRQ
[   32.392063]  [815d9f38] __skb_flow_dissect+0x358/0x820
[   32.392064]  [815da48e] __skb_get_hash+0x8e/0x2e0
[   32.392066]  [815def7b] __skb_tx_hash+0x5b/0xb0
[   32.392067]  [815df54a] __netdev_pick_tx+0x18a/0x1a0
[   32.392068]  [815df40a] ? __netdev_pick_tx+0x4a/0x1a0
[   32.392069]  [815e4db0] ? __dev_queue_xmit+0x50/0x620
[   32.392071]  [815e4d0b] netdev_pick_tx+0xcb/0x120
[   32.392072]  [815e4e08] __dev_queue_xmit+0xa8/0x620
[   32.392073]  [815e4db0] ? __dev_queue_xmit+0x50/0x620
[   32.392076]  [81698225] ? ip6_finish_output+0xa5/0x1e0
[   32.392077]  [815e53a3] dev_queue_xmit_sk+0x13/0x20
[   32.392078]  [81696144] ip6_finish_output2+0x464/0x5f0
[   32.392079]  [81698225] ? ip6_finish_output+0xa5/0x1e0
[   32.392081]  [816a5bf2] ? ip6_mtu+0xb2/0xd0
[   32.392082]  [816a5b80] ? ip6_mtu+0x40/0xd0
[   32.392083]  [81698225] ip6_finish_output+0xa5/0x1e0
[   32.392084]  [816983be] ip6_output+0x5e/0x1b0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] net: make u64_stats_init() a function

2015-06-12 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

Using a function instead of a macro is cleaner and remove
following W=1 warnings (extract)

In file included from net/ipv6/ip6_vti.c:29:0:
net/ipv6/ip6_vti.c: In function ‘vti6_dev_init_gen’:
include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not
used [-Wunused-but-set-variable]
typeof(type) *stat;   \
  ^
net/ipv6/ip6_vti.c:862:16: note: in expansion of macro
‘netdev_alloc_pcpu_stats’
  dev-tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
^
  CC [M]  net/ipv6/sit.o
In file included from net/ipv6/sit.c:30:0:
net/ipv6/sit.c: In function ‘ipip6_tunnel_init’:
include/linux/netdevice.h:2029:18: warning: variable ‘stat’ set but not
used [-Wunused-but-set-variable]
typeof(type) *stat;   \
  ^

Signed-off-by: Eric Dumazet eduma...@google.com
---
 include/linux/u64_stats_sync.h |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/u64_stats_sync.h b/include/linux/u64_stats_sync.h
index 
4b4439e75f45f8e915f0ffb6b855be5f1113a04f..df89c9bcba7db8dbde3bbf2b99f9af6ed562b112
 100644
--- a/include/linux/u64_stats_sync.h
+++ b/include/linux/u64_stats_sync.h
@@ -68,11 +68,12 @@ struct u64_stats_sync {
 };
 
 
+static inline void u64_stats_init(struct u64_stats_sync *syncp)
+{
 #if BITS_PER_LONG == 32  defined(CONFIG_SMP)
-# define u64_stats_init(syncp) seqcount_init(syncp.seq)
-#else
-# define u64_stats_init(syncp) do { } while (0)
+   seqcount_init(syncp-seq);
 #endif
+}
 
 static inline void u64_stats_update_begin(struct u64_stats_sync *syncp)
 {


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next 1/3] bpf: introduce current-pid, tgid, uid, gid, comm accessors

2015-06-12 Thread Alexei Starovoitov

On 6/12/15 5:24 PM, Andy Lutomirski wrote:

so what specifically you proposing?
Use from_kuid(init_user_ns,...) instead?

That seems reasonable to me.  After all, you can't install one of
these probes from a non-init userns.


ok. will respin with that change.
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] Fix Cavium Liquidio build related errors and warnings

2015-06-12 Thread Raghu Vatsavayi
1) Fixed following sparse warnings:
lio_main.c:213:6: warning: symbol 'octeon_droq_bh' was not
declared. Should it be static?
lio_main.c:233:5: warning: symbol 'lio_wait_for_oq_pkts' was
not declared. Should it be static?
lio_main.c:3083:5: warning: symbol 'lio_nic_info' was not
declared. Should it be static?
lio_main.c:2618:16: warning: cast from restricted __be16
octeon_device.c:466:6: warning: symbol 'oct_set_config_info'
was not declared. Should it be static?
octeon_device.c:573:25: warning: cast to restricted __be32
octeon_device.c:582:29: warning: cast to restricted __be32
octeon_device.c:584:39: warning: cast to restricted __be32
octeon_device.c:594:13: warning: cast to restricted __be32
octeon_device.c:596:25: warning: cast to restricted __be32
octeon_device.c:613:25: warning: cast to restricted __be32
octeon_device.c:614:29: warning: cast to restricted __be64
octeon_device.c:615:29: warning: cast to restricted __be32
octeon_device.c:619:37: warning: cast to restricted __be32
octeon_device.c:623:33: warning: cast to restricted __be32
cn66xx_device.c:540:6: warning: symbol
'lio_cn6xxx_get_pcie_qlmport' was not declared. Should it be s
octeon_mem_ops.c:181:16: warning: cast to restricted __be64
octeon_mem_ops.c:190:16: warning: cast to restricted __be32
octeon_mem_ops.c:196:17: warning: incorrect type in initializer
2) Fix build errors corresponding to vmalloc on linux-next 4.1.
3) Liquidio now supports 64 bit only, modified Kconfig accordingly.
4) Fix some code alignment issues based on kernel build warnings.

Signed-off-by: Derek Chickles derek.chick...@caviumnetworks.com
Signed-off-by: Satanand Burla satananda.bu...@caviumnetworks.com
Signed-off-by: Felix Manlunas felix.manlu...@caviumnetworks.com
Signed-off-by: Raghu Vatsavayi raghu.vatsav...@caviumnetworks.com
---
 drivers/net/ethernet/cavium/Kconfig|  1 +
 drivers/net/ethernet/cavium/liquidio/cn66xx_device.c   |  2 +-
 drivers/net/ethernet/cavium/liquidio/lio_main.c|  9 +
 drivers/net/ethernet/cavium/liquidio/liquidio_image.h  | 14 +++---
 drivers/net/ethernet/cavium/liquidio/octeon_device.c   |  8 +---
 drivers/net/ethernet/cavium/liquidio/octeon_droq.c |  1 +
 drivers/net/ethernet/cavium/liquidio/octeon_mem_ops.c  |  6 +++---
 drivers/net/ethernet/cavium/liquidio/request_manager.c |  4 +++-
 8 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/cavium/Kconfig 
b/drivers/net/ethernet/cavium/Kconfig
index c7d8674..5e7a0e2 100644
--- a/drivers/net/ethernet/cavium/Kconfig
+++ b/drivers/net/ethernet/cavium/Kconfig
@@ -43,6 +43,7 @@ configTHUNDER_NIC_BGX
 
 config LIQUIDIO
tristate Cavium LiquidIO support
+   depends on 64BIT
select PTP_1588_CLOCK
select FW_LOADER
select LIBCRC32
diff --git a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c 
b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
index d23f494..8ad7425 100644
--- a/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
+++ b/drivers/net/ethernet/cavium/liquidio/cn66xx_device.c
@@ -537,7 +537,7 @@ void lio_cn6xxx_disable_interrupt(void *chip)
mmiowb();
 }
 
-void lio_cn6xxx_get_pcie_qlmport(struct octeon_device *oct)
+static void lio_cn6xxx_get_pcie_qlmport(struct octeon_device *oct)
 {
/* CN63xx Pass2 and newer parts implements the SLI_MAC_NUMBER register
 * to determine the PCIE port #
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_main.c 
b/drivers/net/ethernet/cavium/liquidio/lio_main.c
index c75f517..0660dee 100644
--- a/drivers/net/ethernet/cavium/liquidio/lio_main.c
+++ b/drivers/net/ethernet/cavium/liquidio/lio_main.c
@@ -26,6 +26,7 @@
 #include linux/pci.h
 #include linux/pci_ids.h
 #include linux/ip.h
+#include net/ip.h
 #include linux/ipv6.h
 #include linux/net_tstamp.h
 #include linux/if_vlan.h
@@ -210,7 +211,7 @@ static int liquidio_probe(struct pci_dev *pdev,
 static struct handshake handshake[MAX_OCTEON_DEVICES];
 static struct completion first_stage;
 
-void octeon_droq_bh(unsigned long pdev)
+static void octeon_droq_bh(unsigned long pdev)
 {
int q_no;
int reschedule = 0;
@@ -230,7 +231,7 @@ void octeon_droq_bh(unsigned long pdev)
tasklet_schedule(oct_priv-droq_tasklet);
 }
 
-int lio_wait_for_oq_pkts(struct octeon_device *oct)
+static int lio_wait_for_oq_pkts(struct octeon_device *oct)
 {
struct octeon_device_priv *oct_priv =
(struct octeon_device_priv *)oct-priv;
@@ -2615,7 +2616,7 @@ static inline int is_ip_fragmented(struct sk_buff *skb)
 * with more to follow; the current offset could be 0 ).
 * -  ths offset field is non-zero.
 */
-   return htons(ip_hdr(skb)-frag_off)  0x3fff;
+   return (ip_hdr(skb)-frag_off  htons(IP_MF | IP_OFFSET)) ? 1 : 0;
 }
 
 static inline int is_ipv6(struct sk_buff *skb)
@@ -3080,7 +3081,7 @@ 

[PATCH net-next] tcp: tcp_v6_connect() cleanup

2015-06-12 Thread Eric Dumazet
From: Eric Dumazet eduma...@google.com

Remove dead code from tcp_v6_connect()

Signed-off-by: Eric Dumazet eduma...@google.com
---
 net/ipv6/tcp_ipv6.c |2 --
 1 file changed, 2 deletions(-)

diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 
45a7176ed460681558808439f20e1622423f4c32..6748c4277affad71cd721e3a985af10c31c047ad
 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -120,7 +120,6 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr 
*uaddr,
struct ipv6_pinfo *np = inet6_sk(sk);
struct tcp_sock *tp = tcp_sk(sk);
struct in6_addr *saddr = NULL, *final_p, final;
-   struct rt6_info *rt;
struct flowi6 fl6;
struct dst_entry *dst;
int addr_type;
@@ -258,7 +257,6 @@ static int tcp_v6_connect(struct sock *sk, struct sockaddr 
*uaddr,
sk-sk_gso_type = SKB_GSO_TCPV6;
__ip6_dst_store(sk, dst, NULL, NULL);
 
-   rt = (struct rt6_info *) dst;
if (tcp_death_row.sysctl_tw_recycle 
!tp-rx_opt.ts_recent_stamp 
ipv6_addr_equal(fl6.daddr, sk-sk_v6_daddr))


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 0/2] flow_dissector: Fix MPLS parsing and add ext hdr support

2015-06-12 Thread Tom Herbert
Need to shift label. Added parsing of dst, hop-by-hop, and routing
extension headers.


Tom Herbert (2):
  flow_dissector: Fix MPLS entropy label handling in flow dissector
  flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

 net/core/flow_dissector.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 2/2] flow_dissector: add support for dst, hop-by-hop and routing ext hdrs

2015-06-12 Thread Tom Herbert
If dst, hop-by-hop or routing extension headers are present determine
length of the options and skip over them in flow dissection.

Signed-off-by: Tom Herbert t...@herbertland.com
---
 net/core/flow_dissector.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 1818cdc..22e4dff 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -327,6 +327,7 @@ mpls:
return false;
}
 
+ip_proto_again:
switch (ip_proto) {
case IPPROTO_GRE: {
struct gre_hdr {
@@ -383,6 +384,22 @@ mpls:
}
goto again;
}
+   case NEXTHDR_HOP:
+   case NEXTHDR_ROUTING:
+   case NEXTHDR_DEST: {
+   u8 _opthdr[2], *opthdr;
+
+   if (proto != htons(ETH_P_IPV6))
+   break;
+
+   opthdr = __skb_header_pointer(skb, nhoff, sizeof(_opthdr),
+ data, hlen, _opthdr);
+
+   ip_proto = _opthdr[0];
+   nhoff += (_opthdr[1] + 1)  3;
+
+   goto ip_proto_again;
+   }
case IPPROTO_IPIP:
proto = htons(ETH_P_IP);
goto ip;
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next 1/2] flow_dissector: Fix MPLS entropy label handling in flow dissector

2015-06-12 Thread Tom Herbert
Need to shift after masking to get label value for comparison.

Fixes: b3baa0fbd02a1a9d493d8 (mpls: Add MPLS entropy label in flow_keys)
Reported-by: Dan Carpenter dan.carpen...@oracle.com
Signed-off-by: Tom Herbert t...@herbertland.com
---
 net/core/flow_dissector.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
index 77e22e4..1818cdc 100644
--- a/net/core/flow_dissector.c
+++ b/net/core/flow_dissector.c
@@ -299,8 +299,8 @@ mpls:
if (!hdr)
return false;
 
-   if ((ntohl(hdr[0].entry)  MPLS_LS_LABEL_MASK) ==
-MPLS_LABEL_ENTROPY) {
+   if ((ntohl(hdr[0].entry)  MPLS_LS_LABEL_MASK) 
+MPLS_LS_LABEL_SHIFT == MPLS_LABEL_ENTROPY) {
if (skb_flow_dissector_uses_key(flow_dissector,

FLOW_DISSECTOR_KEY_MPLS_ENTROPY)) {
key_keyid = 
skb_flow_dissector_target(flow_dissector,
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH WIP RFC 0/3] mpls: support for ler

2015-06-12 Thread roopa

On 6/10/15, 12:13 AM, roopa wrote:
Robert/Thomas, All my changes are in the below repo under the 'mpls' 
branch.

https://github.com/CumulusNetworks/net-next
https://github.com/CumulusNetworks/iproute2

The last iproute2 commit has a sample usage.

The commits pushed to this tree do not contain support for the 
following yet (but working on it):

a) tunnel routes to work with tunnel RTA_OIF and a non-tunnel RTA_OIF:
The current commits in the tree assume a non-tunnel RTA_OIF.
If the tunnel driver has registered a dst_output func,  dst_output
is set to the tunnel dst output handler in the receive route lookup 
path which in turn does the encap
and xmits. Thomas had last suggested using a flag to skip the dst 
output handler re-direction
for cases where RTA_OIF is a special tunnel netdev and the tunnel 
driver xmit function
can do the encap. My current thinking is to pass the oif to the encap 
parse handler and the handler can set the flag on the tunnel state. 
And this flag can then be used to skip the dst_output re-direction.

This change should be trivial will fix it soon.


I have pushed this change to my github tree.


b) make RTA_OIF optional and do a fib lookup.

thinking about this some more, RTA_OIF is already optional. And 
net/ipv4/fib_semantics.c:fib_check_nh will lookup the dev if not 
specified. Wouldn't that be enough ?. (unless i have misunderstood 
something here)


thanks,
Roopa


--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netlink: add API to retrieve all group memberships

2015-06-12 Thread David Herrmann
This patch adds getsockopt(SOL_NETLINK, NETLINK_LIST_MEMBERSHIPS) to
retrieve all groups a socket is a member of. Currently, we have to use
getsockname() and look at the nl.nl_groups bitmask. However, this mask is
limited to 32 groups. Hence, similar to NETLINK_ADD_MEMBERSHIP and
NETLINK_DROP_MEMBERSHIP, this adds a separate sockopt to manager higher
groups IDs than 32.

This new NETLINK_LIST_MEMBERSHIPS option takes a pointer to __u32 and the
size of the array. The array is filled with the full membership-set of the
socket, and the required array size is returned in optlen. Hence,
user-space can retry with a properly sized array in case it was too small.

Signed-off-by: David Herrmann dh.herrm...@gmail.com
---
 include/uapi/linux/netlink.h | 15 ---
 net/netlink/af_netlink.c | 22 ++
 2 files changed, 30 insertions(+), 7 deletions(-)

diff --git a/include/uapi/linux/netlink.h b/include/uapi/linux/netlink.h
index 1a85940..e38094f 100644
--- a/include/uapi/linux/netlink.h
+++ b/include/uapi/linux/netlink.h
@@ -101,13 +101,14 @@ struct nlmsgerr {
struct nlmsghdr msg;
 };
 
-#define NETLINK_ADD_MEMBERSHIP 1
-#define NETLINK_DROP_MEMBERSHIP2
-#define NETLINK_PKTINFO3
-#define NETLINK_BROADCAST_ERROR4
-#define NETLINK_NO_ENOBUFS 5
-#define NETLINK_RX_RING6
-#define NETLINK_TX_RING7
+#define NETLINK_ADD_MEMBERSHIP 1
+#define NETLINK_DROP_MEMBERSHIP2
+#define NETLINK_PKTINFO3
+#define NETLINK_BROADCAST_ERROR4
+#define NETLINK_NO_ENOBUFS 5
+#define NETLINK_RX_RING6
+#define NETLINK_TX_RING7
+#define NETLINK_LIST_MEMBERSHIPS   8
 
 struct nl_pktinfo {
__u32   group;
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index bf6e766..b84dbe7 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2254,6 +2254,28 @@ static int netlink_getsockopt(struct socket *sock, int 
level, int optname,
return -EFAULT;
err = 0;
break;
+   case NETLINK_LIST_MEMBERSHIPS: {
+   int pos, idx, shift;
+
+   err = 0;
+   netlink_table_grab();
+   for (pos = 0; pos * 8  nlk-ngroups; pos += sizeof(u32)) {
+   if (len - pos  sizeof(u32))
+   break;
+
+   idx = pos / sizeof(unsigned long);
+   shift = (pos % sizeof(unsigned long)) * 8;
+   if (put_user((u32)(nlk-groups[idx]  shift),
+(u32 __user *)(optval + pos))) {
+   err = -EFAULT;
+   break;
+   }
+   }
+   if (put_user(ALIGN(nlk-ngroups / 8, sizeof(u32)), optlen))
+   err = -EFAULT;
+   netlink_table_ungrab();
+   break;
+   }
default:
err = -ENOPROTOOPT;
}
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH net-next v2 05/19] bna: use BIT(x) instead of (1 x)

2015-06-12 Thread David Laight
From: Ivan Vecera
...
 diff --git a/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
 b/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
 index 679a503..16090fd 100644
 --- a/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
 +++ b/drivers/net/ethernet/brocade/bna/bfa_defs_mfg_comm.h
 @@ -75,7 +75,7 @@ enum {
   CB_GPIO_FC4P2   = (4),  /*! 4G 2port FC card   */
   CB_GPIO_FC4P1   = (5),  /*! 4G 1port FC card   */
   CB_GPIO_DFLY= (6),  /*! 8G 2port FC mezzanine card */
 - CB_GPIO_PROTO   = (1  7)  /*! 8G 2port FC prototypes */
 + CB_GPIO_PROTO   = BIT(7)/*! 8G 2port FC prototypes */

That doesn't look like a BIT() value to me, just a large number.
Should the release driver even have support for the prototype hardware?

...
 - if (rx_enet_mask  ((u32)(1  i))) {
 + if (rx_enet_mask  ((u32)BIT(i))) {

The (u32) cast looks superfluous.
There are also too many ().

...
 - int bit = (1  (vlan_id  BFI_VLAN_WORD_MASK));
 + int bit = BIT((vlan_id  BFI_VLAN_WORD_MASK));

Too many ()

David

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] netdevice: add netdev_pub helper function

2015-06-12 Thread Jason A. Donenfeld
Being able to utilize this makes much code a lot simpler and cleaner.
It's a nice convenience function.

Signed-off-by: Jason A. Donenfeld ja...@zx2c4.com
---
 include/linux/netdevice.h | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 05b9a69..f85be18 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1871,6 +1871,17 @@ static inline void *netdev_priv(const struct net_device 
*dev)
return (char *)dev + ALIGN(sizeof(struct net_device), NETDEV_ALIGN);
 }
 
+/**
+ * netdev_pub - access network device from private pointer
+ * @priv: private data pointer of network device
+ *
+ * Get network device from a network device private data pointer
+ */
+static inline struct net_device *netdev_pub(void *priv)
+{
+   return (struct net_device *)((char *)priv - ALIGN(sizeof(struct 
net_device), NETDEV_ALIGN));
+}
+
 /* Set the sysfs physical device reference for the network logical device
  * if set prior to registration will cause a symlink during initialization.
  */
-- 
2.4.2

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] Increase limit of macvtap queues

2015-06-12 Thread Pankaj Gupta
Macvtap should be compatible with tuntap for maximum number
of queues. '1059590254fa9dce9cafc4f07d1103dbec415e76' removes
the limitation and increases number of queues in tuntap.
Now, Its safe to increase number of queues in Macvtap as well.

Signed-off-by: Pankaj Gupta pagu...@redhat.com
---
 include/linux/if_macvlan.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/linux/if_macvlan.h b/include/linux/if_macvlan.h
index 6f6929e..a4ccc31 100644
--- a/include/linux/if_macvlan.h
+++ b/include/linux/if_macvlan.h
@@ -29,7 +29,7 @@ struct macvtap_queue;
  * Maximum times a macvtap device can be opened. This can be used to
  * configure the number of receive queue, e.g. for multiqueue virtio.
  */
-#define MAX_MACVTAP_QUEUES 16
+#define MAX_MACVTAP_QUEUES 256

 #define MACVLAN_MC_FILTER_BITS 8
 #define MACVLAN_MC_FILTER_SZ   (1  MACVLAN_MC_FILTER_BITS)
--
1.7.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Trond Myklebust
On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet eric.duma...@gmail.com wrote:
 On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote:
 On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote:
 
  I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter
  started reporting a hidden port on my box.
 
  Running unhide-tcp I see this:
 
  # unhide-tcp
  Unhide-tcp 20121229
  Copyright © 2012 Yago Jesus  Patrick Gouin
  License GPLv3+ : GNU GPL version 3 or later
  http://www.unhide-forensics.info
  Used options:
  [*]Starting TCP checking
 
  Found Hidden port that not appears in ss: 946
  [*]Starting UDP checking
 
  This scared the hell out of me as I'm thinking that I have got some kind
  of NSA backdoor hooked into my server and it is monitoring my plans to
  smuggle Kinder Überraschung into the USA from Germany. I panicked!
 
  Well, I wasted the day writing modules to first look at all the sockets
  opened by all processes (via their file descriptors) and posted their
  port numbers.
 
http://rostedt.homelinux.com/private/tasklist.c
 
  But this port wasn't there either.
 
  Then I decided to look at the ports in tcp_hashinfo.
 
http://rostedt.homelinux.com/private/portlist.c
 
  This found the port but no file was connected to it, and worse yet,
  when I first ran it without using probe_kernel_read(), it crashed my
  kernel, because sk-sk_socket pointed to a freed socket!
 
  Note, each boot, the hidden port is different.
 
  Finally, I decided to bring in the big guns, and inserted a
  trace_printk() into the bind logic, to see if I could find the culprit.
  After fiddling with it a few times, I found a suspect:
 
 kworker/3:1H-123   [003] ..s.96.696213: inet_bind_hash: add 946
 
  Bah, it's a kernel thread doing it, via a work queue. I then added a
  trace_dump_stack() to find what was calling this, and here it is:
 
  kworker/3:1H-123   [003] ..s.96.696222: stack trace
   = inet_csk_get_port
   = inet_addr_type
   = inet_bind
   = xs_bind
   = sock_setsockopt
   = __sock_create
   = xs_create_sock.isra.18
   = xs_tcp_setup_socket
   = process_one_work
   = worker_thread
   = worker_thread
   = kthread
   = kthread
   = ret_from_fork
   = kthread
 
  I rebooted, and examined what happens. I see the kworker binding that
  port, and all seems well:
 
  # netstat -tapn |grep 946
  tcp0  0 192.168.23.9:946192.168.23.22:55201 
  ESTABLISHED -
 
  But waiting for a bit, the connection goes into a TIME_WAIT, and then
  it just disappears. But the bind to the port does not get released, and
  that port is from then on, taken.
 
  This never happened with my 3.19 kernels. I would bisect it but this is
  happening on my main server box which I usually only reboot every other
  month doing upgrades. It causes too much disturbance for myself (and my
  family) as when this box is offline, basically the rest of my machines
  are too.
 
  I figured this may be enough information to see if you can fix it.
  Otherwise I can try to do the bisect, but that's not going to happen
  any time soon. I may just go back to 3.19 for now, such that rkhunter
  stops complaining about the hidden port.
 

 The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on
 the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set
 SO_REUSEPORT socket option for TCP connections). Perhaps there is an
 issue with that?

 Strange, because the usual way to not have time-wait is to use SO_LINGER
 with linger=0

 And apparently xs_tcp_finish_connecting() has this :

 sock_reset_flag(sk, SOCK_LINGER);
 tcp_sk(sk)-linger2 = 0;

Are you sure? I thought that SO_LINGER is more about controlling how
the socket behaves w.r.t. waiting for the TCP_CLOSE state to be
achieved (i.e. about aborting the FIN state negotiation early). I've
never observed an effect on the TCP time-wait states.

 Are you sure SO_REUSEADDR was not the thing you wanted ?

Yes. SO_REUSEADDR has the problem that it requires you bind to
something other than 0.0.0.0, so it is less appropriate for outgoing
connections; the RPC code really should not have to worry about
routing and routability of a particular source address.

Cheers
  Trond
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Steven Rostedt
On Fri, 12 Jun 2015 07:40:35 -0700
Eric Dumazet eric.duma...@gmail.com wrote:

 Strange, because the usual way to not have time-wait is to use SO_LINGER
 with linger=0
 
 And apparently xs_tcp_finish_connecting() has this :
 
 sock_reset_flag(sk, SOCK_LINGER);
 tcp_sk(sk)-linger2 = 0;
 
 Are you sure SO_REUSEADDR was not the thing you wanted ?
 
 Steven, have you tried kmemleak ?

Nope, and again, I'm hesitant on adding too much debug. This is my main
server (build server, ssh server, web server, mail server, proxy
server, irc server, etc).

Although, I made dprintk() into trace_printk() in xprtsock.c and
xprt.c, and reran it. Here's the output:

(port 684 was the bad one this time)

# tracer: nop
#
# entries-in-buffer/entries-written: 396/396   #P:4
#
#  _-= irqs-off
# / _= need-resched
#| / _---= hardirq/softirq
#|| / _--= preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
rpc.nfsd-4710  [002] 48.615382: xs_local_setup_socket: RPC: 
  worker connecting xprt 8800d9018000 via AF_LOCAL to /var/run/rpcbind.sock
rpc.nfsd-4710  [002] 48.615393: xs_local_setup_socket: RPC: 
  xprt 8800d9018000 connected to /var/run/rpcbind.sock
rpc.nfsd-4710  [002] 48.615394: xs_setup_local: RPC:   set 
up xprt to /var/run/rpcbind.sock via AF_LOCAL
rpc.nfsd-4710  [002] 48.615399: xprt_create_transport: RPC: 
  created transport 8800d9018000 with 65536 slots
rpc.nfsd-4710  [002] 48.615416: xprt_alloc_slot: RPC: 1 
reserved req 8800db829600 xid cb06d5e8
rpc.nfsd-4710  [002] 48.615419: xprt_prepare_transmit: RPC: 
1 xprt_prepare_transmit
rpc.nfsd-4710  [002] 48.615420: xprt_transmit: RPC: 1 
xprt_transmit(44)
rpc.nfsd-4710  [002] 48.615424: xs_local_send_request: RPC: 
  xs_local_send_request(44) = 0
rpc.nfsd-4710  [002] 48.615425: xprt_transmit: RPC: 1 xmit 
complete
 rpcbind-1829  [003] ..s.48.615503: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [003] ..s.48.615506: xprt_complete_rqst: RPC: 1 
xid cb06d5e8 complete (24 bytes received)
rpc.nfsd-4710  [002] 48.615556: xprt_release: RPC: 1 
release request 8800db829600
rpc.nfsd-4710  [002] 48.615568: xprt_alloc_slot: RPC: 2 
reserved req 8800db829600 xid cc06d5e8
rpc.nfsd-4710  [002] 48.615569: xprt_prepare_transmit: RPC: 
2 xprt_prepare_transmit
rpc.nfsd-4710  [002] 48.615569: xprt_transmit: RPC: 2 
xprt_transmit(44)
rpc.nfsd-4710  [002] 48.615578: xs_local_send_request: RPC: 
  xs_local_send_request(44) = 0
rpc.nfsd-4710  [002] 48.615578: xprt_transmit: RPC: 2 xmit 
complete
 rpcbind-1829  [003] ..s.48.615643: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [003] ..s.48.615645: xprt_complete_rqst: RPC: 2 
xid cc06d5e8 complete (24 bytes received)
rpc.nfsd-4710  [002] 48.615695: xprt_release: RPC: 2 
release request 8800db829600
rpc.nfsd-4710  [002] 48.615698: xprt_alloc_slot: RPC: 3 
reserved req 8800db829600 xid cd06d5e8
rpc.nfsd-4710  [002] 48.615699: xprt_prepare_transmit: RPC: 
3 xprt_prepare_transmit
rpc.nfsd-4710  [002] 48.615700: xprt_transmit: RPC: 3 
xprt_transmit(68)
rpc.nfsd-4710  [002] 48.615706: xs_local_send_request: RPC: 
  xs_local_send_request(68) = 0
rpc.nfsd-4710  [002] 48.615707: xprt_transmit: RPC: 3 xmit 
complete
 rpcbind-1829  [003] ..s.48.615784: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [003] ..s.48.615785: xprt_complete_rqst: RPC: 3 
xid cd06d5e8 complete (28 bytes received)
rpc.nfsd-4710  [002] 48.615830: xprt_release: RPC: 3 
release request 8800db829600
rpc.nfsd-4710  [002] 48.615833: xprt_alloc_slot: RPC: 4 
reserved req 8800db829600 xid ce06d5e8
rpc.nfsd-4710  [002] 48.615834: xprt_prepare_transmit: RPC: 
4 xprt_prepare_transmit
rpc.nfsd-4710  [002] 48.615835: xprt_transmit: RPC: 4 
xprt_transmit(68)
rpc.nfsd-4710  [002] 48.615841: xs_local_send_request: RPC: 
  xs_local_send_request(68) = 0
rpc.nfsd-4710  [002] 48.615841: xprt_transmit: RPC: 4 xmit 
complete
 rpcbind-1829  [003] ..s.48.615892: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [003] ..s.48.615894: xprt_complete_rqst: RPC: 4 
xid ce06d5e8 

[PATCH] Doc: networking: Fix URL for wiki.wireshark.org in udplite.txt

2015-06-12 Thread Masanari Iida
This patch fix URL (http to https) for wiki.wireshark.org.

Signed-off-by: Masanari Iida standby2...@gmail.com
---
 Documentation/networking/udplite.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/networking/udplite.txt 
b/Documentation/networking/udplite.txt
index d727a38..53a7268 100644
--- a/Documentation/networking/udplite.txt
+++ b/Documentation/networking/udplite.txt
@@ -20,7 +20,7 @@
files/UDP-Lite-HOWTO.txt
 
o The Wireshark UDP-Lite WiKi (with capture files):
-   http://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
+   https://wiki.wireshark.org/Lightweight_User_Datagram_Protocol
 
o The Protocol Spec, RFC 3828, http://www.ietf.org/rfc/rfc3828.txt
 
-- 
2.4.3.413.ga5fe668

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Eric Dumazet
On Fri, 2015-06-12 at 10:57 -0400, Trond Myklebust wrote:
 On Fri, Jun 12, 2015 at 10:40 AM, Eric Dumazet eric.duma...@gmail.com wrote:

  Strange, because the usual way to not have time-wait is to use SO_LINGER
  with linger=0
 
  And apparently xs_tcp_finish_connecting() has this :
 
  sock_reset_flag(sk, SOCK_LINGER);
  tcp_sk(sk)-linger2 = 0;
 
 Are you sure? I thought that SO_LINGER is more about controlling how
 the socket behaves w.r.t. waiting for the TCP_CLOSE state to be
 achieved (i.e. about aborting the FIN state negotiation early). I've
 never observed an effect on the TCP time-wait states.

Definitely this is standard way to avoid time-wait states.

Maybe not very well documented. We probably should...

http://stackoverflow.com/questions/3757289/tcp-option-so-linger-zero-when-its-required




 Yes. SO_REUSEADDR has the problem that it requires you bind to
 something other than 0.0.0.0, so it is less appropriate for outgoing
 connections; the RPC code really should not have to worry about
 routing and routability of a particular source address.

OK understood.

Are you trying to reuse same 4-tuple ?



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] tcp: cdg: use div_u64()

2015-06-12 Thread Kenneth Klette Jonassen
Fixes cross-compile to mips.

Signed-off-by: Kenneth Klette Jonassen kenne...@ifi.uio.no
---
Fixes build error for mips-allyesconfig:

  net/built-in.o: In function `tcp_cdg_cong_avoid':
 tcp_cdg.c:(.text+0x217774): undefined reference to `__udivdi3'

https://lists.01.org/pipermail/kbuild-all/2015-June/010142.html
---
 net/ipv4/tcp_cdg.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index a52ce2d..8c6fd3d 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -145,7 +145,7 @@ static void tcp_cdg_hystart_update(struct sock *sk)
return;
 
if (hystart_detect  HYSTART_ACK_TRAIN) {
-   u32 now_us = local_clock() / NSEC_PER_USEC;
+   u32 now_us = div_u64(local_clock(), NSEC_PER_USEC);
 
if (ca-last_ack == 0 || !tcp_is_cwnd_limited(sk)) {
ca-last_ack = now_us;
-- 
2.1.0

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: fec: Ensure clocks are enabled while using mdio bus

2015-06-12 Thread Andrew Lunn
When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the clocks are not enabled, MDIO
reads/writes will simply time out. So enable the clocks before
starting a transaction, and disable them afterwards. The CCF performs
reference counting so the clocks will only be disabled if there are no
other users.

Signed-off-by: Andrew Lunn and...@lunn.ch
---
 drivers/net/ethernet/freescale/fec_main.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index bf4cf3fbb5f2..122186b90cdb 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -65,6 +65,7 @@
 
 static void set_multicast_list(struct net_device *ndev);
 static void fec_enet_itr_coal_init(struct net_device *ndev);
+static int fec_enet_clk_enable(struct net_device *ndev, bool enable);
 
 #define DRIVER_NAMEfec
 
@@ -1764,6 +1765,11 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int 
mii_id, int regnum)
 {
struct fec_enet_private *fep = bus-priv;
unsigned long time_left;
+   int ret;
+
+   ret = fec_enet_clk_enable(fep-netdev, true);
+   if (ret)
+   return 0x;
 
fep-mii_timeout = 0;
init_completion(fep-mdio_done);
@@ -1779,11 +1785,14 @@ static int fec_enet_mdio_read(struct mii_bus *bus, int 
mii_id, int regnum)
if (time_left == 0) {
fep-mii_timeout = 1;
netdev_err(fep-netdev, MDIO read timeout\n);
+   fec_enet_clk_enable(fep-netdev, false);
return -ETIMEDOUT;
}
 
-   /* return value */
-   return FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA));
+   ret = FEC_MMFR_DATA(readl(fep-hwp + FEC_MII_DATA));
+   fec_enet_clk_enable(fep-netdev, false);
+
+   return ret;
 }
 
 static int fec_enet_mdio_write(struct mii_bus *bus, int mii_id, int regnum,
@@ -1791,10 +1800,15 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int 
mii_id, int regnum,
 {
struct fec_enet_private *fep = bus-priv;
unsigned long time_left;
+   int ret;
 
fep-mii_timeout = 0;
init_completion(fep-mdio_done);
 
+   ret = fec_enet_clk_enable(fep-netdev, true);
+   if (ret)
+   netdev_err(fep-netdev, Unable to enable clks\n);
+
/* start a write op */
writel(FEC_MMFR_ST | FEC_MMFR_OP_WRITE |
FEC_MMFR_PA(mii_id) | FEC_MMFR_RA(regnum) |
@@ -1807,9 +1821,12 @@ static int fec_enet_mdio_write(struct mii_bus *bus, int 
mii_id, int regnum,
if (time_left == 0) {
fep-mii_timeout = 1;
netdev_err(fep-netdev, MDIO write timeout\n);
+   fec_enet_clk_enable(fep-netdev, false);
return -ETIMEDOUT;
}
 
+   fec_enet_clk_enable(fep-netdev, false);
+
return 0;
 }
 
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Steven Rostedt
On Fri, 12 Jun 2015 11:34:20 -0400
Steven Rostedt rost...@goodmis.org wrote:

 On Fri, 12 Jun 2015 07:40:35 -0700
 Eric Dumazet eric.duma...@gmail.com wrote:
 
  Strange, because the usual way to not have time-wait is to use SO_LINGER
  with linger=0
  
  And apparently xs_tcp_finish_connecting() has this :
  
  sock_reset_flag(sk, SOCK_LINGER);
  tcp_sk(sk)-linger2 = 0;
  
  Are you sure SO_REUSEADDR was not the thing you wanted ?
  
  Steven, have you tried kmemleak ?
 
 Nope, and again, I'm hesitant on adding too much debug. This is my main
 server (build server, ssh server, web server, mail server, proxy
 server, irc server, etc).
 
 Although, I made dprintk() into trace_printk() in xprtsock.c and
 xprt.c, and reran it. Here's the output:
 

I reverted the following commits:

c627d31ba0696cbd829437af2be2f2dee3546b1e
9e2b9f37760e129cee053cc7b6e7288acc2a7134
caf4ccd4e88cf2795c927834bc488c8321437586

And the issue goes away. That is, I watched the port go from
ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.

In fact, I watched the port with my portlist.c module, and it
disappeared there too when it entered the TIME_WAIT state.

Here's the trace of that run:

# tracer: nop
#
# entries-in-buffer/entries-written: 397/397   #P:4
#
#  _-= irqs-off
# / _= need-resched
#| / _---= hardirq/softirq
#|| / _--= preempt-depth
#||| / delay
#   TASK-PID   CPU#  TIMESTAMP  FUNCTION
#  | |   |      | |
rpc.nfsd-3932  [002] 44.098689: xs_local_setup_socket: RPC: 
  worker connecting xprt 88040b6f5800 via AF_LOCAL to /var/run/rpcbind.sock
rpc.nfsd-3932  [002] 44.098699: xs_local_setup_socket: RPC: 
  xprt 88040b6f5800 connected to /var/run/rpcbind.sock
rpc.nfsd-3932  [002] 44.098700: xs_setup_local: RPC:   set 
up xprt to /var/run/rpcbind.sock via AF_LOCAL
rpc.nfsd-3932  [002] 44.098704: xprt_create_transport: RPC: 
  created transport 88040b6f5800 with 65536 slots
rpc.nfsd-3932  [002] 44.098717: xprt_alloc_slot: RPC: 1 
reserved req 8800d8cc6800 xid 0850084b
rpc.nfsd-3932  [002] 44.098720: xprt_prepare_transmit: RPC: 
1 xprt_prepare_transmit
rpc.nfsd-3932  [002] 44.098721: xprt_transmit: RPC: 1 
xprt_transmit(44)
rpc.nfsd-3932  [002] 44.098724: xs_local_send_request: RPC: 
  xs_local_send_request(44) = 0
rpc.nfsd-3932  [002] 44.098724: xprt_transmit: RPC: 1 xmit 
complete
 rpcbind-1829  [001] ..s.44.098812: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [001] ..s.44.098815: xprt_complete_rqst: RPC: 1 
xid 0850084b complete (24 bytes received)
rpc.nfsd-3932  [002] 44.098854: xprt_release: RPC: 1 
release request 8800d8cc6800
rpc.nfsd-3932  [002] 44.098864: xprt_alloc_slot: RPC: 2 
reserved req 8800d8cc6800 xid 0950084b
rpc.nfsd-3932  [002] 44.098865: xprt_prepare_transmit: RPC: 
2 xprt_prepare_transmit
rpc.nfsd-3932  [002] 44.098865: xprt_transmit: RPC: 2 
xprt_transmit(44)
rpc.nfsd-3932  [002] 44.098870: xs_local_send_request: RPC: 
  xs_local_send_request(44) = 0
rpc.nfsd-3932  [002] 44.098870: xprt_transmit: RPC: 2 xmit 
complete
 rpcbind-1829  [001] ..s.44.098915: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [001] ..s.44.098917: xprt_complete_rqst: RPC: 2 
xid 0950084b complete (24 bytes received)
rpc.nfsd-3932  [002] 44.098968: xprt_release: RPC: 2 
release request 8800d8cc6800
rpc.nfsd-3932  [002] 44.098971: xprt_alloc_slot: RPC: 3 
reserved req 8800d8cc6800 xid 0a50084b
rpc.nfsd-3932  [002] 44.098972: xprt_prepare_transmit: RPC: 
3 xprt_prepare_transmit
rpc.nfsd-3932  [002] 44.098973: xprt_transmit: RPC: 3 
xprt_transmit(68)
rpc.nfsd-3932  [002] 44.098978: xs_local_send_request: RPC: 
  xs_local_send_request(68) = 0
rpc.nfsd-3932  [002] 44.098978: xprt_transmit: RPC: 3 xmit 
complete
 rpcbind-1829  [001] ..s.44.099029: xs_local_data_ready: RPC:   
xs_local_data_ready...
 rpcbind-1829  [001] ..s.44.099031: xprt_complete_rqst: RPC: 3 
xid 0a50084b complete (28 bytes received)
rpc.nfsd-3932  [002] 44.099083: xprt_release: RPC: 3 
release request 8800d8cc6800
rpc.nfsd-3932  [002] 44.099086: xprt_alloc_slot: RPC: 4 
reserved req 8800d8cc6800 xid 0b50084b
rpc.nfsd-3932  [002] 44.099086: xprt_prepare_transmit: RPC: 
4 

Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Steven Rostedt
On Fri, 12 Jun 2015 11:50:38 -0400
Steven Rostedt rost...@goodmis.org wrote:

 On Fri, 12 Jun 2015 11:34:20 -0400
 Steven Rostedt rost...@goodmis.org wrote:
 

 
 And the issue goes away. That is, I watched the port go from
 ESTABLISHED to TIME_WAIT, and then gone, and theirs no hidden port.
 

s/theirs/there's/

Time to go back to grammar school. :-p

-- Steve
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] mlx4_en: don't wait for high order page allocation

2015-06-12 Thread Shaohua Li
High order page allocation can cause direct memory compaction and harm
performance. The patch makes the high order page allocation don't wait,
so not trigger direct memory compaction with memory pressure. More
details can be found in a similar patch for net core:
http://marc.info/?l=linux-mmm=143406665720428w=2

Cc: Amir Vadai am...@mellanox.com
Cc: Ido Shamay i...@mellanox.com
Cc: Eric Dumazet eduma...@google.com
Signed-off-by: Shaohua Li s...@fb.com
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2a77a6b..9bc4143 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -60,8 +60,11 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
gfp_t gfp = _gfp;
 
-   if (order)
+   if (order) {
+   if ((PAGE_SIZE  (order - 1)) = frag_info-frag_size)
+   gfp = ~__GFP_WAIT;
gfp |= __GFP_COMP | __GFP_NOWARN;
+   }
page = alloc_pages(gfp, order);
if (likely(page))
break;
-- 
1.8.1

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[BUG ?] delay always evaluates to 0

2015-06-12 Thread Nicholas Mc Guire
Hi !

commit 2c86c275015c (Add ipw2100 wireless driver.) introduced

drivers/net/wireless/ipw2100.c - line-numbers are from next-20150511
1410 static int ipw2100_hw_phy_off(struct ipw2100_priv *priv)
1411 {
1412 
1413 #define HW_PHY_OFF_LOOP_DELAY (HZ / 5000)  
1414
...
1437 
1438 schedule_timeout_uninterruptible(HW_PHY_OFF_LOOP_DELAY);
1439 }

but (HZ / 5000) will evaluate to 0 for all configurable HZ values - typo ?
and this schedule_timeout_uninterruptible() is probably not doing what
is intended.

thx!
hofrat
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mlx4_en: don't wait for high order page allocation

2015-06-12 Thread Shaohua Li
On Fri, Jun 12, 2015 at 10:05:42AM -0700, Alexander Duyck wrote:
 On 06/12/2015 09:50 AM, Shaohua Li wrote:
 High order page allocation can cause direct memory compaction and harm
 performance. The patch makes the high order page allocation don't wait,
 so not trigger direct memory compaction with memory pressure. More
 details can be found in a similar patch for net core:
 http://marc.info/?l=linux-mmm=143406665720428w=2
 
 Cc: Amir Vadai am...@mellanox.com
 Cc: Ido Shamay i...@mellanox.com
 Cc: Eric Dumazet eduma...@google.com
 Signed-off-by: Shaohua Li s...@fb.com
 ---
   drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 -
   1 file changed, 4 insertions(+), 1 deletion(-)
 
 diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
 b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
 index 2a77a6b..9bc4143 100644
 --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
 +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
 @@ -60,8 +60,11 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
  for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
  gfp_t gfp = _gfp;
 -if (order)
 +if (order) {
 +if ((PAGE_SIZE  (order - 1)) = frag_info-frag_size)
 +gfp = ~__GFP_WAIT;
  gfp |= __GFP_COMP | __GFP_NOWARN;
 +}
  page = alloc_pages(gfp, order);
  if (likely(page))
  break;
 
 Is this even really necessary?  I would thing the fact that the
 refill is done using GFP_ATOMIC would be enough to cover the
 frequently used cases.  I wouldn't think the initial allocation when
 the interface is brought up would be something that is a big enough
 deal to justify being fixed in this case.

Ok, if the allocation is always using GFP_ATOMIC at runtime, we
don't need this of course. please ignore it then.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v2] switchdev: fix BUG when port driver doesn't support set attr op

2015-06-12 Thread Andy Gospodarek
On Thu, Jun 11, 2015 at 08:19:01AM -0700, sfel...@gmail.com wrote:
 From: Scott Feldman sfel...@gmail.com
 
 Fix a BUG_ON() where CONFIG_NET_SWITCHDEV is set but the driver for a
 bridged port does not support switchdev_port_attr_set op.  Don't BUG_ON()
 if -EOPNOTSUPP is returned.
 
 Also change BUG_ON() to netdev_err since this is a normal error path and
 does not warrant the use of BUG_ON(), which is reserved for unrecoverable
 errs.
 
 Signed-off-by: Scott Feldman sfel...@gmail.com
 Reported-by: Brenden Blanco bbla...@plumgrid.com

This is less aggressive -- good call.

Acked-by: Andy Gospodarek go...@cumulusnetworks.com

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] mlx4_en: don't wait for high order page allocation

2015-06-12 Thread Alexander Duyck

On 06/12/2015 09:50 AM, Shaohua Li wrote:

High order page allocation can cause direct memory compaction and harm
performance. The patch makes the high order page allocation don't wait,
so not trigger direct memory compaction with memory pressure. More
details can be found in a similar patch for net core:
http://marc.info/?l=linux-mmm=143406665720428w=2

Cc: Amir Vadai am...@mellanox.com
Cc: Ido Shamay i...@mellanox.com
Cc: Eric Dumazet eduma...@google.com
Signed-off-by: Shaohua Li s...@fb.com
---
  drivers/net/ethernet/mellanox/mlx4/en_rx.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2a77a6b..9bc4143 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -60,8 +60,11 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
for (order = MLX4_EN_ALLOC_PREFER_ORDER; ;) {
gfp_t gfp = _gfp;
  
-		if (order)

+   if (order) {
+   if ((PAGE_SIZE  (order - 1)) = frag_info-frag_size)
+   gfp = ~__GFP_WAIT;
gfp |= __GFP_COMP | __GFP_NOWARN;
+   }
page = alloc_pages(gfp, order);
if (likely(page))
break;


Is this even really necessary?  I would thing the fact that the refill 
is done using GFP_ATOMIC would be enough to cover the frequently used 
cases.  I wouldn't think the initial allocation when the interface is 
brought up would be something that is a big enough deal to justify being 
fixed in this case.


- Alex
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/3] net: phy: Allow PHY devices to identify themselves as Ethernet switches, etc.

2015-06-12 Thread Andrew Lunn
From: Florian Fainelli f.faine...@gmail.com

Some Ethernet MAC drivers using the PHY library require the hardcoding
of link parameters when interfaced to a switch device, SFP module,
switch to switch port, etc. This has typically lead to various ad-hoc
implementations looking like this:

- using a fixed PHY emulated device, which will provide link
  indication towards the Ethernet MAC driver and hardware

- pretend there is no PHY and hardcode link parameters, ala mv643x_eth

Based on that, it is desireable to have the PHY drivers advertise the
correct link parameters, just like regular Ethernet PHYs towards their
CPU Ethernet MAC drivers, however, Ethernet MAC drivers should be able
to tell whether this link should be monitored or not. In the context
of an Ethernet switch, SFP module, switch to switch link, we do not
need to monitor this link since it should be always up.

Signed-off-by: Florian Fainelli f.faine...@gmail.com
Signed-off-by: Andrew Lunn and...@lunn.ch
---
 include/linux/phy.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/linux/phy.h b/include/linux/phy.h
index a26c3f84b8dd..5c3b87c0773c 100644
--- a/include/linux/phy.h
+++ b/include/linux/phy.h
@@ -330,6 +330,7 @@ struct phy_c45_device_ids {
  * c45_ids: 802.3-c45 Device Identifers if is_c45.
  * is_c45:  Set to true if this phy uses clause 45 addressing.
  * is_internal: Set to true if this phy is internal to a MAC.
+ * is_pseudo_fixed_link: Set to true if this phy is an Ethernet switch, etc.
  * has_fixups: Set to true if this phy has fixups/quirks.
  * suspended: Set to true if this phy has been suspended successfully.
  * state: state of the PHY for management purposes
@@ -368,6 +369,7 @@ struct phy_device {
struct phy_c45_device_ids c45_ids;
bool is_c45;
bool is_internal;
+   bool is_pseudo_fixed_link;
bool has_fixups;
bool suspended;
 
@@ -686,6 +688,16 @@ static inline bool phy_interface_is_rgmii(struct 
phy_device *phydev)
 {
return phydev-interface = PHY_INTERFACE_MODE_RGMII 
phydev-interface = PHY_INTERFACE_MODE_RGMII_TXID;
+};
+
+/*
+ * phy_is_pseudo_fixed_link - Convenience function for testing if this
+ * PHY is the CPU port facing side of an Ethernet switch, or similar.
+ * @phydev: the phy_device struct
+ */
+static inline bool phy_is_pseudo_fixed_link(struct phy_device *phydev)
+{
+   return phydev-is_pseudo_fixed_link;
 }
 
 /**
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] dsa: mv88e6xxx: Allow speed/duplex of port to be configured

2015-06-12 Thread Andrew Lunn
The current code sets user ports to perform auto negotiation using the
phy. CPU and DSA ports are configured to full duplex and maximum speed
the switch supports.

There are however use cases where the CPU has a slower port, and when
user ports have SFP modules with fixed speed. In these cases, allow
port settings to be read from a fixed_phy devices.

Signed-off-by: Andrew Lunn and...@lunn.ch
---
 drivers/net/dsa/mv88e6123_61_65.c |  1 +
 drivers/net/dsa/mv88e6131.c   |  1 +
 drivers/net/dsa/mv88e6171.c   |  1 +
 drivers/net/dsa/mv88e6352.c   |  1 +
 drivers/net/dsa/mv88e6xxx.c   | 56 +++
 drivers/net/dsa/mv88e6xxx.h   |  2 ++
 net/dsa/slave.c   |  4 ++-
 7 files changed, 65 insertions(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6123_61_65.c 
b/drivers/net/dsa/mv88e6123_61_65.c
index 71a29a7ce538..3de2a6d73fdc 100644
--- a/drivers/net/dsa/mv88e6123_61_65.c
+++ b/drivers/net/dsa/mv88e6123_61_65.c
@@ -129,6 +129,7 @@ struct dsa_switch_driver mv88e6123_61_65_switch_driver = {
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
+   .adjust_link= mv88e6xxx_adjust_link,
 #ifdef CONFIG_NET_DSA_HWMON
.get_temp   = mv88e6xxx_get_temp,
 #endif
diff --git a/drivers/net/dsa/mv88e6131.c b/drivers/net/dsa/mv88e6131.c
index 32f4a08e9bc9..3e8386529965 100644
--- a/drivers/net/dsa/mv88e6131.c
+++ b/drivers/net/dsa/mv88e6131.c
@@ -182,6 +182,7 @@ struct dsa_switch_driver mv88e6131_switch_driver = {
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
+   .adjust_link= mv88e6xxx_adjust_link,
 };
 
 MODULE_ALIAS(platform:mv88e6085);
diff --git a/drivers/net/dsa/mv88e6171.c b/drivers/net/dsa/mv88e6171.c
index 1c7808495a9d..8803e20ebc52 100644
--- a/drivers/net/dsa/mv88e6171.c
+++ b/drivers/net/dsa/mv88e6171.c
@@ -108,6 +108,7 @@ struct dsa_switch_driver mv88e6171_switch_driver = {
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
+   .adjust_link= mv88e6xxx_adjust_link,
 #ifdef CONFIG_NET_DSA_HWMON
.get_temp   = mv88e6xxx_get_temp,
 #endif
diff --git a/drivers/net/dsa/mv88e6352.c b/drivers/net/dsa/mv88e6352.c
index 632815c10a40..7a2deddbe270 100644
--- a/drivers/net/dsa/mv88e6352.c
+++ b/drivers/net/dsa/mv88e6352.c
@@ -374,6 +374,7 @@ struct dsa_switch_driver mv88e6352_switch_driver = {
.get_strings= mv88e6xxx_get_strings,
.get_ethtool_stats  = mv88e6xxx_get_ethtool_stats,
.get_sset_count = mv88e6xxx_get_sset_count,
+   .adjust_link= mv88e6xxx_adjust_link,
.set_eee= mv88e6xxx_set_eee,
.get_eee= mv88e6xxx_get_eee,
 #ifdef CONFIG_NET_DSA_HWMON
diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index 7fba330ce702..3defccb59d42 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -10,6 +10,7 @@
 
 #include linux/delay.h
 #include linux/etherdevice.h
+#include linux/ethtool.h
 #include linux/if_bridge.h
 #include linux/jiffies.h
 #include linux/list.h
@@ -543,6 +544,61 @@ static bool mv88e6xxx_6352_family(struct dsa_switch *ds)
return false;
 }
 
+/* We expect the switch to perform auto negotiation if there is a real
+ * phy. However, in the case of a fixed link phy, we force the port
+ * settings from the fixed link settings.
+ */
+void mv88e6xxx_adjust_link(struct dsa_switch *ds, int port,
+  struct phy_device *phydev)
+{
+   struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
+   u32 ret, reg;
+
+   if (!phy_is_pseudo_fixed_link(phydev))
+   return;
+
+   mutex_lock(ps-smi_mutex);
+
+   ret = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_PCS_CTRL);
+   if (ret  0)
+   goto out;
+
+   reg = ret  ~(PORT_PCS_CTRL_LINK_UP |
+ PORT_PCS_CTRL_FORCE_LINK |
+ PORT_PCS_CTRL_DUPLEX_FULL |
+ PORT_PCS_CTRL_FORCE_DUPLEX |
+ PORT_PCS_CTRL_UNFORCED);
+
+   reg |= PORT_PCS_CTRL_FORCE_LINK;
+   if (phydev-link)
+   reg |= PORT_PCS_CTRL_LINK_UP;
+
+   if (mv88e6xxx_6065_family(ds)  phydev-speed  SPEED_100)
+   goto out;
+
+   switch (phydev-speed) {
+   case SPEED_1000:
+   reg |= PORT_PCS_CTRL_1000;
+   break;
+   case SPEED_100:
+   reg |= PORT_PCS_CTRL_100;
+   break;
+   case SPEED_10:
+   reg |= PORT_PCS_CTRL_10;
+   default:
+   goto out;
+   }
+
+ 

[PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-12 Thread Andrew Lunn
By default, DSA and CPU ports are configured to the maximum speed the
switch supports. However there can be use cases where the peer device
port is slower. Allow a fixed-link property to be used with the DSA
and CPU port in the device tree, and use this information to configure
the port.

Signed-off-by: Andrew Lunn and...@lunn.ch
---
 include/net/dsa.h |  1 +
 net/dsa/dsa.c | 39 +++
 2 files changed, 40 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index fbca63ba8f73..24572f99224c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -160,6 +160,7 @@ struct dsa_switch {
 * Slave mii_bus and devices for the individual ports.
 */
u32 dsa_port_mask;
+   u32 cpu_port_mask;
u32 phys_port_mask;
u32 phys_mii_mask;
struct mii_bus  *slave_mii_bus;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 392e29a0227d..f9c8f4e7ebce 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
 #endif /* CONFIG_NET_DSA_HWMON */
 
 /* basic switch operations **/
+static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master)
+{
+   struct dsa_chip_data *cd = ds-pd;
+   struct device_node *port_dn;
+   struct phy_device *phydev;
+   int ret, port;
+
+   for (port = 0; port  DSA_MAX_PORTS; port++) {
+   if (!((ds-cpu_port_mask | ds-dsa_port_mask)  (1  port)))
+   continue;
+
+   port_dn = cd-port_dn[port];
+   if (of_phy_is_fixed_link(port_dn)) {
+   ret = of_phy_register_fixed_link(port_dn);
+   if (ret) {
+   netdev_err(master,
+  failed to register fixed PHY\n);
+   return ret;
+   }
+   phydev = of_phy_find_device(port_dn);
+   phydev-is_pseudo_fixed_link = true;
+   genphy_config_init(phydev);
+   genphy_read_status(phydev);
+   if (ds-drv-adjust_link)
+   ds-drv-adjust_link(ds, port, phydev);
+   }
+   }
+   return 0;
+}
+
 static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
 {
struct dsa_switch_driver *drv = ds-drv;
@@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
}
dst-cpu_switch = index;
dst-cpu_port = i;
+   ds-cpu_port_mask |= 1  i;
} else if (!strcmp(name, dsa)) {
ds-dsa_port_mask |= 1  i;
} else {
@@ -297,6 +328,14 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
}
}
 
+   /* Perform configuration of the CPU and DSA ports */
+   ret = dsa_cpu_dsa_setup(ds, dst-master_netdev);
+   if (ret  0) {
+   netdev_err(dst-master_netdev, [%d] : can't configure CPU and 
DSA ports\n,
+  index);
+   ret = 0;
+   }
+
 #ifdef CONFIG_NET_DSA_HWMON
/* If the switch provides a temperature sensor,
 * register with hardware monitoring subsystem.
-- 
2.1.4

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] sctp: allow authenticating DATA chunks that are bundled with COOKIE_ECHO

2015-06-12 Thread Vlad Yasevich
On 06/12/2015 07:26 AM, Neil Horman wrote:
 On Thu, Jun 11, 2015 at 05:27:45PM -0700, David Miller wrote:
 From: mleit...@redhat.com
 Date: Thu, 11 Jun 2015 14:49:46 -0300

 From: Marcelo Ricardo Leitner marcelo.leit...@gmail.com

 Currently, we can ask to authenticate DATA chunks and we can send DATA
 chunks on the same packet as COOKIE_ECHO, but if you try to combine
 both, the DATA chunk will be sent unauthenticated and peer won't accept
 it, leading to a communication failure.

 This happens because even though the data was queued after it was
 requested to authenticate DATA chunks, it was also queued before we
 could know that remote peer can handle authenticating, so
 sctp_auth_send_cid() returns false.

 The fix is whenever we set up an active key, re-check send queue for
 chunks that now should be authenticated. As a result, such packet will
 now contain COOKIE_ECHO + AUTH + DATA chunks, in that order.

 Reported-by: Liu Wei we...@redhat.com
 Signed-off-by: Marcelo Ricardo Leitner marcelo.leit...@gmail.com

 Vlad/Neil, please review.

 
 sorry Dave, though I had sent email on that already.
 
 I had an initial concern that there could be a race in which a previous
 iteration of sctp_outq_flush would move some chunks to a packet, but not flush
 it to the network layer yet (due to not being full), and that would result in
 the same condition.  But since this only happens with a COOKIE_ECHO chunk 
 (which
 is a control chunk), we should be ok, as those are sent immediately.

Neil.  I don't think this race can happen since outq manipulation always 
happens under
a socket lock and so do socket options.  So, we are guaranteed that outq will 
not change
in this case.

Acked-by: Vlad Yasevich vyasev...@gmail.com

-vlad

 Acked-by: Neil Horman nhor...@tuxdriver.com
 

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH next v0] bonding: Display LACP info only to CAP_SYS_ADMIN capable user

2015-06-12 Thread Mahesh Bandewar
On Thu, Jun 11, 2015 at 3:22 PM, David Miller da...@davemloft.net wrote:

 From: Mahesh Bandewar mahe...@google.com
 Date: Wed, 10 Jun 2015 17:19:56 -0700

  Actor and Partner details can be accessed via proc-fs and sys-fs
  entries. These interfaces are world readable at this moment. The
  earlier patch-series made the LACP communication secure to avoid
  nuisance attack from within the same L2 domain but it did not
  prevent someone unprivileged looking at that information on host
  and perform the same act.
 
  This patch essentially avoids spitting those entries if the user
  in question does not have enough privileges.
 
  Signed-off-by: Mahesh Bandewar mahe...@google.com

 I agree with Stephen Hemminger in that you should probably be using
 CAP_NET_ADMIN here.
Will change that into the next revision.

Thanks,
--mahesh..
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Eric Dumazet
On Fri, 2015-06-12 at 10:10 -0400, Trond Myklebust wrote:
 On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote:
 
  I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter
  started reporting a hidden port on my box.
 
  Running unhide-tcp I see this:
 
  # unhide-tcp
  Unhide-tcp 20121229
  Copyright © 2012 Yago Jesus  Patrick Gouin
  License GPLv3+ : GNU GPL version 3 or later
  http://www.unhide-forensics.info
  Used options:
  [*]Starting TCP checking
 
  Found Hidden port that not appears in ss: 946
  [*]Starting UDP checking
 
  This scared the hell out of me as I'm thinking that I have got some kind
  of NSA backdoor hooked into my server and it is monitoring my plans to
  smuggle Kinder Überraschung into the USA from Germany. I panicked!
 
  Well, I wasted the day writing modules to first look at all the sockets
  opened by all processes (via their file descriptors) and posted their
  port numbers.
 
http://rostedt.homelinux.com/private/tasklist.c
 
  But this port wasn't there either.
 
  Then I decided to look at the ports in tcp_hashinfo.
 
http://rostedt.homelinux.com/private/portlist.c
 
  This found the port but no file was connected to it, and worse yet,
  when I first ran it without using probe_kernel_read(), it crashed my
  kernel, because sk-sk_socket pointed to a freed socket!
 
  Note, each boot, the hidden port is different.
 
  Finally, I decided to bring in the big guns, and inserted a
  trace_printk() into the bind logic, to see if I could find the culprit.
  After fiddling with it a few times, I found a suspect:
 
 kworker/3:1H-123   [003] ..s.96.696213: inet_bind_hash: add 946
 
  Bah, it's a kernel thread doing it, via a work queue. I then added a
  trace_dump_stack() to find what was calling this, and here it is:
 
  kworker/3:1H-123   [003] ..s.96.696222: stack trace
   = inet_csk_get_port
   = inet_addr_type
   = inet_bind
   = xs_bind
   = sock_setsockopt
   = __sock_create
   = xs_create_sock.isra.18
   = xs_tcp_setup_socket
   = process_one_work
   = worker_thread
   = worker_thread
   = kthread
   = kthread
   = ret_from_fork
   = kthread
 
  I rebooted, and examined what happens. I see the kworker binding that
  port, and all seems well:
 
  # netstat -tapn |grep 946
  tcp0  0 192.168.23.9:946192.168.23.22:55201 
  ESTABLISHED -
 
  But waiting for a bit, the connection goes into a TIME_WAIT, and then
  it just disappears. But the bind to the port does not get released, and
  that port is from then on, taken.
 
  This never happened with my 3.19 kernels. I would bisect it but this is
  happening on my main server box which I usually only reboot every other
  month doing upgrades. It causes too much disturbance for myself (and my
  family) as when this box is offline, basically the rest of my machines
  are too.
 
  I figured this may be enough information to see if you can fix it.
  Otherwise I can try to do the bisect, but that's not going to happen
  any time soon. I may just go back to 3.19 for now, such that rkhunter
  stops complaining about the hidden port.
 
 
 The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on
 the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set
 SO_REUSEPORT socket option for TCP connections). Perhaps there is an
 issue with that?

Strange, because the usual way to not have time-wait is to use SO_LINGER
with linger=0

And apparently xs_tcp_finish_connecting() has this :

sock_reset_flag(sk, SOCK_LINGER);
tcp_sk(sk)-linger2 = 0;

Are you sure SO_REUSEADDR was not the thing you wanted ?

Steven, have you tried kmemleak ?



--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-12 Thread Guenter Roeck

Hi Florian,

On 06/12/2015 10:18 AM, Andrew Lunn wrote:

By default, DSA and CPU ports are configured to the maximum speed the
switch supports. However there can be use cases where the peer device
port is slower. Allow a fixed-link property to be used with the DSA
and CPU port in the device tree, and use this information to configure
the port.

Signed-off-by: Andrew Lunn and...@lunn.ch
---
  include/net/dsa.h |  1 +
  net/dsa/dsa.c | 39 +++
  2 files changed, 40 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index fbca63ba8f73..24572f99224c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -160,6 +160,7 @@ struct dsa_switch {
 * Slave mii_bus and devices for the individual ports.
 */
u32 dsa_port_mask;
+   u32 cpu_port_mask;
u32 phys_port_mask;
u32 phys_mii_mask;
struct mii_bus  *slave_mii_bus;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 392e29a0227d..f9c8f4e7ebce 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
  #endif /* CONFIG_NET_DSA_HWMON */

  /* basic switch operations **/
+static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master)
+{
+   struct dsa_chip_data *cd = ds-pd;
+   struct device_node *port_dn;
+   struct phy_device *phydev;
+   int ret, port;
+
+   for (port = 0; port  DSA_MAX_PORTS; port++) {
+   if (!((ds-cpu_port_mask | ds-dsa_port_mask)  (1  port)))
+   continue;
+

How does cpu_port_mask interact / interfer / coexist with dst-cpu_port
and dsa_is_cpu_port() ?

Elsewhere we have
if (dsa_is_cpu_port(ds, p) || ds-dsa_port_mask  (1  p))
so I don't entirely see why we need to add cpu_port_mask at this time.
Shouldn't that be a separate patch, maybe with a new macro / function
to check if the port is a cpu port or an external switch port ?

Thanks,
Guenter

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-12 Thread Guenter Roeck

On 06/12/2015 11:14 AM, Florian Fainelli wrote:

On 12/06/15 10:18, Andrew Lunn wrote:

By default, DSA and CPU ports are configured to the maximum speed the
switch supports. However there can be use cases where the peer device
port is slower. Allow a fixed-link property to be used with the DSA
and CPU port in the device tree, and use this information to configure
the port.


Humm, I suppose this means that we might end-up with two fixed PHY
devices, one for the Ethernet MAC, and another one for the switch? That
might duplicate the same information, though I cannot think of a better
solution than using phandles to resolve that.



Signed-off-by: Andrew Lunn and...@lunn.ch
---
  include/net/dsa.h |  1 +
  net/dsa/dsa.c | 39 +++
  2 files changed, 40 insertions(+)

diff --git a/include/net/dsa.h b/include/net/dsa.h
index fbca63ba8f73..24572f99224c 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -160,6 +160,7 @@ struct dsa_switch {
 * Slave mii_bus and devices for the individual ports.
 */
u32 dsa_port_mask;
+   u32 cpu_port_mask;
u32 phys_port_mask;
u32 phys_mii_mask;
struct mii_bus  *slave_mii_bus;
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 392e29a0227d..f9c8f4e7ebce 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
  #endif /* CONFIG_NET_DSA_HWMON */

  /* basic switch operations **/
+static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device *master)
+{
+   struct dsa_chip_data *cd = ds-pd;
+   struct device_node *port_dn;
+   struct phy_device *phydev;
+   int ret, port;
+
+   for (port = 0; port  DSA_MAX_PORTS; port++) {
+   if (!((ds-cpu_port_mask | ds-dsa_port_mask)  (1  port)))
+   continue;
+
+   port_dn = cd-port_dn[port];
+   if (of_phy_is_fixed_link(port_dn)) {
+   ret = of_phy_register_fixed_link(port_dn);
+   if (ret) {
+   netdev_err(master,
+  failed to register fixed PHY\n);
+   return ret;
+   }
+   phydev = of_phy_find_device(port_dn);
+   phydev-is_pseudo_fixed_link = true;
+   genphy_config_init(phydev);
+   genphy_read_status(phydev);


I was curious as to why you were doing this at first, but I guess this
is because the PHY state machine is not started for this fixed PHY that
you just created, right?


+   if (ds-drv-adjust_link)
+   ds-drv-adjust_link(ds, port, phydev);
+   }
+   }
+   return 0;
+}
+
  static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
  {
struct dsa_switch_driver *drv = ds-drv;
@@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
}
dst-cpu_switch = index;
dst-cpu_port = i;
+   ds-cpu_port_mask |= 1  i;


Same question as Guenter here, I assume this is because you plan on
having multiple CPU ports connected to the switch and this makes it
easier to deal with, is that right?



If so, should that be done in a separate patch set ?

Guenter

--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] net: dsa: Allow configuration of CPU DSA port speeds/duplex

2015-06-12 Thread Florian Fainelli
On 12/06/15 10:18, Andrew Lunn wrote:
 By default, DSA and CPU ports are configured to the maximum speed the
 switch supports. However there can be use cases where the peer device
 port is slower. Allow a fixed-link property to be used with the DSA
 and CPU port in the device tree, and use this information to configure
 the port.

Humm, I suppose this means that we might end-up with two fixed PHY
devices, one for the Ethernet MAC, and another one for the switch? That
might duplicate the same information, though I cannot think of a better
solution than using phandles to resolve that.

 
 Signed-off-by: Andrew Lunn and...@lunn.ch
 ---
  include/net/dsa.h |  1 +
  net/dsa/dsa.c | 39 +++
  2 files changed, 40 insertions(+)
 
 diff --git a/include/net/dsa.h b/include/net/dsa.h
 index fbca63ba8f73..24572f99224c 100644
 --- a/include/net/dsa.h
 +++ b/include/net/dsa.h
 @@ -160,6 +160,7 @@ struct dsa_switch {
* Slave mii_bus and devices for the individual ports.
*/
   u32 dsa_port_mask;
 + u32 cpu_port_mask;
   u32 phys_port_mask;
   u32 phys_mii_mask;
   struct mii_bus  *slave_mii_bus;
 diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
 index 392e29a0227d..f9c8f4e7ebce 100644
 --- a/net/dsa/dsa.c
 +++ b/net/dsa/dsa.c
 @@ -176,6 +176,36 @@ __ATTRIBUTE_GROUPS(dsa_hwmon);
  #endif /* CONFIG_NET_DSA_HWMON */
  
  /* basic switch operations 
 **/
 +static int dsa_cpu_dsa_setup(struct dsa_switch *ds, struct net_device 
 *master)
 +{
 + struct dsa_chip_data *cd = ds-pd;
 + struct device_node *port_dn;
 + struct phy_device *phydev;
 + int ret, port;
 +
 + for (port = 0; port  DSA_MAX_PORTS; port++) {
 + if (!((ds-cpu_port_mask | ds-dsa_port_mask)  (1  port)))
 + continue;
 +
 + port_dn = cd-port_dn[port];
 + if (of_phy_is_fixed_link(port_dn)) {
 + ret = of_phy_register_fixed_link(port_dn);
 + if (ret) {
 + netdev_err(master,
 +failed to register fixed PHY\n);
 + return ret;
 + }
 + phydev = of_phy_find_device(port_dn);
 + phydev-is_pseudo_fixed_link = true;
 + genphy_config_init(phydev);
 + genphy_read_status(phydev);

I was curious as to why you were doing this at first, but I guess this
is because the PHY state machine is not started for this fixed PHY that
you just created, right?

 + if (ds-drv-adjust_link)
 + ds-drv-adjust_link(ds, port, phydev);
 + }
 + }
 + return 0;
 +}
 +
  static int dsa_switch_setup_one(struct dsa_switch *ds, struct device *parent)
  {
   struct dsa_switch_driver *drv = ds-drv;
 @@ -204,6 +234,7 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
 struct device *parent)
   }
   dst-cpu_switch = index;
   dst-cpu_port = i;
 + ds-cpu_port_mask |= 1  i;

Same question as Guenter here, I assume this is because you plan on
having multiple CPU ports connected to the switch and this makes it
easier to deal with, is that right?

   } else if (!strcmp(name, dsa)) {
   ds-dsa_port_mask |= 1  i;
   } else {
 @@ -297,6 +328,14 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
 struct device *parent)
   }
   }
  
 + /* Perform configuration of the CPU and DSA ports */
 + ret = dsa_cpu_dsa_setup(ds, dst-master_netdev);
 + if (ret  0) {
 + netdev_err(dst-master_netdev, [%d] : can't configure CPU and 
 DSA ports\n,
 +index);
 + ret = 0;
 + }
 +
  #ifdef CONFIG_NET_DSA_HWMON
   /* If the switch provides a temperature sensor,
* register with hardware monitoring subsystem.
 


-- 
Florian
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION] NFS is creating a hidden port (left over from xs_bind() )

2015-06-12 Thread Trond Myklebust
On Thu, Jun 11, 2015 at 11:49 PM, Steven Rostedt rost...@goodmis.org wrote:

 I recently upgraded my main server to 4.0.4 from 3.19.5 and rkhunter
 started reporting a hidden port on my box.

 Running unhide-tcp I see this:

 # unhide-tcp
 Unhide-tcp 20121229
 Copyright © 2012 Yago Jesus  Patrick Gouin
 License GPLv3+ : GNU GPL version 3 or later
 http://www.unhide-forensics.info
 Used options:
 [*]Starting TCP checking

 Found Hidden port that not appears in ss: 946
 [*]Starting UDP checking

 This scared the hell out of me as I'm thinking that I have got some kind
 of NSA backdoor hooked into my server and it is monitoring my plans to
 smuggle Kinder Überraschung into the USA from Germany. I panicked!

 Well, I wasted the day writing modules to first look at all the sockets
 opened by all processes (via their file descriptors) and posted their
 port numbers.

   http://rostedt.homelinux.com/private/tasklist.c

 But this port wasn't there either.

 Then I decided to look at the ports in tcp_hashinfo.

   http://rostedt.homelinux.com/private/portlist.c

 This found the port but no file was connected to it, and worse yet,
 when I first ran it without using probe_kernel_read(), it crashed my
 kernel, because sk-sk_socket pointed to a freed socket!

 Note, each boot, the hidden port is different.

 Finally, I decided to bring in the big guns, and inserted a
 trace_printk() into the bind logic, to see if I could find the culprit.
 After fiddling with it a few times, I found a suspect:

kworker/3:1H-123   [003] ..s.96.696213: inet_bind_hash: add 946

 Bah, it's a kernel thread doing it, via a work queue. I then added a
 trace_dump_stack() to find what was calling this, and here it is:

 kworker/3:1H-123   [003] ..s.96.696222: stack trace
  = inet_csk_get_port
  = inet_addr_type
  = inet_bind
  = xs_bind
  = sock_setsockopt
  = __sock_create
  = xs_create_sock.isra.18
  = xs_tcp_setup_socket
  = process_one_work
  = worker_thread
  = worker_thread
  = kthread
  = kthread
  = ret_from_fork
  = kthread

 I rebooted, and examined what happens. I see the kworker binding that
 port, and all seems well:

 # netstat -tapn |grep 946
 tcp0  0 192.168.23.9:946192.168.23.22:55201 
 ESTABLISHED -

 But waiting for a bit, the connection goes into a TIME_WAIT, and then
 it just disappears. But the bind to the port does not get released, and
 that port is from then on, taken.

 This never happened with my 3.19 kernels. I would bisect it but this is
 happening on my main server box which I usually only reboot every other
 month doing upgrades. It causes too much disturbance for myself (and my
 family) as when this box is offline, basically the rest of my machines
 are too.

 I figured this may be enough information to see if you can fix it.
 Otherwise I can try to do the bisect, but that's not going to happen
 any time soon. I may just go back to 3.19 for now, such that rkhunter
 stops complaining about the hidden port.


The only new thing that we're doing with 4.0 is to set SO_REUSEPORT on
the socket before binding the port (commit 4dda9c8a5e34: SUNRPC: Set
SO_REUSEPORT socket option for TCP connections). Perhaps there is an
issue with that?

Cheers
  Trond
--
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >