[v2] act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Herbert Xu
On Thu, Apr 16, 2015 at 07:40:30PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 17, 2015 at 10:15:01AM +0800, Herbert Xu wrote:
> > > seems the cleaner fix will be to push skb->mac_len instead?
> > 
> > No skb->mac_len is the same as skb2->dev->hard_header_len.
> 
> hmm. please help me understand the problem then.
> In the commit log you mentioned that your vlan dev and ifb
> have unequal hard header length. I think that can only happen if
> your master dev used to create vlan, didn't have vlan offload,
> so vlandev->hhl = 18 and ifb->hhl = 14. Then when tagged packet
> arrives on master device we call skb_vlan_untag() and
> skb->mac_len becomes 14, then vlan_do_receive() will do

You are right.  I should've just used mac_len.

---8<---
When you redirect a VLAN device to any device, you end up with
crap in af_packet on the xmit path because hard_header_len is
not equal to skb->mac_len.  So the redirected packet contains
four extra bytes at the start which then gets interpreted as
part of the MAC address.

This patch fixes this by only pushing skb->mac_len.  We also
need to fix ifb because it tries to undo the pushing done by
act_mirred.

Signed-off-by: Herbert Xu 

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 34f846b..94570aa 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
if (from & AT_EGRESS) {
dev_queue_xmit(skb);
} else if (from & AT_INGRESS) {
-   skb_pull(skb, skb->dev->hard_header_len);
+   skb_pull(skb, skb->mac_len);
netif_receive_skb(skb);
} else
BUG();
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5953517..3f63cea 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -157,7 +157,7 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
 
if (!(at & AT_EGRESS)) {
if (m->tcfm_ok_push)
-   skb_push(skb2, skb2->dev->hard_header_len);
+   skb_push(skb2, skb->mac_len);
}
 
/* mirror is always swallowed */

-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net-next v4 00/24] switchdev: spring cleanup

2015-04-16 Thread roopa

On 4/13/15, 10:47 PM, roopa wrote:

On 4/12/15, 11:16 PM, sfel...@gmail.com wrote:

From: Scott Feldman 

v4:

Well, it was a lot of work, but now prepare-commit transaction model 
is how
davem advises: if prepare fails, abort the transaction.  The driver 
must do
resource reservations up front in prepare phase and return those 
resources if
aborting.  Commit phase would use reserved resources.  The good news 
is the
driver code (for rocker) now handles resource allocation failures 
better by not

leaving partially device or driver states.  This is a side-effect of the
prepare phase where state isn't modified; only validation of inputs and
resource reservations happen in the prepare phase.  Since we're 
supporting
setting attrs and add objs across lower devs in the stacked case, we 
need to
hold rtnl_lock (or ensure rtnl_lock is held) so lower devs don't move 
on us
during the prepare-commit transaction.  DSA driver code skips the 
prepare phase
and goes straight for the commit phase since no up-front allocations 
are done

and no device failures (that could be detected in the prepare phase) can
happen.


thanks for the series. It definitely does look cleaner and less 
confusing now!.
I do love the abstraction but i was one of the people voting against 
duplicating the
kernel objects into swdev objs which this patches does (which i am 
still not convinced

we should have).



Remove NETIF_F_HW_SWITCH_OFFLOAD from rocker and the swdev_attr_set/get
wrappers.  DSA doesn't set NETIF_F_HW_SWITCH_OFFLOAD, so it can't be in
swdev_attr_set/get.  rocker doesn't need it; or rather can't support
NETIF_F_HW_SWITCH_OFFLOAD being set/cleared at run-time after the device
port is already up and offloading L2/L3. NETIF_F_HW_SWITCH_OFFLOAD is 
still

left as a feature flag for drivers that can use it.


I see that this series removes all uses of it in the switchdev api 
later. I had summarized
the need for the flag in reply to one of your questions a few weeks 
back. Since you have moved all
ndo ops to swdev ops (including ndo_bridge_setlink/dellink), I don't 
want to hold on to the
 feature flag if no one is using it. yes, my userspace driver uses it 
today.

I will come back with stronger justification to keep it or
will submit a patch to remove it and add it back at a later point if 
needed.


scott,  I see that you will be spinning v5 of the series. In which case 
feel free to remove the feature flag during your

code restructuring effort. If i need it again, i will resubmit. thanks.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Alexei Starovoitov
On Fri, Apr 17, 2015 at 10:15:01AM +0800, Herbert Xu wrote:
> > seems the cleaner fix will be to push skb->mac_len instead?
> 
> No skb->mac_len is the same as skb2->dev->hard_header_len.

hmm. please help me understand the problem then.
In the commit log you mentioned that your vlan dev and ifb
have unequal hard header length. I think that can only happen if
your master dev used to create vlan, didn't have vlan offload,
so vlandev->hhl = 18 and ifb->hhl = 14. Then when tagged packet
arrives on master device we call skb_vlan_untag() and
skb->mac_len becomes 14, then vlan_do_receive() will do
another_round, ingress qdisc will trigger and act_mirred
eventually will be called. So existing act_mirred will be pushing
18 bytes, whereas only 14 bytes are valid and the lowest 4 have junk.
By 'fixing it' using ifb's 14 byte the patch is only masking the
problem, no?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Guenter Roeck

hi Vivien,

On 04/16/2015 03:05 PM, Vivien Didelot wrote:

Hi Guenter,


  switch (index) {
+case 0: /* temp1_input */
+if (drv->get_temp)
+mode |= S_IRUGO;


This should be mandatory. Sorry, I don't really understand what you are
trying to accomplish here.

Can you give me a real world example where a chip would support setting
a limit but not reading it ?


I have no such example. I just did not see why this couldn't be allowed
(e.g. setting only set_temp_limit and get_temp_alarm looks fine to me).
But if you say that get_temp should be mandatory, I'm OK with that.


write-only attributes are not defined in the hwmon ABI. If the 'sensors'
command encounters such an attribute, it will create an error message
each time it executes. That doesn't sound very useful to me.

If a chip - for whatever reason - does not have a limit register
but an alarm register or flag, its temperature limit is usually hard-coded
and can be reported this way (the AMD temperature sensor driver does this,
for example). If there is ever a need to support the alarm-register-only
situation for some odd reason, we can add the code at the time.
For now, it just seems to me that you are adding complexity to solve
some theoretic problem which is very unlikely to occur in the real world.


The primary goal of this patchset was to use DEVICE_ATTR_RW to declare
temp1_max, instead of reflecting the minimal permissions needed.


Then why don't you just do that and nothing else ? The goal should be
to simplify code, not to make it more complicated. If the result isn't
less code, I don't think it is worth it.

Thanks,
Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Herbert Xu
On Thu, Apr 16, 2015 at 06:34:02PM -0700, Alexei Starovoitov wrote:
> On Fri, Apr 17, 2015 at 09:02:16AM +0800, Herbert Xu wrote:
> > @@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
> > if (from & AT_EGRESS) {
> > dev_queue_xmit(skb);
> > } else if (from & AT_INGRESS) {
> > -   skb_pull(skb, skb->dev->hard_header_len);
> > +   skb_pull(skb, _dev->hard_header_len);
> ...
> > if (!(at & AT_EGRESS)) {
> > if (m->tcfm_ok_push)
> > -   skb_push(skb2, skb2->dev->hard_header_len);
> > +   skb_push(skb2, dev->hard_header_len);
> 
> seems the cleaner fix will be to push skb->mac_len instead?

No skb->mac_len is the same as skb2->dev->hard_header_len.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck

On 04/16/2015 11:51 AM, Geert Uytterhoeven wrote:

On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck  wrote:

On 04/15/2015 10:12 PM, Guenter Roeck wrote:


Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging
support")
Signed-off-by: Guenter Roeck 



I should have given proper credit.

Reported-by: kbuild test robot 

For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
at least not with gcc 4.9.1.


Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime
soon as it finds real bugs in every new kernel release) says:

 drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
 drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used
uninitialized in this function

Even after your patch, as it missed one case (patch sent).



Grumble. So much for trusting tools :-(

Thanks for fixing this!

Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Alexei Starovoitov
On Fri, Apr 17, 2015 at 09:02:16AM +0800, Herbert Xu wrote:
> @@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
>   if (from & AT_EGRESS) {
>   dev_queue_xmit(skb);
>   } else if (from & AT_INGRESS) {
> - skb_pull(skb, skb->dev->hard_header_len);
> + skb_pull(skb, _dev->hard_header_len);
...
>   if (!(at & AT_EGRESS)) {
>   if (m->tcfm_ok_push)
> - skb_push(skb2, skb2->dev->hard_header_len);
> + skb_push(skb2, dev->hard_header_len);

seems the cleaner fix will be to push skb->mac_len instead?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] inet_diag: fix access to tcp cc information

2015-04-16 Thread Eric Dumazet
From: Eric Dumazet 

Two different problems are fixed here :

1) inet_sk_diag_fill() might be called without socket lock held.
   icsk->icsk_ca_ops can change under us and module be unloaded.
   -> Access to freed memory.
   Fix this using rcu_read_lock() to prevent module unload.

2) Some TCP Congestion Control modules provide information
   but again this is not safe against icsk->icsk_ca_ops
   change and nla_put() errors were ignored. Some sockets
   could not get the additional info if skb was almost full.

Fix this by returning a status from get_info() handlers and
using rcu protection as well.

Signed-off-by: Eric Dumazet 
---
 include/net/tcp.h   |2 +-
 net/ipv4/inet_diag.c|   28 ++--
 net/ipv4/tcp_dctcp.c|5 +++--
 net/ipv4/tcp_illinois.c |6 +++---
 net/ipv4/tcp_vegas.c|5 +++--
 net/ipv4/tcp_vegas.h|2 +-
 net/ipv4/tcp_westwood.c |6 +++---
 7 files changed, 36 insertions(+), 18 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 9598871485ce..051dc5c2802d 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -829,7 +829,7 @@ struct tcp_congestion_ops {
/* hook for packet ack accounting (optional) */
void (*pkts_acked)(struct sock *sk, u32 num_acked, s32 rtt_us);
/* get info for inet_diag (optional) */
-   void (*get_info)(struct sock *sk, u32 ext, struct sk_buff *skb);
+   int (*get_info)(struct sock *sk, u32 ext, struct sk_buff *skb);
 
charname[TCP_CA_NAME_MAX];
struct module   *owner;
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 70e8b3c308ec..bb77ebdae3b3 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -111,6 +111,7 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
  const struct nlmsghdr *unlh)
 {
const struct inet_sock *inet = inet_sk(sk);
+   const struct tcp_congestion_ops *ca_ops;
const struct inet_diag_handler *handler;
int ext = req->idiag_ext;
struct inet_diag_msg *r;
@@ -208,16 +209,31 @@ int inet_sk_diag_fill(struct sock *sk, struct 
inet_connection_sock *icsk,
info = nla_data(attr);
}
 
-   if ((ext & (1 << (INET_DIAG_CONG - 1))) && icsk->icsk_ca_ops)
-   if (nla_put_string(skb, INET_DIAG_CONG,
-  icsk->icsk_ca_ops->name) < 0)
+   if (ext & (1 << (INET_DIAG_CONG - 1))) {
+   int err = 0;
+
+   rcu_read_lock();
+   ca_ops = READ_ONCE(icsk->icsk_ca_ops);
+   if (ca_ops)
+   err = nla_put_string(skb, INET_DIAG_CONG, ca_ops->name);
+   rcu_read_unlock();
+   if (err < 0)
goto errout;
+   }
 
handler->idiag_get_info(sk, r, info);
 
-   if (sk->sk_state < TCP_TIME_WAIT &&
-   icsk->icsk_ca_ops && icsk->icsk_ca_ops->get_info)
-   icsk->icsk_ca_ops->get_info(sk, ext, skb);
+   if (sk->sk_state < TCP_TIME_WAIT) {
+   int err = 0;
+
+   rcu_read_lock();
+   ca_ops = READ_ONCE(icsk->icsk_ca_ops);
+   if (ca_ops && ca_ops->get_info)
+   err = ca_ops->get_info(sk, ext, skb);
+   rcu_read_unlock();
+   if (err < 0)
+   goto errout;
+   }
 
 out:
nlmsg_end(skb, nlh);
diff --git a/net/ipv4/tcp_dctcp.c b/net/ipv4/tcp_dctcp.c
index b504371af742..4376016f7fa5 100644
--- a/net/ipv4/tcp_dctcp.c
+++ b/net/ipv4/tcp_dctcp.c
@@ -277,7 +277,7 @@ static void dctcp_cwnd_event(struct sock *sk, enum 
tcp_ca_event ev)
}
 }
 
-static void dctcp_get_info(struct sock *sk, u32 ext, struct sk_buff *skb)
+static int dctcp_get_info(struct sock *sk, u32 ext, struct sk_buff *skb)
 {
const struct dctcp *ca = inet_csk_ca(sk);
 
@@ -297,8 +297,9 @@ static void dctcp_get_info(struct sock *sk, u32 ext, struct 
sk_buff *skb)
info.dctcp_ab_tot = ca->acked_bytes_total;
}
 
-   nla_put(skb, INET_DIAG_DCTCPINFO, sizeof(info), &info);
+   return nla_put(skb, INET_DIAG_DCTCPINFO, sizeof(info), &info);
}
+   return 0;
 }
 
 static struct tcp_congestion_ops dctcp __read_mostly = {
diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 1d5a30a90adf..67476f085e48 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -300,8 +300,7 @@ static u32 tcp_illinois_ssthresh(struct sock *sk)
 }
 
 /* Extract info for Tcp socket info provided via netlink. */
-static void tcp_illinois_info(struct sock *sk, u32 ext,
- struct sk_buff *skb)
+static int tcp_illinois_info(struct sock *sk, u32 ext, struct sk_buff *skb)
 {
const struct illinois *ca = inet_csk_ca(sk);
 
@@ -318,8 +317,9 @@ static void tcp_illinois_info(struct sock *sk, u32 ext,
do_div(t, info.tcpv_rttc

act_mirred: Fix bogus header when redirecting from VLAN

2015-04-16 Thread Herbert Xu
When you redirect a VLAN device to an ifb device, you end up with
crap within the ifb device because they have unequal hard header
lengths and the redirected packet contains four extra bytes at the
start which then gets interpreted as part of the MAC address.

This patch fixes this by only pushing on the hard header length
of ifb.  This assumes that the original packet actually has enough
header for that so checks have been added to that effect.

Signed-off-by: Herbert Xu 

diff --git a/drivers/net/ifb.c b/drivers/net/ifb.c
index 34f846b..1256d26 100644
--- a/drivers/net/ifb.c
+++ b/drivers/net/ifb.c
@@ -105,7 +105,7 @@ static void ri_tasklet(unsigned long dev)
if (from & AT_EGRESS) {
dev_queue_xmit(skb);
} else if (from & AT_INGRESS) {
-   skb_pull(skb, skb->dev->hard_header_len);
+   skb_pull(skb, _dev->hard_header_len);
netif_receive_skb(skb);
} else
BUG();
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 5953517..44e4d50 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -151,13 +151,25 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
}
 
at = G_TC_AT(skb->tc_verd);
+
+   if (!(at & AT_EGRESS) && m->tcfm_ok_push &&
+   (skb_headroom(skb) < dev->hard_header_len ||
+skb->dev->hard_header_len < dev->hard_header_len)) {
+   net_notice_ratelimited(
+   "tc mirred: Insufficient headroom from %s to %s: "
+   "%d %d\n",
+   skb->dev->name, dev->name,
+   skb_headroom(skb), skb->dev->hard_header_len);
+   goto out;
+   }
+
skb2 = skb_act_clone(skb, GFP_ATOMIC, m->tcf_action);
if (skb2 == NULL)
goto out;
 
if (!(at & AT_EGRESS)) {
if (m->tcfm_ok_push)
-   skb_push(skb2, skb2->dev->hard_header_len);
+   skb_push(skb2, dev->hard_header_len);
}
 
/* mirror is always swallowed */
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy
On 17.04, Hannes Frederic Sowa wrote:
> 
> 
> On Thu, Apr 16, 2015, at 22:56, Patrick McHardy wrote:
> > On 17.04, Herbert Xu wrote:
> > > On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
> > > >
> > > > So currently we have one fast path, that is: we are not fragmented, we
> > > > get out non-fragmented, none of this code is ever touched and no
> > > > problem.
> > > > 
> > > > We don't want to mak this more complex, but
> > > 
> > > You should read Dave's other email where he gives you an obvious
> > > solution.  If you have to modify the skb then you don't have to
> > > worry about the original fragments.
> > > 
> > > But if you only read the skb then don't linearise it completely
> > > and keep the original fragments.
> > 
> > Yes, I was just responding to that. We need an additional member
> > in the skb, but then this part is quite simple.
> 
> I guess you refer to the solution of using the fragmented list of
> packets to do checks. I don't think it will be that simple to peek into
> all the different skbs if the fragments are split in the header.
> Splitting fragments in the header is the 1x1 of firewall by-passing
> attacks, so we should certainly handle it correctly.

Also correct, that argument hasn't come up so far. Its actually not
too hard to handle, you need to keep track of which parts have been
inspected to make a classification decision and make sure those
parts are not overwritten. The TCP match does this statically, but
we certainly could do this dynamically. The question again is how
to keep geometry if things are actually overwritten, especially
partially. Apparently all this troubly is being seen as worthwhile.

> Logic off peeking into fragments and splitting fragments must be
> logically consent all the time, with all netfilter helpers in the tree.
> 
> I don't think the additional checks in fast-path are in any way
> justified.

I fully agree.

> 
> Bye,
> Hannes
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Hannes Frederic Sowa


On Thu, Apr 16, 2015, at 22:56, Patrick McHardy wrote:
> On 17.04, Herbert Xu wrote:
> > On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
> > >
> > > So currently we have one fast path, that is: we are not fragmented, we
> > > get out non-fragmented, none of this code is ever touched and no
> > > problem.
> > > 
> > > We don't want to mak this more complex, but
> > 
> > You should read Dave's other email where he gives you an obvious
> > solution.  If you have to modify the skb then you don't have to
> > worry about the original fragments.
> > 
> > But if you only read the skb then don't linearise it completely
> > and keep the original fragments.
> 
> Yes, I was just responding to that. We need an additional member
> in the skb, but then this part is quite simple.

I guess you refer to the solution of using the fragmented list of
packets to do checks. I don't think it will be that simple to peek into
all the different skbs if the fragments are split in the header.
Splitting fragments in the header is the 1x1 of firewall by-passing
attacks, so we should certainly handle it correctly.

Logic off peeking into fragments and splitting fragments must be
logically consent all the time, with all netfilter helpers in the tree.

I don't think the additional checks in fast-path are in any way
justified.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Prashant Sreedharan
On Thu, 2015-04-16 at 18:15 +0100, Ian Jackson wrote:
> Michael Chan writes ("Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
> messages]"):
> > On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
> > > Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
> > > supposed to synchronize as well.
> > 
> > For small rx packets (< 256 bytes), we sync the DMA buffer before we
> > copy the data to another SKB.  For larger packets, we unmap the DMA
> > buffer.  Do we see the corruption in both cases?
> 
> Yes, at least with swiotlb=force iommu=soft.

Ok this is what is causing the problem, the driver uses
DEFINE_DMA_UNMAP_ADDR(), dma_unmap_addr_set() to keep a copy of the dma
"mapping" and dma_unmap_addr() to get the "mapping" value. On most of
the platforms this is a no-op, but it appears with "iommu=soft and
swiotlb=force" this house keeping is required, when I pass the correct
dma_addr instead of 0 while calling pci_unmap_/pci_dma_sync_ I don't see
the corruption. ie If you set CONFIG_NEED_DMA_MAP_STATE=y in your kernel
config you should not see the problem. Can you confirm ? Thanks


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net] tcp: tcp_get_info() should fetch socket fields once

2015-04-16 Thread Eric Dumazet
From: Eric Dumazet 

tcp_get_info() can be called without holding socket lock,
so any socket fields can change under us.

Use READ_ONCE() to fetch sk_pacing_rate and sk_max_pacing_rate

Fixes: 977cb0ecf82e ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet 
---
David, I do not think this needs stable backport, this is a quite
minor bug. I've added the 'Fixes' tag for reference only.

 net/ipv4/tcp.c |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 18e3a12eb1b2..59c8a027721b 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2595,6 +2595,7 @@ void tcp_get_info(const struct sock *sk, struct tcp_info 
*info)
const struct tcp_sock *tp = tcp_sk(sk);
const struct inet_connection_sock *icsk = inet_csk(sk);
u32 now = tcp_time_stamp;
+   u32 rate;
 
memset(info, 0, sizeof(*info));
 
@@ -2655,10 +2656,11 @@ void tcp_get_info(const struct sock *sk, struct 
tcp_info *info)
 
info->tcpi_total_retrans = tp->total_retrans;
 
-   info->tcpi_pacing_rate = sk->sk_pacing_rate != ~0U ?
-   sk->sk_pacing_rate : ~0ULL;
-   info->tcpi_max_pacing_rate = sk->sk_max_pacing_rate != ~0U ?
-   sk->sk_max_pacing_rate : ~0ULL;
+   rate = READ_ONCE(sk->sk_pacing_rate);
+   info->tcpi_pacing_rate = rate != ~0U ? rate : ~0ULL;
+
+   rate = READ_ONCE(sk->sk_max_pacing_rate);
+   info->tcpi_max_pacing_rate = rate != ~0U ? rate : ~0ULL;
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 06/16] netconsole: make netconsole_target->enabled a bool

2015-04-16 Thread Tejun Heo
We'll add more bool's to netconsole_target.  Let's convert ->enabled
to bool first so that things stay consistent.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 15731d1..09d4e12 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -104,7 +104,7 @@ struct netconsole_target {
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
struct config_item  item;
 #endif
-   int enabled;
+   boolenabled;
struct mutexmutex;
struct netpoll  np;
 };
@@ -197,7 +197,7 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
if (err)
goto fail;
 
-   nt->enabled = 1;
+   nt->enabled = true;
 
return nt;
 
@@ -322,13 +322,13 @@ static ssize_t store_enabled(struct netconsole_target *nt,
return err;
if (enabled < 0 || enabled > 1)
return -EINVAL;
-   if (enabled == nt->enabled) {
+   if ((bool)enabled == nt->enabled) {
pr_info("network logging has already %s\n",
nt->enabled ? "started" : "stopped");
return -EINVAL;
}
 
-   if (enabled) {  /* 1 */
+   if (enabled) {  /* true */
/*
 * Skip netpoll_parse_options() -- all the attributes are
 * already configured via configfs. Just print them out.
@@ -340,13 +340,13 @@ static ssize_t store_enabled(struct netconsole_target *nt,
return err;
 
pr_info("netconsole: network logging started\n");
-   } else {/* 0 */
+   } else {/* false */
/* We need to disable the netconsole before cleaning it up
 * otherwise we might end up in write_msg() with
-* nt->np.dev == NULL and nt->enabled == 1
+* nt->np.dev == NULL and nt->enabled == true
 */
spin_lock_irqsave(&target_list_lock, flags);
-   nt->enabled = 0;
+   nt->enabled = false;
spin_unlock_irqrestore(&target_list_lock, flags);
netpoll_cleanup(&nt->np);
}
@@ -594,7 +594,7 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
 
/*
 * Allocate and initialize with defaults.
-* Target is disabled at creation (enabled == 0).
+* Target is disabled at creation (!enabled).
 */
nt = kzalloc(sizeof(*nt), GFP_KERNEL);
if (!nt)
@@ -695,7 +695,7 @@ restart:
spin_lock_irqsave(&target_list_lock, flags);
dev_put(nt->np.dev);
nt->np.dev = NULL;
-   nt->enabled = 0;
+   nt->enabled = false;
stopped = true;
netconsole_target_put(nt);
goto restart;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 11/16] netconsole: consolidate enable/disable and create/destroy paths

2015-04-16 Thread Tejun Heo
netconsole management paths are scattered around in different places.
This patch reorganizes them so that

* All enable logic is in netconsole_enable() and disable in
  netconsole_disable().  Both should be called with netconsole_mutex
  held and netconsole_disable() may be invoked without intervening
  enable.

* alloc_param_target() now also handles linking the new target to
  target_list.  It's renamed to create_param_target() and now returns
  errno.

* store_enabled() now uses netconsole_enable/disable() instead of
  open-coding the logic.  This also fixes the missing synchronization
  against write path when manipulating ->enabled flag.

* drop_netconsole_target() and netconsole_deferred_disable_work_fn()
  updated to use netconsole_disable().

* init/cleanup_netconsole()'s destruction paths are consolidated into
  netconsole_destroy_all() which uses netconsole_disable().
  free_param_target() is no longer used and removed.

While the conversions aren't one-to-one equivalent, this patch
shouldn't cause any visible behavior differences and makes extension
of the enable/disable logic a lot easier.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 147 +--
 1 file changed, 79 insertions(+), 68 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index f0ac9f6..d72d902 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -83,6 +83,8 @@ static LIST_HEAD(target_list);
 /* protects target creation/destruction and enable/disable */
 static DEFINE_MUTEX(netconsole_mutex);
 
+static struct console netconsole;
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -192,8 +194,43 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
return nt;
 }
 
+static int netconsole_enable(struct netconsole_target *nt)
+{
+   int err;
+
+   lockdep_assert_held(&netconsole_mutex);
+   WARN_ON_ONCE(nt->enabled);
+
+   err = netpoll_setup(&nt->np);
+   if (err)
+   return err;
+
+   console_lock();
+   nt->enabled = true;
+   console_unlock();
+   return 0;
+}
+
+static void netconsole_disable(struct netconsole_target *nt)
+{
+   lockdep_assert_held(&netconsole_mutex);
+
+   /*
+* We need to disable the netconsole before cleaning it up
+* otherwise we might end up in write_msg() with !nt->np.dev &&
+* nt->enabled.
+*/
+   if (nt->enabled) {
+   console_lock();
+   nt->enabled = false;
+   console_unlock();
+
+   netpoll_cleanup(&nt->np);
+   }
+}
+
 /* Allocate new target (from boot/module param) and setup netpoll for it */
-static struct netconsole_target *alloc_param_target(char *target_config)
+static int create_param_target(char *target_config)
 {
int err = -ENOMEM;
struct netconsole_target *nt;
@@ -209,24 +246,26 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
if (err)
goto fail;
 
-   err = netpoll_setup(&nt->np);
+   console_lock();
+   list_add(&nt->list, &target_list);
+   console_unlock();
+
+   err = netconsole_enable(nt);
if (err)
-   goto fail;
+   goto fail_del;
 
-   nt->enabled = true;
+   /* Dump existing printks when we register */
+   netconsole.flags |= CON_PRINTBUFFER;
 
-   return nt;
+   return 0;
 
+fail_del:
+   console_lock();
+   list_del(&nt->list);
+   console_unlock();
 fail:
kfree(nt);
-   return ERR_PTR(err);
-}
-
-/* Cleanup netpoll for given target (from boot/module param) and free it */
-static void free_param_target(struct netconsole_target *nt)
-{
-   netpoll_cleanup(&nt->np);
-   kfree(nt);
+   return err;
 }
 
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
@@ -350,24 +389,15 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 */
netpoll_print_options(&nt->np);
 
-   err = netpoll_setup(&nt->np);
+   err = netconsole_enable(nt);
if (err)
return err;
 
pr_info("netconsole: network logging started\n");
} else {/* false */
-   /* We need to disable the netconsole before cleaning it up
-* otherwise we might end up in write_msg() with
-* nt->np.dev == NULL and nt->enabled == true
-*/
-   console_lock();
-   nt->enabled = false;
-   console_unlock();
-   netpoll_cleanup(&nt->np);
+   netconsole_disable(nt);
}
 
-   nt->enabled = enabled;
-
return strnlen(buf, count);
 }
 
@@ -633,13 +663,7 @@ static void drop_netconsole_target(struct config_group 
*group,
list_del(&nt->list);
console_unl

[PATCH 13/16] netconsole: implement retransmission support for extended consoles

2015-04-16 Thread Tejun Heo
With extended netconsole, the logger can reliably determine which
messages are missing.  This patch implements receiver for extended
netconsole which accepts retransmission request packets coming into
the the netconsole source port and sends back the requested messages.

The receiving socket is automatically created when an extended console
is enabled.  A poll_table with a custom callback is queued on the
socket.  When a packet is received on the socket, a work item is
scheduled which reads and parses the payload and sends back the
requested ones.  A retransmission packet looks like the following.

 nca  ...

The receiver doesn't interfere with the normal transmission path.
Even if something goes wrong with the network stack or receiver
itself, the normal transmission path should keep working.

This can be used to implement reliable netconsole logger.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 238 +++
 1 file changed, 238 insertions(+)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 626d9f0..b2763e0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -45,7 +45,10 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
+#include 
 #include 
 #include 
 
@@ -89,6 +92,14 @@ static struct console netconsole_ext;
 
 static struct console netconsole;
 
+union sockaddr_in46 {
+   struct sockaddr addr;
+   struct sockaddr_in6 in6;
+   struct sockaddr_in  in4;
+};
+
+static char *netconsole_ext_tx_buf;/* CONSOLE_EXT_LOG_MAX */
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -118,6 +129,14 @@ struct netconsole_target {
booldisable_scheduled;
boolextended;
struct netpoll  np;
+
+   /* response packet handling for extended netconsoles */
+   union sockaddr_in46 raddr;  /* logging target address */
+   struct file *rx_file;   /* sock to receive responses */
+   poll_table  rx_ptable;  /* for polling the socket */
+   wait_queue_trx_wait;/* ditto */
+   wait_queue_head_t   *rx_waitq;  /* ditto */
+   struct work_struct  rx_work;/* receive & process packets */
 };
 
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
@@ -274,6 +293,219 @@ static void send_ext_msg_udp(struct netconsole_target 
*nt, const char *msg,
}
 }
 
+static bool sockaddr_in46_equal(union sockaddr_in46 *a, union sockaddr_in46 *b)
+{
+   if (a->addr.sa_family != b->addr.sa_family)
+   return false;
+   if (a->in4.sin_port != b->in4.sin_port)
+   return false;
+   if (a->addr.sa_family == AF_INET &&
+   memcmp(&a->in4.sin_addr, &b->in4.sin_addr, sizeof(a->in4.sin_addr)))
+   return false;
+   if (a->addr.sa_family == AF_INET6 &&
+   memcmp(&a->in6.sin6_addr, &b->in6.sin6_addr, 
sizeof(a->in6.sin6_addr)))
+   return false;
+   return true;
+}
+
+/**
+ * netconsole_rx_work_fn - work function to handle netconsole ack packets
+ * @work: netconsole_target->rx_work
+ *
+ * This function is scheduled when packets are received on the source port
+ * of extended netconsole target and responds to ack messages from the
+ * remote logger.  An ack message has the following format.
+ *
+ * nca   ...
+ *
+ * There can be any number of missing-seq's as long as the whole payload
+ * fits inside MAX_PRINT_CHUNK.  This function re-transmits each message
+ * matching the missing-seq's if available.  This can be used to implement
+ * reliable logging from the receiver side.
+ *
+ * The rx path doesn't interfere with the normal tx path except for when it
+ * actually sends out the messages, and that path doesn't depend on
+ * anything more working compared to the normal tx path.  As such, this
+ * shouldn't make netconsole any less robust when things start going south.
+ */
+static void netconsole_rx_work_fn(struct work_struct *work)
+{
+   struct netconsole_target *nt =
+   container_of(work, struct netconsole_target, rx_work);
+   union sockaddr_in46 raddr;
+   struct msghdr msgh = { .msg_name = &raddr.addr, };
+   struct kvec iov;
+   char *tx_buf = netconsole_ext_tx_buf;
+   char *rx_buf, *pos, *tok;
+   u64 seq;
+   int len;
+
+   rx_buf = kmalloc(MAX_PRINT_CHUNK + 1, GFP_KERNEL);
+   if (!rx_buf && printk_ratelimit()) {
+   pr_warning("failed to allocate RX buffer\n");
+   return;
+   }
+
+   iov.iov_base = rx_buf;
+   iov.iov_len = MAX_PRINT_CHUNK;
+repeat:
+   msgh.msg_namelen = sizeof(raddr);
+   len = kernel_recvmsg(sock_from_file(nt->rx_file, &len), &msgh, &iov, 1,
+MAX_PRINT_CHUNK, MSG_DONTWAIT);
+   if (len < 0) {
+ 

[PATCH 12/16] netconsole: implement extended console support

2015-04-16 Thread Tejun Heo
netconsole transmits raw console messages using one or multiple UDP
packets and there's no way to find out whether the packets are lost in
transit or received out of order.  Depending on the setup, this can
make the logging significantly unreliable and untrustworthy.

With the new extended console support, printk now can be told to
expose log metadata including the message sequence number to console
drivers which can be used by log receivers to determine whether and
which messages are missing and reorder messages received out of order.

This patch implements extended console support for netconsole which
can be enabled by either prepending "+" to a netconsole boot param
entry or echoing 1 to "extended" file in configfs.  When enabled,
netconsole transmits extended log messages with headers identical to
/dev/kmsg output.

netconsole may have to split a single messages to multiple fragments.
In this case, if the extended mode is enabled, an optional header of
the form "ncfrag=OFF@IDX/NR" is added to each fragment where OFF is
the byte offset of the message body, IDX is the 0-based fragment index
and NR is the number of total fragments for this message.

To avoid unnecessarily forcing printk to format extended messages,
extended netconsole is registered with printk iff it's actually used.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 159 ++-
 1 file changed, 158 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d72d902..626d9f0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -83,6 +83,10 @@ static LIST_HEAD(target_list);
 /* protects target creation/destruction and enable/disable */
 static DEFINE_MUTEX(netconsole_mutex);
 
+static bool netconsole_ext_used_during_init;
+static bool netconsole_ext_registered;
+static struct console netconsole_ext;
+
 static struct console netconsole;
 
 /**
@@ -112,6 +116,7 @@ struct netconsole_target {
 #endif
boolenabled;
booldisable_scheduled;
+   boolextended;
struct netpoll  np;
 };
 
@@ -194,6 +199,81 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
return nt;
 }
 
+/**
+ * send_ext_msg_udp - send extended log message to target
+ * @nt: target to send message to
+ * @msg: extended log message to send
+ * @msg_len: length of message
+ *
+ * Transfer extended log @msg to @nt.  If @msg is too long, it'll be split
+ * and transmitted in multiple chunks with ncfrag header field added to
+ * enable correct reassembly.
+ */
+static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
+int msg_len)
+{
+   static char buf[MAX_PRINT_CHUNK];
+   const int max_extra_len = sizeof(",ncfrag=@00/00");
+   const char *header, *body;
+   int header_len = msg_len, body_len = 0;
+   int chunk_len, nr_chunks, i;
+
+   if (!nt->enabled || !netif_running(nt->np.dev))
+   return;
+
+   if (msg_len <= MAX_PRINT_CHUNK) {
+   netpoll_send_udp(&nt->np, msg, msg_len);
+   return;
+   }
+
+   /* need to insert extra header fields, detect header and body */
+   header = msg;
+   body = memchr(msg, ';', msg_len);
+   if (body) {
+   header_len = body - header;
+   body_len = msg_len - header_len - 1;
+   body++;
+   }
+
+   chunk_len = MAX_PRINT_CHUNK - header_len - max_extra_len;
+   if (WARN_ON_ONCE(chunk_len <= 0))
+   return;
+
+   /*
+* Transfer possibly multiple chunks with extra header fields.
+*
+* If @msg needs to be split to fit MAX_PRINT_CHUNK, add
+* "ncfrag=@<0-based-chunk-index>/" to
+* enable proper reassembly on receiver side.
+*/
+   memcpy(buf, header, header_len);
+   nr_chunks = DIV_ROUND_UP(body_len, chunk_len);
+
+   for (i = 0; i < nr_chunks; i++) {
+   int this_header = header_len;
+   int this_chunk;
+
+   if (nr_chunks > 1)
+   this_header += scnprintf(buf + this_header,
+sizeof(buf) - this_header,
+",ncfrag=%d@%d/%d",
+i * chunk_len, i, nr_chunks);
+   if (this_header < sizeof(buf))
+   buf[this_header++] = ';';
+
+   if (WARN_ON_ONCE(this_header + chunk_len > MAX_PRINT_CHUNK))
+   return;
+
+   this_chunk = min(body_len, chunk_len);
+   memcpy(buf + this_header, body, this_chunk);
+
+   netpoll_send_udp(&nt->np, buf, this_header + this_chunk);
+
+   body += this_chunk;
+   body_len -= this_chunk;
+   }
+}
+
 static int netconsole_enable

[PATCH 10/16] netconsole: introduce netconsole_mutex

2015-04-16 Thread Tejun Heo
console_lock protects the target_list itself and setting and clearing
of its ->enabled flag; however, nothing protects the overall
enable/disable operations which we'll need to make netconsole
management more dynamic for the scheduled reliable transmission
support.  Also, an earlier patch introduced a small race window where
dynamic console disable may compete against asynchronous disable
kicked off from netdevice_notifier.

This patch adds netconsole_mutex which protects all target
create/destroy and enable/disable operations.  It also replaces
netconsole_target->mutex used by dynamic consoles.  The above
mentioned race is removed by this change.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 48 
 1 file changed, 28 insertions(+), 20 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 57c02ab..f0ac9f6 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -75,10 +75,14 @@ __setup("netconsole=", option_setup);
 
 /*
  * Linked list of all configured targets.  The list and each target's
- * enable/disable state are protected by console_lock.
+ * enable/disable state are protected by both netconsole_mutex and
+ * console_lock.
  */
 static LIST_HEAD(target_list);
 
+/* protects target creation/destruction and enable/disable */
+static DEFINE_MUTEX(netconsole_mutex);
+
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -106,7 +110,6 @@ struct netconsole_target {
 #endif
boolenabled;
booldisable_scheduled;
-   struct mutexmutex;
struct netpoll  np;
 };
 
@@ -184,7 +187,6 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
nt->np.local_port = 6665;
nt->np.remote_port = ;
-   mutex_init(&nt->mutex);
eth_broadcast_addr(nt->np.remote_mac);
 
return nt;
@@ -196,6 +198,8 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
int err = -ENOMEM;
struct netconsole_target *nt;
 
+   lockdep_assert_held(&netconsole_mutex);
+
nt = alloc_netconsole_target();
if (!nt)
goto fail;
@@ -573,10 +577,10 @@ static ssize_t netconsole_target_attr_store(struct 
config_item *item,
struct netconsole_target_attr *na =
container_of(attr, struct netconsole_target_attr, attr);
 
-   mutex_lock(&nt->mutex);
+   mutex_lock(&netconsole_mutex);
if (na->store)
ret = na->store(nt, buf, count);
-   mutex_unlock(&nt->mutex);
+   mutex_unlock(&netconsole_mutex);
 
return ret;
 }
@@ -610,9 +614,11 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
/* Adding, but it is disabled */
+   mutex_lock(&netconsole_mutex);
console_lock();
list_add(&nt->list, &target_list);
console_unlock();
+   mutex_unlock(&netconsole_mutex);
 
return &nt->item;
 }
@@ -622,6 +628,7 @@ static void drop_netconsole_target(struct config_group 
*group,
 {
struct netconsole_target *nt = to_target(item);
 
+   mutex_lock(&netconsole_mutex);
console_lock();
list_del(&nt->list);
console_unlock();
@@ -634,6 +641,7 @@ static void drop_netconsole_target(struct config_group 
*group,
netpoll_cleanup(&nt->np);
 
config_item_put(&nt->item);
+   mutex_unlock(&netconsole_mutex);
 }
 
 static struct configfs_group_operations netconsole_subsys_group_ops = {
@@ -662,6 +670,7 @@ static void netconsole_deferred_disable_work_fn(struct 
work_struct *work)
 {
struct netconsole_target *nt, *to_disable;
 
+   mutex_lock(&netconsole_mutex);
 repeat:
to_disable = NULL;
console_lock();
@@ -685,6 +694,8 @@ repeat:
netconsole_target_put(to_disable);
goto repeat;
}
+
+   mutex_unlock(&netconsole_mutex);
 }
 
 static DECLARE_WORK(netconsole_deferred_disable_work,
@@ -791,6 +802,8 @@ static int __init init_netconsole(void)
char *target_config;
char *input = config;
 
+   mutex_lock(&netconsole_mutex);
+
if (strnlen(input, MAX_PARAM_LENGTH)) {
while ((target_config = strsep(&input, ";"))) {
nt = alloc_param_target(target_config);
@@ -818,24 +831,21 @@ static int __init init_netconsole(void)
register_console(&netconsole);
pr_info("network logging started\n");
 
+   mutex_unlock(&netconsole_mutex);
return err;
 
 undonotifier:
unregister_netdevice_notifier(&netconsole_netdev_notifier);
-   cancel_work_sync(&netconsole_deferred_disable_work);
 fail:
pr_err("cleanin

[PATCH 05/16] printk: implement log_seq_range() and ext_log_from_seq()

2015-04-16 Thread Tejun Heo
Implement two functions to access log messages by their sequence
numbers.

* log_seq_range() determines the currently available sequence number
  range.

* ext_log_from_seq() outputs log message identified by a given
  sequence number in the extended format.

These will be used to implement reliable netconsole.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 include/linux/printk.h | 14 
 kernel/printk/printk.c | 96 ++
 2 files changed, 110 insertions(+)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 58b1fec..4fd3bb4 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 
 extern const char linux_banner[];
 extern const char linux_proc_banner[];
@@ -169,6 +170,8 @@ void __init setup_log_buf(int early);
 void dump_stack_set_arch_desc(const char *fmt, ...);
 void dump_stack_print_info(const char *log_lvl);
 void show_regs_print_info(const char *log_lvl);
+void log_seq_range(u64 *beginp, u64 *endp);
+ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq);
 #else
 static inline __printf(1, 0)
 int vprintk(const char *s, va_list args)
@@ -228,6 +231,17 @@ static inline void dump_stack_print_info(const char 
*log_lvl)
 static inline void show_regs_print_info(const char *log_lvl)
 {
 }
+
+static inline void log_seq_range(u64 *begin_seqp, u64 *end_seqp)
+{
+   *begin_seqp = 0;
+   *end_seqp = 0;
+}
+
+static inline ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq)
+{
+   return -ERANGE;
+}
 #endif
 
 extern asmlinkage void dump_stack(void) __cold;
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 349a37b..904413e 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -617,6 +617,102 @@ static ssize_t msg_print_ext_body(char *buf, size_t size,
return p - buf;
 }
 
+/**
+ * log_seq_range - get the range of available log sequence numbers
+ * @begin_seqp: output parameter for the begin of the range
+ * @end_seqp: output parameter for the end of the range
+ *
+ * Returns the log sequence number range [*@begin_seqp, *@ends_seqp) which
+ * is currently available to be retrieved using ext_log_from_seq().  Note
+ * that the range may shift anytime.
+ */
+void log_seq_range(u64 *begin_seqp, u64 *end_seqp)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&logbuf_lock, flags);
+   *begin_seqp = log_first_seq;
+   *end_seqp = log_next_seq;
+   raw_spin_unlock_irqrestore(&logbuf_lock, flags);
+}
+EXPORT_SYMBOL_GPL(log_seq_range);
+
+/**
+ * ext_log_from_seq - obtain log message in extended format from its seq number
+ * @buf: output buffer
+ * @size: size of @buf
+ * @log_seq: target log sequence number
+ *
+ * Print the log message with @log_seq as its sequence number into @buf
+ * using the extended format.  On success, returns the length of formatted
+ * output excluding the terminating '\0'.  If @log_seq doesn't match any
+ * message, -ERANGE is returned.
+ *
+ * Due to the way messages are stored, accessing @log_seq requires seeking
+ * the log buffer sequentially.  As the last looked up position is cached,
+ * accessing messages in ascending sequence order is recommended.
+ */
+ssize_t ext_log_from_seq(char *buf, size_t size, u64 log_seq)
+{
+   static u64 seq_hint;
+   static u32 idx_hint;
+   static enum log_flags prev_flags_hint;
+   struct printk_log *msg;
+   enum log_flags prev_flags = 0;
+   unsigned long flags;
+   ssize_t len;
+   u32 seq;
+   u32 prev_idx, idx;
+
+   raw_spin_lock_irqsave(&logbuf_lock, flags);
+
+   if (log_seq < log_first_seq || log_seq >= log_next_seq) {
+   len = -ERANGE;
+   goto out_unlock;
+   }
+
+   /*
+* Seek to @log_seq to determine its index. The last looked up seq
+* and index are cached to accelerate in-order accesses.  For now,
+* @prev_flags is best effort.  It'd be better to restructure it so
+* that the current entry carries all the relevant information
+* without depending on the previous one.
+*/
+   seq = log_first_seq;
+   idx = log_first_idx;
+
+   if (seq < seq_hint && seq_hint <= log_seq) {
+   seq = seq_hint;
+   idx = idx_hint;
+   prev_flags = prev_flags_hint;
+   }
+
+   prev_idx = idx;
+   while (seq < log_seq) {
+   seq++;
+   prev_idx = idx;
+   idx = log_next(idx);
+   }
+
+   if (prev_idx != idx)
+   prev_flags = log_from_idx(prev_idx)->flags;
+   msg = log_from_idx(idx);
+
+   /* entry located, format it */
+   len = msg_print_ext_header(buf, size, msg, seq, prev_flags);
+   len += msg_print_ext_body(buf + len, size - len,
+ log_dict(msg), msg->dict_len,
+ log_text(msg), msg->text_len);
+
+  

[PATCH 14/16] netconsole: implement ack handling and emergency transmission

2015-04-16 Thread Tejun Heo
While the retransmission support added by the previous patch goes a
long way towards enabling implementation of reliable remote logger, it
depends on a lot larger part of the kernel working and there's no
non-polling way of discovering whether the latest messages have been
lost.

This patch implements an optional ack handling.  An extended remote
logger can respond with a message formatted like the following.

 nca[]  ...

The optional  enables ack handling.  Whenever ack lags
transmission by more than ACK_TIMEOUT (10s), the netconsole target
will retransmit all unacked messages with increasing interval (100ms
between message at the beginning, exponentially backing out to 10s).
This ensures that the remote logger has a very high chance of getting
all the messages even after severe failures as long as the timer and
netpoll are working.

This doesn't interfere with the normal transmission path.  Even if
something goes wrong with timer, the normal transmission path should
keep working.

Emergency transmissions are marked with "emg=1" header.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 136 +++
 1 file changed, 127 insertions(+), 9 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index b2763e0..82c8be0 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -49,6 +49,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -59,6 +60,10 @@ MODULE_LICENSE("GPL");
 #define MAX_PARAM_LENGTH   256
 #define MAX_PRINT_CHUNK1000
 
+#define ACK_TIMEOUT(10 * HZ)
+#define EMG_TX_MIN_INTV(HZ / 10)
+#define EMG_TX_MAX_INTVHZ
+
 static char config[MAX_PARAM_LENGTH];
 module_param_string(netconsole, config, MAX_PARAM_LENGTH, 0);
 MODULE_PARM_DESC(netconsole, " 
netconsole=[src-port]@[src-ip]/[dev],[tgt-port]@/[tgt-macaddr]");
@@ -137,6 +142,13 @@ struct netconsole_target {
wait_queue_trx_wait;/* ditto */
wait_queue_head_t   *rx_waitq;  /* ditto */
struct work_struct  rx_work;/* receive & process packets */
+
+   /* ack handling and emergency transmission for extended netconsoles */
+   unsigned long   ack_tstmp;  /* pending ack timestamp */
+   u64 ack_seq;/* last acked sequence */
+   struct timer_list   ack_timer;  /* ack timeout */
+   unsigned long   emg_tx_intv;/* emergency tx interval */
+   u64 emg_tx_seq; /* current emg tx sequence */
 };
 
 #ifdef CONFIG_NETCONSOLE_DYNAMIC
@@ -223,16 +235,17 @@ static struct netconsole_target 
*alloc_netconsole_target(void)
  * @nt: target to send message to
  * @msg: extended log message to send
  * @msg_len: length of message
+ * @emg_tx: is it for emergency transmission?
  *
  * Transfer extended log @msg to @nt.  If @msg is too long, it'll be split
  * and transmitted in multiple chunks with ncfrag header field added to
- * enable correct reassembly.
+ * enable correct reassembly.  If @emg_tx, emg header field is added.
  */
 static void send_ext_msg_udp(struct netconsole_target *nt, const char *msg,
-int msg_len)
+int msg_len, bool emg_tx)
 {
static char buf[MAX_PRINT_CHUNK];
-   const int max_extra_len = sizeof(",ncfrag=@00/00");
+   const int max_extra_len = sizeof(",emg=1,ncfrag=@00/00");
const char *header, *body;
int header_len = msg_len, body_len = 0;
int chunk_len, nr_chunks, i;
@@ -240,7 +253,7 @@ static void send_ext_msg_udp(struct netconsole_target *nt, 
const char *msg,
if (!nt->enabled || !netif_running(nt->np.dev))
return;
 
-   if (msg_len <= MAX_PRINT_CHUNK) {
+   if (!emg_tx && msg_len <= MAX_PRINT_CHUNK) {
netpoll_send_udp(&nt->np, msg, msg_len);
return;
}
@@ -261,6 +274,8 @@ static void send_ext_msg_udp(struct netconsole_target *nt, 
const char *msg,
/*
 * Transfer possibly multiple chunks with extra header fields.
 *
+* For emergency transfers due to missing acks, add "emg=1".
+*
 * If @msg needs to be split to fit MAX_PRINT_CHUNK, add
 * "ncfrag=@<0-based-chunk-index>/" to
 * enable proper reassembly on receiver side.
@@ -272,6 +287,10 @@ static void send_ext_msg_udp(struct netconsole_target *nt, 
const char *msg,
int this_header = header_len;
int this_chunk;
 
+   if (emg_tx)
+   this_header += scnprintf(buf + this_header,
+sizeof(buf) - this_header,
+",emg=1");
if (nr_chunks > 1)
this_header += scnprintf(buf + this_header,

[PATCH 08/16] netconsole: punt disabling to workqueue from netdevice_notifier

2015-04-16 Thread Tejun Heo
The netdevice_notifier callback, netconsole_netdev_event(), needs to
perform netpoll_cleanup() for the affected targets; however, the
notifier is called with rtnl_lock held which the netpoll_cleanup()
path also grabs.  To avoid deadlock, the path uses __netpoll_cleanup()
instead and making the code path different from others.

The planned reliable netconsole support will add more logic to the
cleanup path making the slightly different paths painful.  Let's punt
netconsole_target disabling to workqueue so that it can later share
the same cleanup path.  This would also allow ditching
target_list_lock and depending on console_lock for synchronization.

Note that this introduces a narrow race window where the asynchronous
disabling may race against disabling from configfs ending up executing
netpoll_cleanup() more than once on the same instance.  The follow up
patches will fix it by introducing a mutex to protect overall enable /
disable operations; unfortunately, the locking update couldn't be
ordered before this change due to the locking order with rtnl_lock.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 58 ++--
 1 file changed, 41 insertions(+), 17 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 17692b8..d355776 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -105,6 +105,7 @@ struct netconsole_target {
struct config_item  item;
 #endif
boolenabled;
+   booldisable_scheduled;
struct mutexmutex;
struct netpoll  np;
 };
@@ -660,6 +661,39 @@ static struct configfs_subsystem netconsole_subsys = {
 
 #endif /* CONFIG_NETCONSOLE_DYNAMIC */
 
+static void netconsole_deferred_disable_work_fn(struct work_struct *work)
+{
+   struct netconsole_target *nt, *to_disable;
+   unsigned long flags;
+
+repeat:
+   to_disable = NULL;
+   spin_lock_irqsave(&target_list_lock, flags);
+   list_for_each_entry(nt, &target_list, list) {
+   if (!nt->disable_scheduled)
+   continue;
+   nt->disable_scheduled = false;
+
+   if (!nt->enabled)
+   continue;
+
+   netconsole_target_get(nt);
+   nt->enabled = false;
+   to_disable = nt;
+   break;
+   }
+   spin_unlock_irqrestore(&target_list_lock, flags);
+
+   if (to_disable) {
+   netpoll_cleanup(&to_disable->np);
+   netconsole_target_put(to_disable);
+   goto repeat;
+   }
+}
+
+static DECLARE_WORK(netconsole_deferred_disable_work,
+   netconsole_deferred_disable_work_fn);
+
 /* Handle network interface device notifications */
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event, void *ptr)
@@ -674,9 +708,7 @@ static int netconsole_netdev_event(struct notifier_block 
*this,
goto done;
 
spin_lock_irqsave(&target_list_lock, flags);
-restart:
list_for_each_entry(nt, &target_list, list) {
-   netconsole_target_get(nt);
if (nt->np.dev == dev) {
switch (event) {
case NETDEV_CHANGENAME:
@@ -685,23 +717,14 @@ restart:
case NETDEV_RELEASE:
case NETDEV_JOIN:
case NETDEV_UNREGISTER:
-   /* rtnl_lock already held
-* we might sleep in __netpoll_cleanup()
-*/
-   spin_unlock_irqrestore(&target_list_lock, 
flags);
-
-   __netpoll_cleanup(&nt->np);
-
-   spin_lock_irqsave(&target_list_lock, flags);
-   dev_put(nt->np.dev);
-   nt->np.dev = NULL;
-   nt->enabled = false;
+   /* rtnl_lock already held, punt to workqueue */
+   if (nt->enabled && !nt->disable_scheduled) {
+   nt->disable_scheduled = true;
+   
schedule_work(&netconsole_deferred_disable_work);
+   }
stopped = true;
-   netconsole_target_put(nt);
-   goto restart;
}
}
-   netconsole_target_put(nt);
}
spin_unlock_irqrestore(&target_list_lock, flags);
if (stopped) {
@@ -810,7 +833,7 @@ static int __init init_netconsole(void)
 
 undonotifier:
unregister_netdevice_notifier(&netconsole_netdev_notifier);
-
+   cancel_work_sync(&netconsole_deferred_disable_work);
 fail:
pr

[PATCH 16/16] netconsole: update documentation for extended netconsole

2015-04-16 Thread Tejun Heo
Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 Documentation/networking/netconsole.txt | 95 -
 1 file changed, 94 insertions(+), 1 deletion(-)

diff --git a/Documentation/networking/netconsole.txt 
b/Documentation/networking/netconsole.txt
index a5d574a..e7a57c2 100644
--- a/Documentation/networking/netconsole.txt
+++ b/Documentation/networking/netconsole.txt
@@ -2,6 +2,7 @@
 started by Ingo Molnar , 2001.09.17
 2.6 port and netpoll api by Matt Mackall , Sep 9 2003
 IPv6 support by Cong Wang , Jan 1 2013
+Reliable extended console support by Tejun Heo , Apr 16 2015
 
 Please send bug reports to Matt Mackall 
 Satyam Sharma , and Cong Wang 

@@ -24,9 +25,10 @@ Sender and receiver configuration:
 It takes a string configuration parameter "netconsole" in the
 following format:
 
- netconsole=[src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr]
+ netconsole=[+][src-port]@[src-ip]/[],[tgt-port]@/[tgt-macaddr]
 
where
++ if present, enable reliable extended console support
 src-port  source for UDP packets (defaults to 6665)
 src-ipsource IP to use (interface address)
 dev   network interface (eth0)
@@ -107,6 +109,7 @@ To remove a target:
 The interface exposes these parameters of a netconsole target to userspace:
 
enabled Is this target currently enabled?   (read-write)
+   extendedExtended mode enabled   (read-write)
dev_nameLocal network interface name(read-write)
local_port  Source UDP port to use  (read-write)
remote_port Remote agent's UDP port (read-write)
@@ -132,6 +135,96 @@ You can also update the local interface dynamically. This 
is especially
 useful if you want to use interfaces that have newly come up (and may not
 have existed when netconsole was loaded / initialized).
 
+Reliable extended console:
+==
+
+If '+' is prefixed to the configuration line or "extended" config file
+is set to 1, reliable extended console support is enabled. An example
+boot param follows.
+
+ linux netconsole=+@10.0.0.1/eth1,9353@10.0.0.2/12:34:56:78:9a:bc
+
+Reliable extended console support is consisted of three parts.
+
+1. Extended format messages
+---
+
+Log messages are transmitted with extended metadata header in the
+following format which is the same as /dev/kmsg.
+
+ ,,,;
+
+Non printable characters in  are escaped using "\xff"
+notation. If the message contains optional dictionary, verbatim
+newline is used as the delimeter. printk fragments from KERN_CONT
+messages are transmitted in the following form.
+
+ ,-,,-,fragid=;
+
+Where  uniquely identifies the message being assembled. Later
+when the assembly is complete, the completed message with sequence
+number is transmitted. The receiver is responsible for handling the
+overlaps between fragements and the final assembled message.
+
+If a message doesn't fit in 1000 bytes, the message is split into
+multiple fragments by netconsole. These fragments are transmitted with
+"ncfrag" header field added.
+
+ ncfrag=@<0-based-chunk-index>/
+
+For example,
+
+ 6,416,1758426,-,ncfrag=0@0/2;the first chunk,
+ 6,416,1758426,-,ncfrag=16@1/2;the second chunk
+
+2. Retransmission requests
+--
+
+If extended console support is enabled, netconsole listens for packets
+arriving on the source port. The receiver can send the packets of the
+following form to request messages with specific sequence numbers.
+
+ nca  ...
+
+There can be any number of  as long as they fit inside
+1000 bytes. If the messages with the matching sequence numbers exist,
+netconsole will retransmit them immediately.
+
+3. ACK and emergency transmission
+-
+
+If the receiver sends back a packet with the following format,
+netconsole enables ACK support.
+
+ nca  ...
+
+If the ack sequence lags the current sequence by more than 10s,
+emergency transmission is started and all the unacked messages are
+retransmitted periodically with increasing interval. In the first
+round, each unacked message is transmitted every 100ms. In the next,
+every 200ms. The interval doubles until it reaches 1s and stays there.
+
+Each emergency transmitted message has "emg=1" header field added to
+it. For example,
+
+ 6,416,1758426,-,emg=1;netpoll: netconsole: local port 
+
+This exists to ensure that messages reach the destination even when
+things went pear-shaped. As long as netpoll and timer are working, the
+logging target has pretty good chance of receiving all messages.
+
+Receiver
+
+
+libncrx is a library which implements reliable remote logger using the
+above features. The library is available under tools/lib/netconsole.
+The library is a pure state machine with packet and timer plumbing
+left to the library user to ease integration into larger framework.
+See tools/lib/

[PATCH 07/16] netconsole: factor out alloc_netconsole_target()

2015-04-16 Thread Tejun Heo
alloc_param_target() and make_netconsole_target() were duplicating the
same allocation and init logic.  Factor it out to
alloc_netconsole_target().

This is pure reorganization.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 40 
 1 file changed, 20 insertions(+), 20 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index 09d4e12..17692b8 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -167,19 +167,17 @@ static void netconsole_target_put(struct 
netconsole_target *nt)
 
 #endif /* CONFIG_NETCONSOLE_DYNAMIC */
 
-/* Allocate new target (from boot/module param) and setup netpoll for it */
-static struct netconsole_target *alloc_param_target(char *target_config)
+/*
+ * Allocate and initialize with defaults.  Note that these targets get
+ * their config_item fields zeroed-out.
+ */
+static struct netconsole_target *alloc_netconsole_target(void)
 {
-   int err = -ENOMEM;
struct netconsole_target *nt;
 
-   /*
-* Allocate and initialize with defaults.
-* Note that these targets get their config_item fields zeroed-out.
-*/
nt = kzalloc(sizeof(*nt), GFP_KERNEL);
if (!nt)
-   goto fail;
+   return NULL;
 
nt->np.name = "netconsole";
strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
@@ -188,6 +186,19 @@ static struct netconsole_target *alloc_param_target(char 
*target_config)
mutex_init(&nt->mutex);
eth_broadcast_addr(nt->np.remote_mac);
 
+   return nt;
+}
+
+/* Allocate new target (from boot/module param) and setup netpoll for it */
+static struct netconsole_target *alloc_param_target(char *target_config)
+{
+   int err = -ENOMEM;
+   struct netconsole_target *nt;
+
+   nt = alloc_netconsole_target();
+   if (!nt)
+   goto fail;
+
/* Parse parameters and setup netpoll */
err = netpoll_parse_options(&nt->np, target_config);
if (err)
@@ -592,21 +603,10 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
unsigned long flags;
struct netconsole_target *nt;
 
-   /*
-* Allocate and initialize with defaults.
-* Target is disabled at creation (!enabled).
-*/
-   nt = kzalloc(sizeof(*nt), GFP_KERNEL);
+   nt = alloc_netconsole_target();
if (!nt)
return ERR_PTR(-ENOMEM);
 
-   nt->np.name = "netconsole";
-   strlcpy(nt->np.dev_name, "eth0", IFNAMSIZ);
-   nt->np.local_port = 6665;
-   nt->np.remote_port = ;
-   mutex_init(&nt->mutex);
-   eth_broadcast_addr(nt->np.remote_mac);
-
/* Initialize the config_item member */
config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 04/16] printk: implement support for extended console drivers

2015-04-16 Thread Tejun Heo
printk log_buf keeps various metadata for each message including its
sequence number and timestamp.  The metadata is currently available
only through /dev/kmsg and stripped out before passed onto console
drivers.  We want this metadata to be available to console drivers
too.  Immediately, it's to implement reliable netconsole but may be
useful for other console devices too.

This patch implements support for extended console drivers.  Consoles
can indicate that they process extended messages by setting the new
CON_EXTENDED flag and they'll fed messages formatted the same way as
/dev/kmsg output as follows.

 ,,,;

One special case is fragments.  Message fragments are output
immediately to consoles to avoid losing them in case of crashes.  For
normal consoles, this is handled by later suppressing the assembled
result and /dev/kmsg only shows fully assembled message; however,
extended consoles would need both the fragments, to avoid losing
message in case of a crash, and the assembled result, to tell how the
fragments are assembled and which sequence number got assigned to it.

To help matching up the fragments with the resulting message,
fragments are emitted in the following format.

 ,-,,-,fragid=;

And later when the assembly is complete, the following is transmitted,

 fragid=;

* Extended message formatting for console drivers is enabled iff there
  are registered extended consoles.

* Comment describing extended message formats updated to help
  distinguishing variable with verbatim terms.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 include/linux/console.h |   1 +
 kernel/printk/printk.c  | 141 +---
 2 files changed, 123 insertions(+), 19 deletions(-)

diff --git a/include/linux/console.h b/include/linux/console.h
index 7571a16..04bbd09 100644
--- a/include/linux/console.h
+++ b/include/linux/console.h
@@ -115,6 +115,7 @@ static inline int con_debug_leave(void)
 #define CON_BOOT   (8)
 #define CON_ANYTIME(16) /* Safe to call when cpu is offline */
 #define CON_BRL(32) /* Used for a braille device */
+#define CON_EXTENDED   (64) /* Use the extended output format a la /dev/kmsg */
 
 struct console {
charname[16];
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 0175c46..349a37b 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -84,6 +84,8 @@ static struct lockdep_map console_lock_dep_map = {
 };
 #endif
 
+static int nr_ext_console_drivers;
+
 /*
  * Helper macros to handle lockdep when locking/unlocking console_sem. We use
  * macros instead of functions so that _RET_IP_ contains useful information.
@@ -195,14 +197,28 @@ static int console_may_schedule;
  * need to be changed in the future, when the requirements change.
  *
  * /dev/kmsg exports the structured data in the following line format:
- *   "level,sequnum,timestamp;\n"
+ *   ",,,;\n"
  *
  * The optional key/value pairs are attached as continuation lines starting
  * with a space character and terminated by a newline. All possible
  * non-prinatable characters are escaped in the "\xff" notation.
  *
  * Users of the export format should ignore possible additional values
- * separated by ',', and find the message after the ';' character.
+ * separated by ',', and find the message after the ';' character. All
+ * optional header fields should have the form "=".
+ *
+ * For consoles with CON_EXTENDED set, a message formatted like the
+ * following may also be printed. This is a continuation fragment which are
+ * being assembled and will be re-transmitted with a normal header once
+ * assembly finishes. The fragments are sent out immediately to avoid
+ * losing them over a crash.
+ *   ",-,,-,fragid=;\n"
+ *
+ * On completion of assembly, the following is transmitted.
+ *   "fragid=;\n"
+ *
+ * Extended consoles should identify and handle duplicates by matching the
+ * fragids of the fragments and assembled messages.
  */
 
 enum log_flags {
@@ -210,6 +226,7 @@ enum log_flags {
LOG_NEWLINE = 2,/* text ended with a newline */
LOG_PREFIX  = 4,/* text started with a prefix */
LOG_CONT= 8,/* text is a fragment of a continuation line */
+   LOG_DICT_META   = 16,   /* dict contains console meta information */
 };
 
 struct printk_log {
@@ -292,6 +309,12 @@ static char *log_dict(const struct printk_log *msg)
return (char *)msg + sizeof(struct printk_log) + msg->text_len;
 }
 
+/* if LOG_DICT_META is set, the dict buffer carries printk internal info */
+static size_t log_dict_len(const struct printk_log *msg)
+{
+   return (msg->flags & LOG_DICT_META) ? 0 : msg->dict_len;
+}
+
 /* get record by index; idx must point to valid msg */
 static struct printk_log *log_from_idx(u32 idx)
 {
@@ -516,7 +539,9 @@ static ssize_t msg_print_ext_header(char *buf, size_t size,
enum log_flags prev_flags)
 {
u64 ts_u

[PATCH 15/16] netconsole: implement netconsole receiver library

2015-04-16 Thread Tejun Heo
This patch implements libncrx.a and a simple receiver program ncrx
using the library.  The library makes use of the extended header,
retransmission and ack support to implement reliable netconsole
receiver.  The library is structured as a pure state machine leaving
all IO and timing handling to the library user and should be easy to
fit into any environment.

The reordering and retransmission mechanisms are fairly simple.  Each
receiver instance has a ring buffer where messages are slotted
according to their sequence number.  A hole in sequence numbers
trigger retransmission after the sequence progresses certain number of
slots or certain amount of duration has passed.

See tools/lib/netconsole/ncrx.h for information on usage.

Tested with simulated heavy packet loss (50%).  Messages are
transferred without loss and all the common scenarios including
reboots are handled correctly.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 tools/Makefile|  16 +-
 tools/lib/netconsole/Makefile |  36 ++
 tools/lib/netconsole/ncrx.c   | 906 ++
 tools/lib/netconsole/ncrx.h   | 204 ++
 tools/ncrx/Makefile   |  14 +
 tools/ncrx/ncrx.c | 143 +++
 6 files changed, 1318 insertions(+), 1 deletion(-)
 create mode 100644 tools/lib/netconsole/Makefile
 create mode 100644 tools/lib/netconsole/ncrx.c
 create mode 100644 tools/lib/netconsole/ncrx.h
 create mode 100644 tools/ncrx/Makefile
 create mode 100644 tools/ncrx/ncrx.c

diff --git a/tools/Makefile b/tools/Makefile
index 9a617ad..9f588f7 100644
--- a/tools/Makefile
+++ b/tools/Makefile
@@ -9,6 +9,7 @@ help:
@echo '  firewire   - the userspace part of nosy, an IEEE-1394 traffic 
sniffer'
@echo '  hv - tools used when in Hyper-V clients'
@echo '  lguest - a minimal 32-bit x86 hypervisor'
+   @echo '  ncrx   - simple reliable netconsole logger'
@echo '  perf   - Linux performance measurement and analysis tool'
@echo '  selftests  - various kernel selftests'
@echo '  turbostat  - Intel CPU idle stats and freq reporting tool'
@@ -50,6 +51,12 @@ liblockdep: FORCE
 libapikfs: FORCE
$(call descend,lib/api)
 
+libncrx: FORCE
+   $(call descend,lib/netconsole)
+
+ncrx: libncrx FORCE
+   $(call descend,$@)
+
 perf: libapikfs FORCE
$(call descend,$@)
 
@@ -100,6 +107,12 @@ liblockdep_clean:
 libapikfs_clean:
$(call descend,lib/api,clean)
 
+libncrx_clean:
+   $(call descend,lib/netconsole,clean)
+
+ncrx_clean: libncrx_clean
+   $(call descend,$(@:_clean=),clean)
+
 perf_clean: libapikfs_clean
$(call descend,$(@:_clean=),clean)
 
@@ -114,6 +127,7 @@ tmon_clean:
 
 clean: acpi_clean cgroup_clean cpupower_clean hv_clean firewire_clean 
lguest_clean \
perf_clean selftests_clean turbostat_clean usb_clean 
virtio_clean \
-   vm_clean net_clean x86_energy_perf_policy_clean tmon_clean
+   vm_clean net_clean x86_energy_perf_policy_clean tmon_clean \
+   ncrx_clean
 
 .PHONY: FORCE
diff --git a/tools/lib/netconsole/Makefile b/tools/lib/netconsole/Makefile
new file mode 100644
index 000..6bc7997
--- /dev/null
+++ b/tools/lib/netconsole/Makefile
@@ -0,0 +1,36 @@
+include ../../scripts/Makefile.include
+
+CC = $(CROSS_COMPILE)gcc
+AR = $(CROSS_COMPILE)ar
+
+# guard against environment variables
+LIB_H=
+LIB_OBJS=
+
+LIB_H += ncrx.h
+LIB_OBJS += $(OUTPUT)ncrx.o
+
+LIBFILE = libncrx.a
+
+CFLAGS = -g -Wall -O2 $(EXTRA_WARNINGS) $(EXTRA_CFLAGS) -fPIC
+ALL_CFLAGS = $(CFLAGS) $(BASIC_CFLAGS)
+ALL_LDFLAGS = $(LDFLAGS)
+
+RM = rm -f
+
+$(LIBFILE): $(LIB_OBJS)
+   $(QUIET_AR)$(RM) $@ && $(AR) rcs $(OUTPUT)$@ $(LIB_OBJS)
+
+$(LIB_OBJS): $(LIB_H)
+
+$(OUTPUT)%.o: %.c
+   $(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
+$(OUTPUT)%.s: %.c
+   $(QUIET_CC)$(CC) -S $(ALL_CFLAGS) $<
+$(OUTPUT)%.o: %.S
+   $(QUIET_CC)$(CC) -o $@ -c $(ALL_CFLAGS) $<
+
+clean:
+   $(call QUIET_CLEAN, libncrx) $(RM) $(LIB_OBJS) $(LIBFILE)
+
+.PHONY: clean
diff --git a/tools/lib/netconsole/ncrx.c b/tools/lib/netconsole/ncrx.c
new file mode 100644
index 000..8a87e6b
--- /dev/null
+++ b/tools/lib/netconsole/ncrx.c
@@ -0,0 +1,906 @@
+/*
+ * ncrx - extended netconsole receiver library
+ *
+ * Copyright (C) 2015  Facebook, Inc
+ * Copyright (C) 2015  Tejun Heo 
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ncrx.h"
+
+/* max payload len for responses, this is what netconsole uses on tx side */
+#define NCRX_RESP_MAX  1000
+/* oos history is tracked with a uint32_t */
+#define NCRX_OOS_MAX   32
+
+struct ncrx_msg_list {
+   struct ncrx_listhead;
+   int nr; /* number of msgs on the list */
+};
+
+struct ncrx_slot {
+   struct ncrx_msg *msg;
+   uint64_ttimestamp; 

[PATCH 09/16] netconsole: replace target_list_lock with console_lock

2015-04-16 Thread Tejun Heo
netconsole has been using a spinlock - target_list_lock - to protect
the list of configured netconsole targets and their enable/disable
states.  With the disabling from netdevice_notifier moved off to a
workqueue by the previous patch and thus outside of rtnl_lock,
target_list_lock can be replaced with console_lock, which allows us to
avoid grabbing an extra lock in the log write path and can simplify
locking when involving other subsystems as console_lock is only
trylocked from printk path.

This patch replaces target_list_lock with console_lock.  The
conversion is one-to-one except for write_msg().  The function is
called with console_lock() already held so no further locking is
necessary; however, as netpoll_send_udp() expects irq to be disabled,
explicit irq save/restore pair is added around it.

Signed-off-by: Tejun Heo 
Cc: David Miller 
---
 drivers/net/netconsole.c | 50 ++--
 1 file changed, 19 insertions(+), 31 deletions(-)

diff --git a/drivers/net/netconsole.c b/drivers/net/netconsole.c
index d355776..57c02ab 100644
--- a/drivers/net/netconsole.c
+++ b/drivers/net/netconsole.c
@@ -73,12 +73,12 @@ static int __init option_setup(char *opt)
 __setup("netconsole=", option_setup);
 #endif /* MODULE */
 
-/* Linked list of all configured targets */
+/*
+ * Linked list of all configured targets.  The list and each target's
+ * enable/disable state are protected by console_lock.
+ */
 static LIST_HEAD(target_list);
 
-/* This needs to be a spinlock because write_msg() cannot sleep */
-static DEFINE_SPINLOCK(target_list_lock);
-
 /**
  * struct netconsole_target - Represents a configured netconsole target.
  * @list:  Links this target into the target_list.
@@ -325,7 +325,6 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 const char *buf,
 size_t count)
 {
-   unsigned long flags;
int enabled;
int err;
 
@@ -357,9 +356,9 @@ static ssize_t store_enabled(struct netconsole_target *nt,
 * otherwise we might end up in write_msg() with
 * nt->np.dev == NULL and nt->enabled == true
 */
-   spin_lock_irqsave(&target_list_lock, flags);
+   console_lock();
nt->enabled = false;
-   spin_unlock_irqrestore(&target_list_lock, flags);
+   console_unlock();
netpoll_cleanup(&nt->np);
}
 
@@ -601,7 +600,6 @@ static struct config_item_type netconsole_target_type = {
 static struct config_item *make_netconsole_target(struct config_group *group,
  const char *name)
 {
-   unsigned long flags;
struct netconsole_target *nt;
 
nt = alloc_netconsole_target();
@@ -612,9 +610,9 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
config_item_init_type_name(&nt->item, name, &netconsole_target_type);
 
/* Adding, but it is disabled */
-   spin_lock_irqsave(&target_list_lock, flags);
+   console_lock();
list_add(&nt->list, &target_list);
-   spin_unlock_irqrestore(&target_list_lock, flags);
+   console_unlock();
 
return &nt->item;
 }
@@ -622,12 +620,11 @@ static struct config_item *make_netconsole_target(struct 
config_group *group,
 static void drop_netconsole_target(struct config_group *group,
   struct config_item *item)
 {
-   unsigned long flags;
struct netconsole_target *nt = to_target(item);
 
-   spin_lock_irqsave(&target_list_lock, flags);
+   console_lock();
list_del(&nt->list);
-   spin_unlock_irqrestore(&target_list_lock, flags);
+   console_unlock();
 
/*
 * The target may have never been enabled, or was manually disabled
@@ -664,11 +661,10 @@ static struct configfs_subsystem netconsole_subsys = {
 static void netconsole_deferred_disable_work_fn(struct work_struct *work)
 {
struct netconsole_target *nt, *to_disable;
-   unsigned long flags;
 
 repeat:
to_disable = NULL;
-   spin_lock_irqsave(&target_list_lock, flags);
+   console_lock();
list_for_each_entry(nt, &target_list, list) {
if (!nt->disable_scheduled)
continue;
@@ -682,7 +678,7 @@ repeat:
to_disable = nt;
break;
}
-   spin_unlock_irqrestore(&target_list_lock, flags);
+   console_unlock();
 
if (to_disable) {
netpoll_cleanup(&to_disable->np);
@@ -698,7 +694,6 @@ static DECLARE_WORK(netconsole_deferred_disable_work,
 static int netconsole_netdev_event(struct notifier_block *this,
   unsigned long event, void *ptr)
 {
-   unsigned long flags;
struct netconsole_target *nt;
struct net_device *dev = netdev_notifier_info_to_dev(ptr);
bool stopped = false;
@@ -707,7 +702,

[PATCHSET] printk, netconsole: implement reliable netconsole

2015-04-16 Thread Tejun Heo
In a lot of configurations, netconsole is a useful way to collect
system logs; however, all netconsole does is simply emitting UDP
packets for the raw messages and there's no way for the receiver to
find out whether the packets were lost and/or reordered in flight.

printk already keeps log metadata which contains enough information to
make netconsole reliable.  This patchset does the followings.

* Make printk metadata available to console drivers.  A console driver
  can request this mode by setting CON_EXTENDED.  The metadata is
  emitted in the same format as /dev/kmsg.  This also makes all
  logging metadata including facility, loglevel and dictionary
  available to console receivers.

* Implement extended mode support in netconsole.  When enabled,
  netconsole transmits messages with extended header which is enough
  for the receiver to detect missing messages.

* Implement netconsole retransmission support.  Matching rx socket on
  the source port is automatically created for extended targets and
  the log receiver can request retransmission by sending reponse
  packets.  This is completely decoupled from the main write path and
  doesn't make netconsole less robust when things start go south.

* Implement netconsole ack support.  The response packet can
  optionally contain ack which enables emergency transmission timer.
  If acked sequence lags the current sequence for over 10s, netconsole
  repeatedly re-sends unacked messages with increasing interval.  This
  ensures that the receiver has the latest messages and also that all
  messages are transferred even while the kernel is failing as long as
  timer and netpoll are operational.  This too is completely decoupled
  from the main write path and doesn't make netconsole less robust.

* Implement the receiver library and simple receiver using it
  respectively in tools/lib/netconsole/libncrx.a and tools/ncrx/ncrx.
  In a simulated test with heavy packet loss (50%), ncrx logs all
  messages reliably and handle exceptional conditions including
  reboots as expected.

An obvious alternative for reliable loggin would be using a separate
TCP connection in addition to the UDP packets; however, I decided for
UDP based retransmission and ack mechanism for the following reasons.

* kernel side doesn't get simpler by using TCP.  It'd still need to
  transmit extended format messages, which BTW are useful regardless
  of reliable transmission, to match up UDP and TCP messages and
  detect missing ones from TCP send buffer filling up.  Also, the
  timeout and emergency transmission support would still be necessary
  to ensure that messages are transmitted in case of, e.g., network
  stack faiure.  It'd at least be about the same amount of code as the
  UDP based implementation.

* Receiver side might be a bit simpler but not by much.  It'd still
  need to keep track of the UDP based messages and then match them up
  with TCP messages and put messages from both sources in order (each
  stream may miss different ones) and would have to deal with
  reestablishing connections after reboots.  The only part which can
  completely go away would be the actual ack and retransmission part
  and that isn't a lot of logic.

* When the network condition is good, the only thing the UDP based
  implementation adds is occassional ack messages.  TCP based
  implementation would end up transmitting all messages twice which
  still isn't much but kinda silly given that using TCP doesn't lower
  the complexity in meaningful ways.

This patchset contains the following 16 patches.

 0001-printk-guard-the-amount-written-per-line-by-devkmsg_.patch
 0002-printk-factor-out-message-formatting-from-devkmsg_re.patch
 0003-printk-move-LOG_NOCONS-skipping-into-call_console_dr.patch
 0004-printk-implement-support-for-extended-console-driver.patch
 0005-printk-implement-log_seq_range-and-ext_log_from_seq.patch
 0006-netconsole-make-netconsole_target-enabled-a-bool.patch
 0007-netconsole-factor-out-alloc_netconsole_target.patch
 0008-netconsole-punt-disabling-to-workqueue-from-netdevic.patch
 0009-netconsole-replace-target_list_lock-with-console_loc.patch
 0010-netconsole-introduce-netconsole_mutex.patch
 0011-netconsole-consolidate-enable-disable-and-create-des.patch
 0012-netconsole-implement-extended-console-support.patch
 0013-netconsole-implement-retransmission-support-for-exte.patch
 0014-netconsole-implement-ack-handling-and-emergency-tran.patch
 0015-netconsole-implement-netconsole-receiver-library.patch
 0016-netconsole-update-documentation-for-extended-netcons.patch

0001-0005 implement extended console support in printk.

0006-0011 are prep patches for netconsole.

0012-0014 implement extended mode, retransmission and ack support.

0015 implements receiver library, libncrx, and a simple receiver using
the library, ncrx.

0016 updates documentation.

As the patchset touches both printk and netconsole, I'm not sure how
these patches should be routed once acked.  Either -mm or net sho

[PATCH 02/16] printk: factor out message formatting from devkmsg_read()

2015-04-16 Thread Tejun Heo
The extended message formatting used for /dev/kmsg will be used
implement extended consoles.  Factor out msg_print_ext_header() and
msg_print_ext_body() from devkmsg_read().

This is pure restructuring.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 kernel/printk/printk.c | 157 ++---
 1 file changed, 85 insertions(+), 72 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index b6e24af..5ea6709 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -505,6 +505,86 @@ int check_syslog_permissions(int type, bool from_file)
return security_syslog(type);
 }
 
+static void append_char(char **pp, char *e, char c)
+{
+   if (*pp < e)
+   *(*pp)++ = c;
+}
+
+static ssize_t msg_print_ext_header(char *buf, size_t size,
+   struct printk_log *msg, u64 seq,
+   enum log_flags prev_flags)
+{
+   u64 ts_usec = msg->ts_nsec;
+   char cont = '-';
+
+   do_div(ts_usec, 1000);
+
+   /*
+* If we couldn't merge continuation line fragments during the print,
+* export the stored flags to allow an optional external merge of the
+* records. Merging the records isn't always neccessarily correct, like
+* when we hit a race during printing. In most cases though, it produces
+* better readable output. 'c' in the record flags mark the first
+* fragment of a line, '+' the following.
+*/
+   if (msg->flags & LOG_CONT && !(prev_flags & LOG_CONT))
+   cont = 'c';
+   else if ((msg->flags & LOG_CONT) ||
+((prev_flags & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
+   cont = '+';
+
+   return scnprintf(buf, size, "%u,%llu,%llu,%c;",
+  (msg->facility << 3) | msg->level, seq, ts_usec, cont);
+}
+
+static ssize_t msg_print_ext_body(char *buf, size_t size,
+ char *dict, size_t dict_len,
+ char *text, size_t text_len)
+{
+   char *p = buf, *e = buf + size;
+   size_t i;
+
+   /* escape non-printable characters */
+   for (i = 0; i < text_len; i++) {
+   unsigned char c = text[i];
+
+   if (c < ' ' || c >= 127 || c == '\\')
+   p += scnprintf(p, e - p, "\\x%02x", c);
+   else
+   append_char(&p, e, c);
+   }
+   append_char(&p, e, '\n');
+
+   if (dict_len) {
+   bool line = true;
+
+   for (i = 0; i < dict_len; i++) {
+   unsigned char c = dict[i];
+
+   if (line) {
+   append_char(&p, e, ' ');
+   line = false;
+   }
+
+   if (c == '\0') {
+   append_char(&p, e, '\n');
+   line = true;
+   continue;
+   }
+
+   if (c < ' ' || c >= 127 || c == '\\') {
+   p += scnprintf(p, e - p, "\\x%02x", c);
+   continue;
+   }
+
+   append_char(&p, e, c);
+   }
+   append_char(&p, e, '\n');
+   }
+
+   return p - buf;
+}
 
 /* /dev/kmsg - userspace message inject/listen interface */
 struct devkmsg_user {
@@ -565,30 +645,17 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
-static void append_char(char **pp, char *e, char c)
-{
-   if (*pp < e)
-   *(*pp)++ = c;
-}
-
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 {
struct devkmsg_user *user = file->private_data;
struct printk_log *msg;
-   char *p, *e;
-   u64 ts_usec;
-   size_t i;
-   char cont = '-';
size_t len;
ssize_t ret;
 
if (!user)
return -EBADF;
 
-   p = user->buf;
-   e = user->buf + sizeof(user->buf);
-
ret = mutex_lock_interruptible(&user->lock);
if (ret)
return ret;
@@ -618,71 +685,17 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
}
 
msg = log_from_idx(user->idx);
-   ts_usec = msg->ts_nsec;
-   do_div(ts_usec, 1000);
-
-   /*
-* If we couldn't merge continuation line fragments during the print,
-* export the stored flags to allow an optional external merge of the
-* records. Merging the records isn't always neccessarily correct, like
-* when we hit a race during printing. In most cases though, it produces
-* better readable output. 'c' in the record flags mark the first
-* fragment of a line, '+' the following.
-*/
-   if (msg->flags & LOG_CON

[PATCH 01/16] printk: guard the amount written per line by devkmsg_read()

2015-04-16 Thread Tejun Heo
devkmsg_read() uses 8k buffer and assumes that the formatted output
message won't overrun which seems safe given LOG_LINE_MAX, the current
use of dict and the escaping method being used; however, we're
planning to use devkmsg formatting wider and accounting for the buffer
size properly isn't that complicated.

This patch defines CONSOLE_EXT_LOG_MAX as 8192 and updates
devkmsg_read() so that it limits output accordingly.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 include/linux/printk.h |  2 ++
 kernel/printk/printk.c | 35 +++
 2 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/include/linux/printk.h b/include/linux/printk.h
index 9b30871..58b1fec 100644
--- a/include/linux/printk.h
+++ b/include/linux/printk.h
@@ -30,6 +30,8 @@ static inline const char *printk_skip_level(const char 
*buffer)
return buffer;
 }
 
+#define CONSOLE_EXT_LOG_MAX8192
+
 /* printk's without a loglevel use this.. */
 #define MESSAGE_LOGLEVEL_DEFAULT CONFIG_MESSAGE_LOGLEVEL_DEFAULT
 
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 879edfc..b6e24af 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -512,7 +512,7 @@ struct devkmsg_user {
u32 idx;
enum log_flags prev;
struct mutex lock;
-   char buf[8192];
+   char buf[CONSOLE_EXT_LOG_MAX];
 };
 
 static ssize_t devkmsg_write(struct kiocb *iocb, struct iov_iter *from)
@@ -565,11 +565,18 @@ static ssize_t devkmsg_write(struct kiocb *iocb, struct 
iov_iter *from)
return ret;
 }
 
+static void append_char(char **pp, char *e, char c)
+{
+   if (*pp < e)
+   *(*pp)++ = c;
+}
+
 static ssize_t devkmsg_read(struct file *file, char __user *buf,
size_t count, loff_t *ppos)
 {
struct devkmsg_user *user = file->private_data;
struct printk_log *msg;
+   char *p, *e;
u64 ts_usec;
size_t i;
char cont = '-';
@@ -579,6 +586,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
if (!user)
return -EBADF;
 
+   p = user->buf;
+   e = user->buf + sizeof(user->buf);
+
ret = mutex_lock_interruptible(&user->lock);
if (ret)
return ret;
@@ -625,9 +635,9 @@ static ssize_t devkmsg_read(struct file *file, char __user 
*buf,
 ((user->prev & LOG_CONT) && !(msg->flags & LOG_PREFIX)))
cont = '+';
 
-   len = sprintf(user->buf, "%u,%llu,%llu,%c;",
- (msg->facility << 3) | msg->level,
- user->seq, ts_usec, cont);
+   p += scnprintf(p, e - p, "%u,%llu,%llu,%c;",
+  (msg->facility << 3) | msg->level,
+  user->seq, ts_usec, cont);
user->prev = msg->flags;
 
/* escape non-printable characters */
@@ -635,11 +645,11 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
unsigned char c = log_text(msg)[i];
 
if (c < ' ' || c >= 127 || c == '\\')
-   len += sprintf(user->buf + len, "\\x%02x", c);
+   p += scnprintf(p, e - p, "\\x%02x", c);
else
-   user->buf[len++] = c;
+   append_char(&p, e, c);
}
-   user->buf[len++] = '\n';
+   append_char(&p, e, '\n');
 
if (msg->dict_len) {
bool line = true;
@@ -648,30 +658,31 @@ static ssize_t devkmsg_read(struct file *file, char 
__user *buf,
unsigned char c = log_dict(msg)[i];
 
if (line) {
-   user->buf[len++] = ' ';
+   append_char(&p, e, ' ');
line = false;
}
 
if (c == '\0') {
-   user->buf[len++] = '\n';
+   append_char(&p, e, '\n');
line = true;
continue;
}
 
if (c < ' ' || c >= 127 || c == '\\') {
-   len += sprintf(user->buf + len, "\\x%02x", c);
+   p += scnprintf(p, e - p, "\\x%02x", c);
continue;
}
 
-   user->buf[len++] = c;
+   append_char(&p, e, c);
}
-   user->buf[len++] = '\n';
+   append_char(&p, e, '\n');
}
 
user->idx = log_next(user->idx);
user->seq++;
raw_spin_unlock_irq(&logbuf_lock);
 
+   len = p - user->buf;
if (len > count) {
ret = -EINVAL;
goto out;
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kern

[PATCH 03/16] printk: move LOG_NOCONS skipping into call_console_drivers()

2015-04-16 Thread Tejun Heo
When a line is printed by multiple printk invocations, each chunk is
directly sent out to console drivers so that they don't get lost.
When the line is completed and stored in the log buffer, the line is
suppressed from going out to consoles as that'd lead to duplicate
outputs.  This is tracked with LOG_NOCONS flag.

The suppression is currently implemented in console_unlock() which
skips invoking call_console_drivers() for LOG_NOCONS messages.  This
patch moves the filtering into call_console_drivers() in preparation
of the planned extended console drivers which will deal with the
duplicate messages themselves.

While this makes call_console_drivers() iterate over LOG_NOCONS
messages, this is extremely unlikely to matter especially given that
continuation lines aren't that common and also simplifies
console_unlock() a bit.

Signed-off-by: Tejun Heo 
Cc: Kay Sievers 
Cc: Petr Mladek 
---
 kernel/printk/printk.c | 46 --
 1 file changed, 24 insertions(+), 22 deletions(-)

diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 5ea6709..0175c46 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1417,7 +1417,8 @@ SYSCALL_DEFINE3(syslog, int, type, char __user *, buf, 
int, len)
  * log_buf[start] to log_buf[end - 1].
  * The console_lock must be held.
  */
-static void call_console_drivers(int level, const char *text, size_t len)
+static void call_console_drivers(int level, bool nocons,
+const char *text, size_t len)
 {
struct console *con;
 
@@ -1438,6 +1439,13 @@ static void call_console_drivers(int level, const char 
*text, size_t len)
if (!cpu_online(smp_processor_id()) &&
!(con->flags & CON_ANYTIME))
continue;
+   /*
+* Skip record we have buffered and already printed
+* directly to the console when we received it.
+*/
+   if (nocons)
+   continue;
+
con->write(con, text, len);
}
 }
@@ -1919,7 +1927,8 @@ static struct cont {
 } cont;
 static struct printk_log *log_from_idx(u32 idx) { return NULL; }
 static u32 log_next(u32 idx) { return 0; }
-static void call_console_drivers(int level, const char *text, size_t len) {}
+static void call_console_drivers(int level, bool nocons,
+const char *text, size_t len) {}
 static size_t msg_print_text(const struct printk_log *msg, enum log_flags prev,
 bool syslog, char *buf, size_t size) { return 0; }
 static size_t cont_print_text(char *text, size_t size) { return 0; }
@@ -2190,7 +2199,7 @@ static void console_cont_flush(char *text, size_t size)
len = cont_print_text(text, size);
raw_spin_unlock(&logbuf_lock);
stop_critical_timings();
-   call_console_drivers(cont.level, text, len);
+   call_console_drivers(cont.level, false, text, len);
start_critical_timings();
local_irq_restore(flags);
return;
@@ -2234,6 +2243,7 @@ again:
struct printk_log *msg;
size_t len;
int level;
+   bool nocons;
 
raw_spin_lock_irqsave(&logbuf_lock, flags);
if (seen_seq != log_next_seq) {
@@ -2252,38 +2262,30 @@ again:
} else {
len = 0;
}
-skip:
+
if (console_seq == log_next_seq)
break;
 
msg = log_from_idx(console_idx);
-   if (msg->flags & LOG_NOCONS) {
-   /*
-* Skip record we have buffered and already printed
-* directly to the console when we received it.
-*/
-   console_idx = log_next(console_idx);
-   console_seq++;
-   /*
-* We will get here again when we register a new
-* CON_PRINTBUFFER console. Clear the flag so we
-* will properly dump everything later.
-*/
-   msg->flags &= ~LOG_NOCONS;
-   console_prev = msg->flags;
-   goto skip;
-   }
-
level = msg->level;
+   nocons = msg->flags & LOG_NOCONS;
len += msg_print_text(msg, console_prev, false,
  text + len, sizeof(text) - len);
console_idx = log_next(console_idx);
console_seq++;
console_prev = msg->flags;
+
+   /*
+* The log will be processed again when we register a new
+* CON_PRINTBUFFER console. Clear the flag so we will
+* properly dump everything later.
+*/
+   msg->flags &= ~LOG_NOCONS;
+

Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Vivien Didelot
Hi Guenter,

> >  switch (index) {
> > +case 0: /* temp1_input */
> > +if (drv->get_temp)
> > +mode |= S_IRUGO;
> 
> This should be mandatory. Sorry, I don't really understand what you are
> trying to accomplish here.
> 
> Can you give me a real world example where a chip would support setting
> a limit but not reading it ?

I have no such example. I just did not see why this couldn't be allowed
(e.g. setting only set_temp_limit and get_temp_alarm looks fine to me).
But if you say that get_temp should be mandatory, I'm OK with that.

The primary goal of this patchset was to use DEVICE_ATTR_RW to declare
temp1_max, instead of reflecting the minimal permissions needed.

Best,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/1] altera tse: Fix network-delays and -retransmissions after high throughput.

2015-04-16 Thread Andreas Oetken
From: Andreas Oetken 

Fix bug which occurs when more than  packets are available during
napi-poll, leading to "delays" and retransmissions on the network.

Check for (count < limit) before checking the get_rx_status in tse_rx-function.
Function get_rx_status is reading from the response-fifo.
If there is currently a response in the fifo,
reading the last byte of the response pops the value from the fifo.
If the limit is checked as second condition
and the limit is reached the fifo is popped but the packet is not processed.

Signed-off-by: Andreas Oetken 
---
 drivers/net/ethernet/altera/altera_tse_main.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/altera/altera_tse_main.c 
b/drivers/net/ethernet/altera/altera_tse_main.c
index 6725dc0..dbbbd34 100644
--- a/drivers/net/ethernet/altera/altera_tse_main.c
+++ b/drivers/net/ethernet/altera/altera_tse_main.c
@@ -376,8 +376,13 @@ static int tse_rx(struct altera_tse_private *priv, int 
limit)
u16 pktlength;
u16 pktstatus;
 
-   while (((rxstatus = priv->dmaops->get_rx_status(priv)) != 0) &&
-  (count < limit))  {
+   /* Check for count < limit first as get_rx_status is changing
+   * the response-fifo so we must process the next packet
+   * after calling get_rx_status if a response is pending.
+   * (reading the last byte of the response pops the value from the fifo.)
+   */
+   while ((count < limit) &&
+  ((rxstatus = priv->dmaops->get_rx_status(priv)) != 0)) {
pktstatus = rxstatus >> 16;
pktlength = rxstatus & 0x;
 
-- 
2.1.4

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Guenter Roeck
On Thu, Apr 16, 2015 at 02:38:19PM -0400, Vivien Didelot wrote:
> A switch driver may only provide one of the temperature limit accessors,
> or the temperature alarm getter. So register the hwmon subsystem if any
> of the related functions is provided.
> 
> Thus, check get_temp to set the visibility of temp1_input.
> 
> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/dsa.c | 7 ++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 67d2983..6b68994 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -157,6 +157,10 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
> *kobj,
>   umode_t mode = 0;
>  
>   switch (index) {
> + case 0: /* temp1_input */
> + if (drv->get_temp)
> + mode |= S_IRUGO;

This should be mandatory. Sorry, I don't really understand what you are
trying to accomplish here.

Can you give me a real world example where a chip would support setting
a limit but not reading it ?

Thanks,
Guenter

> + break;
>   case 1: /* temp1_max */
>   if (drv->get_temp_limit)
>   mode |= S_IRUGO;
> @@ -310,7 +314,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
> struct device *parent)
>* register with hardware monitoring subsystem.
>* Treat registration error as non-fatal and ignore it.
>*/
> - if (drv->get_temp) {
> + if (drv->get_temp || drv->get_temp_limit || drv->set_temp_limit ||
> + drv->get_temp_alarm) {
>   const char *netname = netdev_name(dst->master_netdev);
>   char hname[IFNAMSIZ + 1];
>   int i, j;
> -- 
> 2.3.5
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Guenter Roeck
On Thu, Apr 16, 2015 at 02:38:18PM -0400, Vivien Didelot wrote:
> Since commit da4759c, sysfs will only use the permissions returned by
> is_visible, instead of OR'ing them with the default file mode.
> 
> This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and
> just return the desired permissions for the hwmon sysfs attributes in
> dsa_hwmon_attrs_visible.
> 
> Also, allow temp1_max to be write-only if set_temp_limit is provided,
> but not get_temp_limit.
> 
Hi Vivien,

This would be a first for the entire hwmon subsystem and doesn't really
make sense.

Guenter

> Signed-off-by: Vivien Didelot 
> ---
>  net/dsa/dsa.c | 24 
>  1 file changed, 16 insertions(+), 8 deletions(-)
> 
> diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
> index 5eaadab..67d2983 100644
> --- a/net/dsa/dsa.c
> +++ b/net/dsa/dsa.c
> @@ -124,7 +124,7 @@ static ssize_t temp1_max_store(struct device *dev,
>  
>   return count;
>  }
> -static DEVICE_ATTR(temp1_max, S_IRUGO, temp1_max_show, temp1_max_store);
> +static DEVICE_ATTR_RW(temp1_max);
>  
>  static ssize_t temp1_max_alarm_show(struct device *dev,
>   struct device_attribute *attr, char *buf)
> @@ -154,16 +154,24 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
> *kobj,
>   struct device *dev = container_of(kobj, struct device, kobj);
>   struct dsa_switch *ds = dev_get_drvdata(dev);
>   struct dsa_switch_driver *drv = ds->drv;
> - umode_t mode = attr->mode;
> + umode_t mode = 0;
>  
> - if (index == 1) {
> - if (!drv->get_temp_limit)
> - mode = 0;
> - else if (drv->set_temp_limit)
> + switch (index) {
> + case 1: /* temp1_max */
> + if (drv->get_temp_limit)
> + mode |= S_IRUGO;
> + if (drv->set_temp_limit)
>   mode |= S_IWUSR;
> - } else if (index == 2 && !drv->get_temp_alarm) {
> - mode = 0;
> + break;
> + case 2: /* temp1_max_alarm */
> + if (drv->get_temp_alarm)
> + mode |= S_IRUGO;
> + break;
> + default:
> + mode = attr->mode;
> + break;
>   }
> +
>   return mode;
>  }
>  
> -- 
> 2.3.5
> 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] ethtool: return 1 as exit code on a settings(-s) failure.

2015-04-16 Thread Ben Hutchings
On Thu, 2015-04-16 at 13:55 -0700, Sridhar Samudrala wrote:
> Currently 0 is returned on both success or failure.

Previously discussed here:
.

Ben.

> Signed-off-by: Sridhar Samudrala 
> ---
>  ethtool.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/ethtool.c b/ethtool.c
> index 01b13a6..163dff2 100644
> --- a/ethtool.c
> +++ b/ethtool.c
> @@ -2352,7 +2352,7 @@ static int do_sset(struct cmd_context *ctx)
>   int argc = ctx->argc;
>   char **argp = ctx->argp;
>   int i;
> - int err;
> + int err = 0;
>  
>   for (i = 0; i < ARRAY_SIZE(flags_msglvl); i++)
>   flag_to_cmdline_info(flags_msglvl[i].name,
> @@ -2665,7 +2665,7 @@ static int do_sset(struct cmd_context *ctx)
>   }
>   }
>  
> - return 0;
> + return err ? 1 : 0;
>  }
>  
>  static int do_gregs(struct cmd_context *ctx)

-- 
Ben Hutchings
Humour is the best antidote to reality.


signature.asc
Description: This is a digitally signed message part


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy
On 17.04, Herbert Xu wrote:
> On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
> >
> > So currently we have one fast path, that is: we are not fragmented, we
> > get out non-fragmented, none of this code is ever touched and no
> > problem.
> > 
> > We don't want to mak this more complex, but
> 
> You should read Dave's other email where he gives you an obvious
> solution.  If you have to modify the skb then you don't have to
> worry about the original fragments.
> 
> But if you only read the skb then don't linearise it completely
> and keep the original fragments.

Yes, I was just responding to that. We need an additional member
in the skb, but then this part is quite simple.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy
On 16.04, David Miller wrote:
> > Netfilter may change the contents of the packet, even change its size.
> > It is *really* hard to do this while keeping the original fragments
> > intact.
> 
> I keep hearing a lot of "it's hard" as the only reason we shouldn't do
> this properly, and that frankly sucks.  People aren't looking for a
> solution and to be honest it's quite tiring.
> 
> The common case is that the rules processed are simple, the size of
> the overall packet does _not_ change, and therefore the best thing
> to do is pass the entire thing as a unit with the frags in tact.
> 
> That's the fundamental fact.  It's also the fastest way to process
> these packets and avoids all of these stupid max frag garbage.
> 
> Only at the point where netfilter makes changes to the size of the
> packet does it take "ownership" and get to take on the responsibility
> of making sure the new resulting fragments are sane.
> 
> But only at that point.

Agreed, that part shouldn't be hard. We need to pass the defragmented
skb through the ruleset, meaning we need to pass it through the stack.
That's needed since the rules depend on this.

If we don't make changes, we can spit out the original fragments, but
for this we need to keep a reference to them from the skb. We still
need the max_frag_size thing, once a modification is made we drop the
frag list reference and just regulary refragment the modified skb
according to the limits.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] ethtool: return 1 as exit code on a settings(-s) failure.

2015-04-16 Thread Sridhar Samudrala
Currently 0 is returned on both success or failure.

Signed-off-by: Sridhar Samudrala 
---
 ethtool.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ethtool.c b/ethtool.c
index 01b13a6..163dff2 100644
--- a/ethtool.c
+++ b/ethtool.c
@@ -2352,7 +2352,7 @@ static int do_sset(struct cmd_context *ctx)
int argc = ctx->argc;
char **argp = ctx->argp;
int i;
-   int err;
+   int err = 0;
 
for (i = 0; i < ARRAY_SIZE(flags_msglvl); i++)
flag_to_cmdline_info(flags_msglvl[i].name,
@@ -2665,7 +2665,7 @@ static int do_sset(struct cmd_context *ctx)
}
}
 
-   return 0;
+   return err ? 1 : 0;
 }
 
 static int do_gregs(struct cmd_context *ctx)
-- 
1.8.4.2

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Herbert Xu
On Thu, Apr 16, 2015 at 06:13:25PM +0200, Hannes Frederic Sowa wrote:
>
> So currently we have one fast path, that is: we are not fragmented, we
> get out non-fragmented, none of this code is ever touched and no
> problem.
> 
> We don't want to mak this more complex, but

You should read Dave's other email where he gives you an obvious
solution.  If you have to modify the skb then you don't have to
worry about the original fragments.

But if you only read the skb then don't linearise it completely
and keep the original fragments.

Cheers,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Bluetooth: Pre-initialize variables in read_local_oob_ext_data_complete()

2015-04-16 Thread Marcel Holtmann
Hi Geert,

> net/bluetooth/mgmt.c: In function ‘read_local_oob_ext_data_complete’:
> net/bluetooth/mgmt.c:6474: warning: ‘r256’ may be used uninitialized in this 
> function
> net/bluetooth/mgmt.c:6474: warning: ‘h256’ may be used uninitialized in this 
> function
> net/bluetooth/mgmt.c:6474: warning: ‘r192’ may be used uninitialized in this 
> function
> net/bluetooth/mgmt.c:6474: warning: ‘h192’ may be used uninitialized in this 
> function
> 
> While these are false positives, the code can be shortened by
> pre-initializing the hash table pointers and eir_len. This has the side
> effect of killing the compiler warnings.

can you be a bit specific on which compiler version is this. I fixed one 
occurrence that seemed valid. However in this case the compiler seems to be 
just plain stupid. On a gcc 4.9, I am not seeing these for example.

Regards

Marcel

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Patrick McHardy
Am 16. April 2015 17:43:23 MESZ, schrieb David Miller :
>From: Hannes Frederic Sowa 
>Date: Thu, 16 Apr 2015 14:11:42 +0200
>
>> On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
>>> On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:
>>> >
>>> > Netfilter may change the contents of the packet, even change its
>size.
>>> > It is *really* hard to do this while keeping the original
>fragments
>>> > intact.
>>> 
>>> Perhaps we should provide better helpers to facilitate this?
>>> 
>>> So instead of directly manipulating the content of the skb you
>>> would so so through helpers and the helpers can then try to do
>>> sensible things with the fragments.
>> 
>> When Florian and me started discussing how to solve this we also
>wanted
>> to be as transparent as possible. But looking at all possible
>> fragmentation scenarios, this seems to be too complicated.
>> 
>> Even imagine a fragment with overlapping offsets and some of the
>> fragments got duplicated. If we had to keep this in frag_list and now
>> netfilter has to change any of this contents, this will become a
>total
>> mess, like changing one port in multiple skbs at different offsets.
>> 
>> I doubt it is worth the effort.
>
>You guys keep talking about exceptional cases rather than what is
>unquestionable the common case, and the one worth handling in the
>fast paths.

True. But its not like we haven't tried. What I keep hearing is lots of well 
meaning opinions, and the fact is, I've looked into this countless times, you 
can find my first attempt when googling from 11 years ago, and the resolution 
was always - it's not worth it. It's not "too hard", I wouldn't mind. And I 
fail to see the problem with that. We're not talking about a functional defect, 
it's something people (me included) think is "not nice".

I will try to recollect the problems the different approaches have in detail. 
I'm pretty sure you will come to the same conclusion as I have.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()

2015-04-16 Thread David Miller
From: Geert Uytterhoeven 
Date: Thu, 16 Apr 2015 20:49:14 +0200

> drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
> drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used uninitialized in 
> this function
> 
> If oldstate == state, mv88e6xxx_set_port_state() will return an
> uninitialized value. Pre-initialize ret to zero to fix this.
> 
> Signed-off-by: Geert Uytterhoeven 

Indeed, applied, thanks Geert.
N‹§²ζμrΈ›yϊθšΨb²X¬ΆΗ§vΨ^–)ήΊ{.nΗ+‰·§zΧ^Ύ)ν…
ζθw*jg¬±¨Ά‰šŽŠέ’j/κδzΉή–Šΰ2Šή™¨θ­Ϊ&’)ί‘«aΆΪώψ�G«ιh�ζj:+v‰¨Šwθ†Ω₯

[PATCH] Bluetooth: Pre-initialize variables in read_local_oob_ext_data_complete()

2015-04-16 Thread Geert Uytterhoeven
net/bluetooth/mgmt.c: In function ‘read_local_oob_ext_data_complete’:
net/bluetooth/mgmt.c:6474: warning: ‘r256’ may be used uninitialized in this 
function
net/bluetooth/mgmt.c:6474: warning: ‘h256’ may be used uninitialized in this 
function
net/bluetooth/mgmt.c:6474: warning: ‘r192’ may be used uninitialized in this 
function
net/bluetooth/mgmt.c:6474: warning: ‘h192’ may be used uninitialized in this 
function

While these are false positives, the code can be shortened by
pre-initializing the hash table pointers and eir_len. This has the side
effect of killing the compiler warnings.

Signed-off-by: Geert Uytterhoeven 
---
 net/bluetooth/mgmt.c | 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/net/bluetooth/mgmt.c b/net/bluetooth/mgmt.c
index 7fd87e7135b52753..f8e13b6d58279463 100644
--- a/net/bluetooth/mgmt.c
+++ b/net/bluetooth/mgmt.c
@@ -6471,9 +6471,9 @@ static void read_local_oob_ext_data_complete(struct 
hci_dev *hdev, u8 status,
 {
const struct mgmt_cp_read_local_oob_ext_data *mgmt_cp;
struct mgmt_rp_read_local_oob_ext_data *mgmt_rp;
-   u8 *h192, *r192, *h256, *r256;
+   u8 *h192 = NULL, *r192 = NULL, *h256 = NULL, *r256 = NULL;
struct mgmt_pending_cmd *cmd;
-   u16 eir_len;
+   u16 eir_len = 0;
int err;
 
BT_DBG("%s status %u", hdev->name, status);
@@ -6486,18 +6486,11 @@ static void read_local_oob_ext_data_complete(struct 
hci_dev *hdev, u8 status,
 
if (status) {
status = mgmt_status(status);
-   eir_len = 0;
-
-   h192 = NULL;
-   r192 = NULL;
-   h256 = NULL;
-   r256 = NULL;
} else if (opcode == HCI_OP_READ_LOCAL_OOB_DATA) {
struct hci_rp_read_local_oob_data *rp;
 
if (skb->len != sizeof(*rp)) {
status = MGMT_STATUS_FAILED;
-   eir_len = 0;
} else {
status = MGMT_STATUS_SUCCESS;
rp = (void *)skb->data;
@@ -6505,23 +6498,18 @@ static void read_local_oob_ext_data_complete(struct 
hci_dev *hdev, u8 status,
eir_len = 5 + 18 + 18;
h192 = rp->hash;
r192 = rp->rand;
-   h256 = NULL;
-   r256 = NULL;
}
} else {
struct hci_rp_read_local_oob_ext_data *rp;
 
if (skb->len != sizeof(*rp)) {
status = MGMT_STATUS_FAILED;
-   eir_len = 0;
} else {
status = MGMT_STATUS_SUCCESS;
rp = (void *)skb->data;
 
if (hci_dev_test_flag(hdev, HCI_SC_ONLY)) {
eir_len = 5 + 18 + 18;
-   h192 = NULL;
-   r192 = NULL;
} else {
eir_len = 5 + 18 + 18 + 18 + 18;
h192 = rp->hash192;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH iproute2 -next v2] tc: built-in eBPF exec proxy

2015-04-16 Thread Daniel Borkmann
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.

File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:

  # env | grep BPF
  BPF_NUM_MAPS=3
  BPF_MAP1=6<- BPF_MAP_ID_QUEUE (id 1)
  BPF_MAP0=5<- BPF_MAP_ID_PROTO (id 0)
  BPF_MAP2=7<- BPF_MAP_ID_DROPS (id 2)

  # ls -la /proc/self/fd
  [...]
  lrwx--. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
  lrwx--. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
  lrwx--. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
  [...]
  lrwx--. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
  lrwx--. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
  lrwx--. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map

The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.

To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.

Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.

Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa 
Signed-off-by: Daniel Borkmann 
Reviewed-by: Hannes Frederic Sowa 
Acked-by: Alexei Starovoitov 
---
 ( Stephen, this applies to current net-next branch of iproute2. )
 v1 -> v2:
  - s/bpf_map_set_env/bpf_map_get_from_env/ as suggested by Alexei
  - Rest as is, also kept reviewer acks

 examples/bpf/bpf_agent.c |  55 ++
 examples/bpf/bpf_prog.c  |  27 +
 tc/Makefile  |   7 ++-
 tc/e_bpf.c   | 147 +++
 tc/f_bpf.c   |   2 +-
 tc/m_action.c|   2 +-
 tc/m_bpf.c   |   2 +-
 tc/tc.c  |   9 +--
 tc/tc_bpf.c  |  98 ---
 tc/tc_bpf.h  |  15 -
 tc/tc_common.h   |   3 +
 tc/tc_exec.c | 109 +++
 tc/tc_filter.c   |   2 +-
 tc/tc_util.h |  16 --
 14 files changed, 454 insertions(+), 40 deletions(-)
 create mode 100644 tc/e_bpf.c
 create mode 100644 tc/tc_exec.c

diff --git a/examples/bpf/bpf_agent.c b/examples/bpf/bpf_agent.c
index 0f481b1..426b880 100644
--- a/examples/bpf/bpf_agent.c
+++ b/examples/bpf/bpf_agent.c
@@ -24,6 +24,8 @@
  *   -- Happy eBPF hacking! ;)
  */
 
+#define _GNU_SOURCE
+
 #include 
 #include 
 #include 
@@ -31,6 +33,7 @@
 #include 
 #include 
 #include 
+
 #include 
 #include 
 #include 
@@ -102,6 +105,23 @@ static void bpf_dump_proto(int fd)
printf("\n");
 }
 
+static void bpf_dump_map_data(int *tfd)
+{
+   int i;
+
+   for (i = 0; i < 30; i++) {
+   const int period = 5;
+
+   printf("data, period: %dsec\n", period);
+
+   bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]);
+   bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]);
+   bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]);
+
+   sleep(period);
+   }
+}
+
 static void bpf_info_loop(int *fds, struct bpf_map_aux *aux)
 {
int i, tfd[BPF_MAP_ID_MAX];
@@ -122,16 +142,22 @@ static void bpf_info_loop(int *fds, struct bpf_map_aux 
*aux)
tfd[aux->ent[i].id] = fds[i];
}
 
-   for (i = 0; i < 30; i++) {
-   int period = 5;
+   bpf_dump_map_data(tfd);
+}
 
-   printf("data, period: %dsec\n", period);
+static void bpf_map_get_from_env(int *tfd)
+{
+   char key[64], *val;
+   int i;
 
-   bpf_dump_drops(tfd[BPF_MAP_ID_DROPS]);
-   bpf_dump_queue(tfd[BPF_MAP_ID_QUEUE]);
-   bpf_dump_proto(tfd[BPF_MAP_ID_PROTO]);
+   for (i = 0; i < BPF_MAP_ID_MAX; i++) {
+   memset(key, 0, sizeof(key));
+   snprintf(key, sizeof(key), "BPF_MAP%d", i);
 
-   sleep(period);
+   val = secure_getenv(key);
+   assert(val != NULL);
+
+   tfd[i] = atoi(val);
}
 }
 
@@ -186,9 +212,17 @@ int main(int argc, char **argv)
struct sockaddr_un addr;
int fd, ret, i;
 
-   if (argc < 2) {
-  

Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Vivien Didelot
Hello Sergei,

> > Since commit da4759c, sysfs will only use the permissions returned by
> 
> Please also specify that commit's summary line in parens.

Duly noted.

da4759c is "sysfs: Use only return value from is_visible for the file mode"
(see: 
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=da4759c)

Best,
-v
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Or Gerlitz
On Thu, Apr 16, 2015, Yann Droneaud  wrote:
> Hi,
>
> Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit :
>> Currently, iflink of the parent interface was always accessed, even
>> when interface didn't have a parent and hence we crashed there.
>
> + since commit 5aa7add8f14b ('infiniband/ipoib: implement
> ndo_get_iflink').
>
> as there was no crash here before AFAIK.

Disagree, it's redundant, as the Fixes tag below points to this commit


>> Handle the interface types properly: for a child interface, return
>> the ifindex of the parent, for parent interface, return its ifindex.
>>
>> For child devices, make sure to set the parent pointer prior to
>> invoking register_netdevice(), this allows the new ndo to be called
>> by the stack immediately after the child device is registered.
>>
>> Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
>
> Cc: Nicolas Dichtel 

Ditto, I find it enough to just copy Nicholas on the patch submission,
as we did here.

Or.

>> Reported-by: Honggang Li 
>> Signed-off-by: Erez Shitrit 
>> Signed-off-by: Honggang Li 
>> ---
>>
>> changes from V0:
>>  - fixed two typos in the change-log
>>
>>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
>>  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
>> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> index 657b89b..915ad04 100644
>> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
>> @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device 
>> *dev)
>>  {
>>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>>
>> + /* parent interface */
>> + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
>> + return dev->ifindex;
>> +
>> + /* child/vlan interface */
>>   return priv->parent->ifindex;
>>  }
>>
>> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
>> b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
>> index 4dd1313..fca1a88 100644
>> --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
>> +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
>> @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
>> ipoib_dev_priv *priv,
>>   /* MTU will be reset when mcast join happens */
>>   priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
>>   priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
>> + priv->parent = ppriv->dev;
>>   set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
>>
>>   result = ipoib_set_dev_features(priv, ppriv->ca);
>> @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
>> ipoib_dev_priv *priv,
>>   goto register_failed;
>>   }
>>
>> - priv->parent = ppriv->dev;
>> -
>>   ipoib_create_debug_files(priv->dev);
>>
>>   /* RTNL childs don't need proprietary sysfs entries */
>
> Regards.
>
> --
> Yann Droneaud
> OPTEYA
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Sergei Shtylyov

Hello.

On 04/16/2015 09:38 PM, Vivien Didelot wrote:


Since commit da4759c, sysfs will only use the permissions returned by


   Please also specify that commit's summary line in parens.


is_visible, instead of OR'ing them with the default file mode.



This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and
just return the desired permissions for the hwmon sysfs attributes in
dsa_hwmon_attrs_visible.



Also, allow temp1_max to be write-only if set_temp_limit is provided,
but not get_temp_limit.



Signed-off-by: Vivien Didelot 


WBR, Sergei

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: ioremap_uc() followed by set_memory_wc() - burrying MTRR

2015-04-16 Thread Luis R. Rodriguez
On Thu, Apr 16, 2015 at 01:18:37PM +0900, Hyong-Youb Kim wrote:
> On Thu, Apr 16, 2015 at 01:58:16AM +0200, Luis R. Rodriguez wrote:
> > 
> > An alternative... is to just ioremap_wc() the entire region, including
> > MMIO registers for these old devices. I see one ethernet driver that does
> > this, myri10ge, and am curious how and why they ended up deciding this
> > and if they have run into any issues. I wonder if this is a reasonable
> > comrpomise for these 2 remaining corner cases.
> > 
> 
> For myri10ge, it a performance thing.  Descriptor rings are in NIC
> memory BAR0, not in host memory.  Say, to send a packet, the driver
> writes the send descriptor to the ioremap'd NIC memory.  It is a
> multi-word descriptor.  So, to send it as one PCIE MWr transaction,
> the driver maps the whole BAR0 as WC and does "copy descriptor; wmb".

Interesting, so you burst write multi-word descriptor writes using
write-combining here for the Ethernet device.

> Without WC, descriptors would end up as multiple 4B or 8B MWr packets
> to the NIC, which has a pretty big performance impact on this
> particular NIC.

How big are the descriptors?

> Most registers that do not want WC are actually in BAR2, which is not
> mapped as WC.  For registers that are in BAR0, we do "write to the
> register; wmb".  If we want to wait till the NIC has seen the write,
> we do "write; wmb; read".

Interesting, thanks, yeah using this as a work around to the problem sounds
plausible however it still would require likely making just as many changes to
the ivtv and ipath driver as to just do a proper split. I do wonder however if
this sort of work around can be generalized somehow though so that others could
use, if this sort of thing is going to become prevalent. If so then this would
serve two purposes: work around for the corner cases of MTRR use on Linux and
also these sorts of device constraints.

In order to determine if this is likely to be generally useful could you 
elaborate
a bit more about the detals of the performance issues of not bursting writes
for the descriptor on this device.

Even if that is done a conversion over to this work around seems it may require
device specific nitpicks. For instance I note in myri10ge_submit_req() for
small writes you just do a reverse write and do the first set last, then
finally the last 32 bits are rewritten, I guess to trigger something?

> This approach has worked for this device for many years.  I cannot say
> whether it works for other devices, though.

I think it should but the more interesting question would be exactly
*why* it was needed for this device, who determined that, and why?

  Luis
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Geert Uytterhoeven
On Thu, Apr 16, 2015 at 3:46 PM, Guenter Roeck  wrote:
> On 04/15/2015 10:12 PM, Guenter Roeck wrote:
>>
>> Return correct error code if _mv88e6xxx_reg_read returns an error.
>>
>> Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging
>> support")
>> Signed-off-by: Guenter Roeck 
>
>
> I should have given proper credit.
>
> Reported-by: kbuild test robot 
>
> For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
> at least not with gcc 4.9.1.

Good old gcc 4.1.2 (which I still use for m68k builds, and won't retire anytime
soon as it finds real bugs in every new kernel release) says:

drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used
uninitialized in this function

Even after your patch, as it missed one case (patch sent).

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- ge...@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] net: dsa: mv88e6xxx: Add missing initialization in mv88e6xxx_set_port_state()

2015-04-16 Thread Geert Uytterhoeven
drivers/net/dsa/mv88e6xxx.c: In function ‘mv88e6xxx_set_port_state’:
drivers/net/dsa/mv88e6xxx.c:905: warning: ‘ret’ may be used uninitialized in 
this function

If oldstate == state, mv88e6xxx_set_port_state() will return an
uninitialized value. Pre-initialize ret to zero to fix this.

Signed-off-by: Geert Uytterhoeven 
---
 drivers/net/dsa/mv88e6xxx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index f64186a5f63453f3..859a332fe5cd0251 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -902,7 +902,7 @@ static int _mv88e6xxx_flush_fid(struct dsa_switch *ds, int 
fid)
 static int mv88e6xxx_set_port_state(struct dsa_switch *ds, int port, u8 state)
 {
struct mv88e6xxx_priv_state *ps = ds_to_priv(ds);
-   int reg, ret;
+   int reg, ret = 0;
u8 oldstate;
 
mutex_lock(&ps->smi_mutex);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] net: dsa: register hwmon for any provided function

2015-04-16 Thread Vivien Didelot
A switch driver may only provide one of the temperature limit accessors,
or the temperature alarm getter. So register the hwmon subsystem if any
of the related functions is provided.

Thus, check get_temp to set the visibility of temp1_input.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 67d2983..6b68994 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -157,6 +157,10 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
*kobj,
umode_t mode = 0;
 
switch (index) {
+   case 0: /* temp1_input */
+   if (drv->get_temp)
+   mode |= S_IRUGO;
+   break;
case 1: /* temp1_max */
if (drv->get_temp_limit)
mode |= S_IRUGO;
@@ -310,7 +314,8 @@ static int dsa_switch_setup_one(struct dsa_switch *ds, 
struct device *parent)
 * register with hardware monitoring subsystem.
 * Treat registration error as non-fatal and ignore it.
 */
-   if (drv->get_temp) {
+   if (drv->get_temp || drv->get_temp_limit || drv->set_temp_limit ||
+   drv->get_temp_alarm) {
const char *netname = netdev_name(dst->master_netdev);
char hname[IFNAMSIZ + 1];
int i, j;
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] net: dsa: use DEVICE_ATTR_RW to declare temp1_max

2015-04-16 Thread Vivien Didelot
Since commit da4759c, sysfs will only use the permissions returned by
is_visible, instead of OR'ing them with the default file mode.

This allows us to declare temp1_max with the DEVICE_ATTR_RW macro and
just return the desired permissions for the hwmon sysfs attributes in
dsa_hwmon_attrs_visible.

Also, allow temp1_max to be write-only if set_temp_limit is provided,
but not get_temp_limit.

Signed-off-by: Vivien Didelot 
---
 net/dsa/dsa.c | 24 
 1 file changed, 16 insertions(+), 8 deletions(-)

diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index 5eaadab..67d2983 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -124,7 +124,7 @@ static ssize_t temp1_max_store(struct device *dev,
 
return count;
 }
-static DEVICE_ATTR(temp1_max, S_IRUGO, temp1_max_show, temp1_max_store);
+static DEVICE_ATTR_RW(temp1_max);
 
 static ssize_t temp1_max_alarm_show(struct device *dev,
struct device_attribute *attr, char *buf)
@@ -154,16 +154,24 @@ static umode_t dsa_hwmon_attrs_visible(struct kobject 
*kobj,
struct device *dev = container_of(kobj, struct device, kobj);
struct dsa_switch *ds = dev_get_drvdata(dev);
struct dsa_switch_driver *drv = ds->drv;
-   umode_t mode = attr->mode;
+   umode_t mode = 0;
 
-   if (index == 1) {
-   if (!drv->get_temp_limit)
-   mode = 0;
-   else if (drv->set_temp_limit)
+   switch (index) {
+   case 1: /* temp1_max */
+   if (drv->get_temp_limit)
+   mode |= S_IRUGO;
+   if (drv->set_temp_limit)
mode |= S_IWUSR;
-   } else if (index == 2 && !drv->get_temp_alarm) {
-   mode = 0;
+   break;
+   case 2: /* temp1_max_alarm */
+   if (drv->get_temp_alarm)
+   mode |= S_IRUGO;
+   break;
+   default:
+   mode = attr->mode;
+   break;
}
+
return mode;
 }
 
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Revert "net: Reset secmark when scrubbing packet"

2015-04-16 Thread David Miller
From: Herbert Xu 
Date: Thu, 16 Apr 2015 16:12:53 +0800

> On Thu, Apr 16, 2015 at 05:02:15PM +1000, James Morris wrote:
>> 
>> They don't support namespaces, and maintaining the label is critical for 
>> SELinux, at least, which mediates security for the system as a whole.
> 
> Thanks for the confirmation James, I thought this looked a bit
> dodgy :)
> 
> ---8<---
> This patch reverts commit b8fb4e0648a2ab3734140342002f68fb0c7d1602
> because the secmark must be preserved even when a packet crosses
> namespace boundaries.  The reason is that security labels apply to
> the system as a whole and is not per-namespace.
> 
> Signed-off-by: Herbert Xu 

Applied and queued up for -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [v3] skbuff: Do not scrub skb mark within the same name space

2015-04-16 Thread David Miller
From: Thomas Graf 
Date: Thu, 16 Apr 2015 09:33:35 +0100

> On 04/16/15 at 09:03am, Herbert Xu wrote:
>> The commit ea23192e8e577dfc51e0f4fc5ca113af334edff9 ("tunnels:
>> harmonize cleanup done on skb on rx path") broke anyone trying to
>> use netfilter marking across IPv4 tunnels.  While most of the
>> fields that are cleared by skb_scrub_packet don't matter, the
>> netfilter mark must be preserved.
>> 
>> This patch rearranges skb_scrub_packet to preserve the mark field.
>> 
>> Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path")
>> Signed-off-by: Herbert Xu 
> 
> Acked-by: Thomas Graf 
> 
> We should also add a flag to veth which expclitly allows to preserve
> the mark into the namespace.

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread David Miller
From: Bryan O'Donoghue 
Date: Thu, 16 Apr 2015 17:56:03 +0100

> An oops exists in the flow of stmmac_release().
> phy_ethtool_get_wol() depends on phydev->drv.
> phydev->drv will be null after stmmac_mdio_unreg() completes.
> 
> Steps to reproduce on Quark X1000:
> 
> 1. ifconfig eth0 192.168.0.1
> 2. rmmod stmmac_pci
> 
> To fix this stmmac_mdio_unreg() should be run after unregister_netdev().
> 
> Signed-off-by: Bryan O'Donoghue 
> Reported-by: Dan O'Donovan 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread David Miller
From: Michael Chan 
Date: Thu, 16 Apr 2015 09:39:13 -0700

> On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
>> Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
>> supposed to synchronize as well.
>> 
> 
> For small rx packets (< 256 bytes), we sync the DMA buffer before we
> copy the data to another SKB.  For larger packets, we unmap the DMA
> buffer.  Do we see the corruption in both cases?

I wonder about that prefetch which is done before the DMA sync.

Also we should think about whether that DMA sync applies to the proper
region.  The 'len' is relative to "data+TG3_RX_OFFSET" yet if I read
the code correctly we are effectively sync'ing from 'data' to
'data+len'.

There is some bug hiding in here I think...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bpf: fix two bugs in verification logic when accessing 'ctx' pointer

2015-04-16 Thread David Miller
From: Alexei Starovoitov 
Date: Wed, 15 Apr 2015 16:19:33 -0700

> 1.
> first bug is a silly mistake. It broke tracing examples and prevented
> simple bpf programs from loading.
 ...
> 2.
> second bug is more subtle.
> If malicious code is using the same dest register as source register,
> the checks designed to prevent the same instruction to be used with different
> pointer types will fail to trigger, since we were assigning src_reg_type
> when it was already overwritten by check_mem_access().
> The fix is trivial. Just move line:
> src_reg_type = regs[insn->src_reg].type;
> before check_mem_access().
> Add new 'access skb fields bad4' test to check this case.
> 
> Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
> Signed-off-by: Alexei Starovoitov 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net 0/5] stmmac: Correct flow control configuration

2015-04-16 Thread David Miller
From: Vince Bridgers 
Date: Wed, 15 Apr 2015 11:17:37 -0500

> This series of patches corrects flow control configuration for the Synopsys
> GMAC driver.
> 
> Flow control is configured based on a configurable receive fifo size. If
> less than 4Kbytes flow control is left disabled and a warning is presented. If
> a receive fifo size is not specified, flow control is left disabled to
> maintain current behavior. Unicast pause detection was disabled, but is now 
> enabled. The pause time was modified to be maximum time per a XON/XOFF
> flow control mode of operation.
> 
> This patch was tested on an Altera Cyclone 5 and an Altera Arria 10 devkit,
> and verified that flow control operates as expected when enabled.
> 
> Please consider this series for inclusion so that flow control will
> function as expected for the Synopsys GMAC controller. 

Series applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bpf: fix bpf helpers to use skb->mac_header relative offsets

2015-04-16 Thread David Miller
From: Alexei Starovoitov 
Date: Wed, 15 Apr 2015 12:55:45 -0700

> For the short-term solution, lets fix bpf helper functions to use
> skb->mac_header relative offsets instead of skb->data in order to
> get the same eBPF programs with cls_bpf and act_bpf work on ingress
> and egress qdisc path. We need to ensure that mac_header is set
> before calling into programs. This is effectively the first option
> from below referenced discussion.
> 
> More long term solution for LD_ABS|LD_IND instructions will be more
> intrusive but also more beneficial than this, and implemented later
> as it's too risky at this point in time.
> 
> I.e., we plan to look into the option of moving skb_pull() out of
> eth_type_trans() and into netif_receive_skb() as has been suggested
> as second option. Meanwhile, this solution ensures ingress can be
> used with eBPF, too, and that we won't run into ABI troubles later.
> For dealing with negative offsets inside eBPF helper functions,
> we've implemented bpf_skb_clone_unwritable() to test for unwriteable
> headers.
> 
> Reference: http://thread.gmane.org/gmane.linux.network/359129/focus=359694
> Fixes: 608cd71a9c7c ("tc: bpf: generalize pedit action")
> Fixes: 91bc4822c3d6 ("tc: bpf: add checksum helpers")
> Signed-off-by: Alexei Starovoitov 
> Signed-off-by: Daniel Borkmann 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 -next] tc: built-in eBPF exec proxy

2015-04-16 Thread Daniel Borkmann

On 04/16/2015 07:48 PM, Alexei Starovoitov wrote:

On 4/15/15 7:52 AM, Daniel Borkmann wrote:


File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:


Amazing that it worked.
Acked-by: Alexei Starovoitov 


+static void bpf_map_set_env(int *tfd)
+{
+char key[64], *val;
+int i;
+for (i = 0; i < BPF_MAP_ID_MAX; i++) {
+memset(key, 0, sizeof(key));
+snprintf(key, sizeof(key), "BPF_MAP%d", i);

+val = secure_getenv(key);
+assert(val != NULL);


everything looks good. My only nit is that the name of the function
reads as this function is setting env vars, whereas it's actually
reading them. I guess in your mind it fits with the rest of
'bpf_map_set_*' functions, but the name is still confusing.


Ok, since it's example code, I'll find a better function name for it,
keep yours and Hannes' tags and resubmit. Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH iproute2 -next] tc: built-in eBPF exec proxy

2015-04-16 Thread Alexei Starovoitov

On 4/15/15 7:52 AM, Daniel Borkmann wrote:


File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:


Amazing that it worked.
Acked-by: Alexei Starovoitov 


+static void bpf_map_set_env(int *tfd)
+{
+   char key[64], *val;
+   int i;
+   for (i = 0; i < BPF_MAP_ID_MAX; i++) {
+   memset(key, 0, sizeof(key));
+   snprintf(key, sizeof(key), "BPF_MAP%d", i);

+   val = secure_getenv(key);
+   assert(val != NULL);


everything looks good. My only nit is that the name of the function
reads as this function is setting env vars, whereas it's actually
reading them. I guess in your mind it fits with the rest of
'bpf_map_set_*' functions, but the name is still confusing.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Ian Jackson
Michael Chan writes ("Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more 
messages]"):
> On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
> > Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
> > supposed to synchronize as well.
> 
> For small rx packets (< 256 bytes), we sync the DMA buffer before we
> copy the data to another SKB.  For larger packets, we unmap the DMA
> buffer.  Do we see the corruption in both cases?

Yes, at least with swiotlb=force iommu=soft.

Ian.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Yann Droneaud
Hi,

Le jeudi 16 avril 2015 à 16:34 +0300, Erez Shitrit a écrit :
> Currently, iflink of the parent interface was always accessed, even 
> when interface didn't have a parent and hence we crashed there.

+ since commit 5aa7add8f14b ('infiniband/ipoib: implement
ndo_get_iflink').

as there was no crash here before AFAIK.

> 
> Handle the interface types properly: for a child interface, return
> the ifindex of the parent, for parent interface, return its ifindex.
> 
> For child devices, make sure to set the parent pointer prior to
> invoking register_netdevice(), this allows the new ndo to be called
> by the stack immediately after the child device is registered.
> 
> Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')

Cc: Nicolas Dichtel 

> Reported-by: Honggang Li 
> Signed-off-by: Erez Shitrit 
> Signed-off-by: Honggang Li 
> ---
> 
> changes from V0:
>  - fixed two typos in the change-log
> 
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
>  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 657b89b..915ad04 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>  
> + /* parent interface */
> + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
> + return dev->ifindex;
> +
> + /* child/vlan interface */
>   return priv->parent->ifindex;
>  }
>  
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
> index 4dd1313..fca1a88 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
> @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
> ipoib_dev_priv *priv,
>   /* MTU will be reset when mcast join happens */
>   priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
>   priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
> + priv->parent = ppriv->dev;
>   set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
>  
>   result = ipoib_set_dev_features(priv, ppriv->ca);
> @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
> ipoib_dev_priv *priv,
>   goto register_failed;
>   }
>  
> - priv->parent = ppriv->dev;
> -
>   ipoib_create_debug_files(priv->dev);
>  
>   /* RTNL childs don't need proprietary sysfs entries */

Regards.

-- 
Yann Droneaud
OPTEYA


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread Bryan O'Donoghue
An oops exists in the flow of stmmac_release().
phy_ethtool_get_wol() depends on phydev->drv.
phydev->drv will be null after stmmac_mdio_unreg() completes.

Steps to reproduce on Quark X1000:

1. ifconfig eth0 192.168.0.1
2. rmmod stmmac_pci

To fix this stmmac_mdio_unreg() should be run after unregister_netdev().

Signed-off-by: Bryan O'Donoghue 
Reported-by: Dan O'Donovan 
---
 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c 
b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 06103ca..6065173 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -2970,15 +2970,15 @@ int stmmac_dvr_remove(struct net_device *ndev)
priv->hw->dma->stop_tx(priv->ioaddr);
 
stmmac_set_mac(priv->ioaddr, false);
-   if (priv->pcs != STMMAC_PCS_RGMII && priv->pcs != STMMAC_PCS_TBI &&
-   priv->pcs != STMMAC_PCS_RTBI)
-   stmmac_mdio_unregister(ndev);
netif_carrier_off(ndev);
unregister_netdev(ndev);
if (priv->stmmac_rst)
reset_control_assert(priv->stmmac_rst);
clk_disable_unprepare(priv->pclk);
clk_disable_unprepare(priv->stmmac_clk);
+   if (priv->pcs != STMMAC_PCS_RGMII && priv->pcs != STMMAC_PCS_TBI &&
+   priv->pcs != STMMAC_PCS_RTBI)
+   stmmac_mdio_unregister(ndev);
free_netdev(ndev);
 
return 0;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v3 0/1] stmmac: rmmod oops after ifconfig

2015-04-16 Thread Bryan O'Donoghue
We have an oops with stmmac on Quark X1000/Galileo, triggered by rmmod after
ifconfig. Fix for issue contained in next mail against
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git


root@clanton:~# ifconfig eth0 192.168.0.1
root@clanton:~# rmmod stmmac_pci
[   39.257871] stmmac_dvr_remove:
removing driver
[   39.263618] BUG: unable to handle kernel NULL pointer dereference at
0060
[   39.271030] IP: [] phy_ethtool_get_wol+0x2/0x20
[   39.272391] *pdpt = 0e682001 *pde =  
[   39.272391] Oops:  [#1] 
[   39.272391] Modules linked in: evdev usbhid btusb bluetooth iwlwifi
cfg80211 rfkill ad7298 industrialio_triggered_buffer kfifo_buf industrialio
spi_s
[   39.272391] CPU: 0 PID: 1245 Comm: rmmod Not tainted 4.0.0 #26
[   39.272391] Hardware name: Intel Corp. QUARK/Galileo, BIOS 0x0350
01/01/2014
[   39.272391] task: ce52a9e0 ti: cea64000 task.ti: cea64000
[   39.272391] EIP: 0060:[] EFLAGS: 00010292 CPU: 0
[   39.272391] EIP is at phy_ethtool_get_wol+0x2/0x20
[   39.272391] EAX: ce59e400 EBX: ce59e400 ECX:  EDX: cea65ddc
[   39.272391] ESI:  EDI: cea65e80 EBP: cea65df8 ESP: cea65dd8
[   39.272391]  DS: 007b ES: 007b FS:  GS: 0033 SS: 0068
[   39.272391] CR0: 8005003b CR2: 0060 CR3: 0e641000 CR4: 00100030
[   39.272391] Stack:
[   39.272391]  c1255908 0005    
ce59e400 ce85b4e0
[   39.272391]  cea65e04 c125597c ce59e400 cea65e10 c12559cf ce85b000
cea65e24 e07a9dd8
[   39.272391]  cea65e38 ce85b000 cea65e80 cea65e44 c12a7639 cea0cc60
cdd4a180 c009ad00
[   39.272391] Call Trace:
[   39.272391]  [] ? phy_suspend+0x38/0x70
[   39.272391]  [] phy_detach+0x3c/0x60
[   39.272391]  [] phy_disconnect+0x2f/0x40
[   39.272391]  [] stmmac_release+0x38/0x150 [stmmac]
[   39.272391]  [] __dev_close_many+0x69/0xb0
[   39.272391]  [] dev_close_many+0x5d/0xd0
[   39.272391]  [] rollback_registered_many+0xda/0x240
[   39.272391]  [] rollback_registered+0x1f/0x30
[   39.272391]  [] unregister_netdevice_queue+0x47/0xb0
[   39.272391]  [] ? mutex_lock+0xb/0x19
[   39.272391]  [] unregister_netdev+0x14/0x20
[   39.272391]  [] stmmac_dvr_remove+0x6d/0xb0 [stmmac]
[   39.272391]  [] stmmac_pci_remove+0xe/0x10 [stmmac_pci]
[   39.272391]  [] pci_device_remove+0x28/0xa0
[   39.272391]  [] __device_release_driver+0x5a/0xb0
[   39.272391]  [] driver_detach+0x6f/0x80
[   39.272391]  [] bus_remove_driver+0x3b/0x80
[   39.272391]  [] driver_unregister+0x23/0x60
[   39.272391]  [] ? find_module_all+0x5a/0x80
[   39.272391]  [] pci_unregister_driver+0xf/0x50
[   39.272391]  [] stmmac_pci_driver_exit+0xd/0xc3a [stmmac_pci]
[   39.272391]  [] SyS_delete_module+0x15c/0x1b0
[   39.272391]  [] syscall_call+0x7/0x7
[   39.272391] Code: 42 12 31 c0 c3 8d 74 26 00 8b 08 8b 49 5c 85 c9 74 07
55 89 e5 ff d1 5d c3 b8 a1 ff ff ff c3 8d 76 00 8d bc 27 00 00 00 00 8b 08
<0
[   39.272391] EIP: [] phy_ethtool_get_wol+0x2/0x20 SS:ESP
0068:cea65dd8
[   39.272391] CR2: 0060
[   39.551363] ---[ end trace 98d1260353bf4fec ]---
Killed

Bryan O'Donoghue (1):
  stmmac: fix oops on rmmod after assigning ip addr

 drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

-- 
1.9.1
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: tg3 NIC driver bug in 3.14.x under Xen [and 3 more messages]

2015-04-16 Thread Michael Chan
On Thu, 2015-04-16 at 09:24 -0300, casca...@linux.vnet.ibm.com wrote: 
> Yes, this looks like the driver is not syncing the DMA buffers. Unmap is
> supposed to synchronize as well.
> 

For small rx packets (< 256 bytes), we sync the DMA buffer before we
copy the data to another SKB.  For larger packets, we unmap the DMA
buffer.  Do we see the corruption in both cases?

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread David Miller
From: Bryan O'Donoghue 
Date: Thu, 16 Apr 2015 17:20:50 +0100

> I'll spin again against :
> https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/

Currently active tree is 'net', not 'net-next'
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH stable 3.10-3.16] tcp: Fix crash in TCP Fast Open

2015-04-16 Thread Luis Henriques
On Wed, Apr 15, 2015 at 07:00:32PM +0100, Ben Hutchings wrote:
> Commit 355a901e6cf1 ("tcp: make connect() mem charging friendly")
> changed tcp_send_syn_data() to perform an open-coded copy of the 'syn'
> skb rather than using skb_copy_expand().
> 
> The open-coded copy does not cover the skb_shared_info::gso_segs
> field, so in the new skb it is left set to 0.  When this commit was
> backported into stable branches between 3.10.y and 3.16.7-ckty
> inclusive, it triggered the BUG() in tcp_transmit_skb().
> 
> Since Linux 3.18 the GSO segment count is kept in the
> tcp_skb_cb::tcp_gso_segs field and tcp_send_syn_data() does copy the
> tcp_skb_cb structure to the new skb, so mainline and newer stable
> branches are not affected.
> 
> Set skb_shared_info::gso_segs to the correct value of 1.
> 
> Signed-off-by: Ben Hutchings 

Thanks a lot, Ben.  I'll queue this for the next 3.16 kernel release.

Cheers,
--
Luís

> ---
>  net/ipv4/tcp_output.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index d5457e4..1ea0a07 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2992,6 +2992,7 @@ static int tcp_send_syn_data(struct sock *sk, struct 
> sk_buff *syn)
>   goto fallback;
>   syn_data->ip_summed = CHECKSUM_PARTIAL;
>   memcpy(syn_data->cb, syn->cb, sizeof(syn->cb));
> + skb_shinfo(syn_data)->gso_segs = 1;
>   if (unlikely(memcpy_fromiovecend(skb_put(syn_data, space),
>fo->data->msg_iov, 0, space))) {
>   kfree_skb(syn_data);
> 
> -- 
> Ben Hutchings
> Editing code like this is akin to sticking plasters on the bleeding stump
> of a severed limb. - me, 29 June 1999

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread Bryan O'Donoghue

On 16/04/15 17:07, David Miller wrote:

This patch does not apply to the current tree, please respin.



/facepalm

Did this against :
https://github.com/torvalds/linux.git

I'll spin again against :
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Drop duplicate declaration of 'ret' variable

2015-04-16 Thread David Miller
From: Guenter Roeck 
Date: Thu, 16 Apr 2015 06:49:50 -0700

> A duplicate declaration of 'ret' can result in hiding an error code.
> Drop it.
> 
> Fixes: 17ee3e04ddbf ("net: dsa: Provide additional RMON statistics")
> Signed-off-by: Guenter Roeck 

Applied, thank you.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] ethernet: remove unused including

2015-04-16 Thread David Miller
From: weiyj...@163.com
Date: Thu, 16 Apr 2015 21:06:47 +0800

> From: Wei Yongjun 
> 
> Remove including  that don't need it.
> 
> Signed-off-by: Wei Yongjun 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] netns: remove duplicated include from net_namespace.c

2015-04-16 Thread David Miller
From: weiyj...@163.com
Date: Thu, 16 Apr 2015 21:17:35 +0800

> From: Wei Yongjun 
> 
> Remove duplicated include.
> 
> Signed-off-by: Wei Yongjun 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread Hannes Frederic Sowa
Hi David,

On Thu, Apr 16, 2015, at 17:43, David Miller wrote:
> From: Hannes Frederic Sowa 
> Date: Thu, 16 Apr 2015 14:11:42 +0200
> 
> > On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
> >> On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:
> >> >
> >> > Netfilter may change the contents of the packet, even change its size.
> >> > It is *really* hard to do this while keeping the original fragments
> >> > intact.
> >> 
> >> Perhaps we should provide better helpers to facilitate this?
> >> 
> >> So instead of directly manipulating the content of the skb you
> >> would so so through helpers and the helpers can then try to do
> >> sensible things with the fragments.
> > 
> > When Florian and me started discussing how to solve this we also wanted
> > to be as transparent as possible. But looking at all possible
> > fragmentation scenarios, this seems to be too complicated.
> > 
> > Even imagine a fragment with overlapping offsets and some of the
> > fragments got duplicated. If we had to keep this in frag_list and now
> > netfilter has to change any of this contents, this will become a total
> > mess, like changing one port in multiple skbs at different offsets.
> > 
> > I doubt it is worth the effort.
> 
> You guys keep talking about exceptional cases rather than what is
> unquestionable the common case, and the one worth handling in the
> fast paths.

So currently we have one fast path, that is: we are not fragmented, we
get out non-fragmented, none of this code is ever touched and no
problem.

We don't want to mak this more complex, but

every other solution would need to first expand sk_buff with a new
member(!), then:
  1) push the original fragments through netfilter - I think this will
  definitely break user space compatibility
  2) add new wrappers into the fast path which now will make sure that
  we access the correct skb from the frag_list instead of just
  dereference the pointer - we will make fast path also more slowly
  and a lot more complex
  3) use the reassembled skb for netfilter and emit the original ones -
  seems like a security hazard to me, discrepancy what is 
  checked and what is used
  4) just use the frag_list skb->len's to restore the reassembled skb to
  the original sizes, this would be possible but we would still harm 
  fast-path with that (new skb member) we could switch to
  dynamically allocated array of int[] to store sizes and DF bit
  information
  In particular, even this would not achieve the goal to have
  perfect transparency.

I can give you a more fundamental problem:
We cannot even be transparent if asymmetric routing is in effect and we
don't see all fragments. In case we have contrack rules active, we clear
fragment queues after some time. So a bridge actually will toss
offset!=0 packets if it never sees the head. If we let the packet pass,
we might have a security problem. We can never become fully transparent.

Bye,
Hannes
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread David Miller
From: Guenter Roeck 
Date: Wed, 15 Apr 2015 22:12:42 -0700

> Return correct error code if _mv88e6xxx_reg_read returns an error.
> 
> Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging support")
> Signed-off-by: Guenter Roeck 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rocker: fix error return code in rocker_probe()

2015-04-16 Thread David Miller
From: weiyj...@163.com
Date: Thu, 16 Apr 2015 20:21:02 +0800

> From: Wei Yongjun 
> 
> Fix to return -EINVAL from the invalid PCI region size error
> handling case instead of 0, as done elsewhere in this function.
> 
> Signed-off-by: Wei Yongjun 

Applied.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Jason Gunthorpe
On Thu, Apr 16, 2015 at 04:34:34PM +0300, Erez Shitrit wrote:
> Currently, iflink of the parent interface was always accessed, even 
> when interface didn't have a parent and hence we crashed there.
> 
> Handle the interface types properly: for a child interface, return
> the ifindex of the parent, for parent interface, return its ifindex.
> 
> For child devices, make sure to set the parent pointer prior to
> invoking register_netdevice(), this allows the new ndo to be called
> by the stack immediately after the child device is registered.
> 
> Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
> Reported-by: Honggang Li 
> Signed-off-by: Erez Shitrit 
> Signed-off-by: Honggang Li 

Reviewed-By: Jason Gunthorpe +

> changes from V0:
>  - fixed two typos in the change-log
> 
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
>  drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index 657b89b..915ad04 100644
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
>  {
>   struct ipoib_dev_priv *priv = netdev_priv(dev);
>  
> + /* parent interface */
> + if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
> + return dev->ifindex;
> +
> + /* child/vlan interface */
>   return priv->parent->ifindex;
>  }
>  
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
> b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
> index 4dd1313..fca1a88 100644
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
> @@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
> ipoib_dev_priv *priv,
>   /* MTU will be reset when mcast join happens */
>   priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
>   priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
> + priv->parent = ppriv->dev;
>   set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
>  
>   result = ipoib_set_dev_features(priv, ppriv->ca);
> @@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
> ipoib_dev_priv *priv,
>   goto register_failed;
>   }
>  
> - priv->parent = ppriv->dev;
> -
>   ipoib_create_debug_files(priv->dev);
>  
>   /* RTNL childs don't need proprietary sysfs entries */
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] fou: Fix missing unlock on error in fou_nl_dump()

2015-04-16 Thread David Miller
From: weiyj...@163.com
Date: Thu, 16 Apr 2015 20:14:39 +0800

> From: Wei Yongjun 
> 
> Add the missing unlock before return from function fou_nl_dump()
> in the error handling case.
> 
> Fixes: 7a6c8c34e5b7 (fou: implement FOU_CMD_GET)
> Signed-off-by: Wei Yongjun 

Your patch adds a warning because you fail to remove the 'done'
label which is no longer used.

Cong Wang already submitted a patch which does remove the 'done'
field properly, so I will apply his version.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Patch net-next v2] fou: avoid missing unlock in failure path

2015-04-16 Thread David Miller
From: Cong Wang 
Date: Wed, 15 Apr 2015 11:48:49 -0700

> Fixes: 7a6c8c34e5b7 ("fou: implement FOU_CMD_GET")
> Reported-by: Dan Carpenter 
> Cc: Dan Carpenter 
> Signed-off-by: Cong Wang 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/1] stmmac: fix oops on rmmod after assigning ip addr

2015-04-16 Thread David Miller
From: Bryan O'Donoghue 
Date: Wed, 15 Apr 2015 02:07:46 +0100

> An oops exists in the flow of stmmac_release().
> phy_ethtool_get_wol() depends on phydev->drv.
> phydev->drv will be null after stmmac_mdio_unreg() completes.
> 
> Steps to reproduce on Quark X1000:
> 
> 1. ifconfig eth0 192.168.0.1
> 2. rmmod stmmac_pci
> 
> To fix this stmmac_mdio_unreg() should be run after unregister_netdev().
> 
> Signed-off-by: Bryan O'Donoghue 
> Reported-by: Dan O'Donovan 

This patch does not apply to the current tree, please respin.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH net] bpf: fix verifier memory corruption

2015-04-16 Thread David Miller
From: Alexei Starovoitov 
Date: Tue, 14 Apr 2015 15:57:13 -0700

> Due to missing bounds check the DAG pass of the BPF verifier can corrupt
> the memory which can cause random crashes during program loading:
> 
> [8.449451] BUG: unable to handle kernel paging request at 
> [8.451293] IP: [] kmem_cache_alloc_trace+0x8d/0x2f0
> [8.452329] Oops:  [#1] SMP
> [8.452329] Call Trace:
> [8.452329]  [] bpf_check+0x852/0x2000
> [8.452329]  [] bpf_prog_load+0x1e4/0x310
> [8.452329]  [] ? might_fault+0x5f/0xb0
> [8.452329]  [] SyS_bpf+0x806/0xa30
> 
> Fixes: f1bca824dabb ("bpf: add search pruning optimization to verifier")
> Signed-off-by: Alexei Starovoitov 

Applied and queued up for -stable, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 31/39] net: core: pktgen: Remove bogus hrtimer_active() check

2015-04-16 Thread David Miller
From: Thomas Gleixner 
Date: Tue, 14 Apr 2015 21:09:16 -

> The check for hrtimer_active() after starting the timer is
> pointless. If the timer is inactive it has expired already and
> therefor the task pointer is already NULL.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 03/39] net: sched: Use hrtimer_resolution instead of hrtimer_get_res()

2015-04-16 Thread David Miller
From: Thomas Gleixner 
Date: Tue, 14 Apr 2015 21:08:28 -

> No point in converting a timespec now that the value is directly
> accessible.
> 
> Signed-off-by: Thomas Gleixner 

Acked-by: David S. Miller 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch 1/1] cxgb4: drop __GFP_NOFAIL allocation

2015-04-16 Thread David Miller
From: a...@linux-foundation.org
Date: Tue, 14 Apr 2015 13:24:33 -0700

> From: Michal Hocko 
> Subject: cxgb4: drop __GFP_NOFAIL allocation
> 
> set_filter_wr is requesting __GFP_NOFAIL allocation although it can return
> ENOMEM without any problems obviously (t4_l2t_set_switching does that
> already).  So the non-failing requirement is too strong without any
> obvious reason.  Drop __GFP_NOFAIL and reorganize the code to have the
> failure paths easier.
> 
> The same applies to _c4iw_write_mem_dma_aligned which uses __GFP_NOFAIL
> and then checks the return value and returns -ENOMEM on failure.  This
> doesn't make any sense what so ever.  Either the allocation cannot fail or
> it can.
> 
> del_filter_wr seems to be safe as well because the filter entry is not
> marked as pending and the return value is propagated up the stack up to
> c4iw_destroy_listen.
> 
> Signed-off-by: Michal Hocko 
 ...
> Signed-off-by: Andrew Morton 

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] altera tse: Fix network-delays and -retransmissions after high throughput.

2015-04-16 Thread David Miller
From: Andreas Oetken 
Date: Tue, 14 Apr 2015 22:25:26 +0200

> From: Andreas Oetken 
> 
> Fix bug which occurs when more than  packets are available during 
> napi-poll,
> leading to "delays" and retransmissions on the network.
> 
> Check for (count < limit) before checking the get_rx_status in 
> tse_rx-function. get_rx_status is reading from the response-fifo.
> If there is currently a response in the fifo, reading the last byte of the 
> response pops the value from the fifo.
> If the limit is check as second condition and the limit is reached the fifo 
> is popped but the packet is not processed.
> 
> Signed-off-by: Andreas Oetken 

Your commit message lines are very long, format them to 80 columns
or less.

> - while (((rxstatus = priv->dmaops->get_rx_status(priv)) != 0) &&
> -(count < limit))  {
> + /* Check for limit first. get_rx_status is reading the from the
> + * response-fifo. If there is a currently a response in the fifo,
> + * reading the last byte of the response pops the value from the fifo.
> + */
> + while ((count < limit) &&
> + ((rxstatus = priv->dmaops->get_rx_status(priv)) != 0)) {

You're corrupted the indentation.

The second line of this while statement should start precisely at
the first column after the openning parenthesis of the first line.
You must use the appropriate number of TAB then SPACE characters
necessary to achieve this.

If you are purely using TAB characters, as you have done here, for
indentation then it is very likely that you have indented things
incorrectly.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread David Miller
From: Hannes Frederic Sowa 
Date: Thu, 16 Apr 2015 14:11:42 +0200

> On Thu, Apr 16, 2015, at 07:29, Herbert Xu wrote:
>> On Thu, Apr 16, 2015 at 06:24:00AM +0100, Patrick McHardy wrote:
>> >
>> > Netfilter may change the contents of the packet, even change its size.
>> > It is *really* hard to do this while keeping the original fragments
>> > intact.
>> 
>> Perhaps we should provide better helpers to facilitate this?
>> 
>> So instead of directly manipulating the content of the skb you
>> would so so through helpers and the helpers can then try to do
>> sensible things with the fragments.
> 
> When Florian and me started discussing how to solve this we also wanted
> to be as transparent as possible. But looking at all possible
> fragmentation scenarios, this seems to be too complicated.
> 
> Even imagine a fragment with overlapping offsets and some of the
> fragments got duplicated. If we had to keep this in frag_list and now
> netfilter has to change any of this contents, this will become a total
> mess, like changing one port in multiple skbs at different offsets.
> 
> I doubt it is worth the effort.

You guys keep talking about exceptional cases rather than what is
unquestionable the common case, and the one worth handling in the
fast paths.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] netns: deinline net_generic()

2015-04-16 Thread David Miller
From: Denys Vlasenko 
Date: Thu, 16 Apr 2015 13:14:14 +0200

> It would help if you tell me how I should change the patches.

Why ask Eric when I told you exactly how to change the patch to make
it acceptable, so please do so.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next 0/3] net: cap size to original frag size when refragmenting

2015-04-16 Thread David Miller
From: Patrick McHardy 
Date: Thu, 16 Apr 2015 06:24:00 +0100

> On 16.04, Herbert Xu wrote:
>> David Miller  wrote:
>> > 
>> > Because then there is no ambiguity at all, you preserve on output
>> > exactly what you had on input.  The same geometry, the same
>> > everything.  No special checks, no max frag len, none of this crap.
>> > Those are all hacks trying to work around the _fundamental_ issue
>> > which is that we potentially change the thing when we refrag.
>> 
>> Agreed.  Doing anything other than preserving the original geometry
>> is simply wrong.
>> 
>> However, this doesn't mean that netfilter has to process each
>> fragment.  What we could do is to preserve the original fragments
>> in frag_list and then process the overall skb as a unit in netfilter.
>> 
>> On output we simply fragment according to the original frag_list.
>> 
>> The only thing to watch out for is to eliminate anything in the
>> middle that tries to linearise the skb.
> 
> Netfilter may change the contents of the packet, even change its size.
> It is *really* hard to do this while keeping the original fragments
> intact.

I keep hearing a lot of "it's hard" as the only reason we shouldn't do
this properly, and that frankly sucks.  People aren't looking for a
solution and to be honest it's quite tiring.

The common case is that the rules processed are simple, the size of
the overall packet does _not_ change, and therefore the best thing
to do is pass the entire thing as a unit with the frags in tact.

That's the fundamental fact.  It's also the fastest way to process
these packets and avoids all of these stupid max frag garbage.

Only at the point where netfilter makes changes to the size of the
packet does it take "ownership" and get to take on the responsibility
of making sure the new resulting fragments are sane.

But only at that point.

It must not molest the geometry in any other set of circumstances,
and I absolutely will not relent on this.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH -next] netns: remove duplicated include from net_namespace.c

2015-04-16 Thread Nicolas Dichtel

Le 16/04/2015 15:17, weiyj...@163.com a écrit :

From: Wei Yongjun 

Remove duplicated include.

Signed-off-by: Wei Yongjun 

Acked-by: Nicolas Dichtel 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] dsa: mv88e6xxx: Drop duplicate declaration of 'ret' variable

2015-04-16 Thread Guenter Roeck
A duplicate declaration of 'ret' can result in hiding an error code.
Drop it.

Fixes: 17ee3e04ddbf ("net: dsa: Provide additional RMON statistics")
Signed-off-by: Guenter Roeck 
---
Found by compiling the file with W=2.

 drivers/net/dsa/mv88e6xxx.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
index f64186a5f634..12f186cb30cc 100644
--- a/drivers/net/dsa/mv88e6xxx.c
+++ b/drivers/net/dsa/mv88e6xxx.c
@@ -602,8 +602,6 @@ static void _mv88e6xxx_get_ethtool_stats(struct dsa_switch 
*ds,
u32 high = 0;
 
if (s->reg >= 0x100) {
-   int ret;
-
ret = mv88e6xxx_reg_read(ds, REG_PORT(port),
 s->reg - 0x100);
if (ret < 0)
-- 
2.1.0

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck

On 04/15/2015 10:12 PM, Guenter Roeck wrote:

Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging support")
Signed-off-by: Guenter Roeck 


I should have given proper credit.

Reported-by: kbuild test robot 

For the curious, neither W=1, W=2, C-1, C=2, nor sparse report this problem,
at least not with gcc 4.9.1.

Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-16 Thread Tim Deegan
At 12:39 +0100 on 16 Apr (1429187952), George Dunlap wrote:
> Your comment lists three benefits:
> 1. better RTT estimation
> 2. faster recovery
> 3. high rates
> 
> #3 is just marketing fluff; it's also contradicted by the statement that
> immediately follows it -- i.e., there are drivers for which the
> limitation does *not* give high rates.

AFAICT #3 is talking about throughput _under TCP_, where inflating the
RTT will absolutely cause problems.

Tim.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH V1 net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Erez Shitrit
Currently, iflink of the parent interface was always accessed, even 
when interface didn't have a parent and hence we crashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li 
Signed-off-by: Erez Shitrit 
Signed-off-by: Honggang Li 
---

changes from V0:
 - fixed two typos in the change-log

 drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 657b89b..915ad04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+   /* parent interface */
+   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
+   return dev->ifindex;
+
+   /* child/vlan interface */
return priv->parent->ifindex;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 4dd1313..fca1a88 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
/* MTU will be reset when mcast join happens */
priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
+   priv->parent = ppriv->dev;
set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
 
result = ipoib_set_dev_features(priv, ppriv->ca);
@@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
goto register_failed;
}
 
-   priv->parent = ppriv->dev;
-
ipoib_create_debug_files(priv->dev);
 
/* RTNL childs don't need proprietary sysfs entries */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH net-next] IB/ipoib: Fix ndo_get_iflink

2015-04-16 Thread Erez Shitrit
Currently, iflink of the parent interface was always accessed, event 
when interface didn't have a parent and hence we rashed there.

Handle the interface types properly: for a child interface, return
the ifindex of the parent, for parent interface, return its ifindex.

For child devices, make sure to set the parent pointer prior to
invoking register_netdevice(), this allows the new ndo to be called
by the stack immediately after the child device is registered.

Fixes: 5aa7add8f14b ('infiniband/ipoib: implement ndo_get_iflink')
Reported-by: Honggang Li 
Signed-off-by: Erez Shitrit 
Signed-off-by: Honggang Li 
---
 drivers/infiniband/ulp/ipoib/ipoib_main.c | 5 +
 drivers/infiniband/ulp/ipoib/ipoib_vlan.c | 3 +--
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c 
b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index 657b89b..915ad04 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -846,6 +846,11 @@ static int ipoib_get_iflink(const struct net_device *dev)
 {
struct ipoib_dev_priv *priv = netdev_priv(dev);
 
+   /* parent interface */
+   if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags))
+   return dev->ifindex;
+
+   /* child/vlan interface */
return priv->parent->ifindex;
 }
 
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c 
b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
index 4dd1313..fca1a88 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_vlan.c
@@ -58,6 +58,7 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
/* MTU will be reset when mcast join happens */
priv->dev->mtu   = IPOIB_UD_MTU(priv->max_ib_mtu);
priv->mcast_mtu  = priv->admin_mtu = priv->dev->mtu;
+   priv->parent = ppriv->dev;
set_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags);
 
result = ipoib_set_dev_features(priv, ppriv->ca);
@@ -84,8 +85,6 @@ int __ipoib_vlan_add(struct ipoib_dev_priv *ppriv, struct 
ipoib_dev_priv *priv,
goto register_failed;
}
 
-   priv->parent = ppriv->dev;
-
ipoib_create_debug_files(priv->dev);
 
/* RTNL childs don't need proprietary sysfs entries */
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] hso: fix refcnt leak in recent patch.

2015-04-16 Thread Olivier Sobrie
On Tue, Apr 14, 2015 at 11:03:03AM +1000, NeilBrown wrote:
> On Tue, 14 Apr 2015 09:36:34 +1000 NeilBrown  wrote:
> 
> > 
> > 
> > Prior to
> > commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> > hso: fix crash when device disappears while serial port is open
> > 
> > hso_serial_open would always kref_get(&serial->parent->ref) before
> > returning zero.
> > Since that commit, it only calls kref_get when returning 0 if
> > serial->port.count was zero.
> > 
> > This results in calls to
> >kref_put(&serial->parent->ref, hso_serial_ref_free);
> > 
> > after hso_serial_ref_free has been called, which dereferences a freed
> > pointer.
> > 
> > This patch adds the missing kref_get().
> > 
> > Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> > Cc: sta...@vger.kernel.org (v4.0)
> > Cc: Olivier Sobrie 
> > Signed-off-by: NeilBrown 
> > 
> > diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
> > index 75befc1bd816..6848fc903340 100644
> > --- a/drivers/net/usb/hso.c
> > +++ b/drivers/net/usb/hso.c
> > @@ -1299,6 +1299,7 @@ static int hso_serial_open(struct tty_struct *tty, 
> > struct file *filp)
> > }
> > } else {
> > D1("Port was already open");
> > +   kref_get(&serial->parent->ref);
> > }
> >  
> > usb_autopm_put_interface(serial->parent->interface);
> 
> 
> Sorry - that was wrong.
> I'm getting crashes which strongly suggest the kref_put is being called extra
> times, but I misunderstood the code and was hasty.
> 
> Maybe this instead?

I tested the patch and it looks fine :-)
Thank you,
Olivier

> 
> Thanks,
> NeilBrown
> 
> From: NeilBrown 
> Date: Tue, 14 Apr 2015 09:33:03 +1000
> Subject: [PATCH] hso: fix refcnt leak in recent patch.
> 
> Prior to
> commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> hso: fix crash when device disappears while serial port is open
> 
> a kref_get on serial->parent->ref would be taken on each open,
> and it would be kref_put on each close.
> 
> Now the kref_put happens when the tty_struct is finally put (via
> the 'cleanup') providing tty->driver_data has been set.
> So the kref_get must be called exact once when tty->driver_data is
> set.
> 
> With the current code, if the first open fails the kref_get() is never
> called, but the kref_put() is called, leaving to a crash.
> 
> So change the kref_get call to happen exactly when ->driver_data is
> changed from NULL to non-NULL.
> 
> Fixes: commit 29bd3bc1194c624ce863cab2a7da9bc1f0c3b47b
> Cc: sta...@vger.kernel.org (v4.0)
> Cc: Olivier Sobrie 

Tested-by: Olivier Sobrie 

> Signed-off-by: NeilBrown 
> 
> diff --git a/drivers/net/usb/hso.c b/drivers/net/usb/hso.c
> index 75befc1bd816..17fd3820263a 100644
> --- a/drivers/net/usb/hso.c
> +++ b/drivers/net/usb/hso.c
> @@ -1278,6 +1278,8 @@ static int hso_serial_open(struct tty_struct *tty, 
> struct file *filp)
>   D1("Opening %d", serial->minor);
>  
>   /* setup */
> + if (tty->driver_data == NULL)
> + kref_get(&serial->parent->ref);
>   tty->driver_data = serial;
>   tty_port_tty_set(&serial->port, tty);
>  
> @@ -1294,8 +1296,6 @@ static int hso_serial_open(struct tty_struct *tty, 
> struct file *filp)
>   if (result) {
>   hso_stop_serial_device(serial->parent);
>   serial->port.count--;
> - } else {
> - kref_get(&serial->parent->ref);
>   }
>   } else {
>   D1("Port was already open");



-- 
Olivier
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -next] netns: remove duplicated include from net_namespace.c

2015-04-16 Thread weiyj_lk
From: Wei Yongjun 

Remove duplicated include.

Signed-off-by: Wei Yongjun 
---
 net/core/net_namespace.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
index e5e96b0..9c43cf6 100644
--- a/net/core/net_namespace.c
+++ b/net/core/net_namespace.c
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH -next] ethernet: remove unused including

2015-04-16 Thread weiyj_lk
From: Wei Yongjun 

Remove including  that don't need it.

Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/qualcomm/qca_spi.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/net/ethernet/qualcomm/qca_spi.c 
b/drivers/net/ethernet/qualcomm/qca_spi.c
index 4a42e96..f66641d 100644
--- a/drivers/net/ethernet/qualcomm/qca_spi.c
+++ b/drivers/net/ethernet/qualcomm/qca_spi.c
@@ -41,7 +41,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "qca_7k.h"
 #include "qca_debug.h"

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Guenter Roeck

On 04/16/2015 05:37 AM, Andrew Lunn wrote:

On Wed, Apr 15, 2015 at 10:12:42PM -0700, Guenter Roeck wrote:

Return correct error code if _mv88e6xxx_reg_read returns an error.

Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging support")
Signed-off-by: Guenter Roeck 


Hi Guenter

Good catch. I'm surprised there is no compiler warning, possibly used
before set.


There was, actually. No idea how I missed that. Fengguang's build bots caught 
it.


Reviewed-by: Andrew Lunn 


Thanks!

Guenter

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] rocker: fix error return code in rocker_probe()

2015-04-16 Thread Jiri Pirko
Thu, Apr 16, 2015 at 02:21:02PM CEST, weiyj...@163.com wrote:
>From: Wei Yongjun 
>
>Fix to return -EINVAL from the invalid PCI region size error
>handling case instead of 0, as done elsewhere in this function.
>
>Signed-off-by: Wei Yongjun 


Good catch.

Acked-by: Jiri Pirko 
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen

2015-04-16 Thread Eric Dumazet
On Thu, 2015-04-16 at 11:01 +0100, George Dunlap wrote:

> He suggested that after he'd been prodded by 4 more e-mails in which two
> of us guessed what he was trying to get at.  That's what I was
> complaining about.

My big complain is that I suggested to test to double the sysctl, which
gave good results.

Then you provided a patch using a 8x factor. How does that sound ?

Next time I ask a raise, I should try a 8x factor as well, who knows,
it might be accepted.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] dsa: mv88e6xxx: Fix error handling in mv88e6xxx_set_port_state

2015-04-16 Thread Andrew Lunn
On Wed, Apr 15, 2015 at 10:12:42PM -0700, Guenter Roeck wrote:
> Return correct error code if _mv88e6xxx_reg_read returns an error.
> 
> Fixes: facd95b2e0ec0 ("net: dsa: mv88e6xxx: Add Hardware bridging support")
> Signed-off-by: Guenter Roeck 

Hi Guenter

Good catch. I'm surprised there is no compiler warning, possibly used
before set.

Reviewed-by: Andrew Lunn 

Thanks
Andrew

> ---
>  drivers/net/dsa/mv88e6xxx.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/dsa/mv88e6xxx.c b/drivers/net/dsa/mv88e6xxx.c
> index fc8d3b6ffe8e..f64186a5f634 100644
> --- a/drivers/net/dsa/mv88e6xxx.c
> +++ b/drivers/net/dsa/mv88e6xxx.c
> @@ -908,8 +908,10 @@ static int mv88e6xxx_set_port_state(struct dsa_switch 
> *ds, int port, u8 state)
>   mutex_lock(&ps->smi_mutex);
>  
>   reg = _mv88e6xxx_reg_read(ds, REG_PORT(port), PORT_CONTROL);
> - if (reg < 0)
> + if (reg < 0) {
> + ret = reg;
>   goto abort;
> + }
>  
>   oldstate = reg & PORT_CONTROL_STATE_MASK;
>   if (oldstate != state) {
> -- 
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >