date:20161104

Re: [PATCH net] sock: fix sendmmsg for partial sendmsg

2016-11-04 Thread Maciej Żenczykowski

Acked-by: Maciej Żenczykowski

[net PATCH] fib_trie: Correct /proc/net/route off by one error

2016-11-04 Thread Alexander Duyck

The display of /proc/net/route has had a couple issues due to the fact that
when I originally rewrote most of fib_trie I made it so that the iterator
was tracking the next value to use instead of the current.

In addition it had an off by 1 error where I was tracking the first piece
of data as position 0, even though in reality that belonged to the
SEQ_START_TOKEN.

This patch updates the code so the iterator tracks the last reported
position and key instead of the next expected position and key.  In
addition it shifts things so that all of the leaves start at 1 instead of
trying to report leaves starting with offset 0 as being valid.  With these
two issues addressed this should resolve any off by one errors that were
present in the display of /proc/net/route.

Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in 
/proc/net/route")
Cc: Andy Whitcroft 
Reported-by: Jason Baron 
Signed-off-by: Alexander Duyck 
---
 net/ipv4/fib_trie.c |   21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 31cef36..4cff74d 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2413,22 +2413,19 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
struct key_vector *l, **tp = &iter->tnode;
t_key key;
 
-   /* use cache location of next-to-find key */
+   /* use cached location of previously found key */
if (iter->pos > 0 && pos >= iter->pos) {
-   pos -= iter->pos;
key = iter->key;
} else {
-   iter->pos = 0;
+   iter->pos = 1;
key = 0;
}
 
-   while ((l = leaf_walk_rcu(tp, key)) != NULL) {
+   pos -= iter->pos;
+
+   while ((l = leaf_walk_rcu(tp, key)) && (pos-- > 0)) {
key = l->key + 1;
iter->pos++;
-
-   if (--pos <= 0)
-   break;
-
l = NULL;
 
/* handle unlikely case of a key wrap */
@@ -2437,7 +2434,7 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
}
 
if (l)
-   iter->key = key;/* remember it */
+   iter->key = l->key; /* remember it */
else
iter->pos = 0;  /* forget it */
 
@@ -2465,7 +2462,7 @@ static void *fib_route_seq_start(struct seq_file *seq, 
loff_t *pos)
return fib_route_get_idx(iter, *pos);
 
iter->pos = 0;
-   iter->key = 0;
+   iter->key = KEY_MAX;
 
return SEQ_START_TOKEN;
 }
@@ -2474,7 +2471,7 @@ static void *fib_route_seq_next(struct seq_file *seq, 
void *v, loff_t *pos)
 {
struct fib_route_iter *iter = seq->private;
struct key_vector *l = NULL;
-   t_key key = iter->key;
+   t_key key = iter->key + 1;
 
++*pos;
 
@@ -2483,7 +2480,7 @@ static void *fib_route_seq_next(struct seq_file *seq, 
void *v, loff_t *pos)
l = leaf_walk_rcu(&iter->tnode, key);
 
if (l) {
-   iter->key = l->key + 1;
+   iter->key = l->key;
iter->pos++;
} else {
iter->pos = 0;

Re: [PATCH net-next 5/7] vxlan: simplify RTF_LOCAL handling.

2016-11-04 Thread kbuild test robot

Hi Pravin,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Pravin-B-Shelar/vxlan-General-improvements/20161105-060958
config: i386-randconfig-x077-201644 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/net/vxlan.c: In function 'vxlan_group_used':
   drivers/net/vxlan.c:947:21: warning: unused variable 'sock6' 
[-Wunused-variable]
 struct vxlan_sock *sock6 = NULL;
^
   drivers/net/vxlan.c: In function 'check_route_rtf_local':
>> drivers/net/vxlan.c:1954:17: error: 'RTF_LOCAL' undeclared (first use in 
>> this function)
 if (rt_flags & RTF_LOCAL &&
^
   drivers/net/vxlan.c:1954:17: note: each undeclared identifier is reported 
only once for each function it appears in

vim +/RTF_LOCAL +1954 drivers/net/vxlan.c

  1948  static int check_route_rtf_local(struct sk_buff *skb, struct net_device 
*dev,
  1949   struct vxlan_dev *vxlan, union 
vxlan_addr *daddr,
  1950   __be32 dst_port, __be32 vni, struct 
dst_entry *dst,
  1951   u32 rt_flags)
  1952  {
  1953  /* Bypass encapsulation if the destination is local */
> 1954  if (rt_flags & RTF_LOCAL &&
  1955  !(rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
  1956  struct vxlan_dev *dst_vxlan;
  1957  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip

[PATCH iproute2 4/7] l2tp: fix L2TP_ATTR_{RECV,SEND}_SEQ handling

2016-11-04 Thread Asbjørn Sloth Tønnesen

L2TP_ATTR_RECV_SEQ and L2TP_ATTR_SEND_SEQ are declared as NLA_U8
attributes in the kernel, so let's threat them accordingly.

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 ip/ipl2tp.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 2e0e9c7..af89e2f 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -160,8 +160,8 @@ static int create_session(struct l2tp_parm *p)
addattr8(&req.n, 1024, L2TP_ATTR_L2SPEC_LEN, p->l2spec_len);
 
if (p->mtu) addattr16(&req.n, 1024, L2TP_ATTR_MTU, p->mtu);
-   if (p->recv_seq)addattr(&req.n, 1024, L2TP_ATTR_RECV_SEQ);
-   if (p->send_seq)addattr(&req.n, 1024, L2TP_ATTR_SEND_SEQ);
+   if (p->recv_seq)addattr8(&req.n, 1024, L2TP_ATTR_RECV_SEQ, 1);
+   if (p->send_seq)addattr8(&req.n, 1024, L2TP_ATTR_SEND_SEQ, 1);
if (p->lns_mode)addattr(&req.n, 1024, L2TP_ATTR_LNS_MODE);
if (p->data_seq)addattr8(&req.n, 1024, L2TP_ATTR_DATA_SEQ, 
p->data_seq);
if (p->reorder_timeout) addattr64(&req.n, 1024, L2TP_ATTR_RECV_TIMEOUT,
@@ -304,8 +304,10 @@ static int get_response(struct nlmsghdr *n, void *arg)
memcpy(p->peer_cookie, RTA_DATA(attrs[L2TP_ATTR_PEER_COOKIE]),
   p->peer_cookie_len = 
RTA_PAYLOAD(attrs[L2TP_ATTR_PEER_COOKIE]));
 
-   p->recv_seq = !!attrs[L2TP_ATTR_RECV_SEQ];
-   p->send_seq = !!attrs[L2TP_ATTR_SEND_SEQ];
+   if (attrs[L2TP_ATTR_RECV_SEQ])
+   p->recv_seq = rta_getattr_u8(attrs[L2TP_ATTR_RECV_SEQ]);
+   if (attrs[L2TP_ATTR_SEND_SEQ])
+   p->send_seq = rta_getattr_u8(attrs[L2TP_ATTR_SEND_SEQ]);
 
if (attrs[L2TP_ATTR_RECV_TIMEOUT])
p->reorder_timeout = 
rta_getattr_u64(attrs[L2TP_ATTR_RECV_TIMEOUT]);
-- 
2.10.1

[PATCH iproute2 1/7] man: ip-l2tp.8: fix l2spec_type documentation

2016-11-04 Thread Asbjørn Sloth Tønnesen

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 man/man8/ip-l2tp.8 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index 5b7041f..4a3bb20 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -239,7 +239,7 @@ find in received L2TP packets. Default is to use no cookie.
 set the layer2specific header type of the session.
 .br
 Valid values are:
-.BR none ", " udp "."
+.BR none ", " default "."
 .TP
 .BI offset " OFFSET"
 sets the byte offset from the L2TP header where user data starts in
-- 
2.10.1

[PATCH iproute2 5/7] l2tp: support sequence numbering

2016-11-04 Thread Asbjørn Sloth Tønnesen

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 ip/ipl2tp.c| 23 +++
 man/man8/ip-l2tp.8 | 15 +++
 2 files changed, 38 insertions(+)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index af89e2f..6d00d09 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -246,6 +246,12 @@ static void print_session(struct l2tp_data *data)
printf("  reorder timeout: %u\n", p->reorder_timeout);
else
printf("\n");
+   if (p->send_seq || p->recv_seq) {
+   printf("  sequence numbering:");
+   if (p->send_seq) printf(" send");
+   if (p->recv_seq) printf(" recv");
+   printf("\n");
+   }
 }
 
 static int get_response(struct nlmsghdr *n, void *arg)
@@ -483,6 +489,7 @@ static void usage(void)
fprintf(stderr, "  session_id ID peer_session_id ID\n");
fprintf(stderr, "  [ cookie HEXSTR ] [ peer_cookie HEXSTR ]\n");
fprintf(stderr, "  [ offset OFFSET ] [ peer_offset OFFSET ]\n");
+   fprintf(stderr, "  [ seq { none | send | recv | both } ]\n");
fprintf(stderr, "  [ l2spec_type L2SPEC ]\n");
fprintf(stderr, "   ip l2tp del tunnel tunnel_id ID\n");
fprintf(stderr, "   ip l2tp del session tunnel_id ID session_id 
ID\n");
@@ -653,6 +660,22 @@ static int parse_args(int argc, char **argv, int cmd, 
struct l2tp_parm *p)
fprintf(stderr, "Unknown layer2specific header 
type \"%s\"\n", *argv);
exit(-1);
}
+   } else if (strcmp(*argv, "seq") == 0) {
+   NEXT_ARG();
+   if (strcasecmp(*argv, "both") == 0) {
+   p->recv_seq = 1;
+   p->send_seq = 1;
+   } else if (strcasecmp(*argv, "recv") == 0) {
+   p->recv_seq = 1;
+   } else if (strcasecmp(*argv, "send") == 0) {
+   p->send_seq = 1;
+   } else if (strcasecmp(*argv, "none") == 0) {
+   p->recv_seq = 0;
+   p->send_seq = 0;
+   } else {
+   fprintf(stderr, "Unknown seq value \"%s\"\n", 
*argv);
+   exit(-1);
+   }
} else if (strcmp(*argv, "tunnel") == 0) {
p->tunnel = 1;
} else if (strcmp(*argv, "session") == 0) {
diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index 991d097..d4e7270 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -51,6 +51,8 @@ ip-l2tp - L2TPv3 static unmanaged tunnel configuration
 .br
 .RB "[ " l2spec_type " { " none " | " default " } ]"
 .br
+.RB "[ " seq " { " none " | " send " | " recv " | " both " } ]"
+.br
 .RB "[ " offset
 .IR OFFSET
 .RB " ] [ " peer_offset
@@ -238,6 +240,19 @@ set the layer2specific header type of the session.
 Valid values are:
 .BR none ", " default "."
 .TP
+.BI seq " SEQ"
+controls sequence numbering to prevent or detect out of order packets.
+.B send
+puts a sequence number in the default layer2specific header of each
+outgoing packet.
+.B recv
+reorder packets if they are received out of order.
+Default is
+.BR none "."
+.br
+Valid values are:
+.BR none ", " send ", " recv ", " both "."
+.TP
 .BI offset " OFFSET"
 sets the byte offset from the L2TP header where user data starts in
 transmitted L2TP data packets. This is hardly ever used. If set, the
-- 
2.10.1

[PATCH iproute2 6/7] l2tp: read IPv6 UDP checksum attributes from kernel

2016-11-04 Thread Asbjørn Sloth Tønnesen

In case of an older kernel that doesn't set L2TP_ATTR_UDP_ZERO_CSUM6_{R,T}X
the old hard-coded value is being preserved, since the attribute flag will be
missing.

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 ip/ipl2tp.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 6d00d09..8f3268d 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -296,12 +296,9 @@ static int get_response(struct nlmsghdr *n, void *arg)
p->l2spec_len = rta_getattr_u8(attrs[L2TP_ATTR_L2SPEC_LEN]);
 
p->udp_csum = !!attrs[L2TP_ATTR_UDP_CSUM];
-   /*
-* Not fetching from L2TP_ATTR_UDP_ZERO_CSUM6_{T,R}X because the
-* kernel doesn't send it so just leave it as default value.
-*/
-   p->udp6_csum_tx = 1;
-   p->udp6_csum_rx = 1;
+   p->udp6_csum_tx = !attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX];
+   p->udp6_csum_rx = !attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX];
+
if (attrs[L2TP_ATTR_COOKIE])
memcpy(p->cookie, RTA_DATA(attrs[L2TP_ATTR_COOKIE]),
   p->cookie_len = RTA_PAYLOAD(attrs[L2TP_ATTR_COOKIE]));
-- 
2.10.1

[PATCH iproute2 3/7] l2tp: fix integers with too few significant bits

2016-11-04 Thread Asbjørn Sloth Tønnesen

udp6_csum{,_tx,_rx}, tunnel and session are the only ones
currently used.

recv_seq, send_seq, lns_mode and data_seq are partially
implemented in a useless way.

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 ip/ipl2tp.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index d3338ac..2e0e9c7 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -56,15 +56,15 @@ struct l2tp_parm {
 
uint16_t pw_type;
uint16_t mtu;
-   int udp6_csum_tx:1;
-   int udp6_csum_rx:1;
-   int udp_csum:1;
-   int recv_seq:1;
-   int send_seq:1;
-   int lns_mode:1;
-   int data_seq:2;
-   int tunnel:1;
-   int session:1;
+   unsigned int udp6_csum_tx:1;
+   unsigned int udp6_csum_rx:1;
+   unsigned int udp_csum:1;
+   unsigned int recv_seq:1;
+   unsigned int send_seq:1;
+   unsigned int lns_mode:1;
+   unsigned int data_seq:2;
+   unsigned int tunnel:1;
+   unsigned int session:1;
int reorder_timeout;
const char *ifname;
uint8_t l2spec_type;
-- 
2.10.1

[PATCH iproute2 7/7] l2tp: show tunnel: expose UDP checksum state

2016-11-04 Thread Asbjørn Sloth Tønnesen

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 ip/ipl2tp.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/ip/ipl2tp.c b/ip/ipl2tp.c
index 8f3268d..27dc184 100644
--- a/ip/ipl2tp.c
+++ b/ip/ipl2tp.c
@@ -218,9 +218,24 @@ static void print_tunnel(const struct l2tp_data *data)
printf("  Peer tunnel %u\n",
   p->peer_tunnel_id);
 
-   if (p->encap == L2TP_ENCAPTYPE_UDP)
+   if (p->encap == L2TP_ENCAPTYPE_UDP) {
printf("  UDP source / dest ports: %hu/%hu\n",
   p->local_udp_port, p->peer_udp_port);
+
+   switch (p->local_ip.family) {
+   case AF_INET:
+   printf("  UDP checksum: %s\n",
+  p->udp_csum ? "enabled" : "disabled");
+   break;
+   case AF_INET6:
+   printf("  UDP checksum: %s%s%s%s\n",
+  p->udp6_csum_tx && p->udp6_csum_rx ? "enabled" : 
"",
+  p->udp6_csum_tx && !p->udp6_csum_rx ? "tx" : "",
+  !p->udp6_csum_tx && p->udp6_csum_rx ? "rx" : "",
+  !p->udp6_csum_tx && !p->udp6_csum_rx ? 
"disabled" : "");
+   break;
+   }
+   }
 }
 
 static void print_session(struct l2tp_data *data)
-- 
2.10.1

[PATCH iproute2 2/7] man: ip-l2tp.8: remove non-existent tunnel parameter name

2016-11-04 Thread Asbjørn Sloth Tønnesen

The name parameter is only valid for sessions, not tunnels.

Signed-off-by: Asbjørn Sloth Tønnesen 
---
 man/man8/ip-l2tp.8 | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/man/man8/ip-l2tp.8 b/man/man8/ip-l2tp.8
index 4a3bb20..991d097 100644
--- a/man/man8/ip-l2tp.8
+++ b/man/man8/ip-l2tp.8
@@ -154,9 +154,6 @@ tunnels and sessions to be established and provides for 
detecting and
 acting upon network failures.
 .SS ip l2tp add tunnel - add a new tunnel
 .TP
-.BI name " NAME "
-sets the session network interface name. Default is l2tpethN.
-.TP
 .BI tunnel_id " ID"
 set the tunnel id, which is a 32-bit integer value. Uniquely
 identifies the tunnel. The value used must match the peer_tunnel_id
-- 
2.10.1

Re: [PATCH net-next RFC WIP] Patch for XDP support for virtio_net

2016-11-04 Thread John Fastabend

On 16-11-03 05:34 PM, Michael S. Tsirkin wrote:
> On Thu, Nov 03, 2016 at 04:29:22PM -0700, John Fastabend wrote:
>> [...]
>>
> - when XDP is attached disable all LRO using 
> VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET
>   (not used by driver so far, designed to allow dynamic LRO control with
>ethtool)

 I see there is a UAPI bit for this but I guess we also need to add
 support to vhost as well? Seems otherwise we may just drop a bunch
 of packets on the floor out of handle_rx() when recvmsg returns larger
 than a page size. Or did I read this wrong...
>>>
>>> It's already supported host side. However you might
>>> get some packets that were in flight when you attached.
>>>
>>
>> Really I must have missed it I don't see any *GUEST_FEATURES* flag in
>> ./drivers/vhost/?
> 
> It's all done by QEMU catching these commands and calling
> ioctls on the tun/macvtap/packet socket.
> 

Well at least for the tap vhost backend in linux that I found here,

 ./qemu/net/tap-linux.c

there is no LRO feature flag but that is OK I can get it working next
week looks fairly straight forward.

[...]

>> And if I try to merge the last email I sent out here. In mergeable and
>> big_packets modes if LRO is off and MTU < PAGE_SIZE it seems we should
>> always get physically contiguous data on a single page correct?
> 
> Unfortunately not in the mergeable buffer case according to spec, even though
> linux hosts will do that, so it's fine to optimize for that
> but need to somehow work in other cases e.g. by doing a data copy.
> 

ah OK this makes sense I was looking at vhost implementation in Linux.

> 
>> It
>> may be at some offset in a page however. But the offset should not
>> matter to XDP. If I read this right we wouldn't need to add a new
>> XDP mode and could just use the existing merge or big modes. This would
>> seem cleaner to me than adding a new mode and requiring a qemu option.
>>
>> Thanks for all the pointers by the way its very helpful.
> 
> So for mergeable we spend cycles trying to make buffers as small
> as possible and I have a patch to avoid copies for that too,
> I'll post it next week hopefully.
> 

Good to know. I'll get the XDP stuff wrapped up next week or see
if Srijeet wants to do it.

Thanks,
John

[PATCH net-next 5/5] net: l2tp: fix negative assignment to unsigned int

2016-11-04 Thread Asbjoern Sloth Toennesen

recv_seq, send_seq and lns_mode mode are all defined as
unsigned int foo:1;

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_core.c | 2 +-
 net/l2tp/l2tp_ppp.c  | 6 +++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/l2tp/l2tp_core.c b/net/l2tp/l2tp_core.c
index a2ed3bd..85948c6 100644
--- a/net/l2tp/l2tp_core.c
+++ b/net/l2tp/l2tp_core.c
@@ -715,7 +715,7 @@ void l2tp_recv_common(struct l2tp_session *session, struct 
sk_buff *skb,
l2tp_info(session, L2TP_MSG_SEQ,
  "%s: requested to enable seq numbers by 
LNS\n",
  session->name);
-   session->send_seq = -1;
+   session->send_seq = 1;
l2tp_session_set_header_len(session, tunnel->version);
}
} else {
diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 41d47bf..2ddfec1 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -1272,7 +1272,7 @@ static int pppol2tp_session_setsockopt(struct sock *sk,
err = -EINVAL;
break;
}
-   session->recv_seq = val ? -1 : 0;
+   session->recv_seq = !!val;
l2tp_info(session, PPPOL2TP_MSG_CONTROL,
  "%s: set recv_seq=%d\n",
  session->name, session->recv_seq);
@@ -1283,7 +1283,7 @@ static int pppol2tp_session_setsockopt(struct sock *sk,
err = -EINVAL;
break;
}
-   session->send_seq = val ? -1 : 0;
+   session->send_seq = !!val;
{
struct sock *ssk  = ps->sock;
struct pppox_sock *po = pppox_sk(ssk);
@@ -1301,7 +1301,7 @@ static int pppol2tp_session_setsockopt(struct sock *sk,
err = -EINVAL;
break;
}
-   session->lns_mode = val ? -1 : 0;
+   session->lns_mode = !!val;
l2tp_info(session, PPPOL2TP_MSG_CONTROL,
  "%s: set lns_mode=%d\n",
  session->name, session->lns_mode);
-- 
2.10.1

[PATCH net-next 1/5] net: l2tp: fix L2TP_ATTR_UDP_CSUM attribute type

2016-11-04 Thread Asbjoern Sloth Toennesen

L2TP_ATTR_UDP_CSUM is a flag, and gets read with
nla_get_flag, but it is defined as NLA_U8 in
the nla_policy.

It appears that this is only publicly used in
iproute2, where it's broken, because it's used as
a NLA_FLAG, and fails validation as a NLA_U8.

The only place it's used as a NLA_U8 is in
l2tp_nl_tunnel_send(), but iproute2 again reads that
as a flag, it's therefore always set. Fortunately
it is never used for anything, just read.

CC: Miao Wang 
Signed-off-by: Asbjoern Sloth Toennesen 
---
 include/uapi/linux/l2tp.h |  2 +-
 net/l2tp/l2tp_netlink.c   | 12 +---
 2 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index 4bd27d0..73e3a23 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -104,7 +104,7 @@ enum {
L2TP_ATTR_PEER_CONN_ID, /* u32 */
L2TP_ATTR_SESSION_ID,   /* u32 */
L2TP_ATTR_PEER_SESSION_ID,  /* u32 */
-   L2TP_ATTR_UDP_CSUM, /* u8 */
+   L2TP_ATTR_UDP_CSUM, /* flag */
L2TP_ATTR_VLAN_ID,  /* u16 */
L2TP_ATTR_COOKIE,   /* 0, 4 or 8 bytes */
L2TP_ATTR_PEER_COOKIE,  /* 0, 4 or 8 bytes */
diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 59aa2d2..1fe05da 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -379,9 +379,15 @@ static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 
portid, u32 seq, int fla
 
switch (tunnel->encap) {
case L2TP_ENCAPTYPE_UDP:
+   switch (sk->sk_family) {
+   case AF_INET:
+   if ((!sk->sk_no_check_tx) &&
+   nla_put_flag(skb, L2TP_ATTR_UDP_CSUM))
+   goto nla_put_failure;
+   break;
+   }
if (nla_put_u16(skb, L2TP_ATTR_UDP_SPORT, 
ntohs(inet->inet_sport)) ||
-   nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)) ||
-   nla_put_u8(skb, L2TP_ATTR_UDP_CSUM, !sk->sk_no_check_tx))
+   nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)))
goto nla_put_failure;
/* NOBREAK */
case L2TP_ENCAPTYPE_IP:
@@ -873,7 +879,7 @@ static const struct nla_policy l2tp_nl_policy[L2TP_ATTR_MAX 
+ 1] = {
[L2TP_ATTR_PEER_CONN_ID]= { .type = NLA_U32, },
[L2TP_ATTR_SESSION_ID]  = { .type = NLA_U32, },
[L2TP_ATTR_PEER_SESSION_ID] = { .type = NLA_U32, },
-   [L2TP_ATTR_UDP_CSUM]= { .type = NLA_U8, },
+   [L2TP_ATTR_UDP_CSUM]= { .type = NLA_FLAG, },
[L2TP_ATTR_VLAN_ID] = { .type = NLA_U16, },
[L2TP_ATTR_DEBUG]   = { .type = NLA_U32, },
[L2TP_ATTR_RECV_SEQ]= { .type = NLA_U8, },
-- 
2.10.1

[PATCH net-next 2/5] net: l2tp: fix L2TP_ATTR_UDP_ZERO_CSUM6_{RX,TX} attribute types

2016-11-04 Thread Asbjoern Sloth Toennesen

The attributes L2TP_ATTR_UDP_ZERO_CSUM6_RX and
L2TP_ATTR_UDP_ZERO_CSUM6_TX are used as flags,
but is defined as a u8 in a comment.

This patch redocuments them as flags, and adds
them to the nla_policy, so they gets validated.

The only publicly user, iproute2, already treat
these attributes as flags.

CC: Miao Wang 
Signed-off-by: Asbjoern Sloth Toennesen 
---
 include/uapi/linux/l2tp.h | 4 ++--
 net/l2tp/l2tp_netlink.c   | 2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/l2tp.h b/include/uapi/linux/l2tp.h
index 73e3a23..345e49f 100644
--- a/include/uapi/linux/l2tp.h
+++ b/include/uapi/linux/l2tp.h
@@ -124,8 +124,8 @@ enum {
L2TP_ATTR_STATS,/* nested */
L2TP_ATTR_IP6_SADDR,/* struct in6_addr */
L2TP_ATTR_IP6_DADDR,/* struct in6_addr */
-   L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* u8 */
-   L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* u8 */
+   L2TP_ATTR_UDP_ZERO_CSUM6_TX,/* flag */
+   L2TP_ATTR_UDP_ZERO_CSUM6_RX,/* flag */
L2TP_ATTR_PAD,
__L2TP_ATTR_MAX,
 };
diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 1fe05da..e45c5409 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -880,6 +880,8 @@ static const struct nla_policy l2tp_nl_policy[L2TP_ATTR_MAX 
+ 1] = {
[L2TP_ATTR_SESSION_ID]  = { .type = NLA_U32, },
[L2TP_ATTR_PEER_SESSION_ID] = { .type = NLA_U32, },
[L2TP_ATTR_UDP_CSUM]= { .type = NLA_FLAG, },
+   [L2TP_ATTR_UDP_ZERO_CSUM6_TX]   = { .type = NLA_FLAG, },
+   [L2TP_ATTR_UDP_ZERO_CSUM6_RX]   = { .type = NLA_FLAG, },
[L2TP_ATTR_VLAN_ID] = { .type = NLA_U16, },
[L2TP_ATTR_DEBUG]   = { .type = NLA_U32, },
[L2TP_ATTR_RECV_SEQ]= { .type = NLA_U8, },
-- 
2.10.1

[PATCH net-next 3/5] net: l2tp: netlink: l2tp_nl_tunnel_send: set UDP6 checksum flags

2016-11-04 Thread Asbjoern Sloth Toennesen

This patch causes the proper attribute flags to be set,
in the case that IPv6 UDP checksums are disabled, so that
userspace ie. `ip l2tp show tunnel` knows about it.

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_netlink.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index e45c5409..1b3fcde 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -385,6 +385,16 @@ static int l2tp_nl_tunnel_send(struct sk_buff *skb, u32 
portid, u32 seq, int fla
nla_put_flag(skb, L2TP_ATTR_UDP_CSUM))
goto nla_put_failure;
break;
+#if IS_ENABLED(CONFIG_IPV6)
+   case AF_INET6:
+   if (udp_get_no_check6_tx(sk) &&
+   nla_put_flag(skb, L2TP_ATTR_UDP_ZERO_CSUM6_TX))
+   goto nla_put_failure;
+   if (udp_get_no_check6_rx(sk) &&
+   nla_put_flag(skb, L2TP_ATTR_UDP_ZERO_CSUM6_RX))
+   goto nla_put_failure;
+   break;
+#endif
}
if (nla_put_u16(skb, L2TP_ATTR_UDP_SPORT, 
ntohs(inet->inet_sport)) ||
nla_put_u16(skb, L2TP_ATTR_UDP_DPORT, 
ntohs(inet->inet_dport)))
-- 
2.10.1

[PATCH net-next 4/5] net: l2tp: cleanup: remove redundant condition

2016-11-04 Thread Asbjoern Sloth Toennesen

These assignments follow this pattern:

unsigned int foo:1;
struct nlattr *nla = info->attrs[bar];

if (nla)
foo = nla_get_flag(nla); /* expands to: foo = !!nla */

This could be simplified to: if (nla) foo = 1;
but lets just remove the condition and use the macro,

foo = nla_get_flag(nla);

Signed-off-by: Asbjoern Sloth Toennesen 
---
 net/l2tp/l2tp_netlink.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/l2tp/l2tp_netlink.c b/net/l2tp/l2tp_netlink.c
index 1b3fcde..abf6bf1 100644
--- a/net/l2tp/l2tp_netlink.c
+++ b/net/l2tp/l2tp_netlink.c
@@ -220,14 +220,14 @@ static int l2tp_nl_cmd_tunnel_create(struct sk_buff *skb, 
struct genl_info *info
cfg.local_udp_port = 
nla_get_u16(info->attrs[L2TP_ATTR_UDP_SPORT]);
if (info->attrs[L2TP_ATTR_UDP_DPORT])
cfg.peer_udp_port = 
nla_get_u16(info->attrs[L2TP_ATTR_UDP_DPORT]);
-   if (info->attrs[L2TP_ATTR_UDP_CSUM])
-   cfg.use_udp_checksums = 
nla_get_flag(info->attrs[L2TP_ATTR_UDP_CSUM]);
+   cfg.use_udp_checksums = nla_get_flag(
+   info->attrs[L2TP_ATTR_UDP_CSUM]);
 
 #if IS_ENABLED(CONFIG_IPV6)
-   if (info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX])
-   cfg.udp6_zero_tx_checksums = 
nla_get_flag(info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
-   if (info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX])
-   cfg.udp6_zero_rx_checksums = 
nla_get_flag(info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
+   cfg.udp6_zero_tx_checksums = nla_get_flag(
+   info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_TX]);
+   cfg.udp6_zero_rx_checksums = nla_get_flag(
+   info->attrs[L2TP_ATTR_UDP_ZERO_CSUM6_RX]);
 #endif
}
 
-- 
2.10.1

[PATCH] net-ipv6: on device mtu change do not add mtu to mtu-less routes

2016-11-04 Thread Maciej Żenczykowski

From: Maciej Żenczykowski 

Routes can specify an mtu explicitly or inherit the mtu from
the underlying device - this inheritance is implemented in
dst->ops->mtu handlers ip6_mtu() and ip6_blackhole_mtu().

Currently changing the mtu of a device adds mtu explicitly
to routes using that device.

ie.
  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65535 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1
  local 2000::1 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1

After this patch the route entry no longer changes unless it already has an mtu.
There is no need: this inheritance is already done in ip6_mtu()

  # ip link set dev lo mtu 65536
  # ip -6 route add local 2000::1 dev lo
  # ip -6 route add local 2000::2 dev lo mtu 2000
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 65535
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 2000 pref medium

  # ip link set dev lo mtu 1501
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 1501 pref medium

  # ip link set dev lo mtu 65536
  # ip -6 route get 2000::1; ip -6 route get 2000::2
  local 2000::1 dev lo  table local  src ...  metric 1024  pref medium
  local 2000::2 dev lo  table local  src ...  metric 1024  mtu 65536 pref medium

  # ip -6 route del local 2000::1
  # ip -6 route del local 2000::2

This is desirable because changing device mtu and then resetting it
to the previous value shouldn't change the user visible routing table.

Signed-off-by: Maciej Żenczykowski 
CC: Eric Dumazet 
---
 net/ipv6/route.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 947ed1ded026..fa90d14302f7 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -2758,6 +2758,7 @@ static int rt6_mtu_change_route(struct rt6_info *rt, void 
*p_arg)
   PMTU discouvery.
 */
if (rt->dst.dev == arg->dev &&
+   dst_metric_raw(&rt->dst, RTAX_MTU) &&
!dst_metric_locked(&rt->dst, RTAX_MTU)) {
if (rt->rt6i_flags & RTF_CACHE) {
/* For RTF_CACHE with rt6i_pmtu == 0
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH net] ipv4: update comment to document GSO fragmentation cases.

2016-11-04 Thread Shmulik Ladkani

Hi,

On Fri,  4 Nov 2016 12:22:38 -0400 Lance Richardson  wrote:
> This is a follow-up to commit eb96202f1e34 ("ipv4: allow local
> fragmentation in ip_finish_output_gso()"), updating the comment
> documenting cases in which fragmentation is needed for egress
> GSO packets.
> 
> Suggested-by: Shmulik Ladkani 
> Signed-off-by: Lance Richardson 

Thanks Lance.

Reviewed-by: Shmulik Ladkani

[PATCH net-next 7/7] vxlan: remove unsed vxlan_dev_dst_port()

2016-11-04 Thread Pravin B Shelar

Signed-off-by: Pravin B Shelar 
---
 include/net/vxlan.h | 10 --
 1 file changed, 10 deletions(-)

diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 308adc4..49a5920 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -281,16 +281,6 @@ struct vxlan_dev {
 struct net_device *vxlan_dev_create(struct net *net, const char *name,
u8 name_assign_type, struct vxlan_config 
*conf);
 
-static inline __be16 vxlan_dev_dst_port(struct vxlan_dev *vxlan,
-   unsigned short family)
-{
-#if IS_ENABLED(CONFIG_IPV6)
-   if (family == AF_INET6)
-   return inet_sk(vxlan->vn6_sock->sock->sk)->inet_sport;
-#endif
-   return inet_sk(vxlan->vn4_sock->sock->sk)->inet_sport;
-}
-
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 netdev_features_t features)
 {
-- 
1.9.1

[PATCH net-next 6/7] vxlan: simplify vxlan xmit

2016-11-04 Thread Pravin B Shelar

Existing vxlan xmit function handles two distinct cases.
1. vxlan net device
2. vxlan lwt device.
By seperating initilization these two cases the egress path
looks better.

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 79 -
 1 file changed, 35 insertions(+), 44 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d496763..7bba92b 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1978,8 +1978,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct dst_cache *dst_cache;
struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
-   struct sock *sk;
-   const struct iphdr *old_iph;
+   const struct iphdr *old_iph = ip_hdr(skb);
union vxlan_addr *dst;
union vxlan_addr remote_ip, local_ip;
union vxlan_addr *src;
@@ -1987,7 +1986,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct vxlan_metadata *md = &_md;
__be16 src_port = 0, dst_port;
__be32 vni, label;
-   __be16 df = 0;
__u8 tos, ttl;
int err;
u32 flags = vxlan->flags;
@@ -1997,11 +1995,34 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
info = skb_tunnel_info(skb);
 
if (rdst) {
+   dst = &rdst->remote_ip;
+   if (vxlan_addr_any(dst)) {
+   if (did_rsc) {
+   /* short-circuited back to local bridge */
+   vxlan_encap_bypass(skb, vxlan, vxlan);
+   return;
+   }
+   goto drop;
+   }
+
dst_port = rdst->remote_port ? rdst->remote_port : 
vxlan->cfg.dst_port;
vni = rdst->remote_vni;
-   dst = &rdst->remote_ip;
src = &vxlan->cfg.saddr;
dst_cache = &rdst->dst_cache;
+   md->gbp = skb->mark;
+   ttl = vxlan->cfg.ttl;
+   if (!ttl && vxlan_addr_multicast(dst))
+   ttl = 1;
+
+   tos = vxlan->cfg.tos;
+   if (tos == 1)
+   tos = ip_tunnel_get_dsfield(old_iph, skb);
+
+   if (dst->sa.sa_family == AF_INET)
+   udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX);
+   else
+   udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM6_TX);
+   label = vxlan->cfg.label;
} else {
if (!info) {
WARN_ONCE(1, "%s: Missing encapsulation instructions\n",
@@ -2021,32 +2042,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
dst = &remote_ip;
src = &local_ip;
dst_cache = &info->dst_cache;
-   }
-
-   if (vxlan_addr_any(dst)) {
-   if (did_rsc) {
-   /* short-circuited back to local bridge */
-   vxlan_encap_bypass(skb, vxlan, vxlan);
-   return;
-   }
-   goto drop;
-   }
-
-   old_iph = ip_hdr(skb);
-
-   ttl = vxlan->cfg.ttl;
-   if (!ttl && vxlan_addr_multicast(dst))
-   ttl = 1;
-
-   tos = vxlan->cfg.tos;
-   if (tos == 1)
-   tos = ip_tunnel_get_dsfield(old_iph, skb);
-
-   label = vxlan->cfg.label;
-   src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min,
-vxlan->cfg.port_max, true);
-
-   if (info) {
ttl = info->key.ttl;
tos = info->key.tos;
label = info->key.label;
@@ -2054,15 +2049,15 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
if (info->options_len)
md = ip_tunnel_info_opts(info);
-   } else {
-   md->gbp = skb->mark;
}
 
+   src_port = udp_flow_src_port(dev_net(dev), skb, vxlan->cfg.port_min,
+vxlan->cfg.port_max, true);
+
if (dst->sa.sa_family == AF_INET) {
struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock);
struct rtable *rt;
-
-   sk = sock4->sock->sk;
+   __be16 df = 0;
 
rt = vxlan_get_route(vxlan, dev, sock4, skb,
 rdst ? rdst->remote_ifindex : 0, tos,
@@ -2078,18 +2073,17 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
rt->rt_flags);
if (err)
return;
-   udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX);
} else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) {
df = htons(IP_DF);
}
-

[PATCH net-next 2/7] vxlan: simplify exception handling

2016-11-04 Thread Pravin B Shelar

vxlan egress path error handling has became complicated, it
need to handle IPv4 and IPv6 tunnel cases.
Earlier patch removes vlan handling from vxlan_build_skb(), so
vxlan_build_skb does not need to free skb and we can simplify
the xmit path by having single error handling for both type of
tunnels.

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 28 +++-
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 756d826..a1e707f 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1789,7 +1789,7 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
return 0;
 
 out_free:
-   kfree_skb(skb);
+   dst_release(dst);
return err;
 }
 
@@ -1927,7 +1927,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct ip_tunnel_info *info;
struct vxlan_dev *vxlan = netdev_priv(dev);
struct sock *sk;
-   struct rtable *rt = NULL;
const struct iphdr *old_iph;
union vxlan_addr *dst;
union vxlan_addr remote_ip, local_ip;
@@ -2009,6 +2008,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
if (dst->sa.sa_family == AF_INET) {
struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock);
+   struct rtable *rt;
 
if (!sock4)
goto drop;
@@ -2030,7 +2030,8 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
netdev_dbg(dev, "circular route to %pI4\n",
   &dst->sin.sin_addr.s_addr);
dev->stats.collisions++;
-   goto rt_tx_error;
+   ip_rt_put(rt);
+   goto tx_error;
}
 
/* Bypass encapsulation if the destination is local */
@@ -2058,7 +2059,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
err = vxlan_build_skb(skb, &rt->dst, sizeof(struct iphdr),
  vni, md, flags, udp_sum);
if (err < 0)
-   goto xmit_tx_error;
+   goto tx_error;
 
udp_tunnel_xmit_skb(rt, sk, skb, src->sin.sin_addr.s_addr,
dst->sin.sin_addr.s_addr, tos, ttl, df,
@@ -2117,11 +2118,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
skb_scrub_packet(skb, xnet);
err = vxlan_build_skb(skb, ndst, sizeof(struct ipv6hdr),
  vni, md, flags, udp_sum);
-   if (err < 0) {
-   dst_release(ndst);
-   dev->stats.tx_errors++;
-   return;
-   }
+   if (err < 0)
+   goto tx_error;
+
udp_tunnel6_xmit_skb(ndst, sk, skb, dev,
 &src->sin6.sin6_addr,
 &dst->sin6.sin6_addr, tos, ttl,
@@ -2133,17 +2132,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
 drop:
dev->stats.tx_dropped++;
-   goto tx_free;
+   dev_kfree_skb(skb);
+   return;
 
-xmit_tx_error:
-   /* skb is already freed. */
-   skb = NULL;
-rt_tx_error:
-   ip_rt_put(rt);
 tx_error:
dev->stats.tx_errors++;
-tx_free:
-   dev_kfree_skb(skb);
+   kfree_skb(skb);
 }
 
 /* Transmit local packets over Vxlan
-- 
1.9.1

[PATCH net-next 5/7] vxlan: simplify RTF_LOCAL handling.

2016-11-04 Thread Pravin B Shelar

Avoid code duplicate code for handling RTF_LOCAL routes.

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 78 +
 1 file changed, 43 insertions(+), 35 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 15319f1..d496763 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1945,6 +1945,33 @@ static void vxlan_encap_bypass(struct sk_buff *skb, 
struct vxlan_dev *src_vxlan,
}
 }
 
+static int check_route_rtf_local(struct sk_buff *skb, struct net_device *dev,
+struct vxlan_dev *vxlan, union vxlan_addr 
*daddr,
+__be32 dst_port, __be32 vni, struct dst_entry 
*dst,
+u32 rt_flags)
+{
+   /* Bypass encapsulation if the destination is local */
+   if (rt_flags & RTF_LOCAL &&
+   !(rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
+   struct vxlan_dev *dst_vxlan;
+
+   dst_release(dst);
+   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
+  daddr->sa.sa_family, dst_port,
+  vxlan->flags);
+   if (!dst_vxlan) {
+   dev->stats.tx_errors++;
+   kfree_skb(skb);
+
+   return -ENOENT;
+   }
+   vxlan_encap_bypass(skb, vxlan, dst_vxlan);
+   return 1;
+   }
+
+   return 0;
+}
+
 static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
   struct vxlan_rdst *rdst, bool did_rsc)
 {
@@ -2045,26 +2072,16 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
if (IS_ERR(rt))
goto tx_error;
 
-   /* Bypass encapsulation if the destination is local */
-   if (!info && rt->rt_flags & RTCF_LOCAL &&
-   !(rt->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
-   struct vxlan_dev *dst_vxlan;
-
-   ip_rt_put(rt);
-   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-  dst->sa.sa_family, dst_port,
-  vxlan->flags);
-   if (!dst_vxlan)
-   goto tx_error;
-   vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-   return;
-   }
-
-   if (!info)
+   if (!info) {
+   err = check_route_rtf_local(skb, dev, vxlan, dst,
+   dst_port, vni, &rt->dst,
+   rt->rt_flags);
+   if (err)
+   return;
udp_sum = !(flags & VXLAN_F_UDP_ZERO_CSUM_TX);
-   else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT)
+   } else if (info->key.tun_flags & TUNNEL_DONT_FRAGMENT) {
df = htons(IP_DF);
-
+   }
tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
err = vxlan_build_skb(skb, &rt->dst, sizeof(struct iphdr),
@@ -2079,7 +2096,6 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
} else {
struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
struct dst_entry *ndst;
-   u32 rt6i_flags;
 
sk = sock6->sock->sk;
 
@@ -2091,24 +2107,16 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
if (IS_ERR(ndst))
goto tx_error;
 
-   /* Bypass encapsulation if the destination is local */
-   rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags;
-   if (!info && rt6i_flags & RTF_LOCAL &&
-   !(rt6i_flags & (RTCF_BROADCAST | RTCF_MULTICAST))) {
-   struct vxlan_dev *dst_vxlan;
-
-   dst_release(ndst);
-   dst_vxlan = vxlan_find_vni(vxlan->net, vni,
-  dst->sa.sa_family, dst_port,
-  vxlan->flags);
-   if (!dst_vxlan)
-   goto tx_error;
-   vxlan_encap_bypass(skb, vxlan, dst_vxlan);
-   return;
-   }
+   if (!info) {
+   u32 rt6i_flags = ((struct rt6_info *)ndst)->rt6i_flags;
 
-   if (!info)
+   err = check_route_rtf_local(skb, dev, vxlan, dst,
+   dst_port, vni, ndst,
+   rt6i_flags);
+

[PATCH net-next 0/7] vxlan: General improvements.

2016-11-04 Thread Pravin B Shelar

Following patch series improves vxlan fast path, removes
duplicate code and simplifies vxlan xmit code path.

Pravin B Shelar (7):
  vxlan: avoid vlan processing in vxlan device.
  vxlan: simplify exception handling
  vxlan: avoid checking socket multiple times.
  vxlan: improve vxlan route lookup checks.
  vxlan: simplify RTF_LOCAL handling.
  vxlan: simplify vxlan xmit
  vxlan: remove unsed vxlan_dev_dst_port()

 drivers/net/vxlan.c | 264 ++--
 include/linux/if_vlan.h |  16 ---
 include/net/vxlan.h |  10 --
 3 files changed, 123 insertions(+), 167 deletions(-)

-- 
1.9.1

[PATCH net-next 3/7] vxlan: avoid checking socket multiple times.

2016-11-04 Thread Pravin B Shelar

Check the vxlan socket in vxlan6_getroute().

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index a1e707f..6435d6a 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1830,6 +1830,7 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan,
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan,
+ struct vxlan_sock *sock6,
  struct sk_buff *skb, int oif, u8 tos,
  __be32 label,
  const struct in6_addr *daddr,
@@ -1837,7 +1838,6 @@ static struct dst_entry *vxlan6_get_route(struct 
vxlan_dev *vxlan,
  struct dst_cache *dst_cache,
  const struct ip_tunnel_info *info)
 {
-   struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
struct dst_entry *ndst;
struct flowi6 fl6;
@@ -2070,11 +2070,9 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct dst_entry *ndst;
u32 rt6i_flags;
 
-   if (!sock6)
-   goto drop;
sk = sock6->sock->sk;
 
-   ndst = vxlan6_get_route(vxlan, skb,
+   ndst = vxlan6_get_route(vxlan, sock6, skb,
rdst ? rdst->remote_ifindex : 0, tos,
label, &dst->sin6.sin6_addr,
&src->sin6.sin6_addr,
@@ -2426,9 +2424,10 @@ static int vxlan_fill_metadata_dst(struct net_device 
*dev, struct sk_buff *skb)
ip_rt_put(rt);
} else {
 #if IS_ENABLED(CONFIG_IPV6)
+   struct vxlan_sock *sock6 = rcu_dereference(vxlan->vn6_sock);
struct dst_entry *ndst;
 
-   ndst = vxlan6_get_route(vxlan, skb, 0, info->key.tos,
+   ndst = vxlan6_get_route(vxlan, sock6, skb, 0, info->key.tos,
info->key.label, &info->key.u.ipv6.dst,
&info->key.u.ipv6.src, NULL, info);
if (IS_ERR(ndst))
-- 
1.9.1

[PATCH net-next 1/7] vxlan: avoid vlan processing in vxlan device.

2016-11-04 Thread Pravin B Shelar

VxLan device does not have special handling for vlan taging on egress.
Therefore it does not make sense to expose vlan offloading feature.

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c |  9 +
 include/linux/if_vlan.h | 16 
 2 files changed, 1 insertion(+), 24 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index cb5cc7c..756d826 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1748,18 +1748,13 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
}
 
min_headroom = LL_RESERVED_SPACE(dst->dev) + dst->header_len
-   + VXLAN_HLEN + iphdr_len
-   + (skb_vlan_tag_present(skb) ? VLAN_HLEN : 0);
+   + VXLAN_HLEN + iphdr_len;
 
/* Need space for new headers (invalidates iph ptr) */
err = skb_cow_head(skb, min_headroom);
if (unlikely(err))
goto out_free;
 
-   skb = vlan_hwaccel_push_inside(skb);
-   if (WARN_ON(!skb))
-   return -ENOMEM;
-
err = iptunnel_handle_offloads(skb, type);
if (err)
goto out_free;
@@ -2527,10 +2522,8 @@ static void vxlan_setup(struct net_device *dev)
dev->features   |= NETIF_F_GSO_SOFTWARE;
 
dev->vlan_features = dev->features;
-   dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
dev->hw_features |= NETIF_F_SG | NETIF_F_HW_CSUM | NETIF_F_RXCSUM;
dev->hw_features |= NETIF_F_GSO_SOFTWARE;
-   dev->hw_features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX;
netif_keep_dst(dev);
dev->priv_flags |= IFF_NO_QUEUE;
 
diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 3319d97..8d5fcd6 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -399,22 +399,6 @@ static inline struct sk_buff 
*__vlan_hwaccel_push_inside(struct sk_buff *skb)
skb->vlan_tci = 0;
return skb;
 }
-/*
- * vlan_hwaccel_push_inside - pushes vlan tag to the payload
- * @skb: skbuff to tag
- *
- * Checks is tag is present in @skb->vlan_tci and if it is, it pushes the
- * VLAN tag from @skb->vlan_tci inside to the payload.
- *
- * Following the skb_unshare() example, in case of error, the calling function
- * doesn't have to worry about freeing the original skb.
- */
-static inline struct sk_buff *vlan_hwaccel_push_inside(struct sk_buff *skb)
-{
-   if (skb_vlan_tag_present(skb))
-   skb = __vlan_hwaccel_push_inside(skb);
-   return skb;
-}
 
 /**
  * __vlan_hwaccel_put_tag - hardware accelerated VLAN inserting
-- 
1.9.1

[PATCH net-next 4/7] vxlan: improve vxlan route lookup checks.

2016-11-04 Thread Pravin B Shelar

Move route sanity check to respective vxlan[4/6]_get_route functions.
This allows us to perform all sanity checks before caching the dst so
that we can avoid these checks on subsequent packets.
This give move accurate metadata information for packet from
fill_metadata_dst().

Signed-off-by: Pravin B Shelar 
---
 drivers/net/vxlan.c | 71 +
 1 file changed, 34 insertions(+), 37 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 6435d6a..15319f1 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1793,7 +1793,8 @@ static int vxlan_build_skb(struct sk_buff *skb, struct 
dst_entry *dst,
return err;
 }
 
-static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan,
+static struct rtable *vxlan_get_route(struct vxlan_dev *vxlan, struct 
net_device *dev,
+ struct vxlan_sock *sock4,
  struct sk_buff *skb, int oif, u8 tos,
  __be32 daddr, __be32 *saddr,
  struct dst_cache *dst_cache,
@@ -1803,6 +1804,9 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan,
struct rtable *rt = NULL;
struct flowi4 fl4;
 
+   if (!sock4)
+   return ERR_PTR(-EIO);
+
if (tos && !info)
use_cache = false;
if (use_cache) {
@@ -1820,16 +1824,27 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan,
fl4.saddr = *saddr;
 
rt = ip_route_output_key(vxlan->net, &fl4);
-   if (!IS_ERR(rt)) {
+   if (likely(!IS_ERR(rt))) {
+   if (rt->dst.dev == dev) {
+   netdev_dbg(dev, "circular route to %pI4\n", &daddr);
+   dev->stats.collisions++;
+   ip_rt_put(rt);
+   return ERR_PTR(-ELOOP);
+   }
+
*saddr = fl4.saddr;
if (use_cache)
dst_cache_set_ip4(dst_cache, &rt->dst, fl4.saddr);
+   } else {
+   netdev_dbg(dev, "no route to %pI4\n", &daddr);
+   dev->stats.tx_carrier_errors++;
}
return rt;
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
 static struct dst_entry *vxlan6_get_route(struct vxlan_dev *vxlan,
+ struct net_device *dev,
  struct vxlan_sock *sock6,
  struct sk_buff *skb, int oif, u8 tos,
  __be32 label,
@@ -1865,8 +1880,18 @@ static struct dst_entry *vxlan6_get_route(struct 
vxlan_dev *vxlan,
err = ipv6_stub->ipv6_dst_lookup(vxlan->net,
 sock6->sock->sk,
 &ndst, &fl6);
-   if (err < 0)
+   if (unlikely(err < 0)) {
+   netdev_dbg(dev, "no route to %pI6\n", daddr);
+   dev->stats.tx_carrier_errors++;
return ERR_PTR(err);
+   }
+
+   if (unlikely(ndst->dev == dev)) {
+   netdev_dbg(dev, "circular route to %pI6\n", daddr);
+   dst_release(ndst);
+   dev->stats.collisions++;
+   return ERR_PTR(-ELOOP);
+   }
 
*saddr = fl6.saddr;
if (use_cache)
@@ -2010,29 +2035,15 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
struct vxlan_sock *sock4 = rcu_dereference(vxlan->vn4_sock);
struct rtable *rt;
 
-   if (!sock4)
-   goto drop;
sk = sock4->sock->sk;
 
-   rt = vxlan_get_route(vxlan, skb,
+   rt = vxlan_get_route(vxlan, dev, sock4, skb,
 rdst ? rdst->remote_ifindex : 0, tos,
 dst->sin.sin_addr.s_addr,
 &src->sin.sin_addr.s_addr,
 dst_cache, info);
-   if (IS_ERR(rt)) {
-   netdev_dbg(dev, "no route to %pI4\n",
-  &dst->sin.sin_addr.s_addr);
-   dev->stats.tx_carrier_errors++;
-   goto tx_error;
-   }
-
-   if (rt->dst.dev == dev) {
-   netdev_dbg(dev, "circular route to %pI4\n",
-  &dst->sin.sin_addr.s_addr);
-   dev->stats.collisions++;
-   ip_rt_put(rt);
+   if (IS_ERR(rt))
goto tx_error;
-   }
 
/* Bypass encapsulation if the destination is local */
if (!info && rt->rt_flags & RTCF_LOCAL &&
@@ -2072,25 +2083,13 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct 
net_device *dev,
 
sk = sock6->sock->sk;
 
-   ndst = vxlan6_get_route(vxlan, sock6, skb,
+

Re: [PATCH net] r8152: Fix broken RX checksums.

2016-11-04 Thread Mark Lord

On 16-11-04 09:50 AM, Mark Lord wrote:
> Yeah, the device or driver is definitely getting confused with rx_desc 
> structures.
> I added code to check for unlikely rx_desc values, and it found this for 
> starters:
> 
> rx_desc: 00480801 00480401 00480001 0048fc00 0048f800 0048f400 pkt_len=2045
> rx_data: 00 f0 48 00 00 ec 48 00 00 e8 48 00 00 e4 48 00 00 e0 48 00 00 dc 48 
> 00 00 d8 48 00 00 d4
> 48 00
> rx_data: 00 d0 48 00 00 cc 48 00 00 c8 48 00 00 c4 48 00 00 c0 48 00 00 bc 48 
> 00 00 b8 48 00 00 b4
> 48 00
> rx_data: 00 b0 48 00 00 ac 48 00 00 01 00 00 81 ed 00 00 00 01 00 00 00 00 00 
> 00 00 00 00 02 4d ac
> 00 00
> rx_data: 10 00 ff ff ff ff 00 00 01 28 83 d6 ff 6d 00 20 25 b1 58 1b 68 ff 00 
> 05 20 01 56 41 17 35
> 00 00
> ...
> 
> The MTU/MRU on this link is the standard 1500 bytes, so a pkt_len of 2045 
> isn't valid here.
> And the rx_desc values look an awful lot like the rx_data values that follow 
> it.
> 
> There's definitely more broken here than just TCP RX checksums.

I spent a bit more time on this again today, and made progress.
The issue seems to be stale rx buffers.
I'll discuss further offline with Hayes Wang.

-- 
Mark Lord
Real-Time Remedies Inc.
ml...@pobox.com

Re: Coding Style: Reverse XMAS tree declarations ?

2016-11-04 Thread Lino Sanfilippo

On 04.11.2016 18:44, Joe Perches wrote:
> On Fri, 2016-11-04 at 11:07 -0400, David Miller wrote:
>> From: Lino Sanfilippo 
>> > On 04.11.2016 07:53, Joe Perches wrote:
>> >> CHECK:REVERSE_XMAS_TREE: Prefer ordering declarations longest to
>> >> shortest
>> >> #446: FILE: drivers/net/ethernet/ethoc.c:446:
>> >> +int size = bd.stat >> 16;
>> >> +struct sk_buff *skb;
>> > should not this case be valid? Optically the longer line is already
>> > before the shorter.
>> > I think that the whole point in using this reverse xmas tree ordering
>> > is to have
>> > the code optically tidied up and not to enforce ordering between
>> > variable name lengths.
>> 
>> That's correct.
> 
> And also another reason the whole reverse xmas tree
> automatic declaration layout concept is IMO dubious.
> 
> Basically, you're looking not at the initial ordering
> of automatics as important, but helping find a specific
> automatic when reversing from reading code is not always
> correct.
> 
> Something like:
> 
> static void function{args,...)
> {
>   [longish list of reverse xmas tree identifiers...]
>   struct foo *bar = longish_function(args, ...);
>   struct foobarbaz *qux;
>   [more identifers]
> 
>   [multiple screenfuls of code later...)
> 
>   new_function(..., bar, ...);
> 
>   [more code...]
> }
> 
> and the reverse xmas tree helpfulness of looking up the
> type of bar is neither obvious nor easy.
> 

In this case it is IMHO rather the declaration + initialization that makes
"bar" hard to find at one glance, not the use of RXT. You could do something 
like

[longish list of reverse xmas tree identifiers...]
struct foobarbaz *qux;
struct foo *bar;

bar = longish_function(args, ...);

to increase readability. 

Personally I find it more readable to always use a separate line for 
initializations
by means of functions (regardless of whether the RXT scheme is used or not). 
 
> My preference would be for a bar that serves coffee and alcohol.
> 

At least a bar like this should not be too hard to find :)

Regards,
Lino

Re: Coding Style: Reverse XMAS tree declarations ? (was Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet)

2016-11-04 Thread David VomLehn

On Fri, Nov 04, 2016 at 10:05:15AM -0700, Randy Dunlap wrote:
> On 11/03/16 23:53, Joe Perches wrote:
> > On Thu, 2016-11-03 at 15:58 -0400, David Miller wrote:
> >> From: Madalin Bucur 
> >> Date: Wed, 2 Nov 2016 22:17:26 +0200
> >>
> >>> This introduces the Freescale Data Path Acceleration Architecture
> >>> +static inline size_t bpool_buffer_raw_size(u8 index, u8 cnt)
> >>> +{
> >>> + u8 i;
> >>> + size_t res = DPAA_BP_RAW_SIZE / 2;
> >>
> >> Always order local variable declarations from longest to shortest line,
> >> also know as Reverse Christmas Tree Format.
> > 
> > I think this declaration sorting order is misguided but
> > here's a possible change to checkpatch adding a test for it
> > that does this test just for net/ and drivers/net/
> 
> I agree with the misguided part.
> That's not actually in CodingStyle AFAICT. Where did this come from?
> 
> 
> thanks.
> -- 
> ~Randy

This puzzles me. The CodingStyle gives some pretty reasonable rationales
for coding style over above the "it's easier to read if it all looks the
same". I can see rationales for other approaches (and I am not proposing
any of these):

alphabetic orderEasier to search for declarations
complex to simple   As in, structs and unions, pointers to simple
data (int, char), simple data. It seems like I
can deduce the simple types from usage, but more
complex I need to know things like the
particular structure.
group by usage  Mirror the ontological locality in the code

Do we have a basis for thinking this is easier or more consistent than
any other approach?
-- 
David VL

[PATCH net] sock: fix sendmmsg for partial sendmsg

2016-11-04 Thread Soheil Hassas Yeganeh

From: Soheil Hassas Yeganeh 

Do not send the next message in sendmmsg for partial sendmsg
invocations.

sendmmsg assumes that it can continue sending the next message
when the return value of the individual sendmsg invocations
is positive. It results in corrupting the data for TCP,
SCTP, and UNIX streams.

For example, sendmmsg([["abcd"], ["efgh"]]) can result in a stream
of "aefgh" if the first sendmsg invocation sends only the first
byte while the second sendmsg goes through.

Datagram sockets either send the entire datagram or fail, so
this patch affects only sockets of type SOCK_STREAM and
SOCK_SEQPACKET.

Fixes: 228e548e6020 ("net: Add sendmmsg socket system call")
Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Eric Dumazet 
Signed-off-by: Willem de Bruijn 
Signed-off-by: Neal Cardwell 
---
 net/socket.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/socket.c b/net/socket.c
index 5a9bf5e..272518b 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -2038,6 +2038,8 @@ int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg, 
unsigned int vlen,
if (err)
break;
++datagrams;
+   if (msg_data_left(&msg_sys))
+   break;
cond_resched();
}
 
-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH net-next] sock: do not set sk_err in sock_dequeue_err_skb

2016-11-04 Thread Willem de Bruijn

On Thu, Nov 3, 2016 at 7:10 PM, Hannes Frederic Sowa
 wrote:
> [also cc'ed Andy, albeit this doesn't seem to solve his initial problem,
> right? ]

Indeed, this does not help disambiguate the source of an error
returned by a socketcall. It only reduces how often sk_err is leaked
into the socket call datapath. Local errors and timestamps never set
sk_err in the first place, so there can be no expectation that they
set it on dequeue.

I suspect that that line in sock_dequeue_err_skb originates from icmp
errors, as those do set sk_err whenever queueing an error. So
reenabling it when there are multiple errors on the queue makes sense,
if all queued errors would be icmp errors. But they are not. And that
path is racy, so there is no guarantee, either way. Processes that
rely on syscall blocking to read the errqueue still block for icmp
errors after this patch, due to the initial sk_err assignment on
enqueue.

Note that the comment in the patch that only TCP socket calls are
blocked is probably incorrect. TCP has its own check in tcp_sendmsg
and returns EPIPE when an sk_err is set, without clearing it. But
other send, recv, connect, .. implementations include a path to
sock_error that aborts the call (e.g., in __skb_try_recv_datagram),
returns the value of sk_err, and clears it. This is an inconsistency
between the protocols.

Andy's original problem mentions recvmsg. We could probably make that
unambiguous by returning a cmsg with the value of sk_err if RECVERR is
set and sk_err is the source of the error. But this still does not
solve the general issue, as other socketcalls, like send and connect,
can also return sk_err. We can modify the number of places that check
for it and currently abort system calls. Mainly sock_error(). One
option is to add a socket option (SOL_SOCKET, as sk_err is not limited
to when INET_RECVERR is set) to make that a noop. If the caller sets
this option it instead has to wait with poll to receive POLLERR and
call getsockopt SO_ERROR and recv MSG_ERRQUEUE to get the asynchronous
error.

Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer

2016-11-04 Thread Alexander Duyck

On Fri, Nov 4, 2016 at 12:07 PM, Jason Baron  wrote:
>
>
> On 11/04/2016 02:43 PM, Alexander Duyck wrote:
>>
>> On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron  wrote:
>>>
>>> From: Jason Baron 
>>>
>>> When read() is called on /proc/net/route requesting a size that is one
>>> entry size (128 bytes) less than m->size or greater, the resulting output
>>> has missing and/or duplicate entries. Since m->size is typically
>>> PAGE_SIZE,
>>> for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
>>> bytes will see bogus output.
>>>
>>> For example:
>>>
>>> for i in {100..200}; do
>>> ip route add 192.168.1.$i dev eth0
>>> done
>>> dd if=/proc/net/route of=/tmp/good bs=1024
>>> dd if=/proc/net/route of=/tmp/bad bs=4096
>>>
>>> # diff -q /tmp/good /tmp/bad
>>> Files /tmp/good and /tmp/bad differ
>>>
>>> I think this has gone unnoticed, since the output of 'netstat -r' and
>>> 'route' is generated by reading in 1,024 byte increments and thus not
>>> corrupted. Further, the number of entries in the route table needs to be
>>> sufficiently large in order to trigger the problematic case.
>>>
>>> The issue arises because fib_route_get_idx() does not properly handle
>>> the case where pos equals iter->pos. This case only arises when we have
>>> a large read buffer size because we end up re-requesting the last entry
>>> that overflowed m->buf. In the case of a smaller read buffer size,
>>> we don't exceed the size of m->buf, and thus fib_route_get_idx() is
>>> called
>>> with pos greater than iter->pos.
>>>
>>> Fix by properly handling the iter->pos == pos case.
>>>
>>> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in
>>> /proc/net/route")
>>> Cc: Andy Whitcroft 
>>> Cc: Alexander Duyck 
>>> Signed-off-by: Jason Baron 
>>> ---
>>>  net/ipv4/fib_trie.c | 12 ++--
>>>  1 file changed, 10 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index 31cef3602585..1017533fc75c 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -2411,12 +2411,17 @@ static struct key_vector
>>> *fib_route_get_idx(struct fib_route_iter *iter,
>>> loff_t pos)
>>>  {
>>> struct key_vector *l, **tp = &iter->tnode;
>>> +   loff_t saved_pos = 0;
>>> t_key key;
>>>
>>> /* use cache location of next-to-find key */
>>> if (iter->pos > 0 && pos >= iter->pos) {
>>> pos -= iter->pos;
>>> key = iter->key;
>>> +   if (pos == 0) {
>>> +   saved_pos = iter->pos;
>>> +   key--;
>>> +   }
>>> } else {
>>> iter->pos = 0;
>>> key = 0;
>>> @@ -2436,10 +2441,13 @@ static struct key_vector
>>> *fib_route_get_idx(struct fib_route_iter *iter,
>>> break;
>>> }
>>>
>>> -   if (l)
>>> +   if (l) {
>>> iter->key = key;/* remember it */
>>> -   else
>>> +   if (saved_pos)
>>> +   iter->pos = saved_pos;
>>> +   } else {
>>> iter->pos = 0;  /* forget it */
>>> +   }
>>>
>>> return l;
>>>  }
>>
>>
>> This doesn't seem correct to me.  I will have to look through this.
>> My understanding is that the value of iter->pos is supposed to be the
>> next position for us to grab, not the last one that was retrieved.  If
>> we are trying to re-request the last value then we should be falling
>> back into the else case for this since pos should be one less than
>> iter->pos.  The problem is the table could change out from under us
>> which is one of the reasons why we don't want to try and rewind the
>> key like you are doing here.
>>
>> - Alex
>>
>
> Hi Alex,
>
> In this case, seq_read() has called m->op->next(), which sets iter->pos
> equal to pos and iter->key to key + 1. However, when we then go to output
> the item associated with key, the 'm->op->next()' call overflows. Thus, we
> have a situation where iter->pos equals pos, iter->key = key + 1, but we
> have not displayed the item at position 'key' (thus the bug is that we miss
> the item at key).
>
> The change I proposed was simply to restart the search from 'key' in this
> case. If that item has disappeared, we will output the next one, or if its
> been replaced we will display its replacement. I think that is
> ok?
>
> The bug could also be fixed by changing:
>
> if (iter->pos > 0 && pos >= iter->pos) {
>
> to say:
>
> if (iter->pos > 0 && pos > iter->pos) {
>
> But that restarts the search on every overflow, which could mean every page
> size, and that seems suboptimal to me. Like-wise, if we make pos 1 less than
> iter->pos that restarts the search. The idea with this patch is to not force
> us to redo the entire search on each overflow.
>
> Thanks,
>
> -Jason

Actually I think the underlying issue is that we still have an
unresolved off by one error.  Specifically

Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer

2016-11-04 Thread Jason Baron




On 11/04/2016 02:43 PM, Alexander Duyck wrote:

On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron  wrote:

From: Jason Baron 

When read() is called on /proc/net/route requesting a size that is one
entry size (128 bytes) less than m->size or greater, the resulting output
has missing and/or duplicate entries. Since m->size is typically PAGE_SIZE,
for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
bytes will see bogus output.

For example:

for i in {100..200}; do
ip route add 192.168.1.$i dev eth0
done
dd if=/proc/net/route of=/tmp/good bs=1024
dd if=/proc/net/route of=/tmp/bad bs=4096

# diff -q /tmp/good /tmp/bad
Files /tmp/good and /tmp/bad differ

I think this has gone unnoticed, since the output of 'netstat -r' and
'route' is generated by reading in 1,024 byte increments and thus not
corrupted. Further, the number of entries in the route table needs to be
sufficiently large in order to trigger the problematic case.

The issue arises because fib_route_get_idx() does not properly handle
the case where pos equals iter->pos. This case only arises when we have
a large read buffer size because we end up re-requesting the last entry
that overflowed m->buf. In the case of a smaller read buffer size,
we don't exceed the size of m->buf, and thus fib_route_get_idx() is called
with pos greater than iter->pos.

Fix by properly handling the iter->pos == pos case.

Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in 
/proc/net/route")
Cc: Andy Whitcroft 
Cc: Alexander Duyck 
Signed-off-by: Jason Baron 
---
 net/ipv4/fib_trie.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
index 31cef3602585..1017533fc75c 100644
--- a/net/ipv4/fib_trie.c
+++ b/net/ipv4/fib_trie.c
@@ -2411,12 +2411,17 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
loff_t pos)
 {
struct key_vector *l, **tp = &iter->tnode;
+   loff_t saved_pos = 0;
t_key key;

/* use cache location of next-to-find key */
if (iter->pos > 0 && pos >= iter->pos) {
pos -= iter->pos;
key = iter->key;
+   if (pos == 0) {
+   saved_pos = iter->pos;
+   key--;
+   }
} else {
iter->pos = 0;
key = 0;
@@ -2436,10 +2441,13 @@ static struct key_vector *fib_route_get_idx(struct 
fib_route_iter *iter,
break;
}

-   if (l)
+   if (l) {
iter->key = key;/* remember it */
-   else
+   if (saved_pos)
+   iter->pos = saved_pos;
+   } else {
iter->pos = 0;  /* forget it */
+   }

return l;
 }


This doesn't seem correct to me.  I will have to look through this.
My understanding is that the value of iter->pos is supposed to be the
next position for us to grab, not the last one that was retrieved.  If
we are trying to re-request the last value then we should be falling
back into the else case for this since pos should be one less than
iter->pos.  The problem is the table could change out from under us
which is one of the reasons why we don't want to try and rewind the
key like you are doing here.

- Alex



Hi Alex,

In this case, seq_read() has called m->op->next(), which sets iter->pos 
equal to pos and iter->key to key + 1. However, when we then go to 
output the item associated with key, the 'm->op->next()' call overflows. 
Thus, we have a situation where iter->pos equals pos, iter->key = key + 
1, but we have not displayed the item at position 'key' (thus the bug is 
that we miss the item at key).


The change I proposed was simply to restart the search from 'key' in 
this case. If that item has disappeared, we will output the next one, or 
if its been replaced we will display its replacement. I think that is

ok?

The bug could also be fixed by changing:

if (iter->pos > 0 && pos >= iter->pos) {

to say:

if (iter->pos > 0 && pos > iter->pos) {

But that restarts the search on every overflow, which could mean every 
page size, and that seems suboptimal to me. Like-wise, if we make pos 1 
less than iter->pos that restarts the search. The idea with this patch 
is to not force us to redo the entire search on each overflow.


Thanks,

-Jason

Re: [PATCH net 0/6] Mellanox 100G mlx5 fixes 2016-11-04

2016-11-04 Thread David Miller

From: Saeed Mahameed 
Date: Fri,  4 Nov 2016 01:48:41 +0200

> This series contains six hot fixes of the mlx5 core and mlx5e driver.
> 
> Huy fixed an invalid pointer dereference on initialization flow for when
> the selected mlx5 load profile is out of range.
> 
> Or provided three eswitch offloads related fixes
>  - Prevent changing NS of a VF representor. 
>  - Handle matching on vlan priority for offloaded TC rules
>  - Set the actions for offloaded rules properly
> 
> On my part I here addressed the error flow related issues in
> mlx5e_open_channel reported by Jesper just this week.

Series applied, thanks.

Re: [PATCH net-next resend 00/13] ring reconfiguration and XDP support

2016-11-04 Thread David Miller

From: Jakub Kicinski 
Date: Thu,  3 Nov 2016 17:11:56 +

> This set adds support for ethtool channel API and XDP.

Series applied, thank you!

[PATCH net-next 1/2] tcp: shortcut listeners in tcp_get_info()

2016-11-04 Thread Eric Dumazet

Being lockless in tcp_get_info() is hard, because we need to add
specific synchronization in TCP fast path, like seqcount.

Following patch will change inet_diag_dump_icsk() to no longer
hold any lock for non listeners, so that we can properly acquire
socket lock in get_tcp_info() and let it return more consistent counters.

Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Neal Cardwell 
---
 net/ipv4/tcp.c | 41 -
 1 file changed, 24 insertions(+), 17 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3251fe71f39f..117982be0cab 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2721,6 +2721,27 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 
info->tcpi_state = sk_state_load(sk);
 
+   /* Report meaningful fields for all TCP states, including listeners */
+   rate = READ_ONCE(sk->sk_pacing_rate);
+   rate64 = rate != ~0U ? rate : ~0ULL;
+   put_unaligned(rate64, &info->tcpi_pacing_rate);
+
+   rate = READ_ONCE(sk->sk_max_pacing_rate);
+   rate64 = rate != ~0U ? rate : ~0ULL;
+   put_unaligned(rate64, &info->tcpi_max_pacing_rate);
+
+   info->tcpi_reordering = tp->reordering;
+   info->tcpi_snd_cwnd = tp->snd_cwnd;
+
+   if (info->tcpi_state == TCP_LISTEN) {
+   /* listeners aliased fields :
+* tcpi_unacked -> Number of children ready for accept()
+* tcpi_sacked  -> max backlog
+*/
+   info->tcpi_unacked = sk->sk_ack_backlog;
+   info->tcpi_sacked = sk->sk_max_ack_backlog;
+   return;
+   }
info->tcpi_ca_state = icsk->icsk_ca_state;
info->tcpi_retransmits = icsk->icsk_retransmits;
info->tcpi_probes = icsk->icsk_probes_out;
@@ -2748,13 +2769,9 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_snd_mss = tp->mss_cache;
info->tcpi_rcv_mss = icsk->icsk_ack.rcv_mss;
 
-   if (info->tcpi_state == TCP_LISTEN) {
-   info->tcpi_unacked = sk->sk_ack_backlog;
-   info->tcpi_sacked = sk->sk_max_ack_backlog;
-   } else {
-   info->tcpi_unacked = tp->packets_out;
-   info->tcpi_sacked = tp->sacked_out;
-   }
+   info->tcpi_unacked = tp->packets_out;
+   info->tcpi_sacked = tp->sacked_out;
+
info->tcpi_lost = tp->lost_out;
info->tcpi_retrans = tp->retrans_out;
info->tcpi_fackets = tp->fackets_out;
@@ -2768,23 +2785,13 @@ void tcp_get_info(struct sock *sk, struct tcp_info 
*info)
info->tcpi_rtt = tp->srtt_us >> 3;
info->tcpi_rttvar = tp->mdev_us >> 2;
info->tcpi_snd_ssthresh = tp->snd_ssthresh;
-   info->tcpi_snd_cwnd = tp->snd_cwnd;
info->tcpi_advmss = tp->advmss;
-   info->tcpi_reordering = tp->reordering;
 
info->tcpi_rcv_rtt = jiffies_to_usecs(tp->rcv_rtt_est.rtt)>>3;
info->tcpi_rcv_space = tp->rcvq_space.space;
 
info->tcpi_total_retrans = tp->total_retrans;
 
-   rate = READ_ONCE(sk->sk_pacing_rate);
-   rate64 = rate != ~0U ? rate : ~0ULL;
-   put_unaligned(rate64, &info->tcpi_pacing_rate);
-
-   rate = READ_ONCE(sk->sk_max_pacing_rate);
-   rate64 = rate != ~0U ? rate : ~0ULL;
-   put_unaligned(rate64, &info->tcpi_max_pacing_rate);
-
do {
start = u64_stats_fetch_begin_irq(&tp->syncp);
put_unaligned(tp->bytes_acked, &info->tcpi_bytes_acked);
-- 
2.8.0.rc3.226.g39d4020

[PATCH net-next 2/2] tcp: no longer hold ehash lock while calling tcp_get_info()

2016-11-04 Thread Eric Dumazet

We had various problems in the past in tcp_get_info() and used
specific synchronization to avoid deadlocks.

We would like to add more instrumentation points for TCP, and
avoiding grabing socket lock in tcp_getinfo() was too costly.

Being able to lock the socket allows to provide consistent set
of fields.

inet_diag_dump_icsk() can make sure ehash locks are not
held any more when tcp_get_info() is called.

We can remove syncp added in commit d654976cbf85
("tcp: fix a potential deadlock in tcp_get_info()"), but we need
to use lock_sock_fast() instead of spin_lock_bh() since TCP input
path can now be run from process context.

Signed-off-by: Eric Dumazet 
Signed-off-by: Yuchung Cheng 
Acked-by: Soheil Hassas Yeganeh 
Acked-by: Neal Cardwell 
---
 include/linux/tcp.h  |  2 --
 net/ipv4/inet_diag.c | 48 +---
 net/ipv4/tcp.c   | 20 +---
 net/ipv4/tcp_input.c |  4 
 4 files changed, 42 insertions(+), 32 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index a17ae7b85218..32a7c7e35b71 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -176,8 +176,6 @@ struct tcp_sock {
 * sum(delta(snd_una)), or how many bytes
 * were acked.
 */
-   struct u64_stats_sync syncp; /* protects 64bit vars (cf tcp_get_info()) 
*/
-
u32 snd_una;/* First byte we want an ack for*/
u32 snd_sml;/* Last byte of the most recently transmitted 
small packet */
u32 rcv_tstamp; /* timestamp of last received ACK (for 
keepalives) */
diff --git a/net/ipv4/inet_diag.c b/net/ipv4/inet_diag.c
index 3b34024202d8..4dea33e5f295 100644
--- a/net/ipv4/inet_diag.c
+++ b/net/ipv4/inet_diag.c
@@ -861,10 +861,11 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, 
struct sk_buff *skb,
 struct netlink_callback *cb,
 const struct inet_diag_req_v2 *r, struct nlattr *bc)
 {
+   bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN);
struct net *net = sock_net(skb->sk);
-   int i, num, s_i, s_num;
u32 idiag_states = r->idiag_states;
-   bool net_admin = netlink_net_capable(cb->skb, CAP_NET_ADMIN);
+   int i, num, s_i, s_num;
+   struct sock *sk;
 
if (idiag_states & TCPF_SYN_RECV)
idiag_states |= TCPF_NEW_SYN_RECV;
@@ -877,7 +878,6 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, 
struct sk_buff *skb,
 
for (i = s_i; i < INET_LHTABLE_SIZE; i++) {
struct inet_listen_hashbucket *ilb;
-   struct sock *sk;
 
num = 0;
ilb = &hashinfo->listening_hash[i];
@@ -922,13 +922,14 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, 
struct sk_buff *skb,
if (!(idiag_states & ~TCPF_LISTEN))
goto out;
 
+#define SKARR_SZ 16
for (i = s_i; i <= hashinfo->ehash_mask; i++) {
struct inet_ehash_bucket *head = &hashinfo->ehash[i];
spinlock_t *lock = inet_ehash_lockp(hashinfo, i);
struct hlist_nulls_node *node;
-   struct sock *sk;
-
-   num = 0;
+   struct sock *sk_arr[SKARR_SZ];
+   int num_arr[SKARR_SZ];
+   int idx, accum, res;
 
if (hlist_nulls_empty(&head->chain))
continue;
@@ -936,9 +937,12 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, 
struct sk_buff *skb,
if (i > s_i)
s_num = 0;
 
+next_chunk:
+   num = 0;
+   accum = 0;
spin_lock_bh(lock);
sk_nulls_for_each(sk, node, &head->chain) {
-   int state, res;
+   int state;
 
if (!net_eq(sock_net(sk), net))
continue;
@@ -962,21 +966,35 @@ void inet_diag_dump_icsk(struct inet_hashinfo *hashinfo, 
struct sk_buff *skb,
if (!inet_diag_bc_sk(bc, sk))
goto next_normal;
 
-   res = sk_diag_fill(sk, skb, r,
+   sock_hold(sk);
+   num_arr[accum] = num;
+   sk_arr[accum] = sk;
+   if (++accum == SKARR_SZ)
+   break;
+next_normal:
+   ++num;
+   }
+   spin_unlock_bh(lock);
+   res = 0;
+   for (idx = 0; idx < accum; idx++) {
+   if (res >= 0) {
+   res = sk_diag_fill(sk_arr[idx], skb, r,
   sk_user_ns(NETLINK_CB(cb->skb).sk),
   NETLINK_CB(cb->skb).portid,

[PATCH net-next 0/2] tcp: tcp_get_info() locking changes

2016-11-04 Thread Eric Dumazet

This short series prepares tcp_get_info() for more detailed infos.

In order to not slow down fast path, our goal is to use the normal
socket spinlock instead of custom synchronization.

All we need to ensure is that tcp_get_info() is not called with
ehash lock, which might dead lock, since packet processing would acquire
the spinlocks in reverse way.

Eric Dumazet (2):
  tcp: shortcut listeners in get_tcp_info()
  tcp: no longer hold ehash lock while calling tcp_get_info()

 include/linux/tcp.h  |  2 --
 net/ipv4/inet_diag.c | 48 +--
 net/ipv4/tcp.c   | 57 
 net/ipv4/tcp_input.c |  4 
 4 files changed, 64 insertions(+), 47 deletions(-)

-- 
2.8.0.rc3.226.g39d4020

Re: [PATCH net-next v1 00/10] amd-xgbe: AMD XGBE driver updates 2016-11-03

2016-11-04 Thread David Miller

From: Tom Lendacky 
Date: Thu, 3 Nov 2016 13:17:28 -0500

> This patch series is targeted at preparing the driver for a new PCI version
> of the hardware.  After this series is applied, a follow-on series will
> introduce the support for the PCI version of the hardware.
> 
> The following updates and fixes are included in this driver update series:
> 
> - Fix formatting of PCS debug register dump
> - Prepare for priority-based FIFO allocation
> - Implement priority-based FIFO allocation
> - Prepare for working with more than one type of PCS/PHY
> - Prepare for the introduction of clause 37 auto-negotiation
> - Add support for clause 37 auto-negotiation
> - Prepare for supporting a new PCS register access method
> - Add support for 64-bit management counter registers
> - Update DMA channel status determination
> - Prepare for supporting PCI devices in addition to platform devices
> 
> This patch series is based on net-next.

Series applied, thanks Tom.

Re: [PATCH net-next v2] net: inet: Support UID-based routing

2016-11-04 Thread David Miller

From: Lorenzo Colitti 
Date: Fri,  4 Nov 2016 02:23:40 +0900

> This patchset adds support for per-UID routing.

Looks great, thanks for all of the hard work.

Series applied.

Please submit the necessary iproute2 patches, as needed.

Thanks again.

Re: [PATCH net] fib_trie: correct /proc/net/route for large read buffer

2016-11-04 Thread Alexander Duyck

On Fri, Nov 4, 2016 at 7:45 AM, Jason Baron  wrote:
> From: Jason Baron 
>
> When read() is called on /proc/net/route requesting a size that is one
> entry size (128 bytes) less than m->size or greater, the resulting output
> has missing and/or duplicate entries. Since m->size is typically PAGE_SIZE,
> for a PAGE_SIZE of 4,096 this means that reads requesting more than 3,968
> bytes will see bogus output.
>
> For example:
>
> for i in {100..200}; do
> ip route add 192.168.1.$i dev eth0
> done
> dd if=/proc/net/route of=/tmp/good bs=1024
> dd if=/proc/net/route of=/tmp/bad bs=4096
>
> # diff -q /tmp/good /tmp/bad
> Files /tmp/good and /tmp/bad differ
>
> I think this has gone unnoticed, since the output of 'netstat -r' and
> 'route' is generated by reading in 1,024 byte increments and thus not
> corrupted. Further, the number of entries in the route table needs to be
> sufficiently large in order to trigger the problematic case.
>
> The issue arises because fib_route_get_idx() does not properly handle
> the case where pos equals iter->pos. This case only arises when we have
> a large read buffer size because we end up re-requesting the last entry
> that overflowed m->buf. In the case of a smaller read buffer size,
> we don't exceed the size of m->buf, and thus fib_route_get_idx() is called
> with pos greater than iter->pos.
>
> Fix by properly handling the iter->pos == pos case.
>
> Fixes: 25b97c016b26 ("ipv4: off-by-one in continuation handling in 
> /proc/net/route")
> Cc: Andy Whitcroft 
> Cc: Alexander Duyck 
> Signed-off-by: Jason Baron 
> ---
>  net/ipv4/fib_trie.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index 31cef3602585..1017533fc75c 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -2411,12 +2411,17 @@ static struct key_vector *fib_route_get_idx(struct 
> fib_route_iter *iter,
> loff_t pos)
>  {
> struct key_vector *l, **tp = &iter->tnode;
> +   loff_t saved_pos = 0;
> t_key key;
>
> /* use cache location of next-to-find key */
> if (iter->pos > 0 && pos >= iter->pos) {
> pos -= iter->pos;
> key = iter->key;
> +   if (pos == 0) {
> +   saved_pos = iter->pos;
> +   key--;
> +   }
> } else {
> iter->pos = 0;
> key = 0;
> @@ -2436,10 +2441,13 @@ static struct key_vector *fib_route_get_idx(struct 
> fib_route_iter *iter,
> break;
> }
>
> -   if (l)
> +   if (l) {
> iter->key = key;/* remember it */
> -   else
> +   if (saved_pos)
> +   iter->pos = saved_pos;
> +   } else {
> iter->pos = 0;  /* forget it */
> +   }
>
> return l;
>  }

This doesn't seem correct to me.  I will have to look through this.
My understanding is that the value of iter->pos is supposed to be the
next position for us to grab, not the last one that was retrieved.  If
we are trying to re-request the last value then we should be falling
back into the else case for this since pos should be one less than
iter->pos.  The problem is the table could change out from under us
which is one of the reasons why we don't want to try and rewind the
key like you are doing here.

- Alex

Re: [PATCH net-next v2 00/11] net: dsa: mv88e6xxx: refine port operations

2016-11-04 Thread David Miller

From: Vivien Didelot 
Date: Fri,  4 Nov 2016 03:23:25 +0100

> The Marvell chips have one internal SMI device per port, containing a
> set of registers used to configure a port's link, STP state, default
> VLAN or addresses database, etc.
> 
> This patchset creates port files to implement the port operations as
> described in datasheets, and extend the chip ops structure with them.
> 
> Patches 1 to 6 implement accessors for port's STP state, port based VLAN
> map, default FID, default VID, and 802.1Q mode.
> 
> Patches 7 to 11 implement the port's MAC setup of link state, duplex
> mode, RGMII delay and speed, all accessed through port's register 0x01.
> 
> The new port's MAC setup code is used to re-implement the adjust_link
> code and correctly force the link down before changing any of the MAC
> settings, as requested by the datasheets.
> 
> The port's MAC accessors use values compatible with struct phy_device
> (e.g. DUPLEX_FULL) and extend them when needed (e.g. SPEED_MAX).
> 
> Changes in v2:
> 
>   - Strictly use new _UNFORCED values instead of re-using _UNKNOWN ones.

Series applied, thanks Vivien.

Re: [PATCH v2 2/2 ] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias

Hi David,

On 11/04/2016 06:54 PM, David Miller wrote:
> From: Sebastian Frias 
> Date: Fri, 4 Nov 2016 18:02:15 +0100
> 
>> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
>> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
>> delay for tx and rx only") added several RGMII definitions:
>> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
>> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
>>
>> Those are all RGMII modes (1Gbit) and must be considered that way when
>> setting the MAC mode or the pad mode for the HW to work properly.
>>
>> Signed-off-by: Sebastian Frias 
> 
> You cannot just repost one part of a patch series when you make changes.
> 
> You must always repost the entire series as a new fresh version, with
> changelog entries added to your "[PATCH v2 0/2] ..." header posting.
> 

Thanks for the information.

I sent v2, and then v3, because v2 had formatting issues, hopefully it is
ok now.

Best regards,

Sebastian

[PATCH v3 2/2] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias

Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII definitions:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC mode or the pad mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..fba2699 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
 static void nb8800_mac_config(struct net_device *dev)
 {
struct nb8800_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = dev->phydev;
bool gigabit = priv->speed == SPEED_1000;
u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
u32 mac_mode = 0;
@@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;
 
if (gigabit) {
-   if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (phy_interface_is_rgmii(phydev))
mac_mode |= RGMII_MODE;
 
mac_mode |= GMAC_MODE;
@@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII:
-   pad_mode = PAD_MODE_RGMII;
-   break;
-
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2

[PATCH v3 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias

The delay can be applied at PHY or MAC level, but since
PHY drivers will apply the delay at PHY level when using
one of the "internal delay" declinations of RGMII mode
(like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
at MAC level causes issues.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index b59aa35..d2855c9 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII_TXID:
-   pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+   pad_mode = PAD_MODE_RGMII;
break;
 
default:
-- 
1.7.11.2

[PATCH v3 0/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias

This is v3 of the series, it fixes formatting issues of v2.

In v2 of the series, only the second patch:
"net: ethernet: nb8800: handle all RGMII definitions" is modified
to account for Florian's suggestion.

Sebastian Frias (2):
  net: ethernet: nb8800: Do not apply TX delay at MAC level
  net: ethernet: nb8800: handle all RGMII definitions

 drivers/net/ethernet/aurora/nb8800.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.11.2

[PATCH v2 2/2] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias

Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII definitions:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC mode or the pad mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
drivers/net/ethernet/aurora/nb8800.c | 8 
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..fba2699 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
static void nb8800_mac_config(struct net_device *dev)
{
struct nb8800_priv *priv = netdev_priv(dev);
+ struct phy_device *phydev = dev->phydev;
bool gigabit = priv->speed == SPEED_1000;
u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
u32 mac_mode = 0;
@@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;

if (gigabit) {
- if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+ if (phy_interface_is_rgmii(phydev))
mac_mode |= RGMII_MODE;

mac_mode |= GMAC_MODE;
@@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;

case PHY_INTERFACE_MODE_RGMII:
- pad_mode = PAD_MODE_RGMII;
- break;
-
+ case PHY_INTERFACE_MODE_RGMII_ID:
+ case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2

[PATCH v2 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias

The delay can be applied at PHY or MAC level, but since
PHY drivers will apply the delay at PHY level when using
one of the "internal delay" declinations of RGMII mode
(like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
at MAC level causes issues.

Signed-off-by: Sebastian Frias 
---
drivers/net/ethernet/aurora/nb8800.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index b59aa35..d2855c9 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
break;

case PHY_INTERFACE_MODE_RGMII_TXID:
- pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
+ pad_mode = PAD_MODE_RGMII;
break;

default:
-- 
1.7.11.2

[PATCH v2 0/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias

This is v2 of the series, only the second patch:
"net: ethernet: nb8800: handle all RGMII definitions" is modified
to account for Florian's suggestion.

Sebastian Frias (2):
  net: ethernet: nb8800: Do not apply TX delay at MAC level
  net: ethernet: nb8800: handle all RGMII definitions

 drivers/net/ethernet/aurora/nb8800.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

-- 
1.7.11.2

[net-next v2 1/7] stmmac: dwmac-sti: remove useless of_node check

2016-11-04 Thread Joachim Eastwood

Since dwmac-sti is a DT only driver checking for OF node is not necessary.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 58c05ac..075ed42 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -270,9 +270,6 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
struct regmap *regmap;
int err;
 
-   if (!np)
-   return -EINVAL;
-
/* clk selection from extra syscfg register */
dwmac->clk_sel_reg = -ENXIO;
res = platform_get_resource_byname(pdev, IORESOURCE_MEM, "sti-clkconf");
-- 
2.10.2

[net-next v2 0/7] stmmac: dwmac-sti refactor+cleanup

2016-11-04 Thread Joachim Eastwood

This patch set aims to remove the init/exit callbacks from the 
dwmac-sti driver and instead use standard PM callbacks. Doing this
will also allow us to cleanup the driver.

Eventually the init/exit callbacks will be deprecated and removed
from all drivers dwmac-* except for dwmac-generic. Drivers will be
refactored to use standard PM and remove callbacks.

Changes since v2:
 - add missing static to sti_dwmac_pm_ops
 - s/sti_dwmac_set_phy_mode/sti_dwmac_set_mode
 - acked/tested by Giuseppe

Joachim Eastwood (7):
  stmmac: dwmac-sti: remove useless of_node check
  stmmac: dwmac-sti: remove clk NULL checks
  stmmac: dwmac-sti: add PM ops and resume function
  stmmac: dwmac-sti: move st,gmac_en parsing to sti_dwmac_parse_data
  stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling
  stmmac: dwmac-sti: clean up and rename sti_dwmac_init
  stmmac: dwmac-sti: remove unused priv dev member

 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 87 -
 1 file changed, 58 insertions(+), 29 deletions(-)

-- 
2.10.2

[net-next v2 3/7] stmmac: dwmac-sti: add PM ops and resume function

2016-11-04 Thread Joachim Eastwood

Implement PM callbacks and driver remove in the driver instead
of relying on the init/exit hooks in stmmac_platform. This gives
the driver more flexibility in how the code is organized.

Eventually the init/exit callbacks will be deprecated in favor
of the standard PM callbacks and driver remove function.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 47 +++--
 1 file changed, 37 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index f009bf4..bd6db22 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -253,12 +253,6 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
return 0;
 }
 
-static void sti_dwmac_exit(struct platform_device *pdev, void *priv)
-{
-   struct sti_dwmac *dwmac = priv;
-
-   clk_disable_unprepare(dwmac->clk);
-}
 static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
struct platform_device *pdev)
 {
@@ -352,8 +346,6 @@ static int sti_dwmac_probe(struct platform_device *pdev)
dwmac->fix_retime_src = data->fix_retime_src;
 
plat_dat->bsp_priv = dwmac;
-   plat_dat->init = sti_dwmac_init;
-   plat_dat->exit = sti_dwmac_exit;
plat_dat->fix_mac_speed = data->fix_retime_src;
 
ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
@@ -363,6 +355,41 @@ static int sti_dwmac_probe(struct platform_device *pdev)
return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
 }
 
+static int sti_dwmac_remove(struct platform_device *pdev)
+{
+   struct sti_dwmac *dwmac = get_stmmac_bsp_priv(&pdev->dev);
+   int ret = stmmac_dvr_remove(&pdev->dev);
+
+   clk_disable_unprepare(dwmac->clk);
+
+   return ret;
+}
+
+#ifdef CONFIG_PM_SLEEP
+static int sti_dwmac_suspend(struct device *dev)
+{
+   struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
+   int ret = stmmac_suspend(dev);
+
+   clk_disable_unprepare(dwmac->clk);
+
+   return ret;
+}
+
+static int sti_dwmac_resume(struct device *dev)
+{
+   struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
+   struct platform_device *pdev = to_platform_device(dev);
+
+   sti_dwmac_init(pdev, dwmac);
+
+   return stmmac_resume(dev);
+}
+#endif /* CONFIG_PM_SLEEP */
+
+static SIMPLE_DEV_PM_OPS(sti_dwmac_pm_ops, sti_dwmac_suspend,
+  sti_dwmac_resume);
+
 static const struct sti_dwmac_of_data stih4xx_dwmac_data = {
.fix_retime_src = stih4xx_fix_retime_src,
 };
@@ -382,10 +409,10 @@ MODULE_DEVICE_TABLE(of, sti_dwmac_match);
 
 static struct platform_driver sti_dwmac_driver = {
.probe  = sti_dwmac_probe,
-   .remove = stmmac_pltfr_remove,
+   .remove = sti_dwmac_remove,
.driver = {
.name   = "sti-dwmac",
-   .pm = &stmmac_pltfr_pm_ops,
+   .pm = &sti_dwmac_pm_ops,
.of_match_table = sti_dwmac_match,
},
 };
-- 
2.10.2

[net-next v2 5/7] stmmac: dwmac-sti: move clk_prepare_enable out of init and add error handling

2016-11-04 Thread Joachim Eastwood

Add clock error handling to probe and in the process move clock enabling
out of sti_dwmac_init() to make this easier.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 19 +++
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 269c7a5..b71888a 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -237,8 +237,6 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
u32 reg = dwmac->ctrl_reg;
u32 val;
 
-   clk_prepare_enable(dwmac->clk);
-
if (dwmac->gmac_en)
regmap_update_bits(regmap, reg, EN_MASK, EN);
 
@@ -348,11 +346,23 @@ static int sti_dwmac_probe(struct platform_device *pdev)
plat_dat->bsp_priv = dwmac;
plat_dat->fix_mac_speed = data->fix_retime_src;
 
-   ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
+   ret = clk_prepare_enable(dwmac->clk);
if (ret)
return ret;
 
-   return stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+   ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
+   if (ret)
+   goto disable_clk;
+
+   ret = stmmac_dvr_probe(&pdev->dev, plat_dat, &stmmac_res);
+   if (ret)
+   goto disable_clk;
+
+   return 0;
+
+disable_clk:
+   clk_disable_unprepare(dwmac->clk);
+   return ret;
 }
 
 static int sti_dwmac_remove(struct platform_device *pdev)
@@ -381,6 +391,7 @@ static int sti_dwmac_resume(struct device *dev)
struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
struct platform_device *pdev = to_platform_device(dev);
 
+   clk_prepare_enable(dwmac->clk);
sti_dwmac_init(pdev, dwmac);
 
return stmmac_resume(dev);
-- 
2.10.2

[net-next v2 6/7] stmmac: dwmac-sti: clean up and rename sti_dwmac_init

2016-11-04 Thread Joachim Eastwood

Rename sti_dwmac_init to sti_dwmac_set_mode which is a better
description for what it really does.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index b71888a..ac17bff 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -229,9 +229,8 @@ static void stid127_fix_retime_src(void *priv, u32 spd)
regmap_update_bits(dwmac->regmap, reg, STID127_RETIME_SRC_MASK, val);
 }
 
-static int sti_dwmac_init(struct platform_device *pdev, void *priv)
+static int sti_dwmac_set_mode(struct sti_dwmac *dwmac)
 {
-   struct sti_dwmac *dwmac = priv;
struct regmap *regmap = dwmac->regmap;
int iface = dwmac->interface;
u32 reg = dwmac->ctrl_reg;
@@ -245,7 +244,7 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
val = (iface == PHY_INTERFACE_MODE_REVMII) ? 0 : ENMII;
regmap_update_bits(regmap, reg, ENMII_MASK, val);
 
-   dwmac->fix_retime_src(priv, dwmac->speed);
+   dwmac->fix_retime_src(dwmac, dwmac->speed);
 
return 0;
 }
@@ -350,7 +349,7 @@ static int sti_dwmac_probe(struct platform_device *pdev)
if (ret)
return ret;
 
-   ret = sti_dwmac_init(pdev, plat_dat->bsp_priv);
+   ret = sti_dwmac_set_mode(dwmac);
if (ret)
goto disable_clk;
 
@@ -389,10 +388,9 @@ static int sti_dwmac_suspend(struct device *dev)
 static int sti_dwmac_resume(struct device *dev)
 {
struct sti_dwmac *dwmac = get_stmmac_bsp_priv(dev);
-   struct platform_device *pdev = to_platform_device(dev);
 
clk_prepare_enable(dwmac->clk);
-   sti_dwmac_init(pdev, dwmac);
+   sti_dwmac_set_mode(dwmac);
 
return stmmac_resume(dev);
 }
-- 
2.10.2

Re: [PATCH v2 2/2 ] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread David Miller

From: Sebastian Frias 
Date: Fri, 4 Nov 2016 18:02:15 +0100

> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
> delay for tx and rx only") added several RGMII definitions:
> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
> 
> Those are all RGMII modes (1Gbit) and must be considered that way when
> setting the MAC mode or the pad mode for the HW to work properly.
> 
> Signed-off-by: Sebastian Frias 

You cannot just repost one part of a patch series when you make changes.

You must always repost the entire series as a new fresh version, with
changelog entries added to your "[PATCH v2 0/2] ..." header posting.

[net-next v2 7/7] stmmac: dwmac-sti: remove unused priv dev member

2016-11-04 Thread Joachim Eastwood

The dev member of struct sti_dwmac is not used anywhere in the driver
so lets just remove it.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index ac17bff..c9006ab 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -126,7 +126,6 @@ struct sti_dwmac {
struct clk *clk;/* PHY clock */
u32 ctrl_reg;   /* GMAC glue-logic control register */
int clk_sel_reg;/* GMAC ext clk selection register */
-   struct device *dev;
struct regmap *regmap;
bool gmac_en;
u32 speed;
@@ -274,7 +273,6 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
return err;
}
 
-   dwmac->dev = dev;
dwmac->interface = of_get_phy_mode(np);
dwmac->regmap = regmap;
dwmac->gmac_en = of_property_read_bool(np, "st,gmac_en");
-- 
2.10.2

[net-next v2 2/7] stmmac: dwmac-sti: remove clk NULL checks

2016-11-04 Thread Joachim Eastwood

Since sti_dwmac_parse_data() sets dwmac->clk to NULL if not clock was
provided in DT and NULL is a valid clock there is no need to check for
NULL before using this clock.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index 075ed42..f009bf4 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -191,7 +191,7 @@ static void stih4xx_fix_retime_src(void *priv, u32 spd)
}
}
 
-   if (src == TX_RETIME_SRC_CLKGEN && dwmac->clk && freq)
+   if (src == TX_RETIME_SRC_CLKGEN && freq)
clk_set_rate(dwmac->clk, freq);
 
regmap_update_bits(dwmac->regmap, reg, STIH4XX_RETIME_SRC_MASK,
@@ -222,7 +222,7 @@ static void stid127_fix_retime_src(void *priv, u32 spd)
freq = DWMAC_2_5MHZ;
}
 
-   if (dwmac->clk && freq)
+   if (freq)
clk_set_rate(dwmac->clk, freq);
 
regmap_update_bits(dwmac->regmap, reg, STID127_RETIME_SRC_MASK, val);
@@ -238,8 +238,7 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
u32 reg = dwmac->ctrl_reg;
u32 val;
 
-   if (dwmac->clk)
-   clk_prepare_enable(dwmac->clk);
+   clk_prepare_enable(dwmac->clk);
 
if (of_property_read_bool(np, "st,gmac_en"))
regmap_update_bits(regmap, reg, EN_MASK, EN);
@@ -258,8 +257,7 @@ static void sti_dwmac_exit(struct platform_device *pdev, 
void *priv)
 {
struct sti_dwmac *dwmac = priv;
 
-   if (dwmac->clk)
-   clk_disable_unprepare(dwmac->clk);
+   clk_disable_unprepare(dwmac->clk);
 }
 static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
struct platform_device *pdev)
-- 
2.10.2

[net-next v2 4/7] stmmac: dwmac-sti: move st,gmac_en parsing to sti_dwmac_parse_data

2016-11-04 Thread Joachim Eastwood

The sti_dwmac_init() function is called both from probe and resume.
Since DT properties doesn't change between suspend/resume cycles move
parsing of this parameter into sti_dwmac_parse_data() where it belongs.

Signed-off-by: Joachim Eastwood 
Acked-by: Giuseppe Cavallaro 
Tested-by: Giuseppe Cavallaro 
---
 drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c 
b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
index bd6db22..269c7a5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-sti.c
@@ -128,6 +128,7 @@ struct sti_dwmac {
int clk_sel_reg;/* GMAC ext clk selection register */
struct device *dev;
struct regmap *regmap;
+   bool gmac_en;
u32 speed;
void (*fix_retime_src)(void *priv, unsigned int speed);
 };
@@ -233,14 +234,12 @@ static int sti_dwmac_init(struct platform_device *pdev, 
void *priv)
struct sti_dwmac *dwmac = priv;
struct regmap *regmap = dwmac->regmap;
int iface = dwmac->interface;
-   struct device *dev = dwmac->dev;
-   struct device_node *np = dev->of_node;
u32 reg = dwmac->ctrl_reg;
u32 val;
 
clk_prepare_enable(dwmac->clk);
 
-   if (of_property_read_bool(np, "st,gmac_en"))
+   if (dwmac->gmac_en)
regmap_update_bits(regmap, reg, EN_MASK, EN);
 
regmap_update_bits(regmap, reg, MII_PHY_SEL_MASK, phy_intf_sels[iface]);
@@ -281,6 +280,7 @@ static int sti_dwmac_parse_data(struct sti_dwmac *dwmac,
dwmac->dev = dev;
dwmac->interface = of_get_phy_mode(np);
dwmac->regmap = regmap;
+   dwmac->gmac_en = of_property_read_bool(np, "st,gmac_en");
dwmac->ext_phyclk = of_property_read_bool(np, "st,ext-phyclk");
dwmac->tx_retime_src = TX_RETIME_SRC_NA;
dwmac->speed = SPEED_100;
-- 
2.10.2

Re: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()

2016-11-04 Thread 张谦

yes,  tipc_l2_device_event() only change MTU of bearer rather than the MTU of 
link, tipc_enable_l2_media() will be the right place to test a tiny MTU.

Qian

Sent from my iPhone

On 5 Nov 2016, at 00:00, Jon Maloy  wrote:

>> -Original Message-
>> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
>> On Behalf Of ??
>> Sent: Friday, 04 November, 2016 03:24
>> To: Jon Maloy ; Ben Hutchings
>> ; Ying Xue 
>> Cc: netdev@vger.kernel.org; Eric Dumazet 
>> Subject: Re: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()
>> 
>> Hi,
>> I think both tipc_l2_device_event() and tipc_enable_l2_media() need to 
>> refuse a
>> tiny MTU for TIPC bearers.
> 
> Right, except that when looking into the code for tipc_l2_device_event() I 
> realize that it currently doesn't try to re-adapt to a new MTU at all. It 
> just calls tipc_reset_bearer(), which I suspect has changed somewhere along 
> the road to ignore the MTU. So, you only need to change 
> tipc_enable_l2_media().
> 
> ///jon
> 
>> 
>> tipc_l2_device_event() used to update the TIPC MTU value when executing a
>> command like 'ifconfig eth0 MTU 1 up'.
>> tipc_enable_l2_media() will be invoked when the TIPC network created.
>> 
>> Thanks.
>> 
>> Qian Zhang
>> MarvelTeam Qihoo 360
>> 
>> 
>> 
>> -邮件原件-
>> 发件人: Jon Maloy [mailto:jon.ma...@ericsson.com]
>> 发送时间: 2016年11月1日 19:37
>> 收件人: 张谦; Ben Hutchings; Ying Xue
>> 抄送: netdev@vger.kernel.org; Eric Dumazet
>> 主题: RE: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()
>> 
>> Hi,
>> I think we all agreed in the end that this is a possible, but highly 
>> implausible,
>> scenario, and rather as a point of exploit than a functional bug.
>> The solution is very simple, and described further down in this mail thread. 
>> I have
>> not done anything to it yet, but you are welcome to contribute.
>> 
>> BR
>> ///jon
>> 
>> 
>>> -Original Message-
>>> From: 张谦 [mailto:zhangqia...@360.cn]
>>> Sent: Tuesday, 01 November, 2016 02:35
>>> To: Ben Hutchings ; Jon Maloy
>>> ; Ying Xue 
>>> Cc: netdev@vger.kernel.org; Eric Dumazet 
>>> Subject: Re: [PATCH net] tipc: Guard against tiny MTU in
>>> tipc_msg_build()
>>> 
>>> Hi all,
>>> I have accomplished a PoC can help you to confirm this issue.
>>> 
>>> And two weeks passed from the last mail, can you tell me the progress
>>> of the patch to this flaw?
>>> 
>>> Thanks.
>>> 
>>> Qian Zhang
>>> Marvel Team Qihoo 360
>>> 
>>> 
>>> -邮件原件-
>>> 发件人: Ben Hutchings [mailto:b...@decadent.org.uk]
>>> 发送时间: 2016年10月21日 23:00
>>> 收件人: Jon Maloy; Ying Xue
>>> 抄送: netdev@vger.kernel.org; 张谦; Eric Dumazet
>>> 主题: Re: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()
>>> 
>>> On Fri, 2016-10-21 at 14:57 +, Jon Maloy wrote:
> -Original Message-
>>> From: Ben Hutchings [mailto:b...@decadent.org.uk]
> Sent: Thursday, 20 October, 2016 12:40
>>> To: Jon Maloy ; Ying Xue
>>> 
> Cc: netdev@vger.kernel.org; Qian Zhang
> ; Eric Dumazet
>>> 
> Subject: Re: [PATCH net] tipc: Guard against tiny MTU in
> tipc_msg_build()
> 
> On Thu, 2016-10-20 at 14:51 +, Jon Maloy wrote:
> [...]
>>> At this point we're about to copy INT_H_SIZE + mhsz bytes into
>>> the first fragment.  If that's already limited to be less than
>>> or equal to MAX_H_SIZE, comparing with MAX_H_SIZE would be fine.
>>> But if
> 
> MAX_H_SIZE
>>> is the maximum value of mhsz, that won't be good enough.
>> 
>> 
>> 
>> MAX_H_SIZE is 60 bytes, but in practice you will never see an
>> mhsz larger than
> 
> the biggest header we are actually using, which is MCAST_H_SIZE
> (==44
>>> bytes).
>> INT_H_SIZE is 40 bytes, so you are in reality testing for
>> whether we have an mtu
> 
> < 84 bytes.
>> You won't find any interfaces or protocols that come even close
>> to this
> 
> limitation, so to me this test is redundant.
> 
> But I can easily create such an interface:
> 
> $ unshare -n -U -r
> # ip l set lo mtu 1
> 
> Ben.
 
 
 It won't be very useful though. But I assume you mean it could be a
 possible exploit,
>>> 
>>> Exactly.
>>> 
 and I suspect a few other things would break both in TIPC and in
 other stacks if you do anything like that. I think the solution to
 this is not to fix all possible places in the code where this can go
 wrong, but rather to have a generic test where we refuse to attach
 bearers/interfaces offering an mtu < e.g. 1000 bytes. This can
 easily be done in tipc_enable_l2_media().
>>> 
>>> Yes.
>>> 
>>> Ben.
>>> 
>>> --
>>> Ben Hutchings
>>> One of the nice things about standards is that there are so many of them.
>

Re: Coding Style: Reverse XMAS tree declarations ?

2016-11-04 Thread Joe Perches

On Fri, 2016-11-04 at 11:07 -0400, David Miller wrote:
> From: Lino Sanfilippo 
> > On 04.11.2016 07:53, Joe Perches wrote:
> >> CHECK:REVERSE_XMAS_TREE: Prefer ordering declarations longest to
> >> shortest
> >> #446: FILE: drivers/net/ethernet/ethoc.c:446:
> >> +int size = bd.stat >> 16;
> >> +struct sk_buff *skb;
> > should not this case be valid? Optically the longer line is already
> > before the shorter.
> > I think that the whole point in using this reverse xmas tree ordering
> > is to have
> > the code optically tidied up and not to enforce ordering between
> > variable name lengths.
> 
> That's correct.

And also another reason the whole reverse xmas tree
automatic declaration layout concept is IMO dubious.

Basically, you're looking not at the initial ordering
of automatics as important, but helping find a specific
automatic when reversing from reading code is not always
correct.

Something like:

static void function{args,...)
{
[longish list of reverse xmas tree identifiers...]
struct foo *bar = longish_function(args, ...);
struct foobarbaz *qux;
[more identifers]

[multiple screenfuls of code later...)

new_function(..., bar, ...);

[more code...]
}

and the reverse xmas tree helpfulness of looking up the
type of bar is neither obvious nor easy.

My preference would be for a bar that serves coffee and alcohol.

Re: [PATCH v2 2/2 ] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Måns Rullgård

Sebastian Frias  writes:

> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
> delay for tx and rx only") added several RGMII definitions:
> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
>
> Those are all RGMII modes (1Gbit) and must be considered that way when
> setting the MAC mode or the pad mode for the HW to work properly.
>
> Signed-off-by: Sebastian Frias 
> ---
>  drivers/net/ethernet/aurora/nb8800.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
> b/drivers/net/ethernet/aurora/nb8800.c
> index d2855c9..fba2699 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
>  static void nb8800_mac_config(struct net_device *dev)
>  {
>   struct nb8800_priv *priv = netdev_priv(dev);
> + struct phy_device *phydev = dev->phydev;
>   bool gigabit = priv->speed == SPEED_1000;
>   u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
>   u32 mac_mode = 0;
> @@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
>   mac_mode |= HALF_DUPLEX;
>
>   if (gigabit) {
> - if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
> + if (phy_interface_is_rgmii(phydev))
>   mac_mode |= RGMII_MODE;
>
>   mac_mode |= GMAC_MODE;

This part is correct regardless of the outcome of the delay setup
discussion.

> @@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
>   break;
>
>   case PHY_INTERFACE_MODE_RGMII:
> - pad_mode = PAD_MODE_RGMII;
> - break;
> -
> + case PHY_INTERFACE_MODE_RGMII_ID:
> + case PHY_INTERFACE_MODE_RGMII_RXID:
>   case PHY_INTERFACE_MODE_RGMII_TXID:
>   pad_mode = PAD_MODE_RGMII;
>   break;
> -- 
> 1.7.11.2
>

-- 
Måns Rullgård

Re: Ethernet not working on a different SoC with same eth HW

2016-11-04 Thread Mason

On 04/11/2016 17:55, Måns Rullgård wrote:

> Florian Fainelli  writes:
> 
>> On 11/04/2016 08:22 AM, Måns Rullgård wrote:
>>> Andrew Lunn  writes:
>>>
 On Fri, Nov 04, 2016 at 03:05:00PM +, Måns Rullgård wrote:
> Andrew Lunn  writes:
>
 I agree with you. But fixing it is likely to break boards which
 currently have "rgmii", but actually need the delay in order to work.
>>>
>>> Does the internal delay here refer to the PHY or the MAC?  It's a
>>> property of the MAC node after all.
>>
>> It is the PHY which applies the delay.
>
> Says who?

 The source code.
>>>
>>> There's source code that disagrees with that.  The Broadcom GENET
>>> driver, for instance.
>>
>> Correct, and in the case where the MAC adds the delay while transmitting
>> (because it supports that) the expectation is that the PHY would remove
>> such a delay internally, conversely, the PHY would introduce a delay
>> while transmitting back to the PHY, in order to produce the desired 90
>> degrees shift on the RGMII signals, and get reproduce the correct clock
>> and data alignment internally.
>>
>>>
>  Some MACs can do it too.

 I'm sure they can. But look at the code. Nearly none do, and those
 that do are potentially broken.
>>>
>>> Those few drivers that do anything differently based on these values
>>> enable clock delay in the MAC.  That's why I wrote the NB8800 driver the
>>> way I did.
>>>
>>
>> I don't really what is wrong with the nb8800 driver at the moment, so
>> maybe this is just a configuration issue with the Atheros PHY driver,
>> it's not like it has not given people headache judging by the recent
>> discussions...
> 
> We don't even know if the problems Mason is having are caused by
> incorrect clock skew in the first place.  I'd suggest not patching
> anything at all until he gets it working.

All I said was:

> Assuming that "rxid" (rx internal delay) and "rx clock delay" are
> in fact the same concept with different names, do you agree that
> it would be unexpected for "rgmii rx clock delay" to be enabled
> when a DTB specifies "rgmii" or "rgmii-txid" ?

In parallel, Sebastian changed the DT of a 8758 board
from phy-connection-type = "rgmii";
to phy-connection-type = "rgmii-txid";

and this broke the Ethernet on 8758, although the "reference"
legacy 3.4 kernel does enable tx clock delay.

Regards.

Re: Coding Style: Reverse XMAS tree declarations ? (was Re: [PATCH net-next v6 02/10] dpaa_eth: add support for DPAA Ethernet)

2016-11-04 Thread Randy Dunlap

On 11/03/16 23:53, Joe Perches wrote:
> On Thu, 2016-11-03 at 15:58 -0400, David Miller wrote:
>> From: Madalin Bucur 
>> Date: Wed, 2 Nov 2016 22:17:26 +0200
>>
>>> This introduces the Freescale Data Path Acceleration Architecture
>>> +static inline size_t bpool_buffer_raw_size(u8 index, u8 cnt)
>>> +{
>>> + u8 i;
>>> + size_t res = DPAA_BP_RAW_SIZE / 2;
>>
>> Always order local variable declarations from longest to shortest line,
>> also know as Reverse Christmas Tree Format.
> 
> I think this declaration sorting order is misguided but
> here's a possible change to checkpatch adding a test for it
> that does this test just for net/ and drivers/net/

I agree with the misguided part.
That's not actually in CodingStyle AFAICT. Where did this come from?


thanks.
-- 
~Randy

[PATCH net-next 2/2] RDS: TCP: start multipath acceptor loop at 0

2016-11-04 Thread Sowmini Varadhan

The for() loop in rds_tcp_accept_one() assumes that the 0'th
rds_tcp_conn_path is UP and starts multipath accepts at index 1.
But this assumption may not always be true: if the 0'th path
has failed (ERROR or DOWN state) an incoming connection request
should be used to resurrect this path.

Signed-off-by: Sowmini Varadhan 
Acked-by: Santosh Shilimkar 
---
 net/rds/tcp_listen.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c
index e0b23fb..c9c4968 100644
--- a/net/rds/tcp_listen.c
+++ b/net/rds/tcp_listen.c
@@ -103,7 +103,7 @@ struct rds_tcp_connection *rds_tcp_accept_one_path(struct 
rds_connection *conn)
if (!peer_is_smaller)
return NULL;
 
-   for (i = 1; i < npaths; i++) {
+   for (i = 0; i < npaths; i++) {
struct rds_conn_path *cp = &conn->c_path[i];
 
if (rds_conn_path_transition(cp, RDS_CONN_DOWN,
-- 
1.7.1

[PATCH net-next 0/2] RDS: TCP: bug fixes

2016-11-04 Thread Sowmini Varadhan

A couple of bug fixes identified during testing. 

Sowmini Varadhan (2):
  RDS: TCP: report addr/port info based on TCP socket in rds-info
  RDS: TCP: start multipath acceptor loop at 0

 net/rds/tcp.c|   20 +---
 net/rds/tcp_listen.c |2 +-
 2 files changed, 14 insertions(+), 8 deletions(-)

[PATCH net-next 1/2] RDS: TCP: report addr/port info based on TCP socket in rds-info

2016-11-04 Thread Sowmini Varadhan

The socket argument passed to rds_tcp_tc_info() is a PF_RDS socket,
so it is incorrect to report the address port info based on
rds_getname() as part of TCP state report.

Invoke inet_getname() for the t_sock associated with the
rds_tcp_connection instead.

Signed-off-by: Sowmini Varadhan 
Acked-by: Santosh Shilimkar 
---
 net/rds/tcp.c |   20 +---
 1 files changed, 13 insertions(+), 7 deletions(-)

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index fcddacc..3296a6a 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -220,7 +220,7 @@ void rds_tcp_set_callbacks(struct socket *sock, struct 
rds_conn_path *cp)
write_unlock_bh(&sock->sk->sk_callback_lock);
 }
 
-static void rds_tcp_tc_info(struct socket *sock, unsigned int len,
+static void rds_tcp_tc_info(struct socket *rds_sock, unsigned int len,
struct rds_info_iterator *iter,
struct rds_info_lengths *lens)
 {
@@ -229,6 +229,7 @@ static void rds_tcp_tc_info(struct socket *sock, unsigned 
int len,
unsigned long flags;
struct sockaddr_in sin;
int sinlen;
+   struct socket *sock;
 
spin_lock_irqsave(&rds_tcp_tc_list_lock, flags);
 
@@ -237,12 +238,17 @@ static void rds_tcp_tc_info(struct socket *sock, unsigned 
int len,
 
list_for_each_entry(tc, &rds_tcp_tc_list, t_list_item) {
 
-   sock->ops->getname(sock, (struct sockaddr *)&sin, &sinlen, 0);
-   tsinfo.local_addr = sin.sin_addr.s_addr;
-   tsinfo.local_port = sin.sin_port;
-   sock->ops->getname(sock, (struct sockaddr *)&sin, &sinlen, 1);
-   tsinfo.peer_addr = sin.sin_addr.s_addr;
-   tsinfo.peer_port = sin.sin_port;
+   sock = tc->t_sock;
+   if (sock) {
+   sock->ops->getname(sock, (struct sockaddr *)&sin,
+  &sinlen, 0);
+   tsinfo.local_addr = sin.sin_addr.s_addr;
+   tsinfo.local_port = sin.sin_port;
+   sock->ops->getname(sock, (struct sockaddr *)&sin,
+  &sinlen, 1);
+   tsinfo.peer_addr = sin.sin_addr.s_addr;
+   tsinfo.peer_port = sin.sin_port;
+   }
 
tsinfo.hdr_rem = tc->t_tinc_hdr_rem;
tsinfo.data_rem = tc->t_tinc_data_rem;
-- 
1.7.1

[PATCH v2 2/2 ] net: ethernet: nb8800: handle all RGMII definitions

2016-11-04 Thread Sebastian Frias

Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
delay for tx and rx only") added several RGMII definitions:
PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.

Those are all RGMII modes (1Gbit) and must be considered that way when
setting the MAC mode or the pad mode for the HW to work properly.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index d2855c9..fba2699 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -598,6 +598,7 @@ static irqreturn_t nb8800_irq(int irq, void *dev_id)
 static void nb8800_mac_config(struct net_device *dev)
 {
struct nb8800_priv *priv = netdev_priv(dev);
+   struct phy_device *phydev = dev->phydev;
bool gigabit = priv->speed == SPEED_1000;
u32 mac_mode_mask = RGMII_MODE | HALF_DUPLEX | GMAC_MODE;
u32 mac_mode = 0;
@@ -609,7 +610,7 @@ static void nb8800_mac_config(struct net_device *dev)
mac_mode |= HALF_DUPLEX;
 
if (gigabit) {
-   if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
+   if (phy_interface_is_rgmii(phydev))
mac_mode |= RGMII_MODE;
 
mac_mode |= GMAC_MODE;
@@ -1278,9 +1279,8 @@ static int nb8800_tangox_init(struct net_device *dev)
break;
 
case PHY_INTERFACE_MODE_RGMII:
-   pad_mode = PAD_MODE_RGMII;
-   break;
-
+   case PHY_INTERFACE_MODE_RGMII_ID:
+   case PHY_INTERFACE_MODE_RGMII_RXID:
case PHY_INTERFACE_MODE_RGMII_TXID:
pad_mode = PAD_MODE_RGMII;
break;
-- 
1.7.11.2

Re: Ethernet not working on a different SoC with same eth HW

2016-11-04 Thread Måns Rullgård

Florian Fainelli  writes:

> On 11/04/2016 08:22 AM, Måns Rullgård wrote:
>> Andrew Lunn  writes:
>> 
>>> On Fri, Nov 04, 2016 at 03:05:00PM +, Måns Rullgård wrote:
 Andrew Lunn  writes:

>>> I agree with you. But fixing it is likely to break boards which
>>> currently have "rgmii", but actually need the delay in order to work.
>>
>> Does the internal delay here refer to the PHY or the MAC?  It's a
>> property of the MAC node after all.
>
> It is the PHY which applies the delay.

 Says who?
>>>
>>> The source code.
>> 
>> There's source code that disagrees with that.  The Broadcom GENET
>> driver, for instance.
>
> Correct, and in the case where the MAC adds the delay while transmitting
> (because it supports that) the expectation is that the PHY would remove
> such a delay internally, conversely, the PHY would introduce a delay
> while transmitting back to the PHY, in order to produce the desired 90
> degrees shift on the RGMII signals, and get reproduce the correct clock
> and data alignment internally.
>
>> 
  Some MACs can do it too.
>>>
>>> I'm sure they can. But look at the code. Nearly none do, and those
>>> that do are potentially broken.
>> 
>> Those few drivers that do anything differently based on these values
>> enable clock delay in the MAC.  That's why I wrote the NB8800 driver the
>> way I did.
>> 
>
> I don't really what is wrong with the nb8800 driver at the moment, so
> maybe this is just a configuration issue with the Atheros PHY driver,
> it's not like it has not given people headache judging by the recent
> discussions...

We don't even know if the problems Mason is having are caused by
incorrect clock skew in the first place.  I'd suggest not patching
anything at all until he gets it working.

-- 
Måns Rullgård

Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Måns Rullgård

Florian Fainelli  writes:

> On 11/04/2016 08:36 AM, Sebastian Frias wrote:
>> Hi Måns,
>> 
>> On 11/04/2016 04:18 PM, Måns Rullgård wrote:
>>> Sebastian Frias  writes:
>>>
 The delay can be applied at PHY or MAC level, but since
 PHY drivers will apply the delay at PHY level when using
 one of the "internal delay" declinations of RGMII mode
 (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
 at MAC level causes issues.
>>>
>>> The Broadcom GENET driver does the same thing.
>>>
>> 
>> Well, I don't know who uses that driver, or why they did it that way.
>
> I do use this driver and it works for me (tm), although I tested mostly
> with Broadcom PHYs and Ethernet switches, rarely with third party PHYs,
> but had that too, but all of that is in tree though,
> drivers/net/phy/broadcom.com, drivers/net/dsa/b53/ so feel free to
> "audit" that part of the code too.
>
> The configuration of the GENET port multiplexer requires us to specify
> how we want to align the clock and data, if we don't do that, and the
> PHY is also not agreeing with how its own delays should be configured,
> mayhem ensues, ranging from occasional transmit success, to high rates
> of CRC/FCS errors in best cases.
>
> I did verify that the settings were correct using a scope FWIW.
>
>> 
>> However, with the current code and DT bindings, if one requires
>> the delay, phy-connection-type="rgmii-txid" must be set.
>
> Yes, and we would set it correctly for our Broadcom reference boards
> using this driver.
>
>> 
>> But when doing so, both the Atheros 8035 and the Aurora NB8800 drivers
>> will apply the delay.
>> 
>> I think a better way of dealing with this is that both, PHY and MAC
>> drivers exchange information so that the delay is applied only once.
>
> Exchange what information? The PHY device interface (phydev->interface)
> conveys the needed information for both entities.

There doesn't seem to be any consensus among the drivers regarding where
the delay should be applied.  Since only a few drivers, MAC or PHY, act
on this property, most combinations still work by chance.  It is common
for boards to set the delay at the PHY using external config pins so no
software setup is required (although I have one Sigma based board that
gets this wrong).  I suspect if drivers/net/ethernet/broadcom/genet were
used with one of the four PHY drivers that also set the delay based on
this DT property, things would go wrong.

-- 
Måns Rullgård

[PATCH for-next 05/11] IB/hns: Modify the condition of notifying hardware loopback

2016-11-04 Thread Salil Mehta

From: Lijun Ou 

This patch modified the condition of notifying hardware loopback.

In hip06, RoCE Engine has several ports, one QP is related
to one port. hardware only support loopback in the same port,
not in the different ports.

So, If QP related to port N, the dmac in the QP context equals
the smac of the local port N or the loop_idc is 1, we should
set loopback bit in QP context to notify hardware.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Lijun Ou 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |   24 +++-
 1 file changed, 7 insertions(+), 17 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index d6df6dd..8ca36a7 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2244,24 +2244,14 @@ static int hns_roce_v1_m_qp(struct ib_qp *ibqp, const 
struct ib_qp_attr *attr,
 QP_CONTEXT_QPC_BYTE_32_SIGNALING_TYPE_S,
 hr_qp->sq_signal_bits);
 
-   for (port = 0; port < hr_dev->caps.num_ports; port++) {
-   smac = (u8 *)hr_dev->dev_addr[port];
-   dev_dbg(dev, "smac: %2x: %2x: %2x: %2x: %2x: %2x\n",
-   smac[0], smac[1], smac[2], smac[3], smac[4],
-   smac[5]);
-   if ((dmac[0] == smac[0]) && (dmac[1] == smac[1]) &&
-   (dmac[2] == smac[2]) && (dmac[3] == smac[3]) &&
-   (dmac[4] == smac[4]) && (dmac[5] == smac[5])) {
-   roce_set_bit(context->qpc_bytes_32,
-   QP_CONTEXT_QPC_BYTE_32_LOOPBACK_INDICATOR_S,
-   1);
-   break;
-   }
-   }
-
-   if (hr_dev->loop_idc == 0x1)
+   port = (attr_mask & IB_QP_PORT) ? (attr->port_num - 1) :
+   hr_qp->port;
+   smac = (u8 *)hr_dev->dev_addr[port];
+   /* when dmac equals smac or loop_idc is 1, it should loopback */
+   if (ether_addr_equal_unaligned(dmac, smac) ||
+   hr_dev->loop_idc == 0x1)
roce_set_bit(context->qpc_bytes_32,
-   QP_CONTEXT_QPC_BYTE_32_LOOPBACK_INDICATOR_S, 1);
+ QP_CONTEXT_QPC_BYTE_32_LOOPBACK_INDICATOR_S, 1);
 
roce_set_bit(context->qpc_bytes_32,
 QP_CONTEXT_QPC_BYTE_32_GLOBAL_HEADER_S,
-- 
1.7.9.5

[PATCH for-next 06/11] IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()

2016-11-04 Thread Salil Mehta

From: Lijun Ou 

In old code, the value of qp state from qpc was assigned for
attr->qp_state. The value may be an error while attr_mask &
IB_QP_STATE is zero.

Signed-off-by: Lijun Ou 
Reviewed-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 8ca36a7..2d48406 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2571,7 +2571,7 @@ static int hns_roce_v1_m_qp(struct ib_qp *ibqp, const 
struct ib_qp_attr *attr,
/* Every status migrate must change state */
roce_set_field(context->qpc_bytes_144,
   QP_CONTEXT_QPC_BYTES_144_QP_STATE_M,
-  QP_CONTEXT_QPC_BYTES_144_QP_STATE_S, attr->qp_state);
+  QP_CONTEXT_QPC_BYTES_144_QP_STATE_S, new_state);
 
/* SW pass context to HW */
ret = hns_roce_v1_qp_modify(hr_dev, &hr_qp->mtt,
-- 
1.7.9.5

[PATCH for-next 00/11] Code improvements & fixes for HNS RoCE driver

2016-11-04 Thread Salil Mehta

This patchset introduces some code improvements and fixes
for the identified problems in the HNS RoCE driver.

Lijun Ou (4):
  IB/hns: Add the interface for querying QP1
  IB/hns: add self loopback for CM
  IB/hns: Modify the condition of notifying hardware loopback
  IB/hns: Fix the bug for qp state in hns_roce_v1_m_qp()

Salil Mehta (1):
  IB/hns: Fix for Checkpatch.pl comment style errors

Shaobo Xu (1):
  IB/hns: Implement the add_gid/del_gid and optimize the GIDs
management

Wei Hu (Xavier) (5):
  IB/hns: Add code for refreshing CQ CI using TPTR
  IB/hns: Optimize the logic of allocating memory using APIs
  IB/hns: Modify the macro for the timeout when cmd process
  IB/hns: Modify query info named port_num when querying RC QP
  IB/hns: Change qpn allocation to round-robin mode.

 drivers/infiniband/hw/hns/hns_roce_alloc.c  |   11 +-
 drivers/infiniband/hw/hns/hns_roce_cmd.c|8 +-
 drivers/infiniband/hw/hns/hns_roce_cmd.h|7 +-
 drivers/infiniband/hw/hns/hns_roce_common.h |2 -
 drivers/infiniband/hw/hns/hns_roce_cq.c |   17 +-
 drivers/infiniband/hw/hns/hns_roce_device.h |   45 ++--
 drivers/infiniband/hw/hns/hns_roce_eq.c |6 +-
 drivers/infiniband/hw/hns/hns_roce_hem.c|6 +-
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |  271 +--
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |   17 +-
 drivers/infiniband/hw/hns/hns_roce_main.c   |  311 +++
 drivers/infiniband/hw/hns/hns_roce_mr.c |   21 +-
 drivers/infiniband/hw/hns/hns_roce_pd.c |5 +-
 drivers/infiniband/hw/hns/hns_roce_qp.c |2 +-
 14 files changed, 367 insertions(+), 362 deletions(-)

-- 
1.7.9.5

[PATCH for-next 02/11] IB/hns: Add code for refreshing CQ CI using TPTR

2016-11-04 Thread Salil Mehta

From: "Wei Hu (Xavier)" 

This patch added the code for refreshing CQ CI using TPTR in hip06
SoC.

We will send a doorbell to hardware for refreshing CQ CI when user
succeed to poll a cqe. But it will be failed if the doorbell has
been blocked. So hardware will read a special buffer called TPTR
to get the lastest CI value when the cq is almost full.

This patch support the special CI buffer as follows:
a) Alloc the memory for TPTR in the hns_roce_tptr_init function and
   free it in hns_roce_tptr_free function, these two functions will
   be called in probe function and in the remove function.
b) Add the code for computing offset(every cq need 2 bytes) and
   write the dma addr to every cq context to notice hardware in the
   function named hns_roce_v1_write_cqc.
c) Add code for mapping TPTR buffer to user space in function named
   hns_roce_mmap. The mapping distinguish TPTR and UAR of user mode
   by vm_pgoff(0: UAR, 1: TPTR, others:invaild) in hip06.
d) Alloc the code for refreshing CQ CI using TPTR in the function
   named hns_roce_v1_poll_cq.
e) Add some variable definitions to the related structure.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Dongdong Huang(Donald) 
Signed-off-by: Lijun Ou 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_common.h |2 -
 drivers/infiniband/hw/hns/hns_roce_cq.c |9 +++
 drivers/infiniband/hw/hns/hns_roce_device.h |6 +-
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |   79 ---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h  |9 +++
 drivers/infiniband/hw/hns/hns_roce_main.c   |   13 -
 6 files changed, 103 insertions(+), 15 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_common.h 
b/drivers/infiniband/hw/hns/hns_roce_common.h
index 2970161..0dcb620 100644
--- a/drivers/infiniband/hw/hns/hns_roce_common.h
+++ b/drivers/infiniband/hw/hns/hns_roce_common.h
@@ -253,8 +253,6 @@
 #define ROCEE_VENDOR_ID_REG0x0
 #define ROCEE_VENDOR_PART_ID_REG   0x4
 
-#define ROCEE_HW_VERSION_REG   0x8
-
 #define ROCEE_SYS_IMAGE_GUID_L_REG 0xC
 #define ROCEE_SYS_IMAGE_GUID_H_REG 0x10
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 0973659..5dc8d92 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -349,6 +349,15 @@ struct ib_cq *hns_roce_ib_create_cq(struct ib_device 
*ib_dev,
goto err_mtt;
}
 
+   /*
+* For the QP created by kernel space, tptr value should be initialized
+* to zero; For the QP created by user space, it will cause synchronous
+* problems if tptr is set to zero here, so we initialze it in user
+* space.
+*/
+   if (!context)
+   *hr_cq->tptr_addr = 0;
+
/* Get created cq handler and carry out event */
hr_cq->comp = hns_roce_ib_cq_comp;
hr_cq->event = hns_roce_ib_cq_event;
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 3417315..7242b14 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -37,6 +37,8 @@
 
 #define DRV_NAME "hns_roce"
 
+#define HNS_ROCE_HW_VER1   ('h' << 24 | 'i' << 16 | '0' << 8 | '6')
+
 #define MAC_ADDR_OCTET_NUM 6
 #define HNS_ROCE_MAX_MSG_LEN   0x8000
 
@@ -296,7 +298,7 @@ struct hns_roce_cq {
u32 cq_depth;
u32 cons_index;
void __iomem*cq_db_l;
-   void __iomem*tptr_addr;
+   u16 *tptr_addr;
unsigned long   cqn;
u32 vector;
atomic_trefcount;
@@ -553,6 +555,8 @@ struct hns_roce_dev {
 
int cmd_mod;
int loop_idc;
+   dma_addr_t  tptr_dma_addr; /*only for hw v1*/
+   u32 tptr_size; /*only for hw v1*/
struct hns_roce_hw  *hw;
 };
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index ca8b784..7750d0d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -849,6 +849,45 @@ static void hns_roce_bt_free(struct hns_roce_dev *hr_dev)
priv->bt_table.qpc_buf.buf, priv->bt_table.qpc_buf.map);
 }
 
+static int hns_roce_tptr_init(struct hns_roce_dev *hr_dev)
+{
+   struct device *dev = &hr_dev->pdev->dev;
+   struct hns_roce_buf_list *tptr_buf;
+   struct hns_roce_v1_priv *priv;
+
+   priv = (struct hns_roce_v1_priv *)hr_dev->hw->priv;
+   tptr_buf = &priv->tptr_table.tptr_buf;
+
+   /*
+* This buffer will be used for CQ's tptr(tail pointer), also

[PATCH for-next 04/11] IB/hns: add self loopback for CM

2016-11-04 Thread Salil Mehta

From: Lijun Ou 

This patch mainly adds self loopback support for CM.

Signed-off-by: Lijun Ou 
Signed-off-by: Peter Chen 
Reviewed-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |   11 +++
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h |2 ++
 2 files changed, 13 insertions(+)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 7750d0d..d6df6dd 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -32,6 +32,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
@@ -72,6 +73,8 @@ int hns_roce_v1_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
int nreq = 0;
u32 ind = 0;
int ret = 0;
+   u8 *smac;
+   int loopback;
 
if (unlikely(ibqp->qp_type != IB_QPT_GSI &&
ibqp->qp_type != IB_QPT_RC)) {
@@ -129,6 +132,14 @@ int hns_roce_v1_post_send(struct ib_qp *ibqp, struct 
ib_send_wr *wr,
   UD_SEND_WQE_U32_8_DMAC_5_M,
   UD_SEND_WQE_U32_8_DMAC_5_S,
   ah->av.mac[5]);
+
+   smac = (u8 *)hr_dev->dev_addr[qp->port];
+   loopback = ether_addr_equal_unaligned(ah->av.mac,
+ smac) ? 1 : 0;
+   roce_set_bit(ud_sq_wqe->u32_8,
+UD_SEND_WQE_U32_8_LOOPBACK_INDICATOR_S,
+loopback);
+
roce_set_field(ud_sq_wqe->u32_8,
   UD_SEND_WQE_U32_8_OPERATION_TYPE_M,
   UD_SEND_WQE_U32_8_OPERATION_TYPE_S,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
index 6004c7f..cf28f1b 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
@@ -440,6 +440,8 @@ struct hns_roce_ud_send_wqe {
 #define UD_SEND_WQE_U32_8_DMAC_5_M   \
(((1UL << 8) - 1) << UD_SEND_WQE_U32_8_DMAC_5_S)
 
+#define UD_SEND_WQE_U32_8_LOOPBACK_INDICATOR_S 22
+
 #define UD_SEND_WQE_U32_8_OPERATION_TYPE_S 16
 #define UD_SEND_WQE_U32_8_OPERATION_TYPE_M   \
(((1UL << 4) - 1) << UD_SEND_WQE_U32_8_OPERATION_TYPE_S)
-- 
1.7.9.5

Re: Ethernet not working on a different SoC with same eth HW

2016-11-04 Thread Florian Fainelli



On 11/04/2016 08:22 AM, Måns Rullgård wrote:
> Andrew Lunn  writes:
> 
>> On Fri, Nov 04, 2016 at 03:05:00PM +, Måns Rullgård wrote:
>>> Andrew Lunn  writes:
>>>
>> I agree with you. But fixing it is likely to break boards which
>> currently have "rgmii", but actually need the delay in order to work.
>
> Does the internal delay here refer to the PHY or the MAC?  It's a
> property of the MAC node after all.

 It is the PHY which applies the delay.
>>>
>>> Says who?
>>
>> The source code.
> 
> There's source code that disagrees with that.  The Broadcom GENET
> driver, for instance.

Correct, and in the case where the MAC adds the delay while transmitting
(because it supports that) the expectation is that the PHY would remove
such a delay internally, conversely, the PHY would introduce a delay
while transmitting back to the PHY, in order to produce the desired 90
degrees shift on the RGMII signals, and get reproduce the correct clock
and data alignment internally.

> 
>>>  Some MACs can do it too.
>>
>> I'm sure they can. But look at the code. Nearly none do, and those
>> that do are potentially broken.
> 
> Those few drivers that do anything differently based on these values
> enable clock delay in the MAC.  That's why I wrote the NB8800 driver the
> way I did.
> 

I don't really what is wrong with the nb8800 driver at the moment, so
maybe this is just a configuration issue with the Atheros PHY driver,
it's not like it has not given people headache judging by the recent
discussions...
-- 
Florian

[PATCH for-next 07/11] IB/hns: Modify the macro for the timeout when cmd process

2016-11-04 Thread Salil Mehta

From: "Wei Hu (Xavier)" 

This patch modified the macro for the timeout when cmd is
processing as follows:
Before modification:
 enum {
HNS_ROCE_CMD_TIME_CLASS_A   = 1,
HNS_ROCE_CMD_TIME_CLASS_B   = 1,
HNS_ROCE_CMD_TIME_CLASS_C   = 1,
 };
After modification:
 #define HNS_ROCE_CMD_TIMEOUT_MSECS 1

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_cmd.h   |7 +--
 drivers/infiniband/hw/hns/hns_roce_cq.c|4 ++--
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |8 
 drivers/infiniband/hw/hns/hns_roce_mr.c|4 ++--
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.h 
b/drivers/infiniband/hw/hns/hns_roce_cmd.h
index e3997d3..ed14ad3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.h
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.h
@@ -34,6 +34,7 @@
 #define _HNS_ROCE_CMD_H
 
 #define HNS_ROCE_MAILBOX_SIZE  4096
+#define HNS_ROCE_CMD_TIMEOUT_MSECS 1
 
 enum {
/* TPT commands */
@@ -57,12 +58,6 @@ enum {
HNS_ROCE_CMD_QUERY_QP   = 0x22,
 };
 
-enum {
-   HNS_ROCE_CMD_TIME_CLASS_A   = 1,
-   HNS_ROCE_CMD_TIME_CLASS_B   = 1,
-   HNS_ROCE_CMD_TIME_CLASS_C   = 1,
-};
-
 struct hns_roce_cmd_mailbox {
void   *buf;
dma_addr_t  dma;
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 5dc8d92..461a273 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -77,7 +77,7 @@ static int hns_roce_sw2hw_cq(struct hns_roce_dev *dev,
 unsigned long cq_num)
 {
return hns_roce_cmd_mbox(dev, mailbox->dma, 0, cq_num, 0,
-   HNS_ROCE_CMD_SW2HW_CQ, HNS_ROCE_CMD_TIME_CLASS_A);
+   HNS_ROCE_CMD_SW2HW_CQ, HNS_ROCE_CMD_TIMEOUT_MSECS);
 }
 
 static int hns_roce_cq_alloc(struct hns_roce_dev *hr_dev, int nent,
@@ -176,7 +176,7 @@ static int hns_roce_hw2sw_cq(struct hns_roce_dev *dev,
 {
return hns_roce_cmd_mbox(dev, 0, mailbox ? mailbox->dma : 0, cq_num,
 mailbox ? 0 : 1, HNS_ROCE_CMD_HW2SW_CQ,
-HNS_ROCE_CMD_TIME_CLASS_A);
+HNS_ROCE_CMD_TIMEOUT_MSECS);
 }
 
 static void hns_roce_free_cq(struct hns_roce_dev *hr_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 2d48406..c39a9b2 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -1871,12 +1871,12 @@ static int hns_roce_v1_qp_modify(struct hns_roce_dev 
*hr_dev,
if (op[cur_state][new_state] == HNS_ROCE_CMD_2RST_QP)
return hns_roce_cmd_mbox(hr_dev, 0, 0, hr_qp->qpn, 2,
 HNS_ROCE_CMD_2RST_QP,
-HNS_ROCE_CMD_TIME_CLASS_A);
+HNS_ROCE_CMD_TIMEOUT_MSECS);
 
if (op[cur_state][new_state] == HNS_ROCE_CMD_2ERR_QP)
return hns_roce_cmd_mbox(hr_dev, 0, 0, hr_qp->qpn, 2,
 HNS_ROCE_CMD_2ERR_QP,
-HNS_ROCE_CMD_TIME_CLASS_A);
+HNS_ROCE_CMD_TIMEOUT_MSECS);
 
mailbox = hns_roce_alloc_cmd_mailbox(hr_dev);
if (IS_ERR(mailbox))
@@ -1886,7 +1886,7 @@ static int hns_roce_v1_qp_modify(struct hns_roce_dev 
*hr_dev,
 
ret = hns_roce_cmd_mbox(hr_dev, mailbox->dma, 0, hr_qp->qpn, 0,
op[cur_state][new_state],
-   HNS_ROCE_CMD_TIME_CLASS_C);
+   HNS_ROCE_CMD_TIMEOUT_MSECS);
 
hns_roce_free_cmd_mailbox(hr_dev, mailbox);
return ret;
@@ -2681,7 +2681,7 @@ static int hns_roce_v1_query_qpc(struct hns_roce_dev 
*hr_dev,
 
ret = hns_roce_cmd_mbox(hr_dev, 0, mailbox->dma, hr_qp->qpn, 0,
HNS_ROCE_CMD_QUERY_QP,
-   HNS_ROCE_CMD_TIME_CLASS_A);
+   HNS_ROCE_CMD_TIMEOUT_MSECS);
if (!ret)
memcpy(hr_context, mailbox->buf, sizeof(*hr_context));
else
diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c 
b/drivers/infiniband/hw/hns/hns_roce_mr.c
index d3dfb5f..2227962 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -53,7 +53,7 @@ static int hns_roce_sw2hw_mpt(struct hns_roce_dev *hr_dev,
 {
return hns_roce_cmd_mbox(hr_dev, mailbox->dma, 0, mpt_index, 0,
 HNS_ROCE_CMD_SW2HW_MPT,
-HNS_ROCE_CMD_TIME_CLASS_B);
+HNS_ROCE_CMD_TIMEOUT_MSECS);
 }
 
 static int hns_roce_

[PATCH for-next 09/11] IB/hns: Change qpn allocation to round-robin mode.

2016-11-04 Thread Salil Mehta

From: "Wei Hu (Xavier)" 

When using CM to establish connections, qp number that was freed
just now will be rejected by ib core. To fix these problem, We
change qpn allocation to round-robin mode. We added the round-robin
mode for allocating resources using bitmap. We use round-robin mode
for qp number and non round-robing mode for other resources like
cq number, pd number etc.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_alloc.c  |   11 +++
 drivers/infiniband/hw/hns/hns_roce_cq.c |4 ++--
 drivers/infiniband/hw/hns/hns_roce_device.h |9 +++--
 drivers/infiniband/hw/hns/hns_roce_mr.c |2 +-
 drivers/infiniband/hw/hns/hns_roce_pd.c |5 +++--
 drivers/infiniband/hw/hns/hns_roce_qp.c |2 +-
 6 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_alloc.c 
b/drivers/infiniband/hw/hns/hns_roce_alloc.c
index 863a17a..605962f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_alloc.c
+++ b/drivers/infiniband/hw/hns/hns_roce_alloc.c
@@ -61,9 +61,10 @@ int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap, 
unsigned long *obj)
return ret;
 }
 
-void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj)
+void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj,
+ int rr)
 {
-   hns_roce_bitmap_free_range(bitmap, obj, 1);
+   hns_roce_bitmap_free_range(bitmap, obj, 1, rr);
 }
 
 int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
@@ -106,7 +107,8 @@ int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap 
*bitmap, int cnt,
 }
 
 void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
-   unsigned long obj, int cnt)
+   unsigned long obj, int cnt,
+   int rr)
 {
int i;
 
@@ -116,7 +118,8 @@ void hns_roce_bitmap_free_range(struct hns_roce_bitmap 
*bitmap,
for (i = 0; i < cnt; i++)
clear_bit(obj + i, bitmap->table);
 
-   bitmap->last = min(bitmap->last, obj);
+   if (!rr)
+   bitmap->last = min(bitmap->last, obj);
bitmap->top = (bitmap->top + bitmap->max + bitmap->reserved_top)
   & bitmap->mask;
spin_unlock(&bitmap->lock);
diff --git a/drivers/infiniband/hw/hns/hns_roce_cq.c 
b/drivers/infiniband/hw/hns/hns_roce_cq.c
index 461a273..c9f6c3d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cq.c
@@ -166,7 +166,7 @@ static int hns_roce_cq_alloc(struct hns_roce_dev *hr_dev, 
int nent,
hns_roce_table_put(hr_dev, &cq_table->table, hr_cq->cqn);
 
 err_out:
-   hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn);
+   hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn, BITMAP_NO_RR);
return ret;
 }
 
@@ -204,7 +204,7 @@ static void hns_roce_free_cq(struct hns_roce_dev *hr_dev,
spin_unlock_irq(&cq_table->lock);
 
hns_roce_table_put(hr_dev, &cq_table->table, hr_cq->cqn);
-   hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn);
+   hns_roce_bitmap_free(&cq_table->bitmap, hr_cq->cqn, BITMAP_NO_RR);
 }
 
 static int hns_roce_ib_get_cq_umem(struct hns_roce_dev *hr_dev,
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 7242b14..593a42a 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -72,6 +72,9 @@
 #define HNS_ROCE_MAX_GID_NUM   16
 #define HNS_ROCE_GID_SIZE  16
 
+#define BITMAP_NO_RR   0
+#define BITMAP_RR  1
+
 #define MR_TYPE_MR 0x00
 #define MR_TYPE_DMA0x03
 
@@ -661,7 +664,8 @@ int hns_roce_buf_write_mtt(struct hns_roce_dev *hr_dev,
 void hns_roce_cleanup_qp_table(struct hns_roce_dev *hr_dev);
 
 int hns_roce_bitmap_alloc(struct hns_roce_bitmap *bitmap, unsigned long *obj);
-void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj);
+void hns_roce_bitmap_free(struct hns_roce_bitmap *bitmap, unsigned long obj,
+int rr);
 int hns_roce_bitmap_init(struct hns_roce_bitmap *bitmap, u32 num, u32 mask,
 u32 reserved_bot, u32 resetrved_top);
 void hns_roce_bitmap_cleanup(struct hns_roce_bitmap *bitmap);
@@ -669,7 +673,8 @@ int hns_roce_bitmap_init(struct hns_roce_bitmap *bitmap, 
u32 num, u32 mask,
 int hns_roce_bitmap_alloc_range(struct hns_roce_bitmap *bitmap, int cnt,
int align, unsigned long *obj);
 void hns_roce_bitmap_free_range(struct hns_roce_bitmap *bitmap,
-   unsigned long obj, int cnt);
+   unsigned long obj, int cnt,
+   int rr);
 
 struct ib_ah *hns_roce_create_ah(struct ib_pd *

[PATCH for-next 01/11] IB/hns: Add the interface for querying QP1

2016-11-04 Thread Salil Mehta

From: Lijun Ou 

In old code, It only added the interface for querying non-specific
QP. This patch mainly adds an interface for querying QP1.

Signed-off-by: Lijun Ou 
Reviewed-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |   87 +++-
 drivers/infiniband/hw/hns/hns_roce_hw_v1.h |6 +-
 2 files changed, 90 insertions(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index 71232e5..ca8b784 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2630,8 +2630,82 @@ static int hns_roce_v1_query_qpc(struct hns_roce_dev 
*hr_dev,
return ret;
 }
 
-int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
-int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr)
+static int hns_roce_v1_q_sqp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
+int qp_attr_mask,
+struct ib_qp_init_attr *qp_init_attr)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
+   struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+   struct hns_roce_sqp_context *context;
+   u32 addr;
+
+   context = kzalloc(sizeof(*context), GFP_KERNEL);
+   if (!context)
+   return -ENOMEM;
+
+   mutex_lock(&hr_qp->mutex);
+
+   if (hr_qp->state == IB_QPS_RESET) {
+   qp_attr->qp_state = IB_QPS_RESET;
+   goto done;
+   }
+
+   addr = ROCEE_QP1C_CFG0_0_REG + hr_qp->port * sizeof(*context);
+   context->qp1c_bytes_4 = roce_read(hr_dev, addr);
+   context->sq_rq_bt_l = roce_read(hr_dev, addr + 1);
+   context->qp1c_bytes_12 = roce_read(hr_dev, addr + 2);
+   context->qp1c_bytes_16 = roce_read(hr_dev, addr + 3);
+   context->qp1c_bytes_20 = roce_read(hr_dev, addr + 4);
+   context->cur_rq_wqe_ba_l = roce_read(hr_dev, addr + 5);
+   context->qp1c_bytes_28 = roce_read(hr_dev, addr + 6);
+   context->qp1c_bytes_32 = roce_read(hr_dev, addr + 7);
+   context->cur_sq_wqe_ba_l = roce_read(hr_dev, addr + 8);
+   context->qp1c_bytes_40 = roce_read(hr_dev, addr + 9);
+
+   hr_qp->state = roce_get_field(context->qp1c_bytes_4,
+ QP1C_BYTES_4_QP_STATE_M,
+ QP1C_BYTES_4_QP_STATE_S);
+   qp_attr->qp_state   = hr_qp->state;
+   qp_attr->path_mtu   = IB_MTU_256;
+   qp_attr->path_mig_state = IB_MIG_ARMED;
+   qp_attr->qkey   = QKEY_VAL;
+   qp_attr->rq_psn = 0;
+   qp_attr->sq_psn = 0;
+   qp_attr->dest_qp_num= 1;
+   qp_attr->qp_access_flags = 6;
+
+   qp_attr->pkey_index = roce_get_field(context->qp1c_bytes_20,
+QP1C_BYTES_20_PKEY_IDX_M,
+QP1C_BYTES_20_PKEY_IDX_S);
+   qp_attr->port_num = hr_qp->port + 1;
+   qp_attr->sq_draining = 0;
+   qp_attr->max_rd_atomic = 0;
+   qp_attr->max_dest_rd_atomic = 0;
+   qp_attr->min_rnr_timer = 0;
+   qp_attr->timeout = 0;
+   qp_attr->retry_cnt = 0;
+   qp_attr->rnr_retry = 0;
+   qp_attr->alt_timeout = 0;
+
+done:
+   qp_attr->cur_qp_state = qp_attr->qp_state;
+   qp_attr->cap.max_recv_wr = hr_qp->rq.wqe_cnt;
+   qp_attr->cap.max_recv_sge = hr_qp->rq.max_gs;
+   qp_attr->cap.max_send_wr = hr_qp->sq.wqe_cnt;
+   qp_attr->cap.max_send_sge = hr_qp->sq.max_gs;
+   qp_attr->cap.max_inline_data = 0;
+   qp_init_attr->cap = qp_attr->cap;
+   qp_init_attr->create_flags = 0;
+
+   mutex_unlock(&hr_qp->mutex);
+   kfree(context);
+
+   return 0;
+}
+
+static int hns_roce_v1_q_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
+   int qp_attr_mask,
+   struct ib_qp_init_attr *qp_init_attr)
 {
struct hns_roce_dev *hr_dev = to_hr_dev(ibqp->device);
struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
@@ -2767,6 +2841,15 @@ int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *qp_attr,
return ret;
 }
 
+int hns_roce_v1_query_qp(struct ib_qp *ibqp, struct ib_qp_attr *qp_attr,
+int qp_attr_mask, struct ib_qp_init_attr *qp_init_attr)
+{
+   struct hns_roce_qp *hr_qp = to_hr_qp(ibqp);
+
+   return hr_qp->doorbell_qpn <= 1 ?
+   hns_roce_v1_q_sqp(ibqp, qp_attr, qp_attr_mask, qp_init_attr) :
+   hns_roce_v1_q_qp(ibqp, qp_attr, qp_attr_mask, qp_init_attr);
+}
 static void hns_roce_v1_destroy_qp_common(struct hns_roce_dev *hr_dev,
  struct hns_roce_qp *hr_qp,
  int is_user)
diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.h 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.h
index 539b0a3b..2e1878b 100644
--- a/drivers/infiniba

[PATCH for-next 03/11] IB/hns: Optimize the logic of allocating memory using APIs

2016-11-04 Thread Salil Mehta

From: "Wei Hu (Xavier)" 

This patch modified the logic of allocating memory using APIs in
hns RoCE driver. We used kcalloc instead of kmalloc_array and
bitmap_zero. And When kcalloc failed, call vzalloc to alloc
memory.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Ping Zhang 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_mr.c |   15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_mr.c 
b/drivers/infiniband/hw/hns/hns_roce_mr.c
index fb87883..d3dfb5f 100644
--- a/drivers/infiniband/hw/hns/hns_roce_mr.c
+++ b/drivers/infiniband/hw/hns/hns_roce_mr.c
@@ -137,11 +137,12 @@ static int hns_roce_buddy_init(struct hns_roce_buddy 
*buddy, int max_order)
 
for (i = 0; i <= buddy->max_order; ++i) {
s = BITS_TO_LONGS(1 << (buddy->max_order - i));
-   buddy->bits[i] = kmalloc_array(s, sizeof(long), GFP_KERNEL);
-   if (!buddy->bits[i])
-   goto err_out_free;
-
-   bitmap_zero(buddy->bits[i], 1 << (buddy->max_order - i));
+   buddy->bits[i] = kcalloc(s, sizeof(long), GFP_KERNEL);
+   if (!buddy->bits[i]) {
+   buddy->bits[i] = vzalloc(s * sizeof(long));
+   if (!buddy->bits[i])
+   goto err_out_free;
+   }
}
 
set_bit(0, buddy->bits[buddy->max_order]);
@@ -151,7 +152,7 @@ static int hns_roce_buddy_init(struct hns_roce_buddy 
*buddy, int max_order)
 
 err_out_free:
for (i = 0; i <= buddy->max_order; ++i)
-   kfree(buddy->bits[i]);
+   kvfree(buddy->bits[i]);
 
 err_out:
kfree(buddy->bits);
@@ -164,7 +165,7 @@ static void hns_roce_buddy_cleanup(struct hns_roce_buddy 
*buddy)
int i;
 
for (i = 0; i <= buddy->max_order; ++i)
-   kfree(buddy->bits[i]);
+   kvfree(buddy->bits[i]);
 
kfree(buddy->bits);
kfree(buddy->num_free);
-- 
1.7.9.5

[PATCH for-next 08/11] IB/hns: Modify query info named port_num when querying RC QP

2016-11-04 Thread Salil Mehta

From: "Wei Hu (Xavier)" 

This patch modified the output query info qp_attr->port_num
to fix bug in hip06.

Signed-off-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c |4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c 
b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
index c39a9b2..76edebe 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hw_v1.c
@@ -2861,9 +2861,7 @@ static int hns_roce_v1_q_qp(struct ib_qp *ibqp, struct 
ib_qp_attr *qp_attr,
qp_attr->pkey_index = roce_get_field(context->qpc_bytes_12,
  QP_CONTEXT_QPC_BYTES_12_P_KEY_INDEX_M,
  QP_CONTEXT_QPC_BYTES_12_P_KEY_INDEX_S);
-   qp_attr->port_num = (u8)roce_get_field(context->qpc_bytes_156,
-QP_CONTEXT_QPC_BYTES_156_PORT_NUM_M,
-QP_CONTEXT_QPC_BYTES_156_PORT_NUM_S) + 1;
+   qp_attr->port_num = hr_qp->port + 1;
qp_attr->sq_draining = 0;
qp_attr->max_rd_atomic = roce_get_field(context->qpc_bytes_156,
 QP_CONTEXT_QPC_BYTES_156_INITIATOR_DEPTH_M,
-- 
1.7.9.5

[PATCH for-next 10/11] IB/hns: Implement the add_gid/del_gid and optimize the GIDs management

2016-11-04 Thread Salil Mehta

From: Shaobo Xu 

IB core has implemented the calculation of GIDs and the management
of GID tables, and it is now responsible to supply query function
for GIDs. So the calculation of GIDs and the management of GID
tables in the RoCE driver is redundant.

The patch is to implement the add_gid/del_gid to set the GIDs in
the RoCE driver, remove the redundant calculation and management of
GIDs in the notifier call of the net device and the inet, and
update the query_gid.

Signed-off-by: Shaobo Xu 
Reviewed-by: Wei Hu (Xavier) 
Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_device.h |2 -
 drivers/infiniband/hw/hns/hns_roce_main.c   |  270 +--
 2 files changed, 48 insertions(+), 224 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 593a42a..9ef1cc3 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -429,8 +429,6 @@ struct hns_roce_ib_iboe {
struct net_device  *netdevs[HNS_ROCE_MAX_PORTS];
struct notifier_block   nb;
struct notifier_block   nb_inet;
-   /* 16 GID is shared by 6 port in v1 engine. */
-   union ib_gidgid_table[HNS_ROCE_MAX_GID_NUM];
u8  phy_port[HNS_ROCE_MAX_PORTS];
 };
 
diff --git a/drivers/infiniband/hw/hns/hns_roce_main.c 
b/drivers/infiniband/hw/hns/hns_roce_main.c
index 6770171..795ef97 100644
--- a/drivers/infiniband/hw/hns/hns_roce_main.c
+++ b/drivers/infiniband/hw/hns/hns_roce_main.c
@@ -35,52 +35,13 @@
 #include 
 #include 
 #include 
+#include 
 #include "hns_roce_common.h"
 #include "hns_roce_device.h"
 #include "hns_roce_user.h"
 #include "hns_roce_hem.h"
 
 /**
- * hns_roce_addrconf_ifid_eui48 - Get default gid.
- * @eui: eui.
- * @vlan_id:  gid
- * @dev:  net device
- * Description:
- *MAC convert to GID
- *gid[0..7] = fe80   
- *gid[8] = mac[0] ^ 2
- *gid[9] = mac[1]
- *gid[10] = mac[2]
- *gid[11] = ff(VLAN ID high byte (4 MS bits))
- *gid[12] = fe(VLAN ID low byte)
- *gid[13] = mac[3]
- *gid[14] = mac[4]
- *gid[15] = mac[5]
- */
-static void hns_roce_addrconf_ifid_eui48(u8 *eui, u16 vlan_id,
-struct net_device *dev)
-{
-   memcpy(eui, dev->dev_addr, 3);
-   memcpy(eui + 5, dev->dev_addr + 3, 3);
-   if (vlan_id < 0x1000) {
-   eui[3] = vlan_id >> 8;
-   eui[4] = vlan_id & 0xff;
-   } else {
-   eui[3] = 0xff;
-   eui[4] = 0xfe;
-   }
-   eui[0] ^= 2;
-}
-
-static void hns_roce_make_default_gid(struct net_device *dev, union ib_gid 
*gid)
-{
-   memset(gid, 0, sizeof(*gid));
-   gid->raw[0] = 0xFE;
-   gid->raw[1] = 0x80;
-   hns_roce_addrconf_ifid_eui48(&gid->raw[8], 0x, dev);
-}
-
-/**
  * hns_get_gid_index - Get gid index.
  * @hr_dev: pointer to structure hns_roce_dev.
  * @port:  port, value range: 0 ~ MAX
@@ -96,30 +57,6 @@ int hns_get_gid_index(struct hns_roce_dev *hr_dev, u8 port, 
int gid_index)
return gid_index * hr_dev->caps.num_ports + port;
 }
 
-static int hns_roce_set_gid(struct hns_roce_dev *hr_dev, u8 port, int 
gid_index,
-union ib_gid *gid)
-{
-   struct device *dev = &hr_dev->pdev->dev;
-   u8 gid_idx = 0;
-
-   if (gid_index >= hr_dev->caps.gid_table_len[port]) {
-   dev_err(dev, "gid_index %d illegal, port %d gid range: 0~%d\n",
-   gid_index, port, hr_dev->caps.gid_table_len[port] - 1);
-   return -EINVAL;
-   }
-
-   gid_idx = hns_get_gid_index(hr_dev, port, gid_index);
-
-   if (!memcmp(gid, &hr_dev->iboe.gid_table[gid_idx], sizeof(*gid)))
-   return -EINVAL;
-
-   memcpy(&hr_dev->iboe.gid_table[gid_idx], gid, sizeof(*gid));
-
-   hr_dev->hw->set_gid(hr_dev, port, gid_index, gid);
-
-   return 0;
-}
-
 static void hns_roce_set_mac(struct hns_roce_dev *hr_dev, u8 port, u8 *addr)
 {
u8 phy_port;
@@ -147,15 +84,44 @@ static void hns_roce_set_mtu(struct hns_roce_dev *hr_dev, 
u8 port, int mtu)
hr_dev->hw->set_mtu(hr_dev, phy_port, tmp);
 }
 
-static void hns_roce_update_gids(struct hns_roce_dev *hr_dev, int port)
+static int hns_roce_add_gid(struct ib_device *device, u8 port_num,
+   unsigned int index, const union ib_gid *gid,
+   const struct ib_gid_attr *attr, void **context)
+{
+   struct hns_roce_dev *hr_dev = to_hr_dev(device);
+   u8 port = port_num - 1;
+   unsigned long flags;
+
+   if (port >= hr_dev->caps.num_ports)
+   return -EINVAL;
+
+   spin_lock_irqsave(&hr_dev->iboe.lock, flags);
+
+   hr_dev->hw->set_gid(hr_dev, port, index, (union ib_gid *)gid);
+
+   spin_unlock_irqrestore(&hr_dev->iboe.lock, flags);
+
+   return 0;
+}
+
+static int

[PATCH for-next 11/11] IB/hns: Fix for Checkpatch.pl comment style errors

2016-11-04 Thread Salil Mehta

This patch correct the comment style errors caught by
checkpatch.pl script

Signed-off-by: Salil Mehta  
---
 drivers/infiniband/hw/hns/hns_roce_cmd.c|8 ++--
 drivers/infiniband/hw/hns/hns_roce_device.h |   28 +++---
 drivers/infiniband/hw/hns/hns_roce_eq.c |6 +--
 drivers/infiniband/hw/hns/hns_roce_hem.c|6 +--
 drivers/infiniband/hw/hns/hns_roce_hw_v1.c  |   56 +--
 drivers/infiniband/hw/hns/hns_roce_main.c   |   28 +++---
 6 files changed, 66 insertions(+), 66 deletions(-)

diff --git a/drivers/infiniband/hw/hns/hns_roce_cmd.c 
b/drivers/infiniband/hw/hns/hns_roce_cmd.c
index 2a0b6c0..8c1f7a6 100644
--- a/drivers/infiniband/hw/hns/hns_roce_cmd.c
+++ b/drivers/infiniband/hw/hns/hns_roce_cmd.c
@@ -216,10 +216,10 @@ static int __hns_roce_cmd_mbox_wait(struct hns_roce_dev 
*hr_dev, u64 in_param,
goto out;
 
/*
-   * It is timeout when wait_for_completion_timeout return 0
-   * The return value is the time limit set in advance
-   * how many seconds showing
-   */
+* It is timeout when wait_for_completion_timeout return 0
+* The return value is the time limit set in advance
+* how many seconds showing
+*/
if (!wait_for_completion_timeout(&context->done,
 msecs_to_jiffies(timeout))) {
dev_err(dev, "[cmd]wait_for_completion_timeout timeout\n");
diff --git a/drivers/infiniband/hw/hns/hns_roce_device.h 
b/drivers/infiniband/hw/hns/hns_roce_device.h
index 9ef1cc3..e48464d 100644
--- a/drivers/infiniband/hw/hns/hns_roce_device.h
+++ b/drivers/infiniband/hw/hns/hns_roce_device.h
@@ -201,9 +201,9 @@ struct hns_roce_bitmap {
 /* Order = 0: bitmap is biggest, order = max bitmap is least (only a bit) */
 /* Every bit repesent to a partner free/used status in bitmap */
 /*
-* Initial, bits of other bitmap are all 0 except that a bit of max_order is 1
-* Bit = 1 represent to idle and available; bit = 0: not available
-*/
+ * Initial, bits of other bitmap are all 0 except that a bit of max_order is 1
+ * Bit = 1 represent to idle and available; bit = 0: not available
+ */
 struct hns_roce_buddy {
/* Members point to every order level bitmap */
unsigned long **bits;
@@ -365,25 +365,25 @@ struct hns_roce_cmdq {
struct mutexhcr_mutex;
struct semaphorepoll_sem;
/*
-   * Event mode: cmd register mutex protection,
-   * ensure to not exceed max_cmds and user use limit region
-   */
+* Event mode: cmd register mutex protection,
+* ensure to not exceed max_cmds and user use limit region
+*/
struct semaphoreevent_sem;
int max_cmds;
spinlock_t  context_lock;
int free_head;
struct hns_roce_cmd_context *context;
/*
-   * Result of get integer part
-   * which max_comds compute according a power of 2
-   */
+* Result of get integer part
+* which max_comds compute according a power of 2
+*/
u16 token_mask;
/*
-   * Process whether use event mode, init default non-zero
-   * After the event queue of cmd event ready,
-   * can switch into event mode
-   * close device, switch into poll mode(non event mode)
-   */
+* Process whether use event mode, init default non-zero
+* After the event queue of cmd event ready,
+* can switch into event mode
+* close device, switch into poll mode(non event mode)
+*/
u8  use_events;
u8  toggle;
 };
diff --git a/drivers/infiniband/hw/hns/hns_roce_eq.c 
b/drivers/infiniband/hw/hns/hns_roce_eq.c
index 21e21b0..50f8649 100644
--- a/drivers/infiniband/hw/hns/hns_roce_eq.c
+++ b/drivers/infiniband/hw/hns/hns_roce_eq.c
@@ -371,9 +371,9 @@ static int hns_roce_aeq_ovf_int(struct hns_roce_dev *hr_dev,
int i = 0;
 
/**
-   * AEQ overflow ECC mult bit err CEQ overflow alarm
-   * must clear interrupt, mask irq, clear irq, cancel mask operation
-   */
+* AEQ overflow ECC mult bit err CEQ overflow alarm
+* must clear interrupt, mask irq, clear irq, cancel mask operation
+*/
aeshift_val = roce_read(hr_dev, ROCEE_CAEP_AEQC_AEQE_SHIFT_REG);
 
if (roce_get_bit(aeshift_val,
diff --git a/drivers/infiniband/hw/hns/hns_roce_hem.c 
b/drivers/infiniband/hw/hns/hns_roce_hem.c
index 250d8f2..c5104e0 100644
--- a/drivers/infiniband/hw/hns/hns_roce_hem.c
+++ b/drivers/infiniband/hw/hns/hns_roce_hem.c
@@ -80,9 +80,9 @@ struct hns_roce_hem *hns_roce_alloc_hem(struct hns_roce_dev 
*hr_dev, int npages,
--order;
 
/*
-   * Alloc memory one time. If failed, don't alloc small block
-   * memory, directly return fail.
-

Re: [PATCH v6 7/7] arm64: dts: NS2: add AMAC ethernet support

2016-11-04 Thread Jon Mason

On Fri, Nov 04, 2016 at 04:31:40PM +0300, Sergei Shtylyov wrote:
> Hello.
> 
> On 11/4/2016 8:11 AM, Jon Mason wrote:
> 
> >Add support for the AMAC ethernet to the Broadcom Northstar2 SoC device
> >tree
> >
> >Signed-off-by: Jon Mason 
> >---
> > arch/arm64/boot/dts/broadcom/ns2-svk.dts |  5 +
> > arch/arm64/boot/dts/broadcom/ns2.dtsi| 12 
> > 2 files changed, 17 insertions(+)
> >
> >diff --git a/arch/arm64/boot/dts/broadcom/ns2-svk.dts 
> >b/arch/arm64/boot/dts/broadcom/ns2-svk.dts
> >index b09f3bc..c4d5442 100644
> >--- a/arch/arm64/boot/dts/broadcom/ns2-svk.dts
> >+++ b/arch/arm64/boot/dts/broadcom/ns2-svk.dts
> >@@ -56,6 +56,10 @@
> > };
> > };
> >
> >+&enet {
> >+status = "ok";
> 
>The spec dictates it should be "okay" (although "ok" is also recognized).

The rest of the file uses "ok".  So, the addition above is consistent
with the other entries.

Perhaps a patch outside this series to convert the entire file from
"ok" to "okay" would be acceptable to you.

Thanks,
Jon

> 
> >+};
> >+
> > &pci_phy0 {
> > status = "ok";
> > };
> >@@ -174,6 +178,7 @@
> > &mdio_mux_iproc {
> > mdio@10 {
> > gphy0: eth-phy@10 {
> >+enet-phy-lane-swap;
> > reg = <0x10>;
> > };
> > };
> [...]
> 
> MBR, Sergei
>

Re: [PATCH 2/2] net: ethernet: nb8800: handle all RGMII declinations

2016-11-04 Thread Sebastian Frias

On 11/04/2016 05:23 PM, Florian Fainelli wrote:
> 
> 
> On 11/04/2016 08:05 AM, Sebastian Frias wrote:
>> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
>> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
>> delay for tx and rx only") added several RGMII declinations:
>> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
>> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
>>
>> Those are all RGMII modes (1Gbit) and must be considered that way when
>> setting the MAC Mode or the Pads Mode for the HW to work properly.
>>
>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 10 ++
>>  1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index d2855c9..6230ace 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -609,7 +609,10 @@ static void nb8800_mac_config(struct net_device *dev)
>>  mac_mode |= HALF_DUPLEX;
>>  
>>  if (gigabit) {
>> -if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
>> +if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII ||
>> +priv->phy_mode == PHY_INTERFACE_MODE_RGMII_ID ||
>> +priv->phy_mode == PHY_INTERFACE_MODE_RGMII_RXID ||
>> +priv->phy_mode == PHY_INTERFACE_MODE_RGMII_TXID)
> 
> phy_interface_is_rgmii(phydev)?

Thanks! I'll post an update.

> 
>>  mac_mode |= RGMII_MODE;
>>  
>>  mac_mode |= GMAC_MODE;
>> @@ -1278,9 +1281,8 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>  
>>  case PHY_INTERFACE_MODE_RGMII:
>> -pad_mode = PAD_MODE_RGMII;
>> -break;
>> -
>> +case PHY_INTERFACE_MODE_RGMII_ID:
>> +case PHY_INTERFACE_MODE_RGMII_RXID:
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>>  pad_mode = PAD_MODE_RGMII;
>>  break;
>>
>

Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Florian Fainelli



On 11/04/2016 08:36 AM, Sebastian Frias wrote:
> Hi Måns,
> 
> On 11/04/2016 04:18 PM, Måns Rullgård wrote:
>> Sebastian Frias  writes:
>>
>>> The delay can be applied at PHY or MAC level, but since
>>> PHY drivers will apply the delay at PHY level when using
>>> one of the "internal delay" declinations of RGMII mode
>>> (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
>>> at MAC level causes issues.
>>
>> The Broadcom GENET driver does the same thing.
>>
> 
> Well, I don't know who uses that driver, or why they did it that way.

I do use this driver and it works for me (tm), although I tested mostly
with Broadcom PHYs and Ethernet switches, rarely with third party PHYs,
but had that too, but all of that is in tree though,
drivers/net/phy/broadcom.com, drivers/net/dsa/b53/ so feel free to
"audit" that part of the code too.

The configuration of the GENET port multiplexer requires us to specify
how we want to align the clock and data, if we don't do that, and the
PHY is also not agreeing with how its own delays should be configured,
mayhem ensues, ranging from occasional transmit success, to high rates
of CRC/FCS errors in best cases.

I did verify that the settings were correct using a scope FWIW.

> 
> However, with the current code and DT bindings, if one requires
> the delay, phy-connection-type="rgmii-txid" must be set.

Yes, and we would set it correctly for our Broadcom reference boards
using this driver.

> 
> But when doing so, both the Atheros 8035 and the Aurora NB8800 drivers
> will apply the delay.
> 
> I think a better way of dealing with this is that both, PHY and MAC
> drivers exchange information so that the delay is applied only once.

Exchange what information? The PHY device interface (phydev->interface)
conveys the needed information for both entities.

> 
> I can see how to do that in another patch set.
> 
>>> Signed-off-by: Sebastian Frias 
>>> ---
>>>  drivers/net/ethernet/aurora/nb8800.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>>> b/drivers/net/ethernet/aurora/nb8800.c
>>> index b59aa35..d2855c9 100644
>>> --- a/drivers/net/ethernet/aurora/nb8800.c
>>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>>> @@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
>>> break;
>>>
>>> case PHY_INTERFACE_MODE_RGMII_TXID:
>>> -   pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
>>> +   pad_mode = PAD_MODE_RGMII;
>>> break;
>>>
>>> default:
>>> -- 
>>> 1.7.11.2
>>
>> If this change is correct (and I'm not convinced it is), that case
>> should be merged with the one above it and PHY_INTERFACE_MODE_RGMII_RXID
>> added as well.
>>
> 
> I can do a single patch.
> 
> The reason I made two patches was that it was clear what this patch
> does, i.e.: do not apply the delay at MAC level, and what the subsequent
> patch does, i.e.: handle all RGMII declinations.
> 
> Best regards,
> 
> Sebastian
> 

-- 
Florian

Re: [PATCH net-next v4 3/9] ipv6: sr: add support for SRH encapsulation and injection with lwtunnels

2016-11-04 Thread Tom Herbert

On Fri, Nov 4, 2016 at 3:29 AM, David Lebrun  wrote:
> This patch creates a new type of interfaceless lightweight tunnel (SEG6),
> enabling the encapsulation and injection of SRH within locally emitted
> packets and forwarded packets.
>
> From a configuration viewpoint, a seg6 tunnel would be configured as follows:
>
>   ip -6 ro ad fc00::1/128 encap seg6 mode encap segs fc42::1,fc42::2,fc42::3 
> dev eth0
>
> Any packet whose destination address is fc00::1 would thus be encapsulated
> within an outer IPv6 header containing the SRH with three segments, and would
> actually be routed to the first segment of the list. If `mode inline' was
> specified instead of `mode encap', then the SRH would be directly inserted
> after the IPv6 header without outer encapsulation.
>
> Signed-off-by: David Lebrun 
> ---
>  include/linux/seg6_iptunnel.h  |   6 +
>  include/net/seg6.h |   3 +
>  include/uapi/linux/lwtunnel.h  |   1 +
>  include/uapi/linux/seg6_iptunnel.h |  41 
>  net/core/lwtunnel.c|   2 +
>  net/ipv6/Makefile  |   2 +-
>  net/ipv6/seg6.c|   7 +
>  net/ipv6/seg6_iptunnel.c   | 380 
> +
>  8 files changed, 441 insertions(+), 1 deletion(-)
>  create mode 100644 include/linux/seg6_iptunnel.h
>  create mode 100644 include/uapi/linux/seg6_iptunnel.h
>  create mode 100644 net/ipv6/seg6_iptunnel.c
>
> diff --git a/include/linux/seg6_iptunnel.h b/include/linux/seg6_iptunnel.h
> new file mode 100644
> index 000..5377cf6
> --- /dev/null
> +++ b/include/linux/seg6_iptunnel.h
> @@ -0,0 +1,6 @@
> +#ifndef _LINUX_SEG6_IPTUNNEL_H
> +#define _LINUX_SEG6_IPTUNNEL_H
> +
> +#include 
> +
> +#endif
> diff --git a/include/net/seg6.h b/include/net/seg6.h
> index 7c7b8ed..5dac54e 100644
> --- a/include/net/seg6.h
> +++ b/include/net/seg6.h
> @@ -16,6 +16,7 @@
>
>  #include 
>  #include 
> +#include 
>
>  static inline void update_csum_diff4(struct sk_buff *skb, __be32 from,
>  __be32 to)
> @@ -48,5 +49,7 @@ static inline struct seg6_pernet_data *seg6_pernet(struct 
> net *net)
>
>  extern int seg6_init(void);
>  extern void seg6_exit(void);
> +extern int seg6_iptunnel_init(void);
> +extern void seg6_iptunnel_exit(void);
>
>  #endif
> diff --git a/include/uapi/linux/lwtunnel.h b/include/uapi/linux/lwtunnel.h
> index a478fe8..453cc62 100644
> --- a/include/uapi/linux/lwtunnel.h
> +++ b/include/uapi/linux/lwtunnel.h
> @@ -9,6 +9,7 @@ enum lwtunnel_encap_types {
> LWTUNNEL_ENCAP_IP,
> LWTUNNEL_ENCAP_ILA,
> LWTUNNEL_ENCAP_IP6,
> +   LWTUNNEL_ENCAP_SEG6,
> __LWTUNNEL_ENCAP_MAX,
>  };
>
> diff --git a/include/uapi/linux/seg6_iptunnel.h 
> b/include/uapi/linux/seg6_iptunnel.h
> new file mode 100644
> index 000..da5524a
> --- /dev/null
> +++ b/include/uapi/linux/seg6_iptunnel.h
> @@ -0,0 +1,41 @@
> +/*
> + *  SR-IPv6 implementation
> + *
> + *  Author:
> + *  David Lebrun 
> + *
> + *
> + *  This program is free software; you can redistribute it and/or
> + *  modify it under the terms of the GNU General Public License
> + *  as published by the Free Software Foundation; either version
> + *  2 of the License, or (at your option) any later version.
> + */
> +
> +#ifndef _UAPI_LINUX_SEG6_IPTUNNEL_H
> +#define _UAPI_LINUX_SEG6_IPTUNNEL_H
> +
> +enum {
> +   SEG6_IPTUNNEL_UNSPEC,
> +   SEG6_IPTUNNEL_SRH,
> +   __SEG6_IPTUNNEL_MAX,
> +};
> +#define SEG6_IPTUNNEL_MAX (__SEG6_IPTUNNEL_MAX - 1)
> +
> +struct seg6_iptunnel_encap {
> +   int flags;
> +   struct ipv6_sr_hdr srh[0];
> +};
> +
> +#define SEG6_IPTUN_ENCAP_SIZE(x) ((sizeof(*x)) + (((x)->srh->hdrlen + 1) << 
> 3))
> +
> +#define SEG6_IPTUN_FLAG_ENCAP   0x1
> +
> +static inline size_t seg6_lwt_headroom(struct seg6_iptunnel_encap *tuninfo)
> +{
> +   int encap = !!(tuninfo->flags & SEG6_IPTUN_FLAG_ENCAP);
> +
> +   return ((tuninfo->srh->hdrlen + 1) << 3) +
> +  (encap * sizeof(struct ipv6hdr));
> +}
> +
> +#endif
> diff --git a/net/core/lwtunnel.c b/net/core/lwtunnel.c
> index 88fd642..03976e9 100644
> --- a/net/core/lwtunnel.c
> +++ b/net/core/lwtunnel.c
> @@ -39,6 +39,8 @@ static const char *lwtunnel_encap_str(enum 
> lwtunnel_encap_types encap_type)
> return "MPLS";
> case LWTUNNEL_ENCAP_ILA:
> return "ILA";
> +   case LWTUNNEL_ENCAP_SEG6:
> +   return "SEG6";
> case LWTUNNEL_ENCAP_IP6:
> case LWTUNNEL_ENCAP_IP:
> case LWTUNNEL_ENCAP_NONE:
> diff --git a/net/ipv6/Makefile b/net/ipv6/Makefile
> index c92010d..59ee92f 100644
> --- a/net/ipv6/Makefile
> +++ b/net/ipv6/Makefile
> @@ -9,7 +9,7 @@ ipv6-objs :=af_inet6.o anycast.o ip6_output.o ip6_input.o 
> addrconf.o \
> route.o ip6_fib.o ipv6_sockglue.o ndisc.o udp.o udplite.o \
> raw.o icmp.o mcast.o reassembly.o tcp_ipv6.o ping.o \
> exthdrs.o datagram.o ip6

Re: [PATCH 2/2] net: ethernet: nb8800: handle all RGMII declinations

2016-11-04 Thread Florian Fainelli



On 11/04/2016 08:05 AM, Sebastian Frias wrote:
> Commit a999589ccaae ("phylib: add RGMII-ID interface mode definition")
> and commit 7d400a4c5897 ("phylib: add PHY interface modes for internal
> delay for tx and rx only") added several RGMII declinations:
> PHY_INTERFACE_MODE_RGMII_ID, PHY_INTERFACE_MODE_RGMII_RXID and
> PHY_INTERFACE_MODE_RGMII_TXID to deal with internal delays.
> 
> Those are all RGMII modes (1Gbit) and must be considered that way when
> setting the MAC Mode or the Pads Mode for the HW to work properly.
> 
> Signed-off-by: Sebastian Frias 
> ---
>  drivers/net/ethernet/aurora/nb8800.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
> b/drivers/net/ethernet/aurora/nb8800.c
> index d2855c9..6230ace 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -609,7 +609,10 @@ static void nb8800_mac_config(struct net_device *dev)
>   mac_mode |= HALF_DUPLEX;
>  
>   if (gigabit) {
> - if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII)
> + if (priv->phy_mode == PHY_INTERFACE_MODE_RGMII ||
> + priv->phy_mode == PHY_INTERFACE_MODE_RGMII_ID ||
> + priv->phy_mode == PHY_INTERFACE_MODE_RGMII_RXID ||
> + priv->phy_mode == PHY_INTERFACE_MODE_RGMII_TXID)

phy_interface_is_rgmii(phydev)?

>   mac_mode |= RGMII_MODE;
>  
>   mac_mode |= GMAC_MODE;
> @@ -1278,9 +1281,8 @@ static int nb8800_tangox_init(struct net_device *dev)
>   break;
>  
>   case PHY_INTERFACE_MODE_RGMII:
> - pad_mode = PAD_MODE_RGMII;
> - break;
> -
> + case PHY_INTERFACE_MODE_RGMII_ID:
> + case PHY_INTERFACE_MODE_RGMII_RXID:
>   case PHY_INTERFACE_MODE_RGMII_TXID:
>   pad_mode = PAD_MODE_RGMII;
>   break;
> 

-- 
Florian

[PATCH net] ipv4: update comment to document GSO fragmentation cases.

2016-11-04 Thread Lance Richardson

This is a follow-up to commit eb96202f1e34 ("ipv4: allow local
fragmentation in ip_finish_output_gso()"), updating the comment
documenting cases in which fragmentation is needed for egress
GSO packets.

Suggested-by: Shmulik Ladkani 
Signed-off-by: Lance Richardson 
---
 net/ipv4/ip_output.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 4971401..c2dae40 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -244,12 +244,18 @@ static int ip_finish_output_gso(struct net *net, struct 
sock *sk,
if (skb_gso_validate_mtu(skb, mtu))
return ip_finish_output2(net, sk, skb);
 
-   /* Slowpath -  GSO segment length is exceeding the dst MTU.
+   /* Slowpath -  GSO segment length exceeds the egress MTU.
 *
-* This can happen in two cases:
-* 1) TCP GRO packet, DF bit not set
-* 2) skb arrived via virtio-net, we thus get TSO/GSO skbs directly
-* from host network stack.
+* This can happen in several cases:
+*  - Forwarding of a TCP GRO skb, when DF flag is not set.
+*  - Forwarding of an skb that arrived on a virtualization interface
+*(virtio-net/vhost/tap) with TSO/GSO size set by other network
+*stack.
+*  - Local GSO skb transmitted on an NETIF_F_TSO tunnel stacked over an
+*interface with a smaller MTU.
+*  - Arriving GRO skb (or GSO skb in a virtualized environment) that is
+*bridged to a NETIF_F_TSO tunnel stacked over an interface with an
+*insufficent MTU.
 */
features = netif_skb_features(skb);
BUILD_BUG_ON(sizeof(*IPCB(skb)) > SKB_SGO_CB_OFFSET);
-- 
2.5.5

Re: [PATCH] Documentation: networking: dsa: Update tagging protocols

2016-11-04 Thread Florian Fainelli



On 11/04/2016 05:16 AM, Fabian Mewes wrote:
> Add Qualcomm QCA tagging introduced in cafdc45c9 to the
> list of supported protocols.
> 
> Signed-off-by: Fabian Mewes 

Acked-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 1/2] net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes

2016-11-04 Thread Florian Fainelli



On 11/04/2016 08:51 AM, Neil Armstrong wrote:
> In order to support PHY switching on Amlogic GXL SoCs, add support for
> 16bit and 32bit registers sizes.
> 
> Reviewed-by: Andrew Lunn 
> Signed-off-by: Neil Armstrong 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [PATCH net-next 2/2] net: phy: Add Meson GXL Internal PHY driver

2016-11-04 Thread Florian Fainelli



On 11/04/2016 08:51 AM, Neil Armstrong wrote:
> Add driver for the Internal RMII PHY found in the Amlogic Meson GXL SoCs.
> 
> This PHY seems to only implement some standard registers and need some
> workarounds to provide autoneg values from vendor registers.
> 
> Some magic values are currently used to configure the PHY, and this a
> temporary setup until clarification about these registers names and
> registers fields are provided by Amlogic.
> 
> Signed-off-by: Neil Armstrong 

Reviewed-by: Florian Fainelli 
-- 
Florian

Re: [net-next PATCH 0/7] stmmac: dwmac-sti refactor+cleanup

2016-11-04 Thread Joachim Eastwood

Hi Giuseppe,

On 4 November 2016 at 14:49, Giuseppe CAVALLARO  wrote:
> Hello Joachim.
>
> I have tested the patches on STiH390 with GMAC4 and the driver is ok.
>
> So you can add my Acked-by/Tested-by in the V2.

Thanks! I'll send a V2 later today or tomorrow.


> I just ask you, when rename the sti_dwmac_init in sti_dwmac_set_phy_mode
> to use another name: sti_dwmac_set_mode could be good, IMO.

Sure thing.


regards,
Joachim Eastwood

RE: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()

2016-11-04 Thread Jon Maloy

> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of ??
> Sent: Friday, 04 November, 2016 03:24
> To: Jon Maloy ; Ben Hutchings
> ; Ying Xue 
> Cc: netdev@vger.kernel.org; Eric Dumazet 
> Subject: Re: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()
> 
> Hi,
> I think both tipc_l2_device_event() and tipc_enable_l2_media() need to refuse 
> a
> tiny MTU for TIPC bearers.

Right, except that when looking into the code for tipc_l2_device_event() I 
realize that it currently doesn't try to re-adapt to a new MTU at all. It just 
calls tipc_reset_bearer(), which I suspect has changed somewhere along the road 
to ignore the MTU. So, you only need to change tipc_enable_l2_media().

///jon

> 
> tipc_l2_device_event() used to update the TIPC MTU value when executing a
> command like 'ifconfig eth0 MTU 1 up'.
> tipc_enable_l2_media() will be invoked when the TIPC network created.
> 
> Thanks.
> 
> Qian Zhang
> MarvelTeam Qihoo 360
> 
> 
> 
> -邮件原件-
> 发件人: Jon Maloy [mailto:jon.ma...@ericsson.com]
> 发送时间: 2016年11月1日 19:37
> 收件人: 张谦; Ben Hutchings; Ying Xue
> 抄送: netdev@vger.kernel.org; Eric Dumazet
> 主题: RE: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()
> 
> Hi,
> I think we all agreed in the end that this is a possible, but highly 
> implausible,
> scenario, and rather as a point of exploit than a functional bug.
> The solution is very simple, and described further down in this mail thread. 
> I have
> not done anything to it yet, but you are welcome to contribute.
> 
> BR
> ///jon
> 
> 
> > -Original Message-
> > From: 张谦 [mailto:zhangqia...@360.cn]
> > Sent: Tuesday, 01 November, 2016 02:35
> > To: Ben Hutchings ; Jon Maloy
> > ; Ying Xue 
> > Cc: netdev@vger.kernel.org; Eric Dumazet 
> > Subject: Re: [PATCH net] tipc: Guard against tiny MTU in
> > tipc_msg_build()
> >
> > Hi all,
> > I have accomplished a PoC can help you to confirm this issue.
> >
> > And two weeks passed from the last mail, can you tell me the progress
> > of the patch to this flaw?
> >
> > Thanks.
> >
> > Qian Zhang
> > Marvel Team Qihoo 360
> >
> >
> > -邮件原件-
> > 发件人: Ben Hutchings [mailto:b...@decadent.org.uk]
> > 发送时间: 2016年10月21日 23:00
> > 收件人: Jon Maloy; Ying Xue
> > 抄送: netdev@vger.kernel.org; 张谦; Eric Dumazet
> > 主题: Re: [PATCH net] tipc: Guard against tiny MTU in tipc_msg_build()
> >
> > On Fri, 2016-10-21 at 14:57 +, Jon Maloy wrote:
> > > > -Original Message-
> > > > > > From: Ben Hutchings [mailto:b...@decadent.org.uk]
> > > > Sent: Thursday, 20 October, 2016 12:40
> > > > > > To: Jon Maloy ; Ying Xue
> > > > > > 
> > > > > > > > Cc: netdev@vger.kernel.org; Qian Zhang
> > > > > > > > ; Eric Dumazet
> > > > > > 
> > > > Subject: Re: [PATCH net] tipc: Guard against tiny MTU in
> > > > tipc_msg_build()
> > > >
> > > > On Thu, 2016-10-20 at 14:51 +, Jon Maloy wrote:
> > > > [...]
> > > > > > At this point we're about to copy INT_H_SIZE + mhsz bytes into
> > > > > > the first fragment.  If that's already limited to be less than
> > > > > > or equal to MAX_H_SIZE, comparing with MAX_H_SIZE would be fine.
> > > > > > But if
> > > >
> > > > MAX_H_SIZE
> > > > > > is the maximum value of mhsz, that won't be good enough.
> > > > >
> > > > >
> > > > >
> > > > > MAX_H_SIZE is 60 bytes, but in practice you will never see an
> > > > > mhsz larger than
> > > >
> > > > the biggest header we are actually using, which is MCAST_H_SIZE
> > > > (==44
> > bytes).
> > > > > INT_H_SIZE is 40 bytes, so you are in reality testing for
> > > > > whether we have an mtu
> > > >
> > > > < 84 bytes.
> > > > > You won't find any interfaces or protocols that come even close
> > > > > to this
> > > >
> > > > limitation, so to me this test is redundant.
> > > >
> > > > But I can easily create such an interface:
> > > >
> > > > $ unshare -n -U -r
> > > > # ip l set lo mtu 1
> > > >
> > > > Ben.
> > >
> > >
> > > It won't be very useful though. But I assume you mean it could be a
> > > possible exploit,
> >
> > Exactly.
> >
> > >  and I suspect a few other things would break both in TIPC and in
> > > other stacks if you do anything like that. I think the solution to
> > > this is not to fix all possible places in the code where this can go
> > > wrong, but rather to have a generic test where we refuse to attach
> > > bearers/interfaces offering an mtu < e.g. 1000 bytes. This can
> > > easily be done in tipc_enable_l2_media().
> >
> > Yes.
> >
> > Ben.
> >
> > --
> > Ben Hutchings
> > One of the nice things about standards is that there are so many of them.

[PATCH net-next 0/2] ARM64: Add Internal PHY support for Meson GXL

2016-11-04 Thread Neil Armstrong

The Amlogic Meson GXL SoCs have an internal RMII PHY that is muxed with the
external RGMII pins.

In order to support switching between the two PHYs links, extended registers
size for mdio-mux-mmioreg must be added.

The DT related patches submitted as RFC in [3] will be sent in a separate
patchset due to multiple patchsets and DTSI migrations.

Changes since v2 RFC patchset at : [3]
 - Change phy Kconfig/Makefile alphabetic order
 - GXL dtsi cleanup

Changes since original RFC patchset at : [2]
 - Remove meson8b experimental phy switching
 - Switch to mdio-mux-mmioreg with extennded size support
 - Add internal phy support for S905x and p231
 - Add external PHY support for p230

[1] 
http://lkml.kernel.org/r/1477932286-27482-1-git-send-email-narmstr...@baylibre.com
[2] 
http://lkml.kernel.org/r/1477060838-14164-1-git-send-email-narmstr...@baylibre.com
[3] 
http://lkml.kernel.org/r/1477932987-27871-1-git-send-email-narmstr...@baylibre.com

Neil Armstrong (2):
  net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes
  net: phy: Add Meson GXL Internal PHY driver

 .../devicetree/bindings/net/mdio-mux-mmioreg.txt   |  4 +-
 drivers/net/phy/Kconfig|  5 ++
 drivers/net/phy/Makefile   |  1 +
 drivers/net/phy/mdio-mux-mmioreg.c | 60 
 drivers/net/phy/meson-gxl.c| 81 ++
 5 files changed, 136 insertions(+), 15 deletions(-)
 create mode 100644 drivers/net/phy/meson-gxl.c

-- 
1.9.1

[PATCH net-next 1/2] net: mdio-mux-mmioreg: Add support for 16bit and 32bit register sizes

2016-11-04 Thread Neil Armstrong

In order to support PHY switching on Amlogic GXL SoCs, add support for
16bit and 32bit registers sizes.

Reviewed-by: Andrew Lunn 
Signed-off-by: Neil Armstrong 
---
 .../devicetree/bindings/net/mdio-mux-mmioreg.txt   |  4 +-
 drivers/net/phy/mdio-mux-mmioreg.c | 60 +-
 2 files changed, 49 insertions(+), 15 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/mdio-mux-mmioreg.txt 
b/Documentation/devicetree/bindings/net/mdio-mux-mmioreg.txt
index 8516929..065e8bd 100644
--- a/Documentation/devicetree/bindings/net/mdio-mux-mmioreg.txt
+++ b/Documentation/devicetree/bindings/net/mdio-mux-mmioreg.txt
@@ -3,7 +3,7 @@ Properties for an MDIO bus multiplexer controlled by a 
memory-mapped device
 This is a special case of a MDIO bus multiplexer.  A memory-mapped device,
 like an FPGA, is used to control which child bus is connected.  The mdio-mux
 node must be a child of the memory-mapped device.  The driver currently only
-supports devices with eight-bit registers.
+supports devices with 8, 16 or 32-bit registers.
 
 Required properties in addition to the generic multiplexer properties:
 
@@ -11,7 +11,7 @@ Required properties in addition to the generic multiplexer 
properties:
 
 - reg : integer, contains the offset of the register that controls the bus
multiplexer.  The size field in the 'reg' property is the size of
-   register, and must therefore be 1.
+   register, and must therefore be 1, 2, or 4.
 
 - mux-mask : integer, contains an eight-bit mask that specifies which
bits in the register control the actual bus multiplexer.  The
diff --git a/drivers/net/phy/mdio-mux-mmioreg.c 
b/drivers/net/phy/mdio-mux-mmioreg.c
index d0bed52..6a33646 100644
--- a/drivers/net/phy/mdio-mux-mmioreg.c
+++ b/drivers/net/phy/mdio-mux-mmioreg.c
@@ -21,7 +21,8 @@
 struct mdio_mux_mmioreg_state {
void *mux_handle;
phys_addr_t phys;
-   uint8_t mask;
+   unsigned int iosize;
+   unsigned int mask;
 };
 
 /*
@@ -47,17 +48,47 @@ static int mdio_mux_mmioreg_switch_fn(int current_child, 
int desired_child,
struct mdio_mux_mmioreg_state *s = data;
 
if (current_child ^ desired_child) {
-   void __iomem *p = ioremap(s->phys, 1);
-   uint8_t x, y;
-
+   void __iomem *p = ioremap(s->phys, s->iosize);
if (!p)
return -ENOMEM;
 
-   x = ioread8(p);
-   y = (x & ~s->mask) | desired_child;
-   if (x != y) {
-   iowrite8((x & ~s->mask) | desired_child, p);
-   pr_debug("%s: %02x -> %02x\n", __func__, x, y);
+   switch (s->iosize) {
+   case sizeof(uint8_t): {
+   uint8_t x, y;
+
+   x = ioread8(p);
+   y = (x & ~s->mask) | desired_child;
+   if (x != y) {
+   iowrite8((x & ~s->mask) | desired_child, p);
+   pr_debug("%s: %02x -> %02x\n", __func__, x, y);
+   }
+
+   break;
+   }
+   case sizeof(uint16_t): {
+   uint16_t x, y;
+
+   x = ioread16(p);
+   y = (x & ~s->mask) | desired_child;
+   if (x != y) {
+   iowrite16((x & ~s->mask) | desired_child, p);
+   pr_debug("%s: %04x -> %04x\n", __func__, x, y);
+   }
+
+   break;
+   }
+   case sizeof(uint32_t): {
+   uint32_t x, y;
+
+   x = ioread32(p);
+   y = (x & ~s->mask) | desired_child;
+   if (x != y) {
+   iowrite32((x & ~s->mask) | desired_child, p);
+   pr_debug("%s: %08x -> %08x\n", __func__, x, y);
+   }
+
+   break;
+   }
}
 
iounmap(p);
@@ -88,8 +119,11 @@ static int mdio_mux_mmioreg_probe(struct platform_device 
*pdev)
}
s->phys = res.start;
 
-   if (resource_size(&res) != sizeof(uint8_t)) {
-   dev_err(&pdev->dev, "only 8-bit registers are supported\n");
+   s->iosize = resource_size(&res);
+   if (s->iosize != sizeof(uint8_t) &&
+   s->iosize != sizeof(uint16_t) &&
+   s->iosize != sizeof(uint32_t)) {
+   dev_err(&pdev->dev, "only 8/16/32-bit registers are 
supported\n");
return -EINVAL;
}
 
@@ -98,8 +132,8 @@ static int mdio_mux_mmioreg_probe(struct platform_device 
*pdev)
dev_err(&pdev->dev, "missing or invalid mux-mask property\n");
return -ENODEV;
}
-   if (be32_to_cpup(iprop) > 255) {
-   dev_err(&pdev->dev, "only 8-bit registers are s

[PATCH net-next 2/2] net: phy: Add Meson GXL Internal PHY driver

2016-11-04 Thread Neil Armstrong

Add driver for the Internal RMII PHY found in the Amlogic Meson GXL SoCs.

This PHY seems to only implement some standard registers and need some
workarounds to provide autoneg values from vendor registers.

Some magic values are currently used to configure the PHY, and this a
temporary setup until clarification about these registers names and
registers fields are provided by Amlogic.

Signed-off-by: Neil Armstrong 
---
 drivers/net/phy/Kconfig |  5 +++
 drivers/net/phy/Makefile|  1 +
 drivers/net/phy/meson-gxl.c | 81 +
 3 files changed, 87 insertions(+)
 create mode 100644 drivers/net/phy/meson-gxl.c

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index 2651c8d..b48943a 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -264,6 +264,11 @@ config MARVELL_PHY
---help---
  Currently has a driver for the 88E1011S
 
+config MESON_GXL_PHY
+   tristate "Amlogic Meson GXL Internal PHY"
+   ---help---
+ Currently has a driver for the Amlogic Meson GXL Internal PHY
+
 config MICREL_PHY
tristate "Micrel PHYs"
---help---
diff --git a/drivers/net/phy/Makefile b/drivers/net/phy/Makefile
index e58667d..3cd5af7 100644
--- a/drivers/net/phy/Makefile
+++ b/drivers/net/phy/Makefile
@@ -41,6 +41,7 @@ obj-$(CONFIG_INTEL_XWAY_PHY)  += intel-xway.o
 obj-$(CONFIG_LSI_ET1011C_PHY)  += et1011c.o
 obj-$(CONFIG_LXT_PHY)  += lxt.o
 obj-$(CONFIG_MARVELL_PHY)  += marvell.o
+obj-$(CONFIG_MESON_GXL_PHY)+= meson-gxl.o
 obj-$(CONFIG_MICREL_KS8995MA)  += spi_ks8995.o
 obj-$(CONFIG_MICREL_PHY)   += micrel.o
 obj-$(CONFIG_MICROCHIP_PHY)+= microchip.o
diff --git a/drivers/net/phy/meson-gxl.c b/drivers/net/phy/meson-gxl.c
new file mode 100644
index 000..1ea69b7
--- /dev/null
+++ b/drivers/net/phy/meson-gxl.c
@@ -0,0 +1,81 @@
+/*
+ * Amlogic Meson GXL Internal PHY Driver
+ *
+ * Copyright (C) 2015 Amlogic, Inc. All rights reserved.
+ * Copyright (C) 2016 BayLibre, SAS. All rights reserved.
+ * Author: Neil Armstrong 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int meson_gxl_config_init(struct phy_device *phydev)
+{
+   /* Enable Analog and DSP register Bank access by */
+   phy_write(phydev, 0x14, 0x);
+   phy_write(phydev, 0x14, 0x0400);
+   phy_write(phydev, 0x14, 0x);
+   phy_write(phydev, 0x14, 0x0400);
+
+   /* Write Analog register 23 */
+   phy_write(phydev, 0x17, 0x8E0D);
+   phy_write(phydev, 0x14, 0x4417);
+
+   /* Enable fractional PLL */
+   phy_write(phydev, 0x17, 0x0005);
+   phy_write(phydev, 0x14, 0x5C1B);
+
+   /* Program fraction FR_PLL_DIV1 */
+   phy_write(phydev, 0x17, 0x029A);
+   phy_write(phydev, 0x14, 0x5C1D);
+
+   /* Program fraction FR_PLL_DIV1 */
+   phy_write(phydev, 0x17, 0x);
+   phy_write(phydev, 0x14, 0x5C1C);
+
+   return 0;
+}
+
+static struct phy_driver meson_gxl_phy[] = {
+   {
+   .phy_id = 0x01814400,
+   .phy_id_mask= 0xfff0,
+   .name   = "Meson GXL Internal PHY",
+   .features   = PHY_BASIC_FEATURES,
+   .flags  = PHY_IS_INTERNAL,
+   .config_init= meson_gxl_config_init,
+   .config_aneg= genphy_config_aneg,
+   .aneg_done  = genphy_aneg_done,
+   .read_status= genphy_read_status,
+   .suspend= genphy_suspend,
+   .resume = genphy_resume,
+   },
+};
+
+static struct mdio_device_id __maybe_unused meson_gxl_tbl[] = {
+   { 0x01814400, 0xfff0 },
+   { }
+};
+
+module_phy_driver(meson_gxl_phy);
+
+MODULE_DEVICE_TABLE(mdio, meson_gxl_tbl);
+
+MODULE_DESCRIPTION("Amlogic Meson GXL Internal PHY driver");
+MODULE_AUTHOR("Baoqi wang");
+MODULE_AUTHOR("Neil Armstrong ");
+MODULE_LICENSE("GPL");
-- 
1.9.1

Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias

Hi Andrew,

On 11/04/2016 04:11 PM, Andrew Lunn wrote:
> On Fri, Nov 04, 2016 at 04:02:24PM +0100, Sebastian Frias wrote:
>> The delay can be applied at PHY or MAC level, but since
>> PHY drivers will apply the delay at PHY level when using
>> one of the "internal delay" declinations of RGMII mode
>> (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
>> at MAC level causes issues.
>>
>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index b59aa35..d2855c9 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>  
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
>> +pad_mode = PAD_MODE_RGMII;
>>  break;
> 
> How many boards use this Ethernet driver? How many boards are your
> potentially breaking, because they need this delay?
> 

This part is specific to the Tango architecture, as noted by the
function name "nb8800_tangox_init".

Also the register used here is Sigma-specific (i.e.: not related to the
Aurora VLSI MAC, "au-nb8800")

The thing is that without this patch if we set
phy-connection-type="rgmii-txid" on the DT, then both, the PHY and the
MAC, will apply the delay.

Best regards,

Sebastian

> I guess it is a small number, because doesn't it require the PHY is
> also broken, not adding a delay when it should?
> 
>  Andrew
>

Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Sebastian Frias

Hi Måns,

On 11/04/2016 04:18 PM, Måns Rullgård wrote:
> Sebastian Frias  writes:
> 
>> The delay can be applied at PHY or MAC level, but since
>> PHY drivers will apply the delay at PHY level when using
>> one of the "internal delay" declinations of RGMII mode
>> (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
>> at MAC level causes issues.
> 
> The Broadcom GENET driver does the same thing.
> 

Well, I don't know who uses that driver, or why they did it that way.

However, with the current code and DT bindings, if one requires
the delay, phy-connection-type="rgmii-txid" must be set.

But when doing so, both the Atheros 8035 and the Aurora NB8800 drivers
will apply the delay.

I think a better way of dealing with this is that both, PHY and MAC
drivers exchange information so that the delay is applied only once.

I can see how to do that in another patch set.

>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index b59aa35..d2855c9 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
>> +pad_mode = PAD_MODE_RGMII;
>>  break;
>>
>>  default:
>> -- 
>> 1.7.11.2
> 
> If this change is correct (and I'm not convinced it is), that case
> should be merged with the one above it and PHY_INTERFACE_MODE_RGMII_RXID
> added as well.
> 

I can do a single patch.

The reason I made two patches was that it was clear what this patch
does, i.e.: do not apply the delay at MAC level, and what the subsequent
patch does, i.e.: handle all RGMII declinations.

Best regards,

Sebastian

Re: [PATCH 1/2] net: ethernet: nb8800: Do not apply TX delay at MAC level

2016-11-04 Thread Måns Rullgård

Andrew Lunn  writes:

> On Fri, Nov 04, 2016 at 04:02:24PM +0100, Sebastian Frias wrote:
>> The delay can be applied at PHY or MAC level, but since
>> PHY drivers will apply the delay at PHY level when using
>> one of the "internal delay" declinations of RGMII mode
>> (like PHY_INTERFACE_MODE_RGMII_TXID), applying it again
>> at MAC level causes issues.

If this is correct, most of the PHY drivers are broken.

>> Signed-off-by: Sebastian Frias 
>> ---
>>  drivers/net/ethernet/aurora/nb8800.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/drivers/net/ethernet/aurora/nb8800.c 
>> b/drivers/net/ethernet/aurora/nb8800.c
>> index b59aa35..d2855c9 100644
>> --- a/drivers/net/ethernet/aurora/nb8800.c
>> +++ b/drivers/net/ethernet/aurora/nb8800.c
>> @@ -1282,7 +1282,7 @@ static int nb8800_tangox_init(struct net_device *dev)
>>  break;
>>  
>>  case PHY_INTERFACE_MODE_RGMII_TXID:
>> -pad_mode = PAD_MODE_RGMII | PAD_MODE_GTX_CLK_DELAY;
>> +pad_mode = PAD_MODE_RGMII;
>>  break;
>
> How many boards use this Ethernet driver? How many boards are your
> potentially breaking, because they need this delay?
>
> I guess it is a small number, because doesn't it require the PHY is
> also broken, not adding a delay when it should?

What if the PHY doesn't have that option?

-- 
Måns Rullgård

Re: [PATCH (net.git) 0/3] stmmac: fix PTP support

2016-11-04 Thread David Miller

From: Giuseppe CAVALLARO 
Date: Fri, 4 Nov 2016 14:53:09 +0100

> the series have some Acked-by, do you prefer a new
> series (I can rebase them if you ask me) or you can keep
> this one? Or you have some advice or issue to warn?

If it's not in an "Action Required" state in patchwork, you
can safely assume it must be resubmitted in order for me to
consider it.

Thanks.

1 2 >

1 - 100 of 192 matches

Mail list logo