date:20170918

Re: [PATCH v5 05/10] dt-bindings: net: dwmac-sun8i: update documentation about integrated PHY

2017-09-18 Thread Corentin Labbe

On Thu, Sep 14, 2017 at 09:19:49PM +0200, Andrew Lunn wrote:
> > > Is the MDIO controller "allwinner,sun8i-h3-emac" or "snps,dwmac-mdio"? 
> > > If the latter, then I think the node is fine, but then the mux should be 
> > > a child node of it. IOW, the child of an MDIO controller should either 
> > > be a mux node or slave devices.
> 
> Hi Rob
> 
> Up until now, children of an MDIO bus have been MDIO devices. Those
> MDIO devices are either Ethernet PHYs, Ethernet Switches, or the
> oddball devices that Broadcom iProc has, like generic PHYs.
> 
> We have never had MDIO-muxes as MDIO children. A Mux is not an MDIO
> device, and does not have the properties of an MDIO device. It is not
> addressable on the MDIO bus. The current MUXes are addressed via GPIOs
> or MMIO.
> 
> There other similar cases. i2c-mux-gpio is not a child of an i2c bus,
> nor i2c-mux-reg or gpio-mux. nxp,pca9548 is however a child of the i2c
> bus, because it is an i2c device itself...
> 
> If the MDIO mux was an MDIO device, i would agree with you. Bit it is
> not, so lets not make it a child.
> 
>   Andrew

Hello Rob, could you anwser/confirm please.
I wait on this for sending the next version.

Thanks
Regards
Corentin Labbe

Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers

2017-09-18 Thread Richard Cochran

On Mon, Sep 18, 2017 at 04:06:28PM -0700, Vinicius Costa Gomes wrote:
> That's the point, the application does not need to know that, and asking
> that would be stupid.

On the contrary, this information is essential to the application.
Probably you have never seen an actual Ethernet field bus in
operation?  In any case, you are missing the point.

> (And that's another nice point of how 802.1Qbv works, applications do
> not need to be changed to use it, and I think we should work to achieve
> this on the Linux side)

Once you start to care about real time performance, then you need to
consider the applications.  This is industrial control, not streaming
your tunes from your ipod.

> That being said, that only works for kinds of traffic that maps well to
> this configuration in advance model, which is the model that the IEEE
> (see 802.1Qcc) and the AVNU Alliance[1] are pushing for.

Again, you are missing the point of what they aiming for.  I have
looked at a number of production systems, and in each case the
developers want total control over the transmission, in order to
reduce latency to an absolute minimum.  Typically the data to be sent
are available only microseconds before the transmission deadline.

Consider OpenAVB on github that people are already using.  Take a look
at simple_talker.c and explain how "applications do not need to be
changed to use it."

> [1]
> http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/

Did you even read this?

[page 24]

As described in section 2, some industrial control systems require
predictable, very low latency and cycle-to-cycle variation to meet
hard real-time application requirements. In these systems,
multiple distributed controllers commonly synchronize their
sensor/actuator operations with other controllers by scheduling
these operations in time, typically using a repeating control
cycle.
...
The gate control mechanism is itself a time-aware PTP application
operating within a bridge or end station port.

It is an application, not a "god box."

> In short, I see a per-packet transmission time and a per-queue schedule
> as solutions to different problems.

Well, I can agree with that.  For some non real-time applications,
bandwidth shaping is enough, and your Qdisc idea is sufficient.  For
the really challenging TSN targets (industrial control, automotive),
your idea of an opaque schedule file won't fly.

Thanks,
Richard

[PATCH net-next] net: sk_buff rbnode reorg

2017-09-18 Thread Eric Dumazet

From: Eric Dumazet 

skb->rbnode shares space with skb->next, skb->prev and skb->tstamp

Current uses (TCP receive ofo queue and netem) need to save/restore
tstamp.

Since we might use an RB tree for TCP retransmit queue at some point
to speedup SACK processing with large rtx queues, this patch exchanges
skb->dev and skb->tstamp.

This saves some overhead in both TCP and netem.

Signed-off-by: Eric Dumazet 
---
 include/linux/skbuff.h |   16 
 include/net/tcp.h  |5 -
 net/ipv4/tcp_input.c   |   27 +--
 net/sched/sch_netem.c  |7 ---
 4 files changed, 17 insertions(+), 38 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 
72299ef00061db1ce70d34b96ae1639ecde08837..492828801acba42ac6bccb287d3cc5080039135c
 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -661,8 +661,12 @@ struct sk_buff {
struct sk_buff  *prev;
 
union {
-   ktime_t tstamp;
-   u64 skb_mstamp;
+   struct net_device   *dev;
+   /* Some protocols might use this space to store 
information,
+* while device pointer would be NULL.
+* UDP receive path is one user.
+*/
+   unsigned long   dev_scratch;
};
};
struct rb_node  rbnode; /* used in netem & tcp stack */
@@ -670,12 +674,8 @@ struct sk_buff {
struct sock *sk;
 
union {
-   struct net_device   *dev;
-   /* Some protocols might use this space to store information,
-* while device pointer would be NULL.
-* UDP receive path is one user.
-*/
-   unsigned long   dev_scratch;
+   ktime_t tstamp;
+   u64 skb_mstamp;
};
/*
 * This is the control buffer. It is free to use for every
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 
b510f284427aabc1f508d24d29d0f812e5e0aa61..6ecc01aa667b26a3e2270a3566b2076bfebd8605
 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -797,11 +797,6 @@ struct tcp_skb_cb {
u16 tcp_gso_segs;
u16 tcp_gso_size;
};
-
-   /* Used to stash the receive timestamp while this skb is in the
-* out of order queue, as skb->tstamp is overwritten by the
-* rbnode.
-*/
ktime_t swtstamp;
};
__u8tcp_flags;  /* TCP header flags. (tcp[13])  */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
bddf724f5c02abe1d0cffe2517f6376e440d..db9bb46b5776f9ee332298c0e95afb0a5966b938
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4266,11 +4266,6 @@ static void tcp_sack_remove(struct tcp_sock *tp)
tp->rx_opt.num_sacks = num_sacks;
 }
 
-enum tcp_queue {
-   OOO_QUEUE,
-   RCV_QUEUE,
-};
-
 /**
  * tcp_try_coalesce - try to merge skb to prior one
  * @sk: socket
@@ -4286,7 +4281,6 @@ enum tcp_queue {
  * Returns true if caller should free @from instead of queueing it
  */
 static bool tcp_try_coalesce(struct sock *sk,
-enum tcp_queue dest,
 struct sk_buff *to,
 struct sk_buff *from,
 bool *fragstolen)
@@ -4311,10 +4305,7 @@ static bool tcp_try_coalesce(struct sock *sk,
 
if (TCP_SKB_CB(from)->has_rxtstamp) {
TCP_SKB_CB(to)->has_rxtstamp = true;
-   if (dest == OOO_QUEUE)
-   TCP_SKB_CB(to)->swtstamp = TCP_SKB_CB(from)->swtstamp;
-   else
-   to->tstamp = from->tstamp;
+   to->tstamp = from->tstamp;
}
 
return true;
@@ -4351,9 +4342,6 @@ static void tcp_ofo_queue(struct sock *sk)
}
p = rb_next(p);
rb_erase(>rbnode, >out_of_order_queue);
-   /* Replace tstamp which was stomped by rbnode */
-   if (TCP_SKB_CB(skb)->has_rxtstamp)
-   skb->tstamp = TCP_SKB_CB(skb)->swtstamp;
 
if (unlikely(!after(TCP_SKB_CB(skb)->end_seq, tp->rcv_nxt))) {
SOCK_DEBUG(sk, "ofo packet was already received\n");
@@ -4365,8 +4353,7 @@ static void tcp_ofo_queue(struct sock *sk)
   TCP_SKB_CB(skb)->end_seq);
 
tail = skb_peek_tail(>sk_receive_queue);
-   eaten = tail && tcp_try_coalesce(sk, RCV_QUEUE,
-tail, skb, );

linux-next: Signed-off-by missing for commit in the net tree

2017-09-18 Thread Stephen Rothwell

Hi all,

Commit

  129c6cda2de2 ("8139too: revisit napi_complete_done() usage")

is missing a Signed-off-by from its author.

-- 
Cheers,
Stephen Rothwell

Re: [PATCH net-next 12/14] gtp: Configuration for zero UDP checksum

2017-09-18 Thread David Miller

From: Tom Herbert 
Date: Mon, 18 Sep 2017 17:39:02 -0700

> Add configuration to control use of zero checksums on transmit for both
> IPv4 and IPv6, and control over accepting zero IPv6 checksums on
> receive.
> 
> Signed-off-by: Tom Herbert 

I thought we were trying to move away from this special case of allowing
zero UDP checksums with tunnels, especially for ipv6.

Re: [PATCH net-next 11/14] net: Add a facility to support application defined GSO

2017-09-18 Thread David Miller

From: Tom Herbert 
Date: Mon, 18 Sep 2017 17:39:01 -0700

> Allow applications or encapsulation protocols to register a GSO segment
> function to their specific protocol. To faciliate this I reserved the
> upper four bits in the gso_type to indicate the application specific GSO
> type. Zero in these bits indicates no application GSO, so there are
> fifteen instance that can be defined.
> 
> An application registers a a gso_segment using the skb_gso_app_register
> this takes a struct skb_gso_app that indicates a callback function as
> well as a set of GSO types for which at least one must be matched before
> calling he segment function. GSO returns one of the application GSO
> types described above (not a fixed value for the applications).
> Subsequently, when the application sends a GSO packet the application
> gso_type is set in the skb gso_type along with any other types.
> 
> skb_gso_app_segment is the function called from another GSO segment
> function to handle segmentation of the application or encapsulation
> protocol. This function includes check flags that provides context for
> the appropriate GSO instance to match. For instance, in order to handle
> a protocol encapsulated in UDP (GTP for instance) skb_gso_app_segment is
> call from udp_tunnel_segment and check flags would be
> SKB_GSO_UDP_TUNNEL_CSUM | SKB_GSO_UDP_TUNNEL.
> 
> Signed-off-by: Tom Herbert 

What happens on cards that can offload existing arbitrary UDP tunnel
encapsulations?

Will something about the state of the GSO type bits you are adding
prevent that?  Or do we need to add some new checks somewhere?

Re: [PATCH net-next 08/14] gtp: Support encpasulating over IPv6

2017-09-18 Thread David Miller

From: Tom Herbert 
Date: Mon, 18 Sep 2017 17:38:58 -0700

> Allow peers to be specified by IPv6 addresses.
> 
> Signed-off-by: Tom Herbert 

Hmmm, can you just check the socket family or something like that?

Re: [PATCH net-next 07/14] gtp: Support encapsulation of IPv6 packets

2017-09-18 Thread David Miller

From: Tom Herbert 
Date: Mon, 18 Sep 2017 17:38:57 -0700

> @@ -98,6 +104,7 @@ static void pdp_context_delete(struct pdp_ctx *pctx);
>  static inline u32 gtp0_hashfn(u64 tid)
>  {
>   u32 *tid32 = (u32 *) 
> +
>   return jhash_2words(tid32[0], tid32[1], gtp_h_initval);
>  }
>  
> @@ -111,6 +118,11 @@ static inline u32 ipv4_hashfn(__be32 ip)
>   return jhash_1word((__force u32)ip, gtp_h_initval);
>  }
>  
> +static inline u32 ipv6_hashfn(const struct in6_addr *a)
> +{
> + return __ipv6_addr_jhash(a, gtp_h_initval);
> +}

I know you are just following the pattern of the existing "ipv4_hashfn()" here
but this kind of stuff is not very global namespace friendly.  Even simply
adding a "gtp_" prefix to these hash functions would be a lot better.

Re: [PATCH net-next 03/14] gtp: Call common functions to get tunnel routes and add dst_cache

2017-09-18 Thread David Miller

From: Tom Herbert 
Date: Mon, 18 Sep 2017 17:38:53 -0700

> Call ip_tunnel_get_route and dst_cache to pdp context which should
> improve performance by obviating the need to perform a route lookup
> on every packet.
> 
> Signed-off-by: Tom Herbert 

Not caused by your changes, but something to think about:

> -static struct rtable *ip4_route_output_gtp(struct flowi4 *fl4,
> -const struct sock *sk,
> -__be32 daddr)
> -{
> - memset(fl4, 0, sizeof(*fl4));
> - fl4->flowi4_oif = sk->sk_bound_dev_if;
> - fl4->daddr  = daddr;
> - fl4->saddr  = inet_sk(sk)->inet_saddr;
> - fl4->flowi4_tos = RT_CONN_FLAGS(sk);
> - fl4->flowi4_proto   = sk->sk_protocol;
> -
> - return ip_route_output_key(sock_net(sk), fl4);
> -}

This and the new dst caching code ignores any source address selection
done by ip_route_output_key() or the new tunnel route lookup helpers.

Either source address selection should be respected, or if saddr will
never be modified by a route lookup for some specific reason here,
that should be documented.

Re: RFC: Audit Kernel Container IDs

2017-09-18 Thread Richard Guy Briggs

On 2017-09-18 21:45, Eric W. Biederman wrote:
> Richard Guy Briggs  writes:
> 
> > On 2017-09-14 12:33, Eric W. Biederman wrote:
> >> Richard Guy Briggs  writes:
> >> 
> >> > The trigger is a pseudo filesystem (proc, since PID tree already exists)
> >> > write of a u64 representing the container ID to a file representing a
> >> > process that will become the first process in a new container.
> >> > This might place restrictions on mount namespaces required to define a
> >> > container, or at least careful checking of namespaces in the kernel to
> >> > verify permissions of the orchestrator so it can't change its own
> >> > container ID.
> >> 
> >> Why a u64?
> >
> > u32 will roll too quickly.  UUID is large enough that it adds
> > significantly to audit record bandwidth.  I'd prefer u64, but can look
> > at the difference of accommodating a UUID...
> 
> I was imagining a string might be better.  As for the purposes of audit
> it is just a byte string you regurgitate.

Yes, so looking at u128 vs dhowells' proposal, it would be 16 bytes vs
24 bytes, which really isn't that much difference...

What length of string length were you envisioning?

> >> Why a proc filesystem write and not a magic audit message?
> >
> > A magic audit message requires CAP_AUDIT_WRITE, which we'd like to use
> > sparingly.  Given that orchestrators will already require it to send
> > the mandatory AUDIT_VIRT_*, this doesn't seem like an unreasonable burden.
> >
> > I was originally leaning towards an audit message trigger or a syscall.
> >
> >> I don't like the fact that the proc filesystem entry is likely going to
> >> be readable and abusable by non-audit contexts?
> >
> > This proposal wasn't going to start with that link being readable, but
> > its filesystem structure and link names would be, perhaps giving away
> > too much already.
> >
> > I think we will need to find a way for the orchestrator or one of its
> > authorized agents to read this information while blocking reads from
> > unauthorized agents, otherwise this would be of very limited use.
> 
> Something that is set only for future audit messages seems reasonable.
> Once you start reading this from something other than audit messages I
> get neverous, that people will use this beyond audit for things it is
> not intended for.

Understandably.  At the same time, if we implement something that is
more broadly useful and solves a number of other challenges others are
facing, how can we make it available while limiting the potential for
abuse?

> >> Why the ability to change the containerid?  What is the use case you are
> >> thinking of there?
> >
> > This was covered in the end of the conversation with Paul Moore (that
> > maybe you got tired reading?)
> 
> I have not had time to review everything.  As I was busy preparing for my
> wedding and am now in the middle of my honeymoon.

I'm very sorry, my bad!  You had given me a heads up about this and I
appologise for causing a stir during your special time.

> > I'd originally proposed having it write
> > once, but Paul figured there was no good reason to restrict it and leave
> > that decision up to the orchestrator.  The use case would be adding
> > other processes to a container, but it could be argued all additional
> > processes should be spawned by the first process in a container.
> 
> I see two cases here:
> a) Nested containers
> b) Inject processes via something like nsenter into a container.
> 
> In case a) you have to figure out what to do with nested containers
> and that does seem to be a legitimate case for a double write.  Arguably
> with the restriction that you must specify a more nested label.

Is this technically a double write if it is an inheritance?  That should
be solvable with a flag.

> In case b) which you seem to be referring to it would be a process
> created by the container manager outside the container that has no
> container label.  At which point there is not a need for a double write.

Looking at the potential for nesting, if the orchestrator is already in
a container, then it would already have a label, but if we refer to the
flag solution above, then it is still the first write.

> So my recommendation is to not support double writes until you support
> nested containers.

I think this is a reasonable restriction.

Thanks for your time.  Sorry to disturb your holiday.

> Eric

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

Re: [PATCH net] net: systemport: Fix 64-bit statistics dependency

2017-09-18 Thread David Miller

From: Florian Fainelli 
Date: Mon, 18 Sep 2017 16:31:30 -0700

> There are several problems with commit 10377ba7673d ("net: systemport:
> Support 64bit statistics", first one got fixed in 7095c973453e ("net:
> systemport: Fix 64-bit stats deadlock").
> 
> The second problem is that this specific code updates the
> stats64.tx_{packets,bytes} from ndo_get_stats64() and that is what we
> are returning to ethtool -S. If we are not running a tool that involves
> calling ndo_get_stats64(), then we won't get updated ethtool stats.
> 
> The solution to this is to update the stats from both call sites,
> factoring that into a specific function, While at it, don't just check
> the sizeof() but also the type of the statistics in order to use the
> 64-bit stats seqlock.
> 
> Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics"
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.

Re: [PATCH net 0/7] Bug fixes for the HNS3 Ethernet Driver for Hip08 SoC

2017-09-18 Thread Leon Romanovsky

On Tue, Sep 19, 2017 at 02:06:21AM +0100, Salil Mehta wrote:
> This patch set presents some bug fixes for the HNS3 Ethernet driver, 
> identified
> during internal testing & stabilization efforts.
>
> This patch series is meant for Linux 4.14 kernel.
>
> Lipeng (6):
>   net: hns3: get phy addr from NCL_config
>   net: hns3: fix the command used to unmap ring from vector
>   net: hns3: Fix ring and vector map command
>   net: hns3: fix a bug of set mac address
>   net: hns3: set default vlan id to PF
>   net: hns3: Fixes the premature exit of loop when matching clients
>
> Salil Mehta (1):
>   net: hns3: fixes the ether address copy with more appropriate API

1. The fixes patches should have Fixes line and not all of them have
(I didn't look all patches).
2. Please decide on one style: fixes vs. Fixes, fix vs. Fix in the titles
3. Subject should be descriptive and usable, I don't know if it applies
to the "fix a bug of set mac address" patch.

Thanks

>
>  drivers/net/ethernet/hisilicon/hns3/hnae3.c| 43 
> +-
>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  8 +++-
>  .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 20 --
>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c |  7 ++--
>  4 files changed, 35 insertions(+), 43 deletions(-)
>
> --
> 2.11.0
>
>


signature.asc
Description: PGP signature

Re: [PATCH net] 8139too: revisit napi_complete_done() usage

2017-09-18 Thread David Miller

From: Eric Dumazet 
Date: Mon, 18 Sep 2017 13:03:43 -0700

> From: Eric Dumazet 
> 
> It seems we have to be more careful in napi_complete_done()
> use. This patch is not a revert, as it seems we can
> avoid bug that Ville reported by moving the napi_complete_done()
> test in the spinlock section.
> 
> Many thanks to Ville for detective work and all tests.
> 
> Fixes: 617f01211baf ("8139too: use napi_complete_done()")
> Reported-by: Ville Syrjälä 
> Tested-by: Ville Syrjälä 

Applied and queued up for -stable.

Re: [PATCHv2 iproute2 1/2] lib/libnetlink: re malloc buff if size is not enough

2017-09-18 Thread Hangbin Liu

Hi Michal,

On Mon, Sep 18, 2017 at 09:55:05AM +0200, Michal Kubecek wrote:
> > +static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
> > +{
> > +   struct iovec *iov;
> > +   int len = -1, buf_len = 32768;
> > +   char *bufp, *buf = NULL;
> > +
> > +   int flag = MSG_PEEK | MSG_TRUNC;
> > +
> > +realloc:
> > +   bufp = realloc(buf, buf_len);
> > +
> > +   if (bufp == NULL) {
> > +   fprintf(stderr, "malloc error: not enough buffer\n");
> > +   free(buf);
> > +   return -ENOMEM;
> > +   }
> > +   buf = bufp;
> > +   iov = msg->msg_iov;
> > +   iov->iov_base = buf;
> > +   iov->iov_len = buf_len;
> > +
> > +recv:
> > +   len = recvmsg(fd, msg, flag);
> > +
> > +   if (len < 0) {
> > +   if (errno == EINTR || errno == EAGAIN)
> > +   goto recv;
> > +   fprintf(stderr, "netlink receive error %s (%d)\n",
> > +   strerror(errno), errno);
> 
> free(buf);
> 
> > +   return len;
> 
> Maybe we should return -errno (saved before calling fprintf()) to be
> consistent.
> 
> > +   }
> > +
> > +   if (len == 0) {
> > +   fprintf(stderr, "EOF on netlink\n");
> 
> free(buf);

Will fix these three issues.

> > @@ -471,19 +516,23 @@ int rtnl_dump_filter_l(struct rtnl_handle *rth,
> >  
> > if (h->nlmsg_type == NLMSG_ERROR) {
> > rtnl_dump_error(rth, h);
> > +   free(buf);
> > return -1;
> > }
> >  
> > if (!rth->dump_fp) {
> > err = a->filter(, h, a->arg1);
> > -   if (err < 0)
> > +   if (err < 0) {
> > +   free(buf);
> > return err;
> > +   }
> > }
> >  
> >  skip_it:
> > h = NLMSG_NEXT(h, msglen);
> > }
> > }
> > +   free(buf);
> 
> We only free the last buffer returned by rtnl_recvmsg() this way. IMHO
> this free(buf) should be moved inside the loop.

Do you mean the outside while loop or the for loop? I think we could not put
it inside the for loop, because we may need the buf multi times based on arg.

while (1) {
status = rtnl_recvmsg(rth->fd, , );

for (a = arg; a->filter; a++) {
struct nlmsghdr *h = (struct nlmsghdr *)buf;
while (NLMSG_OK(h, msglen)) {
[...]
skip_it:
h = NLMSG_NEXT(h, msglen);
}
}
free(buf);
[...]
}

Thanks
Hangbin

Re: RFC: Audit Kernel Container IDs

2017-09-18 Thread Eric W. Biederman

Richard Guy Briggs  writes:

> On 2017-09-14 12:33, Eric W. Biederman wrote:
>> Richard Guy Briggs  writes:
>> 
>> > The trigger is a pseudo filesystem (proc, since PID tree already exists)
>> > write of a u64 representing the container ID to a file representing a
>> > process that will become the first process in a new container.
>> > This might place restrictions on mount namespaces required to define a
>> > container, or at least careful checking of namespaces in the kernel to
>> > verify permissions of the orchestrator so it can't change its own
>> > container ID.
>> 
>> Why a u64?
>
> u32 will roll too quickly.  UUID is large enough that it adds
> significantly to audit record bandwidth.  I'd prefer u64, but can look
> at the difference of accommodating a UUID...

I was imagining a string might be better.  As for the purposes of audit
it is just a byte string you regurgitate.

>> Why a proc filesystem write and not a magic audit message?
>
> A magic audit message requires CAP_AUDIT_WRITE, which we'd like to use
> sparingly.  Given that orchestrators will already require it to send
> the mandatory AUDIT_VIRT_*, this doesn't seem like an unreasonable burden.
>
> I was originally leaning towards an audit message trigger or a syscall.
>
>> I don't like the fact that the proc filesystem entry is likely going to
>> be readable and abusable by non-audit contexts?
>
> This proposal wasn't going to start with that link being readable, but
> its filesystem structure and link names would be, perhaps giving away
> too much already.
>
> I think we will need to find a way for the orchestrator or one of its
> authorized agents to read this information while blocking reads from
> unauthorized agents, otherwise this would be of very limited use.

Something that is set only for future audit messages seems reasonable.
Once you start reading this from something other than audit messages I
get neverous, that people will use this beyond audit for things it is
not intended for.

>> Why the ability to change the containerid?  What is the use case you are
>> thinking of there?
>
> This was covered in the end of the conversation with Paul Moore (that
> maybe you got tired reading?)

I have not had time to review everything.  As I was busy preparing for my
wedding and am now in the middle of my honeymoon.

> I'd originally proposed having it write
> once, but Paul figured there was no good reason to restrict it and leave
> that decision up to the orchestrator.  The use case would be adding
> other processes to a container, but it could be argued all additional
> processes should be spawned by the first process in a container.

I see two cases here:
a) Nested containers
b) Inject processes via something like nsenter into a container.

In case a) you have to figure out what to do with nested containers
and that does seem to be a legitimate case for a double write.  Arguably
with the restriction that you must specify a more nested label.

In case b) which you seem to be referring to it would be a process
created by the container manager outside the container that has no
container label.  At which point there is not a need for a double write.

So my recommendation is to not support double writes until you support
nested containers.

Eric

[PATCH net-next v2 12/12] net: dsa: bcm_sf2: Utilize b53_{enable,disable}_port

2017-09-18 Thread Florian Fainelli

Export b53_{enable,disable}_port and use these two functions in
bcm_sf2_port_setup and bcm_sf2_port_disable. The generic functions
cannot be used without wrapping because we need to manage additional
switch integration details (PHY, Broadcom tag etc.).

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c |  8 
 drivers/net/dsa/b53/b53_priv.h   |  2 ++
 drivers/net/dsa/bcm_sf2.c| 26 ++
 3 files changed, 8 insertions(+), 28 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index c3f1cd2c33ea..a9f2a5b55a5e 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -502,8 +502,7 @@ void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
 }
 EXPORT_SYMBOL(b53_imp_vlan_setup);
 
-static int b53_enable_port(struct dsa_switch *ds, int port,
-  struct phy_device *phy)
+int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
 {
struct b53_device *dev = ds->priv;
unsigned int cpu_port = dev->cpu_port;
@@ -530,9 +529,9 @@ static int b53_enable_port(struct dsa_switch *ds, int port,
 
return 0;
 }
+EXPORT_SYMBOL(b53_enable_port);
 
-static void b53_disable_port(struct dsa_switch *ds, int port,
-struct phy_device *phy)
+void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy)
 {
struct b53_device *dev = ds->priv;
u8 reg;
@@ -542,6 +541,7 @@ static void b53_disable_port(struct dsa_switch *ds, int 
port,
reg |= PORT_CTRL_RX_DISABLE | PORT_CTRL_TX_DISABLE;
b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), reg);
 }
+EXPORT_SYMBOL(b53_disable_port);
 
 void b53_brcm_hdr_setup(struct dsa_switch *ds, int port)
 {
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index df8bea4105e4..8b5ba78edfd2 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -311,6 +311,8 @@ int b53_mirror_add(struct dsa_switch *ds, int port,
   struct dsa_mall_mirror_tc_entry *mirror, bool ingress);
 void b53_mirror_del(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
+int b53_enable_port(struct dsa_switch *ds, int port, struct phy_device *phy);
+void b53_disable_port(struct dsa_switch *ds, int port, struct phy_device *phy);
 void b53_brcm_hdr_setup(struct dsa_switch *ds, int port);
 void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable);
 int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy);
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 08639674947a..0072a959db5b 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -163,7 +163,6 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
  struct phy_device *phy)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-   s8 cpu_port = ds->dst->cpu_dp->index;
unsigned int i;
u32 reg;
 
@@ -184,9 +183,6 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
reg |= i << (PRT_TO_QID_SHIFT * i);
core_writel(priv, reg, CORE_PORT_TC2_QOS_MAP_PORT(port));
 
-   /* Clear the Rx and Tx disable bits and set to no spanning tree */
-   core_writel(priv, 0, CORE_G_PCTL_PORT(port));
-
/* Re-enable the GPHY and re-apply workarounds */
if (priv->int_phy_mask & 1 << port && priv->hw_params.num_gphy == 1) {
bcm_sf2_gphy_enable_set(ds, true);
@@ -209,23 +205,7 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
if (port == priv->moca_port)
bcm_sf2_port_intr_enable(priv, port);
 
-   /* Set this port, and only this one to be in the default VLAN,
-* if member of a bridge, restore its membership prior to
-* bringing down this port.
-*/
-   reg = core_readl(priv, CORE_PORT_VLAN_CTL_PORT(port));
-   reg &= ~PORT_VLAN_CTRL_MASK;
-   reg |= (1 << port);
-   reg |= priv->dev->ports[port].vlan_ctl_mask;
-   core_writel(priv, reg, CORE_PORT_VLAN_CTL_PORT(port));
-
-   b53_imp_vlan_setup(ds, cpu_port);
-
-   /* If EEE was enabled, restore it */
-   if (priv->dev->ports[port].eee.eee_enabled)
-   b53_eee_enable_set(ds, port, true);
-
-   return 0;
+   return b53_enable_port(ds, port, phy);
 }
 
 static void bcm_sf2_port_disable(struct dsa_switch *ds, int port,
@@ -248,9 +228,7 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, int 
port,
else
off = CORE_G_PCTL_PORT(port);
 
-   reg = core_readl(priv, off);
-   reg |= RX_DIS | TX_DIS;
-   core_writel(priv, reg, off);
+   b53_disable_port(ds, port, phy);
 
/* Power down the port memory */
reg = core_readl(priv, CORE_MEM_PSM_VDD_CTRL);
-- 
2.9.3

[PATCH net-next v2 07/12] net: dsa: b53: Define EEE register page

2017-09-18 Thread Florian Fainelli

In preparation for migrating the EEE code from bcm_sf2 to b53, define the full
EEE register page and offsets within that page.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_regs.h | 41 +
 1 file changed, 41 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_regs.h b/drivers/net/dsa/b53/b53_regs.h
index 5e8b8e31fee8..2a9f421680aa 100644
--- a/drivers/net/dsa/b53/b53_regs.h
+++ b/drivers/net/dsa/b53/b53_regs.h
@@ -50,6 +50,9 @@
 /* Jumbo Frame Registers */
 #define B53_JUMBO_PAGE 0x40
 
+/* EEE Control Registers Page */
+#define B53_EEE_PAGE   0x92
+
 /* CFP Configuration Registers Page */
 #define B53_CFP_PAGE   0xa1
 
@@ -472,6 +475,44 @@
 #define   JMS_MAX_SIZE 9724
 
 /*
+ * EEE Configuration Page Registers
+ */
+
+/* EEE Enable control register (16 bit) */
+#define B53_EEE_EN_CTRL0x00
+
+/* EEE LPI assert status register (16 bit) */
+#define B53_EEE_LPI_ASSERT_STS 0x02
+
+/* EEE LPI indicate status register (16 bit) */
+#define B53_EEE_LPI_INDICATE   0x4
+
+/* EEE Receiving idle symbols status register (16 bit) */
+#define B53_EEE_RX_IDLE_SYM_STS0x6
+
+/* EEE Pipeline timer register (32 bit) */
+#define B53_EEE_PIP_TIMER  0xC
+
+/* EEE Sleep timer Gig register (32 bit) */
+#define B53_EEE_SLEEP_TIMER_GIG(i) (0x10 + 4 * (i))
+
+/* EEE Sleep timer FE register (32 bit) */
+#define B53_EEE_SLEEP_TIMER_FE(i)  (0x34 + 4 * (i))
+
+/* EEE Minimum LP timer Gig register (32 bit) */
+#define B53_EEE_MIN_LP_TIMER_GIG(i)(0x58 + 4 * (i))
+
+/* EEE Minimum LP timer FE register (32 bit) */
+#define B53_EEE_MIN_LP_TIMER_FE(i) (0x7c + 4 * (i))
+
+/* EEE Wake timer Gig register (16 bit) */
+#define B53_EEE_WAKE_TIMER_GIG(i)  (0xa0 + 2 * (i))
+
+/* EEE Wake timer FE register (16 bit) */
+#define B53_EEE_WAKE_TIMER_FE(i)   (0xb2 + 2 * (i))
+
+
+/*
  * CFP Configuration Page Registers
  */
 
-- 
2.9.3

[PATCH net-next v2 11/12] net: dsa: bcm_sf2: Use SF2_NUM_EGRESS_QUEUES for CFP

2017-09-18 Thread Florian Fainelli

The magic number 8 in 3 locations in bcm_sf2_cfp.c actually designates the
number of switch port egress queues, so use that define instead of open-coding
it.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2_cfp.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2_cfp.c b/drivers/net/dsa/bcm_sf2_cfp.c
index 8a1da7e67707..94649e1481ec 100644
--- a/drivers/net/dsa/bcm_sf2_cfp.c
+++ b/drivers/net/dsa/bcm_sf2_cfp.c
@@ -144,7 +144,7 @@ static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int 
port,
 * destination port is enabled and that we are within the
 * number of ports supported by the switch
 */
-   port_num = fs->ring_cookie / 8;
+   port_num = fs->ring_cookie / SF2_NUM_EGRESS_QUEUES;
 
if (fs->ring_cookie == RX_CLS_FLOW_DISC ||
!(BIT(port_num) & ds->enabled_port_mask) ||
@@ -280,7 +280,7 @@ static int bcm_sf2_cfp_rule_set(struct dsa_switch *ds, int 
port,
 * We have a small oddity where Port 6 just does not have a
 * valid bit here (so we subtract by one).
 */
-   queue_num = fs->ring_cookie % 8;
+   queue_num = fs->ring_cookie % SF2_NUM_EGRESS_QUEUES;
if (port_num >= 7)
port_num -= 1;
 
@@ -401,7 +401,7 @@ static int bcm_sf2_cfp_rule_get(struct bcm_sf2_priv *priv, 
int port,
/* There is no Port 6, so we compensate for that here */
if (nfc->fs.ring_cookie >= 6)
nfc->fs.ring_cookie++;
-   nfc->fs.ring_cookie *= 8;
+   nfc->fs.ring_cookie *= SF2_NUM_EGRESS_QUEUES;
 
/* Extract the destination queue */
queue_num = (reg >> NEW_TC_SHIFT) & NEW_TC_MASK;
-- 
2.9.3

[PATCH net-next v2 10/12] net: dsa: b53: Export b53_imp_vlan_setup()

2017-09-18 Thread Florian Fainelli

bcm_sf2 and b53 do exactly the same thing, so share that piece.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c |  3 ++-
 drivers/net/dsa/b53/b53_priv.h   |  1 +
 drivers/net/dsa/bcm_sf2.c| 23 +--
 3 files changed, 4 insertions(+), 23 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 4e37ec27e496..c3f1cd2c33ea 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -484,7 +484,7 @@ static int b53_fast_age_vlan(struct b53_device *dev, u16 
vid)
return b53_flush_arl(dev, FAST_AGE_VLAN);
 }
 
-static void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
+void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
 {
struct b53_device *dev = ds->priv;
unsigned int i;
@@ -500,6 +500,7 @@ static void b53_imp_vlan_setup(struct dsa_switch *ds, int 
cpu_port)
b53_write16(dev, B53_PVLAN_PAGE, B53_PVLAN_PORT_MASK(i), pvlan);
}
 }
+EXPORT_SYMBOL(b53_imp_vlan_setup);
 
 static int b53_enable_port(struct dsa_switch *ds, int port,
   struct phy_device *phy)
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 0ed59672ef07..df8bea4105e4 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -284,6 +284,7 @@ static inline int b53_switch_get_reset_gpio(struct 
b53_device *dev)
 #endif
 
 /* Exported functions towards other drivers */
+void b53_imp_vlan_setup(struct dsa_switch *ds, int cpu_port);
 void b53_get_strings(struct dsa_switch *ds, int port, uint8_t *data);
 void b53_get_ethtool_stats(struct dsa_switch *ds, int port, uint64_t *data);
 int b53_get_sset_count(struct dsa_switch *ds);
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 4e8ef4c07eab..08639674947a 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -40,27 +40,6 @@ static enum dsa_tag_protocol 
bcm_sf2_sw_get_tag_protocol(struct dsa_switch *ds)
return DSA_TAG_PROTO_BRCM;
 }
 
-static void bcm_sf2_imp_vlan_setup(struct dsa_switch *ds, int cpu_port)
-{
-   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-   unsigned int i;
-   u32 reg;
-
-   /* Enable the IMP Port to be in the same VLAN as the other ports
-* on a per-port basis such that we only have Port i and IMP in
-* the same VLAN.
-*/
-   for (i = 0; i < priv->hw_params.num_ports; i++) {
-   if (!((1 << i) & ds->enabled_port_mask))
-   continue;
-
-   reg = core_readl(priv, CORE_PORT_VLAN_CTL_PORT(i));
-   reg |= (1 << cpu_port);
-   core_writel(priv, reg, CORE_PORT_VLAN_CTL_PORT(i));
-   }
-}
-
-
 static void bcm_sf2_imp_setup(struct dsa_switch *ds, int port)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
@@ -240,7 +219,7 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
reg |= priv->dev->ports[port].vlan_ctl_mask;
core_writel(priv, reg, CORE_PORT_VLAN_CTL_PORT(port));
 
-   bcm_sf2_imp_vlan_setup(ds, cpu_port);
+   b53_imp_vlan_setup(ds, cpu_port);
 
/* If EEE was enabled, restore it */
if (priv->dev->ports[port].eee.eee_enabled)
-- 
2.9.3

[PATCH net-next v2 05/12] net: dsa: b53: Use a macro to define I/O operations

2017-09-18 Thread Florian Fainelli

Instead of repeating the same pattern: acquire mutex, read/write,
release mutex, define a macro: b53_build_op() which takes the type
(read|write), I/O size, and value (scalar or pointer). This helps with
fixing bugs that could exist (e.g: missing barrier, lock etc.).

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_priv.h | 133 +++--
 1 file changed, 22 insertions(+), 111 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 7528b22aeb03..f1136619e0e4 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -199,119 +199,30 @@ static inline void b53_switch_remove(struct b53_device 
*dev)
dsa_unregister_switch(dev->ds);
 }
 
-static inline int b53_read8(struct b53_device *dev, u8 page, u8 reg, u8 *val)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->read8(dev, page, reg, val);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_read16(struct b53_device *dev, u8 page, u8 reg, u16 *val)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->read16(dev, page, reg, val);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_read32(struct b53_device *dev, u8 page, u8 reg, u32 *val)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->read32(dev, page, reg, val);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_read48(struct b53_device *dev, u8 page, u8 reg, u64 *val)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->read48(dev, page, reg, val);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
+#define b53_build_op(type, op_size, val_type)  \
+static inline int b53_##type##op_size(struct b53_device *dev, u8 page, 
\
+ u8 reg, val_type val) 
\
+{  
\
+   int ret;
\
+   
\
+   mutex_lock(>reg_mutex);
\
+   ret = dev->ops->type##op_size(dev, page, reg, val); 
\
+   mutex_unlock(>reg_mutex);  
\
+   
\
+   return ret; 
\
 }
 
-static inline int b53_read64(struct b53_device *dev, u8 page, u8 reg, u64 *val)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->read64(dev, page, reg, val);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_write8(struct b53_device *dev, u8 page, u8 reg, u8 value)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->write8(dev, page, reg, value);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_write16(struct b53_device *dev, u8 page, u8 reg,
- u16 value)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->write16(dev, page, reg, value);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_write32(struct b53_device *dev, u8 page, u8 reg,
- u32 value)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->write32(dev, page, reg, value);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_write48(struct b53_device *dev, u8 page, u8 reg,
- u64 value)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->write48(dev, page, reg, value);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
-
-static inline int b53_write64(struct b53_device *dev, u8 page, u8 reg,
-  u64 value)
-{
-   int ret;
-
-   mutex_lock(>reg_mutex);
-   ret = dev->ops->write64(dev, page, reg, value);
-   mutex_unlock(>reg_mutex);
-
-   return ret;
-}
+b53_build_op(read, 8, u8 *);
+b53_build_op(read, 16, u16 *);
+b53_build_op(read, 32, u32 *);
+b53_build_op(read, 48, u64 *);
+b53_build_op(read, 64, u64 *);
+
+b53_build_op(write, 8, u8);
+b53_build_op(write, 16, u16);
+b53_build_op(write, 32, u32);
+b53_build_op(write, 48, u64);
+b53_build_op(write, 64, u64);
 
 struct b53_arl_entry {
u8 port;
-- 
2.9.3

[PATCH net-next v2 06/12] net: dsa: b53: Move Broadcom header setup to b53

2017-09-18 Thread Florian Fainelli

The code to enable Broadcom tags/headers is largely switch independent,
and in preparation for enabling it for multiple devices with b53, move
the code we have in bcm_sf2.c to b53_common.c

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 47 
 drivers/net/dsa/b53/b53_priv.h   |  1 +
 drivers/net/dsa/b53/b53_regs.h   |  7 ++
 drivers/net/dsa/bcm_sf2.c| 43 ++--
 drivers/net/dsa/bcm_sf2_regs.h   |  8 ---
 5 files changed, 57 insertions(+), 49 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 3297af6aab8a..aa2187c71ea5 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -538,6 +538,53 @@ static void b53_disable_port(struct dsa_switch *ds, int 
port,
b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), reg);
 }
 
+void b53_brcm_hdr_setup(struct dsa_switch *ds, int port)
+{
+   struct b53_device *dev = ds->priv;
+   u8 hdr_ctl, val;
+   u16 reg;
+
+   /* Resolve which bit controls the Broadcom tag */
+   switch (port) {
+   case 8:
+   val = BRCM_HDR_P8_EN;
+   break;
+   case 7:
+   val = BRCM_HDR_P7_EN;
+   break;
+   case 5:
+   val = BRCM_HDR_P5_EN;
+   break;
+   default:
+   val = 0;
+   break;
+   }
+
+   /* Enable Broadcom tags for IMP port */
+   b53_read8(dev, B53_MGMT_PAGE, B53_BRCM_HDR, _ctl);
+   hdr_ctl |= val;
+   b53_write8(dev, B53_MGMT_PAGE, B53_BRCM_HDR, hdr_ctl);
+
+   /* Registers below are only accessible on newer devices */
+   if (!is58xx(dev))
+   return;
+
+   /* Enable reception Broadcom tag for CPU TX (switch RX) to
+* allow us to tag outgoing frames
+*/
+   b53_read16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_RX_DIS, );
+   reg &= ~BIT(port);
+   b53_write16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_RX_DIS, reg);
+
+   /* Enable transmission of Broadcom tags from the switch (CPU RX) to
+* allow delivering frames to the per-port net_devices
+*/
+   b53_read16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_TX_DIS, );
+   reg &= ~BIT(port);
+   b53_write16(dev, B53_MGMT_PAGE, B53_BRCM_HDR_TX_DIS, reg);
+}
+EXPORT_SYMBOL(b53_brcm_hdr_setup);
+
 static void b53_enable_cpu_port(struct b53_device *dev, int port)
 {
u8 port_ctrl;
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index f1136619e0e4..44297b7c3795 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -309,5 +309,6 @@ int b53_mirror_add(struct dsa_switch *ds, int port,
   struct dsa_mall_mirror_tc_entry *mirror, bool ingress);
 void b53_mirror_del(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
+void b53_brcm_hdr_setup(struct dsa_switch *ds, int port);
 
 #endif
diff --git a/drivers/net/dsa/b53/b53_regs.h b/drivers/net/dsa/b53/b53_regs.h
index e5c86d44667a..5e8b8e31fee8 100644
--- a/drivers/net/dsa/b53/b53_regs.h
+++ b/drivers/net/dsa/b53/b53_regs.h
@@ -210,6 +210,7 @@
 #define B53_BRCM_HDR   0x03
 #define   BRCM_HDR_P8_EN   BIT(0) /* Enable tagging on port 8 */
 #define   BRCM_HDR_P5_EN   BIT(1) /* Enable tagging on port 5 */
+#define   BRCM_HDR_P7_EN   BIT(2) /* Enable tagging on port 7 */
 
 /* Mirror capture control register (16 bit) */
 #define B53_MIR_CAP_CTL0x10
@@ -249,6 +250,12 @@
 /* Revision ID register (8 bit) */
 #define B53_REV_ID 0x40
 
+/* Broadcom header RX control (16 bit) */
+#define B53_BRCM_HDR_RX_DIS0x60
+
+/* Broadcom header TX control (16 bit) */
+#define B53_BRCM_HDR_TX_DIS0x62
+
 /*
  * ARL Access Page Registers
  */
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 8acbd17bc1fd..49cb51223f70 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -60,45 +60,6 @@ static void bcm_sf2_imp_vlan_setup(struct dsa_switch *ds, 
int cpu_port)
}
 }
 
-static void bcm_sf2_brcm_hdr_setup(struct bcm_sf2_priv *priv, int port)
-{
-   u32 reg, val;
-
-   /* Resolve which bit controls the Broadcom tag */
-   switch (port) {
-   case 8:
-   val = BRCM_HDR_EN_P8;
-   break;
-   case 7:
-   val = BRCM_HDR_EN_P7;
-   break;
-   case 5:
-   val = BRCM_HDR_EN_P5;
-   break;
-   default:
-   val = 0;
-   break;
-   }
-
-   /* Enable Broadcom tags for IMP port */
-   reg

[PATCH net-next v2 09/12] net: dsa: b53: Wire-up EEE

2017-09-18 Thread Florian Fainelli

Add support for enabling and disabling EEE, as well as re-negotiating it in
.adjust_link() and in .port_enable().

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 491e4ffa8a0e..4e37ec27e496 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -523,6 +523,10 @@ static int b53_enable_port(struct dsa_switch *ds, int port,
 
b53_imp_vlan_setup(ds, cpu_port);
 
+   /* If EEE was enabled, restore it */
+   if (dev->ports[port].eee.eee_enabled)
+   b53_eee_enable_set(ds, port, true);
+
return 0;
 }
 
@@ -879,6 +883,7 @@ static void b53_adjust_link(struct dsa_switch *ds, int port,
struct phy_device *phydev)
 {
struct b53_device *dev = ds->priv;
+   struct ethtool_eee *p = >ports[port].eee;
u8 rgmii_ctrl = 0, reg = 0, off;
 
if (!phy_is_pseudo_fixed_link(phydev))
@@ -1000,6 +1005,9 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
port,
b53_write8(dev, B53_CTRL_PAGE, po_reg, gmii_po);
}
}
+
+   /* Re-negotiate EEE if it was enabled already */
+   p->eee_enabled = b53_eee_init(ds, port, phydev);
 }
 
 int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering)
@@ -1605,6 +1613,8 @@ static const struct dsa_switch_ops b53_switch_ops = {
.adjust_link= b53_adjust_link,
.port_enable= b53_enable_port,
.port_disable   = b53_disable_port,
+   .get_mac_eee= b53_get_mac_eee,
+   .set_mac_eee= b53_set_mac_eee,
.port_bridge_join   = b53_br_join,
.port_bridge_leave  = b53_br_leave,
.port_stp_state_set = b53_br_set_stp_state,
-- 
2.9.3

[PATCH net-next v2 08/12] net: dsa: b53: Move EEE functions to b53

2017-09-18 Thread Florian Fainelli

Move the bcm_sf2 EEE-related functions to the b53 driver because this is shared
code amongst Gigabit capable switch, only 5325 and 5365 are too old to support
that.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 63 ++
 drivers/net/dsa/b53/b53_priv.h   |  5 +++
 drivers/net/dsa/bcm_sf2.c| 66 
 drivers/net/dsa/bcm_sf2.h|  2 --
 drivers/net/dsa/bcm_sf2_regs.h   |  3 --
 5 files changed, 74 insertions(+), 65 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index aa2187c71ea5..491e4ffa8a0e 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1531,6 +1531,69 @@ void b53_mirror_del(struct dsa_switch *ds, int port,
 }
 EXPORT_SYMBOL(b53_mirror_del);
 
+void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable)
+{
+   struct b53_device *dev = ds->priv;
+   u16 reg;
+
+   b53_read16(dev, B53_EEE_PAGE, B53_EEE_EN_CTRL, );
+   if (enable)
+   reg |= BIT(port);
+   else
+   reg &= ~BIT(port);
+   b53_write16(dev, B53_EEE_PAGE, B53_EEE_EN_CTRL, reg);
+}
+EXPORT_SYMBOL(b53_eee_enable_set);
+
+
+/* Returns 0 if EEE was not enabled, or 1 otherwise
+ */
+int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy)
+{
+   int ret;
+
+   ret = phy_init_eee(phy, 0);
+   if (ret)
+   return 0;
+
+   b53_eee_enable_set(ds, port, true);
+
+   return 1;
+}
+EXPORT_SYMBOL(b53_eee_init);
+
+int b53_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e)
+{
+   struct b53_device *dev = ds->priv;
+   struct ethtool_eee *p = >ports[port].eee;
+   u16 reg;
+
+   if (is5325(dev) || is5365(dev))
+   return -EOPNOTSUPP;
+
+   b53_read16(dev, B53_EEE_PAGE, B53_EEE_LPI_INDICATE, );
+   e->eee_enabled = p->eee_enabled;
+   e->eee_active = !!(reg & BIT(port));
+
+   return 0;
+}
+EXPORT_SYMBOL(b53_get_mac_eee);
+
+int b53_set_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e)
+{
+   struct b53_device *dev = ds->priv;
+   struct ethtool_eee *p = >ports[port].eee;
+
+   if (is5325(dev) || is5365(dev))
+   return -EOPNOTSUPP;
+
+   p->eee_enabled = e->eee_enabled;
+   b53_eee_enable_set(ds, port, e->eee_enabled);
+
+   return 0;
+}
+EXPORT_SYMBOL(b53_set_mac_eee);
+
 static const struct dsa_switch_ops b53_switch_ops = {
.get_tag_protocol   = b53_get_tag_protocol,
.setup  = b53_setup,
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 44297b7c3795..0ed59672ef07 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -70,6 +70,7 @@ enum {
 
 struct b53_port {
u16 vlan_ctl_mask;
+   struct ethtool_eee eee;
 };
 
 struct b53_vlan {
@@ -310,5 +311,9 @@ int b53_mirror_add(struct dsa_switch *ds, int port,
 void b53_mirror_del(struct dsa_switch *ds, int port,
struct dsa_mall_mirror_tc_entry *mirror);
 void b53_brcm_hdr_setup(struct dsa_switch *ds, int port);
+void b53_eee_enable_set(struct dsa_switch *ds, int port, bool enable);
+int b53_eee_init(struct dsa_switch *ds, int port, struct phy_device *phy);
+int b53_get_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e);
+int b53_set_mac_eee(struct dsa_switch *ds, int port, struct ethtool_eee *e);
 
 #endif
diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 49cb51223f70..4e8ef4c07eab 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -107,19 +107,6 @@ static void bcm_sf2_imp_setup(struct dsa_switch *ds, int 
port)
core_writel(priv, reg, offset);
 }
 
-static void bcm_sf2_eee_enable_set(struct dsa_switch *ds, int port, bool 
enable)
-{
-   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-   u32 reg;
-
-   reg = core_readl(priv, CORE_EEE_EN_CTRL);
-   if (enable)
-   reg |= 1 << port;
-   else
-   reg &= ~(1 << port);
-   core_writel(priv, reg, CORE_EEE_EN_CTRL);
-}
-
 static void bcm_sf2_gphy_enable_set(struct dsa_switch *ds, bool enable)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
@@ -256,8 +243,8 @@ static int bcm_sf2_port_setup(struct dsa_switch *ds, int 
port,
bcm_sf2_imp_vlan_setup(ds, cpu_port);
 
/* If EEE was enabled, restore it */
-   if (priv->port_sts[port].eee.eee_enabled)
-   bcm_sf2_eee_enable_set(ds, port, true);
+   if (priv->dev->ports[port].eee.eee_enabled)
+   b53_eee_enable_set(ds, port, true);
 
return 0;
 }
@@ -292,47 +279,6 @@ static void bcm_sf2_port_disable(struct dsa_switch *ds, 
int port,
core_writel(priv, reg, CORE_MEM_PSM_VDD_CTRL);
 }
 
-/* Returns 0 if EEE was not enabled, or 1 otherwise
- */
-static int

Re: [PATCH net-next] net_sched: sch_htb: add per class overlimits counter

2017-09-18 Thread David Miller

From: Eric Dumazet 
Date: Mon, 18 Sep 2017 12:36:22 -0700

> From: Eric Dumazet 
> 
> HTB qdisc overlimits counter is properly increased, but we have no per
> class counter, meaning it is difficult to diagnose HTB problems.
> 
> This patch adds this counter, visible in "tc -s class show dev eth0",
> with current iproute2.
> 
> Signed-off-by: Eric Dumazet 
> Reported-by: Denys Fedoryshchenko 

Applied, thanks Eric.

[PATCH net-next v2 04/12] net: dsa: bcm_sf2: Defer port enabling to calling port_enable

2017-09-18 Thread Florian Fainelli

There is no need to configure the enabled ports once in bcm_sf2_sw_setup() and
then a second time around when dsa_switch_ops::port_enable is called, just do
it when port_enable is called which is better in terms of power consumption and
correctness.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 9 +++--
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index d7b53d53c116..8acbd17bc1fd 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -890,14 +890,11 @@ static int bcm_sf2_sw_setup(struct dsa_switch *ds)
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
unsigned int port;
 
-   /* Enable all valid ports and disable those unused */
+   /* Disable unused ports and configure IMP port */
for (port = 0; port < priv->hw_params.num_ports; port++) {
-   /* IMP port receives special treatment */
-   if ((1 << port) & ds->enabled_port_mask)
-   bcm_sf2_port_setup(ds, port, NULL);
-   else if (dsa_is_cpu_port(ds, port))
+   if (dsa_is_cpu_port(ds, port))
bcm_sf2_imp_setup(ds, port);
-   else
+   else if (!((1 << port) & ds->enabled_port_mask))
bcm_sf2_port_disable(ds, port, NULL);
}
 
-- 
2.9.3

[PATCH net-next v2 03/12] net: dsa: b53: Defer port enabling to calling port_enable

2017-09-18 Thread Florian Fainelli

There is no need to configure the enabled ports once in b53_setup() and then a
second time around when dsa_switch_ops::port_enable is called, just do it when
port_enable is called which is better in terms of power consumption and
correctness.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index d8bc54cfcfbe..3297af6aab8a 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -815,12 +815,13 @@ static int b53_setup(struct dsa_switch *ds)
if (ret)
dev_err(ds->dev, "failed to apply configuration\n");
 
+   /* Configure IMP/CPU port, disable unused ports. Enabled
+* ports will be configured with .port_enable
+*/
for (port = 0; port < dev->num_ports; port++) {
-   if (BIT(port) & ds->enabled_port_mask)
-   b53_enable_port(ds, port, NULL);
-   else if (dsa_is_cpu_port(ds, port))
+   if (dsa_is_cpu_port(ds, port))
b53_enable_cpu_port(dev, port);
-   else
+   else if (!(BIT(port) & ds->enabled_port_mask))
b53_disable_port(ds, port, NULL);
}
 
-- 
2.9.3

[PATCH net-next v2 02/12] net: dsa: b53: Make b53_enable_cpu_port() take a port argument

2017-09-18 Thread Florian Fainelli

In preparation for future changes allowing the configuring of multiple
CPU ports, make b53_enable_cpu_port() take a port argument.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 274f3679f33d..d8bc54cfcfbe 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -538,19 +538,18 @@ static void b53_disable_port(struct dsa_switch *ds, int 
port,
b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), reg);
 }
 
-static void b53_enable_cpu_port(struct b53_device *dev)
+static void b53_enable_cpu_port(struct b53_device *dev, int port)
 {
-   unsigned int cpu_port = dev->cpu_port;
u8 port_ctrl;
 
/* BCM5325 CPU port is at 8 */
-   if ((is5325(dev) || is5365(dev)) && cpu_port == B53_CPU_PORT_25)
-   cpu_port = B53_CPU_PORT;
+   if ((is5325(dev) || is5365(dev)) && port == B53_CPU_PORT_25)
+   port = B53_CPU_PORT;
 
port_ctrl = PORT_CTRL_RX_BCST_EN |
PORT_CTRL_RX_MCST_EN |
PORT_CTRL_RX_UCST_EN;
-   b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(cpu_port), port_ctrl);
+   b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), port_ctrl);
 }
 
 static void b53_enable_mib(struct b53_device *dev)
@@ -820,7 +819,7 @@ static int b53_setup(struct dsa_switch *ds)
if (BIT(port) & ds->enabled_port_mask)
b53_enable_port(ds, port, NULL);
else if (dsa_is_cpu_port(ds, port))
-   b53_enable_cpu_port(dev);
+   b53_enable_cpu_port(dev, port);
else
b53_disable_port(ds, port, NULL);
}
-- 
2.9.3

[PATCH net-next v2 01/12] net: dsa: b53: Remove is_cpu_port()

2017-09-18 Thread Florian Fainelli

This is not used anywhere, so remove it.

Reviewed-by: Vivien Didelot 
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_priv.h | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 01bd8cbe9a3f..7528b22aeb03 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -186,11 +186,6 @@ static inline int is58xx(struct b53_device *dev)
 #define B53_CPU_PORT_255
 #define B53_CPU_PORT   8
 
-static inline int is_cpu_port(struct b53_device *dev, int port)
-{
-   return dev->cpu_port;
-}
-
 struct b53_device *b53_switch_alloc(struct device *base,
const struct b53_io_ops *ops,
void *priv);
-- 
2.9.3

[PATCH net-next v2 00/12] net: dsa: b53/bcm_sf2 cleanups

2017-09-18 Thread Florian Fainelli

Hi all,

This patch series is a first pass set of clean-ups to reduce the number of LOCs
between b53 and bcm_sf2 and sharing as many functions as possible.

There is a number of additional cleanups queued up locally that require more
thorough testing.

Thanks!

Changes in v2:

- added Reviewed-by tags from Vivien
- added a missing EXPORT_SYMBOL() in patch 8
- fixed a typo in patch 5

Florian Fainelli (12):
  net: dsa: b53: Remove is_cpu_port()
  net: dsa: b53: Make b53_enable_cpu_port() take a port argument
  net: dsa: b53: Defer port enabling to calling port_enable
  net: dsa: bcm_sf2: Defer port enabling to calling port_enable
  net: dsa: b53: Use a macro to define I/O operations
  net: dsa: b53: Move Broadcom header setup to b53
  net: dsa: b53: Define EEE register page
  net: dsa: b53: Move EEE functions to b53
  net: dsa: b53: Wire-up EEE
  net: dsa: b53: Export b53_imp_vlan_setup()
  net: dsa: bcm_sf2: Use SF2_NUM_EGRESS_QUEUES for CFP
  net: dsa: bcm_sf2: Utilize b53_{enable,disable}_port

 drivers/net/dsa/b53/b53_common.c | 151 
 drivers/net/dsa/b53/b53_priv.h   | 145 ---
 drivers/net/dsa/b53/b53_regs.h   |  48 
 drivers/net/dsa/bcm_sf2.c| 161 +++
 drivers/net/dsa/bcm_sf2.h|   2 -
 drivers/net/dsa/bcm_sf2_cfp.c|   6 +-
 drivers/net/dsa/bcm_sf2_regs.h   |  11 ---
 7 files changed, 228 insertions(+), 296 deletions(-)

-- 
2.9.3

Re: [PATCH net] bpf: fix ri->map prog pointer on bpf_prog_realloc

2017-09-18 Thread Alexei Starovoitov

On Tue, Sep 19, 2017 at 03:16:44AM +0200, Daniel Borkmann wrote:
> Commit 109980b894e9 ("bpf: don't select potentially stale
> ri->map from buggy xdp progs") passed the pointer to the prog
> itself to be loaded into r4 prior on bpf_redirect_map() helper
> call, so that we can store the owner into ri->map_owner out of
> the helper.
> 
> Issue with that is that the actual address of the prog is still
> subject to change when subsequent rewrites occur, e.g. through
> patching other helper functions or constant blinding. Thus, we
> really need to take prog->aux as the address we're holding, and
> then during runtime fetch the actual pointer via aux->prog. This
> also works with prog clones as they share the same aux and fixup
> pointer to self after blinding finished.
> 
> Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy 
> xdp progs")
> Signed-off-by: Daniel Borkmann 
> ---
>  kernel/bpf/verifier.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 799b245..243c09f 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -4205,9 +4205,17 @@ static int fixup_bpf_calls(struct bpf_verifier_env 
> *env)
>   }
>  
>   if (insn->imm == BPF_FUNC_redirect_map) {
> - u64 addr = (unsigned long)prog;
> + /* Note, we cannot use prog directly as imm as 
> subsequent
> +  * rewrites would still change the prog pointer. The 
> only
> +  * stable address we can use is aux, which also works 
> with
> +  * prog clones during blinding.
> +  */

good catch. extra load at runtime sucks, but I don't see better solution.

> + u64 addr = (unsigned long)prog->aux;
> + const int r4 = BPF_REG_4;
>   struct bpf_insn r4_ld[] = {
> - BPF_LD_IMM64(BPF_REG_4, addr),
> + BPF_LD_IMM64(r4, addr),
> + BPF_LDX_MEM(BPF_DW, r4, r4,
> + offsetof(struct bpf_prog_aux, 
> prog)),

needs to be BPF_FIELD_SIZEOF(struct bpf_prog_aux, prog) to work on 32-bit

Re: cross namespace interface notification for tun devices

2017-09-18 Thread Jason A. Donenfeld

On Mon, Sep 18, 2017 at 8:47 PM, Jason A. Donenfeld  wrote:
> The best I've come up with is, in a sleep loop, writing to the tun
> device's fd something with a NULL or invalid payload. If the interface
> is down, the kernel returns -EIO. If the interface is up, the kernel
> returns -EINVAL. This seems to be a reliable distinguisher, but is a
> pretty insane way of doing it. And sleep loops are somewhat different
> from events too.

Specifically, I'm referring to the horrific hack exemplified in the
attached .c file, in case anybody is curious about the details of what
I'd rather not use.
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

int main(int argc, char *argv[])
{
	/* If IFF_NO_PI is specified, this still sort of works but it
	 * bumps the device error counters, which we don't want, so
	 * it's best not to use this trick with IFF_NO_PI. */
	struct ifreq ifr = { .ifr_flags = IFF_TUN };
	int tun, sock, ret;

	tun = open("/dev/net/tun", O_RDWR);
	if (tun < 0) {
		perror("[-] open(/dev/net/tun)");
		return 1;
	}

	sock = socket(AF_INET, SOCK_DGRAM, 0);
	if (sock < 0) {
		perror("[-] socket(AF_INET, SOCK_DGRAM)");
		return 1;
	}

	ret = ioctl(tun, TUNSETIFF, );
	if (ret < 0) {
		perror("[-] ioctl(TUNSETIFF)");
		return 1;
	}

	if (write(tun, NULL, 0) >= 0 || errno != EIO)
		perror("[-] write(if:down, NULL, 0) did not return -EIO");
	else
		fprintf(stderr, "[+] write(if:down, NULL, 0) returned -EIO: test successful\n");

	ifr.ifr_flags = IFF_UP;
	ret = ioctl(sock, SIOCSIFFLAGS, );
	if (ret < 0) {
		perror("[-] ioctl(SIOCSIFFLAGS)");
		return 1;
	}

	if (write(tun, NULL, 0) >= 0 || errno != EINVAL)
		perror("[-] write(if:up, NULL, 0) did not return -EINVAL");
	else
		fprintf(stderr, "[+] write(if:up, NULL, 0) returned -EINVAL: test successful\n");
	
	return 0;
}

[PATCH net] bpf: fix ri->map prog pointer on bpf_prog_realloc

2017-09-18 Thread Daniel Borkmann

Commit 109980b894e9 ("bpf: don't select potentially stale
ri->map from buggy xdp progs") passed the pointer to the prog
itself to be loaded into r4 prior on bpf_redirect_map() helper
call, so that we can store the owner into ri->map_owner out of
the helper.

Issue with that is that the actual address of the prog is still
subject to change when subsequent rewrites occur, e.g. through
patching other helper functions or constant blinding. Thus, we
really need to take prog->aux as the address we're holding, and
then during runtime fetch the actual pointer via aux->prog. This
also works with prog clones as they share the same aux and fixup
pointer to self after blinding finished.

Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy 
xdp progs")
Signed-off-by: Daniel Borkmann 
---
 kernel/bpf/verifier.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 799b245..243c09f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -4205,9 +4205,17 @@ static int fixup_bpf_calls(struct bpf_verifier_env *env)
}
 
if (insn->imm == BPF_FUNC_redirect_map) {
-   u64 addr = (unsigned long)prog;
+   /* Note, we cannot use prog directly as imm as 
subsequent
+* rewrites would still change the prog pointer. The 
only
+* stable address we can use is aux, which also works 
with
+* prog clones during blinding.
+*/
+   u64 addr = (unsigned long)prog->aux;
+   const int r4 = BPF_REG_4;
struct bpf_insn r4_ld[] = {
-   BPF_LD_IMM64(BPF_REG_4, addr),
+   BPF_LD_IMM64(r4, addr),
+   BPF_LDX_MEM(BPF_DW, r4, r4,
+   offsetof(struct bpf_prog_aux, 
prog)),
*insn,
};
cnt = ARRAY_SIZE(r4_ld);
-- 
1.9.3

[PATCH net 0/7] Bug fixes for the HNS3 Ethernet Driver for Hip08 SoC

2017-09-18 Thread Salil Mehta

This patch set presents some bug fixes for the HNS3 Ethernet driver, identified
during internal testing & stabilization efforts.

This patch series is meant for Linux 4.14 kernel. 

Lipeng (6):
  net: hns3: get phy addr from NCL_config
  net: hns3: fix the command used to unmap ring from vector
  net: hns3: Fix ring and vector map command
  net: hns3: fix a bug of set mac address
  net: hns3: set default vlan id to PF
  net: hns3: Fixes the premature exit of loop when matching clients

Salil Mehta (1):
  net: hns3: fixes the ether address copy with more appropriate API

 drivers/net/ethernet/hisilicon/hns3/hnae3.c| 43 +-
 .../net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h |  8 +++-
 .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c| 20 --
 .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c |  7 ++--
 4 files changed, 35 insertions(+), 43 deletions(-)

-- 
2.11.0

[PATCH net 3/7] net: hns3: Fix ring and vector map command

2017-09-18 Thread Salil Mehta

From: Lipeng 

This patch add INT_GL and VF id to vector configure when bind ring
with vector. INT_GL means Interrupt Gap Limiting. Vector id starts
from 0 in each VF, so the bind command must specify VF id.

Signed-off-by: Lipeng 
Signed-off-by: Mingguang Qu 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h  | 8 ++--
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 8 
 2 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
index 91ae0135ee50..c2b613b40509 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_cmd.h
@@ -238,7 +238,7 @@ struct hclge_tqp_map {
u8 rsv[18];
 };
 
-#define HCLGE_VECTOR_ELEMENTS_PER_CMD  11
+#define HCLGE_VECTOR_ELEMENTS_PER_CMD  10
 
 enum hclge_int_type {
HCLGE_INT_TX,
@@ -252,8 +252,12 @@ struct hclge_ctrl_vector_chain {
 #define HCLGE_INT_TYPE_S   0
 #define HCLGE_INT_TYPE_M   0x3
 #define HCLGE_TQP_ID_S 2
-#define HCLGE_TQP_ID_M (0x3fff << HCLGE_TQP_ID_S)
+#define HCLGE_TQP_ID_M (0x7ff << HCLGE_TQP_ID_S)
+#define HCLGE_INT_GL_IDX_S 13
+#define HCLGE_INT_GL_IDX_M (0x3 << HCLGE_INT_GL_IDX_S)
__le16 tqp_type_and_id[HCLGE_VECTOR_ELEMENTS_PER_CMD];
+   u8 vfid;
+   u8 rsv;
 };
 
 #define HCLGE_TC_NUM   8
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index e324bc6e9f4f..eafd9c678162 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2680,7 +2680,11 @@ int hclge_map_vport_ring_to_vector(struct hclge_vport 
*vport, int vector_id,
   hnae_get_bit(node->flag, HNAE3_RING_TYPE_B));
hnae_set_field(req->tqp_type_and_id[i], HCLGE_TQP_ID_M,
   HCLGE_TQP_ID_S,  node->tqp_index);
+   hnae_set_field(req->tqp_type_and_id[i], HCLGE_INT_GL_IDX_M,
+  HCLGE_INT_GL_IDX_S,
+  hnae_get_bit(node->flag, HNAE3_RING_TYPE_B));
req->tqp_type_and_id[i] = cpu_to_le16(req->tqp_type_and_id[i]);
+   req->vfid = vport->vport_id;
 
if (++i >= HCLGE_VECTOR_ELEMENTS_PER_CMD) {
req->int_cause_num = HCLGE_VECTOR_ELEMENTS_PER_CMD;
@@ -2764,8 +2768,12 @@ static int hclge_unmap_ring_from_vector(
   hnae_get_bit(node->flag, HNAE3_RING_TYPE_B));
hnae_set_field(req->tqp_type_and_id[i], HCLGE_TQP_ID_M,
   HCLGE_TQP_ID_S,  node->tqp_index);
+   hnae_set_field(req->tqp_type_and_id[i], HCLGE_INT_GL_IDX_M,
+  HCLGE_INT_GL_IDX_S,
+  hnae_get_bit(node->flag, HNAE3_RING_TYPE_B));
 
req->tqp_type_and_id[i] = cpu_to_le16(req->tqp_type_and_id[i]);
+   req->vfid = vport->vport_id;
 
if (++i >= HCLGE_VECTOR_ELEMENTS_PER_CMD) {
req->int_cause_num = HCLGE_VECTOR_ELEMENTS_PER_CMD;
-- 
2.11.0

[PATCH net 5/7] net: hns3: fixes the ether address copy with more appropriate API

2017-09-18 Thread Salil Mehta

This patch replaces the ethernet address copy instance with more
appropriate ether_addr_copy() function.

Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index eafd9c678162..8e172afd4876 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1063,8 +1063,7 @@ static int hclge_configure(struct hclge_dev *hdev)
hdev->base_tqp_pid = 0;
hdev->rss_size_max = 1;
hdev->rx_buf_len = cfg.rx_buf_len;
-   for (i = 0; i < ETH_ALEN; i++)
-   hdev->hw.mac.mac_addr[i] = cfg.mac_addr[i];
+   ether_addr_copy(hdev->hw.mac.mac_addr, cfg.mac_addr);
hdev->hw.mac.media_type = cfg.media_type;
hdev->hw.mac.phy_addr = cfg.phy_addr;
hdev->num_desc = cfg.tqp_desc_num;
-- 
2.11.0

[PATCH net 4/7] net: hns3: fix a bug of set mac address

2017-09-18 Thread Salil Mehta

From: Lipeng 

HNS3 driver get mac address from NCL_config file and set the mac address
to HW. If the mac address in NCL_config is invalid, driver will set a
random mac address, and use this address.

The current code will set random mac address to HW, but will not set the
valid mac address from NCL_config file to HW. This patch fix the bug.

Signed-off-by: Lipeng 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
index 1c3e29447891..4d68d6ea5143 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c
@@ -2705,10 +2705,11 @@ static void hns3_init_mac_addr(struct net_device 
*netdev)
eth_hw_addr_random(netdev);
dev_warn(priv->dev, "using random MAC address %pM\n",
 netdev->dev_addr);
-   /* Also copy this new MAC address into hdev */
-   if (h->ae_algo->ops->set_mac_addr)
-   h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr);
}
+
+   if (h->ae_algo->ops->set_mac_addr)
+   h->ae_algo->ops->set_mac_addr(h, netdev->dev_addr);
+
 }
 
 static void hns3_nic_set_priv_ops(struct net_device *netdev)
-- 
2.11.0

[PATCH net 7/7] net: hns3: Fixes the premature exit of loop when matching clients

2017-09-18 Thread Salil Mehta

From: Lipeng 

When register/unregister ae_dev, ae_dev should match all client
in the client_list. Enet and roce can co-exists together so we
should continue checking for enet and roce presence together.
So break should not be there.

Above caused problems in loading and unloading of modules.

Signed-off-by: Lipeng 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hnae3.c | 43 ++---
 1 file changed, 9 insertions(+), 34 deletions(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hnae3.c 
b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
index 59efbd605416..5bcb2238acb2 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hnae3.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hnae3.c
@@ -37,20 +37,15 @@ static bool hnae3_client_match(enum hnae3_client_type 
client_type,
 }
 
 static int hnae3_match_n_instantiate(struct hnae3_client *client,
-struct hnae3_ae_dev *ae_dev,
-bool is_reg, bool *matched)
+struct hnae3_ae_dev *ae_dev, bool is_reg)
 {
int ret;
 
-   *matched = false;
-
/* check if this client matches the type of ae_dev */
if (!(hnae3_client_match(client->type, ae_dev->dev_type) &&
  hnae_get_bit(ae_dev->flag, HNAE3_DEV_INITED_B))) {
return 0;
}
-   /* there is a match of client and dev */
-   *matched = true;
 
/* now, (un-)instantiate client by calling lower layer */
if (is_reg) {
@@ -69,7 +64,6 @@ int hnae3_register_client(struct hnae3_client *client)
 {
struct hnae3_client *client_tmp;
struct hnae3_ae_dev *ae_dev;
-   bool matched;
int ret = 0;
 
mutex_lock(_common_lock);
@@ -86,7 +80,7 @@ int hnae3_register_client(struct hnae3_client *client)
/* if the client could not be initialized on current port, for
 * any error reasons, move on to next available port
 */
-   ret = hnae3_match_n_instantiate(client, ae_dev, true, );
+   ret = hnae3_match_n_instantiate(client, ae_dev, true);
if (ret)
dev_err(_dev->pdev->dev,
"match and instantiation failed for port\n");
@@ -102,12 +96,11 @@ EXPORT_SYMBOL(hnae3_register_client);
 void hnae3_unregister_client(struct hnae3_client *client)
 {
struct hnae3_ae_dev *ae_dev;
-   bool matched;
 
mutex_lock(_common_lock);
/* un-initialize the client on every matched port */
list_for_each_entry(ae_dev, _ae_dev_list, node) {
-   hnae3_match_n_instantiate(client, ae_dev, false, );
+   hnae3_match_n_instantiate(client, ae_dev, false);
}
 
list_del(>node);
@@ -124,7 +117,6 @@ int hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo)
const struct pci_device_id *id;
struct hnae3_ae_dev *ae_dev;
struct hnae3_client *client;
-   bool matched;
int ret = 0;
 
mutex_lock(_common_lock);
@@ -151,13 +143,10 @@ int hnae3_register_ae_algo(struct hnae3_ae_algo *ae_algo)
 * initialize the figure out client instance
 */
list_for_each_entry(client, _client_list, node) {
-   ret = hnae3_match_n_instantiate(client, ae_dev, true,
-   );
+   ret = hnae3_match_n_instantiate(client, ae_dev, true);
if (ret)
dev_err(_dev->pdev->dev,
"match and instantiation failed\n");
-   if (matched)
-   break;
}
}
 
@@ -175,7 +164,6 @@ void hnae3_unregister_ae_algo(struct hnae3_ae_algo *ae_algo)
const struct pci_device_id *id;
struct hnae3_ae_dev *ae_dev;
struct hnae3_client *client;
-   bool matched;
 
mutex_lock(_common_lock);
/* Check if there are matched ae_dev */
@@ -187,12 +175,8 @@ void hnae3_unregister_ae_algo(struct hnae3_ae_algo 
*ae_algo)
/* check the client list for the match with this ae_dev type and
 * un-initialize the figure out client instance
 */
-   list_for_each_entry(client, _client_list, node) {
-   hnae3_match_n_instantiate(client, ae_dev, false,
- );
-   if (matched)
-   break;
-   }
+   list_for_each_entry(client, _client_list, node)
+   hnae3_match_n_instantiate(client, ae_dev, false);
 
ae_algo->ops->uninit_ae_dev(ae_dev);
hnae_set_bit(ae_dev->flag, HNAE3_DEV_INITED_B, 0);
@@ -212,7 +196,6 @@ int

RE: [PATCH V2] tipc: Use bsearch library function

2017-09-18 Thread Jon Maloy



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Joe Perches
> Sent: Sunday, September 17, 2017 23:15
> To: Jon Maloy ; Thomas Meyer
> 
> Cc: Ying Xue ; netdev@vger.kernel.org; tipc-
> discuss...@lists.sourceforge.net; linux-ker...@vger.kernel.org;
> da...@davemloft.net
> Subject: Re: [PATCH V2] tipc: Use bsearch library function
> 
> On Sun, 2017-09-17 at 16:27 +, Jon Maloy wrote:
> > > -Original Message-
> > > From: Thomas Meyer [mailto:tho...@m3y3r.de]
> []
> > > What about the other binary search implementation in the same file?
> > > Should I try to convert it it will it get NAKed for performance reasons 
> > > too?
> >
> > The searches for inserting and removing publications is less time
> > critical, so that would be ok with me.
> > If you have any more general interest in improving the code in this
> > file (which is needed) it would also be appreciated.
> 
> Perhaps using an rbtree would be an improvement.

Not a bad idea. It would probably reduce the code amount, possibly at the 
expense of cache hit rate during the binary lookup.
It is worth looking into.

///jon

[PATCH net 6/7] net: hns3: set default vlan id to PF

2017-09-18 Thread Salil Mehta

From: Lipeng 

When there is no vlan id in the packets, hardware will treat the vlan id
as 0 and look for the mac_vlan table. This patch set the default vlan id
of PF as 0. Without this config, it will fail when look for mac_vlan
table, and hardware will drop packets.

Signed-off-by: Mingguang Qu 
Signed-off-by: Lipeng 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index 8e172afd4876..74008ef23169 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3673,6 +3673,7 @@ static int hclge_init_vlan_config(struct hclge_dev *hdev)
 {
 #define HCLGE_VLAN_TYPE_VF_TABLE   0
 #define HCLGE_VLAN_TYPE_PORT_TABLE 1
+   struct hnae3_handle *handle;
int ret;
 
ret = hclge_set_vlan_filter_ctrl(hdev, HCLGE_VLAN_TYPE_VF_TABLE,
@@ -3682,8 +3683,11 @@ static int hclge_init_vlan_config(struct hclge_dev *hdev)
 
ret = hclge_set_vlan_filter_ctrl(hdev, HCLGE_VLAN_TYPE_PORT_TABLE,
 true);
+   if (ret)
+   return ret;
 
-   return ret;
+   handle = >vport[0].nic;
+   return hclge_set_port_vlan_filter(handle, htons(ETH_P_8021Q), 0, false);
 }
 
 static int hclge_set_mtu(struct hnae3_handle *handle, int new_mtu)
-- 
2.11.0

[PATCH net 1/7] net: hns3: get phy addr from NCL_config

2017-09-18 Thread Salil Mehta

From: Lipeng 

NCL_config file defines phy address for every port. Driver should get
phy address from NCL_config file.If do not get the right phy address,
every port will use the default phy address 0, different port use the
same phy address will cause error.

Signed-off-by: Lipeng 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index bb45365fb817..db4e07dac29a 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -1066,6 +1066,7 @@ static int hclge_configure(struct hclge_dev *hdev)
for (i = 0; i < ETH_ALEN; i++)
hdev->hw.mac.mac_addr[i] = cfg.mac_addr[i];
hdev->hw.mac.media_type = cfg.media_type;
+   hdev->hw.mac.phy_addr = cfg.phy_addr;
hdev->num_desc = cfg.tqp_desc_num;
hdev->tm_info.num_pg = 1;
hdev->tm_info.num_tc = cfg.tc_num;
-- 
2.11.0

[PATCH net 2/7] net: hns3: fix the command used to unmap ring from vector

2017-09-18 Thread Salil Mehta

From: Lipeng 

When unmap ring from vector, it use wrong command, this will cause
error if the unmap action need multi command description. This patch
fix the error.

Signed-off-by: Lipeng 
Signed-off-by: Salil Mehta 
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c 
b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index db4e07dac29a..e324bc6e9f4f 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -2779,7 +2779,7 @@ static int hclge_unmap_ring_from_vector(
}
i = 0;
hclge_cmd_setup_basic_desc(,
-  HCLGE_OPC_ADD_RING_TO_VECTOR,
+  HCLGE_OPC_DEL_RING_TO_VECTOR,
   false);
req->int_vector_id = vector_id;
}
-- 
2.11.0

RE: [PATCH RFC 6/6] Modify tag_ksz.c to support other KSZ switch drivers

2017-09-18 Thread Tristram.Ha

> I am not really sure why this is such a concern for you so soon when
> your driver is not even included yet. You should really aim for baby
> steps here: get the basic driver(s) included, with a limited set of
> features, and gradually add more features to the driver. When
> fwd_offload_mark and RSTP become a real problem, we can most
> definitively find a way to fix those in DSA and depending drivers.

I was under the impression that there is a new push of this new switchdev
model and so the DSA model was overhauled to support that.

The KSZ9477 driver is already in the kernel, and its register access is actually
much different from the other older switches.  There are not much common
code to be reused.  I always know this tail tag handling is the sticking point.
I will submit a much simplified driver and wait for switch access in the future.

ipv4 ID calculation

2017-09-18 Thread Harsha Chenji

Hi all,

Where is the ID field of the IPv4 header created when the DF flag is
set? I am looking at ip_build_and_send_pkt. The code seems to have
changed in 4.4-rc1:

if (ip_dont_fragment(sk, >dst)) {
iph->frag_off = htons(IP_DF);
iph->id = 0;
} else {
iph->frag_off = 0;
__ip_select_ident(net, iph, 1);
}

old code (executed irrespective of DF or not):

ip_select_ident(sock_net(sk), skb, sk);

The code in Stevens is basically iph->id = htons(ip_ident++) and now
it seems to be calculated based on a hash + lookup table.

So where is the id of 0 overwritten when DF is set? Didn't find any
info in the docs.

P.S. - is this the right mailing list for these kind of questions?

Thanks!

[PATCH net-next 06/14] gtp: Eliminate pktinfo and add port configuration

2017-09-18 Thread Tom Herbert

The gtp pktinfo structure is unnecessary and needs a lot of code to
manage it. Remove it. Also, add per pdp port configuration for transmit.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 167 ---
 include/uapi/linux/gtp.h |   1 +
 2 files changed, 71 insertions(+), 97 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index f2089fa4f004..a928279c382c 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -53,6 +53,7 @@ struct pdp_ctx {
} v1;
} u;
u8  gtp_version;
+   __be16  gtp_port;
u16 af;
 
struct in_addr  ms_addr_ip4;
@@ -418,149 +419,112 @@ static inline void gtp1_push_header(struct sk_buff 
*skb, struct pdp_ctx *pctx)
 */
 }
 
-struct gtp_pktinfo {
-   struct sock *sk;
-   struct iphdr*iph;
-   struct flowi4   fl4;
-   struct rtable   *rt;
-   struct pdp_ctx  *pctx;
-   struct net_device   *dev;
-   __be16  gtph_port;
-};
-
-static void gtp_push_header(struct sk_buff *skb, struct gtp_pktinfo *pktinfo)
+static void gtp_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 {
-   switch (pktinfo->pctx->gtp_version) {
+   switch (pctx->gtp_version) {
case GTP_V0:
-   pktinfo->gtph_port = htons(GTP0_PORT);
-   gtp0_push_header(skb, pktinfo->pctx);
+   gtp0_push_header(skb, pctx);
break;
case GTP_V1:
-   pktinfo->gtph_port = htons(GTP1U_PORT);
-   gtp1_push_header(skb, pktinfo->pctx);
+   gtp1_push_header(skb, pctx);
break;
}
 }
 
-static inline void gtp_set_pktinfo_ipv4(struct gtp_pktinfo *pktinfo,
-   struct sock *sk, struct iphdr *iph,
-   struct pdp_ctx *pctx, struct rtable *rt,
-   struct flowi4 *fl4,
-   struct net_device *dev)
-{
-   pktinfo->sk = sk;
-   pktinfo->iph= iph;
-   pktinfo->pctx   = pctx;
-   pktinfo->rt = rt;
-   pktinfo->fl4= *fl4;
-   pktinfo->dev= dev;
-}
-
-static int gtp_build_skb_ip4(struct sk_buff *skb, struct net_device *dev,
-struct gtp_pktinfo *pktinfo)
+static int gtp_xmit(struct sk_buff *skb, struct net_device *dev,
+   struct pdp_ctx *pctx)
 {
-   struct gtp_dev *gtp = netdev_priv(dev);
-   struct pdp_ctx *pctx;
+   struct sock *sk = pctx->sk;
+   __be32 saddr = inet_sk(sk)->inet_saddr;
struct rtable *rt;
-   struct flowi4 fl4;
-   struct iphdr *iph;
-   struct sock *sk;
-   __be32 saddr;
-
-   /* Read the IP destination address and resolve the PDP context.
-* Prepend PDP header with TEI/TID from PDP ctx.
-*/
-   iph = ip_hdr(skb);
-   if (gtp->role == GTP_ROLE_SGSN)
-   pctx = ipv4_pdp_find(gtp, iph->saddr);
-   else
-   pctx = ipv4_pdp_find(gtp, iph->daddr);
+   int err = 0;
 
-   if (!pctx) {
-   netdev_dbg(dev, "no PDP ctx found for %pI4, skip\n",
-  >daddr);
-   return -ENOENT;
-   }
-   netdev_dbg(dev, "found PDP context %p\n", pctx);
+   /* Ensure there is sufficient headroom. */
+   err = skb_cow_head(skb, dev->needed_headroom);
+   if (unlikely(err))
+   goto out_err;
 
-   sk = pctx->sk;
-   saddr = inet_sk(sk)->inet_saddr;
+   skb_reset_inner_headers(skb);
 
rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
 sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
 pctx->peer_addr_ip4.s_addr, ,
-pktinfo->gtph_port, pktinfo->gtph_port,
+pctx->gtp_port, pctx->gtp_port,
 >dst_cache, NULL);
 
if (IS_ERR(rt)) {
-   if (rt == ERR_PTR(-ELOOP)) {
-   netdev_dbg(dev, "circular route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.collisions++;
-   goto err_rt;
-   } else {
-   netdev_dbg(dev, "no route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.tx_carrier_errors++;
-   goto err;
-   }
+   err = PTR_ERR(rt);
+   goto out_err;
}
 
skb_dst_drop(skb);
 
-   gtp_set_pktinfo_ipv4(pktinfo, sk, iph, pctx, rt, , dev);
-   gtp_push_header(skb, pktinfo);
+   gtp_push_header(skb, pctx);
+   udp_tunnel_xmit_skb(rt, sk, skb, saddr,
+

[PATCH net-next 09/14] gtp: Allow configuring GTP interface as standalone

2017-09-18 Thread Tom Herbert

Add new configuration of GTP interfaces that allow specifying a port to
listen on (as opposed to having to get sockets from a userspace control
plane). This allows GTP interfaces to be configured and the data path
tested without requiring a GTP-C daemon.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 212 +++
 include/uapi/linux/gtp.h |   5 ++
 2 files changed, 166 insertions(+), 51 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 121b41e7a901..1870469a4982 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -86,6 +86,9 @@ struct gtp_dev {
struct sock *sk0;
struct sock *sk1u;
 
+   struct socket   *sock0;
+   struct socket   *sock1u;
+
struct net_device   *dev;
 
unsigned introle;
@@ -430,26 +433,33 @@ static void gtp_encap_destroy(struct sock *sk)
}
 }
 
-static void gtp_encap_disable_sock(struct sock *sk)
+static void gtp_encap_release(struct gtp_dev *gtp)
 {
-   if (!sk)
-   return;
+   if (gtp->sk0) {
+   if (gtp->sock0) {
+   udp_tunnel_sock_release(gtp->sock0);
+   gtp->sock0 = NULL;
+   } else {
+   gtp_encap_destroy(gtp->sk0);
+   }
 
-   gtp_encap_destroy(sk);
-}
+   gtp->sk0 = NULL;
+   }
 
-static void gtp_encap_disable(struct gtp_dev *gtp)
-{
-   gtp_encap_disable_sock(gtp->sk0);
-   gtp_encap_disable_sock(gtp->sk1u);
+   if (gtp->sk1u) {
+   if (gtp->sock1u) {
+   udp_tunnel_sock_release(gtp->sock1u);
+   gtp->sock1u = NULL;
+   } else {
+   gtp_encap_destroy(gtp->sk1u);
+   }
+
+   gtp->sk1u = NULL;
+   }
 }
 
 static int gtp_dev_init(struct net_device *dev)
 {
-   struct gtp_dev *gtp = netdev_priv(dev);
-
-   gtp->dev = dev;
-
dev->tstats = netdev_alloc_pcpu_stats(struct pcpu_sw_netstats);
if (!dev->tstats)
return -ENOMEM;
@@ -461,7 +471,8 @@ static void gtp_dev_uninit(struct net_device *dev)
 {
struct gtp_dev *gtp = netdev_priv(dev);
 
-   gtp_encap_disable(gtp);
+   gtp_encap_release(gtp);
+
free_percpu(dev->tstats);
 }
 
@@ -676,6 +687,7 @@ static const struct net_device_ops gtp_netdev_ops = {
 
 static void gtp_link_setup(struct net_device *dev)
 {
+   struct gtp_dev *gtp = netdev_priv(dev);
dev->netdev_ops = _netdev_ops;
dev->needs_free_netdev  = true;
 
@@ -697,6 +709,8 @@ static void gtp_link_setup(struct net_device *dev)
  sizeof(struct udphdr) +
  sizeof(struct gtp0_header);
 
+   gtp->dev = dev;
+
gro_cells_init(>gro_cells, dev);
 }
 
@@ -710,13 +724,19 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
   struct netlink_ext_ack *extack)
 {
unsigned int role = GTP_ROLE_GGSN;
+   bool have_fd, have_ports;
bool is_ipv6 = false;
struct gtp_dev *gtp;
struct gtp_net *gn;
int hashsize, err;
 
-   if (!data[IFLA_GTP_FD0] && !data[IFLA_GTP_FD1])
+   have_fd = !!data[IFLA_GTP_FD0] || !!data[IFLA_GTP_FD1];
+   have_ports = !!data[IFLA_GTP_PORT0] || !!data[IFLA_GTP_PORT1];
+
+   if (!(have_fd ^ have_ports)) {
+   /* Either got fd(s) or port(s) */
return -EINVAL;
+   }
 
if (data[IFLA_GTP_ROLE]) {
role = nla_get_u32(data[IFLA_GTP_ROLE]);
@@ -773,7 +793,7 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
 out_hashtable:
gtp_hashtable_free(gtp);
 out_encap:
-   gtp_encap_disable(gtp);
+   gtp_encap_release(gtp);
return err;
 }
 
@@ -782,7 +802,7 @@ static void gtp_dellink(struct net_device *dev, struct 
list_head *head)
struct gtp_dev *gtp = netdev_priv(dev);
 
gro_cells_destroy(>gro_cells);
-   gtp_encap_disable(gtp);
+   gtp_encap_release(gtp);
gtp_hashtable_free(gtp);
list_del_rcu(>list);
unregister_netdevice_queue(dev, head);
@@ -793,6 +813,8 @@ static const struct nla_policy gtp_policy[IFLA_GTP_MAX + 1] 
= {
[IFLA_GTP_FD1]  = { .type = NLA_U32 },
[IFLA_GTP_PDP_HASHSIZE] = { .type = NLA_U32 },
[IFLA_GTP_ROLE] = { .type = NLA_U32 },
+   [IFLA_GTP_PORT0]= { .type = NLA_U16 },
+   [IFLA_GTP_PORT1]= { .type = NLA_U16 },
 };
 
 static int gtp_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -883,11 +905,35 @@ static void gtp_hashtable_free(struct gtp_dev *gtp)
kfree(gtp->tid_hash);
 }
 
-static struct sock *gtp_encap_enable_socket(int fd, int type,
-   struct gtp_dev

[PATCH net-next 03/14] gtp: Call common functions to get tunnel routes and add dst_cache

2017-09-18 Thread Tom Herbert

Call ip_tunnel_get_route and dst_cache to pdp context which should
improve performance by obviating the need to perform a route lookup
on every packet.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 59 ++-
 1 file changed, 32 insertions(+), 27 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index f38e32a7ec9c..95df3bcebbb2 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -63,6 +63,8 @@ struct pdp_ctx {
 
atomic_ttx_seq;
struct rcu_head rcu_head;
+
+   struct dst_cachedst_cache;
 };
 
 /* One instance of the GTP device. */
@@ -379,20 +381,6 @@ static void gtp_dev_uninit(struct net_device *dev)
free_percpu(dev->tstats);
 }
 
-static struct rtable *ip4_route_output_gtp(struct flowi4 *fl4,
-  const struct sock *sk,
-  __be32 daddr)
-{
-   memset(fl4, 0, sizeof(*fl4));
-   fl4->flowi4_oif = sk->sk_bound_dev_if;
-   fl4->daddr  = daddr;
-   fl4->saddr  = inet_sk(sk)->inet_saddr;
-   fl4->flowi4_tos = RT_CONN_FLAGS(sk);
-   fl4->flowi4_proto   = sk->sk_protocol;
-
-   return ip_route_output_key(sock_net(sk), fl4);
-}
-
 static inline void gtp0_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
 {
int payload_len = skb->len;
@@ -479,6 +467,8 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
struct rtable *rt;
struct flowi4 fl4;
struct iphdr *iph;
+   struct sock *sk;
+   __be32 saddr;
__be16 df;
int mtu;
 
@@ -498,19 +488,27 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
}
netdev_dbg(dev, "found PDP context %p\n", pctx);
 
-   rt = ip4_route_output_gtp(, pctx->sk, pctx->peer_addr_ip4.s_addr);
-   if (IS_ERR(rt)) {
-   netdev_dbg(dev, "no route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.tx_carrier_errors++;
-   goto err;
-   }
+   sk = pctx->sk;
+   saddr = inet_sk(sk)->inet_saddr;
 
-   if (rt->dst.dev == dev) {
-   netdev_dbg(dev, "circular route to SSGN %pI4\n",
-  >peer_addr_ip4.s_addr);
-   dev->stats.collisions++;
-   goto err_rt;
+   rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
+sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
+pctx->peer_addr_ip4.s_addr, ,
+pktinfo->gtph_port, pktinfo->gtph_port,
+>dst_cache, NULL);
+
+   if (IS_ERR(rt)) {
+   if (rt == ERR_PTR(-ELOOP)) {
+   netdev_dbg(dev, "circular route to SSGN %pI4\n",
+  >peer_addr_ip4.s_addr);
+   dev->stats.collisions++;
+   goto err_rt;
+   } else {
+   netdev_dbg(dev, "no route to SSGN %pI4\n",
+  >peer_addr_ip4.s_addr);
+   dev->stats.tx_carrier_errors++;
+   goto err;
+   }
}
 
skb_dst_drop(skb);
@@ -543,7 +541,7 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
goto err_rt;
}
 
-   gtp_set_pktinfo_ipv4(pktinfo, pctx->sk, iph, pctx, rt, , dev);
+   gtp_set_pktinfo_ipv4(pktinfo, sk, iph, pctx, rt, , dev);
gtp_push_header(skb, pktinfo);
 
return 0;
@@ -917,6 +915,7 @@ static int ipv4_pdp_add(struct gtp_dev *gtp, struct sock 
*sk,
struct pdp_ctx *pctx;
bool found = false;
__be32 ms_addr;
+   int err;
 
ms_addr = nla_get_be32(info->attrs[GTPA_MS_ADDRESS]);
hash_ms = ipv4_hashfn(ms_addr) % gtp->hash_size;
@@ -951,6 +950,12 @@ static int ipv4_pdp_add(struct gtp_dev *gtp, struct sock 
*sk,
if (pctx == NULL)
return -ENOMEM;
 
+   err = dst_cache_init(>dst_cache, GFP_KERNEL);
+   if (err) {
+   kfree(pctx);
+   return err;
+   }
+
sock_hold(sk);
pctx->sk = sk;
pctx->dev = gtp->dev;
-- 
2.11.0

[PATCH net-next 13/14] gtp: Support for GRO

2017-09-18 Thread Tom Herbert

Populate GRO receive and GRO complete functions for GTP-Uv0 and v1.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 204 ++
 1 file changed, 204 insertions(+)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index b53946f8b10b..2f9d810cf19f 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -429,6 +430,205 @@ static int gtp1u_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
return 1;
 }
 
+static struct sk_buff **gtp_gro_receive_finish(struct sock *sk,
+  struct sk_buff **head,
+  struct sk_buff *skb,
+  void *hdr, size_t hdrlen)
+{
+   const struct packet_offload *ptype;
+   struct sk_buff **pp;
+   __be16 type;
+
+   type = ipver_to_eth((struct iphdr *)((void *)hdr + hdrlen));
+   if (!type)
+   goto out_err;
+
+   rcu_read_lock();
+
+   ptype = gro_find_receive_by_type(type);
+   if (!ptype)
+   goto out_unlock_err;
+
+   skb_gro_pull(skb, hdrlen);
+   skb_gro_postpull_rcsum(skb, hdr, hdrlen);
+   pp = call_gro_receive(ptype->callbacks.gro_receive, head, skb);
+
+   rcu_read_unlock();
+
+   return pp;
+
+out_unlock_err:
+   rcu_read_unlock();
+out_err:
+   NAPI_GRO_CB(skb)->flush |= 1;
+   return NULL;
+}
+
+static struct sk_buff **gtp0_gro_receive(struct sock *sk,
+struct sk_buff **head,
+struct sk_buff *skb)
+{
+   struct gtp0_header *gtp0;
+   size_t len, hdrlen, off;
+   struct sk_buff *p;
+
+   off = skb_gro_offset(skb);
+   len = off + sizeof(*gtp0);
+   hdrlen = sizeof(*gtp0);
+
+   gtp0 = skb_gro_header_fast(skb, off);
+   if (skb_gro_header_hard(skb, len)) {
+   gtp0 = skb_gro_header_slow(skb, len, off);
+   if (unlikely(!gtp0))
+   goto out;
+   }
+
+   if ((gtp0->flags >> 5) != GTP_V0 || gtp0->type != GTP_TPDU)
+   goto out;
+
+   hdrlen += sizeof(*gtp0);
+
+   /* To get IP version */
+   len += sizeof(struct iphdr);
+
+   /* Now get header with GTP header an IPv4 header (for version) */
+   if (skb_gro_header_hard(skb, len)) {
+   gtp0 = skb_gro_header_slow(skb, len, off);
+   if (unlikely(!gtp0))
+   goto out;
+   }
+
+   for (p = *head; p; p = p->next) {
+   const struct gtp0_header *gtp0_t;
+
+   if (!NAPI_GRO_CB(p)->same_flow)
+   continue;
+
+   gtp0_t = (struct gtp0_header *)(p->data + off);
+
+   if (gtp0->flags != gtp0_t->flags ||
+   gtp0->type != gtp0_t->type ||
+   gtp0->flow != gtp0_t->flow ||
+   gtp0->tid != gtp0_t->tid) {
+   NAPI_GRO_CB(p)->same_flow = 0;
+   continue;
+   }
+   }
+
+   return gtp_gro_receive_finish(sk, head, skb, gtp0, hdrlen);
+
+out:
+   NAPI_GRO_CB(skb)->flush |= 1;
+
+   return NULL;
+}
+
+static struct sk_buff **gtp1u_gro_receive(struct sock *sk,
+ struct sk_buff **head,
+ struct sk_buff *skb)
+{
+   struct gtp1_header *gtp1;
+   size_t len, hdrlen, off;
+   struct sk_buff *p;
+
+   off = skb_gro_offset(skb);
+   len = off + sizeof(*gtp1);
+   hdrlen = sizeof(*gtp1);
+
+   gtp1 = skb_gro_header_fast(skb, off);
+   if (skb_gro_header_hard(skb, len)) {
+   gtp1 = skb_gro_header_slow(skb, len, off);
+   if (unlikely(!gtp1))
+   goto out;
+   }
+
+   if ((gtp1->flags >> 5) != GTP_V1 || gtp1->type != GTP_TPDU)
+   goto out;
+
+   if (gtp1->flags & GTP1_F_MASK) {
+   hdrlen += 4;
+   len += 4;
+   }
+
+   len += sizeof(struct iphdr);
+
+   /* Now get header with GTP header an IPv4 header (for version) */
+   if (skb_gro_header_hard(skb, len)) {
+   gtp1 = skb_gro_header_slow(skb, len, off);
+   if (unlikely(!gtp1))
+   goto out;
+   }
+
+   for (p = *head; p; p = p->next) {
+   const struct gtp1_header *gtp1_t;
+
+   if (!NAPI_GRO_CB(p)->same_flow)
+   continue;
+
+   gtp1_t = (struct gtp1_header *)(p->data + off);
+
+   if (gtp1->flags != gtp1_t->flags ||
+   gtp1->type != gtp1_t->type ||
+   gtp1->tid != gtp1_t->tid) {
+   NAPI_GRO_CB(p)->same_flow = 0;
+   continue;
+   }
+   }
+
+   return

[PATCH net-next 08/14] gtp: Support encpasulating over IPv6

2017-09-18 Thread Tom Herbert

Allow peers to be specified by IPv6 addresses.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 198 +--
 include/uapi/linux/gtp.h |   1 +
 include/uapi/linux/if_link.h |   3 +
 3 files changed, 158 insertions(+), 44 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 62c0c968efa6..121b41e7a901 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -61,7 +62,11 @@ struct pdp_ctx {
struct in6_addr ms_addr_ip6;
};
 
-   struct in_addr  peer_addr_ip4;
+   u16 peer_af;
+   union {
+   struct in_addr  peer_addr_ip4;
+   struct in6_addr peer_addr_ip6;
+   };
 
struct sock *sk;
struct net_device   *dev;
@@ -76,6 +81,8 @@ struct pdp_ctx {
 struct gtp_dev {
struct list_headlist;
 
+   unsigned intis_ipv6:1;
+
struct sock *sk0;
struct sock *sk1u;
 
@@ -515,8 +522,6 @@ static int gtp_xmit(struct sk_buff *skb, struct net_device 
*dev,
struct pdp_ctx *pctx)
 {
struct sock *sk = pctx->sk;
-   __be32 saddr = inet_sk(sk)->inet_saddr;
-   struct rtable *rt;
int err = 0;
 
/* Ensure there is sufficient headroom. */
@@ -526,28 +531,63 @@ static int gtp_xmit(struct sk_buff *skb, struct 
net_device *dev,
 
skb_reset_inner_headers(skb);
 
-   rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
-sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
-pctx->peer_addr_ip4.s_addr, ,
-pctx->gtp_port, pctx->gtp_port,
->dst_cache, NULL);
+   if (pctx->peer_af == AF_INET) {
+   __be32 saddr = inet_sk(sk)->inet_saddr;
+   struct rtable *rt;
 
-   if (IS_ERR(rt)) {
-   err = PTR_ERR(rt);
-   goto out_err;
-   }
+   rt = ip_tunnel_get_route(dev, skb, sk->sk_protocol,
+sk->sk_bound_dev_if, RT_CONN_FLAGS(sk),
+pctx->peer_addr_ip4.s_addr, ,
+pctx->gtp_port, pctx->gtp_port,
+>dst_cache, NULL);
+
+   if (IS_ERR(rt)) {
+   err = PTR_ERR(rt);
+   goto out_err;
+   }
+
+   skb_dst_drop(skb);
+
+   gtp_push_header(skb, pctx);
+   udp_tunnel_xmit_skb(rt, sk, skb, saddr,
+   pctx->peer_addr_ip4.s_addr,
+   0, ip4_dst_hoplimit(>dst), 0,
+   pctx->gtp_port, pctx->gtp_port,
+   false, false);
 
-   skb_dst_drop(skb);
+   netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
+  , >peer_addr_ip4.s_addr);
 
-   gtp_push_header(skb, pctx);
-   udp_tunnel_xmit_skb(rt, sk, skb, saddr,
-   pctx->peer_addr_ip4.s_addr,
-   0, ip4_dst_hoplimit(>dst), 0,
-   pctx->gtp_port, pctx->gtp_port,
-   false, false);
+#if IS_ENABLED(CONFIG_IPV6)
+   } else if (pctx->peer_af == AF_INET6) {
+   struct in6_addr saddr = inet6_sk(sk)->saddr;
+   struct dst_entry *dst;
 
-   netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
-  , >peer_addr_ip4.s_addr);
+   dst = ip6_tnl_get_route(dev, skb, sk, sk->sk_protocol,
+   sk->sk_bound_dev_if, 0,
+   0, >peer_addr_ip6, ,
+   pctx->gtp_port, pctx->gtp_port,
+   >dst_cache, NULL);
+
+   if (IS_ERR(dst)) {
+   err = PTR_ERR(dst);
+   goto out_err;
+   }
+
+   skb_dst_drop(skb);
+
+   gtp_push_header(skb, pctx);
+   udp_tunnel6_xmit_skb(dst, sk, skb, dev,
+, >peer_addr_ip6,
+0, ip6_dst_hoplimit(dst), 0,
+pctx->gtp_port, pctx->gtp_port,
+true);
+
+   netdev_dbg(dev, "gtp -> IP src: %pI6 dst: %pI6\n",
+  , >peer_addr_ip6);
+
+#endif
+   }
 
return 0;
 
@@ -652,7 +692,8 @@ static void gtp_link_setup(struct net_device *dev)
 
/* Assume largest header, ie. GTPv0. */
dev->needed_headroom= LL_MAX_HEADER +
- sizeof(struct iphdr) +
+

[PATCH net-next 14/14] gtp: GSO support

2017-09-18 Thread Tom Herbert

Need to define a gtp_gso_segment since the GTP header includes a length
field that must be set per packet. Also, GPv0 header includes a sequence
number that is incremented per packet.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 176 +++
 include/uapi/linux/if_link.h |   1 -
 2 files changed, 163 insertions(+), 14 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 2f9d810cf19f..a2c4d9804a8f 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -120,6 +120,8 @@ static u32 gtp_h_initval;
 
 static void pdp_context_delete(struct pdp_ctx *pctx);
 
+static int gtp_gso_type;
+
 static inline u32 gtp0_hashfn(u64 tid)
 {
u32 *tid32 = (u32 *) 
@@ -430,6 +432,69 @@ static int gtp1u_udp_encap_recv(struct sock *sk, struct 
sk_buff *skb)
return 1;
 }
 
+static struct sk_buff *gtp_gso_segment(struct sk_buff *skb,
+  netdev_features_t features)
+{
+   struct sk_buff *segs = ERR_PTR(-EINVAL);
+   int tnl_hlen = skb->mac_len;
+   struct gtp0_header *gtp0;
+
+   if (unlikely(!pskb_may_pull(skb, tnl_hlen)))
+   return ERR_PTR(-EINVAL);
+
+   /* Make sure we have a mininal GTP header */
+   if (unlikely(tnl_hlen < min_t(size_t, sizeof(struct gtp0_header),
+ sizeof(struct gtp1_header
+   return ERR_PTR(-EINVAL);
+
+   /* Determine version */
+   gtp0 = (struct gtp0_header *)skb->data;
+   switch (gtp0->flags >> 5) {
+   case GTP_V0: {
+   u16 tx_seq;
+
+   if (unlikely(tnl_hlen != sizeof(struct gtp0_header)))
+   return ERR_PTR(-EINVAL);
+
+   tx_seq = ntohs(gtp0->seq);
+
+   /* segment inner packet. */
+   segs = skb_mac_gso_segment(skb, features);
+   if (!IS_ERR_OR_NULL(segs)) {
+   skb = segs;
+   do {
+   gtp0 = (struct gtp0_header *)
+   skb_mac_header(skb);
+   gtp0->length = ntohs(skb->len - tnl_hlen);
+   gtp0->seq = htons(tx_seq);
+   tx_seq++;
+   } while ((skb = skb->next));
+   }
+   break;
+   }
+   case GTP_V1: {
+   struct gtp1_header *gtp1;
+
+   if (unlikely(tnl_hlen != sizeof(struct gtp1_header)))
+   return ERR_PTR(-EINVAL);
+
+   /* segment inner packet. */
+   segs = skb_mac_gso_segment(skb, features);
+   if (!IS_ERR_OR_NULL(segs)) {
+   skb = segs;
+   do {
+   gtp1 = (struct gtp1_header *)
+   skb_mac_header(skb);
+   gtp1->length = ntohs(skb->len - tnl_hlen);
+   } while ((skb = skb->next));
+   }
+   break;
+   }
+   }
+
+   return segs;
+}
+
 static struct sk_buff **gtp_gro_receive_finish(struct sock *sk,
   struct sk_buff **head,
   struct sk_buff *skb,
@@ -688,18 +753,25 @@ static inline void gtp0_push_header(struct sk_buff *skb, 
struct pdp_ctx *pctx)
 {
int payload_len = skb->len;
struct gtp0_header *gtp0;
+   u32 tx_seq;
 
gtp0 = skb_push(skb, sizeof(*gtp0));
 
gtp0->flags = 0x1e; /* v0, GTP-non-prime. */
gtp0->type  = GTP_TPDU;
gtp0->length= htons(payload_len);
-   gtp0->seq   = htons((atomic_inc_return(>tx_seq) - 1) %
-   0x);
gtp0->flow  = htons(pctx->u.v0.flow);
gtp0->number= 0xff;
gtp0->spare[0]  = gtp0->spare[1] = gtp0->spare[2] = 0xff;
gtp0->tid   = cpu_to_be64(pctx->u.v0.tid);
+
+   /* If skb is GSO allocate sequence numbers for all the segments */
+   tx_seq = skb_shinfo(skb)->gso_segs ?
+   atomic_add_return(skb_shinfo(skb)->gso_segs,
+ >tx_seq) :
+   atomic_inc_return(>tx_seq);
+
+   gtp0->seq   = (htons((u16)tx_seq) - 1) & 0x;
 }
 
 static inline void gtp1_push_header(struct sk_buff *skb, struct pdp_ctx *pctx)
@@ -737,6 +809,59 @@ static void gtp_push_header(struct sk_buff *skb, struct 
pdp_ctx *pctx)
}
 }
 
+static size_t gtp_max_header_len(int version)
+
+{
+   switch (version) {
+   case GTP_V0:
+   return sizeof(struct gtp0_header);
+   case GTP_V1:
+   return sizeof(struct gtp1_header) + 4;
+   }
+
+   /* Should not happen */
+   return 0;
+}
+
+static int gtp_build_skb(struct sk_buff *skb, struct dst_entry *dst,
+

[PATCH net-next 10/14] gtp: Add support for devnet

2017-09-18 Thread Tom Herbert

Add a net field to gtp that is derived from src_net. Use net_eq to make
cross net argument for transmit functions.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 1870469a4982..393f63cb2576 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -89,6 +89,7 @@ struct gtp_dev {
struct socket   *sock0;
struct socket   *sock1u;
 
+   struct net  *net;
struct net_device   *dev;
 
unsigned introle;
@@ -271,6 +272,7 @@ static u16 ipver_to_eth(struct iphdr *iph)
 static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
  unsigned int hdrlen, unsigned int role)
 {
+   struct gtp_dev *gtp = netdev_priv(pctx->dev);
struct pcpu_sw_netstats *stats;
u16 inner_protocol;
 
@@ -285,8 +287,7 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
 
/* Get rid of the GTP + UDP headers. */
if (iptunnel_pull_header(skb, hdrlen, inner_protocol,
-!net_eq(sock_net(pctx->sk),
-dev_net(pctx->dev
+!net_eq(gtp->net, dev_net(pctx->dev
return -1;
 
netdev_dbg(pctx->dev, "forwarding packet from GGSN to uplink\n");
@@ -532,6 +533,8 @@ static void gtp_push_header(struct sk_buff *skb, struct 
pdp_ctx *pctx)
 static int gtp_xmit(struct sk_buff *skb, struct net_device *dev,
struct pdp_ctx *pctx)
 {
+   struct gtp_dev *gtp = netdev_priv(dev);
+   bool xnet = !net_eq(gtp->net, dev_net(gtp->dev));
struct sock *sk = pctx->sk;
int err = 0;
 
@@ -564,7 +567,7 @@ static int gtp_xmit(struct sk_buff *skb, struct net_device 
*dev,
pctx->peer_addr_ip4.s_addr,
0, ip4_dst_hoplimit(>dst), 0,
pctx->gtp_port, pctx->gtp_port,
-   false, false);
+   xnet, false);
 
netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
   , >peer_addr_ip4.s_addr);
@@ -782,6 +785,7 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
 
gtp->role = role;
gtp->is_ipv6 = is_ipv6;
+   gtp->net = src_net;
 
gn = net_generic(dev_net(dev), gtp_net_id);
list_add_rcu(>list, >gtp_dev_list);
-- 
2.11.0

[PATCH net-next 12/14] gtp: Configuration for zero UDP checksum

2017-09-18 Thread Tom Herbert

Add configuration to control use of zero checksums on transmit for both
IPv4 and IPv6, and control over accepting zero IPv6 checksums on
receive.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 35 +--
 include/uapi/linux/if_link.h |  4 
 2 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 393f63cb2576..b53946f8b10b 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -75,6 +75,13 @@ struct pdp_ctx {
struct rcu_head rcu_head;
 
struct dst_cachedst_cache;
+
+   unsigned intcfg_flags;
+
+#define GTP_F_UDP_ZERO_CSUM_TX 0x1
+#define GTP_F_UDP_ZERO_CSUM6_TX0x2
+#define GTP_F_UDP_ZERO_CSUM6_RX0x4
+
 };
 
 /* One instance of the GTP device. */
@@ -536,6 +543,7 @@ static int gtp_xmit(struct sk_buff *skb, struct net_device 
*dev,
struct gtp_dev *gtp = netdev_priv(dev);
bool xnet = !net_eq(gtp->net, dev_net(gtp->dev));
struct sock *sk = pctx->sk;
+   bool udp_csum;
int err = 0;
 
/* Ensure there is sufficient headroom. */
@@ -563,11 +571,12 @@ static int gtp_xmit(struct sk_buff *skb, struct 
net_device *dev,
skb_dst_drop(skb);
 
gtp_push_header(skb, pctx);
+   udp_csum = !(pctx->cfg_flags & GTP_F_UDP_ZERO_CSUM_TX);
udp_tunnel_xmit_skb(rt, sk, skb, saddr,
pctx->peer_addr_ip4.s_addr,
0, ip4_dst_hoplimit(>dst), 0,
pctx->gtp_port, pctx->gtp_port,
-   xnet, false);
+   xnet, !udp_csum);
 
netdev_dbg(dev, "gtp -> IP src: %pI4 dst: %pI4\n",
   , >peer_addr_ip4.s_addr);
@@ -591,11 +600,12 @@ static int gtp_xmit(struct sk_buff *skb, struct 
net_device *dev,
skb_dst_drop(skb);
 
gtp_push_header(skb, pctx);
+   udp_csum = !(pctx->cfg_flags & GTP_F_UDP_ZERO_CSUM6_TX);
udp_tunnel6_xmit_skb(dst, sk, skb, dev,
 , >peer_addr_ip6,
 0, ip6_dst_hoplimit(dst), 0,
 pctx->gtp_port, pctx->gtp_port,
-true);
+!udp_csum);
 
netdev_dbg(dev, "gtp -> IP src: %pI6 dst: %pI6\n",
   , >peer_addr_ip6);
@@ -728,6 +738,7 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
 {
unsigned int role = GTP_ROLE_GGSN;
bool have_fd, have_ports;
+   unsigned int flags = 0;
bool is_ipv6 = false;
struct gtp_dev *gtp;
struct gtp_net *gn;
@@ -747,6 +758,21 @@ static int gtp_newlink(struct net *src_net, struct 
net_device *dev,
return -EINVAL;
}
 
+   if (data[IFLA_GTP_UDP_CSUM]) {
+   if (!nla_get_u8(data[IFLA_GTP_UDP_CSUM]))
+   flags |= GTP_F_UDP_ZERO_CSUM_TX;
+   }
+
+   if (data[IFLA_GTP_UDP_ZERO_CSUM6_TX]) {
+   if (nla_get_u8(data[IFLA_GTP_UDP_ZERO_CSUM6_TX]))
+   flags |= GTP_F_UDP_ZERO_CSUM6_TX;
+   }
+
+   if (data[IFLA_GTP_UDP_ZERO_CSUM6_RX]) {
+   if (nla_get_u8(data[IFLA_GTP_UDP_ZERO_CSUM6_RX]))
+   flags |= GTP_F_UDP_ZERO_CSUM6_RX;
+   }
+
if (data[IFLA_GTP_AF]) {
u16 af = nla_get_u16(data[IFLA_GTP_AF]);
 
@@ -819,6 +845,9 @@ static const struct nla_policy gtp_policy[IFLA_GTP_MAX + 1] 
= {
[IFLA_GTP_ROLE] = { .type = NLA_U32 },
[IFLA_GTP_PORT0]= { .type = NLA_U16 },
[IFLA_GTP_PORT1]= { .type = NLA_U16 },
+   [IFLA_GTP_UDP_CSUM] = { .type = NLA_U8 },
+   [IFLA_GTP_UDP_ZERO_CSUM6_TX]= { .type = NLA_U8 },
+   [IFLA_GTP_UDP_ZERO_CSUM6_RX]= { .type = NLA_U8 },
 };
 
 static int gtp_validate(struct nlattr *tb[], struct nlattr *data[],
@@ -990,6 +1019,8 @@ static struct socket *gtp_create_sock(struct net *net, 
bool ipv6,
 
if (ipv6) {
udp_conf.family = AF_INET6;
+   udp_conf.use_udp6_rx_checksums =
+   !(flags & GTP_F_UDP_ZERO_CSUM6_RX);
udp_conf.ipv6_v6only = 1;
} else {
udp_conf.family = AF_INET;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 81c26864abeb..14a32d745e24 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -555,6 +555,10 @@ enum {
IFLA_GTP_AF,
IFLA_GTP_PORT0,
IFLA_GTP_PORT1,
+   IFLA_GTP_UDP_CSUM,
+   IFLA_GTP_UDP_ZERO_CSUM6_TX,
+   IFLA_GTP_UDP_ZERO_CSUM6_RX,
+
__IFLA_GTP_MAX,
 };
 #define IFLA_GTP_MAX

[PATCH net-next 02/14] vxlan: Call common functions to get tunnel routes

2017-09-18 Thread Tom Herbert

Call ip_tunnel_get_route and ip6_tnl_get_route to handle getting a route
and dealing with the dst_cache.

Signed-off-by: Tom Herbert 
---
 drivers/net/vxlan.c | 84 -
 1 file changed, 5 insertions(+), 79 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index d7c49cf1d5e9..810caa9adf37 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -1867,47 +1867,11 @@ static struct rtable *vxlan_get_route(struct vxlan_dev 
*vxlan, struct net_device
  struct dst_cache *dst_cache,
  const struct ip_tunnel_info *info)
 {
-   bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
-   struct rtable *rt = NULL;
-   struct flowi4 fl4;
-
if (!sock4)
return ERR_PTR(-EIO);
 
-   if (tos && !info)
-   use_cache = false;
-   if (use_cache) {
-   rt = dst_cache_get_ip4(dst_cache, saddr);
-   if (rt)
-   return rt;
-   }
-
-   memset(, 0, sizeof(fl4));
-   fl4.flowi4_oif = oif;
-   fl4.flowi4_tos = RT_TOS(tos);
-   fl4.flowi4_mark = skb->mark;
-   fl4.flowi4_proto = IPPROTO_UDP;
-   fl4.daddr = daddr;
-   fl4.saddr = *saddr;
-   fl4.fl4_dport = dport;
-   fl4.fl4_sport = sport;
-
-   rt = ip_route_output_key(vxlan->net, );
-   if (likely(!IS_ERR(rt))) {
-   if (rt->dst.dev == dev) {
-   netdev_dbg(dev, "circular route to %pI4\n", );
-   ip_rt_put(rt);
-   return ERR_PTR(-ELOOP);
-   }
-
-   *saddr = fl4.saddr;
-   if (use_cache)
-   dst_cache_set_ip4(dst_cache, >dst, fl4.saddr);
-   } else {
-   netdev_dbg(dev, "no route to %pI4\n", );
-   return ERR_PTR(-ENETUNREACH);
-   }
-   return rt;
+   return ip_tunnel_get_route(dev, skb, IPPROTO_UDP, oif, tos, daddr,
+  saddr, dport, sport, dst_cache, info);
 }
 
 #if IS_ENABLED(CONFIG_IPV6)
@@ -1922,50 +1886,12 @@ static struct dst_entry *vxlan6_get_route(struct 
vxlan_dev *vxlan,
  struct dst_cache *dst_cache,
  const struct ip_tunnel_info *info)
 {
-   bool use_cache = ip_tunnel_dst_cache_usable(skb, info);
-   struct dst_entry *ndst;
-   struct flowi6 fl6;
-   int err;
-
if (!sock6)
return ERR_PTR(-EIO);
 
-   if (tos && !info)
-   use_cache = false;
-   if (use_cache) {
-   ndst = dst_cache_get_ip6(dst_cache, saddr);
-   if (ndst)
-   return ndst;
-   }
-
-   memset(, 0, sizeof(fl6));
-   fl6.flowi6_oif = oif;
-   fl6.daddr = *daddr;
-   fl6.saddr = *saddr;
-   fl6.flowlabel = ip6_make_flowinfo(RT_TOS(tos), label);
-   fl6.flowi6_mark = skb->mark;
-   fl6.flowi6_proto = IPPROTO_UDP;
-   fl6.fl6_dport = dport;
-   fl6.fl6_sport = sport;
-
-   err = ipv6_stub->ipv6_dst_lookup(vxlan->net,
-sock6->sock->sk,
-, );
-   if (unlikely(err < 0)) {
-   netdev_dbg(dev, "no route to %pI6\n", daddr);
-   return ERR_PTR(-ENETUNREACH);
-   }
-
-   if (unlikely(ndst->dev == dev)) {
-   netdev_dbg(dev, "circular route to %pI6\n", daddr);
-   dst_release(ndst);
-   return ERR_PTR(-ELOOP);
-   }
-
-   *saddr = fl6.saddr;
-   if (use_cache)
-   dst_cache_set_ip6(dst_cache, ndst, saddr);
-   return ndst;
+   return ip6_tnl_get_route(dev, skb, sock6->sock->sk, IPPROTO_UDP, oif,
+  tos, label, daddr, saddr, dport, sport,
+  dst_cache, info);
 }
 #endif
 
-- 
2.11.0

[PATCH net-next 05/14] gtp: Remove special mtu handling

2017-09-18 Thread Tom Herbert

Removes MTU handling in gtp_build_skb_ip4. This is non standard relative
to how other tunneling protocols handle MTU. The model espoused is that
the inner interface should set it's MTU to be less than the expected
path MTU on the overlay network. Path MTU discovery is not typically
used for modifying tunnel MTUs.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 1de2ea6217ea..f2089fa4f004 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -466,8 +466,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
struct iphdr *iph;
struct sock *sk;
__be32 saddr;
-   __be16 df;
-   int mtu;
 
/* Read the IP destination address and resolve the PDP context.
 * Prepend PDP header with TEI/TID from PDP ctx.
@@ -510,34 +508,6 @@ static int gtp_build_skb_ip4(struct sk_buff *skb, struct 
net_device *dev,
 
skb_dst_drop(skb);
 
-   /* This is similar to tnl_update_pmtu(). */
-   df = iph->frag_off;
-   if (df) {
-   mtu = dst_mtu(>dst) - dev->hard_header_len -
-   sizeof(struct iphdr) - sizeof(struct udphdr);
-   switch (pctx->gtp_version) {
-   case GTP_V0:
-   mtu -= sizeof(struct gtp0_header);
-   break;
-   case GTP_V1:
-   mtu -= sizeof(struct gtp1_header);
-   break;
-   }
-   } else {
-   mtu = dst_mtu(>dst);
-   }
-
-   rt->dst.ops->update_pmtu(>dst, NULL, skb, mtu);
-
-   if (!skb_is_gso(skb) && (iph->frag_off & htons(IP_DF)) &&
-   mtu < ntohs(iph->tot_len)) {
-   netdev_dbg(dev, "packet too big, fragmentation needed\n");
-   memset(IPCB(skb), 0, sizeof(*IPCB(skb)));
-   icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED,
- htonl(mtu));
-   goto err_rt;
-   }
-
gtp_set_pktinfo_ipv4(pktinfo, sk, iph, pctx, rt, , dev);
gtp_push_header(skb, pktinfo);
 
-- 
2.11.0

[PATCH net-next 11/14] net: Add a facility to support application defined GSO

2017-09-18 Thread Tom Herbert

Allow applications or encapsulation protocols to register a GSO segment
function to their specific protocol. To faciliate this I reserved the
upper four bits in the gso_type to indicate the application specific GSO
type. Zero in these bits indicates no application GSO, so there are
fifteen instance that can be defined.

An application registers a a gso_segment using the skb_gso_app_register
this takes a struct skb_gso_app that indicates a callback function as
well as a set of GSO types for which at least one must be matched before
calling he segment function. GSO returns one of the application GSO
types described above (not a fixed value for the applications).
Subsequently, when the application sends a GSO packet the application
gso_type is set in the skb gso_type along with any other types.

skb_gso_app_segment is the function called from another GSO segment
function to handle segmentation of the application or encapsulation
protocol. This function includes check flags that provides context for
the appropriate GSO instance to match. For instance, in order to handle
a protocol encapsulated in UDP (GTP for instance) skb_gso_app_segment is
call from udp_tunnel_segment and check flags would be
SKB_GSO_UDP_TUNNEL_CSUM | SKB_GSO_UDP_TUNNEL.

Signed-off-by: Tom Herbert 
---
 include/linux/netdevice.h | 31 +++
 include/linux/skbuff.h| 25 +
 net/core/dev.c| 47 +++
 net/ipv4/ip_tunnel_core.c |  6 ++
 net/ipv4/udp_offload.c| 20 +++-
 5 files changed, 124 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f535779d9dc1..f3bed4f8ba83 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3932,6 +3932,37 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
netdev_features_t features);
 
+struct skb_gso_app {
+   unsigned int check_flags;
+   struct sk_buff *(*gso_segment)(struct sk_buff *skb,
+  netdev_features_t features);
+};
+
+extern struct skb_gso_app *skb_gso_apps[];
+int skb_gso_app_register(const struct skb_gso_app *app);
+void skb_gso_app_unregister(int num, const struct skb_gso_app *app);
+
+/* rcu_read_lock() must be held */
+static inline struct skb_gso_app *skb_gso_app_lookup(struct sk_buff *skb,
+netdev_features_t features,
+unsigned int check_flags)
+{
+   struct skb_gso_app *app;
+   int type;
+
+   if (!(skb_shinfo(skb)->gso_type & SKB_GSO_APP_MASK))
+   return false;
+
+   type = skb_gso_app_to_index(skb_shinfo(skb)->gso_type);
+
+   app = rcu_dereference(skb_gso_apps[type]);
+   if (app && app->gso_segment &&
+   (check_flags & app->check_flags))
+   return app;
+
+   return NULL;
+}
+
 struct netdev_bonding_info {
ifslave slave;
ifbond  master;
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 72299ef00061..ea45fb93897c 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -535,6 +535,9 @@ enum {
SKB_FCLONE_CLONE,   /* companion fclone skb (from fclone_cache) */
 };
 
+#define SKB_GSO_APP_LOW_SHIFT  28
+#define SKB_GSO_APP_HIGH_SHIFT 31
+
 enum {
SKB_GSO_TCPV4 = 1 << 0,
 
@@ -569,8 +572,30 @@ enum {
SKB_GSO_SCTP = 1 << 14,
 
SKB_GSO_ESP = 1 << 15,
+
+   /* UDP encapsulation specific GSO consumes bits 28 through 31 */
+
+   SKB_GSO_APP_LOW = 1 << SKB_GSO_APP_LOW_SHIFT,
+
+   SKB_GSO_APP_HIGH = 1 << SKB_GSO_APP_HIGH_SHIFT,
 };
 
+#define SKB_GSO_APP_MASK ((-1U << SKB_GSO_APP_LOW_SHIFT) & \
+ (-1U >> (8*sizeof(u32) - SKB_GSO_APP_HIGH_SHIFT - 1)))
+#define SKB_GSO_APP_NUM (SKB_GSO_APP_MASK >> SKB_GSO_APP_LOW_SHIFT)
+
+static inline int skb_gso_app_to_index(unsigned int x)
+{
+   /* Caller should check that app bits are non-zero */
+
+   return ((SKB_GSO_APP_MASK & x) >> SKB_GSO_APP_LOW_SHIFT) - 1;
+}
+
+static inline int skb_gso_app_to_gso_type(unsigned int x)
+{
+   return (x + 1) << SKB_GSO_APP_LOW_SHIFT;
+}
+
 #if BITS_PER_LONG > 32
 #define NET_SKBUFF_DATA_USES_OFFSET 1
 #endif
diff --git a/net/core/dev.c b/net/core/dev.c
index fb766d906148..c77fca112e67 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -156,6 +156,7 @@
 
 static DEFINE_SPINLOCK(ptype_lock);
 static DEFINE_SPINLOCK(offload_lock);
+static DEFINE_SPINLOCK(skb_gso_app_lock);
 struct list_head ptype_base[PTYPE_HASH_SIZE] __read_mostly;
 struct list_head ptype_all __read_mostly;  /* Taps */
 static struct list_head offload_base __read_mostly;
@@ -2725,6 +2726,52 @@ struct sk_buff *skb_mac_gso_segment(struct sk_buff *skb,
 }
 EXPORT_SYMBOL(skb_mac_gso_segment);
 
+struct

[PATCH net-next 00/14] gtp: Additional feature support

2017-09-18 Thread Tom Herbert

This patch set builds upon the initial GTP implementation to make
support closer to that enjoyed by other encapsulation protocols.

The major items are:

  - IPv6 support
  - Configurable networking interfaces so that GTP kernel can be
used and tested without needing GSN network emulation (i.e. no user
space daemon needed).
  - GSO,GRO
  - Control of zero UDP checksums
  - Port numbers are configurable
  - Addition of a dst_cache in the GTP structure and other cleanup

Additionally, this patch set also includes a couple of general support
capabilities:

  - A facility that allows application specific GSO callbacks
  - Common functions to get a route fo for an IP tunnel

For IPv6 support, the mobile subscriber needs to allow IPv6 addresses,
and the remote enpoint can be IPv6.

For configurable interfaces, configuration is added to allow an
alterate means to configure a GTP and device. This follows the
typical UDP encapsulation model of specifying a listener port for
receive, and a remote address and port for transmit. 

GRO was straightfoward to implement following the model of other
UDP encapsulations.

Providing GSO support had one wrinkle-- the GTP header includes a
payload length field that needs to be set per GSO segment. In order
to address that in a general way, I create the concept of
application specific GSO.

To implement application layer GSO I reserved the top four bits of
shinfo(skb)->gso_type. The idea is that an application or encapsulation
protocol (like GTP in this case) can register a GSO segment callback.
The facility returns a gso_type with upper four bits set to a value
(index into a table). When the application sets up a packet it includes
the code in the gso_type for the skb. At some point (e.g. from UDP
segment) the gso_type is checked in the skb and if the application
specific GSO is indicated then the callback is called. The
registered callbacks include a set of other gso_types so that
an application callback can be matched to an appropriate instance.
FOr instance, the GTP callback checks for the UDP GSO flags.

Zero UDP checksum, port number configuration, and dst_cache are
straightforwad.

Configuration is performed by iproute2/ip. I will post that
in a subsequent patch set.

Tested:

Configured the matrix of IPv4/IPv6 mobile subscriber, IPv4/IPv6 remote
peer, and GTP version 0 and 1 (eight combinations). Observed
connectivity and proper GSO/GRO. Also, tested VXLAN for
regression.

Tom Herbert (14):
  iptunnel: Add common functions to get a tunnel route
  vxlan: Call common functions to get tunnel routes
  gtp: Call common functions to get tunnel routes and add dst_cache
  gtp: udp recv clean up
  gtp: Remove special mtu handling
  gtp: Eliminate pktinfo and add port configuration
  gtp: Support encapsulation of IPv6 packets
  gtp: Support encpasulating over IPv6
  gtp: Allow configuring GTP interface as standalone
  gtp: Add support for devnet
  net: Add a facility to support application defined GSO
  gtp: Configuration for zero UDP checksum
  gtp: Support for GRO
  gtp: GSO support

 drivers/net/gtp.c| 1300 --
 drivers/net/vxlan.c  |   84 +--
 include/linux/netdevice.h|   31 +
 include/linux/skbuff.h   |   25 +
 include/net/ip6_tunnel.h |   33 ++
 include/net/ip_tunnels.h |   33 ++
 include/uapi/linux/gtp.h |8 +
 include/uapi/linux/if_link.h |6 +
 net/core/dev.c   |   47 ++
 net/ipv4/ip_tunnel.c |   41 ++
 net/ipv4/ip_tunnel_core.c|6 +
 net/ipv4/udp_offload.c   |   20 +-
 net/ipv6/ip6_tunnel.c|   43 ++
 13 files changed, 1306 insertions(+), 371 deletions(-)

-- 
2.11.0

[PATCH net-next 07/14] gtp: Support encapsulation of IPv6 packets

2017-09-18 Thread Tom Herbert

Allow IPv6 mobile subscriber packets. This entails adding an IPv6 mobile
subscriber address to pdp context and IPv6 specific variants to find pdp
contexts by address.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c| 259 +--
 include/uapi/linux/gtp.h |   1 +
 2 files changed, 209 insertions(+), 51 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index a928279c382c..62c0c968efa6 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -54,9 +54,13 @@ struct pdp_ctx {
} u;
u8  gtp_version;
__be16  gtp_port;
-   u16 af;
 
-   struct in_addr  ms_addr_ip4;
+   u16 ms_af;
+   union {
+   struct in_addr  ms_addr_ip4;
+   struct in6_addr ms_addr_ip6;
+   };
+
struct in_addr  peer_addr_ip4;
 
struct sock *sk;
@@ -80,7 +84,9 @@ struct gtp_dev {
unsigned introle;
unsigned inthash_size;
struct hlist_head   *tid_hash;
-   struct hlist_head   *addr_hash;
+
+   struct hlist_head   *addr4_hash;
+   struct hlist_head   *addr6_hash;
 
struct gro_cellsgro_cells;
 };
@@ -98,6 +104,7 @@ static void pdp_context_delete(struct pdp_ctx *pctx);
 static inline u32 gtp0_hashfn(u64 tid)
 {
u32 *tid32 = (u32 *) 
+
return jhash_2words(tid32[0], tid32[1], gtp_h_initval);
 }
 
@@ -111,6 +118,11 @@ static inline u32 ipv4_hashfn(__be32 ip)
return jhash_1word((__force u32)ip, gtp_h_initval);
 }
 
+static inline u32 ipv6_hashfn(const struct in6_addr *a)
+{
+   return __ipv6_addr_jhash(a, gtp_h_initval);
+}
+
 /* Resolve a PDP context structure based on the 64bit TID. */
 static struct pdp_ctx *gtp0_pdp_find(struct gtp_dev *gtp, u64 tid)
 {
@@ -149,10 +161,10 @@ static struct pdp_ctx *ipv4_pdp_find(struct gtp_dev *gtp, 
__be32 ms_addr)
struct hlist_head *head;
struct pdp_ctx *pdp;
 
-   head = >addr_hash[ipv4_hashfn(ms_addr) % gtp->hash_size];
+   head = >addr4_hash[ipv4_hashfn(ms_addr) % gtp->hash_size];
 
hlist_for_each_entry_rcu(pdp, head, hlist_addr) {
-   if (pdp->af == AF_INET &&
+   if (pdp->ms_af == AF_INET &&
pdp->ms_addr_ip4.s_addr == ms_addr)
return pdp;
}
@@ -176,32 +188,95 @@ static bool gtp_check_ms_ipv4(struct sk_buff *skb, struct 
pdp_ctx *pctx,
return iph->saddr == pctx->ms_addr_ip4.s_addr;
 }
 
+/* Resolve a PDP context based on IPv6 address of MS. */
+static struct pdp_ctx *ipv6_pdp_find(struct gtp_dev *gtp,
+const struct in6_addr *ms_addr)
+{
+   struct hlist_head *head;
+   struct pdp_ctx *pdp;
+
+   head = >addr6_hash[ipv6_hashfn(ms_addr) % gtp->hash_size];
+
+   hlist_for_each_entry_rcu(pdp, head, hlist_addr) {
+   if (pdp->ms_af == AF_INET6 &&
+   ipv6_addr_equal(>ms_addr_ip6, ms_addr))
+   return pdp;
+   }
+
+   return NULL;
+}
+
+static bool gtp_check_ms_ipv6(struct sk_buff *skb, struct pdp_ctx *pctx,
+ unsigned int hdrlen, unsigned int role)
+{
+   struct ipv6hdr *ipv6h;
+
+   if (!pskb_may_pull(skb, hdrlen + sizeof(struct ipv6hdr)))
+   return false;
+
+   ipv6h = (struct ipv6hdr *)(skb->data + hdrlen);
+
+   if (role == GTP_ROLE_SGSN)
+   return ipv6_addr_equal(>daddr, >ms_addr_ip6);
+   else
+   return ipv6_addr_equal(>saddr, >ms_addr_ip6);
+}
+
 /* Check if the inner IP address in this packet is assigned to any
  * existing mobile subscriber.
  */
 static bool gtp_check_ms(struct sk_buff *skb, struct pdp_ctx *pctx,
 unsigned int hdrlen, unsigned int role)
 {
-   switch (ntohs(skb->protocol)) {
-   case ETH_P_IP:
+   struct iphdr *iph;
+
+   /* Minimally there needs to be an IPv4 header */
+   if (!pskb_may_pull(skb, hdrlen + sizeof(struct iphdr)))
+   return false;
+
+   iph = (struct iphdr *)(skb->data + hdrlen);
+
+   switch (iph->version) {
+   case 4:
return gtp_check_ms_ipv4(skb, pctx, hdrlen, role);
+   case 6:
+   return gtp_check_ms_ipv6(skb, pctx, hdrlen, role);
}
+
return false;
 }
 
+static u16 ipver_to_eth(struct iphdr *iph)
+{
+   switch (iph->version) {
+   case 4:
+   return htons(ETH_P_IP);
+   case 6:
+   return htons(ETH_P_IPV6);
+   default:
+   return 0;
+   }
+}
+
 static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff *skb,
-   unsigned int hdrlen, unsigned int role)
+ unsigned int hdrlen, unsigned int role)
 {
struct pcpu_sw_netstats *stats;
+   u16 inner_protocol;

[PATCH net-next 04/14] gtp: udp recv clean up

2017-09-18 Thread Tom Herbert

Create separate UDP receive functions for GTP version 0 and version 1.
Set encap_rcv appropriately when configuring a socket. Also, convert to
using gro_cells.

Signed-off-by: Tom Herbert 
---
 drivers/net/gtp.c | 130 +-
 1 file changed, 71 insertions(+), 59 deletions(-)

diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c
index 95df3bcebbb2..1de2ea6217ea 100644
--- a/drivers/net/gtp.c
+++ b/drivers/net/gtp.c
@@ -80,6 +80,8 @@ struct gtp_dev {
unsigned inthash_size;
struct hlist_head   *tid_hash;
struct hlist_head   *addr_hash;
+
+   struct gro_cellsgro_cells;
 };
 
 static unsigned int gtp_net_id __read_mostly;
@@ -217,55 +219,83 @@ static int gtp_rx(struct pdp_ctx *pctx, struct sk_buff 
*skb,
stats->rx_bytes += skb->len;
u64_stats_update_end(>syncp);
 
-   netif_rx(skb);
+   gro_cells_receive(>gro_cells, skb);
+
return 0;
 }
 
-/* 1 means pass up to the stack, -1 means drop and 0 means decapsulated. */
-static int gtp0_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
+/* UDP encapsulation receive handler for GTPv0-U . See net/ipv4/udp.c.
+ * Return codes: 0: success, <0: error, >0: pass up to userspace UDP socket.
+ */
+static int gtp0_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct gtp_dev *gtp = rcu_dereference_sk_user_data(sk);
unsigned int hdrlen = sizeof(struct udphdr) +
  sizeof(struct gtp0_header);
struct gtp0_header *gtp0;
struct pdp_ctx *pctx;
 
+   if (!gtp)
+   goto pass;
+
if (!pskb_may_pull(skb, hdrlen))
-   return -1;
+   goto drop;
 
gtp0 = (struct gtp0_header *)(skb->data + sizeof(struct udphdr));
 
if ((gtp0->flags >> 5) != GTP_V0)
-   return 1;
+   goto pass;
 
if (gtp0->type != GTP_TPDU)
-   return 1;
+   goto pass;
+
+   netdev_dbg(gtp->dev, "received GTP0 packet\n");
 
pctx = gtp0_pdp_find(gtp, be64_to_cpu(gtp0->tid));
if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-   return 1;
+   goto pass;
+   }
+
+   if (!gtp_rx(pctx, skb, hdrlen, gtp->role)) {
+   /* Successfully received */
+   return 0;
}
 
-   return gtp_rx(pctx, skb, hdrlen, gtp->role);
+drop:
+   kfree_skb(skb);
+   return 0;
+
+pass:
+   return 1;
 }
 
-static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, struct sk_buff *skb)
+/* UDP encapsulation receive handler for GTPv0-U . See net/ipv4/udp.c.
+ * Return codes: 0: success, <0: error, >0: pass up to userspace UDP socket.
+ */
+static int gtp1u_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
+   struct gtp_dev *gtp = rcu_dereference_sk_user_data(sk);
unsigned int hdrlen = sizeof(struct udphdr) +
  sizeof(struct gtp1_header);
struct gtp1_header *gtp1;
struct pdp_ctx *pctx;
 
+   if (!gtp)
+   goto pass;
+
if (!pskb_may_pull(skb, hdrlen))
-   return -1;
+   goto drop;
 
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
if ((gtp1->flags >> 5) != GTP_V1)
-   return 1;
+   goto pass;
 
if (gtp1->type != GTP_TPDU)
-   return 1;
+   goto pass;
+
+   netdev_dbg(gtp->dev, "received GTP1 packet\n");
 
/* From 29.060: "This field shall be present if and only if any one or
 * more of the S, PN and E flags are set.".
@@ -278,17 +308,27 @@ static int gtp1u_udp_encap_recv(struct gtp_dev *gtp, 
struct sk_buff *skb)
 
/* Make sure the header is larger enough, including extensions. */
if (!pskb_may_pull(skb, hdrlen))
-   return -1;
+   goto drop;
 
gtp1 = (struct gtp1_header *)(skb->data + sizeof(struct udphdr));
 
pctx = gtp1_pdp_find(gtp, ntohl(gtp1->tid));
if (!pctx) {
netdev_dbg(gtp->dev, "No PDP ctx to decap skb=%p\n", skb);
-   return 1;
+   goto pass;
+   }
+
+   if (!gtp_rx(pctx, skb, hdrlen, gtp->role)) {
+   /* Successfully received */
+   return 0;
}
 
-   return gtp_rx(pctx, skb, hdrlen, gtp->role);
+drop:
+   kfree_skb(skb);
+   return 0;
+
+pass:
+   return 1;
 }
 
 static void gtp_encap_destroy(struct sock *sk)
@@ -317,49 +357,6 @@ static void gtp_encap_disable(struct gtp_dev *gtp)
gtp_encap_disable_sock(gtp->sk1u);
 }
 
-/* UDP encapsulation receive handler. See net/ipv4/udp.c.
- * Return codes: 0: success, <0: error, >0: pass up to userspace UDP socket.
- */
-static int gtp_encap_recv(struct sock *sk, struct sk_buff *skb)
-{
-   struct gtp_dev *gtp;
-   int ret = 0;
-
-   gtp =

[PATCH net-next 01/14] iptunnel: Add common functions to get a tunnel route

2017-09-18 Thread Tom Herbert

ip_tunnel_get_route and ip6_tnl_get_route are create to return
routes for a tunnel. These functions are derived from the VXLAN
functions.

Signed-off-by: Tom Herbert 
---
 include/net/ip6_tunnel.h | 33 +
 include/net/ip_tunnels.h | 33 +
 net/ipv4/ip_tunnel.c | 41 +
 net/ipv6/ip6_tunnel.c| 43 +++
 4 files changed, 150 insertions(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 08fbc7f7d8d7..233097bf07a2 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -142,6 +142,39 @@ __u32 ip6_tnl_get_cap(struct ip6_tnl *t, const struct 
in6_addr *laddr,
 struct net *ip6_tnl_get_link_net(const struct net_device *dev);
 int ip6_tnl_get_iflink(const struct net_device *dev);
 int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu);
+struct dst_entry *__ip6_tnl_get_route(struct net_device *dev,
+ struct sk_buff *skb, struct sock *sk,
+ u8 proto, int oif, u8 tos, __be32 label,
+ const struct in6_addr *daddr,
+ struct in6_addr *saddr,
+ __be16 dport, __be16 sport,
+ struct dst_cache *dst_cache,
+ const struct ip_tunnel_info *info,
+ bool use_cache);
+
+static inline struct dst_entry *ip6_tnl_get_route(struct net_device *dev,
+   struct sk_buff *skb, struct sock *sk, u8 proto,
+   int oif, u8 tos, __be32 label,
+   const struct in6_addr *daddr,
+   struct in6_addr *saddr,
+   __be16 dport, __be16 sport,
+   struct dst_cache *dst_cache,
+   const struct ip_tunnel_info *info)
+{
+bool use_cache = (ip_tunnel_dst_cache_usable(skb, info) &&
+   (!tos || info));
+
+   if (use_cache) {
+   struct dst_entry *ndst = dst_cache_get_ip6(dst_cache, saddr);
+
+   if (ndst)
+   return ndst;
+   }
+
+   return __ip6_tnl_get_route(dev, skb, sk, proto, oif, tos, label,
+  daddr, saddr, dport, sport, dst_cache,
+  info, use_cache);
+}
 
 static inline void ip6tunnel_xmit(struct sock *sk, struct sk_buff *skb,
  struct net_device *dev)
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h
index 992652856fe8..91d5150a1044 100644
--- a/include/net/ip_tunnels.h
+++ b/include/net/ip_tunnels.h
@@ -284,6 +284,39 @@ int ip_tunnel_newlink(struct net_device *dev, struct 
nlattr *tb[],
  struct ip_tunnel_parm *p, __u32 fwmark);
 void ip_tunnel_setup(struct net_device *dev, unsigned int net_id);
 
+struct rtable *__ip_tunnel_get_route(struct net_device *dev,
+struct sk_buff *skb, u8 proto,
+int oif, u8 tos,
+__be32 daddr, __be32 *saddr,
+__be16 dport, __be16 sport,
+struct dst_cache *dst_cache,
+const struct ip_tunnel_info *info,
+bool use_cache);
+
+static inline struct rtable *ip_tunnel_get_route(struct net_device *dev,
+struct sk_buff *skb, u8 proto,
+int oif, u8 tos,
+__be32 daddr, __be32 *saddr,
+__be16 dport, __be16 sport,
+struct dst_cache *dst_cache,
+const struct ip_tunnel_info *info)
+{
+   bool use_cache = (ip_tunnel_dst_cache_usable(skb, info) &&
+   (!tos || info));
+
+   if (use_cache) {
+   struct rtable *rt;
+
+   rt = dst_cache_get_ip4(dst_cache, saddr);
+   if (rt)
+   return rt;
+   }
+
+   return __ip_tunnel_get_route(dev, skb, proto, oif, tos,
+daddr, saddr, dport, sport,
+dst_cache, info, use_cache);
+}
+
 struct ip_tunnel_encap_ops {
size_t (*encap_hlen)(struct ip_tunnel_encap *e);
int (*build_header)(struct sk_buff *skb, struct ip_tunnel_encap *e,
diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index e9805ad664ac..f0f35333febd 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -935,6 +935,47 @@ int ip_tunnel_ioctl(struct net_device *dev, struct 
ip_tunnel_parm *p, int cmd)
 }
 EXPORT_SYMBOL_GPL(ip_tunnel_ioctl);
 
+struct rtable

Re: [PATCH net] tcp: remove two unused functions

2017-09-18 Thread David Miller

From: Yuchung Cheng 
Date: Mon, 18 Sep 2017 11:05:16 -0700

> remove tcp_may_send_now and tcp_snd_test that are no longer used
> 
> Fixes: 840a3cbe8969 ("tcp: remove forward retransmit feature")
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Eric Dumazet 

Applied, thanks.

Re: [PATCH net 1/3] net: mvpp2: fix the dma_mask and coherent_dma_mask settings for PPv2.2

2017-09-18 Thread David Miller

From: Antoine Tenart 
Date: Mon, 18 Sep 2017 15:04:06 +0200

> The dev->dma_mask usually points to dev->coherent_dma_mask. This is an
> issue as setting both of them will override the other. This is
> problematic here as the PPv2 driver uses a 32-bit-mask for coherent
> accesses (txq, rxq, bm) and a 40-bit mask for all other accesses due to
> an hardware limitation.
> 
> This can lead to a memory remap for all dma_map_single() calls when
> dealing with memory above 4GB.
> 
> Fixes: 2067e0a13cfe ("net: mvpp2: set dma mask and coherent dma mask on 
> PPv2.2")
> Reported-by: Stefan Chulski 
> Signed-off-by: Antoine Tenart 

Yikes.

I surrmise that if the platform has made dev->dma_mask point to
>coherent_dma_mask, it is because it does not allow the two
settings to be set separately.

By rearranging the pointer, you are bypassing that, and probably
breaking things or creating a situation that the DMA mapping
layer is not expecting.

I want to know more about the situations where dma_mask is set to
point to _dma_mask and how that is supposed to work.

At a minimum this commit log message needs to go into more detail.

Thanks.

RE: [Intel-wired-lan] [PATCH 2/5] e1000e: Fix wrong comment related to link detection

2017-09-18 Thread Brown, Aaron F

> From: Intel-wired-lan [mailto:intel-wired-lan-boun...@osuosl.org] On Behalf
> Of Benjamin Poirier
> Sent: Friday, July 21, 2017 11:36 AM
> To: Kirsher, Jeffrey T 
> Cc: netdev@vger.kernel.org; intel-wired-...@lists.osuosl.org; linux-
> ker...@vger.kernel.org; Lennart Sorensen 
> Subject: [Intel-wired-lan] [PATCH 2/5] e1000e: Fix wrong comment related to
> link detection
> 
> Reading e1000e_check_for_copper_link() shows that get_link_status is set to
> false after link has been detected. Therefore, it stays TRUE until then.
> 
> Signed-off-by: Benjamin Poirier 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 

Tested-by: Aaron Brown

Re: [PATCH] bpf: devmap: pass on return value of bpf_map_precharge_memlock

2017-09-18 Thread David Miller

From: Tobias Klauser 
Date: Mon, 18 Sep 2017 15:03:46 +0200

> If bpf_map_precharge_memlock in dev_map_alloc, -ENOMEM is returned
> regardless of the actual error produced by bpf_map_precharge_memlock.
> Fix it by passing on the error returned by bpf_map_precharge_memlock.
> 
> Also return -EINVAL instead of -ENOMEM if the page count overflow check
> fails.
> 
> This makes dev_map_alloc match the behavior of other bpf maps' alloc
> functions wrt. return values.
> 
> Signed-off-by: Tobias Klauser 

Applied, thank you.

Re: [net PATCH] bnxt_en: check for ingress qdisc in flower offload

2017-09-18 Thread David Miller

From: Sathya Perla 
Date: Mon, 18 Sep 2017 17:05:37 +0530

> Check for ingress-only qdisc for flower offload, as other qdiscs
> are not supported for flower offload.
> 
> Suggested-by: Jiri Pirko 
> Signed-off-by: Sathya Perla 

Applied, thanks.

Re: [PATCH] net_sched: use explicit size of struct tcmsg, remove need to declare tcm

2017-09-18 Thread David Miller

From: Colin King 
Date: Mon, 18 Sep 2017 12:40:38 +0100

> From: Colin Ian King 
> 
> Pointer tcm is being initialized and is never read, it is only being used
> to determine the size of struct tcmsg.  Clean this up by removing
> variable tcm and explicitly using the sizeof struct tcmsg rather than *tcm.
> Cleans up clang warning:
> 
> warning: Value stored to 'tcm' during its initialization is never read
> 
> Signed-off-by: Colin Ian King 

Applied to net-next.

Re: [PATCH net-next v2 0/7] korina: performance fixes and cleanup

2017-09-18 Thread David Miller

From: Roman Yeryomin 
Date: Sun, 17 Sep 2017 20:23:53 +0300

> Changes from v1:
> - use GRO instead of increasing ring size
> - use NAPI_POLL_WEIGHT instead of defining own NAPI_WEIGHT
> - optimize rx descriptor flags processing

Series applied, thank you.

Re: [PATCH RFC 6/6] Modify tag_ksz.c to support other KSZ switch drivers

2017-09-18 Thread Florian Fainelli

On 09/18/2017 04:44 PM, tristram...@microchip.com wrote:
>>> In the old DSA implementation all the ports are partitioned into its own
>> device
>>> and the bridge joining them will do all the forwarding.  This is useful for
>> quick
>>> testing with some protocols like RSTP but it is probably useless for real
>>> operation.
>>
>> It is a good minimal driver, to get something into the kernel. You can
>> then add features to it.
>>
>>> The new switchdev model tries to use the switch hardware as much as
>>> possible.  This offload_fwd_mark bit means the frame is forwarded by the
>>> hardware switch, so the software bridge does not need to do it again.
>> Without
>>> this bit there will be duplicated multicast frames coming out the ports if
>> internal
>>> forwarding is enabled.
>>
>> Correct. Once you switch driver is clever enough, you can enable
>> offload_fwd_mark.
>>
>>> When RSTP is used the port can be put in blocked state and so the
>> forwarding
>>> will stop for that port.   Currently the switch driver will check that
>> membership
>>> to decide whether to set that bit.
>>
>> This i don't get. RSTP or STP just break loops. How does RSTP vs STP
>> mean you need to set offload_fwd_mark differently?
>>
> 
> The logic of the switch driver is if the membership of the port receiving
> the frame contains other ports--not counting cpu port--the bit
> offload_fwd_mark is set.  In RSTP closing the blocked port is generally good
> enough, but there are exceptions, so the port is removed from the
> membership of other forwarding ports.  A disabled port will have its
> membership completely reset so it cannot receive anything.  It does not
> matter much in RSTP as the software bridge should know whether to forward
> the frame or not.
> 
> We are back to square one.  Is there any plan to add this offload_fwd_mark
> support to DSA driver so that it can be reported properly?  It can be set all 
> the
> time, except during port initialization or before bridge creation the 
> forwarding
> state does not reflect reality.
> 
> If not the port membership can be fixed and there is no internal switch
> forwarding, leaving everything handled by the software bridge.

I am not really sure why this is such a concern for you so soon when
your driver is not even included yet. You should really aim for baby
steps here: get the basic driver(s) included, with a limited set of
features, and gradually add more features to the driver. When
fwd_offload_mark and RSTP become a real problem, we can most
definitively find a way to fix those in DSA and depending drivers.
-- 
Florian

RE: [PATCH RFC 6/6] Modify tag_ksz.c to support other KSZ switch drivers

2017-09-18 Thread Tristram.Ha

> > In the old DSA implementation all the ports are partitioned into its own
> device
> > and the bridge joining them will do all the forwarding.  This is useful for
> quick
> > testing with some protocols like RSTP but it is probably useless for real
> > operation.
> 
> It is a good minimal driver, to get something into the kernel. You can
> then add features to it.
> 
> > The new switchdev model tries to use the switch hardware as much as
> > possible.  This offload_fwd_mark bit means the frame is forwarded by the
> > hardware switch, so the software bridge does not need to do it again.
> Without
> > this bit there will be duplicated multicast frames coming out the ports if
> internal
> > forwarding is enabled.
> 
> Correct. Once you switch driver is clever enough, you can enable
> offload_fwd_mark.
> 
> > When RSTP is used the port can be put in blocked state and so the
> forwarding
> > will stop for that port.   Currently the switch driver will check that
> membership
> > to decide whether to set that bit.
> 
> This i don't get. RSTP or STP just break loops. How does RSTP vs STP
> mean you need to set offload_fwd_mark differently?
> 

The logic of the switch driver is if the membership of the port receiving
the frame contains other ports--not counting cpu port--the bit
offload_fwd_mark is set.  In RSTP closing the blocked port is generally good
enough, but there are exceptions, so the port is removed from the
membership of other forwarding ports.  A disabled port will have its
membership completely reset so it cannot receive anything.  It does not
matter much in RSTP as the software bridge should know whether to forward
the frame or not.

We are back to square one.  Is there any plan to add this offload_fwd_mark
support to DSA driver so that it can be reported properly?  It can be set all 
the
time, except during port initialization or before bridge creation the forwarding
state does not reflect reality.

If not the port membership can be fixed and there is no internal switch
forwarding, leaving everything handled by the software bridge.

Re: [PATCH] hamradio: baycom: use new parport device model

2017-09-18 Thread David Miller

From: Sudip Mukherjee 
Date: Sun, 17 Sep 2017 12:46:20 +0100

> Modify baycom driver to use the new parallel port device model.
> 
> Signed-off-by: Sudip Mukherjee 

Applied to net-next, thanks.

Re: [PATCH] Documentation: networking: fix ASCII art in switchdev.txt

2017-09-18 Thread David Miller

From: Randy Dunlap 
Date: Sat, 16 Sep 2017 13:10:06 -0700

> From: Randy Dunlap 
> 
> Fix ASCII art in Documentation/networking/switchdev.txt:
> 
> Change non-ASCII "spaces" to ASCII spaces.
> 
> Change 2 erroneous '+' characters in ASCII art to '-' (at the '*'
> characters below):
> 
> line 32:
>  +--++++-*--++---+  +-+-+
> line 41:
>  +--+---*+
> 
> Signed-off-by: Randy Dunlap 

Applied, thanks Randy.

[PATCH net] bpf: one perf event close won't free bpf program attached by another perf event

2017-09-18 Thread Yonghong Song

This patch fixes a bug exhibited by the following scenario:
  1. fd1 = perf_event_open with attr.config = ID1
  2. attach bpf program prog1 to fd1
  3. fd2 = perf_event_open with attr.config = ID1
 
  4. user program closes fd2 and prog1 is detached from the tracepoint.
  5. user program with fd1 does not work properly as tracepoint
 no output any more.

The issue happens at step 4. Multiple perf_event_open can be called
successfully, but only one bpf prog pointer in the tp_event. In the
current logic, any fd release for the same tp_event will free
the tp_event->prog.

The fix is to free tp_event->prog only when the closing fd
corresponds to the one which registered the program.

Signed-off-by: Yonghong Song 
---
 Additional context: discussed with Alexei internally but did not find
 a solution which can avoid introducing the additional field in
 trace_event_call structure.

 Peter, could you take a look as well and maybe you could have better
 alternative? Thanks!

 include/linux/trace_events.h | 1 +
 kernel/events/core.c | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 7f11050..2e0f222 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -272,6 +272,7 @@ struct trace_event_call {
int perf_refcount;
struct hlist_head __percpu  *perf_events;
struct bpf_prog *prog;
+   struct perf_event   *bpf_prog_owner;
 
int (*perf_perm)(struct trace_event_call *,
 struct perf_event *);
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3e691b7..6bc21e2 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8171,6 +8171,7 @@ static int perf_event_set_bpf_prog(struct perf_event 
*event, u32 prog_fd)
}
}
event->tp_event->prog = prog;
+   event->tp_event->bpf_prog_owner = event;
 
return 0;
 }
@@ -8185,7 +8186,7 @@ static void perf_event_free_bpf_prog(struct perf_event 
*event)
return;
 
prog = event->tp_event->prog;
-   if (prog) {
+   if (prog && event->tp_event->bpf_prog_owner == event) {
event->tp_event->prog = NULL;
bpf_prog_put(prog);
}
-- 
2.9.5

Re: [PATCH net] net/sched: cls_matchall: fix crash when used with classful qdisc

2017-09-18 Thread David Miller

From: Davide Caratti 
Date: Sat, 16 Sep 2017 14:02:21 +0200

> this script, edited from Linux Advanced Routing and Traffic Control guide
> 
> tc q a dev en0 root handle 1: htb default a
> tc c a dev en0 parent 1:  classid 1:1 htb rate 6mbit burst 15k
> tc c a dev en0 parent 1:1 classid 1:a htb rate 5mbit ceil 6mbit burst 15k
> tc c a dev en0 parent 1:1 classid 1:b htb rate 1mbit ceil 6mbit burst 15k
> tc f a dev en0 parent 1:0 prio 1 $clsname $clsargs classid 1:b
> ping $address -c1
> tc -s c s dev en0
> 
> classifies traffic to 1:b or 1:a, depending on whether the packet matches
> or not the pattern $clsargs of filter $clsname. However, when $clsname is
> 'matchall', a systematic crash can be observed in htb_classify(). HTB and
> classful qdiscs don't assign initial value to struct tcf_result, but then
> they expect it to contain valid values after filters have been run. Thus,
> current 'matchall' ignores the TCA_MATCHALL_CLASSID attribute, configured
> by user, and makes HTB (and classful qdiscs) dereference random pointers.
> 
> By assigning head->res to *res in mall_classify(), before the actions are
> invoked, we fix this crash and enable TCA_MATCHALL_CLASSID functionality,
> that had no effect on 'matchall' classifier since its first introduction.
> 
> BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1460213
> Reported-by: Jiri Benc 
> Fixes: b87f7936a932 ("net/sched: introduce Match-all classifier")
> Signed-off-by: Davide Caratti 

Applied and queued up for -stable, thanks.

Re: [PATCH net] ip6_tunnel: do not allow loading ip6_tunnel if ipv6 is disabled in cmdline

2017-09-18 Thread David Miller

From: Xin Long 
Date: Fri, 15 Sep 2017 15:58:33 +0800

> If ipv6 has been disabled from cmdline since kernel started, it makes
> no sense to allow users to create any ip6 tunnel. Otherwise, it could
> some potential problem.
> 
> Jianlin found a kernel crash caused by this in ip6_gre when he set
> ipv6.disable=1 in grub:
> 
> [  209.588865] Unable to handle kernel paging request for data at address 
> 0x0080
> [  209.588872] Faulting instruction address: 0xc0a3aa6c
> [  209.588879] Oops: Kernel access of bad area, sig: 11 [#1]
> [  209.589062] NIP [c0a3aa6c] fib_rules_lookup+0x4c/0x260
> [  209.589071] LR [c0b9ad90] fib6_rule_lookup+0x50/0xb0
> [  209.589076] Call Trace:
> [  209.589097] fib6_rule_lookup+0x50/0xb0
> [  209.589106] rt6_lookup+0xc4/0x110
> [  209.589116] ip6gre_tnl_link_config+0x214/0x2f0 [ip6_gre]
> [  209.589125] ip6gre_newlink+0x138/0x3a0 [ip6_gre]
> [  209.589134] rtnl_newlink+0x798/0xb80
> [  209.589142] rtnetlink_rcv_msg+0xec/0x390
> [  209.589151] netlink_rcv_skb+0x138/0x150
> [  209.589159] rtnetlink_rcv+0x48/0x70
> [  209.589169] netlink_unicast+0x538/0x640
> [  209.589175] netlink_sendmsg+0x40c/0x480
> [  209.589184] ___sys_sendmsg+0x384/0x4e0
> [  209.589194] SyS_sendmsg+0xd4/0x140
> [  209.589201] SyS_socketcall+0x3e0/0x4f0
> [  209.589209] system_call+0x38/0xe0
> 
> This patch is to return -EOPNOTSUPP in ip6_tunnel_init if ipv6 has been
> disabled from cmdline.
> 
> Reported-by: Jianlin Shi 
> Signed-off-by: Xin Long 

Applied and queued up for -stable, thanks.

Re: [PATCH net] net: phy: Fix mask value write on gmii2rgmii converter speed register

2017-09-18 Thread David Miller

From: Fahad Kunnathadi 
Date: Fri, 15 Sep 2017 12:01:58 +0530

> To clear Speed Selection in MDIO control register(0x10),
> ie, clear bits 6 and 13 to zero while keeping other bits same.
> Before AND operation,The Mask value has to be perform with bitwise NOT
> operation (ie, ~ operator)
> 
> This patch clears current speed selection before writing the
> new speed settings to gmii2rgmii converter
> 
> Fixes: f411a6160bd4 ("net: phy: Add gmiitorgmii converter support")
> 
> Signed-off-by: Fahad Kunnathadi 
> Reviewed-by: Andrew Lunn 

Applied and queued up for -stable, thanks.

[PATCH net] net: systemport: Fix 64-bit statistics dependency

2017-09-18 Thread Florian Fainelli

There are several problems with commit 10377ba7673d ("net: systemport:
Support 64bit statistics", first one got fixed in 7095c973453e ("net:
systemport: Fix 64-bit stats deadlock").

The second problem is that this specific code updates the
stats64.tx_{packets,bytes} from ndo_get_stats64() and that is what we
are returning to ethtool -S. If we are not running a tool that involves
calling ndo_get_stats64(), then we won't get updated ethtool stats.

The solution to this is to update the stats from both call sites,
factoring that into a specific function, While at it, don't just check
the sizeof() but also the type of the statistics in order to use the
64-bit stats seqlock.

Fixes: 10377ba7673d ("net: systemport: Support 64bit statistics"
Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/broadcom/bcmsysport.c | 52 ++
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
b/drivers/net/ethernet/broadcom/bcmsysport.c
index c3c53f6cd9e6..83eec9a8c275 100644
--- a/drivers/net/ethernet/broadcom/bcmsysport.c
+++ b/drivers/net/ethernet/broadcom/bcmsysport.c
@@ -432,6 +432,27 @@ static void bcm_sysport_update_mib_counters(struct 
bcm_sysport_priv *priv)
netif_dbg(priv, hw, priv->netdev, "updated MIB counters\n");
 }
 
+static void bcm_sysport_update_tx_stats(struct bcm_sysport_priv *priv,
+   u64 *tx_bytes, u64 *tx_packets)
+{
+   struct bcm_sysport_tx_ring *ring;
+   u64 bytes = 0, packets = 0;
+   unsigned int start;
+   unsigned int q;
+
+   for (q = 0; q < priv->netdev->num_tx_queues; q++) {
+   ring = >tx_rings[q];
+   do {
+   start = u64_stats_fetch_begin_irq(>syncp);
+   bytes = ring->bytes;
+   packets = ring->packets;
+   } while (u64_stats_fetch_retry_irq(>syncp, start));
+
+   *tx_bytes += bytes;
+   *tx_packets += packets;
+   }
+}
+
 static void bcm_sysport_get_stats(struct net_device *dev,
  struct ethtool_stats *stats, u64 *data)
 {
@@ -439,11 +460,16 @@ static void bcm_sysport_get_stats(struct net_device *dev,
struct bcm_sysport_stats64 *stats64 = >stats64;
struct u64_stats_sync *syncp = >syncp;
struct bcm_sysport_tx_ring *ring;
+   u64 tx_bytes = 0, tx_packets = 0;
unsigned int start;
int i, j;
 
-   if (netif_running(dev))
+   if (netif_running(dev)) {
bcm_sysport_update_mib_counters(priv);
+   bcm_sysport_update_tx_stats(priv, _bytes, _packets);
+   stats64->tx_bytes = tx_bytes;
+   stats64->tx_packets = tx_packets;
+   }
 
for (i =  0, j = 0; i < BCM_SYSPORT_STATS_LEN; i++) {
const struct bcm_sysport_stats *s;
@@ -461,12 +487,13 @@ static void bcm_sysport_get_stats(struct net_device *dev,
continue;
p += s->stat_offset;
 
-   if (s->stat_sizeof == sizeof(u64))
+   if (s->stat_sizeof == sizeof(u64) &&
+   s->type == BCM_SYSPORT_STAT_NETDEV64) {
do {
start = u64_stats_fetch_begin_irq(syncp);
data[i] = *(u64 *)p;
} while (u64_stats_fetch_retry_irq(syncp, start));
-   else
+   } else
data[i] = *(u32 *)p;
j++;
}
@@ -1716,27 +1743,12 @@ static void bcm_sysport_get_stats64(struct net_device 
*dev,
 {
struct bcm_sysport_priv *priv = netdev_priv(dev);
struct bcm_sysport_stats64 *stats64 = >stats64;
-   struct bcm_sysport_tx_ring *ring;
-   u64 tx_packets = 0, tx_bytes = 0;
unsigned int start;
-   unsigned int q;
 
netdev_stats_to_stats64(stats, >stats);
 
-   for (q = 0; q < dev->num_tx_queues; q++) {
-   ring = >tx_rings[q];
-   do {
-   start = u64_stats_fetch_begin_irq(>syncp);
-   tx_bytes = ring->bytes;
-   tx_packets = ring->packets;
-   } while (u64_stats_fetch_retry_irq(>syncp, start));
-
-   stats->tx_bytes += tx_bytes;
-   stats->tx_packets += tx_packets;
-   }
-
-   stats64->tx_bytes = stats->tx_bytes;
-   stats64->tx_packets = stats->tx_packets;
+   bcm_sysport_update_tx_stats(priv, >tx_bytes,
+   >tx_packets);
 
do {
start = u64_stats_fetch_begin_irq(>syncp);
-- 
2.9.3

Re: [PATCH net-next 00/12] net: dsa: b53/bcm_sf2 cleanups

2017-09-18 Thread David Miller

From: Florian Fainelli 
Date: Mon, 18 Sep 2017 14:46:48 -0700

> On 09/18/2017 02:41 PM, Florian Fainelli wrote:
>> Hi all,
>> 
>> This patch series is a first pass set of clean-ups to reduce the number of 
>> LOCs
>> between b53 and bcm_sf2 and sharing as many functions as possible.
>> 
>> There is a number of additional cleanups queued up locally that require more
>> thorough testing.
> 
> David, I just spotted a missing EXPORT_SYMBOL() in patch 8 that was not
> flagged since I had temporarily disabled modular build, I will resubmit
> this shortly after checking the other patches too. Thanks!

Ok.

Re: [PATCH 16/16] thunderbolt: Add support for networking over Thunderbolt cable

2017-09-18 Thread Andrew Lunn

On Mon, Sep 18, 2017 at 06:30:49PM +0300, Mika Westerberg wrote:
> From: Amir Levy 
> 
> ThunderboltIP is a protocol created by Apple to tunnel IP/ethernet
> traffic over a Thunderbolt cable. The protocol consists of configuration
> phase where each side sends ThunderboltIP login packets (the protocol is
> determined by UUID in the XDomain packet header) over the configuration
> channel. Once both sides get positive acknowledgment to their login
> packet, they configure high-speed DMA path accordingly. This DMA path is
> then used to transmit and receive networking traffic.
> 
> This patch creates a virtual ethernet interface the host software can
> use in the same way as any other networking interface. Once the
> interface is brought up successfully network packets get tunneled over
> the Thunderbolt cable to the remote host and back.
> 
> The connection is terminated by sending a ThunderboltIP logout packet
> over the configuration channel. We do this when the network interface is
> brought down by user or the driver is unloaded.
> 
> Signed-off-by: Amir Levy 
> Signed-off-by: Michael Jamet 
> Signed-off-by: Mika Westerberg 
> Reviewed-by: Yehezkel Bernat 
> ---
>  Documentation/admin-guide/thunderbolt.rst |   24 +
>  drivers/thunderbolt/Kconfig   |   12 +
>  drivers/thunderbolt/Makefile  |3 +
>  drivers/thunderbolt/net.c | 1392 
> +
>  4 files changed, 1431 insertions(+)
>  create mode 100644 drivers/thunderbolt/net.c

Hi Mika

Could this be renamed to driver/net/thunderbolt.c?

At minimum, it needs a MAINTAINER entry pointing to netdev, so patches
get reviewed by netdev people. However, since the driver seems to be a
lot more netdev than thunderbolt, placing it in driver/net could be
better.

Thanks
Andrew

Re: [RFC net-next 0/5] TSN: Add qdisc-based config interfaces for traffic shapers

2017-09-18 Thread Vinicius Costa Gomes

Hi Richard,

Richard Cochran  writes:

> On Thu, Aug 31, 2017 at 06:26:20PM -0700, Vinicius Costa Gomes wrote:
>>  * Time-aware shaper (802.1Qbv):
>
> I just posted a working alternative showing how to handle 802.1Qbv and
> many other Ethernet field buses.
>
>>The idea we are currently exploring is to add a "time-aware", priority 
>> based
>>qdisc, that also exposes the Tx queues available and provides a mechanism 
>> for
>>mapping priority <-> traffic class <-> Tx queues in a similar fashion as
>>mqprio. We are calling this qdisc 'taprio', and its 'tc' cmd line would 
>> be:
>>
>>$ $ tc qdisc add dev ens4 parent root handle 100 taprio num_tc 4\
>> map 2 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \
>> queues 0 1 2 3  \
>> sched-file gates.sched [base-time ]   \
>>[cycle-time ] [extension-time ]
>>
>> is multi-line, with each line being of the following format:
>>  
>>
>>Qbv only defines one : "S" for 'SetGates'
>>
>>For example:
>>
>>S 0x01 300
>>S 0x03 500
>>
>>This means that there are two intervals, the first will have the gate
>>for traffic class 0 open for 300 nanoseconds, the second will have
>>both traffic classes open for 500 nanoseconds.
>
> The idea of the schedule file will not work in practice.  Consider the
> fact that the application wants to deliver time critical data in a
> particular slot.  How can it find out a) what the time slots are and
> b) when the next slot is scheduled?  With this Qdisc, it cannot do
> this, AFAICT.  The admin might delete the file after configuring the
> Qdisc!

That's the point, the application does not need to know that, and asking
that would be stupid. From the point of view of the Qbv specification,
applications only need to care about its basic bandwidth requirements:
its interval, frame size, frames per interval (using the terms of the
SRP section of 802.1Q). The traffic schedule is provided (off band) by a
"god box" which knows all the requirements of all applications in all
the nodes and how they are connected.

(And that's another nice point of how 802.1Qbv works, applications do
not need to be changed to use it, and I think we should work to achieve
this on the Linux side)

That being said, that only works for kinds of traffic that maps well to
this configuration in advance model, which is the model that the IEEE
(see 802.1Qcc) and the AVNU Alliance[1] are pushing for.

In the real world, I can see multiple types of applications, some using
something like TXTIME, and some configured in advance.

>
> Using the SO_TXTIME option, the application has total control over the
> scheduling.  The great advantages of this approach is that we can
> support any possible combination of periodic or aperiodic scheduling
> and we can support any priority scheme user space dreams up.

It has the disavantage of that the scheduling information has to be
in-band with the data. I *really* think that for scheduled traffic,
there should be a clear separation, we should not mix the dataflow with
scheduling. In short, an application in the network don't need to have
all the information necessary to schedule its own traffic well.

I have two points here: 1. I see both "solutions" (taprio and SO_TXTIME)
as being ortoghonal and useful, both; 2. trying to make one do the job
of the other, however, looks like "If all I have is a hammer, everything
looks like a nail".

In short, I see a per-packet transmission time and a per-queue schedule
as solutions to different problems.

>
> For example, one can imaging running two or more loops that only
> occasionally collide.  When they do collide, which packet should be
> sent first?  Just let user space decide.
>
> Thanks,
> Richard

Cheers,
--
Vinicius

[1]
http://avnu.org/theory-of-operation-for-tsn-enabled-industrial-systems/

[PATCH net-next v3 2/4] bpf: add a test case for helper bpf_perf_event_read_value

2017-09-18 Thread Yonghong Song

The bpf sample program tracex6 is enhanced to use the new
helper to read enabled/running time as well.

Signed-off-by: Yonghong Song 
---
 samples/bpf/tracex6_kern.c| 26 ++
 samples/bpf/tracex6_user.c| 13 -
 tools/include/uapi/linux/bpf.h|  3 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 4 files changed, 43 insertions(+), 2 deletions(-)

diff --git a/samples/bpf/tracex6_kern.c b/samples/bpf/tracex6_kern.c
index e7d1803..46c557a 100644
--- a/samples/bpf/tracex6_kern.c
+++ b/samples/bpf/tracex6_kern.c
@@ -15,6 +15,12 @@ struct bpf_map_def SEC("maps") values = {
.value_size = sizeof(u64),
.max_entries = 64,
 };
+struct bpf_map_def SEC("maps") values2 = {
+   .type = BPF_MAP_TYPE_HASH,
+   .key_size = sizeof(int),
+   .value_size = sizeof(struct bpf_perf_event_value),
+   .max_entries = 64,
+};
 
 SEC("kprobe/htab_map_get_next_key")
 int bpf_prog1(struct pt_regs *ctx)
@@ -37,5 +43,25 @@ int bpf_prog1(struct pt_regs *ctx)
return 0;
 }
 
+SEC("kprobe/htab_map_lookup_elem")
+int bpf_prog2(struct pt_regs *ctx)
+{
+   u32 key = bpf_get_smp_processor_id();
+   struct bpf_perf_event_value *val, buf;
+   int error;
+
+   error = bpf_perf_event_read_value(, key, , sizeof(buf));
+   if (error)
+   return 0;
+
+   val = bpf_map_lookup_elem(, );
+   if (val)
+   *val = buf;
+   else
+   bpf_map_update_elem(, , , BPF_NOEXIST);
+
+   return 0;
+}
+
 char _license[] SEC("license") = "GPL";
 u32 _version SEC("version") = LINUX_VERSION_CODE;
diff --git a/samples/bpf/tracex6_user.c b/samples/bpf/tracex6_user.c
index a05a99a..3341a96 100644
--- a/samples/bpf/tracex6_user.c
+++ b/samples/bpf/tracex6_user.c
@@ -22,6 +22,7 @@
 
 static void check_on_cpu(int cpu, struct perf_event_attr *attr)
 {
+   struct bpf_perf_event_value value2;
int pmu_fd, error = 0;
cpu_set_t set;
__u64 value;
@@ -46,8 +47,18 @@ static void check_on_cpu(int cpu, struct perf_event_attr 
*attr)
fprintf(stderr, "Value missing for CPU %d\n", cpu);
error = 1;
goto on_exit;
+   } else {
+   fprintf(stderr, "CPU %d: %llu\n", cpu, value);
+   }
+   /* The above bpf_map_lookup_elem should trigger the second kprobe */
+   if (bpf_map_lookup_elem(map_fd[2], , )) {
+   fprintf(stderr, "Value2 missing for CPU %d\n", cpu);
+   error = 1;
+   goto on_exit;
+   } else {
+   fprintf(stderr, "CPU %d: counter: %llu, enabled: %llu, running: 
%llu\n", cpu,
+   value2.counter, value2.enabled, value2.running);
}
-   fprintf(stderr, "CPU %d: %llu\n", cpu, value);
 
 on_exit:
assert(bpf_map_delete_elem(map_fd[0], ) == 0 || error);
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 461811e..79eb529 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -632,7 +632,8 @@ union bpf_attr {
FN(skb_adjust_room),\
FN(redirect_map),   \
FN(sk_redirect_map),\
-   FN(sock_map_update),
+   FN(sock_map_update),\
+   FN(perf_event_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index 36fb916..c866682 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -70,6 +70,9 @@ static int (*bpf_sk_redirect_map)(void *map, int key, int 
flags) =
 static int (*bpf_sock_map_update)(void *map, void *key, void *value,
  unsigned long long flags) =
(void *) BPF_FUNC_sock_map_update;
+static int (*bpf_perf_event_read_value)(void *map, unsigned long long flags,
+  void *buf, unsigned int buf_size) =
+   (void *) BPF_FUNC_perf_event_read_value;
 
 
 /* llvm builtin functions that eBPF C program may use to
-- 
2.9.5

[PATCH net-next v3 1/4] bpf: add helper bpf_perf_event_read_value for perf event array map

2017-09-18 Thread Yonghong Song

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch adds helper bpf_perf_event_read_value for kprobed based perf
event array map, to read perf counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.
To achieve scaling factor between two bpf invocations, users
can can use cpu_id as the key (which is typical for perf array usage model)
to remember the previous value and do the calculation inside the
bpf program.

Signed-off-by: Yonghong Song 
---
 include/linux/perf_event.h |  3 ++-
 include/uapi/linux/bpf.h   | 18 +-
 kernel/bpf/arraymap.c  |  2 +-
 kernel/bpf/verifier.c  |  4 +++-
 kernel/events/core.c   | 15 ---
 kernel/trace/bpf_trace.c   | 44 
 6 files changed, 75 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8e22f24..13f08ee 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -884,7 +884,8 @@ perf_event_create_kernel_counter(struct perf_event_attr 
*attr,
void *context);
 extern void perf_pmu_migrate_context(struct pmu *pmu,
int src_cpu, int dst_cpu);
-int perf_event_read_local(struct perf_event *event, u64 *value);
+int perf_event_read_local(struct perf_event *event, u64 *value,
+ u64 *enabled, u64 *running);
 extern u64 perf_event_read_value(struct perf_event *event,
 u64 *enabled, u64 *running);
 
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 43ab5c4..2c68b9e 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -582,6 +582,14 @@ union bpf_attr {
  * @map: pointer to sockmap to update
  * @key: key to insert/update sock in map
  * @flags: same flags as map update elem
+ *
+ * int bpf_perf_event_read_value(map, flags, buf, buf_size)
+ * read perf event counter value and perf event enabled/running time
+ * @map: pointer to perf_event_array map
+ * @flags: index of event in the map or bitmask flags
+ * @buf: buf to fill
+ * @buf_size: size of the buf
+ * Return: 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -638,6 +646,7 @@ union bpf_attr {
FN(redirect_map),   \
FN(sk_redirect_map),\
FN(sock_map_update),\
+   FN(perf_event_read_value),  \
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
@@ -681,7 +690,8 @@ enum bpf_func_id {
 #define BPF_F_ZERO_CSUM_TX (1ULL << 1)
 #define BPF_F_DONT_FRAGMENT(1ULL << 2)
 
-/* BPF_FUNC_perf_event_output and BPF_FUNC_perf_event_read flags. */
+/* BPF_FUNC_perf_event_output, BPF_FUNC_perf_event_read and
+ * BPF_FUNC_perf_event_read_value flags. */
 #define BPF_F_INDEX_MASK   0xULL
 #define BPF_F_CURRENT_CPU  BPF_F_INDEX_MASK
 /* BPF_FUNC_perf_event_output for sk_buff input context. */
@@ -864,4 +874,10 @@ enum {
 #define TCP_BPF_IW 1001/* Set TCP initial congestion window */
 #define TCP_BPF_SNDCWND_CLAMP  1002/* Set sndcwnd_clamp */
 
+struct bpf_perf_event_value {
+   __u64 counter;
+   __u64 enabled;
+   __u64 running;
+};
+
 #endif /* _UAPI__LINUX_BPF_H__ */
diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
index 98c0f00..68d8666 100644
--- a/kernel/bpf/arraymap.c
+++ b/kernel/bpf/arraymap.c
@@ -492,7 +492,7 @@ static void *perf_event_fd_array_get_ptr(struct bpf_map 
*map,
 
ee = ERR_PTR(-EOPNOTSUPP);
event = perf_file->private_data;
-   if (perf_event_read_local(event, ) == -EOPNOTSUPP)
+   if (perf_event_read_local(event, , NULL, NULL) == -EOPNOTSUPP)
goto err_out;
 
ee = bpf_event_entry_gen(perf_file, map_file);
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 799b245..1bf9d7b 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1494,7 +1494,8 @@ static int

[PATCH net-next v3 3/4] bpf: add helper bpf_perf_prog_read_value

2017-09-18 Thread Yonghong Song

This patch adds helper bpf_perf_prog_read_cvalue for perf event based bpf
programs, to read event counter and enabled/running time.
The enabled/running time is accumulated since the perf event open.

The typical use case for perf event based bpf program is to attach itself
to a single event. In such cases, if it is desirable to get scaling factor
between two bpf invocations, users can can save the time values in a map,
and use the value from the map and the current value to calculate
the scaling factor.

Signed-off-by: Yonghong Song 
---
 include/linux/perf_event.h |  1 +
 include/uapi/linux/bpf.h   |  8 
 kernel/events/core.c   |  1 +
 kernel/trace/bpf_trace.c   | 23 +++
 4 files changed, 33 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 13f08ee..5ff3055 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -806,6 +806,7 @@ struct perf_output_handle {
 struct bpf_perf_event_data_kern {
struct pt_regs *regs;
struct perf_sample_data *data;
+   struct perf_event *event;
 };
 
 #ifdef CONFIG_CGROUP_PERF
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2c68b9e..ba77022 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -590,6 +590,13 @@ union bpf_attr {
  * @buf: buf to fill
  * @buf_size: size of the buf
  * Return: 0 on success or negative error code
+ *
+ * int bpf_perf_prog_read_value(ctx, buf, buf_size)
+ * read perf prog attached perf event counter and enabled/running time
+ * @ctx: pointer to ctx
+ * @buf: buf to fill
+ * @buf_size: size of the buf
+ * Return : 0 on success or negative error code
  */
 #define __BPF_FUNC_MAPPER(FN)  \
FN(unspec), \
@@ -647,6 +654,7 @@ union bpf_attr {
FN(sk_redirect_map),\
FN(sock_map_update),\
FN(perf_event_read_value),  \
+   FN(perf_prog_read_value),   \
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 2d5bbe5..d039086 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -8081,6 +8081,7 @@ static void bpf_overflow_handler(struct perf_event *event,
struct bpf_perf_event_data_kern ctx = {
.data = data,
.regs = regs,
+   .event = event,
};
int ret = 0;
 
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 39ce5d9..596b5c9 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -603,6 +603,18 @@ BPF_CALL_3(bpf_get_stackid_tp, void *, tp_buff, struct 
bpf_map *, map,
   flags, 0, 0);
 }
 
+BPF_CALL_3(bpf_perf_prog_read_value_tp, void *, ctx, struct 
bpf_perf_event_value *,
+   buf, u32, size)
+{
+   struct bpf_perf_event_data_kern *kctx = (struct 
bpf_perf_event_data_kern *)ctx;
+
+   if (size != sizeof(struct bpf_perf_event_value))
+   return -EINVAL;
+
+   return perf_event_read_local(kctx->event, >counter, >enabled,
+>running);
+}
+
 static const struct bpf_func_proto bpf_get_stackid_proto_tp = {
.func   = bpf_get_stackid_tp,
.gpl_only   = true,
@@ -612,6 +624,15 @@ static const struct bpf_func_proto 
bpf_get_stackid_proto_tp = {
.arg3_type  = ARG_ANYTHING,
 };
 
+static const struct bpf_func_proto bpf_perf_prog_read_value_proto_tp = {
+ .func   = bpf_perf_prog_read_value_tp,
+ .gpl_only   = true,
+ .ret_type   = RET_INTEGER,
+ .arg1_type  = ARG_PTR_TO_CTX,
+ .arg2_type  = ARG_PTR_TO_UNINIT_MEM,
+ .arg3_type  = ARG_CONST_SIZE,
+};
+
 static const struct bpf_func_proto *tp_prog_func_proto(enum bpf_func_id 
func_id)
 {
switch (func_id) {
@@ -619,6 +640,8 @@ static const struct bpf_func_proto *tp_prog_func_proto(enum 
bpf_func_id func_id)
return _perf_event_output_proto_tp;
case BPF_FUNC_get_stackid:
return _get_stackid_proto_tp;
+   case BPF_FUNC_perf_prog_read_value:
+   return _perf_prog_read_value_proto_tp;
default:
return tracing_func_proto(func_id);
}
-- 
2.9.5

[PATCH net-next v3 0/4] bpf: add two helpers to read perf event enabled/running time

2017-09-18 Thread Yonghong Song

Hardware pmu counters are limited resources. When there are more
pmu based perf events opened than available counters, kernel will
multiplex these events so each event gets certain percentage
(but not 100%) of the pmu time. In case that multiplexing happens,
the number of samples or counter value will not reflect the
case compared to no multiplexing. This makes comparison between
different runs difficult.

Typically, the number of samples or counter value should be
normalized before comparing to other experiments. The typical
normalization is done like:
  normalized_num_samples = num_samples * time_enabled / time_running
  normalized_counter_value = counter_value * time_enabled / time_running
where time_enabled is the time enabled for event and time_running is
the time running for event since last normalization.

This patch set implements two helper functions.
The helper bpf_perf_event_read_value reads counter/time_enabled/time_running
for perf event array map. The helper bpf_perf_prog_read_value read
counter/time_enabled/time_running for bpf prog with type 
BPF_PROG_TYPE_PERF_EVENT.

Changelogs:
v2->v3:
  . counters should be read in order to read enabled/running time. This is to
prevent that counters and enabled/running time may be read separately.
v1->v2:
  . reading enabled/running time should be together with reading counters
which contains the logic to ensure the result is valid.


Yonghong Song (4):
  bpf: add helper bpf_perf_event_read_value for perf event array map
  bpf: add a test case for helper bpf_perf_event_read_value
  bpf: add helper bpf_perf_prog_read_value
  bpf: add a test case for helper bpf_perf_prog_read_value

 include/linux/perf_event.h|  4 +-
 include/uapi/linux/bpf.h  | 26 +++-
 kernel/bpf/arraymap.c |  2 +-
 kernel/bpf/verifier.c |  4 +-
 kernel/events/core.c  | 16 ++--
 kernel/trace/bpf_trace.c  | 67 +--
 samples/bpf/trace_event_kern.c| 10 +
 samples/bpf/trace_event_user.c| 13 +++---
 samples/bpf/tracex6_kern.c| 26 
 samples/bpf/tracex6_user.c| 13 +-
 tools/include/uapi/linux/bpf.h|  4 +-
 tools/testing/selftests/bpf/bpf_helpers.h |  6 +++
 12 files changed, 173 insertions(+), 18 deletions(-)

-- 
2.9.5

[PATCH net-next v3 4/4] bpf: add a test case for helper bpf_perf_prog_read_value

2017-09-18 Thread Yonghong Song

The bpf sample program trace_event is enhanced to use the new
helper to print out enabled/running time.

Signed-off-by: Yonghong Song 
---
 samples/bpf/trace_event_kern.c| 10 ++
 samples/bpf/trace_event_user.c| 13 -
 tools/include/uapi/linux/bpf.h|  3 ++-
 tools/testing/selftests/bpf/bpf_helpers.h |  3 +++
 4 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/samples/bpf/trace_event_kern.c b/samples/bpf/trace_event_kern.c
index 41b6115..a77a583d 100644
--- a/samples/bpf/trace_event_kern.c
+++ b/samples/bpf/trace_event_kern.c
@@ -37,10 +37,14 @@ struct bpf_map_def SEC("maps") stackmap = {
 SEC("perf_event")
 int bpf_prog1(struct bpf_perf_event_data *ctx)
 {
+   char time_fmt1[] = "Time Enabled: %llu, Time Running: %llu";
+   char time_fmt2[] = "Get Time Failed, ErrCode: %d";
char fmt[] = "CPU-%d period %lld ip %llx";
u32 cpu = bpf_get_smp_processor_id();
+   struct bpf_perf_event_value value_buf;
struct key_t key;
u64 *val, one = 1;
+   int ret;
 
if (ctx->sample_period < 1)
/* ignore warmup */
@@ -54,6 +58,12 @@ int bpf_prog1(struct bpf_perf_event_data *ctx)
return 0;
}
 
+   ret = bpf_perf_prog_read_value(ctx, (void *)_buf, sizeof(struct 
bpf_perf_event_value));
+   if (!ret)
+ bpf_trace_printk(time_fmt1, sizeof(time_fmt1), value_buf.enabled, 
value_buf.running);
+   else
+ bpf_trace_printk(time_fmt2, sizeof(time_fmt2), ret);
+
val = bpf_map_lookup_elem(, );
if (val)
(*val)++;
diff --git a/samples/bpf/trace_event_user.c b/samples/bpf/trace_event_user.c
index 7bd827b..bf4f1b6 100644
--- a/samples/bpf/trace_event_user.c
+++ b/samples/bpf/trace_event_user.c
@@ -127,6 +127,9 @@ static void test_perf_event_all_cpu(struct perf_event_attr 
*attr)
int *pmu_fd = malloc(nr_cpus * sizeof(int));
int i, error = 0;
 
+   /* system wide perf event, no need to inherit */
+   attr->inherit = 0;
+
/* open perf_event on all cpus */
for (i = 0; i < nr_cpus; i++) {
pmu_fd[i] = sys_perf_event_open(attr, -1, i, -1, 0);
@@ -154,6 +157,11 @@ static void test_perf_event_task(struct perf_event_attr 
*attr)
 {
int pmu_fd;
 
+   /* per task perf event, enable inherit so the "dd ..." command can be 
traced properly.
+* Enabling inherit will cause bpf_perf_prog_read_time helper failure.
+*/
+   attr->inherit = 1;
+
/* open task bound event */
pmu_fd = sys_perf_event_open(attr, 0, -1, -1, 0);
if (pmu_fd < 0) {
@@ -175,14 +183,12 @@ static void test_bpf_perf_event(void)
.freq = 1,
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
-   .inherit = 1,
};
struct perf_event_attr attr_type_sw = {
.sample_freq = SAMPLE_FREQ,
.freq = 1,
.type = PERF_TYPE_SOFTWARE,
.config = PERF_COUNT_SW_CPU_CLOCK,
-   .inherit = 1,
};
struct perf_event_attr attr_hw_cache_l1d = {
.sample_freq = SAMPLE_FREQ,
@@ -192,7 +198,6 @@ static void test_bpf_perf_event(void)
PERF_COUNT_HW_CACHE_L1D |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16),
-   .inherit = 1,
};
struct perf_event_attr attr_hw_cache_branch_miss = {
.sample_freq = SAMPLE_FREQ,
@@ -202,7 +207,6 @@ static void test_bpf_perf_event(void)
PERF_COUNT_HW_CACHE_BPU |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16),
-   .inherit = 1,
};
struct perf_event_attr attr_type_raw = {
.sample_freq = SAMPLE_FREQ,
@@ -210,7 +214,6 @@ static void test_bpf_perf_event(void)
.type = PERF_TYPE_RAW,
/* Intel Instruction Retired */
.config = 0xc0,
-   .inherit = 1,
};
 
printf("Test HW_CPU_CYCLES\n");
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 79eb529..fa1be2c 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -633,7 +633,8 @@ union bpf_attr {
FN(redirect_map),   \
FN(sk_redirect_map),\
FN(sock_map_update),\
-   FN(perf_event_read_value),
+   FN(perf_event_read_value),  \
+   FN(perf_prog_read_value),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index c866682..892d785 100644
---

Re: [PATCH net-next 3/3] bpf: Test deletion in BPF_MAP_TYPE_LPM_TRIE

2017-09-18 Thread Alexei Starovoitov


On 9/18/17 12:30 PM, Craig Gallek wrote:

From: Craig Gallek 

Extend the 'random' operation tests to include a delete operation
(delete half of the nodes from both lpm implementions and ensure
that lookups are still equivalent).

Also, add a simple IPv4 test which verifies lookup behavior as nodes
are deleted from the tree.

Signed-off-by: Craig Gallek 


Thanks for the tests!
Acked-by: Alexei Starovoitov

Re: [PATCH net-next 2/3] bpf: Add uniqueness invariant to trivial lpm test implementation

2017-09-18 Thread Alexei Starovoitov


On 9/18/17 12:30 PM, Craig Gallek wrote:

From: Craig Gallek 

The 'trivial' lpm implementation in this test allows equivalent nodes
to be added (that is, nodes consisting of the same prefix and prefix
length).  For lookup operations, this is fine because insertion happens
at the head of the (singly linked) list and the first, best match is
returned.  In order to support deletion, the tlpm data structue must
first enforce uniqueness.  This change modifies the insertion algorithm
to search for equivalent nodes and remove them.  Note: the
BPF_MAP_TYPE_LPM_TRIE already has a uniqueness invariant that is
implemented as node replacement.

Signed-off-by: Craig Gallek 


Acked-by: Alexei Starovoitov

Re: [PATCH net-next 1/3] bpf: Implement map_delete_elem for BPF_MAP_TYPE_LPM_TRIE

2017-09-18 Thread Alexei Starovoitov


On 9/18/17 12:30 PM, Craig Gallek wrote:

From: Craig Gallek 

This is a simple non-recursive delete operation.  It prunes paths
of empty nodes in the tree, but it does not try to further compress
the tree as nodes are removed.

Signed-off-by: Craig Gallek 
---
 kernel/bpf/lpm_trie.c | 80 +--
 1 file changed, 77 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
index 1b767844a76f..9d58a576b2ae 100644
--- a/kernel/bpf/lpm_trie.c
+++ b/kernel/bpf/lpm_trie.c
@@ -389,10 +389,84 @@ static int trie_update_elem(struct bpf_map *map,
return ret;
 }

-static int trie_delete_elem(struct bpf_map *map, void *key)
+/* Called from syscall or from eBPF program */
+static int trie_delete_elem(struct bpf_map *map, void *_key)
 {
-   /* TODO */
-   return -ENOSYS;
+   struct lpm_trie *trie = container_of(map, struct lpm_trie, map);
+   struct bpf_lpm_trie_key *key = _key;
+   struct lpm_trie_node __rcu **trim;
+   struct lpm_trie_node *node;
+   unsigned long irq_flags;
+   unsigned int next_bit;
+   size_t matchlen = 0;
+   int ret = 0;
+
+   if (key->prefixlen > trie->max_prefixlen)
+   return -EINVAL;
+
+   raw_spin_lock_irqsave(>lock, irq_flags);
+
+   /* Walk the tree looking for an exact key/length match and keeping
+* track of where we could begin trimming the tree.  The trim-point
+* is the sub-tree along the walk consisting of only single-child
+* intermediate nodes and ending at a leaf node that we want to
+* remove.
+*/
+   trim = >root;
+   node = rcu_dereference_protected(
+   trie->root, lockdep_is_held(>lock));
+   while (node) {
+   matchlen = longest_prefix_match(trie, node, key);
+
+   if (node->prefixlen != matchlen ||
+   node->prefixlen == key->prefixlen)
+   break;


curious why there is no need to do
'node->prefixlen == trie->max_prefixlen' in the above
like update/lookup do?


+
+   next_bit = extract_bit(key->data, node->prefixlen);
+   /* If we hit a node that has more than one child or is a valid
+* prefix itself, do not remove it. Reset the root of the trim
+* path to its descendant on our path.
+*/
+   if (!(node->flags & LPM_TREE_NODE_FLAG_IM) ||
+   (node->child[0] && node->child[1]))
+   trim = >child[next_bit];
+   node = rcu_dereference_protected(
+   node->child[next_bit], lockdep_is_held(>lock));
+   }
+
+   if (!node || node->prefixlen != key->prefixlen ||
+   (node->flags & LPM_TREE_NODE_FLAG_IM)) {
+   ret = -ENOENT;
+   goto out;
+   }
+
+   trie->n_entries--;
+
+   /* If the node we are removing is not a leaf node, simply mark it
+* as intermediate and we are done.
+*/
+   if (rcu_access_pointer(node->child[0]) ||
+   rcu_access_pointer(node->child[1])) {
+   node->flags |= LPM_TREE_NODE_FLAG_IM;
+   goto out;
+   }
+
+   /* trim should now point to the slot holding the start of a path from
+* zero or more intermediate nodes to our leaf node for deletion.
+*/
+   while ((node = rcu_dereference_protected(
+   *trim, lockdep_is_held(>lock {
+   RCU_INIT_POINTER(*trim, NULL);
+   trim = rcu_access_pointer(node->child[0]) ?
+   >child[0] :
+   >child[1];
+   kfree_rcu(node, rcu);


can it be that some of the nodes this loop walks have
both child[0] and [1] ?

Re: [PATCH RFC 6/6] Modify tag_ksz.c to support other KSZ switch drivers

2017-09-18 Thread Andrew Lunn

> In the old DSA implementation all the ports are partitioned into its own 
> device
> and the bridge joining them will do all the forwarding.  This is useful for 
> quick
> testing with some protocols like RSTP but it is probably useless for real
> operation.

It is a good minimal driver, to get something into the kernel. You can
then add features to it.

> The new switchdev model tries to use the switch hardware as much as
> possible.  This offload_fwd_mark bit means the frame is forwarded by the
> hardware switch, so the software bridge does not need to do it again.  Without
> this bit there will be duplicated multicast frames coming out the ports if 
> internal
> forwarding is enabled.

Correct. Once you switch driver is clever enough, you can enable
offload_fwd_mark.
 
> When RSTP is used the port can be put in blocked state and so the forwarding
> will stop for that port.   Currently the switch driver will check that 
> membership
> to decide whether to set that bit.

This i don't get. RSTP or STP just break loops. How does RSTP vs STP
mean you need to set offload_fwd_mark differently?

> The KSZ switches never have a built-in MAC controller, so it is always a 
> support
> issue for us.

Not quite right. Once your drivers are in mainline, it becomes a
support issue for the community, with you being part of the community.
I've helped others fix this issue, one of the less well used Marvell
Ethernet drivers had this problem, and i gave somebody pointers how to
fix it.

Andrew

Re: [PATCH RFC 6/6] Modify tag_ksz.c to support other KSZ switch drivers

2017-09-18 Thread Andrew Lunn

On Mon, Sep 18, 2017 at 08:55:17PM +, tristram...@microchip.com wrote:
> > > > This is ugly. We have a clean separation between a switch driver and a
> > > > tag driver. Here you are mixing them together. Don't. Look at the
> > > > mailing lists for what Florian and I suggested to Pavel. It will also
> > > > solve your include file issue above.
> > >
> > > It seems to me all tag_*.c are hard-coded.  They do not have access to
> > > the switch device and just use the bit definitions defined in the top to
> > > do the job.
> > >
> > > This creates a problem for all KSZ switch devices as they have different
> > > tail tag formats and lengths.  There will be potentially 4 additional
> > > DSA_TAG_PROT_KSZ* just to handle them.  Extra files will be added
> > > with about the same code.
> > 
> > Hi Tristram
> > 
> > Think about factoring out the common code into a shared functions.
> >
> 
> I am a little unsure what you have in mind.  Can you elaborate?

You say you need 4 DSA_TAG_PROT_KSZ. I guess the code is nearly
identical in them all? If i remember correctly, the two tag bytes are
the other way around?

static int ksz8k_xmit( *skb, struct net_device *dev)
{
uint8* tag;

tag = ksz_xmit(skb, dev)
if (!tag)
 return NULL;

tag[0] = 1 << p->dp->index;
tag[1] = 0;

return skb;
}

static int ksz9k_xmit( *skb, struct net_device *dev)
{
uint8* tag;

tag = ksz_xmit(skb, dev)
if (!tag)
 return NULL;

tag[0] = 0;
tag[1] = 1 << p->dp->index;

return skb;
}

const struct dsa_device_ops ksz8k_netdev_ops = {
   .xmit   = ksz8k_xmit,
   .rcv= ksz8k_rcv,
};

const struct dsa_device_ops ksz9k_netdev_ops = {
   .xmit   = ksz9k_xmit,
   .rcv= ksz9k_rcv,
};

Andrew

Re: [PATCH net-next 09/12] net: dsa: b53: Wire-up EEE

2017-09-18 Thread Florian Fainelli

On 09/18/2017 03:29 PM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> @@ -1000,6 +1005,9 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
>> port,
>>  b53_write8(dev, B53_CTRL_PAGE, po_reg, gmii_po);
>>  }
>>  }
>> +
>> +/* Re-negotiate EEE if it was enabled already */
>> +p->eee_enabled = b53_eee_init(ds, port, phydev);
>>  }
> 
> Same here, I think we can move this up to DSA core, maybe with a
> eee_enabled mask in dsa_switch or a bool in dsa_port.

This can be done as a subsequent patch, sure.
-- 
Florian

Re: [PATCH net-next 08/12] net: dsa: b53: Move EEE functions to b53

2017-09-18 Thread Florian Fainelli

On 09/18/2017 03:27 PM, Vivien Didelot wrote:
> Hi Florian,
> 
> Florian Fainelli  writes:
> 
>> @@ -649,7 +595,7 @@ static void bcm_sf2_sw_adjust_link(struct dsa_switch 
>> *ds, int port,
>>  core_writel(priv, reg, offset);
>>  
>>  if (!phydev->is_pseudo_fixed_link)
>> -p->eee_enabled = bcm_sf2_eee_init(ds, port, phydev);
>> +p->eee_enabled = b53_eee_init(ds, port, phydev);
>>  }
> 
> I know this is a bit out-of-scope of this patch, but I have to say I am
> not confortable with having still phy device stuffs in switch drivers...

Yes, this is out of scope :)

> 
> Can this is_pseudo_fixed_link check + phy_eee_init + eee_enable be moved
> up to dsa_slave_adjust_link in a future patch maybe?

Not 100% positive this applies to all switches, which is why this is
still largely a switch driver decision.
-- 
Florian

Re: [PATCH net-next 09/12] net: dsa: b53: Wire-up EEE

2017-09-18 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> @@ -1000,6 +1005,9 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
> port,
>   b53_write8(dev, B53_CTRL_PAGE, po_reg, gmii_po);
>   }
>   }
> +
> + /* Re-negotiate EEE if it was enabled already */
> + p->eee_enabled = b53_eee_init(ds, port, phydev);
>  }

Same here, I think we can move this up to DSA core, maybe with a
eee_enabled mask in dsa_switch or a bool in dsa_port.

Re: [PATCH net-next 08/12] net: dsa: b53: Move EEE functions to b53

2017-09-18 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> @@ -649,7 +595,7 @@ static void bcm_sf2_sw_adjust_link(struct dsa_switch *ds, 
> int port,
>   core_writel(priv, reg, offset);
>  
>   if (!phydev->is_pseudo_fixed_link)
> - p->eee_enabled = bcm_sf2_eee_init(ds, port, phydev);
> + p->eee_enabled = b53_eee_init(ds, port, phydev);
>  }

I know this is a bit out-of-scope of this patch, but I have to say I am
not confortable with having still phy device stuffs in switch drivers...

Can this is_pseudo_fixed_link check + phy_eee_init + eee_enable be moved
up to dsa_slave_adjust_link in a future patch maybe?

Thanks,

Vivien

Re: [PATCH net-next 06/12] net: dsa: b53: Move Broadcom header setup to b53

2017-09-18 Thread Vivien Didelot

Florian Fainelli  writes:

> The code to enable Broadcom tags/headers is largely switch independent,
> and in preparation for enabling it for multiple devices with b53, move
> the code we have in bcm_sf2.c to b53_common.c
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot

Re: [PATCH net-next 05/12] net: dsa: b53: Use a macro to define I/O operations

2017-09-18 Thread Vivien Didelot

Hi Florian,

Florian Fainelli  writes:

> Instead of repeating the same pattern: acquire mutex, read/write, release
> mutex, define a macro: b53_build_op() which takes the type (read|write), I/O
> size, and value (scalar or pointer). This helps with fixing bugs that could
> exit (e.g: missing barrier, lock etc.).
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot

Re: [PATCH 2/3] selftests: actually run the various net selftests

2017-09-18 Thread Shuah Khan

On 09/18/2017 11:32 AM, jo...@toxicpanda.com wrote:
> From: Josef Bacik 
> 
> These self tests are just self contained binaries, they are not run by
> any of the scripts in the directory.  This means they need to be marked
> with TEST_GEN_PROGS to actually be run, not TEST_GEN_FILES.
> 
> Signed-off-by: Josef Bacik 
> ---
>  tools/testing/selftests/net/Makefile | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/testing/selftests/net/Makefile 
> b/tools/testing/selftests/net/Makefile
> index 3df542c84610..45a4e77a47c4 100644
> --- a/tools/testing/selftests/net/Makefile
> +++ b/tools/testing/selftests/net/Makefile
> @@ -6,8 +6,8 @@ CFLAGS += -I../../../../usr/include/
>  TEST_PROGS := run_netsocktests run_afpackettests test_bpf.sh netdevice.sh 
> rtnetlink.sh
>  TEST_GEN_FILES =  socket
>  TEST_GEN_FILES += psock_fanout psock_tpacket
> -TEST_GEN_FILES += reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
> -TEST_GEN_FILES += reuseport_dualstack msg_zerocopy reuseaddr_conflict
> +TEST_GEN_PROGS += reuseport_bpf reuseport_bpf_cpu reuseport_bpf_numa
> +TEST_GEN_PROGS += reuseport_dualstack msg_zerocopy reuseaddr_conflict

Hmm. I see msg_zerocopy.sh for running msg_zerocopy. msg_zerocopy should
still stay in TEST_GEN_FILES and msg_zerocopy.sh needs to be added to
TEST_PROGS so it runs.

thanks,
-- Shuah

Re: RACK not getting disabled

2017-09-18 Thread Yuchung Cheng

On Mon, Sep 18, 2017 at 2:55 PM, hiren panchasara
 wrote:
> On 09/18/17 at 02:46P, Yuchung Cheng wrote:
>> On Mon, Sep 18, 2017 at 2:29 PM, hiren panchasara
>>  wrote:
>> > On 09/18/17 at 02:18P, Eric Dumazet wrote:
>> >> On Mon, 2017-09-18 at 13:14 -0700, hiren panchasara wrote:
>> >> > Hi all, I am trying to disable rack to see 3dupacks in action during
>> >> > loss-detection but based on the pcap, I see that it's still trigger
>> >> > loss-recovery on the first SACK (as if RACK is still enabled/active).
>> just to be clear: 3-dupack (aka RFC3517) is still enabled with RACK
>> enabled. I am experimenting a patch set to disable 3-dupack approach
>> completely.
>
> So any incoming packet undergoes both checks right now to decide whether
> to mark it lost based on 3-dupacks (and eventually rfc6675) and also
> rack? Any insights into how they are working together would be great.
>
> Also whichever scheme detects loss first can kick connection into
> loss-recovery, right?
Yes. essentially we run both algorithms. the recovery starts when any
packet is deemed lost

>
> Thanks for the clarification, Yuchung.
>
> Cheers,
> Hiren

Re: [PATCH net-next 04/12] net: dsa: bcm_sf2: Defer port enabling to calling port_enable

2017-09-18 Thread Vivien Didelot

Florian Fainelli  writes:

> There is no need to configure the enabled ports once in bcm_sf2_sw_setup() and
> then a second time around when dsa_switch_ops::port_enable is called, just do
> it when port_enable is called which is better in terms of power consumption 
> and
> correctness.
>
> Signed-off-by: Florian Fainelli 

Reviewed-by: Vivien Didelot

Re: [PATCH v2] net/ethernet/freescale: fix warning for ucc_geth

2017-09-18 Thread David Miller

From: Valentin Longchamp 
Date: Fri, 15 Sep 2017 07:58:47 +0200

> uf_info.regs is resource_size_t i.e. phys_addr_t that can be either u32
> or u64 according to CONFIG_PHYS_ADDR_T_64BIT.
> 
> The printk format is thus adaptet to u64 and the regs value cast to u64
> to take both u32 and u64 into account.
> 
> Signed-off-by: Valentin Longchamp 

Applied to net-next, thanks.

Re: [PATCH net-next 03/12] net: dsa: b53: Defer port enabling to calling port_enable

2017-09-18 Thread Vivien Didelot

Florian Fainelli  writes:

> There is no need to configure the enabled ports once in b53_setup() and then a
> second time around when dsa_switch_ops::port_enable is called, just do it when
> port_enable is called which is better in terms of power consumption and
> correctness.
>
> Signed-off-by: Florian Fainelli 

Great, my next step is to move up the ports enabling/disabling withing
DSA core. This patch helps going in that direction, thanks.

Reviewed-by: Vivien Didelot

Re: RACK not getting disabled

2017-09-18 Thread hiren panchasara

On 09/18/17 at 02:46P, Yuchung Cheng wrote:
> On Mon, Sep 18, 2017 at 2:29 PM, hiren panchasara
>  wrote:
> > On 09/18/17 at 02:18P, Eric Dumazet wrote:
> >> On Mon, 2017-09-18 at 13:14 -0700, hiren panchasara wrote:
> >> > Hi all, I am trying to disable rack to see 3dupacks in action during
> >> > loss-detection but based on the pcap, I see that it's still trigger
> >> > loss-recovery on the first SACK (as if RACK is still enabled/active).
> just to be clear: 3-dupack (aka RFC3517) is still enabled with RACK
> enabled. I am experimenting a patch set to disable 3-dupack approach
> completely.

So any incoming packet undergoes both checks right now to decide whether
to mark it lost based on 3-dupacks (and eventually rfc6675) and also
rack? Any insights into how they are working together would be great.

Also whichever scheme detects loss first can kick connection into
loss-recovery, right?

Thanks for the clarification, Yuchung.

Cheers,
Hiren

pgpxEh2kwtW0j.pgp
Description: PGP signature

Re: [PATCH 1/1] ipv6_skip_exthdr: use ipv6_authlen for AH header length computation

2017-09-18 Thread David Miller

From: Xiang Gao 
Date: Fri, 15 Sep 2017 01:04:27 -0400

> From 09cf2e3cf09cf591283785aaa8159baf39ac2e08 Mon Sep 17 00:00:00 2001
> From: Xiang Gao 
> Date: Fri, 15 Sep 2017 00:44:12 -0400
> Subject: [PATCH] ipv6_skip_exthdr: use ipv6_authlen for AH hdrlen
> 
> In ipv6_skip_exthdr, the lengh of AH header is computed manually
> as (hp->hdrlen+2)<<2. However, in include/linux/ipv6.h, a macro
> named ipv6_authlen is already defined for exactly the same job. This
> commit replaces the manual computation code with the macro.

Your patch was whitespace corrupted by your email client.

1 2 3 >

1 - 100 of 283 matches

Mail list logo