Re: [PATCH RFC net-next] net: Poptrie based routing table lookup

2018-09-04 Thread Jesper Dangaard Brouer
On Tue, 4 Sep 2018 16:34:36 -0400
"Md. Islam"  wrote:

> On Tue, Sep 4, 2018 at 12:14 PM, Md. Islam  wrote:
> >
> > On Tue, Sep 4, 2018, 6:53 AM Jesper Dangaard Brouer 
> > wrote:  
> >>
> >> Hi Md. Islam,
> >>
> >> People will start to ignore you, when you don't interact appropriately
> >> with the community, and you ignore their advice, especially when it is
> >> about how to interact with the community[1].
> >>
> >> You have not addressed any of my feedback on your patch in [1].
> >>  [1]
> >> http://www.mail-archive.com/search?l=mid=20180827173334.16ff0...@redhat.com
> >>   
> >
> >
> > Jesper,
> >
> > I actually addressed all the feedbacks in the previous patch except TOS,
> > FIB_matrics, and etc. This is because I don't think they are relevant in
> > this usecase. Please let me know if I wrong.
> >
> > Thanks  
> 
> Jesper
> 
> Sorry, I missed your review in the first place. I will take a look and
> resubmit the patch.

Good that you actually noticed yourself, that you did not address any
of my feedback.  I don't want to repeat myself, so you just need to
follow the above link, and the link below (coding style +checkpatch.pl).


> >>
> >>
> >>
> >> --
> >> Best regards,
> >>   Jesper Dangaard Brouer
> >>   MSc.CS, Principal Kernel Engineer at Red Hat
> >>   LinkedIn: http://www.linkedin.com/in/brouer
> >>
> >> p.s. also top-posting is bad, but I suspect you will not read my
> >> response if I don't top-post.
> >>
> >>
> >> On Tue, 4 Sep 2018 01:02:30 -0400 "Md. Islam"  wrote:
> >>  
> >> > This patch implements Poptrie based routing table
> >> > lookup/insert/delete/flush. Currently many carrier routers use kernel
> >> > bypass frameworks such as DPDK and VPP to implement the data plane.
> >> > XDP along with this patch will enable Linux to work as such a router.
> >> > Currently it supports up to 255 ports. Many real word backbone routers
> >> > have up to 233 ports (to the best of my knowledge), so it seems to be
> >> > sufficient at this moment.
> >> >
> >> > I also have attached a draft paper to explain it works (poptrie.pdf).
> >> > Please set CONFIG_FIB_POPTRIE=y (default n) before testing the patch.
> >> > Note that, poptrie_lookup() is not being called from anywhere. It will
> >> > be used by XDP forwarding.
> >> >
> >> >
> >> > From 3dc9683298ed896dd3080733503c35d68f05370e Mon Sep 17 00:00:00 2001
> >> > From: tamimcse 
> >> > Date: Mon, 3 Sep 2018 23:56:43 -0400
> >> > Subject: [PATCH] Poptrie based routing table lookup
> >> >
> >> > Signed-off-by: tamimcse 
> >> > ---
> >> >  include/net/ip_fib.h   |  42 +
> >> >  net/ipv4/Kconfig   |   4 +
> >> >  net/ipv4/Makefile  |   1 +
> >> >  net/ipv4/fib_poptrie.c | 483
> >> > +
> >> >  net/ipv4/fib_trie.c|  12 ++
> >> >  5 files changed, 542 insertions(+)
> >> >  create mode 100644 net/ipv4/fib_poptrie.c  
> >>
> >> First of order of business: You need to conform to the kernels coding
> >> standards!
> >>
> >> https://www.kernel.org/doc/html/v4.18/process/coding-style.html
> >>
> >> There is a script avail to check this called: scripts/checkpatch.pl
> >> It summary says:
> >>  total: 139 errors, 238 warnings, 6 checks, 372 lines checked
> >> (Not good, more error+warnings than lines...)
> >>
> >> Please fix up those... else people will not even read you code!




[GIT] net merged into net-next

2018-09-04 Thread David Miller


Just FYI...


Re: [PATCH rdma-next v1 12/15] RDMA/mlx5: Add a new flow action verb - modify header

2018-09-04 Thread Leon Romanovsky
On Tue, Sep 04, 2018 at 03:58:23PM -0600, Jason Gunthorpe wrote:
> On Tue, Aug 28, 2018 at 02:18:51PM +0300, Leon Romanovsky wrote:
>
> > +static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_ACTION_CREATE_MODIFY_HEADER)(
> > +   struct ib_uverbs_file *file,
> > +   struct uverbs_attr_bundle *attrs)
> > +{
> > +   struct ib_uobject *uobj = uverbs_attr_get_uobject(
> > +   attrs, MLX5_IB_ATTR_CREATE_MODIFY_HEADER_HANDLE);
> > +   struct mlx5_ib_dev *mdev = to_mdev(uobj->context->device);
> > +   enum mlx5_ib_uapi_flow_table_type ft_type;
> > +   struct ib_flow_action *action;
> > +   size_t num_actions;
> > +   void *in;
> > +   int len;
> > +   int ret;
> > +
> > +   if (!mlx5_ib_modify_header_supported(mdev))
> > +   return -EOPNOTSUPP;
> > +
> > +   in = uverbs_attr_get_alloced_ptr(attrs,
> > +   MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> > +   len = uverbs_attr_get_len(attrs,
> > +   MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> > +
> > +   if (len % MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto))
> > +   return -EINVAL;
> > +
> > +   ret = uverbs_get_const(_type, attrs,
> > +  MLX5_IB_ATTR_CREATE_MODIFY_HEADER_FT_TYPE);
> > +   if (ret)
> > +   return -EINVAL;
>
> This should be
>
>   if (ret)
>   return ret;
>
> Every call to uverbs_get_const is wrong in this same way..

Right, from technical point of view uverbs_get_const can return EINVAL
only, and it is correct for now, but need to be changed to proper
"return ret".

>
> I can probably fix it if this is the only thing though..
>

Thanks, I appreciate it.

> Jason


signature.asc
Description: PGP signature


Re: [PATCH rdma-next v1 00/15] Flow actions to mutate packets

2018-09-04 Thread Leon Romanovsky
On Tue, Sep 04, 2018 at 04:12:05PM -0600, Jason Gunthorpe wrote:
> On Tue, Aug 28, 2018 at 02:18:39PM +0300, Leon Romanovsky wrote:
> > From: Leon Romanovsky 
> >
> > >From Mark,
> >
> > This series exposes the ability to create flow actions which can
> > mutate packet headers. We do that by exposing two new verbs:
> >  * modify header - can change existing packet headers. packet
> >  * reformat - can encapsulate or decapsulate a packet.
> >   Once created a flow action must be attached to a steering
> >   rule for it to take effect.
> >
> > The first 10 patches refactor mlx5_core code, rename internal structures
> > to better reflect their operation and export needed functions so the
> > RDMA side can allocate the action.
> >
> > The last 5 patches expose via the IOCTL infrastructure mlx5_ib methods
> > which do the actual allocation of resources and return an handle to the
> > user. A user of this API is expected to know how to work with the
> > device's spec as the input to those function is HW depended.
> >
> > An example usage of the modify header action is routing, A user can
> > create an action which edits the L2 header and decrease the TTL.
> >
> > An example usage of the packet reformat action is VXLAN encap/decap
> > which is done by the HW.
> >
> > Changelog:
> >  v0 -> v1:
> >   * Patch 1: Addressed Saeed's comments and simplified the logic.
> >   * Patch 2: Changed due to changes in patch 1.
> >
> >  Split the 27 patch series into 3, this is the first one
> >  which just lets the user create flow action.
> >  Other than that styling fixes mainly in the RDMA patches
> >  to make sure 80 chars limit isn't exceeded.
> >
> >  RFC -> v0:
> >   * Patch 1 a new patch which refactors the logic
> > when getting a flow namespace.
> >   * Patch 2 was split into two.
> >   * Patch 3: Fixed a typo in commit message
> >   * Patch 5: Updated commit message
> >   * Patch 7: Updated commit message
> > Renamed:
> >   - MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT_ID to
> > MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT
> >   - packet_reformat_id to reformat_id in struct mlx5_flow_act
> >   - packet_reformat_id to encap_id in struct mlx5_esw_flow_attr
> >   - packet_reformat_id to encap_id in struct mlx5e_encap_entry
> >   - PACKET_REFORMAT to REFORMAT when printing trace points
> >   * Patch 9: Updated commit message
> > Updated function declaration in mlx5_core.h, could of lead
> > to compile error on bisection.
> >   * Patch 11: Disallow egress rules insertion when in switchdev mode
> >   * Patch 12: A new patch to deal with passing enum values using
> > the IOCTL infrastructure.
> >   * Patch 13: Use new enum value attribute when passing enum
> > mlx5_ib_uapi_flow_table_type
> >   * Patch 15: Don't set encap flags on flow tables if in switchdev mode
> >   * Patch 17: Use new enum value attribute when passing enum
> > mlx5_ib_uapi_flow_table_type and enum
> > mlx5_ib_uapi_flow_action_packet_reformat_type
> >   * Patch 19: Allow creation of both
> > MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L2_TO_L3_TUNNEL
> > and MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L3_TUNNEL_TO_L2
> > packet
> > reformat actions.
> >   * Patch 20: A new patch which allows attaching packet reformat
> > actions to flow tables on NIC RX.
> >
> > Thanks
> >
> > Mark Bloch (15):
> >   net/mlx5: Cleanup flow namespace getter switch logic
> >   net/mlx5: Add proper NIC TX steering flow tables support
> >   net/mlx5: Export modify header alloc/dealloc functions
> >   net/mlx5: Add support for more namespaces when allocating modify
> > header
> >   net/mlx5: Break encap/decap into two separated flow table creation
> > flags
> >   net/mlx5: Move header encap type to IFC header file
> >   {net, RDMA}/mlx5: Rename encap to reformat packet
> >   net/mlx5: Expose new packet reformat capabilities
> >   net/mlx5: Pass a namespace for packet reformat ID allocation
> >   net/mlx5: Export packet reformat alloc/dealloc functions
> >   RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
> >   RDMA/mlx5: Add a new flow action verb - modify header
> >   RDMA/uverbs: Add generic function to fill in flow action object
> >   RDMA/mlx5: Add new flow action verb - packet reformat
> >   RDMA/mlx5: Extend packet reformat verbs
> >
> >  drivers/infiniband/core/uverbs_ioctl.c |  23 ++
> >  .../infiniband/core/uverbs_std_types_flow_action.c |   7 +-
> >  drivers/infiniband/hw/mlx5/devx.c  |   6 +-
> >  drivers/infiniband/hw/mlx5/flow.c  | 301 
> > +
> >  drivers/infiniband/hw/mlx5/main.c  |   3 +
> >  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  19 +-
> >  drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |   8 +-
> >  .../mellanox/mlx5/core/diag/fs_tracepoint.h|   2 +-
> >  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  51 ++--
> >  

Re: [PATCH mlx5-next v1 05/15] net/mlx5: Break encap/decap into two separated flow table creation flags

2018-09-04 Thread Leon Romanovsky
On Tue, Sep 04, 2018 at 04:02:42PM -0600, Jason Gunthorpe wrote:
> On Tue, Aug 28, 2018 at 02:18:44PM +0300, Leon Romanovsky wrote:
> > From: Mark Bloch 
> >
> > Today we are able to attach encap and decap actions only to the FDB. In
> > preparation to enable those actions on the NIC flow tables, break the
> > single flag into two. Those flags control whatever a decap or encap
> > operations can be attached to the flow table created. For FDB, if
> > encapsulation is required, we set both of them.
> >
> > Signed-off-by: Mark Bloch 
> > Reviewed-by: Saeed Mahameed 
> > Reviewed-by: Or Gerlitz 
> > Signed-off-by: Leon Romanovsky 
> >  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 3 ++-
> >  drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   | 7 ---
> >  include/linux/mlx5/fs.h| 3 ++-
> >  3 files changed, 8 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > index f72b5c9dcfe9..ff21807a0c4b 100644
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> > @@ -529,7 +529,8 @@ static int esw_create_offloads_fast_fdb_table(struct 
> > mlx5_eswitch *esw)
> > esw_size >>= 1;
> >
> > if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
> > -   flags |= MLX5_FLOW_TABLE_TUNNEL_EN;
> > +   flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP |
> > + MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
> >
> > fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH,
> >   esw_size,
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
> > b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
> > index 9ae777e56529..1698f325a21e 100644
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
> > @@ -152,7 +152,8 @@ static int mlx5_cmd_create_flow_table(struct 
> > mlx5_core_dev *dev,
> >   struct mlx5_flow_table *next_ft,
> >   unsigned int *table_id, u32 flags)
> >  {
> > -   int en_encap_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN);
> > +   int en_encap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP);
> > +   int en_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>
> Yuk, please don't use !!.
>
>   bool en_decap = flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP;

We need to provide en_encap and en_decap as an input to MLX5_SET(...)
which is passed to FW as 0 or 1. Boolean type is declared in C as int
and treated as zero for false and any other value for true, so we
can't pass "bool en_decap" without ensuring before that it is 1.

I'm applying this patch as is.

Thanks

>
> Jason


signature.asc
Description: PGP signature


[PATCH iproute2] bridge/mdb: fix missing new line when show bridge mdb

2018-09-04 Thread Hangbin Liu
The bridge mdb show is broken on current iproute2. e.g.
]# bridge mdb show
34: br0  veth0_br  224.1.1.2  temp 34: br0  veth0_br  224.1.1.1  temp

After fix:
]# bridge mdb show
34: br0  veth0_br  224.1.1.2  temp
34: br0  veth0_br  224.1.1.1  temp

Reported-by: Ying Xu 
Fixes: c7c1a1ef51aea ("bridge: colorize output and use JSON print library")
Signed-off-by: Hangbin Liu 
---
 bridge/mdb.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/bridge/mdb.c b/bridge/mdb.c
index f38dc67..d89c065 100644
--- a/bridge/mdb.c
+++ b/bridge/mdb.c
@@ -107,6 +107,10 @@ static void br_print_router_ports(FILE *f, struct rtattr 
*attr,
fprintf(f, "%s ", port_ifname);
}
}
+
+   if (!is_json_context() && !show_stats)
+   fprintf(f, "\n");
+
close_json_array(PRINT_JSON, NULL);
 }
 
@@ -164,6 +168,10 @@ static void print_mdb_entry(FILE *f, int ifindex, const 
struct br_mdb_entry *e,
print_string(PRINT_ANY, "timer", " %s",
 format_timer(timer));
}
+
+   if (!is_json_context())
+   fprintf(f, "\n");
+
close_json_object();
 }
 
-- 
2.5.5



Re: [PATCH bpf-next] bpf/verifier: properly clear union members after a ctx read

2018-09-04 Thread Alexei Starovoitov
On Tue, Sep 04, 2018 at 03:19:52PM +0100, Edward Cree wrote:
> In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access()
>  has supplied a reg_type, the other members of the register state are set
>  appropriately.  Previously reg.range was set to 0, but as it is in a
>  union with reg.map_ptr, which is larger, upper bytes of the latter were
>  left in place.  This then caused the memcmp() in regsafe() to fail,
>  preventing some branches from being pruned (and occasionally causing the
>  same program to take a varying number of processed insns on repeated
>  verifier runs).
> 
> Signed-off-by: Edward Cree 
> ---
> Possibly something might need adding to __mark_reg_unknown() as well to
>  clear map_ptr/range, I'm not sure (though doing so did not affect the
>  processed insn count on the cilium programs).
> 
>  kernel/bpf/verifier.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index f4ff0c569e54..49e4ea66fdd3 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -1640,9 +1640,9 @@ static int check_mem_access(struct bpf_verifier_env 
> *env, int insn_idx, u32 regn
>   else
>   mark_reg_known_zero(env, regs,
>   value_regno);
> - regs[value_regno].id = 0;
> - regs[value_regno].off = 0;
> - regs[value_regno].range = 0;
> + /* Clear id, off, and union(map_ptr, range) */
> + memset(regs + value_regno, 0,
> +offsetof(struct bpf_reg_state, var_off));
>   regs[value_regno].type = reg_type;
>   }

Awesome! Thanks a bunch for tracking it down.
I vaguely remember thinking about overlapping map_ptr with other fields
and not clearing map_ptr explicitly, because it was unnecessary...
Doing a bit of git-archaeology...
Looks like commit f1174f77b50c ("bpf/verifier: rework value tracking")
removed 'imm' from that union, so that __mark_reg_unknown_value()
that was clearing both 'imm' and 'map_ptr' before was no longer happening.
So old sequence:
  mark_reg_unknown_value_and_range(); // which called __mark_reg_unknown_value()
  //  which cleared 'imm' (and id|off|range)
  state->regs[value_regno].type = reg_type;
got replaced with
  mark_reg_known_zero();
  state->regs[value_regno].id = 0;
  state->regs[value_regno].off = 0;
  state->regs[value_regno].range = 0;
  state->regs[value_regno].type = reg_type;
which made map_ptr contain junk in upper bits.
I bet the comment "note that reg.[id|off|range] == 0" few lines before
that was deleted by that commit probably caused that bug :)
That comment I added as part of commit 969bf05eb3ce ("bpf: direct packet 
access")
What I was trying to express in that comment that
"mark_reg_unknown_value() that is called right before that comment
also clears id|off|range that are included as part of bigger 'imm' field
that mark_reg_unknown_value() clears, so these three fields don't need
to be cleared separately"
Sorry for confusion that that comment caused and painful debugging.

So would you agree it's fair to add
Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
?

Also I think it's better to do this memset() in both
__mark_reg_unknown() and in __mark_reg_known()
instead of open coding it here in check_mem_access().

While at it we would also need to adjust this:
static void __mark_reg_const_zero(struct bpf_reg_state *reg)
{
__mark_reg_known(reg, 0);
reg->off = 0;
reg->type = SCALAR_VALUE;
}
since line reg->off = 0; wouldn't make sense after memset() is added
and few other places.

btw the 4 byte hole:
enum bpf_reg_type  type; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
union {
u16range;/*   2 */
struct bpf_map *   map_ptr;  /*   8 */
};   /* 8 8 */
doesn't cause instability issues, since we kzalloc verifier reg state.

How about patch like the following:

>From 422fd975ed78645ab67d2eb50ff6e1ff6fb3de32 Mon Sep 17 00:00:00 2001
From: Alexei Starovoitov 
Date: Tue, 4 Sep 2018 19:13:44 -0700
Subject: [PATCH] bpf/verifier: fix verifier instability

Fixes: f1174f77b50c ("bpf/verifier: rework value tracking")
Debugged-by: Edward Cree  
Signed-off-by: Alexei Starovoitov 
---
 kernel/bpf/verifier.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f4ff0c569e54..6ff1bac1795d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -570,7 +570,9 @@ static void __mark_reg_not_init(struct bpf_reg_state *reg);
  */
 static void __mark_reg_known(struct 

I wait for your prompt response.

2018-09-04 Thread Aziz Dake
Good day,

I am Mr. Aziz Dake, from Burkina Faso  a Minister confide on me to
look for foreign partner who will assist him to invest the sum of
Thirty  Million  Dollars  ($30,000,000) in your country.

He has investment interest in mining, exotic properties for commercial
resident, development properties, hotels and any other viable
investment opportunities in your country based on your recommendation
will be highly welcomed.

Hence your co -operation is highly needed to actualize this investment project

I wait for your prompt response.

Sincerely yours

Mr Aziz Dake.


Re: Why not use all the syn queues? in the function "tcp_conn_request", I have some questions.

2018-09-04 Thread Ttttabcd




Sent with ProtonMail Secure Email.

‐‐‐ Original Message ‐‐‐
On 4 September 2018 9:06 PM, Neal Cardwell  wrote:

> On Tue, Sep 4, 2018 at 1:48 AM Ttttabcd a...@protonmail.com wrote:
>
> > Hello everyone,recently I am looking at the source code for handling TCP 
> > three-way handshake(Linux Kernel version 4.18.5).
> > I found some strange places in the source code for handling syn messages.
> > in the function "tcp_conn_request"
> > This code will be executed when we don't enable the syn cookies.
> >
> > if (!net->ipv4.sysctl_tcp_syncookies &&
> > (net->ipv4.sysctl_max_syn_backlog - 
> > inet_csk_reqsk_queue_len(sk) <
> >  (net->ipv4.sysctl_max_syn_backlog >> 2)) &&
> > !tcp_peer_is_proven(req, dst)) {
> > /* Without syncookies last quarter of
> >  * backlog is filled with destinations,
> >  * proven to be alive.
> >  * It means that we continue to communicate
> >  * to destinations, already remembered
> >  * to the moment of synflood.
> >  */
> > pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
> > rsk_ops->family);
> > goto drop_and_release;
> > }
> >
> >
> > But why don't we use all the syn queues?
>
> If tcp_peer_is_proven() returns true then we do allow ourselves to use
> the whole queue.
>
> > Why do we need to leave the size of (net->ipv4.sysctl_max_syn_backlog >> 2) 
> > in the queue?
> > Even if the system is attacked by a syn flood, there is no need to leave a 
> > part. Why do we need to leave a part?
>
> The comment describes the rationale. If syncookies are disabled, then
> the last quarter of the backlog is reserved for filling with
> destinations that were proven to be alive, according to
> tcp_peer_is_proven() (which uses RTTs measured in previous
> connections). The idea is that if there is a SYN flood, we do not want
> to use all of our queue budget on attack traffic but instead want to
> reserve some queue space for SYNs from real remote machines that we
> have actually contacted in the past.
>
> > The value of sysctl_max_syn_backlog is the maximum length of the queue only 
> > if syn cookies are enabled.
>
> Even if syncookies are disabled, sysctl_max_syn_backlog is the maximum
> length of the queue.
>
> > This is the first strange place, here is another strange place
> >
> > __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
> >
> > if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
> >  inet_csk_reqsk_queue_is_full(sk)) && !isn) {
> >
> > if (!want_cookie && !isn) {
> >
> >
> > The value of "isn" comes from TCP_SKB_CB(skb)->tcp_tw_isn, then it is 
> > judged twice whether its value is indeed 0.
> > But "tcp_tw_isn" is initialized in the function "tcp_v4_fill_cb"
> >
> > TCP_SKB_CB(skb)->tcp_tw_isn = 0;
> >
> >
> > So it has always been 0, I used printk to test, and the result is always 0.
>
> That field is also set in tcp_timewait_state_process():
>
> TCP_SKB_CB(skb)->tcp_tw_isn = isn;
>
> So there can be cases where it is not 0.
>
> Hope that helps,
> neal

Thank you very much, I understand


Re: [PATCH net-next 5/5] net: dsa: b53: Add SerDes support

2018-09-04 Thread Andrew Lunn
On Tue, Sep 04, 2018 at 04:55:26PM -0700, Florian Fainelli wrote:
> On 09/04/2018 04:32 PM, Andrew Lunn wrote:
> > 
> > 
> >> +void b53_serdes_phylink_validate(struct b53_device *dev, int port,
> >> +   unsigned long *supported,
> >> +   struct phylink_link_state *state)
> >> +{
> >> +  u8 lane = b53_serdes_map_lane(dev, port);
> >> +
> >> +  if (lane == B53_INVALID_LANE)
> >> +  return;
> >> +
> >> +  switch (lane) {
> >> +  case 0:
> >> +  phylink_set(supported, 2500baseX_Full);
> > 
> > Hi Florian
> > 
> > Could you also use it for 2500BaseT_Full with an appropriate copper
> > PHY?
> 
> My reading of the datasheet (which only mentions 2.5G with no further
> mention) make me think that is not possible to do copper at 2.5G and
> only 2500baseX since it only talks about fiber and not copper.
> 
> Would you recommend a specific SFP that allows that? Like this one:
> 
> https://www.flexoptix.net/en/sfp-t-transceiver-2h-gigabit-cat-5e-rj-45-100m-100m-1000m-2500-base-t.html?co8829=85744

I was actually thinking of a 'plain old' copper PHY with a SERDES
interface which can do 25000Base-T. The Marvell 88x3310 or the
Aquantia 10G PHY, for example.

Russell might be able to make a recommendation. I don't have any
Copper SFP modules.

   Andrew


Re: [PATCH net-next 5/5] net: dsa: b53: Add SerDes support

2018-09-04 Thread Florian Fainelli
On 09/04/2018 04:15 PM, Andrew Lunn wrote:
> On Tue, Sep 04, 2018 at 03:11:20PM -0700, Florian Fainelli wrote:
>> Add support for the Northstar Plus SerDes which is accessed through a
>> special page of the switch. Since this is something that most people
>> probably will not want to use, make it a configurable option.
>>
>> The SerDes supports both SGMII and 1000baseX modes, and is internally
>> looking like a seemingly standard MII PHY, except for the few bits that
>> got repurposed.
> 
> Hi Florian
> 
> The SERDES in the 6352 also look very similar to a standard MII PHYs.
> 
> Maybe at some point, we should look at the SERDES drivers we have
> embedded in different MAC drivers, and see if we can pull them out,
> maybe put them in drivers/net/phy. Any SERDES driver being used in
> combination with phylink probably has the same API.

Yes, that would sound like a good move forward. The SerDes on the
Northstar Plus does have a bunch of MII standard registers, but not a
whole lot (BMSR, BMCR, MII_PHYSID1/2, AUTONEGADV, AUTONEGLPABIL) and
then, it's all custom.

It would be good to have possibly a third vendor (Mediatek? Qualcomm?)
and see how they did it so we can define an appropriate API.
-- 
Florian


Re: [PATCH net-next 5/5] net: dsa: b53: Add SerDes support

2018-09-04 Thread Florian Fainelli
On 09/04/2018 04:32 PM, Andrew Lunn wrote:
> 
> 
>> +void b53_serdes_phylink_validate(struct b53_device *dev, int port,
>> + unsigned long *supported,
>> + struct phylink_link_state *state)
>> +{
>> +u8 lane = b53_serdes_map_lane(dev, port);
>> +
>> +if (lane == B53_INVALID_LANE)
>> +return;
>> +
>> +switch (lane) {
>> +case 0:
>> +phylink_set(supported, 2500baseX_Full);
> 
> Hi Florian
> 
> Could you also use it for 2500BaseT_Full with an appropriate copper
> PHY?

My reading of the datasheet (which only mentions 2.5G with no further
mention) make me think that is not possible to do copper at 2.5G and
only 2500baseX since it only talks about fiber and not copper.

Would you recommend a specific SFP that allows that? Like this one:

https://www.flexoptix.net/en/sfp-t-transceiver-2h-gigabit-cat-5e-rj-45-100m-100m-1000m-2500-base-t.html?co8829=85744
-- 
Florian


Re: [PATCH net-next 2/5] net: dsa: b53: Make SRAB driver manage port interrupts

2018-09-04 Thread Florian Fainelli
On 09/04/2018 03:59 PM, Andrew Lunn wrote:
>> +static irqreturn_t b53_srab_port_isr(int irq, void *dev_id)
>> +{
>> +struct b53_srab_port_priv *port = dev_id;
>> +struct b53_device *dev = port->dev;
>> +struct b53_srab_priv *priv = dev->priv;
>> +
>> +/* Acknowledge the interrupt */
>> +writel(BIT(port->num), priv->regs + B53_SRAB_INTR);
>> +
>> +schedule_work(>irq_work);
>> +
>> +return IRQ_HANDLED;
>> +}
>> +
>> +static int b53_srab_irq_enable(struct b53_device *dev, int port)
>> +{
>> +struct b53_srab_priv *priv = dev->priv;
>> +struct b53_srab_port_priv *p = >port_intrs[port];
>> +int ret;
>> +
>> +ret = request_irq(p->irq, b53_srab_port_isr, 0,
>> +  dev_name(dev->dev), p);
> 
> Hi Florian
> 
> Could you use a threaded interrupt? Save you from having to implement
> your own work queue. I think you can have a function called in both
> interrupt context in order to acknowledged the interrupt, and thread
> context to do the remaining work.

Indeed, this works nicely actually, thanks for the suggestion.
-- 
Florian


Re: [PATCH net-next 5/5] net: dsa: b53: Add SerDes support

2018-09-04 Thread Andrew Lunn



> +void b53_serdes_phylink_validate(struct b53_device *dev, int port,
> +  unsigned long *supported,
> +  struct phylink_link_state *state)
> +{
> + u8 lane = b53_serdes_map_lane(dev, port);
> +
> + if (lane == B53_INVALID_LANE)
> + return;
> +
> + switch (lane) {
> + case 0:
> + phylink_set(supported, 2500baseX_Full);

Hi Florian

Could you also use it for 2500BaseT_Full with an appropriate copper
PHY?

Andrew


Re: [PATCH net-next 5/5] net: dsa: b53: Add SerDes support

2018-09-04 Thread Andrew Lunn
On Tue, Sep 04, 2018 at 03:11:20PM -0700, Florian Fainelli wrote:
> Add support for the Northstar Plus SerDes which is accessed through a
> special page of the switch. Since this is something that most people
> probably will not want to use, make it a configurable option.
> 
> The SerDes supports both SGMII and 1000baseX modes, and is internally
> looking like a seemingly standard MII PHY, except for the few bits that
> got repurposed.

Hi Florian

The SERDES in the 6352 also look very similar to a standard MII PHYs.

Maybe at some point, we should look at the SERDES drivers we have
embedded in different MAC drivers, and see if we can pull them out,
maybe put them in drivers/net/phy. Any SERDES driver being used in
combination with phylink probably has the same API.

   Andrew


Re: [PATCH net-next 2/5] net: dsa: b53: Make SRAB driver manage port interrupts

2018-09-04 Thread Andrew Lunn
> +static irqreturn_t b53_srab_port_isr(int irq, void *dev_id)
> +{
> + struct b53_srab_port_priv *port = dev_id;
> + struct b53_device *dev = port->dev;
> + struct b53_srab_priv *priv = dev->priv;
> +
> + /* Acknowledge the interrupt */
> + writel(BIT(port->num), priv->regs + B53_SRAB_INTR);
> +
> + schedule_work(>irq_work);
> +
> + return IRQ_HANDLED;
> +}
> +
> +static int b53_srab_irq_enable(struct b53_device *dev, int port)
> +{
> + struct b53_srab_priv *priv = dev->priv;
> + struct b53_srab_port_priv *p = >port_intrs[port];
> + int ret;
> +
> + ret = request_irq(p->irq, b53_srab_port_isr, 0,
> +   dev_name(dev->dev), p);

Hi Florian

Could you use a threaded interrupt? Save you from having to implement
your own work queue. I think you can have a function called in both
interrupt context in order to acknowledged the interrupt, and thread
context to do the remaining work.

Andrew



Re: [PATCH net-next] net: sched: change tcf_del_walker() to use concurrent-safe delete

2018-09-04 Thread Cong Wang
On Mon, Sep 3, 2018 at 1:33 PM Vlad Buslov  wrote:
>
>
> On Mon 03 Sep 2018 at 18:50, Cong Wang  wrote:
> > On Mon, Sep 3, 2018 at 12:06 AM Vlad Buslov  wrote:
> >>
> >> Action API was changed to work with actions and action_idr in concurrency
> >> safe manner, however tcf_del_walker() still uses actions without taking
> >> reference to them first and deletes them directly, disregarding possible
> >> concurrent delete.
> >>
> >> Change tcf_del_walker() to use tcf_idr_delete_index() that doesn't require
> >> caller to hold reference to action and accepts action id as argument,
> >> instead of direct action pointer.
> >
> > Hmm, why doesn't tcf_del_walker() just take idrinfo->lock? At least
> > tcf_dump_walker() already does.
>
> Because tcf_del_walker() calls __tcf_idr_release(), which take
> idrinfo->lock itself (deadlock). It also calls sleeping functions like

Deadlock can be easily resolved by moving the lock out.


> tcf_action_goto_chain_fini(), so just implementing function that
> releases action without taking idrinfo->lock is not enough.

Sleeping can be resolved either by making it atomic or
deferring it to a work queue.

None of your arguments here is a blocker to locking
idrinfo->lock. You really should focus on if it is really
necessary to lock idrinfo->lock in tcf_del_walker(), rather
than these details.

For me, if you need idrinfo->lock for dump walker, you must
need it for delete walker too, because deletion is a writer
which should require stronger protection than the dumper,
which merely a reader.


[PATCH net-next v2] openvswitch: Derive IP protocol number for IPv6 later frags

2018-09-04 Thread Yi-Hung Wei
Currently, OVS only parses the IP protocol number for the first
IPv6 fragment, but sets the IP protocol number for the later fragments
to be NEXTHDF_FRAGMENT.  This patch tries to derive the IP protocol
number for the IPV6 later frags so that we can match that.

Signed-off-by: Yi-Hung Wei 
---
 net/openvswitch/flow.c | 22 +-
 1 file changed, 9 insertions(+), 13 deletions(-)

diff --git a/net/openvswitch/flow.c b/net/openvswitch/flow.c
index 56b8e7167790..35966da84769 100644
--- a/net/openvswitch/flow.c
+++ b/net/openvswitch/flow.c
@@ -254,21 +254,18 @@ static bool icmphdr_ok(struct sk_buff *skb)
 
 static int parse_ipv6hdr(struct sk_buff *skb, struct sw_flow_key *key)
 {
+   unsigned short frag_off;
+   unsigned int payload_ofs = 0;
unsigned int nh_ofs = skb_network_offset(skb);
unsigned int nh_len;
-   int payload_ofs;
struct ipv6hdr *nh;
-   uint8_t nexthdr;
-   __be16 frag_off;
-   int err;
+   int err, nexthdr, flags = 0;
 
err = check_header(skb, nh_ofs + sizeof(*nh));
if (unlikely(err))
return err;
 
nh = ipv6_hdr(skb);
-   nexthdr = nh->nexthdr;
-   payload_ofs = (u8 *)(nh + 1) - skb->data;
 
key->ip.proto = NEXTHDR_NONE;
key->ip.tos = ipv6_get_dsfield(nh);
@@ -277,10 +274,9 @@ static int parse_ipv6hdr(struct sk_buff *skb, struct 
sw_flow_key *key)
key->ipv6.addr.src = nh->saddr;
key->ipv6.addr.dst = nh->daddr;
 
-   payload_ofs = ipv6_skip_exthdr(skb, payload_ofs, , _off);
-
-   if (frag_off) {
-   if (frag_off & htons(~0x7))
+   nexthdr = ipv6_find_hdr(skb, _ofs, -1, _off, );
+   if (flags & IP6_FH_F_FRAG) {
+   if (frag_off)
key->ip.frag = OVS_FRAG_TYPE_LATER;
else
key->ip.frag = OVS_FRAG_TYPE_FIRST;
@@ -288,11 +284,11 @@ static int parse_ipv6hdr(struct sk_buff *skb, struct 
sw_flow_key *key)
key->ip.frag = OVS_FRAG_TYPE_NONE;
}
 
-   /* Delayed handling of error in ipv6_skip_exthdr() as it
-* always sets frag_off to a valid value which may be
+   /* Delayed handling of error in ipv6_find_hdr() as it
+* always sets flags and frag_off to a valid value which may be
 * used to set key->ip.frag above.
 */
-   if (unlikely(payload_ofs < 0))
+   if (unlikely(nexthdr < 0))
return -EPROTO;
 
nh_len = payload_ofs - nh_ofs;
-- 
2.7.4



[PATCH net-next 0/5] net: dsa: b53: SerDes support

2018-09-04 Thread Florian Fainelli
Hi all,

This patch series adds support for the SerDes found on NorthStar Plus
(NSP) which allows us to use the SFP port on the BCM958625HR board (and
other similar designs).

Florian Fainelli (5):
  net: dsa: b53: Add ability to enable/disable port interrupts
  net: dsa: b53: Make SRAB driver manage port interrupts
  net: dsa: b53: Add helper to set link parameters
  net: dsa: b53: Add PHYLINK support
  net: dsa: b53: Add SerDes support

 drivers/net/dsa/b53/Kconfig  |   7 +
 drivers/net/dsa/b53/Makefile |   1 +
 drivers/net/dsa/b53/b53_common.c | 243 +++
 drivers/net/dsa/b53/b53_priv.h   |  36 +
 drivers/net/dsa/b53/b53_serdes.c | 217 +++
 drivers/net/dsa/b53/b53_serdes.h | 121 +++
 drivers/net/dsa/b53/b53_srab.c   | 217 +++
 7 files changed, 813 insertions(+), 29 deletions(-)
 create mode 100644 drivers/net/dsa/b53/b53_serdes.c
 create mode 100644 drivers/net/dsa/b53/b53_serdes.h

-- 
2.17.1



Re: [PATCH rdma-next v1 00/15] Flow actions to mutate packets

2018-09-04 Thread Jason Gunthorpe
On Tue, Aug 28, 2018 at 02:18:39PM +0300, Leon Romanovsky wrote:
> From: Leon Romanovsky 
> 
> >From Mark,
> 
> This series exposes the ability to create flow actions which can
> mutate packet headers. We do that by exposing two new verbs:
>  * modify header - can change existing packet headers. packet
>  * reformat - can encapsulate or decapsulate a packet.
>   Once created a flow action must be attached to a steering
>   rule for it to take effect.
> 
> The first 10 patches refactor mlx5_core code, rename internal structures
> to better reflect their operation and export needed functions so the
> RDMA side can allocate the action.
> 
> The last 5 patches expose via the IOCTL infrastructure mlx5_ib methods
> which do the actual allocation of resources and return an handle to the
> user. A user of this API is expected to know how to work with the
> device's spec as the input to those function is HW depended.
> 
> An example usage of the modify header action is routing, A user can
> create an action which edits the L2 header and decrease the TTL.
> 
> An example usage of the packet reformat action is VXLAN encap/decap
> which is done by the HW.
> 
> Changelog:
>  v0 -> v1:
>   * Patch 1: Addressed Saeed's comments and simplified the logic.
>   * Patch 2: Changed due to changes in patch 1.
> 
>  Split the 27 patch series into 3, this is the first one
>  which just lets the user create flow action.
>  Other than that styling fixes mainly in the RDMA patches
>  to make sure 80 chars limit isn't exceeded.
> 
>  RFC -> v0:
>   * Patch 1 a new patch which refactors the logic
> when getting a flow namespace.
>   * Patch 2 was split into two.
>   * Patch 3: Fixed a typo in commit message
>   * Patch 5: Updated commit message
>   * Patch 7: Updated commit message
> Renamed:
>   - MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT_ID to
> MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT
>   - packet_reformat_id to reformat_id in struct mlx5_flow_act
>   - packet_reformat_id to encap_id in struct mlx5_esw_flow_attr
>   - packet_reformat_id to encap_id in struct mlx5e_encap_entry
>   - PACKET_REFORMAT to REFORMAT when printing trace points
>   * Patch 9: Updated commit message
> Updated function declaration in mlx5_core.h, could of lead
> to compile error on bisection.
>   * Patch 11: Disallow egress rules insertion when in switchdev mode
>   * Patch 12: A new patch to deal with passing enum values using
> the IOCTL infrastructure.
>   * Patch 13: Use new enum value attribute when passing enum
> mlx5_ib_uapi_flow_table_type
>   * Patch 15: Don't set encap flags on flow tables if in switchdev mode
>   * Patch 17: Use new enum value attribute when passing enum
> mlx5_ib_uapi_flow_table_type and enum
> mlx5_ib_uapi_flow_action_packet_reformat_type
>   * Patch 19: Allow creation of both
> MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L2_TO_L3_TUNNEL
> and MLX5_IB_UAPI_FLOW_ACTION_PACKET_REFORMAT_TYPE_L3_TUNNEL_TO_L2
> packet
> reformat actions.
>   * Patch 20: A new patch which allows attaching packet reformat
> actions to flow tables on NIC RX.
> 
> Thanks
> 
> Mark Bloch (15):
>   net/mlx5: Cleanup flow namespace getter switch logic
>   net/mlx5: Add proper NIC TX steering flow tables support
>   net/mlx5: Export modify header alloc/dealloc functions
>   net/mlx5: Add support for more namespaces when allocating modify
> header
>   net/mlx5: Break encap/decap into two separated flow table creation
> flags
>   net/mlx5: Move header encap type to IFC header file
>   {net, RDMA}/mlx5: Rename encap to reformat packet
>   net/mlx5: Expose new packet reformat capabilities
>   net/mlx5: Pass a namespace for packet reformat ID allocation
>   net/mlx5: Export packet reformat alloc/dealloc functions
>   RDMA/uverbs: Add UVERBS_ATTR_CONST_IN to the specs language
>   RDMA/mlx5: Add a new flow action verb - modify header
>   RDMA/uverbs: Add generic function to fill in flow action object
>   RDMA/mlx5: Add new flow action verb - packet reformat
>   RDMA/mlx5: Extend packet reformat verbs
> 
>  drivers/infiniband/core/uverbs_ioctl.c |  23 ++
>  .../infiniband/core/uverbs_std_types_flow_action.c |   7 +-
>  drivers/infiniband/hw/mlx5/devx.c  |   6 +-
>  drivers/infiniband/hw/mlx5/flow.c  | 301 
> +
>  drivers/infiniband/hw/mlx5/main.c  |   3 +
>  drivers/infiniband/hw/mlx5/mlx5_ib.h   |  19 +-
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c  |   8 +-
>  .../mellanox/mlx5/core/diag/fs_tracepoint.h|   2 +-
>  drivers/net/ethernet/mellanox/mlx5/core/en_tc.c|  51 ++--
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch.c  |   2 +-
>  .../ethernet/mellanox/mlx5/core/eswitch_offloads.c |   9 +-
>  drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   |  87 +++---
>  drivers/net/ethernet/mellanox/mlx5/core/fs_core.c  |  73 +++--
>  

[PATCH net-next 5/5] net: dsa: b53: Add SerDes support

2018-09-04 Thread Florian Fainelli
Add support for the Northstar Plus SerDes which is accessed through a
special page of the switch. Since this is something that most people
probably will not want to use, make it a configurable option.

The SerDes supports both SGMII and 1000baseX modes, and is internally
looking like a seemingly standard MII PHY, except for the few bits that
got repurposed.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/Kconfig  |   7 +
 drivers/net/dsa/b53/Makefile |   1 +
 drivers/net/dsa/b53/b53_common.c |  25 
 drivers/net/dsa/b53/b53_priv.h   |  17 +++
 drivers/net/dsa/b53/b53_serdes.c | 217 +++
 drivers/net/dsa/b53/b53_serdes.h | 121 +
 drivers/net/dsa/b53/b53_srab.c   | 109 
 7 files changed, 497 insertions(+)
 create mode 100644 drivers/net/dsa/b53/b53_serdes.c
 create mode 100644 drivers/net/dsa/b53/b53_serdes.h

diff --git a/drivers/net/dsa/b53/Kconfig b/drivers/net/dsa/b53/Kconfig
index 37745f4bf4f6..ceb5cee10218 100644
--- a/drivers/net/dsa/b53/Kconfig
+++ b/drivers/net/dsa/b53/Kconfig
@@ -35,3 +35,10 @@ config B53_SRAB_DRIVER
help
  Select to enable support for memory-mapped Switch Register Access
  Bridge Registers (SRAB) like it is found on the BCM53010
+
+config B53_SERDES
+   tristate "B53 SerDes support"
+   depends on B53
+   default ARCH_BCM_IPROC
+   help
+ Select to enable support for SerDes on e.g: Northstar Plus SoCs.
diff --git a/drivers/net/dsa/b53/Makefile b/drivers/net/dsa/b53/Makefile
index 4256fb42a4dd..b1be13023ae4 100644
--- a/drivers/net/dsa/b53/Makefile
+++ b/drivers/net/dsa/b53/Makefile
@@ -5,3 +5,4 @@ obj-$(CONFIG_B53_SPI_DRIVER)+= b53_spi.o
 obj-$(CONFIG_B53_MDIO_DRIVER)  += b53_mdio.o
 obj-$(CONFIG_B53_MMAP_DRIVER)  += b53_mmap.o
 obj-$(CONFIG_B53_SRAB_DRIVER)  += b53_srab.o
+obj-$(CONFIG_B53_SERDES)   += b53_serdes.o
diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 108d272ca4c7..64d72c713f1e 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -765,6 +765,8 @@ static int b53_reset_switch(struct b53_device *priv)
memset(priv->vlans, 0, sizeof(*priv->vlans) * priv->num_vlans);
memset(priv->ports, 0, sizeof(*priv->ports) * priv->num_ports);
 
+   priv->serdes_lane = B53_INVALID_LANE;
+
return b53_switch_reset(priv);
 }
 
@@ -1128,6 +1130,9 @@ void b53_phylink_validate(struct dsa_switch *ds, int port,
struct b53_device *dev = ds->priv;
__ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
 
+   if (dev->ops->serdes_phylink_validate)
+   dev->ops->serdes_phylink_validate(dev, port, mask, state);
+
/* Allow all the expected bits */
phylink_set(mask, Autoneg);
phylink_set_port_modes(mask);
@@ -1164,8 +1169,12 @@ EXPORT_SYMBOL(b53_phylink_validate);
 int b53_phylink_mac_link_state(struct dsa_switch *ds, int port,
   struct phylink_link_state *state)
 {
+   struct b53_device *dev = ds->priv;
int ret = -EOPNOTSUPP;
 
+   if (dev->ops->serdes_link_state)
+   ret = dev->ops->serdes_link_state(dev, port, state);
+
return ret;
 }
 EXPORT_SYMBOL(b53_phylink_mac_link_state);
@@ -1182,11 +1191,19 @@ void b53_phylink_mac_config(struct dsa_switch *ds, int 
port,
if (mode == MLO_AN_FIXED)
b53_force_port_config(dev, port, state->speed,
  state->duplex, state->pause);
+
+   if (phy_interface_mode_is_8023z(state->interface) &&
+   dev->ops->serdes_config)
+   dev->ops->serdes_config(dev, port, mode, state);
 }
 EXPORT_SYMBOL(b53_phylink_mac_config);
 
 void b53_phylink_mac_an_restart(struct dsa_switch *ds, int port)
 {
+   struct b53_device *dev = ds->priv;
+
+   if (dev->ops->serdes_an_restart)
+   dev->ops->serdes_an_restart(dev, port);
 }
 EXPORT_SYMBOL(b53_phylink_mac_an_restart);
 
@@ -1203,6 +1220,10 @@ void b53_phylink_mac_link_down(struct dsa_switch *ds, 
int port,
b53_force_link(dev, port, false);
return;
}
+
+   if (phy_interface_mode_is_8023z(interface) &&
+   dev->ops->serdes_link_set)
+   dev->ops->serdes_link_set(dev, port, mode, interface, false);
 }
 EXPORT_SYMBOL(b53_phylink_mac_link_down);
 
@@ -1220,6 +1241,10 @@ void b53_phylink_mac_link_up(struct dsa_switch *ds, int 
port,
b53_force_link(dev, port, true);
return;
}
+
+   if (phy_interface_mode_is_8023z(interface) &&
+   dev->ops->serdes_link_set)
+   dev->ops->serdes_link_set(dev, port, mode, interface, true);
 }
 EXPORT_SYMBOL(b53_phylink_mac_link_up);
 
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index 3f79dc07c00f..ec796482792d 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -29,6 +29,7 

[PATCH net-next 4/5] net: dsa: b53: Add PHYLINK support

2018-09-04 Thread Florian Fainelli
Add support for PHYLINK, things are reasonably straight forward since we
do not yet support SerDes interfaces, that leaves us with just
MLO_AN_PHY and MLO_AN_FIXED to deal with.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 120 +++
 drivers/net/dsa/b53/b53_priv.h   |  17 +
 2 files changed, 137 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 78aeaccf19a1..108d272ca4c7 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -1109,6 +1109,120 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
port,
p->eee_enabled = b53_eee_init(ds, port, phydev);
 }
 
+void b53_port_event(struct dsa_switch *ds, int port)
+{
+   struct b53_device *dev = ds->priv;
+   bool link;
+   u16 sts;
+
+   b53_read16(dev, B53_STAT_PAGE, B53_LINK_STAT, );
+   link = !!(sts & BIT(port));
+   dsa_port_phylink_mac_change(ds, port, link);
+}
+EXPORT_SYMBOL(b53_port_event);
+
+void b53_phylink_validate(struct dsa_switch *ds, int port,
+ unsigned long *supported,
+ struct phylink_link_state *state)
+{
+   struct b53_device *dev = ds->priv;
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+   /* Allow all the expected bits */
+   phylink_set(mask, Autoneg);
+   phylink_set_port_modes(mask);
+   phylink_set(mask, Pause);
+   phylink_set(mask, Asym_Pause);
+
+   /* With the exclusion of 5325/5365, MII, Reverse MII and 802.3z, we
+* support Gigabit, including Half duplex.
+*/
+   if (state->interface != PHY_INTERFACE_MODE_MII &&
+   state->interface != PHY_INTERFACE_MODE_REVMII &&
+   !phy_interface_mode_is_8023z(state->interface) &&
+   !(is5325(dev) || is5365(dev))) {
+   phylink_set(mask, 1000baseT_Full);
+   phylink_set(mask, 1000baseT_Half);
+   }
+
+   if (!phy_interface_mode_is_8023z(state->interface)) {
+   phylink_set(mask, 10baseT_Half);
+   phylink_set(mask, 10baseT_Full);
+   phylink_set(mask, 100baseT_Half);
+   phylink_set(mask, 100baseT_Full);
+   }
+
+   bitmap_and(supported, supported, mask,
+  __ETHTOOL_LINK_MODE_MASK_NBITS);
+   bitmap_and(state->advertising, state->advertising, mask,
+  __ETHTOOL_LINK_MODE_MASK_NBITS);
+
+   phylink_helper_basex_speed(state);
+}
+EXPORT_SYMBOL(b53_phylink_validate);
+
+int b53_phylink_mac_link_state(struct dsa_switch *ds, int port,
+  struct phylink_link_state *state)
+{
+   int ret = -EOPNOTSUPP;
+
+   return ret;
+}
+EXPORT_SYMBOL(b53_phylink_mac_link_state);
+
+void b53_phylink_mac_config(struct dsa_switch *ds, int port,
+   unsigned int mode,
+   const struct phylink_link_state *state)
+{
+   struct b53_device *dev = ds->priv;
+
+   if (mode == MLO_AN_PHY)
+   return;
+
+   if (mode == MLO_AN_FIXED)
+   b53_force_port_config(dev, port, state->speed,
+ state->duplex, state->pause);
+}
+EXPORT_SYMBOL(b53_phylink_mac_config);
+
+void b53_phylink_mac_an_restart(struct dsa_switch *ds, int port)
+{
+}
+EXPORT_SYMBOL(b53_phylink_mac_an_restart);
+
+void b53_phylink_mac_link_down(struct dsa_switch *ds, int port,
+  unsigned int mode,
+  phy_interface_t interface)
+{
+   struct b53_device *dev = ds->priv;
+
+   if (mode == MLO_AN_PHY)
+   return;
+
+   if (mode == MLO_AN_FIXED) {
+   b53_force_link(dev, port, false);
+   return;
+   }
+}
+EXPORT_SYMBOL(b53_phylink_mac_link_down);
+
+void b53_phylink_mac_link_up(struct dsa_switch *ds, int port,
+unsigned int mode,
+phy_interface_t interface,
+struct phy_device *phydev)
+{
+   struct b53_device *dev = ds->priv;
+
+   if (mode == MLO_AN_PHY)
+   return;
+
+   if (mode == MLO_AN_FIXED) {
+   b53_force_link(dev, port, true);
+   return;
+   }
+}
+EXPORT_SYMBOL(b53_phylink_mac_link_up);
+
 int b53_vlan_filtering(struct dsa_switch *ds, int port, bool vlan_filtering)
 {
return 0;
@@ -1750,6 +1864,12 @@ static const struct dsa_switch_ops b53_switch_ops = {
.phy_read   = b53_phy_read16,
.phy_write  = b53_phy_write16,
.adjust_link= b53_adjust_link,
+   .phylink_validate   = b53_phylink_validate,
+   .phylink_mac_link_state = b53_phylink_mac_link_state,
+   .phylink_mac_config = b53_phylink_mac_config,
+   .phylink_mac_an_restart = b53_phylink_mac_an_restart,
+   .phylink_mac_link_down  = b53_phylink_mac_link_down,
+   

[PATCH net-next 1/5] net: dsa: b53: Add ability to enable/disable port interrupts

2018-09-04 Thread Florian Fainelli
Some switches expose individual interrupt line(s) for port specific
event(s), allow configuring these interrupts at an appropriate time
during port_enable/disable callbacks where all port specific resources
are known to be set-up and ready for use.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 9 +
 drivers/net/dsa/b53/b53_priv.h   | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index d93c790bfbe8..85ed264bc163 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -502,8 +502,14 @@ int b53_enable_port(struct dsa_switch *ds, int port, 
struct phy_device *phy)
 {
struct b53_device *dev = ds->priv;
unsigned int cpu_port = ds->ports[port].cpu_dp->index;
+   int ret = 0;
u16 pvlan;
 
+   if (dev->ops->irq_enable)
+   ret = dev->ops->irq_enable(dev, port);
+   if (ret)
+   return ret;
+
/* Clear the Rx and Tx disable bits and set to no spanning tree */
b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), 0);
 
@@ -536,6 +542,9 @@ void b53_disable_port(struct dsa_switch *ds, int port, 
struct phy_device *phy)
b53_read8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), );
reg |= PORT_CTRL_RX_DISABLE | PORT_CTRL_TX_DISABLE;
b53_write8(dev, B53_CTRL_PAGE, B53_PORT_CTRL(port), reg);
+
+   if (dev->ops->irq_disable)
+   dev->ops->irq_disable(dev, port);
 }
 EXPORT_SYMBOL(b53_disable_port);
 
diff --git a/drivers/net/dsa/b53/b53_priv.h b/drivers/net/dsa/b53/b53_priv.h
index df149756c282..2980a5838f58 100644
--- a/drivers/net/dsa/b53/b53_priv.h
+++ b/drivers/net/dsa/b53/b53_priv.h
@@ -43,6 +43,8 @@ struct b53_io_ops {
int (*write64)(struct b53_device *dev, u8 page, u8 reg, u64 value);
int (*phy_read16)(struct b53_device *dev, int addr, int reg, u16 
*value);
int (*phy_write16)(struct b53_device *dev, int addr, int reg, u16 
value);
+   int (*irq_enable)(struct b53_device *dev, int port);
+   void (*irq_disable)(struct b53_device *dev, int port);
 };
 
 enum {
-- 
2.17.1



[PATCH net-next 3/5] net: dsa: b53: Add helper to set link parameters

2018-09-04 Thread Florian Fainelli
Extract the logic from b53_adjust_link() responsible for overriding a
given port's link, speed, duplex and pause settings and make two helper
functions to set the port's configuration and the port's link settings.
We will make use of both, as separate functions while adding PHYLINK
support next.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_common.c | 89 +---
 1 file changed, 60 insertions(+), 29 deletions(-)

diff --git a/drivers/net/dsa/b53/b53_common.c b/drivers/net/dsa/b53/b53_common.c
index 85ed264bc163..78aeaccf19a1 100644
--- a/drivers/net/dsa/b53/b53_common.c
+++ b/drivers/net/dsa/b53/b53_common.c
@@ -26,6 +26,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -947,33 +948,50 @@ static int b53_setup(struct dsa_switch *ds)
return ret;
 }
 
-static void b53_adjust_link(struct dsa_switch *ds, int port,
-   struct phy_device *phydev)
+static void b53_force_link(struct b53_device *dev, int port, int link)
 {
-   struct b53_device *dev = ds->priv;
-   struct ethtool_eee *p = >ports[port].eee;
-   u8 rgmii_ctrl = 0, reg = 0, off;
-
-   if (!phy_is_pseudo_fixed_link(phydev))
-   return;
+   u8 reg, val, off;
 
/* Override the port settings */
if (port == dev->cpu_port) {
off = B53_PORT_OVERRIDE_CTRL;
-   reg = PORT_OVERRIDE_EN;
+   val = PORT_OVERRIDE_EN;
} else {
off = B53_GMII_PORT_OVERRIDE_CTRL(port);
-   reg = GMII_PO_EN;
+   val = GMII_PO_EN;
}
 
-   /* Set the link UP */
-   if (phydev->link)
+   b53_read8(dev, B53_CTRL_PAGE, off, );
+   reg |= val;
+   if (link)
reg |= PORT_OVERRIDE_LINK;
+   else
+   reg &= ~PORT_OVERRIDE_LINK;
+   b53_write8(dev, B53_CTRL_PAGE, off, reg);
+}
+
+static void b53_force_port_config(struct b53_device *dev, int port,
+ int speed, int duplex, int pause)
+{
+   u8 reg, val, off;
+
+   /* Override the port settings */
+   if (port == dev->cpu_port) {
+   off = B53_PORT_OVERRIDE_CTRL;
+   val = PORT_OVERRIDE_EN;
+   } else {
+   off = B53_GMII_PORT_OVERRIDE_CTRL(port);
+   val = GMII_PO_EN;
+   }
 
-   if (phydev->duplex == DUPLEX_FULL)
+   b53_read8(dev, B53_CTRL_PAGE, off, );
+   reg |= val;
+   if (duplex == DUPLEX_FULL)
reg |= PORT_OVERRIDE_FULL_DUPLEX;
+   else
+   reg &= ~PORT_OVERRIDE_FULL_DUPLEX;
 
-   switch (phydev->speed) {
+   switch (speed) {
case 2000:
reg |= PORT_OVERRIDE_SPEED_2000M;
/* fallthrough */
@@ -987,21 +1005,41 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
port,
reg |= PORT_OVERRIDE_SPEED_10M;
break;
default:
-   dev_err(ds->dev, "unknown speed: %d\n", phydev->speed);
+   dev_err(dev->dev, "unknown speed: %d\n", speed);
return;
}
 
+   if (pause & MLO_PAUSE_RX)
+   reg |= PORT_OVERRIDE_RX_FLOW;
+   if (pause & MLO_PAUSE_TX)
+   reg |= PORT_OVERRIDE_TX_FLOW;
+
+   b53_write8(dev, B53_CTRL_PAGE, off, reg);
+}
+
+static void b53_adjust_link(struct dsa_switch *ds, int port,
+   struct phy_device *phydev)
+{
+   struct b53_device *dev = ds->priv;
+   struct ethtool_eee *p = >ports[port].eee;
+   u8 rgmii_ctrl = 0, reg = 0, off;
+   int pause;
+
+   if (!phy_is_pseudo_fixed_link(phydev))
+   return;
+
/* Enable flow control on BCM5301x's CPU port */
if (is5301x(dev) && port == dev->cpu_port)
-   reg |= PORT_OVERRIDE_RX_FLOW | PORT_OVERRIDE_TX_FLOW;
+   pause = MLO_PAUSE_TXRX_MASK;
 
if (phydev->pause) {
if (phydev->asym_pause)
-   reg |= PORT_OVERRIDE_TX_FLOW;
-   reg |= PORT_OVERRIDE_RX_FLOW;
+   pause |= MLO_PAUSE_TX;
+   pause |= MLO_PAUSE_RX;
}
 
-   b53_write8(dev, B53_CTRL_PAGE, off, reg);
+   b53_force_port_config(dev, port, phydev->speed, phydev->duplex, pause);
+   b53_force_link(dev, port, phydev->link);
 
if (is531x5(dev) && phy_interface_is_rgmii(phydev)) {
if (port == 8)
@@ -1061,16 +1099,9 @@ static void b53_adjust_link(struct dsa_switch *ds, int 
port,
}
} else if (is5301x(dev)) {
if (port != dev->cpu_port) {
-   u8 po_reg = B53_GMII_PORT_OVERRIDE_CTRL(dev->cpu_port);
-   u8 gmii_po;
-
-   b53_read8(dev, B53_CTRL_PAGE, po_reg, _po);
-   gmii_po |= GMII_PO_LINK |
-  GMII_PO_RX_FLOW |
-  GMII_PO_TX_FLOW |

[PATCH net-next 2/5] net: dsa: b53: Make SRAB driver manage port interrupts

2018-09-04 Thread Florian Fainelli
Update the SRAB driver to manage per-port interrupts. Since we cannot
sleep during b53_io_ops, schedule a workqueue whenever we get a port
specific interrupt. We will later make use of this to call back into
PHYLINK when there is e.g: a link state change.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/b53/b53_srab.c | 108 +
 1 file changed, 108 insertions(+)

diff --git a/drivers/net/dsa/b53/b53_srab.c b/drivers/net/dsa/b53/b53_srab.c
index 91de2ba99ad1..411b84f61903 100644
--- a/drivers/net/dsa/b53/b53_srab.c
+++ b/drivers/net/dsa/b53/b53_srab.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "b53_priv.h"
 
@@ -47,6 +48,7 @@
 
 /* command and status register of the SRAB */
 #define B53_SRAB_CTRLS 0x40
+#define  B53_SRAB_CTRLS_HOST_INTR  BIT(1)
 #define  B53_SRAB_CTRLS_RCAREQ BIT(3)
 #define  B53_SRAB_CTRLS_RCAGNT BIT(4)
 #define  B53_SRAB_CTRLS_SW_INIT_DONE   BIT(6)
@@ -60,8 +62,17 @@
 #define  B53_SRAB_P7_SLEEP_TIMER   BIT(11)
 #define  B53_SRAB_IMP0_SLEEP_TIMER BIT(12)
 
+struct b53_srab_port_priv {
+   struct work_struct irq_work;
+   int irq;
+   bool irq_enabled;
+   struct b53_device *dev;
+   unsigned int num;
+};
+
 struct b53_srab_priv {
void __iomem *regs;
+   struct b53_srab_port_priv port_intrs[B53_N_PORTS];
 };
 
 static int b53_srab_request_grant(struct b53_device *dev)
@@ -344,6 +355,50 @@ static int b53_srab_write64(struct b53_device *dev, u8 
page, u8 reg,
return ret;
 }
 
+static void b53_srab_port_defer(struct work_struct *work)
+{
+}
+
+static irqreturn_t b53_srab_port_isr(int irq, void *dev_id)
+{
+   struct b53_srab_port_priv *port = dev_id;
+   struct b53_device *dev = port->dev;
+   struct b53_srab_priv *priv = dev->priv;
+
+   /* Acknowledge the interrupt */
+   writel(BIT(port->num), priv->regs + B53_SRAB_INTR);
+
+   schedule_work(>irq_work);
+
+   return IRQ_HANDLED;
+}
+
+static int b53_srab_irq_enable(struct b53_device *dev, int port)
+{
+   struct b53_srab_priv *priv = dev->priv;
+   struct b53_srab_port_priv *p = >port_intrs[port];
+   int ret;
+
+   ret = request_irq(p->irq, b53_srab_port_isr, 0,
+ dev_name(dev->dev), p);
+   if (!ret)
+   p->irq_enabled = true;
+
+   return ret;
+}
+
+static void b53_srab_irq_disable(struct b53_device *dev, int port)
+{
+   struct b53_srab_priv *priv = dev->priv;
+   struct b53_srab_port_priv *p = >port_intrs[port];
+
+   if (p->irq_enabled) {
+   free_irq(p->irq, p);
+   cancel_work_sync(>irq_work);
+   p->irq_enabled = false;
+   }
+}
+
 static const struct b53_io_ops b53_srab_ops = {
.read8 = b53_srab_read8,
.read16 = b53_srab_read16,
@@ -355,6 +410,8 @@ static const struct b53_io_ops b53_srab_ops = {
.write32 = b53_srab_write32,
.write48 = b53_srab_write48,
.write64 = b53_srab_write64,
+   .irq_enable = b53_srab_irq_enable,
+   .irq_disable = b53_srab_irq_disable,
 };
 
 static const struct of_device_id b53_srab_of_match[] = {
@@ -379,6 +436,53 @@ static const struct of_device_id b53_srab_of_match[] = {
 };
 MODULE_DEVICE_TABLE(of, b53_srab_of_match);
 
+static void b53_srab_intr_set(struct b53_srab_priv *priv, bool set)
+{
+   u32 reg;
+
+   reg = readl(priv->regs + B53_SRAB_CTRLS);
+   if (set)
+   reg |= B53_SRAB_CTRLS_HOST_INTR;
+   else
+   reg &= ~B53_SRAB_CTRLS_HOST_INTR;
+   writel(reg, priv->regs + B53_SRAB_CTRLS);
+}
+
+static void b53_srab_prepare_irq(struct platform_device *pdev)
+{
+   struct b53_device *dev = platform_get_drvdata(pdev);
+   struct b53_srab_priv *priv = dev->priv;
+   struct b53_srab_port_priv *port;
+   unsigned int i;
+   char *name;
+
+   /* Clear all pending interrupts */
+   writel(0x, priv->regs + B53_SRAB_INTR);
+
+   if (dev->pdata && dev->pdata->chip_id != BCM58XX_DEVICE_ID)
+   return;
+
+   for (i = 0; i < B53_N_PORTS; i++) {
+   port = >port_intrs[i];
+
+   /* There is no port 6 */
+   if (i == 6)
+   continue;
+
+   name = kasprintf(GFP_KERNEL, "link_state_p%d", i);
+   if (!name)
+   return;
+
+   port->num = i;
+   port->dev = dev;
+   INIT_WORK(>irq_work, b53_srab_port_defer);
+   port->irq = platform_get_irq_byname(pdev, name);
+   kfree(name);
+   }
+
+   b53_srab_intr_set(priv, true);
+}
+
 static int b53_srab_probe(struct platform_device *pdev)
 {
struct b53_platform_data *pdata = pdev->dev.platform_data;
@@ -417,13 +521,17 @@ static int b53_srab_probe(struct platform_device *pdev)
 
platform_set_drvdata(pdev, dev);
 
+   b53_srab_prepare_irq(pdev);
+
return 

Re: [PATCH mlx5-next v1 05/15] net/mlx5: Break encap/decap into two separated flow table creation flags

2018-09-04 Thread Jason Gunthorpe
On Tue, Aug 28, 2018 at 02:18:44PM +0300, Leon Romanovsky wrote:
> From: Mark Bloch 
> 
> Today we are able to attach encap and decap actions only to the FDB. In
> preparation to enable those actions on the NIC flow tables, break the
> single flag into two. Those flags control whatever a decap or encap
> operations can be attached to the flow table created. For FDB, if
> encapsulation is required, we set both of them.
> 
> Signed-off-by: Mark Bloch 
> Reviewed-by: Saeed Mahameed 
> Reviewed-by: Or Gerlitz 
> Signed-off-by: Leon Romanovsky 
>  drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c | 3 ++-
>  drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c   | 7 ---
>  include/linux/mlx5/fs.h| 3 ++-
>  3 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index f72b5c9dcfe9..ff21807a0c4b 100644
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -529,7 +529,8 @@ static int esw_create_offloads_fast_fdb_table(struct 
> mlx5_eswitch *esw)
>   esw_size >>= 1;
>  
>   if (esw->offloads.encap != DEVLINK_ESWITCH_ENCAP_MODE_NONE)
> - flags |= MLX5_FLOW_TABLE_TUNNEL_EN;
> + flags |= (MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP |
> +   MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);
>  
>   fdb = mlx5_create_auto_grouped_flow_table(root_ns, FDB_FAST_PATH,
> esw_size,
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
> index 9ae777e56529..1698f325a21e 100644
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c
> @@ -152,7 +152,8 @@ static int mlx5_cmd_create_flow_table(struct 
> mlx5_core_dev *dev,
> struct mlx5_flow_table *next_ft,
> unsigned int *table_id, u32 flags)
>  {
> - int en_encap_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN);
> + int en_encap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_ENCAP);
> + int en_decap = !!(flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP);

Yuk, please don't use !!.

bool en_decap = flags & MLX5_FLOW_TABLE_TUNNEL_EN_DECAP;

Jason


Re: [PATCH rdma-next v1 12/15] RDMA/mlx5: Add a new flow action verb - modify header

2018-09-04 Thread Jason Gunthorpe
On Tue, Aug 28, 2018 at 02:18:51PM +0300, Leon Romanovsky wrote:

> +static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_ACTION_CREATE_MODIFY_HEADER)(
> + struct ib_uverbs_file *file,
> + struct uverbs_attr_bundle *attrs)
> +{
> + struct ib_uobject *uobj = uverbs_attr_get_uobject(
> + attrs, MLX5_IB_ATTR_CREATE_MODIFY_HEADER_HANDLE);
> + struct mlx5_ib_dev *mdev = to_mdev(uobj->context->device);
> + enum mlx5_ib_uapi_flow_table_type ft_type;
> + struct ib_flow_action *action;
> + size_t num_actions;
> + void *in;
> + int len;
> + int ret;
> +
> + if (!mlx5_ib_modify_header_supported(mdev))
> + return -EOPNOTSUPP;
> +
> + in = uverbs_attr_get_alloced_ptr(attrs,
> + MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> + len = uverbs_attr_get_len(attrs,
> + MLX5_IB_ATTR_CREATE_MODIFY_HEADER_ACTIONS_PRM);
> +
> + if (len % MLX5_UN_SZ_BYTES(set_action_in_add_action_in_auto))
> + return -EINVAL;
> +
> + ret = uverbs_get_const(_type, attrs,
> +MLX5_IB_ATTR_CREATE_MODIFY_HEADER_FT_TYPE);
> + if (ret)
> + return -EINVAL;

This should be

if (ret)
return ret;

Every call to uverbs_get_const is wrong in this same way..

I can probably fix it if this is the only thing though..

Jason


Re: [PATCH v2 net-next 4/7] dt-bindings: net: Add lantiq,xrx200-net DT bindings

2018-09-04 Thread Hauke Mehrtens
On 09/03/2018 09:46 PM, Florian Fainelli wrote:
> 
> 
> On 9/1/2018 5:04 AM, Hauke Mehrtens wrote:
>> This adds the binding for the PMAC core between the CPU and the GSWIP
>> switch found on the xrx200 / VR9 Lantiq / Intel SoC.
>>
>> Signed-off-by: Hauke Mehrtens 
>> Cc: devicet...@vger.kernel.org
>> ---
>>   .../devicetree/bindings/net/lantiq,xrx200-net.txt   | 21
>> +
>>   1 file changed, 21 insertions(+)
>>   create mode 100644
>> Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
>>
>> diff --git
>> a/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
>> b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
>> new file mode 100644
>> index ..8a2fe5200cdc
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/lantiq,xrx200-net.txt
>> @@ -0,0 +1,21 @@
>> +Lantiq xRX200 GSWIP PMAC Ethernet driver
>> +==
>> +
>> +Required properties:
>> +
>> +- compatible    : "lantiq,xrx200-net" for the PMAC of the embedded
>> +    : GSWIP in the xXR200
>> +- reg    : memory range of the PMAC core inside of the GSWIP core
>> +- interrupts    : TX and RX DMA interrupts. Use interrupt-names "tx" for
>> +    : the TX interrupt and "rx" for the RX interrupt.
> 
> You would likely want to document that the order should be strict, that
> is TX interrupt first and RX interrupt second, but other than that:
> 
> Reviewed-by: Florian Fainelli 

Currently this is fetched based on the name like this:
platform_get_irq_byname(pdev, "rx");

I do not care about the order, just interrupt-names must match.

Hauke



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 net-next 5/7] net: lantiq: Add Lantiq / Intel VRX200 Ethernet driver

2018-09-04 Thread Hauke Mehrtens
Hi Florian,

Thanks for the review.

On 09/03/2018 09:24 PM, Florian Fainelli wrote:
> 
> 
> On 9/1/2018 5:04 AM, Hauke Mehrtens wrote:
>> This drives the PMAC between the GSWIP Switch and the CPU in the VRX200
>> SoC. This is currently only the very basic version of the Ethernet
>> driver.
>>
>> When the DMA channel is activated we receive some packets which were
>> send to the SoC while it was still in U-Boot, these packets have the
>> wrong header. Resetting the IP cores did not work so we read out the
>> extra packets at the beginning and discard them.
>>
>> This also adapts the clock code in sysctrl.c to use the default name of
>> the device node so that the driver gets the correct clock. sysctrl.c
>> should be replaced with a proper common clock driver later.
>>
>> Signed-off-by: Hauke Mehrtens 
>> ---
>>   MAINTAINERS  |   1 +
>>   arch/mips/lantiq/xway/sysctrl.c  |   6 +-
>>   drivers/net/ethernet/Kconfig |   7 +
>>   drivers/net/ethernet/Makefile    |   1 +
>>   drivers/net/ethernet/lantiq_xrx200.c | 591
>> +++
>>   5 files changed, 603 insertions(+), 3 deletions(-)
>>   create mode 100644 drivers/net/ethernet/lantiq_xrx200.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 4b2ee65f6086..912d31b5 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -8171,6 +8171,7 @@ M:    Hauke Mehrtens 
>>   L:    netdev@vger.kernel.org
>>   S:    Maintained
>>   F:    net/dsa/tag_gswip.c
>> +F:    drivers/net/ethernet/lantiq_xrx200.c
>>     LANTIQ MIPS ARCHITECTURE
>>   M:    John Crispin 
>> diff --git a/arch/mips/lantiq/xway/sysctrl.c
>> b/arch/mips/lantiq/xway/sysctrl.c
>> index e0af39b33e28..eeb89a37e27e 100644
>> --- a/arch/mips/lantiq/xway/sysctrl.c
>> +++ b/arch/mips/lantiq/xway/sysctrl.c
>> @@ -505,7 +505,7 @@ void __init ltq_soc_init(void)
>>   clkdev_add_pmu("1a80.pcie", "msi", 1, 1, PMU1_PCIE2_MSI);
>>   clkdev_add_pmu("1a80.pcie", "pdi", 1, 1, PMU1_PCIE2_PDI);
>>   clkdev_add_pmu("1a80.pcie", "ctl", 1, 1, PMU1_PCIE2_CTL);
>> -    clkdev_add_pmu("1e108000.eth", NULL, 0, 0, PMU_SWITCH |
>> PMU_PPE_DP);
>> +    clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH |
>> PMU_PPE_DP);
>>   clkdev_add_pmu("1da0.usif", "NULL", 1, 0, PMU_USIF);
>>   clkdev_add_pmu("1e103100.deu", NULL, 1, 0, PMU_DEU);
>>   } else if (of_machine_is_compatible("lantiq,ar10")) {
>> @@ -513,7 +513,7 @@ void __init ltq_soc_init(void)
>>     ltq_ar10_fpi_hz(), ltq_ar10_pp32_hz());
>>   clkdev_add_pmu("1e101000.usb", "otg", 1, 0, PMU_USB0);
>>   clkdev_add_pmu("1e106000.usb", "otg", 1, 0, PMU_USB1);
>> -    clkdev_add_pmu("1e108000.eth", NULL, 0, 0, PMU_SWITCH |
>> +    clkdev_add_pmu("1e10b308.eth", NULL, 0, 0, PMU_SWITCH |
>>  PMU_PPE_DP | PMU_PPE_TC);
> 
> Should not that be part of patch 4 where you define the base register
> address?

hmm, I can also put this into patch number 4.

> 
>>   clkdev_add_pmu("1da0.usif", "NULL", 1, 0, PMU_USIF);
>>   clkdev_add_pmu("1f203020.gphy", NULL, 1, 0, PMU_GPHY);
>> @@ -536,7 +536,7 @@ void __init ltq_soc_init(void)
>>   clkdev_add_pmu(NULL, "ahb", 1, 0, PMU_AHBM | PMU_AHBS);
>>     clkdev_add_pmu("1da0.usif", "NULL", 1, 0, PMU_USIF);
>> -    clkdev_add_pmu("1e108000.eth", NULL, 0, 0,
>> +    clkdev_add_pmu("1e10b308.eth", NULL, 0, 0,
>>   PMU_SWITCH | PMU_PPE_DPLUS | PMU_PPE_DPLUM |
>>   PMU_PPE_EMA | PMU_PPE_TC | PMU_PPE_SLL01 |
>>   PMU_PPE_QSB | PMU_PPE_TOP);
> 
> Likewise.
> 
> [snip]
> 
>> +static int xrx200_open(struct net_device *dev)
>> +{
>> +    struct xrx200_priv *priv = netdev_priv(dev);
>> +
>> +    ltq_dma_open(>chan_tx.dma);
>> +    ltq_dma_enable_irq(>chan_tx.dma);
>> +
>> +    napi_enable(>chan_rx.napi);
>> +    ltq_dma_open(>chan_rx.dma);
>> +    /* The boot loader does not always deactivate the receiving of
>> frames
>> + * on the ports and then some packets queue up in the PPE buffers.
>> + * They already passed the PMAC so they do not have the tags
>> + * configured here. Read the these packets here and drop them.
>> + * The HW should have written them into memory after 10us
>> + */
>> +    udelay(10);
> 
> You execute in process context with the ndo_open() callback (AFAIR),
> would usleep_range() work here?

The Documentation/timers/timers-howto.txt says for ~10us I should use
udaly also in non atomic context, I can try usleep_range() too.

>> +    xrx200_flush_dma(>chan_rx);
>> +    ltq_dma_enable_irq(>chan_rx.dma);
>> +
>> +    netif_wake_queue(dev);
>> +
>> +    return 0;
>> +}
>> +
>> +static int xrx200_close(struct net_device *dev)
>> +{
>> +    struct xrx200_priv *priv = netdev_priv(dev);
>> +
>> +    netif_stop_queue(dev);
>> +
>> +    napi_disable(>chan_rx.napi);
>> +    ltq_dma_close(>chan_rx.dma);
>> +
>> +    

Re: phys_port_id in switchdev mode?

2018-09-04 Thread Or Gerlitz
On Tue, Sep 4, 2018 at 1:20 PM, Jakub Kicinski
 wrote:
> On Mon, 3 Sep 2018 12:40:22 +0300, Or Gerlitz wrote:
>> On Tue, Aug 28, 2018 at 9:05 PM, Jakub Kicinski wrote:
>> > Hi!
>>
>> Hi Jakub and sorry for the late reply, this crazigly hot summer refuses to 
>> die,
>>
>> Note I replied couple of minutes ago but it didn't get to the list, so
>> lets take it from this one:
>>
>> > I wonder if we can use phys_port_id in switchdev to group together
>> > interfaces of a single PCI PF?  Here is the problem:
>> >
>> > With a mix of PF and VF interfaces it gets increasingly difficult to
>> > figure out which one corresponds to which PF.  We can identify which
>> > *representor* is which, by means of phys_port_name and devlink
>> > flavours.  But if the actual VF/PF interfaces are also present on the
>> > same host, it gets confusing when one tries to identify the PF they
>> > came from.  Generally one has to resort of matching between PCI DBDF of
>> > the PF and VFs or read relevant info out of ethtool -i.
>> >
>> > In multi host scenario this is particularly painful, as there seems to
>> > be no immediately obvious way to match PCI interface ID of a card (0,
>> > 1, 2, 3, 4...) to the DBDF we have connected.
>> >
>> > Another angle to this is legacy SR-IOV NDOs.  User space picks a netdev
>> > from /sys/bus/pci/$VF_DBDF/physfn/net/ to run the NDOs on in somehow
>> > random manner, which means we have to provide those for all devices with
>> > link to the PF (all reprs).  And we have to link them (a) because it's
>> > right (tm) and (b) to get correct naming.
>>
>> wait, as you commented in later, not only the mellanox vf reprs but rather 
>> also
>> the nfp vf reprs are not linked to the PF, because ip link output
>> grows quadratically.
>
> Right, correct.  If we set phys_port_id libvirt will reliably pick the
> correct netdev to run NDOs on (PF/PF repr) so we can remove them from
> the other netdevs and therefore limit the size of ip link show output.

just to make sure, this is suggested/future not existing flow of libvirt?


> Ugh, you're right!  Libvirt is our primary target here.  IIUC we need
> phys_port_id on the actual VF and then *a* netdev linked to physfn in
> sysfs which will have the legacy NDOs.
>
> We can't set the phys_port_id on the VF reprs because then we're back
> to the problem of ip link output growing.  Perhaps we shouldn't set it
> on PF repr either?
>
> Let's make a table (assuming bare metal cloud scenario where Host0 is
> controlling the network, while Host1 is the actual server):

yeah, this would be a super-set the non-smartnic case where
we have only one host.



[...]


> With this libvirt on Host0 should easily find the actual PF0 netdev to
> run the NDO on, if it wants to use VFs:
>  - libvrit finds act VF0/0 to plug into the VM;
>  - reads its phys_port_id -> "PF0 SN";
>  - finds netdev with "PF0 SN" linked to physfn -> "act PF0";
>  - runs NDOs on "act PF0" for PF0's VF correctly.

What you describe here doesn't seem to be networking
configuration, as it deals only with VFs and PF but not with reprs,
and hence AFAIK runs on host host1

[...]

> Should Host0 in bare metal cloud have access to SR-IOV NDOs of Host1?

I need to think on that


Re: [PATCH RFC net-next] net: Poptrie based routing table lookup

2018-09-04 Thread Md. Islam
On Tue, Sep 4, 2018 at 12:14 PM, Md. Islam  wrote:
>
> On Tue, Sep 4, 2018, 6:53 AM Jesper Dangaard Brouer 
> wrote:
>>
>> Hi Md. Islam,
>>
>> People will start to ignore you, when you don't interact appropriately
>> with the community, and you ignore their advice, especially when it is
>> about how to interact with the community[1].
>>
>> You have not addressed any of my feedback on your patch in [1].
>>  [1]
>> http://www.mail-archive.com/search?l=mid=20180827173334.16ff0...@redhat.com
>
>
> Jesper,
>
> I actually addressed all the feedbacks in the previous patch except TOS,
> FIB_matrics, and etc. This is because I don't think they are relevant in
> this usecase. Please let me know if I wrong.
>
> Thanks

Jesper

Sorry, I missed your review in the first place. I will take a look and
resubmit the patch.

Thanks

>>
>>
>>
>> --
>> Best regards,
>>   Jesper Dangaard Brouer
>>   MSc.CS, Principal Kernel Engineer at Red Hat
>>   LinkedIn: http://www.linkedin.com/in/brouer
>>
>> p.s. also top-posting is bad, but I suspect you will not read my
>> response if I don't top-post.
>>
>>
>> On Tue, 4 Sep 2018 01:02:30 -0400 "Md. Islam"  wrote:
>>
>> > This patch implements Poptrie based routing table
>> > lookup/insert/delete/flush. Currently many carrier routers use kernel
>> > bypass frameworks such as DPDK and VPP to implement the data plane.
>> > XDP along with this patch will enable Linux to work as such a router.
>> > Currently it supports up to 255 ports. Many real word backbone routers
>> > have up to 233 ports (to the best of my knowledge), so it seems to be
>> > sufficient at this moment.
>> >
>> > I also have attached a draft paper to explain it works (poptrie.pdf).
>> > Please set CONFIG_FIB_POPTRIE=y (default n) before testing the patch.
>> > Note that, poptrie_lookup() is not being called from anywhere. It will
>> > be used by XDP forwarding.
>> >
>> >
>> > From 3dc9683298ed896dd3080733503c35d68f05370e Mon Sep 17 00:00:00 2001
>> > From: tamimcse 
>> > Date: Mon, 3 Sep 2018 23:56:43 -0400
>> > Subject: [PATCH] Poptrie based routing table lookup
>> >
>> > Signed-off-by: tamimcse 
>> > ---
>> >  include/net/ip_fib.h   |  42 +
>> >  net/ipv4/Kconfig   |   4 +
>> >  net/ipv4/Makefile  |   1 +
>> >  net/ipv4/fib_poptrie.c | 483
>> > +
>> >  net/ipv4/fib_trie.c|  12 ++
>> >  5 files changed, 542 insertions(+)
>> >  create mode 100644 net/ipv4/fib_poptrie.c
>>
>> First of order of business: You need to conform to the kernels coding
>> standards!
>>
>> https://www.kernel.org/doc/html/v4.18/process/coding-style.html
>>
>> There is a script avail to check this called: scripts/checkpatch.pl
>> It summary says:
>>  total: 139 errors, 238 warnings, 6 checks, 372 lines checked
>> (Not good, more error+warnings than lines...)
>>
>> Please fix up those... else people will not even read you code!
>>
>


Motorcycle Owners List

2018-09-04 Thread Audrey Tyler



Hi,

Would you are interested in acquiring an email list of "Motorcycle Owners" from 
USA.

We also having data of Harley Davidson Owners, Car Owners List, BMW Owners 
List, RV Owners, Pick Up Truck Owners, Boat Owners, RV Owners List and many 
more...

Each record we will provide you with: Contact (First and Last name), Mailing 
Address and Emails Address.

Please let me know your thoughts towards procuring these Lists.

Best Regards,
Audrey Tyler
Research Analyst



Re: [PATCH net] net: phy: sfp: Handle unimplemented hwmon limits and alarms

2018-09-04 Thread David Miller
From: Andrew Lunn 
Date: Tue,  4 Sep 2018 04:23:56 +0200

> Not all SFPs implement the registers containing sensor limits and
> alarms. Luckily, there is a bit indicating if they are implemented or
> not. Add checking for this bit, when deciding if the hwmon attributes
> should be visible.
> 
> Fixes: 1323061a018a ("net: phy: sfp: Add HWMON support for module sensors")
> Signed-off-by: Andrew Lunn 

Applied, thanks Andrew.


Re: [PATCH net-next v2] net: sched: action_ife: take reference to meta module

2018-09-04 Thread David Miller
From: Vlad Buslov 
Date: Tue,  4 Sep 2018 00:44:42 +0300

> Recent refactoring of add_metainfo() caused use_all_metadata() to add
> metainfo to ife action metalist without taking reference to module. This
> causes warning in module_put called from ife action cleanup function.
> 
> Implement add_metainfo_and_get_ops() function that returns with reference
> to module taken if metainfo was added successfully, and call it from
> use_all_metadata(), instead of calling __add_metainfo() directly.
> 
> Example warning:
 ...
> Fixes: 5ffe57da29b3 ("act_ife: fix a potential deadlock")
> Signed-off-by: Vlad Buslov 
> ---
> 
> Changes V1->V2:
> - fold constants into helper function

Applied to 'net'.


Re: [Patch net] act_ife: fix a potential use-after-free

2018-09-04 Thread David Miller
From: Cong Wang 
Date: Mon,  3 Sep 2018 11:08:15 -0700

> Immediately after module_put(), user could delete this
> module, so e->ops could be already freed before we call
> e->ops->release().
> 
> Fix this by moving module_put() after ops->release().
> 
> Fixes: ef6980b6becb ("introduce IFE action")
> Signed-off-by: Cong Wang 

Applied and queued up for -stable, thanks Cong.


Re: [PATCH net] net/mlx5: Fix SQ offset in QPs with small RQ

2018-09-04 Thread David Miller
From: Tariq Toukan 
Date: Mon,  3 Sep 2018 18:06:24 +0300

> Correct the formula for calculating the RQ page remainder,
> which should be in byte granularity.  The result will be
> non-zero only for RQs smaller than PAGE_SIZE, as an RQ size
> is a power of 2.
> 
> Divide this by the SQ stride (MLX5_SEND_WQE_BB) to get the
> SQ offset in strides granularity.
> 
> Fixes: d7037ad73daa ("net/mlx5: Fix QP fragmented buffer allocation")
> Signed-off-by: Tariq Toukan 
> Reviewed-by: Eran Ben Elisha 
> Signed-off-by: Saeed Mahameed 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/wq.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> Hi Dave,
> Please queue for -stable v4.18.

Applied and queued up for -stable, thanks.


Re: [PATCH] neighbour: confirm neigh entries when ARP packet is received

2018-09-04 Thread David Miller
From: Ihar Hrachyshka 
Date: Tue, 4 Sep 2018 11:31:23 -0700

> Of course, I also agree that the comment will need some adjustment to
> reflect the fact that now a single timestamp is being updated. Perhaps
> while at it, Vasily could also explicitly describe in a comment which
> scenario the "if" branch check is supposed to cover. (I should have
> done it myself, mea culpa.)

Yes, that would help a lot.


Re: [PATCH net-next v2] net: sched: action_ife: take reference to meta module

2018-09-04 Thread Cong Wang
On Mon, Sep 3, 2018 at 2:44 PM Vlad Buslov  wrote:
>
> Recent refactoring of add_metainfo() caused use_all_metadata() to add
> metainfo to ife action metalist without taking reference to module. This
> causes warning in module_put called from ife action cleanup function.
>
> Implement add_metainfo_and_get_ops() function that returns with reference
> to module taken if metainfo was added successfully, and call it from
> use_all_metadata(), instead of calling __add_metainfo() directly.


Acked-by: Cong Wang 

This one should go to -net too.


Re: [PATCH] neighbour: confirm neigh entries when ARP packet is received

2018-09-04 Thread Ihar Hrachyshka
On Sat, Sep 1, 2018 at 4:51 PM, David Miller  wrote:
> From: Vasily Khoruzhick 
> Date: Tue, 28 Aug 2018 19:48:25 -0700
>
>> Update 'confirmed' timestamp when ARP packet is received. It shouldn't
>> affect locktime logic and anyway entry can be confirmed by any higher-layer
>> protocol. Thus it makes no sense not to confirm it when ARP packet is
>> received.
>>
>> Fixes: 77d7123342 ("neighbour: update neigh timestamps iff update is
>> effective")
>>
>> Signed-off-by: Vasily Khoruzhick 
>
> I'm not so sure.
>
> The comment above the code you are moving explains that the current
> behavior is intention, and it explains why too.
>
> Even if your change is correct, you're now making that comment
> inaccuratte, so you'd have to update it to match the new code.
>
> But I still think the current code is intentionally behaving that
> way, and for good reason.

Hi David,

(I am the one who put this comment there.)

I agree with the reasoning that Vasily provided for the change (we
should confirm the entry if e.g. ARP packet with identical
hwaddr/ipaddr pair arrives; just not mark it as updated). It was a
mistake of mine to put access to both updated and confirmed fields
under the "if" branch. Just leaving 'updated' there and moving
'confirmed' outside seems like the right thing to do.

The original intent was to not update 'updated' field when no update
happens (because of consequent ARP packets sent in short span of
time). The fix by Vasily should not negatively affect this scenario.

Of course, I also agree that the comment will need some adjustment to
reflect the fact that now a single timestamp is being updated. Perhaps
while at it, Vasily could also explicitly describe in a comment which
scenario the "if" branch check is supposed to cover. (I should have
done it myself, mea culpa.)

I hope it helps,
Ihar


Re: [PATCH net] net/sched: fix memory leak in act_tunnel_key_init()

2018-09-04 Thread Cong Wang
On Tue, Sep 4, 2018 at 10:00 AM Davide Caratti  wrote:
>
> If users try to install act_tunnel_key 'set' rules with duplicate values
> of 'index', the tunnel metadata are allocated, but never released. Then,
> kmemleak complains as follows:

Acked-by: Cong Wang 


[PATCH bpf-next 4/4] i40e: disallow changing the number of descriptors when AF_XDP is on

2018-09-04 Thread Björn Töpel
From: Björn Töpel 

When an AF_XDP UMEM is attached to any of the Rx rings, we disallow a
user to change the number of descriptors via e.g. "ethtool -G IFNAME".

Otherwise, the size of the stash/reuse queue can grow unbounded, which
would result in OOM or leaking userspace buffers.

Signed-off-by: Björn Töpel 
---
 .../net/ethernet/intel/i40e/i40e_ethtool.c|  9 +++-
 .../ethernet/intel/i40e/i40e_txrx_common.h|  1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c| 22 +++
 3 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index d7d3974beca2..3cd2c88c72f8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -5,7 +5,7 @@
 
 #include "i40e.h"
 #include "i40e_diag.h"
-
+#include "i40e_txrx_common.h"
 #include "i40e_ethtool_stats.h"
 
 #define I40E_PF_STAT(_name, _stat) \
@@ -1493,6 +1493,13 @@ static int i40e_set_ringparam(struct net_device *netdev,
(new_rx_count == vsi->rx_rings[0]->count))
return 0;
 
+   /* If there is a AF_XDP UMEM attached to any of Rx rings,
+* disallow changing the number of descriptors -- regardless
+* if the netdev is running or not.
+*/
+   if (i40e_xsk_any_rx_ring_enabled(vsi))
+   return -EBUSY;
+
while (test_and_set_bit(__I40E_CONFIG_BUSY, pf->state)) {
timeout--;
if (!timeout)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 8d46acff6f2e..09809dffe399 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -89,5 +89,6 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
 
 void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
+bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi);
 
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index e4b62e871afc..119f59ec7cc0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -944,3 +944,25 @@ void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
if (xsk_frames)
xsk_umem_complete_tx(umem, xsk_frames);
 }
+
+/**
+ * i40e_xsk_any_rx_ring_enabled - Checks whether any of the Rx rings
+ * has AF_XDP UMEM attached
+ * @vsi: vsi
+ *
+ * Returns true if any of the Rx rings has an AF_XDP UMEM attached
+ **/
+bool i40e_xsk_any_rx_ring_enabled(struct i40e_vsi *vsi)
+{
+   int i;
+
+   if (!vsi->xsk_umems)
+   return false;
+
+   for (i = 0; i < vsi->num_queue_pairs; i++) {
+   if (vsi->xsk_umems[i])
+   return true;
+   }
+
+   return false;
+}
-- 
2.17.1



[PATCH bpf-next 3/4] i40e: clean zero-copy XDP Rx ring on shutdown/reset

2018-09-04 Thread Björn Töpel
From: Björn Töpel 

Outstanding Rx descriptors are temporarily stored on a stash/reuse
queue. When/if the HW rings comes up again, entries from the stash are
used to re-populate the ring.

The latter required some restructuring of the allocation scheme for
the AF_XDP zero-copy implementation. There is now a fast, and a slow
allocation. The "fast allocation" is used from the fast-path and
obtains free buffers from the fill ring and the internal recycle
mechanism. The "slow allocation" is only used in ring setup, and
obtains buffers from the fill ring and the stash (if any).

Signed-off-by: Björn Töpel 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |   4 +-
 .../ethernet/intel/i40e/i40e_txrx_common.h|   1 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c| 100 --
 3 files changed, 96 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 7f85d4ba8b54..740ea58ba938 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1355,8 +1355,10 @@ void i40e_clean_rx_ring(struct i40e_ring *rx_ring)
rx_ring->skb = NULL;
}
 
-   if (rx_ring->xsk_umem)
+   if (rx_ring->xsk_umem) {
+   i40e_xsk_clean_rx_ring(rx_ring);
goto skip_free;
+   }
 
/* Free all the Rx ring sk_buffs */
for (i = 0; i < rx_ring->count; i++) {
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index 29c68b29d36f..8d46acff6f2e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -87,6 +87,7 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
}
 }
 
+void i40e_xsk_clean_rx_ring(struct i40e_ring *rx_ring);
 void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
 
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 99116277c4d2..e4b62e871afc 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -140,6 +140,7 @@ static void i40e_xsk_umem_dma_unmap(struct i40e_vsi *vsi, 
struct xdp_umem *umem)
 static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, struct xdp_umem *umem,
u16 qid)
 {
+   struct xdp_umem_fq_reuse *reuseq;
bool if_running;
int err;
 
@@ -156,6 +157,12 @@ static int i40e_xsk_umem_enable(struct i40e_vsi *vsi, 
struct xdp_umem *umem,
return -EBUSY;
}
 
+   reuseq = xsk_reuseq_prepare(vsi->rx_rings[0]->count);
+   if (!reuseq)
+   return -ENOMEM;
+
+   xsk_reuseq_free(xsk_reuseq_swap(umem, reuseq));
+
err = i40e_xsk_umem_dma_map(vsi, umem);
if (err)
return err;
@@ -353,16 +360,46 @@ static bool i40e_alloc_buffer_zc(struct i40e_ring 
*rx_ring,
 }
 
 /**
- * i40e_alloc_rx_buffers_zc - Allocates a number of Rx buffers
+ * i40e_alloc_buffer_slow_zc - Allocates an i40e_rx_buffer
  * @rx_ring: Rx ring
- * @count: The number of buffers to allocate
+ * @bi: Rx buffer to populate
  *
- * This function allocates a number of Rx buffers and places them on
- * the Rx ring.
+ * This function allocates an Rx buffer. The buffer can come from fill
+ * queue, or via the reuse queue.
  *
  * Returns true for a successful allocation, false otherwise
  **/
-bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, u16 count)
+static bool i40e_alloc_buffer_slow_zc(struct i40e_ring *rx_ring,
+ struct i40e_rx_buffer *bi)
+{
+   struct xdp_umem *umem = rx_ring->xsk_umem;
+   u64 handle, hr;
+
+   if (!xsk_umem_peek_addr_rq(umem, )) {
+   rx_ring->rx_stats.alloc_page_failed++;
+   return false;
+   }
+
+   handle &= rx_ring->xsk_umem->chunk_mask;
+
+   hr = umem->headroom + XDP_PACKET_HEADROOM;
+
+   bi->dma = xdp_umem_get_dma(umem, handle);
+   bi->dma += hr;
+
+   bi->addr = xdp_umem_get_data(umem, handle);
+   bi->addr += hr;
+
+   bi->handle = handle + umem->headroom;
+
+   xsk_umem_discard_addr_rq(umem);
+   return true;
+}
+
+static __always_inline bool __i40e_alloc_rx_buffers_zc(
+   struct i40e_ring *rx_ring, u16 count,
+   bool alloc(struct i40e_ring *rx_ring,
+  struct i40e_rx_buffer *bi))
 {
u16 ntu = rx_ring->next_to_use;
union i40e_rx_desc *rx_desc;
@@ -372,7 +409,7 @@ bool i40e_alloc_rx_buffers_zc(struct i40e_ring *rx_ring, 
u16 count)
rx_desc = I40E_RX_DESC(rx_ring, ntu);
bi = _ring->rx_bi[ntu];
do {
-   if (!i40e_alloc_buffer_zc(rx_ring, bi)) {
+   if (!alloc(rx_ring, bi)) {
ok = false;
goto no_buffers;
}
@@ -404,6 +441,38 @@ 

[PATCH bpf-next 1/4] i40e: clean zero-copy XDP Tx ring on shutdown/reset

2018-09-04 Thread Björn Töpel
From: Björn Töpel 

When the zero-copy enabled XDP Tx ring is torn down, due to
configuration changes, outstandning frames on the hardware descriptor
ring are queued on the completion ring.

The completion ring has a back-pressure mechanism that will guarantee
that there is sufficient space on the ring.

Signed-off-by: Björn Töpel 
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   | 17 +++
 .../ethernet/intel/i40e/i40e_txrx_common.h|  2 ++
 drivers/net/ethernet/intel/i40e/i40e_xsk.c| 30 +++
 3 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c 
b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 37bd4e50ccde..7f85d4ba8b54 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -636,13 +636,18 @@ void i40e_clean_tx_ring(struct i40e_ring *tx_ring)
unsigned long bi_size;
u16 i;
 
-   /* ring already cleared, nothing to do */
-   if (!tx_ring->tx_bi)
-   return;
+   if (ring_is_xdp(tx_ring) && tx_ring->xsk_umem) {
+   i40e_xsk_clean_tx_ring(tx_ring);
+   } else {
+   /* ring already cleared, nothing to do */
+   if (!tx_ring->tx_bi)
+   return;
 
-   /* Free all the Tx ring sk_buffs */
-   for (i = 0; i < tx_ring->count; i++)
-   i40e_unmap_and_free_tx_resource(tx_ring, _ring->tx_bi[i]);
+   /* Free all the Tx ring sk_buffs */
+   for (i = 0; i < tx_ring->count; i++)
+   i40e_unmap_and_free_tx_resource(tx_ring,
+   _ring->tx_bi[i]);
+   }
 
bi_size = sizeof(struct i40e_tx_buffer) * tx_ring->count;
memset(tx_ring->tx_bi, 0, bi_size);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h 
b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
index b5afd479a9c5..29c68b29d36f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx_common.h
@@ -87,4 +87,6 @@ static inline void i40e_arm_wb(struct i40e_ring *tx_ring,
}
 }
 
+void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring);
+
 #endif /* I40E_TXRX_COMMON_ */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c 
b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 2ebfc78bbd09..99116277c4d2 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -830,3 +830,33 @@ int i40e_xsk_async_xmit(struct net_device *dev, u32 
queue_id)
 
return 0;
 }
+
+/**
+ * i40e_xsk_clean_xdp_ring - Clean the XDP Tx ring on shutdown
+ * @xdp_ring: XDP Tx ring
+ **/
+void i40e_xsk_clean_tx_ring(struct i40e_ring *tx_ring)
+{
+   u16 ntc = tx_ring->next_to_clean, ntu = tx_ring->next_to_use;
+   struct xdp_umem *umem = tx_ring->xsk_umem;
+   struct i40e_tx_buffer *tx_bi;
+   u32 xsk_frames = 0;
+
+   while (ntc != ntu) {
+   tx_bi = _ring->tx_bi[ntc];
+
+   if (tx_bi->xdpf)
+   i40e_clean_xdp_tx_buffer(tx_ring, tx_bi);
+   else
+   xsk_frames++;
+
+   tx_bi->xdpf = NULL;
+
+   ntc++;
+   if (ntc > tx_ring->count)
+   ntc = 0;
+   }
+
+   if (xsk_frames)
+   xsk_umem_complete_tx(umem, xsk_frames);
+}
-- 
2.17.1



[PATCH bpf-next 2/4] net: xsk: add a simple buffer reuse queue

2018-09-04 Thread Björn Töpel
From: Jakub Kicinski 

XSK UMEM is strongly single producer single consumer so reuse of
frames is challenging.  Add a simple "stash" of FILL packets to
reuse for drivers to optionally make use of.  This is useful
when driver has to free (ndo_stop) or resize a ring with an active
AF_XDP ZC socket.

Signed-off-by: Jakub Kicinski 
---
 include/net/xdp_sock.h | 43 +
 net/xdp/xdp_umem.c |  2 ++
 net/xdp/xsk_queue.c| 55 ++
 net/xdp/xsk_queue.h|  3 +++
 4 files changed, 103 insertions(+)

diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index 932ca0dad6f3..7b55206da138 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -14,6 +14,7 @@
 #include 
 
 struct net_device;
+struct xdp_umem_fq_reuse;
 struct xsk_queue;
 
 struct xdp_umem_page {
@@ -37,6 +38,7 @@ struct xdp_umem {
struct page **pgs;
u32 npgs;
struct net_device *dev;
+   struct xdp_umem_fq_reuse *fq_reuse;
u16 queue_id;
bool zc;
spinlock_t xsk_list_lock;
@@ -139,4 +141,45 @@ static inline dma_addr_t xdp_umem_get_dma(struct xdp_umem 
*umem, u64 addr)
 }
 #endif /* CONFIG_XDP_SOCKETS */
 
+struct xdp_umem_fq_reuse {
+   u32 nentries;
+   u32 length;
+   u64 handles[];
+};
+
+/* Following functions are not thread-safe in any way */
+struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries);
+struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
+ struct xdp_umem_fq_reuse *newq);
+void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq);
+
+/* Reuse-queue aware version of FILL queue helpers */
+static inline u64 *xsk_umem_peek_addr_rq(struct xdp_umem *umem, u64 *addr)
+{
+   struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+   if (!rq->length)
+   return xsk_umem_peek_addr(umem, addr);
+
+   *addr = rq->handles[rq->length - 1];
+   return addr;
+}
+
+static inline void xsk_umem_discard_addr_rq(struct xdp_umem *umem)
+{
+   struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+   if (!rq->length)
+   xsk_umem_discard_addr(umem);
+   else
+   rq->length--;
+}
+
+static inline void xsk_umem_fq_reuse(struct xdp_umem *umem, u64 addr)
+{
+   struct xdp_umem_fq_reuse *rq = umem->fq_reuse;
+
+   rq->handles[rq->length++] = addr;
+}
+
 #endif /* _LINUX_XDP_SOCK_H */
diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index b3b632c5aeae..555427b3e0fe 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -165,6 +165,8 @@ static void xdp_umem_release(struct xdp_umem *umem)
umem->cq = NULL;
}
 
+   xsk_reuseq_destroy(umem);
+
xdp_umem_unpin_pages(umem);
 
task = get_pid_task(umem->pid, PIDTYPE_PID);
diff --git a/net/xdp/xsk_queue.c b/net/xdp/xsk_queue.c
index 2dc1384d9f27..b66504592d9b 100644
--- a/net/xdp/xsk_queue.c
+++ b/net/xdp/xsk_queue.c
@@ -3,7 +3,9 @@
  * Copyright(c) 2018 Intel Corporation.
  */
 
+#include 
 #include 
+#include 
 
 #include "xsk_queue.h"
 
@@ -62,3 +64,56 @@ void xskq_destroy(struct xsk_queue *q)
page_frag_free(q->ring);
kfree(q);
 }
+
+struct xdp_umem_fq_reuse *xsk_reuseq_prepare(u32 nentries)
+{
+   struct xdp_umem_fq_reuse *newq;
+
+   /* Check for overflow */
+   if (nentries > (u32)roundup_pow_of_two(nentries))
+   return NULL;
+   nentries = roundup_pow_of_two(nentries);
+
+   newq = kvmalloc(struct_size(newq, handles, nentries), GFP_KERNEL);
+   if (!newq)
+   return NULL;
+   memset(newq, 0, offsetof(typeof(*newq), handles));
+
+   newq->nentries = nentries;
+   return newq;
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_prepare);
+
+struct xdp_umem_fq_reuse *xsk_reuseq_swap(struct xdp_umem *umem,
+ struct xdp_umem_fq_reuse *newq)
+{
+   struct xdp_umem_fq_reuse *oldq = umem->fq_reuse;
+
+   if (!oldq) {
+   umem->fq_reuse = newq;
+   return NULL;
+   }
+
+   if (newq->nentries < oldq->length)
+   return newq;
+
+   memcpy(newq->handles, oldq->handles,
+  array_size(oldq->length, sizeof(u64)));
+   newq->length = oldq->length;
+
+   umem->fq_reuse = newq;
+   return oldq;
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_swap);
+
+void xsk_reuseq_free(struct xdp_umem_fq_reuse *rq)
+{
+   kvfree(rq);
+}
+EXPORT_SYMBOL_GPL(xsk_reuseq_free);
+
+void xsk_reuseq_destroy(struct xdp_umem *umem)
+{
+   xsk_reuseq_free(umem->fq_reuse);
+   umem->fq_reuse = NULL;
+}
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index 82252cccb4e0..bcb5cbb40419 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -258,4 +258,7 @@ void xskq_set_umem(struct xsk_queue *q, u64 size, u64 
chunk_mask);
 struct xsk_queue *xskq_create(u32 nentries, bool umem_queue);
 void xskq_destroy(struct xsk_queue *q_ops);
 
+/* Executed by the core when the 

[PATCH bpf-next 0/4] i40e AF_XDP zero-copy buffer leak fixes

2018-09-04 Thread Björn Töpel
From: Björn Töpel 

This series addresses an AF_XDP zero-copy issue that buffers passed
from userspace to the kernel was leaked when the hardware descriptor
ring was torn down.

The patches fixes the i40e AF_XDP zero-copy implementation.

Thanks to Jakub Kicinski for pointing this out!

Some background for folks that don't know the details: A zero-copy
capable driver picks buffers off the fill ring and places them on the
hardware Rx ring to be completed at a later point when DMA is
complete. Similar on the Tx side; The driver picks buffers off the Tx
ring and places them on the Tx hardware ring.

In the typical flow, the Rx buffer will be placed onto an Rx ring
(completed to the user), and the Tx buffer will be placed on the
completion ring to notify the user that the transfer is done.

However, if the driver needs to tear down the hardware rings for some
reason (interface goes down, reconfiguration and such), the userspace
buffers cannot be leaked. They have to be reused or completed back to
userspace.

The implementation does the following:

* Outstanding Tx descriptors will be passed to the completion
  ring. The Tx code has back-pressure mechanism in place, so that
  enough empty space in the completion ring is guaranteed.

* Outstanding Rx descriptors are temporarily stored on a stash/reuse
  queue. The reuse queue is based on Jakub's RFC. When/if the HW rings
  comes up again, entries from the stash are used to re-populate the
  ring.

* When AF_XDP ZC is enabled, disallow changing the number of hardware
  descriptors via ethtool. Otherwise, the size of the stash/reuse
  queue can grow unbounded.

Going forward, introducing a "zero-copy allocator" analogous to Jesper
Brouer's page pool would be a more robust and reuseable solution.

Jakub: I've made a minor checkpatch-fix to your RFC, prior adding it
into this series.


Thanks!
Björn

Björn Töpel (3):
  i40e: clean zero-copy XDP Tx ring on shutdown/reset
  i40e: clean zero-copy XDP Rx ring on shutdown/reset
  i40e: disallow changing the number of descriptors when AF_XDP is on

Jakub Kicinski (1):
  net: xsk: add a simple buffer reuse queue

 .../net/ethernet/intel/i40e/i40e_ethtool.c|   9 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c   |  21 ++-
 .../ethernet/intel/i40e/i40e_txrx_common.h|   4 +
 drivers/net/ethernet/intel/i40e/i40e_xsk.c| 152 +-
 include/net/xdp_sock.h|  43 +
 net/xdp/xdp_umem.c|   2 +
 net/xdp/xsk_queue.c   |  55 +++
 net/xdp/xsk_queue.h   |   3 +
 8 files changed, 273 insertions(+), 16 deletions(-)

-- 
2.17.1



[iproute PATCH] ip-route: Fix segfault with many nexthops

2018-09-04 Thread Phil Sutter
It was possible to crash ip-route by adding an IPv6 route with 37
nexthop statements. A simple reproducer is:

| for i in `seq 37`; do
|   nhs="nexthop via ::$i "$nhs
| done
| ip -6 route add ::/64 $nhs

The related code was broken in multiple ways:

* parse_one_nh() assumed that rta points to 4kB of storage but caller
  provided just 1kB. Fixed by passing 'len' parameter with the correct
  value.

* Error checking of rta_addattr*() calls in parse_one_nh() and called
  functions was completely absent, so with above fix in place output
  flood would occur due to parser looping forever.

Note that it is still not possible to add a route with more than 36
nexthops due to stack buffer sizes, this patch merely fixes error path.

Signed-off-by: Phil Sutter 
---
 ip/iproute.c  |  41 ++--
 ip/iproute_lwtunnel.c | 108 +-
 2 files changed, 91 insertions(+), 58 deletions(-)

diff --git a/ip/iproute.c b/ip/iproute.c
index 30833414a3f7f..9e5ae48c0715c 100644
--- a/ip/iproute.c
+++ b/ip/iproute.c
@@ -941,7 +941,7 @@ int print_route(const struct sockaddr_nl *who, struct 
nlmsghdr *n, void *arg)
 }
 
 static int parse_one_nh(struct nlmsghdr *n, struct rtmsg *r,
-   struct rtattr *rta, struct rtnexthop *rtnh,
+   struct rtattr *rta, size_t len, struct rtnexthop *rtnh,
int *argcp, char ***argvp)
 {
int argc = *argcp;
@@ -962,11 +962,16 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
if (r->rtm_family == AF_UNSPEC)
r->rtm_family = addr.family;
if (addr.family == r->rtm_family) {
-   rta_addattr_l(rta, 4096, RTA_GATEWAY, 
, addr.bytelen);
-   rtnh->rtnh_len += sizeof(struct rtattr) + 
addr.bytelen;
+   if (rta_addattr_l(rta, len, RTA_GATEWAY,
+ , addr.bytelen))
+   return -1;
+   rtnh->rtnh_len += sizeof(struct rtattr)
+ + addr.bytelen;
} else {
-   rta_addattr_l(rta, 4096, RTA_VIA, , 
addr.bytelen+2);
-   rtnh->rtnh_len += RTA_SPACE(addr.bytelen+2);
+   if (rta_addattr_l(rta, len, RTA_VIA,
+ , addr.bytelen + 
2))
+   return -1;
+   rtnh->rtnh_len += RTA_SPACE(addr.bytelen + 2);
}
} else if (strcmp(*argv, "dev") == 0) {
NEXT_ARG();
@@ -988,13 +993,15 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
NEXT_ARG();
if (get_rt_realms_or_raw(, *argv))
invarg("\"realm\" value is invalid\n", *argv);
-   rta_addattr32(rta, 4096, RTA_FLOW, realm);
+   if (rta_addattr32(rta, len, RTA_FLOW, realm))
+   return -1;
rtnh->rtnh_len += sizeof(struct rtattr) + 4;
} else if (strcmp(*argv, "encap") == 0) {
-   int len = rta->rta_len;
+   int old_len = rta->rta_len;
 
-   lwt_parse_encap(rta, 4096, , );
-   rtnh->rtnh_len += rta->rta_len - len;
+   if (lwt_parse_encap(rta, len, , ))
+   return -1;
+   rtnh->rtnh_len += rta->rta_len - old_len;
} else if (strcmp(*argv, "as") == 0) {
inet_prefix addr;
 
@@ -1002,8 +1009,9 @@ static int parse_one_nh(struct nlmsghdr *n, struct rtmsg 
*r,
if (strcmp(*argv, "to") == 0)
NEXT_ARG();
get_addr(, *argv, r->rtm_family);
-   rta_addattr_l(rta, 4096, RTA_NEWDST, ,
- addr.bytelen);
+   if (rta_addattr_l(rta, len, RTA_NEWDST,
+ , addr.bytelen))
+   return -1;
rtnh->rtnh_len += sizeof(struct rtattr) + addr.bytelen;
} else
break;
@@ -1036,15 +1044,18 @@ static int parse_nexthops(struct nlmsghdr *n, struct 
rtmsg *r,
memset(rtnh, 0, sizeof(*rtnh));
rtnh->rtnh_len = sizeof(*rtnh);
rta->rta_len += rtnh->rtnh_len;
-   if (parse_one_nh(n, r, rta, rtnh, , )) {
+   if (parse_one_nh(n, r, rta, 1024, rtnh, , )) {
fprintf(stderr, "Error: cannot parse nexthop\n");
exit(-1);
  

[PATCH net] net/sched: fix memory leak in act_tunnel_key_init()

2018-09-04 Thread Davide Caratti
If users try to install act_tunnel_key 'set' rules with duplicate values
of 'index', the tunnel metadata are allocated, but never released. Then,
kmemleak complains as follows:

 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 # echo clear > /sys/kernel/debug/kmemleak
 # tc a a a tunnel_key set src_ip 1.1.1.1 dst_ip 2.2.2.2 id 42 index 111
 Error: TC IDR already exists.
 We have an error talking to the kernel
 # echo scan > /sys/kernel/debug/kmemleak
 # cat /sys/kernel/debug/kmemleak
 unreferenced object 0x8800574e6c80 (size 256):
   comm "tc", pid 5617, jiffies 4298118009 (age 57.990s)
   hex dump (first 32 bytes):
 00 00 00 00 00 00 00 00 00 1c e8 b0 ff ff ff ff  
 81 24 c2 ad ff ff ff ff 00 00 00 00 00 00 00 00  .$..
   backtrace:
 [] tunnel_key_init+0x8a5/0x1800 [act_tunnel_key]
 [<7d98fccd>] tcf_action_init_1+0x698/0xac0
 [<99b8f7cc>] tcf_action_init+0x15c/0x590
 [] tc_ctl_action+0x336/0x5c2
 [<2f5a2f7d>] rtnetlink_rcv_msg+0x357/0x8e0
 [<0bfe7575>] netlink_rcv_skb+0x124/0x350
 [] netlink_unicast+0x40f/0x5d0
 [] netlink_sendmsg+0x6e8/0xba0
 [<63d9d490>] sock_sendmsg+0xb3/0xf0
 [] ___sys_sendmsg+0x654/0x960
 [] __sys_sendmsg+0xd3/0x170
 [] do_syscall_64+0xa5/0x470
 [<5caa2d97>] entry_SYSCALL_64_after_hwframe+0x49/0xbe
 [] 0x

This problem theoretically happens also in case users attempt to setup a
geneve rule having wrong configuration data, or when the kernel fails to
allocate 'params_new'. Ensure that tunnel_key_init() releases the tunnel
metadata also in the above conditions.

Addresses-Coverity-ID: 1373974 ("Resource leak")
Fixes: d0f6dd8a914f4 ("net/sched: Introduce act_tunnel_key")
Fixes: 0ed5269f9e41f ("net/sched: add tunnel option support to act_tunnel_key")
Signed-off-by: Davide Caratti 
---
 net/sched/act_tunnel_key.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/net/sched/act_tunnel_key.c b/net/sched/act_tunnel_key.c
index 420759153d5f..28d58bbc953e 100644
--- a/net/sched/act_tunnel_key.c
+++ b/net/sched/act_tunnel_key.c
@@ -317,7 +317,7 @@ static int tunnel_key_init(struct net *net, struct nlattr 
*nla,
  >u.tun_info,
  opts_len, extack);
if (ret < 0)
-   goto err_out;
+   goto release_tun_meta;
}
 
metadata->u.tun_info.mode |= IP_TUNNEL_INFO_TX;
@@ -333,23 +333,24 @@ static int tunnel_key_init(struct net *net, struct nlattr 
*nla,
 _tunnel_key_ops, bind, true);
if (ret) {
NL_SET_ERR_MSG(extack, "Cannot create TC IDR");
-   goto err_out;
+   goto release_tun_meta;
}
 
ret = ACT_P_CREATED;
} else if (!ovr) {
-   tcf_idr_release(*a, bind);
NL_SET_ERR_MSG(extack, "TC IDR already exists");
-   return -EEXIST;
+   ret = -EEXIST;
+   goto release_tun_meta;
}
 
t = to_tunnel_key(*a);
 
params_new = kzalloc(sizeof(*params_new), GFP_KERNEL);
if (unlikely(!params_new)) {
-   tcf_idr_release(*a, bind);
NL_SET_ERR_MSG(extack, "Cannot allocate tunnel key parameters");
-   return -ENOMEM;
+   ret = -ENOMEM;
+   exists = true;
+   goto release_tun_meta;
}
params_new->tcft_action = parm->t_action;
params_new->tcft_enc_metadata = metadata;
@@ -367,6 +368,9 @@ static int tunnel_key_init(struct net *net, struct nlattr 
*nla,
 
return ret;
 
+release_tun_meta:
+   dst_release(>dst);
+
 err_out:
if (exists)
tcf_idr_release(*a, bind);
-- 
2.17.1



Re: [oss-drivers] Re: [RFC bpf-next PATCH] samples/bpf: xdp1 add XDP hardware offload option

2018-09-04 Thread Jakub Kicinski
On Tue, 4 Sep 2018 18:49:33 +0200, Jesper Dangaard Brouer wrote:
> On Tue, 4 Sep 2018 17:09:12 +0200
> Jakub Kicinski  wrote:
> 
> > On Tue, 04 Sep 2018 16:59:19 +0200, Jesper Dangaard Brouer wrote:  
> > > Trying to use XDP hardware offloading via XDP_FLAGS_HW_MODE
> > > and setting the ifindex in prog_load_attr.ifindex before
> > > loading the BPF code via bpf_prog_load_xattr().
> > > 
> > > This unfortunately does not seem to work...
> > > - Am I doing something wrong?
> > > 
> > > Notice, I also disable the map BPF_MAP_TYPE_PERCPU_ARRAY
> > > to make sure it was not related to the map (not supporting
> > > offloading).
> > > 
> > > Failed with:
> > >  # ./xdp1 -O $( > >  libbpf: load bpf program failed: Invalid argument
> > >  libbpf: failed to load program 'xdp1'
> > >  libbpf: failed to load object './xdp1_kern.o'
> > > 
> > > Tested on kernel 4.18.0-2.el8.x86_64 with driver nfp
> > >  Ethernet controller: Netronome Systems, Inc. Device 4000
> > 
> > Are you running the BPF capable FW?
> > 
> > https://help.netronome.com/support/solutions/articles/3650009-agilio-ebpf-2-0-6-extended-berkeley-packet-filter
> >   
> 
> I'm likely not running the correct firmware...
> 
> Can you tell me, with the ethtool -i output, if I'm running the
> appropriate firmware?
> 
> # ethtool -i enp129s0np1
> driver: nfp
> version: 4.18.0-2.el8.x86_64 SMP mod_unl
> firmware-version: 0.0.3.5 0.21 nic-2.0.7 nic
> expansion-rom-version: 
> bus-info: :81:00.0
> supports-statistics: yes
> supports-test: no
> supports-eeprom-access: no
> supports-register-dump: yes
> supports-priv-flags: no

Yup, the BPF firmware says bpf in firmware version.

> If this is a firmware version case, then we should really improve the
> errors we are giving the user, the -EINVAL can be anything.
> 
>  "libbpf: load bpf program failed: Invalid argument"

That is true.


Re: [bpf-next PATCH 3/3] xdp: split code for map vs non-map redirect

2018-09-04 Thread Jesper Dangaard Brouer
On Tue, 4 Sep 2018 23:39:45 +0800
kbuild test robot  wrote:

> Hi Jesper,
> 
> Thank you for the patch! Perhaps something to improve:

Daniel is faster than kbuild test-robot, and have already pointed this
out, and it should be fixed in V2.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [RFC bpf-next PATCH] samples/bpf: xdp1 add XDP hardware offload option

2018-09-04 Thread Jesper Dangaard Brouer
On Tue, 4 Sep 2018 17:09:12 +0200
Jakub Kicinski  wrote:

> On Tue, 04 Sep 2018 16:59:19 +0200, Jesper Dangaard Brouer wrote:
> > Trying to use XDP hardware offloading via XDP_FLAGS_HW_MODE
> > and setting the ifindex in prog_load_attr.ifindex before
> > loading the BPF code via bpf_prog_load_xattr().
> > 
> > This unfortunately does not seem to work...
> > - Am I doing something wrong?
> > 
> > Notice, I also disable the map BPF_MAP_TYPE_PERCPU_ARRAY
> > to make sure it was not related to the map (not supporting
> > offloading).
> > 
> > Failed with:
> >  # ./xdp1 -O $( >  libbpf: load bpf program failed: Invalid argument
> >  libbpf: failed to load program 'xdp1'
> >  libbpf: failed to load object './xdp1_kern.o'
> > 
> > Tested on kernel 4.18.0-2.el8.x86_64 with driver nfp
> >  Ethernet controller: Netronome Systems, Inc. Device 4000  
> 
> Are you running the BPF capable FW?
> 
> https://help.netronome.com/support/solutions/articles/3650009-agilio-ebpf-2-0-6-extended-berkeley-packet-filter

I'm likely not running the correct firmware...

Can you tell me, with the ethtool -i output, if I'm running the
appropriate firmware?

# ethtool -i enp129s0np1
driver: nfp
version: 4.18.0-2.el8.x86_64 SMP mod_unl
firmware-version: 0.0.3.5 0.21 nic-2.0.7 nic
expansion-rom-version: 
bus-info: :81:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: yes
supports-priv-flags: no

If this is a firmware version case, then we should really improve the
errors we are giving the user, the -EINVAL can be anything.

 "libbpf: load bpf program failed: Invalid argument"

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH v2 net-next] failover: Add missing check to validate 'slave_dev' in net_failover_slave_unregister

2018-09-04 Thread Samudrala, Sridhar

On 9/3/2018 7:56 PM, YueHaibing wrote:

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/net_failover.c: In function 'net_failover_slave_unregister':
drivers/net/net_failover.c:598:35: warning:
  variable 'primary_dev' set but not used [-Wunused-but-set-variable]

There should check the validity of 'slave_dev'.

Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")

Signed-off-by: YueHaibing 


Acked-by: Sridhar Samudrala 



---
v2: use WARN_ON_ONCE as Liran Alon suggested
---
  drivers/net/net_failover.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
index 7ae1856..5a749dc 100644
--- a/drivers/net/net_failover.c
+++ b/drivers/net/net_failover.c
@@ -603,6 +603,9 @@ static int net_failover_slave_unregister(struct net_device 
*slave_dev,
primary_dev = rtnl_dereference(nfo_info->primary_dev);
standby_dev = rtnl_dereference(nfo_info->standby_dev);
  
+	if (WARN_ON_ONCE(slave_dev != primary_dev && slave_dev != standby_dev))

+   return -ENODEV;
+
vlan_vids_del_by_dev(slave_dev, failover_dev);
dev_uc_unsync(slave_dev, failover_dev);
dev_mc_unsync(slave_dev, failover_dev);





Re: [PATCH net-next v2 1/2] netlink: ipv4 igmp join notifications

2018-09-04 Thread Patrick Ruddy
On Mon, 2018-09-03 at 16:12 -0700, Roopa Prabhu wrote:
> On Sun, Sep 2, 2018 at 4:18 AM, Patrick Ruddy
>  wrote:
> > Hi Roopa
> > 
> > inline
> > 
> > thx
> > 
> > -pr
> > 
> > On Fri, 2018-08-31 at 09:29 -0700, Roopa Prabhu wrote:
> > > On Fri, Aug 31, 2018 at 4:20 AM, Patrick Ruddy
> > >  wrote:
> > > > Some userspace applications need to know about IGMP joins from the 
> > > > kernel
> > > > for 2 reasons
> > > > 1. To allow the programming of multicast MAC filters in hardware
> > > > 2. To form a multicast FORUS list for non link-local multicast
> > > >groups to be sent to the kernel and from there to the interested
> > > >party.
> > > > (1) can be fulfilled but simply sending the hardware multicast MAC
> > > > address to be programmed but (2) requires the L3 address to be sent
> > > > since this cannot be constructed from the MAC address whereas the
> > > > reverse translation is a standard library function.
> > > > 
> > > > This commit provides addition and deletion of multicast addresses
> > > > using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
> > > > the RTM_GETADDR extension to allow multicast join state to be read
> > > > from the kernel.
> > > > 
> > > > Signed-off-by: Patrick Ruddy 
> > > > ---
> > > > v2: fix kbuild warnings.
> > > 
> > > I am still going through the series, but AFAICT, user-space caches 
> > > listening to
> > > RTNLGRP_IPV4_IFADDR will now also get multicast addresses by default ?
> > > 
> > 
> > Yes that's the crux of this change. It's unfortunate that I could not
> > use IFA_MULTICAST to distinguish the SAFI. I suppose the other option
> > would be to create a set of new NEW/DEL/GETMULTICAST messages but the
> > partial code for RTM_GETMULTICAST in ipv6/mcast.c complicates that
> > slightly. Happy to look at it if you think that would be be better.
> > 
> 
> yeah, true. Thinking about this some more, you are adding an interface
> for multicast entries learnt via igmp.
> There is already a netlink channel for layer2 mc addresses via igmp. I
> can't see why that cannot be used.
> It is RTM_*MDB msgs. It is currently only available for the bridge.
> But, I have a requirement for it to be
> available via a vxlan dev...so, I am looking at making it available on
> other devices.
> 
> Can you check if RTM_*MDB msgs can be made to work for your case ?.
> 
> The reason I think it should be possible is because this is similar to
> bridge fdb entries.
> The bridge fdb api  (RTM_NEWNEIGH with AF_BRIDGE) is overloaded to
> notify and dump netdev unicast addresses.
> similarly I think the mdb api can be overloaded to notify and dump
> netdev multicast addresses (statically added or learnt via igmp)

If I'm reading this correctly I think overloading this channel is
possible. 

What you're suggesting is overloading the RTM_***MDB messages with
AF_INET and AF_INET6 to carry the per-interfaces joined l3 multicast
addresses.

I've thrown together a quick test of this and it looks good. I can
polish this up and resubmit if you're happy with the approach. FWIW
isolating the multicast addresses this was seems safer and it's a
smaller patchset.

thx

-pr


Re: [PATCH RFC net-next 00/18] net: Improve route scalability via support for nexthop objects

2018-09-04 Thread David Ahern
On 9/2/18 11:34 AM, David Miller wrote:
> From: dsah...@kernel.org
> Date: Fri, 31 Aug 2018 17:49:35 -0700
> 
>> Examples
>> 1. Single path
>> $ ip nexthop add id 1 via 10.99.1.2 dev veth1
>> $ ip route add 10.1.1.0/24 nhid 1
>>
>> $ ip next ls
>> id 1 via 10.99.1.2 src 10.99.1.1 dev veth1 scope link
>>
>> $ ip ro ls
>> 10.1.1.0/24 nhid 1 scope link
>> ...
> 
> First of all, this whole idea is awesome!  But, you knew that already. :)

:-)

> 
> However, I worry what happesn in a mixed environment where we have routing
> daemons and tools inserting nexthop based routes, and some doing things
> the old way using and expecting inline nexthop information in the routes.
> 
> That mixed environment situation has to function correctly.  Older
> apps have to see the per-route nexthop info in the format and layout
> they expect (gw/dev pairs).  They cannot be expected to just studdenly
> understand the nexthop ID etc.
> 
> Otherwise the concept and ideas are fine, so as long as you can resolve
> the mixed environment situation I fully support this work and look forward
> to it being in a state where I can integrate it :-)
> 

RTA_NH_ID is on par with other new attributes (RTA_ENCAP for example) --
userspace apps get a route attribute and have no idea what it means
until support is added (e.g., it took more than 2 years for libnl to get
support for RTA_ENCAP). I take your comment to mean you prefer this new
attribute to be treated differently -- assume apps are clueless unless
they indicate otherwise. Given the number of ioctl based apps that might
be the better option for this case.

I can add an attribute for apps to specify 'hey, I understand nexthops'
on dump and get requests (per-app flag), and then I can add a sysctl
that controls whether the nexthop spec is included. The sysctl would be
for notifications and a global option for dumps/gets. Users who know
their OS is safe for the short form can set it and get the benefit of
smaller messages. While the biggest win here is pushing routes to the
kernel faster, there is also a gain with less data from the kernel in
route dumps and notifications, especially with multipath environments.


[net-next:master 13/40] drivers/net/ethernet/altera/altera_tse_main.c:1628: undefined reference to `of_phy_is_fixed_link'

2018-09-04 Thread kbuild test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 
master
head:   42220b77495d20eff5880336faff7bca1c111ee3
commit: 7e8d5755be0e6c92d3b86a85e54c6a550b1910c5 [13/40] net: nixge: Add 
support for 64-bit platforms
config: i386-randconfig-b0-09021507 (attached as .config)
compiler: gcc-4.9 (Debian 4.9.4-2) 4.9.4
reproduce:
git checkout 7e8d5755be0e6c92d3b86a85e54c6a550b1910c5
# save the attached .config to linux build tree
make ARCH=i386 
:: branch date: 3 hours ago
:: commit date: 4 days ago

All errors (new ones prefixed by >>):

   drivers/net/ethernet/altera/altera_tse_main.o: In function 
`altera_tse_remove':
>> drivers/net/ethernet/altera/altera_tse_main.c:1628: undefined reference to 
>> `of_phy_is_fixed_link'
>> drivers/net/ethernet/altera/altera_tse_main.c:1629: undefined reference to 
>> `of_phy_deregister_fixed_link'
   drivers/net/ethernet/stmicro/stmmac/stmmac_main.o: In function 
`stmmac_init_phy':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:963: undefined reference 
>> to `of_phy_connect'
   drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.o: In function 
`stmmac_mdio_register':
>> drivers/net/ethernet/stmicro/stmmac/stmmac_mdio.c:361: undefined reference 
>> to `of_mdiobus_register'

# 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/commit/?id=7e8d5755be0e6c92d3b86a85e54c6a550b1910c5
git remote add net-next 
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
git remote update net-next
git checkout 7e8d5755be0e6c92d3b86a85e54c6a550b1910c5
vim +1628 drivers/net/ethernet/altera/altera_tse_main.c

bbd2190c Vince Bridgers  2014-03-17  1617  
bbd2190c Vince Bridgers  2014-03-17  1618  /* Remove Altera TSE MAC device
bbd2190c Vince Bridgers  2014-03-17  1619   */
bbd2190c Vince Bridgers  2014-03-17  1620  static int altera_tse_remove(struct 
platform_device *pdev)
bbd2190c Vince Bridgers  2014-03-17  1621  {
bbd2190c Vince Bridgers  2014-03-17  1622   struct net_device *ndev = 
platform_get_drvdata(pdev);
5a89394a Johan Hovold2016-11-28  1623   struct altera_tse_private *priv 
= netdev_priv(ndev);
c484994e Kostya Belezko  2014-12-30  1624  
5a89394a Johan Hovold2016-11-28  1625   if (ndev->phydev) {
941ea69e Philippe Reynes 2016-06-18  1626   
phy_disconnect(ndev->phydev);
bbd2190c Vince Bridgers  2014-03-17  1627  
5a89394a Johan Hovold2016-11-28 @1628   if 
(of_phy_is_fixed_link(priv->device->of_node))
5a89394a Johan Hovold2016-11-28 @1629   
of_phy_deregister_fixed_link(priv->device->of_node);
5a89394a Johan Hovold2016-11-28  1630   }
5a89394a Johan Hovold2016-11-28  1631  
bbd2190c Vince Bridgers  2014-03-17  1632   platform_set_drvdata(pdev, 
NULL);
bbd2190c Vince Bridgers  2014-03-17  1633   altera_tse_mdio_destroy(ndev);
bbd2190c Vince Bridgers  2014-03-17  1634   unregister_netdev(ndev);
bbd2190c Vince Bridgers  2014-03-17  1635   free_netdev(ndev);
bbd2190c Vince Bridgers  2014-03-17  1636  
bbd2190c Vince Bridgers  2014-03-17  1637   return 0;
bbd2190c Vince Bridgers  2014-03-17  1638  }
bbd2190c Vince Bridgers  2014-03-17  1639  

:: The code at line 1628 was first introduced by commit
:: 5a89394ad2a5b94885bdbbb611518b0cc70bf354 net: ethernet: altera: fix 
fixed-link phydev leaks

:: TO: Johan Hovold 
:: CC: David S. Miller 

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH iproute2-next] ip: Add support for nexthop objects

2018-09-04 Thread David Ahern
On 9/1/18 2:37 PM, Stephen Hemminger wrote:

>> diff --git a/include/uapi/linux/nexthop.h b/include/uapi/linux/nexthop.h
>> new file mode 100644
>> index ..335182e8229a
>> --- /dev/null
>> +++ b/include/uapi/linux/nexthop.h
>> @@ -0,0 +1,56 @@
>> +#ifndef __LINUX_NEXTHOP_H
>> +#define __LINUX_NEXTHOP_H
>> +
>> +#include 
>> +
>> +struct nhmsg {
>> +unsigned char   nh_family;
>> +unsigned char   nh_scope; /* one of RT_SCOPE */
>> +unsigned char   nh_protocol;  /* Routing protocol that installed nh */
>> +unsigned char   resvd;
>> +unsigned intnh_flags; /* RTNH_F flags */
>> +};
> 
> Why not use __u8 and __u32 for these?

I want consistency with rtmsg on which nhmsg is based and has many
parallels.

> 
>> +struct nexthop_grp {
>> +__u32   id;
>> +__u32   weight;
>> +};
>> +
>> +enum {
>> +NEXTHOP_GRP_TYPE_MPATH,  /* default type if not specified */
>> +__NEXTHOP_GRP_TYPE_MAX,
>> +};
>> +
>> +#define NEXTHOP_GRP_TYPE_MAX (__NEXTHOP_GRP_TYPE_MAX - 1)
>> +
>> +
>> +/* NHA_ID   32-bit id for nexthop. id must be greater than 0.
>> + *  id == 0 means assign an unused id.
>> + */
> 
> Don't use dave's preferred comment style in this file.
> The reset of the file uses standard comments.

The file will eventually come from the kernel via header sync, so I have
to stick to whatever style is appropriate for the uapi files.


>> diff --git a/ip/ipnexthop.c b/ip/ipnexthop.c
>> new file mode 100644
>> index ..9fa4b7292426
>> --- /dev/null
>> +++ b/ip/ipnexthop.c
>> @@ -0,0 +1,652 @@
>> +/*
>> + * ip nexthop
>> + *
>> + * Copyright (C) 2017 Cumulus Networks
>> + * Copyright (c) 2017 David Ahern 
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License as published by
>> + * the Free Software Foundation; either version 2 of the License, or
>> + * (at your option) any later version.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> + * GNU General Public License for more details.
>> + */
>>
> 
> Please use SPDX and not GPL boilerplate in new files.

yes, the file pre-dates SPDX. Need to do the same with the kernel side
files.


> 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
> 
> Is this code really using libmnl?

no. need to fix. The iproute2 patch was only added for the RFC so people
could try out the UAPI which is the point of the RFC.


>> +if (!num || num * sizeof(*nhg) != RTA_PAYLOAD(grps_attr)) {
>> +fprintf(fp, "");
>> +return;
>> +}
>> +
>> +if (gtype)
>> +group_type = rta_getattr_u16(gtype);
>> +
>> +if (is_json_context()) {
>> +open_json_array(PRINT_JSON, "group");
>> +for (i = 0; i < num; ++i) {
>> +open_json_object(NULL);
>> +print_uint(PRINT_ANY, "id", "id %u ", nhg[i].id);
>> +print_uint(PRINT_ANY, "weight", "weight %u ", 
>> nhg[i].weight);
>> +close_json_object();
>> +}
>> +close_json_array(PRINT_JSON, NULL);
>> +print_string(PRINT_ANY, "type", "type %s ",
>> + nh_group_type_to_str(group_type, b1, sizeof(b1)));
>> +} else {
>> +fprintf(fp, "group ");
>> +for (i = 0; i < num; ++i) {
>> +if (i)
>> +fprintf(fp, "/");
>> +fprintf(fp, "%u", nhg[i].id);
>> +if (num > 1 && nhg[i].weight > 1)
>> +fprintf(fp, ",%u", nhg[i].weight);
>> +}
>> +}
>> +}
> 
> I think this could be done by using json_print cleverly rather than having
> to use is_json_contex(). That would avoid repeating code.
> 
> You are only decoding group type in the json version, why not both?

oversight. group type was a recent change.



[PATCH net-next] nfp: separate VXLAN and GRE feature handling

2018-09-04 Thread Jakub Kicinski
VXLAN and GRE FW features have to currently be both advertised
for the driver to enable them.  Separate the handling.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Dirk van der Merwe 
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index a8b9fbab5f73..9b7f28c2e221 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3745,15 +3745,18 @@ static void nfp_net_netdev_init(struct nfp_net *nn)
}
if (nn->cap & NFP_NET_CFG_CTRL_RSS_ANY)
netdev->hw_features |= NETIF_F_RXHASH;
-   if (nn->cap & NFP_NET_CFG_CTRL_VXLAN &&
-   nn->cap & NFP_NET_CFG_CTRL_NVGRE) {
+   if (nn->cap & NFP_NET_CFG_CTRL_VXLAN) {
if (nn->cap & NFP_NET_CFG_CTRL_LSO)
-   netdev->hw_features |= NETIF_F_GSO_GRE |
-  NETIF_F_GSO_UDP_TUNNEL;
-   nn->dp.ctrl |= NFP_NET_CFG_CTRL_VXLAN | NFP_NET_CFG_CTRL_NVGRE;
-
-   netdev->hw_enc_features = netdev->hw_features;
+   netdev->hw_features |= NETIF_F_GSO_UDP_TUNNEL;
+   nn->dp.ctrl |= NFP_NET_CFG_CTRL_VXLAN;
}
+   if (nn->cap & NFP_NET_CFG_CTRL_NVGRE) {
+   if (nn->cap & NFP_NET_CFG_CTRL_LSO)
+   netdev->hw_features |= NETIF_F_GSO_GRE;
+   nn->dp.ctrl |= NFP_NET_CFG_CTRL_NVGRE;
+   }
+   if (nn->cap & (NFP_NET_CFG_CTRL_VXLAN | NFP_NET_CFG_CTRL_NVGRE))
+   netdev->hw_enc_features = netdev->hw_features;
 
netdev->vlan_features = netdev->hw_features;
 
-- 
2.17.1



Re: [bpf-next PATCH 3/3] xdp: split code for map vs non-map redirect

2018-09-04 Thread kbuild test robot
Hi Jesper,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on bpf-next/master]

url:
https://github.com/0day-ci/linux/commits/Jesper-Dangaard-Brouer/XDP-micro-optimizations-for-redirect/20180903-121606
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
reproduce:
# apt-get install sparse
make ARCH=x86_64 allmodconfig
make C=1 CF=-D__CHECK_ENDIAN__
:: branch date: 3 hours ago
:: commit date: 3 hours ago

   net/core/filter.c:116:48: sparse: expression using sizeof(void)
   net/core/filter.c:116:48: sparse: expression using sizeof(void)
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:210:32: sparse: cast to restricted __be16
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:237:32: sparse: cast to restricted __be32
   net/core/filter.c:410:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:413:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:416:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:419:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:422:33: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:495:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:498:27: sparse: subtraction of functions? Share your drugs
   net/core/filter.c:501:27: sparse: subtraction of functions? Share your drugs
   include/linux/slab.h:631:13: sparse: undefined identifier 
'__builtin_mul_overflow'
   include/linux/slab.h:631:13: sparse: not a function 
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   net/core/filter.c:1389:39: sparse: incorrect type in argument 1 (different 
address spaces) @@expected struct sock_filter const *filter @@got 
struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1389:39:expected struct sock_filter const *filter
   net/core/filter.c:1389:39:got struct sock_filter [noderef] *filter
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   net/core/filter.c:1467:39: sparse: incorrect type in argument 1 (different 
address spaces) @@expected struct sock_filter const *filter @@got 
struct sockstruct sock_filter const *filter @@
   net/core/filter.c:1467:39:expected struct sock_filter const *filter
   net/core/filter.c:1467:39:got struct sock_filter [noderef] *filter
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   include/linux/filter.h:644:16: sparse: expression using sizeof(void)
   net/core/filter.c:1843:43: sparse: incorrect type in argument 2 (different 
base types) @@expected restricted __wsum [usertype] diff @@got unsigned 
lonrestricted __wsum [usertype] diff @@
   net/core/filter.c:1843:43:expected restricted __wsum [usertype] diff
   net/core/filter.c:1843:43:got unsigned long long [unsigned] [usertype] to
   net/core/filter.c:1846:36: sparse: incorrect type in argument 2 (different 
base types) @@expected restricted __be16 [usertype] old @@got unsigned 
lonrestricted __be16 [usertype] old @@
   net/core/filter.c:1846:36:expected restricted __be16 [usertype] old
   net/core/filter.c:1846:36:got unsigned long long [unsigned] [usertype] 
from
   net/core/filter.c:1846:42: sparse: incorrect type in argument 3 (different 
base types) @@expected restricted __be16 [usertype] new @@got unsigned 
lonrestricted __be16 [usertype] new @@
   net/core/filter.c:1846:42:expected restricted __be16 [usertype] new
   net/core/filter.c:1846:42:got unsigned long 

Re: [PATCH RFC net-next 18/18] net/ipv4: Optimization for fib_info lookup

2018-09-04 Thread David Ahern
On 9/1/18 2:43 PM, Stephen Hemminger wrote:
> On Fri, 31 Aug 2018 17:49:53 -0700
> dsah...@kernel.org wrote:
> 
>> +static inline unsigned int fib_info_hashfn_cfg(const struct fib_config *cfg)
>> +{
>> +unsigned int mask = (fib_info_hash_size - 1);
>> +unsigned int val = 0;
>> +
>> +val ^= (cfg->fc_protocol << 8) | cfg->fc_scope;
> 
> Why do assignment to 0 than do initial xor?
> Why not instead just do assignment in the first statement which would be 
> clearer.
> 

Side effect of copy-paste-adjust of the original. Will fix for the next
rfc (really need to not have 2 versions of the hashfn; need to think
through it).


[RFC PATCH] xdp: xdp_do_redirect_slow() can be static

2018-09-04 Thread kbuild test robot


Fixes: 974efdb08a35 ("xdp: split code for map vs non-map redirect")
Signed-off-by: kbuild test robot 
---
 filter.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 45ea00b..95454f3 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3171,7 +3171,7 @@ static int __bpf_tx_xdp(struct net_device *dev,
 }
 
 /* non-static to avoid inline by compiler */
-int xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
+static int xdp_do_redirect_slow(struct net_device *dev, struct xdp_buff *xdp,
struct bpf_prog *xdp_prog, struct bpf_redirect_info *ri)
 {
struct net_device *fwd;


Re: [RFC bpf-next PATCH] samples/bpf: xdp1 add XDP hardware offload option

2018-09-04 Thread Jakub Kicinski
On Tue, 04 Sep 2018 16:59:19 +0200, Jesper Dangaard Brouer wrote:
> Trying to use XDP hardware offloading via XDP_FLAGS_HW_MODE
> and setting the ifindex in prog_load_attr.ifindex before
> loading the BPF code via bpf_prog_load_xattr().
> 
> This unfortunately does not seem to work...
> - Am I doing something wrong?
> 
> Notice, I also disable the map BPF_MAP_TYPE_PERCPU_ARRAY
> to make sure it was not related to the map (not supporting
> offloading).
> 
> Failed with:
>  # ./xdp1 -O $(  libbpf: load bpf program failed: Invalid argument
>  libbpf: failed to load program 'xdp1'
>  libbpf: failed to load object './xdp1_kern.o'
> 
> Tested on kernel 4.18.0-2.el8.x86_64 with driver nfp
>  Ethernet controller: Netronome Systems, Inc. Device 4000

Are you running the BPF capable FW?

https://help.netronome.com/support/solutions/articles/3650009-agilio-ebpf-2-0-6-extended-berkeley-packet-filter


[RFC bpf-next PATCH] samples/bpf: xdp1 add XDP hardware offload option

2018-09-04 Thread Jesper Dangaard Brouer
Trying to use XDP hardware offloading via XDP_FLAGS_HW_MODE
and setting the ifindex in prog_load_attr.ifindex before
loading the BPF code via bpf_prog_load_xattr().

This unfortunately does not seem to work...
- Am I doing something wrong?

Notice, I also disable the map BPF_MAP_TYPE_PERCPU_ARRAY
to make sure it was not related to the map (not supporting
offloading).

Failed with:
 # ./xdp1 -O $(
 #include "bpf_helpers.h"
 
+/*
 struct bpf_map_def SEC("maps") rxcnt = {
.type = BPF_MAP_TYPE_PERCPU_ARRAY,
.key_size = sizeof(u32),
.value_size = sizeof(long),
.max_entries = 256,
 };
+*/
 
 static int parse_ipv4(void *data, u64 nh_off, void *data_end)
 {
@@ -83,9 +85,9 @@ int xdp_prog1(struct xdp_md *ctx)
else
ipproto = 0;
 
-   value = bpf_map_lookup_elem(, );
-   if (value)
-   *value += 1;
+// value = bpf_map_lookup_elem(, );
+// if (value)
+// *value += 1;
 
return rc;
 }
diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index b02c531510ed..a362b5adfcf0 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -64,7 +64,8 @@ static void usage(const char *prog)
"usage: %s [OPTS] IFINDEX\n\n"
"OPTS:\n"
"-Suse skb-mode\n"
-   "-Nenforce native mode\n",
+   "-Nenforce native mode\n"
+   "-Ooffload mode\n",
prog);
 }
 
@@ -74,7 +75,7 @@ int main(int argc, char **argv)
struct bpf_prog_load_attr prog_load_attr = {
.prog_type  = BPF_PROG_TYPE_XDP,
};
-   const char *optstr = "SN";
+   const char *optstr = "SNO";
int prog_fd, map_fd, opt;
struct bpf_object *obj;
struct bpf_map *map;
@@ -88,6 +89,9 @@ int main(int argc, char **argv)
case 'N':
xdp_flags |= XDP_FLAGS_DRV_MODE;
break;
+   case 'O':
+   xdp_flags |= XDP_FLAGS_HW_MODE;
+   break;
default:
usage(basename(argv[0]));
return 1;
@@ -109,6 +113,10 @@ int main(int argc, char **argv)
snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
prog_load_attr.file = filename;
 
+   /* For HW offload provide ifindex when loading BPF code */
+   if (xdp_flags & XDP_FLAGS_HW_MODE) {
+   prog_load_attr.ifindex = ifindex;
+   }
if (bpf_prog_load_xattr(_load_attr, , _fd))
return 1;
 



bnxt_en fails to initialize MAC address in Oracle cloud

2018-09-04 Thread Seth Forshee
We got a bug report against Ubuntu about networking failing to come up
in the Oracle cloud:

https://bugs.launchpad.net/bugs/1790652

This is with a kernel based on 4.18.5, and it has also been seen with a
4.17-based kernel. I'm not currently aware of any working kernel
version. The driver seems to be getting an error response from the
firmware when trying to set the MAC address.

[2.437420] Broadcom NetXtreme-C/E driver bnxt_en v1.9.1
[2.449820] bnxt_en :00:03.0 (unnamed net_device) (uninitialized): hwrm 
req_type 0xf seq id 0x5 error 0x
[2.455610] bnxt_en :00:03.0 (unnamed net_device) (uninitialized): VF 
MAC address 00:00:17:02:05:d0 not approved by the PF
[2.461443] bnxt_en :00:03.0: Unable to initialize mac address.
[2.483531] bnxt_en: probe of :00:03.0 failed with error -99

Let me know if there's more information you need, and we'll try to get
it for you.

Thanks,
Seth


[PATCH net-next 1/3] nfp: fix readq on absolute RTsyms

2018-09-04 Thread Jakub Kicinski
Return the error and report value through the output param.

Fixes: 640917dd81b6 ("nfp: support access to absolute RTsyms")
Reported-by: Dan Carpenter 
Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 108ce8c5e68e..4003ed76a49a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -337,8 +337,10 @@ int __nfp_rtsym_readq(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
-   if (sym->type == NFP_RTSYM_TYPE_ABS)
-   return sym->addr;
+   if (sym->type == NFP_RTSYM_TYPE_ABS) {
+   *value = sym->addr;
+   return 0;
+   }
 
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, _id, );
if (err)
-- 
2.17.1



[PATCH net-next 2/3] nfp: prefix rtsym error messages with symbol name

2018-09-04 Thread Jakub Kicinski
For ease of debug preface all error messages with the name
of the symbol which caused them.  Use the same message format
for existing messages while at it.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../netronome/nfp/nfpcore/nfp_rtsym.c | 21 ++-
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 4003ed76a49a..5e416e14e46a 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -237,10 +237,10 @@ u64 nfp_rtsym_size(const struct nfp_rtsym *sym)
 {
switch (sym->type) {
case NFP_RTSYM_TYPE_NONE:
-   pr_err("rtsym type NONE\n");
+   pr_err("rtsym '%s': type NONE\n", sym->name);
return 0;
default:
-   pr_warn("Unknown rtsym type: %d\n", sym->type);
+   pr_warn("rtsym '%s': unknown type: %d\n", sym->name, sym->type);
/* fall through */
case NFP_RTSYM_TYPE_OBJECT:
case NFP_RTSYM_TYPE_FUNCTION:
@@ -255,7 +255,8 @@ nfp_rtsym_to_dest(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
  u8 action, u8 token, u64 off, u32 *cpp_id, u64 *addr)
 {
if (sym->type != NFP_RTSYM_TYPE_OBJECT) {
-   nfp_err(cpp, "Direct access attempt to non-object rtsym\n");
+   nfp_err(cpp, "rtsym '%s': direct access to non-object rtsym\n",
+   sym->name);
return -EINVAL;
}
 
@@ -270,8 +271,8 @@ nfp_rtsym_to_dest(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
*cpp_id = NFP_CPP_ISLAND_ID(NFP_CPP_TARGET_MU, action, token,
sym->domain);
} else if (sym->target < 0) {
-   nfp_err(cpp, "Unhandled RTsym target encoding: %d\n",
-   sym->target);
+   nfp_err(cpp, "rtsym '%s': unhandled target encoding: %d\n",
+   sym->name, sym->target);
return -EINVAL;
} else {
*cpp_id = NFP_CPP_ISLAND_ID(sym->target, action, token,
@@ -451,7 +452,7 @@ u64 nfp_rtsym_read_le(struct nfp_rtsym_table *rtbl, const 
char *name,
break;
default:
nfp_err(rtbl->cpp,
-   "rtsym '%s' unsupported or non-scalar size: %lld\n",
+   "rtsym '%s': unsupported or non-scalar size: %lld\n",
name, nfp_rtsym_size(sym));
err = -EINVAL;
break;
@@ -497,7 +498,7 @@ int nfp_rtsym_write_le(struct nfp_rtsym_table *rtbl, const 
char *name,
break;
default:
nfp_err(rtbl->cpp,
-   "rtsym '%s' unsupported or non-scalar size: %lld\n",
+   "rtsym '%s': unsupported or non-scalar size: %lld\n",
name, nfp_rtsym_size(sym));
err = -EINVAL;
break;
@@ -523,18 +524,18 @@ nfp_rtsym_map(struct nfp_rtsym_table *rtbl, const char 
*name, const char *id,
err = nfp_rtsym_to_dest(rtbl->cpp, sym, NFP_CPP_ACTION_RW, 0, 0,
_id, );
if (err) {
-   nfp_err(rtbl->cpp, "Symbol %s mapping failed\n", name);
+   nfp_err(rtbl->cpp, "rtsym '%s': mapping failed\n", name);
return (u8 __iomem *)ERR_PTR(err);
}
 
if (sym->size < min_size) {
-   nfp_err(rtbl->cpp, "Symbol %s too small\n", name);
+   nfp_err(rtbl->cpp, "rtsym '%s': too small\n", name);
return (u8 __iomem *)ERR_PTR(-EINVAL);
}
 
mem = nfp_cpp_map_area(rtbl->cpp, id, cpp_id, addr, sym->size, area);
if (IS_ERR(mem)) {
-   nfp_err(rtbl->cpp, "Failed to map symbol %s: %ld\n",
+   nfp_err(rtbl->cpp, "rtysm '%s': failed to map: %ld\n",
name, PTR_ERR(mem));
return mem;
}
-- 
2.17.1



[PATCH net-next 0/3] nfp: improve the new rtsym helpers

2018-09-04 Thread Jakub Kicinski
Hi!

This set fixes a bug in ABS rtsym handling I added in net-next,
it expands the error checking and reporting on the rtsym accesses.

Jakub Kicinski (3):
  nfp: fix readq on absolute RTsyms
  nfp: prefix rtsym error messages with symbol name
  nfp: validate rtsym accesses fall within the symbol

 .../netronome/nfp/nfpcore/nfp_rtsym.c | 75 +++
 1 file changed, 60 insertions(+), 15 deletions(-)

-- 
2.17.1



[PATCH net-next 3/3] nfp: validate rtsym accesses fall within the symbol

2018-09-04 Thread Jakub Kicinski
With the accesses to rtsyms now all going via special helpers
we can easily make sure the driver is not reading past the
end of the symbol.

Signed-off-by: Jakub Kicinski 
Reviewed-by: Francois H. Theron 
---
 .../netronome/nfp/nfpcore/nfp_rtsym.c | 48 +--
 1 file changed, 45 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c 
b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
index 5e416e14e46a..1ad0a015572e 100644
--- a/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
+++ b/drivers/net/ethernet/netronome/nfp/nfpcore/nfp_rtsym.c
@@ -39,6 +39,8 @@
  *  Espen Skoglund 
  *  Francois H. Theron 
  */
+
+#include 
 #include 
 #include 
 #include 
@@ -285,15 +287,23 @@ nfp_rtsym_to_dest(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
 int __nfp_rtsym_read(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
 u8 action, u8 token, u64 off, void *buf, size_t len)
 {
+   u64 sym_size = nfp_rtsym_size(sym);
u32 cpp_id;
u64 addr;
int err;
 
+   if (off > sym_size) {
+   nfp_err(cpp, "rtsym '%s': read out of bounds: off: %lld + len: 
%zd > size: %lld\n",
+   sym->name, off, len, sym_size);
+   return -ENXIO;
+   }
+   len = min_t(size_t, len, sym_size - off);
+
if (sym->type == NFP_RTSYM_TYPE_ABS) {
-   __le64 tmp = cpu_to_le64(sym->addr);
+   u8 tmp[8];
 
-   len = min(len, sizeof(tmp));
-   memcpy(buf, , len);
+   put_unaligned_le64(sym->addr, tmp);
+   memcpy(buf, [off], len);
 
return len;
}
@@ -318,6 +328,12 @@ int __nfp_rtsym_readl(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
+   if (off + 4 > nfp_rtsym_size(sym)) {
+   nfp_err(cpp, "rtsym '%s': readl out of bounds: off: %lld + 4 > 
size: %lld\n",
+   sym->name, off, nfp_rtsym_size(sym));
+   return -ENXIO;
+   }
+
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, _id, );
if (err)
return err;
@@ -338,6 +354,12 @@ int __nfp_rtsym_readq(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
+   if (off + 8 > nfp_rtsym_size(sym)) {
+   nfp_err(cpp, "rtsym '%s': readq out of bounds: off: %lld + 8 > 
size: %lld\n",
+   sym->name, off, nfp_rtsym_size(sym));
+   return -ENXIO;
+   }
+
if (sym->type == NFP_RTSYM_TYPE_ABS) {
*value = sym->addr;
return 0;
@@ -359,10 +381,18 @@ int nfp_rtsym_readq(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym, u64 off,
 int __nfp_rtsym_write(struct nfp_cpp *cpp, const struct nfp_rtsym *sym,
  u8 action, u8 token, u64 off, void *buf, size_t len)
 {
+   u64 sym_size = nfp_rtsym_size(sym);
u32 cpp_id;
u64 addr;
int err;
 
+   if (off > sym_size) {
+   nfp_err(cpp, "rtsym '%s': write out of bounds: off: %lld + len: 
%zd > size: %lld\n",
+   sym->name, off, len, sym_size);
+   return -ENXIO;
+   }
+   len = min_t(size_t, len, sym_size - off);
+
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, _id, );
if (err)
return err;
@@ -383,6 +413,12 @@ int __nfp_rtsym_writel(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
+   if (off + 4 > nfp_rtsym_size(sym)) {
+   nfp_err(cpp, "rtsym '%s': writel out of bounds: off: %lld + 4 > 
size: %lld\n",
+   sym->name, off, nfp_rtsym_size(sym));
+   return -ENXIO;
+   }
+
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, _id, );
if (err)
return err;
@@ -403,6 +439,12 @@ int __nfp_rtsym_writeq(struct nfp_cpp *cpp, const struct 
nfp_rtsym *sym,
u64 addr;
int err;
 
+   if (off + 8 > nfp_rtsym_size(sym)) {
+   nfp_err(cpp, "rtsym '%s': writeq out of bounds: off: %lld + 8 > 
size: %lld\n",
+   sym->name, off, nfp_rtsym_size(sym));
+   return -ENXIO;
+   }
+
err = nfp_rtsym_to_dest(cpp, sym, action, token, off, _id, );
if (err)
return err;
-- 
2.17.1



Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jose Abreu
Hi Jerome,

On 04-09-2018 13:27, Jerome Brunet wrote:
> On Tue, 2018-09-04 at 10:57 +0100, Jose Abreu wrote:
>> Hi Jerome,
>>
>> On 03-09-2018 17:22, Jerome Brunet wrote:
>>> Situation is even worse with this.
>>> I'm using an NFS root filesystem. With your fixup, I'm not reaching the 
>>> prompt
>>> anymore. Looks like a the same kind of network breakdown we had previously
>>>
>> I was able to reproduce your problem and the attached fixup patch
>> fixed it up for me. Can you please try?
> I suppose this applies on top the initial patch, not the previous fixup
> (judging from the rejection) Could you details the baseline for each
> patch you send, its not easy to follow.
>
> BTW, there something weird (at least for me) with the patch you attach.
> git always refuse to apply them and even patch complains:
>
> git apply fixup2.patch
> error: patch failed: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1861
> error: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: patch does not apply
>
> patch -p1 < fixup2.patch
> (Stripping trailing CRs from patch; use --binary to disable.)
> patching file drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
> Hunk #6 succeeded at 2252 (offset 1 line).   
> patch unexpectedly ends in middle of line
> patch unexpectedly ends in middle of line

I just do "git diff  > out.patch". Maybe not the best thing
to do then.

>
> Anyway, with this second fixup, I'm back to square one:
> I can boot but iperf3 won't hold for long
>
>
> # iperf3 -c 10.1.2.124 -p 12345 -t 600
> Connecting to host 10.1.2.124, port 12345
> [  4] local 10.1.4.59 port 38650 connected to 10.1.2.124 port 12345
> [ ID] Interval   Transfer Bandwidth   Retr  Cwnd
> [  4]   0.00-1.00   sec  80.8 MBytes   678 Mbits/sec1300 KBytes
> [  4]   1.00-2.00   sec  81.1 MBytes   680 Mbits/sec0329 KBytes
> [  4]   2.00-3.00   sec  80.7 MBytes   677 Mbits/sec0335 KBytes
> [  4]   3.00-4.00   sec  81.7 MBytes   685 Mbits/sec0337 KBytes
> [  4]   4.00-5.00   sec  81.0 MBytes   680 Mbits/sec0341 KBytes
> [  4]   5.00-6.00   sec  81.0 MBytes   680 Mbits/sec0344 KBytes
> [  4]   6.00-7.00   sec  80.7 MBytes   677 Mbits/sec0345 KBytes
> [  4]   7.00-8.00   sec  81.5 MBytes   684 Mbits/sec0346 KBytes
> [  4]   8.00-9.00   sec  81.2 MBytes   680 Mbits/sec0348 KBytes
> [  4]   9.00-10.00  sec  5.59 MBytes  46.9 Mbits/sec2   1.41 KBytes
> [  4]  10.00-11.00  sec  0.00 Bytes  0.00 bits/sec1   1.41 KBytes
> [  4]  11.00-12.00  sec  0.00 Bytes  0.00 bits/sec0   1.41 KBytes
> [  4]  12.00-13.00  sec  0.00 Bytes  0.00 bits/sec1   1.41 KBytes
> [  4]  13.00-14.00  sec  0.00 Bytes  0.00 bits/sec0   1.41 KBytes

Okay, so this is odd because I now have a similar setup as yours
and is working perfectly fine:

---
# dmesg | grep -i stmmac
stmmaceth f0008000.ethernet: PTP uses main clock
stmmaceth f0008000.ethernet: User ID: 0x10, Synopsys ID: 0x37
stmmaceth f0008000.ethernet:DWMAC1000
stmmaceth f0008000.ethernet: DMA HW capability register supported
stmmaceth f0008000.ethernet: RX Checksum Offload Engine supported
stmmaceth f0008000.ethernet: COE Type 2
stmmaceth f0008000.ethernet: TX Checksum insertion supported
stmmaceth f0008000.ethernet: Normal descriptors
stmmaceth f0008000.ethernet: Ring mode enabled
stmmaceth f0008000.ethernet: Enable RX Mitigation via HW Watchdog
Timer
libphy: stmmac: probed
stmmaceth f0008000.ethernet eth0: device MAC address
0e:67:f6:6c:59:c6
Micrel KSZ9031 Gigabit PHY stmmac-0:00: attached PHY driver
[Micrel KSZ9031 Gigabit PHY] (mii_bus:phy_addr=stmmac-0:00, irq=POLL)
stmmaceth f0008000.ethernet eth0: No Safety Features support found
stmmaceth f0008000.ethernet eth0: PTP not supported by HW
stmmaceth f0008000.ethernet eth0: Link is Up - 1Gbps/Full - flow
control off
stmmaceth f0008000.ethernet eth0: Link is Down
stmmaceth f0008000.ethernet eth0: Link is Up - 1Gbps/Full - flow
control off

---
# iperf3 -c 192.168.0.3 -t 600
Connecting to host 192.168.0.3, port 5201
[  4] local 192.168.0.1 port 46796 connected to 192.168.0.3 port 5201
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.01   sec   101 MBytes   841 Mbits/sec1467
KBytes
[  4]   1.01-2.00   sec   112 MBytes   945 Mbits/sec0475
KBytes
[  4]   2.00-3.01   sec   114 MBytes   947 Mbits/sec0481
KBytes
[  4]   3.01-4.00   sec   112 MBytes   945 Mbits/sec0486
KBytes
[  4]   4.00-5.00   sec   113 MBytes   947 Mbits/sec0506
KBytes
[  4]   5.00-6.01   sec   113 MBytes   947 Mbits/sec0520
KBytes
[  4]   6.01-7.00   sec   112 MBytes   950 Mbits/sec0625
KBytes
[  4]   7.00-8.01   sec   114 MBytes   948 Mbits/sec0625
KBytes
[  4]   8.01-9.00   sec   112 MBytes   948 Mbits/sec0625
KBytes
[  4]   9.00-10.00  sec   114 MBytes   955 Mbits/sec0998
KBytes
[  4]  10.00-11.01  sec   114 MBytes   949 Mbits/sec0998
KBytes
[  4]  

[PATCH bpf-next] bpf/verifier: properly clear union members after a ctx read

2018-09-04 Thread Edward Cree
In check_mem_access(), for the PTR_TO_CTX case, after check_ctx_access()
 has supplied a reg_type, the other members of the register state are set
 appropriately.  Previously reg.range was set to 0, but as it is in a
 union with reg.map_ptr, which is larger, upper bytes of the latter were
 left in place.  This then caused the memcmp() in regsafe() to fail,
 preventing some branches from being pruned (and occasionally causing the
 same program to take a varying number of processed insns on repeated
 verifier runs).

Signed-off-by: Edward Cree 
---
Possibly something might need adding to __mark_reg_unknown() as well to
 clear map_ptr/range, I'm not sure (though doing so did not affect the
 processed insn count on the cilium programs).

 kernel/bpf/verifier.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index f4ff0c569e54..49e4ea66fdd3 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1640,9 +1640,9 @@ static int check_mem_access(struct bpf_verifier_env *env, 
int insn_idx, u32 regn
else
mark_reg_known_zero(env, regs,
value_regno);
-   regs[value_regno].id = 0;
-   regs[value_regno].off = 0;
-   regs[value_regno].range = 0;
+   /* Clear id, off, and union(map_ptr, range) */
+   memset(regs + value_regno, 0,
+  offsetof(struct bpf_reg_state, var_off));
regs[value_regno].type = reg_type;
}
 


Re: Why not use all the syn queues? in the function "tcp_conn_request", I have some questions.

2018-09-04 Thread Neal Cardwell
On Tue, Sep 4, 2018 at 1:48 AM Ttttabcd  wrote:
>
> Hello everyone,recently I am looking at the source code for handling TCP 
> three-way handshake(Linux Kernel version 4.18.5).
>
> I found some strange places in the source code for handling syn messages.
>
> in the function "tcp_conn_request"
>
> This code will be executed when we don't enable the syn cookies.
>
> if (!net->ipv4.sysctl_tcp_syncookies &&
> (net->ipv4.sysctl_max_syn_backlog - 
> inet_csk_reqsk_queue_len(sk) <
>  (net->ipv4.sysctl_max_syn_backlog >> 2)) &&
> !tcp_peer_is_proven(req, dst)) {
> /* Without syncookies last quarter of
>  * backlog is filled with destinations,
>  * proven to be alive.
>  * It means that we continue to communicate
>  * to destinations, already remembered
>  * to the moment of synflood.
>  */
> pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
> rsk_ops->family);
> goto drop_and_release;
> }
>
> But why don't we use all the syn queues?

If tcp_peer_is_proven() returns true then we do allow ourselves to use
the whole queue.

> Why do we need to leave the size of (net->ipv4.sysctl_max_syn_backlog >> 2) 
> in the queue?
>
> Even if the system is attacked by a syn flood, there is no need to leave a 
> part. Why do we need to leave a part?

The comment describes the rationale. If syncookies are disabled, then
the last quarter of the backlog is reserved for filling with
destinations that were proven to be alive, according to
tcp_peer_is_proven() (which uses RTTs measured in previous
connections). The idea is that if there is a SYN flood, we do not want
to use all of our queue budget on attack traffic but instead want to
reserve some queue space for SYNs from real remote machines that we
have actually contacted in the past.

> The value of sysctl_max_syn_backlog is the maximum length of the queue only 
> if syn cookies are enabled.

Even if syncookies are disabled, sysctl_max_syn_backlog is the maximum
length of the queue.

> This is the first strange place, here is another strange place
>
> __u32 isn = TCP_SKB_CB(skb)->tcp_tw_isn;
>
> if ((net->ipv4.sysctl_tcp_syncookies == 2 ||
>  inet_csk_reqsk_queue_is_full(sk)) && !isn) {
>
> if (!want_cookie && !isn) {
>
> The value of "isn" comes from TCP_SKB_CB(skb)->tcp_tw_isn, then it is judged 
> twice whether its value is indeed 0.
>
> But "tcp_tw_isn" is initialized in the function "tcp_v4_fill_cb"
>
> TCP_SKB_CB(skb)->tcp_tw_isn = 0;
>
> So it has always been 0, I used printk to test, and the result is always 0.

That field is also set in tcp_timewait_state_process():

  TCP_SKB_CB(skb)->tcp_tw_isn = isn;

So there can be cases where it is not 0.

Hope that helps,
neal


Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jerome Brunet
On Tue, 2018-09-04 at 10:57 +0100, Jose Abreu wrote:
> Hi Jerome,
> 
> On 03-09-2018 17:22, Jerome Brunet wrote:
> > 
> > Situation is even worse with this.
> > I'm using an NFS root filesystem. With your fixup, I'm not reaching the 
> > prompt
> > anymore. Looks like a the same kind of network breakdown we had previously
> > 
> 
> I was able to reproduce your problem and the attached fixup patch
> fixed it up for me. Can you please try?

I suppose this applies on top the initial patch, not the previous fixup
(judging from the rejection) Could you details the baseline for each
patch you send, its not easy to follow.

BTW, there something weird (at least for me) with the patch you attach.
git always refuse to apply them and even patch complains:

git apply fixup2.patch
error: patch failed: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c:1861
error: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c: patch does not apply

patch -p1 < fixup2.patch
(Stripping trailing CRs from patch; use --binary to disable.)
patching file drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
Hunk #6 succeeded at 2252 (offset 1 line).   
patch unexpectedly ends in middle of line
patch unexpectedly ends in middle of line

Anyway, with this second fixup, I'm back to square one:
I can boot but iperf3 won't hold for long


# iperf3 -c 10.1.2.124 -p 12345 -t 600
Connecting to host 10.1.2.124, port 12345
[  4] local 10.1.4.59 port 38650 connected to 10.1.2.124 port 12345
[ ID] Interval   Transfer Bandwidth   Retr  Cwnd
[  4]   0.00-1.00   sec  80.8 MBytes   678 Mbits/sec1300 KBytes
[  4]   1.00-2.00   sec  81.1 MBytes   680 Mbits/sec0329 KBytes
[  4]   2.00-3.00   sec  80.7 MBytes   677 Mbits/sec0335 KBytes
[  4]   3.00-4.00   sec  81.7 MBytes   685 Mbits/sec0337 KBytes
[  4]   4.00-5.00   sec  81.0 MBytes   680 Mbits/sec0341 KBytes
[  4]   5.00-6.00   sec  81.0 MBytes   680 Mbits/sec0344 KBytes
[  4]   6.00-7.00   sec  80.7 MBytes   677 Mbits/sec0345 KBytes
[  4]   7.00-8.00   sec  81.5 MBytes   684 Mbits/sec0346 KBytes
[  4]   8.00-9.00   sec  81.2 MBytes   680 Mbits/sec0348 KBytes
[  4]   9.00-10.00  sec  5.59 MBytes  46.9 Mbits/sec2   1.41 KBytes
[  4]  10.00-11.00  sec  0.00 Bytes  0.00 bits/sec1   1.41 KBytes
[  4]  11.00-12.00  sec  0.00 Bytes  0.00 bits/sec0   1.41 KBytes
[  4]  12.00-13.00  sec  0.00 Bytes  0.00 bits/sec1   1.41 KBytes
[  4]  13.00-14.00  sec  0.00 Bytes  0.00 bits/sec0   1.41 KBytes


> 
> Thanks and Best Regards,
> Jose Miguel Abreu




Re: [PATCH mlx5-next] net/mlx5: Add memic command opcode to command checker

2018-09-04 Thread Leon Romanovsky
On Mon, Sep 03, 2018 at 08:19:48PM +0300, Leon Romanovsky wrote:
> From: Ariel Levkovich 
>
> Adding the alloc/dealloc memic FW command opcodes to
> avoid "unknown command" prints in the command string
> converter and internal error status handler.
>
> Signed-off-by: Ariel Levkovich 
> Signed-off-by: Leon Romanovsky 
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 4 
>  1 file changed, 4 insertions(+)
>

Applied to mlx5-next: 09adbb5dd01b net/mlx5: Add memic command opcode to 
command checker

Thanks


signature.asc
Description: PGP signature


Re: [PATCH mlx5-next] net/mlx5: Fix atomic_mode enum values

2018-09-04 Thread Leon Romanovsky
On Mon, Sep 03, 2018 at 08:19:28PM +0300, Leon Romanovsky wrote:
> From: Moni Shoua 
>
> The field atomic_mode is 4 bits wide and therefore can hold values
> from 0x0 to 0xf. Remove the unnecessary 20 bit shift that made the values
> be incorrect. While that, remove unused enum values.
>
> Fixes: 57cda166bbe0 ("net/mlx5: Add DCT command interface")
> Signed-off-by: Moni Shoua 
> Reviewed-by: Artemy Kovalyov 
> Signed-off-by: Leon Romanovsky 
> ---
>  include/linux/mlx5/driver.h | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
>

Applied to mlx5-next: aa7e80b220f3 net/mlx5: Fix atomic_mode enum values

Thanks


signature.asc
Description: PGP signature


Re: [PATCH RFC net-next] net: Poptrie based routing table lookup

2018-09-04 Thread Jesper Dangaard Brouer
Hi Md. Islam,

People will start to ignore you, when you don't interact appropriately
with the community, and you ignore their advice, especially when it is
about how to interact with the community[1].

You have not addressed any of my feedback on your patch in [1].
 [1] 
http://www.mail-archive.com/search?l=mid=20180827173334.16ff0...@redhat.com

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

p.s. also top-posting is bad, but I suspect you will not read my
response if I don't top-post.


On Tue, 4 Sep 2018 01:02:30 -0400 "Md. Islam"  wrote:

> This patch implements Poptrie based routing table
> lookup/insert/delete/flush. Currently many carrier routers use kernel
> bypass frameworks such as DPDK and VPP to implement the data plane.
> XDP along with this patch will enable Linux to work as such a router.
> Currently it supports up to 255 ports. Many real word backbone routers
> have up to 233 ports (to the best of my knowledge), so it seems to be
> sufficient at this moment.
> 
> I also have attached a draft paper to explain it works (poptrie.pdf).
> Please set CONFIG_FIB_POPTRIE=y (default n) before testing the patch.
> Note that, poptrie_lookup() is not being called from anywhere. It will
> be used by XDP forwarding.
> 
> 
> From 3dc9683298ed896dd3080733503c35d68f05370e Mon Sep 17 00:00:00 2001
> From: tamimcse 
> Date: Mon, 3 Sep 2018 23:56:43 -0400
> Subject: [PATCH] Poptrie based routing table lookup
> 
> Signed-off-by: tamimcse 
> ---
>  include/net/ip_fib.h   |  42 +
>  net/ipv4/Kconfig   |   4 +
>  net/ipv4/Makefile  |   1 +
>  net/ipv4/fib_poptrie.c | 483 
> +
>  net/ipv4/fib_trie.c|  12 ++
>  5 files changed, 542 insertions(+)
>  create mode 100644 net/ipv4/fib_poptrie.c

First of order of business: You need to conform to the kernels coding
standards!

https://www.kernel.org/doc/html/v4.18/process/coding-style.html

There is a script avail to check this called: scripts/checkpatch.pl
It summary says:
 total: 139 errors, 238 warnings, 6 checks, 372 lines checked
(Not good, more error+warnings than lines...)

Please fix up those... else people will not even read you code!



Re: phys_port_id in switchdev mode?

2018-09-04 Thread Jakub Kicinski
On Mon, 3 Sep 2018 12:40:22 +0300, Or Gerlitz wrote:
> On Tue, Aug 28, 2018 at 9:05 PM, Jakub Kicinski wrote:
> > Hi!  
> 
> Hi Jakub and sorry for the late reply, this crazigly hot summer refuses to 
> die,
> 
> Note I replied couple of minutes ago but it didn't get to the list, so
> lets take it from this one:
>
> > I wonder if we can use phys_port_id in switchdev to group together
> > interfaces of a single PCI PF?  Here is the problem:
> >
> > With a mix of PF and VF interfaces it gets increasingly difficult to
> > figure out which one corresponds to which PF.  We can identify which
> > *representor* is which, by means of phys_port_name and devlink
> > flavours.  But if the actual VF/PF interfaces are also present on the
> > same host, it gets confusing when one tries to identify the PF they
> > came from.  Generally one has to resort of matching between PCI DBDF of
> > the PF and VFs or read relevant info out of ethtool -i.
> >
> > In multi host scenario this is particularly painful, as there seems to
> > be no immediately obvious way to match PCI interface ID of a card (0,
> > 1, 2, 3, 4...) to the DBDF we have connected.
> >
> > Another angle to this is legacy SR-IOV NDOs.  User space picks a netdev
> > from /sys/bus/pci/$VF_DBDF/physfn/net/ to run the NDOs on in somehow
> > random manner, which means we have to provide those for all devices with
> > link to the PF (all reprs).  And we have to link them (a) because it's
> > right (tm) and (b) to get correct naming.  
> 
> wait, as you commented in later, not only the mellanox vf reprs but rather 
> also
> the nfp vf reprs are not linked to the PF, because ip link output
> grows quadratically.

Right, correct.  If we set phys_port_id libvirt will reliably pick the
correct netdev to run NDOs on (PF/PF repr) so we can remove them from
the other netdevs and therefore limit the size of ip link show output.

> > The only reliable way to make
> > user space (libvirt) choose the repr it should run the NDOs on (which is
> > IMHO the corresponding PF repr) is to set phys_port_id on actual VFs,
> > VF reprs, PFs and PF reprs to a value corresponding to the *PCI PF*,
> > not the external/Ethernet port when in switchdev mode.  User space
> > should understand phys_port_id in this context, given it was originally
> > introduced for matching VFs to ports.  
> 
> Using phy_port_id to match/group VFs to PFs makes sense to me.
> 
> So what would be the libvirt use case you envision that needs
> the VF and PF reprs to support that as well? or maybe you were
> not referring to libvirt but to some other provisioning element? I need
> to refresh my memory on that area.

Ugh, you're right!  Libvirt is our primary target here.  IIUC we need
phys_port_id on the actual VF and then *a* netdev linked to physfn in
sysfs which will have the legacy NDOs.

We can't set the phys_port_id on the VF reprs because then we're back
to the problem of ip link output growing.  Perhaps we shouldn't set it
on PF repr either?

Let's make a table (assuming bare metal cloud scenario where Host0 is
controlling the network, while Host1 is the actual server):

[act - actual; rpr - representor; SN -serial number]

Today:

  dev | host | sysfs | phys_-  | switch- | phys_-| NDOs
  |  | link  | port_id | dev_id  | port_name | 
---
uplink|   0  |   PF0 |   - | ASIC SN | p0| PF0
act PF0   |   0  |   PF0 |   - |   - |  -|  -
act VF0/0 |   0  | VF0/0 |   - |   - |  -|  -
rpr PF0   |   0  |-  |   - | ASIC SN | pf0   |  -
rpr VF0/0 |   0  |-  |   - | ASIC SN | pf0vf0|  -
act PF1   |   1  |   PF1 |   - |   - |  -| PF1
act VF1/0 |   1  | VF1/0 |   - |   - |  -|  -
rpr PF1   |   0  |-  |   - | ASIC SN | pf1   |  -
rpr VF1/0 |   0  |-  |   - | ASIC SN | pf1vf0|  -

Proposed:

  dev | host | sysfs | phys_-  | switch- | phys_-| NDOs
  |  | link  | port_id | dev_id  | port_name |
---
uplink|   0  |   PF0 |   - | ASIC SN | p0|  -
act PF0   |   0  |   PF0 | PF0 SN  |   - |  -| PF0
act VF0/0 |   0  | VF0/0 | PF0 SN  |   - |  -|  -
rpr PF0   |   0  |   PF0 |   - | ASIC SN | pf0   |  -
rpr VF0/0 |   0  |   PF0 |   - | ASIC SN | pf0vf0|  -
act PF1   |   1  |   PF1 | PF1 SN  |   - |  -| PF1
act VF1/0 |   1  | VF1/0 | PF1 SN  |   - |  -|  -
rpr PF1   |   0  |   PF0 |   - | ASIC SN | pf1   |  -
rpr VF1/0 |   0  |   PF0 |   - | ASIC SN | pf1vf0|  -

With this libvirt on Host0 should easily find the actual PF0 netdev to
run the NDO on, if it wants to use VFs:
 - libvrit finds act VF0/0 to plug into the VF;
 - reads its phys_port_id -> "PF0 SN";
 - finds netdev with "PF0 SN" linked to physfn -> "act PF0";
 - runs NDOs on "act PF0" for PF0's VF 

Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jose Abreu



On 04-09-2018 10:16, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 16:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 16:38, Jerome Brunet wrote:
>>> On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
 On 03-09-2018 15:10, Jerome Brunet wrote:
> On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
>> On 03-09-2018 11:16, Jerome Brunet wrote:
>>> No notable change. Rx is fine but Tx:
>>> [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 KBytes
>>>
>>> I suppose the problem as something to do with the retries. When doing 
>>> Tx test
>>> alone, we don't have such a things a throughput where we expect it to 
>>> be.
>> Yeah, I just remembered you are not using GMAC4 so it wouldn't
>> make a difference. Is your version 3.710? If so please try adding
>> the following compatible to your DT bindings "snps,dwmac-3.710".
> According to the documentation, it is a 3.70a but I learn (the hard way) 
> not to
> trust the documentation too much. Is there anyway to make sure which 
> version we
> have. Like a register to read ?
 It should be dumped at probe by a string like this one:

 "User ID: 0xXY, Synopsys ID: 0xXZ"
>>> User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?
>> Its 3.7. As for the User ID this can be changed by final HW team
>> so I can't confirm what it means.
>>
> Is there anyway to know if it is a 3.70a or 3.71 ?

If the user ID wasn't changed from default then its 3.71.

>
> Out of curiosity, I changed the compatible to "snps,dwmac-3.710" anyway. 
> For
> some reason, the MDIO bus failed to register with this. Since it is not 
> the
> documented version, I did not check why.
 No you can't change. You need to add it. So it should stay like this:

 compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
 "snps,dwmac-3.710";
>>> Adding "snps,dwmac-3.710" does not change anything for me.
>>> Having both Tx and Rx at the same time still wreck Tx throughput 
>>> unfortunately 
>> Okay, so you said that there are lots of retries: can you disable
>> COE at all ? (it should be something like: ethtool -K eth0 rx off
>> tx off).
> Done but no change.

Ok. Are you able to analyze the sent / received packets using
pcap so that we can understand why there are lots of retries ?

Thanks and Best Regards,
Jose Miguel Abreu

>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
 Thanks and Best Regards,
 Jose Miguel Abreu

>>> By the way, your mailer (and its auto 80 column rule I suppose) made 
>>> the patch
>>> below a bit harder to apply
>> Sorry. Next time I will send as attachment.
> No worries
>
>> Thanks and Best Regards,
>> Jose Miguel Abreu
>>
>



Re: [PATCH net-next 1/2] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jose Abreu
Hi Jerome,

On 03-09-2018 17:22, Jerome Brunet wrote:
>
> Situation is even worse with this.
> I'm using an NFS root filesystem. With your fixup, I'm not reaching the prompt
> anymore. Looks like a the same kind of network breakdown we had previously
>

I was able to reproduce your problem and the attached fixup patch
fixed it up for me. Can you please try?

Thanks and Best Regards,
Jose Miguel Abreu
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 14f890f2a970..2cf927ccb409 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -1861,14 +1861,14 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int limit, u32 queue)
 {
 	struct stmmac_tx_queue *tx_q = >tx_queue[queue];
 	unsigned int bytes_compl = 0, pkts_compl = 0;
-	unsigned int entry, count = 0;
+	unsigned int entry;
 
 	netif_tx_lock(priv->dev);
 
 	priv->xstats.tx_clean++;
 
 	entry = tx_q->dirty_tx;
-	while ((entry != tx_q->cur_tx) && (count < limit)) {
+	while ((entry != tx_q->cur_tx) && (pkts_compl < limit)) {
 		struct sk_buff *skb = tx_q->tx_skbuff[entry];
 		struct dma_desc *p;
 		int status;
@@ -1884,8 +1884,6 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int limit, u32 queue)
 		if (unlikely(status & tx_dma_own))
 			break;
 
-		count++;
-
 		/* Make sure descriptor fields are read after reading
 		 * the own bit.
 		 */
@@ -1955,7 +1953,7 @@ static int stmmac_tx_clean(struct stmmac_priv *priv, int limit, u32 queue)
 	}
 	netif_tx_unlock(priv->dev);
 
-	return count;
+	return pkts_compl;
 }
 
 /**
@@ -2072,8 +2070,11 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
 		if (likely(status[chan] & handle_rx)) {
 			struct stmmac_rx_queue *rx_q = >rx_queue[chan];
 
-			if (likely(napi_schedule_prep(_q->napi)))
+			if (likely(napi_schedule_prep(_q->napi))) {
+stmmac_disable_dma_irq(priv, priv->ioaddr,
+		   rx_q->queue_index);
 __napi_schedule(_q->napi);
+			}
 		}
 	}
 
@@ -2085,8 +2086,11 @@ static void stmmac_dma_interrupt(struct stmmac_priv *priv)
 		if (status[chan] & handle_tx) {
 			struct stmmac_tx_queue *tx_q = >tx_queue[chan];
 
-			if (likely(napi_schedule_prep(_q->napi)))
+			if (likely(napi_schedule_prep(_q->napi))) {
+stmmac_disable_dma_irq(priv, priv->ioaddr,
+		   tx_q->queue_index);
 __napi_schedule(_q->napi);
+			}
 		}
 	}
 
@@ -2247,11 +2251,7 @@ static void stmmac_tx_timer(struct timer_list *t)
 	struct stmmac_tx_queue *tx_q = from_timer(tx_q, t, txtimer);
 	struct stmmac_priv *priv = tx_q->priv_data;
 
-	if (napi_schedule_prep(_q->napi)) {
-		stmmac_disable_dma_irq(priv, priv->ioaddr, tx_q->queue_index);
-		__napi_schedule(_q->napi);
-	}
-
+	stmmac_tx_clean(priv, ~0, tx_q->queue_index);
 	tx_q->tx_timer_active = 0;
 }
 


Re: [Patch net] tipc: fix a missing rhashtable_walk_exit()

2018-09-04 Thread Ying Xue
On 08/24/2018 07:19 AM, Cong Wang wrote:
> rhashtable_walk_exit() must be paired with rhashtable_walk_enter().
> 
> Fixes: 40f9f4397060 ("tipc: Fix tipc_sk_reinit race conditions")
> Cc: Herbert Xu 
> Cc: Ying Xue 
> Signed-off-by: Cong Wang 

Acked-by: Ying Xue 

> ---
>  net/tipc/socket.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/tipc/socket.c b/net/tipc/socket.c
> index c1e93c9515bc..c9a50b62c738 100644
> --- a/net/tipc/socket.c
> +++ b/net/tipc/socket.c
> @@ -2672,6 +2672,8 @@ void tipc_sk_reinit(struct net *net)
>  
>   rhashtable_walk_stop();
>   } while (tsk == ERR_PTR(-EAGAIN));
> +
> + rhashtable_walk_exit();
>  }
>  
>  static struct tipc_sock *tipc_sk_lookup(struct net *net, u32 portid)
> 


Re: [RFT net-next] net: stmmac: Rework coalesce timer and fix multi-queue races

2018-09-04 Thread Jerome Brunet
On Mon, 2018-09-03 at 16:47 +0100, Jose Abreu wrote:
> On 03-09-2018 16:38, Jerome Brunet wrote:
> > On Mon, 2018-09-03 at 16:22 +0100, Jose Abreu wrote:
> > > On 03-09-2018 15:10, Jerome Brunet wrote:
> > > > On Mon, 2018-09-03 at 12:47 +0100, Jose Abreu wrote:
> > > > > On 03-09-2018 11:16, Jerome Brunet wrote:
> > > > > > No notable change. Rx is fine but Tx:
> > > > > > [  5]   3.00-4.00   sec  3.55 MBytes  29.8 Mbits/sec   51   12.7 
> > > > > > KBytes
> > > > > > 
> > > > > > I suppose the problem as something to do with the retries. When 
> > > > > > doing Tx test
> > > > > > alone, we don't have such a things a throughput where we expect it 
> > > > > > to be.
> > > > > 
> > > > > Yeah, I just remembered you are not using GMAC4 so it wouldn't
> > > > > make a difference. Is your version 3.710? If so please try adding
> > > > > the following compatible to your DT bindings "snps,dwmac-3.710".
> > > > 
> > > > According to the documentation, it is a 3.70a but I learn (the hard 
> > > > way) not to
> > > > trust the documentation too much. Is there anyway to make sure which 
> > > > version we
> > > > have. Like a register to read ?
> > > 
> > > It should be dumped at probe by a string like this one:
> > > 
> > > "User ID: 0xXY, Synopsys ID: 0xXZ"
> > 
> > User ID: 0x11, Synopsys ID: 0x37 ? What to does it map to ?
> 
> Its 3.7. As for the User ID this can be changed by final HW team
> so I can't confirm what it means.
> 

Is there anyway to know if it is a 3.70a or 3.71 ?

> > 
> > > > Out of curiosity, I changed the compatible to "snps,dwmac-3.710" 
> > > > anyway. For
> > > > some reason, the MDIO bus failed to register with this. Since it is not 
> > > > the
> > > > documented version, I did not check why.
> > > 
> > > No you can't change. You need to add it. So it should stay like this:
> > > 
> > > compatible = "amlogic,meson-gxbb-dwmac", "snps,dwmac",
> > > "snps,dwmac-3.710";
> > 
> > Adding "snps,dwmac-3.710" does not change anything for me.
> > Having both Tx and Rx at the same time still wreck Tx throughput 
> > unfortunately 
> 
> Okay, so you said that there are lots of retries: can you disable
> COE at all ? (it should be something like: ethtool -K eth0 rx off
> tx off).

Done but no change.

> 
> Thanks and Best Regards,
> Jose Miguel Abreu
> 
> > 
> > > Thanks and Best Regards,
> > > Jose Miguel Abreu
> > > 
> > > > > > By the way, your mailer (and its auto 80 column rule I suppose) 
> > > > > > made the patch
> > > > > > below a bit harder to apply
> > > > > 
> > > > > Sorry. Next time I will send as attachment.
> > > > 
> > > > No worries
> > > > 
> > > > > Thanks and Best Regards,
> > > > > Jose Miguel Abreu
> 
> 




Re: [PATCH net-next v2 1/2] netlink: ipv4 igmp join notifications

2018-09-04 Thread Patrick Ruddy
On Mon, 2018-09-03 at 16:12 -0700, Roopa Prabhu wrote:
> On Sun, Sep 2, 2018 at 4:18 AM, Patrick Ruddy
>  wrote:
> > Hi Roopa
> > 
> > inline
> > 
> > thx
> > 
> > -pr
> > 
> > On Fri, 2018-08-31 at 09:29 -0700, Roopa Prabhu wrote:
> > > On Fri, Aug 31, 2018 at 4:20 AM, Patrick Ruddy
> > >  wrote:
> > > > Some userspace applications need to know about IGMP joins from the 
> > > > kernel
> > > > for 2 reasons
> > > > 1. To allow the programming of multicast MAC filters in hardware
> > > > 2. To form a multicast FORUS list for non link-local multicast
> > > >groups to be sent to the kernel and from there to the interested
> > > >party.
> > > > (1) can be fulfilled but simply sending the hardware multicast MAC
> > > > address to be programmed but (2) requires the L3 address to be sent
> > > > since this cannot be constructed from the MAC address whereas the
> > > > reverse translation is a standard library function.
> > > > 
> > > > This commit provides addition and deletion of multicast addresses
> > > > using the RTM_NEWADDR and RTM_DELADDR messages. It also provides
> > > > the RTM_GETADDR extension to allow multicast join state to be read
> > > > from the kernel.
> > > > 
> > > > Signed-off-by: Patrick Ruddy 
> > > > ---
> > > > v2: fix kbuild warnings.
> > > 
> > > I am still going through the series, but AFAICT, user-space caches 
> > > listening to
> > > RTNLGRP_IPV4_IFADDR will now also get multicast addresses by default ?
> > > 
> > 
> > Yes that's the crux of this change. It's unfortunate that I could not
> > use IFA_MULTICAST to distinguish the SAFI. I suppose the other option
> > would be to create a set of new NEW/DEL/GETMULTICAST messages but the
> > partial code for RTM_GETMULTICAST in ipv6/mcast.c complicates that
> > slightly. Happy to look at it if you think that would be be better.
> > 
> 
> yeah, true. Thinking about this some more, you are adding an interface
> for multicast entries learnt via igmp.
> There is already a netlink channel for layer2 mc addresses via igmp. I
> can't see why that cannot be used.
> It is RTM_*MDB msgs. It is currently only available for the bridge.
> But, I have a requirement for it to be
> available via a vxlan dev...so, I am looking at making it available on
> other devices.
>  
> The reason I think it should be possible is because this is similar to
> bridge fdb entries.
> The bridge fdb api  (RTM_NEWNEIGH with AF_BRIDGE) is overloaded to
> notify and dump netdev unicast addresses.
> similarly I think the mdb api can be overloaded to notify and dump
> netdev multicast addresses (statically added or learnt via igmp)
OK I'll take a look at this.


Re: [PATCH v2 net-next] failover: Add missing check to validate 'slave_dev' in net_failover_slave_unregister

2018-09-04 Thread Liran Alon


- yuehaib...@huawei.com wrote:

> Fixes gcc '-Wunused-but-set-variable' warning:
> 
> drivers/net/net_failover.c: In function
> 'net_failover_slave_unregister':
> drivers/net/net_failover.c:598:35: warning:
>  variable 'primary_dev' set but not used [-Wunused-but-set-variable]
> 
> There should check the validity of 'slave_dev'.
> 
> Fixes: cfc80d9a1163 ("net: Introduce net_failover driver")
> 
> Signed-off-by: YueHaibing 
> ---
> v2: use WARN_ON_ONCE as Liran Alon suggested
> ---
>  drivers/net/net_failover.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/net_failover.c b/drivers/net/net_failover.c
> index 7ae1856..5a749dc 100644
> --- a/drivers/net/net_failover.c
> +++ b/drivers/net/net_failover.c
> @@ -603,6 +603,9 @@ static int net_failover_slave_unregister(struct
> net_device *slave_dev,
>   primary_dev = rtnl_dereference(nfo_info->primary_dev);
>   standby_dev = rtnl_dereference(nfo_info->standby_dev);
>  
> + if (WARN_ON_ONCE(slave_dev != primary_dev && slave_dev !=
> standby_dev))
> + return -ENODEV;
> +

I prefer to put () around different conditions but that's just a matter of 
taste.
Reviewed-by: Liran Alon 

>   vlan_vids_del_by_dev(slave_dev, failover_dev);
>   dev_uc_unsync(slave_dev, failover_dev);
>   dev_mc_unsync(slave_dev, failover_dev);


Re: Why not use all the syn queues? in the function "tcp_conn_request", I have some questions.

2018-09-04 Thread Eric Dumazet



On 09/03/2018 10:31 PM, Ttttabcd wrote:
> Hello everyone,recently I am looking at the source code for handling TCP 
> three-way handshake(Linux Kernel version 4.18.5).
> 
> I found some strange places in the source code for handling syn messages.
> 
> in the function "tcp_conn_request"
> 
> This code will be executed when we don't enable the syn cookies.
> 
>   if (!net->ipv4.sysctl_tcp_syncookies &&
>   (net->ipv4.sysctl_max_syn_backlog - 
> inet_csk_reqsk_queue_len(sk) <
>(net->ipv4.sysctl_max_syn_backlog >> 2)) &&
>   !tcp_peer_is_proven(req, dst)) {
>   /* Without syncookies last quarter of
>* backlog is filled with destinations,
>* proven to be alive.
>* It means that we continue to communicate
>* to destinations, already remembered
>* to the moment of synflood.
>*/
>   pr_drop_req(req, ntohs(tcp_hdr(skb)->source),
>   rsk_ops->family);
>   goto drop_and_release;
>   }
> 
> But why don't we use all the syn queues?


Isn't it explained in the comment ?

Anyway, I am not sure anyone disables syn cookies.



RE: [net-next 02/15] i40e: move ethtool stats boiler plate code to i40e_ethtool_stats.h

2018-09-04 Thread Keller, Jacob E



> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of David Miller
> Sent: Wednesday, August 29, 2018 8:05 PM
> To: Kirsher, Jeffrey T 
> Cc: Keller, Jacob E ; netdev@vger.kernel.org;
> nhor...@redhat.com; sassm...@redhat.com; jogre...@redhat.com
> Subject: Re: [net-next 02/15] i40e: move ethtool stats boiler plate code to
> i40e_ethtool_stats.h
> 
> From: Jeff Kirsher 
> Date: Wed, 29 Aug 2018 15:48:21 -0700
> 
> > diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> > new file mode 100644
> > index ..0290ade7494b
> > --- /dev/null
> > +++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool_stats.h
> > @@ -0,0 +1,221 @@
> ...
> > +/**
> > + * __i40e_add_stat_strings - copy stat strings into ethtool buffer
> > + * @p: ethtool supplied buffer
> > + * @stats: stat definitions array
> > + * @size: size of the stats array
> > + *
> > + * Format and copy the strings described by stats into the buffer pointed 
> > at
> > + * by p.
> > + **/
> > +static void __i40e_add_stat_strings(u8 **p, const struct i40e_stats 
> > stats[],
> > +   const unsigned int size, ...)
> 
> Need to be marked inline.

Marking this inline seems to be causing build problems on some systems because 
it uses variadic arguments.

Thanks,
Jake


Re: [PATCH net 0/3] bnxt_en: Bug fixes.

2018-09-04 Thread Michael Chan
On Mon, Sep 3, 2018 at 10:50 PM, Michael Chan  wrote:
> On Mon, Sep 3, 2018 at 10:01 PM, David Miller  wrote:
>>
>> From: Michael Chan 
>> Date: Mon,  3 Sep 2018 04:23:16 -0400
>>
>> > This short series fixes resource related logic in the driver, mostly
>> > affecting the RDMA driver under corner cases.
>>
>> Series applied, thanks Michael.
>>
>> Do you want patch #3 queued up for -stable?
>
> Yes, please go ahead.  Thanks.

But there is a dependency on patch #2 though.  So #2 needs to be queued as well.