Re: [PATCH v4 00/16] Add Paravirtual RDMA Driver

2016-09-15 Thread Leon Romanovsky
On Wed, Sep 14, 2016 at 11:36:36AM -0600, Jason Gunthorpe wrote:
> On Mon, Sep 12, 2016 at 10:43:00PM +, Adit Ranadive wrote:
> > On Mon, Sep 12, 2016 at 11:03:39 -0700, Jason Gunthorpe wrote:
> > > On Sun, Sep 11, 2016 at 09:49:10PM -0700, Adit Ranadive wrote:
> > > > [2] Libpvrdma User-level library -
> > > > http://git.openfabrics.org/?p=~aditr/libpvrdma.git;a=summary
> > >
> > > You will probably find that rdma-plumbing will be the best way to get
> > > your userspace component into the distributors.
> >
> > Hi Jason,
> >
> > Sorry I haven't paying attention to that discussion. Do you know how soon
> > distros will pick up the rdma-plumbing stuff?
>
> We desire to use this as the vehical for the userspace included with
> the 4.9 kernel.
>
> I anticipate the tree will be running by Oct 1.

+1

>
> Jason
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


Re: [PATCH v4 05/16] IB/pvrdma: Add functions for Verbs support

2016-09-15 Thread Leon Romanovsky
On Wed, Sep 14, 2016 at 11:15:37PM -0700, Christoph Hellwig wrote:
> On Thu, Sep 15, 2016 at 12:10:10AM +, Adit Ranadive wrote:
> > On Wed, Sep 14, 2016 at 05:49:50 -0700 Christoph Hellwig wrote:
> > > > +   props->max_fmr = dev->dsr->caps.max_fmr;
> > > > +   props->max_map_per_fmr = dev->dsr->caps.max_map_per_fmr;
> > >
> > > Please don't add FMR support to any new drivers.
> >
> > We don't and our device reports these as 0. If you want me to more explicit 
> > I
> > can remove the zero'd out properties.
>
> Oh, ok.  I'll withdraw my comment then.

I would suggest to remove zero assignments to struct which is zero from
the beginning. It will eliminate the confusions.

Thanks

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


signature.asc
Description: PGP signature


Re: [RFC 08/11] Add support for data path

2016-09-15 Thread Leon Romanovsky
On Mon, Sep 12, 2016 at 07:07:42PM +0300, Ram Amrani wrote:

> +++ b/drivers/infiniband/hw/qedr/qedr_hsi_rdma.h
> @@ -150,6 +150,12 @@ struct rdma_rq_sge {
>   struct regpair addr;
>   __le32 length;
>   __le32 flags;
> +#define RDMA_RQ_SGE_L_KEY_MASK  0x3FF
> +#define RDMA_RQ_SGE_L_KEY_SHIFT 0
> +#define RDMA_RQ_SGE_NUM_SGES_MASK   0x7
> +#define RDMA_RQ_SGE_NUM_SGES_SHIFT  26
> +#define RDMA_RQ_SGE_RESERVED0_MASK  0x7
> +#define RDMA_RQ_SGE_RESERVED0_SHIFT 29
>  };

It is interesting twist to mix defines and structs together.


signature.asc
Description: PGP signature


Re: [PATCH v4 16/16] MAINTAINERS: Update for PVRDMA driver

2016-09-15 Thread Leon Romanovsky
On Mon, Sep 12, 2016 at 11:52:22AM -0600, Jason Gunthorpe wrote:
> On Sun, Sep 11, 2016 at 09:49:26PM -0700, Adit Ranadive wrote:
> > Add maintainer info for the PVRDMA driver.
>
> You can probably squash the last three patches.

It doesn't matter, Doug will squash the whole series anyway.


signature.asc
Description: PGP signature


Re: [PATCH v4 09/16] IB/pvrdma: Add support for Completion Queues

2016-09-15 Thread Yuval Shaia
Hi Adit,
Please see my comments inline.

Besides that I have no more comment for this patch.

Reviewed-by: Yuval Shaia 

Yuval

On Thu, Sep 15, 2016 at 12:07:29AM +, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 05:43:37 -0700, Yuval Shaia wrote:
> > On Sun, Sep 11, 2016 at 09:49:19PM -0700, Adit Ranadive wrote:
> > > +
> > > +static int pvrdma_poll_one(struct pvrdma_cq *cq, struct pvrdma_qp
> > **cur_qp,
> > > +struct ib_wc *wc)
> > > +{
> > > + struct pvrdma_dev *dev = to_vdev(cq->ibcq.device);
> > > + int has_data;
> > > + unsigned int head;
> > > + bool tried = false;
> > > + struct pvrdma_cqe *cqe;
> > > +
> > > +retry:
> > > + has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx,
> > > + cq->ibcq.cqe, &head);
> > > + if (has_data == 0) {
> > > + if (tried)
> > > + return -EAGAIN;
> > > +
> > > + /* Pass down POLL to give physical HCA a chance to poll. */
> > > + pvrdma_write_uar_cq(dev, cq->cq_handle |
> > PVRDMA_UAR_CQ_POLL);
> > > +
> > > + tried = true;
> > > + goto retry;
> > > + } else if (has_data == PVRDMA_INVALID_IDX) {
> > 
> > I didn't went throw the entire life cycle of RX-ring's head and tail but you
> > need to make sure that PVRDMA_INVALID_IDX error is recoverable one, i.e
> > there is probability that in the next call to pvrdma_poll_one it will be 
> > fine.
> > Otherwise it is an endless loop.
> 
> We have never run into this issue internally but I don't think we can recover 
> here 

I briefly reviewed the life cycle of RX-ring's head and tail and didn't
caught any suspicious place that might corrupt it.
So glad to see that you never encountered this case.

> in the driver. The only way to recover would be to destroy and recreate the 
> CQ 
> which we shouldn't do since it could be used by multiple QPs. 

Agree.
But don't they hit the same problem too?

> We don't have a way yet to recover in the device. Once we add that this check
> should go away.

To be honest i have no idea how to do that - i was expecting driver's vendors
to come up with an ideas :)
I once came up with an idea to force restart of the driver but it was
rejected.

> 
> The reason I returned an error value from poll_cq in v3 was to break the 
> possible 
> loop so that it might give clients a chance to recover. But since poll_cq is 
> not expected
> to fail I just log the device error here. I can revert to that version if you 
> want to break
> the possible loop.

Clients (ULPs) cannot recover from this case. They even do not check the
reason of the error and treats any error as -EAGAIN.

> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] nfp: fix error return code in nfp_net_netdev_open()

2016-09-15 Thread Jakub Kicinski
On Thu, 15 Sep 2016 03:45:07 +, Wei Yongjun wrote:
> From: Wei Yongjun 
> 
> Fix to return a negative error code from the error handling
> case instead of 0, as done elsewhere in this function.
> 
> Fixes: 73725d9dfd99 ("nfp: allocate ring SW structs dynamically")
> Signed-off-by: Wei Yongjun 

Acked-by: Jakub Kicinski 

FWIW this is for net.  Thanks Wei!


Re: [PATCHv3 net-next 05/15] bpf: enable non-core use of the verfier

2016-09-15 Thread Jakub Kicinski
On Wed, 14 Sep 2016 16:05:51 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 08:00:13PM +0100, Jakub Kicinski wrote:
> > Advanced JIT compilers and translators may want to use
> > eBPF verifier as a base for parsers or to perform custom
> > checks and validations.
> > 
> > Add ability for external users to invoke the verifier
> > and provide callbacks to be invoked for every intruction
> > checked.  For now only add most basic callback for
> > per-instruction pre-interpretation checks is added.  More
> > advanced users may also like to have per-instruction post
> > callback and state comparison callback.
> > 
> > Signed-off-by: Jakub Kicinski 
> > ---
> >  include/linux/bpf_parser.h |  89 ++
> >  kernel/bpf/verifier.c  | 134 
> > +++--
> >  2 files changed, 158 insertions(+), 65 deletions(-)
> >  create mode 100644 include/linux/bpf_parser.h
> > 
> > diff --git a/include/linux/bpf_parser.h b/include/linux/bpf_parser.h
> > new file mode 100644
> > index ..daa53b204f4d
> > --- /dev/null
> > +++ b/include/linux/bpf_parser.h  
> 
> 'bpf parser' is a bit misleading name, since it can be interpreted
> as parser written in bpf.
> Also the header file containes verifier bits, therefore I think
> the better name would be bpf_verifier.h ?
> 
> > +#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF 
> > program */
> > +
> > +struct verifier_env;
> > +struct bpf_ext_parser_ops {
> > +   int (*insn_hook)(struct verifier_env *env,
> > +int insn_idx, int prev_insn_idx);
> > +};  
> 
> How about calling this bpf_ext_analyzer_ops
> and main entry bpf_analyzer() ?
> I think it will better convey what it's doing.
> 
> > +
> > +/* single container for all structs
> > + * one verifier_env per bpf_check() call
> > + */
> > +struct verifier_env {
> > +   struct bpf_prog *prog;  /* eBPF program being verified */
> > +   struct verifier_stack_elem *head; /* stack of verifier states to be 
> > processed */
> > +   int stack_size; /* number of states to be processed */
> > +   struct verifier_state cur_state; /* current verifier state */
> > +   struct verifier_state_list **explored_states; /* search pruning 
> > optimization */
> > +   const struct bpf_ext_parser_ops *pops; /* external parser ops */
> > +   void *ppriv; /* pointer to external parser's private data */  
> 
> a bit hard to review, since move and addition is in one patch.

Agreed, I'll do move+prefix with bpf_ to one patch since they're both
"no functional changes" and additions to a separate one.

> I think ppriv and pops are too obscure names.
> May be analyzer_ops and analyzer_priv ?

I'll rename everything as suggested.
 
> Conceptually looks good.

Thanks!


Re: [PATCH v4 01/16] vmxnet3: Move PCI Id to pci_ids.h

2016-09-15 Thread Yuval Shaia
Besides that no more comments.

Reviewed-by: Yuval Shaia 

On Wed, Sep 14, 2016 at 07:36:34PM +, Adit Ranadive wrote:
> On Wed, Sep 14, 2016 at 09:25:18 -0700, Yuval Shaia wrote:
> > On Wed, Sep 14, 2016 at 04:00:25PM +, Adit Ranadive wrote:
> > > On Wed, Sep 14, 2016 at 04:09:12 -0700, Yuval Shaia wrote:
> > > > Please update vmxnet3_drv.c accordingly.
> > >
> > > Any reason why? I don't think we need to. Vmxnet3 should just pick up
> > > the moved PCI device id from pci_ids.h file.
> > 
> > So now you need to include it from vmxnet3_drv.c.
> > Same with pvrdma_main.c
> 
> If you're asking me to include pci_ids.h in our drivers we already do that
> by including pci.h in both the drivers. 
> pci.h already includes pci_ids.h - 
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/pci.h#n35
> 
> If that's going to change maybe someone from the PCI group can comment on.
> 
> Thanks,
> Adit
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv3 net-next 08/15] nfp: add BPF to NFP code translator

2016-09-15 Thread Jakub Kicinski
On Wed, 14 Sep 2016 16:15:11 -0700, Alexei Starovoitov wrote:
> On Wed, Sep 14, 2016 at 08:00:16PM +0100, Jakub Kicinski wrote:
> > Add translator for JITing eBPF to operations which
> > can be executed on NFP's programmable engines.
> > 
> > Signed-off-by: Jakub Kicinski 
> > ---
> > v3:
> >  - don't clone the program for the verifier (no longer needed);
> >  - temporarily add a local copy of macros from bitfield.h.  
> 
> so what's the status of that other patch? which tree is it going through?

It's in wireless-drivers-next, Kalle says it should be landing in
net-next early next week.

> Does it mean we'd have to wait till after the merge window? :(
> That would be sad, since it looks like it almost ready.

If it's OK with everyone I was hoping I could have that small local copy
of the macros until bitfield.h gets propagated and then we don't have
to wait :S


Re: [PATCH v5 0/6] Add eBPF hooks for cgroups

2016-09-15 Thread Daniel Mack
On 09/15/2016 08:36 AM, Vincent Bernat wrote:
>  ❦ 12 septembre 2016 18:12 CEST, Daniel Mack  :
> 
>> * The sample program learned to support both ingress and egress, and
>>   can now optionally make the eBPF program drop packets by making it
>>   return 0.
> 
> Ability to lock the eBPF program to avoid modification from a later
> program or in a subcgroup would be pretty interesting from a security
> perspective.

For now, you can achieve that by dropping CAP_NET_ADMIN after installing
a program between fork and exec. I think that should suffice for a first
version. Flags to further limit that could be be added later.


Thanks,
Daniel


Re: [PATCH 3/9] net: ethernet: ti: cpts: rework initialization/deinitialization

2016-09-15 Thread Richard Cochran
On Wed, Sep 14, 2016 at 04:02:25PM +0300, Grygorii Strashko wrote:
> The current implementation CPTS initialization and deinitialization
> (represented by cpts_register/unregister()) is pretty entangled and
> has some issues, like:
> - ptp clock registered before spinlock, which is protecting it, and
> before timecounter and cyclecounter initialization;
> - CPTS ref_clk requested using devm API while cpts_register() is
> called from .ndo_open(), as result additional checks required;
> - CPTS ref_clk is prepared, but never unprepared;
> - CPTS is not disabled even when unregistered..

This list of four items is a clear sign that this one patch should be
broken into a series of four.

Thanks,
Richard


[PATCH] iproute2: build nsid-name cache only for commands that need it

2016-09-15 Thread Anton Aksola
The calling of netns_map_init() before command parsing introduced
a performance issue with large number of namespaces.

As commands such as add, del and exec do not need to iterate through
/var/run/netns it would be good not no build the cache before executing
these commands.

Example:
unpatched:
time seq 1 1000 | xargs -n 1 ip netns add

real0m16.832s
user0m1.350s
sys0m15.029s

patched:
time seq 1 1000 | xargs -n 1 ip netns add

real0m3.859s
user0m0.132s
sys0m3.205s

Signed-off-by: Anton Aksola 
---
 ip/ipnetns.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index af87065..4546fe7 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -775,8 +775,6 @@ static int netns_monitor(int argc, char **argv)
 
 int do_netns(int argc, char **argv)
 {
-   netns_map_init();
-
if (argc < 1)
return netns_list(0, NULL);
 
@@ -784,8 +782,10 @@ int do_netns(int argc, char **argv)
(matches(*argv, "lst") == 0))
return netns_list(argc-1, argv+1);
 
-   if ((matches(*argv, "list-id") == 0))
+   if ((matches(*argv, "list-id") == 0)) {
+   netns_map_init();
return netns_list_id(argc-1, argv+1);
+   }
 
if (matches(*argv, "help") == 0)
return usage();
-- 
1.8.3.1



Re: [RFC v2 00/10] Landlock LSM: Unprivileged sandboxing

2016-09-15 Thread Pavel Machek
Hi!

> This series is a proof of concept to fill some missing part of seccomp as the
> ability to check syscall argument pointers or creating more dynamic security
> policies. The goal of this new stackable Linux Security Module (LSM) called
> Landlock is to allow any process, including unprivileged ones, to create
> powerful security sandboxes comparable to the Seatbelt/XNU Sandbox or the
> OpenBSD Pledge. This kind of sandbox help to mitigate the security impact of
> bugs or unexpected/malicious behaviors in userland applications.
> 
> The first RFC [1] was focused on extending seccomp while staying at the 
> syscall
> level. This brought a working PoC but with some (mitigated) ToCToU race
> conditions due to the seccomp ptrace hole (now fixed) and the non-atomic
> syscall argument evaluation (hence the LSM hooks).

Long and nice description follows. Should it go to Documentation/
somewhere?

Because some documentation would be useful...
Pavel

>  include/linux/bpf.h   |  41 +
>  include/linux/lsm_hooks.h |   5 +
>  include/linux/seccomp.h   |  54 ++-
>  include/uapi/asm-generic/errno-base.h |   1 +
>  include/uapi/linux/bpf.h  | 103 
>  include/uapi/linux/seccomp.h  |   2 +
>  kernel/bpf/arraymap.c | 222 +
>  kernel/bpf/syscall.c  |  18 ++-
>  kernel/bpf/verifier.c |  32 +++-
>  kernel/fork.c |  41 -
>  kernel/seccomp.c  | 211 +++-
>  samples/Makefile  |   2 +-
>  samples/landlock/.gitignore   |   1 +
>  samples/landlock/Makefile |  16 ++
>  samples/landlock/sandbox.c| 295 
> ++
>  security/Kconfig  |   1 +
>  security/Makefile |   2 +
>  security/landlock/Kconfig |  19 +++
>  security/landlock/Makefile|   3 +
>  security/landlock/checker_cgroup.c|  96 +++
>  security/landlock/checker_cgroup.h|  18 +++
>  security/landlock/checker_fs.c| 183 +
>  security/landlock/checker_fs.h|  20 +++
>  security/landlock/lsm.c   | 228 ++
>  security/security.c   |   1 +
>  25 files changed, 1592 insertions(+), 23 deletions(-)
>  create mode 100644 samples/landlock/.gitignore
>  create mode 100644 samples/landlock/Makefile
>  create mode 100644 samples/landlock/sandbox.c
>  create mode 100644 security/landlock/Kconfig
>  create mode 100644 security/landlock/Makefile
>  create mode 100644 security/landlock/checker_cgroup.c
>  create mode 100644 security/landlock/checker_cgroup.h
>  create mode 100644 security/landlock/checker_fs.c
>  create mode 100644 security/landlock/checker_fs.h
>  create mode 100644 security/landlock/lsm.c
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


Re: [RFC 09/11] Add LL2 RoCE interface

2016-09-15 Thread Leon Romanovsky
On Mon, Sep 12, 2016 at 07:07:43PM +0300, Ram Amrani wrote:
> Add light L2 interface for RoCE.
>
> Signed-off-by: Rajesh Borundia 
> Signed-off-by: Ram Amrani 
> ---

<>

> + DP_ERR(cdev,
> +"QED RoCE set MAC filter failed - roce_info/ll2 NULL\n");
> + return -EINVAL;
> + }
> +
> + p_ptt = qed_ptt_acquire(QED_LEADING_HWFN(cdev));
> + if (!p_ptt) {
> + DP_ERR(cdev,
> +"qed roce ll2 mac filter set: failed to acquire PTT\n");
> + return -EINVAL;
> + }

Please use single style for your debug prints QED RoCE vs. qed roce.


signature.asc
Description: PGP signature


Re: [PATCH v2 net-next 1/2] net: phy: Add Edge-rate driver for Microsemi PHYs.

2016-09-15 Thread Raju Lakkaraju
Hi Andrew,

Thank you for review the code.

On Fri, Sep 09, 2016 at 03:18:32PM +0200, Andrew Lunn wrote:
> EXTERNAL EMAIL
> 
> 
> > > > +static int vsc85xx_edge_rate_cntl_set(struct phy_device *phydev,
> > > > +   u8 edge_rate)
> > >
> > > No spaces place.
> > >
> > I ran the checkpatch. I did not find any error. I created another workspace 
> > and
> > applied the same patch. It shows the correct alignement. I have used tabs 
> > (8 space width).
> > then some spaces to align braces.
> 
> Sorry, i worded that poorly. I was meaning between the u8 and edge. A
> single space is enough.
> 
I accepted your suggestion.

> > > > +#ifdef CONFIG_OF_MDIO
> > > > +static int vsc8531_of_init(struct phy_device *phydev)
> > > > +{
> > > > + int rc;
> > > > + struct vsc8531_private *vsc8531 = phydev->priv;
> > > > + struct device *dev = &phydev->mdio.dev;
> > > > + struct device_node *of_node = dev->of_node;
> > > > +
> > > > + if (!of_node)
> > > > + return -ENODEV;
> > > > +
> > > > + rc = of_property_read_u8(of_node, "vsc8531,edge-rate",
> > > > +  &vsc8531->edge_rate);
> > >
> > > Until you have written the Documentation, it is hard for me to tell,
> > > but device tree bindings should use real units, like seconds, Ohms,
> > > Farads, etc. Is the edge rate in nS? Or is it some magic value which
> > > just gets written into the register?
> > >
> >
> > This is some magic value which just gets written into the register.
> 
> Magic values are generally not accepted in device tree bindings. Both
> Micrel and Renesas define their clock skew in ps, for example. Since
> this is rise time, it should also be possible to define it in a unit
> of time.
> 

I accepted your comment. I had discussion with my hardware team and explained
the code review comments.
They asked me to define as picoseconds as units.

> > > >  static int vsc85xx_config_init(struct phy_device *phydev)
> > > >  {
> > > >   int rc;
> > > > + struct vsc8531_private *vsc8531;
> > > > +
> > > > + if (!phydev->priv) {
> > >
> > > How can this happen?
> > >
> >
> > VSC 8531 driver don't have any private structure assigned initially.
> > Allways priv points to NULL.
> 
> So if it cannot happen, don't check for it.
> 
> Also, by convention, you allocate memory in the .probe() function of a
> driver. Please do it there.
> 
I accepted your review comment. 
I will re-send the patch with updates.

> Andrew

---
Thanks,
Raju.


[PATCH net-next 2/2] net sched ife action: Introduce skb tcindex metadata encap decap

2016-09-15 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Sample use case of how this is encoded:
user space via tuntap (or a connected VM/Machine/container)
encodes the tcindex TLV.

Sample use case of decoding:
IFE action decodes it and the skb->tc_index is then used to classify.
So something like this for encoded ICMP packets:

.. first decode then reclassify... skb->tcindex will be set
sudo $TC filter add dev $ETH parent : prio 2 protocol 0xbeef \
u32 match u32 0 0 flowid 1:1 \
action ife decode reclassify

...next match the decode icmp packet...
sudo $TC filter add dev $ETH parent : prio 4 protocol ip \
u32 match ip protocol 1 0xff flowid 1:1 \
action continue
... last classify it using the tcindex classifier and do someaction..
sudo $TC filter add dev $ETH parent : prio 5 protocol ip \
handle 0x11 tcindex classid 1:1 \
action blah..

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/tc_act/tc_ife.h |  3 +-
 net/sched/Kconfig  |  5 +++
 net/sched/Makefile |  1 +
 net/sched/act_meta_skbtcindex.c| 81 ++
 4 files changed, 89 insertions(+), 1 deletion(-)
 create mode 100644 net/sched/act_meta_skbtcindex.c

diff --git a/include/uapi/linux/tc_act/tc_ife.h 
b/include/uapi/linux/tc_act/tc_ife.h
index 4ece02a..cd18360 100644
--- a/include/uapi/linux/tc_act/tc_ife.h
+++ b/include/uapi/linux/tc_act/tc_ife.h
@@ -32,8 +32,9 @@ enum {
 #define IFE_META_HASHID 2
 #defineIFE_META_PRIO 3
 #defineIFE_META_QMAP 4
+#defineIFE_META_TCINDEX 5
 /*Can be overridden at runtime by module option*/
-#define__IFE_META_MAX 5
+#define__IFE_META_MAX 6
 #define IFE_META_MAX (__IFE_META_MAX - 1)
 
 #endif
diff --git a/net/sched/Kconfig b/net/sched/Kconfig
index 7795d5a..87956a7 100644
--- a/net/sched/Kconfig
+++ b/net/sched/Kconfig
@@ -793,6 +793,11 @@ config NET_IFE_SKBPRIO
 depends on NET_ACT_IFE
 ---help---
 
+config NET_IFE_SKBTCINDEX
+tristate "Support to encoding decoding skb tcindex on IFE action"
+depends on NET_ACT_IFE
+---help---
+
 config NET_CLS_IND
bool "Incoming device classification"
depends on NET_CLS_U32 || NET_CLS_FW
diff --git a/net/sched/Makefile b/net/sched/Makefile
index 148ae0d..4bdda36 100644
--- a/net/sched/Makefile
+++ b/net/sched/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_NET_ACT_SKBMOD)  += act_skbmod.o
 obj-$(CONFIG_NET_ACT_IFE)  += act_ife.o
 obj-$(CONFIG_NET_IFE_SKBMARK)  += act_meta_mark.o
 obj-$(CONFIG_NET_IFE_SKBPRIO)  += act_meta_skbprio.o
+obj-$(CONFIG_NET_IFE_SKBTCINDEX)   += act_meta_skbtcindex.o
 obj-$(CONFIG_NET_ACT_TUNNEL_KEY)+= act_tunnel_key.o
 obj-$(CONFIG_NET_SCH_FIFO) += sch_fifo.o
 obj-$(CONFIG_NET_SCH_CBQ)  += sch_cbq.o
diff --git a/net/sched/act_meta_skbtcindex.c b/net/sched/act_meta_skbtcindex.c
new file mode 100644
index 000..ec43327
--- /dev/null
+++ b/net/sched/act_meta_skbtcindex.c
@@ -0,0 +1,81 @@
+/*
+ * net/sched/act_meta_tc_index.c IFE skb->tc_index metadata module
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * copyright Jamal Hadi Salim (2016)
+ *
+*/
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static int skbtcindex_encode(struct sk_buff *skb, void *skbdata,
+struct tcf_meta_info *e)
+{
+   u32 ifetc_index = skb->tc_index;
+
+   return ife_encode_meta_u16(ifetc_index, skbdata, e);
+}
+
+static int skbtcindex_decode(struct sk_buff *skb, void *data, u16 len)
+{
+   u16 ifetc_index = *(u16 *)data;
+
+   skb->tc_index = ntohs(ifetc_index);
+   return 0;
+}
+
+static int skbtcindex_check(struct sk_buff *skb, struct tcf_meta_info *e)
+{
+   return ife_check_meta_u16(skb->tc_index, e);
+}
+
+static struct tcf_meta_ops ife_skbtcindex_ops = {
+   .metaid = IFE_META_TCINDEX,
+   .metatype = NLA_U16,
+   .name = "tc_index",
+   .synopsis = "skb tc_index 16 bit metadata",
+   .check_presence = skbtcindex_check,
+   .encode = skbtcindex_encode,
+   .decode = skbtcindex_decode,
+   .get = ife_get_meta_u16,
+   .alloc = ife_alloc_meta_u16,
+   .release = ife_release_meta_gen,
+   .validate = ife_validate_meta_u16,
+   .owner = THIS_MODULE,
+};
+
+static int __init ifetc_index_init_module(void)
+{
+   pr_emerg("Loaded IFE tc_index\n");
+   return register_ife_op(&ife_skbtcindex_ops);
+}
+
+static void __exit ifetc_index_cleanup_module(void)
+{
+   pr_emerg("Unloaded IFE tc_index\n");
+   unregister_ife_op(&ife_skbtcindex_ops);
+}
+
+module_init(ifetc_index_init_module);
+module_exit(ifetc_index_cleanup_module);
+
+MODULE_AUTHOR("Jamal Hadi Salim

[PATCH net-next 1/2] net sched ife action: add 16 bit helpers

2016-09-15 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

encoder and checker for 16 bits metadata

Signed-off-by: Jamal Hadi Salim 
---
 include/net/tc_act/tc_ife.h |  2 ++
 net/sched/act_ife.c | 26 ++
 2 files changed, 28 insertions(+)

diff --git a/include/net/tc_act/tc_ife.h b/include/net/tc_act/tc_ife.h
index 5164bd7..9fd2bea0 100644
--- a/include/net/tc_act/tc_ife.h
+++ b/include/net/tc_act/tc_ife.h
@@ -50,9 +50,11 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 
dlen,
 int ife_alloc_meta_u32(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
 int ife_alloc_meta_u16(struct tcf_meta_info *mi, void *metaval, gfp_t gfp);
 int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi);
+int ife_check_meta_u16(u16 metaval, struct tcf_meta_info *mi);
 int ife_encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi);
 int ife_validate_meta_u32(void *val, int len);
 int ife_validate_meta_u16(void *val, int len);
+int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi);
 void ife_release_meta_gen(struct tcf_meta_info *mi);
 int register_ife_op(struct tcf_meta_ops *mops);
 int unregister_ife_op(struct tcf_meta_ops *mops);
diff --git a/net/sched/act_ife.c b/net/sched/act_ife.c
index e87cd81..ccf7b4b 100644
--- a/net/sched/act_ife.c
+++ b/net/sched/act_ife.c
@@ -63,6 +63,23 @@ int ife_tlv_meta_encode(void *skbdata, u16 attrtype, u16 
dlen, const void *dval)
 }
 EXPORT_SYMBOL_GPL(ife_tlv_meta_encode);
 
+int ife_encode_meta_u16(u16 metaval, void *skbdata, struct tcf_meta_info *mi)
+{
+   u16 edata = 0;
+
+   if (mi->metaval)
+   edata = *(u16 *)mi->metaval;
+   else if (metaval)
+   edata = metaval;
+
+   if (!edata) /* will not encode */
+   return 0;
+
+   edata = htons(edata);
+   return ife_tlv_meta_encode(skbdata, mi->metaid, 2, &edata);
+}
+EXPORT_SYMBOL_GPL(ife_encode_meta_u16);
+
 int ife_get_meta_u32(struct sk_buff *skb, struct tcf_meta_info *mi)
 {
if (mi->metaval)
@@ -81,6 +98,15 @@ int ife_check_meta_u32(u32 metaval, struct tcf_meta_info *mi)
 }
 EXPORT_SYMBOL_GPL(ife_check_meta_u32);
 
+int ife_check_meta_u16(u16 metaval, struct tcf_meta_info *mi)
+{
+   if (metaval || mi->metaval)
+   return 8; /* T+L+(V) == 2+2+(2+2bytepad) */
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(ife_check_meta_u16);
+
 int ife_encode_meta_u32(u32 metaval, void *skbdata, struct tcf_meta_info *mi)
 {
u32 edata = metaval;
-- 
1.9.1



Re: [PATCH v2 net-next 2/2] net: phy: Add MAC-IF driver for Microsemi PHYs.

2016-09-15 Thread Raju Lakkaraju
Hi Andrew,

Thank you for review the code.
I accepted all your review comments.
I will send the update patch for review again.

Thanks,
Raju.
On Fri, Sep 09, 2016 at 02:03:46PM +0200, Andrew Lunn wrote:
> EXTERNAL EMAIL
> 
> 
> On Fri, Sep 09, 2016 at 11:23:52AM +0530, Raju Lakkaraju wrote:
> > Hi Andrew,
> >
> > Thank you for review the code and valuable comments.
> >
> > On Thu, Sep 08, 2016 at 03:27:27PM +0200, Andrew Lunn wrote:
> > > EXTERNAL EMAIL
> > >
> > >
> > > On Thu, Sep 08, 2016 at 02:47:22PM +0530, Raju Lakkaraju wrote:
> > > > From: Raju Lakkaraju 
> > > >
> > > > Used Device Tree to configure the MAC Interface as per review comments 
> > > > and
> > > > re-sending code for review
> > >
> > > I don't see anything about device tree in this patch...
> > >
> > Ethernet driver (in my BBB environment, TI cpsw driver) read the device tree
> > phy interface parameter and update in phydev structure.
> >
> > In device tree the following code holds the phy interface configuration.
> > &cpsw_emac0 {
> > phy_id = <&davinci_mdio>, <0>;
> > phy-mode = "rgmii";
> > };
> 
> O.K, that is one place it can come from. But it is not the only,
> e.g. platform data or ACPI. A better comment might be:
> 
> Configure the MAC/PHY interface as indicated in phydev->interface,
> eg. GMII, RMII, RGMII.
> 
> Andrew


Re: [PATCH 7/9] net: ethernet: ti: cpts: calc mult and shift from refclk freq

2016-09-15 Thread Richard Cochran
On Wed, Sep 14, 2016 at 10:26:19PM +0200, Richard Cochran wrote:
> On Wed, Sep 14, 2016 at 04:02:29PM +0300, Grygorii Strashko wrote:
> > +   clocks_calc_mult_shift(&mult, &shift, freq, NSEC_PER_SEC, maxsec);
> > +
> > +   cpts->cc_mult = mult;
> > +   cpts->cc.mult = mult;
> 
> In order to get good resolution on the frequency adjustment, we want
> to keep 'mult' as large as possible.  I don't see your code doing
> this.  We can rely on the watchdog reader (work queue) to prevent
> overflows.

I took a closer look, and assuming cc.mask = 2^32 - 1, then using
clocks_calc_mult_shift() produces good results for a reasonable range
of input frequencies.  Keeping 'maxsec' constant at 4 we have:

   | Freq. MHz |   mult | shift |
   |---++---|
   |   100 | 0xa000 |28 |
   |   250 | 0x8000 |29 |
   |   500 | 0x8000 |30 |
   |  1000 | 0x8000 |31 |

Can the input clock be higher than 1 GHz?  If not, I suggest using
clocks_calc_mult_shift() with maxsec=4 and a setting the watchdog also
to 4*HZ.

Thanks,
Richard



Re: [PATCH net-next 2/3] net/sched: cls_flower: Remove an unsed field from the filter key structure

2016-09-15 Thread Sergei Shtylyov

On 9/13/2016 5:02 PM, Or Gerlitz wrote:


Commit c3f8324188fa "net: Add full IPv6 addresses to flow_keys" added an
unsed instance of struct flow_dissector_key_addrs into struct fl_flow_key,


   Unused?


remove it.

Signed-off-by: Or Gerlitz 
Reported-by: Hadar Hen Zion 

[...]

MBR, Sergei



Re: [PATCH v3 net 1/1] net sched actions: fix GETing actions

2016-09-15 Thread Sergei Shtylyov

On 9/13/2016 2:07 AM, Jamal Hadi Salim wrote:


From: Jamal Hadi Salim 

With the batch changes that translated transient actions into
a temporary list lost in the translation was the fact that
tcf_action_destroy() will eventually delete the action from
the permanent location if the refcount is zero.

Example of what broke:
...add a gact action to drop
sudo $TC actions add action drop index 10
...now retrieve it, looks good
sudo $TC actions get action gact index 10
...retrieve it again and find it is gone!
sudo $TC actions get action gact index 10

Fixes:
commit 22dc13c837c3 ("net_sched: convert tcf_exts from list to pointer array"),
commit 824a7e8863b3 ("net_sched: remove an unnecessary list_del()")
commit f07fed82ad79 ("net_sched: remove the leftover cleanup_a()")

Signed-off-by: Jamal Hadi Salim 
---
 net/sched/act_api.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/net/sched/act_api.c b/net/sched/act_api.c
index d09d068..50720b1 100644
--- a/net/sched/act_api.c
+++ b/net/sched/act_api.c
@@ -592,6 +592,16 @@ err_out:
return ERR_PTR(err);
 }

+static void cleanup_a(struct list_head *actions, int ovr)
+{
+   struct tc_action *a;
+
+   list_for_each_entry(a, actions, list) {
+   if (ovr)
+   a->tcfa_refcnt -= 1;


a->tcfa_refcnt--;

[...]

@@ -612,8 +622,15 @@ int tcf_action_init(struct net *net, struct nlattr *nla,
goto err;
}
act->order = i;
+   if (ovr)
+   act->tcfa_refcnt += 1;


act->tcfa_refcnt++;

[...]

@@ -883,6 +900,8 @@ tca_action_gd(struct net *net, struct nlattr *nla, struct 
nlmsghdr *n,
goto err;
}
act->order = i;
+   if (event == RTM_GETACTION)
+   act->tcfa_refcnt += 1;


act->tcfa_refcnt++;

[...]

MBR, Sergei



Re: [PATCH] net/mlx4_en: fix off by one in error handling

2016-09-15 Thread Tariq Toukan



On 14/09/2016 7:08 PM, Sebastian Ott wrote:

On Wed, 14 Sep 2016, Tariq Toukan wrote:

On 14/09/2016 4:53 PM, Sebastian Ott wrote:

On Wed, 14 Sep 2016, Tariq Toukan wrote:

On 14/09/2016 2:09 PM, Sebastian Ott wrote:

If an error occurs in mlx4_init_eq_table the index used in the
err_out_unmap label is one too big which results in a panic in
mlx4_free_eq. This patch fixes the index in the error path.

You are right, but your change below does not cover all cases.
The full solution looks like this:

@@ -1260,7 +1260,7 @@ int mlx4_init_eq_table(struct mlx4_dev *dev)
   eq);
  }
  if (err)
-   goto err_out_unmap;
+   goto err_out_unmap_excluded;

In this case a call to mlx4_create_eq failed. Do you really have to call
mlx4_free_eq for this index again?

We agree on this part, that's why here we should goto the _excluded_ label.
For all other parts, we should not exclude the eq in the highest index, and
thus we goto the _non_excluded_ label.

But that's exactly what the original patch does. If the failure is within
the for loop at index i, we do the cleanup starting at index i-1. If the
failure is after the for loop then i == dev->caps.num_comp_vectors + 1
and we do the cleanup starting at index i == dev->caps.num_comp_vectors.

In the latter case your patch would have an out of bounds array access.

Indeed. Agreed.


Regards,
Sebastian



Reviewed-by: Tariq Toukan 

Thanks!



[PATCH net-next V2 2/3] net/sched: cls_flower: Remove an unused field from the filter key structure

2016-09-15 Thread Or Gerlitz
Commit c3f8324188fa "net: Add full IPv6 addresses to flow_keys" added an
unused instance of struct flow_dissector_key_addrs into struct fl_flow_key,
remove it.

Signed-off-by: Or Gerlitz 
Reported-by: Hadar Hen Zion 
Acked-by: Jiri Pirko 
---
 net/sched/cls_flower.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index 027523c..a3f4c70 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -33,7 +33,6 @@ struct fl_flow_key {
struct flow_dissector_key_basic basic;
struct flow_dissector_key_eth_addrs eth;
struct flow_dissector_key_vlan vlan;
-   struct flow_dissector_key_addrs ipaddrs;
union {
struct flow_dissector_key_ipv4_addrs ipv4;
struct flow_dissector_key_ipv6_addrs ipv6;
-- 
2.3.7



[PATCH net-next V2 3/3] net/sched: cls_flower: Specify vlan attributes format in the UAPI header

2016-09-15 Thread Or Gerlitz
Specify the format (size and endianess) for the vlan attributes.

Signed-off-by: Or Gerlitz 
Acked-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index 60ea2a0..8915b61 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -428,9 +428,9 @@ enum {
TCA_FLOWER_KEY_UDP_DST, /* be16 */
 
TCA_FLOWER_FLAGS,
-   TCA_FLOWER_KEY_VLAN_ID,
-   TCA_FLOWER_KEY_VLAN_PRIO,
-   TCA_FLOWER_KEY_VLAN_ETH_TYPE,
+   TCA_FLOWER_KEY_VLAN_ID, /* be16 */
+   TCA_FLOWER_KEY_VLAN_PRIO,   /* u8   */
+   TCA_FLOWER_KEY_VLAN_ETH_TYPE,   /* be16 */
 
TCA_FLOWER_KEY_ENC_KEY_ID,  /* be32 */
TCA_FLOWER_KEY_ENC_IPV4_SRC,/* be32 */
-- 
2.3.7



[PATCH net-next V2 0/3] net/sched: cls_flower: Add ports masks

2016-09-15 Thread Or Gerlitz
Hi Dave, 

This series adds the ability to specify tcp/udp ports masks 
for TC/flower filter matches.

I also removed an unused fields from the flower keys struct 
and clarified the format of the recently added vlan attibutes.

Or.

v1--> v2 changes: 

 * fixes typo in patch #2 title and change log (Sergei)
 * added acks provided by Jiri on v1
 
FWIW, by mistake the cover letter of V1 (but not the patches)
carried V2 tag, hope this doesn't create too much confusion.

Or Gerlitz (3):
  net/sched: cls_flower: Support masking for matching on tcp/udp ports
  net/sched: cls_flower: Remove an unused field from the filter key structure
  net/sched: cls_flower: Specify vlan attributes format in the UAPI header

 include/uapi/linux/pkt_cls.h | 10 +++---
 net/sched/cls_flower.c   | 21 -
 2 files changed, 19 insertions(+), 12 deletions(-)

-- 
2.3.7



[PATCH net-next V2 1/3] net/sched: cls_flower: Support masking for matching on tcp/udp ports

2016-09-15 Thread Or Gerlitz
Add the definitions for src/dst udp/tcp port masks and use
them when setting && dumping the relevant keys.

Signed-off-by: Or Gerlitz 
Signed-off-by: Paul Blakey 
Acked-by: Jiri Pirko 
---
 include/uapi/linux/pkt_cls.h |  4 
 net/sched/cls_flower.c   | 20 
 2 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index f9c287c..60ea2a0 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -442,6 +442,10 @@ enum {
TCA_FLOWER_KEY_ENC_IPV6_DST,/* struct in6_addr */
TCA_FLOWER_KEY_ENC_IPV6_DST_MASK,/* struct in6_addr */
 
+   TCA_FLOWER_KEY_TCP_SRC_MASK,/* be16 */
+   TCA_FLOWER_KEY_TCP_DST_MASK,/* be16 */
+   TCA_FLOWER_KEY_UDP_SRC_MASK,/* be16 */
+   TCA_FLOWER_KEY_UDP_DST_MASK,/* be16 */
__TCA_FLOWER_MAX,
 };
 
diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c
index b084b2a..027523c 100644
--- a/net/sched/cls_flower.c
+++ b/net/sched/cls_flower.c
@@ -335,6 +335,10 @@ static const struct nla_policy fl_policy[TCA_FLOWER_MAX + 
1] = {
[TCA_FLOWER_KEY_ENC_IPV6_SRC_MASK] = { .len = sizeof(struct in6_addr) },
[TCA_FLOWER_KEY_ENC_IPV6_DST]   = { .len = sizeof(struct in6_addr) },
[TCA_FLOWER_KEY_ENC_IPV6_DST_MASK] = { .len = sizeof(struct in6_addr) },
+   [TCA_FLOWER_KEY_TCP_SRC_MASK]   = { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_TCP_DST_MASK]   = { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_UDP_SRC_MASK]   = { .type = NLA_U16 },
+   [TCA_FLOWER_KEY_UDP_DST_MASK]   = { .type = NLA_U16 },
 };
 
 static void fl_set_key_val(struct nlattr **tb,
@@ -432,17 +436,17 @@ static int fl_set_key(struct net *net, struct nlattr **tb,
 
if (key->basic.ip_proto == IPPROTO_TCP) {
fl_set_key_val(tb, &key->tp.src, TCA_FLOWER_KEY_TCP_SRC,
-  &mask->tp.src, TCA_FLOWER_UNSPEC,
+  &mask->tp.src, TCA_FLOWER_KEY_TCP_SRC_MASK,
   sizeof(key->tp.src));
fl_set_key_val(tb, &key->tp.dst, TCA_FLOWER_KEY_TCP_DST,
-  &mask->tp.dst, TCA_FLOWER_UNSPEC,
+  &mask->tp.dst, TCA_FLOWER_KEY_TCP_DST_MASK,
   sizeof(key->tp.dst));
} else if (key->basic.ip_proto == IPPROTO_UDP) {
fl_set_key_val(tb, &key->tp.src, TCA_FLOWER_KEY_UDP_SRC,
-  &mask->tp.src, TCA_FLOWER_UNSPEC,
+  &mask->tp.src, TCA_FLOWER_KEY_UDP_SRC_MASK,
   sizeof(key->tp.src));
fl_set_key_val(tb, &key->tp.dst, TCA_FLOWER_KEY_UDP_DST,
-  &mask->tp.dst, TCA_FLOWER_UNSPEC,
+  &mask->tp.dst, TCA_FLOWER_KEY_UDP_DST_MASK,
   sizeof(key->tp.dst));
}
 
@@ -877,18 +881,18 @@ static int fl_dump(struct net *net, struct tcf_proto *tp, 
unsigned long fh,
 
if (key->basic.ip_proto == IPPROTO_TCP &&
(fl_dump_key_val(skb, &key->tp.src, TCA_FLOWER_KEY_TCP_SRC,
-&mask->tp.src, TCA_FLOWER_UNSPEC,
+&mask->tp.src, TCA_FLOWER_KEY_TCP_SRC_MASK,
 sizeof(key->tp.src)) ||
 fl_dump_key_val(skb, &key->tp.dst, TCA_FLOWER_KEY_TCP_DST,
-&mask->tp.dst, TCA_FLOWER_UNSPEC,
+&mask->tp.dst, TCA_FLOWER_KEY_TCP_DST_MASK,
 sizeof(key->tp.dst
goto nla_put_failure;
else if (key->basic.ip_proto == IPPROTO_UDP &&
 (fl_dump_key_val(skb, &key->tp.src, TCA_FLOWER_KEY_UDP_SRC,
- &mask->tp.src, TCA_FLOWER_UNSPEC,
+ &mask->tp.src, TCA_FLOWER_KEY_UDP_SRC_MASK,
  sizeof(key->tp.src)) ||
  fl_dump_key_val(skb, &key->tp.dst, TCA_FLOWER_KEY_UDP_DST,
- &mask->tp.dst, TCA_FLOWER_UNSPEC,
+ &mask->tp.dst, TCA_FLOWER_KEY_UDP_DST_MASK,
  sizeof(key->tp.dst
goto nla_put_failure;
 
-- 
2.3.7



[PATCH net 6/7] qeth: do not turn on SG per default

2016-09-15 Thread Ursula Braun
According to recent performance measurements, turning on net_device
feature NETIF_F_SG only behaves well, but turning on feature
NETIF_F_GSO shows bad results. Since the kernel activates NETIF_F_GSO
automatically as soon as the driver configures feature NETIF_F_SG, qeth
should not activate feature NETIF_F_SG per default, until the qeth
problems with NETIF_F_GSO are solved.

Signed-off-by: Ursula Braun 
Reviewed-by: Thomas Richter 
---
 drivers/s390/net/qeth_l2_main.c | 2 --
 drivers/s390/net/qeth_l3_main.c | 1 -
 2 files changed, 3 deletions(-)

diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 2081c18..bb27058 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1124,8 +1124,6 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
card->dev->hw_features |= NETIF_F_RXCSUM;
card->dev->vlan_features |= NETIF_F_RXCSUM;
}
-   /* Turn on SG per default */
-   card->dev->features |= NETIF_F_SG;
}
card->info.broadcast_capable = 1;
qeth_l2_request_initial_mac(card);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 0cbbc80..c00f6db 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3120,7 +3120,6 @@ static int qeth_l3_setup_netdev(struct qeth_card *card)
card->dev->vlan_features = NETIF_F_SG |
NETIF_F_RXCSUM | NETIF_F_IP_CSUM |
NETIF_F_TSO;
-   card->dev->features = NETIF_F_SG;
}
}
} else if (card->info.type == QETH_CARD_TYPE_IQD) {
-- 
2.8.4



[PATCH net 0/7] 390: qeth patches

2016-09-15 Thread Ursula Braun
Hi Dave,

here are several fixes for the s390 qeth driver, built for net.

Thanks,
Ursula

Hans Wippel (1):
  qeth: restore device features after recovery

Thomas Richter (1):
  s390/qeth: fix setting VIPA address

Ursula Braun (5):
  s390/qeth: use ip_lock for hsuid configuration
  s390/qeth: allow hsuid configuration in DOWN state
  qeth: check not more than 16 SBALEs on the completion queue
  qeth: do not limit number of gso segments
  qeth: do not turn on SG per default

 drivers/s390/net/qeth_core.h  |  1 +
 drivers/s390/net/qeth_core_main.c | 32 +++-
 drivers/s390/net/qeth_l2_main.c   |  6 +++---
 drivers/s390/net/qeth_l3_main.c   | 29 -
 drivers/s390/net/qeth_l3_sys.c|  5 +
 5 files changed, 60 insertions(+), 13 deletions(-)

-- 
2.8.4



[PATCH net 5/7] qeth: do not limit number of gso segments

2016-09-15 Thread Ursula Braun
To reduce the need of skb_linearize() calls, gso_max_segs of qeth
net_devices had been limited according to the maximum number of qdio SBAL
elements. But a gso segment cannot be larger than the mtu-size, while an
SBAL element can contain up to 4096 bytes. The gso_max_segs limitation
limits the maximum packet size given to the qeth driver. Performance
measurements with tso-enabled qeth network interfaces and mtu-size 1500
showed, that the disadvantage of smaller packets is much more severe than
the advantage of fewer skb_linearize() calls.
This patch gets rid of the gso_max_segs limitations in the qeth driver.

Signed-off-by: Ursula Braun 
Reviewed-by: Thomas Richter 
---
 drivers/s390/net/qeth_l2_main.c | 1 -
 drivers/s390/net/qeth_l3_main.c | 1 -
 2 files changed, 2 deletions(-)

diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 54fd891..2081c18 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1131,7 +1131,6 @@ static int qeth_l2_setup_netdev(struct qeth_card *card)
qeth_l2_request_initial_mac(card);
card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) *
  PAGE_SIZE;
-   card->dev->gso_max_segs = (QETH_MAX_BUFFER_ELEMENTS(card) - 1);
SET_NETDEV_DEV(card->dev, &card->gdev->dev);
netif_napi_add(card->dev, &card->napi, qeth_l2_poll, QETH_NAPI_WEIGHT);
netif_carrier_off(card->dev);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 4ba82e1..0cbbc80 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3148,7 +3148,6 @@ static int qeth_l3_setup_netdev(struct qeth_card *card)
netif_keep_dst(card->dev);
card->dev->gso_max_size = (QETH_MAX_BUFFER_ELEMENTS(card) - 1) *
  PAGE_SIZE;
-   card->dev->gso_max_segs = (QETH_MAX_BUFFER_ELEMENTS(card) - 1);
 
SET_NETDEV_DEV(card->dev, &card->gdev->dev);
netif_napi_add(card->dev, &card->napi, qeth_l3_poll, QETH_NAPI_WEIGHT);
-- 
2.8.4



[PATCH net 3/7] s390/qeth: allow hsuid configuration in DOWN state

2016-09-15 Thread Ursula Braun
The qeth IP address mapping logic has been reworked recently. It
causes now problems to specify qeth sysfs attribute "hsuid" in DOWN
state, which is allowed. Postpone registering or deregistering of
IP-addresses in this case.

Signed-off-by: Ursula Braun 
Reviewed-by: Thomas Richter 
---
 drivers/s390/net/qeth_l3_main.c | 22 +-
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 2f51271..4ba82e1 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -257,6 +257,11 @@ int qeth_l3_delete_ip(struct qeth_card *card, struct 
qeth_ipaddr *tmp_addr)
if (addr->in_progress)
return -EINPROGRESS;
 
+   if (!qeth_card_hw_is_reachable(card)) {
+   addr->disp_flag = QETH_DISP_ADDR_DELETE;
+   return 0;
+   }
+
rc = qeth_l3_deregister_addr_entry(card, addr);
 
hash_del(&addr->hnode);
@@ -296,6 +301,11 @@ int qeth_l3_add_ip(struct qeth_card *card, struct 
qeth_ipaddr *tmp_addr)
hash_add(card->ip_htable, &addr->hnode,
qeth_l3_ipaddr_hash(addr));
 
+   if (!qeth_card_hw_is_reachable(card)) {
+   addr->disp_flag = QETH_DISP_ADDR_ADD;
+   return 0;
+   }
+
/* qeth_l3_register_addr_entry can go to sleep
 * if we add a IPV4 addr. It is caused by the reason
 * that SETIP ipa cmd starts ARP staff for IPV4 addr.
@@ -390,12 +400,16 @@ static void qeth_l3_recover_ip(struct qeth_card *card)
int i;
int rc;
 
-   QETH_CARD_TEXT(card, 4, "recoverip");
+   QETH_CARD_TEXT(card, 4, "recovrip");
 
spin_lock_bh(&card->ip_lock);
 
hash_for_each_safe(card->ip_htable, i, tmp, addr, hnode) {
-   if (addr->disp_flag == QETH_DISP_ADDR_ADD) {
+   if (addr->disp_flag == QETH_DISP_ADDR_DELETE) {
+   qeth_l3_deregister_addr_entry(card, addr);
+   hash_del(&addr->hnode);
+   kfree(addr);
+   } else if (addr->disp_flag == QETH_DISP_ADDR_ADD) {
if (addr->proto == QETH_PROT_IPV4) {
addr->in_progress = 1;
spin_unlock_bh(&card->ip_lock);
@@ -407,10 +421,8 @@ static void qeth_l3_recover_ip(struct qeth_card *card)
 
if (!rc) {
addr->disp_flag = QETH_DISP_ADDR_DO_NOTHING;
-   if (addr->ref_counter < 1) {
+   if (addr->ref_counter < 1)
qeth_l3_delete_ip(card, addr);
-   kfree(addr);
-   }
} else {
hash_del(&addr->hnode);
kfree(addr);
-- 
2.8.4



[PATCH net 4/7] qeth: check not more than 16 SBALEs on the completion queue

2016-09-15 Thread Ursula Braun
af_iucv socket programs with HiperSockets as transport make use of the qdio
completion queue. Running such an af_iucv socket program may result in a
crash:

[90341.677709] Oops: 0038 ilc:2 [#1] SMP
[90341.677743] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 
4.6.0-20160720.0.0e86ec7.5e62689.fc23.s390xperformance #1
[90341.677744] Hardware name: IBM  2964 N96  703
  (LPAR)
[90341.677746] task: edb79f00 ti: edb84000 task.ti: 
edb84000
[90341.677748] Krnl PSW : 0704d0018000 0075bc50 
(qeth_qdio_input_handler+0x258/0x4e0)
[90341.677756]R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 
RI:0 EA:3
Krnl GPRS: 03d10391e900 0001 e61e6000 0005
[90341.677759]00a9e6ec 5420040001a77400 0001 
006f
[90341.677761]e0d83f00 0003 0010 
5420040001a77400
[90341.677784]7ba8b000 00943fd0 0075bc4e 
ed3b3c10
[90341.677793] Krnl Code: 0075bc42: e320cc180004lg  
%r2,3096(%r12)
   0075bc48: c0e5c5cc   brasl   %r14,7547e0
  #0075bc4e: 1816   lr  %r1,%r6
  >0075bc50: ba19b008   cs  %r1,%r9,8(%r11)
   0075bc54: ec180041017e   cij %r1,1,8,75bcd6
   0075bc5a: 5810b008   l   %r1,8(%r11)
   0075bc5e: ec16005c027e   cij %r1,2,6,75bd16
   0075bc64: 5090b008   st  %r9,8(%r11)
[90341.677807] Call Trace:
[90341.677810] ([<0075bbc0>] qeth_qdio_input_handler+0x1c8/0x4e0)
[90341.677812] ([<0070efbc>] qdio_kick_handler+0x124/0x2a8)
[90341.677814] ([<00713570>] __tiqdio_inbound_processing+0xf0/0xcd0)
[90341.677818] ([<00143312>] tasklet_action+0x92/0x120)
[90341.677823] ([<008b6e72>] __do_softirq+0x112/0x308)
[90341.677824] ([<00142bce>] irq_exit+0xd6/0xf8)
[90341.677829] ([<0010b1d2>] do_IRQ+0x6a/0x88)
[90341.677830] ([<008b6322>] io_int_handler+0x112/0x220)
[90341.677832] ([<00102b2e>] enabled_wait+0x56/0xa8)
[90341.677833] ([<>]   (null))
[90341.677835] ([<00102e32>] arch_cpu_idle+0x32/0x48)
[90341.677838] ([<0018a126>] cpu_startup_entry+0x266/0x2b0)
[90341.677841] ([<00113b38>] smp_start_secondary+0x100/0x110)
[90341.677843] ([<008b68a6>] restart_int_handler+0x62/0x78)
[90341.677845] ([<008b6588>] psw_idle+0x3c/0x40)
[90341.677846] Last Breaking-Event-Address:
[90341.677848]  [<007547ec>] qeth_dbf_longtext+0xc/0xc0
[90341.677849]
[90341.677850] Kernel panic - not syncing: Fatal exception in interrupt

qeth_qdio_cq_handler() analyzes SBALs on this completion queue, but does
not observe the limit of 16 SBAL elements per SBAL. This patch adds the
additional check to process not more than 16 SBAL elements.

Signed-off-by: Ursula Braun 
---
 drivers/s390/net/qeth_core_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 6ad5a14..20cf296 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -3619,7 +3619,8 @@ static void qeth_qdio_cq_handler(struct qeth_card *card,
int e;
 
e = 0;
-   while (buffer->element[e].addr) {
+   while ((e < QDIO_MAX_ELEMENTS_PER_BUFFER) &&
+  buffer->element[e].addr) {
unsigned long phys_aob_addr;
 
phys_aob_addr = (unsigned long) buffer->element[e].addr;
-- 
2.8.4



[PATCH net 2/7] s390/qeth: use ip_lock for hsuid configuration

2016-09-15 Thread Ursula Braun
qeth_l3_dev_hsuid_store() changes the ip hash table, which
requires the ip_lock.

Signed-off-by: Ursula Braun 
---
 drivers/s390/net/qeth_l3_sys.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/s390/net/qeth_l3_sys.c b/drivers/s390/net/qeth_l3_sys.c
index 65645b1..0e00a5c 100644
--- a/drivers/s390/net/qeth_l3_sys.c
+++ b/drivers/s390/net/qeth_l3_sys.c
@@ -297,7 +297,9 @@ static ssize_t qeth_l3_dev_hsuid_store(struct device *dev,
addr->u.a6.pfxlen = 0;
addr->type = QETH_IP_TYPE_NORMAL;
 
+   spin_lock_bh(&card->ip_lock);
qeth_l3_delete_ip(card, addr);
+   spin_unlock_bh(&card->ip_lock);
kfree(addr);
}
 
@@ -329,7 +331,10 @@ static ssize_t qeth_l3_dev_hsuid_store(struct device *dev,
addr->type = QETH_IP_TYPE_NORMAL;
} else
return -ENOMEM;
+
+   spin_lock_bh(&card->ip_lock);
qeth_l3_add_ip(card, addr);
+   spin_unlock_bh(&card->ip_lock);
kfree(addr);
 
return count;
-- 
2.8.4



[PATCH net 1/7] qeth: restore device features after recovery

2016-09-15 Thread Ursula Braun
From: Hans Wippel 

After device recovery, only a basic set of network device features is
enabled on the device. If features like checksum offloading or TSO were
enabled by the user before the recovery, this results in a mismatch
between the network device features, that the kernel assumes to be
enabled on the device, and the features actually enabled on the device.

This patch tries to restore previously set features, that require
changes on the device, after the recovery of a device. In case of an
error, the network device's features are changed to contain only the
features that are actually turned on.

Signed-off-by: Hans Wippel 
Signed-off-by: Ursula Braun 
---
 drivers/s390/net/qeth_core.h  |  1 +
 drivers/s390/net/qeth_core_main.c | 29 +
 drivers/s390/net/qeth_l2_main.c   |  3 +++
 drivers/s390/net/qeth_l3_main.c   |  1 +
 4 files changed, 34 insertions(+)

diff --git a/drivers/s390/net/qeth_core.h b/drivers/s390/net/qeth_core.h
index bf40063..6d4b68c4 100644
--- a/drivers/s390/net/qeth_core.h
+++ b/drivers/s390/net/qeth_core.h
@@ -999,6 +999,7 @@ struct qeth_cmd_buffer *qeth_get_setassparms_cmd(struct 
qeth_card *,
 __u16, __u16,
 enum qeth_prot_versions);
 int qeth_set_features(struct net_device *, netdev_features_t);
+int qeth_recover_features(struct net_device *);
 netdev_features_t qeth_fix_features(struct net_device *, netdev_features_t);
 
 /* exports for OSN */
diff --git a/drivers/s390/net/qeth_core_main.c 
b/drivers/s390/net/qeth_core_main.c
index 7dba6c8..6ad5a14 100644
--- a/drivers/s390/net/qeth_core_main.c
+++ b/drivers/s390/net/qeth_core_main.c
@@ -6131,6 +6131,35 @@ static int qeth_set_ipa_tso(struct qeth_card *card, int 
on)
return rc;
 }
 
+/* try to restore device features on a device after recovery */
+int qeth_recover_features(struct net_device *dev)
+{
+   struct qeth_card *card = dev->ml_priv;
+   netdev_features_t recover = dev->features;
+
+   if (recover & NETIF_F_IP_CSUM) {
+   if (qeth_set_ipa_csum(card, 1, IPA_OUTBOUND_CHECKSUM))
+   recover ^= NETIF_F_IP_CSUM;
+   }
+   if (recover & NETIF_F_RXCSUM) {
+   if (qeth_set_ipa_csum(card, 1, IPA_INBOUND_CHECKSUM))
+   recover ^= NETIF_F_RXCSUM;
+   }
+   if (recover & NETIF_F_TSO) {
+   if (qeth_set_ipa_tso(card, 1))
+   recover ^= NETIF_F_TSO;
+   }
+
+   if (recover == dev->features)
+   return 0;
+
+   dev_warn(&card->gdev->dev,
+"Device recovery failed to restore all offload features\n");
+   dev->features = recover;
+   return -EIO;
+}
+EXPORT_SYMBOL_GPL(qeth_recover_features);
+
 int qeth_set_features(struct net_device *dev, netdev_features_t features)
 {
struct qeth_card *card = dev->ml_priv;
diff --git a/drivers/s390/net/qeth_l2_main.c b/drivers/s390/net/qeth_l2_main.c
index 7bc20c5..54fd891 100644
--- a/drivers/s390/net/qeth_l2_main.c
+++ b/drivers/s390/net/qeth_l2_main.c
@@ -1246,6 +1246,9 @@ contin:
}
/* this also sets saved unicast addresses */
qeth_l2_set_rx_mode(card->dev);
+   rtnl_lock();
+   qeth_recover_features(card->dev);
+   rtnl_unlock();
}
/* let user_space know that device is online */
kobject_uevent(&gdev->dev.kobj, KOBJ_CHANGE);
diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index 7293466..2f51271 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -3269,6 +3269,7 @@ contin:
else
dev_open(card->dev);
qeth_l3_set_multicast_list(card->dev);
+   qeth_recover_features(card->dev);
rtnl_unlock();
}
qeth_trace_features(card);
-- 
2.8.4



[PATCH net 7/7] s390/qeth: fix setting VIPA address

2016-09-15 Thread Ursula Braun
From: Thomas Richter 

commit 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
restructured the internal address handling.
This work broke setting a virtual IP address.
The command
echo 10.1.1.1 > /sys/bus/ccwgroup/devices//vipa/add4
fails with file exist error even if the IP address has not
been set before.

It turned out that the search result for the IP address
search is handled incorrectly in the VIPA case.

This patch fixes the setting of an virtual IP address.

Signed-off-by: Thomas Richter 
Signed-off-by: Ursula Braun 
---
 drivers/s390/net/qeth_l3_main.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/net/qeth_l3_main.c b/drivers/s390/net/qeth_l3_main.c
index c00f6db..272d9e7 100644
--- a/drivers/s390/net/qeth_l3_main.c
+++ b/drivers/s390/net/qeth_l3_main.c
@@ -701,7 +701,7 @@ int qeth_l3_add_vipa(struct qeth_card *card, enum 
qeth_prot_versions proto,
 
spin_lock_bh(&card->ip_lock);
 
-   if (!qeth_l3_ip_from_hash(card, ipaddr))
+   if (qeth_l3_ip_from_hash(card, ipaddr))
rc = -EEXIST;
else
qeth_l3_add_ip(card, ipaddr);
@@ -769,7 +769,7 @@ int qeth_l3_add_rxip(struct qeth_card *card, enum 
qeth_prot_versions proto,
 
spin_lock_bh(&card->ip_lock);
 
-   if (!qeth_l3_ip_from_hash(card, ipaddr))
+   if (qeth_l3_ip_from_hash(card, ipaddr))
rc = -EEXIST;
else
qeth_l3_add_ip(card, ipaddr);
-- 
2.8.4



I hope this email meets you well in good health condition

2016-09-15 Thread Jones
How you doing today? I hope you are doing well. My name is Jones, from the US. 
I'm in Syria right now fighting ISIS. I want to get to know you better, if I 
may be so bold. I consider myself an easy-going man, and I am currently looking 
for a relationship in which I feel loved. Please tell me more about yourself, 
if you don't mind.

Hope to hear from you soon.

Regards,
Jones


Re: [RFC 06/11] Add support for QP verbs

2016-09-15 Thread Leon Romanovsky
On Mon, Sep 12, 2016 at 07:07:40PM +0300, Ram Amrani wrote:
> Add support for Queue Pair verbs which adds, deletes,
> modifies and queries Queue Pairs.
>
> Signed-off-by: Rajesh Borundia 
> Signed-off-by: Ram Amrani 
> ---
>  drivers/infiniband/hw/qedr/main.c  |   15 +-
>  drivers/infiniband/hw/qedr/qedr.h  |  126 +++
>  drivers/infiniband/hw/qedr/qedr_cm.h   |   40 +
>  drivers/infiniband/hw/qedr/qedr_hsi_rdma.h |   11 +
>  drivers/infiniband/hw/qedr/qedr_user.h |   34 +
>  drivers/infiniband/hw/qedr/verbs.c | 1098 -
>  drivers/infiniband/hw/qedr/verbs.h |7 +
>  drivers/net/ethernet/qlogic/qed/qed_cxt.h  |1 +
>  drivers/net/ethernet/qlogic/qed/qed_roce.c | 1211 
> 
>  drivers/net/ethernet/qlogic/qed/qed_roce.h |   71 ++
>  include/linux/qed/qed_roce_if.h|  144 
>  11 files changed, 2756 insertions(+), 2 deletions(-)
>  create mode 100644 drivers/infiniband/hw/qedr/qedr_cm.h
>

<>

> +
> +static inline int get_gid_info_from_table(struct ib_qp *ibqp,
> +   struct ib_qp_attr *attr,
> +   int attr_mask,
> +   struct qed_rdma_modify_qp_in_params
> +   *qp_params)
> +{
> + enum rdma_network_type nw_type;
> + struct ib_gid_attr gid_attr;
> + union ib_gid gid;
> + u32 ipv4_addr;
> + int rc = 0;
> + int i;
> +
> + rc = ib_get_cached_gid(ibqp->device, attr->ah_attr.port_num,
> +attr->ah_attr.grh.sgid_index, &gid, &gid_attr);
> + if (!rc && !memcmp(&gid, &zgid, sizeof(gid)))
> + rc = -ENOENT;
> +
> + if (!rc && gid_attr.ndev) {
> + qp_params->vlan_id = rdma_vlan_dev_vlan_id(gid_attr.ndev);
> +
> + dev_put(gid_attr.ndev);
> + nw_type = ib_gid_to_network_type(gid_attr.gid_type, &gid);
> + switch (nw_type) {
> + case RDMA_NETWORK_IPV6:
> + memcpy(&qp_params->sgid.bytes[0], &gid.raw[0],
> +sizeof(qp_params->sgid));
> + memcpy(&qp_params->dgid.bytes[0],
> +&attr->ah_attr.grh.dgid,
> +sizeof(qp_params->dgid));
> + qp_params->roce_mode = ROCE_V2_IPV6;
> + SET_FIELD(qp_params->modify_flags,
> +   QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
> + break;
> + case RDMA_NETWORK_IB:
> + memcpy(&qp_params->sgid.bytes[0], &gid.raw[0],
> +sizeof(qp_params->sgid));
> + memcpy(&qp_params->dgid.bytes[0],
> +&attr->ah_attr.grh.dgid,
> +sizeof(qp_params->dgid));
> + qp_params->roce_mode = ROCE_V1;
> + break;
> + case RDMA_NETWORK_IPV4:
> + memset(&qp_params->sgid, 0, sizeof(qp_params->sgid));
> + memset(&qp_params->dgid, 0, sizeof(qp_params->dgid));
> + ipv4_addr = qedr_get_ipv4_from_gid(gid.raw);
> + qp_params->sgid.ipv4_addr = ipv4_addr;
> + ipv4_addr =
> + qedr_get_ipv4_from_gid(attr->ah_attr.grh.dgid.raw);
> + qp_params->dgid.ipv4_addr = ipv4_addr;
> + SET_FIELD(qp_params->modify_flags,
> +   QED_ROCE_MODIFY_QP_VALID_ROCE_MODE, 1);
> + qp_params->roce_mode = ROCE_V2_IPV4;
> + break;
> + }
> + }
> + if (rc)
> + return -EINVAL;

I think it is better to check "rc" right after call to
ib_get_cached_gid().



signature.asc
Description: PGP signature


[PATCH net-next 0/3] mlx5e Order-0 pages for Striding RQ

2016-09-15 Thread Tariq Toukan
Hi Dave,

In this series, we refactor our Striding RQ receive-flow to always use
fragmented WQEs (Work Queue Elements) using order-0 pages, omitting the
flow that allocates and splits high-order pages which would fragment
and deplete high-order pages in the system.

The first patch gives a slight degradation, but opens the opportunity
to using a simple page-cache mechanism of a fair size.
The page-cache, implemented in patch 3, not only closes the performance
gap but even gives a gain.
In patch 2 we re-organize the code to better manage the calls for
alloc/de-alloc pages in the RX flow.

Series generated against net-next commit:
bed806cb266e "Merge branch 'mlxsw-ethtool'"

Thanks,
Tariq

Tariq Toukan (3):
  net/mlx5e: Single flow order-0 pages for Striding RQ
  net/mlx5e: Introduce API for RX mapped pages
  net/mlx5e: Implement RX mapped page cache for page recycle

 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  70 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 149 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 359 +++--
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |  20 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 5 files changed, 291 insertions(+), 309 deletions(-)

-- 
1.8.3.1



[PATCH net-next 3/3] net/mlx5e: Implement RX mapped page cache for page recycle

2016-09-15 Thread Tariq Toukan
Instead of reallocating and mapping pages for RX data-path,
recycle already used pages in a per ring cache.

Performance tests:
The following results were measured on a freshly booted system,
giving optimal baseline performance, as high-order pages are yet to
be fragmented and depleted.

We ran pktgen single-stream benchmarks, with iptables-raw-drop:

Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - order0 no cache
* 4,786,899 - order0 with cache
1% gain

Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - order0 no cache
* 4,127,852 - order0 with cache
3.7% gain

Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - order0 no cache
* 3,931,708 - order0 with cache
5.4% gain

Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   | 16 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 15 ++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 57 --
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h | 16 ++
 4 files changed, 99 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 401b2f7b165f..7dd4763e726e 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -287,6 +287,18 @@ struct mlx5e_rx_am { /* Adaptive Moderation */
u8  tired;
 };
 
+/* a single cache unit is capable to serve one napi call (for non-striding rq)
+ * or a MPWQE (for striding rq).
+ */
+#define MLX5E_CACHE_UNIT   (MLX5_MPWRQ_PAGES_PER_WQE > NAPI_POLL_WEIGHT ? \
+MLX5_MPWRQ_PAGES_PER_WQE : NAPI_POLL_WEIGHT)
+#define MLX5E_CACHE_SIZE   (2 * roundup_pow_of_two(MLX5E_CACHE_UNIT))
+struct mlx5e_page_cache {
+   u32 head;
+   u32 tail;
+   struct mlx5e_dma_info page_cache[MLX5E_CACHE_SIZE];
+};
+
 struct mlx5e_rq {
/* data path */
struct mlx5_wq_ll  wq;
@@ -301,6 +313,8 @@ struct mlx5e_rq {
struct mlx5e_tstamp   *tstamp;
struct mlx5e_rq_stats  stats;
struct mlx5e_cqcq;
+   struct mlx5e_page_cache page_cache;
+
mlx5e_fp_handle_rx_cqe handle_rx_cqe;
mlx5e_fp_alloc_wqe alloc_wqe;
mlx5e_fp_dealloc_wqe   dealloc_wqe;
@@ -651,6 +665,8 @@ bool mlx5e_poll_tx_cq(struct mlx5e_cq *cq, int napi_budget);
 int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget);
 void mlx5e_free_tx_descs(struct mlx5e_sq *sq);
 
+void mlx5e_page_release(struct mlx5e_rq *rq, struct mlx5e_dma_info *dma_info,
+   bool recycle);
 void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe);
 bool mlx5e_post_rx_wqes(struct mlx5e_rq *rq);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 136554b77c3b..8595b507e200 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -141,6 +141,10 @@ static void mlx5e_update_sw_counters(struct mlx5e_priv 
*priv)
s->rx_buff_alloc_err += rq_stats->buff_alloc_err;
s->rx_cqe_compress_blks += rq_stats->cqe_compress_blks;
s->rx_cqe_compress_pkts += rq_stats->cqe_compress_pkts;
+   s->rx_cache_reuse += rq_stats->cache_reuse;
+   s->rx_cache_full  += rq_stats->cache_full;
+   s->rx_cache_empty += rq_stats->cache_empty;
+   s->rx_cache_busy  += rq_stats->cache_busy;
 
for (j = 0; j < priv->params.num_tc; j++) {
sq_stats = &priv->channel[i]->sq[j].stats;
@@ -475,6 +479,9 @@ static int mlx5e_create_rq(struct mlx5e_channel *c,
INIT_WORK(&rq->am.work, mlx5e_rx_am_work);
rq->am.mode = priv->params.rx_cq_period_mode;
 
+   rq->page_cache.head = 0;
+   rq->page_cache.tail = 0;
+
return 0;
 
 err_rq_wq_destroy:
@@ -485,6 +492,8 @@ err_rq_wq_destroy:
 
 static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
 {
+   int i;
+
switch (rq->wq_type) {
case MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ:
mlx5e_rq_free_mpwqe_info(rq);
@@ -493,6 +502,12 @@ static void mlx5e_destroy_rq(struct mlx5e_rq *rq)
kfree(rq->skb);
}
 
+   for (i = rq->page_cache.head; i != rq->page_cache.tail;
+i = (i + 1) & (MLX5E_CACHE_SIZE - 1)) {
+   struct mlx5e_dma_info *dma_info = &rq->page_cache.page_cache[i];
+
+   mlx5e_page_release(rq, dma_info, false);
+   }
mlx5_wq_destroy(&rq->wq_ctrl);
 }
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 0c34daa04c43..dc8677933f76 100644
--- a/drivers/net/ethernet/mellanox/mlx5/

[PATCH net-next 2/3] net/mlx5e: Introduce API for RX mapped pages

2016-09-15 Thread Tariq Toukan
Manage the allocation and deallocation of mapped RX pages only
through dedicated API functions.

Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 46 +++--
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index 5d1b7b5e4f36..0c34daa04c43 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -305,26 +305,32 @@ static inline void mlx5e_post_umr_wqe(struct mlx5e_rq 
*rq, u16 ix)
mlx5e_tx_notify_hw(sq, &wqe->ctrl, 0);
 }
 
-static inline int mlx5e_alloc_and_map_page(struct mlx5e_rq *rq,
-  struct mlx5e_mpw_info *wi,
-  int i)
+static inline int mlx5e_page_alloc_mapped(struct mlx5e_rq *rq,
+ struct mlx5e_dma_info *dma_info)
 {
struct page *page = dev_alloc_page();
+
if (unlikely(!page))
return -ENOMEM;
 
-   wi->umr.dma_info[i].page = page;
-   wi->umr.dma_info[i].addr = dma_map_page(rq->pdev, page, 0, PAGE_SIZE,
-   PCI_DMA_FROMDEVICE);
-   if (unlikely(dma_mapping_error(rq->pdev, wi->umr.dma_info[i].addr))) {
+   dma_info->page = page;
+   dma_info->addr = dma_map_page(rq->pdev, page, 0, PAGE_SIZE,
+ DMA_FROM_DEVICE);
+   if (unlikely(dma_mapping_error(rq->pdev, dma_info->addr))) {
put_page(page);
return -ENOMEM;
}
-   wi->umr.mtt[i] = cpu_to_be64(wi->umr.dma_info[i].addr | MLX5_EN_WR);
 
return 0;
 }
 
+static inline void mlx5e_page_release(struct mlx5e_rq *rq,
+ struct mlx5e_dma_info *dma_info)
+{
+   dma_unmap_page(rq->pdev, dma_info->addr, PAGE_SIZE, DMA_FROM_DEVICE);
+   put_page(dma_info->page);
+}
+
 static int mlx5e_alloc_rx_umr_mpwqe(struct mlx5e_rq *rq,
struct mlx5e_rx_wqe *wqe,
u16 ix)
@@ -336,10 +342,13 @@ static int mlx5e_alloc_rx_umr_mpwqe(struct mlx5e_rq *rq,
int i;
 
for (i = 0; i < MLX5_MPWRQ_PAGES_PER_WQE; i++) {
-   err = mlx5e_alloc_and_map_page(rq, wi, i);
+   struct mlx5e_dma_info *dma_info = &wi->umr.dma_info[i];
+
+   err = mlx5e_page_alloc_mapped(rq, dma_info);
if (unlikely(err))
goto err_unmap;
-   page_ref_add(wi->umr.dma_info[i].page, pg_strides);
+   wi->umr.mtt[i] = cpu_to_be64(dma_info->addr | MLX5_EN_WR);
+   page_ref_add(dma_info->page, pg_strides);
wi->skbs_frags[i] = 0;
}
 
@@ -350,10 +359,10 @@ static int mlx5e_alloc_rx_umr_mpwqe(struct mlx5e_rq *rq,
 
 err_unmap:
while (--i >= 0) {
-   dma_unmap_page(rq->pdev, wi->umr.dma_info[i].addr, PAGE_SIZE,
-  PCI_DMA_FROMDEVICE);
-   page_ref_sub(wi->umr.dma_info[i].page, pg_strides);
-   put_page(wi->umr.dma_info[i].page);
+   struct mlx5e_dma_info *dma_info = &wi->umr.dma_info[i];
+
+   page_ref_sub(dma_info->page, pg_strides);
+   mlx5e_page_release(rq, dma_info);
}
 
return err;
@@ -365,11 +374,10 @@ void mlx5e_free_rx_mpwqe(struct mlx5e_rq *rq, struct 
mlx5e_mpw_info *wi)
int i;
 
for (i = 0; i < MLX5_MPWRQ_PAGES_PER_WQE; i++) {
-   dma_unmap_page(rq->pdev, wi->umr.dma_info[i].addr, PAGE_SIZE,
-  PCI_DMA_FROMDEVICE);
-   page_ref_sub(wi->umr.dma_info[i].page,
-pg_strides - wi->skbs_frags[i]);
-   put_page(wi->umr.dma_info[i].page);
+   struct mlx5e_dma_info *dma_info = &wi->umr.dma_info[i];
+
+   page_ref_sub(dma_info->page, pg_strides - wi->skbs_frags[i]);
+   mlx5e_page_release(rq, dma_info);
}
 }
 
-- 
1.8.3.1



[PATCH net-next 1/3] net/mlx5e: Single flow order-0 pages for Striding RQ

2016-09-15 Thread Tariq Toukan
To improve the memory consumption scheme, we omit the flow that
demands and splits high-order pages in Striding RQ, and stay
with a single Striding RQ flow that uses order-0 pages.

Moving to fragmented memory allows the use of larger MPWQEs,
which reduces the number of UMR posts and filler CQEs.

Moving to a single flow allows several optimizations that improve
performance, especially in production servers where we would
anyway fallback to order-0 allocations:
- inline functions that were called via function pointers.
- improve the UMR post process.

This patch alone is expected to give a slight performance reduction.
However, the new memory scheme gives the possibility to use a page-cache
of a fair size, that doesn't inflate the memory footprint, which will
dramatically fix the reduction and even give a performance gain.

Performance tests:
The following results were measured on a freshly booted system,
giving optimal baseline performance, as high-order pages are yet to
be fragmented and depleted.

We ran pktgen single-stream benchmarks, with iptables-raw-drop:

Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - this patch
no reduction

Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - this patch
3.5% reduction

Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - this patch
4% reduction

Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi 
packet WQE")
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  54 ++--
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 136 --
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 292 -
 drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   4 -
 drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
 5 files changed, 184 insertions(+), 304 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index a9358cf7386a..401b2f7b165f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -62,12 +62,12 @@
 #define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE0xd
 
 #define MLX5E_PARAMS_MINIMUM_LOG_RQ_SIZE_MPW0x1
-#define MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW0x4
+#define MLX5E_PARAMS_DEFAULT_LOG_RQ_SIZE_MPW0x3
 #define MLX5E_PARAMS_MAXIMUM_LOG_RQ_SIZE_MPW0x6
 
 #define MLX5_MPWRQ_LOG_STRIDE_SIZE 6  /* >= 6, HW restriction */
 #define MLX5_MPWRQ_LOG_STRIDE_SIZE_CQE_COMPRESS8  /* >= 6, HW 
restriction */
-#define MLX5_MPWRQ_LOG_WQE_SZ  17
+#define MLX5_MPWRQ_LOG_WQE_SZ  18
 #define MLX5_MPWRQ_WQE_PAGE_ORDER  (MLX5_MPWRQ_LOG_WQE_SZ - PAGE_SHIFT > 0 ? \
MLX5_MPWRQ_LOG_WQE_SZ - PAGE_SHIFT : 0)
 #define MLX5_MPWRQ_PAGES_PER_WQE   BIT(MLX5_MPWRQ_WQE_PAGE_ORDER)
@@ -293,8 +293,8 @@ struct mlx5e_rq {
u32wqe_sz;
struct sk_buff   **skb;
struct mlx5e_mpw_info *wqe_info;
+   void  *mtt_no_align;
__be32 mkey_be;
-   __be32 umr_mkey_be;
 
struct device *pdev;
struct net_device *netdev;
@@ -323,32 +323,15 @@ struct mlx5e_rq {
 
 struct mlx5e_umr_dma_info {
__be64*mtt;
-   __be64*mtt_no_align;
dma_addr_t mtt_addr;
-   struct mlx5e_dma_info *dma_info;
+   struct mlx5e_dma_info  dma_info[MLX5_MPWRQ_PAGES_PER_WQE];
+   struct mlx5e_umr_wqe   wqe;
 };
 
 struct mlx5e_mpw_info {
-   union {
-   struct mlx5e_dma_info dma_info;
-   struct mlx5e_umr_dma_info umr;
-   };
+   struct mlx5e_umr_dma_info umr;
u16 consumed_strides;
u16 skbs_frags[MLX5_MPWRQ_PAGES_PER_WQE];
-
-   void (*dma_pre_sync)(struct device *pdev,
-struct mlx5e_mpw_info *wi,
-u32 wqe_offset, u32 len);
-   void (*add_skb_frag)(struct mlx5e_rq *rq,
-struct sk_buff *skb,
-struct mlx5e_mpw_info *wi,
-u32 page_idx, u32 frag_offset, u32 len);
-   void (*copy_skb_header)(struct device *pdev,
-   struct sk_buff *skb,
-   struct mlx5e_mpw_info *wi,
-   u32 page_idx, u32 offset,
-   u32 headlen);
-   void (*free_wqe)(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi);
 };
 
 struct mlx5e_tx_wqe_info {
@@ -672,24 +655,11 @@ void mlx5e_handle_rx_cqe(struct mlx5e_rq *rq, struct 
mlx5_cqe64 *cqe);
 void mlx5e_handle_rx_cqe_mpwrq(struct mlx5e_rq *rq, struct mlx5_cqe6

Re: [PATCH] iproute2: build nsid-name cache only for commands that need it

2016-09-15 Thread Nicolas Dichtel
Le 15/09/2016 à 10:23, Anton Aksola a écrit :
[snip]
> --- a/ip/ipnetns.c
> +++ b/ip/ipnetns.c
> @@ -775,8 +775,6 @@ static int netns_monitor(int argc, char **argv)
>  
>  int do_netns(int argc, char **argv)
>  {
> - netns_map_init();
> -
>   if (argc < 1)
>   return netns_list(0, NULL);
>  
> @@ -784,8 +782,10 @@ int do_netns(int argc, char **argv)
>   (matches(*argv, "lst") == 0))
>   return netns_list(argc-1, argv+1);
>  
> - if ((matches(*argv, "list-id") == 0))
> + if ((matches(*argv, "list-id") == 0)) {
> + netns_map_init();
>   return netns_list_id(argc-1, argv+1);
> + }
'ip netns' (ip netns list) also need it.


[PATCH net-next 1/2] i40e: remove superfluous I40E_DEBUG_USER statement

2016-09-15 Thread Stefan Assmann
This debug statement is confusing and never set in the code. Any debug
output should be guarded by the proper I40E_DEBUG_* statement which can
be enabled via the debug module parameter.
Remove or convert the I40E_DEBUG_USER cases to I40E_DEBUG_INIT.

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 ---
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  6 -
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  2 --
 drivers/net/ethernet/intel/i40e/i40e_main.c| 35 +-
 drivers/net/ethernet/intel/i40e/i40e_type.h|  2 --
 5 files changed, 17 insertions(+), 31 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_common.c 
b/drivers/net/ethernet/intel/i40e/i40e_common.c
index 2154a34..8ccb09c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_common.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_common.c
@@ -3207,9 +3207,6 @@ static void i40e_parse_discover_capabilities(struct 
i40e_hw *hw, void *buff,
break;
case I40E_AQ_CAP_ID_MSIX:
p->num_msix_vectors = number;
-   i40e_debug(hw, I40E_DEBUG_INIT,
-  "HW Capability: MSIX vector count = %d\n",
-  p->num_msix_vectors);
break;
case I40E_AQ_CAP_ID_VF_MSIX:
p->num_msix_vectors_vf = number;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 05cf9a7..e9c6f1c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -1210,12 +1210,6 @@ static ssize_t i40e_dbg_command_write(struct file *filp,
u32 level;
cnt = sscanf(&cmd_buf[10], "%i", &level);
if (cnt) {
-   if (I40E_DEBUG_USER & level) {
-   pf->hw.debug_mask = level;
-   dev_info(&pf->pdev->dev,
-"set hw.debug_mask = 0x%08x\n",
-pf->hw.debug_mask);
-   }
pf->msg_enable = level;
dev_info(&pf->pdev->dev, "set msg_enable = 0x%08x\n",
 pf->msg_enable);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 1835186..c56877c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -987,8 +987,6 @@ static void i40e_set_msglevel(struct net_device *netdev, 
u32 data)
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_pf *pf = np->vsi->back;
 
-   if (I40E_DEBUG_USER & data)
-   pf->hw.debug_mask = data;
pf->msg_enable = data;
 }
 
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 61b0fc4..56369761 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -6665,16 +6665,19 @@ static int i40e_get_capabilities(struct i40e_pf *pf)
}
} while (err);
 
-   if (pf->hw.debug_mask & I40E_DEBUG_USER)
-   dev_info(&pf->pdev->dev,
-"pf=%d, num_vfs=%d, msix_pf=%d, msix_vf=%d, fd_g=%d, 
fd_b=%d, pf_max_q=%d num_vsi=%d\n",
-pf->hw.pf_id, pf->hw.func_caps.num_vfs,
-pf->hw.func_caps.num_msix_vectors,
-pf->hw.func_caps.num_msix_vectors_vf,
-pf->hw.func_caps.fd_filters_guaranteed,
-pf->hw.func_caps.fd_filters_best_effort,
-pf->hw.func_caps.num_tx_qp,
-pf->hw.func_caps.num_vsis);
+   i40e_debug(&pf->hw, I40E_DEBUG_INIT,
+  "HW Capabilities: PF-id[%d] num_vfs=%d, msix_pf=%d, 
msix_vf=%d\n",
+  pf->hw.pf_id,
+  pf->hw.func_caps.num_vfs,
+  pf->hw.func_caps.num_msix_vectors,
+  pf->hw.func_caps.num_msix_vectors_vf);
+   i40e_debug(&pf->hw, I40E_DEBUG_INIT,
+  "HW Capabilities: PF-id[%d] fd_g=%d, fd_b=%d, pf_max_qp=%d 
num_vsis=%d\n",
+  pf->hw.pf_id,
+  pf->hw.func_caps.fd_filters_guaranteed,
+  pf->hw.func_caps.fd_filters_best_effort,
+  pf->hw.func_caps.num_tx_qp,
+  pf->hw.func_caps.num_vsis);
 
 #define DEF_NUM_VSI (1 + (pf->hw.func_caps.fcoe ? 1 : 0) \
   + pf->hw.func_caps.num_vfs)
@@ -8495,14 +8498,10 @@ static int i40e_sw_init(struct i40e_pf *pf)
int err = 0;
int size;
 
-   pf->msg_enable = netif_msg_init(I40E_DEFAULT_MSG_ENABLE,
-   (NETIF_MSG_DRV|NETIF_MSG_PROBE|NETIF_MSG_LINK));

[PATCH net-next 2/2] i40e: fix setting debug parameter early

2016-09-15 Thread Stefan Assmann
pf->msg_enable is a bitmask, therefore assigning the value of the
"debug" parameter is wrong. It is initialized again later in
i40e_sw_init() so it didn't cause any problem, except that we missed
early debug messages. Moved the initialization and assigned
pf->hw.debug_mask the bitmask as that's what the driver actually uses
in i40e_debug(). Otherwise the debug parameter is just a noop.

Fixes: 5b5faa4 ("i40e: enable debug earlier")

Signed-off-by: Stefan Assmann 
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c 
b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 56369761..f972f0d 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -8498,11 +8498,6 @@ static int i40e_sw_init(struct i40e_pf *pf)
int err = 0;
int size;
 
-   pf->msg_enable = netif_msg_init(debug,
-   NETIF_MSG_DRV|
-   NETIF_MSG_PROBE  |
-   NETIF_MSG_LINK);
-
/* Set default capability flags */
pf->flags = I40E_FLAG_RX_CSUM_ENABLED |
I40E_FLAG_MSI_ENABLED |
@@ -10812,10 +10807,13 @@ static int i40e_probe(struct pci_dev *pdev, const 
struct pci_device_id *ent)
mutex_init(&hw->aq.asq_mutex);
mutex_init(&hw->aq.arq_mutex);
 
-   if (debug != -1) {
-   pf->msg_enable = pf->hw.debug_mask;
-   pf->msg_enable = debug;
-   }
+   /* enable debug prints if requested */
+   pf->msg_enable = netif_msg_init(debug,
+   NETIF_MSG_DRV   |
+   NETIF_MSG_PROBE |
+   NETIF_MSG_LINK);
+   if (debug != -1)
+   pf->hw.debug_mask = pf->msg_enable;
 
/* do a special CORER for clearing PXE mode once at init */
if (hw->revision_id == 0 &&
-- 
2.7.4



[PATCH net-next 0/2] i40e: clean-up and fix for the i40e debug code

2016-09-15 Thread Stefan Assmann
Stefan Assmann (2):
  i40e: remove superfluous I40E_DEBUG_USER statement
  i40e: fix setting debug parameter early

 drivers/net/ethernet/intel/i40e/i40e_common.c  |  3 --
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  6 
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c |  2 --
 drivers/net/ethernet/intel/i40e/i40e_main.c| 43 --
 drivers/net/ethernet/intel/i40e/i40e_type.h|  2 --
 5 files changed, 20 insertions(+), 36 deletions(-)

-- 
2.7.4



RE: [RFC 09/11] Add LL2 RoCE interface

2016-09-15 Thread Amrani, Ram
> > +   DP_ERR(cdev,
> > +  "QED RoCE set MAC filter failed - roce_info/ll2 NULL\n");
> > +   return -EINVAL;
> > +   }
> > +
> > +   p_ptt = qed_ptt_acquire(QED_LEADING_HWFN(cdev));
> > +   if (!p_ptt) {
> > +   DP_ERR(cdev,
> > +  "qed roce ll2 mac filter set: failed to acquire PTT\n");
> > +   return -EINVAL;
> > +   }
> 
> Please use single style for your debug prints QED RoCE vs. qed roce.
Sure. This bothers me too. Will fix. Thanks.



[PATCH] mwifiex: fix null pointer deference when adapter is null

2016-09-15 Thread Colin King
From: Colin Ian King 

If adapter is null the error exit path in mwifiex_shutdown_sw is
to down the semaphore sem and print some debug via mwifiex_dbg.
However, passing a NULL adapter to mwifiex_dbg causes a null
pointer deference when accessing adapter->dev.  This fix checks
for a null adapter at the start of the function and to exit
without the need to up the semaphore and we also skip the debug
to avoid the null pointer dereference.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/marvell/mwifiex/main.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/main.c 
b/drivers/net/wireless/marvell/mwifiex/main.c
index 9b2e98c..7a4f8cc 100644
--- a/drivers/net/wireless/marvell/mwifiex/main.c
+++ b/drivers/net/wireless/marvell/mwifiex/main.c
@@ -1369,12 +1369,12 @@ mwifiex_shutdown_sw(struct mwifiex_adapter *adapter, 
struct semaphore *sem)
struct mwifiex_private *priv;
int i;
 
+   if (!adapter)
+   goto exit_return;
+
if (down_interruptible(sem))
goto exit_sem_err;
 
-   if (!adapter)
-   goto exit_remove;
-
priv = mwifiex_get_priv(adapter, MWIFIEX_BSS_ROLE_ANY);
mwifiex_deauthenticate(priv, NULL);
 
@@ -1434,6 +1434,7 @@ mwifiex_shutdown_sw(struct mwifiex_adapter *adapter, 
struct semaphore *sem)
up(sem);
 exit_sem_err:
mwifiex_dbg(adapter, INFO, "%s, successful\n", __func__);
+exit_return:
return 0;
 }
 
-- 
2.9.3



Re: [PATCH 7/9] net: ethernet: ti: cpts: calc mult and shift from refclk freq

2016-09-15 Thread Richard Cochran
On Thu, Sep 15, 2016 at 01:58:15PM +0200, Richard Cochran wrote:
> Can the input clock be higher than 1 GHz?  If not, I suggest using
> clocks_calc_mult_shift() with maxsec=4 and a setting the watchdog also
> to 4*HZ.

On second thought, with the new 12% timer batching, using 4*HZ for 32
bits of 1 GHz is cutting it too close.  So just keep it like you had
it, with maxsec=mask/freq and timeout=maxsec/2, to stay on the safe
side.

Thanks,
Richard


Re: [PATCH] mwifiex: fix null pointer deference when adapter is null

2016-09-15 Thread Julian Calaby
Hi All,

On Thu, Sep 15, 2016 at 11:42 PM, Colin King  wrote:
> From: Colin Ian King 
>
> If adapter is null the error exit path in mwifiex_shutdown_sw is
> to down the semaphore sem and print some debug via mwifiex_dbg.
> However, passing a NULL adapter to mwifiex_dbg causes a null
> pointer deference when accessing adapter->dev.  This fix checks
> for a null adapter at the start of the function and to exit
> without the need to up the semaphore and we also skip the debug
> to avoid the null pointer dereference.
>
> Signed-off-by: Colin Ian King 

Reviewed-by: Julian Calaby 

Thanks,

-- 
Julian Calaby

Email: julian.cal...@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/


Re: [Xen-devel] [RFC PATCH] xen-netback: fix error handling on netback_probe()

2016-09-15 Thread Wei Liu
On Thu, Sep 15, 2016 at 04:05:17PM +0200, Filipe Manco wrote:
> On 14-09-2016 12:10, Wei Liu wrote:
> >CC xen-devel as well.
> >
> >On Tue, Sep 13, 2016 at 02:11:27PM +0200, Filipe Manco wrote:
> >>In case of error during netback_probe() (e.g. an entry missing on the
> >>xenstore) netback_remove() is called on the new device, which will set
> >>the device backend state to XenbusStateClosed by calling
> >>set_backend_state(). However, the backend state wasn't initialized by
> >>netback_probe() at this point, which will cause and invalid transaction
> >>and set_backend_state() to BUG().
> >>
> >>Initialize the backend state at the beginning of netback_probe() to
> >>XenbusStateInitialising, and create a new valid state transaction on
> >>set_backend_state(), from XenbusStateInitialising to XenbusStateClosed.
> >>
> >>Signed-off-by: Filipe Manco 
> >There is a state machine right before set_backend_state. You would also
> >need to update that.
> Good point I'll update the diagram.
> 
> After looking at the diagram and for consistency, shouldn't the transition
> Initialising -> InitWait be handled using set_backend_state()? Currently it
> is done directly in netback_probe() code. If you agree I'll submit a v2 with
> these two changes.

That's fine with me.

Wei.


Re: [Xen-devel] [RFC PATCH] xen-netback: fix error handling on netback_probe()

2016-09-15 Thread Filipe Manco

On 14-09-2016 12:10, Wei Liu wrote:

CC xen-devel as well.

On Tue, Sep 13, 2016 at 02:11:27PM +0200, Filipe Manco wrote:

In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().

Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create a new valid state transaction on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed.

Signed-off-by: Filipe Manco 

There is a state machine right before set_backend_state. You would also
need to update that.

Good point I'll update the diagram.

After looking at the diagram and for consistency, shouldn't the transition
Initialising -> InitWait be handled using set_backend_state()? Currently it
is done directly in netback_probe() code. If you agree I'll submit a v2 with
these two changes.

According to the definition of XenbusStateInitialising, this patch looks
plausible to me.

Wei.


Filipe

---
  drivers/net/xen-netback/xenbus.c | 10 ++
  1 file changed, 10 insertions(+)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 6a31f2610c23..c0e5f6994d01 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -270,6 +270,7 @@ static int netback_probe(struct xenbus_device *dev,
  
  	be->dev = dev;

dev_set_drvdata(&dev->dev, be);
+   be->state = XenbusStateInitialising;
  
  	sg = 1;
  
@@ -515,6 +516,15 @@ static void set_backend_state(struct backend_info *be,

  {
while (be->state != state) {
switch (be->state) {
+   case XenbusStateInitialising:
+   switch (state) {
+   case XenbusStateClosed:
+   backend_switch_state(be, XenbusStateClosed);
+   break;
+   default:
+   BUG();
+   }
+   break;
case XenbusStateClosed:
switch (state) {
case XenbusStateInitWait:
--
2.7.4


___
Xen-devel mailing list
xen-de...@lists.xen.org
https://lists.xen.org/xen-devel




[PATCH V3 0/3] net-next: dsa: add QCA8K support

2016-09-15 Thread John Crispin
This series is based on the AR8xxx series posted by Matthieu Olivari in may
2015. The following changes were made since then

* fixed the nitpicks from the previous review
* updated to latest API
* turned it into an mdio device
* added callbacks for fdb, bridge offloading, stp, eee, port status
* fixed several minor issues to the port setup and arp learning
* changed the namespacing as this driver to qca8k

The driver has so far only been tested on qca8337/N. It should work on other QCA
switches such as the qca8327 with minor changes.

John Crispin (3):
  Documentation: devicetree: add qca8k binding
  net-next: dsa: add Qualcomm tag RX/TX handler
  net-next: dsa: add new driver for qca8xxx family

 .../devicetree/bindings/net/dsa/qca8k.txt  |   89 ++
 drivers/net/dsa/Kconfig|9 +
 drivers/net/dsa/Makefile   |1 +
 drivers/net/dsa/qca8k.c| 1060 
 drivers/net/dsa/qca8k.h|  185 
 include/net/dsa.h  |1 +
 net/dsa/Kconfig|3 +
 net/dsa/Makefile   |1 +
 net/dsa/dsa.c  |3 +
 net/dsa/dsa_priv.h |2 +
 net/dsa/tag_qca.c  |  138 +++
 11 files changed, 1492 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/qca8k.txt
 create mode 100644 drivers/net/dsa/qca8k.c
 create mode 100644 drivers/net/dsa/qca8k.h
 create mode 100644 net/dsa/tag_qca.c

-- 
1.7.10.4



[PATCH V3 2/3] net-next: dsa: add Qualcomm tag RX/TX handler

2016-09-15 Thread John Crispin
Add support for the 2-bytes Qualcomm tag that gigabit switches such as
the QCA8337/N might insert when receiving packets, or that we need
to insert while targeting specific switch ports. The tag is inserted
directly behind the ethernet header.

Reviewed-by: Andrew Lunn 
Reviewed-by: Florian Fainelli 
Signed-off-by: John Crispin 
---
Changes in V2
* fix some comments
* remove dead code
* rename variable from phy->reg

 include/net/dsa.h  |1 +
 net/dsa/Kconfig|3 ++
 net/dsa/Makefile   |1 +
 net/dsa/dsa.c  |3 ++
 net/dsa/dsa_priv.h |2 +
 net/dsa/tag_qca.c  |  138 
 6 files changed, 148 insertions(+)
 create mode 100644 net/dsa/tag_qca.c

diff --git a/include/net/dsa.h b/include/net/dsa.h
index 24ee961..7fdd63e 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -26,6 +26,7 @@ enum dsa_tag_protocol {
DSA_TAG_PROTO_TRAILER,
DSA_TAG_PROTO_EDSA,
DSA_TAG_PROTO_BRCM,
+   DSA_TAG_PROTO_QCA,
DSA_TAG_LAST,   /* MUST BE LAST */
 };
 
diff --git a/net/dsa/Kconfig b/net/dsa/Kconfig
index ff7736f..96e47c5 100644
--- a/net/dsa/Kconfig
+++ b/net/dsa/Kconfig
@@ -38,4 +38,7 @@ config NET_DSA_TAG_EDSA
 config NET_DSA_TAG_TRAILER
bool
 
+config NET_DSA_TAG_QCA
+   bool
+
 endif
diff --git a/net/dsa/Makefile b/net/dsa/Makefile
index 8af4ded..a3380ed 100644
--- a/net/dsa/Makefile
+++ b/net/dsa/Makefile
@@ -7,3 +7,4 @@ dsa_core-$(CONFIG_NET_DSA_TAG_BRCM) += tag_brcm.o
 dsa_core-$(CONFIG_NET_DSA_TAG_DSA) += tag_dsa.o
 dsa_core-$(CONFIG_NET_DSA_TAG_EDSA) += tag_edsa.o
 dsa_core-$(CONFIG_NET_DSA_TAG_TRAILER) += tag_trailer.o
+dsa_core-$(CONFIG_NET_DSA_TAG_QCA) += tag_qca.o
diff --git a/net/dsa/dsa.c b/net/dsa/dsa.c
index d8d267e..66e31ac 100644
--- a/net/dsa/dsa.c
+++ b/net/dsa/dsa.c
@@ -54,6 +54,9 @@ const struct dsa_device_ops *dsa_device_ops[DSA_TAG_LAST] = {
 #ifdef CONFIG_NET_DSA_TAG_BRCM
[DSA_TAG_PROTO_BRCM] = &brcm_netdev_ops,
 #endif
+#ifdef CONFIG_NET_DSA_TAG_QCA
+   [DSA_TAG_PROTO_QCA] = &qca_netdev_ops,
+#endif
[DSA_TAG_PROTO_NONE] = &none_ops,
 };
 
diff --git a/net/dsa/dsa_priv.h b/net/dsa/dsa_priv.h
index 00077a9..6cfd738 100644
--- a/net/dsa/dsa_priv.h
+++ b/net/dsa/dsa_priv.h
@@ -81,5 +81,7 @@ extern const struct dsa_device_ops trailer_netdev_ops;
 /* tag_brcm.c */
 extern const struct dsa_device_ops brcm_netdev_ops;
 
+/* tag_qca.c */
+extern const struct dsa_device_ops qca_netdev_ops;
 
 #endif
diff --git a/net/dsa/tag_qca.c b/net/dsa/tag_qca.c
new file mode 100644
index 000..0c90cac
--- /dev/null
+++ b/net/dsa/tag_qca.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include "dsa_priv.h"
+
+#define QCA_HDR_LEN2
+#define QCA_HDR_VERSION0x2
+
+#define QCA_HDR_RECV_VERSION_MASK  GENMASK(15, 14)
+#define QCA_HDR_RECV_VERSION_S 14
+#define QCA_HDR_RECV_PRIORITY_MASK GENMASK(13, 11)
+#define QCA_HDR_RECV_PRIORITY_S11
+#define QCA_HDR_RECV_TYPE_MASK GENMASK(10, 6)
+#define QCA_HDR_RECV_TYPE_S6
+#define QCA_HDR_RECV_FRAME_IS_TAGGED   BIT(3)
+#define QCA_HDR_RECV_SOURCE_PORT_MASK  GENMASK(2, 0)
+
+#define QCA_HDR_XMIT_VERSION_MASK  GENMASK(15, 14)
+#define QCA_HDR_XMIT_VERSION_S 14
+#define QCA_HDR_XMIT_PRIORITY_MASK GENMASK(13, 11)
+#define QCA_HDR_XMIT_PRIORITY_S11
+#define QCA_HDR_XMIT_CONTROL_MASK  GENMASK(10, 8)
+#define QCA_HDR_XMIT_CONTROL_S 8
+#define QCA_HDR_XMIT_FROM_CPU  BIT(7)
+#define QCA_HDR_XMIT_DP_BIT_MASK   GENMASK(6, 0)
+
+static struct sk_buff *qca_tag_xmit(struct sk_buff *skb, struct net_device 
*dev)
+{
+   struct dsa_slave_priv *p = netdev_priv(dev);
+   u16 *phdr, hdr;
+
+   dev->stats.tx_packets++;
+   dev->stats.tx_bytes += skb->len;
+
+   if (skb_cow_head(skb, 0) < 0)
+   goto out_free;
+
+   skb_push(skb, QCA_HDR_LEN);
+
+   memmove(skb->data, skb->data + QCA_HDR_LEN, 2 * ETH_ALEN);
+   phdr = (u16 *)(skb->data + 2 * ETH_ALEN);
+
+   /* Set the version field, and set destination port information */
+   hdr = QCA_HDR_VERSION << QCA_HDR_XMIT_VERSION_S |
+   QCA_HDR_XMIT_FROM_CPU |
+   BIT(p->port);
+
+   *phdr = htons(hdr);
+
+   return skb;
+
+out_free:
+   kfree_skb(skb);
+   return NULL;
+}
+
+static int qca_tag_rcv(struct sk_buff *skb, struct net_device *dev,
+  struct packet_type *pt, s

Re: [PATCH net-next v2 3/5] cxgb4: add parser to translate u32 filters to internal spec

2016-09-15 Thread John Fastabend
On 16-09-13 04:42 AM, Rahul Lakkireddy wrote:
> Parse information sent by u32 into internal filter specification.
> Add support for parsing several fields in IPv4, IPv6, TCP, and UDP.
> 
> Signed-off-by: Rahul Lakkireddy 
> Signed-off-by: Hariprasad Shenai 
> ---

Looks good to me. Also curious if you would find it worthwhile to
have a cls_u32 mode that starts at L2 instead of the IP header? The
use case would be to use cls_u32 with various encapsulation protocols
in front of the IP header.

Reviewed-by: John Fastabend 



[PATCH V3 3/3] net-next: dsa: add new driver for qca8xxx family

2016-09-15 Thread John Crispin
This patch contains initial support for the QCA8337 switch. It
will detect a QCA8337 switch, if present and declared in the DT.

Each port will be represented through a standalone net_device interface,
as for other DSA switches. CPU can communicate with any of the ports by
setting an IP@ on ethN interface. Most of the extra callbacks of the DSA
subsystem are already supported, such as bridge offloading, stp, fdb.

Signed-off-by: John Crispin 
---
Changes in V2
* add proper locking for the FDB table
* remove udelay when changing the page. neither datasheet nore SDK code
  requires a sleep
* add a cond_resched to the busy wait loop
* use nested locking when accessing the mdio bus
* remove the phy_to_port() wrappers
* remove mmd access function and use existing phy helpers
* fix a copy/paste bug breaking the eee callbacks
* use default vid 1 when fdb entries are added fro vid 0
* remove the phy id check and add a switch id check instead
* add error handling to the mdio read/write functions
* remove inline usage

Changes in V3
* remove qca8k_to_priv() helper
* turn fdb mutex into a generic register mutex
* verify that port0 is the cpu port
* flush MIB counters on startup
* more PHY->port
* implement fdb_prepare properly
* power ports up/down on PM event and shutdown

 drivers/net/dsa/Kconfig  |9 +
 drivers/net/dsa/Makefile |1 +
 drivers/net/dsa/qca8k.c  | 1060 ++
 drivers/net/dsa/qca8k.h  |  185 
 4 files changed, 1255 insertions(+)
 create mode 100644 drivers/net/dsa/qca8k.c
 create mode 100644 drivers/net/dsa/qca8k.h

diff --git a/drivers/net/dsa/Kconfig b/drivers/net/dsa/Kconfig
index de6d044..0659846 100644
--- a/drivers/net/dsa/Kconfig
+++ b/drivers/net/dsa/Kconfig
@@ -25,4 +25,13 @@ source "drivers/net/dsa/b53/Kconfig"
 
 source "drivers/net/dsa/mv88e6xxx/Kconfig"
 
+config NET_DSA_QCA8K
+   tristate "Qualcomm Atheros QCA8K Ethernet switch family support"
+   depends on NET_DSA
+   select NET_DSA_TAG_QCA
+   select REGMAP
+   ---help---
+ This enables support for the Qualcomm Atheros QCA8K Ethernet
+ switch chips.
+
 endmenu
diff --git a/drivers/net/dsa/Makefile b/drivers/net/dsa/Makefile
index ca1e71b..8346e4f 100644
--- a/drivers/net/dsa/Makefile
+++ b/drivers/net/dsa/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_NET_DSA_MV88E6060) += mv88e6060.o
 obj-$(CONFIG_NET_DSA_BCM_SF2)  += bcm_sf2.o
+obj-$(CONFIG_NET_DSA_QCA8K)+= qca8k.o
 
 obj-y  += b53/
 obj-y  += mv88e6xxx/
diff --git a/drivers/net/dsa/qca8k.c b/drivers/net/dsa/qca8k.c
new file mode 100644
index 000..7f3f178
--- /dev/null
+++ b/drivers/net/dsa/qca8k.c
@@ -0,0 +1,1060 @@
+/*
+ * Copyright (C) 2009 Felix Fietkau 
+ * Copyright (C) 2011-2012 Gabor Juhos 
+ * Copyright (c) 2015, The Linux Foundation. All rights reserved.
+ * Copyright (c) 2016 John Crispin 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "qca8k.h"
+
+#define MIB_DESC(_s, _o, _n)   \
+   {   \
+   .size = (_s),   \
+   .offset = (_o), \
+   .name = (_n),   \
+   }
+
+static const struct qca8k_mib_desc ar8327_mib[] = {
+   MIB_DESC(1, 0x00, "RxBroad"),
+   MIB_DESC(1, 0x04, "RxPause"),
+   MIB_DESC(1, 0x08, "RxMulti"),
+   MIB_DESC(1, 0x0c, "RxFcsErr"),
+   MIB_DESC(1, 0x10, "RxAlignErr"),
+   MIB_DESC(1, 0x14, "RxRunt"),
+   MIB_DESC(1, 0x18, "RxFragment"),
+   MIB_DESC(1, 0x1c, "Rx64Byte"),
+   MIB_DESC(1, 0x20, "Rx128Byte"),
+   MIB_DESC(1, 0x24, "Rx256Byte"),
+   MIB_DESC(1, 0x28, "Rx512Byte"),
+   MIB_DESC(1, 0x2c, "Rx1024Byte"),
+   MIB_DESC(1, 0x30, "Rx1518Byte"),
+   MIB_DESC(1, 0x34, "RxMaxByte"),
+   MIB_DESC(1, 0x38, "RxTooLong"),
+   MIB_DESC(2, 0x3c, "RxGoodByte"),
+   MIB_DESC(2, 0x44, "RxBadByte"),
+   MIB_DESC(1, 0x4c, "RxOverFlow"),
+   MIB_DESC(1, 0x50, "Filtered"),
+   MIB_DESC(1, 0x54, "TxBroad"),
+   MIB_DESC(1, 0x58, "TxPause"),
+   MIB_DESC(1, 0x5c, "TxMulti"),
+   MIB_DESC(1, 0x60, "TxUnderRun"),
+   MIB_DESC(1, 0x64, "Tx64Byte"),
+   MIB_DESC(1, 0x68, "Tx128Byte"),
+   MIB_DESC(1, 0x6c, "Tx256Byte"),
+   MIB_DESC(1, 0x70, "Tx512Byte"),
+   MIB_DESC(1, 0x74, "Tx1024Byte"),
+   MIB_DESC(1, 0x78, "Tx1518Byte"),
+   MIB_DESC(1, 0x7c, "TxMaxByte"),
+   MIB_DESC(1, 0x80, "TxOverSize"),
+   MIB_

[PATCH V3 1/3] Documentation: devicetree: add qca8k binding

2016-09-15 Thread John Crispin
Add device-tree binding for ar8xxx switch families.

Cc: devicet...@vger.kernel.org
Signed-off-by: John Crispin 
---
Changes in V2
* fixup example to include phy nodes and corresponding phandles
* add a note explaining why we need to phy nodes

Changes in V3
* add note stating that the cpu port is always 0

 .../devicetree/bindings/net/dsa/qca8k.txt  |   89 
 1 file changed, 89 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/dsa/qca8k.txt

diff --git a/Documentation/devicetree/bindings/net/dsa/qca8k.txt 
b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
new file mode 100644
index 000..9c67ee4
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/dsa/qca8k.txt
@@ -0,0 +1,89 @@
+* Qualcomm Atheros QCA8xxx switch family
+
+Required properties:
+
+- compatible: should be "qca,qca8337"
+- #size-cells: must be 0
+- #address-cells: must be 1
+
+Subnodes:
+
+The integrated switch subnode should be specified according to the binding
+described in dsa/dsa.txt. As the QCA8K switches do not have a N:N mapping of
+port and PHY id, each subnode describing a port needs to have a valid phandle
+referencing the internal PHY connected to it. The CPU port of this switch is
+always port 0.
+
+Example:
+
+
+   &mdio0 {
+   phy_port1: phy@0 {
+   reg = <0>;
+   };
+
+   phy_port2: phy@1 {
+   reg = <1>;
+   };
+
+   phy_port3: phy@2 {
+   reg = <2>;
+   };
+
+   phy_port4: phy@3 {
+   reg = <3>;
+   };
+
+   phy_port5: phy@4 {
+   reg = <4>;
+   };
+
+   switch0@0 {
+   compatible = "qca,qca8337";
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   reg = <0>;
+
+   ports {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   port@0 {
+   reg = <0>;
+   label = "cpu";
+   ethernet = <&gmac1>;
+   phy-mode = "rgmii";
+   };
+
+   port@1 {
+   reg = <1>;
+   label = "lan1";
+   phy-handle = <&phy_port1>;
+   };
+
+   port@2 {
+   reg = <2>;
+   label = "lan2";
+   phy-handle = <&phy_port2>;
+   };
+
+   port@3 {
+   reg = <3>;
+   label = "lan3";
+   phy-handle = <&phy_port3>;
+   };
+
+   port@4 {
+   reg = <4>;
+   label = "lan4";
+   phy-handle = <&phy_port4>;
+   };
+
+   port@5 {
+   reg = <5>;
+   label = "wan";
+   phy-handle = <&phy_port5>;
+   };
+   };
+   };
+   };
-- 
1.7.10.4



Re: [PATCH 3/3] mm: memcontrol: consolidate cgroup socket tracking

2016-09-15 Thread Johannes Weiner
On Wed, Sep 14, 2016 at 03:17:14PM -0700, Andrew Morton wrote:
> On Thu, 15 Sep 2016 13:34:24 +0800 kbuild test robot  wrote:
> 
> > Hi Johannes,
> > 
> > [auto build test ERROR on net/master]
> > [also build test ERROR on v4.8-rc6 next-20160914]
> > [if your patch is applied to the wrong git tree, please drop us a note to 
> > help improve the system]
> > [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto 
> > for convenience) to record what (public, well-known) commit your patch 
> > series was built on]
> > [Check https://git-scm.com/docs/git-format-patch for more information]
> > 
> > url:
> > https://github.com/0day-ci/linux/commits/Johannes-Weiner/mm-memcontrol-make-per-cpu-charge-cache-IRQ-safe-for-socket-accounting/20160915-035634
> > config: m68k-sun3_defconfig (attached as .config)
> > compiler: m68k-linux-gcc (GCC) 4.9.0
> > reproduce:
> > wget 
> > https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
> >  -O ~/bin/make.cross
> > chmod +x ~/bin/make.cross
> > # save the attached .config to linux build tree
> > make.cross ARCH=m68k 
> > 
> > All errors (new ones prefixed by >>):
> > 
> >net/built-in.o: In function `sk_alloc':
> > >> (.text+0x4076): undefined reference to `mem_cgroup_sk_alloc'
> >net/built-in.o: In function `__sk_destruct':
> > >> sock.c:(.text+0x457e): undefined reference to `mem_cgroup_sk_free'
> >net/built-in.o: In function `sk_clone_lock':
> >(.text+0x4f1c): undefined reference to `mem_cgroup_sk_alloc'
> 
> This?

Thanks for fixing it up, Andrew.

I think it'd be nicer to declare the dummy functions for !CONFIG_MEMCG;
it also doesn't look like a hotpath that would necessitate the jump
label in that place. Dave, any preference either way?

Signed-off-by: Johannes Weiner 
---

diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h
index ca11b3e6dd65..61d20c17f3b7 100644
--- a/include/linux/memcontrol.h
+++ b/include/linux/memcontrol.h
@@ -773,13 +773,13 @@ static inline void mem_cgroup_wb_stats(struct 
bdi_writeback *wb,
 #endif /* CONFIG_CGROUP_WRITEBACK */
 
 struct sock;
-void mem_cgroup_sk_alloc(struct sock *sk);
-void mem_cgroup_sk_free(struct sock *sk);
 bool mem_cgroup_charge_skmem(struct mem_cgroup *memcg, unsigned int nr_pages);
 void mem_cgroup_uncharge_skmem(struct mem_cgroup *memcg, unsigned int 
nr_pages);
 #ifdef CONFIG_MEMCG
 extern struct static_key_false memcg_sockets_enabled_key;
 #define mem_cgroup_sockets_enabled 
static_branch_unlikely(&memcg_sockets_enabled_key)
+void mem_cgroup_sk_alloc(struct sock *sk);
+void mem_cgroup_sk_free(struct sock *sk);
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && memcg->tcpmem_pressure)
@@ -792,6 +792,8 @@ static inline bool mem_cgroup_under_socket_pressure(struct 
mem_cgroup *memcg)
 }
 #else
 #define mem_cgroup_sockets_enabled 0
+static inline void mem_cgroup_sk_alloc(struct sock *sk) { };
+static inline void mem_cgroup_sk_free(struct sock *sk) { };
 static inline bool mem_cgroup_under_socket_pressure(struct mem_cgroup *memcg)
 {
return false;


Re: [4.4-RT PATCH RFC/RFT] drivers: net: cpsw: mark rx/tx irq as IRQF_NO_THREAD

2016-09-15 Thread Sebastian Andrzej Siewior
On 2016-09-09 15:46:44 [+0300], Grygorii Strashko wrote:
> 
> It looks like scheduler playing ping-pong between CPUs with threaded irqs 
> irq/354-355.
> And seems this might be the case - if I pin both threaded IRQ handlers to CPU0
> I can see better latency and netperf improvement
> cyclictest -m -Sp98 -q  -D4m
> T: 0 ( 1318) P:98 I:1000 C: 24 Min:  9 Act:   14 Avg:   15 Max:  
> 42
> T: 1 ( 1319) P:98 I:1500 C: 159909 Min:  9 Act:   14 Avg:   16 Max:  
> 39
> 
> if I arrange hwirqs  and pin pin both threaded IRQ handlers on CPU1 
> I can observe more less similar results as with this patch. 

so no patch then.

> with this change i do not see  "NOHZ: local_softirq_pending 80" any more 
> Tested-by: Grygorii Strashko  

okay. So I need to think what I do about this. Either this or trying to
run the "higher" softirq first but this could break things.
Thanks for the confirmation.

> > - having the hard-IRQ and IRQ-thread on the same CPU might help, too. It
> >   is not strictly required but saves a few cycles if you don't have to
> >   perform cross CPU wake ups and migrate task forth and back. The latter
> >   happens at prio 99.
> 
> I've experimented with this and it improves netperf and I also followed 
> instructions from [1].
> But seems messed ti pin threaded irqs to cpu.
> [1] 
> https://www.osadl.org/Real-time-Ethernet-UDP-worst-case-roun.qa-farm-rt-ethernet-udp-monitor.0.html

There is irq_thread() => irq_thread_check_affinity(). It might not work
as expected on ARM but it makes sense to follow the affinity mask HW irq
for the thread.

> > - I am not sure NAPI works as expected. I would assume so. There is IRQ
> >   354 and 355 which fire after each other. One would be enough I guess.
> >   And they seem to be short living / fire often. If NAPI works then it
> >   should put an end to it and push it to the softirq thread.
> >   If you have IRQ-pacing support I suggest to use something like 10ms or
> >   so. That means your ping response will go from <= 1ms to 10ms in the
> >   worst case but since you process more packets at a time your
> >   throughput should increase.
> >   If I count this correct, it too you alsmost 4ms from "raise SCHED" to
> >   "try process SCHED" and most of the time was spent in 35[45] hard irq,
> >   raise NET_RX or cross wakeup the IRQ thread.
> 
> The question I have to dial with is why switching to RT cause so significant
> netperf drop (without additional tunning) comparing to vanilla - ~120% for 
> 256K and ~200% for 128K windows?

You have a sched / thread ping/pong. That is one thing. !RT with
threaded irqs should show similar problems. The higher latency is caused
by the migration thread.

> It's of course expected to see netperf drop, but I assume not so significant 
> :(
> And I can't find any reports or statistic related to this. Does the same 
> happen on x86?

It should. Maybe at a lower level if it handles migration more
effective. There is this watchdog thread (for instance) which tries to
detect lockups and runs at P99. It causes "worse" cyclictest numbers on
x86 and on ARM but on ARM this is more visible than on x86.

Sebastian


Re: [net-next,RFC,1/2] fib: introduce fib notification infrastructure

2016-09-15 Thread Andy Gospodarek
On Tue, Sep 6, 2016 at 8:01 AM, Jiri Pirko
 wrote:
> From: Jiri Pirko 
>
> This allows to pass information about added/deleted fib entries to
> whoever is interested. This is done in a very similar way as devinet
> notifies address additions/removals.

(Sorry for the delayed response here...)

I had tried a slightly different approach, but this one also seems
reasonable and possibly better -- especially if this can be made more
generic and shared between ipv4 and ipv6 despite their inherent
differences.

What I did differently was make a more ipv4-specific change to start
with that did this:

+#define RTNH_F_MODIFIED(1 << 7)/* used for
internal kernel tracking */
+
+#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | \
+RTNH_F_LINKDOWN | \
+RTNH_F_MODIFIED) /* used as mask for
route comparisons */

Then in various cases where the route was modified (fib_sync_up, etc),
I added this:

+nexthop_nh->nh_flags |= RTNH_F_MODIFIED;

Checking for the modified flag was then done in fib_table_update().
This new function was a rewrite of fib_table_flush() and checks for
RTNH_F_MODIFIED were done there before calling switchdev infra and
then announce new routes if routes changed.

The main issue I see right now is that neither userspace nor switchdev
are notified when a route flag changes.  This needs to be resolved.

I think this RFC is along the proper path to provide notification, but
I'm not sure that notification will happen when flags change (most
notably the LNKDOWN flag) and there are some other corner cases that
could probably be covered as well.

I need to forward-port my patch from where it was to the latest
net-next and see if these cases I was concerned about were still an
issue.  I'm happy to do that and see if we can put this all together
to fix a few of the outstanding issues.


>
> Signed-off-by: Jiri Pirko 
> ---
>  include/net/ip_fib.h | 19 +++
>  net/ipv4/fib_trie.c  | 43 +++
>  2 files changed, 62 insertions(+)
>
> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
> index 4079fc1..9ad7ba9 100644
> --- a/include/net/ip_fib.h
> +++ b/include/net/ip_fib.h
> @@ -22,6 +22,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  struct fib_config {
> u8  fc_dst_len;
> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct 
> fib_nh *nh);
>  #define FIB_RES_PREFSRC(net, res)  ((res).fi->fib_prefsrc ? : \
>  FIB_RES_SADDR(net, res))
>
> +struct fib_notifier_info {
> +   u32 dst;
> +   int dst_len;
> +   struct fib_info *fi;
> +   u8 tos;
> +   u8 type;
> +   u32 tb_id;
> +   u32 nlflags;
> +};
> +
> +enum fib_event_type {
> +   FIB_EVENT_TYPE_ADD,
> +   FIB_EVENT_TYPE_DEL,
> +};
> +
> +int register_fib_notifier(struct notifier_block *nb);
> +int unregister_fib_notifier(struct notifier_block *nb);
> +
>  struct fib_table {
> struct hlist_node   tb_hlist;
> u32 tb_id;
> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
> index e2ffc2a..19ec471 100644
> --- a/net/ipv4/fib_trie.c
> +++ b/net/ipv4/fib_trie.c
> @@ -73,6 +73,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -84,6 +85,36 @@
>  #include 
>  #include "fib_lookup.h"
>
> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
> +
> +int register_fib_notifier(struct notifier_block *nb)
> +{
> +   return blocking_notifier_chain_register(&fib_chain, nb);
> +}
> +EXPORT_SYMBOL(register_fib_notifier);
> +
> +int unregister_fib_notifier(struct notifier_block *nb)
> +{
> +   return blocking_notifier_chain_unregister(&fib_chain, nb);
> +}
> +EXPORT_SYMBOL(unregister_fib_notifier);
> +
> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
> + int dst_len, struct fib_info *fi,
> + u8 tos, u8 type, u32 tb_id, u32 nlflags)
> +{
> +   struct fib_notifier_info info = {
> +   .dst = dst,
> +   .dst_len = dst_len,
> +   .fi = fi,
> +   .tos = tos,
> +   .type = type,
> +   .tb_id = tb_id,
> +   .nlflags = nlflags,
> +   };
> +   return blocking_notifier_call_chain(&fib_chain, event_type, &info);
> +}
> +
>  #define MAX_STAT_DEPTH 32
>
>  #define KEYLENGTH  (8*sizeof(t_key))
> @@ -1190,6 +1221,10 @@ int fib_table_insert(struct fib_table *tb, struct 
> fib_config *cfg)
> fib_release_info(fi_drop);
> if (state & FA_S_ACCESSED)
> rt_cache_flush(cfg->fc_nlinfo.nl_net);
> +
> +   call_fib_notifiers(FIB_EVENT_TYPE_ADD, key, plen, fi,
> +  new_fa->fa_tos, cfg->fc_type,

Re: [PATCH RFC 01/11] net/mlx5e: Single flow order-0 pages for Striding RQ

2016-09-15 Thread Tariq Toukan

Hi Jesper,


On 07/09/2016 10:18 PM, Jesper Dangaard Brouer wrote:

On Wed,  7 Sep 2016 15:42:22 +0300 Saeed Mahameed  wrote:


From: Tariq Toukan 

To improve the memory consumption scheme, we omit the flow that
demands and splits high-order pages in Striding RQ, and stay
with a single Striding RQ flow that uses order-0 pages.

Thanks you for doing this! MM-list people thanks you!

Thanks. I've just submitted it to net-next.

For others to understand what this means:  This driver was doing
split_page() on high-order pages (for Striding RQ).  This was really bad
because it will cause fragmenting the page-allocator, and depleting the
high-order pages available quickly.

(I've left rest of patch intact below, if some MM people should be
interested in looking at the changes).

There is even a funny comment in split_page() relevant to this:

/* [...]
  * Note: this is probably too low level an operation for use in drivers.
  * Please consult with lkml before using this in your driver.
  */



Moving to fragmented memory allows the use of larger MPWQEs,
which reduces the number of UMR posts and filler CQEs.

Moving to a single flow allows several optimizations that improve
performance, especially in production servers where we would
anyway fallback to order-0 allocations:
- inline functions that were called via function pointers.
- improve the UMR post process.

This patch alone is expected to give a slight performance reduction.
However, the new memory scheme gives the possibility to use a page-cache
of a fair size, that doesn't inflate the memory footprint, which will
dramatically fix the reduction and even give a huge gain.

We ran pktgen single-stream benchmarks, with iptables-raw-drop:

Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - this patch
no reduction

Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - this patch
3.5% reduction

Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - this patch
4% reduction


Well, the reduction does not really matter than much, because your
baseline benchmarks are from a freshly booted system, where you have
not fragmented and depleted the high-order pages yet... ;-)
Indeed. On fragmented systems we'll get a gain, even w/o the page-cache 
mechanism, as no time is wasted looking for high-order-pages.




Fixes: 461017cb006a ("net/mlx5e: Support RX multi-packet WQE (Striding RQ)")
Fixes: bc77b240b3c5 ("net/mlx5e: Add fragmented memory support for RX multi packet 
WQE")
Signed-off-by: Tariq Toukan 
Signed-off-by: Saeed Mahameed 
---
  drivers/net/ethernet/mellanox/mlx5/core/en.h   |  54 ++--
  drivers/net/ethernet/mellanox/mlx5/core/en_main.c  | 136 --
  drivers/net/ethernet/mellanox/mlx5/core/en_rx.c| 292 -
  drivers/net/ethernet/mellanox/mlx5/core/en_stats.h |   4 -
  drivers/net/ethernet/mellanox/mlx5/core/en_txrx.c  |   2 +-
  5 files changed, 184 insertions(+), 304 deletions(-)


Regards,
Tariq


Re: [net-next,RFC,1/2] fib: introduce fib notification infrastructure

2016-09-15 Thread Jiri Pirko
Thu, Sep 15, 2016 at 04:41:20PM CEST, a...@greyhouse.net wrote:
>On Tue, Sep 6, 2016 at 8:01 AM, Jiri Pirko
> wrote:
>> From: Jiri Pirko 
>>
>> This allows to pass information about added/deleted fib entries to
>> whoever is interested. This is done in a very similar way as devinet
>> notifies address additions/removals.
>
>(Sorry for the delayed response here...)
>
>I had tried a slightly different approach, but this one also seems
>reasonable and possibly better -- especially if this can be made more
>generic and shared between ipv4 and ipv6 despite their inherent
>differences.
>
>What I did differently was make a more ipv4-specific change to start
>with that did this:
>
>+#define RTNH_F_MODIFIED(1 << 7)/* used for
>internal kernel tracking */
>+
>+#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | \
>+RTNH_F_LINKDOWN | \
>+RTNH_F_MODIFIED) /* used as mask for
>route comparisons */
>
>Then in various cases where the route was modified (fib_sync_up, etc),
>I added this:
>
>+nexthop_nh->nh_flags |= RTNH_F_MODIFIED;
>
>Checking for the modified flag was then done in fib_table_update().
>This new function was a rewrite of fib_table_flush() and checks for
>RTNH_F_MODIFIED were done there before calling switchdev infra and
>then announce new routes if routes changed.
>
>The main issue I see right now is that neither userspace nor switchdev
>are notified when a route flag changes.  This needs to be resolved.
>
>I think this RFC is along the proper path to provide notification, but
>I'm not sure that notification will happen when flags change (most
>notably the LNKDOWN flag) and there are some other corner cases that
>could probably be covered as well.
>
>I need to forward-port my patch from where it was to the latest
>net-next and see if these cases I was concerned about were still an
>issue.  I'm happy to do that and see if we can put this all together
>to fix a few of the outstanding issues.

I believe that "modify" can be easily another fib event. Drivers can
react accordingly. I'm close to sending v1 (hopefully tomorrow). I
believe you can base your patchset on top of mine which saves you lot of
time.


>
>
>>
>> Signed-off-by: Jiri Pirko 
>> ---
>>  include/net/ip_fib.h | 19 +++
>>  net/ipv4/fib_trie.c  | 43 +++
>>  2 files changed, 62 insertions(+)
>>
>> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>> index 4079fc1..9ad7ba9 100644
>> --- a/include/net/ip_fib.h
>> +++ b/include/net/ip_fib.h
>> @@ -22,6 +22,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  struct fib_config {
>> u8  fc_dst_len;
>> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, struct 
>> fib_nh *nh);
>>  #define FIB_RES_PREFSRC(net, res)  ((res).fi->fib_prefsrc ? : \
>>  FIB_RES_SADDR(net, res))
>>
>> +struct fib_notifier_info {
>> +   u32 dst;
>> +   int dst_len;
>> +   struct fib_info *fi;
>> +   u8 tos;
>> +   u8 type;
>> +   u32 tb_id;
>> +   u32 nlflags;
>> +};
>> +
>> +enum fib_event_type {
>> +   FIB_EVENT_TYPE_ADD,
>> +   FIB_EVENT_TYPE_DEL,
>> +};
>> +
>> +int register_fib_notifier(struct notifier_block *nb);
>> +int unregister_fib_notifier(struct notifier_block *nb);
>> +
>>  struct fib_table {
>> struct hlist_node   tb_hlist;
>> u32 tb_id;
>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>> index e2ffc2a..19ec471 100644
>> --- a/net/ipv4/fib_trie.c
>> +++ b/net/ipv4/fib_trie.c
>> @@ -73,6 +73,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>  #include 
>>  #include 
>> @@ -84,6 +85,36 @@
>>  #include 
>>  #include "fib_lookup.h"
>>
>> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
>> +
>> +int register_fib_notifier(struct notifier_block *nb)
>> +{
>> +   return blocking_notifier_chain_register(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(register_fib_notifier);
>> +
>> +int unregister_fib_notifier(struct notifier_block *nb)
>> +{
>> +   return blocking_notifier_chain_unregister(&fib_chain, nb);
>> +}
>> +EXPORT_SYMBOL(unregister_fib_notifier);
>> +
>> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
>> + int dst_len, struct fib_info *fi,
>> + u8 tos, u8 type, u32 tb_id, u32 nlflags)
>> +{
>> +   struct fib_notifier_info info = {
>> +   .dst = dst,
>> +   .dst_len = dst_len,
>> +   .fi = fi,
>> +   .tos = tos,
>> +   .type = type,
>> +   .tb_id = tb_id,
>> +   .nlflags = nlflags,
>> +   };
>> +   return blocking_notifier_call_chain(&fib_chain, event_type, &info);
>> +}
>> +
>>  #define MAX_STAT_DEPTH 32
>>
>>  #define KEYLENGTH  (8*sizeof(t_key))
>> @@ -

Re: [net-next,RFC,1/2] fib: introduce fib notification infrastructure

2016-09-15 Thread Andy Gospodarek
On Thu, Sep 15, 2016 at 10:45 AM, Jiri Pirko  wrote:
> Thu, Sep 15, 2016 at 04:41:20PM CEST, a...@greyhouse.net wrote:
>>On Tue, Sep 6, 2016 at 8:01 AM, Jiri Pirko
>> wrote:
>>> From: Jiri Pirko 
>>>
>>> This allows to pass information about added/deleted fib entries to
>>> whoever is interested. This is done in a very similar way as devinet
>>> notifies address additions/removals.
>>
>>(Sorry for the delayed response here...)
>>
>>I had tried a slightly different approach, but this one also seems
>>reasonable and possibly better -- especially if this can be made more
>>generic and shared between ipv4 and ipv6 despite their inherent
>>differences.
>>
>>What I did differently was make a more ipv4-specific change to start
>>with that did this:
>>
>>+#define RTNH_F_MODIFIED(1 << 7)/* used for
>>internal kernel tracking */
>>+
>>+#define RTNH_F_COMPARE_MASK(RTNH_F_DEAD | \
>>+RTNH_F_LINKDOWN | \
>>+RTNH_F_MODIFIED) /* used as mask for
>>route comparisons */
>>
>>Then in various cases where the route was modified (fib_sync_up, etc),
>>I added this:
>>
>>+nexthop_nh->nh_flags |= RTNH_F_MODIFIED;
>>
>>Checking for the modified flag was then done in fib_table_update().
>>This new function was a rewrite of fib_table_flush() and checks for
>>RTNH_F_MODIFIED were done there before calling switchdev infra and
>>then announce new routes if routes changed.
>>
>>The main issue I see right now is that neither userspace nor switchdev
>>are notified when a route flag changes.  This needs to be resolved.
>>
>>I think this RFC is along the proper path to provide notification, but
>>I'm not sure that notification will happen when flags change (most
>>notably the LNKDOWN flag) and there are some other corner cases that
>>could probably be covered as well.
>>
>>I need to forward-port my patch from where it was to the latest
>>net-next and see if these cases I was concerned about were still an
>>issue.  I'm happy to do that and see if we can put this all together
>>to fix a few of the outstanding issues.
>
> I believe that "modify" can be easily another fib event. Drivers can
> react accordingly. I'm close to sending v1 (hopefully tomorrow). I
> believe you can base your patchset on top of mine which saves you lot of
> time.
>

Sounds good -- looking forward to it.  If you add this email on the
cc-list rather than the other one, I'll see it more quickly this time.
:-)


>
>>
>>
>>>
>>> Signed-off-by: Jiri Pirko 
>>> ---
>>>  include/net/ip_fib.h | 19 +++
>>>  net/ipv4/fib_trie.c  | 43 +++
>>>  2 files changed, 62 insertions(+)
>>>
>>> diff --git a/include/net/ip_fib.h b/include/net/ip_fib.h
>>> index 4079fc1..9ad7ba9 100644
>>> --- a/include/net/ip_fib.h
>>> +++ b/include/net/ip_fib.h
>>> @@ -22,6 +22,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>
>>>  struct fib_config {
>>> u8  fc_dst_len;
>>> @@ -184,6 +185,24 @@ __be32 fib_info_update_nh_saddr(struct net *net, 
>>> struct fib_nh *nh);
>>>  #define FIB_RES_PREFSRC(net, res)  ((res).fi->fib_prefsrc ? : \
>>>  FIB_RES_SADDR(net, res))
>>>
>>> +struct fib_notifier_info {
>>> +   u32 dst;
>>> +   int dst_len;
>>> +   struct fib_info *fi;
>>> +   u8 tos;
>>> +   u8 type;
>>> +   u32 tb_id;
>>> +   u32 nlflags;
>>> +};
>>> +
>>> +enum fib_event_type {
>>> +   FIB_EVENT_TYPE_ADD,
>>> +   FIB_EVENT_TYPE_DEL,
>>> +};
>>> +
>>> +int register_fib_notifier(struct notifier_block *nb);
>>> +int unregister_fib_notifier(struct notifier_block *nb);
>>> +
>>>  struct fib_table {
>>> struct hlist_node   tb_hlist;
>>> u32 tb_id;
>>> diff --git a/net/ipv4/fib_trie.c b/net/ipv4/fib_trie.c
>>> index e2ffc2a..19ec471 100644
>>> --- a/net/ipv4/fib_trie.c
>>> +++ b/net/ipv4/fib_trie.c
>>> @@ -73,6 +73,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  #include 
>>>  #include 
>>>  #include 
>>> @@ -84,6 +85,36 @@
>>>  #include 
>>>  #include "fib_lookup.h"
>>>
>>> +static BLOCKING_NOTIFIER_HEAD(fib_chain);
>>> +
>>> +int register_fib_notifier(struct notifier_block *nb)
>>> +{
>>> +   return blocking_notifier_chain_register(&fib_chain, nb);
>>> +}
>>> +EXPORT_SYMBOL(register_fib_notifier);
>>> +
>>> +int unregister_fib_notifier(struct notifier_block *nb)
>>> +{
>>> +   return blocking_notifier_chain_unregister(&fib_chain, nb);
>>> +}
>>> +EXPORT_SYMBOL(unregister_fib_notifier);
>>> +
>>> +static int call_fib_notifiers(enum fib_event_type event_type, u32 dst,
>>> + int dst_len, struct fib_info *fi,
>>> + u8 tos, u8 type, u32 tb_id, u32 nlflags)
>>> +{
>>> +   struct fib_notifier_info info = {
>>> +   .dst = dst,
>>> +   .dst_len = dst_len,
>>> +   

Re: [PATCH RFC 01/11] net/mlx5e: Single flow order-0 pages for Striding RQ

2016-09-15 Thread Tariq Toukan

Hi Alexei,

On 07/09/2016 8:31 PM, Alexei Starovoitov wrote:

On Wed, Sep 07, 2016 at 03:42:22PM +0300, Saeed Mahameed wrote:

From: Tariq Toukan 

To improve the memory consumption scheme, we omit the flow that
demands and splits high-order pages in Striding RQ, and stay
with a single Striding RQ flow that uses order-0 pages.

Moving to fragmented memory allows the use of larger MPWQEs,
which reduces the number of UMR posts and filler CQEs.

Moving to a single flow allows several optimizations that improve
performance, especially in production servers where we would
anyway fallback to order-0 allocations:
- inline functions that were called via function pointers.
- improve the UMR post process.

This patch alone is expected to give a slight performance reduction.
However, the new memory scheme gives the possibility to use a page-cache
of a fair size, that doesn't inflate the memory footprint, which will
dramatically fix the reduction and even give a huge gain.

We ran pktgen single-stream benchmarks, with iptables-raw-drop:

Single stride, 64 bytes:
* 4,739,057 - baseline
* 4,749,550 - this patch
no reduction

Larger packets, no page cross, 1024 bytes:
* 3,982,361 - baseline
* 3,845,682 - this patch
3.5% reduction

Larger packets, every 3rd packet crosses a page, 1500 bytes:
* 3,731,189 - baseline
* 3,579,414 - this patch
4% reduction

imo it's not a realistic use case, but would be good to mention that
patch 3 brings performance back for this use case anyway.
Exactly, that's what I meant in the previous paragraph (".. will 
dramatically fix the reduction and even give a huge gain.")

Regards,
Tariq


pull-request: wireless-drivers-next 2016-09-15

2016-09-15 Thread Kalle Valo
Hi Dave,

here's the first pull request for 4.9. The ones I want to point out are
the FIELD_PREP() and FIELD_GET() macros added to bitfield.h, which are
reviewed by Linus, and make it possible to remove util.h from mt7601u.

Also we have new HW support to various drivers and other smaller
features, the signed tag below contains more information. And I pulled
my ath-current (uses older net tree as the baseline) branch to fix a
conflict in ath10k.

Once again the diffstat from git request-pull was wrong. I fixed it by
manually copying the diffstat from a test pull against net-next, so
everything should be ok. But please let me know if there are any
problems.

Kalle

The following changes since commit e34f2ff40e0339f6a379e1ecf49e8f2759056453:

  ath9k: bring back direction setting in ath9k_{start_stop} (2016-09-07 
16:21:04 +0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
tags/wireless-drivers-next-for-davem-2016-09-15

for you to fetch changes up to b7450e248d71067e0c1a09614cf3d7571f7e10fa:

  mwifiex: firmware name correction for usb8997 chipset (2016-09-14 20:02:14 
+0300)


wireless-drivers-next patches for 4.9

Major changes:

iwlwifi

* preparation for new a000 HW continues
* some DQA improvements
* add support for GMAC
* add support for 9460, 9270 and 9170 series

mwifiex

* support random MAC address for scanning
* add HT aggregation support for adhoc mode
* add custom regulatory domain support
* add manufacturing mode support via nl80211 testmode interface

bcma

* support BCM53573 series of wireless SoCs

bitfield.h

* add FIELD_PREP() and FIELD_GET() macros

mt7601u

* convert to use the new bitfield.h macros

brcmfmac

* add support for bcm4339 chip with modalias sdio:c00v02D0d4339

ath10k

* add nl80211 testmode support for 10.4 firmware
* hide kernel addresses from logs using %pK format specifier
* implement NAPI support
* enable peer stats by default

ath9k

* use ieee80211_tx_status_noskb where possible

wil6210

* extract firmware capabilities from the firmware file

ath6kl

* enable firmware crash dumps on the AR6004

ath-current is also merged to fix a conflict in ath10k.


Amitkumar Karwar (8):
  mwifiex: fix failed to reconnect after interface disabled/enabled
  mwifiex: remove misleading disconnect message
  mwifiex: add CHAN_REGION_CFG command
  mwifiex: add custom regulatory domain support
  mwifiex: add PCIe function level reset support
  mwifiex: PCIe8997 chip specific handling
  mwifiex: handle error if IRQ request fails in mwifiex_sdio_of()
  mwifiex: correction in Rx STBC field of htcapinfo

Arend Van Spriel (2):
  brcmfmac: add support for bcm4339 chip with modalias sdio:c00v02D0d4339
  brcmfmac: sdio: shorten retry loop in brcmf_sdio_kso_control()

Arnd Bergmann (1):
  bcma: use of_dma_configure() to set initial dma mask

Ashok Raj Nagarajan (2):
  ath10k: fix sending frame in management path in push txq logic
  ath10k: fix reporting channel survey data

Ayala Beker (1):
  iwlwifi: mvm: support GMAC protocol

Baoyou Xie (2):
  ath9k: mark ath_fill_led_pin() static
  brcmfmac: add missing header dependencies

Ben Greear (1):
  ath10k: improve logging message

Bob Copeland (2):
  ath9k: fix misleading indent
  ath9k: remove repetitions of mask array size

Chaehyun Lim (1):
  ath10k: remove unused variable ar_pci

Christian Engelmayer (2):
  rtlwifi: rtl8192de: Fix leak in _rtl92de_read_adapter_info()
  rtlwifi: rtl8723ae: Fix leak in _rtl8723e_read_adapter_info()

Christophe Jaillet (4):
  mwifiex: fix the length parameter of a memset
  mwifiex: simplify length computation for some memset
  rt2x00usb: Fix error return code
  mwifiex: scan: Simplify code

Colin Ian King (5):
  ath10k: fix spelling mistake "montior" -> "monitor"
  mwifiex: fix missing break on IEEE80211_STYPE_ACTION case
  zd1211rw: fix spelling mistake "firmeware" -> "firmware"
  ath10k: fix memory leak on caldata on error exit path
  rtl8xxxu: fix spelling mistake "firmare" -> "firmware"

Dan Kephart (1):
  ath6kl: enable firmware crash dumps on the AR6004

Daniel Wagner (2):
  ath10k: use complete() instead complete_all()
  carl9170: Fix wrong completion usage

Eduardo Abinader (1):
  ath9k: consider return code on

Eric Bentley (1):
  ath6kl: Allow the radio to report 0 dbm txpower without timing out

Felix Fietkau (2):
  ath9k: use ieee80211_tx_status_noskb where possible
  ath9k: improve powersave filter handling

Ganapathi Bhat (4):
  mwifiex: support random MAC address for scanning
  mwifiex: fix radar detection issue
  mwifiex: Command 7 handling for USB chipsets
  mwifiex: firmware name correction for usb8997 chipset

Guy Mishol (1):
  

[PATCH v2] xen-netback: fix error handling on netback_probe()

2016-09-15 Thread Filipe Manco
In case of error during netback_probe() (e.g. an entry missing on the
xenstore) netback_remove() is called on the new device, which will set
the device backend state to XenbusStateClosed by calling
set_backend_state(). However, the backend state wasn't initialized by
netback_probe() at this point, which will cause and invalid transaction
and set_backend_state() to BUG().

Initialize the backend state at the beginning of netback_probe() to
XenbusStateInitialising, and create two new valid state transitions on
set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
and from XenbusStateInitialising to XenbusStateInitWait.

Signed-off-by: Filipe Manco 
---
 drivers/net/xen-netback/xenbus.c | 46 ++--
 1 file changed, 30 insertions(+), 16 deletions(-)

diff --git a/drivers/net/xen-netback/xenbus.c b/drivers/net/xen-netback/xenbus.c
index 6a31f2610c23..daf4c7867102 100644
--- a/drivers/net/xen-netback/xenbus.c
+++ b/drivers/net/xen-netback/xenbus.c
@@ -271,6 +271,11 @@ static int netback_probe(struct xenbus_device *dev,
be->dev = dev;
dev_set_drvdata(&dev->dev, be);
 
+   be->state = XenbusStateInitialising;
+   err = xenbus_switch_state(dev, XenbusStateInitialising);
+   if (err)
+   goto fail;
+
sg = 1;
 
do {
@@ -383,11 +388,6 @@ static int netback_probe(struct xenbus_device *dev,
 
be->hotplug_script = script;
 
-   err = xenbus_switch_state(dev, XenbusStateInitWait);
-   if (err)
-   goto fail;
-
-   be->state = XenbusStateInitWait;
 
/* This kicks hotplug scripts, so do it immediately. */
err = backend_create_xenvif(be);
@@ -492,20 +492,20 @@ static inline void backend_switch_state(struct 
backend_info *be,
 
 /* Handle backend state transitions:
  *
- * The backend state starts in InitWait and the following transitions are
+ * The backend state starts in Initialising and the following transitions are
  * allowed.
  *
- * InitWait -> Connected
- *
- *^\ |
- *| \|
- *|  \   |
- *|   \  |
- *|\ |
- *| \|
- *|  V   V
+ * Initialising -> InitWait -> Connected
+ *  \
+ *   \^\ |
+ *\   | \|
+ * \  |  \   |
+ *  \ |   \  |
+ *   \|\ |
+ *\   | \|
+ * V  |  V   V
  *
- *  Closed  <-> Closing
+ *  Closed  <-> Closing
  *
  * The state argument specifies the eventual state of the backend and the
  * function transitions to that state via the shortest path.
@@ -515,6 +515,20 @@ static void set_backend_state(struct backend_info *be,
 {
while (be->state != state) {
switch (be->state) {
+   case XenbusStateInitialising:
+   switch (state) {
+   case XenbusStateInitWait:
+   case XenbusStateConnected:
+   case XenbusStateClosing:
+   backend_switch_state(be, XenbusStateInitWait);
+   break;
+   case XenbusStateClosed:
+   backend_switch_state(be, XenbusStateClosed);
+   break;
+   default:
+   BUG();
+   }
+   break;
case XenbusStateClosed:
switch (state) {
case XenbusStateInitWait:
-- 
2.7.4



[PATCH net] tcp: fix overflow in __tcp_retransmit_skb()

2016-09-15 Thread Eric Dumazet
From: Eric Dumazet 

If a TCP socket gets a large write queue, an overflow can happen
in a test in __tcp_retransmit_skb() preventing all retransmits.

The flow then stalls and resets after timeouts.

Tested:

sysctl -w net.core.wmem_max=10
netperf -H dest -- -s 10

Signed-off-by: Eric Dumazet 
---
 net/ipv4/tcp_output.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index bdaef7fd6e47..f53d0cca5fa4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -2605,7 +2605,8 @@ int __tcp_retransmit_skb(struct sock *sk, struct sk_buff 
*skb, int segs)
 * copying overhead: fragmentation, tunneling, mangling etc.
 */
if (atomic_read(&sk->sk_wmem_alloc) >
-   min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
+   min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
+ sk->sk_sndbuf))
return -EAGAIN;
 
if (skb_still_in_host_queue(sk, skb))




Re: [net-next PATCH 00/11] iw_cxgb4,cxgbit: remove duplicate code

2016-09-15 Thread Varun Prakash
Hi Or,

On Wed, Sep 14, 2016 at 02:02:43PM +0530, Or Gerlitz wrote:
> On Tue, Sep 13, 2016 at 6:53 PM, Varun Prakash  wrote:
> > This patch series removes duplicate code from
> > iw_cxgb4 and cxgbit by adding common function definitions in libcxgb.
> 
> Is that bunch of misc functionalities or you can provide a more high
> level description what
> you are cleaning out. Also, what other areas are you planning to
> refactor following the review
> comments we had on the target driver?

This patch series removes duplicate function definitions
that are used in connection management.
I am looking into more improvements in connection management,
will post next series once it is ready. 

Thanks
Varun 


Re: [PATCH V3 1/3] Documentation: devicetree: add qca8k binding

2016-09-15 Thread Andrew Lunn
B1;2802;0cOn Thu, Sep 15, 2016 at 04:26:39PM +0200, John Crispin wrote:
> Add device-tree binding for ar8xxx switch families.
> 
> Cc: devicet...@vger.kernel.org
> Signed-off-by: John Crispin 

Reviewed-by: Andrew Lunn 

Andrew


[PATCH net] net: avoid sk_forward_alloc overflows

2016-09-15 Thread Eric Dumazet
From: Eric Dumazet 

A malicious TCP receiver, sending SACK, can force the sender to split
skbs in write queue and increase its memory usage.

Then, when socket is closed and its write queue purged, we might
overflow sk_forward_alloc (It becomes negative)

sk_mem_reclaim() does nothing in this case, and more than 2GB
are leaked from TCP perspective (tcp_memory_allocated is not changed)

Then warnings trigger from inet_sock_destruct() and
sk_stream_kill_queues() seeing a not zero sk_forward_alloc

All TCP stack can be stuck because TCP is under memory pressure.

A simple fix is to preemptively reclaim from sk_mem_uncharge().

This makes sure a socket wont have more than 2 MB forward allocated,
after burst and idle period.

Signed-off-by: Eric Dumazet 
---
 include/net/sock.h |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 
c797c57f4d9f6b2ef6cc23f1d63210cd41c8cff4..ebf75db08e062dfe7867cc80c7699f593be16349
 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1339,6 +1339,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int 
size)
if (!sk_has_account(sk))
return;
sk->sk_forward_alloc += size;
+
+   /* Avoid a possible overflow.
+* TCP send queues can make this happen, if sk_mem_reclaim()
+* is not called and more than 2 GBytes are released at once.
+*
+* If we reach 2 MBytes, reclaim 1 MBytes right now, there is
+* no need to hold that much forward allocation anyway.
+*/
+   if (unlikely(sk->sk_forward_alloc >= 1 << 21))
+   __sk_mem_reclaim(sk, 1 << 20);
 }
 
 static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)




RE: [PATCH net] tcp: fix overflow in __tcp_retransmit_skb()

2016-09-15 Thread David Laight
From: Eric Dumazet
> Sent: 15 September 2016 16:13
> If a TCP socket gets a large write queue, an overflow can happen
> in a test in __tcp_retransmit_skb() preventing all retransmits.
...
>   if (atomic_read(&sk->sk_wmem_alloc) >
> - min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
> + min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> +   sk->sk_sndbuf))
>   return -EAGAIN;

Might it also be better to split that test to (say):

u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc);
if (unlikely((wmem_alloc > sk->sk_sndbuf))
return -EAGAIN;
if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 
2)))
return -EAGAIN;

It might even be worth splitting the second test as:

if (unlikely(wmem_alloc > sk->sk_wmem_queued)
&& wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2))
return -EAGAIN;

David



[PATCH] mwifiex: fix memory leak on regd when chan is zero

2016-09-15 Thread Colin King
From: Colin Ian King 

When chan is zero mwifiex_create_custom_regdomain does not kfree
regd and we have a memory leak. Fix this by freeing regd before
the return.

Signed-off-by: Colin Ian King 
---
 drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c 
b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
index 3344a26..15a91f3 100644
--- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
+++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
@@ -1049,8 +1049,10 @@ mwifiex_create_custom_regdomain(struct mwifiex_private 
*priv,
enum nl80211_band band;
 
chan = *buf++;
-   if (!chan)
+   if (!chan) {
+   kfree(regd);
return NULL;
+   }
chflags = *buf++;
band = (chan <= 14) ? NL80211_BAND_2GHZ : NL80211_BAND_5GHZ;
freq = ieee80211_channel_to_frequency(chan, band);
-- 
2.9.3



Re: [PATCH] mwifiex: fix null pointer deference when adapter is null

2016-09-15 Thread kbuild test robot
Hi Colin,

[auto build test WARNING on wireless-drivers-next/master]
[also build test WARNING on next-20160915]
[cannot apply to v4.8-rc6]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]
[Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for 
convenience) to record what (public, well-known) commit your patch series was 
built on]
[Check https://git-scm.com/docs/git-format-patch for more information]

url:
https://github.com/0day-ci/linux/commits/Colin-King/mwifiex-fix-null-pointer-deference-when-adapter-is-null/20160915-231625
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next.git 
master
config: x86_64-randconfig-x013-201637 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   drivers/net/wireless/marvell/mwifiex/main.c: In function 
'mwifiex_shutdown_sw':
>> drivers/net/wireless/marvell/mwifiex/main.c:1433:1: warning: label 
>> 'exit_remove' defined but not used [-Wunused-label]
exit_remove:
^~~
   Cyclomatic Complexity 5 include/linux/compiler.h:__read_once_size
   Cyclomatic Complexity 5 include/linux/compiler.h:__write_once_size
   Cyclomatic Complexity 2 arch/x86/include/asm/bitops.h:set_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:constant_test_bit
   Cyclomatic Complexity 1 arch/x86/include/asm/bitops.h:fls64
   Cyclomatic Complexity 1 include/linux/log2.h:__ilog2_u64
   Cyclomatic Complexity 1 include/linux/list.h:INIT_LIST_HEAD
   Cyclomatic Complexity 1 include/linux/list.h:list_empty
   Cyclomatic Complexity 1 include/asm-generic/getorder.h:__get_order
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_read
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_inc
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_dec
   Cyclomatic Complexity 1 arch/x86/include/asm/atomic.h:atomic_add_return
   Cyclomatic Complexity 1 include/linux/spinlock.h:spinlock_check
   Cyclomatic Complexity 1 include/linux/spinlock.h:spin_unlock_irqrestore
   Cyclomatic Complexity 1 include/linux/kasan.h:kasan_kmalloc
   Cyclomatic Complexity 28 include/linux/slab.h:kmalloc_index
   Cyclomatic Complexity 1 include/linux/slab.h:kmem_cache_alloc_trace
   Cyclomatic Complexity 1 include/linux/slab.h:kmalloc_order_trace
   Cyclomatic Complexity 68 include/linux/slab.h:kmalloc_large
   Cyclomatic Complexity 5 include/linux/slab.h:kmalloc
   Cyclomatic Complexity 1 include/linux/slab.h:kzalloc
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_end_pointer
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_queue_empty
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_shared
   Cyclomatic Complexity 1 include/linux/skbuff.h:skb_headroom
   Cyclomatic Complexity 1 include/linux/netdevice.h:netdev_get_tx_queue
   Cyclomatic Complexity 1 include/linux/netdevice.h:netdev_priv
   Cyclomatic Complexity 1 include/linux/netdevice.h:netif_tx_stop_queue
   Cyclomatic Complexity 1 include/linux/netdevice.h:netif_tx_queue_stopped
   Cyclomatic Complexity 1 include/linux/netdevice.h:netif_carrier_ok
   Cyclomatic Complexity 1 include/linux/etherdevice.h:is_multicast_ether_addr
   Cyclomatic Complexity 1 include/linux/etherdevice.h:ether_addr_copy
   Cyclomatic Complexity 1 include/linux/etherdevice.h:ether_addr_equal
   Cyclomatic Complexity 1 
include/linux/etherdevice.h:ether_addr_equal_unaligned
   Cyclomatic Complexity 1 
drivers/net/wireless/marvell/mwifiex/util.h:MWIFIEX_SKB_TXCB
   Cyclomatic Complexity 6 
drivers/net/wireless/marvell/mwifiex/main.h:mwifiex_get_priv
   Cyclomatic Complexity 1 
drivers/net/wireless/marvell/mwifiex/main.h:mwifiex_netdev_get_priv
   Cyclomatic Complexity 1 
drivers/net/wireless/marvell/mwifiex/main.h:mwifiex_is_skb_mgmt_frame
   Cyclomatic Complexity 1 
drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_get_stats
   Cyclomatic Complexity 1 include/linux/workqueue.h:queue_work
   Cyclomatic Complexity 2 
drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_queue_rx_work
   Cyclomatic Complexity 4 
drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_set_multicast_list
   Cyclomatic Complexity 1 
drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_netdev_select_wmm_queue
   Cyclomatic Complexity 1 include/linux/err.h:IS_ERR
   Cyclomatic Complexity 1 include/linux/timekeeping.h:ktime_get_real
   Cyclomatic Complexity 1 include/linux/skbuff.h:__net_timestamp
   Cyclomatic Complexity 1 
drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_open
   Cyclomatic Complexity 2 
drivers/net/wireless/marvell/mwifiex/util.h:MWIFIEX_SKB_RXCB
   Cyclomatic Complexity 1 include/linux/netdevice.h:dev_kfree_skb_any
   Cyclomatic Complexity 6 
drivers/net/wireless/marvell/mwifiex/main.c:mwifiex_unregister
   Cyclomatic Complexit

[PATCH net-next] tcp: prepare skbs for better sack shifting

2016-09-15 Thread Eric Dumazet
From: Eric Dumazet 

With large BDP TCP flows and lossy networks, it is very important
to keep a low number of skbs in the write queue.

RACK and SACK processing can perform a linear scan of it.

We should avoid putting any payload in skb->head, so that SACK
shifting can be done if needed.

With this patch, we allow to pack ~0.5 MB per skb instead of
the 64KB initially cooked at tcp_sendmsg() time.

This gives a reduction of number of skbs in write queue by eight.
tcp_rack_detect_loss() likes this.

We still allow payload in skb->head for first skb put in the queue,
to not impact RPC workloads.

Signed-off-by: Eric Dumazet 
Cc: Yuchung Cheng 
---
 net/ipv4/tcp.c |   31 ---
 1 file changed, 24 insertions(+), 7 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 
a13fcb369f52fe85def7c9d856259bc0509f3453..7dae800092e62cec330544851289d20a68642561
 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1020,17 +1020,31 @@ int tcp_sendpage(struct sock *sk, struct page *page, 
int offset,
 }
 EXPORT_SYMBOL(tcp_sendpage);
 
-static inline int select_size(const struct sock *sk, bool sg)
+/* Do not bother using a page frag for very small frames.
+ * But use this heuristic only for the first skb in write queue.
+ *
+ * Having no payload in skb->head allows better SACK shifting
+ * in tcp_shift_skb_data(), reducing sack/rack overhead, because
+ * write queue has less skbs.
+ * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
+ * This also speeds up tso_fragment(), since it wont fallback
+ * to tcp_fragment().
+ */
+static int linear_payload_sz(bool first_skb)
+{
+   if (first_skb)
+   return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+   return 0;
+}
+
+static int select_size(const struct sock *sk, bool sg, bool first_skb)
 {
const struct tcp_sock *tp = tcp_sk(sk);
int tmp = tp->mss_cache;
 
if (sg) {
if (sk_can_gso(sk)) {
-   /* Small frames wont use a full page:
-* Payload will immediately follow tcp header.
-*/
-   tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
+   tmp = linear_payload_sz(first_skb);
} else {
int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
 
@@ -1161,6 +1175,8 @@ restart:
}
 
if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
+   bool first_skb;
+
 new_segment:
/* Allocate new segment. If the interface is SG,
 * allocate skb fitting to single page.
@@ -1172,10 +1188,11 @@ new_segment:
process_backlog = false;
goto restart;
}
+   first_skb = skb_queue_empty(&sk->sk_write_queue);
skb = sk_stream_alloc_skb(sk,
- select_size(sk, sg),
+ select_size(sk, sg, 
first_skb),
  sk->sk_allocation,
- 
skb_queue_empty(&sk->sk_write_queue));
+ first_skb);
if (!skb)
goto wait_for_memory;
 




Re: [PATCH net] tcp: fix overflow in __tcp_retransmit_skb()

2016-09-15 Thread Eric Dumazet
On Thu, 2016-09-15 at 15:52 +, David Laight wrote:
> From: Eric Dumazet
> > Sent: 15 September 2016 16:13
> > If a TCP socket gets a large write queue, an overflow can happen
> > in a test in __tcp_retransmit_skb() preventing all retransmits.
> ...
> > if (atomic_read(&sk->sk_wmem_alloc) >
> > -   min(sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2), sk->sk_sndbuf))
> > +   min_t(u32, sk->sk_wmem_queued + (sk->sk_wmem_queued >> 2),
> > + sk->sk_sndbuf))
> > return -EAGAIN;
> 
> Might it also be better to split that test to (say):
> 
>   u32 wmem_alloc = atomic_read(&sk->sk_wmem_alloc);
>   if (unlikely((wmem_alloc > sk->sk_sndbuf))
>   return -EAGAIN;
>   if (unlikely(wmem_alloc > sk->sk_wmem_queued + (sk->sk_wmem_queued >> 
> 2)))
>   return -EAGAIN;

Well, I find the existing code more readable, but this is just an
opinion.

Thanks.




[PATCH] irda: Free skb on irda_accept error path.

2016-09-15 Thread Phil Turnbull
skb is not freed if newsk is NULL. Rework the error path so free_skb is
unconditionally called on function exit.

Fixes: c3ea9fa27413 ("[IrDA] af_irda: IRDA_ASSERT cleanups")
Signed-off-by: Phil Turnbull 
---
 net/irda/af_irda.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 8d2f7c9b491d..ccc244406fb9 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -832,7 +832,7 @@ static int irda_accept(struct socket *sock, struct socket 
*newsock, int flags)
struct sock *sk = sock->sk;
struct irda_sock *new, *self = irda_sk(sk);
struct sock *newsk;
-   struct sk_buff *skb;
+   struct sk_buff *skb = NULL;
int err;
 
err = irda_create(sock_net(sk), newsock, sk->sk_protocol, 0);
@@ -900,7 +900,6 @@ static int irda_accept(struct socket *sock, struct socket 
*newsock, int flags)
err = -EPERM; /* value does not seem to make sense. -arnd */
if (!new->tsap) {
pr_debug("%s(), dup failed!\n", __func__);
-   kfree_skb(skb);
goto out;
}
 
@@ -919,7 +918,6 @@ static int irda_accept(struct socket *sock, struct socket 
*newsock, int flags)
/* Clean up the original one to keep it in listen state */
irttp_listen(self->tsap);
 
-   kfree_skb(skb);
sk->sk_ack_backlog--;
 
newsock->state = SS_CONNECTED;
@@ -927,6 +925,7 @@ static int irda_accept(struct socket *sock, struct socket 
*newsock, int flags)
irda_connect_response(new);
err = 0;
 out:
+   kfree_skb(skb);
release_sock(sk);
return err;
 }
-- 
2.9.0.rc2



[net-next PATCH] net: netlink messages for HW addr programming

2016-09-15 Thread Patrick Ruddy
Add RTM_NEWADDR and RTM_DELADDR netlink messages with family
AF_UNSPEC to indicate interest in specific unicast and multicast
hardware addresses. These messages are sent when addresses are
added or deleted from the appropriate interface driver.
Added AF_UNSPEC GETADDR function to allow the netlink notifications
to be replayed to avoid loss of state due to application start
ordering or restart.

Signed-off-by: Patrick Ruddy 
---
 include/linux/netdevice.h |   1 +
 net/core/dev_addr_lists.c | 157 --
 net/core/rtnetlink.c  |   8 ++-
 3 files changed, 161 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2095b6a..2029618 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3751,6 +3751,7 @@ int dev_mc_sync_multiple(struct net_device *to, struct 
net_device *from);
 void dev_mc_unsync(struct net_device *to, struct net_device *from);
 void dev_mc_flush(struct net_device *dev);
 void dev_mc_init(struct net_device *dev);
+int unspec_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb);
 
 /**
  *  __dev_mc_sync - Synchonize device's multicast list
diff --git a/net/core/dev_addr_lists.c b/net/core/dev_addr_lists.c
index c0548d2..70343e6 100644
--- a/net/core/dev_addr_lists.c
+++ b/net/core/dev_addr_lists.c
@@ -12,9 +12,17 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
+#include 
+
+enum unspec_addr_idx {
+   UNSPEC_UCAST = 0,
+   UNSPEC_MCAST,
+   UNSPEC_MAX
+};
 
 /*
  * General list handling functions
@@ -477,6 +485,139 @@ out:
 }
 EXPORT_SYMBOL(dev_uc_add_excl);
 
+static int fill_addr(struct sk_buff *skb, struct net_device *dev,
+const unsigned char *addr, u32 seq, int type,
+int addr_type, int ifa_flags, unsigned int flags)
+{
+   struct nlmsghdr *nlh;
+   struct ifaddrmsg *ifm;
+
+   nlh = nlmsg_put(skb, 0, seq, type, sizeof(*ifm), flags);
+   if (!nlh)
+   return -EMSGSIZE;
+
+   ifm = nlmsg_data(nlh);
+   ifm->ifa_family = AF_UNSPEC;
+   ifm->ifa_prefixlen = 0;
+   ifm->ifa_flags = ifa_flags;
+   ifm->ifa_scope = RT_SCOPE_LINK;
+   ifm->ifa_index = dev->ifindex;
+   if (nla_put(skb, addr_type, dev->addr_len, addr))
+   goto nla_put_failure;
+   nlmsg_end(skb, nlh);
+   return 0;
+
+nla_put_failure:
+   nlmsg_cancel(skb, nlh);
+   return -EMSGSIZE;
+}
+
+static inline size_t addr_nlmsg_size(void)
+{
+   return NLMSG_ALIGN(sizeof(struct ifaddrmsg))
+   + nla_total_size(MAX_ADDR_LEN);
+}
+
+static void addr_notify(struct net_device *dev, const unsigned char *addr,
+   int type, int addr_type)
+{
+   struct net *net = dev_net(dev);
+   struct sk_buff *skb;
+   int err = -ENOBUFS;
+
+   skb = nlmsg_new(addr_nlmsg_size(), GFP_ATOMIC);
+   if (!skb)
+   goto errout;
+
+   err = fill_addr(skb, dev, addr, 0, type, addr_type, IFA_F_SECONDARY,
+   0);
+   if (err < 0) {
+   WARN_ON(err == -EMSGSIZE);
+   kfree_skb(skb);
+   goto errout;
+   }
+   rtnl_notify(skb, net, 0, RTNLGRP_LINK, NULL, GFP_ATOMIC);
+   return;
+errout:
+   if (err < 0)
+   rtnl_set_sk_err(net, RTNLGRP_LINK, err);
+}
+
+int unspec_dump_ifaddr(struct sk_buff *skb, struct netlink_callback *cb)
+{
+   struct net *net = sock_net(skb->sk);
+   struct net_device *dev;
+   struct hlist_head *head;
+   struct netdev_hw_addr_list *list;
+   struct netdev_hw_addr *ha;
+   int h, s_h;
+   int idx = 0, s_idx;
+   int mac_idx = 0, s_mac_idx;
+   enum unspec_addr_idx addr_idx = 0, s_addr_idx;
+   int err = 0;
+
+   s_h = cb->args[0];
+   s_idx = cb->args[1];
+   s_addr_idx = cb->args[2];
+   s_mac_idx = cb->args[3];
+
+   rcu_read_lock();
+   for (h = s_h; h < NETDEV_HASHENTRIES; h++, s_idx = 0) {
+   idx = 0;
+   head = &net->dev_index_head[h];
+   cb->seq = atomic_read(&net->ipv4.dev_addr_genid) ^
+ net->dev_base_seq;
+   hlist_for_each_entry_rcu(dev, head, index_hlist) {
+   if (idx < s_idx)
+   goto cont;
+   if (h > s_h || idx > s_idx)
+   s_mac_idx = 0;
+   for (addr_idx = 0; addr_idx < UNSPEC_MAX;
+addr_idx++, s_addr_idx = 0) {
+   if (addr_idx < s_addr_idx)
+   continue;
+   list = (addr_idx == UNSPEC_UCAST) ? &dev->uc :
+   &dev->mc;
+   if (netdev_hw_addr_list_empty(list))
+   continue;
+   mac_idx = 0;
+  

Re: [PATCH] mwifiex: fix memory leak on regd when chan is zero

2016-09-15 Thread Kalle Valo
Colin King  writes:

> From: Colin Ian King 
>
> When chan is zero mwifiex_create_custom_regdomain does not kfree
> regd and we have a memory leak. Fix this by freeing regd before
> the return.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c 
> b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
> index 3344a26..15a91f3 100644
> --- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
> +++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
> @@ -1049,8 +1049,10 @@ mwifiex_create_custom_regdomain(struct mwifiex_private 
> *priv,
>   enum nl80211_band band;
>  
>   chan = *buf++;
> - if (!chan)
> + if (!chan) {
> + kfree(regd);
>   return NULL;
> + }

Bob sent a similar fix and he also did more:

mwifiex: fix error handling in mwifiex_create_custom_regdomain

https://patchwork.kernel.org/patch/9331337/

-- 
Kalle Valo


[PATCH net-next] net: vrf: Remove RT_FL_TOS

2016-09-15 Thread David Ahern
No longer used after d66f6c0a8f3c0 ("net: ipv4: Remove l3mdev_get_saddr")

Signed-off-by: David Ahern 
---
 drivers/net/vrf.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/vrf.c b/drivers/net/vrf.c
index 55674b0e65b7..85c271c70d42 100644
--- a/drivers/net/vrf.c
+++ b/drivers/net/vrf.c
@@ -37,9 +37,6 @@
 #include 
 #include 
 
-#define RT_FL_TOS(oldflp4) \
-   ((oldflp4)->flowi4_tos & (IPTOS_RT_MASK | RTO_ONLINK))
-
 #define DRV_NAME   "vrf"
 #define DRV_VERSION"1.0"
 
-- 
2.1.4



Re: [PATCH net-next 1/7] lwt: Add net to build_state argument

2016-09-15 Thread Roopa Prabhu
On 9/14/16, 4:22 PM, Tom Herbert wrote:
> Users of LWT need to know net if they want to have per net operations
> in LWT.
>
> Signed-off-by: Tom Herbert 
> ---
>  
Acked-by: Roopa Prabhu 


[PATCH net-next] net: l3mdev: Remove netif_index_is_l3_master

2016-09-15 Thread David Ahern
No longer used after e0d56fdd73422 ("net: l3mdev: remove redundant calls")

Signed-off-by: David Ahern 
---
 include/net/l3mdev.h | 24 
 1 file changed, 24 deletions(-)

diff --git a/include/net/l3mdev.h b/include/net/l3mdev.h
index 3832099289c5..b220dabeab45 100644
--- a/include/net/l3mdev.h
+++ b/include/net/l3mdev.h
@@ -114,25 +114,6 @@ static inline u32 l3mdev_fib_table(const struct net_device 
*dev)
return tb_id;
 }
 
-static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
-{
-   struct net_device *dev;
-   bool rc = false;
-
-   if (ifindex == 0)
-   return false;
-
-   rcu_read_lock();
-
-   dev = dev_get_by_index_rcu(net, ifindex);
-   if (dev)
-   rc = netif_is_l3_master(dev);
-
-   rcu_read_unlock();
-
-   return rc;
-}
-
 struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 
*fl6);
 
 static inline
@@ -226,11 +207,6 @@ static inline u32 l3mdev_fib_table_by_index(struct net 
*net, int ifindex)
return 0;
 }
 
-static inline bool netif_index_is_l3_master(struct net *net, int ifindex)
-{
-   return false;
-}
-
 static inline
 struct dst_entry *l3mdev_link_scope_lookup(struct net *net, struct flowi6 *fl6)
 {
-- 
2.1.4



Re: [PATCH] mwifiex: fix memory leak on regd when chan is zero

2016-09-15 Thread Colin Ian King
On 15/09/16 18:10, Kalle Valo wrote:
> Colin King  writes:
> 
>> From: Colin Ian King 
>>
>> When chan is zero mwifiex_create_custom_regdomain does not kfree
>> regd and we have a memory leak. Fix this by freeing regd before
>> the return.
>>
>> Signed-off-by: Colin Ian King 
>> ---
>>  drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c | 4 +++-
>>  1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c 
>> b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
>> index 3344a26..15a91f3 100644
>> --- a/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
>> +++ b/drivers/net/wireless/marvell/mwifiex/sta_cmdresp.c
>> @@ -1049,8 +1049,10 @@ mwifiex_create_custom_regdomain(struct 
>> mwifiex_private *priv,
>>  enum nl80211_band band;
>>  
>>  chan = *buf++;
>> -if (!chan)
>> +if (!chan) {
>> +kfree(regd);
>>  return NULL;
>> +}
> 
> Bob sent a similar fix and he also did more:
> 
> mwifiex: fix error handling in mwifiex_create_custom_regdomain
> 
> https://patchwork.kernel.org/patch/9331337/
> 
Ah, sorry for the duplication noise.

Colin


[PATCH] llc: switch type to bool as the timeout is only tested versus 0

2016-09-15 Thread Alan
(As asked by Dave in Februrary)

Signed-off-by: Alan Cox 

---
 net/llc/af_llc.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 8ae3ed9..db916cf 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -38,7 +38,7 @@ static u16 llc_ui_sap_link_no_max[256];
 static struct sockaddr_llc llc_ui_addrnull;
 static const struct proto_ops llc_ui_ops;
 
-static long llc_ui_wait_for_conn(struct sock *sk, long timeout);
+static bool llc_ui_wait_for_conn(struct sock *sk, long timeout);
 static int llc_ui_wait_for_disc(struct sock *sk, long timeout);
 static int llc_ui_wait_for_busy_core(struct sock *sk, long timeout);
 
@@ -551,7 +551,7 @@ static int llc_ui_wait_for_disc(struct sock *sk, long 
timeout)
return rc;
 }
 
-static long llc_ui_wait_for_conn(struct sock *sk, long timeout)
+static bool llc_ui_wait_for_conn(struct sock *sk, long timeout)
 {
DEFINE_WAIT(wait);
 



Re: [PATCH net-next] tcp: prepare skbs for better sack shifting

2016-09-15 Thread Yuchung Cheng
On Thu, Sep 15, 2016 at 9:33 AM, Eric Dumazet  wrote:
>
> From: Eric Dumazet 
>
> With large BDP TCP flows and lossy networks, it is very important
> to keep a low number of skbs in the write queue.
>
> RACK and SACK processing can perform a linear scan of it.
>
> We should avoid putting any payload in skb->head, so that SACK
> shifting can be done if needed.
>
> With this patch, we allow to pack ~0.5 MB per skb instead of
> the 64KB initially cooked at tcp_sendmsg() time.
>
> This gives a reduction of number of skbs in write queue by eight.
> tcp_rack_detect_loss() likes this.
>
> We still allow payload in skb->head for first skb put in the queue,
> to not impact RPC workloads.
>
> Signed-off-by: Eric Dumazet 
> Cc: Yuchung Cheng 
Acked-by: Yuchung Cheng 


> ---
>  net/ipv4/tcp.c |   31 ---
>  1 file changed, 24 insertions(+), 7 deletions(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 
> a13fcb369f52fe85def7c9d856259bc0509f3453..7dae800092e62cec330544851289d20a68642561
>  100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -1020,17 +1020,31 @@ int tcp_sendpage(struct sock *sk, struct page *page, 
> int offset,
>  }
>  EXPORT_SYMBOL(tcp_sendpage);
>
> -static inline int select_size(const struct sock *sk, bool sg)
> +/* Do not bother using a page frag for very small frames.
> + * But use this heuristic only for the first skb in write queue.
> + *
> + * Having no payload in skb->head allows better SACK shifting
> + * in tcp_shift_skb_data(), reducing sack/rack overhead, because
> + * write queue has less skbs.
> + * Each skb can hold up to MAX_SKB_FRAGS * 32Kbytes, or ~0.5 MB.
> + * This also speeds up tso_fragment(), since it wont fallback
> + * to tcp_fragment().
> + */
> +static int linear_payload_sz(bool first_skb)
> +{
> +   if (first_skb)
> +   return SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +   return 0;
> +}
> +
> +static int select_size(const struct sock *sk, bool sg, bool first_skb)
>  {
> const struct tcp_sock *tp = tcp_sk(sk);
> int tmp = tp->mss_cache;
>
> if (sg) {
> if (sk_can_gso(sk)) {
> -   /* Small frames wont use a full page:
> -* Payload will immediately follow tcp header.
> -*/
> -   tmp = SKB_WITH_OVERHEAD(2048 - MAX_TCP_HEADER);
> +   tmp = linear_payload_sz(first_skb);
> } else {
> int pgbreak = SKB_MAX_HEAD(MAX_TCP_HEADER);
>
> @@ -1161,6 +1175,8 @@ restart:
> }
>
> if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) {
> +   bool first_skb;
> +
>  new_segment:
> /* Allocate new segment. If the interface is SG,
>  * allocate skb fitting to single page.
> @@ -1172,10 +1188,11 @@ new_segment:
> process_backlog = false;
> goto restart;
> }
> +   first_skb = skb_queue_empty(&sk->sk_write_queue);
> skb = sk_stream_alloc_skb(sk,
> - select_size(sk, sg),
> + select_size(sk, sg, 
> first_skb),
>   sk->sk_allocation,
> - 
> skb_queue_empty(&sk->sk_write_queue));
> + first_skb);
> if (!skb)
> goto wait_for_memory;
>
>
>


[PATCH net] sctp: fix SSN comparision

2016-09-15 Thread Marcelo Ricardo Leitner
This function actually operates on u32 yet its paramteres were declared
as u16, causing integer truncation upon calling.

Note in patch context that ADDIP_SERIAL_SIGN_BIT is already 32 bits.

Signed-off-by: Marcelo Ricardo Leitner 
---

This issue exists since before git import, so I can't put a Fixes tag.
Also, that said, probably not worth queueing it to stable.
Thanks

 include/net/sctp/sm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/sctp/sm.h b/include/net/sctp/sm.h
index 
efc01743b9d641bf6b16a37780ee0df34b4ec698..bafe2a0ab9085f24e17038516c55c00cfddd02f4
 100644
--- a/include/net/sctp/sm.h
+++ b/include/net/sctp/sm.h
@@ -382,7 +382,7 @@ enum {
ADDIP_SERIAL_SIGN_BIT = (1<<31)
 };
 
-static inline int ADDIP_SERIAL_gte(__u16 s, __u16 t)
+static inline int ADDIP_SERIAL_gte(__u32 s, __u32 t)
 {
return ((s) == (t)) || (((t) - (s)) & ADDIP_SERIAL_SIGN_BIT);
 }
-- 
2.7.4



[PATCH next] sctp: make use of WORD_TRUNC macro

2016-09-15 Thread Marcelo Ricardo Leitner
No functional change. Just to avoid the usage of '&~3'.
Also break the line to make it easier to read.

Signed-off-by: Marcelo Ricardo Leitner 
---
 net/sctp/chunk.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index 
a55e54738b81ff8cf9cd711cf5fc466ac71374c0..adae4a41ca2078cfee387631f76e5cb768c2269c
 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -182,9 +182,10 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct 
sctp_association *asoc,
/* This is the biggest possible DATA chunk that can fit into
 * the packet
 */
-   max_data = (asoc->pathmtu -
-   sctp_sk(asoc->base.sk)->pf->af->net_header_len -
-   sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk)) & ~3;
+   max_data = asoc->pathmtu -
+  sctp_sk(asoc->base.sk)->pf->af->net_header_len -
+  sizeof(struct sctphdr) - sizeof(struct sctp_data_chunk);
+   max_data = WORD_TRUNC(max_data);
 
max = asoc->frag_point;
/* If the the peer requested that we authenticate DATA chunks
-- 
2.7.4



XDP user interface conclusions

2016-09-15 Thread Jesper Dangaard Brouer
Hi Brenden,

I don't quite understand the semantics of the XDP userspace interface.

We allow XDP programs to be (unconditionally) exchanged by another
program, this avoids taking the link down+up and avoids reallocating
RX ring resources (which is great).

We have two XDP samples programs in samples/bpf/ xdp1 and xdp2.  Now I
want to first load xdp1 and then to avoid the linkdown I load xdp2,
and then afterwards remove/stop program xdp1.

This does NOT work, because (in samples/bpf/xdp1_user.c) when xdp1
exits it unconditionally removes the running XDP program (loaded by xdp2)
via set_link_xdp_fd(ifindex, -1).  The xdp2 user program is still
running, and is unaware of its xdp/bpf program have been unloaded.

I find this userspace interface confusing. What this your intention?
Perhaps you can explain what the intended semantics or specification is?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


XDP user interface confusions

2016-09-15 Thread Jesper Dangaard Brouer
Hi Brenden,

I don't quite understand the semantics of the XDP userspace interface.

We allow XDP programs to be (unconditionally) exchanged by another
program, this avoids taking the link down+up and avoids reallocating
RX ring resources (which is great).

We have two XDP samples programs in samples/bpf/ xdp1 and xdp2.  Now I
want to first load xdp1 and then to avoid the linkdown I load xdp2,
and then afterwards remove/stop program xdp1.

This does NOT work, because (in samples/bpf/xdp1_user.c) when xdp1
exits it unconditionally removes the running XDP program (loaded by xdp2)
via set_link_xdp_fd(ifindex, -1).  The xdp2 user program is still
running, and is unaware of its xdp/bpf program have been unloaded.

I find this userspace interface confusing. What this your intention?
Perhaps you can explain what the intended semantics or specification is?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: XDP user interface confusions

2016-09-15 Thread Brenden Blanco
On Thu, Sep 15, 2016 at 08:14:02PM +0200, Jesper Dangaard Brouer wrote:
> Hi Brenden,
> 
> I don't quite understand the semantics of the XDP userspace interface.
> 
> We allow XDP programs to be (unconditionally) exchanged by another
> program, this avoids taking the link down+up and avoids reallocating
> RX ring resources (which is great).
> 
> We have two XDP samples programs in samples/bpf/ xdp1 and xdp2.  Now I
> want to first load xdp1 and then to avoid the linkdown I load xdp2,
> and then afterwards remove/stop program xdp1.
> 
> This does NOT work, because (in samples/bpf/xdp1_user.c) when xdp1
> exits it unconditionally removes the running XDP program (loaded by xdp2)
> via set_link_xdp_fd(ifindex, -1).  The xdp2 user program is still
> running, and is unaware of its xdp/bpf program have been unloaded.
> 
> I find this userspace interface confusing. What this your intention?
> Perhaps you can explain what the intended semantics or specification is?

In practice, we've used a single agent process to manage bpf programs on
behalf of the user applications. This agent process uses common linux
functionalities to add semantics, while not really relying on the bpf
handles themselves to take care of that. For instance, the process may
put some lockfiles and what-not in /var/run/$PID, and maybe returns the
list of running programs through a http: or unix: interface.

So, from a user<->kernel API, the requirements are minimal...the agent
process just overwrites the loaded bpf program when the application
changes, or a new application comes online. There is nobody to 'notify'
when a handle changes.

When translating this into the kernel api that you see now, none of this
exists, because IMHO the kernel api should be unopinionated and generic.
The result is something that appears very "fire-and-forget", which
results in something simple yet safe at the same time; the refcounting
is done transparently by the kernel.

So, in practice, there is no xdp1 or xdp2, just xdp-agent at different
points in time. Or, better yet, no agent, just the programs running in
the kernel, with the handles of the programs residing solely in the
device, which are perhaps pinned to /sys/fs/bpf for semantic management
purposes. I didn't feel like it was appropriate to conflate different
bpf features in the kernel samples, so we don't see (and probably never
will) a sample which combines these features into a whole. That is best
left to userspace tools. It so happens that this is one of the projects
I am currently active on at $DAYJOB, and we fully intend to share the
details of that when it's in a suitable state.
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   Author of http://www.iptv-analyzer.org
>   LinkedIn: http://www.linkedin.com/in/brouer


MDB offloading of local ipv4 multicast groups

2016-09-15 Thread John Crispin
Hi,

While adding MDB support to the qca8k dsa driver I found that ipv4 mcast
groups don't always get propagated to the dsa driver. In my setup there
are 2 clients connected to the switch, both running a mdns client. The
.port_mdb_add() callback is properly called for 33:33:00:00:00:FB but
01:00:5E:00:00:FB never got propagated to the dsa driver.

The reason is that the call to ipv4_is_local_multicast() here [1] will
return true and the notifier is never called. Is this intentional or is
there something missing in the code ?

John

[1]
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/net/bridge/br_multicast.c?id=refs/tags/v4.8-rc6#n737


[PATCHv4 net-next 04/15] bpf: don't (ab)use instructions to store state

2016-09-15 Thread Jakub Kicinski
Storing state in reserved fields of instructions makes
it impossible to run verifier on programs already
marked as read-only. Allocate and use an array of
per-instruction state instead.

While touching the error path rename and move existing
jump target.

Suggested-by: Alexei Starovoitov 
Signed-off-by: Jakub Kicinski 
Acked-by: Alexei Starovoitov 
---
v3:
 - new patch.
---
 kernel/bpf/verifier.c | 51 ---
 1 file changed, 32 insertions(+), 19 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 086b3979380c..ce9c0d1721c6 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -181,6 +181,10 @@ struct verifier_stack_elem {
struct verifier_stack_elem *next;
 };
 
+struct bpf_insn_aux_data {
+   enum bpf_reg_type ptr_type; /* pointer type for load/store insns */
+};
+
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
 
 /* single container for all structs
@@ -196,6 +200,7 @@ struct verifier_env {
u32 used_map_cnt;   /* number of used maps */
u32 id_gen; /* used to generate unique reg IDs */
bool allow_ptr_leaks;
+   struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 };
 
 #define BPF_COMPLEXITY_LIMIT_INSNS 65536
@@ -2340,7 +2345,7 @@ static int do_check(struct verifier_env *env)
return err;
 
} else if (class == BPF_LDX) {
-   enum bpf_reg_type src_reg_type;
+   enum bpf_reg_type *prev_src_type, src_reg_type;
 
/* check for reserved fields is already done */
 
@@ -2370,16 +2375,18 @@ static int do_check(struct verifier_env *env)
continue;
}
 
-   if (insn->imm == 0) {
+   prev_src_type = &env->insn_aux_data[insn_idx].ptr_type;
+
+   if (*prev_src_type == NOT_INIT) {
/* saw a valid insn
 * dst_reg = *(u32 *)(src_reg + off)
-* use reserved 'imm' field to mark this insn
+* save type to validate intersecting paths
 */
-   insn->imm = src_reg_type;
+   *prev_src_type = src_reg_type;
 
-   } else if (src_reg_type != insn->imm &&
+   } else if (src_reg_type != *prev_src_type &&
   (src_reg_type == PTR_TO_CTX ||
-   insn->imm == PTR_TO_CTX)) {
+   *prev_src_type == PTR_TO_CTX)) {
/* ABuser program is trying to use the same insn
 * dst_reg = *(u32*) (src_reg + off)
 * with different pointer types:
@@ -2392,7 +2399,7 @@ static int do_check(struct verifier_env *env)
}
 
} else if (class == BPF_STX) {
-   enum bpf_reg_type dst_reg_type;
+   enum bpf_reg_type *prev_dst_type, dst_reg_type;
 
if (BPF_MODE(insn->code) == BPF_XADD) {
err = check_xadd(env, insn);
@@ -2420,11 +2427,13 @@ static int do_check(struct verifier_env *env)
if (err)
return err;
 
-   if (insn->imm == 0) {
-   insn->imm = dst_reg_type;
-   } else if (dst_reg_type != insn->imm &&
+   prev_dst_type = &env->insn_aux_data[insn_idx].ptr_type;
+
+   if (*prev_dst_type == NOT_INIT) {
+   *prev_dst_type = dst_reg_type;
+   } else if (dst_reg_type != *prev_dst_type &&
   (dst_reg_type == PTR_TO_CTX ||
-   insn->imm == PTR_TO_CTX)) {
+   *prev_dst_type == PTR_TO_CTX)) {
verbose("same insn cannot be used with 
different pointers\n");
return -EINVAL;
}
@@ -2703,11 +2712,8 @@ static int convert_ctx_accesses(struct verifier_env *env)
else
continue;
 
-   if (insn->imm != PTR_TO_CTX) {
-   /* clear internal mark */
-   insn->imm = 0;
+   if (env->insn_aux_data[i].ptr_type != PTR_TO_CTX)
continue;
-   }
 
cnt = env->prog->aux->ops->
convert_ctx_access(type, insn->dst_reg, insn->src_reg,
@@ -2772,6 +2778,11 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
if (!env)
  

[PATCHv4 net-next 10/15] net: cls_bpf: allow offloaded filters to update stats

2016-09-15 Thread Jakub Kicinski
Call into offloaded filters to update stats.

Signed-off-by: Jakub Kicinski 
Acked-by: Daniel Borkmann 
---
 include/net/pkt_cls.h |  1 +
 net/sched/cls_bpf.c   | 11 +++
 2 files changed, 12 insertions(+)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 57af9f3032ff..5ccaa4be7d96 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -490,6 +490,7 @@ enum tc_clsbpf_command {
TC_CLSBPF_ADD,
TC_CLSBPF_REPLACE,
TC_CLSBPF_DESTROY,
+   TC_CLSBPF_STATS,
 };
 
 struct tc_cls_bpf_offload {
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 1aad314089e9..86ef331f78e8 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -223,6 +223,15 @@ static void cls_bpf_stop_offload(struct tcf_proto *tp,
prog->offloaded = false;
 }
 
+static void cls_bpf_offload_update_stats(struct tcf_proto *tp,
+struct cls_bpf_prog *prog)
+{
+   if (!prog->offloaded)
+   return;
+
+   cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_STATS);
+}
+
 static int cls_bpf_init(struct tcf_proto *tp)
 {
struct cls_bpf_head *head;
@@ -578,6 +587,8 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto 
*tp, unsigned long fh,
 
tm->tcm_handle = prog->handle;
 
+   cls_bpf_offload_update_stats(tp, prog);
+
nest = nla_nest_start(skb, TCA_OPTIONS);
if (nest == NULL)
goto nla_put_failure;
-- 
1.9.1



[PATCHv4 net-next 07/15] bpf: recognize 64bit immediate loads as consts

2016-09-15 Thread Jakub Kicinski
When running as parser interpret BPF_LD | BPF_IMM | BPF_DW
instructions as loading CONST_IMM with the value stored
in imm.  The verifier will continue not recognizing those
due to concerns about search space/program complexity
increase.

Signed-off-by: Jakub Kicinski 
---
v3:
 - limit to parsers.
---
 kernel/bpf/verifier.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index d93e78331b90..f5bed7cce08d 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1766,9 +1766,19 @@ static int check_ld_imm(struct bpf_verifier_env *env, 
struct bpf_insn *insn)
if (err)
return err;
 
-   if (insn->src_reg == 0)
-   /* generic move 64-bit immediate into a register */
+   if (insn->src_reg == 0) {
+   /* generic move 64-bit immediate into a register,
+* only analyzer needs to collect the ld_imm value.
+*/
+   u64 imm = ((u64)(insn + 1)->imm << 32) | (u32)insn->imm;
+
+   if (!env->analyzer_ops)
+   return 0;
+
+   regs[insn->dst_reg].type = CONST_IMM;
+   regs[insn->dst_reg].imm = imm;
return 0;
+   }
 
/* replace_map_fd_with_map_ptr() should have caught bad ld_imm64 */
BUG_ON(insn->src_reg != BPF_PSEUDO_MAP_FD);
-- 
1.9.1



[PATCHv4 net-next 02/15] net: cls_bpf: limit hardware offload by software-only flag

2016-09-15 Thread Jakub Kicinski
Add cls_bpf support for the TCA_CLS_FLAGS_SKIP_HW flag.
Unlike U32 and flower cls_bpf already has some netlink
flags defined.  Create a new attribute to be able to use
the same flag values as the above.

Unlike U32 and flower reject unknown flags.

Signed-off-by: Jakub Kicinski 
Acked-by: Daniel Borkmann 
---
v3:
 - reject (instead of clear) unsupported flags;
 - fix error handling.
v2:
 - rename TCA_BPF_GEN_TCA_FLAGS -> TCA_BPF_FLAGS_GEN;
 - add comment about clearing unsupported flags;
 - validate flags after clearing unsupported.
---
 include/net/pkt_cls.h|  1 +
 include/uapi/linux/pkt_cls.h |  1 +
 net/sched/cls_bpf.c  | 22 --
 3 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index 41e8071dff87..57af9f3032ff 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -498,6 +498,7 @@ struct tc_cls_bpf_offload {
struct bpf_prog *prog;
const char *name;
bool exts_integrated;
+   u32 gen_flags;
 };
 
 #endif
diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index f9c287c67eae..91dd136445f3 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -396,6 +396,7 @@ enum {
TCA_BPF_FD,
TCA_BPF_NAME,
TCA_BPF_FLAGS,
+   TCA_BPF_FLAGS_GEN,
__TCA_BPF_MAX,
 };
 
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index c3983493aeab..1ae5b6798363 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -27,6 +27,8 @@ MODULE_AUTHOR("Daniel Borkmann ");
 MODULE_DESCRIPTION("TC BPF based classifier");
 
 #define CLS_BPF_NAME_LEN   256
+#define CLS_BPF_SUPPORTED_GEN_FLAGS\
+   TCA_CLS_FLAGS_SKIP_HW
 
 struct cls_bpf_head {
struct list_head plist;
@@ -40,6 +42,7 @@ struct cls_bpf_prog {
struct tcf_result res;
bool exts_integrated;
bool offloaded;
+   u32 gen_flags;
struct tcf_exts exts;
u32 handle;
union {
@@ -55,6 +58,7 @@ struct cls_bpf_prog {
 static const struct nla_policy bpf_policy[TCA_BPF_MAX + 1] = {
[TCA_BPF_CLASSID]   = { .type = NLA_U32 },
[TCA_BPF_FLAGS] = { .type = NLA_U32 },
+   [TCA_BPF_FLAGS_GEN] = { .type = NLA_U32 },
[TCA_BPF_FD]= { .type = NLA_U32 },
[TCA_BPF_NAME]  = { .type = NLA_NUL_STRING, .len = 
CLS_BPF_NAME_LEN },
[TCA_BPF_OPS_LEN]   = { .type = NLA_U16 },
@@ -156,6 +160,7 @@ static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct 
cls_bpf_prog *prog,
bpf_offload.prog = prog->filter;
bpf_offload.name = prog->bpf_name;
bpf_offload.exts_integrated = prog->exts_integrated;
+   bpf_offload.gen_flags = prog->gen_flags;
 
return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
 tp->protocol, &offload);
@@ -169,14 +174,14 @@ static void cls_bpf_offload(struct tcf_proto *tp, struct 
cls_bpf_prog *prog,
enum tc_clsbpf_command cmd;
 
if (oldprog && oldprog->offloaded) {
-   if (tc_should_offload(dev, tp, 0)) {
+   if (tc_should_offload(dev, tp, prog->gen_flags)) {
cmd = TC_CLSBPF_REPLACE;
} else {
obj = oldprog;
cmd = TC_CLSBPF_DESTROY;
}
} else {
-   if (!tc_should_offload(dev, tp, 0))
+   if (!tc_should_offload(dev, tp, prog->gen_flags))
return;
cmd = TC_CLSBPF_ADD;
}
@@ -372,6 +377,7 @@ static int cls_bpf_modify_existing(struct net *net, struct 
tcf_proto *tp,
 {
bool is_bpf, is_ebpf, have_exts = false;
struct tcf_exts exts;
+   u32 gen_flags = 0;
int ret;
 
is_bpf = tb[TCA_BPF_OPS_LEN] && tb[TCA_BPF_OPS];
@@ -396,8 +402,17 @@ static int cls_bpf_modify_existing(struct net *net, struct 
tcf_proto *tp,
 
have_exts = bpf_flags & TCA_BPF_FLAG_ACT_DIRECT;
}
+   if (tb[TCA_BPF_FLAGS_GEN]) {
+   gen_flags = nla_get_u32(tb[TCA_BPF_FLAGS_GEN]);
+   if (gen_flags & ~CLS_BPF_SUPPORTED_GEN_FLAGS ||
+   !tc_flags_valid(gen_flags)) {
+   ret = -EINVAL;
+   goto errout;
+   }
+   }
 
prog->exts_integrated = have_exts;
+   prog->gen_flags = gen_flags;
 
ret = is_bpf ? cls_bpf_prog_from_ops(tb, prog) :
   cls_bpf_prog_from_efd(tb, prog, tp);
@@ -569,6 +584,9 @@ static int cls_bpf_dump(struct net *net, struct tcf_proto 
*tp, unsigned long fh,
bpf_flags |= TCA_BPF_FLAG_ACT_DIRECT;
if (bpf_flags && nla_put_u32(skb, TCA_BPF_FLAGS, bpf_flags))
goto nla_put_failure;
+   if (prog->gen_flags &&
+   nla_put_u32(skb, TCA_BPF_FLAGS_GEN, prog->gen_flags))
+   goto nla_put_failure;
 

[PATCHv4 net-next 14/15] nfp: bpf: add support for legacy redirect action

2016-09-15 Thread Jakub Kicinski
Data path has redirect support so expressing redirect
to the port frame came from is a trivial matter of
setting the right result code.

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h | 1 +
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c | 2 ++
 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c | 4 
 3 files changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf.h 
b/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
index d550fbc4768a..378d3c35cad5 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf.h
@@ -60,6 +60,7 @@ enum static_regs {
 
 enum nfp_bpf_action_type {
NN_ACT_TC_DROP,
+   NN_ACT_TC_REDIR,
 };
 
 /* Software register representation, hardware encoding in asm.h */
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c 
b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
index 42a8afb67fc8..60a99e0bf459 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
@@ -1440,6 +1440,7 @@ static void nfp_outro_tc_legacy(struct nfp_prog *nfp_prog)
 {
const u8 act2code[] = {
[NN_ACT_TC_DROP]  = 0x22,
+   [NN_ACT_TC_REDIR] = 0x24
};
/* Target for aborts */
nfp_prog->tgt_abort = nfp_prog_current_offset(nfp_prog);
@@ -1468,6 +1469,7 @@ static void nfp_outro(struct nfp_prog *nfp_prog)
 {
switch (nfp_prog->act) {
case NN_ACT_TC_DROP:
+   case NN_ACT_TC_REDIR:
nfp_outro_tc_legacy(nfp_prog);
break;
}
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
index 0537a53e2174..1ec8e5b74651 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
@@ -123,6 +123,10 @@ nfp_net_bpf_get_act(struct nfp_net *nn, struct 
tc_cls_bpf_offload *cls_bpf)
list_for_each_entry(a, &actions, list) {
if (is_tcf_gact_shot(a))
return NN_ACT_TC_DROP;
+
+   if (is_tcf_mirred_redirect(a) &&
+   tcf_mirred_ifindex(a) == nn->netdev->ifindex)
+   return NN_ACT_TC_REDIR;
}
 
return -ENOTSUPP;
-- 
1.9.1



[PATCHv4 net-next 01/15] net: cls_bpf: add hardware offload

2016-09-15 Thread Jakub Kicinski
This patch adds hardware offload capability to cls_bpf classifier,
similar to what have been done with U32 and flower.

Signed-off-by: Jakub Kicinski 
Acked-by: Daniel Borkmann 
---
v3:
 - s/filter/prog/ in struct tc_cls_bpf_offload.
v2:
 - drop unnecessary WARN_ON;
 - reformat error handling a bit.
---
 include/linux/netdevice.h |  2 ++
 include/net/pkt_cls.h | 14 ++
 net/sched/cls_bpf.c   | 70 +++
 3 files changed, 86 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2095b6ab3661..3c50db29a114 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -789,6 +789,7 @@ enum {
TC_SETUP_CLSU32,
TC_SETUP_CLSFLOWER,
TC_SETUP_MATCHALL,
+   TC_SETUP_CLSBPF,
 };
 
 struct tc_cls_u32_offload;
@@ -800,6 +801,7 @@ struct tc_to_netdev {
struct tc_cls_u32_offload *cls_u32;
struct tc_cls_flower_offload *cls_flower;
struct tc_cls_matchall_offload *cls_mall;
+   struct tc_cls_bpf_offload *cls_bpf;
};
 };
 
diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
index a459be5fe1c2..41e8071dff87 100644
--- a/include/net/pkt_cls.h
+++ b/include/net/pkt_cls.h
@@ -486,4 +486,18 @@ struct tc_cls_matchall_offload {
unsigned long cookie;
 };
 
+enum tc_clsbpf_command {
+   TC_CLSBPF_ADD,
+   TC_CLSBPF_REPLACE,
+   TC_CLSBPF_DESTROY,
+};
+
+struct tc_cls_bpf_offload {
+   enum tc_clsbpf_command command;
+   struct tcf_exts *exts;
+   struct bpf_prog *prog;
+   const char *name;
+   bool exts_integrated;
+};
+
 #endif
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index 4742f415ee5b..c3983493aeab 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -39,6 +39,7 @@ struct cls_bpf_prog {
struct list_head link;
struct tcf_result res;
bool exts_integrated;
+   bool offloaded;
struct tcf_exts exts;
u32 handle;
union {
@@ -140,6 +141,71 @@ static bool cls_bpf_is_ebpf(const struct cls_bpf_prog 
*prog)
return !prog->bpf_ops;
 }
 
+static int cls_bpf_offload_cmd(struct tcf_proto *tp, struct cls_bpf_prog *prog,
+  enum tc_clsbpf_command cmd)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct tc_cls_bpf_offload bpf_offload = {};
+   struct tc_to_netdev offload;
+
+   offload.type = TC_SETUP_CLSBPF;
+   offload.cls_bpf = &bpf_offload;
+
+   bpf_offload.command = cmd;
+   bpf_offload.exts = &prog->exts;
+   bpf_offload.prog = prog->filter;
+   bpf_offload.name = prog->bpf_name;
+   bpf_offload.exts_integrated = prog->exts_integrated;
+
+   return dev->netdev_ops->ndo_setup_tc(dev, tp->q->handle,
+tp->protocol, &offload);
+}
+
+static void cls_bpf_offload(struct tcf_proto *tp, struct cls_bpf_prog *prog,
+   struct cls_bpf_prog *oldprog)
+{
+   struct net_device *dev = tp->q->dev_queue->dev;
+   struct cls_bpf_prog *obj = prog;
+   enum tc_clsbpf_command cmd;
+
+   if (oldprog && oldprog->offloaded) {
+   if (tc_should_offload(dev, tp, 0)) {
+   cmd = TC_CLSBPF_REPLACE;
+   } else {
+   obj = oldprog;
+   cmd = TC_CLSBPF_DESTROY;
+   }
+   } else {
+   if (!tc_should_offload(dev, tp, 0))
+   return;
+   cmd = TC_CLSBPF_ADD;
+   }
+
+   if (cls_bpf_offload_cmd(tp, obj, cmd))
+   return;
+
+   obj->offloaded = true;
+   if (oldprog)
+   oldprog->offloaded = false;
+}
+
+static void cls_bpf_stop_offload(struct tcf_proto *tp,
+struct cls_bpf_prog *prog)
+{
+   int err;
+
+   if (!prog->offloaded)
+   return;
+
+   err = cls_bpf_offload_cmd(tp, prog, TC_CLSBPF_DESTROY);
+   if (err) {
+   pr_err("Stopping hardware offload failed: %d\n", err);
+   return;
+   }
+
+   prog->offloaded = false;
+}
+
 static int cls_bpf_init(struct tcf_proto *tp)
 {
struct cls_bpf_head *head;
@@ -179,6 +245,7 @@ static int cls_bpf_delete(struct tcf_proto *tp, unsigned 
long arg)
 {
struct cls_bpf_prog *prog = (struct cls_bpf_prog *) arg;
 
+   cls_bpf_stop_offload(tp, prog);
list_del_rcu(&prog->link);
tcf_unbind_filter(tp, &prog->res);
call_rcu(&prog->rcu, __cls_bpf_delete_prog);
@@ -195,6 +262,7 @@ static bool cls_bpf_destroy(struct tcf_proto *tp, bool 
force)
return false;
 
list_for_each_entry_safe(prog, tmp, &head->plist, link) {
+   cls_bpf_stop_offload(tp, prog);
list_del_rcu(&prog->link);
tcf_unbind_filter(tp, &prog->res);
call_rcu(&prog->rcu, __cls_bpf_delete_prog);
@@ -416,6 +

[PATCHv4 net-next 00/15] BPF hardware offload (cls_bpf for now)

2016-09-15 Thread Jakub Kicinski
Hi!

Dave, this set depends on bitfield.h which is sitting in the
pull request from Kalle.  I'm expecting buildbot to complain
about patch 8, please pull wireless-drivers-next before applying.

v4:
 - rename parser -> analyzer;
 - reorganize the analyzer patches a bit;
 - use bitfield.h directly.

--- merge blurb:
In the last year a lot of progress have been made on offloading
simpler TC classifiers.  There is also growing interest in using
BPF for generic high-speed packet processing in the kernel.
It seems beneficial to tie those two trends together and think
about hardware offloads of BPF programs.  This patch set presents
such offload to Netronome smart NICs.  cls_bpf is extended with
hardware offload capabilities and NFP driver gets a JIT translator
which in presence of capable firmware can be used to offload
the BPF program onto the card.

BPF JIT implementation is not 100% complete (e.g. missing instructions)
but it is functional.  Encouragingly it should be possible to
offload most (if not all) advanced BPF features onto the NIC - 
including packet modification, maps, tunnel encap/decap etc.

Example of basic tests I used:
  __section_cls_entry
  int cls_entry(struct __sk_buff *skb)
  {
if (load_byte(skb, 0) != 0x0)
return 0;

if (load_byte(skb, 4) != 0x1)
return 0;

skb->mark = 0xcafe;

if (load_byte(skb, 50) != 0xff)
return 0;

return ~0U;
  }

Above code can be compiled with Clang and loaded like this:
# ethtool -K p1p1 hw-tc-offload on
# tc qdisc add dev p1p1 ingress
# tc filter add dev p1p1 parent :  bpf obj prog.o action drop

This set implements the basic transparent offload, the skip_{sw,hw}
flags and reporting statistics for cls_bpf.

Jakub Kicinski (15):
  net: cls_bpf: add hardware offload
  net: cls_bpf: limit hardware offload by software-only flag
  net: cls_bpf: add support for marking filters as hardware-only
  bpf: don't (ab)use instructions to store state
  bpf: expose internal verfier structures
  bpf: enable non-core use of the verfier
  bpf: recognize 64bit immediate loads as consts
  nfp: add BPF to NFP code translator
  nfp: bpf: add hardware bpf offload
  net: cls_bpf: allow offloaded filters to update stats
  net: bpf: allow offloaded filters to update stats
  nfp: bpf: add packet marking support
  net: act_mirred: allow statistic updates from offloaded actions
  nfp: bpf: add support for legacy redirect action
  nfp: bpf: add offload of TC direct action mode

 drivers/net/ethernet/netronome/nfp/Makefile|7 +
 drivers/net/ethernet/netronome/nfp/nfp_asm.h   |  233 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h   |  212 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c   | 1816 
 .../net/ethernet/netronome/nfp/nfp_bpf_verifier.c  |  160 ++
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |   47 +-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  134 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |   51 +-
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   |   12 +
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   |  291 
 .../net/ethernet/netronome/nfp/nfp_netvf_main.c|2 +-
 include/linux/bpf_verifier.h   |   89 +
 include/linux/netdevice.h  |2 +
 include/net/pkt_cls.h  |   16 +
 include/uapi/linux/pkt_cls.h   |1 +
 kernel/bpf/verifier.c  |  384 +++--
 net/sched/act_mirred.c |8 +
 net/sched/cls_bpf.c|  117 +-
 18 files changed, 3376 insertions(+), 206 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_asm.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_verifier.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c
 create mode 100644 include/linux/bpf_verifier.h

-- 
1.9.1



[PATCHv4 net-next 13/15] net: act_mirred: allow statistic updates from offloaded actions

2016-09-15 Thread Jakub Kicinski
Implement .stats_update() callback.  The implementation
is generic and can be reused by other simple actions if
needed.

Signed-off-by: Jakub Kicinski 
---
 net/sched/act_mirred.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 6038c85d92f5..f9862d89cb93 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -204,6 +204,13 @@ static int tcf_mirred(struct sk_buff *skb, const struct 
tc_action *a,
return retval;
 }
 
+static void tcf_stats_update(struct tc_action *a, u64 bytes, u32 packets,
+u64 lastuse)
+{
+   tcf_lastuse_update(&a->tcfa_tm);
+   _bstats_cpu_update(this_cpu_ptr(a->cpu_bstats), bytes, packets);
+}
+
 static int tcf_mirred_dump(struct sk_buff *skb, struct tc_action *a, int bind, 
int ref)
 {
unsigned char *b = skb_tail_pointer(skb);
@@ -280,6 +287,7 @@ static struct tc_action_ops act_mirred_ops = {
.type   =   TCA_ACT_MIRRED,
.owner  =   THIS_MODULE,
.act=   tcf_mirred,
+   .stats_update   =   tcf_stats_update,
.dump   =   tcf_mirred_dump,
.cleanup=   tcf_mirred_release,
.init   =   tcf_mirred_init,
-- 
1.9.1



[PATCHv4 net-next 08/15] nfp: add BPF to NFP code translator

2016-09-15 Thread Jakub Kicinski
Add translator for JITing eBPF to operations which
can be executed on NFP's programmable engines.

Signed-off-by: Jakub Kicinski 
---
v4:
 - use bitfield.h directly.
v3:
 - don't clone the program for the verifier (no longer needed);
 - temporarily add a local copy of macros from bitfield.h.

NOTE: this one will probably trigger buildbot failures because
  it depends on pull request from wireless-drivers-next.
---
 drivers/net/ethernet/netronome/nfp/Makefile|6 +
 drivers/net/ethernet/netronome/nfp/nfp_asm.h   |  233 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf.h   |  208 +++
 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c   | 1729 
 .../net/ethernet/netronome/nfp/nfp_bpf_verifier.c  |  151 ++
 5 files changed, 2327 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_asm.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf.h
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_jit.c
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_bpf_verifier.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 68178819ff12..5f12689bf523 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -5,4 +5,10 @@ nfp_netvf-objs := \
nfp_net_ethtool.o \
nfp_netvf_main.o
 
+ifeq ($(CONFIG_BPF_SYSCALL),y)
+nfp_netvf-objs += \
+   nfp_bpf_verifier.o \
+   nfp_bpf_jit.o
+endif
+
 nfp_netvf-$(CONFIG_NFP_NET_DEBUG) += nfp_net_debugfs.o
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_asm.h 
b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
new file mode 100644
index ..22484b6fd3e8
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_asm.h
@@ -0,0 +1,233 @@
+/*
+ * Copyright (C) 2016 Netronome Systems, Inc.
+ *
+ * This software is dual licensed under the GNU General License Version 2,
+ * June 1991 as shown in the file COPYING in the top-level directory of this
+ * source tree or the BSD 2-Clause License provided below.  You have the
+ * option to license this software under the complete terms of either license.
+ *
+ * The BSD 2-Clause License:
+ *
+ * Redistribution and use in source and binary forms, with or
+ * without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ *  1. Redistributions of source code must retain the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer.
+ *
+ *  2. Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials
+ * provided with the distribution.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+ * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+ * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS
+ * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+ * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+ * SOFTWARE.
+ */
+
+#ifndef __NFP_ASM_H__
+#define __NFP_ASM_H__ 1
+
+#include "nfp_bpf.h"
+
+#define REG_NONE   0
+
+#define RE_REG_NO_DST  0x020
+#define RE_REG_IMM 0x020
+#define RE_REG_IMM_encode(x)   \
+   (RE_REG_IMM | ((x) & 0x1f) | (((x) & 0x60) << 1))
+#define RE_REG_IMM_MAX  0x07fULL
+#define RE_REG_XFR 0x080
+
+#define UR_REG_XFR 0x180
+#define UR_REG_NN  0x280
+#define UR_REG_NO_DST  0x300
+#define UR_REG_IMM UR_REG_NO_DST
+#define UR_REG_IMM_encode(x) (UR_REG_IMM | (x))
+#define UR_REG_IMM_MAX  0x0ffULL
+
+#define OP_BR_BASE 0x0d80020ULL
+#define OP_BR_BASE_MASK0x0f8000c3ce0ULL
+#define OP_BR_MASK 0x01fULL
+#define OP_BR_EV_PIP   0x300ULL
+#define OP_BR_CSS  0x003c000ULL
+#define OP_BR_DEFBR0x030ULL
+#define OP_BR_ADDR_LO  0x007ffc0ULL
+#define OP_BR_ADDR_HI  0x100ULL
+
+#define nfp_is_br(_insn)   \
+   (((_insn) & OP_BR_BASE_MASK) == OP_BR_BASE)
+
+enum br_mask {
+   BR_BEQ = 0x00,
+   BR_BNE = 0x01,
+   BR_BHS = 0x04,
+   BR_BLO = 0x05,
+   BR_BGE = 0x08,
+   BR_UNC = 0x18,
+};
+
+enum br_ev_pip {
+   BR_EV_PIP_UNCOND = 0,
+   BR_EV_PIP_COND = 1,
+};
+
+enum br_ctx_signal_state {
+   BR_CSS_NONE = 2,
+};
+
+#define OP_BBYTE_BASE  0x0c8ULL
+#define OP_BB_A_SRC0x0ffULL
+#define OP_BB_BYTE 0x300ULL
+#define OP_BB_B_SRC0x003fc00ULL
+#define OP_BB_I8   0x004ULL
+#define OP_BB_EQ   0x008ULL
+#define OP_BB_DEFBR0x030ULL
+#define OP_BB_ADDR_LO  0x007ffc0ULL
+#define OP_BB_ADDR_HI  0x100

[PATCHv4 net-next 06/15] bpf: enable non-core use of the verfier

2016-09-15 Thread Jakub Kicinski
Advanced JIT compilers and translators may want to use
eBPF verifier as a base for parsers or to perform custom
checks and validations.

Add ability for external users to invoke the verifier
and provide callbacks to be invoked for every intruction
checked.  For now only add most basic callback for
per-instruction pre-interpretation checks is added.  More
advanced users may also like to have per-instruction post
callback and state comparison callback.

Signed-off-by: Jakub Kicinski 
---
v4:
 - separate from the header split patch.
---
 include/linux/bpf_verifier.h | 11 +++
 kernel/bpf/verifier.c| 68 
 2 files changed, 79 insertions(+)

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 1c0511ef7eaf..e3de907d5bf6 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -59,6 +59,12 @@ struct bpf_insn_aux_data {
 
 #define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */
 
+struct bpf_verifier_env;
+struct bpf_ext_analyzer_ops {
+   int (*insn_hook)(struct bpf_verifier_env *env,
+int insn_idx, int prev_insn_idx);
+};
+
 /* single container for all structs
  * one verifier_env per bpf_check() call
  */
@@ -68,6 +74,8 @@ struct bpf_verifier_env {
int stack_size; /* number of states to be processed */
struct bpf_verifier_state cur_state; /* current verifier state */
struct bpf_verifier_state_list **explored_states; /* search pruning 
optimization */
+   const struct bpf_ext_analyzer_ops *analyzer_ops; /* external analyzer 
ops */
+   void *analyzer_priv; /* pointer to external analyzer's private data */
struct bpf_map *used_maps[MAX_USED_MAPS]; /* array of map's used by 
eBPF program */
u32 used_map_cnt;   /* number of used maps */
u32 id_gen; /* used to generate unique reg IDs */
@@ -75,4 +83,7 @@ struct bpf_verifier_env {
struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
 };
 
+int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops,
+void *priv);
+
 #endif /* _LINUX_BPF_ANALYZER_H */
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6e126a417290..d93e78331b90 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -624,6 +624,10 @@ static int check_packet_access(struct bpf_verifier_env 
*env, u32 regno, int off,
 static int check_ctx_access(struct bpf_verifier_env *env, int off, int size,
enum bpf_access_type t, enum bpf_reg_type *reg_type)
 {
+   /* for analyzer ctx accesses are already validated and converted */
+   if (env->analyzer_ops)
+   return 0;
+
if (env->prog->aux->ops->is_valid_access &&
env->prog->aux->ops->is_valid_access(off, size, t, reg_type)) {
/* remember the offset of last byte accessed in ctx */
@@ -,6 +2226,15 @@ static int is_state_visited(struct bpf_verifier_env 
*env, int insn_idx)
return 0;
 }
 
+static int ext_analyzer_insn_hook(struct bpf_verifier_env *env,
+ int insn_idx, int prev_insn_idx)
+{
+   if (!env->analyzer_ops || !env->analyzer_ops->insn_hook)
+   return 0;
+
+   return env->analyzer_ops->insn_hook(env, insn_idx, prev_insn_idx);
+}
+
 static int do_check(struct bpf_verifier_env *env)
 {
struct bpf_verifier_state *state = &env->cur_state;
@@ -2280,6 +2293,10 @@ static int do_check(struct bpf_verifier_env *env)
print_bpf_insn(insn);
}
 
+   err = ext_analyzer_insn_hook(env, insn_idx, prev_insn_idx);
+   if (err)
+   return err;
+
if (class == BPF_ALU || class == BPF_ALU64) {
err = check_alu_op(env, insn);
if (err)
@@ -2829,3 +2846,54 @@ int bpf_check(struct bpf_prog **prog, union bpf_attr 
*attr)
kfree(env);
return ret;
 }
+
+int bpf_analyzer(struct bpf_prog *prog, const struct bpf_ext_analyzer_ops *ops,
+void *priv)
+{
+   struct bpf_verifier_env *env;
+   int ret;
+
+   env = kzalloc(sizeof(struct bpf_verifier_env), GFP_KERNEL);
+   if (!env)
+   return -ENOMEM;
+
+   env->insn_aux_data = vzalloc(sizeof(struct bpf_insn_aux_data) *
+prog->len);
+   ret = -ENOMEM;
+   if (!env->insn_aux_data)
+   goto err_free_env;
+   env->prog = prog;
+   env->analyzer_ops = ops;
+   env->analyzer_priv = priv;
+
+   /* grab the mutex to protect few globals used by verifier */
+   mutex_lock(&bpf_verifier_lock);
+
+   log_level = 0;
+
+   env->explored_states = kcalloc(env->prog->len,
+  sizeof(struct bpf_verifier_state_list *),
+ 

[PATCHv4 net-next 11/15] nfp: bpf: allow offloaded filters to update stats

2016-09-15 Thread Jakub Kicinski
Periodically poll stats and call into offloaded actions
to update them.

Signed-off-by: Jakub Kicinski 
---
v3:
 - add missing hunk with ethtool stats.
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h   | 19 +++
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  3 ++
 .../net/ethernet/netronome/nfp/nfp_net_ethtool.c   | 12 +
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   | 63 ++
 4 files changed, 97 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index ea6f5e667f27..13c6a9001b4d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -62,6 +62,9 @@
 /* Max time to wait for NFP to respond on updates (in seconds) */
 #define NFP_NET_POLL_TIMEOUT   5
 
+/* Interval for reading offloaded filter stats */
+#define NFP_NET_STAT_POLL_IVL  msecs_to_jiffies(100)
+
 /* Bar allocation */
 #define NFP_NET_CTRL_BAR   0
 #define NFP_NET_Q0_BAR 2
@@ -405,6 +408,11 @@ static inline bool nfp_net_fw_ver_eq(struct 
nfp_net_fw_version *fw_ver,
   fw_ver->minor == minor;
 }
 
+struct nfp_stat_pair {
+   u64 pkts;
+   u64 bytes;
+};
+
 /**
  * struct nfp_net - NFP network device structure
  * @pdev:   Backpointer to PCI device
@@ -428,6 +436,11 @@ static inline bool nfp_net_fw_ver_eq(struct 
nfp_net_fw_version *fw_ver,
  * @rss_cfg:RSS configuration
  * @rss_key:RSS secret key
  * @rss_itbl:   RSS indirection table
+ * @rx_filter: Filter offload statistics - dropped packets/bytes
+ * @rx_filter_prev:Filter offload statistics - values from previous update
+ * @rx_filter_change:  Jiffies when statistics last changed
+ * @rx_filter_stats_timer:  Timer for polling filter offload statistics
+ * @rx_filter_lock:Lock protecting timer state changes (teardown)
  * @max_tx_rings:   Maximum number of TX rings supported by the Firmware
  * @max_rx_rings:   Maximum number of RX rings supported by the Firmware
  * @num_tx_rings:   Currently configured number of TX rings
@@ -504,6 +517,11 @@ struct nfp_net {
u8 rss_key[NFP_NET_CFG_RSS_KEY_SZ];
u8 rss_itbl[NFP_NET_CFG_RSS_ITBL_SZ];
 
+   struct nfp_stat_pair rx_filter, rx_filter_prev;
+   unsigned long rx_filter_change;
+   struct timer_list rx_filter_stats_timer;
+   spinlock_t rx_filter_lock;
+
int max_tx_rings;
int max_rx_rings;
 
@@ -775,6 +793,7 @@ static inline void nfp_net_debugfs_adapter_del(struct 
nfp_net *nn)
 }
 #endif /* CONFIG_NFP_NET_DEBUG */
 
+void nfp_net_filter_stats_timer(unsigned long data);
 int
 nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
struct tc_cls_bpf_offload *cls_bpf);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 51978dfe883b..f091eb758ca2 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -2703,10 +2703,13 @@ struct nfp_net *nfp_net_netdev_alloc(struct pci_dev 
*pdev,
nn->rxd_cnt = NFP_NET_RX_DESCS_DEFAULT;
 
spin_lock_init(&nn->reconfig_lock);
+   spin_lock_init(&nn->rx_filter_lock);
spin_lock_init(&nn->link_status_lock);
 
setup_timer(&nn->reconfig_timer,
nfp_net_reconfig_timer, (unsigned long)nn);
+   setup_timer(&nn->rx_filter_stats_timer,
+   nfp_net_filter_stats_timer, (unsigned long)nn);
 
return nn;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
index 4c9897220969..3418f2277e9d 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_ethtool.c
@@ -106,6 +106,18 @@ static const struct _nfp_net_et_stats nfp_net_et_stats[] = 
{
{"dev_tx_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_TX_FRAMES)},
{"dev_tx_mc_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_TX_MC_FRAMES)},
{"dev_tx_bc_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_TX_BC_FRAMES)},
+
+   {"bpf_pass_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP0_FRAMES)},
+   {"bpf_pass_bytes", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP0_BYTES)},
+   /* see comments in outro functions in nfp_bpf_jit.c to find out
+* how different BPF modes use app-specific counters
+*/
+   {"bpf_app1_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP1_FRAMES)},
+   {"bpf_app1_bytes", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP1_BYTES)},
+   {"bpf_app2_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP2_FRAMES)},
+   {"bpf_app2_bytes", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP2_BYTES)},
+   {"bpf_app3_pkts", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP3_FRAMES)},
+   {"bpf_app3_bytes", NN_ET_DEV_STAT(NFP_NET_CFG_STATS_APP3_BYTES)},
 };
 
 #define NN_ET_GLOBAL_STATS_LEN ARRAY_SIZE(nfp_net_et_stat

[PATCHv4 net-next 09/15] nfp: bpf: add hardware bpf offload

2016-09-15 Thread Jakub Kicinski
Add hardware bpf offload on our smart NICs.  Detect if
capable firmware is loaded and use it to load the code JITed
with just added translator onto programmable engines.

This commit only supports offloading cls_bpf in legacy mode
(non-direct action).

Signed-off-by: Jakub Kicinski 
---
 drivers/net/ethernet/netronome/nfp/Makefile|   1 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h   |  26 ++-
 .../net/ethernet/netronome/nfp/nfp_net_common.c|  40 +++-
 drivers/net/ethernet/netronome/nfp/nfp_net_ctrl.h  |  44 -
 .../net/ethernet/netronome/nfp/nfp_net_offload.c   | 220 +
 5 files changed, 324 insertions(+), 7 deletions(-)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_net_offload.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile 
b/drivers/net/ethernet/netronome/nfp/Makefile
index 5f12689bf523..0efb2ba9a558 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -3,6 +3,7 @@ obj-$(CONFIG_NFP_NETVF) += nfp_netvf.o
 nfp_netvf-objs := \
nfp_net_common.o \
nfp_net_ethtool.o \
+   nfp_net_offload.o \
nfp_netvf_main.o
 
 ifeq ($(CONFIG_BPF_SYSCALL),y)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h 
b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 690635660195..ea6f5e667f27 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -220,7 +220,7 @@ struct nfp_net_tx_ring {
 #define PCIE_DESC_RX_I_TCP_CSUM_OK cpu_to_le16(BIT(11))
 #define PCIE_DESC_RX_I_UDP_CSUMcpu_to_le16(BIT(10))
 #define PCIE_DESC_RX_I_UDP_CSUM_OK cpu_to_le16(BIT(9))
-#define PCIE_DESC_RX_SPARE cpu_to_le16(BIT(8))
+#define PCIE_DESC_RX_BPF   cpu_to_le16(BIT(8))
 #define PCIE_DESC_RX_EOP   cpu_to_le16(BIT(7))
 #define PCIE_DESC_RX_IP4_CSUM  cpu_to_le16(BIT(6))
 #define PCIE_DESC_RX_IP4_CSUM_OK   cpu_to_le16(BIT(5))
@@ -413,6 +413,7 @@ static inline bool nfp_net_fw_ver_eq(struct 
nfp_net_fw_version *fw_ver,
  * @is_vf:  Is the driver attached to a VF?
  * @is_nfp3200: Is the driver for a NFP-3200 card?
  * @fw_loaded:  Is the firmware loaded?
+ * @bpf_offload_skip_sw:  Offloaded BPF program will not be rerun by cls_bpf
  * @ctrl:   Local copy of the control register/word.
  * @fl_bufsz:   Currently configured size of the freelist buffers
  * @rx_offset: Offset in the RX buffers where packet data starts
@@ -473,6 +474,7 @@ struct nfp_net {
unsigned is_vf:1;
unsigned is_nfp3200:1;
unsigned fw_loaded:1;
+   unsigned bpf_offload_skip_sw:1;
 
u32 ctrl;
u32 fl_bufsz;
@@ -561,12 +563,28 @@ struct nfp_net {
 /* Functions to read/write from/to a BAR
  * Performs any endian conversion necessary.
  */
+static inline u16 nn_readb(struct nfp_net *nn, int off)
+{
+   return readb(nn->ctrl_bar + off);
+}
+
 static inline void nn_writeb(struct nfp_net *nn, int off, u8 val)
 {
writeb(val, nn->ctrl_bar + off);
 }
 
-/* NFP-3200 can't handle 16-bit accesses too well - hence no readw/writew */
+/* NFP-3200 can't handle 16-bit accesses too well */
+static inline u16 nn_readw(struct nfp_net *nn, int off)
+{
+   WARN_ON_ONCE(nn->is_nfp3200);
+   return readw(nn->ctrl_bar + off);
+}
+
+static inline void nn_writew(struct nfp_net *nn, int off, u16 val)
+{
+   WARN_ON_ONCE(nn->is_nfp3200);
+   writew(val, nn->ctrl_bar + off);
+}
 
 static inline u32 nn_readl(struct nfp_net *nn, int off)
 {
@@ -757,4 +775,8 @@ static inline void nfp_net_debugfs_adapter_del(struct 
nfp_net *nn)
 }
 #endif /* CONFIG_NFP_NET_DEBUG */
 
+int
+nfp_net_bpf_offload(struct nfp_net *nn, u32 handle, __be16 proto,
+   struct tc_cls_bpf_offload *cls_bpf);
+
 #endif /* _NFP_NET_H_ */
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c 
b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 252e4924de0f..51978dfe883b 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -60,6 +60,7 @@
 
 #include 
 
+#include 
 #include 
 
 #include "nfp_net_ctrl.h"
@@ -2382,6 +2383,31 @@ static struct rtnl_link_stats64 *nfp_net_stat64(struct 
net_device *netdev,
return stats;
 }
 
+static bool nfp_net_ebpf_capable(struct nfp_net *nn)
+{
+   if (nn->cap & NFP_NET_CFG_CTRL_BPF &&
+   nn_readb(nn, NFP_NET_CFG_BPF_ABI) == NFP_NET_BPF_ABI)
+   return true;
+   return false;
+}
+
+static int
+nfp_net_setup_tc(struct net_device *netdev, u32 handle, __be16 proto,
+struct tc_to_netdev *tc)
+{
+   struct nfp_net *nn = netdev_priv(netdev);
+
+   if (TC_H_MAJ(handle) != TC_H_MAJ(TC_H_INGRESS))
+   return -ENOTSUPP;
+   if (proto != htons(ETH_P_ALL))
+   return -ENOTSUPP;
+
+   if (tc->type == TC_SETUP_CLSBPF && nfp_ne

  1   2   >