Re: [pull request][for-next 00/12] Mellanox, mlx5 IPoIB Muli Pkey support 2017-10-11,Re: [pull request][for-next 00/12] Mellanox, mlx5 IPoIB Muli Pkey support 2017-10-11

2017-10-15 Thread David Miller
From: Doug Ledford 
Date: Sat, 14 Oct 2017 17:19:34 -0400

> On 10/14/2017 2:48 PM, Saeed Mahameed wrote:
>> Hi Dave and Doug,
>> 
>> This series includes updates for mlx5 IPoIB offloading driver from Alex
>> and Feras to add the support for Muli Pkey in the mlx5i ipoib offloading 
>> netdev,
>> to be merged into net-next and rdma-next trees.
> 
> As far as the two IPoIB patches are concerned, they're fine.
> 
>> Doug, I am sorry I couldn't base this on rc2 since the series needs and 
>> conflicts
>> with a fix that was submitted to rc3, so to keep things simple I based it on 
>> rc4,
>> I hope this is ok with you..
> 
> No worries, it just means I have to submit it under another branch.  But
> I'm already holding one patch series in a stand alone branch, so no big
> deal.  And, actually, the IPoIB changes are so small they can simply go
> through Dave's tree if you don't have any dependent code in the IPoIB
> driver to submit after this but still in this devel cycle.
> 
>> Please pull and let me know if there's any problem.
> 
> Once I hear that Dave is OK with the net changes, I'm ready to pull (if
> I need to).

They look fine and I've pulled this into net-next.

Thanks.


RE: [PATCH v2 2/5] dpaa_eth: move of_phy_connect() to the eth driver

2017-10-15 Thread Madalin-cristian Bucur
> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch]
> Sent: Sunday, October 15, 2017 9:34 PM
> To: Madalin-cristian Bucur 
> Cc: netdev@vger.kernel.org; da...@davemloft.net; f.faine...@gmail.com;
> vivien.dide...@savoirfairelinux.com; jun...@outlook.com; linux-
> ker...@vger.kernel.org
> Subject: Re: [PATCH v2 2/5] dpaa_eth: move of_phy_connect() to the eth
> driver
> 
> On Fri, Oct 13, 2017 at 05:50:09PM +0300, Madalin Bucur wrote:
> > Signed-off-by: Madalin Bucur 
> > ---
> >  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 48 +++--
> >  drivers/net/ethernet/freescale/fman/mac.c  | 97 ++-
> ---
> >  drivers/net/ethernet/freescale/fman/mac.h  |  5 +-
> >  3 files changed, 66 insertions(+), 84 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> > index 4225806..7cf61d6 100644
> > --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> > +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> > @@ -2435,6 +2435,48 @@ static void dpaa_eth_napi_disable(struct
> dpaa_priv *priv)
> > }
> >  }
> >
> > +static void dpaa_adjust_link(struct net_device *net_dev)
> > +{
> > +   struct mac_device *mac_dev;
> > +   struct dpaa_priv *priv;
> > +
> > +   priv = netdev_priv(net_dev);
> > +   mac_dev = priv->mac_dev;
> > +   mac_dev->adjust_link(mac_dev);
> > +}
> > +
> > +static int dpaa_phy_init(struct net_device *net_dev)
> > +{
> > +   struct mac_device *mac_dev;
> > +   struct phy_device *phy_dev;
> > +   struct dpaa_priv *priv;
> > +
> > +   priv = netdev_priv(net_dev);
> > +   mac_dev = priv->mac_dev;
> > +
> > +   phy_dev = of_phy_connect(net_dev, mac_dev->phy_node,
> > +_adjust_link, 0,
> > +mac_dev->phy_if);
> > +   if (!phy_dev) {
> > +   netif_err(priv, ifup, net_dev, "init_phy() failed\n");
> > +   return -ENODEV;
> > +   }
> > +
> > +   /* Remove any features not supported by the controller */
> > +   phy_dev->supported &= mac_dev->if_support;
> > +
> > +   /* Enable the symmetric and asymmetric PAUSE frame advertisements,
> > +* as most of the PHY drivers do not enable them by default.
> > +*/
> 
> Hi Madalin
> 
> This is just moving code around, so the patch is O.K. However, it
> would be nice to have a followup patch. This comment is wrong. The phy
> driver should never enable symmetric and asymmetric PAUSE frames. The
> MAC needs to, because only the MAC knows if the MAC supports pause
> frames.
> 
>   Andrew

Hi Andrew,

This is obsolete and it will be removed, I'll send a v3. It remained there from
a time it used to be valid (the original DPAA Ethernet driver was developed
and maintained out of tree since about 9 years ago). I see this thread on the
subject which is relatively recent:

https://www.spinics.net/lists/netdev/msg404288.html

Thanks,
Madalin


Re: [PATCH 3/3][v2] selftests: silence test output by default

2017-10-15 Thread Michael Ellerman
Shuah Khan  writes:

> On 09/19/2017 07:51 AM, jo...@toxicpanda.com wrote:
>> From: Josef Bacik 
>> 
>> Some of the networking tests are very noisy and make it impossible to
>> see if we actually passed the tests as they run.  Default to suppressing
>> the output from any tests run in order to make it easier to track what
>> failed.
>> 
>> Signed-off-by: Josef Bacik 
>> ---
>> v1->v2:
>> - dump output into /tmp/testname instead of /dev/null
>> 
>
> Thanks for the fix. Applied to linux-kselftest for 4.14-rc2

Sorry this is not a fix.

This is a regression, it breaks all my test infrastructure, because we
use the output of the test case.

Can we please revert this and fix any tests that are overly verbose.

cheers


RE: [PATCH net v2 2/2] net: fec: Let fec_ptp have its own interrupt routine

2017-10-15 Thread Andy Duan
From: Troy Kisky  Sent: Saturday, October 14, 
2017 10:10 AM
>This is better for code locality and should slightly speed up normal 
>interrupts.
>
>This also allows PPS clock output to start working for i.mx7. This is because
>i.mx7 was already using the limit of 3 interrupts, and needed another.
>
>Signed-off-by: Troy Kisky 
>
>---
>
>v2: made this change independent of any devicetree change so that old dtbs
>continue to work.
>
>Continue to register ptp clock if interrupt is not found.
>---
> drivers/net/ethernet/freescale/fec.h  |  3 +-
> drivers/net/ethernet/freescale/fec_main.c | 25 ++
>drivers/net/ethernet/freescale/fec_ptp.c  | 82 ++
>-
> 3 files changed, 65 insertions(+), 45 deletions(-)
>
>diff --git a/drivers/net/ethernet/freescale/fec.h
>b/drivers/net/ethernet/freescale/fec.h
>index ede1876a9a19..be56ac1f1ac4 100644
>--- a/drivers/net/ethernet/freescale/fec.h
>+++ b/drivers/net/ethernet/freescale/fec.h
>@@ -582,12 +582,11 @@ struct fec_enet_private {
>   u64 ethtool_stats[0];
> };
>
>-void fec_ptp_init(struct platform_device *pdev);
>+void fec_ptp_init(struct platform_device *pdev, int irq_index);
> void fec_ptp_stop(struct platform_device *pdev);  void
>fec_ptp_start_cyclecounter(struct net_device *ndev);  int fec_ptp_set(struct
>net_device *ndev, struct ifreq *ifr);  int fec_ptp_get(struct net_device *ndev,
>struct ifreq *ifr); -uint fec_ptp_check_pps_event(struct fec_enet_private
>*fep);
>
>
>/**
>**/
> #endif /* FEC_H */
>diff --git a/drivers/net/ethernet/freescale/fec_main.c
>b/drivers/net/ethernet/freescale/fec_main.c
>index 3dc2d771a222..21afabbc560f 100644
>--- a/drivers/net/ethernet/freescale/fec_main.c
>+++ b/drivers/net/ethernet/freescale/fec_main.c
>@@ -1602,10 +1602,6 @@ fec_enet_interrupt(int irq, void *dev_id)
>   ret = IRQ_HANDLED;
>   complete(>mdio_done);
>   }
>-
>-  if (fep->ptp_clock)
>-  if (fec_ptp_check_pps_event(fep))
>-  ret = IRQ_HANDLED;
>   return ret;
> }
>
>@@ -3325,6 +3321,8 @@ fec_probe(struct platform_device *pdev)
>   struct device_node *np = pdev->dev.of_node, *phy_node;
>   int num_tx_qs;
>   int num_rx_qs;
>+  char irq_name[8];
>+  int irq_cnt;
>
>   fec_enet_get_queue_num(pdev, _tx_qs, _rx_qs);
>
>@@ -3465,18 +3463,27 @@ fec_probe(struct platform_device *pdev)
>   if (ret)
>   goto failed_reset;
>
>+  irq_cnt = platform_irq_count(pdev);
>+  if (irq_cnt > FEC_IRQ_NUM)
>+  irq_cnt = FEC_IRQ_NUM;  /* last for ptp */
>+  else if (irq_cnt == 2)
>+  irq_cnt = 1;/* last for ptp */
>+  else if (irq_cnt <= 0)
>+  irq_cnt = 1;/* Let the for loop fail */

Don't do like this. Don't suppose pps interrupt is the last one.
And if irq_cnt is 1 like imx28/imx5x,  the patch will break fec interrupt 
function.

I suggest to use .platform_get_irq_byname() to get pps(ptp) interrupt like your 
v1 logic check.

>+
>   if (fep->bufdesc_ex)
>-  fec_ptp_init(pdev);
>+  fec_ptp_init(pdev, irq_cnt);
>
>   ret = fec_enet_init(ndev);
>   if (ret)
>   goto failed_init;
>
>-  for (i = 0; i < FEC_IRQ_NUM; i++) {
>-  irq = platform_get_irq(pdev, i);
>+  for (i = 0; i < irq_cnt; i++) {
>+  sprintf(irq_name, "int%d", i);
>+  irq = platform_get_irq_byname(pdev, irq_name);
>+  if (irq < 0)
>+  irq = platform_get_irq(pdev, i);
>   if (irq < 0) {
>-  if (i)
>-  break;
>   ret = irq;
>   goto failed_irq;
>   }
>diff --git a/drivers/net/ethernet/freescale/fec_ptp.c
>b/drivers/net/ethernet/freescale/fec_ptp.c
>index 6ebad3fac81d..3abeee0d16dd 100644
>--- a/drivers/net/ethernet/freescale/fec_ptp.c
>+++ b/drivers/net/ethernet/freescale/fec_ptp.c
>@@ -549,6 +549,37 @@ static void fec_time_keep(struct work_struct *work)
>   schedule_delayed_work(>time_keep, HZ);  }
>
>+/* This function checks the pps event and reloads the timer compare
>+counter. */ static irqreturn_t fec_ptp_interrupt(int irq, void *dev_id)
>+{
>+  struct net_device *ndev = dev_id;
>+  struct fec_enet_private *fep = netdev_priv(ndev);
>+  u32 val;
>+  u8 channel = fep->pps_channel;
>+  struct ptp_clock_event event;
>+
>+  val = readl(fep->hwp + FEC_TCSR(channel));
>+  if (val & FEC_T_TF_MASK) {
>+  /* Write the next next compare(not the next according the
>spec)
>+   * value to the register
>+   */
>+  writel(fep->next_counter, fep->hwp + FEC_TCCR(channel));
>+  do {
>+  writel(val, fep->hwp + FEC_TCSR(channel));
>+  } while (readl(fep->hwp 

RE: [PATCH v2 5/5] fsl/fman: add dpaa in module names

2017-10-15 Thread Madalin-cristian Bucur
> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Friday, October 13, 2017 8:39 PM
> To: Madalin-cristian Bucur ;
> netdev@vger.kernel.org; da...@davemloft.net
> Cc: and...@lunn.ch; vivien.dide...@savoirfairelinux.com;
> jun...@outlook.com; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH v2 5/5] fsl/fman: add dpaa in module names
> 
> On 10/13/2017 07:50 AM, Madalin Bucur wrote:
> > Signed-off-by: Madalin Bucur 
> 
> You should provide a line or two to explain why are you making this
> change, it is to resolve modular build configurations?

This change just renames the FMan driver modules, using a common prefix for
the DPAA FMan and DPAA Ethernet drivers. Besides making the names more aligned,
this allows writing udev rules that match on either driver name, if needed,
using the fsl_dpaa_* prefix. The change of netdev dev required for the DSA
probing makes the previous rules written using this prefix fail, this change
makes them work again.

> > ---
> >  drivers/net/ethernet/freescale/fman/Makefile | 12 ++--
> >  1 file changed, 6 insertions(+), 6 deletions(-)
> >
> > diff --git a/drivers/net/ethernet/freescale/fman/Makefile
> b/drivers/net/ethernet/freescale/fman/Makefile
> > index 2c38119..4ae524a 100644
> > --- a/drivers/net/ethernet/freescale/fman/Makefile
> > +++ b/drivers/net/ethernet/freescale/fman/Makefile
> > @@ -1,9 +1,9 @@
> >  subdir-ccflags-y +=  -I$(srctree)/drivers/net/ethernet/freescale/fman
> >
> > -obj-$(CONFIG_FSL_FMAN) += fsl_fman.o
> > -obj-$(CONFIG_FSL_FMAN) += fsl_fman_port.o
> > -obj-$(CONFIG_FSL_FMAN) += fsl_mac.o
> > +obj-$(CONFIG_FSL_FMAN) += fsl_dpaa_fman.o
> > +obj-$(CONFIG_FSL_FMAN) += fsl_dpaa_fman_port.o
> > +obj-$(CONFIG_FSL_FMAN) += fsl_dpaa_mac.o
> >
> > -fsl_fman-objs  := fman_muram.o fman.o fman_sp.o fman_keygen.o
> > -fsl_fman_port-objs := fman_port.o
> > -fsl_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
> > +fsl_dpaa_fman-objs := fman_muram.o fman.o fman_sp.o fman_keygen.o
> > +fsl_dpaa_fman_port-objs := fman_port.o
> > +fsl_dpaa_mac-objs:= mac.o fman_dtsec.o fman_memac.o fman_tgec.o
> >
> 
> 
> --
> Florian


To learn the netdev code, please give me some advice...

2017-10-15 Thread admin

Hi, all

    I want to learn network programming, and hope to read the kernel 
code of netdev.


    Please give me some advice.

    Thanks!




Re: routing UAPI mismatch invalid state behavior?

2017-10-15 Thread David Ahern
On 10/15/17 11:02 AM, Alexander Aring wrote:
> Hi,
> 
> I figure out some problem, easy to reproduce:
> 
> # setup dummy
> $ modprobe dummy
> $ ip link set dummy0 up
> 
> # issue
> $ ip route replace default via 169.254.65.37 dev dummy0
> RTNETLINK answers: Network is unreachable

This fails in fib_check_nh. Basically the lookup of the address fails
making it an invalid gateway.

> 
> so it will forbid me to do that, but:
> 
> $ ip route replace 169.254.65.37 dev dummy0

This succeeds because iproute2 adds NLM_F_CREATE to flags along with
REPLACE:

if (matches(*argv, "replace") == 0)
return iproute_modify(RTM_NEWROUTE,
NLM_F_CREATE|NLM_F_REPLACE,
  argc-1, argv+1);

but your overall intent here of resolving 169.254.65.37 is the key.

> $ ip route replace default via 169.254.65.37 dev dummy0

The existence of the previous route allows the gateway check in
fib_check_nh to succeed.

> $ ip route del 169.254.65.37 dev dummy0
> 
> allows me to do that. Is there now a invalid state in my routing table
> or is it an expected behavior?

There are no recursive checks to gateway lookups once the routes are
installed. With the way routes and nexthops are currently created doing
so would be very expensive in some setups.


Re: [PATCH net-next 0/2] Add mqprio hardware offload support in hns3 driver

2017-10-15 Thread Yunsheng Lin
Hi, Yuval

On 2017/10/15 16:51, Yuval Mintz wrote:
> This patchset adds a new hardware offload type in mqprio before
>> adding
> mqprio hardware offload support in hns3 driver.
> 
> Apparently Dave has already accepted  Amirtha's changes to mqprio:
> https://marc.info/?l=linux-netdev=150803219824053=2 
> so I guess you need to revise your patchs to align to the new conventions.

Ok.

"If offloads are supported by setting the 'hw' option to 1, the default
offload mode is 'dcb' where only the TC values are offloaded to the
device. "

According to the description of the above patchset, the default mode is already
dcb, so i will drop the dcb mode patch.

I think the scenario you mentioned still existed, and I am willing to implement
it if we come to a solution that will suit most in the community.

Thanks,
Yunsheng Lin

> 

 I think one of the biggest issues in tying this to DCB configuration is the
 non-immediate [and possibly non persistent] configuration.

 Scenario #1:
 User is configuring mqprio offloaded with 3 TCs while device is in willing
>>> mode.
 Would you expect the driver to immediately respond with a success or
>>> instead
 delay the return until the DCBx negotiation is complete and the
>> operational
 num of TCs is actually 3?
>>>
>>> Well, when user requsts the mqprio offloaded by a hardware shared by
>> DCB,
>>> I expect
>>> the user is not using the dcb tool.
>>> If user is still using dcb tool, then result is undefined.
>>>
>>> The scenario you mention maybe can be enforced by setting willing to zero
>>> when user
>>> is requesting the mqprio offload, and restore the willing bit when unloaded
>>> the mqprio
>>> offload.
>>
>> Sounds a bit harsh but would probably work.
>>
>>> But I think the real issue is that dcb and mqprio shares the tc system in 
>>> the
>>> stack,
>>> the problem may be better to be fixed in the stack rather than in the
>> driver,
>>> as you
>>> suggested in the DCB patchset. What do you think?
>>
>> What did you have in mind?
>>
>>>

 Scenario #2:
 Assume user explicitly offloaded mqprio with 3 TCs, but now DCB
>>> configuration
 has changed on the peer side and 4 TCs is the new negotiated operational
>>> value.
 Your current driver logic would change the number of TCs underneath
>> the
>>> user
 configuration [and it would actually probably work due to mqprio being a
>>> crappy
 qdisc]. But was that the user actual intention?
 [I think the likely answer in this scenario is 'yes' since the alternative 
 is no
>>> better.
 But I still thought it was worth mentioning]
>>>
>>> You are right, the problem also have something to do with mqprio and dcb
>>> sharing
>>> the tc in the stack.
>>>
>>> Druing testing, when user explicitly offloaded mqprio with 3 TCs, all
>>> queue has a default pfifo mqprio attached, after DCB changes the tc num
>> to
>>> 4,
>>> using tc qdisc shows some queue does not have a default pfifo mqprio
>>> attached.
>>
>> Really? Then what did it show?
>> [I assume it has some pfifo attached, and it's an mqprio dump kind of an
>> issue]
>>
>>>
>>> Maybe we can add a callback to notify mqprio the configuration has
>> changed.
>>>
>>
>> Which would do what?
>> You already have the notifications available for monitoring using dcbnl 
>> logic if
>> the
>> configuration change [for user]; So user can re-configure whatever it wants.
>> But other than dropping all the qdisc configurations and going back to the
>> default
>> qdiscs, what default action would mqprio be able to do when configuration
>> changes
>> that actually makes sense?
>>
>>> Thanks
>>> Yunsheng Lin
>>>

 Cheers,
 Yuval

>
> Yunsheng Lin (2):
>   mqprio: Add a new hardware offload type in mqprio
>   net: hns3: Add mqprio hardware offload support in hns3 driver
>
>  drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46
>>> ++-
> ---
>  include/uapi/linux/pkt_sched.h |  1 +
>  4 files changed, 55 insertions(+), 16 deletions(-)
>
> --
> 1.9.1



> 



Re: [PATCH net-next 0/2] Add mqprio hardware offload support in hns3 driver

2017-10-15 Thread Yunsheng Lin
Hi, Yuval

On 2017/10/15 13:14, Yuval Mintz wrote:
>> Hi, Yuval
>>
>> On 2017/10/13 4:21, Yuval Mintz wrote:
 This patchset adds a new hardware offload type in mqprio before adding
 mqprio hardware offload support in hns3 driver.
>>>
>>> I think one of the biggest issues in tying this to DCB configuration is the
>>> non-immediate [and possibly non persistent] configuration.
>>>
>>> Scenario #1:
>>> User is configuring mqprio offloaded with 3 TCs while device is in willing
>> mode.
>>> Would you expect the driver to immediately respond with a success or
>> instead
>>> delay the return until the DCBx negotiation is complete and the operational
>>> num of TCs is actually 3?
>>
>> Well, when user requsts the mqprio offloaded by a hardware shared by DCB,
>> I expect
>> the user is not using the dcb tool.
>> If user is still using dcb tool, then result is undefined.
>>
>> The scenario you mention maybe can be enforced by setting willing to zero
>> when user
>> is requesting the mqprio offload, and restore the willing bit when unloaded
>> the mqprio
>> offload.
> 
> Sounds a bit harsh but would probably work.
> 
>> But I think the real issue is that dcb and mqprio shares the tc system in the
>> stack,
>> the problem may be better to be fixed in the stack rather than in the driver,
>> as you
>> suggested in the DCB patchset. What do you think?
> 
> What did you have in mind?

I was thinking maybe the tc system can provide a notification to mqprio and dcb.
mqprio and dcb register a callback to the tc system, when there is some change 
of
tc configuration, the tc system call the callback from mqprio and dcb.

> 
>>
>>>
>>> Scenario #2:
>>> Assume user explicitly offloaded mqprio with 3 TCs, but now DCB
>> configuration
>>> has changed on the peer side and 4 TCs is the new negotiated operational
>> value.
>>> Your current driver logic would change the number of TCs underneath the
>> user
>>> configuration [and it would actually probably work due to mqprio being a
>> crappy
>>> qdisc]. But was that the user actual intention?
>>> [I think the likely answer in this scenario is 'yes' since the alternative 
>>> is no
>> better.
>>> But I still thought it was worth mentioning]
>>
>> You are right, the problem also have something to do with mqprio and dcb
>> sharing
>> the tc in the stack.
>>
>> Druing testing, when user explicitly offloaded mqprio with 3 TCs, all
>> queue has a default pfifo mqprio attached, after DCB changes the tc num to
>> 4,
>> using tc qdisc shows some queue does not have a default pfifo mqprio
>> attached.
> 
> Really? Then what did it show? 
> [I assume it has some pfifo attached, and it's an mqprio dump kind of an 
> issue]

When queue size of the ndev is 16 and tc num is 3, we set the real queue size to
15 ( 5 * 3 = 15), mqprio only attach pfifo to the first 15 queue, when tc num 
change
to 4 by DCB, we set the real queue size to 16 (4 * 4 = 16).
So tc qdisc shows the last queue has no qdisc attached.

> 
>>
>> Maybe we can add a callback to notify mqprio the configuration has changed.
>>
> 
> Which would do what?
> You already have the notifications available for monitoring using dcbnl logic 
> if the
> configuration change [for user]; So user can re-configure whatever it wants.

Yes, if user is only using dcb tool.

> But other than dropping all the qdisc configurations and going back to the 
> default
> qdiscs, what default action would mqprio be able to do when configuration 
> changes
> that actually makes sense?

As explained above, after dcb changing the configuration, some queue may have 
no qdisc
attached, so I was thinking maybe we can add pfifo to it if there is no qdsic 
attached
to it.

Thanks,
Yunsheng Lin

> 
>> Thanks
>> Yunsheng Lin
>>
>>>
>>> Cheers,
>>> Yuval
>>>

 Yunsheng Lin (2):
   mqprio: Add a new hardware offload type in mqprio
   net: hns3: Add mqprio hardware offload support in hns3 driver

  drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46
>> ++-
 ---
  include/uapi/linux/pkt_sched.h |  1 +
  4 files changed, 55 insertions(+), 16 deletions(-)

 --
 1.9.1
>>>
>>>
>>>
> 



Re: [PATCH net 5/6] rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlink

2017-10-15 Thread David Ahern
[ cc'ed Nicolas ]

On 10/15/17 4:13 AM, Xin Long wrote:
> The check 'status & DO_SETLINK_NOTIFY' in do_setlink doesn't really
> work after status & DO_SETLINK_MODIFIED, as:
> 
>   DO_SETLINK_MODIFIED 0x1
>   DO_SETLINK_NOTIFY 0x3
> 
> Considering that notifications are suppposed to be sent only when
> status have the flag DO_SETLINK_NOTIFY, the right check would be:
> 
>   (status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY
> 
> This would avoid lots of duplicated notifications when setting some
> properties of a link.
> 
> Fixes: ba9989069f4e ("rtnl/do_setlink(): notify when a netdev is modified")
> Signed-off-by: Xin Long 
> ---
>  net/core/rtnetlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index ab98c1c..3e98fb5 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2248,7 +2248,7 @@ static int do_setlink(const struct sk_buff *skb,
>  
>  errout:
>   if (status & DO_SETLINK_MODIFIED) {
> - if (status & DO_SETLINK_NOTIFY)
> + if ((status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY)
>   netdev_state_change(dev);
>  
>   if (err < 0)
> 

Good catch.

Acked-by: David Ahern 


Re: [PATCH net 6/6] rtnetlink: do not set notification for tx_queue_len in do_setlink

2017-10-15 Thread David Ahern
On 10/15/17 4:13 AM, Xin Long wrote:
> NETDEV_CHANGE_TX_QUEUE_LEN event process in rtnetlink_event would
> send a notification for userspace and tx_queue_len's setting in
> do_setlink would trigger NETDEV_CHANGE_TX_QUEUE_LEN.
> 
> So it shouldn't set DO_SETLINK_NOTIFY status for this change to
> send a notification any more.
> 
> Signed-off-by: Xin Long 
> ---
>  net/core/rtnetlink.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 3e98fb5..a6bcf86 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -2093,7 +2093,7 @@ static int do_setlink(const struct sk_buff *skb,
>   dev->tx_queue_len = orig_len;
>   goto errout;
>   }
> - status |= DO_SETLINK_NOTIFY;
> + status |= DO_SETLINK_MODIFIED;
>   }
>   }
>  
> 

Acked-by: David Ahern 



Re: [PATCH net 2/6] rtnetlink: bring NETDEV_CHANGE_TX_QUEUE_LEN event process back in rtnetlink_event

2017-10-15 Thread David Ahern
On 10/15/17 4:13 AM, Xin Long wrote:
> The same fix for changing mtu in the patch 'rtnetlink: bring
> NETDEV_CHANGEMTU event process back in rtnetlink_event' is
> needed for changing tx_queue_len.
> 
> Note that the redundant notifications issue for tx_queue_len
> will be fixed in the later patch 'rtnetlink: do not send
> notification for tx_queue_len in do_setlink'.
> 
> Fixes: 27b3b551d8a7 ("rtnetlink: Do not generate notifications for 
> NETDEV_CHANGE_TX_QUEUE_LEN event")
> Signed-off-by: Xin Long 
> ---
>  net/core/rtnetlink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 72053ed..bf47360 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4287,6 +4287,7 @@ static int rtnetlink_event(struct notifier_block *this, 
> unsigned long event, voi
>   case NETDEV_NOTIFY_PEERS:
>   case NETDEV_RESEND_IGMP:
>   case NETDEV_CHANGEINFODATA:
> + case NETDEV_CHANGE_TX_QUEUE_LEN:
>   rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, rtnl_get_event(event),
>  GFP_KERNEL);
>   break;
> 

Acked-by: David Ahern 



Re: [PATCH net 4/6] rtnetlink: bring NETDEV_CHANGEUPPER event process back in rtnetlink_event

2017-10-15 Thread David Ahern
On 10/15/17 4:13 AM, Xin Long wrote:
> libteam needs this event notification in userspace when dev's master
> dev has been changed. After this, the redundant notifications issue
> would be fixed in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY
> correctly in do_setlink'.
> 
> Fixes: b6b36eb23a46 ("rtnetlink: Do not generate notifications for 
> NETDEV_CHANGEUPPER event")
> Signed-off-by: Xin Long 
> ---
>  net/core/rtnetlink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index 8e44fd5..ab98c1c 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4286,6 +4286,7 @@ static int rtnetlink_event(struct notifier_block *this, 
> unsigned long event, voi
>   case NETDEV_BONDING_FAILOVER:
>   case NETDEV_POST_TYPE_CHANGE:
>   case NETDEV_NOTIFY_PEERS:
> + case NETDEV_CHANGEUPPER:
>   case NETDEV_RESEND_IGMP:
>   case NETDEV_CHANGEINFODATA:
>   case NETDEV_CHANGE_TX_QUEUE_LEN:
> 

Acked-by: David Ahern 


Re: [PATCH net 3/6] rtnetlink: bring NETDEV_POST_TYPE_CHANGE event process back in rtnetlink_event

2017-10-15 Thread David Ahern
On 10/15/17 4:13 AM, Xin Long wrote:
> As I said in patch 'rtnetlink: bring NETDEV_CHANGEMTU event process back
> in rtnetlink_event', removing NETDEV_POST_TYPE_CHANGE event was not the
> right fix for the redundant notifications issue.
> 
> So bring this event process back to rtnetlink_event and the old redundant
> notifications issue would be fixed in the later patch 'rtnetlink: check
> DO_SETLINK_NOTIFY correctly in do_setlink'.
> 
> Fixes: aef091ae58aa ("rtnetlink: Do not generate notifications for 
> POST_TYPE_CHANGE event")
> Signed-off-by: Xin Long 
> ---
>  net/core/rtnetlink.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index bf47360..8e44fd5 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -4284,6 +4284,7 @@ static int rtnetlink_event(struct notifier_block *this, 
> unsigned long event, voi
>   case NETDEV_CHANGENAME:
>   case NETDEV_FEAT_CHANGE:
>   case NETDEV_BONDING_FAILOVER:
> + case NETDEV_POST_TYPE_CHANGE:
>   case NETDEV_NOTIFY_PEERS:
>   case NETDEV_RESEND_IGMP:
>   case NETDEV_CHANGEINFODATA:
> 

Acked-by: David Ahern 


Re: RFC: making cn_proc work in {pid,user} namespaces

2017-10-15 Thread Eric W. Biederman
Aleksa Sarai  writes:

> Hi all,
>
> At the moment, cn_proc is not usable by containers or container runtimes. In
> addition, all connectors have an odd relationship with init_net (for example,
> /proc/net/connectors only exists in init_net). There are two main use-cases 
> that
> would be perfect for cn_proc, which is the reason for me pushing this:
>
> First, when adding a process to an existing container, in certain modes runc
> would like to know that process's exit code. But, when joining a PID 
> namespace,
> it is advisable[1] to always double-fork after doing the setns(2) to reparent
> the joining process to the init of the container (this causes the SIGCHLD to 
> be
> received by the container init).  It would also be useful to be able to 
> monitor
> the exit code of the init process in a container without being its parent. At
> the moment, cn_proc doesn't allow unprivileged users to use it (making it a
> problem for user namespaces and "rootless containers"). In addition, it also
> doesn't allow nested containers to use it, because it requires the process to 
> be
> in init_pid. As a result, runc cannot use cn_proc and relies on SIGCHLD (which
> can only be used if we don't double-fork, or keep around a long-running 
> process
> which is something that runc also cannot do).

As far as I know there are no technical issues that require a
daemonizing double fork when injecting a process into a pid namespaces.
A fork is required because the pid is changing and that requires another
process.

Monitoring and acting on the monitored state without keeping around a
single process to do the monitoring does not make sense to me.  So I am
just going to ignore that.

So I don't think fixing cn_proc for this issue makes sense.

> Secondly, there are/were some init systems that rely on cn_proc to manage
> service state. From a "it would be neat" perspective, I think it would be 
> quite
> nice if such init systems could be used inside containers. But that requires
> cn_proc to be able to be used as an unprivileged user and in a pid namespace
> other than init_pid.

Any pointers to these init systems?  In general I agree.  Given how much
work it takes to go through a subsystem and ensure that it is safe for
non-root users I am happy to see the work done, but I am not
volunteering for the work when I have several I have as many tasks as I
have on my plate right now.

> The /proc/net/connectors thing is quite easily resolved (just make it the
> connector driver perdev and make some small changes to make sure the 
> interfaces
> stay sane inside of a container's network namespace). I'm sure that we'll
> probably have to make some changes to the registration API, so that a 
> connector
> can specify whether they want to be visible to non-init_net
> namespaces.
>
> However, the cn_proc problem is a bit harder to resolve nicely and there are
> quite a few interface questions that would need to be agreed upon. The basic
> idea would be that a process can only get cn_proc events if it has
> ptrace_may_access rights over said process (effectively a forced filter -- 
> which
> would ideally be done send-side but it looks like it might have to be done
> receive-side). This should resolve possible concerns about an unprivileged
> process being able to inspect (fairly granular) information about the host. 
> And
> obviously the pids, uids, and gids would all be translated according to the
> receiving process's user namespaces (if it cannot be translated then the 
> message
> is not received). I guess that the translation would be done in the same way 
> as
> SCM_CREDENTIALS (and cgroup.procs files), which is that it's done on the 
> receive
> side not the send side.

Hmm.  We have several of these things such as bsd process accounting
which appear to be working fine.

The basic logic winds up being:
for_each_receiver:
compose_msg in receivers namespace
send_msg.

The tricky bit in my mind is dealing with receivers because of the
connection with the network namespace.

SCM_CREDENTIALS is an unfortunate case, that really should not be
followed as a model.  The major challenge there is not knowing
the receiving socket, or the receiver.  If I had been smarter
when I coded that originally I would have forced everything into
the namespace of the opener of the receiving socket.   I may have to
revisit that one again someday and see if there are improvements that
can be made.

> My reason for sending this email rather than just writing the patch is to see
> whether anyone has any solid NACKs against the use-case or whether there is 
> some
> fundamental issue that I'm not seeing. If nobody objects, I'll be happy to 
> work
> on this.

If you want a non-crazy (with respect to namespace involvement) model
please look at kernel/acct.c:acc_process()

If there are use cases that people still care about that use the
proc connector and want to run in a container it seems sensible to dig
in and sort things out.  I 

Re: [PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-15 Thread Richard Weinberger
Am Montag, 16. Oktober 2017, 00:00:20 CEST schrieb Richard Weinberger:
> max_entries is user controlled and used as input for __alloc_percpu().
> This function expects that the allocation size is a power of two and
> less than PCPU_MIN_UNIT_SIZE.
> Otherwise a WARN() is triggered.

On a second though, I think we should also have a hard limit for ->max_entries 
or check for an overflow here:

static u64 dev_map_bitmap_size(const union bpf_attr *attr)
{
return BITS_TO_LONGS(attr->max_entries) * sizeof(unsigned long);
}

Thanks,
//richard



[PATCH] bpf: devmap: Check attr->max_entries more carefully

2017-10-15 Thread Richard Weinberger
max_entries is user controlled and used as input for __alloc_percpu().
This function expects that the allocation size is a power of two and
less than PCPU_MIN_UNIT_SIZE.
Otherwise a WARN() is triggered.

Fixes: 11393cc9b9be ("xdp: Add batching support to redirect map")
Reported-by: Shankara Pailoor 
Reported-by: syzkaller 
Signed-off-by: Richard Weinberger 
---
 kernel/bpf/devmap.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index e093d9a2c4dd..6ce00083103b 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -49,6 +49,7 @@
  */
 #include 
 #include 
+#include 
 
 struct bpf_dtab_netdev {
struct net_device *dev;
@@ -77,6 +78,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
struct bpf_dtab *dtab;
int err = -EINVAL;
u64 cost;
+   size_t palloc_size;
 
/* check sanity of attributes */
if (attr->max_entries == 0 || attr->key_size != 4 ||
@@ -95,9 +97,14 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
dtab->map.map_flags = attr->map_flags;
dtab->map.numa_node = bpf_map_attr_numa_node(attr);
 
+   palloc_size = roundup_pow_of_two(dev_map_bitmap_size(attr));
+   if (palloc_size > PCPU_MIN_UNIT_SIZE ||
+   palloc_size < dev_map_bitmap_size(attr))
+   return ERR_PTR(-EINVAL);
+
/* make sure page count doesn't overflow */
cost = (u64) dtab->map.max_entries * sizeof(struct bpf_dtab_netdev *);
-   cost += dev_map_bitmap_size(attr) * num_possible_cpus();
+   cost += palloc_size * num_possible_cpus();
if (cost >= U32_MAX - PAGE_SIZE)
goto free_dtab;
 
@@ -111,7 +118,7 @@ static struct bpf_map *dev_map_alloc(union bpf_attr *attr)
err = -ENOMEM;
 
/* A per cpu bitfield with a bit per possible net device */
-   dtab->flush_needed = __alloc_percpu(dev_map_bitmap_size(attr),
+   dtab->flush_needed = __alloc_percpu(palloc_size,
__alignof__(unsigned long));
if (!dtab->flush_needed)
goto free_dtab;
-- 
2.13.6



Re: [PATCH net-next 00/10] korina cleanups/optimizations

2017-10-15 Thread David Miller
From: Roman Yeryomin 
Date: Sun, 15 Oct 2017 19:46:02 +0300

> On 2017-10-15 19:38, Florian Fainelli wrote:
>> On October 15, 2017 9:22:26 AM PDT, Roman Yeryomin 
>> wrote:
>>> TX optimizations have led to ~15% performance increase (35->40Mbps)
>>> in local tx usecase (tested with iperf v3.2).
>> Could you avoid empty commit messages and write a paragraph or two for
>> each commit that explains what and why are you changing? The changes
>> look fine but they lack any explanation.
> 
> I thought that short descriptions are already self explanatory and
> just didn't know what to write more.

"Optimize TX handlers."

In what way?  Why?  How are things improved?  Is it measurable?
etc.


[PATCH] hamradio: baycom_par: use new parport device model

2017-10-15 Thread Sudip Mukherjee
Modify baycom driver to use the new parallel port device model.

Signed-off-by: Sudip Mukherjee 
---

Not tested on real hardware, only tested on kvm with 32 bit guest and
verified that the device is binding to the driver properly in par96_open
but then unbinding as the device was not found.

 drivers/net/hamradio/baycom_par.c | 48 +++
 1 file changed, 44 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hamradio/baycom_par.c 
b/drivers/net/hamradio/baycom_par.c
index e178383..1f7ceaf 100644
--- a/drivers/net/hamradio/baycom_par.c
+++ b/drivers/net/hamradio/baycom_par.c
@@ -311,7 +311,9 @@ static void par96_wakeup(void *handle)
 static int par96_open(struct net_device *dev)
 {
struct baycom_state *bc = netdev_priv(dev);
+   struct pardev_cb par_cb;
struct parport *pp;
+   int i;
 
if (!dev || !bc)
return -ENXIO;
@@ -332,8 +334,21 @@ static int par96_open(struct net_device *dev)
}
memset(>modem, 0, sizeof(bc->modem));
bc->hdrv.par.bitrate = 9600;
-   bc->pdev = parport_register_device(pp, dev->name, NULL, par96_wakeup, 
-par96_interrupt, PARPORT_DEV_EXCL, dev);
+   memset(_cb, 0, sizeof(par_cb));
+   par_cb.wakeup = par96_wakeup;
+   par_cb.irq_func = par96_interrupt;
+   par_cb.private = (void *)dev;
+   par_cb.flags = PARPORT_DEV_EXCL;
+   for (i = 0; i < NR_PORTS; i++)
+   if (baycom_device[i] == dev)
+   break;
+
+   if (i == NR_PORTS) {
+   pr_err("%s: no device found\n", bc_drvname);
+   parport_put_port(pp);
+   return -ENODEV;
+   }
+   bc->pdev = parport_register_dev_model(pp, dev->name, _cb, i);
parport_put_port(pp);
if (!bc->pdev) {
printk(KERN_ERR "baycom_par: cannot register parport at 
0x%lx\n", dev->base_addr);
@@ -490,12 +505,34 @@ static int baycom_ioctl(struct net_device *dev, struct 
ifreq *ifr,
 
 /* - */
 
+static int baycom_par_probe(struct pardevice *par_dev)
+{
+   struct device_driver *drv = par_dev->dev.driver;
+   int len = strlen(drv->name);
+
+   if (strncmp(par_dev->name, drv->name, len))
+   return -ENODEV;
+
+   return 0;
+}
+
+static struct parport_driver baycom_par_driver = {
+   .name = "bcp",
+   .probe = baycom_par_probe,
+   .devmodel = true,
+};
+
 static int __init init_baycompar(void)
 {
-   int i, found = 0;
+   int i, found = 0, ret;
char set_hw = 1;
 
printk(bc_drvinfo);
+
+   ret = parport_register_driver(_par_driver);
+   if (ret)
+   return ret;
+
/*
 * register net devices
 */
@@ -524,8 +561,10 @@ static int __init init_baycompar(void)
baycom_device[i] = dev;
}
 
-   if (!found)
+   if (!found) {
+   parport_unregister_driver(_par_driver);
return -ENXIO;
+   }
return 0;
 }
 
@@ -539,6 +578,7 @@ static void __exit cleanup_baycompar(void)
if (dev)
hdlcdrv_unregister(dev);
}
+   parport_unregister_driver(_par_driver);
 }
 
 module_init(init_baycompar);
-- 
1.9.1



Re: [PATCH v2 2/5] dpaa_eth: move of_phy_connect() to the eth driver

2017-10-15 Thread Andrew Lunn
On Fri, Oct 13, 2017 at 05:50:09PM +0300, Madalin Bucur wrote:
> Signed-off-by: Madalin Bucur 
> ---
>  drivers/net/ethernet/freescale/dpaa/dpaa_eth.c | 48 +++--
>  drivers/net/ethernet/freescale/fman/mac.c  | 97 
> ++
>  drivers/net/ethernet/freescale/fman/mac.h  |  5 +-
>  3 files changed, 66 insertions(+), 84 deletions(-)
> 
> diff --git a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c 
> b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> index 4225806..7cf61d6 100644
> --- a/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> +++ b/drivers/net/ethernet/freescale/dpaa/dpaa_eth.c
> @@ -2435,6 +2435,48 @@ static void dpaa_eth_napi_disable(struct dpaa_priv 
> *priv)
>   }
>  }
>  
> +static void dpaa_adjust_link(struct net_device *net_dev)
> +{
> + struct mac_device *mac_dev;
> + struct dpaa_priv *priv;
> +
> + priv = netdev_priv(net_dev);
> + mac_dev = priv->mac_dev;
> + mac_dev->adjust_link(mac_dev);
> +}
> +
> +static int dpaa_phy_init(struct net_device *net_dev)
> +{
> + struct mac_device *mac_dev;
> + struct phy_device *phy_dev;
> + struct dpaa_priv *priv;
> +
> + priv = netdev_priv(net_dev);
> + mac_dev = priv->mac_dev;
> +
> + phy_dev = of_phy_connect(net_dev, mac_dev->phy_node,
> +  _adjust_link, 0,
> +  mac_dev->phy_if);
> + if (!phy_dev) {
> + netif_err(priv, ifup, net_dev, "init_phy() failed\n");
> + return -ENODEV;
> + }
> +
> + /* Remove any features not supported by the controller */
> + phy_dev->supported &= mac_dev->if_support;
> +
> + /* Enable the symmetric and asymmetric PAUSE frame advertisements,
> +  * as most of the PHY drivers do not enable them by default.
> +  */

Hi Madalin

This is just moving code around, so the patch is O.K. However, it
would be nice to have a followup patch. This comment is wrong. The phy
driver should never enable symmetric and asymmetric PAUSE frames. The
MAC needs to, because only the MAC knows if the MAC supports pause
frames.

Andrew


[PATCH] net: dccp: mark expected switch fall-throughs

2017-10-15 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that for options.c file, I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).
Please, verify if the actual intention of the code is to fall through.

 net/dccp/input.c   | 1 +
 net/dccp/options.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dccp/input.c b/net/dccp/input.c
index fa6be97..d28d46b 100644
--- a/net/dccp/input.c
+++ b/net/dccp/input.c
@@ -534,6 +534,7 @@ static int dccp_rcv_respond_partopen_state_process(struct 
sock *sk,
case DCCP_PKT_DATA:
if (sk->sk_state == DCCP_RESPOND)
break;
+   /* fall through */
case DCCP_PKT_DATAACK:
case DCCP_PKT_ACK:
/*
diff --git a/net/dccp/options.c b/net/dccp/options.c
index 51cdfc3..4e40db0 100644
--- a/net/dccp/options.c
+++ b/net/dccp/options.c
@@ -227,8 +227,8 @@ int dccp_parse_options(struct sock *sk, struct 
dccp_request_sock *dreq,
 * Ack vectors are processed by the TX CCID if it is
 * interested. The RX CCID need not parse Ack Vectors,
 * since it is only interested in clearing old state.
-* Fall through.
 */
+   /* fall through */
case DCCPO_MIN_TX_CCID_SPECIFIC ... DCCPO_MAX_TX_CCID_SPECIFIC:
if (ccid_hc_tx_parse_options(dp->dccps_hc_tx_ccid, sk,
 pkt_type, opt, value, len))
-- 
2.7.4



Re: [PATCH net-next] ipv6: only update __use and lastusetime once per jiffy at most

2017-10-15 Thread Martin KaFai Lau
On Sat, Oct 14, 2017 at 12:26:28AM +, Eric Dumazet wrote:
> On Fri, Oct 13, 2017 at 5:09 PM, Martin KaFai Lau  wrote:
> > On Fri, Oct 13, 2017 at 10:08:07PM +, Wei Wang wrote:
> >> From: Wei Wang 
> >>
> >> In order to not dirty the cacheline too often, we try to only update
> >> dst->__use and dst->lastusetime at most once per jiffy.
> >
> >
> >> As dst->lastusetime is only used by ipv6 garbage collector, it should
> >> be good enough time resolution.
> > Make sense.
> >
> >> And __use is only used in ipv6_route_seq_show() to show how many times a
> >> dst has been used. And as __use is not atomic_t right now, it does not
> >> show the precise number of usage times anyway. So we think it should be
> >> OK to only update it at most once per jiffy.
> > If __use is only bumped HZ number of times per second and we can do ~3Mpps 
> > now,
> > would __use be way off?
> 
> It is not used in the kernel, and is not even reported by user space
> (iproute2) currently.
> 
> With the percpu stuff, we never did the sum anyway.
The pcpu_rt currently does not track __use.

> 
> I believe we should be fine by being very lazy on this field.
> 
> If really someones complain, we will see, but insuring ~one update per
> HZ seems fine.
Fair point.  Make sense.
We currently also don't find the ipv6_route proc-file very useful
other than debugging purpose.


[PATCH] net: ceph: mark expected switch fall-throughs

2017-10-15 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Signed-off-by: Gustavo A. R. Silva 
---
 net/ceph/ceph_hash.c  | 12 +++-
 net/ceph/messenger.c  |  1 +
 net/ceph/mon_client.c |  1 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/ceph/ceph_hash.c b/net/ceph/ceph_hash.c
index 67bb1f1..9a5850f2 100644
--- a/net/ceph/ceph_hash.c
+++ b/net/ceph/ceph_hash.c
@@ -47,28 +47,38 @@ unsigned int ceph_str_hash_rjenkins(const char *str, 
unsigned int length)
 
/* handle the last 11 bytes */
c = c + length;
-   switch (len) {/* all the case statements fall through */
+   switch (len) {
case 11:
c = c + ((__u32)k[10] << 24);
+   /* fall through */
case 10:
c = c + ((__u32)k[9] << 16);
+   /* fall through */
case 9:
c = c + ((__u32)k[8] << 8);
/* the first byte of c is reserved for the length */
+   /* fall through */
case 8:
b = b + ((__u32)k[7] << 24);
+   /* fall through */
case 7:
b = b + ((__u32)k[6] << 16);
+   /* fall through */
case 6:
b = b + ((__u32)k[5] << 8);
+   /* fall through */
case 5:
b = b + k[4];
+   /* fall through */
case 4:
a = a + ((__u32)k[3] << 24);
+   /* fall through */
case 3:
a = a + ((__u32)k[2] << 16);
+   /* fall through */
case 2:
a = a + ((__u32)k[1] << 8);
+   /* fall through */
case 1:
a = a + k[0];
/* case 0: nothing left to add */
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c
index a67298c..73dac9d 100644
--- a/net/ceph/messenger.c
+++ b/net/ceph/messenger.c
@@ -429,6 +429,7 @@ static void ceph_sock_state_change(struct sock *sk)
switch (sk->sk_state) {
case TCP_CLOSE:
dout("%s TCP_CLOSE\n", __func__);
+   /* fall through */
case TCP_CLOSE_WAIT:
dout("%s TCP_CLOSE_WAIT\n", __func__);
con_sock_state_closing(con);
diff --git a/net/ceph/mon_client.c b/net/ceph/mon_client.c
index 63edc6e..b97c047 100644
--- a/net/ceph/mon_client.c
+++ b/net/ceph/mon_client.c
@@ -1281,6 +1281,7 @@ static struct ceph_msg *mon_alloc_msg(struct 
ceph_connection *con,
 * request had a non-zero tid.  Workaround this weirdness
 * by falling through to the allocate case.
 */
+   /* fall through */
case CEPH_MSG_MON_MAP:
case CEPH_MSG_MDS_MAP:
case CEPH_MSG_OSD_MAP:
-- 
2.7.4



Re: [PATCH net-next 00/10] korina cleanups/optimizations

2017-10-15 Thread Florian Fainelli
On October 15, 2017 9:46:02 AM PDT, Roman Yeryomin  wrote:
>On 2017-10-15 19:38, Florian Fainelli wrote:
>> On October 15, 2017 9:22:26 AM PDT, Roman Yeryomin  
>> wrote:
>>> TX optimizations have led to ~15% performance increase (35->40Mbps)
>>> in local tx usecase (tested with iperf v3.2).
>> 
>> Could you avoid empty commit messages and write a paragraph or two
>for
>> each commit that explains what and why are you changing? The changes
>> look fine but they lack any explanation.
>
>I thought that short descriptions are already self explanatory and just
>
>didn't know what to write more.
>To me they are very obvious.

You can't assume what is obvious to you will be to the reviewers.

>Do you really want me to make up something more?

It's up to David as a maintainer to decide, but if this was up to me I would 
ask you to respin with a least a paragraph for each commit (except the version 
bump) why you did these changes.

>
>>> 
>>> Roman Yeryomin (10):
>>>  net: korina: optimize korina_send_packet logic
>>>  net: korina: reorder functions
>>>  net: korina: introduce and use interrupt disable/enable helpers
>>>  net: korina: optimize tx/rx interrupt handlers
>>>  net: korina: remove unused korina_private members
>>>  net: korina: optimize tx descriptor flags processing
>>>  net: korina: move tx to napi context
>>>  net: korina: optimize tx_full condition
>>>  net: korina: use dma api instead of dma_cache_* functions
>>>  net: korina: bump version
>>> 
>>> drivers/net/ethernet/korina.c | 503
>>> +-
>>> 1 file changed, 204 insertions(+), 299 deletions(-)


-- 
Florian


[PATCH] xen/9pfs: mark expected switch fall-through in xen_9pfs_front_changed

2017-10-15 Thread Gustavo A. R. Silva
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Notice that in this particular case, I placed the "fall through" comment
on its own line, which is what GCC is expecting to find.

Signed-off-by: Gustavo A. R. Silva 
---
This code was tested by compilation only (GCC 7.2.0 was used).

 net/9p/trans_xen.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c
index 6ad3e04..7ec5df9 100644
--- a/net/9p/trans_xen.c
+++ b/net/9p/trans_xen.c
@@ -510,7 +510,8 @@ static void xen_9pfs_front_changed(struct xenbus_device 
*dev,
case XenbusStateClosed:
if (dev->state == XenbusStateClosed)
break;
-   /* Missed the backend's CLOSING state -- fallthrough */
+   /* Missed the backend's CLOSING state */
+   /* fall through */
case XenbusStateClosing:
xenbus_frontend_closed(dev);
break;
-- 
2.7.4



Re: [RFC PATCH v2 5/5] wave: Added TCP Wave

2017-10-15 Thread Eric Dumazet
On Sat, 2017-10-14 at 13:47 +0200, Natale Patriciello wrote:
> TCP Wave (TCPW) replaces the window-based transmission paradigm of the
> standard TCP with a burst-based transmission, the ACK-clock scheduling
> with a self-managed timer and the RTT-based congestion control loop
> with an Ack-based Capacity and Congestion Estimation (ACCE) module. In
> non-technical words, it sends data down the stack when its internal
> timer expires, and the timing of the received ACKs contribute to
> updating this timer regularly.
> 


> It is the first TCP congestion control that uses the timing constraint
> developed in the Linux kernel.

You seem to have missed BBR.



This is against what we have worked in the last years, preventing bursts
in the first place.

If this series is to address WIFI peculiarities with TSQ (this is a well
known problem, still waiting for a fix), I would rather not add all this
code in TCP and find better alternatives.

Since you do not provide information and testing results, there is no
chance we accept such patches.

Your work seems to be a variation of TSQ, and a very complex one.






Re: [RFC PATCH v2 3/5] tcp: added get_segs_per_round

2017-10-15 Thread Eric Dumazet
On Sat, 2017-10-14 at 13:47 +0200, Natale Patriciello wrote:
> Usually, the pacing time is provided per-segment. In some occasion, this
> time refers to the time between a group of segments. With this commit, add
> the possibility for the congestion control module to tell the TCP socket
> how many segments can be sent out before pausing and setting a pacing
> timer.
> 
> Signed-off-by: Natale Patriciello 
> ---
>  include/net/tcp.h |  2 ++
>  net/ipv4/tcp_output.c | 13 +
>  2 files changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index e817f0669d0e..3561eca5a61f 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1019,6 +1019,8 @@ struct tcp_congestion_ops {
>   u64 (*get_pacing_time)(struct sock *sk);
>   /* the pacing timer is expired (optional) */
>   void (*pacing_timer_expired)(struct sock *sk);
> + /* get the # segs to send out when the timer expires (optional) */
> + u32 (*get_segs_per_round)(struct sock *sk);
>  
>   charname[TCP_CA_NAME_MAX];
>   struct module   *owner;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 25b4cf0802f2..e37941e4328b 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2249,6 +2249,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned 
> int mss_now, int nonagle,
>   int result;
>   bool is_cwnd_limited = false, is_rwnd_limited = false;
>   u32 max_segs;
> + u32 pacing_allowed_segs = 0;
>  
>   sent_pkts = 0;
>  
> @@ -2265,14 +2266,18 @@ static bool tcp_write_xmit(struct sock *sk, unsigned 
> int mss_now, int nonagle,
>   max_segs = tcp_tso_segs(sk, mss_now);
>   tcp_mstamp_refresh(tp);
>  
> - if (!tcp_pacing_timer_check(sk) &&
> - ca_ops && ca_ops->pacing_timer_expired)
> - ca_ops->pacing_timer_expired(sk);
> + if (!tcp_pacing_timer_check(sk)) {
> + pacing_allowed_segs = 1;
> + if (ca_ops && ca_ops->pacing_timer_expired)
> + ca_ops->pacing_timer_expired(sk);
> + if (ca_ops && ca_ops->get_segs_per_round)
> + pacing_allowed_segs = ca_ops->get_segs_per_round(sk);
> + }
>  
>   while ((skb = tcp_send_head(sk))) {
>   unsigned int limit;
>  
> - if (tcp_pacing_check(sk))
> + if (sent_pkts >= pacing_allowed_segs)
>   break;
>  

So, if pacing_allowed_segs == 0, we break immediately of the loop 

Are you sure you have tested this patch with normal CC like CUBIC and
BBR ?





Re: [RFC PATCH v2 2/5] tcp: implemented pacing_expired

2017-10-15 Thread Eric Dumazet
On Sat, 2017-10-14 at 13:47 +0200, Natale Patriciello wrote:
> Inform the congestion control that the pacing timer, previously set,
> has expired. The commit does not consider situations in which another
> kind of timer has expired (e.g., a tail loss probe, a retransmission
> timer...)
> 
> Signed-off-by: Natale Patriciello 
> ---
>  include/net/tcp.h | 2 ++
>  net/ipv4/tcp_output.c | 6 ++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/include/net/tcp.h b/include/net/tcp.h
> index 42c7aa96c4cf..e817f0669d0e 100644
> --- a/include/net/tcp.h
> +++ b/include/net/tcp.h
> @@ -1017,6 +1017,8 @@ struct tcp_congestion_ops {
>  union tcp_cc_info *info);
>   /* get the expiration time for the pacing timer (optional) */
>   u64 (*get_pacing_time)(struct sock *sk);
> + /* the pacing timer is expired (optional) */
> + void (*pacing_timer_expired)(struct sock *sk);
>  
>   charname[TCP_CA_NAME_MAX];
>   struct module   *owner;
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index ec5977156c26..25b4cf0802f2 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2241,6 +2241,7 @@ void tcp_chrono_stop(struct sock *sk, const enum 
> tcp_chrono type)
>  static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int 
> nonagle,
>  int push_one, gfp_t gfp)
>  {
> + const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
>   struct tcp_sock *tp = tcp_sk(sk);
>   struct sk_buff *skb;
>   unsigned int tso_segs, sent_pkts;
> @@ -2263,6 +2264,11 @@ static bool tcp_write_xmit(struct sock *sk, unsigned 
> int mss_now, int nonagle,
>  
>   max_segs = tcp_tso_segs(sk, mss_now);
>   tcp_mstamp_refresh(tp);
> +
> + if (!tcp_pacing_timer_check(sk) &&
> + ca_ops && ca_ops->pacing_timer_expired)
> + ca_ops->pacing_timer_expired(sk);
> +
>  

1) We do not want hrtimer_active() being called for each
tcp_write_xmit() ... :/

2) ca_ops can not be NULL.


->
   if (ca_ops->pacing_timer_expired && !tcp_pacing_timer_check(sk))
   ca_ops->pacing_timer_expired(sk);







routing UAPI mismatch invalid state behavior?

2017-10-15 Thread Alexander Aring
Hi,

I figure out some problem, easy to reproduce:

# setup dummy
$ modprobe dummy
$ ip link set dummy0 up

# issue
$ ip route replace default via 169.254.65.37 dev dummy0
RTNETLINK answers: Network is unreachable

so it will forbid me to do that, but:

$ ip route replace 169.254.65.37 dev dummy0
$ ip route replace default via 169.254.65.37 dev dummy0
$ ip route del 169.254.65.37 dev dummy0

allows me to do that. Is there now a invalid state in my routing table
or is it an expected behavior?

- Alex


[PATCH] ipv6: addrconf: Use normal debugging style

2017-10-15 Thread Joe Perches
Remove local ADBG macro and use netdev_dbg/pr_debug

Miscellanea:

o Remove unnecessary debug message after allocation failure as there
  already is a dump_stack() on the failure paths
o Leave the allocation failure message on snmp6_alloc_dev as there
  is one code path that does not do a dump_stack()

Signed-off-by: Joe Perches 
---

o This is on top of David Ahern's proposed patch
  net: ipv6: Make inet6addr_validator a blocking notifier

 net/ipv6/addrconf.c | 28 
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 31ff12277bcf..c505e84dda1f 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -94,15 +94,6 @@
 #include 
 #include 
 
-/* Set to 3 to get tracing... */
-#define ACONF_DEBUG 2
-
-#if ACONF_DEBUG >= 3
-#define ADBG(fmt, ...) printk(fmt, ##__VA_ARGS__)
-#else
-#define ADBG(fmt, ...) do { if (0) printk(fmt, ##__VA_ARGS__); } while (0)
-#endif
-
 #defineINFINITY_LIFE_TIME  0x
 
 #define IPV6_MAX_STRLEN \
@@ -409,9 +400,8 @@ static struct inet6_dev *ipv6_add_dev(struct net_device 
*dev)
dev_hold(dev);
 
if (snmp6_alloc_dev(ndev) < 0) {
-   ADBG(KERN_WARNING
-   "%s: cannot allocate memory for statistics; dev=%s.\n",
-   __func__, dev->name);
+   netdev_dbg(dev, "%s: cannot allocate memory for statistics\n",
+  __func__);
neigh_parms_release(_tbl, ndev->nd_parms);
dev_put(dev);
kfree(ndev);
@@ -419,9 +409,8 @@ static struct inet6_dev *ipv6_add_dev(struct net_device 
*dev)
}
 
if (snmp6_register_dev(ndev) < 0) {
-   ADBG(KERN_WARNING
-   "%s: cannot create /proc/net/dev_snmp6/%s\n",
-   __func__, dev->name);
+   netdev_dbg(dev, "%s: cannot create /proc/net/dev_snmp6/%s\n",
+  __func__, dev->name);
goto err_release;
}
 
@@ -966,7 +955,7 @@ static int ipv6_add_addr_hash(struct net_device *dev, 
struct inet6_ifaddr *ifa)
 
/* Ignore adding duplicate addresses on an interface */
if (ipv6_chk_same_addr(dev_net(dev), >addr, dev)) {
-   ADBG("ipv6_add_addr: already assigned\n");
+   netdev_dbg(dev, "ipv6_add_addr: already assigned\n");
err = -EEXIST;
goto out;
}
@@ -1029,7 +1018,6 @@ ipv6_add_addr(struct inet6_dev *idev, const struct 
in6_addr *addr,
 
ifa = kzalloc(sizeof(*ifa), gfp_flags);
if (!ifa) {
-   ADBG("ipv6_add_addr: malloc failed\n");
err = -ENOBUFS;
goto out;
}
@@ -2575,7 +2563,7 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, 
int len, bool sllao)
pinfo = (struct prefix_info *) opt;
 
if (len < sizeof(struct prefix_info)) {
-   ADBG("addrconf: prefix option too short\n");
+   netdev_dbg(dev, "addrconf: prefix option too short\n");
return;
}
 
@@ -4373,8 +4361,8 @@ static void addrconf_verify_rtnl(void)
if (time_before(next_sched, jiffies + ADDRCONF_TIMER_FUZZ_MAX))
next_sched = jiffies + ADDRCONF_TIMER_FUZZ_MAX;
 
-   ADBG(KERN_DEBUG "now = %lu, schedule = %lu, rounded schedule = %lu => 
%lu\n",
- now, next, next_sec, next_sched);
+   pr_debug("now = %lu, schedule = %lu, rounded schedule = %lu => %lu\n",
+now, next, next_sec, next_sched);
mod_delayed_work(addrconf_wq, _chk_work, next_sched - now);
rcu_read_unlock_bh();
 }
-- 
2.10.0.rc2.1.g053435c



Re: [PATCH net-next 00/10] korina cleanups/optimizations

2017-10-15 Thread Roman Yeryomin

On 2017-10-15 19:38, Florian Fainelli wrote:
On October 15, 2017 9:22:26 AM PDT, Roman Yeryomin  
wrote:

TX optimizations have led to ~15% performance increase (35->40Mbps)
in local tx usecase (tested with iperf v3.2).


Could you avoid empty commit messages and write a paragraph or two for
each commit that explains what and why are you changing? The changes
look fine but they lack any explanation.


I thought that short descriptions are already self explanatory and just 
didn't know what to write more.

To me they are very obvious.
Do you really want me to make up something more?



Roman Yeryomin (10):
 net: korina: optimize korina_send_packet logic
 net: korina: reorder functions
 net: korina: introduce and use interrupt disable/enable helpers
 net: korina: optimize tx/rx interrupt handlers
 net: korina: remove unused korina_private members
 net: korina: optimize tx descriptor flags processing
 net: korina: move tx to napi context
 net: korina: optimize tx_full condition
 net: korina: use dma api instead of dma_cache_* functions
 net: korina: bump version

drivers/net/ethernet/korina.c | 503
+-
1 file changed, 204 insertions(+), 299 deletions(-)


Re: [PATCH net-next 00/10] korina cleanups/optimizations

2017-10-15 Thread Florian Fainelli
On October 15, 2017 9:22:26 AM PDT, Roman Yeryomin  wrote:
>TX optimizations have led to ~15% performance increase (35->40Mbps)
>in local tx usecase (tested with iperf v3.2).

Could you avoid empty commit messages and write a paragraph or two for each 
commit that explains what and why are you changing? The changes look fine but 
they lack any explanation.

>
>Roman Yeryomin (10):
>  net: korina: optimize korina_send_packet logic
>  net: korina: reorder functions
>  net: korina: introduce and use interrupt disable/enable helpers
>  net: korina: optimize tx/rx interrupt handlers
>  net: korina: remove unused korina_private members
>  net: korina: optimize tx descriptor flags processing
>  net: korina: move tx to napi context
>  net: korina: optimize tx_full condition
>  net: korina: use dma api instead of dma_cache_* functions
>  net: korina: bump version
>
>drivers/net/ethernet/korina.c | 503
>+-
> 1 file changed, 204 insertions(+), 299 deletions(-)


-- 
Florian


[PATCH net-next 10/10] net: korina: bump version

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 8fa60d1fe022..fc8e80487507 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -66,8 +66,8 @@
 #include 
 
 #define DRV_NAME   "korina"
-#define DRV_VERSION"0.20"
-#define DRV_RELDATE"15Sep2017"
+#define DRV_VERSION"0.21"
+#define DRV_RELDATE"15Oct2017"
 
 #define STATION_ADDRESS_HIGH(dev) (((dev)->dev_addr[0] << 8) | \
   ((dev)->dev_addr[1]))
-- 
2.11.0



[PATCH net-next 07/10] net: korina: move tx to napi context

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 79 +++
 1 file changed, 20 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index a076b0c49e6e..f606f1e01af1 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -185,10 +185,6 @@ static inline void korina_abort_dma(struct net_device *dev,
 {
if (readl(>dmac) & DMA_CHAN_RUN_BIT) {
writel(0x10, >dmac);
-
-   while (!(readl(>dmas) & DMA_STAT_HALT))
-   netif_trans_update(dev);
-
writel(0, >dmas);
}
 
@@ -260,12 +256,9 @@ static void mdio_write(struct net_device *dev, int mii_id, 
int reg, int val)
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
-   unsigned long flags;
u32 chain_prev, chain_next, dmandptr;
struct dma_desc *td;
 
-   spin_lock_irqsave(>lock, flags);
-
td = >td_ring[lp->tx_chain_tail];
 
/* stop queue when full, drop pkts if queue already full */
@@ -276,7 +269,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
if (lp->tx_count > (KORINA_NUM_TDS - 2)) {
dev->stats.tx_dropped++;
dev_kfree_skb_any(skb);
-   spin_unlock_irqrestore(>lock, flags);
 
return NETDEV_TX_BUSY;
}
@@ -319,9 +311,6 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
 
dma_cache_wback((u32) td, sizeof(*td));
 
-   netif_trans_update(dev);
-   spin_unlock_irqrestore(>lock, flags);
-
return NETDEV_TX_OK;
 }
 
@@ -332,8 +321,6 @@ static void korina_tx(struct net_device *dev)
u32 devcs;
u32 dmas;
 
-   spin_lock(>lock);
-
/* Process all desc that are done */
while (IS_DMA_FINISHED(td->control)) {
if (lp->tx_full == 1) {
@@ -389,10 +376,6 @@ static void korina_tx(struct net_device *dev)
/* Clear the DMA status register */
dmas = readl(>tx_dma_regs->dmas);
writel(~dmas, >tx_dma_regs->dmas);
-
-   korina_int_enable_tx(lp);
-
-   spin_unlock(>lock);
 }
 
 static int korina_rx(struct net_device *dev, int limit)
@@ -513,67 +496,47 @@ static int korina_poll(struct napi_struct *napi, int 
budget)
struct net_device *dev = lp->dev;
int work_done;
 
+   korina_tx(dev);
+
work_done = korina_rx(dev, budget);
if (work_done < budget) {
napi_complete_done(napi, work_done);
+   spin_lock_bh(>lock);
+   korina_int_enable_tx(lp);
korina_int_enable_rx(lp);
+   spin_unlock_bh(>lock);
}
return work_done;
 }
 
-static irqreturn_t
-korina_tx_dma_interrupt(int irq, void *dev_id)
-{
-   struct net_device *dev = dev_id;
-   struct korina_private *lp = netdev_priv(dev);
-   u32 dmas;
-
-   dmas = readl(>tx_dma_regs->dmas);
-
-   if (likely(dmas & KORINA_INT_TX)) {
-   korina_int_disable_tx(lp);
-
-   korina_tx(dev);
-
-   if (lp->tx_chain_status == desc_filled &&
-   (readl(&(lp->tx_dma_regs->dmandptr)) == 0)) {
-   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
-   &(lp->tx_dma_regs->dmandptr));
-   lp->tx_chain_status = desc_empty;
-   lp->tx_chain_head = lp->tx_chain_tail;
-   netif_trans_update(dev);
-   }
-
-   if (unlikely(dmas & DMA_STAT_ERR))
-   lp->dma_halt_cnt++;
-
-   return IRQ_HANDLED;
-   }
-
-   return IRQ_NONE;
-}
-
 /* Ethernet Rx DMA interrupt */
-static irqreturn_t korina_rx_dma_interrupt(int irq, void *dev_id)
+static irqreturn_t korina_dma_interrupt(int irq, void *dev_id)
 {
struct net_device *dev = dev_id;
struct korina_private *lp = netdev_priv(dev);
u32 dmas;
+   unsigned long flags;
+   irqreturn_t ret = IRQ_NONE;
 
-   dmas = readl(>rx_dma_regs->dmas);
+   spin_lock_irqsave(>lock, flags);
 
-   if (likely(dmas & KORINA_INT_RX)) {
+   dmas = readl(>tx_dma_regs->dmas);
+   dmas |= readl(>rx_dma_regs->dmas);
+
+   if (likely(dmas & KORINA_INT_TXRX)) {
korina_int_disable_rx(lp);
+   korina_int_disable_tx(lp);
 
napi_schedule(>napi);
 
if (unlikely(dmas & DMA_STAT_ERR))
lp->dma_halt_cnt++;
 
-   return IRQ_HANDLED;
+   ret = IRQ_HANDLED;
}
 
-   return IRQ_NONE;
+   spin_unlock_irqrestore(>lock, flags);
+   return ret;
 }
 
 /*
@@ -792,8 +755,6 @@ static int korina_init(struct net_device *dev)
 
 

[PATCH net-next 09/10] net: korina: use dma api instead of dma_cache_* functions

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 44 ++-
 1 file changed, 18 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 9520fa1e35a2..8fa60d1fe022 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -113,6 +113,7 @@ struct korina_private {
struct dma_reg *tx_dma_regs;
struct dma_desc *td_ring; /* transmit descriptor ring */
struct dma_desc *rd_ring; /* receive descriptor ring  */
+   dma_addr_t ring_dma;
 
struct sk_buff *tx_skb[KORINA_NUM_TDS];
struct sk_buff *rx_skb[KORINA_NUM_RDS];
@@ -257,6 +258,7 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
struct korina_private *lp = netdev_priv(dev);
u32 chain_prev, chain_next, dmandptr;
struct dma_desc *td;
+   dma_addr_t dma_addr;
 
td = >td_ring[lp->tx_chain_tail];
 
@@ -267,10 +269,10 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->tx_count++;
lp->tx_skb[lp->tx_chain_tail] = skb;
 
-   dma_cache_wback((u32)skb->data, skb->len);
+   dma_addr = dma_map_single(>dev, skb->data, skb->len, 
DMA_TO_DEVICE);
 
-   /* Setup the transmit descriptor. */
-   dma_cache_inv((u32) td, sizeof(*td));
+   /* flush descriptor */
+   wmb();
 
td->ca = CPHYSADDR(skb->data);
chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
@@ -299,7 +301,7 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
lp->tx_chain_status = desc_filled;
}
 
-   dma_cache_wback((u32) td, sizeof(*td));
+   dma_unmap_single(>dev, dma_addr, skb->len, DMA_TO_DEVICE);
 
return NETDEV_TX_OK;
 }
@@ -375,8 +377,6 @@ static int korina_rx(struct net_device *dev, int limit)
u32 devcs, pkt_len, dmas;
int count;
 
-   dma_cache_inv((u32)rd, sizeof(*rd));
-
for (count = 0; count < limit; count++) {
skb = lp->rx_skb[lp->rx_next_done];
skb_new = NULL;
@@ -416,9 +416,6 @@ static int korina_rx(struct net_device *dev, int limit)
 * descriptor then */
pkt_buf = (u8 *)lp->rx_skb[lp->rx_next_done]->data;
 
-   /* invalidate the cache */
-   dma_cache_inv((unsigned long)pkt_buf, pkt_len - 4);
-
/* Malloc up new buffer. */
skb_new = netdev_alloc_skb_ip_align(dev, KORINA_RBSIZE);
 
@@ -455,7 +452,6 @@ static int korina_rx(struct net_device *dev, int limit)
~DMA_DESC_COD;
 
lp->rx_next_done = (lp->rx_next_done + 1) & KORINA_RDS_MASK;
-   dma_cache_wback((u32)rd, sizeof(*rd));
rd = >rd_ring[lp->rx_next_done];
writel(~DMA_STAT_DONE, >rx_dma_regs->dmas);
}
@@ -470,7 +466,6 @@ static int korina_rx(struct net_device *dev, int limit)
rd->devcs = 0;
skb = lp->rx_skb[lp->rx_next_done];
rd->ca = CPHYSADDR(skb->data);
-   dma_cache_wback((u32)rd, sizeof(*rd));
korina_chain_rx(lp, rd);
}
 
@@ -676,6 +671,13 @@ static int korina_alloc_ring(struct net_device *dev)
struct sk_buff *skb;
int i;
 
+   lp->td_ring = dma_alloc_coherent(NULL, TD_RING_SIZE + RD_RING_SIZE,
+   >ring_dma, GFP_ATOMIC);
+   if (!lp->td_ring)
+   return -ENOMEM;
+
+   lp->rd_ring = >td_ring[KORINA_NUM_TDS];
+
/* Initialize the transmit descriptors */
for (i = 0; i < KORINA_NUM_TDS; i++) {
lp->td_ring[i].control = DMA_DESC_IOF;
@@ -707,6 +709,8 @@ static int korina_alloc_ring(struct net_device *dev)
 
lp->rx_next_done  = 0;
 
+   wmb();
+
return 0;
 }
 
@@ -728,6 +732,9 @@ static void korina_free_ring(struct net_device *dev)
dev_kfree_skb_any(lp->tx_skb[i]);
lp->tx_skb[i] = NULL;
}
+
+   dma_free_coherent(NULL, TD_RING_SIZE + RD_RING_SIZE,
+   lp->td_ring, lp->ring_dma);
 }
 
 /*
@@ -979,19 +986,6 @@ static int korina_probe(struct platform_device *pdev)
goto probe_err_dma_tx;
}
 
-   lp->td_ring = kmalloc(TD_RING_SIZE + RD_RING_SIZE, GFP_KERNEL);
-   if (!lp->td_ring) {
-   rc = -ENXIO;
-   goto probe_err_td_ring;
-   }
-
-   dma_cache_inv((unsigned long)(lp->td_ring),
-   TD_RING_SIZE + RD_RING_SIZE);
-
-   /* now convert TD_RING pointer to KSEG1 */
-   lp->td_ring = (struct dma_desc *)KSEG1ADDR(lp->td_ring);
-   lp->rd_ring = >td_ring[KORINA_NUM_TDS];
-
spin_lock_init(>lock);
/* just use the rx dma irq */
dev->irq = lp->rx_irq;
@@ -1026,8 +1020,6 @@ static int korina_probe(struct 

[PATCH net-next 08/10] net: korina: optimize tx_full condition

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 20 
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index f606f1e01af1..9520fa1e35a2 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -124,7 +124,6 @@ struct korina_private {
int tx_chain_tail;
enum chain_status tx_chain_status;
int tx_count;
-   int tx_full;
 
int rx_irq;
int tx_irq;
@@ -261,19 +260,10 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
 
td = >td_ring[lp->tx_chain_tail];
 
-   /* stop queue when full, drop pkts if queue already full */
-   if (lp->tx_count >= (KORINA_NUM_TDS - 2)) {
-   lp->tx_full = 1;
+   /* stop queue when full */
+   if (unlikely(lp->tx_count > (KORINA_NUM_TDS - 3)))
netif_stop_queue(dev);
 
-   if (lp->tx_count > (KORINA_NUM_TDS - 2)) {
-   dev->stats.tx_dropped++;
-   dev_kfree_skb_any(skb);
-
-   return NETDEV_TX_BUSY;
-   }
-   }
-
lp->tx_count++;
lp->tx_skb[lp->tx_chain_tail] = skb;
 
@@ -323,10 +313,8 @@ static void korina_tx(struct net_device *dev)
 
/* Process all desc that are done */
while (IS_DMA_FINISHED(td->control)) {
-   if (lp->tx_full == 1) {
+   if (unlikely(lp->tx_count > (KORINA_NUM_TDS - 2)))
netif_wake_queue(dev);
-   lp->tx_full = 0;
-   }
 
devcs = lp->td_ring[lp->tx_next_done].devcs;
 
@@ -696,7 +684,7 @@ static int korina_alloc_ring(struct net_device *dev)
lp->td_ring[i].link = 0;
}
lp->tx_next_done = lp->tx_chain_head = lp->tx_chain_tail =
-   lp->tx_full = lp->tx_count = 0;
+   lp->tx_count = 0;
lp->tx_chain_status = desc_empty;
 
/* Initialize the receive descriptors */
-- 
2.11.0



[PATCH net-next 01/10] net: korina: optimize korina_send_packet logic

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 76 ++-
 1 file changed, 24 insertions(+), 52 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 7cecd9dbc111..84b2654e2d06 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -200,8 +200,7 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
unsigned long flags;
-   u32 length;
-   u32 chain_prev, chain_next;
+   u32 chain_prev, chain_next, dmandptr;
struct dma_desc *td;
 
spin_lock_irqsave(>lock, flags);
@@ -211,10 +210,9 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
/* stop queue when full, drop pkts if queue already full */
if (lp->tx_count >= (KORINA_NUM_TDS - 2)) {
lp->tx_full = 1;
+   netif_stop_queue(dev);
 
-   if (lp->tx_count == (KORINA_NUM_TDS - 2))
-   netif_stop_queue(dev);
-   else {
+   if (lp->tx_count > (KORINA_NUM_TDS - 2)) {
dev->stats.tx_dropped++;
dev_kfree_skb_any(skb);
spin_unlock_irqrestore(>lock, flags);
@@ -224,66 +222,40 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
}
 
lp->tx_count++;
-
lp->tx_skb[lp->tx_chain_tail] = skb;
 
-   length = skb->len;
dma_cache_wback((u32)skb->data, skb->len);
 
/* Setup the transmit descriptor. */
dma_cache_inv((u32) td, sizeof(*td));
+
td->ca = CPHYSADDR(skb->data);
chain_prev = (lp->tx_chain_tail - 1) & KORINA_TDS_MASK;
chain_next = (lp->tx_chain_tail + 1) & KORINA_TDS_MASK;
 
-   if (readl(&(lp->tx_dma_regs->dmandptr)) == 0) {
-   if (lp->tx_chain_status == desc_empty) {
-   /* Update tail */
-   td->control = DMA_COUNT(length) |
-   DMA_DESC_COF | DMA_DESC_IOF;
-   /* Move tail */
-   lp->tx_chain_tail = chain_next;
-   /* Write to NDPTR */
-   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
+   dmandptr = readl(&(lp->tx_dma_regs->dmandptr));
+   /* Update tail */
+   td->control = DMA_COUNT(skb->len) | DMA_DESC_COF | DMA_DESC_IOF;
+   /* Move tail */
+   lp->tx_chain_tail = chain_next;
+
+   if (lp->tx_chain_status != desc_empty) {
+   /* Link to prev */
+   lp->td_ring[chain_prev].control &= ~DMA_DESC_COF;
+   lp->td_ring[chain_prev].link =  CPHYSADDR(td);
+   }
+
+   if (!dmandptr) {
+   /* Write to NDPTR */
+   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
>tx_dma_regs->dmandptr);
-   /* Move head to tail */
-   lp->tx_chain_head = lp->tx_chain_tail;
-   } else {
-   /* Update tail */
-   td->control = DMA_COUNT(length) |
-   DMA_DESC_COF | DMA_DESC_IOF;
-   /* Link to prev */
-   lp->td_ring[chain_prev].control &=
-   ~DMA_DESC_COF;
-   /* Link to prev */
-   lp->td_ring[chain_prev].link =  CPHYSADDR(td);
-   /* Move tail */
-   lp->tx_chain_tail = chain_next;
-   /* Write to NDPTR */
-   writel(CPHYSADDR(>td_ring[lp->tx_chain_head]),
-   &(lp->tx_dma_regs->dmandptr));
-   /* Move head to tail */
-   lp->tx_chain_head = lp->tx_chain_tail;
-   lp->tx_chain_status = desc_empty;
-   }
+   /* Move head to tail */
+   lp->tx_chain_head = lp->tx_chain_tail;
+   lp->tx_chain_status = desc_empty;
} else {
-   if (lp->tx_chain_status == desc_empty) {
-   /* Update tail */
-   td->control = DMA_COUNT(length) |
-   DMA_DESC_COF | DMA_DESC_IOF;
-   /* Move tail */
-   lp->tx_chain_tail = chain_next;
-   lp->tx_chain_status = desc_filled;
-   } else {
-   /* Update tail */
-   td->control = DMA_COUNT(length) |
-   DMA_DESC_COF | DMA_DESC_IOF;
-   lp->td_ring[chain_prev].control &=
-   ~DMA_DESC_COF;
-   lp->td_ring[chain_prev].link =  CPHYSADDR(td);
-

[PATCH net-next 02/10] net: korina: reorder functions

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 317 +-
 1 file changed, 158 insertions(+), 159 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 84b2654e2d06..5545f86aac4a 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -195,6 +195,35 @@ static void korina_chain_rx(struct korina_private *lp,
korina_chain_dma(lp->rx_dma_regs, CPHYSADDR(rd));
 }
 
+static int mdio_read(struct net_device *dev, int mii_id, int reg)
+{
+   struct korina_private *lp = netdev_priv(dev);
+   int ret;
+
+   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+
+   writel(0, >eth_regs->miimcfg);
+   writel(0, >eth_regs->miimcmd);
+   writel(mii_id | reg, >eth_regs->miimaddr);
+   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+
+   ret = (int)(readl(>eth_regs->miimrdd));
+   return ret;
+}
+
+static void mdio_write(struct net_device *dev, int mii_id, int reg, int val)
+{
+   struct korina_private *lp = netdev_priv(dev);
+
+   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+
+   writel(0, >eth_regs->miimcfg);
+   writel(1, >eth_regs->miimcmd);
+   writel(mii_id | reg, >eth_regs->miimaddr);
+   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   writel(val, >eth_regs->miimwtd);
+}
+
 /* transmit packet */
 static int korina_send_packet(struct sk_buff *skb, struct net_device *dev)
 {
@@ -264,60 +293,87 @@ static int korina_send_packet(struct sk_buff *skb, struct 
net_device *dev)
return NETDEV_TX_OK;
 }
 
-static int mdio_read(struct net_device *dev, int mii_id, int reg)
+static void korina_tx(struct net_device *dev)
 {
struct korina_private *lp = netdev_priv(dev);
-   int ret;
+   struct dma_desc *td = >td_ring[lp->tx_next_done];
+   u32 devcs;
+   u32 dmas;
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   spin_lock(>lock);
 
-   writel(0, >eth_regs->miimcfg);
-   writel(0, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
+   /* Process all desc that are done */
+   while (IS_DMA_FINISHED(td->control)) {
+   if (lp->tx_full == 1) {
+   netif_wake_queue(dev);
+   lp->tx_full = 0;
+   }
 
-   ret = (int)(readl(>eth_regs->miimrdd));
-   return ret;
-}
+   devcs = lp->td_ring[lp->tx_next_done].devcs;
+   if ((devcs & (ETH_TX_FD | ETH_TX_LD)) !=
+   (ETH_TX_FD | ETH_TX_LD)) {
+   dev->stats.tx_errors++;
+   dev->stats.tx_dropped++;
 
-static void mdio_write(struct net_device *dev, int mii_id, int reg, int val)
-{
-   struct korina_private *lp = netdev_priv(dev);
+   /* Should never happen */
+   printk(KERN_ERR "%s: split tx ignored\n",
+   dev->name);
+   } else if (devcs & ETH_TX_TOK) {
+   dev->stats.tx_packets++;
+   dev->stats.tx_bytes +=
+   lp->tx_skb[lp->tx_next_done]->len;
+   } else {
+   dev->stats.tx_errors++;
+   dev->stats.tx_dropped++;
 
-   mii_id = ((lp->rx_irq == 0x2c ? 1 : 0) << 8);
+   /* Underflow */
+   if (devcs & ETH_TX_UND)
+   dev->stats.tx_fifo_errors++;
 
-   writel(0, >eth_regs->miimcfg);
-   writel(1, >eth_regs->miimcmd);
-   writel(mii_id | reg, >eth_regs->miimaddr);
-   writel(ETH_MII_CMD_SCN, >eth_regs->miimcmd);
-   writel(val, >eth_regs->miimwtd);
-}
+   /* Oversized frame */
+   if (devcs & ETH_TX_OF)
+   dev->stats.tx_aborted_errors++;
 
-/* Ethernet Rx DMA interrupt */
-static irqreturn_t korina_rx_dma_interrupt(int irq, void *dev_id)
-{
-   struct net_device *dev = dev_id;
-   struct korina_private *lp = netdev_priv(dev);
-   u32 dmas, dmasm;
-   irqreturn_t retval;
+   /* Excessive deferrals */
+   if (devcs & ETH_TX_ED)
+   dev->stats.tx_carrier_errors++;
 
-   dmas = readl(>rx_dma_regs->dmas);
-   if (dmas & (DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR)) {
-   dmasm = readl(>rx_dma_regs->dmasm);
-   writel(dmasm | (DMA_STAT_DONE |
-   DMA_STAT_HALT | DMA_STAT_ERR),
-   >rx_dma_regs->dmasm);
+   /* Collisions: medium busy */
+   if (devcs & ETH_TX_EC)
+   dev->stats.collisions++;
 
-   napi_schedule(>napi);
+   /* Late collision */
+  

[PATCH net-next 05/10] net: korina: remove unused korina_private members

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 7 ---
 1 file changed, 7 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 64ae32af7539..59d4554c43c9 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -118,9 +118,6 @@ struct korina_private {
struct sk_buff *rx_skb[KORINA_NUM_RDS];
 
int rx_next_done;
-   int rx_chain_head;
-   int rx_chain_tail;
-   enum chain_status rx_chain_status;
 
int tx_next_done;
int tx_chain_head;
@@ -135,7 +132,6 @@ struct korina_private {
spinlock_t lock;/* NIC xmit lock */
 
int dma_halt_cnt;
-   int dma_run_cnt;
struct napi_struct napi;
struct timer_list media_check_timer;
struct mii_if_info mii_if;
@@ -770,9 +766,6 @@ static int korina_alloc_ring(struct net_device *dev)
lp->rd_ring[i - 1].control |= DMA_DESC_COD;
 
lp->rx_next_done  = 0;
-   lp->rx_chain_head = 0;
-   lp->rx_chain_tail = 0;
-   lp->rx_chain_status = desc_empty;
 
return 0;
 }
-- 
2.11.0



[PATCH net-next 03/10] net: korina: introduce and use interrupt disable/enable helpers

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 83 +--
 1 file changed, 48 insertions(+), 35 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 5545f86aac4a..04c1a3f38a79 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -91,6 +91,10 @@
 #define RD_RING_SIZE   (KORINA_NUM_RDS * sizeof(struct dma_desc))
 #define TD_RING_SIZE   (KORINA_NUM_TDS * sizeof(struct dma_desc))
 
+#define KORINA_INT_TX  (DMA_STAT_FINI | DMA_STAT_ERR)
+#define KORINA_INT_RX  (DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR)
+#define KORINA_INT_TXRX(KORINA_INT_TX | KORINA_INT_RX)
+
 #define TX_TIMEOUT (6000 * HZ / 1000)
 
 enum chain_status {
@@ -142,6 +146,38 @@ struct korina_private {
 
 extern unsigned int idt_cpu_freq;
 
+static inline void korina_int_disable(u32 *dmasm, u32 ints)
+{
+   u32 tmp = readl(dmasm);
+   writel(tmp | ints, dmasm);
+}
+
+static inline void korina_int_enable(u32 *dmasm, u32 ints)
+{
+   u32 tmp = readl(dmasm);
+   writel(tmp & ~ints, dmasm);
+}
+
+static inline void korina_int_disable_tx(struct korina_private *kp)
+{
+   korina_int_disable(>tx_dma_regs->dmasm, KORINA_INT_TX);
+}
+
+static inline void korina_int_disable_rx(struct korina_private *kp)
+{
+   korina_int_disable(>rx_dma_regs->dmasm, KORINA_INT_RX);
+}
+
+static inline void korina_int_enable_tx(struct korina_private *kp)
+{
+   korina_int_enable(>tx_dma_regs->dmasm, KORINA_INT_TX);
+}
+
+static inline void korina_int_enable_rx(struct korina_private *kp)
+{
+   korina_int_enable(>rx_dma_regs->dmasm, KORINA_INT_RX);
+}
+
 static inline void korina_start_dma(struct dma_reg *ch, u32 dma_addr)
 {
writel(0, >dmandptr);
@@ -369,9 +405,7 @@ static void korina_tx(struct net_device *dev)
dmas = readl(>tx_dma_regs->dmas);
writel(~dmas, >tx_dma_regs->dmas);
 
-   writel(readl(>tx_dma_regs->dmasm) &
-   ~(DMA_STAT_FINI | DMA_STAT_ERR),
-   >tx_dma_regs->dmasm);
+   korina_int_enable_tx(lp);
 
spin_unlock(>lock);
 }
@@ -497,10 +531,7 @@ static int korina_poll(struct napi_struct *napi, int 
budget)
work_done = korina_rx(dev, budget);
if (work_done < budget) {
napi_complete_done(napi, work_done);
-
-   writel(readl(>rx_dma_regs->dmasm) &
-   ~(DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR),
-   >rx_dma_regs->dmasm);
+   korina_int_enable_rx(lp);
}
return work_done;
 }
@@ -510,15 +541,13 @@ korina_tx_dma_interrupt(int irq, void *dev_id)
 {
struct net_device *dev = dev_id;
struct korina_private *lp = netdev_priv(dev);
-   u32 dmas, dmasm;
+   u32 dmas;
irqreturn_t retval;
 
dmas = readl(>tx_dma_regs->dmas);
 
if (dmas & (DMA_STAT_FINI | DMA_STAT_ERR)) {
-   dmasm = readl(>tx_dma_regs->dmasm);
-   writel(dmasm | (DMA_STAT_FINI | DMA_STAT_ERR),
-   >tx_dma_regs->dmasm);
+   korina_int_disable_tx(lp);
 
korina_tx(dev);
 
@@ -545,15 +574,12 @@ static irqreturn_t korina_rx_dma_interrupt(int irq, void 
*dev_id)
 {
struct net_device *dev = dev_id;
struct korina_private *lp = netdev_priv(dev);
-   u32 dmas, dmasm;
+   u32 dmas;
irqreturn_t retval;
 
dmas = readl(>rx_dma_regs->dmas);
if (dmas & (DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR)) {
-   dmasm = readl(>rx_dma_regs->dmasm);
-   writel(dmasm | (DMA_STAT_DONE |
-   DMA_STAT_HALT | DMA_STAT_ERR),
-   >rx_dma_regs->dmasm);
+   korina_int_disable_rx(lp);
 
napi_schedule(>napi);
 
@@ -803,12 +829,8 @@ static int korina_init(struct net_device *dev)
/* Start Rx DMA */
korina_start_rx(lp, >rd_ring[0]);
 
-   writel(readl(>tx_dma_regs->dmasm) &
-   ~(DMA_STAT_FINI | DMA_STAT_ERR),
-   >tx_dma_regs->dmasm);
-   writel(readl(>rx_dma_regs->dmasm) &
-   ~(DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR),
-   >rx_dma_regs->dmasm);
+   korina_int_enable_tx(lp);
+   korina_int_enable_rx(lp);
 
/* Accept only packets destined for this Ethernet device address */
writel(ETH_ARC_AB, >eth_regs->etharc);
@@ -867,12 +889,8 @@ static void korina_restart_task(struct work_struct *work)
disable_irq(lp->rx_irq);
disable_irq(lp->tx_irq);
 
-   writel(readl(>tx_dma_regs->dmasm) |
-   DMA_STAT_FINI | DMA_STAT_ERR,
-   >tx_dma_regs->dmasm);
-   writel(readl(>rx_dma_regs->dmasm) |
-   DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR,
-   

[PATCH net-next 00/10] korina cleanups/optimizations

2017-10-15 Thread Roman Yeryomin
TX optimizations have led to ~15% performance increase (35->40Mbps)
in local tx usecase (tested with iperf v3.2).

Roman Yeryomin (10):
  net: korina: optimize korina_send_packet logic
  net: korina: reorder functions
  net: korina: introduce and use interrupt disable/enable helpers
  net: korina: optimize tx/rx interrupt handlers
  net: korina: remove unused korina_private members
  net: korina: optimize tx descriptor flags processing
  net: korina: move tx to napi context
  net: korina: optimize tx_full condition
  net: korina: use dma api instead of dma_cache_* functions
  net: korina: bump version

 drivers/net/ethernet/korina.c | 503 +-
 1 file changed, 204 insertions(+), 299 deletions(-)

-- 
2.11.0



[PATCH net-next 06/10] net: korina: optimize tx descriptor flags processing

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 59d4554c43c9..a076b0c49e6e 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -342,43 +342,33 @@ static void korina_tx(struct net_device *dev)
}
 
devcs = lp->td_ring[lp->tx_next_done].devcs;
-   if ((devcs & (ETH_TX_FD | ETH_TX_LD)) !=
-   (ETH_TX_FD | ETH_TX_LD)) {
-   dev->stats.tx_errors++;
-   dev->stats.tx_dropped++;
 
-   /* Should never happen */
-   printk(KERN_ERR "%s: split tx ignored\n",
-   dev->name);
-   } else if (devcs & ETH_TX_TOK) {
-   dev->stats.tx_packets++;
-   dev->stats.tx_bytes +=
-   lp->tx_skb[lp->tx_next_done]->len;
-   } else {
+   if (!(devcs & ETH_TX_TOK)) {
dev->stats.tx_errors++;
dev->stats.tx_dropped++;
 
/* Underflow */
if (devcs & ETH_TX_UND)
dev->stats.tx_fifo_errors++;
-
/* Oversized frame */
if (devcs & ETH_TX_OF)
dev->stats.tx_aborted_errors++;
-
/* Excessive deferrals */
if (devcs & ETH_TX_ED)
dev->stats.tx_carrier_errors++;
-
/* Collisions: medium busy */
if (devcs & ETH_TX_EC)
dev->stats.collisions++;
-
/* Late collision */
if (devcs & ETH_TX_LC)
dev->stats.tx_window_errors++;
+
+   goto next;
}
 
+   dev->stats.tx_packets++;
+   dev->stats.tx_bytes += lp->tx_skb[lp->tx_next_done]->len;
+next:
/* We must always free the original skb */
if (lp->tx_skb[lp->tx_next_done]) {
dev_kfree_skb_any(lp->tx_skb[lp->tx_next_done]);
@@ -394,7 +384,6 @@ static void korina_tx(struct net_device *dev)
/* Go on to next transmission */
lp->tx_next_done = (lp->tx_next_done + 1) & KORINA_TDS_MASK;
td = >td_ring[lp->tx_next_done];
-
}
 
/* Clear the DMA status register */
-- 
2.11.0



[PATCH net-next 04/10] net: korina: optimize tx/rx interrupt handlers

2017-10-15 Thread Roman Yeryomin
Signed-off-by: Roman Yeryomin 
---
 drivers/net/ethernet/korina.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/korina.c b/drivers/net/ethernet/korina.c
index 04c1a3f38a79..64ae32af7539 100644
--- a/drivers/net/ethernet/korina.c
+++ b/drivers/net/ethernet/korina.c
@@ -542,11 +542,10 @@ korina_tx_dma_interrupt(int irq, void *dev_id)
struct net_device *dev = dev_id;
struct korina_private *lp = netdev_priv(dev);
u32 dmas;
-   irqreturn_t retval;
 
dmas = readl(>tx_dma_regs->dmas);
 
-   if (dmas & (DMA_STAT_FINI | DMA_STAT_ERR)) {
+   if (likely(dmas & KORINA_INT_TX)) {
korina_int_disable_tx(lp);
 
korina_tx(dev);
@@ -559,14 +558,14 @@ korina_tx_dma_interrupt(int irq, void *dev_id)
lp->tx_chain_head = lp->tx_chain_tail;
netif_trans_update(dev);
}
-   if (dmas & DMA_STAT_ERR)
-   printk(KERN_ERR "%s: DMA error\n", dev->name);
 
-   retval = IRQ_HANDLED;
-   } else
-   retval = IRQ_NONE;
+   if (unlikely(dmas & DMA_STAT_ERR))
+   lp->dma_halt_cnt++;
 
-   return retval;
+   return IRQ_HANDLED;
+   }
+
+   return IRQ_NONE;
 }
 
 /* Ethernet Rx DMA interrupt */
@@ -575,22 +574,21 @@ static irqreturn_t korina_rx_dma_interrupt(int irq, void 
*dev_id)
struct net_device *dev = dev_id;
struct korina_private *lp = netdev_priv(dev);
u32 dmas;
-   irqreturn_t retval;
 
dmas = readl(>rx_dma_regs->dmas);
-   if (dmas & (DMA_STAT_DONE | DMA_STAT_HALT | DMA_STAT_ERR)) {
+
+   if (likely(dmas & KORINA_INT_RX)) {
korina_int_disable_rx(lp);
 
napi_schedule(>napi);
 
-   if (dmas & DMA_STAT_ERR)
-   printk(KERN_ERR "%s: DMA error\n", dev->name);
+   if (unlikely(dmas & DMA_STAT_ERR))
+   lp->dma_halt_cnt++;
 
-   retval = IRQ_HANDLED;
-   } else
-   retval = IRQ_NONE;
+   return IRQ_HANDLED;
+   }
 
-   return retval;
+   return IRQ_NONE;
 }
 
 /*
-- 
2.11.0



Re: [PATCH net-next 1/5] ipv6: addrconf: cleanup locking in ipv6_add_addr

2017-10-15 Thread David Ahern
On 10/15/17 9:59 AM, Ido Schimmel wrote:
> On Sun, Oct 15, 2017 at 09:24:07AM -0600, David Ahern wrote:
>> On 10/15/17 1:50 AM, Ido Schimmel wrote:
>>> On Fri, Oct 13, 2017 at 04:02:09PM -0700, David Ahern wrote:
 ipv6_add_addr is called in process context with rtnl lock held
 (e.g., manual config of an address) or during softirq processing
 (e.g., autoconf and address from a router advertisement).

 Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
 and does not call unlock until exit, minus the call around the address
 validator notifier. Similarly, addrconf_hash_lock is taken after the
 validator notifier and held until exit. This forces the allocation of
 inet6_ifaddr to always be atomic.

 Refactor ipv6_add_addr as follows:
 1. add an input boolean to discriminate the call path (process context
or softirq). This new flag controls whether the alloc can be done
with GFP_KERNEL or GFP_ATOMIC.

 2. Move the rcu_read_lock_bh and unlock calls only around functions that
do rcu updates.

 3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should
return an error when an address is already in use."). This was done
presumably because rcu_read_unlock_bh needs to be called before calling
the validator. Since rcu_read_lock is not needed before the validator
runs revert the hold and put added by 3ad7d2468f79f and only do the
hold when setting ifp->idev.

 4. move duplicate address check and insertion of new address in the global
address hash into a helper. The helper is called after an ifa is
allocated and filled in.

 This allows the ifa for manually configured addresses to be done with
 GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
 and hash table spinlock held.

 Signed-off-by: David Ahern 
>>>
>>> [...]
>>>
 @@ -1073,21 +1085,19 @@ ipv6_add_addr(struct inet6_dev *idev, const struct 
 in6_addr *addr,
  
in6_ifa_hold(ifa);
write_unlock(>lock);
 -out2:
 +
rcu_read_unlock_bh();
  
 -  if (likely(err == 0))
 -  inet6addr_notifier_call_chain(NETDEV_UP, ifa);
 -  else {
 +  inet6addr_notifier_call_chain(NETDEV_UP, ifa);
 +out:
 +  if (unlikely(err < 0)) {
 +  if (rt)
 +  ip6_rt_put(rt);
>>>
>>> I believe 'rt' needs to be set to NULL after addrconf_dst_alloc()
>>> fails.
>>
>> The above frees rt and the line below frees the ifa and resets the value
>> to an error, so after the line above rt is no longer referenced.
> 
> Earlier in the code we have:
> 
> rt = addrconf_dst_alloc(idev, addr, false);
> if (IS_ERR(rt)) {
>   err = PTR_ERR(rt);
>   goto out;
> }
> 
> So we end up calling ip6_rt_put() with an error value. I believe it
> should be:
> 
> rt = addrconf_dst_alloc(idev, addr, false);
> if (IS_ERR(rt)) {
>   err = PTR_ERR(rt);
>   rt = NULL;
>   goto out;
> }
> 

gotcha. Will fix.


Re: [PATCH net-next 1/5] ipv6: addrconf: cleanup locking in ipv6_add_addr

2017-10-15 Thread Ido Schimmel
On Sun, Oct 15, 2017 at 09:24:07AM -0600, David Ahern wrote:
> On 10/15/17 1:50 AM, Ido Schimmel wrote:
> > On Fri, Oct 13, 2017 at 04:02:09PM -0700, David Ahern wrote:
> >> ipv6_add_addr is called in process context with rtnl lock held
> >> (e.g., manual config of an address) or during softirq processing
> >> (e.g., autoconf and address from a router advertisement).
> >>
> >> Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
> >> and does not call unlock until exit, minus the call around the address
> >> validator notifier. Similarly, addrconf_hash_lock is taken after the
> >> validator notifier and held until exit. This forces the allocation of
> >> inet6_ifaddr to always be atomic.
> >>
> >> Refactor ipv6_add_addr as follows:
> >> 1. add an input boolean to discriminate the call path (process context
> >>or softirq). This new flag controls whether the alloc can be done
> >>with GFP_KERNEL or GFP_ATOMIC.
> >>
> >> 2. Move the rcu_read_lock_bh and unlock calls only around functions that
> >>do rcu updates.
> >>
> >> 3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should
> >>return an error when an address is already in use."). This was done
> >>presumably because rcu_read_unlock_bh needs to be called before calling
> >>the validator. Since rcu_read_lock is not needed before the validator
> >>runs revert the hold and put added by 3ad7d2468f79f and only do the
> >>hold when setting ifp->idev.
> >>
> >> 4. move duplicate address check and insertion of new address in the global
> >>address hash into a helper. The helper is called after an ifa is
> >>allocated and filled in.
> >>
> >> This allows the ifa for manually configured addresses to be done with
> >> GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
> >> and hash table spinlock held.
> >>
> >> Signed-off-by: David Ahern 
> > 
> > [...]
> > 
> >> @@ -1073,21 +1085,19 @@ ipv6_add_addr(struct inet6_dev *idev, const struct 
> >> in6_addr *addr,
> >>  
> >>in6_ifa_hold(ifa);
> >>write_unlock(>lock);
> >> -out2:
> >> +
> >>rcu_read_unlock_bh();
> >>  
> >> -  if (likely(err == 0))
> >> -  inet6addr_notifier_call_chain(NETDEV_UP, ifa);
> >> -  else {
> >> +  inet6addr_notifier_call_chain(NETDEV_UP, ifa);
> >> +out:
> >> +  if (unlikely(err < 0)) {
> >> +  if (rt)
> >> +  ip6_rt_put(rt);
> > 
> > I believe 'rt' needs to be set to NULL after addrconf_dst_alloc()
> > fails.
> 
> The above frees rt and the line below frees the ifa and resets the value
> to an error, so after the line above rt is no longer referenced.

Earlier in the code we have:

rt = addrconf_dst_alloc(idev, addr, false);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
goto out;
}

So we end up calling ip6_rt_put() with an error value. I believe it
should be:

rt = addrconf_dst_alloc(idev, addr, false);
if (IS_ERR(rt)) {
err = PTR_ERR(rt);
rt = NULL;
goto out;
}


Re: [PATCH v9 00/20] simplify crypto wait for async op

2017-10-15 Thread Herbert Xu
On Sun, Oct 15, 2017 at 10:19:45AM +0100, Gilad Ben-Yossef wrote:
>
> Changes from v8:
> - Remove the translation of EAGAIN return code to the
>   previous return code of EBUSY for the user space
>   interface of algif as no one seems to rely on it as
>   requested by Herbert Xu.

Sorry, but I forgot to mention that EAGAIN is not a good value
to use because it's used by the network system calls for other
meanings (interrupted by a signal).

So if we stop doing the translation then we also need to pick
a different value, perhaps E2BIG or something similar that have
no current use within the crypto API or network API.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH net-next 1/5] ipv6: addrconf: cleanup locking in ipv6_add_addr

2017-10-15 Thread David Ahern
On 10/15/17 1:50 AM, Ido Schimmel wrote:
> On Fri, Oct 13, 2017 at 04:02:09PM -0700, David Ahern wrote:
>> ipv6_add_addr is called in process context with rtnl lock held
>> (e.g., manual config of an address) or during softirq processing
>> (e.g., autoconf and address from a router advertisement).
>>
>> Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
>> and does not call unlock until exit, minus the call around the address
>> validator notifier. Similarly, addrconf_hash_lock is taken after the
>> validator notifier and held until exit. This forces the allocation of
>> inet6_ifaddr to always be atomic.
>>
>> Refactor ipv6_add_addr as follows:
>> 1. add an input boolean to discriminate the call path (process context
>>or softirq). This new flag controls whether the alloc can be done
>>with GFP_KERNEL or GFP_ATOMIC.
>>
>> 2. Move the rcu_read_lock_bh and unlock calls only around functions that
>>do rcu updates.
>>
>> 3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should
>>return an error when an address is already in use."). This was done
>>presumably because rcu_read_unlock_bh needs to be called before calling
>>the validator. Since rcu_read_lock is not needed before the validator
>>runs revert the hold and put added by 3ad7d2468f79f and only do the
>>hold when setting ifp->idev.
>>
>> 4. move duplicate address check and insertion of new address in the global
>>address hash into a helper. The helper is called after an ifa is
>>allocated and filled in.
>>
>> This allows the ifa for manually configured addresses to be done with
>> GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
>> and hash table spinlock held.
>>
>> Signed-off-by: David Ahern 
> 
> [...]
> 
>> @@ -1073,21 +1085,19 @@ ipv6_add_addr(struct inet6_dev *idev, const struct 
>> in6_addr *addr,
>>  
>>  in6_ifa_hold(ifa);
>>  write_unlock(>lock);
>> -out2:
>> +
>>  rcu_read_unlock_bh();
>>  
>> -if (likely(err == 0))
>> -inet6addr_notifier_call_chain(NETDEV_UP, ifa);
>> -else {
>> +inet6addr_notifier_call_chain(NETDEV_UP, ifa);
>> +out:
>> +if (unlikely(err < 0)) {
>> +if (rt)
>> +ip6_rt_put(rt);
> 
> I believe 'rt' needs to be set to NULL after addrconf_dst_alloc()
> fails.

The above frees rt and the line below frees the ifa and resets the value
to an error, so after the line above rt is no longer referenced.

Taking a look at this again, I think I am missing an idev put in the
error path here.

> 
>>  kfree(ifa);
>> -in6_dev_put(idev);
>>  ifa = ERR_PTR(err);
>>  }
>>  
>>  return ifa;
>> -out:
>> -spin_unlock(_hash_lock);
>> -goto out2;
>>  }



Re: [PATCH net-next v2 2/2] net: ethernet: socionext: add AVE ethernet driver

2017-10-15 Thread Masahiro Yamada
2017-10-13 9:35 GMT+09:00 Kunihiko Hayashi :
> +static int ave_probe(struct platform_device *pdev)
> +{
> +   struct device *dev = >dev;
> +   struct device_node *np = dev->of_node;
> +   u32 ave_id;
> +   struct ave_private *priv;
> +   const struct ave_soc_data *data;
> +   phy_interface_t phy_mode;
> +   struct net_device *ndev;
> +   struct resource *res;
> +   void __iomem *base;
> +   int irq, ret = 0;
> +   char buf[ETHTOOL_FWVERS_LEN];
> +   const void *mac_addr;
> +
> +   data = of_device_get_match_data(dev);
> +   if (WARN_ON(!data))
> +   return -EINVAL;
> +
> +   phy_mode = of_get_phy_mode(np);
> +   if (phy_mode < 0) {
> +   dev_err(dev, "phy-mode not found\n");
> +   return -EINVAL;
> +   }
> +   if ((!phy_interface_mode_is_rgmii(phy_mode)) &&
> +   phy_mode != PHY_INTERFACE_MODE_RMII &&
> +   phy_mode != PHY_INTERFACE_MODE_MII) {
> +   dev_err(dev, "phy-mode is invalid\n");
> +   return -EINVAL;
> +   }
> +
> +   irq = platform_get_irq(pdev, 0);
> +   if (irq < 0) {
> +   dev_err(dev, "IRQ not found\n");
> +   return irq;
> +   }
> +
> +   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +   base = devm_ioremap_resource(dev, res);
> +   if (IS_ERR(base))
> +   return PTR_ERR(base);
> +
> +   /* alloc netdevice */
> +   ndev = alloc_etherdev(sizeof(struct ave_private));
> +   if (!ndev) {
> +   dev_err(dev, "can't allocate ethernet device\n");
> +   return -ENOMEM;
> +   }
> +
> +   ndev->netdev_ops = _netdev_ops;
> +   ndev->ethtool_ops = _ethtool_ops;
> +   SET_NETDEV_DEV(ndev, dev);
> +
> +   /* support hardware checksum */
> +   ndev->features|= (NETIF_F_IP_CSUM | NETIF_F_RXCSUM);
> +   ndev->hw_features |= (NETIF_F_IP_CSUM | NETIF_F_RXCSUM);
> +
> +   ndev->max_mtu = AVE_MAX_ETHFRAME - (ETH_HLEN + ETH_FCS_LEN);
> +
> +   /* get mac address */
> +   mac_addr = of_get_mac_address(np);
> +   if (mac_addr)
> +   ether_addr_copy(ndev->dev_addr, mac_addr);
> +
> +   /* if the mac address is invalid, use random mac address */
> +   if (!is_valid_ether_addr(ndev->dev_addr)) {
> +   eth_hw_addr_random(ndev);
> +   dev_warn(dev, "Using random MAC address: %pM\n",
> +ndev->dev_addr);
> +   }
> +
> +   priv = netdev_priv(ndev);
> +   priv->base = base;
> +   priv->irq = irq;
> +   priv->ndev = ndev;
> +   priv->msg_enable = netif_msg_init(-1, AVE_DEFAULT_MSG_ENABLE);
> +   priv->phy_mode = phy_mode;
> +   priv->data = data;
> +
> +   if (IS_DESC_64BIT(priv)) {
> +   priv->desc_size = AVE_DESC_SIZE_64;
> +   priv->tx.daddr  = AVE_TXDM_64;
> +   priv->rx.daddr  = AVE_RXDM_64;
> +   } else {
> +   priv->desc_size = AVE_DESC_SIZE_32;
> +   priv->tx.daddr  = AVE_TXDM_32;
> +   priv->rx.daddr  = AVE_RXDM_32;
> +   }
> +   priv->tx.ndesc = AVE_NR_TXDESC;
> +   priv->rx.ndesc = AVE_NR_RXDESC;
> +
> +   u64_stats_init(>stats_rx.syncp);
> +   u64_stats_init(>stats_tx.syncp);
> +
> +   /* get clock */

Please remove this super-obvious comment.


> +   priv->clk = clk_get(dev, NULL);

Missing clk_put() in the failure path.

Why don't you use devm?



> +   if (IS_ERR(priv->clk))
> +   priv->clk = NULL;

So, clk is optional, but
you need to check EPROBE_DEFER.




> +   /* get reset */

Remove.


> +   priv->rst = reset_control_get(dev, NULL);
> +   if (IS_ERR(priv->rst))
> +   priv->rst = NULL;


reset_control_get() is deprecated.  Do not use it in a new driver.

Again, missing reset_control_put().   devm?


The reset seems optional (again, ignoring EPROBE_DEFER)
but you did not use reset_control_get_optional, why?

>From your code, this reset is used as shared.


priv->rst = devm_reset_control_get_optional_shared(dev, NULL);
if (IS_ERR(priv->rst))
  return PTR_ERR(priv->rst);








-- 
Best Regards
Masahiro Yamada


Re: Linux 4.12+ memory leak on router with i40e NICs

2017-10-15 Thread Paweł Staszewski

Previously attached graphs was for:

4.14.0-rc4-next-20171012

from git:

git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git


In kernel drivers.
Just tested by replacing cards in server from 8x10G based on 82599 to
02:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 01)
02:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 01)
02:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 01)
02:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 01)
03:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 02)
03:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 02)
03:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 02)
03:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 
for 10GbE SFP+ (rev 02)


And with same configuration - have leaking memory somewhere - there is 
no process that can

 ps aux --sort -rss
USER   PID %CPU %MEM    VSZ   RSS TTY  STAT START   TIME COMMAND
root  5103 41.2 31.1 10357508 10242860 ?   Sl   Oct14 1010:33 
/usr/local/sbin/bgpd -d -A 127.0.0.1 -u root -g root -I 
--ignore_warnings -F /usr/local/etc/Quagga.conf -t
root  5094  0.0  0.8 295372 270868 ?   Ss   Oct14   1:26 
/usr/local/sbin/zebra -d -A 127.0.0.1 -u root -g root -I 
--ignore_warnings -F /usr/local/etc/Quagga.conf
root  4356  3.4  0.2  98780 75852 ?    S    Oct14  84:21 
/usr/sbin/snmpd -p /var/run/snmpd.pid -Ln -I -smux
root  3448  0.0  0.0  32172  6204 ?    Ss   Oct14   0:00 
/sbin/udevd --daemon
root  5385  0.0  0.0  61636  5044 ?    Ss   Oct14   0:00 sshd: 
paol [priv]
root  4116  0.0  0.0 346312  4804 ?    Ss   Oct14   0:33 
/usr/sbin/syslog-ng --persist-file /var/lib/syslog-ng/syslog-ng.persist 
--cfgfile /etc/syslog-ng/syslog-ng.conf --pidfile /run/syslog-ng.pid
paol  5390  0.0  0.0  61636  3564 ?    S    Oct14   0:06 sshd: 
paol@pts/1
root  5403  0.0  0.0 709344  3520 ?    Ssl  Oct14   0:26 
/opt/collectd/sbin/collectd

root  5397  0.0  0.0  18280  3288 pts/1    S    Oct14   0:00 -su
root  4384  0.0  0.0  30472  3016 ?    Ss   Oct14   0:00 
/usr/sbin/sshd

paol  5391  0.0  0.0  18180  2884 pts/1    Ss   Oct14   0:00 -bash
root  5394  0.0  0.0  43988  2376 pts/1    S    Oct14   0:00 su -
root 20815  0.0  0.0  17744  2312 pts/1    R+   16:58   0:00 ps aux 
--sort -rss
root  4438  0.0  0.0  28820  2256 ?    S    Oct14   0:00 teamd 
-d -f /etc/teamd.conf
root  4030  0.0  0.0   6976  2024 ?    Ss   Oct14   0:00 mdadm 
--monitor --scan --daemonise --pid-file /var/run/mdadm.pid --syslog
root  4408  0.0  0.0  16768  1884 ?    Ss   Oct14   0:00 
/usr/sbin/cron
root  5357  0.0  0.0 120532  1724 tty6 Ss+  Oct14   0:00 
/sbin/agetty 38400 tty6 linux
root  5352  0.0  0.0 120532  1712 tty1 Ss+  Oct14   0:00 
/sbin/agetty 38400 tty1 linux
root  5356  0.0  0.0 120532  1692 tty5 Ss+  Oct14   0:00 
/sbin/agetty 38400 tty5 linux
root  5353  0.0  0.0 120532  1648 tty2 Ss+  Oct14   0:00 
/sbin/agetty 38400 tty2 linux
root  5355  0.0  0.0 120532  1628 tty4 Ss+  Oct14   0:00 
/sbin/agetty 38400 tty4 linux
root  5354  0.0  0.0 120532  1620 tty3 Ss+  Oct14   0:00 
/sbin/agetty 38400 tty3 linux

root 1  0.0  0.0   4184  1420 ?    Ss   Oct14   0:02 init [3]
root  4115  0.0  0.0  34336   608 ?    S    Oct14   0:00 
supervising syslog-ng

root 2  0.0  0.0  0 0 ?    S    Oct14   0:00 [kthreadd]
root 4  0.0  0.0  0 0 ?    I<   Oct14   0:00 
[kworker/0:0H]
root 6  0.0  0.0  0 0 ?    I<   Oct14   0:00 
[mm_percpu_wq]
root 7  0.8  0.0  0 0 ?    S    Oct14  22:01 
[ksoftirqd/0]

root 8  0.0  0.0  0 0 ?    I    Oct14   1:36 [rcu_sched]
root 9  0.0  0.0  0 0 ?    I    Oct14   0:00 [rcu_bh]
root    10  0.0  0.0  0 0 ?    S    Oct14   0:00 
[migration/0]

root    11  0.0  0.0  0 0 ?    S    Oct14   0:00 [cpuhp/0]
root    12  0.0  0.0  0 0 ?    S    Oct14   0:00 [cpuhp/1]
root    13  0.0  0.0  0 0 ?    S    Oct14   0:00 
[migration/1]
root    14  0.8  0.0  0 0 ?    S    Oct14  21:39 
[ksoftirqd/1]
root    16  0.0  0.0  0 0 ?    I<   Oct14   0:00 
[kworker/1:0H]

root    17  0.0  0.0  0 0 ?    S    Oct14   0:00 [cpuhp/2]
root    18  0.0  0.0  0 0 ?    S    Oct14   0:00 
[migration/2]
root    19  0.8  0.0  0 0 ?    S    Oct14  20:48 
[ksoftirqd/2]
root    21  0.0  0.0  0 0 ?    I<   Oct14   0:00 
[kworker/2:0H]

root    22  0.0  0.0  0 0 ?    S    Oct14   0:00 [cpuhp/3]
root    23 

Re: [PATCH net-next v2 1/2] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-10-15 Thread Masahiro Yamada
2017-10-13 9:35 GMT+09:00 Kunihiko Hayashi :
> DT bindings for the AVE ethernet controller found on Socionext's
> UniPhier platforms.
>
> Signed-off-by: Kunihiko Hayashi 
> Signed-off-by: Jassi Brar 
> ---
>  .../bindings/net/socionext,uniphier-ave4.txt   | 53 
> ++
>  1 file changed, 53 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
>
> diff --git 
> a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt 
> b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> new file mode 100644
> index 000..25f4d92
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> @@ -0,0 +1,53 @@
> +* Socionext AVE ethernet controller
> +
> +This describes the devicetree bindings for AVE ethernet controller
> +implemented on Socionext UniPhier SoCs.
> +
> +Required properties:
> + - compatible: Should be
> +   - "socionext,uniphier-pro4-ave4" : for Pro4 SoC
> +   - "socionext,uniphier-pxs2-ave4" : for PXs2 SoC
> +   - "socionext,uniphier-ld20-ave4" : for LD20 SoC
> +   - "socionext,uniphier-ld11-ave4" : for LD11 SoC
> + - reg: Address where registers are mapped and size of region.
> + - interrupts: Should contain the MAC interrupt.
> + - phy-mode: See ethernet.txt in the same directory. Allow to choose
> +   "rgmii", "rmii", or "mii" according to the PHY.
> + - pinctrl-names: List of assigned state names, see pinctrl
> +   binding documentation.
> + - pinctrl-0: List of phandles to configure the GPIO pin,
> +   see pinctrl binding documentation. Choose this appropriately
> +   according to phy-mode.
> +   - <_ether_rgmii> if phy-mode is "rgmii".
> +   - <_ether_rmii> if phy-mode is "rmii".
> +   - <_ether_mii> if phy-mode is "mii".
> + - phy-handle: Should point to the external phy device.
> +   See ethernet.txt file in the same directory.
> + - mdio subnode: Should be device tree subnode with the following required
> +   properties:
> +   - #address-cells: Must be <1>.
> +   - #size-cells: Must be <0>.
> +   - reg: phy ID number, usually a small integer.
> +
> +Optional properties:
> + - local-mac-address: See ethernet.txt in the same directory.
> +
> +Example:
> +
> +   ether: ethernet@6500 {
> +   compatible = "socionext,uniphier-ld20-ave4";
> +   reg = <0x6500 0x8500>;
> +   interrupts = <0 66 4>;
> +   pinctrl-names = "default";
> +   pinctrl-0 = <_ether_rgmii>;
> +   phy-mode = "rgmii";
> +   phy-handle = <>;
> +   local-mac-address = [00 00 00 00 00 00];
> +   mdio {
> +   #address-cells = <1>;
> +   #size-cells = <0>;
> +   ethphy: ethphy@1 {
> +   reg = <1>;
> +   };
> +   };
> +   };
> --
> 2.7.4
>


I found the following code in 2/2.

+ /* get clock */
+ priv->clk = clk_get(dev, NULL);
+ if (IS_ERR(priv->clk))
+ priv->clk = NULL;
+
+ /* get reset */
+ priv->rst = reset_control_get(dev, NULL);
+ if (IS_ERR(priv->rst))
+ priv->rst = NULL;
+


This doc needs to describe "clocks", "resets".


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] ipvs: Fix inappropriate output of procfs

2017-10-15 Thread Julian Anastasov

Hello,

On Sun, 15 Oct 2017, KUWAZAWA Takuya wrote:

> Information about ipvs in different network namespace can be seen via procfs.
> 
> How to reproduce:
> 
>   # ip netns add ns01
>   # ip netns add ns02
>   # ip netns exec ns01 ip a add dev lo 127.0.0.1/8
>   # ip netns exec ns02 ip a add dev lo 127.0.0.1/8
>   # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80
>   # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80
> 
> The ipvsadm displays information about its own network namespace only.
> 
>   # ip netns exec ns01 ipvsadm -Ln
>   IP Virtual Server version 1.2.1 (size=4096)
>   Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port   Forward Weight ActiveConn InActConn
>   TCP  10.1.1.1:80 wlc
> 
>   # ip netns exec ns02 ipvsadm -Ln
>   IP Virtual Server version 1.2.1 (size=4096)
>   Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port   Forward Weight ActiveConn InActConn
>   TCP  10.1.1.2:80 wlc
> 
> But I can see information about other network namespace via procfs.
> 
>   # ip netns exec ns01 cat /proc/net/ip_vs
>   IP Virtual Server version 1.2.1 (size=4096)
>   Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
>   TCP  0A010101:0050 wlc
>   TCP  0A010102:0050 wlc
> 
>   # ip netns exec ns02 cat /proc/net/ip_vs
>   IP Virtual Server version 1.2.1 (size=4096)
>   Prot LocalAddress:Port Scheduler Flags
> -> RemoteAddress:Port Forward Weight ActiveConn InActConn
>   TCP  0A010102:0050 wlc
> 
> Signed-off-by: KUWAZAWA Takuya 

Looks good to me

Acked-by: Julian Anastasov 

Simon, please apply to ipvs tree.

> ---
>  net/netfilter/ipvs/ip_vs_ctl.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
> index 4f940d7..b3245f9 100644
> --- a/net/netfilter/ipvs/ip_vs_ctl.c
> +++ b/net/netfilter/ipvs/ip_vs_ctl.c
> @@ -2034,12 +2034,16 @@ static int ip_vs_info_seq_show(struct seq_file *seq, 
> void *v)
>   seq_puts(seq,
>"  -> RemoteAddress:Port Forward Weight ActiveConn 
> InActConn\n");
>   } else {
> + struct net *net = seq_file_net(seq);
> + struct netns_ipvs *ipvs = net_ipvs(net);
>   const struct ip_vs_service *svc = v;
>   const struct ip_vs_iter *iter = seq->private;
>   const struct ip_vs_dest *dest;
>   struct ip_vs_scheduler *sched = rcu_dereference(svc->scheduler);
>   char *sched_name = sched ? sched->name : "none";
>  
> + if (svc->ipvs != ipvs)
> + return 0;
>   if (iter->table == ip_vs_svc_table) {
>  #ifdef CONFIG_IP_VS_IPV6
>   if (svc->af == AF_INET6)
> -- 
> 1.8.3.1

Regards

--
Julian Anastasov 


[REGRESSION] rtl8723bs (r8723bs) WiFi stopped working on 4.14 on Chuwi Hi12 (Intel CherryTrail SoC)

2017-10-15 Thread Михаил Новоселов , ШЭМ Думалогия
Hello,
I've found a regression in kernel 4.14, the last tested kernel was the 
yesterday daily build http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/ , it 
worked on 4.12 and 4.13 and now on 4.14 it is able to scan for available 
networks, but is not able to get an IP adress.
I reported a bug to the kernel's bugzilla: 
https://bugzilla.kernel.org/show_bug.cgi?id=197137 
-

Hardware platform: Chuwi Hi12 (intell cherrytrail tablet)
On Linux kernels 4.12 and 4.13 the WiFi does work, starting from 4.14 not 
release builds it does not, including some dily builds of the times somewhere 
near rc1, the rc3 release and the letest daily build of the code state for 
2017-10-04 http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2017-10-05/ 
I will provide logs from 4.14 a bit later, bellow are LOGS FROM 4.13.1 with 
working WiFi

Full information on hardware and logs (including full dmesg) are here: 
https://linux-hardware.org/index.php?probe=01521aa6cf (kernel 4.13 and working 
wifi)

$ dmesg | grep rtl
[    7.899105] RTL8723BS: rtl8723bs v4.3.5.5_12290.20140916_BTCOEX20140507-4E40
[    7.899106] RTL8723BS: rtl8723bs BT-Coex version = BTCOEX20140507-4E40
[   10.570755] rtl8723bs: acquire FW from file:rtlwifi/rtl8723bs_nic.bin

Logs from 4.14 with WiFi not working: 
https://linux-hardware.org/index.php?probe=5c7b00c4cb


Re: [PATCH net-next] ipv6: only update __use and lastusetime once per jiffy at most

2017-10-15 Thread Paolo Abeni
On Fri, 2017-10-13 at 17:09 -0700, Martin KaFai Lau wrote:
> On Fri, Oct 13, 2017 at 10:08:07PM +, Wei Wang wrote:
> > From: Wei Wang 
> > 
> > In order to not dirty the cacheline too often, we try to only update
> > dst->__use and dst->lastusetime at most once per jiffy.
> 
> 
> > As dst->lastusetime is only used by ipv6 garbage collector, it should
> > be good enough time resolution.
> 
> Make sense.
> 
> > And __use is only used in ipv6_route_seq_show() to show how many times a
> > dst has been used. And as __use is not atomic_t right now, it does not
> > show the precise number of usage times anyway. So we think it should be
> > OK to only update it at most once per jiffy.
> 
> If __use is only bumped HZ number of times per second and we can do ~3Mpps 
> now,
> would __use be way off?


It would, but even nowaday such value could not be trusted, due to the
cuncurrent non atomic operation used to update it.

This:

https://marc.info/?l=linux-netdev=150653252930953=2

was an attempt to preserve a more meaningful value for '__use', but it
requires an additional cacheline.

I'm fine with either options.

Paolo



[PATCH] ipvs: Fix inappropriate output of procfs

2017-10-15 Thread KUWAZAWA Takuya
Information about ipvs in different network namespace can be seen via procfs.

How to reproduce:

  # ip netns add ns01
  # ip netns add ns02
  # ip netns exec ns01 ip a add dev lo 127.0.0.1/8
  # ip netns exec ns02 ip a add dev lo 127.0.0.1/8
  # ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80
  # ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80

The ipvsadm displays information about its own network namespace only.

  # ip netns exec ns01 ipvsadm -Ln
  IP Virtual Server version 1.2.1 (size=4096)
  Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port   Forward Weight ActiveConn InActConn
  TCP  10.1.1.1:80 wlc

  # ip netns exec ns02 ipvsadm -Ln
  IP Virtual Server version 1.2.1 (size=4096)
  Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port   Forward Weight ActiveConn InActConn
  TCP  10.1.1.2:80 wlc

But I can see information about other network namespace via procfs.

  # ip netns exec ns01 cat /proc/net/ip_vs
  IP Virtual Server version 1.2.1 (size=4096)
  Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
  TCP  0A010101:0050 wlc
  TCP  0A010102:0050 wlc

  # ip netns exec ns02 cat /proc/net/ip_vs
  IP Virtual Server version 1.2.1 (size=4096)
  Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
  TCP  0A010102:0050 wlc

Signed-off-by: KUWAZAWA Takuya 
---
 net/netfilter/ipvs/ip_vs_ctl.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 4f940d7..b3245f9 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2034,12 +2034,16 @@ static int ip_vs_info_seq_show(struct seq_file *seq, 
void *v)
seq_puts(seq,
 "  -> RemoteAddress:Port Forward Weight ActiveConn 
InActConn\n");
} else {
+   struct net *net = seq_file_net(seq);
+   struct netns_ipvs *ipvs = net_ipvs(net);
const struct ip_vs_service *svc = v;
const struct ip_vs_iter *iter = seq->private;
const struct ip_vs_dest *dest;
struct ip_vs_scheduler *sched = rcu_dereference(svc->scheduler);
char *sched_name = sched ? sched->name : "none";
 
+   if (svc->ipvs != ipvs)
+   return 0;
if (iter->table == ip_vs_svc_table) {
 #ifdef CONFIG_IP_VS_IPV6
if (svc->af == AF_INET6)
-- 
1.8.3.1



[PATCH net 4/6] rtnetlink: bring NETDEV_CHANGEUPPER event process back in rtnetlink_event

2017-10-15 Thread Xin Long
libteam needs this event notification in userspace when dev's master
dev has been changed. After this, the redundant notifications issue
would be fixed in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY
correctly in do_setlink'.

Fixes: b6b36eb23a46 ("rtnetlink: Do not generate notifications for 
NETDEV_CHANGEUPPER event")
Signed-off-by: Xin Long 
---
 net/core/rtnetlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8e44fd5..ab98c1c 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4286,6 +4286,7 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_BONDING_FAILOVER:
case NETDEV_POST_TYPE_CHANGE:
case NETDEV_NOTIFY_PEERS:
+   case NETDEV_CHANGEUPPER:
case NETDEV_RESEND_IGMP:
case NETDEV_CHANGEINFODATA:
case NETDEV_CHANGE_TX_QUEUE_LEN:
-- 
2.1.0



[PATCH net 5/6] rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlink

2017-10-15 Thread Xin Long
The check 'status & DO_SETLINK_NOTIFY' in do_setlink doesn't really
work after status & DO_SETLINK_MODIFIED, as:

  DO_SETLINK_MODIFIED 0x1
  DO_SETLINK_NOTIFY 0x3

Considering that notifications are suppposed to be sent only when
status have the flag DO_SETLINK_NOTIFY, the right check would be:

  (status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY

This would avoid lots of duplicated notifications when setting some
properties of a link.

Fixes: ba9989069f4e ("rtnl/do_setlink(): notify when a netdev is modified")
Signed-off-by: Xin Long 
---
 net/core/rtnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index ab98c1c..3e98fb5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2248,7 +2248,7 @@ static int do_setlink(const struct sk_buff *skb,
 
 errout:
if (status & DO_SETLINK_MODIFIED) {
-   if (status & DO_SETLINK_NOTIFY)
+   if ((status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY)
netdev_state_change(dev);
 
if (err < 0)
-- 
2.1.0



[PATCH net 6/6] rtnetlink: do not set notification for tx_queue_len in do_setlink

2017-10-15 Thread Xin Long
NETDEV_CHANGE_TX_QUEUE_LEN event process in rtnetlink_event would
send a notification for userspace and tx_queue_len's setting in
do_setlink would trigger NETDEV_CHANGE_TX_QUEUE_LEN.

So it shouldn't set DO_SETLINK_NOTIFY status for this change to
send a notification any more.

Signed-off-by: Xin Long 
---
 net/core/rtnetlink.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 3e98fb5..a6bcf86 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2093,7 +2093,7 @@ static int do_setlink(const struct sk_buff *skb,
dev->tx_queue_len = orig_len;
goto errout;
}
-   status |= DO_SETLINK_NOTIFY;
+   status |= DO_SETLINK_MODIFIED;
}
}
 
-- 
2.1.0



[PATCH net 1/6] rtnetlink: bring NETDEV_CHANGEMTU event process back in rtnetlink_event

2017-10-15 Thread Xin Long
Commit 085e1a65f04f ("rtnetlink: Do not generate notifications for MTU
events") tried to fix the redundant notifications issue when ip link
set mtu by removing NETDEV_CHANGEMTU event process in rtnetlink_event.

But it also resulted in no notification generated when dev's mtu is
changed via other methods, like:
  'ifconfig eth1 mtu 1400' or 'echo 1400 > /sys/class/net/eth1/mtu'
It would cause users not to be notified by this change.

This patch is to fix it by bringing NETDEV_CHANGEMTU event back into
rtnetlink_event, and the redundant notifications issue will be fixed
in the later patch 'rtnetlink: check DO_SETLINK_NOTIFY correctly in
do_setlink'.

Fixes: 085e1a65f04f ("rtnetlink: Do not generate notifications for MTU events")
Signed-off-by: Xin Long 
---
 net/core/rtnetlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d4bcdcc..72053ed 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4279,6 +4279,7 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
 
switch (event) {
case NETDEV_REBOOT:
+   case NETDEV_CHANGEMTU:
case NETDEV_CHANGEADDR:
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
-- 
2.1.0



[PATCH net 3/6] rtnetlink: bring NETDEV_POST_TYPE_CHANGE event process back in rtnetlink_event

2017-10-15 Thread Xin Long
As I said in patch 'rtnetlink: bring NETDEV_CHANGEMTU event process back
in rtnetlink_event', removing NETDEV_POST_TYPE_CHANGE event was not the
right fix for the redundant notifications issue.

So bring this event process back to rtnetlink_event and the old redundant
notifications issue would be fixed in the later patch 'rtnetlink: check
DO_SETLINK_NOTIFY correctly in do_setlink'.

Fixes: aef091ae58aa ("rtnetlink: Do not generate notifications for 
POST_TYPE_CHANGE event")
Signed-off-by: Xin Long 
---
 net/core/rtnetlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index bf47360..8e44fd5 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4284,6 +4284,7 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_CHANGENAME:
case NETDEV_FEAT_CHANGE:
case NETDEV_BONDING_FAILOVER:
+   case NETDEV_POST_TYPE_CHANGE:
case NETDEV_NOTIFY_PEERS:
case NETDEV_RESEND_IGMP:
case NETDEV_CHANGEINFODATA:
-- 
2.1.0



[PATCH net 2/6] rtnetlink: bring NETDEV_CHANGE_TX_QUEUE_LEN event process back in rtnetlink_event

2017-10-15 Thread Xin Long
The same fix for changing mtu in the patch 'rtnetlink: bring
NETDEV_CHANGEMTU event process back in rtnetlink_event' is
needed for changing tx_queue_len.

Note that the redundant notifications issue for tx_queue_len
will be fixed in the later patch 'rtnetlink: do not send
notification for tx_queue_len in do_setlink'.

Fixes: 27b3b551d8a7 ("rtnetlink: Do not generate notifications for 
NETDEV_CHANGE_TX_QUEUE_LEN event")
Signed-off-by: Xin Long 
---
 net/core/rtnetlink.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 72053ed..bf47360 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -4287,6 +4287,7 @@ static int rtnetlink_event(struct notifier_block *this, 
unsigned long event, voi
case NETDEV_NOTIFY_PEERS:
case NETDEV_RESEND_IGMP:
case NETDEV_CHANGEINFODATA:
+   case NETDEV_CHANGE_TX_QUEUE_LEN:
rtmsg_ifinfo_event(RTM_NEWLINK, dev, 0, rtnl_get_event(event),
   GFP_KERNEL);
break;
-- 
2.1.0



[PATCH net 0/6] rtnetlink: a bunch of fixes for userspace notifications in changing dev properties

2017-10-15 Thread Xin Long
Whenever any property of a link, address, route, etc. changes by whatever way,
kernel should notify the programs that listen for such events in userspace.

The patchet "rtnetlink: Cleanup user notifications for netdev events" tried to
fix a redundant notifications issue, but it also introduced a side effect.

After that, user notifications could only be sent when changing dev properties
via netlink api. As it removed some events process in rtnetlink_event where
the notifications was sent to users.

It resulted in no notification generated when dev properties are changed via
other ways, like ioctl, sysfs, etc. It may cause some user programs doesn't
work as expected because of the missing notifications.

This patchset will fix it by bringing some of these netdev events back and
also fix the old redundant notifications issue with a proper way.

Xin Long (6):
  rtnetlink: bring NETDEV_CHANGEMTU event process back in
rtnetlink_event
  rtnetlink: bring NETDEV_CHANGE_TX_QUEUE_LEN event process back in
rtnetlink_event
  rtnetlink: bring NETDEV_POST_TYPE_CHANGE event process back in
rtnetlink_event
  rtnetlink: bring NETDEV_CHANGEUPPER event process back in
rtnetlink_event
  rtnetlink: check DO_SETLINK_NOTIFY correctly in do_setlink
  rtnetlink: do not set notification for tx_queue_len in do_setlink

 net/core/rtnetlink.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

-- 
2.1.0



RFC: making cn_proc work in {pid,user} namespaces

2017-10-15 Thread Aleksa Sarai

Hi all,

At the moment, cn_proc is not usable by containers or container 
runtimes. In addition, all connectors have an odd relationship with 
init_net (for example, /proc/net/connectors only exists in init_net). 
There are two main use-cases that would be perfect for cn_proc, which is 
the reason for me pushing this:


First, when adding a process to an existing container, in certain modes 
runc would like to know that process's exit code. But, when joining a 
PID namespace, it is advisable[1] to always double-fork after doing the 
setns(2) to reparent the joining process to the init of the container 
(this causes the SIGCHLD to be received by the container init). It would 
also be useful to be able to monitor the exit code of the init process 
in a container without being its parent. At the moment, cn_proc doesn't 
allow unprivileged users to use it (making it a problem for user 
namespaces and "rootless containers"). In addition, it also doesn't 
allow nested containers to use it, because it requires the process to be 
in init_pid. As a result, runc cannot use cn_proc and relies on SIGCHLD 
(which can only be used if we don't double-fork, or keep around a 
long-running process which is something that runc also cannot do).


Secondly, there are/were some init systems that rely on cn_proc to 
manage service state. From a "it would be neat" perspective, I think it 
would be quite nice if such init systems could be used inside 
containers. But that requires cn_proc to be able to be used as an 
unprivileged user and in a pid namespace other than init_pid.


The /proc/net/connectors thing is quite easily resolved (just make it 
the connector driver perdev and make some small changes to make sure the 
interfaces stay sane inside of a container's network namespace). I'm 
sure that we'll probably have to make some changes to the registration 
API, so that a connector can specify whether they want to be visible to 
non-init_net namespaces.


However, the cn_proc problem is a bit harder to resolve nicely and there 
are quite a few interface questions that would need to be agreed upon. 
The basic idea would be that a process can only get cn_proc events if it 
has ptrace_may_access rights over said process (effectively a forced 
filter -- which would ideally be done send-side but it looks like it 
might have to be done receive-side). This should resolve possible 
concerns about an unprivileged process being able to inspect (fairly 
granular) information about the host. And obviously the pids, uids, and 
gids would all be translated according to the receiving process's user 
namespaces (if it cannot be translated then the message is not 
received). I guess that the translation would be done in the same way as 
SCM_CREDENTIALS (and cgroup.procs files), which is that it's done on the 
receive side not the send side.


My reason for sending this email rather than just writing the patch is 
to see whether anyone has any solid NACKs against the use-case or 
whether there is some fundamental issue that I'm not seeing. If nobody 
objects, I'll be happy to work on this.


[1]: https://lwn.net/Articles/532748/

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
https://www.cyphar.com/


Re: [PATCH] tests: Remove bashisms (s/source/.)

2017-10-15 Thread Petr Vorel
Hi Mark,

> > --- a/testsuite/tests/ip/route/add_default_route.t
> > +++ b/testsuite/tests/ip/route/add_default_route.t
> > @@ -1,6 +1,6 @@
> > -#!/bin/sh
> > +#!/bin/bash

> Funny - ^^^ choosing bash explicitly while
> removing the bashism?
Eh, this is really silly, sorry. I've sent a patch to revert it back as it was 
unintentional.


> I noticed a couple other files already specified /bin/bash, yet you removed 
> the bashisms. But the above struck me as something that would not seem to 
> have been intended.
I plan to remove all bashisms. Although having explicit dependency on bash 
(i.e. /bin/bash
in shebang), it would be nice have posix compliant shell scripts (as Randy 
Dunlap also
noted). Or at least don't have bashisms in script with /bin/sh shebang.


Kind regards,
Petr


[PATCH iproute2 1/1] tests: Revert back /bin/sh in shebang

2017-10-15 Thread Petr Vorel
This was added by mistake in commit ecd44e68
("tests: Remove bashisms (s/source/.)")

Signed-off-by: Petr Vorel 
---
 testsuite/tests/ip/route/add_default_route.t | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/testsuite/tests/ip/route/add_default_route.t 
b/testsuite/tests/ip/route/add_default_route.t
index 0b566f1f..569ba1f8 100755
--- a/testsuite/tests/ip/route/add_default_route.t
+++ b/testsuite/tests/ip/route/add_default_route.t
@@ -1,4 +1,4 @@
-#!/bin/bash
+#!/bin/sh
 
 . lib/generic.sh
 
-- 
2.14.2



[PATCH v9 02/20] crypto: ccp: use -EAGAIN for transient busy indication

2017-10-15 Thread Gilad Ben-Yossef
Replace -EBUSY with -EAGAIN when reporting transient busy
indication in the absence of backlog.

Signed-off-by: Gilad Ben-Yossef 
Reviewed-by: Gary R Hook 

---

Please squash this patch with the previous one when merging upstream.

 drivers/crypto/ccp/ccp-crypto-main.c | 8 +++-
 drivers/crypto/ccp/ccp-dev.c | 7 +--
 2 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/ccp/ccp-crypto-main.c 
b/drivers/crypto/ccp/ccp-crypto-main.c
index 35a9de7..403ff0a 100644
--- a/drivers/crypto/ccp/ccp-crypto-main.c
+++ b/drivers/crypto/ccp/ccp-crypto-main.c
@@ -222,9 +222,10 @@ static int ccp_crypto_enqueue_cmd(struct ccp_crypto_cmd 
*crypto_cmd)
 
/* Check if the cmd can/should be queued */
if (req_queue.cmd_count >= CCP_CRYPTO_MAX_QLEN) {
-   ret = -EBUSY;
-   if (!(crypto_cmd->cmd->flags & CCP_CMD_MAY_BACKLOG))
+   if (!(crypto_cmd->cmd->flags & CCP_CMD_MAY_BACKLOG)) {
+   ret = -EAGAIN;
goto e_lock;
+   }
}
 
/* Look for an entry with the same tfm.  If there is a cmd
@@ -243,9 +244,6 @@ static int ccp_crypto_enqueue_cmd(struct ccp_crypto_cmd 
*crypto_cmd)
ret = ccp_enqueue_cmd(crypto_cmd->cmd);
if (!ccp_crypto_success(ret))
goto e_lock;/* Error, don't queue it */
-   if ((ret == -EBUSY) &&
-   !(crypto_cmd->cmd->flags & CCP_CMD_MAY_BACKLOG))
-   goto e_lock;/* Not backlogging, don't queue it */
}
 
if (req_queue.cmd_count >= CCP_CRYPTO_MAX_QLEN) {
diff --git a/drivers/crypto/ccp/ccp-dev.c b/drivers/crypto/ccp/ccp-dev.c
index 4e029b1..3d637e3 100644
--- a/drivers/crypto/ccp/ccp-dev.c
+++ b/drivers/crypto/ccp/ccp-dev.c
@@ -292,9 +292,12 @@ int ccp_enqueue_cmd(struct ccp_cmd *cmd)
i = ccp->cmd_q_count;
 
if (ccp->cmd_count >= MAX_CMD_QLEN) {
-   ret = -EBUSY;
-   if (cmd->flags & CCP_CMD_MAY_BACKLOG)
+   if (cmd->flags & CCP_CMD_MAY_BACKLOG) {
+   ret = -EBUSY;
list_add_tail(>entry, >backlog);
+   } else {
+   ret = -EAGAIN;
+   }
} else {
ret = -EINPROGRESS;
ccp->cmd_count++;
-- 
2.7.4



[PATCH v9 00/20] simplify crypto wait for async op

2017-10-15 Thread Gilad Ben-Yossef
Many users of kernel async. crypto services have a pattern of
starting an async. crypto op and than using a completion
to wait for it to end.

This patch set simplifies this common use case in two ways:

First, by separating the return codes of the case where a
request is queued to a backlog due to the provider being
busy (-EBUSY) from the case the request has failed due
to the provider being busy and backlogging is not enabled
(-EAGAIN).

Next, this change is than built on to create a generic API
to wait for a async. crypto operation to complete.

The end result is a smaller code base and an API that is
easier to use and more difficult to get wrong.

The patch set was boot tested on x86_64 and arm64 which
at the very least tests the crypto users via testmgr and
tcrypt but I do note that I do not have access to some
of the HW whose drivers are modified nor do I claim I was
able to test all of the corner cases.

The patch set is based upon linux-next release tagged
next-20171013.

Changes from v8:
- Remove the translation of EAGAIN return code to the
  previous return code of EBUSY for the user space
  interface of algif as no one seems to rely on it as
  requested by Herbert Xu.

Changes from v7:
- Turn -EBUSY to -EAGAIN also in crypto using net
  code which I missed before, as has been pointed
  out by Harsh Jain.

Changes from v6:
- Fix brown paper bag compile error on marvell/cesa
  code.

Changes from v5:
- Remove redundant new line as spotted by Jonathan
  Cameron.
- Reworded dm-verity change commit message to better
  clarify potential issue averted by change as
  pointed out by Mikulas Patocka.

Changes from v4:
- Rebase on top of latest algif changes from Stephan
  Mueller.
- Fix typo in ccp patch title.

Changes from v3:
- Instead of changing the return code to indicate
  backlog queueing, change the return code to indicate
  transient busy state, as suggested by Herbert Xu.

Changes from v2:
- Patch title changed from "introduce crypto wait for
  async op" to better reflect the current state.
- Rebase on top of latest linux-next.
- Add a new return code of -EIOCBQUEUED for backlog
  queueing, as suggested by Herbert Xu.
- Transform more users to the new API.
- Update the drbg change to account for new init as
  indicated by Stephan Muller.

Changes from v1:
- Address review comments from Eric Biggers.
- Separated out bug fixes of existing code and rebase
  on top of that patch set.
- Rename 'ecr' to 'wait' in fscrypto code.
- Split patch introducing the new API from the change
  moving over the algif code which it originated from
  to the new API.
- Inline crypto_wait_req().
- Some code indentation fixes.

Gilad Ben-Yossef (20):
  crypto: change transient busy return code to -EAGAIN
  crypto: ccp: use -EAGAIN for transient busy indication
  net: use -EAGAIN for transient busy indication
  crypto: remove redundant backlog checks on EBUSY
  crypto: marvell/cesa: remove redundant backlog checks on EBUSY
  crypto: introduce crypto wait for async op
  crypto: move algif to generic async completion
  crypto: move pub key to generic async completion
  crypto: move drbg to generic async completion
  crypto: move gcm to generic async completion
  crypto: move testmgr to generic async completion
  fscrypt: move to generic async completion
  dm: move dm-verity to generic async completion
  cifs: move to generic async completion
  ima: move to generic async completion
  crypto: tcrypt: move to generic async completion
  crypto: talitos: move to generic async completion
  crypto: qce: move to generic async completion
  crypto: mediatek: move to generic async completion
  crypto: adapt api sample to use async. op wait

 Documentation/crypto/api-samples.rst |  52 ++---
 crypto/af_alg.c  |  27 -
 crypto/ahash.c   |  12 +--
 crypto/algapi.c  |   6 +-
 crypto/algif_aead.c  |   8 +-
 crypto/algif_hash.c  |  30 +++---
 crypto/algif_skcipher.c  |   9 +-
 crypto/api.c |  13 +++
 crypto/asymmetric_keys/public_key.c  |  28 +
 crypto/cryptd.c  |   4 +-
 crypto/cts.c |   6 +-
 crypto/drbg.c|  36 ++-
 crypto/gcm.c |  32 ++
 crypto/lrw.c |   8 +-
 crypto/rsa-pkcs1pad.c|  16 +--
 crypto/tcrypt.c  |  84 +--
 crypto/testmgr.c | 204 ---
 crypto/xts.c |   8 +-
 drivers/crypto/ccp/ccp-crypto-main.c |   8 +-
 drivers/crypto/ccp/ccp-dev.c |   7 +-
 drivers/crypto/marvell/cesa.c|   3 +-
 drivers/crypto/marvell/cesa.h|   2 +-
 drivers/crypto/mediatek/mtk-aes.c|  31 +-
 drivers/crypto/qce/sha.c |  30 +-
 drivers/crypto/talitos.c |  38 +--
 drivers/md/dm-verity-target.c|  81 

[PATCH v9 03/20] net: use -EAGAIN for transient busy indication

2017-10-15 Thread Gilad Ben-Yossef
Replace -EBUSY with -EAGAIN when handling transient busy
indication in the absence of backlog.

Signed-off-by: Gilad Ben-Yossef 

---

Please squash this patch with the previous one when merging upstream.

 net/ipv4/ah4.c  | 2 +-
 net/ipv4/esp4.c | 2 +-
 net/ipv6/ah6.c  | 2 +-
 net/ipv6/esp6.c | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/ah4.c b/net/ipv4/ah4.c
index 37db44f..049cb0a 100644
--- a/net/ipv4/ah4.c
+++ b/net/ipv4/ah4.c
@@ -240,7 +240,7 @@ static int ah_output(struct xfrm_state *x, struct sk_buff 
*skb)
if (err == -EINPROGRESS)
goto out;
 
-   if (err == -EBUSY)
+   if (err == -EAGAIN)
err = NET_XMIT_DROP;
goto out_free;
}
diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index b00e4a4..ff8e088 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -432,7 +432,7 @@ int esp_output_tail(struct xfrm_state *x, struct sk_buff 
*skb, struct esp_info *
case -EINPROGRESS:
goto error;
 
-   case -EBUSY:
+   case -EAGAIN:
err = NET_XMIT_DROP;
break;
 
diff --git a/net/ipv6/ah6.c b/net/ipv6/ah6.c
index 7802b72..ba0c145 100644
--- a/net/ipv6/ah6.c
+++ b/net/ipv6/ah6.c
@@ -443,7 +443,7 @@ static int ah6_output(struct xfrm_state *x, struct sk_buff 
*skb)
if (err == -EINPROGRESS)
goto out;
 
-   if (err == -EBUSY)
+   if (err == -EAGAIN)
err = NET_XMIT_DROP;
goto out_free;
}
diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 89910e2..1a71ee5 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -396,7 +396,7 @@ int esp6_output_tail(struct xfrm_state *x, struct sk_buff 
*skb, struct esp_info
case -EINPROGRESS:
goto error;
 
-   case -EBUSY:
+   case -EAGAIN:
err = NET_XMIT_DROP;
break;
 
-- 
2.7.4



[PATCH v9 09/20] crypto: move drbg to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
DRBG is starting an async. crypto op and waiting for it complete.
Move it over to generic code doing the same.

The code now also passes CRYPTO_TFM_REQ_MAY_SLEEP flag indicating
crypto request memory allocation may use GFP_KERNEL which should
be perfectly fine as the code is obviously sleeping for the
completion of the request any way.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/drbg.c | 36 +---
 include/crypto/drbg.h |  3 +--
 2 files changed, 10 insertions(+), 29 deletions(-)

diff --git a/crypto/drbg.c b/crypto/drbg.c
index 7001839..4faa278 100644
--- a/crypto/drbg.c
+++ b/crypto/drbg.c
@@ -1651,16 +1651,6 @@ static int drbg_fini_sym_kernel(struct drbg_state *drbg)
return 0;
 }
 
-static void drbg_skcipher_cb(struct crypto_async_request *req, int error)
-{
-   struct drbg_state *drbg = req->data;
-
-   if (error == -EINPROGRESS)
-   return;
-   drbg->ctr_async_err = error;
-   complete(>ctr_completion);
-}
-
 static int drbg_init_sym_kernel(struct drbg_state *drbg)
 {
struct crypto_cipher *tfm;
@@ -1691,7 +1681,7 @@ static int drbg_init_sym_kernel(struct drbg_state *drbg)
return PTR_ERR(sk_tfm);
}
drbg->ctr_handle = sk_tfm;
-   init_completion(>ctr_completion);
+   crypto_init_wait(>ctr_wait);
 
req = skcipher_request_alloc(sk_tfm, GFP_KERNEL);
if (!req) {
@@ -1700,8 +1690,9 @@ static int drbg_init_sym_kernel(struct drbg_state *drbg)
return -ENOMEM;
}
drbg->ctr_req = req;
-   skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-   drbg_skcipher_cb, drbg);
+   skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
+   CRYPTO_TFM_REQ_MAY_SLEEP,
+   crypto_req_done, >ctr_wait);
 
alignmask = crypto_skcipher_alignmask(sk_tfm);
drbg->ctr_null_value_buf = kzalloc(DRBG_CTR_NULL_LEN + alignmask,
@@ -1762,21 +1753,12 @@ static int drbg_kcapi_sym_ctr(struct drbg_state *drbg,
/* Output buffer may not be valid for SGL, use scratchpad */
skcipher_request_set_crypt(drbg->ctr_req, _in, _out,
   cryptlen, drbg->V);
-   ret = crypto_skcipher_encrypt(drbg->ctr_req);
-   switch (ret) {
-   case 0:
-   break;
-   case -EINPROGRESS:
-   case -EBUSY:
-   wait_for_completion(>ctr_completion);
-   if (!drbg->ctr_async_err) {
-   reinit_completion(>ctr_completion);
-   break;
-   }
-   default:
+   ret = crypto_wait_req(crypto_skcipher_encrypt(drbg->ctr_req),
+   >ctr_wait);
+   if (ret)
goto out;
-   }
-   init_completion(>ctr_completion);
+
+   crypto_init_wait(>ctr_wait);
 
memcpy(outbuf, drbg->outscratchpad, cryptlen);
 
diff --git a/include/crypto/drbg.h b/include/crypto/drbg.h
index 22f884c..8f94110 100644
--- a/include/crypto/drbg.h
+++ b/include/crypto/drbg.h
@@ -126,8 +126,7 @@ struct drbg_state {
__u8 *ctr_null_value;   /* CTR mode aligned zero buf */
__u8 *outscratchpadbuf; /* CTR mode output scratchpad */
 __u8 *outscratchpad;   /* CTR mode aligned outbuf */
-   struct completion ctr_completion;   /* CTR mode async handler */
-   int ctr_async_err;  /* CTR mode async error */
+   struct crypto_wait ctr_wait;/* CTR mode async wait obj */
 
bool seeded;/* DRBG fully seeded? */
bool pr;/* Prediction resistance enabled? */
-- 
2.7.4



[PATCH v9 06/20] crypto: introduce crypto wait for async op

2017-10-15 Thread Gilad Ben-Yossef
Invoking a possibly async. crypto op and waiting for completion
while correctly handling backlog processing is a common task
in the crypto API implementation and outside users of it.

This patch adds a generic implementation for doing so in
preparation for using it across the board instead of hand
rolled versions.

Signed-off-by: Gilad Ben-Yossef 
CC: Eric Biggers 
CC: Jonathan Cameron 
---
 crypto/api.c   | 13 +
 include/linux/crypto.h | 40 
 2 files changed, 53 insertions(+)

diff --git a/crypto/api.c b/crypto/api.c
index 941cd4c..2a2479d 100644
--- a/crypto/api.c
+++ b/crypto/api.c
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 LIST_HEAD(crypto_alg_list);
@@ -595,5 +596,17 @@ int crypto_has_alg(const char *name, u32 type, u32 mask)
 }
 EXPORT_SYMBOL_GPL(crypto_has_alg);
 
+void crypto_req_done(struct crypto_async_request *req, int err)
+{
+   struct crypto_wait *wait = req->data;
+
+   if (err == -EINPROGRESS)
+   return;
+
+   wait->err = err;
+   complete(>completion);
+}
+EXPORT_SYMBOL_GPL(crypto_req_done);
+
 MODULE_DESCRIPTION("Cryptographic core API");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 84da997..78508ca 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -24,6 +24,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Autoloaded crypto modules should only use a prefixed name to avoid allowing
@@ -468,6 +469,45 @@ struct crypto_alg {
 } CRYPTO_MINALIGN_ATTR;
 
 /*
+ * A helper struct for waiting for completion of async crypto ops
+ */
+struct crypto_wait {
+   struct completion completion;
+   int err;
+};
+
+/*
+ * Macro for declaring a crypto op async wait object on stack
+ */
+#define DECLARE_CRYPTO_WAIT(_wait) \
+   struct crypto_wait _wait = { \
+   COMPLETION_INITIALIZER_ONSTACK((_wait).completion), 0 }
+
+/*
+ * Async ops completion helper functioons
+ */
+void crypto_req_done(struct crypto_async_request *req, int err);
+
+static inline int crypto_wait_req(int err, struct crypto_wait *wait)
+{
+   switch (err) {
+   case -EINPROGRESS:
+   case -EBUSY:
+   wait_for_completion(>completion);
+   reinit_completion(>completion);
+   err = wait->err;
+   break;
+   };
+
+   return err;
+}
+
+static inline void crypto_init_wait(struct crypto_wait *wait)
+{
+   init_completion(>completion);
+}
+
+/*
  * Algorithm registration interface.
  */
 int crypto_register_alg(struct crypto_alg *alg);
-- 
2.7.4



[PATCH v9 07/20] crypto: move algif to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
algif starts several async crypto ops and waits for their completion.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/af_alg.c | 27 ---
 crypto/algif_aead.c |  8 
 crypto/algif_hash.c | 30 ++
 crypto/algif_skcipher.c |  9 -
 include/crypto/if_alg.h | 15 +--
 5 files changed, 23 insertions(+), 66 deletions(-)

diff --git a/crypto/af_alg.c b/crypto/af_alg.c
index 337cf38..85cea9d 100644
--- a/crypto/af_alg.c
+++ b/crypto/af_alg.c
@@ -481,33 +481,6 @@ int af_alg_cmsg_send(struct msghdr *msg, struct 
af_alg_control *con)
 }
 EXPORT_SYMBOL_GPL(af_alg_cmsg_send);
 
-int af_alg_wait_for_completion(int err, struct af_alg_completion *completion)
-{
-   switch (err) {
-   case -EINPROGRESS:
-   case -EBUSY:
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   err = completion->err;
-   break;
-   };
-
-   return err;
-}
-EXPORT_SYMBOL_GPL(af_alg_wait_for_completion);
-
-void af_alg_complete(struct crypto_async_request *req, int err)
-{
-   struct af_alg_completion *completion = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   completion->err = err;
-   complete(>completion);
-}
-EXPORT_SYMBOL_GPL(af_alg_complete);
-
 /**
  * af_alg_alloc_tsgl - allocate the TX SGL
  *
diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index 516b38c..aacae08 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -278,11 +278,11 @@ static int _aead_recvmsg(struct socket *sock, struct 
msghdr *msg,
/* Synchronous operation */
aead_request_set_callback(>cra_u.aead_req,
  CRYPTO_TFM_REQ_MAY_BACKLOG,
- af_alg_complete, >completion);
-   err = af_alg_wait_for_completion(ctx->enc ?
+ crypto_req_done, >wait);
+   err = crypto_wait_req(ctx->enc ?
crypto_aead_encrypt(>cra_u.aead_req) :
crypto_aead_decrypt(>cra_u.aead_req),
->completion);
+   >wait);
}
 
/* AIO operation in progress */
@@ -554,7 +554,7 @@ static int aead_accept_parent_nokey(void *private, struct 
sock *sk)
ctx->merge = 0;
ctx->enc = 0;
ctx->aead_assoclen = 0;
-   af_alg_init_completion(>completion);
+   crypto_init_wait(>wait);
 
ask->private = ctx;
 
diff --git a/crypto/algif_hash.c b/crypto/algif_hash.c
index 5e92bd2..76d2e71 100644
--- a/crypto/algif_hash.c
+++ b/crypto/algif_hash.c
@@ -26,7 +26,7 @@ struct hash_ctx {
 
u8 *result;
 
-   struct af_alg_completion completion;
+   struct crypto_wait wait;
 
unsigned int len;
bool more;
@@ -88,8 +88,7 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
if ((msg->msg_flags & MSG_MORE))
hash_free_result(sk, ctx);
 
-   err = af_alg_wait_for_completion(crypto_ahash_init(>req),
-   >completion);
+   err = crypto_wait_req(crypto_ahash_init(>req), >wait);
if (err)
goto unlock;
}
@@ -110,8 +109,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
 
ahash_request_set_crypt(>req, ctx->sgl.sg, NULL, len);
 
-   err = af_alg_wait_for_completion(crypto_ahash_update(>req),
->completion);
+   err = crypto_wait_req(crypto_ahash_update(>req),
+ >wait);
af_alg_free_sg(>sgl);
if (err)
goto unlock;
@@ -129,8 +128,8 @@ static int hash_sendmsg(struct socket *sock, struct msghdr 
*msg,
goto unlock;
 
ahash_request_set_crypt(>req, NULL, ctx->result, 0);
-   err = af_alg_wait_for_completion(crypto_ahash_final(>req),
->completion);
+   err = crypto_wait_req(crypto_ahash_final(>req),
+ >wait);
}
 
 unlock:
@@ -171,7 +170,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct 
page *page,
} else {
if (!ctx->more) {
err = crypto_ahash_init(>req);
-   err = af_alg_wait_for_completion(err, >completion);
+   err = crypto_wait_req(err, >wait);
if (err)
goto unlock;
}
@@ -179,7 +178,7 @@ static ssize_t hash_sendpage(struct socket *sock, struct 
page *page,
err = 

[PATCH v9 15/20] ima: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
ima starts several async crypto ops and  waits for their completions.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Mimi Zohar 
---
 security/integrity/ima/ima_crypto.c | 56 +++--
 1 file changed, 17 insertions(+), 39 deletions(-)

diff --git a/security/integrity/ima/ima_crypto.c 
b/security/integrity/ima/ima_crypto.c
index a856d8c..9057b16 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -27,11 +27,6 @@
 
 #include "ima.h"
 
-struct ahash_completion {
-   struct completion completion;
-   int err;
-};
-
 /* minimum file size for ahash use */
 static unsigned long ima_ahash_minsize;
 module_param_named(ahash_minsize, ima_ahash_minsize, ulong, 0644);
@@ -196,30 +191,13 @@ static void ima_free_atfm(struct crypto_ahash *tfm)
crypto_free_ahash(tfm);
 }
 
-static void ahash_complete(struct crypto_async_request *req, int err)
+static inline int ahash_wait(int err, struct crypto_wait *wait)
 {
-   struct ahash_completion *res = req->data;
 
-   if (err == -EINPROGRESS)
-   return;
-   res->err = err;
-   complete(>completion);
-}
+   err = crypto_wait_req(err, wait);
 
-static int ahash_wait(int err, struct ahash_completion *res)
-{
-   switch (err) {
-   case 0:
-   break;
-   case -EINPROGRESS:
-   case -EBUSY:
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   err = res->err;
-   /* fall through */
-   default:
+   if (err)
pr_crit_ratelimited("ahash calculation failed: err: %d\n", err);
-   }
 
return err;
 }
@@ -233,7 +211,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
int rc, read = 0, rbuf_len, active = 0, ahash_rc = 0;
struct ahash_request *req;
struct scatterlist sg[1];
-   struct ahash_completion res;
+   struct crypto_wait wait;
size_t rbuf_size[2];
 
hash->length = crypto_ahash_digestsize(tfm);
@@ -242,12 +220,12 @@ static int ima_calc_file_hash_atfm(struct file *file,
if (!req)
return -ENOMEM;
 
-   init_completion();
+   crypto_init_wait();
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP,
-  ahash_complete, );
+  crypto_req_done, );
 
-   rc = ahash_wait(crypto_ahash_init(req), );
+   rc = ahash_wait(crypto_ahash_init(req), );
if (rc)
goto out1;
 
@@ -288,7 +266,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
 * read/request, wait for the completion of the
 * previous ahash_update() request.
 */
-   rc = ahash_wait(ahash_rc, );
+   rc = ahash_wait(ahash_rc, );
if (rc)
goto out3;
}
@@ -304,7 +282,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
 * read/request, wait for the completion of the
 * previous ahash_update() request.
 */
-   rc = ahash_wait(ahash_rc, );
+   rc = ahash_wait(ahash_rc, );
if (rc)
goto out3;
}
@@ -318,7 +296,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
active = !active; /* swap buffers, if we use two */
}
/* wait for the last update request to complete */
-   rc = ahash_wait(ahash_rc, );
+   rc = ahash_wait(ahash_rc, );
 out3:
if (read)
file->f_mode &= ~FMODE_READ;
@@ -327,7 +305,7 @@ static int ima_calc_file_hash_atfm(struct file *file,
 out2:
if (!rc) {
ahash_request_set_crypt(req, NULL, hash->digest, 0);
-   rc = ahash_wait(crypto_ahash_final(req), );
+   rc = ahash_wait(crypto_ahash_final(req), );
}
 out1:
ahash_request_free(req);
@@ -537,7 +515,7 @@ static int calc_buffer_ahash_atfm(const void *buf, loff_t 
len,
 {
struct ahash_request *req;
struct scatterlist sg;
-   struct ahash_completion res;
+   struct crypto_wait wait;
int rc, ahash_rc = 0;
 
hash->length = crypto_ahash_digestsize(tfm);
@@ -546,12 +524,12 @@ static int calc_buffer_ahash_atfm(const void *buf, loff_t 
len,
if (!req)
return -ENOMEM;
 
-   init_completion();
+   crypto_init_wait();
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
   CRYPTO_TFM_REQ_MAY_SLEEP,
-  ahash_complete, );
+   

[PATCH v9 11/20] crypto: move testmgr to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
testmgr is starting async. crypto ops and waiting for them to complete.
Move it over to generic code doing the same.

This also provides a test of the generic crypto async. wait code.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/testmgr.c | 204 ++-
 1 file changed, 66 insertions(+), 138 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index dd9c7f1..3996fd5 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -76,11 +76,6 @@ int alg_test(const char *driver, const char *alg, u32 type, 
u32 mask)
 #define ENCRYPT 1
 #define DECRYPT 0
 
-struct tcrypt_result {
-   struct completion completion;
-   int err;
-};
-
 struct aead_test_suite {
struct {
const struct aead_testvec *vecs;
@@ -155,17 +150,6 @@ static void hexdump(unsigned char *buf, unsigned int len)
buf, len, false);
 }
 
-static void tcrypt_complete(struct crypto_async_request *req, int err)
-{
-   struct tcrypt_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static int testmgr_alloc_buf(char *buf[XBUFSIZE])
 {
int i;
@@ -193,20 +177,10 @@ static void testmgr_free_buf(char *buf[XBUFSIZE])
free_page((unsigned long)buf[i]);
 }
 
-static int wait_async_op(struct tcrypt_result *tr, int ret)
-{
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   ret = tr->err;
-   }
-   return ret;
-}
-
 static int ahash_partial_update(struct ahash_request **preq,
struct crypto_ahash *tfm, const struct hash_testvec *template,
void *hash_buff, int k, int temp, struct scatterlist *sg,
-   const char *algo, char *result, struct tcrypt_result *tresult)
+   const char *algo, char *result, struct crypto_wait *wait)
 {
char *state;
struct ahash_request *req;
@@ -236,7 +210,7 @@ static int ahash_partial_update(struct ahash_request **preq,
}
ahash_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG,
-   tcrypt_complete, tresult);
+   crypto_req_done, wait);
 
memcpy(hash_buff, template->plaintext + temp,
template->tap[k]);
@@ -247,7 +221,7 @@ static int ahash_partial_update(struct ahash_request **preq,
pr_err("alg: hash: Failed to import() for %s\n", algo);
goto out;
}
-   ret = wait_async_op(tresult, crypto_ahash_update(req));
+   ret = crypto_wait_req(crypto_ahash_update(req), wait);
if (ret)
goto out;
*preq = req;
@@ -272,7 +246,7 @@ static int __test_hash(struct crypto_ahash *tfm,
char *result;
char *key;
struct ahash_request *req;
-   struct tcrypt_result tresult;
+   struct crypto_wait wait;
void *hash_buff;
char *xbuf[XBUFSIZE];
int ret = -ENOMEM;
@@ -286,7 +260,7 @@ static int __test_hash(struct crypto_ahash *tfm,
if (testmgr_alloc_buf(xbuf))
goto out_nobuf;
 
-   init_completion();
+   crypto_init_wait();
 
req = ahash_request_alloc(tfm, GFP_KERNEL);
if (!req) {
@@ -295,7 +269,7 @@ static int __test_hash(struct crypto_ahash *tfm,
goto out_noreq;
}
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  tcrypt_complete, );
+  crypto_req_done, );
 
j = 0;
for (i = 0; i < tcount; i++) {
@@ -335,26 +309,26 @@ static int __test_hash(struct crypto_ahash *tfm,
 
ahash_request_set_crypt(req, sg, result, template[i].psize);
if (use_digest) {
-   ret = wait_async_op(, crypto_ahash_digest(req));
+   ret = crypto_wait_req(crypto_ahash_digest(req), );
if (ret) {
pr_err("alg: hash: digest failed on test %d "
   "for %s: ret=%d\n", j, algo, -ret);
goto out;
}
} else {
-   ret = wait_async_op(, crypto_ahash_init(req));
+   ret = crypto_wait_req(crypto_ahash_init(req), );
if (ret) {
pr_err("alg: hash: init failed on test %d "
   "for %s: ret=%d\n", j, algo, -ret);
goto out;
}
-   ret = wait_async_op(, crypto_ahash_update(req));
+   ret = crypto_wait_req(crypto_ahash_update(req), );
if (ret) {
pr_err("alg: hash: update failed on test %d "
  

[PATCH v9 13/20] dm: move dm-verity to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
dm-verity is starting async. crypto ops and waiting for them to complete.
Move it over to generic code doing the same.

This also avoids a future potential data coruption bug created
by the use of wait_for_completion_interruptible() without dealing
correctly with an interrupt aborting the wait prior to the
async op finishing, should this code ever move to a context
where signals are not masked.

Signed-off-by: Gilad Ben-Yossef 
CC: Mikulas Patocka 
---
 drivers/md/dm-verity-target.c | 81 +++
 drivers/md/dm-verity.h|  5 ---
 2 files changed, 20 insertions(+), 66 deletions(-)

diff --git a/drivers/md/dm-verity-target.c b/drivers/md/dm-verity-target.c
index bda3cac..811ad28 100644
--- a/drivers/md/dm-verity-target.c
+++ b/drivers/md/dm-verity-target.c
@@ -92,74 +92,33 @@ static sector_t verity_position_at_level(struct dm_verity 
*v, sector_t block,
return block >> (level * v->hash_per_block_bits);
 }
 
-/*
- * Callback function for asynchrnous crypto API completion notification
- */
-static void verity_op_done(struct crypto_async_request *base, int err)
-{
-   struct verity_result *res = (struct verity_result *)base->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
-/*
- * Wait for async crypto API callback
- */
-static inline int verity_complete_op(struct verity_result *res, int ret)
-{
-   switch (ret) {
-   case 0:
-   break;
-
-   case -EINPROGRESS:
-   case -EBUSY:
-   ret = wait_for_completion_interruptible(>completion);
-   if (!ret)
-   ret = res->err;
-   reinit_completion(>completion);
-   break;
-
-   default:
-   DMERR("verity_wait_hash: crypto op submission failed: %d", ret);
-   }
-
-   if (unlikely(ret < 0))
-   DMERR("verity_wait_hash: crypto op failed: %d", ret);
-
-   return ret;
-}
-
 static int verity_hash_update(struct dm_verity *v, struct ahash_request *req,
const u8 *data, size_t len,
-   struct verity_result *res)
+   struct crypto_wait *wait)
 {
struct scatterlist sg;
 
sg_init_one(, data, len);
ahash_request_set_crypt(req, , NULL, len);
 
-   return verity_complete_op(res, crypto_ahash_update(req));
+   return crypto_wait_req(crypto_ahash_update(req), wait);
 }
 
 /*
  * Wrapper for crypto_ahash_init, which handles verity salting.
  */
 static int verity_hash_init(struct dm_verity *v, struct ahash_request *req,
-   struct verity_result *res)
+   struct crypto_wait *wait)
 {
int r;
 
ahash_request_set_tfm(req, v->tfm);
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP |
CRYPTO_TFM_REQ_MAY_BACKLOG,
-   verity_op_done, (void *)res);
-   init_completion(>completion);
+   crypto_req_done, (void *)wait);
+   crypto_init_wait(wait);
 
-   r = verity_complete_op(res, crypto_ahash_init(req));
+   r = crypto_wait_req(crypto_ahash_init(req), wait);
 
if (unlikely(r < 0)) {
DMERR("crypto_ahash_init failed: %d", r);
@@ -167,18 +126,18 @@ static int verity_hash_init(struct dm_verity *v, struct 
ahash_request *req,
}
 
if (likely(v->salt_size && (v->version >= 1)))
-   r = verity_hash_update(v, req, v->salt, v->salt_size, res);
+   r = verity_hash_update(v, req, v->salt, v->salt_size, wait);
 
return r;
 }
 
 static int verity_hash_final(struct dm_verity *v, struct ahash_request *req,
-u8 *digest, struct verity_result *res)
+u8 *digest, struct crypto_wait *wait)
 {
int r;
 
if (unlikely(v->salt_size && (!v->version))) {
-   r = verity_hash_update(v, req, v->salt, v->salt_size, res);
+   r = verity_hash_update(v, req, v->salt, v->salt_size, wait);
 
if (r < 0) {
DMERR("verity_hash_final failed updating salt: %d", r);
@@ -187,7 +146,7 @@ static int verity_hash_final(struct dm_verity *v, struct 
ahash_request *req,
}
 
ahash_request_set_crypt(req, NULL, digest, 0);
-   r = verity_complete_op(res, crypto_ahash_final(req));
+   r = crypto_wait_req(crypto_ahash_final(req), wait);
 out:
return r;
 }
@@ -196,17 +155,17 @@ int verity_hash(struct dm_verity *v, struct ahash_request 
*req,
const u8 *data, size_t len, u8 *digest)
 {
int r;
-   struct verity_result res;
+   struct crypto_wait wait;
 
-   r = verity_hash_init(v, req, );
+   r = verity_hash_init(v, req, );
if (unlikely(r < 

[PATCH v9 14/20] cifs: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
cifs starts an async. crypto op and waits for their completion.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Pavel Shilovsky 
---
 fs/cifs/smb2ops.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index bdb963d..e067404 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2087,22 +2087,6 @@ init_sg(struct smb_rqst *rqst, u8 *sign)
return sg;
 }
 
-struct cifs_crypt_result {
-   int err;
-   struct completion completion;
-};
-
-static void cifs_crypt_complete(struct crypto_async_request *req, int err)
-{
-   struct cifs_crypt_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static int
 smb2_get_enc_key(struct TCP_Server_Info *server, __u64 ses_id, int enc, u8 
*key)
 {
@@ -2143,12 +2127,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
smb_rqst *rqst, int enc)
struct aead_request *req;
char *iv;
unsigned int iv_len;
-   struct cifs_crypt_result result = {0, };
+   DECLARE_CRYPTO_WAIT(wait);
struct crypto_aead *tfm;
unsigned int crypt_len = le32_to_cpu(tr_hdr->OriginalMessageSize);
 
-   init_completion();
-
rc = smb2_get_enc_key(server, tr_hdr->SessionId, enc, key);
if (rc) {
cifs_dbg(VFS, "%s: Could not get %scryption key\n", __func__,
@@ -2208,14 +2190,10 @@ crypt_message(struct TCP_Server_Info *server, struct 
smb_rqst *rqst, int enc)
aead_request_set_ad(req, assoc_data_len);
 
aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- cifs_crypt_complete, );
+ crypto_req_done, );
 
-   rc = enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req);
-
-   if (rc == -EINPROGRESS || rc == -EBUSY) {
-   wait_for_completion();
-   rc = result.err;
-   }
+   rc = crypto_wait_req(enc ? crypto_aead_encrypt(req)
+   : crypto_aead_decrypt(req), );
 
if (!rc && enc)
memcpy(_hdr->Signature, sign, SMB2_SIGNATURE_SIZE);
-- 
2.7.4



[PATCH v9 12/20] fscrypt: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
fscrypt starts several async. crypto ops and waiting for them to
complete. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 fs/crypto/crypto.c  | 28 
 fs/crypto/fname.c   | 36 ++--
 fs/crypto/fscrypt_private.h | 10 --
 fs/crypto/keyinfo.c | 21 +++--
 4 files changed, 13 insertions(+), 82 deletions(-)

diff --git a/fs/crypto/crypto.c b/fs/crypto/crypto.c
index c7835df..80a3cad 100644
--- a/fs/crypto/crypto.c
+++ b/fs/crypto/crypto.c
@@ -126,21 +126,6 @@ struct fscrypt_ctx *fscrypt_get_ctx(const struct inode 
*inode, gfp_t gfp_flags)
 }
 EXPORT_SYMBOL(fscrypt_get_ctx);
 
-/**
- * page_crypt_complete() - completion callback for page crypto
- * @req: The asynchronous cipher request context
- * @res: The result of the cipher operation
- */
-static void page_crypt_complete(struct crypto_async_request *req, int res)
-{
-   struct fscrypt_completion_result *ecr = req->data;
-
-   if (res == -EINPROGRESS)
-   return;
-   ecr->res = res;
-   complete(>completion);
-}
-
 int fscrypt_do_page_crypto(const struct inode *inode, fscrypt_direction_t rw,
   u64 lblk_num, struct page *src_page,
   struct page *dest_page, unsigned int len,
@@ -151,7 +136,7 @@ int fscrypt_do_page_crypto(const struct inode *inode, 
fscrypt_direction_t rw,
u8 padding[FS_IV_SIZE - sizeof(__le64)];
} iv;
struct skcipher_request *req = NULL;
-   DECLARE_FS_COMPLETION_RESULT(ecr);
+   DECLARE_CRYPTO_WAIT(wait);
struct scatterlist dst, src;
struct fscrypt_info *ci = inode->i_crypt_info;
struct crypto_skcipher *tfm = ci->ci_ctfm;
@@ -179,7 +164,7 @@ int fscrypt_do_page_crypto(const struct inode *inode, 
fscrypt_direction_t rw,
 
skcipher_request_set_callback(
req, CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-   page_crypt_complete, );
+   crypto_req_done, );
 
sg_init_table(, 1);
sg_set_page(, dest_page, len, offs);
@@ -187,14 +172,9 @@ int fscrypt_do_page_crypto(const struct inode *inode, 
fscrypt_direction_t rw,
sg_set_page(, src_page, len, offs);
skcipher_request_set_crypt(req, , , len, );
if (rw == FS_DECRYPT)
-   res = crypto_skcipher_decrypt(req);
+   res = crypto_wait_req(crypto_skcipher_decrypt(req), );
else
-   res = crypto_skcipher_encrypt(req);
-   if (res == -EINPROGRESS || res == -EBUSY) {
-   BUG_ON(req->base.data != );
-   wait_for_completion();
-   res = ecr.res;
-   }
+   res = crypto_wait_req(crypto_skcipher_encrypt(req), );
skcipher_request_free(req);
if (res) {
printk_ratelimited(KERN_ERR
diff --git a/fs/crypto/fname.c b/fs/crypto/fname.c
index ad9f814..a80a0d3 100644
--- a/fs/crypto/fname.c
+++ b/fs/crypto/fname.c
@@ -15,21 +15,6 @@
 #include "fscrypt_private.h"
 
 /**
- * fname_crypt_complete() - completion callback for filename crypto
- * @req: The asynchronous cipher request context
- * @res: The result of the cipher operation
- */
-static void fname_crypt_complete(struct crypto_async_request *req, int res)
-{
-   struct fscrypt_completion_result *ecr = req->data;
-
-   if (res == -EINPROGRESS)
-   return;
-   ecr->res = res;
-   complete(>completion);
-}
-
-/**
  * fname_encrypt() - encrypt a filename
  *
  * The caller must have allocated sufficient memory for the @oname string.
@@ -40,7 +25,7 @@ static int fname_encrypt(struct inode *inode,
const struct qstr *iname, struct fscrypt_str *oname)
 {
struct skcipher_request *req = NULL;
-   DECLARE_FS_COMPLETION_RESULT(ecr);
+   DECLARE_CRYPTO_WAIT(wait);
struct fscrypt_info *ci = inode->i_crypt_info;
struct crypto_skcipher *tfm = ci->ci_ctfm;
int res = 0;
@@ -76,17 +61,12 @@ static int fname_encrypt(struct inode *inode,
}
skcipher_request_set_callback(req,
CRYPTO_TFM_REQ_MAY_BACKLOG | CRYPTO_TFM_REQ_MAY_SLEEP,
-   fname_crypt_complete, );
+   crypto_req_done, );
sg_init_one(, oname->name, cryptlen);
skcipher_request_set_crypt(req, , , cryptlen, iv);
 
/* Do the encryption */
-   res = crypto_skcipher_encrypt(req);
-   if (res == -EINPROGRESS || res == -EBUSY) {
-   /* Request is being completed asynchronously; wait for it */
-   wait_for_completion();
-   res = ecr.res;
-   }
+   res = crypto_wait_req(crypto_skcipher_encrypt(req), );
skcipher_request_free(req);
if (res < 0) {
printk_ratelimited(KERN_ERR
@@ -110,7 +90,7 @@ static int fname_decrypt(struct inode *inode,

[PATCH v9 20/20] crypto: adapt api sample to use async. op wait

2017-10-15 Thread Gilad Ben-Yossef
The code sample is waiting for an async. crypto op completion.
Adapt sample to use the new generic infrastructure to do the same.

This also fixes a possible data coruption bug created by the
use of wait_for_completion_interruptible() without dealing
correctly with an interrupt aborting the wait prior to the
async op finishing.

Signed-off-by: Gilad Ben-Yossef 
---
 Documentation/crypto/api-samples.rst | 52 +++-
 1 file changed, 10 insertions(+), 42 deletions(-)

diff --git a/Documentation/crypto/api-samples.rst 
b/Documentation/crypto/api-samples.rst
index 2531948..006827e 100644
--- a/Documentation/crypto/api-samples.rst
+++ b/Documentation/crypto/api-samples.rst
@@ -7,59 +7,27 @@ Code Example For Symmetric Key Cipher Operation
 ::
 
 
-struct tcrypt_result {
-struct completion completion;
-int err;
-};
-
 /* tie all data structures together */
 struct skcipher_def {
 struct scatterlist sg;
 struct crypto_skcipher *tfm;
 struct skcipher_request *req;
-struct tcrypt_result result;
+struct crypto_wait wait;
 };
 
-/* Callback function */
-static void test_skcipher_cb(struct crypto_async_request *req, int error)
-{
-struct tcrypt_result *result = req->data;
-
-if (error == -EINPROGRESS)
-return;
-result->err = error;
-complete(>completion);
-pr_info("Encryption finished successfully\n");
-}
-
 /* Perform cipher operation */
 static unsigned int test_skcipher_encdec(struct skcipher_def *sk,
  int enc)
 {
-int rc = 0;
+int rc;
 
 if (enc)
-rc = crypto_skcipher_encrypt(sk->req);
+rc = crypto_wait_req(crypto_skcipher_encrypt(sk->req), >wait);
 else
-rc = crypto_skcipher_decrypt(sk->req);
-
-switch (rc) {
-case 0:
-break;
-case -EINPROGRESS:
-case -EBUSY:
-rc = wait_for_completion_interruptible(
->result.completion);
-if (!rc && !sk->result.err) {
-reinit_completion(>result.completion);
-break;
-}
-default:
-pr_info("skcipher encrypt returned with %d result %d\n",
-rc, sk->result.err);
-break;
-}
-init_completion(>result.completion);
+rc = crypto_wait_req(crypto_skcipher_decrypt(sk->req), >wait);
+
+   if (rc)
+   pr_info("skcipher encrypt returned with result %d\n", rc);
 
 return rc;
 }
@@ -89,8 +57,8 @@ Code Example For Symmetric Key Cipher Operation
 }
 
 skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  test_skcipher_cb,
-  );
+  crypto_req_done,
+  );
 
 /* AES 256 with random key */
 get_random_bytes(, 32);
@@ -122,7 +90,7 @@ Code Example For Symmetric Key Cipher Operation
 /* We encrypt one block */
 sg_init_one(, scratchpad, 16);
 skcipher_request_set_crypt(req, , , 16, ivdata);
-init_completion();
+crypto_init_wait();
 
 /* encrypt data */
 ret = test_skcipher_encdec(, 1);
-- 
2.7.4



[PATCH v9 19/20] crypto: mediatek: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
The mediatek driver starts several async crypto ops and waits for their
completions. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Ryder Lee 
---
 drivers/crypto/mediatek/mtk-aes.c | 31 +--
 1 file changed, 5 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/mediatek/mtk-aes.c 
b/drivers/crypto/mediatek/mtk-aes.c
index 32aa587..c2058cf 100644
--- a/drivers/crypto/mediatek/mtk-aes.c
+++ b/drivers/crypto/mediatek/mtk-aes.c
@@ -138,11 +138,6 @@ struct mtk_aes_gcm_ctx {
struct crypto_skcipher *ctr;
 };
 
-struct mtk_aes_gcm_setkey_result {
-   int err;
-   struct completion completion;
-};
-
 struct mtk_aes_drv {
struct list_head dev_list;
/* Device list lock */
@@ -942,17 +937,6 @@ static int mtk_aes_gcm_crypt(struct aead_request *req, u64 
mode)
>base);
 }
 
-static void mtk_gcm_setkey_done(struct crypto_async_request *req, int err)
-{
-   struct mtk_aes_gcm_setkey_result *result = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   result->err = err;
-   complete(>completion);
-}
-
 /*
  * Because of the hardware limitation, we need to pre-calculate key(H)
  * for the GHASH operation. The result of the encryption operation
@@ -968,7 +952,7 @@ static int mtk_aes_gcm_setkey(struct crypto_aead *aead, 
const u8 *key,
u32 hash[4];
u8 iv[8];
 
-   struct mtk_aes_gcm_setkey_result result;
+   struct crypto_wait wait;
 
struct scatterlist sg[1];
struct skcipher_request req;
@@ -1008,22 +992,17 @@ static int mtk_aes_gcm_setkey(struct crypto_aead *aead, 
const u8 *key,
if (!data)
return -ENOMEM;
 
-   init_completion(>result.completion);
+   crypto_init_wait(>wait);
sg_init_one(data->sg, >hash, AES_BLOCK_SIZE);
skcipher_request_set_tfm(>req, ctr);
skcipher_request_set_callback(>req, CRYPTO_TFM_REQ_MAY_SLEEP |
  CRYPTO_TFM_REQ_MAY_BACKLOG,
- mtk_gcm_setkey_done, >result);
+ crypto_req_done, >wait);
skcipher_request_set_crypt(>req, data->sg, data->sg,
   AES_BLOCK_SIZE, data->iv);
 
-   err = crypto_skcipher_encrypt(>req);
-   if (err == -EINPROGRESS || err == -EBUSY) {
-   err = wait_for_completion_interruptible(
-   >result.completion);
-   if (!err)
-   err = data->result.err;
-   }
+   err = crypto_wait_req(crypto_skcipher_encrypt(>req),
+ >wait);
if (err)
goto out;
 
-- 
2.7.4



[PATCH v9 17/20] crypto: talitos: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
The talitos driver starts several async crypto ops and  waits for their
completions. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 drivers/crypto/talitos.c | 38 +-
 1 file changed, 5 insertions(+), 33 deletions(-)

diff --git a/drivers/crypto/talitos.c b/drivers/crypto/talitos.c
index 5bd8191..9c80e0c 100644
--- a/drivers/crypto/talitos.c
+++ b/drivers/crypto/talitos.c
@@ -2160,22 +2160,6 @@ static int ahash_import(struct ahash_request *areq, 
const void *in)
return 0;
 }
 
-struct keyhash_result {
-   struct completion completion;
-   int err;
-};
-
-static void keyhash_complete(struct crypto_async_request *req, int err)
-{
-   struct keyhash_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static int keyhash(struct crypto_ahash *tfm, const u8 *key, unsigned int 
keylen,
   u8 *hash)
 {
@@ -2183,10 +2167,10 @@ static int keyhash(struct crypto_ahash *tfm, const u8 
*key, unsigned int keylen,
 
struct scatterlist sg[1];
struct ahash_request *req;
-   struct keyhash_result hresult;
+   struct crypto_wait wait;
int ret;
 
-   init_completion();
+   crypto_init_wait();
 
req = ahash_request_alloc(tfm, GFP_KERNEL);
if (!req)
@@ -2195,25 +2179,13 @@ static int keyhash(struct crypto_ahash *tfm, const u8 
*key, unsigned int keylen,
/* Keep tfm keylen == 0 during hash of the long key */
ctx->keylen = 0;
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  keyhash_complete, );
+  crypto_req_done, );
 
sg_init_one([0], key, keylen);
 
ahash_request_set_crypt(req, sg, hash, keylen);
-   ret = crypto_ahash_digest(req);
-   switch (ret) {
-   case 0:
-   break;
-   case -EINPROGRESS:
-   case -EBUSY:
-   ret = wait_for_completion_interruptible(
-   );
-   if (!ret)
-   ret = hresult.err;
-   break;
-   default:
-   break;
-   }
+   ret = crypto_wait_req(crypto_ahash_digest(req), );
+
ahash_request_free(req);
 
return ret;
-- 
2.7.4



[PATCH v9 18/20] crypto: qce: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
The qce driver starts several async crypto ops and  waits for their
completions. Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 drivers/crypto/qce/sha.c | 30 --
 1 file changed, 4 insertions(+), 26 deletions(-)

diff --git a/drivers/crypto/qce/sha.c b/drivers/crypto/qce/sha.c
index 47e114a..53227d7 100644
--- a/drivers/crypto/qce/sha.c
+++ b/drivers/crypto/qce/sha.c
@@ -349,28 +349,12 @@ static int qce_ahash_digest(struct ahash_request *req)
return qce->async_req_enqueue(tmpl->qce, >base);
 }
 
-struct qce_ahash_result {
-   struct completion completion;
-   int error;
-};
-
-static void qce_digest_complete(struct crypto_async_request *req, int error)
-{
-   struct qce_ahash_result *result = req->data;
-
-   if (error == -EINPROGRESS)
-   return;
-
-   result->error = error;
-   complete(>completion);
-}
-
 static int qce_ahash_hmac_setkey(struct crypto_ahash *tfm, const u8 *key,
 unsigned int keylen)
 {
unsigned int digestsize = crypto_ahash_digestsize(tfm);
struct qce_sha_ctx *ctx = crypto_tfm_ctx(>base);
-   struct qce_ahash_result result;
+   struct crypto_wait wait;
struct ahash_request *req;
struct scatterlist sg;
unsigned int blocksize;
@@ -405,9 +389,9 @@ static int qce_ahash_hmac_setkey(struct crypto_ahash *tfm, 
const u8 *key,
goto err_free_ahash;
}
 
-   init_completion();
+   crypto_init_wait();
ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-  qce_digest_complete, );
+  crypto_req_done, );
crypto_ahash_clear_flags(ahash_tfm, ~0);
 
buf = kzalloc(keylen + QCE_MAX_ALIGN_SIZE, GFP_KERNEL);
@@ -420,13 +404,7 @@ static int qce_ahash_hmac_setkey(struct crypto_ahash *tfm, 
const u8 *key,
sg_init_one(, buf, keylen);
ahash_request_set_crypt(req, , ctx->authkey, keylen);
 
-   ret = crypto_ahash_digest(req);
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   ret = wait_for_completion_interruptible();
-   if (!ret)
-   ret = result.error;
-   }
-
+   ret = crypto_wait_req(crypto_ahash_digest(req), );
if (ret)
crypto_ahash_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
 
-- 
2.7.4



[PATCH v9 16/20] crypto: tcrypt: move to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
tcrypt starts several async crypto ops and  waits for their completions.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/tcrypt.c | 84 +
 1 file changed, 25 insertions(+), 59 deletions(-)

diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index a371c072..7fa7047 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -79,34 +79,11 @@ static char *check[] = {
NULL
 };
 
-struct tcrypt_result {
-   struct completion completion;
-   int err;
-};
-
-static void tcrypt_complete(struct crypto_async_request *req, int err)
-{
-   struct tcrypt_result *res = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   res->err = err;
-   complete(>completion);
-}
-
 static inline int do_one_aead_op(struct aead_request *req, int ret)
 {
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   struct tcrypt_result *tr = req->base.data;
+   struct crypto_wait *wait = req->base.data;
 
-   ret = wait_for_completion_interruptible(>completion);
-   if (!ret)
-   ret = tr->err;
-   reinit_completion(>completion);
-   }
-
-   return ret;
+   return crypto_wait_req(ret, wait);
 }
 
 static int test_aead_jiffies(struct aead_request *req, int enc,
@@ -248,7 +225,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
char *axbuf[XBUFSIZE];
unsigned int *b_size;
unsigned int iv_len;
-   struct tcrypt_result result;
+   struct crypto_wait wait;
 
iv = kzalloc(MAX_IVLEN, GFP_KERNEL);
if (!iv)
@@ -284,7 +261,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
goto out_notfm;
}
 
-   init_completion();
+   crypto_init_wait();
printk(KERN_INFO "\ntesting speed of %s (%s) %s\n", algo,
get_driver_name(crypto_aead, tfm), e);
 
@@ -296,7 +273,7 @@ static void test_aead_speed(const char *algo, int enc, 
unsigned int secs,
}
 
aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- tcrypt_complete, );
+ crypto_req_done, );
 
i = 0;
do {
@@ -396,21 +373,16 @@ static void test_hash_sg_init(struct scatterlist *sg)
 
 static inline int do_one_ahash_op(struct ahash_request *req, int ret)
 {
-   if (ret == -EINPROGRESS || ret == -EBUSY) {
-   struct tcrypt_result *tr = req->base.data;
+   struct crypto_wait *wait = req->base.data;
 
-   wait_for_completion(>completion);
-   reinit_completion(>completion);
-   ret = tr->err;
-   }
-   return ret;
+   return crypto_wait_req(ret, wait);
 }
 
 struct test_mb_ahash_data {
struct scatterlist sg[TVMEMSIZE];
char result[64];
struct ahash_request *req;
-   struct tcrypt_result tresult;
+   struct crypto_wait wait;
char *xbuf[XBUFSIZE];
 };
 
@@ -439,7 +411,7 @@ static void test_mb_ahash_speed(const char *algo, unsigned 
int sec,
if (testmgr_alloc_buf(data[i].xbuf))
goto out;
 
-   init_completion([i].tresult.completion);
+   crypto_init_wait([i].wait);
 
data[i].req = ahash_request_alloc(tfm, GFP_KERNEL);
if (!data[i].req) {
@@ -448,8 +420,8 @@ static void test_mb_ahash_speed(const char *algo, unsigned 
int sec,
goto out;
}
 
-   ahash_request_set_callback(data[i].req, 0,
-  tcrypt_complete, [i].tresult);
+   ahash_request_set_callback(data[i].req, 0, crypto_req_done,
+  [i].wait);
test_hash_sg_init(data[i].sg);
}
 
@@ -491,16 +463,16 @@ static void test_mb_ahash_speed(const char *algo, 
unsigned int sec,
if (ret)
break;
 
-   complete([k].tresult.completion);
-   data[k].tresult.err = 0;
+   crypto_req_done([k].req->base, 0);
}
 
for (j = 0; j < k; j++) {
-   struct tcrypt_result *tr = [j].tresult;
+   struct crypto_wait *wait = [j].wait;
+   int wait_ret;
 
-   wait_for_completion(>completion);
-   if (tr->err)
-   ret = tr->err;
+   wait_ret = crypto_wait_req(-EINPROGRESS, wait);
+   if (wait_ret)
+   ret = wait_ret;
}
 
end = get_cycles();
@@ -678,7 +650,7 @@ static void test_ahash_speed_common(const char *algo, 
unsigned int secs,

[PATCH v9 10/20] crypto: move gcm to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
gcm is starting an async. crypto op and waiting for it complete.
Move it over to generic code doing the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/gcm.c | 32 ++--
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/crypto/gcm.c b/crypto/gcm.c
index 80cf6cf..8589681 100644
--- a/crypto/gcm.c
+++ b/crypto/gcm.c
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include "internal.h"
-#include 
 #include 
 #include 
 #include 
@@ -79,11 +78,6 @@ struct crypto_gcm_req_priv_ctx {
} u;
 };
 
-struct crypto_gcm_setkey_result {
-   int err;
-   struct completion completion;
-};
-
 static struct {
u8 buf[16];
struct scatterlist sg;
@@ -99,17 +93,6 @@ static inline struct crypto_gcm_req_priv_ctx 
*crypto_gcm_reqctx(
return (void *)PTR_ALIGN((u8 *)aead_request_ctx(req), align + 1);
 }
 
-static void crypto_gcm_setkey_done(struct crypto_async_request *req, int err)
-{
-   struct crypto_gcm_setkey_result *result = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   result->err = err;
-   complete(>completion);
-}
-
 static int crypto_gcm_setkey(struct crypto_aead *aead, const u8 *key,
 unsigned int keylen)
 {
@@ -120,7 +103,7 @@ static int crypto_gcm_setkey(struct crypto_aead *aead, 
const u8 *key,
be128 hash;
u8 iv[16];
 
-   struct crypto_gcm_setkey_result result;
+   struct crypto_wait wait;
 
struct scatterlist sg[1];
struct skcipher_request req;
@@ -141,21 +124,18 @@ static int crypto_gcm_setkey(struct crypto_aead *aead, 
const u8 *key,
if (!data)
return -ENOMEM;
 
-   init_completion(>result.completion);
+   crypto_init_wait(>wait);
sg_init_one(data->sg, >hash, sizeof(data->hash));
skcipher_request_set_tfm(>req, ctr);
skcipher_request_set_callback(>req, CRYPTO_TFM_REQ_MAY_SLEEP |
  CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_gcm_setkey_done,
- >result);
+ crypto_req_done,
+ >wait);
skcipher_request_set_crypt(>req, data->sg, data->sg,
   sizeof(data->hash), data->iv);
 
-   err = crypto_skcipher_encrypt(>req);
-   if (err == -EINPROGRESS || err == -EBUSY) {
-   wait_for_completion(>result.completion);
-   err = data->result.err;
-   }
+   err = crypto_wait_req(crypto_skcipher_encrypt(>req),
+   >wait);
 
if (err)
goto out;
-- 
2.7.4



[PATCH v9 08/20] crypto: move pub key to generic async completion

2017-10-15 Thread Gilad Ben-Yossef
public_key_verify_signature() is starting an async crypto op and
waiting for it to complete. Move it over to generic code doing
the same.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/asymmetric_keys/public_key.c | 28 
 1 file changed, 4 insertions(+), 24 deletions(-)

diff --git a/crypto/asymmetric_keys/public_key.c 
b/crypto/asymmetric_keys/public_key.c
index 3cd6e12..d916235 100644
--- a/crypto/asymmetric_keys/public_key.c
+++ b/crypto/asymmetric_keys/public_key.c
@@ -57,29 +57,13 @@ static void public_key_destroy(void *payload0, void 
*payload3)
public_key_signature_free(payload3);
 }
 
-struct public_key_completion {
-   struct completion completion;
-   int err;
-};
-
-static void public_key_verify_done(struct crypto_async_request *req, int err)
-{
-   struct public_key_completion *compl = req->data;
-
-   if (err == -EINPROGRESS)
-   return;
-
-   compl->err = err;
-   complete(>completion);
-}
-
 /*
  * Verify a signature using a public key.
  */
 int public_key_verify_signature(const struct public_key *pkey,
const struct public_key_signature *sig)
 {
-   struct public_key_completion compl;
+   struct crypto_wait cwait;
struct crypto_akcipher *tfm;
struct akcipher_request *req;
struct scatterlist sig_sg, digest_sg;
@@ -131,20 +115,16 @@ int public_key_verify_signature(const struct public_key 
*pkey,
sg_init_one(_sg, output, outlen);
akcipher_request_set_crypt(req, _sg, _sg, sig->s_size,
   outlen);
-   init_completion();
+   crypto_init_wait();
akcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG |
  CRYPTO_TFM_REQ_MAY_SLEEP,
- public_key_verify_done, );
+ crypto_req_done, );
 
/* Perform the verification calculation.  This doesn't actually do the
 * verification, but rather calculates the hash expected by the
 * signature and returns that to us.
 */
-   ret = crypto_akcipher_verify(req);
-   if ((ret == -EINPROGRESS) || (ret == -EBUSY)) {
-   wait_for_completion();
-   ret = compl.err;
-   }
+   ret = crypto_wait_req(crypto_akcipher_verify(req), );
if (ret < 0)
goto out_free_output;
 
-- 
2.7.4



[PATCH v9 04/20] crypto: remove redundant backlog checks on EBUSY

2017-10-15 Thread Gilad Ben-Yossef
Now that -EBUSY return code only indicates backlog queueing
we can safely remove the now redundant check for the
CRYPTO_TFM_REQ_MAY_BACKLOG flag when -EBUSY is returned.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/ahash.c| 12 +++-
 crypto/cts.c  |  6 ++
 crypto/lrw.c  |  8 ++--
 crypto/rsa-pkcs1pad.c | 16 
 crypto/xts.c  |  8 ++--
 5 files changed, 13 insertions(+), 37 deletions(-)

diff --git a/crypto/ahash.c b/crypto/ahash.c
index 5e8666e..3a35d67 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -334,9 +334,7 @@ static int ahash_op_unaligned(struct ahash_request *req,
return err;
 
err = op(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && (ahash_request_flags(req) &
-  CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
 
ahash_restore_req(req, err);
@@ -394,9 +392,7 @@ static int ahash_def_finup_finish1(struct ahash_request 
*req, int err)
req->base.complete = ahash_def_finup_done2;
 
err = crypto_ahash_reqtfm(req)->final(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && (ahash_request_flags(req) &
-  CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
 
 out:
@@ -432,9 +428,7 @@ static int ahash_def_finup(struct ahash_request *req)
return err;
 
err = tfm->update(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && (ahash_request_flags(req) &
-  CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
 
return ahash_def_finup_finish1(req, err);
diff --git a/crypto/cts.c b/crypto/cts.c
index 243f591..4773c18 100644
--- a/crypto/cts.c
+++ b/crypto/cts.c
@@ -136,8 +136,7 @@ static void crypto_cts_encrypt_done(struct 
crypto_async_request *areq, int err)
goto out;
 
err = cts_cbc_encrypt(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return;
 
 out:
@@ -229,8 +228,7 @@ static void crypto_cts_decrypt_done(struct 
crypto_async_request *areq, int err)
goto out;
 
err = cts_cbc_decrypt(req);
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY && req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return;
 
 out:
diff --git a/crypto/lrw.c b/crypto/lrw.c
index 92df312..cbbd7c5 100644
--- a/crypto/lrw.c
+++ b/crypto/lrw.c
@@ -328,9 +328,7 @@ static int do_encrypt(struct skcipher_request *req, int err)
  crypto_skcipher_encrypt(subreq) ?:
  post_crypt(req);
 
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY &&
-req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
}
 
@@ -380,9 +378,7 @@ static int do_decrypt(struct skcipher_request *req, int err)
  crypto_skcipher_decrypt(subreq) ?:
  post_crypt(req);
 
-   if (err == -EINPROGRESS ||
-   (err == -EBUSY &&
-req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (err == -EINPROGRESS || err == -EBUSY)
return err;
}
 
diff --git a/crypto/rsa-pkcs1pad.c b/crypto/rsa-pkcs1pad.c
index 407c64b..2908f93 100644
--- a/crypto/rsa-pkcs1pad.c
+++ b/crypto/rsa-pkcs1pad.c
@@ -279,9 +279,7 @@ static int pkcs1pad_encrypt(struct akcipher_request *req)
   req->dst, ctx->key_size - 1, req->dst_len);
 
err = crypto_akcipher_encrypt(_ctx->child_req);
-   if (err != -EINPROGRESS &&
-   (err != -EBUSY ||
-!(req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err != -EINPROGRESS && err != -EBUSY)
return pkcs1pad_encrypt_sign_complete(req, err);
 
return err;
@@ -383,9 +381,7 @@ static int pkcs1pad_decrypt(struct akcipher_request *req)
   ctx->key_size);
 
err = crypto_akcipher_decrypt(_ctx->child_req);
-   if (err != -EINPROGRESS &&
-   (err != -EBUSY ||
-!(req->base.flags & CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   if (err != -EINPROGRESS && err != -EBUSY)
return pkcs1pad_decrypt_complete(req, err);
 
return err;
@@ -440,9 +436,7 @@ static int pkcs1pad_sign(struct akcipher_request *req)
   req->dst, ctx->key_size - 1, req->dst_len);
 
err = crypto_akcipher_sign(_ctx->child_req);
-   if 

[PATCH v9 05/20] crypto: marvell/cesa: remove redundant backlog checks on EBUSY

2017-10-15 Thread Gilad Ben-Yossef
Now that -EBUSY return code only indicates backlog queueing
we can safely remove the now redundant check for the
CRYPTO_TFM_REQ_MAY_BACKLOG flag when -EBUSY is returned.

Signed-off-by: Gilad Ben-Yossef 
Acked-by: Boris Brezillon 
---
 drivers/crypto/marvell/cesa.c | 3 +--
 drivers/crypto/marvell/cesa.h | 2 +-
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/marvell/cesa.c b/drivers/crypto/marvell/cesa.c
index b657e7c..ff73aa5 100644
--- a/drivers/crypto/marvell/cesa.c
+++ b/drivers/crypto/marvell/cesa.c
@@ -181,8 +181,7 @@ int mv_cesa_queue_req(struct crypto_async_request *req,
spin_lock_bh(>lock);
ret = crypto_enqueue_request(>queue, req);
if ((mv_cesa_req_get_type(creq) == CESA_DMA_REQ) &&
-   (ret == -EINPROGRESS ||
-   (ret == -EBUSY && req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)))
+   (ret == -EINPROGRESS || ret == -EBUSY))
mv_cesa_tdma_chain(engine, creq);
spin_unlock_bh(>lock);
 
diff --git a/drivers/crypto/marvell/cesa.h b/drivers/crypto/marvell/cesa.h
index b7872f6..63c8457 100644
--- a/drivers/crypto/marvell/cesa.h
+++ b/drivers/crypto/marvell/cesa.h
@@ -763,7 +763,7 @@ static inline int mv_cesa_req_needs_cleanup(struct 
crypto_async_request *req,
 * the backlog and will be processed later. There's no need to
 * clean it up.
 */
-   if (ret == -EBUSY && req->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)
+   if (ret == -EBUSY)
return false;
 
/* Request wasn't queued, we need to clean it up */
-- 
2.7.4



[PATCH v9 01/20] crypto: change transient busy return code to -EAGAIN

2017-10-15 Thread Gilad Ben-Yossef
The crypto API was using the -EBUSY return value to indicate
both a hard failure to submit a crypto operation into a
transformation provider when the latter was busy and the backlog
mechanism was not enabled as well as a notification that the
operation was queued into the backlog when the backlog mechanism
was enabled.

Having the same return code indicate two very different conditions
depending on a flag is both error prone and requires extra runtime
check like the following to discern between the cases:

if (err == -EINPROGRESS ||
(err == -EBUSY && (ahash_request_flags(req) &
   CRYPTO_TFM_REQ_MAY_BACKLOG)))

This patch changes the return code used to indicate a crypto op
failed due to the transformation provider being transiently busy
to -EAGAIN.

Signed-off-by: Gilad Ben-Yossef 
---
 crypto/algapi.c | 6 --
 crypto/cryptd.c | 4 +---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/crypto/algapi.c b/crypto/algapi.c
index aa699ff..916bee3 100644
--- a/crypto/algapi.c
+++ b/crypto/algapi.c
@@ -897,9 +897,11 @@ int crypto_enqueue_request(struct crypto_queue *queue,
int err = -EINPROGRESS;
 
if (unlikely(queue->qlen >= queue->max_qlen)) {
-   err = -EBUSY;
-   if (!(request->flags & CRYPTO_TFM_REQ_MAY_BACKLOG))
+   if (!(request->flags & CRYPTO_TFM_REQ_MAY_BACKLOG)) {
+   err = -EAGAIN;
goto out;
+   }
+   err = -EBUSY;
if (queue->backlog == >list)
queue->backlog = >list;
}
diff --git a/crypto/cryptd.c b/crypto/cryptd.c
index 0508c48..d1dbdce 100644
--- a/crypto/cryptd.c
+++ b/crypto/cryptd.c
@@ -137,16 +137,14 @@ static int cryptd_enqueue_request(struct cryptd_queue 
*queue,
int cpu, err;
struct cryptd_cpu_queue *cpu_queue;
atomic_t *refcnt;
-   bool may_backlog;
 
cpu = get_cpu();
cpu_queue = this_cpu_ptr(queue->cpu_queue);
err = crypto_enqueue_request(_queue->queue, request);
 
refcnt = crypto_tfm_ctx(request->tfm);
-   may_backlog = request->flags & CRYPTO_TFM_REQ_MAY_BACKLOG;
 
-   if (err == -EBUSY && !may_backlog)
+   if (err == -EAGAIN)
goto out_put_cpu;
 
queue_work_on(cpu, kcrypto_wq, _queue->work);
-- 
2.7.4



RE: [PATCH net-next 0/2] Add mqprio hardware offload support in hns3 driver

2017-10-15 Thread Yuval Mintz
> > >> This patchset adds a new hardware offload type in mqprio before
> adding
> > >> mqprio hardware offload support in hns3 driver.

Apparently Dave has already acceptedAmirtha's changes to mqprio:
https://marc.info/?l=linux-netdev=150803219824053=2 
so I guess you need to revise your patchs to align to the new conventions.

> > >
> > > I think one of the biggest issues in tying this to DCB configuration is 
> > > the
> > > non-immediate [and possibly non persistent] configuration.
> > >
> > > Scenario #1:
> > > User is configuring mqprio offloaded with 3 TCs while device is in willing
> > mode.
> > > Would you expect the driver to immediately respond with a success or
> > instead
> > > delay the return until the DCBx negotiation is complete and the
> operational
> > > num of TCs is actually 3?
> >
> > Well, when user requsts the mqprio offloaded by a hardware shared by
> DCB,
> > I expect
> > the user is not using the dcb tool.
> > If user is still using dcb tool, then result is undefined.
> >
> > The scenario you mention maybe can be enforced by setting willing to zero
> > when user
> > is requesting the mqprio offload, and restore the willing bit when unloaded
> > the mqprio
> > offload.
> 
> Sounds a bit harsh but would probably work.
> 
> > But I think the real issue is that dcb and mqprio shares the tc system in 
> > the
> > stack,
> > the problem may be better to be fixed in the stack rather than in the
> driver,
> > as you
> > suggested in the DCB patchset. What do you think?
> 
> What did you have in mind?
> 
> >
> > >
> > > Scenario #2:
> > > Assume user explicitly offloaded mqprio with 3 TCs, but now DCB
> > configuration
> > > has changed on the peer side and 4 TCs is the new negotiated operational
> > value.
> > > Your current driver logic would change the number of TCs underneath
> the
> > user
> > > configuration [and it would actually probably work due to mqprio being a
> > crappy
> > > qdisc]. But was that the user actual intention?
> > > [I think the likely answer in this scenario is 'yes' since the 
> > > alternative is no
> > better.
> > > But I still thought it was worth mentioning]
> >
> > You are right, the problem also have something to do with mqprio and dcb
> > sharing
> > the tc in the stack.
> >
> > Druing testing, when user explicitly offloaded mqprio with 3 TCs, all
> > queue has a default pfifo mqprio attached, after DCB changes the tc num
> to
> > 4,
> > using tc qdisc shows some queue does not have a default pfifo mqprio
> > attached.
> 
> Really? Then what did it show?
> [I assume it has some pfifo attached, and it's an mqprio dump kind of an
> issue]
> 
> >
> > Maybe we can add a callback to notify mqprio the configuration has
> changed.
> >
> 
> Which would do what?
> You already have the notifications available for monitoring using dcbnl logic 
> if
> the
> configuration change [for user]; So user can re-configure whatever it wants.
> But other than dropping all the qdisc configurations and going back to the
> default
> qdiscs, what default action would mqprio be able to do when configuration
> changes
> that actually makes sense?
> 
> > Thanks
> > Yunsheng Lin
> >
> > >
> > > Cheers,
> > > Yuval
> > >
> > >>
> > >> Yunsheng Lin (2):
> > >>   mqprio: Add a new hardware offload type in mqprio
> > >>   net: hns3: Add mqprio hardware offload support in hns3 driver
> > >>
> > >>  drivers/net/ethernet/hisilicon/hns3/hnae3.h|  1 +
> > >>  .../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 23 +++
> > >>  .../net/ethernet/hisilicon/hns3/hns3pf/hns3_enet.c | 46
> > ++-
> > >> ---
> > >>  include/uapi/linux/pkt_sched.h |  1 +
> > >>  4 files changed, 55 insertions(+), 16 deletions(-)
> > >>
> > >> --
> > >> 1.9.1
> > >
> > >
> > >



Re: [PATCH net-next 4/5] mlxsw: spectrum: router: Add support for address validator notifier

2017-10-15 Thread Ido Schimmel
On Fri, Oct 13, 2017 at 04:02:12PM -0700, David Ahern wrote:
> Add support for inetaddr_validator and inet6addr_validator. The
> notifiers provide a means for validating ipv4 and ipv6 addresses
> before the addresses are installed and on failure the error
> is propagated back to the user.
> 
> Signed-off-by: David Ahern 

Reviewed-by: Ido Schimmel 

Thanks!


Re: [PATCH net-next 2/5] net: ipv6: Make inet6addr_validator a blocking notifier

2017-10-15 Thread Ido Schimmel
On Fri, Oct 13, 2017 at 04:02:10PM -0700, David Ahern wrote:
> inet6addr_validator chain was added by commit 3ad7d2468f79f ("Ipvlan
> should return an error when an address is already in use") to allow
> address validation before changes are committed and to be able to
> fail the address change with an error back to the user. The address
> validation is not done for addresses received from router
> advertisements.
> 
> Handling RAs in softirq context is the only reason for the notifier
> chain to be atomic versus blocking. Since the only current user, ipvlan,
> of the validator chain ignores softirq context, the notifier can be made
> blocking and simply not invoked for softirq path.
> 
> The blocking option is needed by spectrum for example to validate
> resources for an adding an address to an interface.
> 
> Signed-off-by: David Ahern 

Thanks for taking care of the in_softirq() check.

Reviewed-by: Ido Schimmel 


Re: [PATCH net-next 1/5] ipv6: addrconf: cleanup locking in ipv6_add_addr

2017-10-15 Thread Ido Schimmel
On Fri, Oct 13, 2017 at 04:02:09PM -0700, David Ahern wrote:
> ipv6_add_addr is called in process context with rtnl lock held
> (e.g., manual config of an address) or during softirq processing
> (e.g., autoconf and address from a router advertisement).
> 
> Currently, ipv6_add_addr calls rcu_read_lock_bh shortly after entry
> and does not call unlock until exit, minus the call around the address
> validator notifier. Similarly, addrconf_hash_lock is taken after the
> validator notifier and held until exit. This forces the allocation of
> inet6_ifaddr to always be atomic.
> 
> Refactor ipv6_add_addr as follows:
> 1. add an input boolean to discriminate the call path (process context
>or softirq). This new flag controls whether the alloc can be done
>with GFP_KERNEL or GFP_ATOMIC.
> 
> 2. Move the rcu_read_lock_bh and unlock calls only around functions that
>do rcu updates.
> 
> 3. Remove the in6_dev_hold and put added by 3ad7d2468f79f ("Ipvlan should
>return an error when an address is already in use."). This was done
>presumably because rcu_read_unlock_bh needs to be called before calling
>the validator. Since rcu_read_lock is not needed before the validator
>runs revert the hold and put added by 3ad7d2468f79f and only do the
>hold when setting ifp->idev.
> 
> 4. move duplicate address check and insertion of new address in the global
>address hash into a helper. The helper is called after an ifa is
>allocated and filled in.
> 
> This allows the ifa for manually configured addresses to be done with
> GFP_KERNEL and reduces the overall amount of time with rcu_read_lock held
> and hash table spinlock held.
> 
> Signed-off-by: David Ahern 

[...]

> @@ -1073,21 +1085,19 @@ ipv6_add_addr(struct inet6_dev *idev, const struct 
> in6_addr *addr,
>  
>   in6_ifa_hold(ifa);
>   write_unlock(>lock);
> -out2:
> +
>   rcu_read_unlock_bh();
>  
> - if (likely(err == 0))
> - inet6addr_notifier_call_chain(NETDEV_UP, ifa);
> - else {
> + inet6addr_notifier_call_chain(NETDEV_UP, ifa);
> +out:
> + if (unlikely(err < 0)) {
> + if (rt)
> + ip6_rt_put(rt);

I believe 'rt' needs to be set to NULL after addrconf_dst_alloc()
fails.

>   kfree(ifa);
> - in6_dev_put(idev);
>   ifa = ERR_PTR(err);
>   }
>  
>   return ifa;
> -out:
> - spin_unlock(_hash_lock);
> - goto out2;
>  }


Re: [patch net-next 06/34] net: core: use dev->ingress_queue instead of tp->q

2017-10-15 Thread Jiri Pirko
Sun, Oct 15, 2017 at 01:18:54AM CEST, dan...@iogearbox.net wrote:
>On 10/13/2017 08:30 AM, Jiri Pirko wrote:
>> Thu, Oct 12, 2017 at 11:45:43PM CEST, dan...@iogearbox.net wrote:
>> > On 10/12/2017 07:17 PM, Jiri Pirko wrote:
>> > > From: Jiri Pirko 
>> > > 
>> > > In sch_handle_egress and sch_handle_ingress, don't use tp->q and use
>> > > dev->ingress_queue which stores the same pointer instead.
>> > > 
>> > > Signed-off-by: Jiri Pirko 
>> > > ---
>> > >net/core/dev.c | 21 +++--
>> > >1 file changed, 15 insertions(+), 6 deletions(-)
>> > > 
>> > > diff --git a/net/core/dev.c b/net/core/dev.c
>> > > index fcddccb..cb9e5e5 100644
>> > > --- a/net/core/dev.c
>> > > +++ b/net/core/dev.c
>> > > @@ -3273,14 +3273,18 @@ EXPORT_SYMBOL(dev_loopback_xmit);
>> > >static struct sk_buff *
>> > >sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device 
>> > > *dev)
>> > >{
>> > > +struct netdev_queue *netdev_queue =
>> > > +rcu_dereference_bh(dev->ingress_queue);
>> > >  struct tcf_proto *cl = rcu_dereference_bh(dev->egress_cl_list);
>> > >  struct tcf_result cl_res;
>> > > +struct Qdisc *q;
>> > > 
>> > > -if (!cl)
>> > > +if (!cl || !netdev_queue)
>> > >  return skb;
>> > > +q = netdev_queue->qdisc;
>> > 
>> > NAK, no additional overhead in the software fast-path of
>> > sch_handle_{ingress,egress}() like this. There are users out there
>> > that use tc in software only, so performance is critical here.
>> 
>> Okay, how else do you suggest I can avoid the need to use tp->q?
>> I was thinking about storing q directly to net_device, which would safe
>> one dereference, resulting in the same amount as current cl->q.
>
>Sorry for late reply, mostly off for few days. netdev struct has different
>cachelines which are hot on rx and tx (see also the location of the two
>lists, ingress_cl_list and egress_cl_list), if you add only one qdisc
>pointer there, then you'd at least penalize one of the two w/ potential
>cache miss. Can we leave it in cl?

Well that is the whole point of this excercise, to remove it from cl.
The thing is, for the shared blocks, cls are shared between multiple
qdisc instances, so cl->q makes no longer any sense.



Re: [pull request][for-next 00/12] Mellanox, mlx5 IPoIB Muli Pkey support 2017-10-11

2017-10-15 Thread Leon Romanovsky
On Sat, Oct 14, 2017 at 05:19:34PM -0400, Doug Ledford wrote:
> On 10/14/2017 2:48 PM, Saeed Mahameed wrote:
> > Hi Dave and Doug,
> >
> > This series includes updates for mlx5 IPoIB offloading driver from Alex
> > and Feras to add the support for Muli Pkey in the mlx5i ipoib offloading 
> > netdev,
> > to be merged into net-next and rdma-next trees.
>
> As far as the two IPoIB patches are concerned, they're fine.
>
> > Doug, I am sorry I couldn't base this on rc2 since the series needs and 
> > conflicts
> > with a fix that was submitted to rc3, so to keep things simple I based it 
> > on rc4,
> > I hope this is ok with you..
>
> No worries, it just means I have to submit it under another branch.  But
> I'm already holding one patch series in a stand alone branch, so no big
> deal.  And, actually, the IPoIB changes are so small they can simply go
> through Dave's tree if you don't have any dependent code in the IPoIB
> driver to submit after this but still in this devel cycle.

I have IPoIB NAPI implementation waiting for this shared code.

Thanks

>
> > Please pull and let me know if there's any problem.
>
> Once I hear that Dave is OK with the net changes, I'm ready to pull (if
> I need to).
>
> --
> Doug Ledford 
> GPG Key ID: B826A3330E572FDD
> Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD
>





signature.asc
Description: PGP signature