Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-15 Thread Richard Cochran
On Tue, Jun 14, 2016 at 10:38:10PM +0200, Henrik Austad wrote:
> Whereas I want to do 
> 
> aplay some_song.wav

Can you please explain how your patches accomplish this?

Thanks,
Richard


Re: [PATCH net-next 00/17] RDS: multiple connection paths for scaling

2016-06-15 Thread David Miller
From: Sowmini Varadhan 
Date: Mon, 13 Jun 2016 09:44:25 -0700

 ...
> This patch series lays down the foundational data-structures to support
> mprds in the kernel. It implements the changes to split up the
> rds_connection structure into a common (to all paths) part,
> and a per-path rds_conn_path. All I/O workqs are driven from
> the rds_conn_path. 
> 
> Note that this patchset does not (yet) actually enable multipathing
> for any of the transports; all transports will continue to use a 
> single path with the refactored data-structures. A subsequent patchset
> will  add the changes to the rds-tcp module to actually use mprds
> in rds-tcp.

Series applied, thank you.


Re: [Patch net-next] net_sched: remove internal use of TC_POLICE_*

2016-06-15 Thread David Miller
From: Cong Wang 
Date: Mon, 13 Jun 2016 10:47:43 -0700

> These should be gone when we removed CONFIG_NET_CLS_POLICE.
> We can not totally remove them since they are exposed
> to userspace.
> 
> Cc: Jamal Hadi Salim 
> Signed-off-by: Cong Wang 

Applied.


Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-15 Thread Richard Cochran
On Tue, Jun 14, 2016 at 10:38:10PM +0200, Henrik Austad wrote:
> Where is your media-application in this?

Um, that *is* a media application.  It plays music on the sound card.

> You only loop the audio from 
> network to the dsp, is the media-application attached to the dsp-device?

Sorry, I thought the old OSS API would be familiar and easy to
understand.  The /dev/dsp is the sound card.

Thanks,
Richard


Re: [net-next PATCH 01/15] net: Combine GENEVE and VXLAN port offload notifiers into single functions

2016-06-15 Thread David Miller
From: Alexander Duyck 
Date: Mon, 13 Jun 2016 19:50:02 -0700

> I'm not going to speculate on what Dave's opinion on this is.  I'll
> wait to hear it from him.
> 
> My concern at this point is that we have several issues.  Specifically
> we have VXLAN-GPE trying to pass itself off as VXLAN when it clearly
> is not, and I know we are going to end up with somebody eventually
> trying to push this feature into the kernel.  I know for a fact there
> is hardware out there that already supports it.  I'm trying to get
> ahead of this and define what the interface is supposed to look like
> myself so that we don't end up with somebody unfamiliar with all this
> trying to push it.  This way we can avoid having some hardware vendor
> on a timeline trying to push it through quick as in the case of i40e,
> or somebody trying to get around it by just hard coding it into their
> driver like occurred with bnxt.
> 
> While I appreciate the opinion, outright refusing to enable the
> existing offloads is counterproductive.  There are customers out there
> that already have this hardware.  There are driver writers out there
> who are going to have to enable these features one way or another.  If
> we want to be obstructionists then I am sure they can just work around
> us and write up out-of-tree drivers and use something like module
> parameters to enable offloads on a specific port.  Most of these
> implementations only seem to support one port anyway.  I just thought
> it might be better to have this figured out in the kernel so that we
> didn't end up creating a bigger mess than needed with each vendor
> going off and doing their own out-of-tree implementation.

My plan is to try and properly balance the two side of this situation.

Realistically, and Alex is right on this, we shoot ourselves in the
foot by not supporting offloads that exist in hardware now even if
they are not generic.

So I would encourage Alex to keep working on his patch set and to
keep working on the feedback he is given.

Thanks.


Re: [PATCH net-next v3 0/7] vmxnet3: upgrade to version 3

2016-06-15 Thread David Miller
From: Shrikrishna Khare 
Date: Mon, 13 Jun 2016 18:50:00 -0700

> This patchset upgrades vmxnet3 to version 3.

As stated by others, it is completely unacceptable to post so many
patches with little or no commit log message.

You must describe, in full detail, what each and every patch does, why
it is doing so, and how it is doing it.


Re: [PATCH] net: hns: update the dependency

2016-06-15 Thread Yisen Zhuang
Hi David,

I'm really sorry for this.

Because i didn't receive the first two emails, i resented it a few times.

I will pay more attention next time.

Thanks,

Yisen

在 2016/6/15 14:24, David Miller 写道:
> From: Yisen Zhuang 
> Date: Wed, 15 Jun 2016 14:03:33 +0800
> 
>> Hi David,
>>
>> You mean that i send this patch 3 times?
>>
>> I am sorry for this.
>>
>> I don't know why you can receive 3 times. I can only receive an email for 
>> this patch.
> 
> I got three copies, each with a different Date: field.
> 
> patchwork saw all 3 copies as well:
> 
> http://patchwork.ozlabs.org/patch/634452/
> http://patchwork.ozlabs.org/patch/634559/
> http://patchwork.ozlabs.org/patch/634586/
> 
> .
> 



Re: [PATCH 1/3] net: Add MDIO bus driver for the Hisilicon FEMAC

2016-06-15 Thread Li Dongpo


On 2016/6/15 6:27, Rob Herring wrote:
> On Mon, Jun 13, 2016 at 02:07:54PM +0800, Dongpo Li wrote:
>> This patch adds a separate driver for the MDIO interface of the
>> Hisilicon Fast Ethernet MAC.
>>
>> Reviewed-by: Jiancheng Xue 
>> Signed-off-by: Dongpo Li 
>> ---
>>  .../bindings/net/hisilicon-femac-mdio.txt  |  22 +++
> 
> Acked-by: Rob Herring 
> 
Hi Rob,
Thanks for your review.
>>  drivers/net/phy/Kconfig|   8 +
>>  drivers/net/phy/Makefile   |   1 +
>>  drivers/net/phy/mdio-hisi-femac.c  | 165 
>> +
>>  4 files changed, 196 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/net/hisilicon-femac-mdio.txt
>>  create mode 100644 drivers/net/phy/mdio-hisi-femac.c
> 
> .
> 



Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-15 Thread Henrik Austad
On Wed, Jun 15, 2016 at 09:04:41AM +0200, Richard Cochran wrote:
> On Tue, Jun 14, 2016 at 10:38:10PM +0200, Henrik Austad wrote:
> > Whereas I want to do 
> > 
> > aplay some_song.wav
> 
> Can you please explain how your patches accomplish this?

In short:

modprobe tsn
modprobe avb_alsa
mkdir /sys/kernel/config/eth0/link
cd /sys/kernel/config/eth0/link

echo alsa > enabled
aplay -Ddefault:CARD=avb some_song.wav

Likewise on the receiver side, except add 'Listener' to end_station 
attribute

arecord -c2 -r48000 -f S16_LE -Ddefault:CARD=avb > some_recording.wav

I've not had time to fully fix the hw-aprams for alsa, so some manual 
tweaking of arecord is required.


Again, this is a very early attempt to get something useful done with TSN, 
I know there are rough edges, I know buffer handling and timestamping is 
not finished


Note: if you don't have an intel-card, load tsn in debug-mode and it will 
let you use all NICs present.

modprobe tsn in_debug=1


-- 
Henrik Austad


signature.asc
Description: Digital signature


Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-15 Thread Richard Cochran
On Wed, Jun 15, 2016 at 12:15:24PM +0900, Takashi Sakamoto wrote:
> > On Mon, Jun 13, 2016 at 01:47:13PM +0200, Richard Cochran wrote:
> >> I have seen audio PLL/multiplier chips that will take, for example, a
> >> 10 kHz input and produce your 48 kHz media clock.  With the right HW
> >> design, you can tell your PTP Hardware Clock to produce a 1 PPS,
> >> and you will have a synchronized AVB endpoint.  The software is all
> >> there already.  Somebody should tell the ALSA guys about it.
> 
> Just from my curiosity, could I ask you more explanation for it in ALSA
> side?

(Disclaimer: I really don't know too much about ALSA, expect that is
fairly big and complex ;)

Here is what I think ALSA should provide:

- The DA and AD clocks should appear as attributes of the HW device.

- There should be a method for measuring the DA/AD clock rate with
  respect to both the system time and the PTP Hardware Clock (PHC)
  time.

- There should be a method for adjusting the DA/AD clock rate if
  possible.  If not, then ALSA should fall back to sample rate
  conversion.

- There should be a method to determine the time delay from the point
  when the audio data are enqueued into ALSA until they pass through
  the D/A converter.  If this cannot be known precisely, then the
  library should provide an estimate with an error bound.

- I think some AVB use cases will need to know the time delay from A/D
  until the data are available to the local application.  (Distributed
  microphones?  I'm not too sure about that.)

- If the DA/AD clocks are connected to other clock devices in HW,
  there should be a way to find this out in SW.  For example, if SW
  can see the PTP-PHC-PLL-DA relationship from the above example, then
  it knows how to synchronize the DA clock using the network.

  [ Implementing this point involves other subsystems beyond ALSA.  It
isn't really necessary for people designing AVB systems, since
they know their designs, but it would be nice to have for writing
generic applications that can deal with any kind of HW setup. ]

> In ALSA, sampling rate conversion should be in userspace, not in kernel
> land. In alsa-lib, sampling rate conversion is implemented in shared object.
> When userspace applications start playbacking/capturing, depending on PCM
> node to access, these applications load the shared object and convert PCM
> frames from buffer in userspace to mmapped DMA-buffer, then commit them.

The AVB use case places an additional requirement on the rate
conversion.  You will need to adjust the frequency on the fly, as the
stream is playing.  I would guess that ALSA doesn't have that option?

Thanks,
Richard


Re: [PATCH] mac80211_hwsim: Allow wmediumd to attach to radios created in its netns

2016-06-15 Thread Johannes Berg
I was about to apply this (with a typo fix for "responsile"), but
noticed these messages:

>   printk(KERN_DEBUG "mac80211_hwsim: received a REGISTER, "
>      "switching to wmediumd mode with pid %d\n", info-
> >snd_portid);


> + if (notify->portid == hwsim_net_get_wmediumd(notify->net)) {
>   printk(KERN_INFO "mac80211_hwsim: wmediumd released
> netlink"
>      " socket, switching to perfect channel
> medium\n");
> 

I wonder if we can do something better about them? Or perhaps if we
should remove them, so other namespaces won't mess up the kernel log?

johannes


[PATCH net-next V2] tun: introduce tx skb ring

2016-06-15 Thread Jason Wang
We used to queue tx packets in sk_receive_queue, this is less
efficient since it requires spinlocks to synchronize between producer
and consumer.

This patch tries to address this by:

- introduce a new mode which will be only enabled with IFF_TX_ARRAY
  set and switch from sk_receive_queue to a fixed size of skb
  array with 256 entries in this mode.
- introduce a new proto_ops peek_len which was used for peeking the
  skb length.
- implement a tun version of peek_len for vhost_net to use and convert
  vhost_net to use peek_len if possible.

Pktgen test shows about 18% improvement on guest receiving pps for small
buffers:

Before: ~122pps
After : ~144pps

The reason why I stick to new mode is because:

- though resize is supported by skb array, in multiqueue mode, it's
  not easy to recover from a partial success of queue resizing.
- tx_queue_len is a user visible feature.

Signed-off-by: Jason Wang 
---
- The patch is based on [PATCH v8 0/5] skb_array: array based FIFO for skbs

Changes from V1:
- switch to use skb array instead of a customized circular buffer
- add non-blocking support
- rename .peek to .peek_len
- drop lockless peeking since test show very minor improvement
---
 drivers/net/tun.c   | 138 
 drivers/vhost/net.c |  16 -
 include/linux/net.h |   1 +
 include/uapi/linux/if_tun.h |   1 +
 4 files changed, 143 insertions(+), 13 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index e16487c..b22e475 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -71,6 +71,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -130,6 +131,7 @@ struct tap_filter {
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
+#define TUN_RING_SIZE 256
 
 struct tun_pcpu_stats {
u64 rx_packets;
@@ -167,6 +169,7 @@ struct tun_file {
};
struct list_head next;
struct tun_struct *detached;
+   struct skb_array tx_array;
 };
 
 struct tun_flow_entry {
@@ -513,8 +516,15 @@ static struct tun_struct *tun_enable_queue(struct tun_file 
*tfile)
return tun;
 }
 
-static void tun_queue_purge(struct tun_file *tfile)
+static void tun_queue_purge(struct tun_struct *tun, struct tun_file *tfile)
 {
+   struct sk_buff *skb;
+
+   if (tun->flags & IFF_TX_ARRAY) {
+   while ((skb = skb_array_consume(&tfile->tx_array)) != NULL)
+   kfree_skb(skb);
+   }
+
skb_queue_purge(&tfile->sk.sk_receive_queue);
skb_queue_purge(&tfile->sk.sk_error_queue);
 }
@@ -545,7 +555,7 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
synchronize_net();
tun_flow_delete_by_queue(tun, tun->numqueues + 1);
/* Drop read queue */
-   tun_queue_purge(tfile);
+   tun_queue_purge(tun, tfile);
tun_set_real_num_queues(tun);
} else if (tfile->detached && clean) {
tun = tun_enable_queue(tfile);
@@ -560,6 +570,8 @@ static void __tun_detach(struct tun_file *tfile, bool clean)
tun->dev->reg_state == NETREG_REGISTERED)
unregister_netdevice(tun->dev);
}
+   if (tun && tun->flags & IFF_TX_ARRAY)
+   skb_array_cleanup(&tfile->tx_array);
sock_put(&tfile->sk);
}
 }
@@ -596,12 +608,12 @@ static void tun_detach_all(struct net_device *dev)
for (i = 0; i < n; i++) {
tfile = rtnl_dereference(tun->tfiles[i]);
/* Drop read queue */
-   tun_queue_purge(tfile);
+   tun_queue_purge(tun, tfile);
sock_put(&tfile->sk);
}
list_for_each_entry_safe(tfile, tmp, &tun->disabled, next) {
tun_enable_queue(tfile);
-   tun_queue_purge(tfile);
+   tun_queue_purge(tun, tfile);
sock_put(&tfile->sk);
}
BUG_ON(tun->numdisabled != 0);
@@ -642,6 +654,13 @@ static int tun_attach(struct tun_struct *tun, struct file 
*file, bool skip_filte
if (!err)
goto out;
}
+
+   if (!tfile->detached && tun->flags & IFF_TX_ARRAY &&
+   skb_array_init(&tfile->tx_array, TUN_RING_SIZE, GFP_KERNEL)) {
+   err = -ENOMEM;
+   goto out;
+   }
+
tfile->queue_index = tun->numqueues;
tfile->socket.sk->sk_shutdown &= ~RCV_SHUTDOWN;
rcu_assign_pointer(tfile->tun, tun);
@@ -891,8 +910,13 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, 
struct net_device *dev)
 
nf_reset(skb);
 
-   /* Enqueue packet */
-   skb_queue_tail(&tfile->socket.sk->sk_receive_queue, skb);
+   if (tun->flags & IFF_TX_ARRAY) {
+   if (skb_array_produce(&tfile->tx_array, skb))
+   goto drop;
+   } else {
+   /* Enqueue packet */
+  

[PATCH] net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG

2016-06-15 Thread Jason A. Donenfeld
The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.

However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.

Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for 
CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: Jason A. Donenfeld 
Cc: Tim Bingham 
---
 include/linux/net.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/net.h b/include/linux/net.h
index 9aa49a0..25aa03b 100644
--- a/include/linux/net.h
+++ b/include/linux/net.h
@@ -251,7 +251,8 @@ do {
\
DEFINE_DYNAMIC_DEBUG_METADATA(descriptor, fmt); \
if (unlikely(descriptor.flags & _DPRINTK_FLAGS_PRINT) &&\
net_ratelimit())\
-   __dynamic_pr_debug(&descriptor, fmt, ##__VA_ARGS__);\
+   __dynamic_pr_debug(&descriptor, pr_fmt(fmt),\
+  ##__VA_ARGS__);  \
 } while (0)
 #elif defined(DEBUG)
 #define net_dbg_ratelimited(fmt, ...)  \
-- 
2.8.4



Re: [PATCH 3/3] net: hisilicon: Add Fast Ethernet MAC driver

2016-06-15 Thread Dongpo Li


On 2016/6/15 5:20, Arnd Bergmann wrote:
> On Tuesday, June 14, 2016 9:17:44 PM CEST Li Dongpo wrote:
>> On 2016/6/13 17:06, Arnd Bergmann wrote:
>>> On Monday, June 13, 2016 2:07:56 PM CEST Dongpo Li wrote:
>>> You tx function uses BQL to optimize the queue length, and that
>>> is great. You also check xmit reclaim for rx interrupts, so
>>> as long as you have both rx and tx traffic, this should work
>>> great.
>>>
>>> However, I notice that you only have a 'tx fifo empty'
>>> interrupt triggering the napi poll, so I guess on a tx-only
>>> workload you will always end up pushing packets into the
>>> queue until BQL throttles tx, and then get the interrupt
>>> after all packets have been sent, which will cause BQL to
>>> make the queue longer up to the maximum queue size, and that
>>> negates the effect of BQL.
>>>
>>> Is there any way you can get a tx interrupt earlier than
>>> this in order to get a more balanced queue, or is it ok
>>> to just rely on rx packets to come in occasionally, and
>>> just use the tx fifo empty interrupt as a fallback?
>>>
>> In tx direction, there are only two kinds of interrupts, 'tx fifo empty'
>> and 'tx one packet finish'. I didn't use 'tx one packet finish' because
>> it would lead to high hardware interrupts rate. This has been verified in
>> our chips. It's ok to just use tx fifo empty interrupt.
> 
> I'm not convinced by the explanation, I don't think that has anything
> to do with the hardware design, but instead is about the correctness
> of the BQL logic with your driver.
> 
> Maybe your xmit function can do something like
> 
>   if (dql_avail(netdev_get_tx_queue(dev, 0)->dql) < 0)
>   enable per-packet interrupt
>   else
>   use only fifo-empty interrupt
> 
> That way, you don't get a lot of interrupts when the system is
> in a state of packets being received and sent continuously,
> but if you get to the point where your tx queue fills up
> and no rx interrupts arrive, you don't have to wait for it
> to become completely empty before adding new packets, and
> BQL won't keep growing the queue.
> 
Hi Arnd,
Thanks for your advice. It's a good advice and I will try to fix it and
test on our chip.

 +priv->phy_mode = of_get_phy_mode(node);
 +if (priv->phy_mode < 0) {
 +dev_err(dev, "not find phy-mode\n");
 +ret = -EINVAL;
 +goto out_disable_clk;
 +}
 +
 +priv->phy_node = of_parse_phandle(node, "phy-handle", 0);
 +if (!priv->phy_node) {
 +dev_err(dev, "not find phy-handle\n");
 +ret = -EINVAL;
 +goto out_disable_clk;
 +}
 +
 +priv->phy = of_phy_connect(ndev, priv->phy_node,
 +   hisi_femac_adjust_link, 0, priv->phy_mode);
 +if (!(priv->phy) || IS_ERR(priv->phy)) {
 +dev_err(dev, "connect to PHY failed!\n");
 +ret = -ENODEV;
 +goto out_phy_node;
 +}
>>>
>>> I wonder if we could generalize this set of three calls, I
>>> get the impression that we duplicate this across several
>>> drivers that shouldn't need to bother with the specific
>>> phy-handle and phy-mode properties.
>>>
>> Some drivers only call 'of_phy_connect' when ndo_open called,
>> some call when driver probed. But 'phy_mode' and 'phy_node' are
>> usually initialized when driver probed.
>> So I think it's not suitable to combine 'of_phy_connect' with
>> 'of_get_phy_mode' and 'of_parse_phandle'.
>> Do you have any more suggestions ?
> 
> My idea was to add another interface that drivers could optionally
> call if they use the logic that you have here, but other drivers
> could keep using the plain of_phy_connect.
> 
> Anyway, this was just an idea, it's not important.
> 
ok, I get your point. I will try to figure out the general interface.
If there is a solution, I'd like to get more review.
>   Arnd
> 
> .
> 

Regards,
Dongpo

.



Re: [PATCH 3/3] net: hisilicon: Add Fast Ethernet MAC driver

2016-06-15 Thread Dongpo Li


On 2016/6/15 6:31, Rob Herring wrote:
> On Mon, Jun 13, 2016 at 02:07:56PM +0800, Dongpo Li wrote:
>> This patch adds the Hisilicon Fast Ethernet MAC(FEMAC) driver.
>> The FEMAC supports max speed 100Mbps and has been used in many
>> Hisilicon SoC.
>>
>> Reviewed-by: Jiancheng Xue 
>> Signed-off-by: Dongpo Li 
>> ---
>>  .../devicetree/bindings/net/hisilicon-femac.txt|   40 +
>>  drivers/net/ethernet/hisilicon/Kconfig |   12 +
>>  drivers/net/ethernet/hisilicon/Makefile|1 +
>>  drivers/net/ethernet/hisilicon/hisi_femac.c| 1015 
>> 
>>  4 files changed, 1068 insertions(+)
>>  create mode 100644 Documentation/devicetree/bindings/net/hisilicon-femac.txt
>>  create mode 100644 drivers/net/ethernet/hisilicon/hisi_femac.c
>>
>> diff --git a/Documentation/devicetree/bindings/net/hisilicon-femac.txt 
>> b/Documentation/devicetree/bindings/net/hisilicon-femac.txt
>> new file mode 100644
>> index 000..b953a56
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/net/hisilicon-femac.txt
>> @@ -0,0 +1,40 @@
>> +Hisilicon Fast Ethernet MAC controller
>> +
>> +Required properties:
>> +- compatible: should be "hisilicon,hisi-femac" and one of the following:
> 
> This compatible seems a bit pointless. The following 2 are generic 
> enough.
> 
ok, I will remove this compatible.

>> +* "hisilicon,hisi-femac-v1"
>> +* "hisilicon,hisi-femac-v2"
> 
> SoC specific compatible strings in addition to these please.
> 
ok.

>> +- reg: specifies base physical address(s) and size of the device registers.
>> +  The first region is the MAC core register base and size.
>> +  The second region is the global MAC control register.
>> +- interrupts: should contain the MAC interrupt.
>> +- clocks: clock phandle and specifier pair.
> 
> How many clocks?
> 
Only one clock, the following description is ok?
- clocks: phandle reference to the MAC main clock

>> +- resets: should contain the phandle to the MAC reset signal(required) and
>> +the PHY reset signal(optional).
>> +- reset-names: should contain the reset signal name "mac_reset"(required)
>> +and "phy_reset"(optional).
>> +- mac-address: see ethernet.txt [1].
>> +- phy-mode: see ethernet.txt [1].
>> +- phy-handle: see ethernet.txt [1].
>> +- hisilicon,phy-reset-delays: triplet of delays if PHY reset signal given.
>> +The 1st cell is reset pre-delay in micro seconds.
>> +The 2nd cell is reset pulse in micro seconds.
>> +The 3rd cell is reset post-delay in micro seconds.
> 
> Add standard unit suffixes.
> 
ok.

>> +
>> +[1] Documentation/devicetree/bindings/net/ethernet.txt
>> +
>> +Example:
>> +hisi_femac: ethernet@1009 {
>> +compatible = "hisilicon,hisi-femac-v2", "hisilicon,hisi-femac";
>> +reg = <0x1009 0x1000>,<0x10091300 0x200>;
>> +interrupts = <12>;
>> +clocks = <&crg HI3518EV200_ETH_CLK>;
>> +resets = <&crg 0xec 0>,
>> +<&crg 0xec 3>;
>> +reset-names = "mac_reset",
>> +"phy_reset";
>> +mac-address = [00 00 00 00 00 00];
>> +phy-mode = "mii";
>> +phy-handle = <&phy0>;
>> +hisilicon,phy-reset-delays = <1 2 2>;
>> +};
> 
> .
> 

Regards,
Dongpo

.



[PATCH net-next] qed: Add Light L2 [ll2] infrastructure

2016-06-15 Thread Yuval Mintz
The qed driver will allow support for several drivers other than
qede, and each of said protocol drivers will require some sort of
L2-like ability to send and receive traffic -
the common use-case would be for control traffic specific for the
protocol that needs to reach user/kernel tools, although other cases
exist as well [some of which are for internal use by firmware].

For that purpose we introduce the Light L2 qed interface - it will provide
a method for drivers other than qede to transmit and receive said packets
from the network. The 'light' in its name is due to its lack of
sophistication in classification and advanced features that exist in the
regular firmware used by qede.

A couple of points of interest:
  - Interface is SKB-based; I.e., protocol drivers would communicate
with qed based on SKBs [as opposed to 'opaque' buffers].

  - This isn't considered datapath, so we're not bothered by issues that
would normally trouble us in regular xmit/recieve [prints, locks, etc.].

  - This adds a new Kconfig option - CONFIG_QED_LL2 which would be
selected by non-qede protocol drivers. But as we currently don't have
any such upstream, qed_ll2.c would not compile on any setup. So it was
added as a selectable user-option, a state that will change on the
submission of the first non-qede protocol driver.

Signed-off-by: Yuval Mintz 
---
Hi Dave,

Please consider applying this to 'net-next'.

Thanks,
Yuval
---
 drivers/net/ethernet/qlogic/Kconfig|8 +
 drivers/net/ethernet/qlogic/qed/Makefile   |1 +
 drivers/net/ethernet/qlogic/qed/qed.h  |8 +
 drivers/net/ethernet/qlogic/qed/qed_cxt.c  |2 +
 drivers/net/ethernet/qlogic/qed/qed_dev.c  |  120 +-
 drivers/net/ethernet/qlogic/qed/qed_dev_api.h  |   20 +
 drivers/net/ethernet/qlogic/qed/qed_hsi.h  |  219 +++
 drivers/net/ethernet/qlogic/qed/qed_ll2.c  | 1696 
 drivers/net/ethernet/qlogic/qed/qed_ll2.h  |  289 
 drivers/net/ethernet/qlogic/qed/qed_main.c |   23 +-
 drivers/net/ethernet/qlogic/qed/qed_reg_addr.h |   22 +
 drivers/net/ethernet/qlogic/qed/qed_sp.h   |4 +
 include/linux/qed/common_hsi.h |2 +
 include/linux/qed/qed_if.h |1 +
 include/linux/qed/qed_ll2_if.h |  140 ++
 15 files changed, 2552 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ll2.c
 create mode 100644 drivers/net/ethernet/qlogic/qed/qed_ll2.h
 create mode 100644 include/linux/qed/qed_ll2_if.h

diff --git a/drivers/net/ethernet/qlogic/Kconfig 
b/drivers/net/ethernet/qlogic/Kconfig
index 680d8c7..ebf1107 100644
--- a/drivers/net/ethernet/qlogic/Kconfig
+++ b/drivers/net/ethernet/qlogic/Kconfig
@@ -98,6 +98,14 @@ config QED
---help---
  This enables the support for ...
 
+config QED_LL2
+   bool "Qlogic QED Light L2 interface"
+   default n
+   depends on QED
+   ---help---
+   This enables support for Light L2 interface which is required
+   by all qed protocol drivers other than qede.
+
 config QED_SRIOV
bool "QLogic QED 25/40/100Gb SR-IOV support"
depends on QED && PCI_IOV
diff --git a/drivers/net/ethernet/qlogic/qed/Makefile 
b/drivers/net/ethernet/qlogic/qed/Makefile
index d1f157e..02f4842 100644
--- a/drivers/net/ethernet/qlogic/qed/Makefile
+++ b/drivers/net/ethernet/qlogic/qed/Makefile
@@ -4,3 +4,4 @@ qed-y := qed_cxt.o qed_dev.o qed_hw.o qed_init_fw_funcs.o 
qed_init_ops.o \
 qed_int.o qed_main.o qed_mcp.o qed_sp_commands.o qed_spq.o qed_l2.o \
 qed_selftest.o qed_dcbx.o
 qed-$(CONFIG_QED_SRIOV) += qed_sriov.o qed_vf.o
+qed-$(CONFIG_QED_LL2) += qed_ll2.o
diff --git a/drivers/net/ethernet/qlogic/qed/qed.h 
b/drivers/net/ethernet/qlogic/qed/qed.h
index 9a63df1..e86f542 100644
--- a/drivers/net/ethernet/qlogic/qed/qed.h
+++ b/drivers/net/ethernet/qlogic/qed/qed.h
@@ -69,6 +69,7 @@ struct qed_sb_info;
 struct qed_sb_attn_info;
 struct qed_cxt_mngr;
 struct qed_sb_sp_info;
+struct qed_ll2_info;
 struct qed_mcp_info;
 
 struct qed_rt_data {
@@ -149,6 +150,7 @@ enum QED_RESOURCES {
QED_MAC,
QED_VLAN,
QED_ILT,
+   QED_LL2_QUEUE,
QED_MAX_RESC,
 };
 
@@ -357,6 +359,8 @@ struct qed_hwfn {
struct qed_sb_attn_info *p_sb_attn;
 
/* Protocol related */
+   boolusing_ll2;
+   struct qed_ll2_info *p_ll2_info;
struct qed_pf_paramspf_params;
 
bool b_rdma_enabled_in_prs;
@@ -542,6 +546,10 @@ struct qed_dev {
} protocol_ops;
void*ops_cookie;
 
+#ifdef CONFIG_QED_LL2
+   struct qed_cb_ll2_info  *ll2;
+   u8  ll2_mac_address[ETH_ALEN];
+#endif
const struct firmware   *firmware;
 };
 
diff --git a/drivers/net/ethernet/qlogic/qed/qed_cxt.c 
b/drivers/net/ethernet/qlogic/qed/qed_cxt.c
index 1c35f37..01cf303 1

Re: [patch net-next] net: hns: add skb_reset_mac_header() after skb being alloc

2016-06-15 Thread Yisen Zhuang
Hi David,

Thanks for your suggestions.

Please see my comments below.

Thanks,

Yisen

在 2016/6/15 13:41, David Miller 写道:
> From: Yisen Zhuang 
> Date: Mon, 13 Jun 2016 20:41:22 +0800
> 
>> From: Kejian Yan 
>>
>> HNS receives a packet without doing anything, but it should call
>> skb_reset_mac_header() to initialize the header before using
>> eth_hdr().
>>
>> Fixes: 0d6b425a3773c3445b0f51b2f333821beaacb619
>> Signed-off-by: Kejian Yan 
>> Signed-off-by: Yisen Zhuang 
> 
> Well, this patch made me look at this function.
> 
> You really shouldn't be filtering packets looped back, that is
> the stack's job.  It shouldn't be happening in the driver.

If we use ping6 to test if it is connected to network, CPUs would send out the 
NS packets
and these packets will be looped back to CPUs. If driver does not drop these 
packets,
they will be sent to protocol stack and protocol stack consider that there is a 
device
with the same address and it is not available address. It will show us the log 
like
"connect: Cannot assign requested address". Then it can not connect to the 
network enviroment.
Thus, we drop these packets looped back in HNS driver.

> 
> And once you remove that code, this patch here is no longer
> necessary.
> 
> Second of all, unless you card supports every protocol that
> exists in the past, present, and _future_ you cannot set
> skb->ip_summed to CHECKSUM_UNNECSSARY unconditionally like
> that.
> 
> You can only set that for protocols your chip actually supports.

Thanks for your suggestions. I will prepare a new patch to fix it.

> 
> .
> 



Re: [PATCH net-next V2] tun: introduce tx skb ring

2016-06-15 Thread kbuild test robot
Hi,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Jason-Wang/tun-introduce-tx-skb-ring/20160615-164041
config: x86_64-randconfig-s2-06151732 (attached as .config)
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All errors (new ones prefixed by >>):

>> drivers/net/tun.c:74:29: fatal error: linux/skb_array.h: No such file or 
>> directory
#include 
^
   compilation terminated.

vim +74 drivers/net/tun.c

68  #include 
69  #include 
70  #include 
71  #include 
72  #include 
73  #include 
  > 74  #include 
75  
76  #include 
77  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCHv3 net-next 00/12] 6lowpan: introduce 6lowpan-nd

2016-06-15 Thread YOSHIFUJI Hideaki
Hi,

Alexander Aring wrote:
> Alexander Aring (12):
>   6lowpan: add private neighbour data
>   6lowpan: add 802.15.4 short addr slaac
>   6lowpan: remove ipv6 module request
>   ndisc: add __ndisc_opt_addr_space function
>   ndisc: add __ndisc_opt_addr_data function
>   ndisc: add __ndisc_fill_addr_option function
>   addrconf: put prefix address add in an own function
>   ipv6: introduce neighbour discovery ops
>   ipv6: export several functions
>   6lowpan: introduce 6lowpan-nd
>   6lowpan: add support for getting short address
>   6lowpan: add support for 802.15.4 short addr handling
> 
>  include/linux/netdevice.h |   8 +-
>  include/net/6lowpan.h |  16 +++
>  include/net/addrconf.h|  10 ++
>  include/net/ndisc.h   | 248 
> +++---
>  net/6lowpan/6lowpan_i.h   |   4 +
>  net/6lowpan/Makefile  |   2 +-
>  net/6lowpan/core.c|  50 -
>  net/6lowpan/debugfs.c |  39 +++
>  net/6lowpan/iphc.c| 167 +++-
>  net/6lowpan/ndisc.c   | 234 +++
>  net/ieee802154/6lowpan/core.c |  12 ++
>  net/ieee802154/6lowpan/tx.c   | 113 +--
>  net/ipv6/addrconf.c   | 218 +
>  net/ipv6/ndisc.c  | 123 +
>  net/ipv6/route.c  |   8 +-
>  15 files changed, 1004 insertions(+), 248 deletions(-)
>  create mode 100644 net/6lowpan/ndisc.c
> 

Looks good to me.

Acked-by: YOSHIFUJI Hideaki 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCHv3 net-next 01/12] 6lowpan: add private neighbour data

2016-06-15 Thread YOSHIFUJI Hideaki


Alexander Aring wrote:
> This patch will introduce a 6lowpan neighbour private data. Like the
> interface private data we handle private data for generic 6lowpan and
> for link-layer specific 6lowpan.
> 
> The current first use case if to save the short address for a 802.15.4
> 6lowpan neighbour.
> 
> Cc: David S. Miller 
> Reviewed-by: Stefan Schmidt 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/linux/netdevice.h |  3 +--
>  include/net/6lowpan.h | 10 ++
>  net/ieee802154/6lowpan/core.c | 12 
>  3 files changed, 23 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index d101e4d..36e43bd 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1483,8 +1483,7 @@ enum netdev_priv_flags {
>   *   @perm_addr: Permanent hw address
>   *   @addr_assign_type:  Hw address assignment type
>   *   @addr_len:  Hardware address length
> - *   @neigh_priv_len;Used in neigh_alloc(),
> - *   initialized only in atm/clip.c
> + *   @neigh_priv_len:Used in neigh_alloc()
>   *   @dev_id:Used to differentiate devices that share
>   *   the same link layer address
>   *   @dev_port:  Used to differentiate devices that share
> diff --git a/include/net/6lowpan.h b/include/net/6lowpan.h
> index da84cf9..2d9b9d3 100644
> --- a/include/net/6lowpan.h
> +++ b/include/net/6lowpan.h
> @@ -141,6 +141,16 @@ struct lowpan_dev {
>   u8 priv[0] __aligned(sizeof(void *));
>  };
>  
> +struct lowpan_802154_neigh {
> + __le16 short_addr;
> +};
> +
> +static inline
> +struct lowpan_802154_neigh *lowpan_802154_neigh(void *neigh_priv)
> +{
> + return neigh_priv;
> +}
> +
>  static inline
>  struct lowpan_dev *lowpan_dev(const struct net_device *dev)
>  {
> diff --git a/net/ieee802154/6lowpan/core.c b/net/ieee802154/6lowpan/core.c
> index 4e2b308..8c004a0 100644
> --- a/net/ieee802154/6lowpan/core.c
> +++ b/net/ieee802154/6lowpan/core.c
> @@ -81,11 +81,21 @@ static int lowpan_stop(struct net_device *dev)
>   return 0;
>  }
>  
> +static int lowpan_neigh_construct(struct neighbour *n)
> +{
> + struct lowpan_802154_neigh *neigh = 
> lowpan_802154_neigh(neighbour_priv(n));
> +
> + /* default no short_addr is available for a neighbour */
> + neigh->short_addr = cpu_to_le16(IEEE802154_ADDR_SHORT_UNSPEC);
> + return 0;
> +}
> +
>  static const struct net_device_ops lowpan_netdev_ops = {
>   .ndo_init   = lowpan_dev_init,
>   .ndo_start_xmit = lowpan_xmit,
>   .ndo_open   = lowpan_open,
>   .ndo_stop   = lowpan_stop,
> + .ndo_neigh_construct= lowpan_neigh_construct,
>  };
>  
>  static void lowpan_setup(struct net_device *ldev)
> @@ -150,6 +160,8 @@ static int lowpan_newlink(struct net *src_net, struct 
> net_device *ldev,
>   wdev->needed_headroom;
>   ldev->needed_tailroom = wdev->needed_tailroom;
>  
> + ldev->neigh_priv_len = sizeof(struct lowpan_802154_neigh);
> +
>   ret = lowpan_register_netdevice(ldev, LOWPAN_LLTYPE_IEEE802154);
>   if (ret < 0) {
>   dev_put(wdev);
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCHv3 net-next 04/12] ndisc: add __ndisc_opt_addr_space function

2016-06-15 Thread YOSHIFUJI Hideaki
Alexander Aring wrote:
> This patch adds __ndisc_opt_addr_space as low-level function for
> ndisc_opt_addr_space which doesn't depend on net_device parameter.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/net/ndisc.h | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 2d8edaa..4cee826 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -127,10 +127,15 @@ static inline int ndisc_addr_option_pad(unsigned short 
> type)
>   }
>  }
>  
> +static inline int __ndisc_opt_addr_space(unsigned char addr_len, int pad)
> +{
> + return NDISC_OPT_SPACE(addr_len + pad);
> +}
> +
>  static inline int ndisc_opt_addr_space(struct net_device *dev)
>  {
> - return NDISC_OPT_SPACE(dev->addr_len +
> -ndisc_addr_option_pad(dev->type));
> + return __ndisc_opt_addr_space(dev->addr_len,
> +   ndisc_addr_option_pad(dev->type));
>  }
>  
>  static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCHv3 net-next 05/12] ndisc: add __ndisc_opt_addr_data function

2016-06-15 Thread YOSHIFUJI Hideaki


Alexander Aring wrote:
> This patch adds __ndisc_opt_addr_data as low-level function for
> ndisc_opt_addr_data which doesn't depend on net_device parameter.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/net/ndisc.h | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 4cee826..c8962ad 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -138,17 +138,23 @@ static inline int ndisc_opt_addr_space(struct 
> net_device *dev)
> ndisc_addr_option_pad(dev->type));
>  }
>  
> -static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> -   struct net_device *dev)
> +static inline u8 *__ndisc_opt_addr_data(struct nd_opt_hdr *p,
> + unsigned char addr_len, int prepad)
>  {
>   u8 *lladdr = (u8 *)(p + 1);
>   int lladdrlen = p->nd_opt_len << 3;
> - int prepad = ndisc_addr_option_pad(dev->type);
> - if (lladdrlen != ndisc_opt_addr_space(dev))
> + if (lladdrlen != __ndisc_opt_addr_space(addr_len, prepad))
>   return NULL;
>   return lladdr + prepad;
>  }
>  
> +static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,
> +   struct net_device *dev)
> +{
> + return __ndisc_opt_addr_data(p, dev->addr_len,
> +  ndisc_addr_option_pad(dev->type));
> +}
> +
>  static inline u32 ndisc_hashfn(const void *pkey, const struct net_device 
> *dev, __u32 *hash_rnd)
>  {
>   const u32 *p32 = pkey;
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCHv3 net-next 06/12] ndisc: add __ndisc_fill_addr_option function

2016-06-15 Thread YOSHIFUJI Hideaki


Alexander Aring wrote:
> This patch adds __ndisc_fill_addr_option as low-level function for
> ndisc_fill_addr_option which doesn't depend on net_device parameter.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 


Acked-by: YOSHIFUJI Hideaki 

> ---
>  net/ipv6/ndisc.c | 14 ++
>  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index c245895..a7b9468 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -150,11 +150,10 @@ struct neigh_table nd_tbl = {
>  };
>  EXPORT_SYMBOL_GPL(nd_tbl);
>  
> -static void ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data)
> +static void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void 
> *data,
> +  int data_len, int pad)
>  {
> - int pad   = ndisc_addr_option_pad(skb->dev->type);
> - int data_len = skb->dev->addr_len;
> - int space = ndisc_opt_addr_space(skb->dev);
> + int space = __ndisc_opt_addr_space(data_len, pad);
>   u8 *opt = skb_put(skb, space);
>  
>   opt[0] = type;
> @@ -172,6 +171,13 @@ static void ndisc_fill_addr_option(struct sk_buff *skb, 
> int type, void *data)
>   memset(opt, 0, space);
>  }
>  
> +static inline void ndisc_fill_addr_option(struct sk_buff *skb, int type,
> +   void *data)
> +{
> + __ndisc_fill_addr_option(skb, type, data, skb->dev->addr_len,
> +  ndisc_addr_option_pad(skb->dev->type));
> +}
> +
>  static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur,
>   struct nd_opt_hdr *end)
>  {
> 

-- 
Hideaki Yoshifuji 
Technical Division, MIRACLE LINUX CORPORATION


Re: [PATCHv3 net-next 07/12] addrconf: put prefix address add in an own function

2016-06-15 Thread YOSHIFUJI Hideaki


Alexander Aring wrote:
> This patch moves the functionality to add a RA PIO prefix generated
> address in an own function. This move prepares to add a hook for
> adding a second address for a second link-layer address. E.g. short
> address for 802.15.4 6LoWPAN.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Reviewed-by: Stefan Schmidt 
> Signed-off-by: Alexander Aring 


Acked-by: YOSHIFUJI Hideaki 


> ---
>  net/ipv6/addrconf.c | 203 
> 
>  1 file changed, 109 insertions(+), 94 deletions(-)
> 
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index beaad49..0ca31e1 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2333,12 +2333,110 @@ static bool is_addr_mode_generate_stable(struct 
> inet6_dev *idev)
>  idev->addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM;
>  }
>  
> +static int addrconf_prefix_rcv_add_addr(struct net *net,
> + struct net_device *dev,
> + const struct prefix_info *pinfo,
> + struct inet6_dev *in6_dev,
> + const struct in6_addr *addr,
> + int addr_type, u32 addr_flags,
> + bool sllao, bool tokenized,
> + __u32 valid_lft, u32 prefered_lft)
> +{
> + struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1);
> + int create = 0, update_lft = 0;
> +
> + if (!ifp && valid_lft) {
> + int max_addresses = in6_dev->cnf.max_addresses;
> +
> +#ifdef CONFIG_IPV6_OPTIMISTIC_DAD
> + if (in6_dev->cnf.optimistic_dad &&
> + !net->ipv6.devconf_all->forwarding && sllao)
> + addr_flags |= IFA_F_OPTIMISTIC;
> +#endif
> +
> + /* Do not allow to create too much of autoconfigured
> +  * addresses; this would be too easy way to crash kernel.
> +  */
> + if (!max_addresses ||
> + ipv6_count_addresses(in6_dev) < max_addresses)
> + ifp = ipv6_add_addr(in6_dev, addr, NULL,
> + pinfo->prefix_len,
> + addr_type&IPV6_ADDR_SCOPE_MASK,
> + addr_flags, valid_lft,
> + prefered_lft);
> +
> + if (IS_ERR_OR_NULL(ifp))
> + return -1;
> +
> + update_lft = 0;
> + create = 1;
> + spin_lock_bh(&ifp->lock);
> + ifp->flags |= IFA_F_MANAGETEMPADDR;
> + ifp->cstamp = jiffies;
> + ifp->tokenized = tokenized;
> + spin_unlock_bh(&ifp->lock);
> + addrconf_dad_start(ifp);
> + }
> +
> + if (ifp) {
> + u32 flags;
> + unsigned long now;
> + u32 stored_lft;
> +
> + /* update lifetime (RFC2462 5.5.3 e) */
> + spin_lock_bh(&ifp->lock);
> + now = jiffies;
> + if (ifp->valid_lft > (now - ifp->tstamp) / HZ)
> + stored_lft = ifp->valid_lft - (now - ifp->tstamp) / HZ;
> + else
> + stored_lft = 0;
> + if (!update_lft && !create && stored_lft) {
> + const u32 minimum_lft = min_t(u32,
> + stored_lft, MIN_VALID_LIFETIME);
> + valid_lft = max(valid_lft, minimum_lft);
> +
> + /* RFC4862 Section 5.5.3e:
> +  * "Note that the preferred lifetime of the
> +  *  corresponding address is always reset to
> +  *  the Preferred Lifetime in the received
> +  *  Prefix Information option, regardless of
> +  *  whether the valid lifetime is also reset or
> +  *  ignored."
> +  *
> +  * So we should always update prefered_lft here.
> +  */
> + update_lft = 1;
> + }
> +
> + if (update_lft) {
> + ifp->valid_lft = valid_lft;
> + ifp->prefered_lft = prefered_lft;
> + ifp->tstamp = now;
> + flags = ifp->flags;
> + ifp->flags &= ~IFA_F_DEPRECATED;
> + spin_unlock_bh(&ifp->lock);
> +
> + if (!(flags&IFA_F_TENTATIVE))
> + ipv6_ifa_notify(0, ifp);
> + } else
> + spin_unlock_bh(&ifp->lock);
> +
> + manage_tempaddrs(in6_dev, ifp, valid_lft, prefered_lft,
> +  create, now);
> +
> + in6_ifa_put(ifp);
> + addrconf_verify

Re: [PATCHv3 net-next 08/12] ipv6: introduce neighbour discovery ops

2016-06-15 Thread YOSHIFUJI Hideaki/吉藤英明


Alexander Aring wrote:
> This patch introduces neighbour discovery ops callback structure. The
> idea is to separate the handling for 6LoWPAN into the 6lowpan module.
> 
> These callback offers 6lowpan different handling, such as 802.15.4 short
> address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
> over 6LoWPANs).
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/linux/netdevice.h |   5 ++
>  include/net/ndisc.h   | 197 
> +-
>  net/ipv6/addrconf.c   |  13 ++-
>  net/ipv6/ndisc.c  | 101 
>  net/ipv6/route.c  |   8 +-
>  5 files changed, 284 insertions(+), 40 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 36e43bd..890158e 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -1456,6 +1456,8 @@ enum netdev_priv_flags {
>   *   @netdev_ops:Includes several pointers to callbacks,
>   *   if one wants to override the ndo_*() functions
>   *   @ethtool_ops:   Management operations
> + *   @ndisc_ops: Includes callbacks for different IPv6 neighbour
> + *   discovery handling. Necessary for e.g. 6LoWPAN.
>   *   @header_ops:Includes callbacks for creating,parsing,caching,etc
>   *   of Layer 2 headers.
>   *
> @@ -1672,6 +1674,9 @@ struct net_device {
>  #ifdef CONFIG_NET_L3_MASTER_DEV
>   const struct l3mdev_ops *l3mdev_ops;
>  #endif
> +#if IS_ENABLED(CONFIG_IPV6)
> + const struct ndisc_ops *ndisc_ops;
> +#endif
>  
>   const struct header_ops *header_ops;
>  
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index c8962ad..a5e2767 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -58,6 +58,7 @@ struct inet6_dev;
>  struct net_device;
>  struct net_proto_family;
>  struct sk_buff;
> +struct prefix_info;
>  
>  extern struct neigh_table nd_tbl;
>  
> @@ -110,9 +111,182 @@ struct ndisc_options {
>  
>  #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
>  
> -struct ndisc_options *ndisc_parse_options(u8 *opt, int opt_len,
> +struct ndisc_options *ndisc_parse_options(const struct net_device *dev,
> +   u8 *opt, int opt_len,
> struct ndisc_options *ndopts);
>  
> +#define NDISC_OPS_REDIRECT_DATA_SPACE2
> +
> +/*
> + * This structure defines the hooks for IPv6 neighbour discovery.
> + * The following hooks can be defined; unless noted otherwise, they are
> + * optional and can be filled with a null pointer.
> + *
> + * int (*is_useropt)(u8 nd_opt_type):
> + * This function is called when IPv6 decide RA userspace options. if
> + * this function returns 1 then the option given by nd_opt_type will
> + * be handled as userspace option additional to the IPv6 options.
> + *
> + * int (*parse_options)(const struct net_device *dev,
> + *   struct nd_opt_hdr *nd_opt,
> + *   struct ndisc_options *ndopts):
> + * This function is called while parsing ndisc ops and put each position
> + * as pointer into ndopts. If this function return unequal 0, then this
> + * function took care about the ndisc option, if 0 then the IPv6 ndisc
> + * option parser will take care about that option.
> + *
> + * void (*update)(const struct net_device *dev, struct neighbour *n,
> + * u32 flags, u8 icmp6_type,
> + * const struct ndisc_options *ndopts):
> + * This function is called when IPv6 ndisc updates the neighbour cache
> + * entry. Additional options which can be updated may be previously
> + * parsed by parse_opts callback and accessible over ndopts parameter.
> + *
> + * int (*opt_addr_space)(const struct net_device *dev, u8 icmp6_type,
> + *struct neighbour *neigh, u8 *ha_buf,
> + *u8 **ha):
> + * This function is called when the necessary option space will be
> + * calculated before allocating a skb. The parameters neigh, ha_buf
> + * abd ha are available on NDISC_REDIRECT messages only.
> + *
> + * void (*fill_addr_option)(const struct net_device *dev,
> + *   struct sk_buff *skb, u8 icmp6_type,
> + *   const u8 *ha):
> + * This function is called when the skb will finally fill the option
> + * fields inside skb. NOTE: this callback should fill the option
> + * fields to the skb which are previously indicated by opt_space
> + * parameter. That means the decision to add such option should
> + * not lost between these two callbacks, e.g. protected by interface
> + * up state.
> + *
> + * void (*prefix_rcv_add_addr)(struct net *net, struct net_device *dev,
> + *  const struct prefix_info *pinfo,
> + *  

Re: [PATCHv3 net-next 09/12] ipv6: export several functions

2016-06-15 Thread YOSHIFUJI Hideaki


Alexander Aring wrote:
> This patch exports some neighbour discovery functions which can be used
> by 6lowpan neighbour discovery ops functionality then.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Signed-off-by: Alexander Aring 

Acked-by: YOSHIFUJI Hideaki 

> ---
>  include/net/addrconf.h |  7 +++
>  include/net/ndisc.h| 12 
>  net/ipv6/addrconf.c| 15 +++
>  net/ipv6/ndisc.c   | 14 +++---
>  4 files changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/include/net/addrconf.h b/include/net/addrconf.h
> index b1774eb..9826d3a 100644
> --- a/include/net/addrconf.h
> +++ b/include/net/addrconf.h
> @@ -97,6 +97,13 @@ void addrconf_leave_solict(struct inet6_dev *idev, const 
> struct in6_addr *addr);
>  void addrconf_add_linklocal(struct inet6_dev *idev,
>   const struct in6_addr *addr, u32 flags);
>  
> +int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
> +  const struct prefix_info *pinfo,
> +  struct inet6_dev *in6_dev,
> +  const struct in6_addr *addr, int addr_type,
> +  u32 addr_flags, bool sllao, bool tokenized,
> +  __u32 valid_lft, u32 prefered_lft);
> +
>  static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
>  {
>   if (dev->addr_len != ETH_ALEN)
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index a5e2767..3f0f41d 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -53,6 +53,15 @@ enum {
>  
>  #include 
>  
> +/* Set to 3 to get tracing... */
> +#define ND_DEBUG 1
> +
> +#define ND_PRINTK(val, level, fmt, ...)  \
> +do { \
> + if (val <= ND_DEBUG)\
> + net_##level##_ratelimited(fmt, ##__VA_ARGS__);  \
> +} while (0)
> +
>  struct ctl_table;
>  struct inet6_dev;
>  struct net_device;
> @@ -115,6 +124,9 @@ struct ndisc_options *ndisc_parse_options(const struct 
> net_device *dev,
> u8 *opt, int opt_len,
> struct ndisc_options *ndopts);
>  
> +void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data,
> +   int data_len, int pad);
> +
>  #define NDISC_OPS_REDIRECT_DATA_SPACE2
>  
>  /*
> diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
> index 2d678c0..9c7d660 100644
> --- a/net/ipv6/addrconf.c
> +++ b/net/ipv6/addrconf.c
> @@ -2333,14 +2333,12 @@ static bool is_addr_mode_generate_stable(struct 
> inet6_dev *idev)
>  idev->addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM;
>  }
>  
> -static int addrconf_prefix_rcv_add_addr(struct net *net,
> - struct net_device *dev,
> - const struct prefix_info *pinfo,
> - struct inet6_dev *in6_dev,
> - const struct in6_addr *addr,
> - int addr_type, u32 addr_flags,
> - bool sllao, bool tokenized,
> - __u32 valid_lft, u32 prefered_lft)
> +int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
> +  const struct prefix_info *pinfo,
> +  struct inet6_dev *in6_dev,
> +  const struct in6_addr *addr, int addr_type,
> +  u32 addr_flags, bool sllao, bool tokenized,
> +  __u32 valid_lft, u32 prefered_lft)
>  {
>   struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1);
>   int create = 0, update_lft = 0;
> @@ -2430,6 +2428,7 @@ static int addrconf_prefix_rcv_add_addr(struct net *net,
>  
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(addrconf_prefix_rcv_add_addr);
>  
>  void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool 
> sllao)
>  {
> diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
> index 2f4afd1..fe65cdc 100644
> --- a/net/ipv6/ndisc.c
> +++ b/net/ipv6/ndisc.c
> @@ -73,15 +73,6 @@
>  #include 
>  #include 
>  
> -/* Set to 3 to get tracing... */
> -#define ND_DEBUG 1
> -
> -#define ND_PRINTK(val, level, fmt, ...)  \
> -do { \
> - if (val <= ND_DEBUG)\
> - net_##level##_ratelimited(fmt, ##__VA_ARGS__);  \
> -} while (0)
> -
>  static u32 ndisc_hash(const void *pkey,
> const struct net_device *dev,
> __u32 *hash_rnd);
> @@ -150,8 +141,8 @@ struct neigh_table nd_tbl = {
>  };
>  EXPORT_SYMBOL_GPL(nd_tbl);
>  
> -static void __ndisc

Re: [PATCHv3 net-next 10/12] 6lowpan: introduce 6lowpan-nd

2016-06-15 Thread YOSHIFUJI Hideaki


Alexander Aring wrote:
> This patch introduce different 6lowpan handling for receive and transmit
> NS/NA messages for the ipv6 neighbour discovery. The first use-case is
> for supporting 802.15.4 short addresses inside the option fields and
> handling for RFC6775 6CO option field as userspace option.
> 
> Cc: David S. Miller 
> Cc: Alexey Kuznetsov 
> Cc: James Morris 
> Cc: Hideaki YOSHIFUJI 
> Cc: Patrick McHardy 
> Reviewed-by: Stefan Schmidt 
> Signed-off-by: Alexander Aring 


Acked-by: YOSHIFUJI Hideaki 


> ---
>  include/net/ndisc.h |  18 ++--
>  net/6lowpan/6lowpan_i.h |   4 +
>  net/6lowpan/Makefile|   2 +-
>  net/6lowpan/core.c  |   4 +-
>  net/6lowpan/ndisc.c | 234 
> 
>  5 files changed, 254 insertions(+), 8 deletions(-)
>  create mode 100644 net/6lowpan/ndisc.c
> 
> diff --git a/include/net/ndisc.h b/include/net/ndisc.h
> index 3f0f41d..be1fe228 100644
> --- a/include/net/ndisc.h
> +++ b/include/net/ndisc.h
> @@ -35,6 +35,7 @@ enum {
>   ND_OPT_ROUTE_INFO = 24, /* RFC4191 */
>   ND_OPT_RDNSS = 25,  /* RFC5006 */
>   ND_OPT_DNSSL = 31,  /* RFC6106 */
> + ND_OPT_6CO = 34,/* RFC6775 */
>   __ND_OPT_MAX
>  };
>  
> @@ -109,14 +110,19 @@ struct ndisc_options {
>  #endif
>   struct nd_opt_hdr *nd_useropts;
>   struct nd_opt_hdr *nd_useropts_end;
> +#if IS_ENABLED(CONFIG_IEEE802154_6LOWPAN)
> + struct nd_opt_hdr *nd_802154_opt_array[ND_OPT_TARGET_LL_ADDR + 1];
> +#endif
>  };
>  
> -#define nd_opts_src_lladdr   nd_opt_array[ND_OPT_SOURCE_LL_ADDR]
> -#define nd_opts_tgt_lladdr   nd_opt_array[ND_OPT_TARGET_LL_ADDR]
> -#define nd_opts_pi   nd_opt_array[ND_OPT_PREFIX_INFO]
> -#define nd_opts_pi_end   nd_opt_array[__ND_OPT_PREFIX_INFO_END]
> -#define nd_opts_rh   nd_opt_array[ND_OPT_REDIRECT_HDR]
> -#define nd_opts_mtu  nd_opt_array[ND_OPT_MTU]
> +#define nd_opts_src_lladdr   nd_opt_array[ND_OPT_SOURCE_LL_ADDR]
> +#define nd_opts_tgt_lladdr   nd_opt_array[ND_OPT_TARGET_LL_ADDR]
> +#define nd_opts_pi   nd_opt_array[ND_OPT_PREFIX_INFO]
> +#define nd_opts_pi_end   
> nd_opt_array[__ND_OPT_PREFIX_INFO_END]
> +#define nd_opts_rh   nd_opt_array[ND_OPT_REDIRECT_HDR]
> +#define nd_opts_mtu  nd_opt_array[ND_OPT_MTU]
> +#define nd_802154_opts_src_lladdr
> nd_802154_opt_array[ND_OPT_SOURCE_LL_ADDR]
> +#define nd_802154_opts_tgt_lladdr
> nd_802154_opt_array[ND_OPT_TARGET_LL_ADDR]
>  
>  #define NDISC_OPT_SPACE(len) (((len)+2+7)&~7)
>  
> diff --git a/net/6lowpan/6lowpan_i.h b/net/6lowpan/6lowpan_i.h
> index 97ecc27..a67caee 100644
> --- a/net/6lowpan/6lowpan_i.h
> +++ b/net/6lowpan/6lowpan_i.h
> @@ -12,6 +12,10 @@ static inline bool lowpan_is_ll(const struct net_device 
> *dev,
>   return lowpan_dev(dev)->lltype == lltype;
>  }
>  
> +extern const struct ndisc_ops lowpan_ndisc_ops;
> +
> +int addrconf_ifid_802154_6lowpan(u8 *eui, struct net_device *dev);
> +
>  #ifdef CONFIG_6LOWPAN_DEBUGFS
>  int lowpan_dev_debugfs_init(struct net_device *dev);
>  void lowpan_dev_debugfs_exit(struct net_device *dev);
> diff --git a/net/6lowpan/Makefile b/net/6lowpan/Makefile
> index e44f3bf..12d131a 100644
> --- a/net/6lowpan/Makefile
> +++ b/net/6lowpan/Makefile
> @@ -1,6 +1,6 @@
>  obj-$(CONFIG_6LOWPAN) += 6lowpan.o
>  
> -6lowpan-y := core.o iphc.o nhc.o
> +6lowpan-y := core.o iphc.o nhc.o ndisc.o
>  6lowpan-$(CONFIG_6LOWPAN_DEBUGFS) += debugfs.o
>  
>  #rfc6282 nhcs
> diff --git a/net/6lowpan/core.c b/net/6lowpan/core.c
> index 1c7a42b..5945f7e 100644
> --- a/net/6lowpan/core.c
> +++ b/net/6lowpan/core.c
> @@ -34,6 +34,8 @@ int lowpan_register_netdevice(struct net_device *dev,
>   for (i = 0; i < LOWPAN_IPHC_CTX_TABLE_SIZE; i++)
>   lowpan_dev(dev)->ctx.table[i].id = i;
>  
> + dev->ndisc_ops = &lowpan_ndisc_ops;
> +
>   ret = register_netdevice(dev);
>   if (ret < 0)
>   return ret;
> @@ -73,7 +75,7 @@ void lowpan_unregister_netdev(struct net_device *dev)
>  }
>  EXPORT_SYMBOL(lowpan_unregister_netdev);
>  
> -static int addrconf_ifid_802154_6lowpan(u8 *eui, struct net_device *dev)
> +int addrconf_ifid_802154_6lowpan(u8 *eui, struct net_device *dev)
>  {
>   struct wpan_dev *wpan_dev = 
> lowpan_802154_dev(dev)->wdev->ieee802154_ptr;
>  
> diff --git a/net/6lowpan/ndisc.c b/net/6lowpan/ndisc.c
> new file mode 100644
> index 000..ae1d419
> --- /dev/null
> +++ b/net/6lowpan/ndisc.c
> @@ -0,0 +1,234 @@
> +/* This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2
> + * as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + *

RE: [PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-06-15 Thread Alan Davey


From: David Miller [mailto:da...@davemloft.net] 
Sent: 08 June 2016 18:26

>> -  The current behaviour is counter-intuitive (fragmentation takes
>> -  place in all other cases) and therefore different to what
>> -  everyone expects.
>
> But it's what all existing applications must expect, and as you have seen in 
> these replied they absolutely do.
>
> You cannot just break things on people like this.

The only case that would break is that where an application relies on the 
existing (documented as a bug) feature of getting an EMSGSIZE return code in 
the case of an over-sized packet.  Applications that perform their own 
fragmentation would be unaffected.

I think that the benefit of the patch, in moving all fragmentation and 
reassembly into the kernel, outweigh the very small chance that applications 
rely on the send of an over-sized packet failing.

What is your thinking on taking the patch?

Regards
Alan




Re: [very-RFC 0/8] TSN driver for the kernel

2016-06-15 Thread Richard Cochran
On Wed, Jun 15, 2016 at 09:04:41AM +0200, Richard Cochran wrote:
> On Tue, Jun 14, 2016 at 10:38:10PM +0200, Henrik Austad wrote:
> > Whereas I want to do 
> > 
> > aplay some_song.wav
> 
> Can you please explain how your patches accomplish this?

Never mind.  Looking back, I found it in patch #7.

Thanks,
Richard


Re: [PATCH net-next 00/10] net_sched: defer skb freeing while changing qdiscs

2016-06-15 Thread Jamal Hadi Salim

On 16-06-13 11:21 PM, Eric Dumazet wrote:

qdiscs/classes are changed under RTNL protection and often
while blocking BH and root qdisc spinlock.

When lots of skbs need to be dropped, we free
them under these locks causing TX/RX freezes,
and more generally latency spikes.

I saw spikes of 50+ ms on quite fast hardware...

This patch series adds a simple queue protected by RTNL
where skbs can be placed until RTNL is released.

Note that this might also serve in the future for optional
reinjection of packets when a qdisc is replaced.



Nice optimization Eric.

cheers,
jamal




Re: [very-RFC 7/8] AVB ALSA - Add ALSA shim for TSN

2016-06-15 Thread Richard Cochran
Now that I understand better...

On Sun, Jun 12, 2016 at 01:01:35AM +0200, Henrik Austad wrote:
> Userspace is supposed to reserve bandwidth, find StreamID etc.
> 
> To use as a Talker:
> 
> mkdir /config/tsn/test/eth0/talker
> cd /config/tsn/test/eth0/talker
> echo 65535 > buffer_size
> echo 08:00:27:08:9f:c3 > remote_mac
> echo 42 > stream_id
> echo alsa > enabled

This is exactly why configfs is the wrong interface.  If you implement
the AVB device in alsa-lib user space, then you can handle the
reservations, configuration, UDP sockets, etc, in a way transparent to
the aplay program.

Heck, if done properly, your layer could discover the AVB nodes in the
network and present each one as a separate device...

Thanks,
Richard




Re: padata - is serial actually serial?

2016-06-15 Thread Steffen Klassert
Hi Jason.

On Tue, Jun 14, 2016 at 11:00:54PM +0200, Jason A. Donenfeld wrote:
> Hi Steffen & Folks,
> 
> I submit a job to padata_do_parallel(). When the parallel() function
> triggers, I do some things, and then call padata_do_serial(). Finally
> the serial() function triggers, where I complete the job (check a
> nonce, etc).
> 
> The padata API is very appealing because not only does it allow for
> parallel computation, but it claims that the serial() functions will
> execute in the order that jobs were originally submitted to
> padata_do_parallel().
> 
> Unfortunately, in practice, I'm pretty sure I'm seeing deviations from
> this. When I submit tons and tons of tasks at rapid speed to
> padata_do_parallel(), it seems like the serial() function isn't being
> called in the exactly the same order that tasks were submitted to
> padata_do_parallel().
> 
> Is this known (expected) behavior? Or have I stumbled upon a potential
> bug that's worthwhile for me to investigate more?

It should return in the same order as the job were submitted,
given that the submitting cpu and the callback cpu are fixed
for all the jobs you want to preserve the order.  If you submit
jobs from more than one cpu, we can not know in which order
they are enqueued. The cpu that gets the lock as the first
has its job in front. Same if you use more than one callback cpu
we can't know in which order they are dequeued, because the
serial workers are scheduled independent on each cpu.

I use it in crypto/pcrypt.c and I never had problems.


Re: [PATCH net-next V2] tun: introduce tx skb ring

2016-06-15 Thread Jamal Hadi Salim
On 16-06-15 04:38 AM, Jason Wang wrote:
> We used to queue tx packets in sk_receive_queue, this is less
> efficient since it requires spinlocks to synchronize between producer
> and consumer.
> 
> This patch tries to address this by:
> 
> - introduce a new mode which will be only enabled with IFF_TX_ARRAY
>set and switch from sk_receive_queue to a fixed size of skb
>array with 256 entries in this mode.
> - introduce a new proto_ops peek_len which was used for peeking the
>skb length.
> - implement a tun version of peek_len for vhost_net to use and convert
>vhost_net to use peek_len if possible.
> 
> Pktgen test shows about 18% improvement on guest receiving pps for small
> buffers:
> 
> Before: ~122pps
> After : ~144pps
> 

So this is more exercising the skb array improvements. For tun
it would be useful to see general performance numbers on user/kernel
crossing (i.e tun read/write).
If you have the cycles can you run such tests?

cheers,
jamal





Re: [PATCH net-next V2] tun: introduce tx skb ring

2016-06-15 Thread Jamal Hadi Salim
On 16-06-15 07:52 AM, Jamal Hadi Salim wrote:
> On 16-06-15 04:38 AM, Jason Wang wrote:
>> We used to queue tx packets in sk_receive_queue, this is less
>> efficient since it requires spinlocks to synchronize between producer

> 
> So this is more exercising the skb array improvements. For tun
> it would be useful to see general performance numbers on user/kernel
> crossing (i.e tun read/write).
> If you have the cycles can you run such tests?
> 

Ignore my message - you are running pktgen from a VM towards the host.
So the numbers you posted are what i was interested in.
Thanks for the good work.

cheers,
jamal



[PATCH 12/15] net: davinci_mdio: document missed "ti,am4372-mdio" compat string

2016-06-15 Thread Grygorii Strashko
Document missed "ti,am4372-mdio" compat string used for TI am437x SoC
(am4372.dtsi).

Signed-off-by: Grygorii Strashko 
---
 Documentation/devicetree/bindings/net/davinci-mdio.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/davinci-mdio.txt 
b/Documentation/devicetree/bindings/net/davinci-mdio.txt
index 0369e25..f2bba50 100644
--- a/Documentation/devicetree/bindings/net/davinci-mdio.txt
+++ b/Documentation/devicetree/bindings/net/davinci-mdio.txt
@@ -2,7 +2,8 @@ TI SoC Davinci/Keystone2 MDIO Controller Device Tree Bindings
 ---
 
 Required properties:
-- compatible   : Should be "ti,davinci_mdio" or "ti,keystone_mdio"
+- compatible   : Should be "ti,davinci_mdio", "ti,keystone_mdio",
+ "ti,am4372-mdio"
 - reg  : physical base address and size of the davinci mdio
  registers map
 - bus_freq : Mdio Bus frequency
-- 
2.8.4



[PATCH 13/15] net: davinci_mdio: introduce "ti,cpsw-mdio" compat string

2016-06-15 Thread Grygorii Strashko
Introduce "ti,cpsw-mdio" compatible string for Davinci MDIO, because
it's required to distinguish the case when MDIO is part of TI CPSW to
enable features supported by TI CPSW (for example, enable PM
management).

Signed-off-by: Grygorii Strashko 
---
 Documentation/devicetree/bindings/net/davinci-mdio.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/davinci-mdio.txt 
b/Documentation/devicetree/bindings/net/davinci-mdio.txt
index f2bba50..a3d6d4b 100644
--- a/Documentation/devicetree/bindings/net/davinci-mdio.txt
+++ b/Documentation/devicetree/bindings/net/davinci-mdio.txt
@@ -3,7 +3,7 @@ TI SoC Davinci/Keystone2 MDIO Controller Device Tree Bindings
 
 Required properties:
 - compatible   : Should be "ti,davinci_mdio", "ti,keystone_mdio",
- "ti,am4372-mdio"
+ "ti,am4372-mdio", "ti,cpsw-mdio"
 - reg  : physical base address and size of the davinci mdio
  registers map
 - bus_freq : Mdio Bus frequency
-- 
2.8.4



[PATCH 01/15] drivers: net: cpsw: fix suspend when all ethX devices are down

2016-06-15 Thread Grygorii Strashko
The cpsw_suspend() could trigger L3 error and CPSW will stop
functioning if System enters suspend when all ethX net-devices are
down - in this case CPSW could be already suspended by PM runtime, but
cpsw_suspend() will try to call soft_reset_slave() unconditionally
and access CPSW registers.

Hence, fix it by moving soft_reset_slave() from cpsw_suspend() to
cpsw_slave_stop(). This way slave ports will be reset when CPSW is
active and will be in proper state during Suspend.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index e6bb0ec..736c77a 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1244,6 +1244,7 @@ static void cpsw_slave_stop(struct cpsw_slave *slave, 
struct cpsw_priv *priv)
slave->phy = NULL;
cpsw_ale_control_set(priv->ale, slave_port,
 ALE_PORT_STATE, ALE_PORT_STATE_DISABLE);
+   soft_reset_slave(slave);
 }
 
 static int cpsw_ndo_open(struct net_device *ndev)
@@ -2558,12 +2559,10 @@ static int cpsw_suspend(struct device *dev)
for (i = 0; i < priv->data.slaves; i++) {
if (netif_running(priv->slaves[i].ndev))
cpsw_ndo_stop(priv->slaves[i].ndev);
-   soft_reset_slave(priv->slaves + i);
}
} else {
if (netif_running(ndev))
cpsw_ndo_stop(ndev);
-   for_each_slave(priv, soft_reset_slave);
}
 
pm_runtime_put_sync(&pdev->dev);
-- 
2.8.4



[PATCH 11/15] drivers: net: davinci_mdio: implement pm runtime auto mode

2016-06-15 Thread Grygorii Strashko
Davinci MDIO is always used as slave device which services
read/write requests from MDIO/PHY core. It doesn't use IRQ also.

As result, It's possible to relax PM runtime constraints for Davinci
MDIO and enable it on demand, instead of powering it during probe
and powering off during removal.

Hence, implement PM runtime autosuspend for Davinci MDIO, but keep it
disabled by default, because Davinci MDIO is integrated in big set of
TI devices and not all of them expected to work corectly with RPM
 autosuspend enabled:
- expected to work on SoCs where MDIO is defined as part of TI CPSW in DT
(cpsw.c DRA7/am57x, am437x, am335x, dm814x)
- not verified on Keystone 2 and other SoCs where MDIO is used with TI EMAC IP
(davinci_emac.c:  dm6467-emac, am3517-emac, dm816-emac).

Davinci MDIO RPM autosuspend can be enabled through sysfs:
 echo 100 > 
/sys/devices/../48484000.ethernet/48485000.mdio/power/autosuspend_delay_ms

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 48 +++---
 1 file changed, 39 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index 13f5080..ce3ec42 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -93,6 +93,7 @@ struct davinci_mdio_data {
struct clk  *clk;
struct device   *dev;
struct mii_bus  *bus;
+   boolactive_in_suspend;
unsigned long   access_time; /* jiffies */
/* Indicates that driver shouldn't modify phy_mask in case
 * if MDIO bus is registered from DT.
@@ -141,8 +142,13 @@ static int davinci_mdio_reset(struct mii_bus *bus)
 {
struct davinci_mdio_data *data = bus->priv;
u32 phy_mask, ver;
+   int ret;
 
-   davinci_mdio_enable(data);
+   ret = pm_runtime_get_sync(data->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(data->dev);
+   return ret;
+   }
 
/* wait for scan logic to settle */
msleep(PHY_MAX_ADDR * data->access_time);
@@ -153,7 +159,7 @@ static int davinci_mdio_reset(struct mii_bus *bus)
 (ver >> 8) & 0xff, ver & 0xff);
 
if (data->skip_scan)
-   return 0;
+   goto done;
 
/* get phy mask from the alive register */
phy_mask = __raw_readl(&data->regs->alive);
@@ -168,6 +174,10 @@ static int davinci_mdio_reset(struct mii_bus *bus)
}
data->bus->phy_mask = phy_mask;
 
+done:
+   pm_runtime_mark_last_busy(data->dev);
+   pm_runtime_put_autosuspend(data->dev);
+
return 0;
 }
 
@@ -228,6 +238,12 @@ static int davinci_mdio_read(struct mii_bus *bus, int 
phy_id, int phy_reg)
if (phy_reg & ~PHY_REG_MASK || phy_id & ~PHY_ID_MASK)
return -EINVAL;
 
+   ret = pm_runtime_get_sync(data->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(data->dev);
+   return ret;
+   }
+
reg = (USERACCESS_GO | USERACCESS_READ | (phy_reg << 21) |
   (phy_id << 16));
 
@@ -251,6 +267,8 @@ static int davinci_mdio_read(struct mii_bus *bus, int 
phy_id, int phy_reg)
break;
}
 
+   pm_runtime_mark_last_busy(data->dev);
+   pm_runtime_put_autosuspend(data->dev);
return ret;
 }
 
@@ -264,6 +282,12 @@ static int davinci_mdio_write(struct mii_bus *bus, int 
phy_id,
if (phy_reg & ~PHY_REG_MASK || phy_id & ~PHY_ID_MASK)
return -EINVAL;
 
+   ret = pm_runtime_get_sync(data->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(data->dev);
+   return ret;
+   }
+
reg = (USERACCESS_GO | USERACCESS_WRITE | (phy_reg << 21) |
   (phy_id << 16) | (phy_data & USERACCESS_DATA));
 
@@ -282,7 +306,10 @@ static int davinci_mdio_write(struct mii_bus *bus, int 
phy_id,
break;
}
 
-   return 0;
+   pm_runtime_mark_last_busy(data->dev);
+   pm_runtime_put_autosuspend(data->dev);
+
+   return ret;
 }
 
 #if IS_ENABLED(CONFIG_OF)
@@ -357,8 +384,9 @@ static int davinci_mdio_probe(struct platform_device *pdev)
 
davinci_mdio_init_clk(data);
 
+   pm_runtime_set_autosuspend_delay(&pdev->dev, -1);
+   pm_runtime_use_autosuspend(&pdev->dev);
pm_runtime_enable(&pdev->dev);
-   pm_runtime_get_sync(&pdev->dev);
 
/* register the mii bus
 * Create PHYs from DT only in case if PHY child nodes are explicitly
@@ -387,9 +415,8 @@ static int davinci_mdio_probe(struct platform_device *pdev)
return 0;
 
 bail_out:
-   pm_runtime_put_sync(&pdev->dev);
+   pm_runtime_dont_use_autosuspend(&pdev->dev);
pm_runtime_disable(&pdev->dev);
-
return ret;
 }
 
@@ -400,7 +427,7 @@ static int davinci_mdio_remove(struct platform_device *pdev)
if (data->bus)
mdiobus_unregister(data->bus);
 
-   pm_runtime_put

[PATCH 04/15] drivers: net: cpsw: ethtool: fix accessing to suspended device

2016-06-15 Thread Grygorii Strashko
The CPSW might be suspended by RPM if all ethX interfaces are down,
but it still could be accesible through ethtool interfce. In this case
ethtool operations, requiring registers access, will cause L3 errors and
CPSW crash.

Hence, fix it by adding RPM get/put calls in ethtool callbcaks which
can access CPSW registers: .set_coalesce(), .get_ethtool_stats(),
.set_pauseparam(), .get_regs()

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 36 +++-
 1 file changed, 35 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index ba81d4e..1ba0c09 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -931,6 +931,13 @@ static int cpsw_set_coalesce(struct net_device *ndev,
u32 prescale = 0;
u32 addnl_dvdr = 1;
u32 coal_intvl = 0;
+   int ret;
+
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return ret;
+   }
 
coal_intvl = coal->rx_coalesce_usecs;
 
@@ -985,6 +992,8 @@ update_return:
priv->coal_intvl = coal_intvl;
}
 
+   pm_runtime_put(&priv->pdev->dev);
+
return 0;
 }
 
@@ -1022,7 +1031,13 @@ static void cpsw_get_ethtool_stats(struct net_device 
*ndev,
struct cpdma_chan_stats tx_stats;
u32 val;
u8 *p;
-   int i;
+   int i, ret;
+
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return;
+   }
 
/* Collect Davinci CPDMA stats for Rx and Tx Channel */
cpdma_chan_get_stats(priv->rxch, &rx_stats);
@@ -1049,6 +1064,8 @@ static void cpsw_get_ethtool_stats(struct net_device 
*ndev,
break;
}
}
+
+   pm_runtime_put(&priv->pdev->dev);
 }
 
 static int cpsw_common_res_usage_state(struct cpsw_priv *priv)
@@ -1780,11 +1797,20 @@ static void cpsw_get_regs(struct net_device *ndev,
 {
struct cpsw_priv *priv = netdev_priv(ndev);
u32 *reg = p;
+   int ret;
+
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return;
+   }
 
/* update CPSW IP version */
regs->version = priv->version;
 
cpsw_ale_dump(priv->ale, reg);
+
+   pm_runtime_put(&priv->pdev->dev);
 }
 
 static void cpsw_get_drvinfo(struct net_device *ndev,
@@ -1902,12 +1928,20 @@ static int cpsw_set_pauseparam(struct net_device *ndev,
 {
struct cpsw_priv *priv = netdev_priv(ndev);
bool link;
+   int ret;
+
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return ret;
+   }
 
priv->rx_pause = pause->rx_pause ? true : false;
priv->tx_pause = pause->tx_pause ? true : false;
 
for_each_slave(priv, _cpsw_adjust_link, priv, &link);
 
+   pm_runtime_put(&priv->pdev->dev);
return 0;
 }
 
-- 
2.8.4



[PATCH 08/15] drivers: net: davinci_mdio: drop suspended and lock fields from mdio_data

2016-06-15 Thread Grygorii Strashko
The Davinci MDIO is not expected to be accessible after its suspend
callbacks has been called:
 - all consumers of Davinci MDIO will stop/disconnect their phys at Device
suspend stage;
 - all phys are expected to be suspended already by PHY/MDIO core;
 - MDIO locking is done by MDIO Bus code.

Hence, it's safe to drop "suspended" and "lock" fields from mdio_data.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 30 --
 1 file changed, 30 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index 291c42e..b6d0059 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -90,11 +90,9 @@ static const struct mdio_platform_data default_pdata = {
 struct davinci_mdio_data {
struct mdio_platform_data pdata;
struct davinci_mdio_regs __iomem *regs;
-   spinlock_t  lock;
struct clk  *clk;
struct device   *dev;
struct mii_bus  *bus;
-   boolsuspended;
unsigned long   access_time; /* jiffies */
/* Indicates that driver shouldn't modify phy_mask in case
 * if MDIO bus is registered from DT.
@@ -225,13 +223,6 @@ static int davinci_mdio_read(struct mii_bus *bus, int 
phy_id, int phy_reg)
if (phy_reg & ~PHY_REG_MASK || phy_id & ~PHY_ID_MASK)
return -EINVAL;
 
-   spin_lock(&data->lock);
-
-   if (data->suspended) {
-   spin_unlock(&data->lock);
-   return -ENODEV;
-   }
-
reg = (USERACCESS_GO | USERACCESS_READ | (phy_reg << 21) |
   (phy_id << 16));
 
@@ -255,8 +246,6 @@ static int davinci_mdio_read(struct mii_bus *bus, int 
phy_id, int phy_reg)
break;
}
 
-   spin_unlock(&data->lock);
-
return ret;
 }
 
@@ -270,13 +259,6 @@ static int davinci_mdio_write(struct mii_bus *bus, int 
phy_id,
if (phy_reg & ~PHY_REG_MASK || phy_id & ~PHY_ID_MASK)
return -EINVAL;
 
-   spin_lock(&data->lock);
-
-   if (data->suspended) {
-   spin_unlock(&data->lock);
-   return -ENODEV;
-   }
-
reg = (USERACCESS_GO | USERACCESS_WRITE | (phy_reg << 21) |
   (phy_id << 16) | (phy_data & USERACCESS_DATA));
 
@@ -295,8 +277,6 @@ static int davinci_mdio_write(struct mii_bus *bus, int 
phy_id,
break;
}
 
-   spin_unlock(&data->lock);
-
return 0;
 }
 
@@ -364,7 +344,6 @@ static int davinci_mdio_probe(struct platform_device *pdev)
 
dev_set_drvdata(dev, data);
data->dev = dev;
-   spin_lock_init(&data->lock);
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
data->regs = devm_ioremap_resource(dev, res);
@@ -426,17 +405,12 @@ static int davinci_mdio_suspend(struct device *dev)
struct davinci_mdio_data *data = dev_get_drvdata(dev);
u32 ctrl;
 
-   spin_lock(&data->lock);
-
/* shutdown the scan state machine */
ctrl = __raw_readl(&data->regs->control);
ctrl &= ~CONTROL_ENABLE;
__raw_writel(ctrl, &data->regs->control);
wait_for_idle(data);
 
-   data->suspended = true;
-   spin_unlock(&data->lock);
-
/* Select sleep pin state */
pinctrl_pm_select_sleep_state(dev);
 
@@ -450,13 +424,9 @@ static int davinci_mdio_resume(struct device *dev)
/* Select default pin state */
pinctrl_pm_select_default_state(dev);
 
-   spin_lock(&data->lock);
/* restart the scan state machine */
__davinci_mdio_reset(data);
 
-   data->suspended = false;
-   spin_unlock(&data->lock);
-
return 0;
 }
 #endif
-- 
2.8.4



[PATCH 14/15] drivers: net: davinci_mdio: enable pm runtime auto for ti cpsw-mdio

2016-06-15 Thread Grygorii Strashko
Use "ti,cpsw-mdio" to enable PM runtime auto-suspend on supported
platforms, where MDIO is implemented as part of TI CPSW.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 45 +-
 1 file changed, 34 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index ce3ec42..d1fb734 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -53,6 +53,10 @@
 
 #define DEF_OUT_FREQ   220 /* 2.2 MHz */
 
+struct davinci_mdio_of_param {
+   int autosuspend_delay_ms;
+};
+
 struct davinci_mdio_regs {
u32 version;
u32 control;
@@ -332,6 +336,19 @@ static int davinci_mdio_probe_dt(struct mdio_platform_data 
*data,
 }
 #endif
 
+#if IS_ENABLED(CONFIG_OF)
+static const struct davinci_mdio_of_param of_cpsw_mdio_data = {
+   .autosuspend_delay_ms = 100,
+};
+
+static const struct of_device_id davinci_mdio_of_mtable[] = {
+   { .compatible = "ti,davinci_mdio", },
+   { .compatible = "ti,cpsw-mdio", .data = &of_cpsw_mdio_data},
+   { /* sentinel */ },
+};
+MODULE_DEVICE_TABLE(of, davinci_mdio_of_mtable);
+#endif
+
 static int davinci_mdio_probe(struct platform_device *pdev)
 {
struct mdio_platform_data *pdata = dev_get_platdata(&pdev->dev);
@@ -340,6 +357,7 @@ static int davinci_mdio_probe(struct platform_device *pdev)
struct resource *res;
struct phy_device *phy;
int ret, addr;
+   int autosuspend_delay_ms = -1;
 
data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
if (!data)
@@ -352,9 +370,22 @@ static int davinci_mdio_probe(struct platform_device *pdev)
}
 
if (dev->of_node) {
-   if (davinci_mdio_probe_dt(&data->pdata, pdev))
-   data->pdata = default_pdata;
+   const struct of_device_id   *of_id;
+
+   ret = davinci_mdio_probe_dt(data, pdev);
+   if (ret)
+   return ret;
snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s", pdev->name);
+
+   of_id = of_match_device(davinci_mdio_of_mtable, &pdev->dev);
+   if (of_id) {
+   const struct davinci_mdio_of_param *of_mdio_data;
+
+   of_mdio_data = of_id->data;
+   if (of_mdio_data)
+   autosuspend_delay_ms =
+   of_mdio_data->autosuspend_delay_ms;
+   }
} else {
data->pdata = pdata ? (*pdata) : default_pdata;
snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s-%x",
@@ -384,7 +415,7 @@ static int davinci_mdio_probe(struct platform_device *pdev)
 
davinci_mdio_init_clk(data);
 
-   pm_runtime_set_autosuspend_delay(&pdev->dev, -1);
+   pm_runtime_set_autosuspend_delay(&pdev->dev, autosuspend_delay_ms);
pm_runtime_use_autosuspend(&pdev->dev);
pm_runtime_enable(&pdev->dev);
 
@@ -495,14 +526,6 @@ static const struct dev_pm_ops davinci_mdio_pm_ops = {
SET_LATE_SYSTEM_SLEEP_PM_OPS(davinci_mdio_suspend, davinci_mdio_resume)
 };
 
-#if IS_ENABLED(CONFIG_OF)
-static const struct of_device_id davinci_mdio_of_mtable[] = {
-   { .compatible = "ti,davinci_mdio", },
-   { /* sentinel */ },
-};
-MODULE_DEVICE_TABLE(of, davinci_mdio_of_mtable);
-#endif
-
 static struct platform_driver davinci_mdio_driver = {
.driver = {
.name= "davinci_mdio",
-- 
2.8.4



[PATCH 10/15] drivers: net: davinci_mdio: add pm runtime callbacks

2016-06-15 Thread Grygorii Strashko
Add PM runtime .runtime_suspend()/.runtime_resume() callbacks and
perform Davinci MDIO enabling/disabling from these callbacks. This
allows to reuse pm_runtime_force_suspend/resume() APIs during System
suspend and required for further implementation of PM runtime
autosuspend.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 31 +++
 1 file changed, 27 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index b206fd3..13f5080 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -406,8 +406,8 @@ static int davinci_mdio_remove(struct platform_device *pdev)
return 0;
 }
 
-#ifdef CONFIG_PM_SLEEP
-static int davinci_mdio_suspend(struct device *dev)
+#ifdef CONFIG_PM
+static int davinci_mdio_runtime_suspend(struct device *dev)
 {
struct davinci_mdio_data *data = dev_get_drvdata(dev);
u32 ctrl;
@@ -418,6 +418,28 @@ static int davinci_mdio_suspend(struct device *dev)
__raw_writel(ctrl, &data->regs->control);
wait_for_idle(data);
 
+   return 0;
+}
+
+static int davinci_mdio_runtime_resume(struct device *dev)
+{
+   struct davinci_mdio_data *data = dev_get_drvdata(dev);
+
+   davinci_mdio_enable(data);
+   return 0;
+}
+#endif
+
+#ifdef CONFIG_PM_SLEEP
+static int davinci_mdio_suspend(struct device *dev)
+{
+   struct davinci_mdio_data *data = dev_get_drvdata(dev);
+   int ret = 0;
+
+   ret = pm_runtime_force_suspend(dev);
+   if (ret < 0)
+   return ret;
+
/* Select sleep pin state */
pinctrl_pm_select_sleep_state(dev);
 
@@ -431,14 +453,15 @@ static int davinci_mdio_resume(struct device *dev)
/* Select default pin state */
pinctrl_pm_select_default_state(dev);
 
-   /* restart the scan state machine */
-   davinci_mdio_enable(data);
+   pm_runtime_force_resume(dev);
 
return 0;
 }
 #endif
 
 static const struct dev_pm_ops davinci_mdio_pm_ops = {
+   SET_RUNTIME_PM_OPS(davinci_mdio_runtime_suspend,
+  davinci_mdio_runtime_resume, NULL)
SET_LATE_SYSTEM_SLEEP_PM_OPS(davinci_mdio_suspend, davinci_mdio_resume)
 };
 
-- 
2.8.4



[PATCH 15/15] ARM: dts: am335x/am437x/dra7: use new "ti,cpsw-mdio" compat string

2016-06-15 Thread Grygorii Strashko
Add "ti,cpsw-mdio" for am335x/am437x/dra7 SoCs where MDIO is
implemented as part of TI CPSW and, this way, enable PM runtime auto
suspend for Davinci MDIO driver on these paltforms.

Signed-off-by: Grygorii Strashko 
---
 arch/arm/boot/dts/am33xx.dtsi | 2 +-
 arch/arm/boot/dts/am4372.dtsi | 2 +-
 arch/arm/boot/dts/dra7.dtsi   | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm/boot/dts/am33xx.dtsi b/arch/arm/boot/dts/am33xx.dtsi
index 52be48b..7fa9a1d 100644
--- a/arch/arm/boot/dts/am33xx.dtsi
+++ b/arch/arm/boot/dts/am33xx.dtsi
@@ -789,7 +789,7 @@
status = "disabled";
 
davinci_mdio: mdio@4a101000 {
-   compatible = "ti,davinci_mdio";
+   compatible = "ti,cpsw-mdio","ti,davinci_mdio";
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "davinci_mdio";
diff --git a/arch/arm/boot/dts/am4372.dtsi b/arch/arm/boot/dts/am4372.dtsi
index 12fcde4..ea76fa7 100644
--- a/arch/arm/boot/dts/am4372.dtsi
+++ b/arch/arm/boot/dts/am4372.dtsi
@@ -636,7 +636,7 @@
syscon = <&scm_conf>;
 
davinci_mdio: mdio@4a101000 {
-   compatible = "ti,am4372-mdio","ti,davinci_mdio";
+   compatible = 
"ti,am4372-mdio","ti,cpsw-mdio","ti,davinci_mdio";
reg = <0x4a101000 0x100>;
#address-cells = <1>;
#size-cells = <0>;
diff --git a/arch/arm/boot/dts/dra7.dtsi b/arch/arm/boot/dts/dra7.dtsi
index e007401..911ef85 100644
--- a/arch/arm/boot/dts/dra7.dtsi
+++ b/arch/arm/boot/dts/dra7.dtsi
@@ -1661,7 +1661,7 @@
status = "disabled";
 
davinci_mdio: mdio@48485000 {
-   compatible = "ti,davinci_mdio";
+   compatible = "ti,cpsw-mdio","ti,davinci_mdio";
#address-cells = <1>;
#size-cells = <0>;
ti,hwmods = "davinci_mdio";
-- 
2.8.4



[PATCH 09/15] drivers: net: davinci_mdio: split reset function on init_clk and enable

2016-06-15 Thread Grygorii Strashko
The Davinci MDIO MDIO_CONTROL.CLKDIV can be calculated only once
during probe, hence split __davinci_mdio_reset() on
davinci_mdio_init_clk() and davinci_mdio_enable(). Initialize and
save CLKDIV in .probe(). Then just use saved value.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index b6d0059..b206fd3 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -98,9 +98,10 @@ struct davinci_mdio_data {
 * if MDIO bus is registered from DT.
 */
boolskip_scan;
+   u32 clk_div;
 };
 
-static void __davinci_mdio_reset(struct davinci_mdio_data *data)
+static void davinci_mdio_init_clk(struct davinci_mdio_data *data)
 {
u32 mdio_in, div, mdio_out_khz, access_time;
 
@@ -109,9 +110,7 @@ static void __davinci_mdio_reset(struct davinci_mdio_data 
*data)
if (div > CONTROL_MAX_DIV)
div = CONTROL_MAX_DIV;
 
-   /* set enable and clock divider */
-   __raw_writel(div | CONTROL_ENABLE, &data->regs->control);
-
+   data->clk_div = div;
/*
 * One mdio transaction consists of:
 *  32 bits of preamble
@@ -132,12 +131,18 @@ static void __davinci_mdio_reset(struct davinci_mdio_data 
*data)
data->access_time = 1;
 }
 
+static void davinci_mdio_enable(struct davinci_mdio_data *data)
+{
+   /* set enable and clock divider */
+   __raw_writel(data->clk_div | CONTROL_ENABLE, &data->regs->control);
+}
+
 static int davinci_mdio_reset(struct mii_bus *bus)
 {
struct davinci_mdio_data *data = bus->priv;
u32 phy_mask, ver;
 
-   __davinci_mdio_reset(data);
+   davinci_mdio_enable(data);
 
/* wait for scan logic to settle */
msleep(PHY_MAX_ADDR * data->access_time);
@@ -188,7 +193,7 @@ static inline int wait_for_user_access(struct 
davinci_mdio_data *data)
 * operation
 */
dev_warn(data->dev, "resetting idled controller\n");
-   __davinci_mdio_reset(data);
+   davinci_mdio_enable(data);
return -EAGAIN;
}
 
@@ -350,6 +355,8 @@ static int davinci_mdio_probe(struct platform_device *pdev)
if (IS_ERR(data->regs))
return PTR_ERR(data->regs);
 
+   davinci_mdio_init_clk(data);
+
pm_runtime_enable(&pdev->dev);
pm_runtime_get_sync(&pdev->dev);
 
@@ -425,7 +432,7 @@ static int davinci_mdio_resume(struct device *dev)
pinctrl_pm_select_default_state(dev);
 
/* restart the scan state machine */
-   __davinci_mdio_reset(data);
+   davinci_mdio_enable(data);
 
return 0;
 }
-- 
2.8.4



[PATCH 06/15] drivers: net: davinci_mdio: do pm runtime initialization later in probe

2016-06-15 Thread Grygorii Strashko
Do PM runtime initialization later in probe - this allows to simplify
error handling a bit.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 15 ++-
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index 4e7c9b9..2e19dd1 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -356,14 +356,10 @@ static int davinci_mdio_probe(struct platform_device 
*pdev)
data->bus->parent   = dev;
data->bus->priv = data;
 
-   pm_runtime_enable(&pdev->dev);
-   pm_runtime_get_sync(&pdev->dev);
data->clk = devm_clk_get(dev, "fck");
if (IS_ERR(data->clk)) {
dev_err(dev, "failed to get device clock\n");
-   ret = PTR_ERR(data->clk);
-   data->clk = NULL;
-   goto bail_out;
+   return PTR_ERR(data->clk);
}
 
dev_set_drvdata(dev, data);
@@ -372,10 +368,11 @@ static int davinci_mdio_probe(struct platform_device 
*pdev)
 
res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
data->regs = devm_ioremap_resource(dev, res);
-   if (IS_ERR(data->regs)) {
-   ret = PTR_ERR(data->regs);
-   goto bail_out;
-   }
+   if (IS_ERR(data->regs))
+   return PTR_ERR(data->regs);
+
+   pm_runtime_enable(&pdev->dev);
+   pm_runtime_get_sync(&pdev->dev);
 
/* register the mii bus
 * Create PHYs from DT only in case if PHY child nodes are explicitly
-- 
2.8.4



[PATCH 02/15] drivers: net: cpsw: check return code from pm runtime calls

2016-06-15 Thread Grygorii Strashko
Add missed check of return codes from PM runtime get() calls.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 736c77a..c76f9db 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1253,7 +1253,11 @@ static int cpsw_ndo_open(struct net_device *ndev)
int i, ret;
u32 reg;
 
-   pm_runtime_get_sync(&priv->pdev->dev);
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return ret;
+   }
 
if (!cpsw_common_res_usage_state(priv))
cpsw_intr_disable(priv);
@@ -2322,7 +2326,11 @@ static int cpsw_probe(struct platform_device *pdev)
/* Need to enable clocks with runtime PM api to access module
 * registers
 */
-   pm_runtime_get_sync(&pdev->dev);
+   ret = pm_runtime_get_sync(&pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&pdev->dev);
+   goto clean_runtime_disable_ret;
+   }
priv->version = readl(&priv->regs->id_ver);
pm_runtime_put_sync(&pdev->dev);
 
-- 
2.8.4



[PATCH 05/15] drivers: net: cpsw: ndev: fix accessing to suspended device

2016-06-15 Thread Grygorii Strashko
The CPSW might be suspended by RPM if all ethX interfaces are down,
but it still could be accesible through net_device_ops interfce. In
this case net_device_ops operations, requiring registers access, will
cause L3 errors and CPSW crash.

Hence, fix it by adding RPM get/put calls in net_device_ops callbacks
which can access CPSW registers: .ndo_set_mac_address(),
.ndo_vlan_rx_add_vid(), .ndo_vlan_rx_kill_vid().

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 33 ++---
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 1ba0c09..591d1c3 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -1633,10 +1633,17 @@ static int cpsw_ndo_set_mac_address(struct net_device 
*ndev, void *p)
struct sockaddr *addr = (struct sockaddr *)p;
int flags = 0;
u16 vid = 0;
+   int ret;
 
if (!is_valid_ether_addr(addr->sa_data))
return -EADDRNOTAVAIL;
 
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return ret;
+   }
+
if (priv->data.dual_emac) {
vid = priv->slaves[priv->emac_port].port_vlan;
flags = ALE_VLAN;
@@ -1651,6 +1658,8 @@ static int cpsw_ndo_set_mac_address(struct net_device 
*ndev, void *p)
memcpy(ndev->dev_addr, priv->mac_addr, ETH_ALEN);
for_each_slave(priv, cpsw_set_slave_mac, priv);
 
+   pm_runtime_put(&priv->pdev->dev);
+
return 0;
 }
 
@@ -1715,10 +1724,17 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device 
*ndev,
__be16 proto, u16 vid)
 {
struct cpsw_priv *priv = netdev_priv(ndev);
+   int ret;
 
if (vid == priv->data.default_vlan)
return 0;
 
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return ret;
+   }
+
if (priv->data.dual_emac) {
/* In dual EMAC, reserved VLAN id should not be used for
 * creating VLAN interfaces as this can break the dual
@@ -1733,7 +1749,10 @@ static int cpsw_ndo_vlan_rx_add_vid(struct net_device 
*ndev,
}
 
dev_info(priv->dev, "Adding vlanid %d to vlan filter\n", vid);
-   return cpsw_add_vlan_ale_entry(priv, vid);
+   ret = cpsw_add_vlan_ale_entry(priv, vid);
+
+   pm_runtime_put(&priv->pdev->dev);
+   return ret;
 }
 
 static int cpsw_ndo_vlan_rx_kill_vid(struct net_device *ndev,
@@ -1745,6 +1764,12 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device 
*ndev,
if (vid == priv->data.default_vlan)
return 0;
 
+   ret = pm_runtime_get_sync(&priv->pdev->dev);
+   if (ret < 0) {
+   pm_runtime_put_noidle(&priv->pdev->dev);
+   return ret;
+   }
+
if (priv->data.dual_emac) {
int i;
 
@@ -1764,8 +1789,10 @@ static int cpsw_ndo_vlan_rx_kill_vid(struct net_device 
*ndev,
if (ret != 0)
return ret;
 
-   return cpsw_ale_del_mcast(priv->ale, priv->ndev->broadcast,
- 0, ALE_VLAN, vid);
+   ret = cpsw_ale_del_mcast(priv->ale, priv->ndev->broadcast,
+0, ALE_VLAN, vid);
+   pm_runtime_put(&priv->pdev->dev);
+   return ret;
 }
 
 static const struct net_device_ops cpsw_netdev_ops = {
-- 
2.8.4



[PATCH 03/15] drivers: net: cpsw: remove pm runtime calls from suspend callbacks

2016-06-15 Thread Grygorii Strashko
PM runtime is properly handled in cpsw_ndo_open/stop(), as result it
isn't required to duplicate these calls in .suspend()/.resume()
callbacks. Moreover, it might cause unnecessary RPM resume of CPSW
during System suspend in the case it's already suspended because
all ethX interfaces are down, before System suspend started.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/cpsw.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c76f9db..ba81d4e 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -2573,8 +2573,6 @@ static int cpsw_suspend(struct device *dev)
cpsw_ndo_stop(ndev);
}
 
-   pm_runtime_put_sync(&pdev->dev);
-
/* Select sleep pin state */
pinctrl_pm_select_sleep_state(&pdev->dev);
 
@@ -2587,8 +2585,6 @@ static int cpsw_resume(struct device *dev)
struct net_device   *ndev = platform_get_drvdata(pdev);
struct cpsw_priv*priv = netdev_priv(ndev);
 
-   pm_runtime_get_sync(&pdev->dev);
-
/* Select default pin state */
pinctrl_pm_select_default_state(&pdev->dev);
 
-- 
2.8.4



[PATCH 07/15] drivers: net: davinci_mdio: remove pm runtime calls from suspend callbacks

2016-06-15 Thread Grygorii Strashko
PM runtime is disabled when Davinci MDIO .suspend_late() and
.resume_early() callbacks are called. As result, any PM runtime calls here will
be just a nop and can be removed.

Signed-off-by: Grygorii Strashko 
---
 drivers/net/ethernet/ti/davinci_mdio.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/net/ethernet/ti/davinci_mdio.c 
b/drivers/net/ethernet/ti/davinci_mdio.c
index 2e19dd1..291c42e 100644
--- a/drivers/net/ethernet/ti/davinci_mdio.c
+++ b/drivers/net/ethernet/ti/davinci_mdio.c
@@ -436,7 +436,6 @@ static int davinci_mdio_suspend(struct device *dev)
 
data->suspended = true;
spin_unlock(&data->lock);
-   pm_runtime_put_sync(data->dev);
 
/* Select sleep pin state */
pinctrl_pm_select_sleep_state(dev);
@@ -451,8 +450,6 @@ static int davinci_mdio_resume(struct device *dev)
/* Select default pin state */
pinctrl_pm_select_default_state(dev);
 
-   pm_runtime_get_sync(data->dev);
-
spin_lock(&data->lock);
/* restart the scan state machine */
__davinci_mdio_reset(data);
-- 
2.8.4



[PATCH 00/15] drivers: net: cpsw: improve runtime pm

2016-06-15 Thread Grygorii Strashko
This series intended to improve runtime PM and allow CPSW to be
RPM suspended when all ethX netdevices are down.

To achieve above goal it is required to relax runtime PM constraints for
Davinci MDIO which blocks CPSW runtime PM now, because Davinci MDIO is always
powered on during probe and powered off only when it's going to be removed.
- Patches 6-11 implement PM runtime autosuspend for Davinci MDIO, but keep it
disabled by default, because Davinci MDIO is integrated in big set of TI devices
and not all of them verified to work correctly with RPM autosuspend enabled:
 expected to work on SoCs where MDIO is defined as part of CPSW in DT
 (cpsw.c DRA7/am57x, am437x, am335x)
The CPSW need to be fixed before RPM suspended can be allowed:
 - Patches 1-5 ensure that CPSW will not cause L3 errors while it is in RPM
   suspended state.

Davinci MDIO RPM autosuspend can be enabled through sysfs:
 echo 100 > 
/sys/devices/../48484000.ethernet/48485000.mdio/power/autosuspend_delay_ms

Patches 12 - 15: introduce new compatible string "ti,cpsw-mdio" which is used
then to enable RPM for am335x/am437x/dra7 SoCs.

Tested on am335x, am437x, am572x and k2g (on k2g with RPM disabled for Davinci 
MDIO)
These changes should not affect on errata i877 implementation on DRA7.

Power measurement on am335x GP EVM:
 Without this series:  547.60 mW total SoC power
 With this series + "ifconfig eth0 down": 477.32 mW Total Soc Power 

Grygorii Strashko (15):
  drivers: net: cpsw: fix suspend when all ethX devices are down
  drivers: net: cpsw: check return code from pm runtime calls
  drivers: net: cpsw: remove pm runtime calls from suspend callbacks
  drivers: net: cpsw: ethtool: fix accessing to suspended device
  drivers: net: cpsw: ndev: fix accessing to suspended device
  drivers: net: davinci_mdio: do pm runtime initialization later in
probe
  drivers: net: davinci_mdio: remove pm runtime calls from suspend
callbacks
  drivers: net: davinci_mdio: drop suspended and lock fields from
mdio_data
  drivers: net: davinci_mdio: split reset function on init_clk and
enable
  drivers: net: davinci_mdio: add pm runtime callbacks
  drivers: net: davinci_mdio: implement pm runtime auto mode
  net: davinci_mdio: document missed "ti,am4372-mdio" compat string
  net: davinci_mdio: introduce "ti,cpsw-mdio" compat string
  drivers: net: davinci_mdio: enable pm runtime auto for ti cpsw-mdio
  ARM: dts: am335x/am437x/dra7: use new "ti,cpsw-mdio" compat string

 .../devicetree/bindings/net/davinci-mdio.txt   |   3 +-
 arch/arm/boot/dts/am33xx.dtsi  |   2 +-
 arch/arm/boot/dts/am4372.dtsi  |   2 +-
 arch/arm/boot/dts/dra7.dtsi|   2 +-
 drivers/net/ethernet/ti/cpsw.c |  88 +--
 drivers/net/ethernet/ti/davinci_mdio.c | 169 +
 6 files changed, 189 insertions(+), 77 deletions(-)

-- 
2.8.4



Re: [very-RFC 7/8] AVB ALSA - Add ALSA shim for TSN

2016-06-15 Thread Henrik Austad
On Wed, Jun 15, 2016 at 01:49:08PM +0200, Richard Cochran wrote:
> Now that I understand better...
> 
> On Sun, Jun 12, 2016 at 01:01:35AM +0200, Henrik Austad wrote:
> > Userspace is supposed to reserve bandwidth, find StreamID etc.
> > 
> > To use as a Talker:
> > 
> > mkdir /config/tsn/test/eth0/talker
> > cd /config/tsn/test/eth0/talker
> > echo 65535 > buffer_size
> > echo 08:00:27:08:9f:c3 > remote_mac
> > echo 42 > stream_id
> > echo alsa > enabled
> 
> This is exactly why configfs is the wrong interface.  If you implement
> the AVB device in alsa-lib user space, then you can handle the
> reservations, configuration, UDP sockets, etc, in a way transparent to
> the aplay program.

And how would v4l2 benefit from this being in alsalib? Should we require 
both V4L and ALSA to implement the same, or should we place it in a common 
place for all.

And what about those systems that want to use TSN but is not a 
media-device, they should be given a raw-socket to send traffic over, 
should they also implement something in a library?

So no, here I think configfs is an apt choice.

> Heck, if done properly, your layer could discover the AVB nodes in the
> network and present each one as a separate device...

No, you definately do not want the kernel to automagically add devices 
whenever something pops up on the network, for this you need userspace to 
be in control. 1722.1 should not be handled in-kernel.


-- 
Henrik Austad


signature.asc
Description: Digital signature


Re: [PATCH] netfilter: fix buffer null termination

2016-06-15 Thread Pablo Neira Ayuso
On Tue, Jun 14, 2016 at 09:52:49PM +0530, Kishan Sandeep wrote:
> Hi Pablo,
> 
> On Tue, Jun 14, 2016 at 8:38 PM, Pablo Neira Ayuso  
> wrote:
> > Cc'ing netfilter-devel.
> >
> > On Tue, Jun 14, 2016 at 07:39:27PM +0530, Kishan Sandeep wrote:
> >> + netdev
> >>
> >> On Sat, Jun 11, 2016 at 10:18 AM, Kishan Sandeep
> >>  wrote:
> >> > strncpy generally perferable fo non-terminated
> >> > fixed-width strings. For NULL termination strlcpy
> >> > is preferrable.
> >> >
> >> > Signed-off-by: Kishan Sandeep 
> >> > ---
> >> >  net/netfilter/xt_repldata.h |2 +-
> >> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >> >
> >> > diff --git a/net/netfilter/xt_repldata.h b/net/netfilter/xt_repldata.h
> >> > index 8fd3241..a460211 100644
> >> > --- a/net/netfilter/xt_repldata.h
> >> > +++ b/net/netfilter/xt_repldata.h
> >> > @@ -28,7 +28,7 @@
> >> > if (tbl == NULL) \
> >> > return NULL; \
> >> > term = (struct type##_error *)&(((char *)tbl)[term_offset]); \
> >> > -   strncpy(tbl->repl.name, info->name, sizeof(tbl->repl.name)); \
> >> > +   strlcpy(tbl->repl.name, info->name, sizeof(tbl->repl.name)); \
> >
> > I don't think this is actually fixing anything. Tables in x_tables
> > have a known and fixed name that is defined from the kernel side, that
> > is always smaller that the buffer we have there. So are you observing
> > any real problem from there?
> >
> > Thanks.
> 
> Not observed any real problem. Here the string is not NULL terminated with
> the use of strncpy - that is the problem. With the use of strlcpy we can make
> the string to terminated properly.

1) Table names set from the kernel codebase, so they are always
   way smaller than that.

2) We're not expecting the addition of new tables in the future that
   we result in the hypothetical problem that you indicate.

Thanks.


Re: [PATCH net-next 01/10] net_sched: add the ability to defer skb freeing

2016-06-15 Thread Jesper Dangaard Brouer
On Mon, 13 Jun 2016 20:21:50 -0700
Eric Dumazet  wrote:

> qdisc are changed under RTNL protection and often
> while blocking BH and root qdisc spinlock.
> 
> When lots of skbs need to be dropped, we free
> them under these locks causing TX/RX freezes,
> and more generally latency spikes.
> 
> This commit adds rtnl_kfree_skbs(), used to queue
> skbs for deferred freeing.
> 
> Actual freeing happens right after RTNL is released,
> with appropriate scheduling points.
> 
> rtnl_qdisc_drop() can also be used in place
> of disc_drop() when RTNL is held.
> 
> qdisc_reset_queue() and __qdisc_reset_queue() get
> the new behavior, so standard qdiscs like pfifo, pfifo_fast...
> have their ->reset() method automatically handled.
> 
> Signed-off-by: Eric Dumazet 
> ---
>  include/linux/rtnetlink.h |  5 +++--
>  include/net/sch_generic.h | 16 
>  net/core/rtnetlink.c  | 22 ++
>  net/sched/sch_generic.c   |  2 +-
>  4 files changed, 38 insertions(+), 7 deletions(-)
> 
> diff --git a/include/linux/rtnetlink.h b/include/linux/rtnetlink.h
> index c006cc900c44..2daece8979f7 100644
> --- a/include/linux/rtnetlink.h
> +++ b/include/linux/rtnetlink.h
[...]
> diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
> index d69c4644f8f2..eb49ca24274a 100644
> --- a/net/core/rtnetlink.c
> +++ b/net/core/rtnetlink.c
> @@ -71,9 +71,31 @@ void rtnl_lock(void)
>  }
>  EXPORT_SYMBOL(rtnl_lock);
>  
> +static struct sk_buff *defer_kfree_skb_list;
> +void rtnl_kfree_skbs(struct sk_buff *head, struct sk_buff *tail)
> +{
> + if (head && tail) {
> + tail->next = defer_kfree_skb_list;
> + defer_kfree_skb_list = head;
> + }
> +}
> +EXPORT_SYMBOL(rtnl_kfree_skbs);
> +
>  void __rtnl_unlock(void)
>  {
> + struct sk_buff *head = defer_kfree_skb_list;
> +
> + defer_kfree_skb_list = NULL;
> +
>   mutex_unlock(&rtnl_mutex);
> +
> + while (head) {
> + struct sk_buff *next = head->next;
> +
> + kfree_skb(head);
> + cond_resched();
> + head = next;
> + }
>  }

This looks a lot like kfree_skb_list()

What about bulk free'ing SKBs here?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH] mac80211_hwsim: Allow wmediumd to attach to radios created in its netns

2016-06-15 Thread Martin Willi
> 
> >  printk(KERN_INFO "mac80211_hwsim: wmediumd released netlink"
> >         " socket, switching to perfect channel medium\n");

> I wonder if we can do something better about them? Or perhaps if we
> should remove them, so other namespaces won't mess up the kernel log

This is in fact not very nice, but not specific to hwsim. Any namespace
can mess up the kernel log from different (networking) subsystems. This
has been discussed some time ago [1], but AFAIK there is no real
solution so far.

For this patch I think we have the following options:
 * Keep the printk() messages as proposed
 * Remove those callable from non-initial namespaces completely
 * Suppress them when called from non-initial namespaces
 * Include the associated "netgroup" in the message

I personally would prefer the first option, as this problem is not
specific to hwsim or mac80211, but many subsystems. So we certainly can
add some work-around, but there is not much to gain if other modules
don't.

Regards
Martin


[1]https://lwn.net/Articles/527342/


Re: [PATCH net-next 01/10] net_sched: add the ability to defer skb freeing

2016-06-15 Thread Eric Dumazet
On Wed, Jun 15, 2016 at 5:33 AM, Jesper Dangaard Brouer
 wrote:
> On Mon, 13 Jun 2016 20:21:50 -0700
>
>
> This looks a lot like kfree_skb_list()
>
> What about bulk free'ing SKBs here?

We are in a very slow path here. Once in a while a potentially huge
qdisc is dismantled.

The important point is to free these skbs outside of any mutex/lock/bh
, not gain 5% on the actual freeing ;)

Now if you have a use case where these operations happen often enough,
then I believe you have a problem you need to fix !


Re: [PATCH] netfilter/nflog: nflog-range does not truncate packets

2016-06-15 Thread Pablo Neira Ayuso
On Sun, Jun 12, 2016 at 11:40:57PM -0400, Vishwanath Pai wrote:
> On 06/09/2016 01:57 PM, Vishwanath Pai wrote:
> > On 06/08/2016 08:16 AM, Pablo Neira Ayuso wrote:
> >> Looking again at your code:
> >>
> >> case NFULNL_COPY_PACKET:
> >> -   if (inst->copy_range > skb->len)
> >> +   data_len = inst->copy_range;
> >> +   if (li->u.ulog.copy_len < data_len)
> >> +   data_len = li->u.ulog.copy_len;
> >>
> >> data_len is set to instance's copy_range.
> >>
> >> But then, if the NFLOG rule indicates smaller copy_len, you use this
> >> value. So to my understanding, NFLOG rule prevails over instance's
> >> copy_range in what matters, which is to shrink the copy range.
> >>
>  --nflog-range will not override the per-instance default,
>  the only time it would get preference is when its value is lesser than
>  the per-instance value. If copy_range is lesser than --nflog-range then
>  we retain copy_range.
> 
>  So basically what we are doing is min(copy_range, nflog-range).
>  Just wanted to clarify this, if this is not how it's meant to be
>  please let me know.
> 
>  Also, there is a bug in my patch, li->u.ulog.copy_len can be set to "0"
>  from userspace (if --nflog-range is not specified), so we have to check
>  for this condition before using the value. I will send a V2 of the patch
>  based on your reply.
> >> Currently, li->u.ulog.copy_len is set to "0" by default when not
> >> specified.
> >>
> >> But copy_len = 0 is a valid possibility, so this looks a bit more
> >> tricky to me to fix since we may need to get flags here to know when
> >> this is set.
> >>
> >> Probably something like:
> >>
> >> if (li->flags & NF_LOG_F_COPY_RANGE)
> >> data_len = li->u.ulog.copy_len;
> >> /* Per-instance copy range prevails over global per-rule option. */
> >> if (data_len < inst->copy_range)
> >> data_len = inst->copy_range;
> >> if (data_len > skb->len)
> >> data_len = skb->len;
> >>
> >> Although this would require a bit more code to introduce these flags.
> >>
> >> You will also need a new flag for xt_NFLOG:
> >>
> >> #define XT_NFLOG_COPY_LEN   0x2
> >>
> >> it seems other XT_NFLOG_* flags were already in place.
> >>
> >> Interesting that nobody noticed this for so long BTW.
> > 
> > I tried this out, I added two flags: one for XT_NFLOG to notify the
> > kernel when --nflog-range is set by the user, and another flag for
> > nfnetlink_log to pass this on to nfulnl_log_packet. This design works
> > fine but while testing this I found a problem.
> > 
> > Lets say --nflog-range is set to 200, and the instance copy_range is set
> > to 100. According to the code above the final value of data_len will be
> > 200 so we try to pack 200 bytes into the skb. But somewhere between
> > nfnetlink_log to dumpcap the packet is getting truncated and dumpcap
> > doesn't like this:
> > 
> > $> dumpcap -y NFLOG -i nflog:5 -s 100
> > Capturing on 'nflog:5'
> > File: /tmp/wireshark_pcapng_nflog-5_20160609133531_pi6MrS
> > dumpcap: Error while capturing packets: Message truncated: (got: 228)
> > (nlmsg_len: 320)
> > Please report this to the Wireshark developers.
> > https://bugs.wireshark.org/
> > (This is not a crash; please do not report it as such.)
> > Packets captured: 0
> > Packets received/dropped on interface 'nflog:5': 0/0
> > (pcap:0/dumpcap:0/flushed:0/ps_ifdrop:0) (0.0%)
> > 
> > I'm trying to figure out where the packet is getting truncated.
> > 
> 
> I found where the problem is. This is the userspace code for libpcap:
> 
> do {
> len = recv(handle->fd, handle->buffer, handle->bufsize, 0);
> if (handle->break_loop) {
> handle->break_loop = 0;
> return -2;
> }
> } while ((len == -1) && (errno == EINTR));
> 
>...
> 
> buf = handle->buffer;
> while (len >= NLMSG_SPACE(0)) {
> const struct nlmsghdr *nlh = (const struct nlmsghdr *) buf;
> u_int32_t msg_len;
> nftype_t type = OTHER;
> 
> if (nlh->nlmsg_len < sizeof(struct nlmsghdr) || len <
> nlh->nlmsg_len) {
> snprintf(handle->errbuf, PCAP_ERRBUF_SIZE,
> "Message truncated: (got: %d) (nlmsg_len: %u)", len, nlh->nlmsg_len);
> return -1;
> }
>
> handle->bufsize is only big enough to accommodate the snaplen specified
> by the user in dumpcap. So if we send more data than that then we will
> break userspace. One way around this is to send min(li->u.ulog.copy_len,
> inst->copy_range). If this is OK then I can send a patch for this,
> please suggest.

But nlmsg_len should match len in this.

If we're just sending a part of the packet to userspace, then we
should adjust nlmsg_len to indicate exactly the netlink message length
that 

Re: [very-RFC 7/8] AVB ALSA - Add ALSA shim for TSN

2016-06-15 Thread Richard Cochran
On Wed, Jun 15, 2016 at 02:13:03PM +0200, Henrik Austad wrote:
> On Wed, Jun 15, 2016 at 01:49:08PM +0200, Richard Cochran wrote:
> And how would v4l2 benefit from this being in alsalib? Should we require 
> both V4L and ALSA to implement the same, or should we place it in a common 
> place for all.

I don't require V4L to implement anything.  You were the one wanting
AVB "devices".  Go ahead and do that, but in user space please.  We
don't want to have kernel code that sends arbitrary Layer2 and UDP
media packets.

The example you present of using aplay over the network is a cute
idea, but after all it is a trivial application.  I have nothing
against creating virtual ALSA devices (if done properly), but that
model won't work for more realistic AVB networks.

> And what about those systems that want to use TSN but is not a 
> media-device, they should be given a raw-socket to send traffic over, 
> should they also implement something in a library?

A raw socket is the way to do it, yes.

Since TSN is about bandwidth reservation and time triggered Ethernet,
decoupled from the contents of the streams, each new TSN application
will have to see what works best.  If common ground is found, then a
library makes sense to do.

At this point, it is way too early to guess how that will look.  But
media packet formats clearly belong in user space.

> So no, here I think configfs is an apt choice.
> 
> > Heck, if done properly, your layer could discover the AVB nodes in the
> > network and present each one as a separate device...
> 
> No, you definately do not want the kernel to automagically add devices 
> whenever something pops up on the network, for this you need userspace to 
> be in control. 1722.1 should not be handled in-kernel.

The layer should be in user space.  Alsa-lib *is* user space.

Thanks,
Richard


[PATCH] libertas: Add spinlock to avoid race condition

2016-06-15 Thread Pavel Andrianov
lbs_mac_event_disconnected may free priv->currenttxskb
while lbs_hard_start_xmit accesses to it.
The patch adds a spinlock for mutual exclusion.

Tested on OLPC XO-1 (usb8388) and XO-1.5 (sd8686) with v4.7-rc3.

Confirmed that lbs_mac_event_disconnected is being called on the
station when hostapd on access point is given SIGHUP.

Signed-off-by: Pavel 
Tested-by: James Cameron 
---
 drivers/net/wireless/marvell/libertas/cmdresp.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/net/wireless/marvell/libertas/cmdresp.c 
b/drivers/net/wireless/marvell/libertas/cmdresp.c
index c95bf6d..c67ae07 100644
--- a/drivers/net/wireless/marvell/libertas/cmdresp.c
+++ b/drivers/net/wireless/marvell/libertas/cmdresp.c
@@ -27,6 +27,8 @@
 void lbs_mac_event_disconnected(struct lbs_private *priv,
bool locally_generated)
 {
+   unsigned long flags;
+
if (priv->connect_status != LBS_CONNECTED)
return;
 
@@ -46,9 +48,11 @@ void lbs_mac_event_disconnected(struct lbs_private *priv,
netif_carrier_off(priv->dev);
 
/* Free Tx and Rx packets */
+   spin_lock_irqsave(&priv->driver_lock, flags);
kfree_skb(priv->currenttxskb);
priv->currenttxskb = NULL;
priv->tx_pending_len = 0;
+   spin_unlock_irqrestore(&priv->driver_lock, flags);
 
priv->connect_status = LBS_DISCONNECTED;
 
-- 
1.7.11.7



Re: [PATCH] mac80211_hwsim: Allow wmediumd to attach to radios created in its netns

2016-06-15 Thread Johannes Berg
On Wed, 2016-06-15 at 14:37 +0200, Martin Willi wrote:
> > 
> > 
> > >  printk(KERN_INFO "mac80211_hwsim: wmediumd released netlink"
> > >         " socket, switching to perfect channel medium\n");
> 
> > I wonder if we can do something better about them? Or perhaps if we
> > should remove them, so other namespaces won't mess up the kernel
> > log
> 
> This is in fact not very nice, but not specific to hwsim. Any
> namespace
> can mess up the kernel log from different (networking) subsystems.
> This
> has been discussed some time ago [1], but AFAIK there is no real
> solution so far.
> 
> For this patch I think we have the following options:
>  * Keep the printk() messages as proposed
>  * Remove those callable from non-initial namespaces completely
>  * Suppress them when called from non-initial namespaces
>  * Include the associated "netgroup" in the message
> 
> I personally would prefer the first option, as this problem is not
> specific to hwsim or mac80211, but many subsystems. So we certainly
> can
> add some work-around, but there is not much to gain if other modules
> don't.
> 

Fair enough.

johannes


Re: [PATCH] net: Fragment large datagrams even when IP_HDRINCL is set.

2016-06-15 Thread Hannes Frederic Sowa
On 31.05.2016 20:39, David Miller wrote:
> From: Alan Davey 
> Date: Mon, 23 May 2016 15:23:45 +0100
> 
>> One of the bugs documented in the raw(7) man page is as follows: When the
>> IP_HDRINCL option is set, datagrams will not be fragmented and are limited to
>> the interface MTU.
>>
>> This patch fixes the bug by removing the check for "length > 
>> rt->dst.dev->mtu"
>> in raw_send_hdrinc() (net/ipv4/raw.c).  Datagrams are no longer limited to 
>> the
>> interface MTU size if the IP_HDRINCL option is set, but are fragmented, if
>> necessary, in the same way as all other datagrams.
>>
>> Signed-off-by: Alan Davey 
> 
> This is not a bug, it's a feature and it's how RAW ipv4 sockets have behaved
> for two decades.
> 
> If the user wants to use hdr inclusion, he can send multiple frames and set
> the fragmentation bits appropriately.

Actually this is difficult, as only the kernel is in control of the
IP_ID generator and it should never leak its state to user space.

The probability of IP_ID collisions would increase drastically because
of this, even imagine different programs doing so concurrently.

Basically those checks are pretty much unnecessary anyway, if we set raw
sockets by default into inet->pmtudisc IPPMTUDISC_DO mode, so a user can
easily control the fragmentation on a raw socket themselves based on
already provided infrastructure.

Bye,
Hannes



Re: [PATCH] libertas: Add spinlock to avoid race condition

2016-06-15 Thread Vaishali Thakkar


On Wednesday 15 June 2016 05:04 PM, Pavel Andrianov wrote:
> lbs_mac_event_disconnected may free priv->currenttxskb
> while lbs_hard_start_xmit accesses to it.
> The patch adds a spinlock for mutual exclusion.
> 
> Tested on OLPC XO-1 (usb8388) and XO-1.5 (sd8686) with v4.7-rc3.
> 
> Confirmed that lbs_mac_event_disconnected is being called on the
> station when hostapd on access point is given SIGHUP.
> 
> Signed-off-by: Pavel 
> Tested-by: James Cameron 

Looks good to me.

Acked-by: Vaishali Thakkar 

> ---
>  drivers/net/wireless/marvell/libertas/cmdresp.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/drivers/net/wireless/marvell/libertas/cmdresp.c 
> b/drivers/net/wireless/marvell/libertas/cmdresp.c
> index c95bf6d..c67ae07 100644
> --- a/drivers/net/wireless/marvell/libertas/cmdresp.c
> +++ b/drivers/net/wireless/marvell/libertas/cmdresp.c
> @@ -27,6 +27,8 @@
>  void lbs_mac_event_disconnected(struct lbs_private *priv,
>   bool locally_generated)
>  {
> + unsigned long flags;
> +
>   if (priv->connect_status != LBS_CONNECTED)
>   return;
>  
> @@ -46,9 +48,11 @@ void lbs_mac_event_disconnected(struct lbs_private *priv,
>   netif_carrier_off(priv->dev);
>  
>   /* Free Tx and Rx packets */
> + spin_lock_irqsave(&priv->driver_lock, flags);
>   kfree_skb(priv->currenttxskb);
>   priv->currenttxskb = NULL;
>   priv->tx_pending_len = 0;
> + spin_unlock_irqrestore(&priv->driver_lock, flags);
>  
>   priv->connect_status = LBS_DISCONNECTED;
>  
> 

-- 
Vaishali


[PATCH v2 net] gre: fix error handler

2016-06-15 Thread Eric Dumazet
From: Eric Dumazet 

1) gre_parse_header() can be called from gre_err()

   At this point transport header points to ICMP header, not the inner
header.

2) We can not really change transport header as ipgre_err() will later
assume transport header still points to ICMP header (using icmp_hdr())

3) pskb_may_pull() logic in gre_parse_header() really works
  if we are interested at zone pointed by skb->data

4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in
ICMP error processing") we should not pull headers in error handler.

So this fix :

A) changes gre_parse_header() to use skb->data instead of
skb_transport_header()

B) Adds a nhs parameter to gre_parse_header() so that we can skip the
not pulled IP header from error path.
  This offset is 0 for normal receive path.

C) remove obsolete IPV6 includes

Signed-off-by: Eric Dumazet 
Cc: Tom Herbert 
Cc: Maciej Żenczykowski 
Cc: Jiri Benc 
---
 include/net/gre.h|2 +-
 net/ipv4/gre_demux.c |   10 +-
 net/ipv4/ip_gre.c|   12 
 net/ipv6/ip6_gre.c   |2 +-
 4 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/include/net/gre.h b/include/net/gre.h
index 5dce30a6abe3..7a54a31d1d4c 100644
--- a/include/net/gre.h
+++ b/include/net/gre.h
@@ -26,7 +26,7 @@ int gre_del_protocol(const struct gre_protocol *proto, u8 
version);
 struct net_device *gretap_fb_dev_create(struct net *net, const char *name,
   u8 name_assign_type);
 int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
-bool *csum_err, __be16 proto);
+bool *csum_err, __be16 proto, int nhs);
 
 static inline int gre_calc_hlen(__be16 o_flags)
 {
diff --git a/net/ipv4/gre_demux.c b/net/ipv4/gre_demux.c
index 4c39f4fd332a..de1d119a4497 100644
--- a/net/ipv4/gre_demux.c
+++ b/net/ipv4/gre_demux.c
@@ -62,26 +62,26 @@ EXPORT_SYMBOL_GPL(gre_del_protocol);
 
 /* Fills in tpi and returns header length to be pulled. */
 int gre_parse_header(struct sk_buff *skb, struct tnl_ptk_info *tpi,
-bool *csum_err, __be16 proto)
+bool *csum_err, __be16 proto, int nhs)
 {
const struct gre_base_hdr *greh;
__be32 *options;
int hdr_len;
 
-   if (unlikely(!pskb_may_pull(skb, sizeof(struct gre_base_hdr
+   if (unlikely(!pskb_may_pull(skb, nhs + sizeof(struct gre_base_hdr
return -EINVAL;
 
-   greh = (struct gre_base_hdr *)skb_transport_header(skb);
+   greh = (struct gre_base_hdr *)(skb->data + nhs);
if (unlikely(greh->flags & (GRE_VERSION | GRE_ROUTING)))
return -EINVAL;
 
tpi->flags = gre_flags_to_tnl_flags(greh->flags);
hdr_len = gre_calc_hlen(tpi->flags);
 
-   if (!pskb_may_pull(skb, hdr_len))
+   if (!pskb_may_pull(skb, nhs + hdr_len))
return -EINVAL;
 
-   greh = (struct gre_base_hdr *)skb_transport_header(skb);
+   greh = (struct gre_base_hdr *)(skb->data + nhs);
tpi->proto = greh->protocol;
 
options = (__be32 *)(greh + 1);
diff --git a/net/ipv4/ip_gre.c b/net/ipv4/ip_gre.c
index 07c5cf1838d8..1d000af7f561 100644
--- a/net/ipv4/ip_gre.c
+++ b/net/ipv4/ip_gre.c
@@ -49,12 +49,6 @@
 #include 
 #include 
 
-#if IS_ENABLED(CONFIG_IPV6)
-#include 
-#include 
-#include 
-#endif
-
 /*
Problems & solutions

@@ -217,12 +211,14 @@ static void gre_err(struct sk_buff *skb, u32 info)
 * by themselves???
 */
 
+   const struct iphdr *iph = (struct iphdr *)skb->data;
const int type = icmp_hdr(skb)->type;
const int code = icmp_hdr(skb)->code;
struct tnl_ptk_info tpi;
bool csum_err = false;
 
-   if (gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IP)) < 0) {
+   if (gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IP),
+iph->ihl * 4) < 0) {
if (!csum_err)  /* ignore csum errors. */
return;
}
@@ -338,7 +334,7 @@ static int gre_rcv(struct sk_buff *skb)
}
 #endif
 
-   hdr_len = gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IP));
+   hdr_len = gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IP), 0);
if (hdr_len < 0)
goto drop;
 
diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index fdc9de276ab1..776d145113e1 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -468,7 +468,7 @@ static int gre_rcv(struct sk_buff *skb)
bool csum_err = false;
int hdr_len;
 
-   hdr_len = gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IPV6));
+   hdr_len = gre_parse_header(skb, &tpi, &csum_err, htons(ETH_P_IPV6), 0);
if (hdr_len < 0)
goto drop;
 




[PATCH net] ixgbe: napi_poll must return the work done

2016-06-15 Thread Paolo Abeni
Currently the function ixgbe_poll() returns 0 when it clean completely
the rx rings, but this foul budget accounting in core code.
Fix this returning the actual work done, capped to weight - 1, since
the core doesn't allow to return the full budget when the driver modifies
the napi status

Signed-off-by: Paolo Abeni 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 088c47c..8bebd86 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2887,7 +2887,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
if (!test_bit(__IXGBE_DOWN, &adapter->state))
ixgbe_irq_enable_queues(adapter, BIT_ULL(q_vector->v_idx));
 
-   return 0;
+   return min(work_done, budget - 1);
 }
 
 /**
-- 
1.8.3.1



[PATCH 0/5] genirq: threadable IRQ support

2016-06-15 Thread Paolo Abeni
This patch series adds a new genirq interface to allows the user space to change
the IRQ mode at runtime, switching to and from the threaded mode.

The configuration is performing on a per irqaction basis, writing into the
newly added procfs entry /proc/irq///threaded. Such entry
is created at IRQ request time, only if CONFIG_IRQ_FORCED_THREADING
is defined.

Upon IRQ creation, the device handling such IRQ may optionally provide, via
the newly added API irq_set_mode_notifier(), an additional callback to be
notified about IRQ mode change.
The device can use such callback to configure its internal state to behave
differently in threaded mode and in normal mode if required.

Additional IRQ flags are added to let the device specifies some default
aspects of the IRQ thread. The device can request a SCHED_NORMAL scheduling
policy and avoid the affinity setting for the IRQ thread. Both of such
options are beneficial for the first threadable IRQ user.

The initial user for this feature is the networking subsystem; some
infrastructure is added to the network core for such goal. A new napi field
storing an IRQ thread reference is used to mark a NAPI instance as threaded
and __napi_schedule is modified to invoke a poll loop directly instead of
raising a softirq when the related NAPI instance is in threaded mode, plus 
a IRQ_mode_set callback is provided to notify the NAPI instance of the IRQ
mode change.

Each network device driver must be migrated explicitly to leverage the new
infrastructure. In this patch series, the Intel ixgbe is updated to invoke
irq_set_mode_notifier(), only when using msix IRQs. 
This avoids other IRQ events to be delayed indefinitely when the rx IRQ is
processed in thread mode. The default behavior after the driver migration is
unchanged.

Running the rx packets processing inside a conventional kthread is beneficial
for different workload since it allows the process scheduler to nicely use
the available resources. With multiqueue NICs, the ksoftirq design does not 
allow
any running process to use 100% of a single CPU, under relevant network load,
because the softirq poll loop will be scheduled on each CPU.

The above can be experienced in a hypervisor/VMs scenario, when the guest is
under UDP flood. If the hypervisor's NIC has enough rx queues the guest will
compete with ksoftirqd on each CPU. Moreover, since the ksoftirqd CPU
utilization change with the ingress traffic, the scheduler try to migrate the
guest processes towards the CPUs with the highest capacity, further impacting
the guest ability to process rx packets.

Running the hypervisor rx packet processing inside a migrable kthread allows
the process scheduler to let the guest process[es] to fully use a single a
core each, migrating some rx threads as required.

The raw numbers, obtained with the netperf UDP_STREAM test, using a tun
device with a noqueue qdisc in the hypervisor, and using random IP addresses
as source in case of multiple flows, are as follow:

vanilla threaded
size/flow   kppskpps/delta
1/1 824 843/+2%
1/25736 906/+23%
1/50752 906/+20%
1/100   772 906/+17%
1/200   741 976/+31%
64/1829 840/+1%
64/25   711 932/+31%
64/50   780 894/+14%
64/100  754 946/+25%
64/200  714 945/+32%
256/1   702 510/-27%
256/25  724 894/+23%
256/50  739 889/+20%
256/100 798 873/+9%
256/200 812 907/+11%
1400/1  720 727/+1%
1400/25 826 826/0
1400/50 827 833/0
1400/100820 820/0
1400/200796 799/0

The guest runs 2vCPU, so it's not prone to the userspace livelock issue
recently exposed here: http://thread.gmane.org/gmane.linux.kernel/2218719

There are relevant improvement in all cpu bounded scenarios with multiple flows
and significant regression with medium size packet, single flow. The latter
is due to the increased 'burstiness' of packet processing which cause the
single socket in the guest of overflow more easily, if the receiver application
is scheduled on the same cpu processing the incoming packets.

The kthread approach should give a lot of new advantages over the softirq
based approach:

* moving into a more dpdk-alike busy poll packet processing direction:
we can even use busy polling without the need of a connected UDP or TCP
socket and can leverage busy polling for forwarding setups. This could
very well increase latency and packet throughput without hurting other
processes if the networking stack gets more and more preemptive in the
future.

* possibility to acquire mutexes in the networking processing path: e.g.
we would need that to configure hw_breakpoints if we want to add
watchpoi

[PATCH 4/5] netdev: implement infrastructure for threadable napi irq

2016-06-15 Thread Paolo Abeni
This commit adds the infrastructure needed for threadable
rx interrupt. A reference to the irq thread is used to
mark the threaded irq mode.
In threaded mode the poll loop is invoked directly from
__napi_schedule().
napi drivers which want to support threadable irq interrupts
must provide an irq mode change handler which actually set
napi->thread and register it after requesting the irq.

Signed-off-by: Paolo Abeni 
Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/netdevice.h |  4 
 net/core/dev.c| 59 +++
 2 files changed, 63 insertions(+)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d101e4d..5da53be 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -322,6 +322,9 @@ struct napi_struct {
struct list_headdev_list;
struct hlist_node   napi_hash_node;
unsigned intnapi_id;
+#ifdef CONFIG_IRQ_FORCED_THREADING
+   struct task_struct  *thread;
+#endif
 };
 
 enum {
@@ -330,6 +333,7 @@ enum {
NAPI_STATE_NPSVC,   /* Netpoll - don't dequeue from poll_list */
NAPI_STATE_HASHED,  /* In NAPI hash (busy polling possible) */
NAPI_STATE_NO_BUSY_POLL,/* Do not add in napi_hash, no busy polling */
+   NAPI_STATE_SCHED_THREAD, /* The poll thread is scheduled */
 };
 
 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index b148357..40ea1e7 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -93,6 +93,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -3453,10 +3454,68 @@ int netdev_tstamp_prequeue __read_mostly = 1;
 int netdev_budget __read_mostly = 300;
 int weight_p __read_mostly = 64;/* old backlog weight */
 
+#if CONFIG_IRQ_FORCED_THREADING
+static int napi_poll(struct napi_struct *n, struct list_head *repoll);
+
+static void napi_threaded_poll(struct napi_struct *napi)
+{
+   unsigned long time_limit = jiffies + 2;
+   struct list_head dummy_repoll;
+   int budget = netdev_budget;
+   bool again = true;
+
+   if (test_and_set_bit(NAPI_STATE_SCHED_THREAD, &napi->state))
+   return;
+
+   local_irq_enable();
+   INIT_LIST_HEAD(&dummy_repoll);
+
+   while (again) {
+   /* ensure that the poll list is not empty */
+   if (list_empty(&dummy_repoll))
+   list_add(&napi->poll_list, &dummy_repoll);
+
+   budget -= napi_poll(napi, &dummy_repoll);
+
+   if (napi_disable_pending(napi))
+   again = false;
+   else if (!test_bit(NAPI_STATE_SCHED, &napi->state))
+   again = false;
+   else if (kthread_should_stop())
+   again = false;
+
+   if (!again || unlikely(budget <= 0 ||
+  time_after_eq(jiffies, time_limit))) {
+   /* no need to reschedule if we are going to stop */
+   if (again)
+   cond_resched_softirq();
+   time_limit = jiffies + 2;
+   budget = netdev_budget;
+   rcu_bh_qs();
+   __kfree_skb_flush();
+   }
+   }
+
+   clear_bit(NAPI_STATE_SCHED_THREAD, &napi->state);
+   local_irq_disable();
+}
+
+static inline bool napi_is_threaded(struct napi_struct *napi)
+{
+   return current == napi->thread;
+}
+#else
+#define napi_is_threaded(napi) 0
+#endif
+
 /* Called with irq disabled */
 static inline void napi_schedule(struct softnet_data *sd,
 struct napi_struct *napi)
 {
+   if (napi_is_threaded(napi)) {
+   napi_threaded_poll(napi);
+   return;
+   }
list_add_tail(&napi->poll_list, &sd->poll_list);
__raise_softirq_irqoff(NET_RX_SOFTIRQ);
 }
-- 
1.8.3.1



[PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq

2016-06-15 Thread Paolo Abeni
Currently cond_resched_softirq() fails to reschedule if there
are pending softirq but no other running process. This happens
i.e. when receiving an interrupt with local bh disabled.

Reported-by: Eric Dumazet 
Signed-off-by: Paolo Abeni 
Signed-off-by: Hannes Frederic Sowa 
---
 kernel/sched/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 7f2cae4..788625f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4837,7 +4837,8 @@ int __sched __cond_resched_softirq(void)
 {
BUG_ON(!in_softirq());
 
-   if (should_resched(SOFTIRQ_DISABLE_OFFSET)) {
+   if (should_resched(SOFTIRQ_DISABLE_OFFSET) ||
+   local_softirq_pending()) {
local_bh_enable();
preempt_schedule_common();
local_bh_disable();
-- 
1.8.3.1



[PATCH 1/5] genirq: implement support for runtime switch to threaded irqs

2016-06-15 Thread Paolo Abeni
When the IRQ_FORCED_THREADING compile option is enabled, a new
new 'threaded' procfs entry is added under the action proc
directory upon irq request. Writing a true value onto
that file will cause the underlying action to be reconfigured
in a FORCE_THREADED mode.

The reconfiguration is performed disabling the irq underlaying
the current action, and then updating the action struct to the
specified mode, i.e. setting the thread field and the
IRQTF_FORCED_THREAD.

If en error occours before notifying the device, the
irq action is unmodified.

A device that wants to be notified about irq mode change,
can register a notifier with irq_set_mode_notifier(). Such
notifier will be invoked in atomic context just after each
irq reconfiguration.

Signed-off-by: Paolo Abeni 
Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/interrupt.h |  15 
 kernel/irq/internals.h|   3 +
 kernel/irq/manage.c   | 197 --
 kernel/irq/proc.c |  51 
 4 files changed, 261 insertions(+), 5 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 9fcabeb..85d3738 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -90,6 +90,7 @@ enum {
 };
 
 typedef irqreturn_t (*irq_handler_t)(int, void *);
+typedef void (*mode_notifier_t)(int, void *, struct task_struct *);
 
 /**
  * struct irqaction - per interrupt action descriptor
@@ -106,6 +107,8 @@ typedef irqreturn_t (*irq_handler_t)(int, void *);
  * @thread_flags:  flags related to @thread
  * @thread_mask:   bitmask for keeping track of @thread activity
  * @dir:   pointer to the proc/irq/NN/name entry
+ * @mode_notifier: callback to notify the device about irq mode change
+ * (threaded vs normal mode)
  */
 struct irqaction {
irq_handler_t   handler;
@@ -121,6 +124,7 @@ struct irqaction {
unsigned long   thread_mask;
const char  *name;
struct proc_dir_entry   *dir;
+   mode_notifier_t mode_notifier;
 } cacheline_internodealigned_in_smp;
 
 extern irqreturn_t no_action(int cpl, void *dev_id);
@@ -212,6 +216,17 @@ extern void irq_wake_thread(unsigned int irq, void 
*dev_id);
 extern void suspend_device_irqs(void);
 extern void resume_device_irqs(void);
 
+#ifdef CONFIG_IRQ_FORCED_THREADING
+extern int irq_set_mode_notifier(unsigned int irq, void *dev_id,
+mode_notifier_t notifier);
+#else
+static inline int
+irq_set_mode_notifier(unsigned int irq, void *dev_id, mode_notifier_t notifier)
+{
+   return 0;
+}
+#endif
+
 /**
  * struct irq_affinity_notify - context for notification of IRQ affinity 
changes
  * @irq:   Interrupt to which notification applies
diff --git a/kernel/irq/internals.h b/kernel/irq/internals.h
index 09be2c9..841c714 100644
--- a/kernel/irq/internals.h
+++ b/kernel/irq/internals.h
@@ -105,6 +105,9 @@ static inline void unregister_handler_proc(unsigned int irq,
   struct irqaction *action) { }
 #endif
 
+extern int irq_reconfigure(unsigned int irq, struct irqaction *act,
+  bool threaded);
+
 extern int irq_select_affinity_usr(unsigned int irq, struct cpumask *mask);
 
 extern void irq_set_thread_affinity(struct irq_desc *desc);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index ef0bc02..cce4efd 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -938,8 +938,7 @@ static int irq_thread(void *data)
irqreturn_t (*handler_fn)(struct irq_desc *desc,
struct irqaction *action);
 
-   if (force_irqthreads && test_bit(IRQTF_FORCED_THREAD,
-   &action->thread_flags))
+   if (test_bit(IRQTF_FORCED_THREAD, &action->thread_flags))
handler_fn = irq_forced_thread_fn;
else
handler_fn = irq_thread_fn;
@@ -1052,8 +1051,8 @@ static void irq_release_resources(struct irq_desc *desc)
c->irq_release_resources(d);
 }
 
-static int
-setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
+static struct task_struct *
+create_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
 {
struct task_struct *t;
struct sched_param param = {
@@ -1070,7 +1069,7 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, 
bool secondary)
}
 
if (IS_ERR(t))
-   return PTR_ERR(t);
+   return t;
 
sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
 
@@ -1080,6 +1079,17 @@ setup_irq_thread(struct irqaction *new, unsigned int 
irq, bool secondary)
 * references an already freed task_struct.
 */
get_task_struct(t);
+   return t;
+}
+
+static int
+setup_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
+{
+   struct task_struct *t = create_irq_thread(new, irq, secondary);
+
+   if (IS_E

[PATCH 2/5] genirq: add flags for controlling the default threaded irq behavior

2016-06-15 Thread Paolo Abeni
A threadable irq can benefit from irq_set_affinity when running
in non threaded mode and prefer running unbounded to cpu when in
threaded mode. Setting the IRQF_TH_NO_AFFINITY flag on irq
registration allow the irq to achieve both behaviors.

A long running threaded irq can starve the system if scheduled under
SCHED_FIFO. Setting the IRQF_TH_SCHED_NORMAL flag on irq will cause
the irq thread to run by default under the SCHED_NORMAL scheduler.

Signed-off-by: Paolo Abeni 
Signed-off-by: Hannes Frederic Sowa 
---
 include/linux/interrupt.h |  6 ++
 kernel/irq/manage.c   | 17 +++--
 2 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h
index 85d3738..33c3033 100644
--- a/include/linux/interrupt.h
+++ b/include/linux/interrupt.h
@@ -61,6 +61,10 @@
  *interrupt handler after suspending interrupts. For system
  *wakeup devices users need to implement wakeup detection in
  *their interrupt handlers.
+ * IRQF_TH_SCHED_NORMAL - If the IRQ is threaded, it will use SCHED_NORMAL,
+ *instead the default SCHED_FIFO scheduler
+ * IRQF_TH_NO_AFFINITY - If the IRQ is threaded, the affinity hint will not be
+ *enforced in the IRQ thread
  */
 #define IRQF_SHARED0x0080
 #define IRQF_PROBE_SHARED  0x0100
@@ -74,6 +78,8 @@
 #define IRQF_NO_THREAD 0x0001
 #define IRQF_EARLY_RESUME  0x0002
 #define IRQF_COND_SUSPEND  0x0004
+#define IRQF_TH_SCHED_NORMAL   0x0008
+#define IRQF_TH_NO_AFFINITY0x0010
 
 #define IRQF_TIMER (__IRQF_TIMER | IRQF_NO_SUSPEND | 
IRQF_NO_THREAD)
 
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index cce4efd..d695e12 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -1055,9 +1055,7 @@ static struct task_struct *
 create_irq_thread(struct irqaction *new, unsigned int irq, bool secondary)
 {
struct task_struct *t;
-   struct sched_param param = {
-   .sched_priority = MAX_USER_RT_PRIO/2,
-   };
+   struct sched_param param;
 
if (!secondary) {
t = kthread_create(irq_thread, new, "irq/%d-%s", irq,
@@ -1071,7 +1069,12 @@ create_irq_thread(struct irqaction *new, unsigned int 
irq, bool secondary)
if (IS_ERR(t))
return t;
 
-   sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
+   if (new->flags & IRQF_TH_SCHED_NORMAL) {
+   sched_setscheduler_nocheck(t, SCHED_NORMAL, ¶m);
+   } else {
+   param.sched_priority = MAX_USER_RT_PRIO/2;
+   sched_setscheduler_nocheck(t, SCHED_FIFO, ¶m);
+   }
 
/*
 * We keep the reference to the task struct even if
@@ -1100,7 +1103,8 @@ setup_irq_thread(struct irqaction *new, unsigned int irq, 
bool secondary)
 * correct as we want the thread to move to the cpu(s)
 * on which the requesting code placed the interrupt.
 */
-   set_bit(IRQTF_AFFINITY, &new->thread_flags);
+   if (!(new->flags & IRQF_TH_NO_AFFINITY))
+   set_bit(IRQTF_AFFINITY, &new->thread_flags);
return 0;
 }
 
@@ -1549,7 +1553,8 @@ void __irq_reconfigure_action(struct irq_desc *desc, 
struct irqaction *action,
 
action->thread = t;
set_bit(IRQTF_FORCED_THREAD, &action->thread_flags);
-   set_bit(IRQTF_AFFINITY, &action->thread_flags);
+   if (!(action->flags & IRQF_TH_NO_AFFINITY))
+   set_bit(IRQTF_AFFINITY, &action->thread_flags);
 
if (!(desc->irq_data.chip->flags & IRQCHIP_ONESHOT_SAFE)) {
/*
-- 
1.8.3.1



[PATCH 5/5] ixgbe: add support for threadable rx irq

2016-06-15 Thread Paolo Abeni
Plug-in the threadable irq infrastructure to allow run-time
configuration of rx irqs, when msix irqs are used.

Signed-off-by: Paolo Abeni 
Signed-off-by: Hannes Frederic Sowa 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 088c47c..d9a591c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2890,6 +2890,14 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
return 0;
 }
 
+static void ixgbe_irq_mode_notifier(int irq, void *data,
+   struct task_struct *irq_thread)
+{
+   struct ixgbe_q_vector *q_vector = (struct ixgbe_q_vector *)data;
+
+   q_vector->napi.thread = irq_thread;
+}
+
 /**
  * ixgbe_request_msix_irqs - Initialize MSI-X interrupts
  * @adapter: board private structure
@@ -2921,8 +2929,12 @@ static int ixgbe_request_msix_irqs(struct ixgbe_adapter 
*adapter)
/* skip this unused q_vector */
continue;
}
-   err = request_irq(entry->vector, &ixgbe_msix_clean_rings, 0,
+   err = request_irq(entry->vector, &ixgbe_msix_clean_rings,
+ IRQF_TH_NO_AFFINITY | IRQF_TH_SCHED_NORMAL,
  q_vector->name, q_vector);
+   if (!err)
+   err = irq_set_mode_notifier(entry->vector, q_vector,
+   ixgbe_irq_mode_notifier);
if (err) {
e_err(probe, "request_irq failed for MSIX interrupt "
  "Error: %d\n", err);
-- 
1.8.3.1



[PATCH 1/1] ixgbe: add fiber tranceiver plug/unplug notifier

2016-06-15 Thread zyjzyj2000
From: Zhu Yanjun 

When the fiber tranceiver is plugged/unplugged, a netdev notifier is
 sent. The userspace tools or kernel can receive this notifier.

Signed-off-by: Zhu Yanjun 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   24 +++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_type.h |2 ++
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 088c47c..1d8c1ff 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -5635,6 +5635,8 @@ static int ixgbe_sw_init(struct ixgbe_adapter *adapter)
hw->revision_id = pdev->revision;
hw->subsystem_vendor_id = pdev->subsystem_vendor;
hw->subsystem_device_id = pdev->subsystem_device;
+   hw->last_tranceiver_status = IXGBE_NOT_IMPLEMENTED;
+   hw->tranceiver_polltime = 0;
 
/* Set common capability flags and settings */
rss = min_t(int, ixgbe_max_rss_indices(adapter), num_online_cpus());
@@ -7067,7 +7069,27 @@ static void ixgbe_watchdog_subtask(struct ixgbe_adapter 
*adapter)
 static void ixgbe_sfp_detection_subtask(struct ixgbe_adapter *adapter)
 {
struct ixgbe_hw *hw = &adapter->hw;
-   s32 err;
+   s32 err, status;
+
+   if ((hw->mac.ops.get_media_type(hw) == ixgbe_media_type_fiber) &&
+   time_after(jiffies, hw->tranceiver_polltime)) {
+   status = IXGBE_READ_REG(hw, IXGBE_ESDP) & IXGBE_ESDP_SDP2;
+   if (status != hw->last_tranceiver_status) {
+   unsigned long val;
+
+   if (!status) {
+   hw->phy.sfp_type = ixgbe_sfp_type_not_present;
+   val = NETDEV_FIBER_TRANCEIVER_UNPLUG;
+   } else {
+   val = NETDEV_FIBER_TRANCEIVER_PLUG;
+   }
+   rtnl_lock();
+   call_netdevice_notifiers(val, adapter->netdev);
+   rtnl_unlock();
+   }
+   hw->last_tranceiver_status = status;
+   hw->tranceiver_polltime = jiffies + 3 * HZ;
+   }
 
/* If crosstalk fix enabled verify the SFP+ cage is full */
if (adapter->need_crosstalk_fix) {
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
index da3d835..fe19899 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_type.h
@@ -3525,6 +3525,8 @@ struct ixgbe_hw {
boolforce_full_reset;
boolallow_unsupported_sfp;
boolwol_enabled;
+   s32 last_tranceiver_status;
+   unsigned long   tranceiver_polltime;
 };
 
 struct ixgbe_info {
-- 
1.7.9.5



[PATCH net-next 1/1] ixgbe: add fiber tranceiver plug/unplug notifier

2016-06-15 Thread zyjzyj2000

Hi, Don

The latest patch is based on your suggestion.
The changes are as below:

1. Replace IXGBE_IDENTIFIER with IXGBE_ESDP;
2. Replace plug interrupt with poll;
3. Indicate the NICs that does not support to plug/unplug tranceiver as plugged;

The patch can work well with the following steps:

 1. boot the host
 2. ip link set eth0 up
 3. unplug the fiber tranceiver
 4. a message NETDEV_FIBER_TRANCEIVER_UNPLUG is sent
 5. plug the fiber tranceiver
 6. a notifier NETDEV_FIBER_TRANCEIVER_PLUG is sent

Any reply is appreciated.

Zhu Yanun


Re: [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq

2016-06-15 Thread Peter Zijlstra
On Wed, Jun 15, 2016 at 03:42:04PM +0200, Paolo Abeni wrote:
> Currently cond_resched_softirq() fails to reschedule if there
> are pending softirq but no other running process. This happens
> i.e. when receiving an interrupt with local bh disabled.
> 
> Reported-by: Eric Dumazet 
> Signed-off-by: Paolo Abeni 
> Signed-off-by: Hannes Frederic Sowa 

All your patches appear to have this broken SoB chain.

As presented it suggests you wrote the patches, which matches with From,
however it then suggests Hannes collected and send them onwards, not so
much.

Please correct.


Re: [PATCH 14/15] drivers: net: davinci_mdio: enable pm runtime auto for ti cpsw-mdio

2016-06-15 Thread kbuild test robot
Hi,

[auto build test ERROR on net-next/master]
[also build test ERROR on v4.7-rc3 next-20160615]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Grygorii-Strashko/drivers-net-cpsw-improve-runtime-pm/20160615-200750
config: arm-multi_v7_defconfig (attached as .config)
compiler: arm-linux-gnueabi-gcc (Debian 5.3.1-8) 5.3.1 20160205
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=arm 

All errors (new ones prefixed by >>):

   drivers/net/ethernet/ti/davinci_mdio.c: In function 'davinci_mdio_probe':
>> drivers/net/ethernet/ti/davinci_mdio.c:375:31: error: passing argument 1 of 
>> 'davinci_mdio_probe_dt' from incompatible pointer type 
>> [-Werror=incompatible-pointer-types]
  ret = davinci_mdio_probe_dt(data, pdev);
  ^
   drivers/net/ethernet/ti/davinci_mdio.c:320:12: note: expected 'struct 
mdio_platform_data *' but argument is of type 'struct davinci_mdio_data *'
static int davinci_mdio_probe_dt(struct mdio_platform_data *data,
   ^
   cc1: some warnings being treated as errors

vim +/davinci_mdio_probe_dt +375 drivers/net/ethernet/ti/davinci_mdio.c

   369  return -ENOMEM;
   370  }
   371  
   372  if (dev->of_node) {
   373  const struct of_device_id   *of_id;
   374  
 > 375  ret = davinci_mdio_probe_dt(data, pdev);
   376  if (ret)
   377  return ret;
   378  snprintf(data->bus->id, MII_BUS_ID_SIZE, "%s", 
pdev->name);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 3/5] sched/preempt: cond_resched_softirq() must check for softirq

2016-06-15 Thread Paolo Abeni
On Wed, 2016-06-15 at 15:48 +0200, Peter Zijlstra wrote:
> On Wed, Jun 15, 2016 at 03:42:04PM +0200, Paolo Abeni wrote:
> > Currently cond_resched_softirq() fails to reschedule if there
> > are pending softirq but no other running process. This happens
> > i.e. when receiving an interrupt with local bh disabled.
> > 
> > Reported-by: Eric Dumazet 
> > Signed-off-by: Paolo Abeni 
> > Signed-off-by: Hannes Frederic Sowa 
> 
> All your patches appear to have this broken SoB chain.
> 
> As presented it suggests you wrote the patches, which matches with From,
> however it then suggests Hannes collected and send them onwards, not so
> much.
> 
> Please correct.

My bad. I'll re-submit. The intention was to specify this is joint work
done together with Hannes.

Paolo




Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq

2016-06-15 Thread kbuild test robot
Hi,

[auto build test ERROR on tip/irq/core]
[also build test ERROR on v4.7-rc3 next-20160615]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/genirq-threadable-IRQ-support/20160615-214836
config: cris-etrax-100lx_v2_defconfig (attached as .config)
compiler: cris-linux-gcc (GCC) 4.6.3
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=cris 

All error/warnings (new ones prefixed by >>):

>> net/core/dev.c:3457:5: warning: "CONFIG_IRQ_FORCED_THREADING" is not defined 
>> [-Wundef]
   net/core/dev.c: In function 'napi_schedule':
>> net/core/dev.c:3516:3: error: implicit declaration of function 
>> 'napi_threaded_poll' [-Werror=implicit-function-declaration]
   cc1: some warnings being treated as errors

vim +/napi_threaded_poll +3516 net/core/dev.c

  3451  EXPORT_SYMBOL(netdev_max_backlog);
  3452  
  3453  int netdev_tstamp_prequeue __read_mostly = 1;
  3454  int netdev_budget __read_mostly = 300;
  3455  int weight_p __read_mostly = 64;/* old backlog weight */
  3456  
> 3457  #if CONFIG_IRQ_FORCED_THREADING
  3458  static int napi_poll(struct napi_struct *n, struct list_head *repoll);
  3459  
  3460  static void napi_threaded_poll(struct napi_struct *napi)
  3461  {
  3462  unsigned long time_limit = jiffies + 2;
  3463  struct list_head dummy_repoll;
  3464  int budget = netdev_budget;
  3465  bool again = true;
  3466  
  3467  if (test_and_set_bit(NAPI_STATE_SCHED_THREAD, &napi->state))
  3468  return;
  3469  
  3470  local_irq_enable();
  3471  INIT_LIST_HEAD(&dummy_repoll);
  3472  
  3473  while (again) {
  3474  /* ensure that the poll list is not empty */
  3475  if (list_empty(&dummy_repoll))
  3476  list_add(&napi->poll_list, &dummy_repoll);
  3477  
  3478  budget -= napi_poll(napi, &dummy_repoll);
  3479  
  3480  if (napi_disable_pending(napi))
  3481  again = false;
  3482  else if (!test_bit(NAPI_STATE_SCHED, &napi->state))
  3483  again = false;
  3484  else if (kthread_should_stop())
  3485  again = false;
  3486  
  3487  if (!again || unlikely(budget <= 0 ||
  3488 time_after_eq(jiffies, 
time_limit))) {
  3489  /* no need to reschedule if we are going to 
stop */
  3490  if (again)
  3491  cond_resched_softirq();
  3492  time_limit = jiffies + 2;
  3493  budget = netdev_budget;
  3494  rcu_bh_qs();
  3495  __kfree_skb_flush();
  3496  }
  3497  }
  3498  
  3499  clear_bit(NAPI_STATE_SCHED_THREAD, &napi->state);
  3500  local_irq_disable();
  3501  }
  3502  
  3503  static inline bool napi_is_threaded(struct napi_struct *napi)
  3504  {
  3505  return current == napi->thread;
  3506  }
  3507  #else
  3508  #define napi_is_threaded(napi) 0
  3509  #endif
  3510  
  3511  /* Called with irq disabled */
  3512  static inline void napi_schedule(struct softnet_data *sd,
  3513   struct napi_struct *napi)
  3514  {
  3515  if (napi_is_threaded(napi)) {
> 3516  napi_threaded_poll(napi);
  3517  return;
  3518  }
  3519  list_add_tail(&napi->poll_list, &sd->poll_list);

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq

2016-06-15 Thread Eric Dumazet
On Wed, Jun 15, 2016 at 6:42 AM, Paolo Abeni  wrote:
> This commit adds the infrastructure needed for threadable
> rx interrupt. A reference to the irq thread is used to
> mark the threaded irq mode.
> In threaded mode the poll loop is invoked directly from
> __napi_schedule().
> napi drivers which want to support threadable irq interrupts
> must provide an irq mode change handler which actually set
> napi->thread and register it after requesting the irq.
>
> Signed-off-by: Paolo Abeni 
> Signed-off-by: Hannes Frederic Sowa 
> ---
>  include/linux/netdevice.h |  4 
>  net/core/dev.c| 59 
> +++
>  2 files changed, 63 insertions(+)
>

I really appreciate the effort, but as I already said this is not going to work.

Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.

Relying on CFS to switch from the two 'threads' you need in the one
vCPU case will add latencies that your 'pure throughput UDP flood' is
not able to detect.

I was waiting a fix from Andy Lutomirski to be merged before sending
my ksoftirqd fix, which will work and wont bring kernel bloat.


Re: [PATCH 4/5] netdev: implement infrastructure for threadable napi irq

2016-06-15 Thread Eric Dumazet
On Wed, Jun 15, 2016 at 7:17 AM, Eric Dumazet  wrote:
> On Wed, Jun 15, 2016 at 6:42 AM, Paolo Abeni  wrote:
>> This commit adds the infrastructure needed for threadable
>> rx interrupt. A reference to the irq thread is used to
>> mark the threaded irq mode.
>> In threaded mode the poll loop is invoked directly from
>> __napi_schedule().
>> napi drivers which want to support threadable irq interrupts
>> must provide an irq mode change handler which actually set
>> napi->thread and register it after requesting the irq.
>>
>> Signed-off-by: Paolo Abeni 
>> Signed-off-by: Hannes Frederic Sowa 
>> ---
>>  include/linux/netdevice.h |  4 
>>  net/core/dev.c| 59 
>> +++
>>  2 files changed, 63 insertions(+)
>>
>
> I really appreciate the effort, but as I already said this is not going to 
> work.
>
> Many NIC have 2 NAPI contexts per queue, one for TX, one for RX.
>
> Relying on CFS to switch from the two 'threads' you need in the one
> vCPU case will add latencies that your 'pure throughput UDP flood' is
> not able to detect.
>
> I was waiting a fix from Andy Lutomirski to be merged before sending
> my ksoftirqd fix, which will work and wont bring kernel bloat.


Andy's patch was"x86/traps: Don't force in_interrupt() to return true
in IST handlers"


Re: padata - is serial actually serial?

2016-06-15 Thread Gary R Hook

On 06/15/2016 06:52 AM, Steffen Klassert wrote:

Hi Jason.

On Tue, Jun 14, 2016 at 11:00:54PM +0200, Jason A. Donenfeld wrote:

Hi Steffen & Folks,

I submit a job to padata_do_parallel(). When the parallel() function
triggers, I do some things, and then call padata_do_serial(). Finally
the serial() function triggers, where I complete the job (check a
nonce, etc).

The padata API is very appealing because not only does it allow for
parallel computation, but it claims that the serial() functions will
execute in the order that jobs were originally submitted to
padata_do_parallel().

Unfortunately, in practice, I'm pretty sure I'm seeing deviations from
this. When I submit tons and tons of tasks at rapid speed to
padata_do_parallel(), it seems like the serial() function isn't being
called in the exactly the same order that tasks were submitted to
padata_do_parallel().

Is this known (expected) behavior? Or have I stumbled upon a potential
bug that's worthwhile for me to investigate more?


It should return in the same order as the job were submitted,
given that the submitting cpu and the callback cpu are fixed
for all the jobs you want to preserve the order.  If you submit
jobs from more than one cpu, we can not know in which order
they are enqueued. The cpu that gets the lock as the first
has its job in front.


Isn't there an element of indeterminacy at the application thread level
(i.e. user space) too? We don't know how the jobs are being submitted, but
unless that is being handled by a single thread in a single process, I
think all bets are off with respect to ordering.

Then again, perhaps I'm not grokking the details here.


Same if you use more than one callback cpu
we can't know in which order they are dequeued, because the
serial workers are scheduled independent on each cpu.


Re: [PATCH 1/5] genirq: implement support for runtime switch to threaded irqs

2016-06-15 Thread kbuild test robot
Hi,

[auto build test WARNING on tip/irq/core]
[also build test WARNING on v4.7-rc3 next-20160615]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/genirq-threadable-IRQ-support/20160615-214836
reproduce: make htmldocs

All warnings (new ones prefixed by >>):

>> kernel/irq/manage.c:1681: warning: No description found for parameter 
>> 'notifier'
>> kernel/irq/manage.c:1681: warning: Excess function parameter 'mode_notifier' 
>> description in 'irq_set_mode_notifier'
   kernel/irq/handle.c:1: warning: no structured comments found
--
   lib/crc32.c:148: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:148: warning: Excess function parameter 'tab' description in 
'crc32_le_generic'
   lib/crc32.c:293: warning: No description found for parameter 'tab)[256]'
   lib/crc32.c:293: warning: Excess function parameter 'tab' description in 
'crc32_be_generic'
   lib/crc32.c:1: warning: no structured comments found
   mm/memory.c:2881: warning: No description found for parameter 'old'
>> kernel/irq/manage.c:1681: warning: No description found for parameter 
>> 'notifier'
>> kernel/irq/manage.c:1681: warning: Excess function parameter 'mode_notifier' 
>> description in 'irq_set_mode_notifier'

vim +/notifier +1681 kernel/irq/manage.c

  1665  /**
  1666   *  irq_set_mode_notifier - register a mode change notifier
  1667   *  @irq: Interrupt line
  1668   *  @dev_id: The cookie used to identify the irq handler and passed 
back
  1669   *   to the notifier
  1670   *  @mode_notifier: The callback to be registered
  1671   *
  1672   *  This call registers a callback to notify the device about irq 
mode
  1673   *  change (threaded/normal mode). Mode change are triggered 
writing on
  1674   *  the 'threaded' procfs entry.
  1675   *  When running in threaded mode the irq thread task struct will 
be passed
  1676   *  to the notifer, or NULL elsewhere. It's up to the device update 
its
  1677   *  internal state accordingly
  1678   */
  1679  int irq_set_mode_notifier(unsigned int irq, void *dev_id,
  1680mode_notifier_t notifier)
> 1681  {
  1682  struct irq_desc *desc = irq_to_desc(irq);
  1683  struct irqaction *action;
  1684  unsigned long flags;
  1685  int ret = -EINVAL;
  1686  
  1687  if (!desc)
  1688  return ret;
  1689  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH] netfilter/nflog: nflog-range does not truncate packets

2016-06-15 Thread Vishwanath Pai
On 06/15/2016 08:39 AM, Pablo Neira Ayuso wrote:
> But nlmsg_len should match len in this.
> 
> If we're just sending a part of the packet to userspace, then we
> should adjust nlmsg_len to indicate exactly the netlink message length
> that we're sending to userspace.
> 
> Is your patch triggering this nlmsg_len != len?

The value of len here is how many bytes were returned by recv. We do
send the entire nlmsg_len to userspace, but recv cannot copy the full
packet because the buffer is not big enough to hold this. They only
allocate the buffer assuming that the packet won't be bigger than their
snap len, but we send more data than their snap len and they don't
handle this condition well.


[PATCH 3/4] net-next: mediatek: add IRQ locking

2016-06-15 Thread John Crispin
The code that enables and disables IRQs is missing proper locking. After
adding the IRQ grouping patch and routing the RX and TX IRQs to different
cores we experienced IRQ stalls. Fix this by adding proper locking.
We use a dedicated lock to reduce the latency if the IRQ code.

Signed-off-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |7 +++
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |1 +
 2 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index cc38eae..c7da00b 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -300,18 +300,24 @@ static void mtk_mdio_cleanup(struct mtk_eth *eth)
 
 static inline void mtk_irq_disable(struct mtk_eth *eth, u32 mask)
 {
+   unsigned long flags;
u32 val;
 
+   spin_lock_irqsave(ð->irq_lock, flags);
val = mtk_r32(eth, MTK_QDMA_INT_MASK);
mtk_w32(eth, val & ~mask, MTK_QDMA_INT_MASK);
+   spin_unlock_irqrestore(ð->irq_lock, flags);
 }
 
 static inline void mtk_irq_enable(struct mtk_eth *eth, u32 mask)
 {
+   unsigned long flags;
u32 val;
 
+   spin_lock_irqsave(ð->irq_lock, flags);
val = mtk_r32(eth, MTK_QDMA_INT_MASK);
mtk_w32(eth, val | mask, MTK_QDMA_INT_MASK);
+   spin_unlock_irqrestore(ð->irq_lock, flags);
 }
 
 static int mtk_set_mac_address(struct net_device *dev, void *p)
@@ -1739,6 +1745,7 @@ static int mtk_probe(struct platform_device *pdev)
return PTR_ERR(eth->base);
 
spin_lock_init(ð->page_lock);
+   spin_lock_init(ð->irq_lock);
 
eth->ethsys = syscon_regmap_lookup_by_phandle(pdev->dev.of_node,
  "mediatek,ethsys");
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
index a5eb7c6..3159d2a 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h
@@ -373,6 +373,7 @@ struct mtk_eth {
void __iomem*base;
struct reset_control*rstc;
spinlock_t  page_lock;
+   spinlock_t  irq_lock;
struct net_device   dummy_dev;
struct net_device   *netdev[MTK_MAX_DEVS];
struct mtk_mac  *mac[MTK_MAX_DEVS];
-- 
1.7.10.4



[PATCH 0/4] net-next: mediatek: IRQ cleanups, fixes and grouping

2016-06-15 Thread John Crispin
This series contains 2 small code cleanups that are leftovers from the
MIPS support. There is also a small fix that adds proper locking to the
code accessing the IRQ registers. Without this fix we saw deadlocks caused
by the last patch of the series, which adds IRQ grouping. The grouping
feature allows us to use different IRQs for TX and RX. By doing so we can
use affinity to let the SoC handle the IRQs on different cores.

John Crispin (4):
  net-next: mediatek: remove superfluous register reads
  net-next: mediatek: don't use intermediate variables to store IRQ
masks
  net-next: mediatek: add IRQ locking
  net-next: mediatek: add support for IRQ grouping

 drivers/net/ethernet/mediatek/mtk_eth_soc.c |  175 +--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |   16 ++-
 2 files changed, 122 insertions(+), 69 deletions(-)

-- 
1.7.10.4



[PATCH 1/4] net-next: mediatek: remove superfluous register reads

2016-06-15 Thread John Crispin
The driver was originally written for MIPS based SoC. These required the
IRQ mask register to be read after writing it to ensure that the content
was actually applied. As this version only works on ARM based SoCs, we can
safely remove the 2 reads as they are no longer required.

Signed-off-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index ebfca7d..b3032f4 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -304,8 +304,6 @@ static inline void mtk_irq_disable(struct mtk_eth *eth, u32 
mask)
 
val = mtk_r32(eth, MTK_QDMA_INT_MASK);
mtk_w32(eth, val & ~mask, MTK_QDMA_INT_MASK);
-   /* flush write */
-   mtk_r32(eth, MTK_QDMA_INT_MASK);
 }
 
 static inline void mtk_irq_enable(struct mtk_eth *eth, u32 mask)
@@ -314,8 +312,6 @@ static inline void mtk_irq_enable(struct mtk_eth *eth, u32 
mask)
 
val = mtk_r32(eth, MTK_QDMA_INT_MASK);
mtk_w32(eth, val | mask, MTK_QDMA_INT_MASK);
-   /* flush write */
-   mtk_r32(eth, MTK_QDMA_INT_MASK);
 }
 
 static int mtk_set_mac_address(struct net_device *dev, void *p)
-- 
1.7.10.4



[PATCH 4/4] net-next: mediatek: add support for IRQ grouping

2016-06-15 Thread John Crispin
The ethernet core has 3 IRQs. Using the IRQ grouping registers we are able
to separate TX and RX IRQs, which allows us to service them on separate
cores. This patch splits the IRQ handler into 2 separate functions, one for
TX and another for RX. The TX housekeeping is split out into its own NAPI
handler.

Signed-off-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |  156 +--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |   15 ++-
 2 files changed, 111 insertions(+), 60 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index c7da00b..79fdb07 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -874,14 +874,13 @@ release_desc:
return done;
 }
 
-static int mtk_poll_tx(struct mtk_eth *eth, int budget, bool *tx_again)
+static int mtk_poll_tx(struct mtk_eth *eth, int budget)
 {
struct mtk_tx_ring *ring = ð->tx_ring;
struct mtk_tx_dma *desc;
struct sk_buff *skb;
struct mtk_tx_buf *tx_buf;
-   int total = 0, done = 0;
-   unsigned int bytes = 0;
+   unsigned int bytes = 0, done = 0;
u32 cpu, dma;
static int condition;
int i;
@@ -933,63 +932,82 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget, 
bool *tx_again)
netdev_completed_queue(eth->netdev[i], done, bytes);
}
 
-   /* read hw index again make sure no new tx packet */
-   if (cpu != dma || cpu != mtk_r32(eth, MTK_QTX_DRX_PTR))
-   *tx_again = true;
-   else
-   mtk_w32(eth, MTK_TX_DONE_INT, MTK_QMTK_INT_STATUS);
-
-   if (!total)
-   return 0;
-
if (mtk_queue_stopped(eth) &&
(atomic_read(&ring->free_count) > ring->thresh))
mtk_wake_queue(eth);
 
-   return total;
+   return done;
 }
 
-static int mtk_poll(struct napi_struct *napi, int budget)
+static void mtk_handle_status_irq(struct mtk_eth *eth)
 {
-   struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
-   u32 status, status2, mask;
-   int tx_done, rx_done;
-   bool tx_again = false;
-
-   status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
-   status2 = mtk_r32(eth, MTK_INT_STATUS2);
-   tx_done = 0;
-   rx_done = 0;
-   tx_again = 0;
-
-   if (status & MTK_TX_DONE_INT)
-   tx_done = mtk_poll_tx(eth, budget, &tx_again);
-
-   if (status & MTK_RX_DONE_INT)
-   rx_done = mtk_poll_rx(napi, budget, eth);
+   u32 status2 = mtk_r32(eth, MTK_INT_STATUS2);
 
if (unlikely(status2 & (MTK_GDM1_AF | MTK_GDM2_AF))) {
mtk_stats_update(eth);
mtk_w32(eth, (MTK_GDM1_AF | MTK_GDM2_AF),
MTK_INT_STATUS2);
}
+}
+
+static int mtk_napi_tx(struct napi_struct *napi, int budget)
+{
+   struct mtk_eth *eth = container_of(napi, struct mtk_eth, tx_napi);
+   u32 status, mask;
+   int tx_done = 0;
+
+   mtk_handle_status_irq(eth);
+   mtk_w32(eth, MTK_TX_DONE_INT, MTK_QMTK_INT_STATUS);
+   tx_done = mtk_poll_tx(eth, budget);
+
+   if (unlikely(netif_msg_intr(eth))) {
+   status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
+   mask = mtk_r32(eth, MTK_QDMA_INT_MASK);
+   dev_info(eth->dev,
+"done tx %d, intr 0x%08x/0x%x\n",
+tx_done, status, mask);
+   }
+
+   if (tx_done == budget)
+   return budget;
+
+   status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
+   if (status & MTK_TX_DONE_INT)
+   return budget;
+
+   napi_complete(napi);
+   mtk_irq_enable(eth, MTK_TX_DONE_INT);
+
+   return tx_done;
+}
+
+static int mtk_napi_rx(struct napi_struct *napi, int budget)
+{
+   struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
+   u32 status, mask;
+   int rx_done = 0;
+
+   mtk_handle_status_irq(eth);
+   mtk_w32(eth, MTK_RX_DONE_INT, MTK_QMTK_INT_STATUS);
+   rx_done = mtk_poll_rx(napi, budget, eth);
 
if (unlikely(netif_msg_intr(eth))) {
+   status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
mask = mtk_r32(eth, MTK_QDMA_INT_MASK);
-   netdev_info(eth->netdev[0],
-   "done tx %d, rx %d, intr 0x%08x/0x%x\n",
-   tx_done, rx_done, status, mask);
+   dev_info(eth->dev,
+"done rx %d, intr 0x%08x/0x%x\n",
+rx_done, status, mask);
}
 
-   if (tx_again || rx_done == budget)
+   if (rx_done == budget)
return budget;
 
status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
-   if (status & (tx_intr | rx_intr))
+   if (status & MTK_RX_DONE_INT)
return budget;
 
napi_complete(napi);
-   mtk_irq_enable(eth, MTK_RX_DONE_INT | MTK_RX_DONE_INT);
+

[PATCH 2/4] net-next: mediatek: don't use intermediate variables to store IRQ masks

2016-06-15 Thread John Crispin
The code currently uses variables to store and never modify the bit masks
of interrupts. This is legacy code from an early version of the driver
that supported MIPS based SoCs where the IRQ bits depended on the actual
SoC. As the bits are the same for all ARM based SoCs using this driver we
can remove the intermediate variables.

Signed-off-by: John Crispin 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c |   22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index b3032f4..cc38eae 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -775,7 +775,7 @@ drop:
 }
 
 static int mtk_poll_rx(struct napi_struct *napi, int budget,
-  struct mtk_eth *eth, u32 rx_intr)
+  struct mtk_eth *eth)
 {
struct mtk_rx_ring *ring = ð->rx_ring;
int idx = ring->calc_idx;
@@ -863,7 +863,7 @@ release_desc:
}
 
if (done < budget)
-   mtk_w32(eth, rx_intr, MTK_QMTK_INT_STATUS);
+   mtk_w32(eth, MTK_RX_DONE_INT, MTK_QMTK_INT_STATUS);
 
return done;
 }
@@ -946,28 +946,26 @@ static int mtk_poll_tx(struct mtk_eth *eth, int budget, 
bool *tx_again)
 static int mtk_poll(struct napi_struct *napi, int budget)
 {
struct mtk_eth *eth = container_of(napi, struct mtk_eth, rx_napi);
-   u32 status, status2, mask, tx_intr, rx_intr, status_intr;
+   u32 status, status2, mask;
int tx_done, rx_done;
bool tx_again = false;
 
status = mtk_r32(eth, MTK_QMTK_INT_STATUS);
status2 = mtk_r32(eth, MTK_INT_STATUS2);
-   tx_intr = MTK_TX_DONE_INT;
-   rx_intr = MTK_RX_DONE_INT;
-   status_intr = (MTK_GDM1_AF | MTK_GDM2_AF);
tx_done = 0;
rx_done = 0;
tx_again = 0;
 
-   if (status & tx_intr)
+   if (status & MTK_TX_DONE_INT)
tx_done = mtk_poll_tx(eth, budget, &tx_again);
 
-   if (status & rx_intr)
-   rx_done = mtk_poll_rx(napi, budget, eth, rx_intr);
+   if (status & MTK_RX_DONE_INT)
+   rx_done = mtk_poll_rx(napi, budget, eth);
 
-   if (unlikely(status2 & status_intr)) {
+   if (unlikely(status2 & (MTK_GDM1_AF | MTK_GDM2_AF))) {
mtk_stats_update(eth);
-   mtk_w32(eth, status_intr, MTK_INT_STATUS2);
+   mtk_w32(eth, (MTK_GDM1_AF | MTK_GDM2_AF),
+   MTK_INT_STATUS2);
}
 
if (unlikely(netif_msg_intr(eth))) {
@@ -985,7 +983,7 @@ static int mtk_poll(struct napi_struct *napi, int budget)
return budget;
 
napi_complete(napi);
-   mtk_irq_enable(eth, tx_intr | rx_intr);
+   mtk_irq_enable(eth, MTK_RX_DONE_INT | MTK_RX_DONE_INT);
 
return rx_done;
 }
-- 
1.7.10.4



Re: [PATCHv3 net-next 04/12] ndisc: add __ndisc_opt_addr_space function

2016-06-15 Thread Stefan Schmidt

Hello.

On 14/06/16 13:52, Alexander Aring wrote:

This patch adds __ndisc_opt_addr_space as low-level function for
ndisc_opt_addr_space which doesn't depend on net_device parameter.

Cc: David S. Miller
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Alexander Aring
---
  include/net/ndisc.h | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 2d8edaa..4cee826 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -127,10 +127,15 @@ static inline int ndisc_addr_option_pad(unsigned short 
type)
}
  }
  
+static inline int __ndisc_opt_addr_space(unsigned char addr_len, int pad)

+{
+   return NDISC_OPT_SPACE(addr_len + pad);
+}
+
  static inline int ndisc_opt_addr_space(struct net_device *dev)
  {
-   return NDISC_OPT_SPACE(dev->addr_len +
-  ndisc_addr_option_pad(dev->type));
+   return __ndisc_opt_addr_space(dev->addr_len,
+ ndisc_addr_option_pad(dev->type));
  }
  
  static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,


Reviewed-by: Stefan Schmidt

regards
Stefan Schmidt


Re: [PATCHv3 net-next 05/12] ndisc: add __ndisc_opt_addr_data function

2016-06-15 Thread Stefan Schmidt

Hello.

On 14/06/16 13:52, Alexander Aring wrote:

This patch adds __ndisc_opt_addr_data as low-level function for
ndisc_opt_addr_data which doesn't depend on net_device parameter.

Cc: David S. Miller
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Alexander Aring
---
  include/net/ndisc.h | 14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index 4cee826..c8962ad 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -138,17 +138,23 @@ static inline int ndisc_opt_addr_space(struct net_device 
*dev)
  ndisc_addr_option_pad(dev->type));
  }
  
-static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,

- struct net_device *dev)
+static inline u8 *__ndisc_opt_addr_data(struct nd_opt_hdr *p,
+   unsigned char addr_len, int prepad)
  {
u8 *lladdr = (u8 *)(p + 1);
int lladdrlen = p->nd_opt_len << 3;
-   int prepad = ndisc_addr_option_pad(dev->type);
-   if (lladdrlen != ndisc_opt_addr_space(dev))
+   if (lladdrlen != __ndisc_opt_addr_space(addr_len, prepad))
return NULL;
return lladdr + prepad;
  }
  
+static inline u8 *ndisc_opt_addr_data(struct nd_opt_hdr *p,

+ struct net_device *dev)
+{
+   return __ndisc_opt_addr_data(p, dev->addr_len,
+ndisc_addr_option_pad(dev->type));
+}
+
  static inline u32 ndisc_hashfn(const void *pkey, const struct net_device 
*dev, __u32 *hash_rnd)
  {
const u32 *p32 = pkey;


Reviewed-by: Stefan Schmidt

regards
Stefan Schmidt


Re: [PATCHv3 net-next 06/12] ndisc: add __ndisc_fill_addr_option function

2016-06-15 Thread Stefan Schmidt

Hello.

On 14/06/16 13:52, Alexander Aring wrote:

This patch adds __ndisc_fill_addr_option as low-level function for
ndisc_fill_addr_option which doesn't depend on net_device parameter.

Cc: David S. Miller
Cc: Alexey Kuznetsov
Cc: James Morris
Cc: Hideaki YOSHIFUJI
Cc: Patrick McHardy
Signed-off-by: Alexander Aring
---
  net/ipv6/ndisc.c | 14 ++
  1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index c245895..a7b9468 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -150,11 +150,10 @@ struct neigh_table nd_tbl = {
  };
  EXPORT_SYMBOL_GPL(nd_tbl);
  
-static void ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data)

+static void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data,
+int data_len, int pad)
  {
-   int pad   = ndisc_addr_option_pad(skb->dev->type);
-   int data_len = skb->dev->addr_len;
-   int space = ndisc_opt_addr_space(skb->dev);
+   int space = __ndisc_opt_addr_space(data_len, pad);
u8 *opt = skb_put(skb, space);
  
  	opt[0] = type;

@@ -172,6 +171,13 @@ static void ndisc_fill_addr_option(struct sk_buff *skb, 
int type, void *data)
memset(opt, 0, space);
  }
  
+static inline void ndisc_fill_addr_option(struct sk_buff *skb, int type,

+ void *data)
+{
+   __ndisc_fill_addr_option(skb, type, data, skb->dev->addr_len,
+ndisc_addr_option_pad(skb->dev->type));
+}
+
  static struct nd_opt_hdr *ndisc_next_option(struct nd_opt_hdr *cur,
struct nd_opt_hdr *end)
  {


Reviewed-by: Stefan Schmidt

regards
Stefan Schmidt


Re: [PATCHv3 net-next 09/12] ipv6: export several functions

2016-06-15 Thread Stefan Schmidt

Hello.

On 14/06/16 13:52, Alexander Aring wrote:

This patch exports some neighbour discovery functions which can be used
by 6lowpan neighbour discovery ops functionality then.

Cc: David S. Miller 
Cc: Alexey Kuznetsov 
Cc: James Morris 
Cc: Hideaki YOSHIFUJI 
Cc: Patrick McHardy 
Signed-off-by: Alexander Aring 
---
  include/net/addrconf.h |  7 +++
  include/net/ndisc.h| 12 
  net/ipv6/addrconf.c| 15 +++
  net/ipv6/ndisc.c   | 14 +++---
  4 files changed, 29 insertions(+), 19 deletions(-)

diff --git a/include/net/addrconf.h b/include/net/addrconf.h
index b1774eb..9826d3a 100644
--- a/include/net/addrconf.h
+++ b/include/net/addrconf.h
@@ -97,6 +97,13 @@ void addrconf_leave_solict(struct inet6_dev *idev, const 
struct in6_addr *addr);
  void addrconf_add_linklocal(struct inet6_dev *idev,
const struct in6_addr *addr, u32 flags);
  
+int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,

+const struct prefix_info *pinfo,
+struct inet6_dev *in6_dev,
+const struct in6_addr *addr, int addr_type,
+u32 addr_flags, bool sllao, bool tokenized,
+__u32 valid_lft, u32 prefered_lft);
+
  static inline int addrconf_ifid_eui48(u8 *eui, struct net_device *dev)
  {
if (dev->addr_len != ETH_ALEN)
diff --git a/include/net/ndisc.h b/include/net/ndisc.h
index a5e2767..3f0f41d 100644
--- a/include/net/ndisc.h
+++ b/include/net/ndisc.h
@@ -53,6 +53,15 @@ enum {
  
  #include 
  
+/* Set to 3 to get tracing... */

+#define ND_DEBUG 1
+
+#define ND_PRINTK(val, level, fmt, ...)\
+do {   \
+   if (val <= ND_DEBUG) \
+   net_##level##_ratelimited(fmt, ##__VA_ARGS__);  \
+} while (0)
+
  struct ctl_table;
  struct inet6_dev;
  struct net_device;
@@ -115,6 +124,9 @@ struct ndisc_options *ndisc_parse_options(const struct 
net_device *dev,
  u8 *opt, int opt_len,
  struct ndisc_options *ndopts);
  
+void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data,

+ int data_len, int pad);
+
  #define NDISC_OPS_REDIRECT_DATA_SPACE 2
  
  /*

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2d678c0..9c7d660 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2333,14 +2333,12 @@ static bool is_addr_mode_generate_stable(struct 
inet6_dev *idev)
   idev->addr_gen_mode == IN6_ADDR_GEN_MODE_RANDOM;
  }
  
-static int addrconf_prefix_rcv_add_addr(struct net *net,

-   struct net_device *dev,
-   const struct prefix_info *pinfo,
-   struct inet6_dev *in6_dev,
-   const struct in6_addr *addr,
-   int addr_type, u32 addr_flags,
-   bool sllao, bool tokenized,
-   __u32 valid_lft, u32 prefered_lft)
+int addrconf_prefix_rcv_add_addr(struct net *net, struct net_device *dev,
+const struct prefix_info *pinfo,
+struct inet6_dev *in6_dev,
+const struct in6_addr *addr, int addr_type,
+u32 addr_flags, bool sllao, bool tokenized,
+__u32 valid_lft, u32 prefered_lft)
  {
struct inet6_ifaddr *ifp = ipv6_get_ifaddr(net, addr, dev, 1);
int create = 0, update_lft = 0;
@@ -2430,6 +2428,7 @@ static int addrconf_prefix_rcv_add_addr(struct net *net,
  
  	return 0;

  }
+EXPORT_SYMBOL_GPL(addrconf_prefix_rcv_add_addr);
  
  void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, int len, bool sllao)

  {
diff --git a/net/ipv6/ndisc.c b/net/ipv6/ndisc.c
index 2f4afd1..fe65cdc 100644
--- a/net/ipv6/ndisc.c
+++ b/net/ipv6/ndisc.c
@@ -73,15 +73,6 @@
  #include 
  #include 
  
-/* Set to 3 to get tracing... */

-#define ND_DEBUG 1
-
-#define ND_PRINTK(val, level, fmt, ...)\
-do {   \
-   if (val <= ND_DEBUG) \
-   net_##level##_ratelimited(fmt, ##__VA_ARGS__);  \
-} while (0)
-
  static u32 ndisc_hash(const void *pkey,
  const struct net_device *dev,
  __u32 *hash_rnd);
@@ -150,8 +141,8 @@ struct neigh_table nd_tbl = {
  };
  EXPORT_SYMBOL_GPL(nd_tbl);
  
-static void __ndisc_fill_addr_option(struct sk_buff *skb, int type, void *data,

-int data_len, int pad)
+void __ndisc_fill_addr_option(struct s

RE: [PATCH] netfilter/nflog: nflog-range does not truncate packets

2016-06-15 Thread Lubashev, Igor
Vish, Pablo,

I wonder about the value of sending more data than a client is willing to 
consume (setting aside the important fact that the client code crashes due to 
the extra data).

It seems that we should either drop the nflog-range parameter from nflog 
altogether (and just use the len from the client) or allow nflog-range to 
further *restrict* the number of bytes sent to the client.

The "further restrict" logic would make it easier to build iptables rules that 
vary nflog-range based on some match conditions, so a single client would get 
different packet length depending on what rules matched.

Thoughts?

- Igor


-Original Message-
From: Vishwanath Pai [mailto:v...@akamai.com] 
Sent: Wednesday, June 15, 2016 10:55 AM
To: Pablo Neira Ayuso 
Cc: ka...@trash.net; kad...@blackhole.kfki.hu; netfilter-de...@vger.kernel.org; 
coret...@netfilter.org; Hunt, Joshua ; 
netdev@vger.kernel.org; pai.vishw...@gmail.com; Lubashev, Igor 

Subject: Re: [PATCH] netfilter/nflog: nflog-range does not truncate packets

On 06/15/2016 08:39 AM, Pablo Neira Ayuso wrote:
> But nlmsg_len should match len in this.
> 
> If we're just sending a part of the packet to userspace, then we 
> should adjust nlmsg_len to indicate exactly the netlink message length 
> that we're sending to userspace.
> 
> Is your patch triggering this nlmsg_len != len?

The value of len here is how many bytes were returned by recv. We do send the 
entire nlmsg_len to userspace, but recv cannot copy the full packet because the 
buffer is not big enough to hold this. They only allocate the buffer assuming 
that the packet won't be bigger than their snap len, but we send more data than 
their snap len and they don't handle this condition well.


Re: [PATCH net] ixgbe: napi_poll must return the work done

2016-06-15 Thread Alexander Duyck
On Wed, Jun 15, 2016 at 6:37 AM, Paolo Abeni  wrote:
> Currently the function ixgbe_poll() returns 0 when it clean completely
> the rx rings, but this foul budget accounting in core code.
> Fix this returning the actual work done, capped to weight - 1, since
> the core doesn't allow to return the full budget when the driver modifies
> the napi status
>
> Signed-off-by: Paolo Abeni 

I think the origin of reporting 0 was actually compatibility with some
NAPI code floating around from before the 2.6.24 kernel.

I'd be curious to know how much this is actually fouling things up.
Can you point to any specific issues it was causing?  If you end up
having to submit a v2 for any reason it might be useful if you can
provide the additional details on what actual issue it was causing.

You might also want to look at the other Intel drivers, specifically
ixgbevf and fm10k as I believe we have similar code in those drivers
as well.

Acked-by: Alexander Duyck 


[PATCH 1/2] mlx5: only register devlink when ethernet is available

2016-06-15 Thread Arnd Bergmann
We get a build error with the mlx5 driver when the ethernet
support (CONFIG_MLX5_CORE_EN) is disabled:

drivers/net/ethernet/mellanox/mlx5/core/main.c:1320:22: error: 
'mlx5_devlink_eswitch_mode_set' undeclared here (not in a function)
drivers/net/ethernet/mellanox/mlx5/core/main.c:1321:22: error: 
'mlx5_devlink_eswitch_mode_get' undeclared here (not in a function)
drivers/net/built-in.o:(.rodata+0x25a68): undefined reference to 
`mlx5_devlink_eswitch_mode_get'
drivers/net/built-in.o:(.rodata+0x25a6c): undefined reference to 
`mlx5_devlink_eswitch_mode_set'

There are actually two problems here, but they are closely related,
so I'm addressing them both:

- The header is included under an #ifdef, which is usually a bad idea
  as it hides the function declarations, so we fail to compile even
  if we don't actually use the functions in the end.
- The references to the functions are kept in the object file because
  we don't check whether they are built-in or not.

As we don't want to add any useless #ifdef here, this uses an
IS_ENABLED() check to drop the mlx5_devlink_ops structure when we don't
need it, and to skip the register/unregister step.

Signed-off-by: Arnd Bergmann 
Fixes: f7856daf57b9 ("net/mlx5: Add devlink interface")
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index dc568096b87c..d238e312b123 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -54,9 +54,7 @@
 #include 
 #include "mlx5_core.h"
 #include "fs_core.h"
-#ifdef CONFIG_MLX5_CORE_EN
 #include "eswitch.h"
-#endif
 
 MODULE_AUTHOR("Eli Cohen ");
 MODULE_DESCRIPTION("Mellanox Connect-IB, ConnectX-4 core driver");
@@ -1329,7 +1327,8 @@ static int init_one(struct pci_dev *pdev,
struct mlx5_priv *priv;
int err;
 
-   devlink = devlink_alloc(&mlx5_devlink_ops, sizeof(*dev));
+   devlink = devlink_alloc(IS_ENABLED(CONFIG_MLX5_CORE_EN) ?
+   &mlx5_devlink_ops : NULL, sizeof(*dev));
if (!devlink) {
dev_err(&pdev->dev, "kzalloc failed\n");
return -ENOMEM;
@@ -1372,7 +1371,8 @@ static int init_one(struct pci_dev *pdev,
goto clean_health;
}
 
-   err = devlink_register(devlink, &pdev->dev);
+   if (IS_ENABLED(CONFIG_MLX5_CORE_EN))
+   err = devlink_register(devlink, &pdev->dev);
if (err)
goto clean_load;
 
@@ -1397,7 +1397,8 @@ static void remove_one(struct pci_dev *pdev)
struct devlink *devlink = priv_to_devlink(dev);
struct mlx5_priv *priv = &dev->priv;
 
-   devlink_unregister(devlink);
+   if (IS_ENABLED(CONFIG_MLX5_CORE_EN))
+   devlink_unregister(devlink);
if (mlx5_unload_one(dev, priv)) {
dev_err(&dev->pdev->dev, "mlx5_unload_one failed\n");
mlx5_health_cleanup(dev);
-- 
2.9.0



[PATCH 2/2] mlx5: fix 64-bit division on times

2016-06-15 Thread Arnd Bergmann
The mlx5 driver fails to build on 32-bit architectures after some
references to 64-bit divisions got added:

drivers/net/built-in.o: In function `mlx5e_rx_am':
:(.text+0xf88ac): undefined reference to `__aeabi_ldivmod'

The driver even performs three division here, and it uses the
obsolete 'struct timespec' that we want to get rid of.

Using ktime_t and ktime_us_delta() replaces one of the divisions
and is mildly more efficient, aside from working across 'settimeofday'
calls and being the right type for the y2038 conversion.

Using a u32 instead of s64 to store the number of microseconds
limits the maximum time to about 71 minutes, but if we exceed that
time, we probably don't care about the result any more for the
purpose of rx coalescing.

For the number of packets, we are taking the difference between
two 'unsigned int', so the result won't ever be greater than that
either.

After those changes, the other two divisions are done as 32-bit
arithmetic operations, which are much faster.

Signed-off-by: Arnd Bergmann 
Fixes: 3841f0b3493b ("net/mlx5e: Support adaptive RX coalescing")
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h   |  2 +-
 drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c | 12 ++--
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 775b8d02a3dc..37df5728323b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -272,7 +272,7 @@ struct mlx5e_rx_am_stats {
 };
 
 struct mlx5e_rx_am_sample {
-   struct timespec time;
+   ktime_t time;
unsigned intpkt_ctr;
u16 event_ctr;
 };
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c 
b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
index cdff5cace4c2..bd0c70220a80 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx_am.c
@@ -267,7 +267,7 @@ static bool mlx5e_am_decision(struct mlx5e_rx_am_stats 
*curr_stats,
 static void mlx5e_am_sample(struct mlx5e_rq *rq,
struct mlx5e_rx_am_sample *s)
 {
-   getnstimeofday(&s->time);
+   s->time  = ktime_get();
s->pkt_ctr   = rq->stats.packets;
s->event_ctr = rq->cq.event_ctr;
 }
@@ -278,17 +278,17 @@ static void mlx5e_am_calc_stats(struct mlx5e_rx_am_sample 
*start,
struct mlx5e_rx_am_sample *end,
struct mlx5e_rx_am_stats *curr_stats)
 {
-   struct timespec time = timespec_sub(end->time, start->time);
-   s64 delta_us = timespec_to_ns(&time) / 1000;
-   s64 npkts = end->pkt_ctr - start->pkt_ctr;
+   /* u32 holds up to 71 minutes, should be enough */
+   u32 delta_us = ktime_us_delta(end->time, start->time);
+   unsigned int npkts = end->pkt_ctr - start->pkt_ctr;
 
if (!delta_us) {
WARN_ONCE(true, "mlx5e_am_calc_stats: delta_us=0\n");
return;
}
 
-   curr_stats->ppms =(npkts * 1000) / delta_us;
-   curr_stats->epms = (MLX5E_AM_NEVENTS * 1000) / delta_us;
+   curr_stats->ppms =(npkts * USEC_PER_MSEC) / delta_us;
+   curr_stats->epms = (MLX5E_AM_NEVENTS * USEC_PER_MSEC) / delta_us;
 }
 
 void mlx5e_rx_am_work(struct work_struct *work)
-- 
2.9.0



Re: [PATCH net] ixgbe: napi_poll must return the work done

2016-06-15 Thread Paolo Abeni
On Wed, 2016-06-15 at 08:20 -0700, Alexander Duyck wrote:
> On Wed, Jun 15, 2016 at 6:37 AM, Paolo Abeni  wrote:
> > Currently the function ixgbe_poll() returns 0 when it clean completely
> > the rx rings, but this foul budget accounting in core code.
> > Fix this returning the actual work done, capped to weight - 1, since
> > the core doesn't allow to return the full budget when the driver modifies
> > the napi status
> >
> > Signed-off-by: Paolo Abeni 
> 
> I think the origin of reporting 0 was actually compatibility with some
> NAPI code floating around from before the 2.6.24 kernel.
> 
> I'd be curious to know how much this is actually fouling things up.
> Can you point to any specific issues it was causing?  

I noticed this while instrumenting the napi poll loop for another
patch. 

It's not easy to reproduce the bugged scenario, several NICs receiving a
relevant amount of traffic on napi instances scheduled on the same
softirq are needed. 

If any/some of them has the buggy poll() method, the napi_poll() loop
may process (much) more than netdev_budget packets per invocation,
possibly delaying others softirq more than needed/expected. 

The maxium delay will be no matter what capped to a couple of jiffies,
due to the time-based loop end condition, so in the worst possible
scenario (most probably not a real thing), this adds a latency of 2
jiffies -  (~1.8ms on
recent h/w with HZ==1000).

> If you end up
> having to submit a v2 for any reason it might be useful if you can
> provide the additional details on what actual issue it was causing.
> 
> You might also want to look at the other Intel drivers, specifically
> ixgbevf and fm10k as I believe we have similar code in those drivers
> as well.

Thank you for the head-up. I need to get an hand on that h/w, first!

Paolo

> 
> Acked-by: Alexander Duyck 




[PATCH v2 03/11] Kbuild: always prefix objtree in LINUXINCLUDE

2016-06-15 Thread Arnd Bergmann
When $(LINUXINCLUDE) is added to the cflags of a target that
normall doesn't have it (e.g. HOSTCFLAGS), each entry in the
list is expanded so that we search both $(objtree) and $(srctree),
which is a bit silly, as we already know which of the two we
want for each entry in LINUXINCLUDE.

Also, a follow-up patch changes the behavior so we only look in
$(srctree) for manually added include path, and that breaks finding
the generated headers.

This adds an explicit $(objtree) for each tree that we want to
look for generated files.

Signed-off-by: Arnd Bergmann 
---
 Makefile | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Makefile b/Makefile
index 45159861e645..969924783543 100644
--- a/Makefile
+++ b/Makefile
@@ -377,19 +377,19 @@ CFLAGS_KCOV   := $(call 
cc-option,-fsanitize-coverage=trace-pc,)
 # Use USERINCLUDE when you must reference the UAPI directories only.
 USERINCLUDE:= \
-I$(srctree)/arch/$(hdr-arch)/include/uapi \
-   -Iarch/$(hdr-arch)/include/generated/uapi \
+   -I$(objtree)/arch/$(hdr-arch)/include/generated/uapi \
-I$(srctree)/include/uapi \
-   -Iinclude/generated/uapi \
+   -I$(objtree)/include/generated/uapi \
 -include $(srctree)/include/linux/kconfig.h
 
 # Use LINUXINCLUDE when you must reference the include/ directory.
 # Needed to be compatible with the O= option
 LINUXINCLUDE:= \
-I$(srctree)/arch/$(hdr-arch)/include \
-   -Iarch/$(hdr-arch)/include/generated/uapi \
-   -Iarch/$(hdr-arch)/include/generated \
+   -I$(objtree)/arch/$(hdr-arch)/include/generated/uapi \
+   -I$(objtree)/arch/$(hdr-arch)/include/generated \
$(if $(KBUILD_SRC), -I$(srctree)/include) \
-   -Iinclude
+   -I$(objtree)/include
 
 LINUXINCLUDE   += $(filter-out $(LINUXINCLUDE),$(USERINCLUDE))
 
-- 
2.9.0



[PATCH v2 00/11] Kbuild: fix -Wmissing-include-path warnings

2016-06-15 Thread Arnd Bergmann
This warning is enabled at "make W=1" level, and found a bunch of
actual problems in code that adds -I flags to nonexisting directories.
All of these are harmless, but clearly wrong.

Kbuild itself also adds a bunch of extra directories, including in
some cases those outside of the kernel tree (e.g. ../../include),
which can have surprising consequences.

This series fixes all the warnings I found with -Wmissing-include-dirs
enabled on ARM randconfigs and x86 allmodconfig. The non-Kbuild
patches can all be applied independently, while we probably want
the Kbuild stuff to be kept as a series, if we decide to merge them.

I have added my test patch at the end, mainly to see if the Kbuild
bot finds any other warnings on additional architectures.

Arnd

Arnd Bergmann (11):
  Kbuild: don't add ../../ to include path
  Kbuild: avoid duplicate include path
  Kbuild: always prefix objtree in LINUXINCLUDE
  Kbuild: arch: look for generated headers in obtree
  Kbuild: don't add obj tree in additional includes
  ARM: don't include removed directories
  ARM: hide mach-*/ include for ARM_SINGLE_ARMV7M
  drm: amd: remove broken include path
  net: skfb: remove obsolete -I cflag
  rtlwifi: don't add include path for rtl8188ee
  [EXPERIMENTAL] Kbuild: enable -Wmissing-include-dirs by default

 Makefile| 16 ++--
 arch/alpha/boot/Makefile|  2 +-
 arch/arm/Makefile   |  2 ++
 arch/arm/mach-mvebu/Makefile|  3 +--
 arch/arm/mach-realview/Makefile |  3 +--
 arch/arm/mach-s5pv210/Makefile  |  2 +-
 arch/powerpc/boot/Makefile  |  2 +-
 arch/powerpc/kvm/Makefile   |  2 +-
 arch/s390/boot/compressed/Makefile  |  4 ++--
 arch/um/Makefile|  4 ++--
 arch/x86/boot/Makefile  |  2 +-
 arch/x86/realmode/rm/Makefile   |  2 +-
 drivers/gpu/drm/amd/acp/Makefile|  2 --
 drivers/net/fddi/skfp/Makefile  |  2 +-
 drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile |  2 +-
 scripts/Kbuild.include  |  2 +-
 scripts/Makefile.lib|  7 ---
 17 files changed, 31 insertions(+), 28 deletions(-)

-- 
2.9.0



[PATCH v2 11/11] [EXPERIMENTAL] Kbuild: enable -Wmissing-include-dirs by default

2016-06-15 Thread Arnd Bergmann
I have fixed up all -Wmissing-include-dirs on ARM randconfig builds,
so we could make this the default, but I have not tested this at all
on other architectures.

This enables it anyway, just to see what other warnings we get
when the build bot analyses the branch.

Don't apply (yet).

Signed-off-by: Arnd Bergmann 
---
 Makefile | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/Makefile b/Makefile
index 969924783543..2305cbd61e60 100644
--- a/Makefile
+++ b/Makefile
@@ -781,6 +781,9 @@ endif
 NOSTDINC_FLAGS += -nostdinc -isystem $(shell $(CC) -print-file-name=include)
 CHECKFLAGS += $(NOSTDINC_FLAGS)
 
+# warn about incorrect -I include paths
+KBUILD_CFLAGS += -Wmissing-include-dirs
+
 # warn about C99 declaration after statement
 KBUILD_CFLAGS += $(call cc-option,-Wdeclaration-after-statement,)
 
-- 
2.9.0



[PATCH v2 04/11] Kbuild: arch: look for generated headers in obtree

2016-06-15 Thread Arnd Bergmann
There are very few files that need add an -I$(obj) gcc for the preprocessor
or the assembler. For C files, we add always these for both the objtree and
srctree, but for the other ones we require the Makefile to add them, and
Kbuild then adds it for both trees.

As a preparation for changing the meaning of the -I$(obj) directive to
only refer to the srctree, this changes the two instances in arch/x86 to use
an explictit $(objtree) prefix where needed, otherwise we won't find the
headers any more, as reported by the kbuild 0day builder.

arch/x86/realmode/rm/realmode.lds.S:75:20: fatal error: pasyms.h: No such file 
or directory

Signed-off-by: Arnd Bergmann 
---
 arch/alpha/boot/Makefile   | 2 +-
 arch/powerpc/boot/Makefile | 2 +-
 arch/powerpc/kvm/Makefile  | 2 +-
 arch/s390/boot/compressed/Makefile | 4 ++--
 arch/um/Makefile   | 4 ++--
 arch/x86/boot/Makefile | 2 +-
 arch/x86/realmode/rm/Makefile  | 2 +-
 7 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/arch/alpha/boot/Makefile b/arch/alpha/boot/Makefile
index 8399bd0e68e8..0cbe4c59d3ce 100644
--- a/arch/alpha/boot/Makefile
+++ b/arch/alpha/boot/Makefile
@@ -15,7 +15,7 @@ targets   := vmlinux.gz vmlinux \
 OBJSTRIP   := $(obj)/tools/objstrip
 
 HOSTCFLAGS := -Wall -I$(objtree)/usr/include
-BOOTCFLAGS += -I$(obj) -I$(srctree)/$(obj)
+BOOTCFLAGS += -I$(objtree)/$(obj) -I$(srctree)/$(obj)
 
 # SRM bootable image.  Copy to offset 512 of a partition.
 $(obj)/bootimage: $(addprefix $(obj)/tools/,mkbb lxboot bootlx) 
$(obj)/vmlinux.nh
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 8fe78a3efc92..ad3782610cf1 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -43,7 +43,7 @@ ifeq ($(call cc-option-yn, -fstack-protector),y)
 BOOTCFLAGS += -fno-stack-protector
 endif
 
-BOOTCFLAGS += -I$(obj) -I$(srctree)/$(obj)
+BOOTCFLAGS += -I$(objtree)/$(obj) -I$(srctree)/$(obj)
 
 DTC_FLAGS  ?= -p 1024
 
diff --git a/arch/powerpc/kvm/Makefile b/arch/powerpc/kvm/Makefile
index eba0bea6e032..1f9e5529e692 100644
--- a/arch/powerpc/kvm/Makefile
+++ b/arch/powerpc/kvm/Makefile
@@ -20,7 +20,7 @@ common-objs-y += powerpc.o emulate.o emulate_loadstore.o
 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o
 obj-$(CONFIG_KVM_BOOK3S_HANDLER) += book3s_exports.o
 
-AFLAGS_booke_interrupts.o := -I$(obj)
+AFLAGS_booke_interrupts.o := -I$(objtree)/$(obj)
 
 kvm-e500-objs := \
$(common-objs-y) \
diff --git a/arch/s390/boot/compressed/Makefile 
b/arch/s390/boot/compressed/Makefile
index 1dd210347e12..2657a29a2026 100644
--- a/arch/s390/boot/compressed/Makefile
+++ b/arch/s390/boot/compressed/Makefile
@@ -31,10 +31,10 @@ quiet_cmd_sizes = GEN $@
 $(obj)/sizes.h: vmlinux
$(call if_changed,sizes)
 
-AFLAGS_head.o += -I$(obj)
+AFLAGS_head.o += -I$(objtree)/$(obj)
 $(obj)/head.o: $(obj)/sizes.h
 
-CFLAGS_misc.o += -I$(obj)
+CFLAGS_misc.o += -I$(objtree)/$(obj)
 $(obj)/misc.o: $(obj)/sizes.h
 
 OBJCOPYFLAGS_vmlinux.bin :=  -R .comment -S
diff --git a/arch/um/Makefile b/arch/um/Makefile
index e3abe6f3156d..0ca46ededfc7 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -78,8 +78,8 @@ include $(ARCH_DIR)/Makefile-os-$(OS)
 
 KBUILD_CPPFLAGS += -I$(srctree)/$(HOST_DIR)/include \
   -I$(srctree)/$(HOST_DIR)/include/uapi \
-  -I$(HOST_DIR)/include/generated \
-  -I$(HOST_DIR)/include/generated/uapi
+  -I$(objtree)/$(HOST_DIR)/include/generated \
+  -I$(objtree)/$(HOST_DIR)/include/generated/uapi
 
 # -Derrno=kernel_errno - This turns all kernel references to errno into
 # kernel_errno to separate them from the libc errno.  This allows -fno-common
diff --git a/arch/x86/boot/Makefile b/arch/x86/boot/Makefile
index be8e688fa0d4..12ea8f8384f4 100644
--- a/arch/x86/boot/Makefile
+++ b/arch/x86/boot/Makefile
@@ -96,7 +96,7 @@ $(obj)/zoffset.h: $(obj)/compressed/vmlinux FORCE
$(call if_changed,zoffset)
 
 
-AFLAGS_header.o += -I$(obj)
+AFLAGS_header.o += -I$(objtree)/$(obj)
 $(obj)/header.o: $(obj)/zoffset.h
 
 LDFLAGS_setup.elf  := -T
diff --git a/arch/x86/realmode/rm/Makefile b/arch/x86/realmode/rm/Makefile
index c556c5ae8de5..25012abc3409 100644
--- a/arch/x86/realmode/rm/Makefile
+++ b/arch/x86/realmode/rm/Makefile
@@ -48,7 +48,7 @@ targets += realmode.lds
 $(obj)/realmode.lds: $(obj)/pasyms.h
 
 LDFLAGS_realmode.elf := --emit-relocs -T
-CPPFLAGS_realmode.lds += -P -C -I$(obj)
+CPPFLAGS_realmode.lds += -P -C -I$(objtree)/$(obj)
 
 targets += realmode.elf
 $(obj)/realmode.elf: $(obj)/realmode.lds $(REALMODE_OBJS) FORCE
-- 
2.9.0



[PATCH v2 10/11] rtlwifi: don't add include path for rtl8188ee

2016-06-15 Thread Arnd Bergmann
For rtl8188ee, we pass -Idrivers/net/wireless/rtlwifi/ to gcc,
however that directy no longer exists, so evidently this option
is no longer required here and can be removed to avoid a warning
when building with 'make W=1' or 'gcc -Wmissing-include-dirs'

Signed-off-by: Arnd Bergmann 
---
 drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile 
b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile
index a85419a37651..676e7de27f27 100644
--- a/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile
+++ b/drivers/net/wireless/realtek/rtlwifi/rtl8188ee/Makefile
@@ -12,4 +12,4 @@ rtl8188ee-objs := \
 
 obj-$(CONFIG_RTL8188EE) += rtl8188ee.o
 
-ccflags-y += -Idrivers/net/wireless/rtlwifi -D__CHECK_ENDIAN__
+ccflags-y += -D__CHECK_ENDIAN__
-- 
2.9.0



[PATCH v2 05/11] Kbuild: don't add obj tree in additional includes

2016-06-15 Thread Arnd Bergmann
When building with separate object directories and driver specific
Makefiles that add additional header include paths, Kbuild adjusts
the gcc flags so that we include both the directory in the source
tree and in the object tree.

However, due to another bug I fixed earlier, this did not actually
include the correct directory in the object tree, so we know that
we only really need the source tree here. Also, including the
object tree sometimes causes warnings about nonexisting directories
when the include path only exists in the source.

This changes the logic to only emit the -I argument for the srctree,
not for objects. We still need both $(srctree)/$(src) and $(obj)
though, so I'm adding them manually.

Signed-off-by: Arnd Bergmann 
---
 scripts/Kbuild.include | 2 +-
 scripts/Makefile.lib   | 7 ---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index f8b45eb47ed3..15b196fc2f49 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -202,7 +202,7 @@ hdr-inst := -f $(srctree)/scripts/Makefile.headersinst obj
 # Prefix -I with $(srctree) if it is not an absolute path.
 # skip if -I has no parameter
 addtree = $(if $(patsubst -I%,%,$(1)), \
-$(if $(filter-out -I/% -I../%,$(1)),$(patsubst -I%,-I$(srctree)/%,$(1))) $(1))
+$(if $(filter-out -I/% -I./% -I../%,$(1)),$(patsubst 
-I%,-I$(srctree)/%,$(1)),$(1)))
 
 # Find all -I options and call addtree
 flags = $(foreach o,$($(1)),$(if $(filter -I%,$(o)),$(call addtree,$(o)),$(o)))
diff --git a/scripts/Makefile.lib b/scripts/Makefile.lib
index 76494e15417b..0a07f9014944 100644
--- a/scripts/Makefile.lib
+++ b/scripts/Makefile.lib
@@ -155,9 +155,10 @@ else
 # $(call addtree,-I$(obj)) locates .h files in srctree, from generated .c files
 #   and locates generated .h files
 # FIXME: Replace both with specific CFLAGS* statements in the makefiles
-__c_flags  = $(call addtree,-I$(obj)) $(call flags,_c_flags)
-__a_flags  =  $(call flags,_a_flags)
-__cpp_flags =  $(call flags,_cpp_flags)
+__c_flags  = $(if $(obj),-I$(srctree)/$(src) -I$(obj)) \
+ $(call flags,_c_flags)
+__a_flags  = $(call flags,_a_flags)
+__cpp_flags = $(call flags,_cpp_flags)
 endif
 
 c_flags= -Wp,-MD,$(depfile) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) \
-- 
2.9.0



  1   2   3   >