Re: [PATCH 3/7] net: phy: spi_ks8995: add register initialization

2016-02-08 Thread Helmut Buchsbaum

On 02/08/2016 05:38 AM, Florian Fainelli wrote:

On 07/02/2016 14:39, Helmut Buchsbaum wrote:

Since several use cases need to setup at least some basic control
registers add the ability to configure an array containing such
register initialization values within the platform data of the switch.
Furthermore expose this capabilty to the devicetree.

Platform data now contains a pointer to an array and the array length
where each member contains the register to be initialized, the
initialization value and a register mask, since in many use cases there
is only the need to init some bits of a register, e.g. disabling unused
ports.

The devicetree notation add the property 'settings' to the SPI node of the
ks8985 driver, which is a list of triple values (register, value, mask),
e.g.:

settings = <0x4D 0x08 0x08
 0x5D 0x08 0x08>;


You encode way too much in the Device Tree that should be knowledge to
the driver on how to configure the switch. This is very tempting,
because you do not dictate any use case and let people define it based
on their Device Tree source, but at the same time, this is very error
prone and does not provide what a proper device driver needs to be doing
by defining a standard and predictable behavior.

Right now this driver is a PHY driver, but it should be moved to a DSA
driver eventually such that each port is exposed as a network interface,
and you have hooks to power on/off ports based on whether a
corresponding network interface is up/down.
--
Florian

The way I built these initialization settings was inspired by the way it 
is done in the pinctrl subsystem: there you also configure the pin 
functions in a very hardware specific way (dependent on the underlying 
pinctrl hardware). Thus this was just extending a principle we can find 
in other subsystems of the kernel to this driver. Furthermore the 
register interface is already exposed to the user space via sysfs, 
which, in my opinion, is even more error prone then setting up the 
Device Tree carefully.


Nevertheless, I can perfectly understand your point of view. This is 
just what thought when I saw all registers are accessible from user space!


At the moment I use this driver with a KSZ8795CLX, port 5 directly 
connected to a MACB/GEM of a Zynq SOC, with the need to enable the RGMII 
internal clock delay (register 0x56,  bit 4), otherwise the the Zynq 
cannot talk to the switch on its RGMII interface (being able to switch 
off unused ports is just a nice add-on I use). Using the sysfs 
capabilities of this driver might be an alternative, but contradicts our 
requirement to set up the network interfaces as fast as possible. 
Furthermore stuff like IP_PNP or nfs root won't work. But maybe I should 
try to move this kind of basic setup to bootloader - I'll investigate 
this idea!


Since I'm not at all (yet) familiar with the DSA subsystem I wonder how 
I could manage setting the clock delay bit with DSA. Would this be a 
driver specific setting or can it be fulfilled within the subsystem?


Since I still want to share my work for the PHY only driver, is it ok if 
I'll resend the patch series just without part 3 to get support for the 
KSZ8795? Let's talk about the part 3 functionality and moving the driver 
to DSA separately!


BTW, are there any additional links about DSA complementing the kernel
documentation?

Thanks for your comments,
Helmut



Re: [PATCH 3/7] net: phy: spi_ks8995: add register initialization

2016-02-08 Thread Andrew Lunn
> At the moment I use this driver with a KSZ8795CLX, port 5 directly
> connected to a MACB/GEM of a Zynq SOC, with the need to enable the
> RGMII internal clock delay (register 0x56,  bit 4), otherwise the
> the Zynq cannot talk to the switch on its RGMII interface

Hi Helmut

This is possible with DSA.
Documentation/devicetree/bindings/net/dsa/dsa.txt says you can include
a phy-mode setting. phy-mode is defined in
Documentation/devicetree/bindings/net/ethernet.txt and includes
"rgmii-id", "rgmii-rxid", "rgmii-txid" which control these delays.

Andrew


Re: [PATCH 4/7] net: phy: spi_ks8995: add support for resetting switch using GPIO

2016-02-08 Thread Andrew Lunn
On Sun, Feb 07, 2016 at 11:39:10PM +0100, Helmut Buchsbaum wrote:
> When using device tree it is no more possible to reset the PHY at board
> level. Furthermore, doing in the driver allows to power down the switch
> when the it is not used any more.
> 
> The patch introduces a new optional property "reset-gpios" denoting an
> appropriate GPIO handle, e.g.:
> 
> reset-gpios = <&gpio0 46 1>

The 1 here means active low flag.

>  
> + pdata->reset_gpio = of_get_named_gpio(np, "reset-gpios", 0);
> +

Here you don't take any notice of the flags.

>   /* we have something like:
>* settings = <0x22 0x80 0xF0>;
>*   ^   ^^
> @@ -484,6 +489,8 @@ static int ks8995_probe(struct spi_device *spi)
>   if (!ks->pdata)
>   return -ENOMEM;
>  
> + ks->pdata->reset_gpio = -1;
> +
>   err = ks8995_parse_dt(ks);
>   if (err) {
>   dev_err(&ks->spi->dev, "bad data DT data\n");
> @@ -494,6 +501,18 @@ static int ks8995_probe(struct spi_device *spi)
>   if (!ks->pdata)
>   ks->pdata = spi->dev.platform_data;
>  
> + if (ks->pdata && gpio_is_valid(ks->pdata->reset_gpio)) {
> + err = devm_gpio_request_one(&spi->dev,
> + ks->pdata->reset_gpio,
> + GPIOF_OUT_INIT_HIGH,

Hard coded HIGH. You should determine this from the flag

DSA has the same functionality and does support the flag. You can copy
it from there.

 Andrew


[PATCH] tcp: Fix syncookies sysctl default.

2016-02-08 Thread David Miller

Unintentionally the default was changed to zero, fix
that.

Fixes: 12ed8244ed ("ipv4: Namespaceify tcp syncookies sysctl knob")
Signed-off-by: David S. Miller 
---
 net/ipv4/tcp_ipv4.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 11ae706..0d381fa 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -2389,7 +2389,7 @@ static int __net_init tcp_sk_init(struct net *net)
 
net->ipv4.sysctl_tcp_syn_retries = TCP_SYN_RETRIES;
net->ipv4.sysctl_tcp_synack_retries = TCP_SYNACK_RETRIES;
-   net->ipv4.sysctl_tcp_syncookies = 0;
+   net->ipv4.sysctl_tcp_syncookies = 1;
net->ipv4.sysctl_tcp_reordering = TCP_FASTRETRANS_THRESH;
net->ipv4.sysctl_tcp_retries1 = TCP_RETR1;
net->ipv4.sysctl_tcp_retries2 = TCP_RETR2;
-- 
2.5.0



Re: [PATCH] tcp: Fix syncookies sysctl default.

2016-02-08 Thread Maciej Żenczykowski
Ack.  Feel free to add:

Reported-by: Maciej Żenczykowski 

(via Eric Dumazet)


Re: [RFC v2] iwlwifi: pcie: transmit queue auto-sizing

2016-02-08 Thread Michal Kazior
On 5 February 2016 at 17:47, Dave Taht  wrote:
>> A bursted txop can be as big as 5-10ms. If you consider you want to
>> queue 5-10ms worth of data for *each* station at any given time you
>> obviously introduce a lot of lag. If you have 10 stations you might
>> end up with service period at 10*10ms = 100ms. This gets even worse if
>> you consider MU-MIMO because you need to do an expensive sounding
>> procedure before transmitting. So while SU aggregation can probably
>> still work reasonably well with shorter bursts (1-2ms) MU needs at
>> least 3ms to get *any* gain when compared to SU (which obviously means
>> you want more to actually make MU pay off).
>
> I am not sure where you get these numbers. Got a spreadsheet?

Here's a nice summary on some of it:

  http://chimera.labs.oreilly.com/books/123401739/ch03.html#figure-mac-ampdu

Even if your single A-MPDU can be shorter than a txop you can burst a
few of them if my understanding is correct.

The overhead associated with MU sounding is something I've been told
only. Apparently for MU to pay off you need fairly big bursts. This
implies that the more stations you have to service the less it makes
sense to attempt MU if you consider latency.


> Gradually reducing the maximum sized txop as a function of the number
> of stations makes sense. If you have 10 stations pending delivery and
> reduced the max txop to 1ms, you hurt bandwidth at that instant, but
> by offering more service to more stations, in less time, they will
> converge on a reasonable share of the bandwidth for each, faster[1].
> And I'm sure that the person videoconferencing on a link like that
> would appreciate getting some service inside of a 10ms interval,
> rather than a 100ms.
>
> yes, there's overhead, and that's not the right number, which would
> vary as to g,n,ac and successors.
>
> You will also get more opportunities to use mu-mimo with shorter
> bursts extant and more stations being regularly serviced.
>
> [1] https://www.youtube.com/watch?v=Rb-UnHDw02o at about 13:50

This is my thinking as well, at least for most common use cases.

If you try to optimize for throughput by introducing extra induced
latency you might end up not being able to use aggregation in practice
anyway because you won't be able to start up connections and ask for
enough data - or at least that's what my intuition tells me.

But, like I've mentioned, there's interest in making it possible to
maximize for throughput (regardless of latency). This surely makes
sense for synthetic UDP benchmarks. But does it make sense for any
real-world application? No idea.


>> The rule of thumb is the
>> longer you wait the bigger capacity you can get.
>
> This is not strictly true as the "fountain" of packets is regulated by
> acks on the other side of the link, and ramp up or down as a function
> of service time and loss.

Yes, if you consider real world cases, i.e. TCP, web traffic, etc.
then you're correct.


>> Apparently there's interest in maximizing throughput but it stands in
>> direct opposition of keeping the latency down so I've been thinking
>> how to satisfy both.
>>
>> The current approach ath10k is taking (patches in review [1][2]) is to
>> use mac80211 software queues for per-station queuing, exposing queue
>> state to firmware (it decides where frames should be dequeued from)
>> and making it possible to stop/wake per-station tx subqueue with fake
>> netdev queues. I'm starting to think this is not the right way though
>> because it's inherently hard to control latency and there's a huge
>> memory overhead associated with the fake netdev queues.
>
> What is this overhead?

E.g. if you want to be able to maximize throughput for 50 MU clients
you need to be able to queue, in theory, 50*200 (roughly) frames. This
translates to both huge memory usage and latency *and* renders
(fq_)codel qdisc rather.. moot.


> Applying things  like codel tend to dramatically shorten the amount of
> skbs extant...

> modern 802.11ac capable hardware has tons more
> memory...

I don't think it does. QCA988x is able to handle "only" 1424 tx
descriptors (IOW 1500-byte long MSDUs) in the driver-to-firmware tx
queue (it's a flat queue). QCA99x0 is able to handle 2500 if asked
politely.

This is still not enough to satisfy the insane "maximize the
capacity/throughput" expectations though.

You could actually argue it's too much from the bufferbloat problem
point of view anyway and Emmanuel's patch proves it is beneficial to
buffer less in driver depending on the sojourn packet time.


>> Also fq_codel
>> is a less effective with this kind of setup.
>
> fq_codel's principal problems with working with wifi are long and
> documented in the talk above.
>
>> My current thinking is that the entire problem should be solved via
>> (per-AC) qdiscs, e.g. fq_codel. I guess one could use
>> limit/target/interval/quantum knobs to tune it for higher latency of
>> aggregation-oriented Wi-Fi links where long service time (think
>> 100-200ms

Re: [RFC v2] iwlwifi: pcie: transmit queue auto-sizing

2016-02-08 Thread Emmanuel Grumbach
On Fri, Feb 5, 2016 at 10:44 AM, Michal Kazior  wrote:
> On 4 February 2016 at 22:14, Ben Greear  wrote:
>> On 02/04/2016 12:56 PM, Grumbach, Emmanuel wrote:
>>> On 02/04/2016 10:46 PM, Ben Greear wrote:
 On 02/04/2016 12:16 PM, Emmanuel Grumbach wrote:
>
> As many (all?) WiFi devices, Intel WiFi devices have
> transmit queues which have 256 transmit descriptors
> each and each descriptor corresponds to an MPDU.
> This means that when it is full, the queue contains
> 256 * ~1500 bytes to be transmitted (if we don't have
> A-MSDUs). The purpose of those queues is to have enough
> packets to be ready for transmission so that when the device
> gets an opportunity to transmit (TxOP), it can take as many
> packets as the spec allows and aggregate them into one
> A-MPDU or even several A-MPDUs if we are using bursts.

 I guess this is only really usable if you have exactly one
 peer connected (ie, in station mode)?

 Otherwise, you could have one slow peer and one fast one,
 and then I suspect this would not work so well?
>>>
>>>
>>> Yes. I guess this one (big) limitation. I guess that what would happen
>>> in this case is that the the latency would constantly jitter. But I also
>
> Hmm.. You'd probably need to track per-station packet sojourn time as
> well and make it possible to stop/wake queues per station.

Clearly here comes the difference between the devices you work on, and
the devices I work on. Intel devices are more client oriented. Our AP
mode doesn't handle many clients etc..

>
>
>>> noticed that I could reduce the transmit queue to 130 descriptors
>>> (instead of 256) and still reach maximal throughput because we can
>>> refill the queues quickly enough.
>>> In iwlwifi, we have plans to have one queue for each peer.
>>> This is under development. Not sure when it'll be ready. It also requires
>>> firmware change obviously.
>>
>> Per-peer queues will probably be nice, especially if we can keep the
>> buffer bloat manageable.
>
> Per-station queues sound tricky if you consider bufferbloat.

iwlwifi's A-MPDU model is different from athXk's I guess. In iwlwifi
(the Intel devices really since it is mostly firmware) the firmware
will try to use a TxOP whenever there is data in the queue. Then, once
we have a chance to transmit, we will look at what we have in the
queue and send the biggest aggregates we can (and bursts if allowed).
But we will not defer packets' transmission to get bigger aggregates.
Testing shows that under ideal conditions, we can have enough packets
in the queue to build big aggregates without creating artificial
latency.

>
> To maximize use of airtime (i.e. txop) you need to send big
> aggregates. Since aggregates are per station-tid to maximize
> multi-station performance (in AP mode) you'll need to queue a lot of
> frames, per each station, depending on the chosen tx rate.

Sure.

>
> A bursted txop can be as big as 5-10ms. If you consider you want to
> queue 5-10ms worth of data for *each* station at any given time you
> obviously introduce a lot of lag. If you have 10 stations you might
> end up with service period at 10*10ms = 100ms. This gets even worse if
> you consider MU-MIMO because you need to do an expensive sounding
> procedure before transmitting. So while SU aggregation can probably
> still work reasonably well with shorter bursts (1-2ms) MU needs at
> least 3ms to get *any* gain when compared to SU (which obviously means
> you want more to actually make MU pay off). The rule of thumb is the
> longer you wait the bigger capacity you can get.

I am not an expert about MU-MIMO, but I'll believe you :)
We can chat about this in a few days :)

Queueing frames under good conditions is fine in a way that it is a
"Good queue" (hey Stephen), you need those queues to maximize the
throughput because of the bursty nature of WiFi and the queue "moves"
quickly since you have high throughput so that the sojourn time in
your queue is relatively small, but when the link conditions gets less
good you need to reduce the queue length because it doesn't really
help you anyway.  This is what my patch aims at fixing.
All this is true when you have a small number of stations...
I understand from your comment that even in ideal conditions you still
need to create a lot of latency to gain TPT. Then there isn't much we
can do without impacting either TPT or latency. Then, there is a real
tradeoff.
I guess that again you are facing a classic AP problem that a station
or an AP with a small number of concurrent associated clients will
likely not have.

All this encourages me in my belief that I should do something in
iwlwifi for iwlwifi and at mac80211's level since there seem to be
very different problems / use cases. But this code can still suit
those use cases can all fit and we'd just (...) have to give different
parameters to the "algorithm"?

>
> Apparently there's interest in maximizing throughput but it stands in
> d

Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-08 Thread Hannes Frederic Sowa

Hello,

On 06.02.2016 20:40, Jiri Pirko wrote:

Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoi...@gmail.com wrote:

On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:


Okay. I see it more as changing mode of operation of hardware and thus has
not really anything to do with networking. If you say you change ethernet to
infiniband it has something to do with networking, sure. But I am fine with
this, I just thought the code size could be reduced by adding this to sysfs
quite a lot. I don't have a strong opinion on this.


there is already a way to change eth/ib via
echo 'eth' > /sys/bus/pci/drivers/mlx4_core/:02:00.0/mlx4_port1

sounds like this is another way to achieve the same?


It is. However the current way is driver-specific, not correct.


Why is driver specific not correct? Actually it is very much a device 
specific thing, isn't it?



For mlx5, we need the same, it cannot be done in this way. Do devlink is
the correct way to go.


Do two drivers already justify a new complete netlink api? Doesn't this 
create the same problems like netdevice naming problems which needed 
multiple years to become stable in case we have multiple cards or some 
administrator reorders the cards (biosdevorder, systemd/udev issues)? 
Are ports always stable? How can we have a 1:1 relationship with 
ifindexes and how can they be stable? It is impossible to use that in 
scripts?



Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
Or this is for switches instead of nics?
Then why it's not adding to switchdev?


Note this is not specific to switch ASICs. This is for all network devices.


That's actually my fear. The relationship from "devlink-names" to 
ifindexes I didn't understand at all architecturally.


Bye,
Hannes



Re: [RFC v2] iwlwifi: pcie: transmit queue auto-sizing

2016-02-08 Thread Emmanuel Grumbach
On Mon, Feb 8, 2016 at 12:00 PM, Michal Kazior  wrote:
> On 5 February 2016 at 17:47, Dave Taht  wrote:
>>> A bursted txop can be as big as 5-10ms. If you consider you want to
>>> queue 5-10ms worth of data for *each* station at any given time you
>>> obviously introduce a lot of lag. If you have 10 stations you might
>>> end up with service period at 10*10ms = 100ms. This gets even worse if
>>> you consider MU-MIMO because you need to do an expensive sounding
>>> procedure before transmitting. So while SU aggregation can probably
>>> still work reasonably well with shorter bursts (1-2ms) MU needs at
>>> least 3ms to get *any* gain when compared to SU (which obviously means
>>> you want more to actually make MU pay off).
>>
>> I am not sure where you get these numbers. Got a spreadsheet?
>
> Here's a nice summary on some of it:
>
>   
> http://chimera.labs.oreilly.com/books/123401739/ch03.html#figure-mac-ampdu
>
> Even if your single A-MPDU can be shorter than a txop you can burst a
> few of them if my understanding is correct.
>
> The overhead associated with MU sounding is something I've been told
> only. Apparently for MU to pay off you need fairly big bursts. This
> implies that the more stations you have to service the less it makes
> sense to attempt MU if you consider latency.
>
>
>> Gradually reducing the maximum sized txop as a function of the number
>> of stations makes sense. If you have 10 stations pending delivery and
>> reduced the max txop to 1ms, you hurt bandwidth at that instant, but
>> by offering more service to more stations, in less time, they will
>> converge on a reasonable share of the bandwidth for each, faster[1].
>> And I'm sure that the person videoconferencing on a link like that
>> would appreciate getting some service inside of a 10ms interval,
>> rather than a 100ms.
>>
>> yes, there's overhead, and that's not the right number, which would
>> vary as to g,n,ac and successors.
>>
>> You will also get more opportunities to use mu-mimo with shorter
>> bursts extant and more stations being regularly serviced.
>>
>> [1] https://www.youtube.com/watch?v=Rb-UnHDw02o at about 13:50
>
> This is my thinking as well, at least for most common use cases.
>
> If you try to optimize for throughput by introducing extra induced
> latency you might end up not being able to use aggregation in practice
> anyway because you won't be able to start up connections and ask for
> enough data - or at least that's what my intuition tells me.
>
> But, like I've mentioned, there's interest in making it possible to
> maximize for throughput (regardless of latency). This surely makes
> sense for synthetic UDP benchmarks. But does it make sense for any
> real-world application? No idea.
>
>
>>> The rule of thumb is the
>>> longer you wait the bigger capacity you can get.
>>
>> This is not strictly true as the "fountain" of packets is regulated by
>> acks on the other side of the link, and ramp up or down as a function
>> of service time and loss.
>
> Yes, if you consider real world cases, i.e. TCP, web traffic, etc.
> then you're correct.
>
>
>>> Apparently there's interest in maximizing throughput but it stands in
>>> direct opposition of keeping the latency down so I've been thinking
>>> how to satisfy both.
>>>
>>> The current approach ath10k is taking (patches in review [1][2]) is to
>>> use mac80211 software queues for per-station queuing, exposing queue
>>> state to firmware (it decides where frames should be dequeued from)
>>> and making it possible to stop/wake per-station tx subqueue with fake
>>> netdev queues. I'm starting to think this is not the right way though
>>> because it's inherently hard to control latency and there's a huge
>>> memory overhead associated with the fake netdev queues.
>>
>> What is this overhead?
>
> E.g. if you want to be able to maximize throughput for 50 MU clients
> you need to be able to queue, in theory, 50*200 (roughly) frames. This
> translates to both huge memory usage and latency *and* renders
> (fq_)codel qdisc rather.. moot.

Ok - now I understand. So yes the conclusion below (make fq_codel
station aware) makes a lot sense.

>
>
>> Applying things  like codel tend to dramatically shorten the amount of
>> skbs extant...
>
>> modern 802.11ac capable hardware has tons more
>> memory...
>
> I don't think it does. QCA988x is able to handle "only" 1424 tx
> descriptors (IOW 1500-byte long MSDUs) in the driver-to-firmware tx
> queue (it's a flat queue). QCA99x0 is able to handle 2500 if asked
> politely.

As I said, our design is not flat which removes for the firmware to
classify the packets by station to be able to build aggregates, but
the downside is the number of clients you can service.

>
> This is still not enough to satisfy the insane "maximize the
> capacity/throughput" expectations though.
>
> You could actually argue it's too much from the bufferbloat problem
> point of view anyway and Emmanuel's patch proves it is beneficial to
> buffer less in drive

[PATCH v5] net: ethernet: nb8800: support fixed-link DT node

2016-02-08 Thread Sebastian Frias


Under some circumstances, for example when connecting
to a switch:

https://stackoverflow.com/questions/31046172/device-tree-for-phy-less-connection-to-a-dsa-switch

the ethernet port will not be connected to a PHY.
In that case a "fixed-link" DT node can be used to replace it.

This patch adds support for the "fixed-link" node to the
nb8800 driver.

Signed-off-by: Sebastian Frias 
---
 drivers/net/ethernet/aurora/nb8800.c | 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c

index ecc4a33..e1fb071 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1460,7 +1460,19 @@ static int nb8800_probe(struct platform_device *pdev)
goto err_disable_clk;
}

-   priv->phy_node = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
+   if (of_phy_is_fixed_link(pdev->dev.of_node)) {
+   ret = of_phy_register_fixed_link(pdev->dev.of_node);
+   if (ret < 0) {
+   dev_err(&pdev->dev, "bad fixed-link spec\n");
+   goto err_free_bus;
+   }
+   priv->phy_node = of_node_get(pdev->dev.of_node);
+   }
+
+   if (!priv->phy_node)
+   priv->phy_node = of_parse_phandle(pdev->dev.of_node,
+ "phy-handle", 0);
+
if (!priv->phy_node) {
dev_err(&pdev->dev, "no PHY specified\n");
ret = -ENODEV;
--
2.1.4



Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Sebastian Frias

On 02/05/2016 04:26 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


On 02/05/2016 04:08 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


On 02/05/2016 03:34 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


Signed-off-by: Sebastian Frias 


Please change the subject to something like "net: ethernet: nb8800:
support fixed-link DT node" and add a comment body.


The subject is pretty explicit for such a simple patch, what else
could I add that wouldn't be unnecessary chat?


It's customary to include a description body even if it's little more
than a restatement of the subject.  Also, while the subject usually only
says _what_ the patch does, the body should additionally state _why_ it
is needed.


I understand, but _why_ it is needed is also obvious in this case; I
mean, without the patch "fixed-link" cannot be used.


Then say so.


Other patches may not be as obvious/simple and thus justify and
require more details.

Anyway, I added "Properly handles the case where the PHY is not connected
to the real MDIO bus" would that be ok?


Have you read Documentation/SubmittingPatches?  Do so (again) and pay
special attention to section 2 "Describe your changes."


I just sent v5.
If for whatever reason, you or anybody else think that the comment is 
not good, would you mind proposing a comment that would make everybody 
happy so that the patch goes thru?
And if you or anybody else does not want the patch, could you please say 
so as well?


I have to admit this process (sending patches and getting it reviewed) 
could benefit from more clarifications.
For example, the process could say that at least 2 reviewers must agree 
on it (on the comments made to the patch and on the patch itself).
I could also say that reviewers are to express not only their opinion 
but to clearly and unequivocally accept or reject.


For instance, right now, it is not clear to me if your comments are 
"nice to have" or "blocking" the patch.

I don't know if the patch is welcome or not, etc.
So I submitted v5, but maybe it was not even necessary, it's hard to 
know where in the submission process we are.


By the way, I know some people like the command line, email, etc. but 
there ought to be other tools better suited for patch review...








---
drivers/net/ethernet/aurora/nb8800.c | 14 +-
1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c
b/drivers/net/ethernet/aurora/nb8800.c
index ecc4a33..e1fb071 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1460,7 +1460,19 @@ static int nb8800_probe(struct platform_device *pdev)
goto err_disable_clk;
}

-   priv->phy_node = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
+   if (of_phy_is_fixed_link(pdev->dev.of_node)) {
+   ret = of_phy_register_fixed_link(pdev->dev.of_node);
+   if (ret < 0) {
+   dev_err(&pdev->dev, "bad fixed-link spec\n");
+   goto err_free_bus;
+   }
+   priv->phy_node = of_node_get(pdev->dev.of_node);
+   }
+
+   if (!priv->phy_node)
+   priv->phy_node = of_parse_phandle(pdev->dev.of_node,
+ "phy-handle", 0);
+
if (!priv->phy_node) {
dev_err(&pdev->dev, "no PHY specified\n");
ret = -ENODEV;
--
2.1.4








Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-08 Thread Jiri Pirko
Mon, Feb 08, 2016 at 11:15:38AM CET, han...@stressinduktion.org wrote:
>Hello,
>
>On 06.02.2016 20:40, Jiri Pirko wrote:
>>Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoi...@gmail.com wrote:
>>>On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:

Okay. I see it more as changing mode of operation of hardware and thus has
not really anything to do with networking. If you say you change ethernet to
infiniband it has something to do with networking, sure. But I am fine with
this, I just thought the code size could be reduced by adding this to sysfs
quite a lot. I don't have a strong opinion on this.
>>>
>>>there is already a way to change eth/ib via
>>>echo 'eth' > /sys/bus/pci/drivers/mlx4_core/:02:00.0/mlx4_port1
>>>
>>>sounds like this is another way to achieve the same?
>>
>>It is. However the current way is driver-specific, not correct.
>
>Why is driver specific not correct? Actually it is very much a device
>specific thing, isn't it?

Well, adding driver specific sysfs file called "driver_name_port_type"
does not seem correct to me.

>
>>For mlx5, we need the same, it cannot be done in this way. Do devlink is
>>the correct way to go.
>
>Do two drivers already justify a new complete netlink api? Doesn't this
>create the same problems like netdevice naming problems which needed multiple
>years to become stable in case we have multiple cards or some administrator

The thing is, other driver would use it as well, but there's no way to
do it :) So vendors have their proprietary configuration utils. Devlink
objective is to avoid those, to introduce vendor-neutral interface.


>reorders the cards (biosdevorder, systemd/udev issues)? Are ports always
>stable? How can we have a 1:1 relationship with ifindexes and how can they be
>stable? It is impossible to use that in scripts?

Port index is setup by driver always, they have stable internal
numbering. devlink device name is not stable (as for example netdev
name), but can be easily identified by bus name and device name. I don't
see a reason why udev cannot rename it according to some rules. By the
way, this is very similar to phyX wireless devices.


>
>>>Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
>>>Or this is for switches instead of nics?
>>>Then why it's not adding to switchdev?
>>
>>Note this is not specific to switch ASICs. This is for all network devices.
>
>That's actually my fear. The relationship from "devlink-names" to ifindexes I
>didn't understand at all architecturally.

Again, this is very similar to phyX wireless devices.
I don't understand the reason for your fear :)


RE: [PATCH net-next 1/2] mpls: packet stats

2016-02-08 Thread David Laight
From: Francois Romieu
> Sent: 06 February 2016 10:59
> > +void mpls_stats_inc_outucastpkts(struct net_device *dev,
> > +const struct sk_buff *skb)
> > +{
> > +   struct mpls_dev *mdev;
> > +   struct inet6_dev *in6dev;
> 
> Nit: the scope can be reduced for both variables.

And hiding the definitions of variables in the middle of
functions just makes them harder to find.   

David



[PATCH iproute2 04/21] iplink: bridge: add support for IFLA_BR_GROUP_FWD_MASK

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_GROUP_FWD_MASK attribute
in iproute2 so it can change the group forwarding mask.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 8504be5625fa..fb448f9f863d 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -27,6 +27,7 @@ static void print_explain(FILE *f)
"  [ ageing_time AGEING_TIME ]\n"
"  [ stp_state STP_STATE ]\n"
"  [ priority PRIORITY ]\n"
+   "  [ group_fwd_mask MASK ]\n"
"  [ vlan_filtering VLAN_FILTERING ]\n"
"  [ vlan_protocol VLAN_PROTOCOL ]\n"
"\n"
@@ -110,6 +111,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
return -1;
}
addattr16(n, 1024, IFLA_BR_VLAN_PROTOCOL, vlan_proto);
+   } else if (matches(*argv, "group_fwd_mask") == 0) {
+   __u16 fwd_mask;
+
+   NEXT_ARG();
+   if (get_u16(&fwd_mask, *argv, 0))
+   invarg("invalid group_fwd_mask", *argv);
+
+   addattr16(n, 1024, IFLA_BR_GROUP_FWD_MASK, fwd_mask);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -212,6 +221,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_GC_TIMER])
fprintf(f, "gc_timer %llu ",
rta_getattr_u64(tb[IFLA_BR_GC_TIMER]));
+
+   if (tb[IFLA_BR_GROUP_FWD_MASK])
+   fprintf(f, "group_fwd_mask %#x ",
+   rta_getattr_u16(tb[IFLA_BR_GROUP_FWD_MASK]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 01/21] iplink: bridge: export bridge_id and designated_root

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Netlink returns the bridge_id and designated_root, we just need to
make them visible.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 25 +
 1 file changed, 25 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 00804093dcb5..6978e58e6b74 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -39,6 +39,15 @@ static void explain(void)
print_explain(stderr);
 }
 
+static void br_dump_bridge_id(const struct ifla_bridge_id *id, char *buf,
+ size_t len)
+{
+   const unsigned char *x = (const unsigned char *)id;
+
+   snprintf(buf, len, "%.2x%.2x.%.2x%.2x%.2x%.2x%.2x%.2x", x[0], x[1],
+x[2], x[3], x[4], x[5], x[6], x[7]);
+}
+
 static int bridge_parse_opt(struct link_util *lu, int argc, char **argv,
struct nlmsghdr *n)
 {
@@ -155,6 +164,22 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
ll_proto_n2a(rta_getattr_u16(tb[IFLA_BR_VLAN_PROTOCOL]),
 b1, sizeof(b1)));
}
+
+   if (tb[IFLA_BR_BRIDGE_ID]) {
+   char bridge_id[32];
+
+   br_dump_bridge_id(RTA_DATA(tb[IFLA_BR_BRIDGE_ID]), bridge_id,
+ sizeof(bridge_id));
+   fprintf(f, "bridge_id %s ", bridge_id);
+   }
+
+   if (tb[IFLA_BR_ROOT_ID]) {
+   char root_id[32];
+
+   br_dump_bridge_id(RTA_DATA(tb[IFLA_BR_BRIDGE_ID]), root_id,
+ sizeof(root_id));
+   fprintf(f, "designated_root %s ", root_id);
+   }
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 21/21] iplink: bridge: add support for netfilter call attributes

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_NF_CALL_(IP|IP6|ARP)TABLES
attributes in iproute2 so it can change their values.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 45 +
 1 file changed, 45 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index b531cb131e7c..05e27fca021b 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -46,6 +46,9 @@ static void print_explain(FILE *f)
"  [ mcast_query_interval QUERY_INTERVAL ]\n"
"  [ mcast_query_response_interval 
QUERY_RESPONSE_INTERVAL ]\n"
"  [ mcast_startup_query_interval 
STARTUP_QUERY_INTERVAL ]\n"
+   "  [ nf_call_iptables NF_CALL_IPTABLES ]\n"
+   "  [ nf_call_ip6tables NF_CALL_IP6TABLES ]\n"
+   "  [ nf_call_arptables NF_CALL_ARPTABLES ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -291,6 +294,36 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
}
addattr64(n, 1024, IFLA_BR_MCAST_STARTUP_QUERY_INTVL,
  mcast_startup_query_intvl);
+   } else if (matches(*argv, "nf_call_iptables") == 0) {
+   __u8 nf_call_ipt;
+
+   NEXT_ARG();
+   if (get_u8(&nf_call_ipt, *argv, 0)) {
+   invarg("invalid nf_call_iptables", *argv);
+   return -1;
+   }
+   addattr8(n, 1024, IFLA_BR_NF_CALL_IPTABLES,
+nf_call_ipt);
+   } else if (matches(*argv, "nf_call_ip6tables") == 0) {
+   __u8 nf_call_ip6t;
+
+   NEXT_ARG();
+   if (get_u8(&nf_call_ip6t, *argv, 0)) {
+   invarg("invalid nf_call_ip6tables", *argv);
+   return -1;
+   }
+   addattr8(n, 1024, IFLA_BR_NF_CALL_IP6TABLES,
+nf_call_ip6t);
+   } else if (matches(*argv, "nf_call_arptables") == 0) {
+   __u8 nf_call_arpt;
+
+   NEXT_ARG();
+   if (get_u8(&nf_call_arpt, *argv, 0)) {
+   invarg("invalid nf_call_arptables", *argv);
+   return -1;
+   }
+   addattr8(n, 1024, IFLA_BR_NF_CALL_ARPTABLES,
+nf_call_arpt);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -466,6 +499,18 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_STARTUP_QUERY_INTVL])
fprintf(f, "mcast_startup_query_interval %llu ",
rta_getattr_u64(tb[IFLA_BR_MCAST_STARTUP_QUERY_INTVL]));
+
+   if (tb[IFLA_BR_NF_CALL_IPTABLES])
+   fprintf(f, "nf_call_iptables %u ",
+   rta_getattr_u8(tb[IFLA_BR_NF_CALL_IPTABLES]));
+
+   if (tb[IFLA_BR_NF_CALL_IP6TABLES])
+   fprintf(f, "nf_call_ip6tables %u ",
+   rta_getattr_u8(tb[IFLA_BR_NF_CALL_IP6TABLES]));
+
+   if (tb[IFLA_BR_NF_CALL_ARPTABLES])
+   fprintf(f, "nf_call_arptables %u ",
+   rta_getattr_u8(tb[IFLA_BR_NF_CALL_ARPTABLES]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 06/21] iplink: bridge: add support for IFLA_BR_VLAN_DEFAULT_PVID

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_VLAN_DEFAULT_PVID
attribute in iproute2 so it can change the default pvid.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 3d343f7649fe..e8bc66b498b3 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -31,6 +31,7 @@ static void print_explain(FILE *f)
"  [ group_address ADDRESS ]\n"
"  [ vlan_filtering VLAN_FILTERING ]\n"
"  [ vlan_protocol VLAN_PROTOCOL ]\n"
+   "  [ vlan_default_pvid VLAN_DEFAULT_PVID ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -129,6 +130,15 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
if (len < 0)
return -1;
addattr_l(n, 1024, IFLA_BR_GROUP_ADDR, llabuf, len);
+   } else if (matches(*argv, "vlan_default_pvid") == 0) {
+   __u16 default_pvid;
+
+   NEXT_ARG();
+   if (get_u16(&default_pvid, *argv, 0))
+   invarg("invalid vlan_default_pvid", *argv);
+
+   addattr16(n, 1024, IFLA_BR_VLAN_DEFAULT_PVID,
+ default_pvid);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -232,6 +242,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
fprintf(f, "gc_timer %llu ",
rta_getattr_u64(tb[IFLA_BR_GC_TIMER]));
 
+   if (tb[IFLA_BR_VLAN_DEFAULT_PVID])
+   fprintf(f, "vlan_default_pvid %u ",
+   rta_getattr_u16(tb[IFLA_BR_VLAN_DEFAULT_PVID]));
+
if (tb[IFLA_BR_GROUP_FWD_MASK])
fprintf(f, "group_fwd_mask %#x ",
rta_getattr_u16(tb[IFLA_BR_GROUP_FWD_MASK]));
-- 
2.4.3



[PATCH iproute2 18/21] iplink: bridge: add support for IFLA_BR_MCAST_QUERY_INTVL

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_QUERY_INTVL attribute
in iproute2 so it can change the query interval.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index f896f54aa6f8..c5e1fcc6b9a9 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -43,6 +43,7 @@ static void print_explain(FILE *f)
"  [ mcast_last_member_interval 
LAST_MEMBER_INTERVAL ]\n"
"  [ mcast_membership_interval 
MEMBERSHIP_INTERVAL ]\n"
"  [ mcast_querier_interval QUERIER_INTERVAL 
]\n"
+   "  [ mcast_query_interval QUERY_INTERVAL ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -255,6 +256,17 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
}
addattr64(n, 1024, IFLA_BR_MCAST_QUERIER_INTVL,
  mcast_querier_intvl);
+   } else if (matches(*argv, "mcast_query_interval") == 0) {
+   __u64 mcast_query_intvl;
+
+   NEXT_ARG();
+   if (get_u64(&mcast_query_intvl, *argv, 0)) {
+   invarg("invalid mcast_query_interval",
+  *argv);
+   return -1;
+   }
+   addattr64(n, 1024, IFLA_BR_MCAST_QUERY_INTVL,
+ mcast_query_intvl);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -418,6 +430,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_QUERIER_INTVL])
fprintf(f, "mcast_querier_interval %llu ",
rta_getattr_u64(tb[IFLA_BR_MCAST_QUERIER_INTVL]));
+
+   if (tb[IFLA_BR_MCAST_QUERY_INTVL])
+   fprintf(f, "mcast_query_interval %llu ",
+   rta_getattr_u64(tb[IFLA_BR_MCAST_QUERY_INTVL]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 17/21] iplink: bridge: add support for IFLA_BR_MCAST_QUERIER_INTVL

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_QUERIER_INTVL
attribute in iproute2 so it can change the querier interval.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 9a6c9418ff0f..f896f54aa6f8 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -42,6 +42,7 @@ static void print_explain(FILE *f)
"  [ mcast_startup_query_count 
STARTUP_QUERY_COUNT ]\n"
"  [ mcast_last_member_interval 
LAST_MEMBER_INTERVAL ]\n"
"  [ mcast_membership_interval 
MEMBERSHIP_INTERVAL ]\n"
+   "  [ mcast_querier_interval QUERIER_INTERVAL 
]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -243,6 +244,17 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
}
addattr64(n, 1024, IFLA_BR_MCAST_MEMBERSHIP_INTVL,
  mcast_membership_intvl);
+   } else if (matches(*argv, "mcast_querier_interval") == 0) {
+   __u64 mcast_querier_intvl;
+
+   NEXT_ARG();
+   if (get_u64(&mcast_querier_intvl, *argv, 0)) {
+   invarg("invalid mcast_querier_interval",
+  *argv);
+   return -1;
+   }
+   addattr64(n, 1024, IFLA_BR_MCAST_QUERIER_INTVL,
+ mcast_querier_intvl);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -402,6 +414,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_MEMBERSHIP_INTVL])
fprintf(f, "mcast_membership_interval %llu ",
rta_getattr_u64(tb[IFLA_BR_MCAST_MEMBERSHIP_INTVL]));
+
+   if (tb[IFLA_BR_MCAST_QUERIER_INTVL])
+   fprintf(f, "mcast_querier_interval %llu ",
+   rta_getattr_u64(tb[IFLA_BR_MCAST_QUERIER_INTVL]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 09/21] iplink: bridge: add support for IFLA_BR_MCAST_QUERY_USE_IFADDR

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_QUERY_USE_IFADDR
attribute in iproute2 so it can toggle the multicast_query_use_ifaddr val.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 208d155df440..30f7b8014e2a 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -34,6 +34,7 @@ static void print_explain(FILE *f)
"  [ vlan_default_pvid VLAN_DEFAULT_PVID ]\n"
"  [ mcast_snooping MULTICAST_SNOOPING ]\n"
"  [ mcast_router MULTICAST_ROUTER ]\n"
+   "  [ mcast_query_use_ifaddr 
MCAST_QUERY_USE_IFADDR ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -157,6 +158,16 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
invarg("invalid mcast_snooping", *argv);
 
addattr8(n, 1024, IFLA_BR_MCAST_SNOOPING, mcast_snoop);
+   } else if (matches(*argv, "mcast_query_use_ifaddr") == 0) {
+   __u8 mcast_qui;
+
+   NEXT_ARG();
+   if (get_u8(&mcast_qui, *argv, 0))
+   invarg("invalid mcast_query_use_ifaddr",
+  *argv);
+
+   addattr8(n, 1024, IFLA_BR_MCAST_QUERY_USE_IFADDR,
+mcast_qui);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -284,6 +295,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_ROUTER])
fprintf(f, "mcast_router %u ",
rta_getattr_u8(tb[IFLA_BR_MCAST_ROUTER]));
+
+   if (tb[IFLA_BR_MCAST_QUERY_USE_IFADDR])
+   fprintf(f, "mcast_query_use_ifaddr %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_QUERY_USE_IFADDR]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 19/21] iplink: bridge: add support for IFLA_BR_MCAST_QUERY_RESPONSE_INTVL

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
attribute in iproute2 so it can change the query response interval.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index c5e1fcc6b9a9..063b87142882 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -44,6 +44,7 @@ static void print_explain(FILE *f)
"  [ mcast_membership_interval 
MEMBERSHIP_INTERVAL ]\n"
"  [ mcast_querier_interval QUERIER_INTERVAL 
]\n"
"  [ mcast_query_interval QUERY_INTERVAL ]\n"
+   "  [ mcast_query_response_interval 
QUERY_RESPONSE_INTERVAL ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -267,6 +268,17 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
}
addattr64(n, 1024, IFLA_BR_MCAST_QUERY_INTVL,
  mcast_query_intvl);
+   } else if (!matches(*argv, "mcast_query_response_interval")) {
+   __u64 mcast_query_resp_intvl;
+
+   NEXT_ARG();
+   if (get_u64(&mcast_query_resp_intvl, *argv, 0)) {
+   invarg("invalid mcast_query_response_interval",
+  *argv);
+   return -1;
+   }
+   addattr64(n, 1024, IFLA_BR_MCAST_QUERY_RESPONSE_INTVL,
+ mcast_query_resp_intvl);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -434,6 +446,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_QUERY_INTVL])
fprintf(f, "mcast_query_interval %llu ",
rta_getattr_u64(tb[IFLA_BR_MCAST_QUERY_INTVL]));
+
+   if (tb[IFLA_BR_MCAST_QUERY_RESPONSE_INTVL])
+   fprintf(f, "mcast_query_response_interval %llu ",
+   
rta_getattr_u64(tb[IFLA_BR_MCAST_QUERY_RESPONSE_INTVL]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 20/21] iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_INTVL

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_INTVL
attribute in iproute2 so it can change the startup query interval.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 063b87142882..b531cb131e7c 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -45,6 +45,7 @@ static void print_explain(FILE *f)
"  [ mcast_querier_interval QUERIER_INTERVAL 
]\n"
"  [ mcast_query_interval QUERY_INTERVAL ]\n"
"  [ mcast_query_response_interval 
QUERY_RESPONSE_INTERVAL ]\n"
+   "  [ mcast_startup_query_interval 
STARTUP_QUERY_INTERVAL ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -279,6 +280,17 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
}
addattr64(n, 1024, IFLA_BR_MCAST_QUERY_RESPONSE_INTVL,
  mcast_query_resp_intvl);
+   } else if (!matches(*argv, "mcast_startup_query_interval")) {
+   __u64 mcast_startup_query_intvl;
+
+   NEXT_ARG();
+   if (get_u64(&mcast_startup_query_intvl, *argv, 0)) {
+   invarg("invalid mcast_startup_query_interval",
+  *argv);
+   return -1;
+   }
+   addattr64(n, 1024, IFLA_BR_MCAST_STARTUP_QUERY_INTVL,
+ mcast_startup_query_intvl);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -450,6 +462,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_QUERY_RESPONSE_INTVL])
fprintf(f, "mcast_query_response_interval %llu ",

rta_getattr_u64(tb[IFLA_BR_MCAST_QUERY_RESPONSE_INTVL]));
+
+   if (tb[IFLA_BR_MCAST_STARTUP_QUERY_INTVL])
+   fprintf(f, "mcast_startup_query_interval %llu ",
+   rta_getattr_u64(tb[IFLA_BR_MCAST_STARTUP_QUERY_INTVL]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 10/21] iplink: bridge: add support for IFLA_BR_MCAST_QUERIER

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_QUERIER attribute
in iproute2 so it can toggle the mcast querier value.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 30f7b8014e2a..0d5c70312f1d 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -35,6 +35,7 @@ static void print_explain(FILE *f)
"  [ mcast_snooping MULTICAST_SNOOPING ]\n"
"  [ mcast_router MULTICAST_ROUTER ]\n"
"  [ mcast_query_use_ifaddr 
MCAST_QUERY_USE_IFADDR ]\n"
+   "  [ mcast_querier MULTICAST_QUERIER ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -168,6 +169,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr8(n, 1024, IFLA_BR_MCAST_QUERY_USE_IFADDR,
 mcast_qui);
+   } else if (matches(*argv, "mcast_querier") == 0) {
+   __u8 mcast_querier;
+
+   NEXT_ARG();
+   if (get_u8(&mcast_querier, *argv, 0))
+   invarg("invalid mcast_querier", *argv);
+
+   addattr8(n, 1024, IFLA_BR_MCAST_QUERIER, mcast_querier);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -299,6 +308,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_QUERY_USE_IFADDR])
fprintf(f, "mcast_query_use_ifaddr %u ",
rta_getattr_u8(tb[IFLA_BR_MCAST_QUERY_USE_IFADDR]));
+
+   if (tb[IFLA_BR_MCAST_QUERIER])
+   fprintf(f, "mcast_querier %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_QUERIER]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 14/21] iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_CNT

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_STARTUP_QUERY_CNT
attribute in iproute2 so it can change the startup query count.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index d98b126698e2..d10a9255eb56 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -39,6 +39,7 @@ static void print_explain(FILE *f)
"  [ mcast_hash_elasticity HASH_ELASTICITY ]\n"
"  [ mcast_hash_max HASH_MAX ]\n"
"  [ mcast_last_member_count LAST_MEMBER_COUNT 
]\n"
+   "  [ mcast_startup_query_count 
STARTUP_QUERY_COUNT ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -209,6 +210,16 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr32(n, 1024, IFLA_BR_MCAST_LAST_MEMBER_CNT,
  mcast_lmc);
+   } else if (matches(*argv, "mcast_startup_query_count") == 0) {
+   __u32 mcast_sqc;
+
+   NEXT_ARG();
+   if (get_u32(&mcast_sqc, *argv, 0))
+   invarg("invalid mcast_startup_query_count",
+  *argv);
+
+   addattr32(n, 1024, IFLA_BR_MCAST_STARTUP_QUERY_CNT,
+ mcast_sqc);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -356,6 +367,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_LAST_MEMBER_CNT])
fprintf(f, "mcast_last_member_count %u ",
rta_getattr_u32(tb[IFLA_BR_MCAST_LAST_MEMBER_CNT]));
+
+   if (tb[IFLA_BR_MCAST_STARTUP_QUERY_CNT])
+   fprintf(f, "mcast_startup_query_count %u ",
+   rta_getattr_u32(tb[IFLA_BR_MCAST_STARTUP_QUERY_CNT]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 11/21] iplink: bridge: add support for IFLA_BR_MCAST_HASH_ELASTICITY

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_HASH_ELASTICTITY
attribute in iproute2 so it can change the hash elasticity value.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 0d5c70312f1d..14aae3e9b0d2 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -36,6 +36,7 @@ static void print_explain(FILE *f)
"  [ mcast_router MULTICAST_ROUTER ]\n"
"  [ mcast_query_use_ifaddr 
MCAST_QUERY_USE_IFADDR ]\n"
"  [ mcast_querier MULTICAST_QUERIER ]\n"
+   "  [ mcast_hash_elasticity HASH_ELASTICITY ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -177,6 +178,16 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
invarg("invalid mcast_querier", *argv);
 
addattr8(n, 1024, IFLA_BR_MCAST_QUERIER, mcast_querier);
+   } else if (matches(*argv, "mcast_hash_elasticity") == 0) {
+   __u32 mcast_hash_el;
+
+   NEXT_ARG();
+   if (get_u32(&mcast_hash_el, *argv, 0))
+   invarg("invalid mcast_hash_elasticity",
+  *argv);
+
+   addattr32(n, 1024, IFLA_BR_MCAST_HASH_ELASTICITY,
+ mcast_hash_el);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -312,6 +323,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_QUERIER])
fprintf(f, "mcast_querier %u ",
rta_getattr_u8(tb[IFLA_BR_MCAST_QUERIER]));
+
+   if (tb[IFLA_BR_MCAST_HASH_ELASTICITY])
+   fprintf(f, "mcast_hash_elasticity %u ",
+   rta_getattr_u32(tb[IFLA_BR_MCAST_HASH_ELASTICITY]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 15/21] iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_INTVL

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_INTVL
attribute in iproute2 so it can change the last member interval.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index d10a9255eb56..f2f52e078dbd 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -40,6 +40,7 @@ static void print_explain(FILE *f)
"  [ mcast_hash_max HASH_MAX ]\n"
"  [ mcast_last_member_count LAST_MEMBER_COUNT 
]\n"
"  [ mcast_startup_query_count 
STARTUP_QUERY_COUNT ]\n"
+   "  [ mcast_last_member_interval 
LAST_MEMBER_INTERVAL ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -220,6 +221,16 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr32(n, 1024, IFLA_BR_MCAST_STARTUP_QUERY_CNT,
  mcast_sqc);
+   } else if (matches(*argv, "mcast_last_member_interval") == 0) {
+   __u64 mcast_last_member_intvl;
+
+   NEXT_ARG();
+   if (get_u64(&mcast_last_member_intvl, *argv, 0))
+   invarg("invalid mcast_last_member_interval",
+  *argv);
+
+   addattr64(n, 1024, IFLA_BR_MCAST_LAST_MEMBER_INTVL,
+ mcast_last_member_intvl);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -371,6 +382,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_STARTUP_QUERY_CNT])
fprintf(f, "mcast_startup_query_count %u ",
rta_getattr_u32(tb[IFLA_BR_MCAST_STARTUP_QUERY_CNT]));
+
+   if (tb[IFLA_BR_MCAST_LAST_MEMBER_INTVL])
+   fprintf(f, "mcast_last_member_interval %llu ",
+   rta_getattr_u64(tb[IFLA_BR_MCAST_LAST_MEMBER_INTVL]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 05/21] iplink: bridge: add support for IFLA_BR_GROUP_ADDR

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_GROUP_ADDR attribute
in iproute2 so it can change the group address.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 19 +++
 1 file changed, 19 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index fb448f9f863d..3d343f7649fe 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -28,6 +28,7 @@ static void print_explain(FILE *f)
"  [ stp_state STP_STATE ]\n"
"  [ priority PRIORITY ]\n"
"  [ group_fwd_mask MASK ]\n"
+   "  [ group_address ADDRESS ]\n"
"  [ vlan_filtering VLAN_FILTERING ]\n"
"  [ vlan_protocol VLAN_PROTOCOL ]\n"
"\n"
@@ -119,6 +120,15 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
invarg("invalid group_fwd_mask", *argv);
 
addattr16(n, 1024, IFLA_BR_GROUP_FWD_MASK, fwd_mask);
+   } else if (matches(*argv, "group_address") == 0) {
+   char llabuf[32];
+   int len;
+
+   NEXT_ARG();
+   len = ll_addr_a2n(llabuf, sizeof(llabuf), *argv);
+   if (len < 0)
+   return -1;
+   addattr_l(n, 1024, IFLA_BR_GROUP_ADDR, llabuf, len);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -225,6 +235,15 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_GROUP_FWD_MASK])
fprintf(f, "group_fwd_mask %#x ",
rta_getattr_u16(tb[IFLA_BR_GROUP_FWD_MASK]));
+
+   if (tb[IFLA_BR_GROUP_ADDR]) {
+   SPRINT_BUF(mac);
+
+   fprintf(f, "group_address %s ",
+   ll_addr_n2a(RTA_DATA(tb[IFLA_BR_GROUP_ADDR]),
+   RTA_PAYLOAD(tb[IFLA_BR_GROUP_ADDR]),
+   1 /*ARPHDR_ETHER*/, mac, sizeof(mac)));
+   }
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 12/21] iplink: bridge: add support for IFLA_BR_MCAST_HASH_MAX

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_HASH_MAX attribute
in iproute2 so it can change the maximum hashed entries.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 14aae3e9b0d2..d15bd45dcdf6 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -37,6 +37,7 @@ static void print_explain(FILE *f)
"  [ mcast_query_use_ifaddr 
MCAST_QUERY_USE_IFADDR ]\n"
"  [ mcast_querier MULTICAST_QUERIER ]\n"
"  [ mcast_hash_elasticity HASH_ELASTICITY ]\n"
+   "  [ mcast_hash_max HASH_MAX ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -188,6 +189,15 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr32(n, 1024, IFLA_BR_MCAST_HASH_ELASTICITY,
  mcast_hash_el);
+   } else if (matches(*argv, "mcast_hash_max") == 0) {
+   __u32 mcast_hash_max;
+
+   NEXT_ARG();
+   if (get_u32(&mcast_hash_max, *argv, 0))
+   invarg("invalid mcast_hash_max", *argv);
+
+   addattr32(n, 1024, IFLA_BR_MCAST_HASH_MAX,
+ mcast_hash_max);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -327,6 +337,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_HASH_ELASTICITY])
fprintf(f, "mcast_hash_elasticity %u ",
rta_getattr_u32(tb[IFLA_BR_MCAST_HASH_ELASTICITY]));
+
+   if (tb[IFLA_BR_MCAST_HASH_MAX])
+   fprintf(f, "mcast_hash_max %u ",
+   rta_getattr_u32(tb[IFLA_BR_MCAST_HASH_MAX]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 13/21] iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_CNT

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_LAST_MEMBER_CNT
attribute in iproute2 so it can change the last member count value.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index d15bd45dcdf6..d98b126698e2 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -38,6 +38,7 @@ static void print_explain(FILE *f)
"  [ mcast_querier MULTICAST_QUERIER ]\n"
"  [ mcast_hash_elasticity HASH_ELASTICITY ]\n"
"  [ mcast_hash_max HASH_MAX ]\n"
+   "  [ mcast_last_member_count LAST_MEMBER_COUNT 
]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -198,6 +199,16 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr32(n, 1024, IFLA_BR_MCAST_HASH_MAX,
  mcast_hash_max);
+   } else if (matches(*argv, "mcast_last_member_count") == 0) {
+   __u32 mcast_lmc;
+
+   NEXT_ARG();
+   if (get_u32(&mcast_lmc, *argv, 0))
+   invarg("invalid mcast_last_member_count",
+  *argv);
+
+   addattr32(n, 1024, IFLA_BR_MCAST_LAST_MEMBER_CNT,
+ mcast_lmc);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -341,6 +352,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_HASH_MAX])
fprintf(f, "mcast_hash_max %u ",
rta_getattr_u32(tb[IFLA_BR_MCAST_HASH_MAX]));
+
+   if (tb[IFLA_BR_MCAST_LAST_MEMBER_CNT])
+   fprintf(f, "mcast_last_member_count %u ",
+   rta_getattr_u32(tb[IFLA_BR_MCAST_LAST_MEMBER_CNT]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 08/21] iplink: bridge: add support for IFLA_BR_MCAST_SNOOPING

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_SNOOPING attribute
in iproute2 so it can change the multicast snooping value.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 11577a8994a3..208d155df440 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -32,6 +32,7 @@ static void print_explain(FILE *f)
"  [ vlan_filtering VLAN_FILTERING ]\n"
"  [ vlan_protocol VLAN_PROTOCOL ]\n"
"  [ vlan_default_pvid VLAN_DEFAULT_PVID ]\n"
+   "  [ mcast_snooping MULTICAST_SNOOPING ]\n"
"  [ mcast_router MULTICAST_ROUTER ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
@@ -148,6 +149,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
invarg("invalid mcast_router", *argv);
 
addattr8(n, 1024, IFLA_BR_MCAST_ROUTER, mcast_router);
+   } else if (matches(*argv, "mcast_snooping") == 0) {
+   __u8 mcast_snoop;
+
+   NEXT_ARG();
+   if (get_u8(&mcast_snoop, *argv, 0))
+   invarg("invalid mcast_snooping", *argv);
+
+   addattr8(n, 1024, IFLA_BR_MCAST_SNOOPING, mcast_snoop);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -268,6 +277,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
1 /*ARPHDR_ETHER*/, mac, sizeof(mac)));
}
 
+   if (tb[IFLA_BR_MCAST_SNOOPING])
+   fprintf(f, "mcast_snooping %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_SNOOPING]));
+
if (tb[IFLA_BR_MCAST_ROUTER])
fprintf(f, "mcast_router %u ",
rta_getattr_u8(tb[IFLA_BR_MCAST_ROUTER]));
-- 
2.4.3



[PATCH iproute2 02/21] iplink: bridge: export root_(port|path_cost), topology_change and change_detected

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Netlink already export these values, we just need to make them visible.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index 6978e58e6b74..d9a725b0be0f 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -180,6 +180,22 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
  sizeof(root_id));
fprintf(f, "designated_root %s ", root_id);
}
+
+   if (tb[IFLA_BR_ROOT_PORT])
+   fprintf(f, "root_port %u ",
+   rta_getattr_u16(tb[IFLA_BR_ROOT_PORT]));
+
+   if (tb[IFLA_BR_ROOT_PATH_COST])
+   fprintf(f, "root_path_cost %u ",
+   rta_getattr_u32(tb[IFLA_BR_ROOT_PATH_COST]));
+
+   if (tb[IFLA_BR_TOPOLOGY_CHANGE])
+   fprintf(f, "topology_change %u ",
+   rta_getattr_u8(tb[IFLA_BR_TOPOLOGY_CHANGE]));
+
+   if (tb[IFLA_BR_TOPOLOGY_CHANGE_DETECTED])
+   fprintf(f, "topology_change_detected %u ",
+   rta_getattr_u8(tb[IFLA_BR_TOPOLOGY_CHANGE_DETECTED]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 16/21] iplink: bridge: add support for IFLA_BR_MCAST_MEMBERSHIP_INTVL

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_MEMBERSHIP_INTVL
attribute in iproute2 so it can change the membership interval.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index f2f52e078dbd..9a6c9418ff0f 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -41,6 +41,7 @@ static void print_explain(FILE *f)
"  [ mcast_last_member_count LAST_MEMBER_COUNT 
]\n"
"  [ mcast_startup_query_count 
STARTUP_QUERY_COUNT ]\n"
"  [ mcast_last_member_interval 
LAST_MEMBER_INTERVAL ]\n"
+   "  [ mcast_membership_interval 
MEMBERSHIP_INTERVAL ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -231,6 +232,17 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr64(n, 1024, IFLA_BR_MCAST_LAST_MEMBER_INTVL,
  mcast_last_member_intvl);
+   } else if (matches(*argv, "mcast_membership_interval") == 0) {
+   __u64 mcast_membership_intvl;
+
+   NEXT_ARG();
+   if (get_u64(&mcast_membership_intvl, *argv, 0)) {
+   invarg("invalid mcast_membership_interval",
+  *argv);
+   return -1;
+   }
+   addattr64(n, 1024, IFLA_BR_MCAST_MEMBERSHIP_INTVL,
+ mcast_membership_intvl);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -386,6 +398,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_MCAST_LAST_MEMBER_INTVL])
fprintf(f, "mcast_last_member_interval %llu ",
rta_getattr_u64(tb[IFLA_BR_MCAST_LAST_MEMBER_INTVL]));
+
+   if (tb[IFLA_BR_MCAST_MEMBERSHIP_INTVL])
+   fprintf(f, "mcast_membership_interval %llu ",
+   rta_getattr_u64(tb[IFLA_BR_MCAST_MEMBERSHIP_INTVL]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 03/21] iplink: bridge: export read-only timers

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Netlink already provides hello_timer, tcn_timer, topology_change_timer
and gc_timer, so let's make them visible.

Signed-off-by: Nikolay Aleksandrov 
---
 include/utils.h|  1 -
 ip/iplink_bridge.c | 16 
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/include/utils.h b/include/utils.h
index 7310f4e0e5db..f109521a904e 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -175,7 +175,6 @@ static inline __u32 nl_mgrp(__u32 group)
return group ? (1 << (group - 1)) : 0;
 }
 
-
 int print_timestamp(FILE *fp);
 void print_nlmsg_timestamp(FILE *fp, const struct nlmsghdr *n);
 
diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index d9a725b0be0f..8504be5625fa 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -196,6 +196,22 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
if (tb[IFLA_BR_TOPOLOGY_CHANGE_DETECTED])
fprintf(f, "topology_change_detected %u ",
rta_getattr_u8(tb[IFLA_BR_TOPOLOGY_CHANGE_DETECTED]));
+
+   if (tb[IFLA_BR_HELLO_TIMER])
+   fprintf(f, "hello_timer %llu ",
+   rta_getattr_u64(tb[IFLA_BR_HELLO_TIMER]));
+
+   if (tb[IFLA_BR_TCN_TIMER])
+   fprintf(f, "tcn_timer %llu ",
+   rta_getattr_u64(tb[IFLA_BR_TCN_TIMER]));
+
+   if (tb[IFLA_BR_TOPOLOGY_CHANGE_TIMER])
+   fprintf(f, "topology_change_timer %llu ",
+   rta_getattr_u64(tb[IFLA_BR_TOPOLOGY_CHANGE_TIMER]));
+
+   if (tb[IFLA_BR_GC_TIMER])
+   fprintf(f, "gc_timer %llu ",
+   rta_getattr_u64(tb[IFLA_BR_GC_TIMER]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 07/21] iplink: bridge: add support for IFLA_BR_MCAST_ROUTER

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

This patch implements support for the IFLA_BR_MCAST_ROUTER attribute
in iproute2 so it can change the multicast router value.

Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bridge.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/ip/iplink_bridge.c b/ip/iplink_bridge.c
index e8bc66b498b3..11577a8994a3 100644
--- a/ip/iplink_bridge.c
+++ b/ip/iplink_bridge.c
@@ -32,6 +32,7 @@ static void print_explain(FILE *f)
"  [ vlan_filtering VLAN_FILTERING ]\n"
"  [ vlan_protocol VLAN_PROTOCOL ]\n"
"  [ vlan_default_pvid VLAN_DEFAULT_PVID ]\n"
+   "  [ mcast_router MULTICAST_ROUTER ]\n"
"\n"
"Where: VLAN_PROTOCOL := { 802.1Q | 802.1ad }\n"
);
@@ -139,6 +140,14 @@ static int bridge_parse_opt(struct link_util *lu, int 
argc, char **argv,
 
addattr16(n, 1024, IFLA_BR_VLAN_DEFAULT_PVID,
  default_pvid);
+   } else if (matches(*argv, "mcast_router") == 0) {
+   __u8 mcast_router;
+
+   NEXT_ARG();
+   if (get_u8(&mcast_router, *argv, 0))
+   invarg("invalid mcast_router", *argv);
+
+   addattr8(n, 1024, IFLA_BR_MCAST_ROUTER, mcast_router);
} else if (matches(*argv, "help") == 0) {
explain();
return -1;
@@ -258,6 +267,10 @@ static void bridge_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *tb[])
RTA_PAYLOAD(tb[IFLA_BR_GROUP_ADDR]),
1 /*ARPHDR_ETHER*/, mac, sizeof(mac)));
}
+
+   if (tb[IFLA_BR_MCAST_ROUTER])
+   fprintf(f, "mcast_router %u ",
+   rta_getattr_u8(tb[IFLA_BR_MCAST_ROUTER]));
 }
 
 static void bridge_print_help(struct link_util *lu, int argc, char **argv,
-- 
2.4.3



[PATCH iproute2 00/21] bridge: complete netlink support

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

Hi,
After I added support for all attributes in the kernel, it's time to make
use of them in iproute2. I've tested changing/viewing all of the
attributes. I'll send a separate set to add support for the ports
attributes.
For the future, adding a switch to change to "seconds.milliseconds" format
for the timers would be helpful (I've stuck to the currently used clock_t
export to be consistent).

Cheers,
 Nik

Nikolay Aleksandrov (21):
  iplink: bridge: export bridge_id and designated_root
  iplink: bridge: export root_(port|path_cost), topology_change and
change_detected
  iplink: bridge: export read-only timers
  iplink: bridge: add support for IFLA_BR_GROUP_FWD_MASK
  iplink: bridge: add support for IFLA_BR_GROUP_ADDR
  iplink: bridge: add support for IFLA_BR_VLAN_DEFAULT_PVID
  iplink: bridge: add support for IFLA_BR_MCAST_ROUTER
  iplink: bridge: add support for IFLA_BR_MCAST_SNOOPING
  iplink: bridge: add support for IFLA_BR_MCAST_QUERY_USE_IFADDR
  iplink: bridge: add support for IFLA_BR_MCAST_QUERIER
  iplink: bridge: add support for IFLA_BR_MCAST_HASH_ELASTICITY
  iplink: bridge: add support for IFLA_BR_MCAST_HASH_MAX
  iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_CNT
  iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_CNT
  iplink: bridge: add support for IFLA_BR_MCAST_LAST_MEMBER_INTVL
  iplink: bridge: add support for IFLA_BR_MCAST_MEMBERSHIP_INTVL
  iplink: bridge: add support for IFLA_BR_MCAST_QUERIER_INTVL
  iplink: bridge: add support for IFLA_BR_MCAST_QUERY_INTVL
  iplink: bridge: add support for IFLA_BR_MCAST_QUERY_RESPONSE_INTVL
  iplink: bridge: add support for IFLA_BR_MCAST_STARTUP_QUERY_INTVL
  iplink: bridge: add support for netfilter call attributes

 include/utils.h|   1 -
 ip/iplink_bridge.c | 356 +
 2 files changed, 356 insertions(+), 1 deletion(-)

-- 
2.4.3



[PATCH 6/6] net: thunderx: Alloc higher order pages when pagesize is small

2016-02-08 Thread sunil . kovvuri
From: Sunil Goutham 

Allocate higher order pages when pagesize is small, this will
reduce number of calls to page allocator and wastage of memory.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index 50ab6f4..5adb208 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -78,7 +78,7 @@ static void nicvf_free_q_desc_mem(struct nicvf *nic, struct 
q_desc_mem *dmem)
 static inline int nicvf_alloc_rcv_buffer(struct nicvf *nic, gfp_t gfp,
 u32 buf_len, u64 **rbuf)
 {
-   int order = get_order(buf_len);
+   int order = (PAGE_SIZE <= 4096) ?  PAGE_ALLOC_COSTLY_ORDER : 0;
 
/* Check if request can be accomodated in previous allocated page */
if (nic->rb_page) {
-- 
1.7.1



[PATCH 3/6] net: thunderx: Assign affinity hints to vf's interrupts

2016-02-08 Thread sunil . kovvuri
From: Sunil Goutham 

This affinity hint can be used by user space irqbalance tool to set
preferred CPU mask for irqs registered by this VF. Irqbalance needs
to be in 'exact' mode to set irq affinity same as indicated by
affinity hint.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nic.h|1 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |   37 -
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h 
b/drivers/net/ethernet/cavium/thunder/nic.h
index 8af363a..00cc915 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -309,6 +309,7 @@ struct nicvf {
struct msix_entry   msix_entries[NIC_VF_MSIX_VECTORS];
charirq_name[NIC_VF_MSIX_VECTORS][20];
boolirq_allocated[NIC_VF_MSIX_VECTORS];
+   cpumask_var_t   affinity_mask[NIC_VF_MSIX_VECTORS];
 
/* VF <-> PF mailbox communication */
boolpf_acked;
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index c6f146c..90ce93e 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -899,6 +899,31 @@ static void nicvf_disable_msix(struct nicvf *nic)
}
 }
 
+static void nicvf_set_irq_affinity(struct nicvf *nic)
+{
+   int vec, cpu;
+   int irqnum;
+
+   for (vec = 0; vec < nic->num_vec; vec++) {
+   if (!nic->irq_allocated[vec])
+   continue;
+
+   if (!zalloc_cpumask_var(&nic->affinity_mask[vec], GFP_KERNEL))
+   return;
+/* CQ interrupts */
+   if (vec < NICVF_INTR_ID_SQ)
+   /* Leave CPU0 for RBDR and other interrupts */
+   cpu = nicvf_netdev_qidx(nic, vec) + 1;
+   else
+   cpu = 0;
+
+   cpumask_set_cpu(cpumask_local_spread(cpu, nic->node),
+   nic->affinity_mask[vec]);
+   irqnum = nic->msix_entries[vec].vector;
+   irq_set_affinity_hint(irqnum, nic->affinity_mask[vec]);
+   }
+}
+
 static int nicvf_register_interrupts(struct nicvf *nic)
 {
int irq, ret = 0;
@@ -944,8 +969,13 @@ static int nicvf_register_interrupts(struct nicvf *nic)
ret = request_irq(nic->msix_entries[irq].vector,
  nicvf_qs_err_intr_handler,
  0, nic->irq_name[irq], nic);
-   if (!ret)
-   nic->irq_allocated[irq] = true;
+   if (ret)
+   goto err;
+
+   nic->irq_allocated[irq] = true;
+
+   /* Set IRQ affinities */
+   nicvf_set_irq_affinity(nic);
 
 err:
if (ret)
@@ -963,6 +993,9 @@ static void nicvf_unregister_interrupts(struct nicvf *nic)
if (!nic->irq_allocated[irq])
continue;
 
+   irq_set_affinity_hint(nic->msix_entries[irq].vector, NULL);
+   free_cpumask_var(nic->affinity_mask[irq]);
+
if (irq < NICVF_INTR_ID_SQ)
free_irq(nic->msix_entries[irq].vector, nic->napi[irq]);
else
-- 
1.7.1



[PATCH 5/6] net: thunderx: bgx: Add log message when setting mac address

2016-02-08 Thread sunil . kovvuri
From: Robert Richter 

Signed-off-by: Robert Richter 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c |   11 ---
 1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c 
b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 111835b..cfee496 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -886,7 +886,8 @@ static void bgx_get_qlm_mode(struct bgx *bgx)
 
 #ifdef CONFIG_ACPI
 
-static int acpi_get_mac_address(struct acpi_device *adev, u8 *dst)
+static int acpi_get_mac_address(struct device *dev, struct acpi_device *adev,
+   u8 *dst)
 {
u8 mac[ETH_ALEN];
int ret;
@@ -897,10 +898,13 @@ static int acpi_get_mac_address(struct acpi_device *adev, 
u8 *dst)
goto out;
 
if (!is_valid_ether_addr(mac)) {
+   dev_warn(dev, "MAC address invalid: %pM\n", mac);
ret = -EINVAL;
goto out;
}
 
+   dev_info(dev, "MAC address set to: %pM\n", mac);
+
memcpy(dst, mac, ETH_ALEN);
 out:
return ret;
@@ -911,14 +915,15 @@ static acpi_status bgx_acpi_register_phy(acpi_handle 
handle,
 u32 lvl, void *context, void **rv)
 {
struct bgx *bgx = context;
+   struct device *dev = &bgx->pdev->dev;
struct acpi_device *adev;
 
if (acpi_bus_get_device(handle, &adev))
goto out;
 
-   acpi_get_mac_address(adev, bgx->lmac[bgx->lmac_count].mac);
+   acpi_get_mac_address(dev, adev, bgx->lmac[bgx->lmac_count].mac);
 
-   SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, &bgx->pdev->dev);
+   SET_NETDEV_DEV(&bgx->lmac[bgx->lmac_count].netdev, dev);
 
bgx->lmac[bgx->lmac_count].lmacid = bgx->lmac_count;
 out:
-- 
1.7.1



[PATCH 2/6] net: thunderx: Use napi_schedule_irqoff()

2016-02-08 Thread sunil . kovvuri
From: Sunil Goutham 

napi_schedule is being called from hard irq context, hence
switch to napi_schedule_irqoff which avoids unneeded call
to local_irq_save and local_irq_restore.

Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index 95db6b7..c6f146c 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -828,7 +828,7 @@ static irqreturn_t nicvf_intr_handler(int irq, void *cq_irq)
nicvf_disable_intr(nic, NICVF_INTR_CQ, qidx);
 
/* Schedule NAPI */
-   napi_schedule(&cq_poll->napi);
+   napi_schedule_irqoff(&cq_poll->napi);
 
/* Clear interrupt */
nicvf_clear_intr(nic, NICVF_INTR_CQ, qidx);
-- 
1.7.1



[PATCH 4/6] net: thunderx: bgx: Use standard firmware node infrastructure.

2016-02-08 Thread sunil . kovvuri
From: David Daney 

In the case of OF device tree, the firmware information is attached to
the BGX device structure in the standard manner, so use the firmware
iterators and accessors where possible.

Signed-off-by: David Daney 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c |   27 +++--
 1 files changed, 14 insertions(+), 13 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c 
b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 9df26c2..111835b 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
@@ -968,26 +968,27 @@ static int bgx_init_acpi_phy(struct bgx *bgx)
 
 static int bgx_init_of_phy(struct bgx *bgx)
 {
-   struct device_node *np;
-   struct device_node *np_child;
+   struct fwnode_handle *fwn;
u8 lmac = 0;
-   char bgx_sel[5];
const char *mac;
 
-   /* Get BGX node from DT */
-   snprintf(bgx_sel, 5, "bgx%d", bgx->bgx_id);
-   np = of_find_node_by_name(NULL, bgx_sel);
-   if (!np)
-   return -ENODEV;
+   device_for_each_child_node(&bgx->pdev->dev, fwn) {
+   struct device_node *phy_np;
+   struct device_node *node = to_of_node(fwn);
+
+   /* If it is not an OF node we cannot handle it yet, so
+* exit the loop.
+*/
+   if (!node)
+   break;
 
-   for_each_child_of_node(np, np_child) {
-   struct device_node *phy_np = of_parse_phandle(np_child,
- "phy-handle", 0);
+   phy_np = of_parse_phandle(node, "phy-handle", 0);
if (!phy_np)
continue;
+
bgx->lmac[lmac].phydev = of_phy_find_device(phy_np);
 
-   mac = of_get_mac_address(np_child);
+   mac = of_get_mac_address(node);
if (mac)
ether_addr_copy(bgx->lmac[lmac].mac, mac);
 
@@ -995,7 +996,7 @@ static int bgx_init_of_phy(struct bgx *bgx)
bgx->lmac[lmac].lmacid = lmac;
lmac++;
if (lmac == MAX_LMAC_PER_BGX) {
-   of_node_put(np_child);
+   of_node_put(node);
break;
}
}
-- 
1.7.1



[PATCH 1/6] net, thunderx: Add TX timeout and RX buffer alloc failure stats.

2016-02-08 Thread sunil . kovvuri
From: Thanneeru Srinivasulu 

When system is low on atomic memory, too many error messages are logged.
Since this is not a total failure but a simple switch to non-atomic allocation
better to have a stat.

Also add a stat for reset, kicked due to transmit watchdog timeout.

Signed-off-by: Thanneeru Srinivasulu 
Signed-off-by: Sunil Goutham 
---
 drivers/net/ethernet/cavium/thunder/nic.h  |3 +++
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c|2 ++
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |1 +
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |3 +--
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/nic.h 
b/drivers/net/ethernet/cavium/thunder/nic.h
index 6888288..8af363a 100644
--- a/drivers/net/ethernet/cavium/thunder/nic.h
+++ b/drivers/net/ethernet/cavium/thunder/nic.h
@@ -248,10 +248,13 @@ struct nicvf_drv_stats {
u64 rx_frames_jumbo;
u64 rx_drops;
 
+   u64 rcv_buffer_alloc_failures;
+
/* Tx */
u64 tx_frames_ok;
u64 tx_drops;
u64 tx_tso;
+   u64 tx_timeout;
u64 txq_stop;
u64 txq_wake;
 };
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
index a12b2e3..d2d8ef2 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_ethtool.c
@@ -89,9 +89,11 @@ static const struct nicvf_stat nicvf_drv_stats[] = {
NICVF_DRV_STAT(rx_frames_1518),
NICVF_DRV_STAT(rx_frames_jumbo),
NICVF_DRV_STAT(rx_drops),
+   NICVF_DRV_STAT(rcv_buffer_alloc_failures),
NICVF_DRV_STAT(tx_frames_ok),
NICVF_DRV_STAT(tx_tso),
NICVF_DRV_STAT(tx_drops),
+   NICVF_DRV_STAT(tx_timeout),
NICVF_DRV_STAT(txq_stop),
NICVF_DRV_STAT(txq_wake),
 };
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index c24cb2a..95db6b7 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -1394,6 +1394,7 @@ static void nicvf_tx_timeout(struct net_device *dev)
netdev_warn(dev, "%s: Transmit timed out, resetting\n",
dev->name);
 
+   nic->drv_stats.tx_timeout++;
schedule_work(&nic->reset_task);
 }
 
diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c 
b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
index d0d1b54..50ab6f4 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_queues.c
@@ -96,8 +96,7 @@ static inline int nicvf_alloc_rcv_buffer(struct nicvf *nic, 
gfp_t gfp,
nic->rb_page = alloc_pages(gfp | __GFP_COMP | __GFP_NOWARN,
   order);
if (!nic->rb_page) {
-   netdev_err(nic->netdev,
-  "Failed to allocate new rcv buffer\n");
+   nic->drv_stats.rcv_buffer_alloc_failures++;
return -ENOMEM;
}
nic->rb_page_offset = 0;
-- 
1.7.1



[PATCH 0/6] net: thunderx: Setting IRQ affinity hints and other optimizations

2016-02-08 Thread sunil . kovvuri
From: Sunil Goutham 

This patch series contains changes
- To add support for virtual function's irq affinity hint
- Replace napi_schedule() with napi_schedule_irqoff()
- Reduce page allocation overhead by allocating pages
  of higher order when pagesize is 4KB.
- Add couple of stats which helps in debugging
- Some miscellaneous changes to BGX driver.


David Daney (1):
  net: thunderx: bgx: Use standard firmware node infrastructure.

Robert Richter (1):
  net: thunderx: bgx: Add log message when setting mac address

Sunil Goutham (3):
  net: thunderx: Use napi_schedule_irqoff()
  net: thunderx: Assign affinity hints to vf's interrupts
  net: thunderx: Alloc higher order pages when pagesize is small

Thanneeru Srinivasulu (1):
  net, thunderx: Add TX timeout and RX buffer alloc failure stats.

 drivers/net/ethernet/cavium/thunder/nic.h  |4 ++
 .../net/ethernet/cavium/thunder/nicvf_ethtool.c|2 +
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |   40 ++-
 drivers/net/ethernet/cavium/thunder/nicvf_queues.c |5 +-
 drivers/net/ethernet/cavium/thunder/thunder_bgx.c  |   38 +++
 5 files changed, 67 insertions(+), 22 deletions(-)



Re: [patch net-next RFC 0/6] Introduce devlink interface and first drivers to use it

2016-02-08 Thread Hannes Frederic Sowa

Hi,

On 08.02.2016 11:55, Jiri Pirko wrote:

Mon, Feb 08, 2016 at 11:15:38AM CET, han...@stressinduktion.org wrote:

Hello,

On 06.02.2016 20:40, Jiri Pirko wrote:

Fri, Feb 05, 2016 at 06:38:42PM CET, alexei.starovoi...@gmail.com wrote:

On Fri, Feb 05, 2016 at 11:01:22AM +0100, Hannes Frederic Sowa wrote:


Okay. I see it more as changing mode of operation of hardware and thus has
not really anything to do with networking. If you say you change ethernet to
infiniband it has something to do with networking, sure. But I am fine with
this, I just thought the code size could be reduced by adding this to sysfs
quite a lot. I don't have a strong opinion on this.


there is already a way to change eth/ib via
echo 'eth' > /sys/bus/pci/drivers/mlx4_core/:02:00.0/mlx4_port1

sounds like this is another way to achieve the same?


It is. However the current way is driver-specific, not correct.


Why is driver specific not correct? Actually it is very much a device
specific thing, isn't it?


Well, adding driver specific sysfs file called "driver_name_port_type"
does not seem correct to me.


Why? PHYs are debugged like that? I thought that especially sysfs is the 
right thing, it makes sure we can correctly identify a device. The logic 
in devlink_alloc by just incrementing a counter and having the naming 
policy be decided by driver registration time will introduce the same 
problems like identifying devices by interfaces had before.



For mlx5, we need the same, it cannot be done in this way. Do devlink is
the correct way to go.


Do two drivers already justify a new complete netlink api? Doesn't this
create the same problems like netdevice naming problems which needed multiple
years to become stable in case we have multiple cards or some administrator


The thing is, other driver would use it as well, but there's no way to
do it :) So vendors have their proprietary configuration utils. Devlink
objective is to avoid those, to introduce vendor-neutral interface.


Ok, agreed. But multiple driver reuse the phy-sysfs routines, too. I 
didn't see this to be a problem.


Anyway, I don't care if it is sysfs or something else, I am concerned 
about the atomic_inc_return based identification of those devices.



reorders the cards (biosdevorder, systemd/udev issues)? Are ports always
stable? How can we have a 1:1 relationship with ifindexes and how can they be
stable? It is impossible to use that in scripts?


Port index is setup by driver always, they have stable internal
numbering. devlink device name is not stable (as for example netdev
name), but can be easily identified by bus name and device name. I don't
see a reason why udev cannot rename it according to some rules. By the
way, this is very similar to phyX wireless devices.


Ok, understood. It just seems to be duplication of code with another name.


Why not hide echo/cat in iproute2 instead of adding parallel netlink api?
Or this is for switches instead of nics?
Then why it's not adding to switchdev?


Note this is not specific to switch ASICs. This is for all network devices.


That's actually my fear. The relationship from "devlink-names" to ifindexes I
didn't understand at all architecturally.


Again, this is very similar to phyX wireless devices.
I don't understand the reason for your fear :)


If, as you said, this gets integrated by systemd/udev and will change 
names to stable ones before switching ports (so we don't accidentally 
switch a wrong port) I am all fine. This is basically how net_devices 
are handled.


Then my only argument is that this is too complex, but I can live with that.

Thanks,
Hannes



[net-next PATCH V2 2/3] net: bulk free SKBs that were delay free'ed due to IRQ context

2016-02-08 Thread Jesper Dangaard Brouer
The network stack defers SKBs free, in-case free happens in IRQ or
when IRQs are disabled. This happens in __dev_kfree_skb_irq() that
writes SKBs that were free'ed during IRQ to the softirq completion
queue (softnet_data.completion_queue).

These SKBs are naturally delayed, and cleaned up during NET_TX_SOFTIRQ
in function net_tx_action().  Take advantage of this a use the skb
defer and flush API, as we are already in softirq context.

For modern drivers this rarely happens. Although most drivers do call
dev_kfree_skb_any(), which detects the situation and calls
__dev_kfree_skb_irq() when needed.  This due to netpoll can call from
IRQ context.

Signed-off-by: Alexander Duyck 
Signed-off-by: Jesper Dangaard Brouer 
---
 include/linux/skbuff.h |1 +
 net/core/dev.c |8 +++-
 net/core/skbuff.c  |8 ++--
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 3c8d348223d7..b06ba2e07c89 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2402,6 +2402,7 @@ static inline struct sk_buff *napi_alloc_skb(struct 
napi_struct *napi,
 void napi_consume_skb(struct sk_buff *skb, int budget);
 
 void __kfree_skb_flush(void);
+void __kfree_skb_defer(struct sk_buff *skb);
 
 /**
  * __dev_alloc_pages - allocate page for network Rx
diff --git a/net/core/dev.c b/net/core/dev.c
index 44384a8c9613..b185d7eaa2e4 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3829,8 +3829,14 @@ static void net_tx_action(struct softirq_action *h)
trace_consume_skb(skb);
else
trace_kfree_skb(skb, net_tx_action);
-   __kfree_skb(skb);
+
+   if (skb->fclone != SKB_FCLONE_UNAVAILABLE)
+   __kfree_skb(skb);
+   else
+   __kfree_skb_defer(skb);
}
+
+   __kfree_skb_flush();
}
 
if (sd->output_queue) {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index e26bb2b1dba4..d278e51789e9 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -767,7 +767,7 @@ void __kfree_skb_flush(void)
}
 }
 
-static void __kfree_skb_defer(struct sk_buff *skb)
+static inline void _kfree_skb_defer(struct sk_buff *skb)
 {
struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 
@@ -789,6 +789,10 @@ static void __kfree_skb_defer(struct sk_buff *skb)
nc->skb_count = 0;
}
 }
+void __kfree_skb_defer(struct sk_buff *skb)
+{
+   _kfree_skb_defer(skb);
+}
 
 void napi_consume_skb(struct sk_buff *skb, int budget)
 {
@@ -814,7 +818,7 @@ void napi_consume_skb(struct sk_buff *skb, int budget)
return;
}
 
-   __kfree_skb_defer(skb);
+   _kfree_skb_defer(skb);
 }
 EXPORT_SYMBOL(napi_consume_skb);
 



[net-next PATCH V2 0/3] net: mitigating kmem_cache free slowpath

2016-02-08 Thread Jesper Dangaard Brouer
This patchset is the first real use-case for kmem_cache bulk _free_.
The use of bulk _alloc_ is NOT included in this patchset. The full use
have previously been posted here [1].

The bulk free side have the largest benefit for the network stack
use-case, because network stack is hitting the kmem_cache/SLUB
slowpath when freeing SKBs, due to the amount of outstanding SKBs.
This is solved by using the new API kmem_cache_free_bulk().

Introduce new API napi_consume_skb(), that hides/handles bulk freeing
for the caller.  The drivers simply need to use this call when freeing
SKBs in NAPI context, e.g. replacing their calles to dev_kfree_skb() /
dev_consume_skb_any().

Driver ixgbe is the first user of this new API.

[1] http://thread.gmane.org/gmane.linux.network/384302/focus=397373

---

Jesper Dangaard Brouer (3):
  net: bulk free infrastructure for NAPI context, use napi_consume_skb
  net: bulk free SKBs that were delay free'ed due to IRQ context
  ixgbe: bulk free SKBs during TX completion cleanup cycle


 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |6 +-
 include/linux/skbuff.h|4 +
 net/core/dev.c|9 ++-
 net/core/skbuff.c |   87 +++--
 4 files changed, 96 insertions(+), 10 deletions(-)

--


[net-next PATCH V2 1/3] net: bulk free infrastructure for NAPI context, use napi_consume_skb

2016-02-08 Thread Jesper Dangaard Brouer
Discovered that network stack were hitting the kmem_cache/SLUB
slowpath when freeing SKBs.  Doing bulk free with kmem_cache_free_bulk
can speedup this slowpath.

NAPI context is a bit special, lets take advantage of that for bulk
free'ing SKBs.

In NAPI context we are running in softirq, which gives us certain
protection.  A softirq can run on several CPUs at once.  BUT the
important part is a softirq will never preempt another softirq running
on the same CPU.  This gives us the opportunity to access per-cpu
variables in softirq context.

Extend napi_alloc_cache (before only contained page_frag_cache) to be
a struct with a small array based stack for holding SKBs.  Introduce a
SKB defer and flush API for accessing this.

Introduce napi_consume_skb() as replacement for e.g. dev_consume_skb_any()
when running in NAPI context.  A small trick to handle/detect if we
are called from netpoll is to see if budget is 0.  In that case, we
need to invoke dev_consume_skb_irq().

Joint work with Alexander Duyck.

Signed-off-by: Jesper Dangaard Brouer 
Signed-off-by: Alexander Duyck 
---
 include/linux/skbuff.h |3 ++
 net/core/dev.c |1 +
 net/core/skbuff.c  |   83 +---
 3 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 11f935c1a090..3c8d348223d7 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2399,6 +2399,9 @@ static inline struct sk_buff *napi_alloc_skb(struct 
napi_struct *napi,
 {
return __napi_alloc_skb(napi, length, GFP_ATOMIC);
 }
+void napi_consume_skb(struct sk_buff *skb, int budget);
+
+void __kfree_skb_flush(void);
 
 /**
  * __dev_alloc_pages - allocate page for network Rx
diff --git a/net/core/dev.c b/net/core/dev.c
index 8cba3d852f25..44384a8c9613 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5152,6 +5152,7 @@ static void net_rx_action(struct softirq_action *h)
}
}
 
+   __kfree_skb_flush();
local_irq_disable();
 
list_splice_tail_init(&sd->poll_list, &list);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b2df375ec9c2..e26bb2b1dba4 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -347,8 +347,16 @@ struct sk_buff *build_skb(void *data, unsigned int 
frag_size)
 }
 EXPORT_SYMBOL(build_skb);
 
+#define NAPI_SKB_CACHE_SIZE64
+
+struct napi_alloc_cache {
+   struct page_frag_cache page;
+   size_t skb_count;
+   void *skb_cache[NAPI_SKB_CACHE_SIZE];
+};
+
 static DEFINE_PER_CPU(struct page_frag_cache, netdev_alloc_cache);
-static DEFINE_PER_CPU(struct page_frag_cache, napi_alloc_cache);
+static DEFINE_PER_CPU(struct napi_alloc_cache, napi_alloc_cache);
 
 static void *__netdev_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
@@ -378,9 +386,9 @@ EXPORT_SYMBOL(netdev_alloc_frag);
 
 static void *__napi_alloc_frag(unsigned int fragsz, gfp_t gfp_mask)
 {
-   struct page_frag_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+   struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
 
-   return __alloc_page_frag(nc, fragsz, gfp_mask);
+   return __alloc_page_frag(&nc->page, fragsz, gfp_mask);
 }
 
 void *napi_alloc_frag(unsigned int fragsz)
@@ -474,7 +482,7 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 gfp_t gfp_mask)
 {
-   struct page_frag_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+   struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
struct sk_buff *skb;
void *data;
 
@@ -494,7 +502,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, 
unsigned int len,
if (sk_memalloc_socks())
gfp_mask |= __GFP_MEMALLOC;
 
-   data = __alloc_page_frag(nc, len, gfp_mask);
+   data = __alloc_page_frag(&nc->page, len, gfp_mask);
if (unlikely(!data))
return NULL;
 
@@ -505,7 +513,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, 
unsigned int len,
}
 
/* use OR instead of assignment to avoid clearing of bits in mask */
-   if (nc->pfmemalloc)
+   if (nc->page.pfmemalloc)
skb->pfmemalloc = 1;
skb->head_frag = 1;
 
@@ -747,6 +755,69 @@ void consume_skb(struct sk_buff *skb)
 }
 EXPORT_SYMBOL(consume_skb);
 
+void __kfree_skb_flush(void)
+{
+   struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+   /* flush skb_cache if containing objects */
+   if (nc->skb_count) {
+   kmem_cache_free_bulk(skbuff_head_cache, nc->skb_count,
+nc->skb_cache);
+   nc->skb_count = 0;
+   }
+}
+
+static void __kfree_skb_defer(struct sk_buff *skb)
+{
+   struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+
+   /* drop skb->head and call any destructors for packet */
+   skb_release_all(skb);
+
+   /* record skb to C

[net-next PATCH V2 3/3] ixgbe: bulk free SKBs during TX completion cleanup cycle

2016-02-08 Thread Jesper Dangaard Brouer
There is an opportunity to bulk free SKBs during reclaiming of
resources after DMA transmit completes in ixgbe_clean_tx_irq.  Thus,
bulk freeing at this point does not introduce any added latency.

Simply use napi_consume_skb() which were recently introduced.  The
napi_budget parameter is needed by napi_consume_skb() to detect if it
is called from netpoll.

Benchmarking IPv4-forwarding, on CPU i7-4790K @4.2GHz (no turbo boost)
 Single CPU/flow numbers: before: 1982144 pps ->  after : 2064446 pps
 Improvement: +82302 pps, -20 nanosec, +4.1%
 (SLUB and GCC version 5.1.1 20150618 (Red Hat 5.1.1-4))

Joint work with Alexander Duyck.

Signed-off-by: Alexander Duyck 
Signed-off-by: Jesper Dangaard Brouer 
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c 
b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index c4003a88bbf6..0c701b8438b6 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -1089,7 +1089,7 @@ static void ixgbe_tx_timeout_reset(struct ixgbe_adapter 
*adapter)
  * @tx_ring: tx ring to clean
  **/
 static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector *q_vector,
-  struct ixgbe_ring *tx_ring)
+  struct ixgbe_ring *tx_ring, int napi_budget)
 {
struct ixgbe_adapter *adapter = q_vector->adapter;
struct ixgbe_tx_buffer *tx_buffer;
@@ -1127,7 +1127,7 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector 
*q_vector,
total_packets += tx_buffer->gso_segs;
 
/* free the skb */
-   dev_consume_skb_any(tx_buffer->skb);
+   napi_consume_skb(tx_buffer->skb, napi_budget);
 
/* unmap skb header data */
dma_unmap_single(tx_ring->dev,
@@ -2784,7 +2784,7 @@ int ixgbe_poll(struct napi_struct *napi, int budget)
 #endif
 
ixgbe_for_each_ring(ring, q_vector->tx)
-   clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring);
+   clean_complete &= !!ixgbe_clean_tx_irq(q_vector, ring, budget);
 
/* Exit if we are called by netpoll or busy polling is active */
if ((budget <= 0) || !ixgbe_qv_lock_napi(q_vector))



Re: [PATCH v5] net: ethernet: nb8800: support fixed-link DT node

2016-02-08 Thread Måns Rullgård
Sebastian Frias  writes:

> Under some circumstances, for example when connecting
> to a switch:
>
> https://stackoverflow.com/questions/31046172/device-tree-for-phy-less-connection-to-a-dsa-switch
>
> the ethernet port will not be connected to a PHY.
> In that case a "fixed-link" DT node can be used to replace it.
>
> This patch adds support for the "fixed-link" node to the
> nb8800 driver.
>
> Signed-off-by: Sebastian Frias 

Acked-by: Mans Rullgard 

> ---
>  drivers/net/ethernet/aurora/nb8800.c | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/aurora/nb8800.c
> b/drivers/net/ethernet/aurora/nb8800.c
> index ecc4a33..e1fb071 100644
> --- a/drivers/net/ethernet/aurora/nb8800.c
> +++ b/drivers/net/ethernet/aurora/nb8800.c
> @@ -1460,7 +1460,19 @@ static int nb8800_probe(struct platform_device *pdev)
>   goto err_disable_clk;
>   }
>
> - priv->phy_node = of_parse_phandle(pdev->dev.of_node, "phy-handle", 0);
> + if (of_phy_is_fixed_link(pdev->dev.of_node)) {
> + ret = of_phy_register_fixed_link(pdev->dev.of_node);
> + if (ret < 0) {
> + dev_err(&pdev->dev, "bad fixed-link spec\n");
> + goto err_free_bus;
> + }
> + priv->phy_node = of_node_get(pdev->dev.of_node);
> + }
> +
> + if (!priv->phy_node)
> + priv->phy_node = of_parse_phandle(pdev->dev.of_node,
> +   "phy-handle", 0);
> +
>   if (!priv->phy_node) {
>   dev_err(&pdev->dev, "no PHY specified\n");
>   ret = -ENODEV;
> -- 
> 2.1.4
>

-- 
Måns Rullgård


Re: [PATCH 5/6] net: thunderx: bgx: Add log message when setting mac address

2016-02-08 Thread Sergei Shtylyov

Hello.

On 2/8/2016 3:07 PM, sunil.kovv...@gmail.com wrote:


From: Robert Richter 

Signed-off-by: Robert Richter 
Signed-off-by: Sunil Goutham 
---
  drivers/net/ethernet/cavium/thunder/thunder_bgx.c |   11 ---
  1 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c 
b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
index 111835b..cfee496 100644
--- a/drivers/net/ethernet/cavium/thunder/thunder_bgx.c
+++ b/drivers/net/ethernet/cavium/thunder/thunder_bgx.c

[...]

@@ -897,10 +898,13 @@ static int acpi_get_mac_address(struct acpi_device *adev, 
u8 *dst)
goto out;

if (!is_valid_ether_addr(mac)) {
+   dev_warn(dev, "MAC address invalid: %pM\n", mac);


   dev_er(), maybe?


ret = -EINVAL;
goto out;
}

+   dev_info(dev, "MAC address set to: %pM\n", mac);
+
memcpy(dst, mac, ETH_ALEN);
  out:
return ret;

[...]

MBR, Sergei



Re: [PATCH 2/2] fm10k: correctly report error when changing number of channels

2016-02-08 Thread Jakub Kicinski
Hi Jacob!

First of all thanks for putting your time into sorting this out,
figuring out what to do with user-set RSS table when queues are
reconfigured was a head scratcher for me as well.

On Fri,  5 Feb 2016 12:30:21 -0800, Jacob Keller wrote:
> +#define FM10K_FLAG_RETA_TABLE_CONFIGURED (u32)(BIT(6))

If we go with your proposal every driver will have to keep track of 
how the RSS table was set and find max value on queue reconfig -
replicating effort and leaving space for diverging behaviour...

Would it be worth considering to place more of this code in the core?


Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Måns Rullgård
Sebastian Frias  writes:

> On 02/05/2016 04:26 PM, Måns Rullgård wrote:
>> Sebastian Frias  writes:
>>
>>> On 02/05/2016 04:08 PM, Måns Rullgård wrote:
 Sebastian Frias  writes:

> On 02/05/2016 03:34 PM, Måns Rullgård wrote:
>> Sebastian Frias  writes:
>>
>>> Signed-off-by: Sebastian Frias 
>>
>> Please change the subject to something like "net: ethernet: nb8800:
>> support fixed-link DT node" and add a comment body.
>
> The subject is pretty explicit for such a simple patch, what else
> could I add that wouldn't be unnecessary chat?

 It's customary to include a description body even if it's little more
 than a restatement of the subject.  Also, while the subject usually only
 says _what_ the patch does, the body should additionally state _why_ it
 is needed.
>>>
>>> I understand, but _why_ it is needed is also obvious in this case; I
>>> mean, without the patch "fixed-link" cannot be used.
>>
>> Then say so.
>>
>>> Other patches may not be as obvious/simple and thus justify and
>>> require more details.
>>>
>>> Anyway, I added "Properly handles the case where the PHY is not connected
>>> to the real MDIO bus" would that be ok?
>>
>> Have you read Documentation/SubmittingPatches?  Do so (again) and pay
>> special attention to section 2 "Describe your changes."
>
> I just sent v5.

Thanks for your patience.

> If for whatever reason, you or anybody else think that the comment is
> not good, would you mind proposing a comment that would make everybody
> happy so that the patch goes thru?
> And if you or anybody else does not want the patch, could you please
> say so as well?
>
> I have to admit this process (sending patches and getting it reviewed)
> could benefit from more clarifications.
> For example, the process could say that at least 2 reviewers must
> agree on it (on the comments made to the patch and on the patch
> itself).
> I could also say that reviewers are to express not only their opinion
> but to clearly and unequivocally accept or reject.
>
> For instance, right now, it is not clear to me if your comments are
> "nice to have" or "blocking" the patch.
> I don't know if the patch is welcome or not, etc.
> So I submitted v5, but maybe it was not even necessary, it's hard to
> know where in the submission process we are.

In this case, it's ultimately up to Dave Miller.  He'll take into
account whatever comments others have made and decide whether he wants
to accept it.

> By the way, I know some people like the command line, email, etc. but
> there ought to be other tools better suited for patch review...

Some kernel subsystems use http://patchwork.ozlabs.org/ to track status
of various patches.

-- 
Måns Rullgård


Re: [PATCH 5/6] net: thunderx: bgx: Add log message when setting mac address

2016-02-08 Thread Bjørn Mork
Sergei Shtylyov  writes:

>dev_er(), maybe?

I like that!

It's often hard to know whether to print something or be quiet.
pr_er(), dev_er(), netdev_er() etc would be the perfect solution to that
problem.


Bjørn


Re: [PATCH 5/6] net: thunderx: bgx: Add log message when setting mac address

2016-02-08 Thread Sergei Shtylyov

On 02/08/2016 04:49 PM, Bjørn Mork wrote:


dev_er(), maybe?


I like that!

It's often hard to know whether to print something or be quiet.
pr_er(), dev_er(), netdev_er() etc would be the perfect solution to that
problem.


   :-D
   Sorry for the typo...


Bjørn


MBR, Sergei



Re: Kernel uapi and glibc header conflicts (was Re: header conflict introduced by change to netfilter_ipv4/ip_tables.h )

2016-02-08 Thread Florian Weimer
On 02/07/2016 12:31 PM, Mikko Rapeli wrote:
> On Thu, Jan 07, 2016 at 10:30:40AM -0800, Stephen Hemminger wrote:
>> On Thu, 7 Jan 2016 07:29:50 +
>> Mikko Rapeli  wrote:
>>
>>> On Wed, Jan 06, 2016 at 09:20:07AM -0800, Stephen Hemminger wrote:
 This commit breaks compilation of iproute2 with net-next.
>>>
>>> Ok, linux/if.h and libc net/if.h have overlapping defines, and this is not
>>> the only one. I saw lots of them in the core dump headers.
>>>
>>> How should we handle them? Another ifndef for IFNAMSIZ into kernel uapi
>>> headers?
>>>
>>> -Mikko
>>
>> Probably need to do the same thing that was done previously for these
>> kind of conflicts.  This makes make linux/if.h change to adapt to net/if.h
>> being included before it.
> 
> So uapi headers now have a libc-compat.h
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/uapi/linux/libc-compat.h?id=refs/tags/v4.5-rc2
> which tries to detect and fix incompatibilities between Linux kernel and glibc
> headers. Part of the fix is then in the kernel side headers and another part
> should be in glibc headers, but glibc git repo does not include any of these
> fixes yet.
> 
> Has the glibc part of this incompatiblity mess been discussed and agreed
> with glibc developers?

I don't remember any recent discussions on libc-alpha, or any bug
reports about this concrete change.

(Redirecting to libc-alpha, which seems the more appropriate list.)

> Many of the conflics arise from propably old glibc headers which had copied
> out definitions from the Linux kernel side before it could export any headers
> to userspace. I assume that the glibc headers are not allowed to depend and
> include Linux kernel uapi headers in deployments but maybe the Linux kernel
> headers could be used at glibc compile time to generate needed glibc side
> definitions. That would allow having a single source for definitions like 
> FNAMSIZ 16.

My impression is that this inconsistency isn't the only problem.  The
problems start if application developers need functionality which is
only in kernel-provided headers, but they still need to include glibc
headers at the same time.

> I'm drafting a test, similar to the kernel uapi header compile test
> https://github.com/mcfrisk/linux/blob/headers_test_v05/scripts/headers_compile_test.sh
> for the glibc conflicts too, and of course noticed that also glibc headers
> conflict with each other. With some workarounds I can test compile each kernel
> uapi header against all compiling glibc headers and see the conflicts as
> build failures.

That could be helpful.

I'm not familiar with relevant developer practices.  It seems to me that
from an application developer point of view, kernel headers are updated
a bit more frequently than glibc headers.  This likely pushes the
solution into a certain direction (and may be the rationale behind the
kernel's libc-compat.h).

Florian


Re: Keystone 2 boards boot failure

2016-02-08 Thread Arnd Bergmann
On Friday 05 February 2016 19:11:06 Grygorii Strashko wrote:
> On 02/05/2016 06:18 PM, Arnd Bergmann wrote:
> > On Thursday 04 February 2016 18:25:08 Grygorii Strashko wrote:

> > @@ -1173,7 +1189,8 @@ static int netcp_tx_submit_skb(struct netcp_intf 
> > *netcp,
> > }
> >   
> > set_words(&tmp, 1, &desc->packet_info);
> > -   set_words((u32 *)&skb, 1, &desc->pad[0]);
> > +   tmp = (uintptr_t)&skb;
> > +   set_words(&tmp, 1, &desc->pad[0]);
> 
> &skb is virt address and its size is 32bit even when LPAE=y (phys/dma 64 bit)
> so  this is excess conversion to/from u64 ;)
> This is from the first look.

My original patch attempted to fix support for 64-bit CPUs, as no driver
should be written to support only 32-bit CPUs even if you think at this
point that there can never be a 64-bit keystone system.

The half-reverted patch above no longer works correctly for 64-bit CPUs
but it should not actually be wrong on 32-bit CPUs either, unless I'm
missing your point. 

> >   
> > if (tx_pipe->flags & SWITCH_TO_PORT_IN_TAGINFO) {
> > tmp = tx_pipe->switch_to_port;
> > 
> > 
> > I'm sure it's something obvious and stupid in there, but I just can't
> > see it and that is very unsatisfying. Do you see where I am going wrong?
> > Most of all, I want to know it so I don't make the same mistake again
> > when I patch another driver.
> > 
> 
> I'm very sorry, but I'll not be able to test it in the nearest future :(
> What I could do now is update your/my patch as i mentioned in [1]
> and re-send it at the weekend (with your authorship and my signoff).
> Do you agree?
> 
> 
> [1] https://www.mail-archive.com/netdev@vger.kernel.org/msg95831.html

Yes, let's do that in the meantime. I can also make sure that that
the driver doesn't build on 64-bit, just in case.

Arnd


Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Mason
On 08/02/2016 14:37, Måns Rullgård wrote:

> Sebastian Frias wrote:
> 
>> By the way, I know some people like the command line, email, etc. but
>> there ought to be other tools better suited for patch review...
> 
> Some kernel subsystems use http://patchwork.ozlabs.org/ to track status
> of various patches.

There's also a kernel bugzilla, but it may be for actual bugs.
https://bugzilla.kernel.org/



Re: [PATCH 5/6] net: thunderx: bgx: Add log message when setting mac address

2016-02-08 Thread Robert Richter
On 08.02.16 16:30:37, Sergei Shtylyov wrote:
> >@@ -897,10 +898,13 @@ static int acpi_get_mac_address(struct acpi_device 
> >*adev, u8 *dst)
> > goto out;
> >
> > if (!is_valid_ether_addr(mac)) {
> >+dev_warn(dev, "MAC address invalid: %pM\n", mac);
> 
>dev_er(), maybe?

Since the driver may continue, my choice was a warning only.

-Robert

> 
> > ret = -EINVAL;
> > goto out;
> > }
> >
> >+dev_info(dev, "MAC address set to: %pM\n", mac);
> >+
> > memcpy(dst, mac, ETH_ALEN);
> >  out:
> > return ret;
> [...]



Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Sebastian Frias

On 02/08/2016 02:37 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


On 02/05/2016 04:26 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


On 02/05/2016 04:08 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


On 02/05/2016 03:34 PM, Måns Rullgård wrote:

Sebastian Frias  writes:


Signed-off-by: Sebastian Frias 


Please change the subject to something like "net: ethernet: nb8800:
support fixed-link DT node" and add a comment body.


The subject is pretty explicit for such a simple patch, what else
could I add that wouldn't be unnecessary chat?


It's customary to include a description body even if it's little more
than a restatement of the subject.  Also, while the subject usually only
says _what_ the patch does, the body should additionally state _why_ it
is needed.


I understand, but _why_ it is needed is also obvious in this case; I
mean, without the patch "fixed-link" cannot be used.


Then say so.


Other patches may not be as obvious/simple and thus justify and
require more details.

Anyway, I added "Properly handles the case where the PHY is not connected
to the real MDIO bus" would that be ok?


Have you read Documentation/SubmittingPatches?  Do so (again) and pay
special attention to section 2 "Describe your changes."


I just sent v5.


Thanks for your patience.


:-)




If for whatever reason, you or anybody else think that the comment is
not good, would you mind proposing a comment that would make everybody
happy so that the patch goes thru?
And if you or anybody else does not want the patch, could you please
say so as well?

I have to admit this process (sending patches and getting it reviewed)
could benefit from more clarifications.
For example, the process could say that at least 2 reviewers must
agree on it (on the comments made to the patch and on the patch
itself).
I could also say that reviewers are to express not only their opinion
but to clearly and unequivocally accept or reject.

For instance, right now, it is not clear to me if your comments are
"nice to have" or "blocking" the patch.
I don't know if the patch is welcome or not, etc.
So I submitted v5, but maybe it was not even necessary, it's hard to
know where in the submission process we are.


In this case, it's ultimately up to Dave Miller.  He'll take into
account whatever comments others have made and decide whether he wants
to accept it.


Ok, thanks.




By the way, I know some people like the command line, email, etc. but
there ought to be other tools better suited for patch review...


Some kernel subsystems use http://patchwork.ozlabs.org/ to track status
of various patches.



Thanks, I see that netdev is part of it, and that the patches are there:

https://patchwork.ozlabs.org/patch/580217/

seems like a slight layer over plain email and mailinglists; I was 
thinking of something more in the line of https://www.gerritcodereview.com/

I believe Google uses Gerrit for Android.
I think Gerrit would probably be too big (and being written in Java, 
using Prolog and other DSLs, implementing its own Git server in Java, 
etc, may make some -or lots?- of kernel developers cry :-) )
However, in Gerrit it is easier to know where in the "review" process we 
are, because people have to explicitly give a score "+/- X" when 
commenting on a patch.
Also, the diff can operate between different versions of the patches 
themselves to see if the inlined comments were addressed.


[PATCH] net: am79c961a: avoid %? in inline assembly

2016-02-08 Thread Arnd Bergmann
The am79c961a.c driver fails to build with clang because of an
unusual inline assembly construct:

drivers/net/ethernet/amd/am79c961a.c:53:7: error: invalid % escape in inline 
assembly string
 "str%?h%1, [%2]@ NET_RAP\n\t"

The same change has been done a decade ago in arch/arm as of
6a39dd6222dd ("[ARM] 3759/2: Remove uses of %?"), but apparently
some drivers were missed.

Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/amd/am79c961a.c | 64 ++--
 1 file changed, 32 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/amd/am79c961a.c 
b/drivers/net/ethernet/amd/am79c961a.c
index 87e727b921dc..fcdf5dda448f 100644
--- a/drivers/net/ethernet/amd/am79c961a.c
+++ b/drivers/net/ethernet/amd/am79c961a.c
@@ -50,8 +50,8 @@ static const char version[] =
 static void write_rreg(u_long base, u_int reg, u_int val)
 {
asm volatile(
-   "str%?h %1, [%2]@ NET_RAP\n\t"
-   "str%?h %0, [%2, #-4]   @ NET_RDP"
+   "strh   %1, [%2]@ NET_RAP\n\t"
+   "strh   %0, [%2, #-4]   @ NET_RDP"
:
: "r" (val), "r" (reg), "r" (ISAIO_BASE + 0x0464));
 }
@@ -60,8 +60,8 @@ static inline unsigned short read_rreg(u_long base_addr, 
u_int reg)
 {
unsigned short v;
asm volatile(
-   "str%?h %1, [%2]@ NET_RAP\n\t"
-   "ldr%?h %0, [%2, #-4]   @ NET_RDP"
+   "strh   %1, [%2]@ NET_RAP\n\t"
+   "ldrh   %0, [%2, #-4]   @ NET_RDP"
: "=r" (v)
: "r" (reg), "r" (ISAIO_BASE + 0x0464));
return v;
@@ -70,8 +70,8 @@ static inline unsigned short read_rreg(u_long base_addr, 
u_int reg)
 static inline void write_ireg(u_long base, u_int reg, u_int val)
 {
asm volatile(
-   "str%?h %1, [%2]@ NET_RAP\n\t"
-   "str%?h %0, [%2, #8]@ NET_IDP"
+   "strh   %1, [%2]@ NET_RAP\n\t"
+   "strh   %0, [%2, #8]@ NET_IDP"
:
: "r" (val), "r" (reg), "r" (ISAIO_BASE + 0x0464));
 }
@@ -80,8 +80,8 @@ static inline unsigned short read_ireg(u_long base_addr, 
u_int reg)
 {
u_short v;
asm volatile(
-   "str%?h %1, [%2]@ NAT_RAP\n\t"
-   "ldr%?h %0, [%2, #8]@ NET_IDP\n\t"
+   "strh   %1, [%2]@ NAT_RAP\n\t"
+   "ldrh   %0, [%2, #8]@ NET_IDP\n\t"
: "=r" (v)
: "r" (reg), "r" (ISAIO_BASE + 0x0464));
return v;
@@ -96,7 +96,7 @@ am_writebuffer(struct net_device *dev, u_int offset, unsigned 
char *buf, unsigne
offset = ISAMEM_BASE + (offset << 1);
length = (length + 1) & ~1;
if ((int)buf & 2) {
-   asm volatile("str%?h%2, [%0], #4"
+   asm volatile("strh  %2, [%0], #4"
 : "=&r" (offset) : "0" (offset), "r" (buf[0] | (buf[1] << 8)));
buf += 2;
length -= 2;
@@ -104,20 +104,20 @@ am_writebuffer(struct net_device *dev, u_int offset, 
unsigned char *buf, unsigne
while (length > 8) {
register unsigned int tmp asm("r2"), tmp2 asm("r3");
asm volatile(
-   "ldm%?ia%0!, {%1, %2}"
+   "ldmia  %0!, {%1, %2}"
: "+r" (buf), "=&r" (tmp), "=&r" (tmp2));
length -= 8;
asm volatile(
-   "str%?h %1, [%0], #4\n\t"
-   "mov%?  %1, %1, lsr #16\n\t"
-   "str%?h %1, [%0], #4\n\t"
-   "str%?h %2, [%0], #4\n\t"
-   "mov%?  %2, %2, lsr #16\n\t"
-   "str%?h %2, [%0], #4"
+   "strh   %1, [%0], #4\n\t"
+   "mov%1, %1, lsr #16\n\t"
+   "strh   %1, [%0], #4\n\t"
+   "strh   %2, [%0], #4\n\t"
+   "mov%2, %2, lsr #16\n\t"
+   "strh   %2, [%0], #4"
: "+r" (offset), "=&r" (tmp), "=&r" (tmp2));
}
while (length > 0) {
-   asm volatile("str%?h%2, [%0], #4"
+   asm volatile("strh  %2, [%0], #4"
 : "=&r" (offset) : "0" (offset), "r" (buf[0] | (buf[1] << 8)));
buf += 2;
length -= 2;
@@ -132,23 +132,23 @@ am_readbuffer(struct net_device *dev, u_int offset, 
unsigned char *buf, unsigned
if ((int)buf & 2) {
unsigned int tmp;
asm volatile(
-   "ldr%?h %2, [%0], #4\n\t"
-   "str%?b %2, [%1], #1\n\t"
-   "mov%?  %2, %2, lsr #8\n\t"
-   "str%?b %2, [%1], #1"
+   "ldrh   %2, [%0], #4\n\t"
+   "strb   %2, [%1], #1\n\t"
+   "mov%2, %2, lsr #8\n\t"
+   "strb   %2, [%1], #1"
: "=&r" (offset), "=&r" (buf), "=r" (tmp): "0" (offset), "1" 
(buf));
length -= 2;
}
while (lengt

Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Måns Rullgård
Sebastian Frias  writes:

> On 02/08/2016 03:11 PM, Mason wrote:
>> On 08/02/2016 14:37, Måns Rullgård wrote:
>>
>>> Sebastian Frias wrote:
>>>
 By the way, I know some people like the command line, email, etc. but
 there ought to be other tools better suited for patch review...
>>>
>>> Some kernel subsystems use http://patchwork.ozlabs.org/ to track status
>>> of various patches.
>>
>> There's also a kernel bugzilla, but it may be for actual bugs.
>> https://bugzilla.kernel.org/
>>
>
> Thanks, and what would be the definition of a bug?
> I mean, would the issue from this thread qualify?
> Should I have created a bug report before submitting the patch?

No, that is not necessary.  My understanding is that the bugzilla is
more of a place to report a bug you've found but don't have a patch
for.  Patches always go through the mailing lists.

-- 
Måns Rullgård


Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Sebastian Frias

On 02/08/2016 03:11 PM, Mason wrote:

On 08/02/2016 14:37, Måns Rullgård wrote:


Sebastian Frias wrote:


By the way, I know some people like the command line, email, etc. but
there ought to be other tools better suited for patch review...


Some kernel subsystems use http://patchwork.ozlabs.org/ to track status
of various patches.


There's also a kernel bugzilla, but it may be for actual bugs.
https://bugzilla.kernel.org/



Thanks, and what would be the definition of a bug?
I mean, would the issue from this thread qualify?
Should I have created a bug report before submitting the patch?


Re: [PATCH v3] net: ethernet: support "fixed-link" DT node on nb8800 driver

2016-02-08 Thread Måns Rullgård
Sebastian Frias  writes:

>>> By the way, I know some people like the command line, email, etc. but
>>> there ought to be other tools better suited for patch review...
>>
>> Some kernel subsystems use http://patchwork.ozlabs.org/ to track status
>> of various patches.
>>
>
> Thanks, I see that netdev is part of it, and that the patches are there:
>
> https://patchwork.ozlabs.org/patch/580217/
>
> seems like a slight layer over plain email and mailinglists; I was
> thinking of something more in the line of
> https://www.gerritcodereview.com/
> I believe Google uses Gerrit for Android.
> I think Gerrit would probably be too big (and being written in Java,
> using Prolog and other DSLs, implementing its own Git server in Java,
> etc, may make some -or lots?- of kernel developers cry :-) )
> However, in Gerrit it is easier to know where in the "review" process
> we are, because people have to explicitly give a score "+/- X" when
> commenting on a patch.
> Also, the diff can operate between different versions of the patches
> themselves to see if the inlined comments were addressed.

Gerrit has some merits, but for seasoned developers it's largely a
nuisance.  It's probably good at keeping junior/undisciplined developers
from doing too much damage by strictly enforcing a cumbersome process.

-- 
Måns Rullgård


[PATCH V6 2/8] test_bitmap: unit tests for lib/bitmap.c

2016-02-08 Thread Kan Liang
From: David Decotigny 

This is mainly testing bitmap construction and conversion to/from u32[]
for now.

Tested:
  qemu i386, x86_64, ppc, ppc64 BE and LE, ARM.

Signed-off-by: David Decotigny 
---
 lib/Kconfig.debug| 8 
 lib/Makefile | 1 +
 tools/testing/selftests/lib/Makefile | 2 +-
 3 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index ecb9e75..f890ee5 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1738,6 +1738,14 @@ config TEST_KSTRTOX
 config TEST_PRINTF
tristate "Test printf() family of functions at runtime"
 
+config TEST_BITMAP
+   tristate "Test bitmap_*() family of functions at runtime"
+   default n
+   help
+ Enable this option to test the bitmap functions at boot.
+
+ If unsure, say N.
+
 config TEST_RHASHTABLE
tristate "Perform selftest on resizable hash table"
default n
diff --git a/lib/Makefile b/lib/Makefile
index a7c26a4..dda4039 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -43,6 +43,7 @@ obj-$(CONFIG_TEST_USER_COPY) += test_user_copy.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_keys.o
 obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
 obj-$(CONFIG_TEST_PRINTF) += test_printf.o
+obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
 
 ifeq ($(CONFIG_DEBUG_KOBJECT),y)
 CFLAGS_kobject.o += -DDEBUG
diff --git a/tools/testing/selftests/lib/Makefile 
b/tools/testing/selftests/lib/Makefile
index 47147b9..0836006 100644
--- a/tools/testing/selftests/lib/Makefile
+++ b/tools/testing/selftests/lib/Makefile
@@ -3,6 +3,6 @@
 # No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
 all:
 
-TEST_PROGS := printf.sh
+TEST_PROGS := printf.sh bitmap.sh
 
 include ../lib.mk
-- 
1.8.3.1



[PATCH V4 7/8] i40e/ethtool: support coalesce getting by queue

2016-02-08 Thread Kan Liang
From: Kan Liang 

This patch implements get_per_queue_coalesce for i40e driver.

Signed-off-by: Kan Liang 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index ed53966..e9ad69a 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1898,6 +1898,12 @@ static int i40e_get_coalesce(struct net_device *netdev,
return __i40e_get_coalesce(netdev, ec, -1);
 }
 
+static int i40e_get_per_queue_coalesce(struct net_device *netdev, int queue,
+  struct ethtool_coalesce *ec)
+{
+   return __i40e_get_coalesce(netdev, ec, queue);
+}
+
 static void i40e_set_itr_per_queue(struct i40e_vsi *vsi,
   struct ethtool_coalesce *ec,
   int queue)
@@ -2874,6 +2880,7 @@ static const struct ethtool_ops i40e_ethtool_ops = {
.get_ts_info= i40e_get_ts_info,
.get_priv_flags = i40e_get_priv_flags,
.set_priv_flags = i40e_set_priv_flags,
+   .get_per_queue_coalesce = i40e_get_per_queue_coalesce,
 };
 
 void i40e_set_ethtool_ops(struct net_device *netdev)
-- 
1.8.3.1



[PATCH V4 8/8] i40e/ethtool: support coalesce setting by queue

2016-02-08 Thread Kan Liang
From: Kan Liang 

This patch implements set_per_queue_coalesce for i40e driver.

Signed-off-by: Kan Liang 
---
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index e9ad69a..91a2d29 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -2008,6 +2008,12 @@ static int i40e_set_coalesce(struct net_device *netdev,
return __i40e_set_coalesce(netdev, ec, -1);
 }
 
+static int i40e_set_per_queue_coalesce(struct net_device *netdev, int queue,
+  struct ethtool_coalesce *ec)
+{
+   return __i40e_set_coalesce(netdev, ec, queue);
+}
+
 /**
  * i40e_get_rss_hash_opts - Get RSS hash Input Set for each flow type
  * @pf: pointer to the physical function struct
@@ -2881,6 +2887,7 @@ static const struct ethtool_ops i40e_ethtool_ops = {
.get_priv_flags = i40e_get_priv_flags,
.set_priv_flags = i40e_set_priv_flags,
.get_per_queue_coalesce = i40e_get_per_queue_coalesce,
+   .set_per_queue_coalesce = i40e_set_per_queue_coalesce,
 };
 
 void i40e_set_ethtool_ops(struct net_device *netdev)
-- 
1.8.3.1



[PATCH V4 3/8] net/ethtool: introduce a new ioctl for per queue setting

2016-02-08 Thread Kan Liang
From: Kan Liang 

Introduce a new ioctl ETHTOOL_PERQUEUE for per queue parameters setting.
The following patches will enable some SUB_COMMANDs for per queue
setting.

Signed-off-by: Kan Liang 
---
 include/uapi/linux/ethtool.h | 17 +
 net/core/ethtool.c   | 27 +--
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index b2e1801..c0b5aaa 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -1146,6 +1146,21 @@ enum ethtool_sfeatures_retval_bits {
 #define ETHTOOL_F_WISH  (1 << ETHTOOL_F_WISH__BIT)
 #define ETHTOOL_F_COMPAT(1 << ETHTOOL_F_COMPAT__BIT)
 
+#define MAX_NUM_QUEUE  4096
+
+/**
+ * struct ethtool_per_queue_op - apply sub command to the queues in mask.
+ * @cmd: ETHTOOL_PERQUEUE
+ * @sub_command: the sub command which apply to each queues
+ * @queue_mask: Bitmap of the queues which sub command apply to
+ * @data: A complete command structure following for each of the queues 
addressed
+ */
+struct ethtool_per_queue_op {
+   __u32   cmd;
+   __u32   sub_command;
+   __u32   queue_mask[DIV_ROUND_UP(MAX_NUM_QUEUE, 32)];
+   chardata[];
+};
 
 /* CMDs currently supported */
 #define ETHTOOL_GSET   0x0001 /* Get settings. */
@@ -1229,6 +1244,8 @@ enum ethtool_sfeatures_retval_bits {
 #define ETHTOOL_STUNABLE   0x0049 /* Set tunable configuration */
 #define ETHTOOL_GPHYSTATS  0x004a /* get PHY-specific statistics */
 
+#define ETHTOOL_PERQUEUE   0x004b /* Set per queue options */
+
 /* compatibility with older code */
 #define SPARC_ETH_GSET ETHTOOL_GSET
 #define SPARC_ETH_SSET ETHTOOL_SSET
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index 453c803..d0f7146 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1823,13 +1823,27 @@ out:
return ret;
 }
 
+static int ethtool_set_per_queue(struct net_device *dev, void __user *useraddr)
+{
+   struct ethtool_per_queue_op per_queue_opt;
+
+   if (copy_from_user(&per_queue_opt, useraddr, sizeof(per_queue_opt)))
+   return -EFAULT;
+
+   switch (per_queue_opt.sub_command) {
+
+   default:
+   return -EOPNOTSUPP;
+   };
+}
+
 /* The main entry point in this file.  Called from net/core/dev_ioctl.c */
 
 int dev_ethtool(struct net *net, struct ifreq *ifr)
 {
struct net_device *dev = __dev_get_by_name(net, ifr->ifr_name);
void __user *useraddr = ifr->ifr_data;
-   u32 ethcmd;
+   u32 ethcmd, sub_cmd;
int rc;
netdev_features_t old_features;
 
@@ -1839,8 +1853,14 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
if (copy_from_user(ðcmd, useraddr, sizeof(ethcmd)))
return -EFAULT;
 
+   if (ethcmd == ETHTOOL_PERQUEUE) {
+   if (copy_from_user(&sub_cmd, useraddr + sizeof(ethcmd), 
sizeof(sub_cmd)))
+   return -EFAULT;
+   } else {
+   sub_cmd = ethcmd;
+   }
/* Allow some commands to be done by anyone */
-   switch (ethcmd) {
+   switch (sub_cmd) {
case ETHTOOL_GSET:
case ETHTOOL_GDRVINFO:
case ETHTOOL_GMSGLVL:
@@ -2070,6 +2090,9 @@ int dev_ethtool(struct net *net, struct ifreq *ifr)
case ETHTOOL_GPHYSTATS:
rc = ethtool_get_phy_stats(dev, useraddr);
break;
+   case ETHTOOL_PERQUEUE:
+   rc = ethtool_set_per_queue(dev, useraddr);
+   break;
default:
rc = -EOPNOTSUPP;
}
-- 
1.8.3.1



[PATCH V4 4/8] net/ethtool: support get coalesce per queue

2016-02-08 Thread Kan Liang
From: Kan Liang 

This patch implements sub command ETHTOOL_GCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface get_per_queue_coalesce to
get coalesce of each masked queue from device driver. Then the interrupt
coalescing parameters will be copied back to user space one by one.

Signed-off-by: Kan Liang 
---
 include/linux/ethtool.h | 10 +-
 net/core/ethtool.c  | 34 +-
 2 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 653dc9c..a83566f 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -201,6 +201,13 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, 
u32 n_rx_rings)
  * @get_module_eeprom: Get the eeprom information from the plug-in module
  * @get_eee: Get Energy-Efficient (EEE) supported and status.
  * @set_eee: Set EEE status (enable/disable) as well as LPI timers.
+ * @get_per_queue_coalesce: Get interrupt coalescing parameters per queue.
+ * It needs to do range check for the input queue number. Only if
+ * neither RX nor TX queue number is in the range, a negative error code
+ * returns. For the case that only RX or only TX is not in the range,
+ * zero should return. But related unavailable fields should be set to ~0,
+ * which indicates RX or TX is not in the range.
+ * Returns a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -279,7 +286,8 @@ struct ethtool_ops {
   const struct ethtool_tunable *, void *);
int (*set_tunable)(struct net_device *,
   const struct ethtool_tunable *, const void *);
-
+   int (*get_per_queue_coalesce)(struct net_device *, int,
+ struct ethtool_coalesce *);
 
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index d0f7146..ccefbf5 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1823,6 +1823,37 @@ out:
return ret;
 }
 
+static int ethtool_get_per_queue_coalesce(struct net_device *dev,
+ void __user *useraddr,
+ struct ethtool_per_queue_op 
*per_queue_opt)
+{
+   int bit, ret;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if (!dev->ethtool_ops->get_per_queue_coalesce)
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   struct ethtool_coalesce coalesce = { .cmd = ETHTOOL_GCOALESCE };
+
+   ret = dev->ethtool_ops->get_per_queue_coalesce(dev, bit, 
&coalesce);
+   if (ret != 0)
+   return ret;
+   if (copy_to_user(useraddr, &coalesce, sizeof(coalesce)))
+   return -EFAULT;
+   useraddr += sizeof(coalesce);
+   }
+
+   return 0;
+}
+
 static int ethtool_set_per_queue(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_per_queue_op per_queue_opt;
@@ -1831,7 +1862,8 @@ static int ethtool_set_per_queue(struct net_device *dev, 
void __user *useraddr)
return -EFAULT;
 
switch (per_queue_opt.sub_command) {
-
+   case ETHTOOL_GCOALESCE:
+   return ethtool_get_per_queue_coalesce(dev, useraddr, 
&per_queue_opt);
default:
return -EOPNOTSUPP;
};
-- 
1.8.3.1



[PATCH V6 1/8] lib/bitmap.c: conversion routines to/from u32 array

2016-02-08 Thread Kan Liang
From: David Decotigny 

Aimed at transferring bitmaps to/from user-space in a 32/64-bit agnostic
way.

Tested:
  unit tests (next patch) on qemu i386, x86_64, ppc, ppc64 BE and LE,
  ARM.

Signed-off-by: David Decotigny 
---
 include/linux/bitmap.h |  6 
 lib/bitmap.c   | 86 ++
 2 files changed, 92 insertions(+)

diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h
index 9653fdb..f7dc158 100644
--- a/include/linux/bitmap.h
+++ b/include/linux/bitmap.h
@@ -59,6 +59,8 @@
  * bitmap_find_free_region(bitmap, bits, order)Find and allocate bit 
region
  * bitmap_release_region(bitmap, pos, order)   Free specified bit region
  * bitmap_allocate_region(bitmap, pos, order)  Allocate specified bit region
+ * bitmap_from_u32array(dst, nbits, buf, nwords) *dst = *buf (nwords 32b words)
+ * bitmap_to_u32array(buf, nwords, src, nbits) *buf = *dst (nwords 32b words)
  */
 
 /*
@@ -163,6 +165,10 @@ extern void bitmap_fold(unsigned long *dst, const unsigned 
long *orig,
 extern int bitmap_find_free_region(unsigned long *bitmap, unsigned int bits, 
int order);
 extern void bitmap_release_region(unsigned long *bitmap, unsigned int pos, int 
order);
 extern int bitmap_allocate_region(unsigned long *bitmap, unsigned int pos, int 
order);
+extern void bitmap_from_u32array(unsigned long *bitmap, unsigned int nbits,
+const u32 *buf, unsigned int nwords);
+extern void bitmap_to_u32array(u32 *buf, unsigned int nwords,
+  const unsigned long *bitmap, unsigned int nbits);
 #ifdef __BIG_ENDIAN
 extern void bitmap_copy_le(unsigned long *dst, const unsigned long *src, 
unsigned int nbits);
 #else
diff --git a/lib/bitmap.c b/lib/bitmap.c
index 8148143..e1cc648 100644
--- a/lib/bitmap.c
+++ b/lib/bitmap.c
@@ -12,6 +12,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1060,6 +1062,90 @@ int bitmap_allocate_region(unsigned long *bitmap, 
unsigned int pos, int order)
 EXPORT_SYMBOL(bitmap_allocate_region);
 
 /**
+ * bitmap_from_u32array - copy the contents of a u32 array of bits to bitmap
+ * @bitmap: array of unsigned longs, the destination bitmap, non NULL
+ * @nbits: number of bits in @bitmap
+ * @buf: array of u32 (in host byte order), the source bitmap, non NULL
+ * @nwords: number of u32 words in @buf
+ *
+ * copy min(nbits, 32*nwords) bits from @buf to @bitmap, remaining
+ * bits between nword and nbits in @bitmap (if any) are cleared. In
+ * last word of @bitmap, the bits beyond nbits (if any) are kept
+ * unchanged.
+ */
+void bitmap_from_u32array(unsigned long *bitmap, unsigned int nbits,
+ const u32 *buf, unsigned int nwords)
+{
+   unsigned int k;
+   const u32 *src = buf;
+
+   for (k = 0; k < BITS_TO_LONGS(nbits); ++k) {
+   unsigned long part = 0;
+
+   if (nwords) {
+   part = *src++;
+   nwords--;
+   }
+
+#if BITS_PER_LONG == 64
+   if (nwords) {
+   part |= ((unsigned long) *src++) << 32;
+   nwords--;
+   }
+#endif
+
+   if (k < nbits/BITS_PER_LONG)
+   bitmap[k] = part;
+   else {
+   unsigned long mask = BITMAP_LAST_WORD_MASK(nbits);
+
+   bitmap[k] = (bitmap[k] & ~mask) | (part & mask);
+   }
+   }
+}
+EXPORT_SYMBOL(bitmap_from_u32array);
+
+/**
+ * bitmap_to_u32array - copy the contents of bitmap to a u32 array of bits
+ * @buf: array of u32 (in host byte order), the dest bitmap, non NULL
+ * @nwords: number of u32 words in @buf
+ * @bitmap: array of unsigned longs, the source bitmap, non NULL
+ * @nbits: number of bits in @bitmap
+ *
+ * copy min(nbits, 32*nwords) bits from @bitmap to @buf. Remaining
+ * bits after nbits in @buf (if any) are cleared.
+ */
+void bitmap_to_u32array(u32 *buf, unsigned int nwords,
+   const unsigned long *bitmap, unsigned int nbits)
+{
+   unsigned int k = 0;
+   u32 *dst = buf;
+
+   while (nwords) {
+   unsigned long part = 0;
+
+   if (k < BITS_TO_LONGS(nbits)) {
+   part = bitmap[k];
+   if (k >= nbits/BITS_PER_LONG)
+   part &= BITMAP_LAST_WORD_MASK(nbits);
+   k++;
+   }
+
+   *dst++ = part & 0xUL;
+   nwords--;
+
+#if BITS_PER_LONG == 64
+   if (nwords) {
+   part >>= 32;
+   *dst++ = part & 0xUL;
+   nwords--;
+   }
+#endif
+   }
+}
+EXPORT_SYMBOL(bitmap_to_u32array);
+
+/**
  * bitmap_copy_le - copy a bitmap, putting the bits into little-endian order.
  * @dst:   destination buffer
  * @src:   bitmap to copy
-- 
1.8.3.1



[PATCH V4 0/8] ethtool per queue parameters support

2016-02-08 Thread Kan Liang
Modern network interface controllers usually support multiple receive
and transmit queues. Each queue may have its own parameters. For
example, Intel XL710/X710 hardware supports per queue interrupt
moderation. However, current ethtool does not support per queue
parameters option. User has to set parameters for the whole NIC.
This series extends ethtool to support per queue parameters option.

Since the support of per queue parameters vary with different cards,
it is impossible to address all cards in one patch. This series only
supports per queue coalesce options on i40e driver. The framework used
in the patch can be easily extended to other cards and parameters.

The lib bitmap needs to be extended to facilitate exchanging queue bitmaps
between user space and kernel space. Two patches from David's latest V6
patch series are also cited in this series. You may refer to
https://lwn.net/Articles/672517/ for more details.

Changes since V3:
 - Based on David's lib bitmap.
 - ETHTOOL_PERQUEUE should be handled before the containing switch
 - make the rollback code unconditional
 - some minor changes according to Ben's feedback

Changes since V2:
 - Add queue-specific settings for interrupt moderation in i40e

Changes since V1:
 - Checking the sub-command number to determine whether the command
   requires CAP_NET_ADMIN
 - Refine the struct ethtool_per_queue_op and improve the comments
 - Use bitmap functions to parse queue mask
 - Improve comments
 - Use bitmap functions to parse queue mask
 - Improve comments
 - Add rollback support
 - Correct the way to find the vector for specific queue.

David Decotigny (2):
  lib/bitmap.c: conversion routines to/from u32 array
  test_bitmap: unit tests for lib/bitmap.c

Kan Liang (6):
  net/ethtool: introduce a new ioctl for per queue setting
  net/ethtool: support get coalesce per queue
  net/ethtool: support set coalesce per queue
  i40e: queue-specific settings for interrupt moderation
  i40e/ethtool: support coalesce getting by queue
  i40e/ethtool: support coalesce setting by queue

 drivers/net/ethernet/intel/i40e/i40e.h |   7 --
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  15 ++-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 154 +
 drivers/net/ethernet/intel/i40e/i40e_main.c|  12 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   9 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|   8 ++
 include/linux/bitmap.h |   6 +
 include/linux/ethtool.h|  18 ++-
 include/uapi/linux/ethtool.h   |  17 +++
 lib/Kconfig.debug  |   8 ++
 lib/Makefile   |   1 +
 lib/bitmap.c   |  86 ++
 net/core/ethtool.c | 119 ++-
 tools/testing/selftests/lib/Makefile   |   2 +-
 14 files changed, 389 insertions(+), 73 deletions(-)

-- 
1.8.3.1



[PATCH V4 5/8] net/ethtool: support set coalesce per queue

2016-02-08 Thread Kan Liang
From: Kan Liang 

This patch implements sub command ETHTOOL_SCOALESCE for ioctl
ETHTOOL_PERQUEUE. It introduces an interface set_per_queue_coalesce to
set coalesce of each masked queue to device driver. The wanted coalesce
information are stored in "data" for each masked queue, which can copy
from userspace.
If it fails to set coalesce to device driver, the value which already
set to specific queue will be tried to rollback.

Signed-off-by: Kan Liang 
---
 include/linux/ethtool.h |  8 +++
 net/core/ethtool.c  | 60 +
 2 files changed, 68 insertions(+)

diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index a83566f..0c699b8 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -208,6 +208,12 @@ static inline u32 ethtool_rxfh_indir_default(u32 index, 
u32 n_rx_rings)
  * zero should return. But related unavailable fields should be set to ~0,
  * which indicates RX or TX is not in the range.
  * Returns a negative error code or zero.
+ * @set_per_queue_coalesce: Set interrupt coalescing parameters per queue.
+ * It needs to do range check for the input queue number. Only if
+ * neither RX nor TX queue number is in the range, a negative error code
+ * returns. For the case that only RX or only TX is not in the range,
+ * zero should return. The related unavailable fields should be avoid.
+ * Returns a negative error code or zero.
  *
  * All operations are optional (i.e. the function pointer may be set
  * to %NULL) and callers must take this into account.  Callers must
@@ -288,6 +294,8 @@ struct ethtool_ops {
   const struct ethtool_tunable *, const void *);
int (*get_per_queue_coalesce)(struct net_device *, int,
  struct ethtool_coalesce *);
+   int (*set_per_queue_coalesce)(struct net_device *, int,
+ struct ethtool_coalesce *);
 
 };
 #endif /* _LINUX_ETHTOOL_H */
diff --git a/net/core/ethtool.c b/net/core/ethtool.c
index ccefbf5..933de09 100644
--- a/net/core/ethtool.c
+++ b/net/core/ethtool.c
@@ -1854,6 +1854,64 @@ static int ethtool_get_per_queue_coalesce(struct 
net_device *dev,
return 0;
 }
 
+static int ethtool_set_per_queue_coalesce(struct net_device *dev,
+ void __user *useraddr,
+ struct ethtool_per_queue_op 
*per_queue_opt)
+{
+   int bit, i, ret = 0;
+   int queue_num;
+   struct ethtool_coalesce *backup = NULL, *tmp = NULL;
+   DECLARE_BITMAP(queue_mask, MAX_NUM_QUEUE);
+
+   if ((!dev->ethtool_ops->set_per_queue_coalesce) ||
+   (!dev->ethtool_ops->get_per_queue_coalesce))
+   return -EOPNOTSUPP;
+
+   useraddr += sizeof(*per_queue_opt);
+
+   bitmap_from_u32array(queue_mask,
+MAX_NUM_QUEUE,
+per_queue_opt->queue_mask,
+DIV_ROUND_UP(MAX_NUM_QUEUE, 32));
+   queue_num = bitmap_weight(queue_mask, MAX_NUM_QUEUE);
+   tmp = backup = kmalloc_array(queue_num, sizeof(*backup), GFP_KERNEL);
+   if (!backup)
+   return -ENOMEM;
+
+   for_each_set_bit(bit, queue_mask, MAX_NUM_QUEUE) {
+   struct ethtool_coalesce coalesce;
+
+   ret = dev->ethtool_ops->get_per_queue_coalesce(dev, bit, tmp);
+   if (ret != 0)
+   goto roll_back;
+
+   tmp += sizeof(struct ethtool_coalesce);
+
+   if (copy_from_user(&coalesce, useraddr, sizeof(coalesce))) {
+   ret = -EFAULT;
+   goto roll_back;
+   }
+
+   ret = dev->ethtool_ops->set_per_queue_coalesce(dev, bit, 
&coalesce);
+   if (ret != 0)
+   goto roll_back;
+
+   useraddr += sizeof(coalesce);
+   }
+
+roll_back:
+   if (ret != 0) {
+   tmp = backup;
+   for_each_set_bit(i, queue_mask, bit) {
+   dev->ethtool_ops->set_per_queue_coalesce(dev, i, tmp);
+   tmp += sizeof(struct ethtool_coalesce);
+   }
+   }
+   kfree(backup);
+
+   return ret;
+}
+
 static int ethtool_set_per_queue(struct net_device *dev, void __user *useraddr)
 {
struct ethtool_per_queue_op per_queue_opt;
@@ -1864,6 +1922,8 @@ static int ethtool_set_per_queue(struct net_device *dev, 
void __user *useraddr)
switch (per_queue_opt.sub_command) {
case ETHTOOL_GCOALESCE:
return ethtool_get_per_queue_coalesce(dev, useraddr, 
&per_queue_opt);
+   case ETHTOOL_SCOALESCE:
+   return ethtool_set_per_queue_coalesce(dev, useraddr, 
&per_queue_opt);
default:
return -EOPNOTSUPP;
};
-- 
1.8.3.1



[PATCH V4 6/8] i40e: queue-specific settings for interrupt moderation

2016-02-08 Thread Kan Liang
From: Kan Liang 

For i40e driver, each vector has its own ITR register. However, there
are no concept of queue-specific settings in the driver proper. Only
global variable is used to store ITR values. That will cause problems
especially when resetting the vector. The specific ITR values could be
lost.
This patch move rx_itr_setting and tx_itr_setting to i40e_ring to store
specific ITR register for each queue.
i40e_get_coalesce and i40e_set_coalesce are also modified accordingly to
support queue-specific settings. To make it compatible with old ethtool,
if user doesn't specify the queue number, i40e_get_coalesce will return
queue 0's value. While i40e_set_coalesce will apply value to all queues.

Signed-off-by: Kan Liang 
Acked-by: Shannon Nelson 
---
 drivers/net/ethernet/intel/i40e/i40e.h |   7 --
 drivers/net/ethernet/intel/i40e/i40e_debugfs.c |  15 ++-
 drivers/net/ethernet/intel/i40e/i40e_ethtool.c | 142 -
 drivers/net/ethernet/intel/i40e/i40e_main.c|  12 +--
 drivers/net/ethernet/intel/i40e/i40e_txrx.c|   9 +-
 drivers/net/ethernet/intel/i40e/i40e_txrx.h|   8 ++
 6 files changed, 123 insertions(+), 70 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e.h 
b/drivers/net/ethernet/intel/i40e/i40e.h
index 53ed3bd..1ed3d22 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -502,13 +502,6 @@ struct i40e_vsi {
struct i40e_ring **tx_rings;
 
u16 work_limit;
-   /* high bit set means dynamic, use accessor routines to read/write.
-* hardware only supports 2us resolution for the ITR registers.
-* these values always store the USER setting, and must be converted
-* before programming to a register.
-*/
-   u16 rx_itr_setting;
-   u16 tx_itr_setting;
u16 int_rate_limit;  /* value in usecs */
 
u16 rss_table_size; /* HW RSS table size */
diff --git a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c 
b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
index 10744a6..40d49f4 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_debugfs.c
@@ -533,6 +533,10 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, int 
seid)
 "rx_rings[%i]: vsi = %p, q_vector = %p\n",
 i, rx_ring->vsi,
 rx_ring->q_vector);
+   dev_info(&pf->pdev->dev,
+"rx_rings[%i]: rx_itr_setting = %d (%s)\n",
+i, rx_ring->rx_itr_setting,
+ITR_IS_DYNAMIC(rx_ring->rx_itr_setting) ? "dynamic" : 
"fixed");
}
for (i = 0; i < vsi->num_queue_pairs; i++) {
struct i40e_ring *tx_ring = ACCESS_ONCE(vsi->tx_rings[i]);
@@ -583,14 +587,15 @@ static void i40e_dbg_dump_vsi_seid(struct i40e_pf *pf, 
int seid)
dev_info(&pf->pdev->dev,
 "tx_rings[%i]: DCB tc = %d\n",
 i, tx_ring->dcb_tc);
+   dev_info(&pf->pdev->dev,
+"tx_rings[%i]: tx_itr_setting = %d (%s)\n",
+i, tx_ring->tx_itr_setting,
+ITR_IS_DYNAMIC(tx_ring->tx_itr_setting) ? "dynamic" : 
"fixed");
}
rcu_read_unlock();
dev_info(&pf->pdev->dev,
-"work_limit = %d, rx_itr_setting = %d (%s), tx_itr_setting 
= %d (%s)\n",
-vsi->work_limit, vsi->rx_itr_setting,
-ITR_IS_DYNAMIC(vsi->rx_itr_setting) ? "dynamic" : "fixed",
-vsi->tx_itr_setting,
-ITR_IS_DYNAMIC(vsi->tx_itr_setting) ? "dynamic" : "fixed");
+"work_limit = %d\n",
+vsi->work_limit);
dev_info(&pf->pdev->dev,
 "max_frame = %d, rx_hdr_len = %d, rx_buf_len = %d dtype = 
%d\n",
 vsi->max_frame, vsi->rx_hdr_len, vsi->rx_buf_len, vsi->dtype);
diff --git a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c 
b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
index 4549591..ed53966 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_ethtool.c
@@ -1849,23 +1849,37 @@ static int i40e_set_phys_id(struct net_device *netdev,
  * 125us (8000 interrupts per second) == ITR(62)
  */
 
-static int i40e_get_coalesce(struct net_device *netdev,
-struct ethtool_coalesce *ec)
+static int __i40e_get_coalesce(struct net_device *netdev,
+  struct ethtool_coalesce *ec,
+  int queue)
 {
struct i40e_netdev_priv *np = netdev_priv(netdev);
struct i40e_vsi *vsi = np->vsi;
+   struct i40e_pf *pf = vsi->back;
 
ec->tx_max_coalesced_frames_irq = vsi->work_limit;
ec->rx_max_coalesced_frames_irq = vsi->work_limit;
 
-   if (ITR_IS_DYNAMIC(vsi->rx_itr_setting))
+   /* rx and tx

Re: [PATCH net] unix: correctly track in-flight fds in sending process user_struct

2016-02-08 Thread David Miller
From: Hannes Frederic Sowa 
Date: Wed,  3 Feb 2016 02:11:03 +0100

> The commit referenced in the Fixes tag incorrectly accounted the number
> of in-flight fds over a unix domain socket to the original opener
> of the file-descriptor. This allows another process to arbitrary
> deplete the original file-openers resource limit for the maximum of
> open files. Instead the sending processes and its struct cred should
> be credited.
> 
> To do so, we add a reference counted struct user_struct pointer to the
> scm_fp_list and use it to account for the number of inflight unix fds.
> 
> Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix 
> sockets")
> Reported-by: David Herrmann 
> Cc: David Herrmann 
> Cc: Willy Tarreau 
> Cc: Linus Torvalds 
> Suggested-by: Linus Torvalds 
> Signed-off-by: Hannes Frederic Sowa 

Applied, thanks Hannes.


[PATCH 8/9] rfkill: Userspace control for airplane mode

2016-02-08 Thread João Paulo Rechi Vita
Provide an interface for the airplane-mode indicator be controlled from
userspace. User has to first acquire the control through
RFKILL_OP_AIRPLANE_MODE_ACQUIRE and keep the fd open for the whole time
it wants to be in control of the indicator. Closing the fd or using
RFKILL_OP_AIRPLANE_MODE_RELEASE restores the default policy.

To change state of the indicator, the RFKILL_OP_AIRPLANE_MODE_CHANGE
operation is used, passing the value on "struct rfkill_event.soft". If
the caller has not acquired the airplane-mode control beforehand, the
operation fails.

Signed-off-by: João Paulo Rechi Vita 
---
 Documentation/rfkill.txt| 10 ++
 include/uapi/linux/rfkill.h |  3 +++
 net/rfkill/core.c   | 47 ++---
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
index b13025a..aa6e014 100644
--- a/Documentation/rfkill.txt
+++ b/Documentation/rfkill.txt
@@ -87,6 +87,7 @@ RFKill provides per-switch LED triggers, which can be used to 
drive LEDs
 according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
 An airplane-mode indicator LED trigger is also available, which triggers
 LED_FULL when all radios known by RFKill are blocked, and LED_OFF otherwise.
+The airplane-mode indicator LED trigger policy can be overridden by userspace.
 
 
 5. Userspace support
@@ -123,5 +124,14 @@ RFKILL_TYPE
 The contents of these variables corresponds to the "name", "state" and
 "type" sysfs files explained above.
 
+Userspace can also override the default airplane-mode indicator policy through
+/dev/rfkill. Control of the airplane mode indicator has to be acquired first,
+using RFKILL_OP_AIRPLANE_MODE_ACQUIRE, and is only available for one userspace
+application at a time. Closing the fd or using RFKILL_OP_AIRPLANE_MODE_RELEASE
+reverts the airplane-mode indicator back to the default kernel policy and makes
+it available for other applications to take control. Changes to the
+airplane-mode indicator state can be made using RFKILL_OP_AIRPLANE_MODE_CHANGE,
+passing the new value in the 'soft' field of 'struct rfkill_event'.
+
 
 For further details consult Documentation/ABI/stable/sysfs-class-rfkill.
diff --git a/include/uapi/linux/rfkill.h b/include/uapi/linux/rfkill.h
index 2e00dce..9cb999b 100644
--- a/include/uapi/linux/rfkill.h
+++ b/include/uapi/linux/rfkill.h
@@ -67,6 +67,9 @@ enum rfkill_operation {
RFKILL_OP_DEL,
RFKILL_OP_CHANGE,
RFKILL_OP_CHANGE_ALL,
+   RFKILL_OP_AIRPLANE_MODE_ACQUIRE,
+   RFKILL_OP_AIRPLANE_MODE_RELEASE,
+   RFKILL_OP_AIRPLANE_MODE_CHANGE,
 };
 
 /**
diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index fb11547..8067701 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -89,6 +89,7 @@ struct rfkill_data {
struct mutexmtx;
wait_queue_head_t   read_wait;
boolinput_handler;
+   boolis_apm_owner;
 };
 
 
@@ -123,7 +124,7 @@ static struct {
 } rfkill_global_states[NUM_RFKILL_TYPES];
 
 static bool rfkill_epo_lock_active;
-
+static bool rfkill_apm_owned;
 
 #ifdef CONFIG_RFKILL_LEDS
 static struct led_trigger rfkill_apm_led_trigger;
@@ -350,7 +351,8 @@ static void rfkill_update_global_state(enum rfkill_type 
type, bool blocked)
 
for (i = 0; i < NUM_RFKILL_TYPES; i++)
rfkill_global_states[i].cur = blocked;
-   rfkill_apm_led_trigger_event(blocked);
+   if (!rfkill_apm_owned)
+   rfkill_apm_led_trigger_event(blocked);
 }
 
 #ifdef CONFIG_RFKILL_INPUT
@@ -1183,6 +1185,7 @@ static ssize_t rfkill_fop_read(struct file *file, char 
__user *buf,
 static ssize_t rfkill_fop_write(struct file *file, const char __user *buf,
size_t count, loff_t *pos)
 {
+   struct rfkill_data *data = file->private_data;
struct rfkill *rfkill;
struct rfkill_event ev;
 
@@ -1199,7 +1202,7 @@ static ssize_t rfkill_fop_write(struct file *file, const 
char __user *buf,
if (copy_from_user(&ev, buf, count))
return -EFAULT;
 
-   if (ev.op != RFKILL_OP_CHANGE && ev.op != RFKILL_OP_CHANGE_ALL)
+   if (ev.op < RFKILL_OP_CHANGE)
return -EINVAL;
 
if (ev.type >= NUM_RFKILL_TYPES)
@@ -1207,6 +1210,34 @@ static ssize_t rfkill_fop_write(struct file *file, const 
char __user *buf,
 
mutex_lock(&rfkill_global_mutex);
 
+   if (ev.op == RFKILL_OP_AIRPLANE_MODE_ACQUIRE) {
+   if (rfkill_apm_owned && !data->is_apm_owner) {
+   count = -EACCES;
+   } else {
+   rfkill_apm_owned = true;
+   data->is_apm_owner = true;
+   }
+   }
+
+   if (ev.op == RFKILL_OP_AIRPLANE_MODE_RELEASE) {
+   if (rfkill_apm_owned && !data->is_apm_owner) {
+   count = -EACCES;
+   } else {
+   bool state

[PATCH 1/9] rfkill: Improve documentation language

2016-02-08 Thread João Paulo Rechi Vita
Signed-off-by: João Paulo Rechi Vita 
---
 net/rfkill/core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index a805831..ffbc375 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -282,8 +282,8 @@ static void rfkill_set_block(struct rfkill *rfkill, bool 
blocked)
spin_lock_irqsave(&rfkill->lock, flags);
if (err) {
/*
-* Failed -- reset status to _prev, this may be different
-* from what set set _PREV to earlier in this function
+* Failed -- reset status to _PREV, which may be different
+* from what we have set _PREV to earlier in this function
 * if rfkill_set_sw_state was invoked.
 */
if (rfkill->state & RFKILL_BLOCK_SW_PREV)
-- 
2.5.0



[PATCH 2/9] rfkill: Remove extra blank line

2016-02-08 Thread João Paulo Rechi Vita
Signed-off-by: João Paulo Rechi Vita 
---
 net/rfkill/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index ffbc375..56d79cb 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -455,7 +455,6 @@ bool rfkill_get_global_sw_state(const enum rfkill_type type)
 }
 #endif
 
-
 bool rfkill_set_hw_state(struct rfkill *rfkill, bool blocked)
 {
unsigned long flags;
-- 
2.5.0



[PATCH 6/9] rfkill: Add documentation about LED triggers

2016-02-08 Thread João Paulo Rechi Vita
Signed-off-by: João Paulo Rechi Vita 
---
 Documentation/rfkill.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
index 2ee6ef9..1f0c270 100644
--- a/Documentation/rfkill.txt
+++ b/Documentation/rfkill.txt
@@ -83,6 +83,8 @@ rfkill drivers that control devices that can be hard-blocked 
unless they also
 assign the poll_hw_block() callback (then the rfkill core will poll the
 device). Don't do this unless you cannot get the event in any other way.
 
+RFKill provides per-switch LED triggers, which can be used to drive LEDs
+according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
 
 
 5. Userspace support
-- 
2.5.0



[PATCH] af_unix: Don't set err in unix_stream_read_generic unless there was an error

2016-02-08 Thread Rainer Weikusat
The present unix_stream_read_generic contains various code sequences of
the form

err = -EDISASTER;
if ()
goto out;

This has the unfortunate side effect of possibly causing the error code
to bleed through to the final

out:
return copied ? : err;

and then to be wrongly returned if no data was copied because the caller
didn't supply a data buffer, as demonstrated by the program available at

http://pad.lv/1540731

Change it such that err is only set if an error condition was detected.

Fixes: 3822b5c2fc62
Signed-off-by: Rainer Weikusat 
---
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 49d5093..c1e4dd7 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -2277,13 +2277,15 @@ static int unix_stream_read_generic(struct 
unix_stream_read_state *state)
size_t size = state->size;
unsigned int last_len;
 
-   err = -EINVAL;
-   if (sk->sk_state != TCP_ESTABLISHED)
+   if (unlikely(sk->sk_state != TCP_ESTABLISHED)) {
+   err = -EINVAL;
goto out;
+   }
 
-   err = -EOPNOTSUPP;
-   if (flags & MSG_OOB)
+   if (unlikely(flags & MSG_OOB)) {
+   err = -EOPNOTSUPP;
goto out;
+   }
 
target = sock_rcvlowat(sk, flags & MSG_WAITALL, size);
timeo = sock_rcvtimeo(sk, noblock);
@@ -2329,9 +2331,11 @@ again:
goto unlock;
 
unix_state_unlock(sk);
-   err = -EAGAIN;
-   if (!timeo)
+   if (!timeo) {
+   err = -EAGAIN;
break;
+   }
+
mutex_unlock(&u->readlock);
 
timeo = unix_stream_data_wait(sk, timeo, last,


[PATCH 0/9] RFKill airplane-mode indicator

2016-02-08 Thread João Paulo Rechi Vita
This series implements an airplane-mode indicator LED trigger, which can be
used by platform drivers. The default policy have have airplane-mode set when
all the radios known by RFKill are OFF, and unset otherwise. This policy can be
overwritten by userspace using the new operations _AIRPLANE_MODE_ACQUIRE,
_AIRPLANE_MODE_RELEASE, and _AIRPLANE_MODE_CHANGE. When the airplane-mode
indicator state changes, userspace gets notifications through the RFKill
control misc device (/dev/rfkill).

The series also contains a few general fixes and improvements to the subsystem.

João Paulo Rechi Vita (9):
  rfkill: Improve documentation language
  rfkill: Remove extra blank line
  rfkill: Point to the correct deprecated doc location
  rfkill: Move "state" sysfs file back to stable
  rfkill: Factor rfkill_global_states[].cur assignments
  rfkill: Add documentation about LED triggers
  rfkill: Create "rfkill-airplane_mode" LED trigger
  rfkill: Userspace control for airplane mode
  rfkill: Notify userspace of airplane-mode state changes

 Documentation/ABI/obsolete/sysfs-class-rfkill |  20 
 Documentation/ABI/stable/sysfs-class-rfkill   |  27 -
 Documentation/rfkill.txt  |  17 +++
 include/uapi/linux/rfkill.h   |   3 +
 net/rfkill/core.c | 144 +-
 5 files changed, 164 insertions(+), 47 deletions(-)
 delete mode 100644 Documentation/ABI/obsolete/sysfs-class-rfkill

-- 
2.5.0



[PATCH 5/9] rfkill: Factor rfkill_global_states[].cur assignments

2016-02-08 Thread João Paulo Rechi Vita
Factor all assignments to rfkill_global_states[].cur into a single
function rfkill_update_global_state().

Signed-off-by: João Paulo Rechi Vita 
---
 net/rfkill/core.c | 38 +-
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 56d79cb..8b96869 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -302,6 +302,19 @@ static void rfkill_set_block(struct rfkill *rfkill, bool 
blocked)
rfkill_event(rfkill);
 }
 
+static void rfkill_update_global_state(enum rfkill_type type, bool blocked)
+{
+   int i;
+
+   if (type != RFKILL_TYPE_ALL) {
+   rfkill_global_states[type].cur = blocked;
+   return;
+   }
+
+   for (i = 0; i < NUM_RFKILL_TYPES; i++)
+   rfkill_global_states[i].cur = blocked;
+}
+
 #ifdef CONFIG_RFKILL_INPUT
 static atomic_t rfkill_input_disabled = ATOMIC_INIT(0);
 
@@ -319,15 +332,7 @@ static void __rfkill_switch_all(const enum rfkill_type 
type, bool blocked)
 {
struct rfkill *rfkill;
 
-   if (type == RFKILL_TYPE_ALL) {
-   int i;
-
-   for (i = 0; i < NUM_RFKILL_TYPES; i++)
-   rfkill_global_states[i].cur = blocked;
-   } else {
-   rfkill_global_states[type].cur = blocked;
-   }
-
+   rfkill_update_global_state(type, blocked);
list_for_each_entry(rfkill, &rfkill_list, node) {
if (rfkill->type != type && type != RFKILL_TYPE_ALL)
continue;
@@ -1164,15 +1169,8 @@ static ssize_t rfkill_fop_write(struct file *file, const 
char __user *buf,
 
mutex_lock(&rfkill_global_mutex);
 
-   if (ev.op == RFKILL_OP_CHANGE_ALL) {
-   if (ev.type == RFKILL_TYPE_ALL) {
-   enum rfkill_type i;
-   for (i = 0; i < NUM_RFKILL_TYPES; i++)
-   rfkill_global_states[i].cur = ev.soft;
-   } else {
-   rfkill_global_states[ev.type].cur = ev.soft;
-   }
-   }
+   if (ev.op == RFKILL_OP_CHANGE_ALL)
+   rfkill_update_global_state(ev.type, ev.soft);
 
list_for_each_entry(rfkill, &rfkill_list, node) {
if (rfkill->idx != ev.idx && ev.op != RFKILL_OP_CHANGE_ALL)
@@ -1261,10 +1259,8 @@ static struct miscdevice rfkill_miscdev = {
 static int __init rfkill_init(void)
 {
int error;
-   int i;
 
-   for (i = 0; i < NUM_RFKILL_TYPES; i++)
-   rfkill_global_states[i].cur = !rfkill_default_state;
+   rfkill_update_global_state(RFKILL_TYPE_ALL, !rfkill_default_state);
 
error = class_register(&rfkill_class);
if (error)
-- 
2.5.0



Re: [PATCH net] ipv6: fix a lockdep splat

2016-02-08 Thread David Miller
From: Eric Dumazet 
Date: Tue, 02 Feb 2016 17:55:01 -0800

> From: Eric Dumazet 
> 
> Silence lockdep false positive about rcu_dereference() being
> used in the wrong context.
> 
> First one should use rcu_dereference_protected() as we own the spinlock.
> 
> Second one should be a normal assignation, as no barrier is needed.
> 
> Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
> Reported-by: Dave Jones 
> Signed-off-by: Eric Dumazet 

Applied and queued up for -stable, thanks.


Re: [PATCH] net: am79c961a: avoid %? in inline assembly

2016-02-08 Thread Russell King - ARM Linux
On Mon, Feb 08, 2016 at 03:33:42PM +0100, Arnd Bergmann wrote:
> The am79c961a.c driver fails to build with clang because of an
> unusual inline assembly construct:
> 
> drivers/net/ethernet/amd/am79c961a.c:53:7: error: invalid % escape in inline 
> assembly string
>  "str%?h%1, [%2]@ NET_RAP\n\t"
> 
> The same change has been done a decade ago in arch/arm as of
> 6a39dd6222dd ("[ARM] 3759/2: Remove uses of %?"), but apparently
> some drivers were missed.
> 
> Signed-off-by: Arnd Bergmann 

Acked-by: Russell King 

Thanks Arnd.

-- 
RMK's Patch system: http://www.arm.linux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.


Re: [net] bonding: trivial: style fixes

2016-02-08 Thread David Miller
From: Zhang Shengju 
Date: Wed,  3 Feb 2016 02:02:32 +

> remove some redudant brackets, use sizeof(*) instead of sizeof(struct x).
> 
> Signed-off-by: Zhang Shengju 

Applied, thanks.


[PATCH 4/9] rfkill: Move "state" sysfs file back to stable

2016-02-08 Thread João Paulo Rechi Vita
There is still quite a bit of code using this interface, so we can't
just remove it. Hopefully it will be possible in the future, but since
its scheduled removal date is past 2 years already, we are better having
the documentation reflecting the current state of things.

Signed-off-by: João Paulo Rechi Vita 
---
 Documentation/ABI/obsolete/sysfs-class-rfkill | 20 
 Documentation/ABI/stable/sysfs-class-rfkill   | 25 ++---
 2 files changed, 22 insertions(+), 23 deletions(-)
 delete mode 100644 Documentation/ABI/obsolete/sysfs-class-rfkill

diff --git a/Documentation/ABI/obsolete/sysfs-class-rfkill 
b/Documentation/ABI/obsolete/sysfs-class-rfkill
deleted file mode 100644
index e736d14..000
--- a/Documentation/ABI/obsolete/sysfs-class-rfkill
+++ /dev/null
@@ -1,20 +0,0 @@
-rfkill - radio frequency (RF) connector kill switch support
-
-For details to this subsystem look at Documentation/rfkill.txt.
-
-What:  /sys/class/rfkill/rfkill[0-9]+/state
-Date:  09-Jul-2007
-KernelVersion  v2.6.22
-Contact:   linux-wirel...@vger.kernel.org
-Description:   Current state of the transmitter.
-   This file is deprecated and scheduled to be removed in 2014,
-   because its not possible to express the 'soft and hard block'
-   state of the rfkill driver.
-Values:A numeric value.
-   0: RFKILL_STATE_SOFT_BLOCKED
-   transmitter is turned off by software
-   1: RFKILL_STATE_UNBLOCKED
-   transmitter is (potentially) active
-   2: RFKILL_STATE_HARD_BLOCKED
-   transmitter is forced off by something outside of
-   the driver's control.
diff --git a/Documentation/ABI/stable/sysfs-class-rfkill 
b/Documentation/ABI/stable/sysfs-class-rfkill
index e51571e..e1ba4a1 100644
--- a/Documentation/ABI/stable/sysfs-class-rfkill
+++ b/Documentation/ABI/stable/sysfs-class-rfkill
@@ -5,9 +5,6 @@ For details to this subsystem look at Documentation/rfkill.txt.
 For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in
 Documentation/ABI/removed/sysfs-class-rfkill.
 
-For the deprecated /sys/class/rfkill/*/state knobs of this interface look in
-Documentation/ABI/obsolete/sysfs-class-rfkill.
-
 What:  /sys/class/rfkill
 Date:  09-Jul-2007
 KernelVersion: v2.6.22
@@ -44,6 +41,28 @@ Values:  A numeric value.
1: true
 
 
+What:  /sys/class/rfkill/rfkill[0-9]+/state
+Date:  09-Jul-2007
+KernelVersion  v2.6.22
+Contact:   linux-wirel...@vger.kernel.org
+Description:   Current state of the transmitter.
+   This file was scheduled to be removed in 2014, but due to its
+   large number of users it will be sticking around for a bit
+   longer. Despite it being marked as stabe, the newer "hard" and
+   "soft" interfaces should be preffered, since it is not possible
+   to express the 'soft and hard block' state of the rfkill driver
+   through this interface. There will likely be another attempt to
+   remove it in the future.
+Values:A numeric value.
+   0: RFKILL_STATE_SOFT_BLOCKED
+   transmitter is turned off by software
+   1: RFKILL_STATE_UNBLOCKED
+   transmitter is (potentially) active
+   2: RFKILL_STATE_HARD_BLOCKED
+   transmitter is forced off by something outside of
+   the driver's control.
+
+
 What:  /sys/class/rfkill/rfkill[0-9]+/hard
 Date:  12-March-2010
 KernelVersion  v2.6.34
-- 
2.5.0



[PATCH 3/9] rfkill: Point to the correct deprecated doc location

2016-02-08 Thread João Paulo Rechi Vita
The "claim" sysfs interface has been removed, so its documentation now
lives in the "removed" folder.

Signed-off-by: João Paulo Rechi Vita 
---
 Documentation/ABI/stable/sysfs-class-rfkill | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Documentation/ABI/stable/sysfs-class-rfkill 
b/Documentation/ABI/stable/sysfs-class-rfkill
index 097f522..e51571e 100644
--- a/Documentation/ABI/stable/sysfs-class-rfkill
+++ b/Documentation/ABI/stable/sysfs-class-rfkill
@@ -2,8 +2,10 @@ rfkill - radio frequency (RF) connector kill switch support
 
 For details to this subsystem look at Documentation/rfkill.txt.
 
-For the deprecated /sys/class/rfkill/*/state and
-/sys/class/rfkill/*/claim knobs of this interface look in
+For the deprecated /sys/class/rfkill/*/claim knobs of this interface look in
+Documentation/ABI/removed/sysfs-class-rfkill.
+
+For the deprecated /sys/class/rfkill/*/state knobs of this interface look in
 Documentation/ABI/obsolete/sysfs-class-rfkill.
 
 What:  /sys/class/rfkill
-- 
2.5.0



[PATCH 9/9] rfkill: Notify userspace of airplane-mode state changes

2016-02-08 Thread João Paulo Rechi Vita
Signed-off-by: João Paulo Rechi Vita 
---
 Documentation/rfkill.txt |  3 +++
 net/rfkill/core.c| 13 +
 2 files changed, 16 insertions(+)

diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
index aa6e014..5248812 100644
--- a/Documentation/rfkill.txt
+++ b/Documentation/rfkill.txt
@@ -133,5 +133,8 @@ it available for other applications to take control. 
Changes to the
 airplane-mode indicator state can be made using RFKILL_OP_AIRPLANE_MODE_CHANGE,
 passing the new value in the 'soft' field of 'struct rfkill_event'.
 
+This same API is also used to provide userspace with notifications of changes
+to airplane-mode indicator state.
+
 
 For further details consult Documentation/ABI/stable/sysfs-class-rfkill.
diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 8067701..abbb8f7 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -131,7 +131,20 @@ static struct led_trigger rfkill_apm_led_trigger;
 
 static void rfkill_apm_led_trigger_event(bool state)
 {
+   struct rfkill_data *data;
+   struct rfkill_int_event *ev;
+
led_trigger_event(&rfkill_apm_led_trigger, state ? LED_FULL : LED_OFF);
+
+   list_for_each_entry(data, &rfkill_fds, list) {
+   ev = kzalloc(sizeof(*ev), GFP_KERNEL);
+   if (!ev)
+   continue;
+   ev->ev.op = RFKILL_OP_AIRPLANE_MODE_CHANGE;
+   ev->ev.soft = state;
+   list_add_tail(&ev->list, &data->events);
+   wake_up_interruptible(&data->read_wait);
+   }
 }
 
 static void rfkill_apm_led_trigger_activate(struct led_classdev *led)
-- 
2.5.0



[PATCH 7/9] rfkill: Create "rfkill-airplane_mode" LED trigger

2016-02-08 Thread João Paulo Rechi Vita
This creates a new LED trigger to be used by platform drivers as a
default trigger for airplane-mode indicator LEDs.

By default this trigger will fire when RFKILL_OP_CHANGE_ALL is called
for all types (RFKILL_TYPE_ALL), setting the LED brightness to LED_FULL
when the changing the state to blocked, and to LED_OFF when the changing
the state to unblocked. In the future there will be a mechanism for
userspace to override the default policy, so it can implement its own.

This trigger will be used by the asus-wireless x86 platform driver.

Signed-off-by: João Paulo Rechi Vita 
---
 Documentation/rfkill.txt |  2 ++
 net/rfkill/core.c| 49 +++-
 2 files changed, 50 insertions(+), 1 deletion(-)

diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
index 1f0c270..b13025a 100644
--- a/Documentation/rfkill.txt
+++ b/Documentation/rfkill.txt
@@ -85,6 +85,8 @@ device). Don't do this unless you cannot get the event in any 
other way.
 
 RFKill provides per-switch LED triggers, which can be used to drive LEDs
 according to the switch state (LED_FULL when blocked, LED_OFF otherwise).
+An airplane-mode indicator LED trigger is also available, which triggers
+LED_FULL when all radios known by RFKill are blocked, and LED_OFF otherwise.
 
 
 5. Userspace support
diff --git a/net/rfkill/core.c b/net/rfkill/core.c
index 8b96869..fb11547 100644
--- a/net/rfkill/core.c
+++ b/net/rfkill/core.c
@@ -126,6 +126,30 @@ static bool rfkill_epo_lock_active;
 
 
 #ifdef CONFIG_RFKILL_LEDS
+static struct led_trigger rfkill_apm_led_trigger;
+
+static void rfkill_apm_led_trigger_event(bool state)
+{
+   led_trigger_event(&rfkill_apm_led_trigger, state ? LED_FULL : LED_OFF);
+}
+
+static void rfkill_apm_led_trigger_activate(struct led_classdev *led)
+{
+   rfkill_apm_led_trigger_event(!rfkill_default_state);
+}
+
+static int rfkill_apm_led_trigger_register(void)
+{
+   rfkill_apm_led_trigger.name = "rfkill-airplane_mode";
+   rfkill_apm_led_trigger.activate = rfkill_apm_led_trigger_activate;
+   return led_trigger_register(&rfkill_apm_led_trigger);
+}
+
+static void rfkill_apm_led_trigger_unregister(void)
+{
+   led_trigger_unregister(&rfkill_apm_led_trigger);
+}
+
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
 {
struct led_trigger *trigger;
@@ -177,6 +201,19 @@ static void rfkill_led_trigger_unregister(struct rfkill 
*rfkill)
led_trigger_unregister(&rfkill->led_trigger);
 }
 #else
+static void rfkill_apm_led_trigger_event(bool state)
+{
+}
+
+static int rfkill_apm_led_trigger_register(void)
+{
+   return 0;
+}
+
+static void rfkill_apm_led_trigger_unregister(void)
+{
+}
+
 static void rfkill_led_trigger_event(struct rfkill *rfkill)
 {
 }
@@ -313,6 +350,7 @@ static void rfkill_update_global_state(enum rfkill_type 
type, bool blocked)
 
for (i = 0; i < NUM_RFKILL_TYPES; i++)
rfkill_global_states[i].cur = blocked;
+   rfkill_apm_led_trigger_event(blocked);
 }
 
 #ifdef CONFIG_RFKILL_INPUT
@@ -1260,15 +1298,22 @@ static int __init rfkill_init(void)
 {
int error;
 
+   error = rfkill_apm_led_trigger_register();
+   if (error)
+   goto out;
+
rfkill_update_global_state(RFKILL_TYPE_ALL, !rfkill_default_state);
 
error = class_register(&rfkill_class);
-   if (error)
+   if (error) {
+   rfkill_apm_led_trigger_unregister();
goto out;
+   }
 
error = misc_register(&rfkill_miscdev);
if (error) {
class_unregister(&rfkill_class);
+   rfkill_apm_led_trigger_unregister();
goto out;
}
 
@@ -1277,6 +1322,7 @@ static int __init rfkill_init(void)
if (error) {
misc_deregister(&rfkill_miscdev);
class_unregister(&rfkill_class);
+   rfkill_apm_led_trigger_unregister();
goto out;
}
 #endif
@@ -1293,5 +1339,6 @@ static void __exit rfkill_exit(void)
 #endif
misc_deregister(&rfkill_miscdev);
class_unregister(&rfkill_class);
+   rfkill_apm_led_trigger_unregister();
 }
 module_exit(rfkill_exit);
-- 
2.5.0



Re: [PATCH 8/9] rfkill: Userspace control for airplane mode

2016-02-08 Thread Dan Williams
On Mon, 2016-02-08 at 10:41 -0500, João Paulo Rechi Vita wrote:
> Provide an interface for the airplane-mode indicator be controlled
> from
> userspace. User has to first acquire the control through
> RFKILL_OP_AIRPLANE_MODE_ACQUIRE and keep the fd open for the whole
> time
> it wants to be in control of the indicator. Closing the fd or using
> RFKILL_OP_AIRPLANE_MODE_RELEASE restores the default policy.
> 
> To change state of the indicator, the RFKILL_OP_AIRPLANE_MODE_CHANGE
> operation is used, passing the value on "struct rfkill_event.soft".
> If
> the caller has not acquired the airplane-mode control beforehand, the
> operation fails.

I'd like to clarify a bit, so tell me if I'm correct or not.  Using
RFKILL_OP_AIRPLANE_MODE_CHANGE does not actually change any device
state. It's just an indicator with no relationship to any of the
registered rfkill switches, right?

I wonder if setting RFKILL_OP_AIRPLANE_MODE_CHANGE(true) shouldn't also
softblock all switches, otherwise you can set airplane mode all day
long with RFKILL_OP_AIRPLANE_MODE_CHANGE and it doesn't actually enable
airplane mode at all?

Dan

> Signed-off-by: João Paulo Rechi Vita 
> ---
>  Documentation/rfkill.txt| 10 ++
>  include/uapi/linux/rfkill.h |  3 +++
>  net/rfkill/core.c   | 47
> ++---
>  3 files changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/rfkill.txt b/Documentation/rfkill.txt
> index b13025a..aa6e014 100644
> --- a/Documentation/rfkill.txt
> +++ b/Documentation/rfkill.txt
> @@ -87,6 +87,7 @@ RFKill provides per-switch LED triggers, which can
> be used to drive LEDs
>  according to the switch state (LED_FULL when blocked, LED_OFF
> otherwise).
>  An airplane-mode indicator LED trigger is also available, which
> triggers
>  LED_FULL when all radios known by RFKill are blocked, and LED_OFF
> otherwise.
> +The airplane-mode indicator LED trigger policy can be overridden by
> userspace.
>  
>  
>  5. Userspace support
> @@ -123,5 +124,14 @@ RFKILL_TYPE
>  The contents of these variables corresponds to the "name", "state"
> and
>  "type" sysfs files explained above.
>  
> +Userspace can also override the default airplane-mode indicator
> policy through
> +/dev/rfkill. Control of the airplane mode indicator has to be
> acquired first,
> +using RFKILL_OP_AIRPLANE_MODE_ACQUIRE, and is only available for one
> userspace
> +application at a time. Closing the fd or using
> RFKILL_OP_AIRPLANE_MODE_RELEASE
> +reverts the airplane-mode indicator back to the default kernel
> policy and makes
> +it available for other applications to take control. Changes to the
> +airplane-mode indicator state can be made using
> RFKILL_OP_AIRPLANE_MODE_CHANGE,
> +passing the new value in the 'soft' field of 'struct rfkill_event'.
> +
>  
>  For further details consult Documentation/ABI/stable/sysfs-class
> -rfkill.
> diff --git a/include/uapi/linux/rfkill.h
> b/include/uapi/linux/rfkill.h
> index 2e00dce..9cb999b 100644
> --- a/include/uapi/linux/rfkill.h
> +++ b/include/uapi/linux/rfkill.h
> @@ -67,6 +67,9 @@ enum rfkill_operation {
>   RFKILL_OP_DEL,
>   RFKILL_OP_CHANGE,
>   RFKILL_OP_CHANGE_ALL,
> + RFKILL_OP_AIRPLANE_MODE_ACQUIRE,
> + RFKILL_OP_AIRPLANE_MODE_RELEASE,
> + RFKILL_OP_AIRPLANE_MODE_CHANGE,
>  };
>  
>  /**
> diff --git a/net/rfkill/core.c b/net/rfkill/core.c
> index fb11547..8067701 100644
> --- a/net/rfkill/core.c
> +++ b/net/rfkill/core.c
> @@ -89,6 +89,7 @@ struct rfkill_data {
>   struct mutexmtx;
>   wait_queue_head_t   read_wait;
>   boolinput_handler;
> + boolis_apm_owner;
>  };
>  
>  
> @@ -123,7 +124,7 @@ static struct {
>  } rfkill_global_states[NUM_RFKILL_TYPES];
>  
>  static bool rfkill_epo_lock_active;
> -
> +static bool rfkill_apm_owned;
>  
>  #ifdef CONFIG_RFKILL_LEDS
>  static struct led_trigger rfkill_apm_led_trigger;
> @@ -350,7 +351,8 @@ static void rfkill_update_global_state(enum
> rfkill_type type, bool blocked)
>  
>   for (i = 0; i < NUM_RFKILL_TYPES; i++)
>   rfkill_global_states[i].cur = blocked;
> - rfkill_apm_led_trigger_event(blocked);
> + if (!rfkill_apm_owned)
> + rfkill_apm_led_trigger_event(blocked);
>  }
>  
>  #ifdef CONFIG_RFKILL_INPUT
> @@ -1183,6 +1185,7 @@ static ssize_t rfkill_fop_read(struct file
> *file, char __user *buf,
>  static ssize_t rfkill_fop_write(struct file *file, const char __user
> *buf,
>   size_t count, loff_t *pos)
>  {
> + struct rfkill_data *data = file->private_data;
>   struct rfkill *rfkill;
>   struct rfkill_event ev;
>  
> @@ -1199,7 +1202,7 @@ static ssize_t rfkill_fop_write(struct file
> *file, const char __user *buf,
>   if (copy_from_user(&ev, buf, count))
>   return -EFAULT;
>  
> - if (ev.op != RFKILL_OP_CHANGE && ev.op !=
> RFKILL_OP_CHANGE_ALL)
> + if (ev.op < RFKILL_OP_CHANGE)
>  

[PATCH iproute2] iplink: bond_slave: fix ad_actor/partner_oper_port_state output

2016-02-08 Thread Nikolay Aleksandrov
From: Nikolay Aleksandrov 

It seems that I've made a mistake when I exported these, instead of a
space in the end I've put a newline character which is wrong and breaks
the single line output.

Fixes: 7d6bc3b87abad ("bonding: export 3ad actor and partner port state")
Reported-by: Sam Tannous 
Signed-off-by: Nikolay Aleksandrov 
---
 ip/iplink_bond_slave.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/ip/iplink_bond_slave.c b/ip/iplink_bond_slave.c
index 9b569b1daa4e..2f3364ee45a5 100644
--- a/ip/iplink_bond_slave.c
+++ b/ip/iplink_bond_slave.c
@@ -80,11 +80,11 @@ static void bond_slave_print_opt(struct link_util *lu, FILE 
*f, struct rtattr *t
rta_getattr_u16(tb[IFLA_BOND_SLAVE_AD_AGGREGATOR_ID]));
 
if (tb[IFLA_BOND_SLAVE_AD_ACTOR_OPER_PORT_STATE])
-   fprintf(f, "ad_actor_oper_port_state %d\n",
+   fprintf(f, "ad_actor_oper_port_state %d ",

rta_getattr_u8(tb[IFLA_BOND_SLAVE_AD_ACTOR_OPER_PORT_STATE]));
 
if (tb[IFLA_BOND_SLAVE_AD_PARTNER_OPER_PORT_STATE])
-   fprintf(f, "ad_partner_oper_port_state %d\n",
+   fprintf(f, "ad_partner_oper_port_state %d ",

rta_getattr_u16(tb[IFLA_BOND_SLAVE_AD_PARTNER_OPER_PORT_STATE]));
 }
 
-- 
2.4.3



Re: [PATCH net-next 1/2] mpls: packet stats

2016-02-08 Thread Robert Shearman

On 06/02/16 10:58, Francois Romieu wrote:

Robert Shearman  :
[...]

diff --git a/net/mpls/Makefile b/net/mpls/Makefile
index 9ca923625016..6fdd61b9eae3 100644
--- a/net/mpls/Makefile
+++ b/net/mpls/Makefile

[...]

@@ -98,6 +94,29 @@ bool mpls_pkt_too_big(const struct sk_buff *skb, unsigned 
int mtu)
  }
  EXPORT_SYMBOL_GPL(mpls_pkt_too_big);

+void mpls_stats_inc_outucastpkts(struct net_device *dev,
+const struct sk_buff *skb)
+{
+   struct mpls_dev *mdev;
+   struct inet6_dev *in6dev;


Nit: the scope can be reduced for both variables.


I'm happy to change this if this is the recommended style, but David 
Laight's reply suggests some doubt.





+
+   if (skb->protocol == htons(ETH_P_MPLS_UC)) {
+   mdev = mpls_dev_get(dev);
+   if (mdev)
+   MPLS_INC_STATS_LEN(mdev, skb->len,
+  MPLS_IFSTATS_MIB_OUTUCASTPKTS,
+  MPLS_IFSTATS_MIB_OUTOCTETS);
+   } else if (skb->protocol == htons(ETH_P_IP)) {
+   IP_UPD_PO_STATS(dev_net(dev), IPSTATS_MIB_OUT, skb->len);
+   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   in6dev = __in6_dev_get(dev);
+   if (in6dev)
+   IP6_UPD_PO_STATS(dev_net(dev), in6dev,
+IPSTATS_MIB_OUT, skb->len);
+   }
+}

[...]

diff --git a/net/mpls/internal.h b/net/mpls/internal.h
index 732a5c17e986..b39770ff2307 100644
--- a/net/mpls/internal.h
+++ b/net/mpls/internal.h

[...]

+#define MPLS_INC_STATS(mdev, field)\
+   do {\
+   __typeof__(*(mdev)->stats) *ptr =\
+   raw_cpu_ptr((mdev)->stats);  \
+   local_bh_disable(); \
+   u64_stats_update_begin(&ptr->syncp); \
+   ptr->mib[field]++;   \
+   u64_stats_update_end(&ptr->syncp);   \
+   local_bh_enable();  \
+   } while (0)


I don't get the point of the local_bh_{disable / enable}.

Which kind of locally enabled bh code section do you anticipate these
helpers to run under ?


When a user process sends an IPv4/IPv6 packet destined to a route with 
mpls lwt encap.


Thanks,
Rob


Re: [PATCH 8/9] rfkill: Userspace control for airplane mode

2016-02-08 Thread Marcel Holtmann
Hi Joa Paulo,

> Provide an interface for the airplane-mode indicator be controlled from
> userspace. User has to first acquire the control through
> RFKILL_OP_AIRPLANE_MODE_ACQUIRE and keep the fd open for the whole time
> it wants to be in control of the indicator. Closing the fd or using
> RFKILL_OP_AIRPLANE_MODE_RELEASE restores the default policy.
> 
> To change state of the indicator, the RFKILL_OP_AIRPLANE_MODE_CHANGE
> operation is used, passing the value on "struct rfkill_event.soft". If
> the caller has not acquired the airplane-mode control beforehand, the
> operation fails.

as explained in an earlier response, the concept Airplane Mode seems to be 
imposing policy into the kernel. Do we really want have this as a kernel 
exposed API.

Regards

Marcel



RE: bonding reports interface up with 0 Mbps

2016-02-08 Thread Tantilov, Emil S
>-Original Message-
>From: Jay Vosburgh [mailto:jay.vosbu...@canonical.com]
>Sent: Thursday, February 04, 2016 4:37 PM
>To: Tantilov, Emil S 
>Cc: netdev@vger.kernel.org; go...@cumulusnetworks.com; zhuyj
>; j...@mellanox.com
>Subject: Re: bonding reports interface up with 0 Mbps
>
>Jay Vosburgh  wrote:
>[...]
>>  Thinking about the trace again... Emil: what happens in the
>>trace before this?  Is there ever a call to the ixgbe_get_settings?
>>Does a NETDEV_UP or NETDEV_CHANGE event ever hit the bond_netdev_event
>>function?
>
>   Emil kindly sent me the trace offline, and I think I see what's
>going on.  It looks like the sequence of events is:
>
>bond_enslave ->
>   bond_update_speed_duplex (device is down, thus DUPLEX/SPEED_UNKNOWN)
>   [ do rest of enslavement, start miimon periodic work ]
>
>   [ time passes, device goes carrier up ]
>
>ixgbe_service_task: eth1: NIC Link is Up 10 Gbps ->
>   netif_carrier_on (arranges for NETDEV_CHANGE notifier out of line)
>
>   [ a few microseconds later ]
>
>bond_mii_monitor ->
>   bond_check_dev_link (now is carrier up)
>   bond_miimon_commit ->   (emits "0 Mbps full duplex" message)
>   bond_lower_state_changed ->
>   bond_netdev_event (NETDEV_CHANGELOWERSTATE, is ignored)
>   bond_3ad_handle_link_change (sees DUPLEX/SPEED_UNKNOWN)
>
>   [ a few microseconds later, in response to ixgbe's netif_carrier_on ]
>
>notifier_call_chain ->
>   bond_netdev_event NETDEV_CHANGE ->
>   bond_update_speed_duplex (sees correct SPEED_1/FULL) ->
>   bond_3ad_adapter_speed_duplex_changed (updates 802.3ad)
>
>   Basically, the race is that the periodic bond_mii_monitor is
>squeezing in between the link going up and bonding's update of the speed
>and duplex in response to the NETDEV_CHANGE triggered by the driver's
>netif_carrier_on call.  bonding ends up using the stale duplex and speed
>information obtained at enslavement time.
>
>   I think that, nowadays, the initial speed and duplex will pretty
>much always be UNKNOWN, at least for real Ethernet devices, because it
>will take longer to autoneg than the time between the dev_open and
>bond_update_speed_duplex calls in bond_enslave.
>
>   Adding a case to bond_netdev_event for CHANGELOWERSTATE works
>because it's a synchronous call from bonding.  For purposes of fixing
>this, it's more or less equivalent to calling bond_update_speed_duplex
>from bond_miimon_commit (which is part of a test patch I posted earlier
>today).
>
>   If the above analysis is correct, then I would expect this patch
>to make the problem go away:
>
>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>index 56b560558884..cabaeb61333d 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -2127,6 +2127,7 @@ static void bond_miimon_commit(struct bonding *bond)
>   continue;
>
>   case BOND_LINK_UP:
>+  bond_update_speed_duplex(slave);
>   bond_set_slave_link_state(slave, BOND_LINK_UP,
> BOND_SLAVE_NOTIFY_NOW);
>   slave->last_link_up = jiffies;

No issues seen over the weekend with this patch. The condition was hit 32 times.

You can add my "tested-by:" when you submit this patch.

Thanks Jay for all your help!
Emil



Re: [PATCH 2/2] fm10k: correctly report error when changing number of channels

2016-02-08 Thread Keller, Jacob E
On Mon, 2016-02-08 at 13:26 +, Jakub Kicinski wrote:
> Hi Jacob!
> 
> First of all thanks for putting your time into sorting this out,
> figuring out what to do with user-set RSS table when queues are
> reconfigured was a head scratcher for me as well.
> 

Yep!

> On Fri,  5 Feb 2016 12:30:21 -0800, Jacob Keller wrote:
> > +#define FM10K_FLAG_RETA_TABLE_CONFIGURED   (u32)(BIT(6))
> 
> If we go with your proposal every driver will have to keep track of 
> how the RSS table was set and find max value on queue reconfig -
> replicating effort and leaving space for diverging behaviour...
> 

in which behavior has already diverged quite significantly, so shoring
that up would be good as well.

> Would it be worth considering to place more of this code in the core?

Yes. I was unsure of how to do this, but I think I have a possible
solution. Since basically all drivers are going to have the same issue,
I think we can just do the check inside net/core/ethtool.c

At least some of the check can be done inside core ethtool, but I think
we still need a way for driver to know it is in "default" mode, as the
driver does behave differently in its reset flow depending on whether
the RSS table has been set.

Maybe we can store it as a flag in the netdev structure instead?

I do agree that the queue size reconfig can handle the new minimum
queue value easily.

Regards,
Jake

Re: [PATCH 8/9] rfkill: Userspace control for airplane mode

2016-02-08 Thread Johannes Berg

On 2016-02-08 17:20, Marcel Holtmann wrote:


as explained in an earlier response, the concept Airplane Mode seems
to be imposing policy into the kernel. Do we really want have this as
a kernel exposed API.


This patch is the one that *solves* that problem ... please read it more 
carefully?


johannes


Re: [PATCH/RFC v4 net-next] ravb: Add dma queue interrupt support

2016-02-08 Thread Yoshihiro Kaneko
Hi Sergei-san,

2016-02-08 2:09 GMT+09:00 Sergei Shtylyov :
> Hello.
>
> On 02/07/2016 07:50 PM, Yoshihiro Kaneko wrote:
>
>> I apologize for not responding to you earlier.
>
>
>Absolutely no problem, these reviews/tests take time from my main tasks
> anyway. :-)
>
>
 From: Kazuya Mizuguchi 

 This patch supports the following interrupts.

 - One interrupt for multiple (descriptor, error, management)
 - One interrupt for emac
 - Four interrupts for dma queue (best effort rx/tx, network control
 rx/tx)

 This patch improve efficiency of the interrupt handler by adding the
 interrupt handler corresponding to each interrupt source described
 above. Additionally, it reduces the number of times of the access to
 EthernetAVB IF.

 Signed-off-by: Kazuya Mizuguchi 
 Signed-off-by: Yoshihiro Kaneko 
 ---

 This patch is based on the master branch of David Miller's next
 networking
 tree.

 v4 [Yoshihiro Kaneko]
 * compile tested only
 * As suggested by Sergei Shtylyov
 drivers/net/ethernet/renesas/ravb.h:
   - make two lines of comment into one line.
   - remove unused definition of xxx_ALL.
 drivers/net/ethernet/renesas/ravb_main.c:
   - remove unrelated change (fix indentation).
   - output warning messages when napi_schedule_prep() fails in
 ravb_dmaq_
 interrupt() like ravb_interrupt().
   - change the function name from req_irq to hook_irq.
   - fix programming error in hook_irq().
   - do free_irq() for rx_irqs[] and tx_irqs[] for only gen3 in
 out_free_
 irq label in ravb_open().

 v3 [Yoshihiro Kaneko]
 * compile tested only
 * As suggested by Sergei Shtylyov
 - update changelog
 drivers/net/ethernet/renesas/ravb.h:
   - add comments to the additional registers like CIE
 drivers/net/ethernet/renesas/ravb_main.c:
   - fix the initialization of the interrupt in ravb_dmac_init()
   - revert ravb_error_interrupt() because gen3 code is wrong
   - change the comment "Management" in ravb_multi_interrupt()
   - add a helper function for request_irq() in ravb_open()
   - revert ravb_close() because atomicity is not necessary here
 drivers/net/ethernet/renesas/ravb_ptp.c:
   - revert ravb_ptp_stop() because atomicity is not necessary here

 v2 [Yoshihiro Kaneko]
 * compile tested only
 * As suggested by Sergei Shtylyov
 - add comment to CIE
 - remove comments from CIE bits
 - fix value of TIx_ALL
 - define each bits for CIE, GIE, GID, RIE0, RID0, RIE2, RID2, TIE,
 TID
 - reversed Christmas tree declaration ordered
 - rename _ravb_emac_interrupt() to ravb_emac_interrupt_unlocked()
 - remove unnecessary clearing of CIE
 - use a bit name corresponding to the target register, RIE0, RIE2,
 TIE,
   TID, RID2, GID, GIE
>>>
>>>
>>>
>>> As I already noted, the changes made to the original patch are
>>> supposed
>>> to be documented above --- (no need to separate diff versions there
>>> though).
>>> Either that, or just say that it's your patch, based on
>>> Mizuguchi-san's
>>> work (the amount of changes makes that possible, I think).
>>
>>
>> I will record that I made a change to this patch in the commit log of
>> the next version.
>> I don't think that I changed the essence of this patch. I changed
>> various trivial things, or fixed bugs you pointed out.
>
>
>OK, as you wish. But in case this gets too tedious, I'll understand if
> you change the authorship.
>
>>> [...]


 diff --git a/drivers/net/ethernet/renesas/ravb_main.c
 b/drivers/net/ethernet/renesas/ravb_main.c
 index ac43ed9..076f25f 100644
 --- a/drivers/net/ethernet/renesas/ravb_main.c
 +++ b/drivers/net/ethernet/renesas/ravb_main.c
>
>
> [...]
>
>>>
 +
 +   spin_lock(&priv->lock);
 +
 +   ris0 = ravb_read(ndev, RIS0);
 +   ric0 = ravb_read(ndev, RIC0);
 +   tis  = ravb_read(ndev, TIS);
 +   tic  = ravb_read(ndev, TIC);
 +
 +   /* Timestamp updated */
 +   if (tis & TIS_TFUF) {
 +   ravb_write(ndev, TID_TFUD, TID);
>>>
>>>
>>>
>>> Wait, you're supposed to clear the TFUF interrupt, not to disable!
>>
>>
>> Thanks for finding this bug.
>>
>>> And with that fixed, this interrupt's handler could get factored out into
>>> a
>>> function...
>>
>>
>> Is this not too small to make a function?
>
>
>I wouldn't say so. But need to count the summary LoCs, of course...
> perhaps indeed not worth it...

I don't feel need for making it a function.
It only clears the interrupt and calls ravb_get_tx_tstamp(). The main processing
is executed in ravb_get_tx_tstamp(). And the caller is only one place
(ravb_interrupt)
other than here.

>
> [...]
>
 @

Re: [PATCH 8/9] rfkill: Userspace control for airplane mode

2016-02-08 Thread João Paulo Rechi Vita
Hello Marcel,

On 8 February 2016 at 11:20, Marcel Holtmann  wrote:
> Hi Joa Paulo,
>
>> Provide an interface for the airplane-mode indicator be controlled from
>> userspace. User has to first acquire the control through
>> RFKILL_OP_AIRPLANE_MODE_ACQUIRE and keep the fd open for the whole time
>> it wants to be in control of the indicator. Closing the fd or using
>> RFKILL_OP_AIRPLANE_MODE_RELEASE restores the default policy.
>>
>> To change state of the indicator, the RFKILL_OP_AIRPLANE_MODE_CHANGE
>> operation is used, passing the value on "struct rfkill_event.soft". If
>> the caller has not acquired the airplane-mode control beforehand, the
>> operation fails.
>
> as explained in an earlier response, the concept Airplane Mode seems to be 
> imposing policy into the kernel. Do we really want have this as a kernel 
> exposed API.
>

On that very same thread we decided to keep using the "airplane mode"
name both for when having the default policy of "set it when all
radios are off" or when allowing userspace to override the default.
Please see the following message from Johannes
http://www.spinics.net/lists/linux-wireless/msg146069.html and let me
know if that covers your concern.

Thanks!

--
João Paulo Rechi Vita
http://about.me/jprvita


Re: [PATCH v3 net-next 2/2] tcp: Add Redundant Data Bundling (RDB)

2016-02-08 Thread Bendik Rønning Opstad
Sorry guys, I messed up that email by including HTML, and it got
rejected by netdev@vger.kernel.org. I'll resend it properly formatted.

Bendik

On 08/02/16 18:17, Bendik Rønning Opstad wrote:
> Eric, thank you for the feedback!
> 
> On Wed, Feb 3, 2016 at 8:34 PM, Eric Dumazet  wrote:
>> On Wed, 2016-02-03 at 19:17 +0100, Bendik Rønning Opstad wrote:
>>> On Tue, Feb 2, 2016 at 9:35 PM, Eric Dumazet 
> wrote:
 Really this looks very complicated.
>>>
>>> Can you be more specific?
>>
>> A lot of code added, needing maintenance cost for years to come.
> 
> Yes, that is understandable.
> 
 Why not simply append the new skb content to prior one ?
>>>
>>> It's not clear to me what you mean. At what stage in the output engine
>>> do you refer to?
>>>
>>> We want to avoid modifying the data of the SKBs in the output queue,
>>
>> Why ? We already do that, as I pointed out.
> 
> I suspect that we might be talking past each other. It wasn't clear to
> me that we were discussing how to implement this in a different way.
> 
> The current retrans collapse functionality only merges SKBs that
> contain data that has already been sent and is about to be
> retransmitted.
> 
> This differs significantly from RDB, which combines both already
> transmitted data and unsent data in the same packet without changing
> how the data is stored (and the state tracked) in the output queue.
> Another difference is that RDB includes un-ACKed data that is not
> considered lost.
> 
>>> therefore we allocate a new SKB (This SKB is named rdb_skb in the code).
>>> The header and payload of the first SKB containing data we want to
>>> redundantly transmit is then copied. Then the payload of the SKBs
> following
>>> next in the output queue is appended onto the rdb_skb. The last payload
>>> that is appended is from the first SKB with unsent data, i.e. the
>>> sk_send_head.
>>>
>>> Would you suggest a different approach?
>>>
 skb_still_in_host_queue(sk, prior_skb) would also tell you if the skb
> is
 really available (ie its clone not sitting/waiting in a qdisc on the
 host)
>>>
>>> Where do you suggest this should be used?
>>
>> To detect if appending data to prior skb is possible.
> 
> I see. As the implementation intentionally avoids modifying SKBs in
> the output queue, this was not obvious.
> 
>> If the prior packet is still in qdisc, no change is allowed,
>> and it is fine : DRB should not trigger anyway.
> 
> Actually, whether the data in the prior SKB is on the wire or is still
> on the host (in qdisc/driver queue) is not relevant. RDB always wants
> to redundantly resend the data if there is room in the packet, because
> the previous packet may become lost.
> 
 Note : select_size() always allocate skb with SKB_WITH_OVERHEAD(2048 -
 MAX_TCP_HEADER) available bytes in skb->data.
>>>
>>> Sure, rdb_build_skb() could use this instead of the calculated
>>> bytes_in_rdb_skb.
>>
>> Point is : small packets already have tail room in skb->head
> 
> Yes, I'm aware of that. But we do not allocate new SKBs because we
> think the existing SKBs do not have enough space available. We do it
> to avoid modifications to the SKBs in the output queue.
> 
>> When RDB decides a packet should be merged into the prior one, you can
>> simply copy payload into the tailroom, then free the skb.
>>
>> No skb allocations are needed, only freeing.
> 
> It wasn't clear to me that you suggest a completely different
> implementation approach altogether.
> 
> As I understand you, the approach you suggest is as follows:
> 
> 1. An SKB containing unsent data is processed for transmission (lets
>call it T_SKB)
> 2. Check if the previous SKB (lets call it P_SKB) (containing sent but
>un-ACKed data) has available (tail) room for the payload contained
>in T_SKB.
> 3. If room in P_SKB:
>   * Copy the unsent data from T_SKB to P_SKB by appending it to the
> linear data and update sequence numbers.
>   * Remove T_SKB (which contains only the new and unsent data) from
> the output queue.
>   * Transmit P_SKB, which now contains some already sent data and some
> unsent data.
> 
> 
> If I have misunderstood, can you please elaborate in detail what you
> mean?
> 
> If this is the approach you suggest, I can think of some potential
> downsides that require further considerations:
> 
> 
> 1) ACK-accounting will work differently
> 
> When the previous SKB (P_SKB) is modified by appending the data of the
> next SKB (T_SKB), what should happen when an incoming ACK
> acknowledges the data that was sent in the original transmission
> (before the SKB was modified), but not the data that was appended
> later? tcp_clean_rtx_queue currently handles partially ACKed SKBs due
> to TSO, in which case the tcp_skb_pcount(skb) > 1. So this function
> would need to be modified to handle this for RDB modified SKBs in the
> queue, where all the data is located in the linear data buffer (no GSO
> segs).
> 
> How should SACK and retrans flags be handled wh

[PATCH v2 2/6] net: phy: spi_ks8995: verify chip and determine revision

2016-02-08 Thread Helmut Buchsbaum
Since the chip variant is now determined by spi_device_id, verify
family and chip id and determine the revision id.

Signed-off-by: Helmut Buchsbaum 
---
 drivers/net/phy/spi_ks8995.c | 118 +--
 1 file changed, 80 insertions(+), 38 deletions(-)

diff --git a/drivers/net/phy/spi_ks8995.c b/drivers/net/phy/spi_ks8995.c
index e848ad9..2803c8e 100644
--- a/drivers/net/phy/spi_ks8995.c
+++ b/drivers/net/phy/spi_ks8995.c
@@ -83,6 +83,8 @@
 
 #define FAMILY_KS8995  0x95
 #define CHIPID_M   0
+#define KS8995_CHIP_ID 0x00
+#define KSZ8864_CHIP_ID0x01
 
 #define KS8995_CMD_WRITE   0x02U
 #define KS8995_CMD_READ0x03U
@@ -97,16 +99,22 @@ enum ks8995_chip_variant {
 
 struct ks8995_chip_params {
char *name;
+   int family_id;
+   int chip_id;
int regs_size;
 };
 
 static const struct ks8995_chip_params ks8995_chip[] = {
[ks8995] = {
.name = "KS8995MA",
+   .family_id = FAMILY_KS8995,
+   .chip_id = KS8995_CHIP_ID,
.regs_size = KS8995_REGS_SIZE,
},
[ksz8864] = {
.name = "KSZ8864RMN",
+   .family_id = FAMILY_KS8995,
+   .chip_id = KSZ8864_CHIP_ID,
.regs_size = KSZ8864_REGS_SIZE,
},
 };
@@ -121,6 +129,7 @@ struct ks8995_switch {
struct ks8995_pdata *pdata;
struct bin_attributeregs_attr;
const struct ks8995_chip_params *chip;
+   int revision_id;
 };
 
 static const struct spi_device_id ks8995_id[] = {
@@ -263,6 +272,73 @@ static ssize_t ks8995_registers_write(struct file *filp, 
struct kobject *kobj,
return ks8995_write(ks8995, buf, off, count);
 }
 
+/* ks8995_get_revision - get chip revision
+ * @ks: pointer to switch instance
+ *
+ * Verify chip family and id and get chip revision.
+ */
+static int ks8995_get_revision(struct ks8995_switch *ks)
+{
+   int err;
+   u8 id0, id1, ksz8864_id;
+
+   /* read family id */
+   err = ks8995_read_reg(ks, KS8995_REG_ID0, &id0);
+   if (err) {
+   err = -EIO;
+   goto err_out;
+   }
+
+   /* verify family id */
+   if (id0 != ks->chip->family_id) {
+   dev_err(&ks->spi->dev, "chip family id mismatch: expected 
0x%02x but 0x%02x read\n",
+   ks->chip->family_id, id0);
+   err = -ENODEV;
+   goto err_out;
+   }
+
+   switch (ks->chip->family_id) {
+   case FAMILY_KS8995:
+   /* try reading chip id at CHIP ID1 */
+   err = ks8995_read_reg(ks, KS8995_REG_ID1, &id1);
+   if (err) {
+   err = -EIO;
+   goto err_out;
+   }
+
+   /* verify chip id */
+   if ((get_chip_id(id1) == CHIPID_M) &&
+   (get_chip_id(id1) == ks->chip->chip_id)) {
+   /* KS8995MA */
+   ks->revision_id = get_chip_rev(id1);
+   } else if (get_chip_id(id1) != CHIPID_M) {
+   /* KSZ8864RMN */
+   err = ks8995_read_reg(ks, KS8995_REG_ID1, &ksz8864_id);
+   if (err) {
+   err = -EIO;
+   goto err_out;
+   }
+
+   if ((ksz8864_id & 0x80) &&
+   (ks->chip->chip_id == KSZ8864_CHIP_ID)) {
+   ks->revision_id = get_chip_rev(id1);
+   }
+
+   } else {
+   dev_err(&ks->spi->dev, "unsupported chip id for KS8995 
family: 0x%02x\n",
+   id1);
+   err = -ENODEV;
+   }
+   break;
+   default:
+   dev_err(&ks->spi->dev, "unsupported family id: 0x%02x\n", id0);
+   err = -ENODEV;
+   break;
+   }
+err_out:
+   return err;
+}
+
 static const struct bin_attribute ks8995_registers_attr = {
.attr = {
.name   = "registers",
@@ -278,7 +354,6 @@ static int ks8995_probe(struct spi_device *spi)
 {
struct ks8995_switch*ks;
struct ks8995_pdata *pdata;
-   u8  ids[2];
int err;
int variant = spi_get_device_id(spi)->driver_data;
 
@@ -309,39 +384,12 @@ static int ks8995_probe(struct spi_device *spi)
return err;
}
 
-   err = ks8995_read(ks, ids, KS8995_REG_ID0, sizeof(ids));
-   if (err < 0) {
-   dev_err(&spi->dev, "unable to read id registers, err=%d\n",
-   err);
+   err = ks8995_get_revision(ks);
+   if (err)
return err;
-   }
-
-   switch (ids[0]) {
-   case FAMILY_KS8995:
-   break;
-   default:
-   dev_err(&spi->dev, "unknown family 

[PATCH v2 5/6] net: phy: spi_ks8995: add support for MICREL KSZ8795CLX

2016-02-08 Thread Helmut Buchsbaum
Add support for MICREL KSZ8795CLX Integrated 5-Port, 10-/100-Managed
Ethernet Switch with Gigabit GMII/RGMII and MII/RMII interfaces.

Signed-off-by: Helmut Buchsbaum 
---
 drivers/net/phy/spi_ks8995.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/net/phy/spi_ks8995.c b/drivers/net/phy/spi_ks8995.c
index f866786..c2d6c23 100644
--- a/drivers/net/phy/spi_ks8995.c
+++ b/drivers/net/phy/spi_ks8995.c
@@ -77,6 +77,7 @@
 
 #define KS8995_REGS_SIZE   0x80
 #define KSZ8864_REGS_SIZE  0x100
+#define KSZ8795_REGS_SIZE  0x100
 
 #define ID1_CHIPID_M   0xf
 #define ID1_CHIPID_S   4
@@ -85,9 +86,11 @@
 #define ID1_START_SW   1   /* start the switch */
 
 #define FAMILY_KS8995  0x95
+#define FAMILY_KSZ8795 0x87
 #define CHIPID_M   0
 #define KS8995_CHIP_ID 0x00
 #define KSZ8864_CHIP_ID0x01
+#define KSZ8795_CHIP_ID0x09
 
 #define KS8995_CMD_WRITE   0x02U
 #define KS8995_CMD_READ0x03U
@@ -97,6 +100,7 @@
 enum ks8995_chip_variant {
ks8995,
ksz8864,
+   ksz8795,
max_variant
 };
 
@@ -126,6 +130,14 @@ static const struct ks8995_chip_params ks8995_chip[] = {
.addr_width = 8,
.addr_shift = 0,
},
+   [ksz8795] = {
+   .name = "KSZ8795CLX",
+   .family_id = FAMILY_KSZ8795,
+   .chip_id = KSZ8795_CHIP_ID,
+   .regs_size = KSZ8795_REGS_SIZE,
+   .addr_width = 12,
+   .addr_shift = 1,
+   },
 };
 
 struct ks8995_pdata {
@@ -145,6 +157,7 @@ struct ks8995_switch {
 static const struct spi_device_id ks8995_id[] = {
{"ks8995", ks8995},
{"ksz8864", ksz8864},
+   {"ksz8795", ksz8795},
{ }
 };
 MODULE_DEVICE_TABLE(spi, ks8995_id);
@@ -358,6 +371,22 @@ static int ks8995_get_revision(struct ks8995_switch *ks)
err = -ENODEV;
}
break;
+   case FAMILY_KSZ8795:
+   /* try reading chip id at CHIP ID1 */
+   err = ks8995_read_reg(ks, KS8995_REG_ID1, &id1);
+   if (err) {
+   err = -EIO;
+   goto err_out;
+   }
+
+   if (get_chip_id(id1) == ks->chip->chip_id) {
+   ks->revision_id = get_chip_rev(id1);
+   } else {
+   dev_err(&ks->spi->dev, "unsupported chip id for KSZ8795 
family: 0x%02x\n",
+   id1);
+   err = -ENODEV;
+   }
+   break;
default:
dev_err(&ks->spi->dev, "unsupported family id: 0x%02x\n", id0);
err = -ENODEV;
-- 
2.1.4



[PATCH v2 4/6] net: phy: spi_ks8995: generalize creation of SPI commands

2016-02-08 Thread Helmut Buchsbaum
Prepare creating SPI reads and writes for other switch families.
The KS8995 family uses the straight forward
<8bit CMD><8bit ADDR>
sequence.
To be able to support KSZ8795 family, which uses
<3bit CMD><12bit ADDR><1 bit TR>
make the SPI command creation chip variant dependent.

Signed-off-by: Helmut Buchsbaum 
---
 drivers/net/phy/spi_ks8995.c | 46 +---
 1 file changed, 35 insertions(+), 11 deletions(-)

diff --git a/drivers/net/phy/spi_ks8995.c b/drivers/net/phy/spi_ks8995.c
index 04d468f..f866786 100644
--- a/drivers/net/phy/spi_ks8995.c
+++ b/drivers/net/phy/spi_ks8995.c
@@ -105,6 +105,8 @@ struct ks8995_chip_params {
int family_id;
int chip_id;
int regs_size;
+   int addr_width;
+   int addr_shift;
 };
 
 static const struct ks8995_chip_params ks8995_chip[] = {
@@ -113,12 +115,16 @@ static const struct ks8995_chip_params ks8995_chip[] = {
.family_id = FAMILY_KS8995,
.chip_id = KS8995_CHIP_ID,
.regs_size = KS8995_REGS_SIZE,
+   .addr_width = 8,
+   .addr_shift = 0,
},
[ksz8864] = {
.name = "KSZ8864RMN",
.family_id = FAMILY_KS8995,
.chip_id = KSZ8864_CHIP_ID,
.regs_size = KSZ8864_REGS_SIZE,
+   .addr_width = 8,
+   .addr_shift = 0,
},
 };
 
@@ -153,20 +159,44 @@ static inline u8 get_chip_rev(u8 val)
return (val >> ID1_REVISION_S) & ID1_REVISION_M;
 }
 
+/* create_spi_cmd - create a chip specific SPI command header
+ * @ks: pointer to switch instance
+ * @cmd: SPI command for switch
+ * @address: register address for command
+ *
+ * Different chip families use different bit pattern to address the switches
+ * registers:
+ *
+ * KS8995: 8bit command + 8bit address
+ * KSZ8795: 3bit command + 12bit address + 1bit TR (?)
+ */
+static inline __be16 create_spi_cmd(struct ks8995_switch *ks, int cmd,
+   unsigned address)
+{
+   u16 result = cmd;
+
+   /* make room for address (incl. address shift) */
+   result <<= ks->chip->addr_width + ks->chip->addr_shift;
+   /* add address */
+   result |= address << ks->chip->addr_shift;
+   /* SPI protocol needs big endian */
+   return cpu_to_be16(result);
+}
 /*  */
 static int ks8995_read(struct ks8995_switch *ks, char *buf,
 unsigned offset, size_t count)
 {
-   u8 cmd[2];
+   __be16 cmd;
struct spi_transfer t[2];
struct spi_message m;
int err;
 
+   cmd = create_spi_cmd(ks, KS8995_CMD_READ, offset);
spi_message_init(&m);
 
memset(&t, 0, sizeof(t));
 
-   t[0].tx_buf = cmd;
+   t[0].tx_buf = &cmd;
t[0].len = sizeof(cmd);
spi_message_add_tail(&t[0], &m);
 
@@ -174,9 +204,6 @@ static int ks8995_read(struct ks8995_switch *ks, char *buf,
t[1].len = count;
spi_message_add_tail(&t[1], &m);
 
-   cmd[0] = KS8995_CMD_READ;
-   cmd[1] = offset;
-
mutex_lock(&ks->lock);
err = spi_sync(ks->spi, &m);
mutex_unlock(&ks->lock);
@@ -184,20 +211,20 @@ static int ks8995_read(struct ks8995_switch *ks, char 
*buf,
return err ? err : count;
 }
 
-
 static int ks8995_write(struct ks8995_switch *ks, char *buf,
 unsigned offset, size_t count)
 {
-   u8 cmd[2];
+   __be16 cmd;
struct spi_transfer t[2];
struct spi_message m;
int err;
 
+   cmd = create_spi_cmd(ks, KS8995_CMD_WRITE, offset);
spi_message_init(&m);
 
memset(&t, 0, sizeof(t));
 
-   t[0].tx_buf = cmd;
+   t[0].tx_buf = &cmd;
t[0].len = sizeof(cmd);
spi_message_add_tail(&t[0], &m);
 
@@ -205,9 +232,6 @@ static int ks8995_write(struct ks8995_switch *ks, char *buf,
t[1].len = count;
spi_message_add_tail(&t[1], &m);
 
-   cmd[0] = KS8995_CMD_WRITE;
-   cmd[1] = offset;
-
mutex_lock(&ks->lock);
err = spi_sync(ks->spi, &m);
mutex_unlock(&ks->lock);
-- 
2.1.4



[PATCH v2 6/6] dt-bindings: net: ks8995: add bindings documentation for ks8995

2016-02-08 Thread Helmut Buchsbaum
Signed-off-by: Helmut Buchsbaum 
---
 .../devicetree/bindings/net/micrel-ks8995.txt| 20 
 1 file changed, 20 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/micrel-ks8995.txt

diff --git a/Documentation/devicetree/bindings/net/micrel-ks8995.txt 
b/Documentation/devicetree/bindings/net/micrel-ks8995.txt
new file mode 100644
index 000..7f11ca6
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/micrel-ks8995.txt
@@ -0,0 +1,20 @@
+Micrel KS8995 SPI controlled Ethernet Switch families
+
+Required properties (according to spi-bus.txt):
+- compatible: either "micrel,ks8995", "micrel,ksz8864" or "micrel,ksz8795"
+
+Optional properties:
+- reset-gpios : phandle of gpio that will be used to reset chip during probe
+
+Example:
+
+spi-master {
+   ...
+   ksz8795 {
+   compatible = "micrel,ksz8795";
+
+   reg = <0>;
+   spi-max-frequency = <5000>;
+   reset-gpios = <&gpio0 46 1>;
+   };
+};
-- 
2.1.4



  1   2   >