Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-22 Thread Uwe Kleine-König
Hello Guenter,

On Sun, May 22, 2016 at 06:06:24PM -0700, Guenter Roeck wrote:
> On 05/22/2016 11:21 AM, Uwe Kleine-König wrote:
> >On Sun, May 22, 2016 at 08:41:23AM -0700, Guenter Roeck wrote:
> >>I am not exactly in favor of forcing GPIOLIB to be enabled for every
> >>system with Ethernet support, just because a few of them may require
> >>a gpio based phy reset. The same is true, really, for every other
> >>driver using _optional gpiolib functions.
> >>
> >>Personally, I think that the _optional functions in gpiolib _should_
> >>return no error if gpiolib is not configured. After all, those
> >>gpio pins _are_ supposed to be optional. gpiolib should be enabled
> >>for affected configurations, ie on systems with gpio support which
> >>do need the optional gpio pins. Forcing gpiolib to be enabled even
> >>in systems with no gpio support seems to be a bit heavy-handed.
> >>It just bloats the kernel on such systems with no added benefit.
> >
> >That's wrong. The usage of gpio_get_optional and friends means that the
> >gpio is optional for the *driver*, that is there are devices that make
> >use of said gpio and others don't. For the devices where the gpio is
> >specified its usage is not optional. So it must not be ignored e.g. by
> >GPIOLIB=n configurations.
> 
> Hi Uwe,
> 
> I understand what you are saying, I just have a different perspective.
> From your point of view, you consider it unacceptable if optional GPIO
> pins are made unavailable by changing the configuration to GPIOLIB=n.
> Ok, but that position has number of drawbacks.
> 
> Playing the system this way only works if the optional GPIO pins
> are the only GPIO pins in use by the system. In almost all cases, this
> will not be the case. Disabling GPIOLIB in systems really needing it
> is simply not a realistic option.
> 
> As such, disabling GPIOLIB to "disable" optional GPIO pins is in most cases
> quite heavy-handed. Whoever wants to "disable" those optional pins can simply
> disable them by removing the respective lines from the devicetree file,
> or from the ACPI DSDT. Much easier to do, with less side effects.

Yeah, I agree. I don't see where the idea comes from to disable GPIOLIB
to make a driver work and it seems from your reply that is was mine.
Note I didn't want to imply this. Also I expect that returning NULL in
gpiod_get_optional breaks a few setups and I consider this worse enough
to accept that the driver failes for many devices that would work just
fine with NULL. The reason is that it's easier to debug as the code
fails at a place that is related to the GPIO in question and not at some
obscure point later in time.

> So, in summary, the current approach of mandating that GPIOLIB=y to be able
> to use drivers with optional GPIO pins only has the effect of unnecessarily
> increasing the code size for platforms with no GPIO support. Nothing else.
> It doesn't prevent users from "disabling" optional GPIO pins. There are other
> means to do that, from manipulating the devicetree file to manipulating
> the DSDT to disabling ACPI (yes, that works too).

I don't know about ACPI, so I cannot comment here.

> >enable that part of GPIOLIB such that the _optional variants do the
> >following with GPIOLIB=n:
> >
> > if a GPIO is specified:
> > return -ENOSYS
> > else:
> > return NULL
> >
> 
> Sure. I thought about it, and had a brief look into the gpiolib code. It would
> require a substantial part of the gpiolib code to be enabled, and as mentioned
> it would still be inconsistent as it really only applies to devicetree and
> lookup table based configurations, but not to ACPI.
> 
> The potential gain of ensuring that GPIOLIB is not accidentally disabled
> just doesn't seem to be worth the effort and the cost, at least not to me.
> Any platform which needs GPIOLIB should just have and keep it enabled,
> and I don't really see it as such a big deal to expect users to keep that
> in mind.
> 
> On the other side, I do dislike the notion of enforcing GPIOLIB=y just because
> some driver(s) used by a platform use optional gpio pins.
> 
> I understand that the current approach is what it is. I just don't
> agree with it, and I don't think it is particularly useful or
> beneficial. But it isn't really worth arguing about either, so let's
> just agree to disagree.

That's fine.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


[PATCH] ipaddress: fix build with musl libc

2016-05-22 Thread Kylie McClain
MIN() is defined within sys/param.h.

Signed-off-by: Kylie McClain 
---
 ip/ipaddress.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index 0692fba..df363b0 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
-- 
2.8.2



Re: [RESEND PATCH v7 17/21] IB/hns: Add QP operations support

2016-05-22 Thread Wei Hu (Xavier)

Hi, Doug Ledford
   In hns_roce_cmd_wait and hns_roce_cmd_poll, there are down 
seminal operations,

   So, there are not deadlock and conflict, right?

static int hns_roce_cmd_poll(struct hns_roce_dev *hr_dev, u64 in_param,
 u64 *out_param, unsigned long in_modifier,
 u8 op_modifier, u16 op, unsigned long timeout)
{
struct device *dev = _dev->pdev->dev;
u8 __iomem *hcr = hr_dev->cmd.hcr;
unsigned long end = 0;
u32 status = 0;
int ret;

down(_dev->cmd.poll_sem);

..// 

up (_dev->cmd.poll_sem);
return ret;
}

static int hns_roce_cmd_wait(struct hns_roce_dev *hr_dev, u64 in_param,
 u64 *out_param, unsigned long in_modifier,
 u8 op_modifier, u16 op, unsigned long timeout)
{
struct hns_roce_cmdq *cmd = _dev->cmd;
struct device *dev = _dev->pdev->dev;
struct hns_roce_cmd_context *context;
int ret = 0;

down(>event_sem);

..// 

up (>event_sem);
return ret;
}

int __hns_roce_cmd(struct hns_roce_dev *hr_dev, u64 in_param, u64 *out_param,
   unsigned long in_modifier, u8 op_modifier, u16 op,
   unsigned long timeout)
{
if (hr_dev->cmd.use_events)
return hns_roce_cmd_wait(hr_dev, in_param, out_param,
 in_modifier, op_modifier, op, timeout);
else
return hns_roce_cmd_poll(hr_dev, in_param, out_param,
 in_modifier, op_modifier, op, timeout);
}



Thanks.

Regards
Wei Hu

On 2016/5/14 5:52, Doug Ledford wrote:

On 05/09/2016 11:04 PM, Lijun Ou wrote:

+int __hns_roce_cmd(struct hns_roce_dev *hr_dev, u64 in_param, u64 *out_param,
+  unsigned long in_modifier, u8 op_modifier, u16 op,
+  unsigned long timeout);
+
+/* Invoke a command with no output parameter */
+static inline int hns_roce_cmd(struct hns_roce_dev *hr_dev, u64 in_param,
+  unsigned long in_modifier, u8 op_modifier,
+  u16 op, unsigned long timeout)
+{
+   return __hns_roce_cmd(hr_dev, in_param, NULL, in_modifier,
+ op_modifier, op, timeout);
+}
+
+/* Invoke a command with an output mailbox */
+static inline int hns_roce_cmd_box(struct hns_roce_dev *hr_dev, u64 in_param,
+  u64 out_param, unsigned long in_modifier,
+  u8 op_modifier, u16 op,
+  unsigned long timeout)
+{
+   return __hns_roce_cmd(hr_dev, in_param, _param, in_modifier,
+ op_modifier, op, timeout);
+}

This will make people scratch their head in the future.  You are using
two commands to map to one command without there being any locking
involved.  The typical convention for routine_1() -> __routine_1() is
that the __ version requires that it be called while locked, and the
version without a __ does the locking before calling it.  That way a
used can always know if they aren't currently holding the appropriate
lock, then they6 call routine_1() and if they are, they call
__routine_1() to avoid a deadlock.  I would suggest changing the name of
__hns_roce_cmd to hns_roce_cmd_box and completely remove the existing
hns_roce_cmd_box inline, and then change the hns_roce_cmd() inline to
directly call hns_roce_cmd_box() which will then select between
event/poll command sends.






Re: [RFC] NET: PHY: adds driver for Lantiq PHY11G

2016-05-22 Thread Andrew Lunn
On Sun, May 22, 2016 at 09:33:51PM +0200, Hauke Mehrtens wrote:
> Supports the Lantiq / Intel CHD 11G and 22E PHYs.
> These PHYs are also named PEF 7061, PEF 7071, PEF 7072
> 
> Signed-off-by: John Crispin 
> Signed-off-by: Hauke Mehrtens 
> ---
> 
> This is based on a driver from OpenWrt / LEDE. This is send as a RFC
> because the merge window is open now and it adds a new driver. This
> patch was cleaned up on request of Alexander.
> 
> 
>  .../devicetree/bindings/phy/phy-lanitq.txt | 216 +
>  drivers/net/phy/Kconfig|   6 +
>  drivers/net/phy/Makefile   |   1 +
>  drivers/net/phy/lantiq.c   | 269 
> +
>  4 files changed, 492 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/phy/phy-lanitq.txt
>  create mode 100644 drivers/net/phy/lantiq.c
> 
> diff --git a/Documentation/devicetree/bindings/phy/phy-lanitq.txt 
> b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
> new file mode 100644
> index 000..d9746e8
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
> @@ -0,0 +1,216 @@
> +Lanitq PHY binding
> +
> +
> +This devicetree binding controls the lantiq ethernet phys led functionality.

Hi Hauke

You should CC: the device tree list.

> +
> +Example:
> + mdio@0 {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + compatible = "lantiq,xrx200-mdio";
> + phy5: ethernet-phy@5 {
> + reg = <0x1>;
> + compatible = "lantiq,phy11g", 
> "ethernet-phy-ieee802.3-c22";
> + };
> + phy11: ethernet-phy@11 {
> + reg = <0x11>;
> + compatible = "lantiq,phy22f", 
> "ethernet-phy-ieee802.3-c22";
> + lantiq,led2h = <0x00>;
> + lantiq,led2l = <0x03>;
> + };
> + phy12: ethernet-phy@12 {
> + reg = <0x12>;
> + compatible = "lantiq,phy22f", 
> "ethernet-phy-ieee802.3-c22";
> + lantiq,led1h = <0x00>;
> + lantiq,led1l = <0x03>;
> + };
> + phy13: ethernet-phy@13 {
> + reg = <0x13>;
> + compatible = "lantiq,phy22f", 
> "ethernet-phy-ieee802.3-c22";
> + lantiq,led2h = <0x00>;
> + lantiq,led2l = <0x03>;
> + };
> + phy14: ethernet-phy@14 {
> + reg = <0x14>;
> + compatible = "lantiq,phy22f", 
> "ethernet-phy-ieee802.3-c22";
> + lantiq,led1h = <0x00>;
> + lantiq,led1l = <0x03>;
> + };
> + };
> +
> +Register Description
> +
> +
> +LEDCH:
> +
> +Name Hardware Reset Value
> +LEDCH0x00C5
> +
> +| 15 |||||||  8 |
> +=
> +|RES |
> +=
> +
> +|  7 |||||||  0 |
> +=
> +|   FBF   |   SBF   |RES | NACS |
> +=
> +
> +FieldBitsTypeDescription
> +FBC  7:6 RW  Fast Blink Frequency
> + ---
> + 0x0 (00b) F02HZ 2 Hz blinking frequency
> + 0x1 (01b) F04HZ 4 Hz blinking frequency
> + 0x2 (10b) F08HZ 8 Hz blinking frequency
> + 0x3 (11b) F16HZ 16 Hz blinking frequency
> +
> +SBF  5:4 RW  Slow Blink Frequency
> + ---
> + 0x0 (00b) F02HZ 2 Hz blinking frequency
> + 0x1 (01b) F04HZ 4 Hz blinking frequency
> + 0x2 (10b) F08HZ 8 Hz blinking frequency
> + 0x3 (11b) F16HZ 16 Hz blinking frequency
> +
> +NACS 2:0 RW  Inverse of Scan Function
> + ---
> + 0x0 (000b) NONE No Function
> + 0x1 (001b) LINK Complex function enabled when link is up
> + 0x2 (010b) PDOWN Complex function enabled when device 
> is powered-down
> + 0x3 (011b) EEE Complex function enabled when device is 
> in EEE mode
> + 0x4 (100b) ANEG Complex function enabled when 
> auto-negotiation is running
> + 0x5 (101b) ABIST Complex function enabled when analog 
> self-test is running
> + 0x6 (110b) CDIAG Complex function enabled when cable 
> diagnostics are running
> + 0x7 (111b) TEST Complex function enabled when test mode 
> is running
> +
> +LEDCL:

I doubt the device tree maintainers will accept this. You don't
normally put 

[PATCH] net/qlge: Avoids recursive EEH error

2016-05-22 Thread Gavin Shan
One timer, whose handler keeps reading on MMIO register for EEH
core to detect error in time, is started when the PCI device driver
is loaded. MMIO register can't be accessed during PE reset in EEH
recovery. Otherwise, the unexpected recursive error is triggered.
The timer isn't closed that time if the interface isn't brought
up. So the unexpected recursive error is seen during EEH recovery
when the interface is down.

This avoids the unexpected recursive EEH error by closing the timer
in qlge_io_error_detected() before EEH PE reset unconditionally. The
timer is started unconditionally after EEH PE reset in qlge_io_resume().
Also, the timer should be closed unconditionally when the device is
removed from the system permanently in qlge_io_error_detected().

Reported-by: Shriya R. Kulkarni 
Signed-off-by: Gavin Shan 
---
 drivers/net/ethernet/qlogic/qlge/qlge_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c 
b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index 83d7210..fd5d1c9 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -4846,7 +4846,6 @@ static void ql_eeh_close(struct net_device *ndev)
}
 
/* Disabling the timer */
-   del_timer_sync(>timer);
ql_cancel_all_work_sync(qdev);
 
for (i = 0; i < qdev->rss_ring_count; i++)
@@ -4873,6 +4872,7 @@ static pci_ers_result_t qlge_io_error_detected(struct 
pci_dev *pdev,
return PCI_ERS_RESULT_CAN_RECOVER;
case pci_channel_io_frozen:
netif_device_detach(ndev);
+   del_timer_sync(>timer);
if (netif_running(ndev))
ql_eeh_close(ndev);
pci_disable_device(pdev);
@@ -4880,6 +4880,7 @@ static pci_ers_result_t qlge_io_error_detected(struct 
pci_dev *pdev,
case pci_channel_io_perm_failure:
dev_err(>dev,
"%s: pci_channel_io_perm_failure.\n", __func__);
+   del_timer_sync(>timer);
ql_eeh_close(ndev);
set_bit(QL_EEH_FATAL, >flags);
return PCI_ERS_RESULT_DISCONNECT;
-- 
2.1.0



Re: [RESEND PATCH v7 00/21] Add HiSilicon RoCE driver

2016-05-22 Thread Wei Hu (Xavier)

Hi, Doug Ledford

We will modify the license description of the beginning of the code file 
in  patch v8.

Thanks

Regards
Wei Hu

On 2016/5/14 5:09, Doug Ledford wrote:

On 05/09/2016 11:04 PM, Lijun Ou wrote:

The HiSilicon Network Substem is a long term evolution IP which is
supposed to be used in HiSilicon ICT SoCs. HNS (HiSilicon Network
Sybsystem) also has a hardware support of performing RDMA with
RoCEE.
The driver for HiSilicon RoCEE(RoCE Engine) is a platform driver and
will support mulitple versions of SOCs in future. This version of driver
is meant to support Hip06 SoC(which confirms to RoCEEv1 hardware
specifications).

Changes v6 -> v7:
1. modify some type of parameter, use bool replace the original type.
2. add the Signed-off-by signatures in the first patch.
3. delete the improper print sentence in hns_roce_create_eq.

One thing you should be aware of (but which you may not care about,
which is fine) is that if you want your driver to be available as part
of the OpenFabric Alliances OFED distribution, the license on the files
would need to be "GPL or BSD" (see the remainder of the RDMA stack for
examples).  If you don't want that license on your files, I'm fine with
that, and I won't hold up your submission based on the license.  If,
however, you *do* want your code to be shipable as part of OFED, it is
much easier to update the license on the files before you submit them
and third parties modify them.  If you want to change the license later,
you have to get every person that has made any substantive change to the
file to agree to the license change.







Re: [PATCH] ip_tunnel: enclose a code block in macro IS_ENABLED(CONFIG_IPV6)

2016-05-22 Thread Alexander Duyck
On Sun, May 22, 2016 at 3:53 PM, Haishuang Yan
 wrote:
> For ipv6 case, enclose the code block in macro IS_ENABLED(CONFIG_IPV6).
>
> Signed-off-by: Haishuang Yan 
> ---
>  net/ipv4/ip_tunnel.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
> index a69ed94..5f3c8de 100644
> --- a/net/ipv4/ip_tunnel.c
> +++ b/net/ipv4/ip_tunnel.c
> @@ -665,10 +665,13 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct 
> net_device *dev,
> if (skb->protocol == htons(ETH_P_IP)) {
> tos = inner_iph->tos;
> connected = false;
> -   } else if (skb->protocol == htons(ETH_P_IPV6)) {
> +   }
> +#if IS_ENABLED(CONFIG_IPV6)
> +   else if (skb->protocol == htons(ETH_P_IPV6)) {
> tos = ipv6_get_dsfield((const struct ipv6hdr 
> *)inner_iph);
> connected = false;
> }
> +#endif
> }
>
> init_tunnel_flow(, protocol, dst, tnl_params->saddr,
>

You could just place the "#if IS_ENABLED" block before the "} else if
(..) {" piece and the "#endif" before the closing brace and this
becomes much easier to look at.

Also is there any specific reason for needing to wrap this code?  It
isn't clear from your patch description, and from what I can tell the
code that you are wrapping should be defined in all cases since it is
defined in header files anyway.

- Alex


Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-22 Thread Guenter Roeck

On 05/22/2016 11:21 AM, Uwe Kleine-König wrote:

Hello Guenter,

[extending Cc: a bit]

On Sun, May 22, 2016 at 08:41:23AM -0700, Guenter Roeck wrote:

On 05/22/2016 03:10 AM, Uwe Kleine-König wrote:

On Tue, May 17, 2016 at 09:37:11PM -0700, Guenter Roeck wrote:

[9.366256] libphy: ethoc-mdio: probed
[9.367389]  (null): could not attach to PHY
[9.368555]  (null): failed to probe MDIO bus
[9.371540] Unable to handle kernel paging request at virtual address 
001c
[9.371540]  pc = d0320926, ra = 903209d1
[9.375358] Oops: sig: 11 [#1]
[9.376081] PREEMPT
[9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1
[9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
[9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001   
d7f45c00 d7c31bd0
[9.382298] a08:     00060100 d04b0c10 
d7f45dfc d7c31bb0
[9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c
[9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011
[9.388173]
Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 
d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00 d025befc
d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34
[9.396652] Call Trace:
[9.397469]  [] __device_release_driver+0x7d/0x98
[9.398869]  [] device_release_driver+0x15/0x20
[9.400247]  [] bus_remove_device+0xc1/0xd4
[9.401569]  [] device_del+0x109/0x15c
[9.402794]  [] phy_mdio_device_remove+0xd/0x18
[9.404124]  [] mdiobus_unregister+0x40/0x5c
[9.405444]  [] ethoc_probe+0x534/0x5b8
[9.406742]  [] platform_drv_probe+0x28/0x48
[9.408122]  [] driver_probe_device+0x101/0x234
[9.409499]  [] __driver_attach+0x7d/0x98
[9.410809]  [] bus_for_each_dev+0x30/0x5c
[9.412104]  [] driver_attach+0x14/0x18
[9.413385]  [] bus_add_driver+0xc9/0x198
[9.414686]  [] driver_register+0x70/0xa0
[9.416001]  [] __platform_driver_register+0x24/0x28
[9.417463]  [] ethoc_driver_init+0x10/0x14
[9.418824]  [] do_one_initcall+0x80/0x1ac
[9.420083]  [] kernel_init_freeable+0x131/0x198
[9.421504]  [] kernel_init+0xc/0xb0
[9.422693]  [] ret_from_kernel_thread+0x8/0xc


Guenter, can you please test if the following patch fixes your setup:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 307f72a0f2e2..efa85fb31574 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1573,14 +1573,14 @@ static int phy_probe(struct device *dev)
int err = 0;
struct gpio_descs *reset_gpios;

-   phydev->drv = phydrv;
-
/* take phy out of reset */
reset_gpios = devm_gpiod_get_array_optional(dev, "reset",
GPIOD_OUT_LOW);
if (IS_ERR(reset_gpios))
return PTR_ERR(reset_gpios);

+   phydev->drv = phydrv;
+
/* Disable the interrupt if the PHY doesn't support it
 * but the interrupt is still a valid one
 */

This doesn't make your ethernet work, but at least the driver should
fail in a clean way.


I tried, but it still fails with exactly the same error, meaning
it still crashes with the same traceback.


This is strange. If probe fails the device should stay unbound and it
shouldn't enter the if block in __device_release_driver at all.


Together with enabling GPIOLIB this should put ethernet in a working
state again.



I am not exactly in favor of forcing GPIOLIB to be enabled for every
system with Ethernet support, just because a few of them may require
a gpio based phy reset. The same is true, really, for every other
driver using _optional gpiolib functions.

Personally, I think that the _optional functions in gpiolib _should_
return no error if gpiolib is not configured. After all, those
gpio pins _are_ supposed to be optional. gpiolib should be enabled
for affected configurations, ie on systems with gpio support which
do need the optional gpio pins. Forcing gpiolib to be enabled even
in systems with no gpio support seems to be a bit heavy-handed.
It just bloats the kernel on such systems with no added benefit.


That's wrong. The usage of gpio_get_optional and friends means that the
gpio is optional for the *driver*, that is there are devices that make
use of said gpio and others don't. For the devices where the gpio is
specified its usage is not optional. So it must not be ignored e.g. by
GPIOLIB=n configurations. IMHO the best would be to unconditionally


Hi Uwe,

I understand what you are saying, I just have a different perspective.
From your point of view, you consider it unacceptable if optional GPIO
pins are made unavailable by changing the configuration to GPIOLIB=n.
Ok, but that position has number of drawbacks.

Playing the system this way only works if the optional GPIO pins
are the only GPIO pins in use by the system. In almost 

[PATCH] ip_tunnel: enclose a code block in macro IS_ENABLED(CONFIG_IPV6)

2016-05-22 Thread Haishuang Yan
For ipv6 case, enclose the code block in macro IS_ENABLED(CONFIG_IPV6).

Signed-off-by: Haishuang Yan 
---
 net/ipv4/ip_tunnel.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index a69ed94..5f3c8de 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -665,10 +665,13 @@ void ip_tunnel_xmit(struct sk_buff *skb, struct 
net_device *dev,
if (skb->protocol == htons(ETH_P_IP)) {
tos = inner_iph->tos;
connected = false;
-   } else if (skb->protocol == htons(ETH_P_IPV6)) {
+   }
+#if IS_ENABLED(CONFIG_IPV6)
+   else if (skb->protocol == htons(ETH_P_IPV6)) {
tos = ipv6_get_dsfield((const struct ipv6hdr 
*)inner_iph);
connected = false;
}
+#endif
}
 
init_tunnel_flow(, protocol, dst, tnl_params->saddr,
-- 
1.8.3.1





Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-22 Thread Francois Romieu
Lino Sanfilippo  :
[...]
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   unsigned int *txbd_dirty = >txbd_dirty;
>   struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
>   struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];
> - struct sk_buff *skb = tx_buff->skb;
>   unsigned int info = le32_to_cpu(txbd->info);
> + struct sk_buff *skb;
>  

Insert a smp_rmb() here to close one window for an outdated txbd_dirty value
(the "arc_emac_tx_clean wrote txbd_curr and issued smp_wmb" one).

Actually, insert smp_rmb() at the start of arc_emac_tx_clean() as it
does not need to be performed withing the loop and would penalize it.

Given the implicit smp barriers in the non-driver code, I consider
"arc_emac_tx_clean on one CPU does not read latest txbd_dirty value written
by previous arc_emac_tx_clean on different CPU" as utter onanism but
issueing smp_rmb() at the start of arc_emac_tx_clean() nails it as well.

> - if ((info & FOR_EMAC) || !txbd->data || !skb)
> + if (*txbd_dirty == priv->txbd_curr)
>   break;

Ok, it's just the "while (priv->txbd_dirty != priv->txbd_curr) {" loop
in disguise.

>  
> + /* Make sure curr pointer is consistent with info */
> + rmb();
> +
> + info = le32_to_cpu(txbd->info);
> +
> + if (info & FOR_EMAC)
> + break;

With proper ordering + barrier in arc_emac_tx() you can relax it to smp_rmb().

> +
> + skb = tx_buff->skb;
> +
>   if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
>   stats->tx_errors++;
>   stats->tx_dropped++;
> @@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM;
>   }
>  
> - /* Ensure that txbd_dirty is visible to tx() before checking
> -  * for queue stopped.
> + /* Ensure that txbd_dirty is visible to tx() and we see the most recent
> +  * value for txbd_curr.
>*/
>   smp_mb();
>  
> @@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>   dma_unmap_len_set(>tx_buff[*txbd_curr], len, len);
>  
>   priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
> -
> - /* Make sure pointer to data buffer is set */
> - wmb();
> + priv->tx_buff[*txbd_curr].skb = skb;
>  
>   skb_tx_timestamp(skb);
>  
>   *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);

No.

You need dma_wmb() after skb_tx_timestamp() to commit skb_tx_timestamp() [*]
and data = cpu_to_le32(addr).

[*] I doubt anyone want a dma_sync_single_...() here.

>  
> - /* Make sure info word is set */
> + /* 1. Make sure that with respect to tx_clean everything is set up
> +  * properly before we advance txbd_curr.
> +  * 2. Make sure writes to DMA descriptors are completed before we inform
> +  * the hardware.
> +  */
>   wmb();

Yes, either wmb() or smp_wmb() + dma_wmb().

>  
> - priv->tx_buff[*txbd_curr].skb = skb;
> -
>   /* Increment index to point to the next BD */
>   *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
>  
> - /* Ensure that tx_clean() sees the new txbd_curr before
> -  * checking the queue status. This prevents an unneeded wake
> -  * of the queue in tx_clean().
> + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees
> +  * the updated value of txbd_curr.
>*/
>   smp_mb();

Nit: s/the most/a/

"a" as in "arc_emac_tx_clean() _is_ racing with arc_emac_tx"

>  
> - if (!arc_emac_tx_avail(priv)) {
> + if (!arc_emac_tx_avail(priv))
>   netif_stop_queue(ndev);
> - /* Refresh tx_dirty */
> - smp_mb();
> - if (arc_emac_tx_avail(priv))
> - netif_start_queue(ndev);
> - }

No.

I may sound like an old record but the revalidation part must be kept.

txbd_dirty may change in the arc_emac_tx_avail.. netif_stop_queue window
(the race requires a different CPU as arc_emac_tx() runs in locally
disabled BH context).

-- 
Ueimor


Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-22 Thread Francois Romieu
Lino Sanfilippo  :
> On 21.05.2016 21:47, Francois Romieu wrote:
> > Shuyu Wei  :
> > [...]
> >> diff --git a/drivers/net/ethernet/arc/emac_main.c 
> >> b/drivers/net/ethernet/arc/emac_main.c
> >> index a3a9392..c2447b0 100644
> >> --- a/drivers/net/ethernet/arc/emac_main.c
> >> +++ b/drivers/net/ethernet/arc/emac_main.c
> >> @@ -686,6 +686,9 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> >> net_device *ndev)
> >>  
> >> skb_tx_timestamp(skb);
> >>  
> >> +   /* Make sure timestamp is set */
> >> +   smp_wmb();
> > 
> > Should be dma_wmb() (see davem's message).
> > 
> > It's completely unrelated to SMP.
> > 
> 
> Its also completely unrelated to dma so I doubt that this is what davem meant.

It's related to dma: nobody wants the device to perform dma from memory
while the CPU is writing timestamp.

The device must enforce a commit of any network buffer memory write before
releasing control, i.e. before writing *info.

See 'git log -p --grep=dma_[rw]mb': it appears several times.

> As far as I understood he was referring to the dma descriptor.

If the wmb() between *info = ... and *txbd_curr = ... is replaced by a
dma_wmb(), the device will see a consistent descriptor when if performs
a DMA read after the CPU wrote into the mailbox (arc_reg_set(..., TXPL_MASK)).

Ok, I agree on this one.

However, it doesn't help with the (SMP) requirement that no CPU sees
the txbd_curr write before *info is written by the local CPU. Afaiui
dma_wmb() is too weak for this part. If we don't want wmb() here,
it will have to be dma_wmb() + smp_wmb().

-- 
Ueimor


Re: [PATCH] net/ethoc: fix null dereference on error exit path

2016-05-22 Thread Colin Ian King
On 22/05/16 20:42, Max Filippov wrote:
> Hi Colin,
> 
> On Sun, May 22, 2016 at 08:08:18PM +0100, Colin King wrote:
>> From: Colin Ian King 
>>
>> priv is assigned to NULL however all the error exit paths to label 'free'
>> dereference priv, causing a null pointer dereference.
>>
>> Examination of the code shows that all error exits via the 'free'
>> label path occur before priv is assigned to netdev_priv(netdev), hence
>> there is no need to call clk_disable_unprepare and so the location of
>> the label should be moved to free_netdev statement to avoid this null
>> dereference on priv.
> 
> This description is a bit inaccurate. Indeed all 'goto free' above the
> 'priv = netdev_priv(netdev);' need to skip 'if (priv->clk)' check, but
> there are two more 'goto free' below that line, and they look correct
> now, but after this patch they'll leave the clock enabled.
> 
Oops, I'll resend a corrected fix tomorrow


[PATCH net] bpf, inode: disallow userns mounts

2016-05-22 Thread Daniel Borkmann
Follow-up to commit e27f4a942a0e ("bpf: Use mount_nodev not mount_ns
to mount the bpf filesystem"), which removes the FS_USERNS_MOUNT flag.

The original idea was to have a per mountns instance instead of a
single global fs instance, but that didn't work out and we had to
switch to mount_nodev() model. The intent of that middle ground was
that we avoid users who don't play nice to create endless instances
of bpf fs which are difficult to control and discover from an admin
point of view, but at the same time it would have allowed us to be
more flexible with regard to namespaces.

Therefore, since we now did the switch to mount_nodev() as a fix
where individual instances are created, we also need to remove userns
mount flag along with it to avoid running into mentioned situation.
I don't expect any breakage at this early point in time with removing
the flag and we can revisit this later should the requirement for
this come up with future users. This and commit e27f4a942a0e have
been split to facilitate tracking should any of them run into the
unlikely case of causing a regression.

Fixes: b2197755b263 ("bpf: add support for persistent maps/progs")
Signed-off-by: Daniel Borkmann 
Acked-by: Hannes Frederic Sowa 
Acked-by: Alexei Starovoitov 
---
 kernel/bpf/inode.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/kernel/bpf/inode.c b/kernel/bpf/inode.c
index 04be702..318858e 100644
--- a/kernel/bpf/inode.c
+++ b/kernel/bpf/inode.c
@@ -365,7 +365,6 @@ static struct file_system_type bpf_fs_type = {
.name   = "bpf",
.mount  = bpf_mount,
.kill_sb= kill_litter_super,
-   .fs_flags   = FS_USERNS_MOUNT,
 };
 
 MODULE_ALIAS_FS("bpf");
-- 
1.9.3



Re: [RFC] NET: PHY: adds driver for Lantiq PHY11G

2016-05-22 Thread Hauke Mehrtens
On 05/22/2016 10:30 PM, Mathias Kresin wrote:
> Hi Hauke,
> 
> find my comments in-line.
> 
> Am 22.05.2016 um 21:33 schrieb Hauke Mehrtens:
>> Supports the Lantiq / Intel CHD 11G and 22E PHYs.
>> These PHYs are also named PEF 7061, PEF 7071, PEF 7072
>>
>> Signed-off-by: John Crispin 
>> Signed-off-by: Hauke Mehrtens 
>> ---
>>
>> This is based on a driver from OpenWrt / LEDE. This is send as a RFC
>> because the merge window is open now and it adds a new driver. This
>> patch was cleaned up on request of Alexander.
>>
>>
>>   .../devicetree/bindings/phy/phy-lanitq.txt | 216
>> +
> 
> Looks like a typo in the filename. lantiq != lanitq

Thanks for spotting this, I just copied it from OpenWrt. ;-)

> 
>>   drivers/net/phy/Kconfig|   6 +
>>   drivers/net/phy/Makefile   |   1 +
>>   drivers/net/phy/lantiq.c   | 269
>> +
>>   4 files changed, 492 insertions(+)
>>   create mode 100644 Documentation/devicetree/bindings/phy/phy-lanitq.txt
>>   create mode 100644 drivers/net/phy/lantiq.c
>>
>> diff --git a/Documentation/devicetree/bindings/phy/phy-lanitq.txt
>> b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
>> new file mode 100644
>> index 000..d9746e8
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
>> @@ -0,0 +1,216 @@
>> +Lanitq PHY binding
> 
> Same typo as mentioned above.
> 

Will change this too.

.

>> +
>> +static void lantiq_gphy_of_reg_init(struct phy_device *phydev)
>> +{
>> +struct device_node *node = phydev->mdio.dev.of_node;
>> +u32 tmp;
>> +
>> +if (!IS_ENABLED(CONFIG_OF_MDIO))
>> +return;
>> +
>> +/* store the led values if one was passed by the device tree */
>> +if (!of_property_read_u32(node, "lantiq,ledch", ))
>> +phy_write_mmd_indirect(phydev, 0x1e0, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,ledcl", ))
>> +phy_write_mmd_indirect(phydev, 0x1e1, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led0h", ))
>> +phy_write_mmd_indirect(phydev, 0x1e2, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led0l", ))
>> +phy_write_mmd_indirect(phydev, 0x1e3, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led1h", ))
>> +phy_write_mmd_indirect(phydev, 0x1e4, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led1l", ))
>> +phy_write_mmd_indirect(phydev, 0x1e5, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led2h", ))
>> +phy_write_mmd_indirect(phydev, 0x1e6, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led2l", ))
>> +phy_write_mmd_indirect(phydev, 0x1e7, MDIO_MMD_VEND2, tmp);
>> +
>> +/* The LED3 is only available in PEF 7072 package. */
>> +if (!of_property_read_u32(node, "lantiq,led3h", ))
>> +phy_write_mmd_indirect(phydev, 0x1e8, MDIO_MMD_VEND2, tmp);
>> +
>> +if (!of_property_read_u32(node, "lantiq,led3l", ))
>> +phy_write_mmd_indirect(phydev, 0x1e9, MDIO_MMD_VEND2, tmp);
>> +}
>> +
>> +static int lantiq_gphy_config_init(struct phy_device *phydev)
>> +{
>> +int err;
>> +
>> +/* Mask all interrupts */
>> +err = phy_write(phydev, MII_VR9_11G_IMASK, 0);
>> +if (err)
>> +return err;
>> +
>> +/* Clear all pending interrupts */
>> +phy_read(phydev, MII_VR9_11G_ISTAT);
>> +
>> +phy_write_mmd_indirect(phydev, 0x1e0, MDIO_MMD_VEND2, 0xc5);
>> +phy_write_mmd_indirect(phydev, 0x1e1, MDIO_MMD_VEND2, 0x67);
> 
> Any specific reason why the complex functions are enabled by default?

This is the same configuration the vendor SDK uses.

>> +phy_write_mmd_indirect(phydev, 0x1e2, MDIO_MMD_VEND2, 0x42);
>> +phy_write_mmd_indirect(phydev, 0x1e3, MDIO_MMD_VEND2, 0x10);
>> +phy_write_mmd_indirect(phydev, 0x1e4, MDIO_MMD_VEND2, 0x70);
>> +phy_write_mmd_indirect(phydev, 0x1e5, MDIO_MMD_VEND2, 0x03);
>> +phy_write_mmd_indirect(phydev, 0x1e6, MDIO_MMD_VEND2, 0x20);
>> +phy_write_mmd_indirect(phydev, 0x1e7, MDIO_MMD_VEND2, 0x00);
>> +phy_write_mmd_indirect(phydev, 0x1e8, MDIO_MMD_VEND2, 0x40);
>> +phy_write_mmd_indirect(phydev, 0x1e9, MDIO_MMD_VEND2, 0x20);
> 
> I would suggest to use the same blink/permanent on configuration for all
> led pins (as it is in LEDE/OpenWrt):
> 
> Constant On: 10/100/1000MBit
> Blink Fast: None
> Blink Slow: None
> Pulse: TX/RX
> 
> I'm aware of only one CPE that uses more than one led for status
> indication. All other have a single led attached to any of the pins.
> 
> This way it's only required to change the default configuration via the
> device tree bindings for the minority of the devices.

Ok, I am not aware on how all the boards are looking like. If most of
the boards only use on led it makes sense 

Re: [RFC] NET: PHY: adds driver for Lantiq PHY11G

2016-05-22 Thread Mathias Kresin

Hi Hauke,

find my comments in-line.

Am 22.05.2016 um 21:33 schrieb Hauke Mehrtens:

Supports the Lantiq / Intel CHD 11G and 22E PHYs.
These PHYs are also named PEF 7061, PEF 7071, PEF 7072

Signed-off-by: John Crispin 
Signed-off-by: Hauke Mehrtens 
---

This is based on a driver from OpenWrt / LEDE. This is send as a RFC
because the merge window is open now and it adds a new driver. This
patch was cleaned up on request of Alexander.


  .../devicetree/bindings/phy/phy-lanitq.txt | 216 +


Looks like a typo in the filename. lantiq != lanitq


  drivers/net/phy/Kconfig|   6 +
  drivers/net/phy/Makefile   |   1 +
  drivers/net/phy/lantiq.c   | 269 +
  4 files changed, 492 insertions(+)
  create mode 100644 Documentation/devicetree/bindings/phy/phy-lanitq.txt
  create mode 100644 drivers/net/phy/lantiq.c

diff --git a/Documentation/devicetree/bindings/phy/phy-lanitq.txt 
b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
new file mode 100644
index 000..d9746e8
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
@@ -0,0 +1,216 @@
+Lanitq PHY binding


Same typo as mentioned above.


+
+
+This devicetree binding controls the lantiq ethernet phys led functionality.
+
+Example:
+   mdio@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "lantiq,xrx200-mdio";
+   phy5: ethernet-phy@5 {
+   reg = <0x1>;
+   compatible = "lantiq,phy11g", 
"ethernet-phy-ieee802.3-c22";
+   };
+   phy11: ethernet-phy@11 {
+   reg = <0x11>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led2h = <0x00>;
+   lantiq,led2l = <0x03>;
+   };
+   phy12: ethernet-phy@12 {
+   reg = <0x12>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led1h = <0x00>;
+   lantiq,led1l = <0x03>;
+   };
+   phy13: ethernet-phy@13 {
+   reg = <0x13>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led2h = <0x00>;
+   lantiq,led2l = <0x03>;
+   };
+   phy14: ethernet-phy@14 {
+   reg = <0x14>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led1h = <0x00>;
+   lantiq,led1l = <0x03>;
+   };
+   };
+
+Register Description
+
+
+LEDCH:
+
+Name   Hardware Reset Value
+LEDCH  0x00C5
+
+| 15 |||||||  8 |
+=
+|  RES |
+=
+
+|  7 |||||||  0 |
+=
+|   FBF   |   SBF   |RES | NACS |
+=
+
+Field  BitsTypeDescription
+FBC7:6 RW  Fast Blink Frequency
+   ---
+   0x0 (00b) F02HZ 2 Hz blinking frequency
+   0x1 (01b) F04HZ 4 Hz blinking frequency
+   0x2 (10b) F08HZ 8 Hz blinking frequency
+   0x3 (11b) F16HZ 16 Hz blinking frequency
+
+SBF5:4 RW  Slow Blink Frequency
+   ---
+   0x0 (00b) F02HZ 2 Hz blinking frequency
+   0x1 (01b) F04HZ 4 Hz blinking frequency
+   0x2 (10b) F08HZ 8 Hz blinking frequency
+   0x3 (11b) F16HZ 16 Hz blinking frequency
+
+NACS   2:0 RW  Inverse of Scan Function
+   ---
+   0x0 (000b) NONE No Function
+   0x1 (001b) LINK Complex function enabled when link is up
+   0x2 (010b) PDOWN Complex function enabled when device 
is powered-down
+   0x3 (011b) EEE Complex function enabled when device is 
in EEE mode
+   0x4 (100b) ANEG Complex function enabled when 
auto-negotiation is running
+   0x5 (101b) ABIST Complex function enabled when analog 
self-test is running
+   0x6 (110b) CDIAG Complex function enabled when cable 
diagnostics are running
+   0x7 (111b) TEST Complex function enabled when test mode 
is running
+
+LEDCL:
+
+Name   Hardware Reset Value
+LEDCL  0x0067
+
+| 15 |||||||  8 |
+=
+|  RES 

Re: [PATCH] net/ethoc: fix null dereference on error exit path

2016-05-22 Thread Max Filippov
Hi Colin,

On Sun, May 22, 2016 at 08:08:18PM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> priv is assigned to NULL however all the error exit paths to label 'free'
> dereference priv, causing a null pointer dereference.
> 
> Examination of the code shows that all error exits via the 'free'
> label path occur before priv is assigned to netdev_priv(netdev), hence
> there is no need to call clk_disable_unprepare and so the location of
> the label should be moved to free_netdev statement to avoid this null
> dereference on priv.

This description is a bit inaccurate. Indeed all 'goto free' above the
'priv = netdev_priv(netdev);' need to skip 'if (priv->clk)' check, but
there are two more 'goto free' below that line, and they look correct
now, but after this patch they'll leave the clock enabled.

-- 
Thanks.
-- Max


[RFC] NET: PHY: adds driver for Lantiq PHY11G

2016-05-22 Thread Hauke Mehrtens
Supports the Lantiq / Intel CHD 11G and 22E PHYs.
These PHYs are also named PEF 7061, PEF 7071, PEF 7072

Signed-off-by: John Crispin 
Signed-off-by: Hauke Mehrtens 
---

This is based on a driver from OpenWrt / LEDE. This is send as a RFC
because the merge window is open now and it adds a new driver. This
patch was cleaned up on request of Alexander.


 .../devicetree/bindings/phy/phy-lanitq.txt | 216 +
 drivers/net/phy/Kconfig|   6 +
 drivers/net/phy/Makefile   |   1 +
 drivers/net/phy/lantiq.c   | 269 +
 4 files changed, 492 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/phy/phy-lanitq.txt
 create mode 100644 drivers/net/phy/lantiq.c

diff --git a/Documentation/devicetree/bindings/phy/phy-lanitq.txt 
b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
new file mode 100644
index 000..d9746e8
--- /dev/null
+++ b/Documentation/devicetree/bindings/phy/phy-lanitq.txt
@@ -0,0 +1,216 @@
+Lanitq PHY binding
+
+
+This devicetree binding controls the lantiq ethernet phys led functionality.
+
+Example:
+   mdio@0 {
+   #address-cells = <1>;
+   #size-cells = <0>;
+   compatible = "lantiq,xrx200-mdio";
+   phy5: ethernet-phy@5 {
+   reg = <0x1>;
+   compatible = "lantiq,phy11g", 
"ethernet-phy-ieee802.3-c22";
+   };
+   phy11: ethernet-phy@11 {
+   reg = <0x11>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led2h = <0x00>;
+   lantiq,led2l = <0x03>;
+   };
+   phy12: ethernet-phy@12 {
+   reg = <0x12>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led1h = <0x00>;
+   lantiq,led1l = <0x03>;
+   };
+   phy13: ethernet-phy@13 {
+   reg = <0x13>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led2h = <0x00>;
+   lantiq,led2l = <0x03>;
+   };
+   phy14: ethernet-phy@14 {
+   reg = <0x14>;
+   compatible = "lantiq,phy22f", 
"ethernet-phy-ieee802.3-c22";
+   lantiq,led1h = <0x00>;
+   lantiq,led1l = <0x03>;
+   };
+   };
+
+Register Description
+
+
+LEDCH:
+
+Name   Hardware Reset Value
+LEDCH  0x00C5
+
+| 15 |||||||  8 |
+=
+|  RES |
+=
+
+|  7 |||||||  0 |
+=
+|   FBF   |   SBF   |RES | NACS |
+=
+
+Field  BitsTypeDescription
+FBC7:6 RW  Fast Blink Frequency
+   ---
+   0x0 (00b) F02HZ 2 Hz blinking frequency
+   0x1 (01b) F04HZ 4 Hz blinking frequency
+   0x2 (10b) F08HZ 8 Hz blinking frequency
+   0x3 (11b) F16HZ 16 Hz blinking frequency
+
+SBF5:4 RW  Slow Blink Frequency
+   ---
+   0x0 (00b) F02HZ 2 Hz blinking frequency
+   0x1 (01b) F04HZ 4 Hz blinking frequency
+   0x2 (10b) F08HZ 8 Hz blinking frequency
+   0x3 (11b) F16HZ 16 Hz blinking frequency
+
+NACS   2:0 RW  Inverse of Scan Function
+   ---
+   0x0 (000b) NONE No Function
+   0x1 (001b) LINK Complex function enabled when link is up
+   0x2 (010b) PDOWN Complex function enabled when device 
is powered-down
+   0x3 (011b) EEE Complex function enabled when device is 
in EEE mode
+   0x4 (100b) ANEG Complex function enabled when 
auto-negotiation is running
+   0x5 (101b) ABIST Complex function enabled when analog 
self-test is running
+   0x6 (110b) CDIAG Complex function enabled when cable 
diagnostics are running
+   0x7 (111b) TEST Complex function enabled when test mode 
is running
+
+LEDCL:
+
+Name   Hardware Reset Value
+LEDCL  0x0067
+
+| 15 |||||||  8 |
+=
+|  RES |
+=
+
+|  7 |||||||  0 |
+=
+|RES | SCAN |RES |

[PATCH] net/ethoc: fix null dereference on error exit path

2016-05-22 Thread Colin King
From: Colin Ian King 

priv is assigned to NULL however all the error exit paths to label 'free'
dereference priv, causing a null pointer dereference.

Examination of the code shows that all error exits via the 'free'
label path occur before priv is assigned to netdev_priv(netdev), hence
there is no need to call clk_disable_unprepare and so the location of
the label should be moved to free_netdev statement to avoid this null
dereference on priv.

Fixes issue found by CoverityScan, CID#113260

Signed-off-by: Colin Ian King 
---
 drivers/net/ethernet/ethoc.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/ethoc.c b/drivers/net/ethernet/ethoc.c
index 41b0106..96403a4 100644
--- a/drivers/net/ethernet/ethoc.c
+++ b/drivers/net/ethernet/ethoc.c
@@ -1241,9 +1241,10 @@ error2:
 error:
mdiobus_unregister(priv->mdio);
mdiobus_free(priv->mdio);
-free:
+
if (priv->clk)
clk_disable_unprepare(priv->clk);
+free:
free_netdev(netdev);
 out:
return ret;
-- 
2.8.1



Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-22 Thread Uwe Kleine-König
Hello Guenter,

[extending Cc: a bit]

On Sun, May 22, 2016 at 08:41:23AM -0700, Guenter Roeck wrote:
> On 05/22/2016 03:10 AM, Uwe Kleine-König wrote:
> >On Tue, May 17, 2016 at 09:37:11PM -0700, Guenter Roeck wrote:
> >>[9.366256] libphy: ethoc-mdio: probed
> >>[9.367389]  (null): could not attach to PHY
> >>[9.368555]  (null): failed to probe MDIO bus
> >>[9.371540] Unable to handle kernel paging request at virtual address 
> >>001c
> >>[9.371540]  pc = d0320926, ra = 903209d1
> >>[9.375358] Oops: sig: 11 [#1]
> >>[9.376081] PREEMPT
> >>[9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 
> >>#1
> >>[9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
> >>[9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001   
> >>d7f45c00 d7c31bd0
> >>[9.382298] a08:     00060100 d04b0c10 
> >>d7f45dfc d7c31bb0
> >>[9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 
> >>001c
> >>[9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 
> >>0011
> >>[9.388173]
> >>Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 
> >>
> >>d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00 
> >> d025befc
> >>d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 
> >> d7f45c34
> >>[9.396652] Call Trace:
> >>[9.397469]  [] __device_release_driver+0x7d/0x98
> >>[9.398869]  [] device_release_driver+0x15/0x20
> >>[9.400247]  [] bus_remove_device+0xc1/0xd4
> >>[9.401569]  [] device_del+0x109/0x15c
> >>[9.402794]  [] phy_mdio_device_remove+0xd/0x18
> >>[9.404124]  [] mdiobus_unregister+0x40/0x5c
> >>[9.405444]  [] ethoc_probe+0x534/0x5b8
> >>[9.406742]  [] platform_drv_probe+0x28/0x48
> >>[9.408122]  [] driver_probe_device+0x101/0x234
> >>[9.409499]  [] __driver_attach+0x7d/0x98
> >>[9.410809]  [] bus_for_each_dev+0x30/0x5c
> >>[9.412104]  [] driver_attach+0x14/0x18
> >>[9.413385]  [] bus_add_driver+0xc9/0x198
> >>[9.414686]  [] driver_register+0x70/0xa0
> >>[9.416001]  [] __platform_driver_register+0x24/0x28
> >>[9.417463]  [] ethoc_driver_init+0x10/0x14
> >>[9.418824]  [] do_one_initcall+0x80/0x1ac
> >>[9.420083]  [] kernel_init_freeable+0x131/0x198
> >>[9.421504]  [] kernel_init+0xc/0xb0
> >>[9.422693]  [] ret_from_kernel_thread+0x8/0xc
> >
> >Guenter, can you please test if the following patch fixes your setup:
> >
> >diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
> >index 307f72a0f2e2..efa85fb31574 100644
> >--- a/drivers/net/phy/phy_device.c
> >+++ b/drivers/net/phy/phy_device.c
> >@@ -1573,14 +1573,14 @@ static int phy_probe(struct device *dev)
> > int err = 0;
> > struct gpio_descs *reset_gpios;
> >
> >-phydev->drv = phydrv;
> >-
> > /* take phy out of reset */
> > reset_gpios = devm_gpiod_get_array_optional(dev, "reset",
> > GPIOD_OUT_LOW);
> > if (IS_ERR(reset_gpios))
> > return PTR_ERR(reset_gpios);
> >
> >+phydev->drv = phydrv;
> >+
> > /* Disable the interrupt if the PHY doesn't support it
> >  * but the interrupt is still a valid one
> >  */
> >
> >This doesn't make your ethernet work, but at least the driver should
> >fail in a clean way.
> >
> I tried, but it still fails with exactly the same error, meaning
> it still crashes with the same traceback.

This is strange. If probe fails the device should stay unbound and it
shouldn't enter the if block in __device_release_driver at all.

> >Together with enabling GPIOLIB this should put ethernet in a working
> >state again.
> >
> 
> I am not exactly in favor of forcing GPIOLIB to be enabled for every
> system with Ethernet support, just because a few of them may require
> a gpio based phy reset. The same is true, really, for every other
> driver using _optional gpiolib functions.
> 
> Personally, I think that the _optional functions in gpiolib _should_
> return no error if gpiolib is not configured. After all, those
> gpio pins _are_ supposed to be optional. gpiolib should be enabled
> for affected configurations, ie on systems with gpio support which
> do need the optional gpio pins. Forcing gpiolib to be enabled even
> in systems with no gpio support seems to be a bit heavy-handed.
> It just bloats the kernel on such systems with no added benefit.

That's wrong. The usage of gpio_get_optional and friends means that the
gpio is optional for the *driver*, that is there are devices that make
use of said gpio and others don't. For the devices where the gpio is
specified its usage is not optional. So it must not be ignored e.g. by
GPIOLIB=n configurations. IMHO the best would be to unconditionally
enable that part of GPIOLIB such that the _optional variants do the
following with GPIOLIB=n:

if a GPIO is specified:
return -ENOSYS
 

[iproute2 PATCH 1/1] tc fix ife late binding

2016-05-22 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

following late binding didn't work

sudo tc actions add action ife encode \
type 0xDEAD allow mark dst 02:15:15:15:15:15 index 1

Signed-off-by: Jamal Hadi Salim 
---
 tc/m_ife.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tc/m_ife.c b/tc/m_ife.c
index ed01ff7..c8ae04d 100644
--- a/tc/m_ife.c
+++ b/tc/m_ife.c
@@ -182,6 +182,7 @@ static int parse_ife(struct action_util *a, int *argc_p, 
char ***argv_p,
fprintf(stderr, "ife: Illegal \"index\"\n");
return -1;
}
+   ok++;
argc--;
argv++;
}
-- 
1.9.1



Re: [iproute2 1/1] man: tc-ife.8: man page for ife action

2016-05-22 Thread Jamal Hadi Salim

On 16-05-02 07:40 PM, Jamal Hadi Salim wrote:

On 16-05-02 06:10 PM, Stephen Hemminger wrote:

On Sat, 30 Apr 2016 06:58:04 -0400
Jamal Hadi Salim  wrote:


From: Lucas Bates 

Signed-off-by: Lucas Bates 
Signed-off-by: Jamal Hadi Salim 


Still waiting for a you to respond to my earlier suggestions about
tc-ife.



I did respond.




Stephen,
I pulled the latest iproute2 and the IFE manpage is missing.
It is in this email thread - can you please apply it? Trying
to get Lucas to send more; do it just to encourage him ;->

cheers,
jamal



Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-05-22 Thread Jamal Hadi Salim

On 16-05-20 01:50 AM, Eric Dumazet wrote:


It wont be inaccurate (no drift), as the writers hold the qdisc
spinlock.

It it the same with say IP/TCP SNMP counters :

When an incoming frame is handled, it might change many SNMP counters,
but an SNMP agent might fetch the whole set of SNMP values in the middle
of the changes.

IpInReceives
IpInDelivers
TcpInSegs
IpExtInOctets
IpExtInNoECTPkts
...

The only 'problem' can be a off-by-one (or off-by-bytes-in-the-packet)
transient error, that all SNMP agents are normally handling just fine.




I think qdisc stats being off by a bit are ok - hence the rcu
suggestion. I would have problems with action stats being off
(where they are used for billing).

cheers,
jamal


[net PATCH 1/1] net sched actions: policer missing timestamp processing

2016-05-22 Thread Jamal Hadi Salim
From: Jamal Hadi Salim 

Policer was not dumping or updating timestamps

Signed-off-by: Jamal Hadi Salim 
---
 include/uapi/linux/pkt_cls.h |  4 +++-
 net/sched/act_police.c   | 10 ++
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index eba5914..f4297c8 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -145,6 +145,8 @@ enum {
TCA_POLICE_PEAKRATE,
TCA_POLICE_AVRATE,
TCA_POLICE_RESULT,
+   TCA_POLICE_TM,
+   TCA_POLICE_PAD,
__TCA_POLICE_MAX
 #define TCA_POLICE_RESULT TCA_POLICE_RESULT
 };
@@ -173,7 +175,7 @@ enum {
TCA_U32_DIVISOR,
TCA_U32_SEL,
TCA_U32_POLICE,
-   TCA_U32_ACT,   
+   TCA_U32_ACT,
TCA_U32_INDEV,
TCA_U32_PCNT,
TCA_U32_MARK,
diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 330f14e..be2c139 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -239,6 +239,9 @@ override:
police->tcfp_t_c = ktime_get_ns();
police->tcf_index = parm->index ? parm->index :
tcf_hash_new_index(tn);
+   police->tcf_tm.install = jiffies;
+   police->tcf_tm.lastuse = jiffies;
+   police->tcf_tm.firstuse = 0;
h = tcf_hash(police->tcf_index, POL_TAB_MASK);
spin_lock_bh(>lock);
hlist_add_head(>tcf_head, >htab[h]);
@@ -268,6 +271,7 @@ static int tcf_act_police(struct sk_buff *skb, const struct 
tc_action *a,
spin_lock(>tcf_lock);
 
bstats_update(>tcf_bstats, skb);
+   tcf_lastuse_update(>tcf_tm);
 
if (police->tcfp_ewma_rate &&
police->tcf_rate_est.bps >= police->tcfp_ewma_rate) {
@@ -327,6 +331,7 @@ tcf_act_police_dump(struct sk_buff *skb, struct tc_action 
*a, int bind, int ref)
.refcnt = police->tcf_refcnt - ref,
.bindcnt = police->tcf_bindcnt - bind,
};
+   struct tcf_t t;
 
if (police->rate_present)
psched_ratecfg_getrate(, >rate);
@@ -340,6 +345,11 @@ tcf_act_police_dump(struct sk_buff *skb, struct tc_action 
*a, int bind, int ref)
if (police->tcfp_ewma_rate &&
nla_put_u32(skb, TCA_POLICE_AVRATE, police->tcfp_ewma_rate))
goto nla_put_failure;
+
+   tcf_tm_dump(, >tcf_tm);
+   if (nla_put_64bit(skb, TCA_POLICE_TM, sizeof(t), , TCA_POLICE_PAD))
+   goto nla_put_failure;
+
return skb->len;
 
 nla_put_failure:
-- 
1.9.1



Re: [net-next PATCH 0/2] net sched small OCD cleanups

2016-05-22 Thread Jamal Hadi Salim

On 16-05-19 02:28 PM, David Miller wrote:

From: Jamal Hadi Salim 
Date: Tue, 17 May 2016 17:19:13 -0400


I have been staring at these small OCD cleanups for a while.
Got cycles to do them as prep for another patch i am working on.
Indentation/stylistic and consistency updates


This series doesn't apply cleanly to the current tree.



Will resend when you open net-next.

cheers,
jamal


Re: [net-next RFC 1/1] net sched actions: introduce timestamp for first time used

2016-05-22 Thread Jamal Hadi Salim

On 16-05-20 01:16 AM, Cong Wang wrote:


One nit: It is better if you can swap 'stm' with 'dtm',
it looks odd to put destination on the right side.



Will make the change for when net-next opens up.

cheers,
jamal


Re: [RFC net-next] net: sched: do not acquire qdisc spinlock in qdisc/class stats dump

2016-05-22 Thread Jamal Hadi Salim

On 16-05-19 08:35 AM, Eric Dumazet wrote:

From: Eric Dumazet 

Large tc dumps (tc -s {qdisc|class} sh dev ethX) done by Google BwE host
agent [1] are problematic at scale :

For each qdisc/class found in the dump, we currently lock the root qdisc
spinlock in order to get stats. Sampling stats every 5 seconds from
thousands of HTB classes is a challenge when the root qdisc spinlock is
under high pressure.



Good stuff.
There are other optimization we could do in such large scale dumps
(such as not dumping something that hasnt been updated)
Could we have changed it to be rcu?


These stats are using u64 or u32 fields, so reading integral values
should not prevent writers from doing concurrent updates if the kernel
arch is a 64bit one.



Meaning it wont work on other archs? is atomic read not dependable
on other setups?



Being able to atomically fetch all counters like packets and bytes sent
at the expense of interfering in fast path (queue and dequeue packets)
is simply not worth the pain, as the values are generally stale after 1
usec.

These lock acquisitions slow down the fast path by 10 to 20 %




Acked-by: Jamal Hadi Salim 

cheers,
jamal


Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-22 Thread Guenter Roeck

Hi Uwe,

On 05/22/2016 03:10 AM, Uwe Kleine-König wrote:

Hello,

On Tue, May 17, 2016 at 09:37:11PM -0700, Guenter Roeck wrote:

[9.366256] libphy: ethoc-mdio: probed
[9.367389]  (null): could not attach to PHY
[9.368555]  (null): failed to probe MDIO bus
[9.371540] Unable to handle kernel paging request at virtual address 
001c
[9.371540]  pc = d0320926, ra = 903209d1
[9.375358] Oops: sig: 11 [#1]
[9.376081] PREEMPT
[9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1
[9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
[9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001   
d7f45c00 d7c31bd0
[9.382298] a08:     00060100 d04b0c10 
d7f45dfc d7c31bb0
[9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c
[9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011
[9.388173]
Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 
d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00 d025befc
d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34
[9.396652] Call Trace:
[9.397469]  [] __device_release_driver+0x7d/0x98
[9.398869]  [] device_release_driver+0x15/0x20
[9.400247]  [] bus_remove_device+0xc1/0xd4
[9.401569]  [] device_del+0x109/0x15c
[9.402794]  [] phy_mdio_device_remove+0xd/0x18
[9.404124]  [] mdiobus_unregister+0x40/0x5c
[9.405444]  [] ethoc_probe+0x534/0x5b8
[9.406742]  [] platform_drv_probe+0x28/0x48
[9.408122]  [] driver_probe_device+0x101/0x234
[9.409499]  [] __driver_attach+0x7d/0x98
[9.410809]  [] bus_for_each_dev+0x30/0x5c
[9.412104]  [] driver_attach+0x14/0x18
[9.413385]  [] bus_add_driver+0xc9/0x198
[9.414686]  [] driver_register+0x70/0xa0
[9.416001]  [] __platform_driver_register+0x24/0x28
[9.417463]  [] ethoc_driver_init+0x10/0x14
[9.418824]  [] do_one_initcall+0x80/0x1ac
[9.420083]  [] kernel_init_freeable+0x131/0x198
[9.421504]  [] kernel_init+0xc/0xb0
[9.422693]  [] ret_from_kernel_thread+0x8/0xc


Guenter, can you please test if the following patch fixes your setup:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 307f72a0f2e2..efa85fb31574 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1573,14 +1573,14 @@ static int phy_probe(struct device *dev)
int err = 0;
struct gpio_descs *reset_gpios;

-   phydev->drv = phydrv;
-
/* take phy out of reset */
reset_gpios = devm_gpiod_get_array_optional(dev, "reset",
GPIOD_OUT_LOW);
if (IS_ERR(reset_gpios))
return PTR_ERR(reset_gpios);

+   phydev->drv = phydrv;
+
/* Disable the interrupt if the PHY doesn't support it
 * but the interrupt is still a valid one
 */

This doesn't make your ethernet work, but at least the driver should
fail in a clean way.


I tried, but it still fails with exactly the same error, meaning
it still crashes with the same traceback.


Together with enabling GPIOLIB this should put ethernet in a working
state again.



I am not exactly in favor of forcing GPIOLIB to be enabled for every
system with Ethernet support, just because a few of them may require
a gpio based phy reset. The same is true, really, for every other
driver using _optional gpiolib functions.

Personally, I think that the _optional functions in gpiolib _should_
return no error if gpiolib is not configured. After all, those
gpio pins _are_ supposed to be optional. gpiolib should be enabled
for affected configurations, ie on systems with gpio support which
do need the optional gpio pins. Forcing gpiolib to be enabled even
in systems with no gpio support seems to be a bit heavy-handed.
It just bloats the kernel on such systems with no added benefit.

Thanks,
Guenter



Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO

2016-05-22 Thread Hannes Frederic Sowa
On 22.05.2016 09:44, Michael S. Tsirkin wrote:
> On Sat, May 21, 2016 at 03:02:30AM +0200, Hannes Frederic Sowa wrote:
>> Hello,
>>
>> On 18.05.2016 18:06, Tom Herbert wrote:
>>> In several gso_segment functions there are checks of gso_type against
>>> a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
>>> attempt to identify unsupported GSO types, but since the stack is
>>> the one that set these GSO types in the first place this seems
>>> unnecessary to do. If a combination isn't valid in the first
>>> place that stack should not allow setting it.
>>>
>>> This is a code simplication especially for add new GSO types.
>>
>> I couldn't still wrap my head around this.
>>
>> I wonder if this is safe in case of if the packet is generated from an
>> untrusted virtual machine over virtio_net?
>>
>> Bye,
>> Hannes
> 
> I'm not sure how you use virtio_net, but neither it nor tun or macvtap
> commonly used as backends for it pass gso flags through
> from untrusted entities.

Sorry for the noise, I see the flags gets sanitized during transport.

Thanks,
Hannes



Re: IPv6 extension header privileges

2016-05-22 Thread Hannes Frederic Sowa
On 22.05.2016 13:56, Sowmini Varadhan wrote:
> 
>>> Tom Herbert wrote:
>>> If you don't mind I'll change this to make specific options are
>>> privileged and not all hbh and destopt. There is talk in IETF about
>>> reinventing IP extensibility within UDP since the kernel APIs don't
>>> allow setting EH. I would like to avoid that :-)
> 
>> On 21.05.2016 19:46, Sowmini Varadhan wrote:
>>> Do you mean this
>>>   http://www.ietf.org/mail-archive/web/spud/current/msg00365.html
> 
> On (05/22/16 03:08), Hannes Frederic Sowa wrote:
>> Hmm, haven't read carefully but isn't that just plain TCP in UDP? I saw
>> extension headers mentioned but haven't grasped why they deem necessary.
> 
> Tom should clarify what he meant, but perhaps he was referring to other
> threads discussing v6 EH. In any case, I dont think the way least-privileges
> for EH are implemented in an OS is directly relevant or causational for
> whether or not the kernel should be bypassed - looks like there are a lot 
> of other drafts floating around, arguing for implementing various tcp/ip
> protocols in uspace and beyond, motivated by various reasons.
> 
> Moving back to the topic here:
> 
>>> Hannes Frederic Sowa wrote:
>> A white list of certain registered IPv6 IANA-options for non-priv whould
> 
>> On 21.05.2016 19:46, Sowmini Varadhan wrote:
>>> Problem is that APIs are not IANA'ed. 
>>> Even RFC 3542 is just Informationaal. 
>>>
>>> And even the classic socket API's that come down from BSD are not 
>>> ietf'ed or iana'ed.
> 
> On (05/22/16 03:08), Hannes Frederic Sowa wrote:
>> Hmm, haven't read carefully but isn't that just plain TCP in UDP? I saw
>> I think I don't completely understand this. IANA is numbering registry
>> and if we have the proper option number allocated we can make sensible
>> decisions and put options on the white list or provide a more complete
>> sensible implementation of the specification in the kernel.
> 
> IANA registers internet protocol (and related) numbers. so, e.g., 
> So, for example, IP_TOS value is not really documented in iana,
> and it ends up being 1 on linux, 3 on freebsd.  Or, to take another example,
> IP_PKTINFO is "8" on linux, 0x1a on solaris and 25 in netbsd.  

Our setsockopts take the option numbers verbatim as they appear on the
wire (struct ipv6_opt_hdr). Thus we only need the numbers like
registered in the ipv6 parameter registry. API doesn't need to change
besides the privilege check. This is enough for Linux.

> but that should not stop the linux kernel (or other OS) from trying 
> to figure out the granularity of the rbac for these options and documenting 
> them in some helpful way for apps.

Just by knowing which option is interpreted in which way, we can do the
decision. I don't want to standardize APIs at all.

Bye,
Hannes



Re: IPv6 extension header privileges

2016-05-22 Thread Sowmini Varadhan

> > Tom Herbert wrote:
> > If you don't mind I'll change this to make specific options are
> > privileged and not all hbh and destopt. There is talk in IETF about
> > reinventing IP extensibility within UDP since the kernel APIs don't
> > allow setting EH. I would like to avoid that :-)

> On 21.05.2016 19:46, Sowmini Varadhan wrote:
> > Do you mean this
> >   http://www.ietf.org/mail-archive/web/spud/current/msg00365.html

On (05/22/16 03:08), Hannes Frederic Sowa wrote:
> Hmm, haven't read carefully but isn't that just plain TCP in UDP? I saw
> extension headers mentioned but haven't grasped why they deem necessary.

Tom should clarify what he meant, but perhaps he was referring to other
threads discussing v6 EH. In any case, I dont think the way least-privileges
for EH are implemented in an OS is directly relevant or causational for
whether or not the kernel should be bypassed - looks like there are a lot 
of other drafts floating around, arguing for implementing various tcp/ip
protocols in uspace and beyond, motivated by various reasons.

Moving back to the topic here:

> > Hannes Frederic Sowa wrote:
>  A white list of certain registered IPv6 IANA-options for non-priv whould

> On 21.05.2016 19:46, Sowmini Varadhan wrote:
> > Problem is that APIs are not IANA'ed. 
> > Even RFC 3542 is just Informationaal. 
> > 
> > And even the classic socket API's that come down from BSD are not 
> > ietf'ed or iana'ed.

On (05/22/16 03:08), Hannes Frederic Sowa wrote:
> Hmm, haven't read carefully but isn't that just plain TCP in UDP? I saw
> I think I don't completely understand this. IANA is numbering registry
> and if we have the proper option number allocated we can make sensible
> decisions and put options on the white list or provide a more complete
> sensible implementation of the specification in the kernel.

IANA registers internet protocol (and related) numbers. so, e.g., 
So, for example, IP_TOS value is not really documented in iana,
and it ends up being 1 on linux, 3 on freebsd.  Or, to take another example,
IP_PKTINFO is "8" on linux, 0x1a on solaris and 25 in netbsd.  

But TOS, and the various code-points (which actually go out 
in the packet, and are needed for proper interop in the network)
are documented in iana/ietf etc.

> E.g. if an option for encapsulation is going to be specified, normal
> users should not be able to set those, like with CALIPSO or some VNI
> inside hop-by-hop options. That should probably be controlled by a
> routing table or a flow matching subsystem, in the kernel.

sure, I completely agree with that. And I strongly suspect that's why
rfc3542 puts down a wildcard "may" - so that some options may be privileged,
others not. Which options are "privileged" (and even the definition 
of "privileged") are entirely up to the OS implementation. (and even *how*
least priviliges/RBAC are implemented, can vary from OS to OS).

> I think it is also in favor of the IETF to get those numbers specified
> and allocated in a proper way, otherwise security won't be manageable at
> all any more.

see above.. Even rfc793 actually does not talk about POSIX APIs
but speaks in generalities, since the focus is on what goes on the wire.
In theory an implementation of a tcp/ip stack does not even have
to use the POSIX socket api, thus how can iana/ietf manadate
specific socket options and numbers, or the rbac model that they
should use? 

but that should not stop the linux kernel (or other OS) from trying 
to figure out the granularity of the rbac for these options and documenting 
them in some helpful way for apps.

--Sowmini


Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-22 Thread Lino Sanfilippo
On 22.05.2016 11:17, Shuyu Wei wrote:

> Hi Lino,
> 
> I tested this patch, it still got panic under stress.
> Just wget 2 large files simultaneously and it failed.
> 
> Looks like the problem comes from the if statement in tx_clean().
> I changed your patch to use 
> 
> -   if (info & FOR_EMAC)
> +   if ((info & FOR_EMAC) || !txbd->data || !skb)
> 
> and it worked. 

Thanks for testing. However that extra check for skb not being NULL should not 
be
necessary if the code were correct. The changes I suggested were all about 
having
skb and info consistent with txbd_curr.
But I just realized that there is still a big flaw in the last changes. While
tx() looks correct now (we first set up the descriptor and assign the skb and 
_then_
advance txbd_curr) tx_clean still is not:

We _first_ have to read tx_curr and _then_ read the corresponding descriptor 
and its skb.
(The last patch implemented just the reverse - and thus wrong - order, first 
get skb and 
descriptor and then read tx_curr).

So the patch below hopefully handles also tx_clean correctly. Could you please 
do once more a test
with this one?


> 
> After further test, my patch to barrier timestamp() didn't work.
> Just like the original code in the tree, the emac still got stuck under
> high load, even if I changed the smp_wmb() to dma_wmb(). So the original
> code do have race somewhere.

So to make this clear: with the current code in net-next you still see a 
problem (lockup), right?

> 
> I'm new to kernel development, and still trying to understand how memory
> barrier works

Its an interresting topic and thats what I am trying to understand, too :)


> ... and why Francois' fix worked. Please be patient with me :-).

So which fix(es) exactly work for you and solve your lockup issue?


--- a/drivers/net/ethernet/arc/emac_main.c
+++ b/drivers/net/ethernet/arc/emac_main.c
@@ -159,12 +159,22 @@ static void arc_emac_tx_clean(struct net_device *ndev)
unsigned int *txbd_dirty = >txbd_dirty;
struct arc_emac_bd *txbd = >txbd[*txbd_dirty];
struct buffer_state *tx_buff = >tx_buff[*txbd_dirty];
-   struct sk_buff *skb = tx_buff->skb;
unsigned int info = le32_to_cpu(txbd->info);
+   struct sk_buff *skb;
 
-   if ((info & FOR_EMAC) || !txbd->data || !skb)
+   if (*txbd_dirty == priv->txbd_curr)
break;
 
+   /* Make sure curr pointer is consistent with info */
+   rmb();
+
+   info = le32_to_cpu(txbd->info);
+
+   if (info & FOR_EMAC)
+   break;
+
+   skb = tx_buff->skb;
+
if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
stats->tx_errors++;
stats->tx_dropped++;
@@ -195,8 +205,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
*txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM;
}
 
-   /* Ensure that txbd_dirty is visible to tx() before checking
-* for queue stopped.
+   /* Ensure that txbd_dirty is visible to tx() and we see the most recent
+* value for txbd_curr.
 */
smp_mb();
 
@@ -680,35 +690,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
net_device *ndev)
dma_unmap_len_set(>tx_buff[*txbd_curr], len, len);
 
priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
-
-   /* Make sure pointer to data buffer is set */
-   wmb();
+   priv->tx_buff[*txbd_curr].skb = skb;
 
skb_tx_timestamp(skb);
 
*info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
 
-   /* Make sure info word is set */
+   /* 1. Make sure that with respect to tx_clean everything is set up
+* properly before we advance txbd_curr.
+* 2. Make sure writes to DMA descriptors are completed before we inform
+* the hardware.
+*/
wmb();
 
-   priv->tx_buff[*txbd_curr].skb = skb;
-
/* Increment index to point to the next BD */
*txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
 
-   /* Ensure that tx_clean() sees the new txbd_curr before
-* checking the queue status. This prevents an unneeded wake
-* of the queue in tx_clean().
+   /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees
+* the updated value of txbd_curr.
 */
smp_mb();
 
-   if (!arc_emac_tx_avail(priv)) {
+   if (!arc_emac_tx_avail(priv))
netif_stop_queue(ndev);
-   /* Refresh tx_dirty */
-   smp_mb();
-   if (arc_emac_tx_avail(priv))
-   netif_start_queue(ndev);
-   }
 
arc_reg_set(priv, R_STATUS, TXPL_MASK);
 
-- 
1.9.1





Re: Crashes in -next due to 'phy: add support for a reset-gpio specification'

2016-05-22 Thread Uwe Kleine-König
Hello,

On Tue, May 17, 2016 at 09:37:11PM -0700, Guenter Roeck wrote:
> [9.366256] libphy: ethoc-mdio: probed
> [9.367389]  (null): could not attach to PHY
> [9.368555]  (null): failed to probe MDIO bus
> [9.371540] Unable to handle kernel paging request at virtual address 
> 001c
> [9.371540]  pc = d0320926, ra = 903209d1
> [9.375358] Oops: sig: 11 [#1]
> [9.376081] PREEMPT
> [9.377080] CPU: 0 PID: 1 Comm: swapper Not tainted 4.6.0-next-20160517 #1
> [9.378397] task: d7c2c000 ti: d7c3 task.ti: d7c3
> [9.379394] a00: 903209d1 d7c31bd0 d7fb5810 0001   
> d7f45c00 d7c31bd0
> [9.382298] a08:     00060100 d04b0c10 
> d7f45dfc d7c31bb0
> [9.385732] pc: d0320926, ps: 00060110, depc: 0018, excvaddr: 001c
> [9.387061] lbeg: d0322e35, lend: d0322e57 lcount: , sar: 0011
> [9.388173]
> Stack: d7c31be0 00060700 d7f45c00 d7c31bd0 9021d509 d7c31c30 d7f45c00 
>d0485dcc d0485dcc d7fb5810 d7c2c000  d7c31c30 d7f45c00 d025befc
>d0485dcc d7c3 d7f45c34 d7c31bf0 9021c985 d7c31c50 d7f45c00 d7f45c34
> [9.396652] Call Trace:
> [9.397469]  [] __device_release_driver+0x7d/0x98
> [9.398869]  [] device_release_driver+0x15/0x20
> [9.400247]  [] bus_remove_device+0xc1/0xd4
> [9.401569]  [] device_del+0x109/0x15c
> [9.402794]  [] phy_mdio_device_remove+0xd/0x18
> [9.404124]  [] mdiobus_unregister+0x40/0x5c
> [9.405444]  [] ethoc_probe+0x534/0x5b8
> [9.406742]  [] platform_drv_probe+0x28/0x48
> [9.408122]  [] driver_probe_device+0x101/0x234
> [9.409499]  [] __driver_attach+0x7d/0x98
> [9.410809]  [] bus_for_each_dev+0x30/0x5c
> [9.412104]  [] driver_attach+0x14/0x18
> [9.413385]  [] bus_add_driver+0xc9/0x198
> [9.414686]  [] driver_register+0x70/0xa0
> [9.416001]  [] __platform_driver_register+0x24/0x28
> [9.417463]  [] ethoc_driver_init+0x10/0x14
> [9.418824]  [] do_one_initcall+0x80/0x1ac
> [9.420083]  [] kernel_init_freeable+0x131/0x198
> [9.421504]  [] kernel_init+0xc/0xb0
> [9.422693]  [] ret_from_kernel_thread+0x8/0xc

Guenter, can you please test if the following patch fixes your setup:

diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c
index 307f72a0f2e2..efa85fb31574 100644
--- a/drivers/net/phy/phy_device.c
+++ b/drivers/net/phy/phy_device.c
@@ -1573,14 +1573,14 @@ static int phy_probe(struct device *dev)
int err = 0;
struct gpio_descs *reset_gpios;
 
-   phydev->drv = phydrv;
-
/* take phy out of reset */
reset_gpios = devm_gpiod_get_array_optional(dev, "reset",
GPIOD_OUT_LOW);
if (IS_ERR(reset_gpios))
return PTR_ERR(reset_gpios);
 
+   phydev->drv = phydrv;
+
/* Disable the interrupt if the PHY doesn't support it
 * but the interrupt is still a valid one
 */

This doesn't make your ethernet work, but at least the driver should
fail in a clean way.

Together with enabling GPIOLIB this should put ethernet in a working
state again.

Best regards
Uwe

-- 
Pengutronix e.K.   | Uwe Kleine-König|
Industrial Linux Solutions | http://www.pengutronix.de/  |


Re: 4.5.0 on sun7i-a20-olinuxino-lime2: libphy: PHY stmmac-0:ffffffff not found (regression from rc7)

2016-05-22 Thread Marc Zyngier
On Sat, 21 May 2016 17:47:05 +0200
Bert Lindner  wrote:

> Hi,
> 
> On 2016-05-20 12:36, Marc Zyngier wrote:
> > On 20/05/16 11:30, Andre Heider wrote:
> >> Hi,
> >>
> >> On Fri, May 20, 2016 at 10:14 AM, Giuseppe CAVALLARO
> >>  wrote:
> >>> On 5/20/2016 9:56 AM, Marc Zyngier wrote:
> 
>  On 20/05/16 06:44, Andre Heider wrote:
> >
> > Giuseppe, Alexandre, et al.,
> >
> > On Thu, Mar 17, 2016 at 8:52 AM, Marc Zyngier 
> > wrote:
> >>
> >> On Thu, 17 Mar 2016 00:56:40 +0100
> >> Bert Lindner  wrote:
> >>>
> >>> On 2016-03-16 18:42, Marc Zyngier wrote:
> 
>  On 16/03/16 15:10, Bert Lindner wrote:
> >
> > On 2016-03-16 14:10, Andreas Färber wrote:
> >>
> >> Am 16.03.2016 um 13:09 schrieb Robin Murphy:
> >>>
> >>> On 16/03/16 11:39, Marc Zyngier wrote:
> 
>  On 16/03/16 11:19, Bert Lindner wrote:
> >
> > ...
> >
> > For the board sun7i-a20-olinuxino-lime2, there seems to be a
> > problem
> > with the eth0 PHY in mainline kernel 4.5.0 that developed since
> > 4.5.0-rc7. Ethernet does not work, although eth0 is reported:
> >
> > ...
> >
> > [9.767125] NET: Registered protocol family 10
> > [   10.357405] libphy: PHY stmmac-0: not found
> > [   10.362382] eth0: Could not attach to PHY
> > [   10.366557] stmmac_open: Cannot attach to PHY (error: -19)
> >
> > ...
> >>
> >> v4 fixes for 4.5 are here:
> >>
> >> https://patchwork.ozlabs.org/patch/598195/ (revert)
> >> https://patchwork.ozlabs.org/patch/598196/
> >
> > ...
> 
>  Good to know, thanks. Could you also give the potential fix a go (as
>  mentioned by Andreas)? Just to make sure that whatever gets merged
>  next
>  will actually fix the issue.
> >>>
> >>>
> >>> Yes sure, it took a while because I had to travel. Confirmed, the
> >>> v4-for-4.5 fix works well for me, on sun7i-a20-olinuxino-lime2:
> >>>
> >>> root@lime2-079f:~# cat /proc/version
> >>> Linux version 4.5.0-598195-598196-v4 (root@lime2-079f) (gcc version
> >>> 4.9.1 (Ubuntu/Linaro 4.9.1-16ubuntu6) ) #1 SMP Wed Mar 16 16:44:22 UTC
> >>> 2016
> >>>
> >>> dmesg:
> >>> [8.245273] NET: Registered protocol family 10
> >>> [9.297406]  RX IPC Checksum Offload disabled
> >>> [9.297460]  No MAC Management Counters available
> >>> [9.297951] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
> >>> [   16.285658] sun7i-dwmac 1c5.ethernet eth0: Link is Up -
> >>> 1Gbps/Full - flow control rx/tx
> >>> [   16.285798] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
> >>>
> >>> The board is connected to my laptop rather than to a switch, so that
> >>> might be where the flow control message comes from (not sure). Anyway
> >>> ethernet works.
> >>
> >>
> >> Cool, many thanks for taking the time to test and report.
> >>
> >> Hopefully Giuseppe will get this merged quickly enough in mainline, and
> >> it should then trickle into a 4.5-stable release (cc-ing stable on
> >> these patches would probably be a good idea, BTW).
> >
> >
> > stmmac is broken on at least Lime2, BananaPi and Cubieboard2 since
> > v4.5 [0], including all five stable releases :(
> 
> 
>  All the A20 platforms are dead, actually.
> 
> > The v4.5 patches quoted above are already +4 weeks old, could we
> > please get them into stable?
> 
> 
>  For that, the maintainer would have needed to CC stable, which he
>  didn't. I'd expect someone who cares to send these patches to stable.
>  It'd be better if the maintainer would do it himself though.
> >>>
> >>>
> >>> sure, I can send the patches to stable (sorry if I missed to add
> >>> stable ML on CC).
> >>>
> >>> Andre, I have not clear if the train of patches actually fix the
> >>> issue or if you need my support to fix something else. In that case
> >>> I need some input for debugging (e.g. kernel log).
> >>
> >> Bert already confirmed that those two patches fixes stmmac on his
> >> Lime2, so I assume that it fixes the issue for all A20 platforms.
> >>
> >>> let me know, is it enough to re-send the patches only?
> >>
> >> Just a resend with cc:stable :)
> >
> > Not quite. Please read Documentation/stable_kernel_rules.txt, and the
> > section that concerns networking patches (and then consult
> > Documentation/networking/netdev-FAQ.txt which has all the details).
> 
> FWIW, recent 4.6-rc series and 4.6.0 have worked fine for me and the 
> lime2. Had not tried 4.5.x again.

We've already established this. This is a 4.5-specific regression (both
4.4 and 4.6 

Re: [PATCH v2] ethernet:arc: Fix racing of TX ring buffer

2016-05-22 Thread Shuyu Wei
On Sun, May 22, 2016 at 12:58:55AM +0200, Lino Sanfilippo wrote:
> --- a/drivers/net/ethernet/arc/emac_main.c
> +++ b/drivers/net/ethernet/arc/emac_main.c
> @@ -162,7 +162,13 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   struct sk_buff *skb = tx_buff->skb;
>   unsigned int info = le32_to_cpu(txbd->info);
>  
> - if ((info & FOR_EMAC) || !txbd->data || !skb)
> + if (info & FOR_EMAC)
> + break;
> +
> + /* Make sure curr pointer is consistent with info */
> + rmb();
> +
> + if (*txbd_dirty == priv->txbd_curr)
>   break;
>  
>   if (unlikely(info & (DROP | DEFR | LTCL | UFLO))) {
> @@ -195,8 +201,8 @@ static void arc_emac_tx_clean(struct net_device *ndev)
>   *txbd_dirty = (*txbd_dirty + 1) % TX_BD_NUM;
>   }
>  
> - /* Ensure that txbd_dirty is visible to tx() before checking
> -  * for queue stopped.
> + /* Ensure that txbd_dirty is visible to tx() and we see the most recent
> +  * value for txbd_curr.
>*/
>   smp_mb();
>  
> @@ -680,35 +686,29 @@ static int arc_emac_tx(struct sk_buff *skb, struct 
> net_device *ndev)
>   dma_unmap_len_set(>tx_buff[*txbd_curr], len, len);
>  
>   priv->txbd[*txbd_curr].data = cpu_to_le32(addr);
> -
> - /* Make sure pointer to data buffer is set */
> - wmb();
> + priv->tx_buff[*txbd_curr].skb = skb;
>  
>   skb_tx_timestamp(skb);
>  
>   *info = cpu_to_le32(FOR_EMAC | FIRST_OR_LAST_MASK | len);
>  
> - /* Make sure info word is set */
> + /* 1. Make sure that with respect to tx_clean everything is set up
> +  * properly before we advance txbd_curr.
> +  * 2. Make sure writes to DMA descriptors are completed before we inform
> +  * the hardware.
> +  */
>   wmb();
>  
> - priv->tx_buff[*txbd_curr].skb = skb;
> -
>   /* Increment index to point to the next BD */
>   *txbd_curr = (*txbd_curr + 1) % TX_BD_NUM;
>  
> - /* Ensure that tx_clean() sees the new txbd_curr before
> -  * checking the queue status. This prevents an unneeded wake
> -  * of the queue in tx_clean().
> + /* Ensure we see the most recent value of txbd_dirty and tx_clean() sees
> +  * the updated value of txbd_curr.
>*/
>   smp_mb();
>  
> - if (!arc_emac_tx_avail(priv)) {
> + if (!arc_emac_tx_avail(priv))
>   netif_stop_queue(ndev);
> - /* Refresh tx_dirty */
> - smp_mb();
> - if (arc_emac_tx_avail(priv))
> - netif_start_queue(ndev);
> - }
>  
>   arc_reg_set(priv, R_STATUS, TXPL_MASK);
>  

Hi Lino,

I tested this patch, it still got panic under stress.
Just wget 2 large files simultaneously and it failed.

Looks like the problem comes from the if statement in tx_clean().
I changed your patch to use 

-   if (info & FOR_EMAC)
+   if ((info & FOR_EMAC) || !txbd->data || !skb)

and it worked. 

After further test, my patch to barrier timestamp() didn't work.
Just like the original code in the tree, the emac still got stuck under
high load, even if I changed the smp_wmb() to dma_wmb(). So the original
code do have race somewhere. 

I'm new to kernel development, and still trying to understand how memory
barrier works and why Francois' fix worked. Please be patient with me :-).

---

[  138.501355] Unable to handle kernel NULL pointer dereference at virtual 
address 00a8
[  138.509482] pgd = c0004000
[  138.512200] [00a8] *pgd=
[  138.515850] Internal error: Oops: 5 [#1] SMP ARM
[  138.520476] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0+ #98
[  138.526477] Hardware name: Rockchip (Device Tree)
[  138.531180] task: c0804e00 ti: c080 task.ti: c080
[  138.536588] PC is at __dev_kfree_skb_irq+0xc/0x8c
[  138.541295] LR is at arc_emac_poll+0x94/0x594
[  138.545653] pc : []lr : []psr: 60050113
[  138.545653] sp : c0801d88  ip : c0801da0  fp : c0801d9c
[  138.557118] r10: ee0a  r9 : ee0a1000  r8 : 01f8
[  138.562340] r7 :   r6 : 00fc  r5 : f08d0400  r4 : 03f0
[  138.568861] r3 : 0001  r2 : 0042  r1 : 0001  r0 : 
[  138.575385] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[  138.582515] Control: 10c5387d  Table: 8da4004a  DAC: 0051
[  138.588256] Process swapper/0 (pid: 0, stack limit = 0xc0800210)
[  138.594258] Stack: (0xc0801d88 to 0xc0802000)
[  138.598617] 1d80:   03f0 f08d0400 c0801df4 c0801da0 
c03c5e64 c046f590
[  138.606794] 1da0:  c0171050 c0801ddc ee0a04a8 c016504c 0028 
c0806614 f08d05f8
[  138.614969] 1dc0: ee0a02a8 0080 f0803100 ee0a04a8 c03c5dd0 0028 
012c 623d
[  138.623144] 1de0: c0802100 c0801e18 c0801e54 c0801df8 c0471c04 c03c5ddc 
c0801eb4 c06e5ae8
[  138.631318] 1e00: c08029f4 c08029f4 c083d821 2e86f000 c07486c0 eefb76c0 
c0801e18 c0801e18
[  

Re: [v4.6-rc7-183-g1410b74e4061]

2016-05-22 Thread Sedat Dilek
On 5/16/16, Sedat Dilek  wrote:
> On 5/16/16, Peter Zijlstra  wrote:
>> On Mon, May 16, 2016 at 07:42:35PM +0200, Sedat Dilek wrote:
>>
>>> Unfortunately, I could not reproduce this again with none of my
>>> 183-kernels.
>>> When I first hit a "chain_key collision" issue, it was hard to
>>> redproduce,
>>> so.
>>> Any idea, how I can "force" this?
>>
>> Nope; I wish I knew, that'd be so much easier to work with :/
>>
>> I'm hoping someone will report a reproducer, even something that
>> triggers once every 5-10 runs would be awesome.
>>
>> In any case, like I've explained before, nothing regressed as such, we
>> only added this new warning under DEBUG_LOCKDEP because we want to
>> better understand the condition that triggers it.
>>
>> If it bothers you, just turn off DEBUG_LOCKDEP and know that your kernel
>> is as reliable as it was before. OTOH, if you do keep it on, please
>> let me know if you can (semi) reliably trigger this, as I'd really like
>> to have a better understanding.
>>
>
> OK, I keep checking my logs.
>
> I refreshed your patch Ingo pointed me to.
>
> But it fails like this (on top of Linux v4.6 final)...
> [...]
>   if [ "" = "-pg" ]; then if [ kernel/locking/mutex-debug.o !=
> "scripts/mod/empty.o" ]; then ./scripts/recordmcount
> "kernel/locking/mutex-debug.o"; fi; fi;
>   mycompiler -Wp,-MD,kernel/locking/.lockdep.o.d  -nostdinc -isystem
> /usr/lib/gcc/x86_64-linux-gnu/4.9/include -nostdinc -isystem
> /usr/lib/gcc/x86_64-linux-gnu/4.9/include -I./arch/x86/include
> -Iarch/x86/include/generated/uapi -Iarch/x86/include/generated
> -Iinclude -I./arch/x86/include/uapi -Iarch/x86/include/generated/uapi
> -I./include/uapi -Iinclude/generated/uapi -include
> ./include/linux/kconfig.h -D__KERNEL__ -Wall -Wundef
> -Wstrict-prototypes -Wno-trigraphs -fno-strict-aliasing -fno-common
> -Werror-implicit-function-declaration -Wno-format-security -std=gnu89
> -mno-sse -mno-mmx -mno-sse2 -mno-3dnow -mno-avx -m64 -falign-jumps=1
> -falign-loops=1 -mno-80387 -mno-fp-ret-in-387
> -mpreferred-stack-boundary=3 -mtune=generic -mno-red-zone
> -mcmodel=kernel -funit-at-a-time -maccumulate-outgoing-args
> -DCONFIG_X86_X32_ABI -DCONFIG_AS_CFI=1 -DCONFIG_AS_CFI_SIGNAL_FRAME=1
> -DCONFIG_AS_CFI_SECTIONS=1 -DCONFIG_AS_FXSAVEQ=1 -DCONFIG_AS_SSSE3=1
> -DCONFIG_AS_CRC32=1 -DCONFIG_AS_AVX=1 -DCONFIG_AS_AVX2=1 -pipe
> -Wno-sign-compare -fno-asynchronous-unwind-tables
> -fno-delete-null-pointer-checks -O2 --param=allow-store-data-races=0
> -Wframe-larger-than=1024 -fno-stack-protector
> -Wno-unused-but-set-variable -fno-omit-frame-pointer
> -fno-optimize-sibling-calls -fno-var-tracking-assignments -mfentry
> -DCC_USING_FENTRY -Wdeclaration-after-statement -Wno-pointer-sign
> -fno-strict-overflow -fconserve-stack -Werror=implicit-int
> -Werror=strict-prototypes -Werror=date-time -DCC_HAVE_ASM_GOTO
> -D"KBUILD_STR(s)=#s" -D"KBUILD_BASENAME=KBUILD_STR(lockdep)"
> -D"KBUILD_MODNAME=KBUILD_STR(lockdep)" -c -o
> kernel/locking/.tmp_lockdep.o kernel/locking/lockdep.c
> kernel/locking/lockdep.c: In function 'print_chain_keys_held_locks':
> kernel/locking/lockdep.c:2034:2: error: too few arguments to function
> 'print_chain_key_iteration'
>   print_chain_key_iteration(hlock_next->class_idx, chain_key);
>   ^
> kernel/locking/lockdep.c:2006:12: note: declared here
>  static u64 print_chain_key_iteration(int class_idx, u64 chain_key,
> u64 prev_key)
> ^
> make[4]: *** [kernel/locking/lockdep.o] Error 1
> make[3]: *** [kernel/locking] Error 2
> make[2]: *** [kernel] Error 2
> [...]
>

Is the attached fix correct?

- Sedat -

P.S.: Attached is a refreshed version of your original proposal patch
which does not compile correctly.
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5dc21eb101b0..b771a691b5e8 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -2031,7 +2031,8 @@ print_chain_keys_held_locks(struct task_struct *curr, 
struct held_lock *hlock_ne
print_lock(hlock);
}
 
-   print_chain_key_iteration(hlock_next->class_idx, chain_key);
+   print_chain_key_iteration(hlock_next->class_idx, chain_key,
+ hlock->prev_chain_key);
print_lock(hlock_next);
 }
 


0001-locking-lockdep-Some-more-additional-chain_key-colli.patch
Description: Binary data


[PATCH 32/54] MAINTAINERS: Add file patterns for net device tree bindings

2016-05-22 Thread Geert Uytterhoeven
Submitters of device tree binding documentation may forget to CC
the subsystem maintainer if this is missing.

Signed-off-by: Geert Uytterhoeven 
Cc: David S. Miller 
Cc: netdev@vger.kernel.org
---
Please apply this patch directly if you want to be involved in device
tree binding documentation for your subsystem.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index f296a5763fef0431..7a9f1138131afcf1 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -8004,6 +8004,7 @@ Q:http://patchwork.ozlabs.org/project/netdev/list/
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git
 S: Odd Fixes
+F: Documentation/devicetree/bindings/net/
 F: drivers/net/
 F: include/linux/if_*
 F: include/linux/netdevice.h
-- 
1.9.1



Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO

2016-05-22 Thread Michael S. Tsirkin
On Wed, May 18, 2016 at 09:06:09AM -0700, Tom Herbert wrote:
> In several gso_segment functions there are checks of gso_type against
> a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
> attempt to identify unsupported GSO types, but since the stack is
> the one that set these GSO types in the first place this seems
> unnecessary to do. If a combination isn't valid in the first
> place that stack should not allow setting it.
> 
> This is a code simplication especially for add new GSO types.
> 
> Signed-off-by: Tom Herbert 


I don't know of instances where gso_flags are passed in
from a VM, so FWIW

Acked-by: Michael S. Tsirkin 


> ---
>  net/ipv4/af_inet.c | 18 --
>  net/ipv4/gre_offload.c | 14 --
>  net/ipv4/tcp_offload.c | 19 ---
>  net/ipv4/udp_offload.c | 10 --
>  net/ipv6/ip6_offload.c | 18 --
>  net/ipv6/udp_offload.c | 13 -
>  net/mpls/mpls_gso.c| 11 +--
>  7 files changed, 1 insertion(+), 102 deletions(-)
> 
> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
> index 2e6e65f..7f08d45 100644
> --- a/net/ipv4/af_inet.c
> +++ b/net/ipv4/af_inet.c
> @@ -1205,24 +1205,6 @@ static struct sk_buff *inet_gso_segment(struct sk_buff 
> *skb,
>   int ihl;
>   int id;
>  
> - if (unlikely(skb_shinfo(skb)->gso_type &
> -  ~(SKB_GSO_TCPV4 |
> -SKB_GSO_UDP |
> -SKB_GSO_DODGY |
> -SKB_GSO_TCP_ECN |
> -SKB_GSO_GRE |
> -SKB_GSO_GRE_CSUM |
> -SKB_GSO_IPIP |
> -SKB_GSO_SIT |
> -SKB_GSO_TCPV6 |
> -SKB_GSO_UDP_TUNNEL |
> -SKB_GSO_UDP_TUNNEL_CSUM |
> -SKB_GSO_TCP_FIXEDID |
> -SKB_GSO_TUNNEL_REMCSUM |
> -SKB_GSO_PARTIAL |
> -0)))
> - goto out;
> -
>   skb_reset_network_header(skb);
>   nhoff = skb_network_header(skb) - skb_mac_header(skb);
>   if (unlikely(!pskb_may_pull(skb, sizeof(*iph
> diff --git a/net/ipv4/gre_offload.c b/net/ipv4/gre_offload.c
> index e88190a..ecd1e09 100644
> --- a/net/ipv4/gre_offload.c
> +++ b/net/ipv4/gre_offload.c
> @@ -26,20 +26,6 @@ static struct sk_buff *gre_gso_segment(struct sk_buff *skb,
>   int gre_offset, outer_hlen;
>   bool need_csum, ufo;
>  
> - if (unlikely(skb_shinfo(skb)->gso_type &
> - ~(SKB_GSO_TCPV4 |
> -   SKB_GSO_TCPV6 |
> -   SKB_GSO_UDP |
> -   SKB_GSO_DODGY |
> -   SKB_GSO_TCP_ECN |
> -   SKB_GSO_TCP_FIXEDID |
> -   SKB_GSO_GRE |
> -   SKB_GSO_GRE_CSUM |
> -   SKB_GSO_IPIP |
> -   SKB_GSO_SIT |
> -   SKB_GSO_PARTIAL)))
> - goto out;
> -
>   if (!skb->encapsulation)
>   goto out;
>  
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 02737b6..5c59649 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -83,25 +83,6 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  
>   if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>   /* Packet is from an untrusted source, reset gso_segs. */
> - int type = skb_shinfo(skb)->gso_type;
> -
> - if (unlikely(type &
> -  ~(SKB_GSO_TCPV4 |
> -SKB_GSO_DODGY |
> -SKB_GSO_TCP_ECN |
> -SKB_GSO_TCP_FIXEDID |
> -SKB_GSO_TCPV6 |
> -SKB_GSO_GRE |
> -SKB_GSO_GRE_CSUM |
> -SKB_GSO_IPIP |
> -SKB_GSO_SIT |
> -SKB_GSO_UDP_TUNNEL |
> -SKB_GSO_UDP_TUNNEL_CSUM |
> -SKB_GSO_TUNNEL_REMCSUM |
> -0) ||
> -  !(type & (SKB_GSO_TCPV4 |
> -SKB_GSO_TCPV6
> - goto out;
>  
>   skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(skb->len, mss);
>  
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 6b7459c..81f253b 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -209,16 +209,6 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff 
> *skb,
>  
>   if (skb_gso_ok(skb, features | NETIF_F_GSO_ROBUST)) {
>   /* Packet is from an untrusted source, reset gso_segs. */
> - int type = skb_shinfo(skb)->gso_type;
> -
> - if (unlikely(type 

Re: [PATCH v7 net-next 01/16] gso: Remove arbitrary checks for unsupported GSO

2016-05-22 Thread Michael S. Tsirkin
On Sat, May 21, 2016 at 03:02:30AM +0200, Hannes Frederic Sowa wrote:
> Hello,
> 
> On 18.05.2016 18:06, Tom Herbert wrote:
> > In several gso_segment functions there are checks of gso_type against
> > a seemingly arbitrary list of SKB_GSO_* flags. This seems like an
> > attempt to identify unsupported GSO types, but since the stack is
> > the one that set these GSO types in the first place this seems
> > unnecessary to do. If a combination isn't valid in the first
> > place that stack should not allow setting it.
> > 
> > This is a code simplication especially for add new GSO types.
> 
> I couldn't still wrap my head around this.
> 
> I wonder if this is safe in case of if the packet is generated from an
> untrusted virtual machine over virtio_net?
> 
> Bye,
> Hannes

I'm not sure how you use virtio_net, but neither it nor tun or macvtap
commonly used as backends for it pass gso flags through
from untrusted entities.

-- 
MST