Re: [PATCH v2 1/2] net, can, ifi: fix "write buffer full" error

2018-02-08 Thread Marc Kleine-Budde
On 02/08/2018 07:47 AM, Heiko Schocher wrote:
> the driver reads in the ISR first the IRQpending register,
> and clears after that in a write *all* bits in it.
> 
> It could happen that the isr register raise bits between
> this 2 register accesses, which leads in lost bits ...
> 
> In case it clears "TX message sent successfully", the driver
> never sends any Tx data, and buffers to userspace run over.
> 
> Fixed this:
> clear only the bits in the IRQpending register, the
> driver had read.
> 
> Signed-off-by: Heiko Schocher 
> Reviewed-by: Marek Vasut 

Applied both to linux-can.

Tnx,
Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] can: m_can: change comparison to bitshift when dealing with a mask

2018-02-08 Thread Marc Kleine-Budde
On 02/06/2018 09:52 AM, Wolfram Sang wrote:
> Due to a typo, the mask was destroyed by a comparison instead of a bit
> shift.
> 
> Reported-by: Geert Uytterhoeven 
> Signed-off-by: Wolfram Sang 
> ---
> Only build tested.

Applied to can. Luckily that define is not used :)

Tnx,
Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] can: m_can: change comparison to bitshift when dealing with a mask

2018-02-08 Thread Marc Kleine-Budde
On 02/06/2018 09:52 AM, Wolfram Sang wrote:
> Due to a typo, the mask was destroyed by a comparison instead of a bit
> shift.
> 
> Reported-by: Geert Uytterhoeven 
> Signed-off-by: Wolfram Sang 

Applied to can.

tnx,
Marc

-- 
Pengutronix e.K.  | Marc Kleine-Budde   |
Industrial Linux Solutions| Phone: +49-231-2826-924 |
Vertretung West/Dortmund  | Fax:   +49-5121-206917- |
Amtsgericht Hildesheim, HRA 2686  | http://www.pengutronix.de   |



signature.asc
Description: OpenPGP digital signature


USB rndis_host - slow download transfers, RX errors

2018-02-08 Thread Tomasz Janowski, Ph.D.
Dear USB developers,

Based on my google research, the problem I experience seems to happen
with some newer smartphones. My test case is Samsung Galaxy S8 (SM-950U1). I am
trying to use USB tethering and everything seems to work as expected (modules
are loaded, Ethernet devices are up and running, dhcp works fine). I can 
connect to
the external world using both LTE or wireless network on the phone.

Now, the problem is that the download speeds are terrible, around 64 KB/s,
while uploads are fast, the order of 15 MB/s. These speeds do not depend
on the wireless service provider: the results are similar when I tether wi-fi.
The USB Ethernet interface on the Linux host reports a lot of receive errors 
(attached:
device_state.txt), while kernel reports bad rndis messages (attached: 
kernel.log.txt).

Windows 10 works great with the same hardware (same PC and same phone), with
uploads and downloads in the order of 150 Mbit/s, which is probably as fast as 
my
wireless network can do. But some people reported issues with older Windows 
drivers too.
Is possible that some newer version of RNDIS protocol is around and Linux 
hasn't updated
its RNDIS module yet?

Best,
Tomasz
[77979.936526] usb 3-9: new high-speed USB device number 7 using xhci_hcd
[77980.089137] usb 3-9: New USB device found, idVendor=04e8, idProduct=6860
[77980.089143] usb 3-9: New USB device strings: Mfr=7, Product=8, SerialNumber=9
[77980.089146] usb 3-9: Product: SAMSUNG_Android
[77980.089148] usb 3-9: Manufacturer: SAMSUNG
[77980.089151] usb 3-9: SerialNumber: 98882945364a434f46
[78019.597479] usb 3-9: USB disconnect, device number 7
[78020.037162] usb 3-9: new high-speed USB device number 8 using xhci_hcd
[78020.187529] usb 3-9: New USB device found, idVendor=04e8, idProduct=6863
[78020.187534] usb 3-9: New USB device strings: Mfr=7, Product=8, SerialNumber=9
[78020.187537] usb 3-9: Product: SAMSUNG_Android
[78020.187539] usb 3-9: Manufacturer: SAMSUNG
[78020.187542] usb 3-9: SerialNumber: 98882945364a434f46
[78020.191979] rndis_host 3-9:1.0 usb0: register 'rndis_host' at 
usb-:00:14.0-9, RNDIS device, 82:70:61:5c:9d:4f
[78020.238719] rndis_host 3-9:1.0 enp0s20u9: renamed from usb0
[78213.555001] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.555027] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
-2013127152/-84731529/-93394684/315651350, len 548
[78213.555035] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.555243] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
-910707548/1362693399/1098054912/-1158698644, len 2048
[78213.555254] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
453773253/134416/1773410520/1734735583, len 2048
[78213.555258] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
1847997430/-1292764570/-1075768035/-1173448098, len 1644
[78213.580893] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.580928] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
1374746855/1872717209/1627961749/1446196293, len 2048
[78213.581007] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
2014317876/-1802985385/1741271276/985318156, len 2048
[78213.581272] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
315472595/1240463481/1199472260/-1304836767, len 346
[78213.606421] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.606700] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
1971266818/603214195/-1554415893/513330975, len 548
[78213.678067] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.678094] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
1971266818/603214195/-1554415893/513330975, len 548
[78213.755151] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.755178] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
-960915119/-1414823780/-1725858181/-1936328995, len 548
[78213.781899] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.782511] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
484952970/955066952/-390686949/1110830164, len 548
[78213.832185] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.832375] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
-289948698/1791367026/1725404901/949556424, len 2048
[78213.832383] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
2071042143/-2017854031/-1092884323/1270254088, len 2048
[78213.832405] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
-1454388581/-113380006/-2064914356/-1946942086, len 346
[78213.832412] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.833306] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
-26919620/597103012/-1245814449/-1439889729, len 548
[78213.858455] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 1/1298/36/1254, 
len 750
[78213.858651] rndis_host 3-9:1.0 enp0s20u9: bad rndis message 
484952970/955066952/-390686949/1110830164, len 548
[78213.958295] rndis_host 3-9:1.0 enp0s20u9: bad 

Re: [PATCH net-queue 3/3] e1000e: Avoid missed interrupts following ICR read.

2018-02-08 Thread Alexander Duyck
On Wed, Feb 7, 2018 at 10:47 PM, Benjamin Poirier  wrote:
> The 82574 specification update errata 12 states that interrupts may be
> missed if ICR is read while INT_ASSERTED is not set. Avoid that problem by
> setting all bits related to events that can trigger the Other interrupt in
> IMS.
>
> The Other interrupt is raised for such events regardless of whether or not
> they are set in IMS. However, only when they are set is the INT_ASSERTED
> bit also set in ICR.
>
> By doing this, we ensure that INT_ASSERTED is always set when we read ICR
> in e1000_msix_other() and steer clear of the errata. This also ensures that
> ICR will automatically be cleared on read, therefore we no longer need to
> clear bits explicitly.
>
> Signed-off-by: Benjamin Poirier 

Acked-by: Alexander Duyck 

> ---
>  drivers/net/ethernet/intel/e1000e/defines.h | 21 -
>  drivers/net/ethernet/intel/e1000e/netdev.c  | 11 ---
>  2 files changed, 24 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/defines.h 
> b/drivers/net/ethernet/intel/e1000e/defines.h
> index afb7ebe20b24..824fd44e25f0 100644
> --- a/drivers/net/ethernet/intel/e1000e/defines.h
> +++ b/drivers/net/ethernet/intel/e1000e/defines.h
> @@ -400,6 +400,10 @@
>  #define E1000_ICR_RXDMT00x0010 /* Rx desc min. threshold (0) */
>  #define E1000_ICR_RXO   0x0040 /* Receiver Overrun */
>  #define E1000_ICR_RXT0  0x0080 /* Rx timer intr (ring 0) */
> +#define E1000_ICR_MDAC  0x0200 /* MDIO Access Complete */
> +#define E1000_ICR_SRPD  0x0001 /* Small Receive Packet Detected 
> */
> +#define E1000_ICR_ACK   0x0002 /* Receive ACK Frame Detected */
> +#define E1000_ICR_MNG   0x0004 /* Manageability Event Detected */
>  #define E1000_ICR_ECCER 0x0040 /* Uncorrectable ECC Error */
>  /* If this bit asserted, the driver should claim the interrupt */
>  #define E1000_ICR_INT_ASSERTED 0x8000
> @@ -407,7 +411,7 @@
>  #define E1000_ICR_RXQ1  0x0020 /* Rx Queue 1 Interrupt */
>  #define E1000_ICR_TXQ0  0x0040 /* Tx Queue 0 Interrupt */
>  #define E1000_ICR_TXQ1  0x0080 /* Tx Queue 1 Interrupt */
> -#define E1000_ICR_OTHER 0x0100 /* Other Interrupts */
> +#define E1000_ICR_OTHER 0x0100 /* Other Interrupt */
>
>  /* PBA ECC Register */
>  #define E1000_PBA_ECC_COUNTER_MASK  0xFFF0 /* ECC counter mask */
> @@ -431,12 +435,27 @@
> E1000_IMS_RXSEQ  |\
> E1000_IMS_LSC)
>
> +/* These are all of the events related to the OTHER interrupt.
> + */
> +#define IMS_OTHER_MASK ( \
> +   E1000_IMS_LSC  | \
> +   E1000_IMS_RXO  | \
> +   E1000_IMS_MDAC | \
> +   E1000_IMS_SRPD | \
> +   E1000_IMS_ACK  | \
> +   E1000_IMS_MNG)
> +
>  /* Interrupt Mask Set */
>  #define E1000_IMS_TXDW  E1000_ICR_TXDW  /* Transmit desc written 
> back */
>  #define E1000_IMS_LSC   E1000_ICR_LSC   /* Link Status Change */
>  #define E1000_IMS_RXSEQ E1000_ICR_RXSEQ /* Rx sequence error */
>  #define E1000_IMS_RXDMT0E1000_ICR_RXDMT0/* Rx desc min. threshold */
> +#define E1000_IMS_RXO   E1000_ICR_RXO   /* Receiver Overrun */
>  #define E1000_IMS_RXT0  E1000_ICR_RXT0  /* Rx timer intr */
> +#define E1000_IMS_MDAC  E1000_ICR_MDAC  /* MDIO Access Complete */
> +#define E1000_IMS_SRPD  E1000_ICR_SRPD  /* Small Receive Packet */
> +#define E1000_IMS_ACK   E1000_ICR_ACK   /* Receive ACK Frame 
> Detected */
> +#define E1000_IMS_MNG   E1000_ICR_MNG   /* Manageability Event */
>  #define E1000_IMS_ECCER E1000_ICR_ECCER /* Uncorrectable ECC Error */
>  #define E1000_IMS_RXQ0  E1000_ICR_RXQ0  /* Rx Queue 0 Interrupt */
>  #define E1000_IMS_RXQ1  E1000_ICR_RXQ1  /* Rx Queue 1 Interrupt */
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 2c9609bee2ae..9fd4050a91ca 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1914,16 +1914,12 @@ static irqreturn_t e1000_msix_other(int 
> __always_unused irq, void *data)
> struct net_device *netdev = data;
> struct e1000_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = >hw;
> -   u32 icr;
> -
> -   icr = er32(ICR);
> -   ew32(ICR, E1000_ICR_OTHER);
> +   u32 icr = er32(ICR);
>
> if (icr & adapter->eiac_mask)
> ew32(ICS, (icr & adapter->eiac_mask));
>
> if (icr & E1000_ICR_LSC) {
> -   ew32(ICR, E1000_ICR_LSC);
> hw->mac.get_link_status = true;
> /* guard against interrupt when we're going down */
> if (!test_bit(__E1000_DOWN, >state))
> @@ -1931,7 +1927,7 @@ static irqreturn_t e1000_msix_other(int 

Re: [PATCH net-queue 2/3] e1000e: Fix queue interrupt re-raising in Other interrupt.

2018-02-08 Thread Alexander Duyck
On Wed, Feb 7, 2018 at 10:47 PM, Benjamin Poirier  wrote:
> restores the ICS write for rx/tx queue interrupts which was present before
> commit 16ecba59bc33 ("e1000e: Do not read ICR in Other interrupt",
> v4.5-rc1) but was not restored in commit 4aea7a5c5e94 ("e1000e: Avoid
> receiver overrun interrupt bursts", v4.15-rc1).
>
> This re-raises the queue interrupts in case the txq or rxq bits were set in
> ICR and the Other interrupt handler read and cleared ICR before the queue
> interrupt was raised.
>
> Fixes: 4aea7a5c5e94 ("e1000e: Avoid receiver overrun interrupt bursts")
> Signed-off-by: Benjamin Poirier 

Acked-by: Alexander Duyck 

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 3b36efa6228d..2c9609bee2ae 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1919,6 +1919,9 @@ static irqreturn_t e1000_msix_other(int __always_unused 
> irq, void *data)
> icr = er32(ICR);
> ew32(ICR, E1000_ICR_OTHER);
>
> +   if (icr & adapter->eiac_mask)
> +   ew32(ICS, (icr & adapter->eiac_mask));
> +
> if (icr & E1000_ICR_LSC) {
> ew32(ICR, E1000_ICR_LSC);
> hw->mac.get_link_status = true;
> --
> 2.16.1
>


Re: [PATCH net-queue 1/3] Partial revert "e1000e: Avoid receiver overrun interrupt bursts"

2018-02-08 Thread Alexander Duyck
On Wed, Feb 7, 2018 at 10:47 PM, Benjamin Poirier  wrote:
> This partially reverts commit 4aea7a5c5e940c1723add439f4088844cd26196d.
>
> We keep the fix for the first part of the problem (1) described in the log
> of that commit, that is to read ICR in the other interrupt handler. We
> remove the fix for the second part of the problem (2), Other interrupt
> throttling.
>
> Bursts of "Other" interrupts may once again occur during rxo (receive
> overflow) traffic conditions. This is deemed acceptable in the interest of
> avoiding unforeseen fallout from changes that are not strictly necessary.
> As discussed, the e1000e driver should be in "maintenance mode".
>
> Link: https://www.spinics.net/lists/netdev/msg480675.html
> Signed-off-by: Benjamin Poirier 

Acked-by: Alexander Duyck 

> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 16 ++--
>  1 file changed, 2 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 153ad406c65e..3b36efa6228d 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1915,21 +1915,10 @@ static irqreturn_t e1000_msix_other(int 
> __always_unused irq, void *data)
> struct e1000_adapter *adapter = netdev_priv(netdev);
> struct e1000_hw *hw = >hw;
> u32 icr;
> -   bool enable = true;
>
> icr = er32(ICR);
> ew32(ICR, E1000_ICR_OTHER);
>
> -   if (icr & E1000_ICR_RXO) {
> -   ew32(ICR, E1000_ICR_RXO);
> -   enable = false;
> -   /* napi poll will re-enable Other, make sure it runs */
> -   if (napi_schedule_prep(>napi)) {
> -   adapter->total_rx_bytes = 0;
> -   adapter->total_rx_packets = 0;
> -   __napi_schedule(>napi);
> -   }
> -   }
> if (icr & E1000_ICR_LSC) {
> ew32(ICR, E1000_ICR_LSC);
> hw->mac.get_link_status = true;
> @@ -1938,7 +1927,7 @@ static irqreturn_t e1000_msix_other(int __always_unused 
> irq, void *data)
> mod_timer(>watchdog_timer, jiffies + 1);
> }
>
> -   if (enable && !test_bit(__E1000_DOWN, >state))
> +   if (!test_bit(__E1000_DOWN, >state))
> ew32(IMS, E1000_IMS_OTHER);
>
> return IRQ_HANDLED;
> @@ -2708,8 +2697,7 @@ static int e1000e_poll(struct napi_struct *napi, int 
> weight)
> napi_complete_done(napi, work_done);
> if (!test_bit(__E1000_DOWN, >state)) {
> if (adapter->msix_entries)
> -   ew32(IMS, adapter->rx_ring->ims_val |
> -E1000_IMS_OTHER);
> +   ew32(IMS, adapter->rx_ring->ims_val);
> else
> e1000_irq_enable(adapter);
> }
> --
> 2.16.1
>


Re: [bpf-next V3 PATCH 1/5] bpf: Sync kernel ABI header with tooling header for bpf_common.h

2018-02-08 Thread Arnaldo Carvalho de Melo
Em Thu, Feb 08, 2018 at 12:48:12PM +0100, Jesper Dangaard Brouer escreveu:
> I recently fixed up a lot of commits that forgot to keep the tooling
> headers in sync.  And then I forgot to do the same thing in commit
> cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
> that before people notice ;-).
> 
> Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09
> ("bpf: add selftest for tcpbpf").
> 
> Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes")

We don't consider a bug to forget to update the tooling headers copy of
the files, i.e. its not a strict requirement on kernel developers to
care about tools/ :-)

I, for one, like to get the warning, its an opportunity for me to see
that something changed and that I should pay attention to see if
something needs to be done in the tooling side.

- Arnaldo

> Signed-off-by: Jesper Dangaard Brouer 
> ---
>  tools/include/uapi/linux/bpf_common.h |7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/include/uapi/linux/bpf_common.h 
> b/tools/include/uapi/linux/bpf_common.h
> index 18be90725ab0..ee97668bdadb 100644
> --- a/tools/include/uapi/linux/bpf_common.h
> +++ b/tools/include/uapi/linux/bpf_common.h
> @@ -15,9 +15,10 @@
>  
>  /* ld/ldx fields */
>  #define BPF_SIZE(code)  ((code) & 0x18)
> -#define  BPF_W   0x00
> -#define  BPF_H   0x08
> -#define  BPF_B   0x10
> +#define  BPF_W   0x00 /* 32-bit */
> +#define  BPF_H   0x08 /* 16-bit */
> +#define  BPF_B   0x10 /*  8-bit */
> +/* eBPF  BPF_DW  0x1864-bit */
>  #define BPF_MODE(code)  ((code) & 0xe0)
>  #define  BPF_IMM 0x00
>  #define  BPF_ABS 0x20


Re: [iproute PATCH] ip-route: Propagate errors from parse_one_nh()

2018-02-08 Thread Phil Sutter
Hi,

On Thu, Feb 08, 2018 at 02:26:05PM +0100, Élie Bouttier wrote:
> On 24/01/2018 16:44, Stephen Hemminger wrote:
> > On Wed, 24 Jan 2018 10:19:24 +0100
> > Phil Sutter  wrote:
> >> On Tue, Jan 23, 2018 at 02:44:42PM -0800, Stephen Hemminger wrote:
> >>> The original commit 2f406f2d0b4ef ("ip route: replace exits with returns")
> >>> looks like well intentioned but suspect. Most of the errors in ip route
> >>> indicate real issues where continuing is not a good plan.  
> >>
> >> You're right, the use of invarg() for any other error effectively
> >> prevents what said commit tried to achieve, so my fix is pretty
> >> pointless in that regard. Yet I wonder why we still have 'ip -batch
> >> -force' given that it's not useful. Maybe Élie is able to provide some
> >> details about the use-case said commit tried to fix?
> >>
> >> Meanwhile I'll prepare some patches to address the shortcomings you
> >> mentioned above.
> > 
> > The use case for batch (and force) is that there may be a large set of 
> > routes
> > or qdisc operations where it is ok for some of them to fail because of 
> > responses
> > from the kernel failing.  I don't think batch should ever just continue if 
> > handed
> > invalid syntax for device or address. There are some borderline cases, for 
> > example
> > if a tunnel device could not be created and later steps depend on that name.
> > 
> > Agree, lets get some real data on why the original patch was done.
> 
> If I remember correctly, I made this commit for a batch of "route get"
> and not stop on inexistent routes. But my patch was not limited to this
> use case and I tried to fix others potential situations where we should
> not stop. The change I made in parse_one_nh is not directly related to
> my use care, sorry for the introduced regression, I should have been
> more careful.
> 
> Ihmo, if a tunnel device could not be created, later steps depending on
> it will fail too, but we potentially still want subsequent operations
> (for instance the creation of a second tunnel) to be done. I you don't
> want it, don't use -force or make two batch files. From this point of
> view, Phil patch is better than invarg() but I'm fine with invarg() too.

OK, that 'route get' case seems pretty valid for me. Also, the situation
I was fixing for in the first place was caused by non-existing
interface, which may very well be caused by a previous attempt at
creating it which had failed. So not really a case of "invalid syntax",
either.

I guess the best approach to satisfy all considerations would be to make
a clear distinction between syntax errors and others (including
non-existing interface e.g.) and not allowing for the latter to exit the
program but propagate errors up to a non-zero return code in do_cmd().
Might be quite some work given that '-batch -force' seems to be a rarely
used combination.

I had a quick idea of changing batch mode to fork and exec in order to
avoid the diligent work needed, but a quick comparison between batch
mode and separate ip invocations for about 64k commands each showed a
slowdown of over 20 times, so really not an alternative here.

My guess is that distinguishing between syntax error and "regular" ones
won't be very clear in all cases and trying to maintain it might be
error-prone. So I'd personally just not make that distinction at all but
try to let -force continue at all times, no matter what. Especially
since the user explicitly requested this behaviour.

Stephen, I think this is a maintainer's decision now. :)

Cheers, Phil


Re: possible deadlock in rtnl_lock (4)

2018-02-08 Thread Xin Long
On Thu, Feb 8, 2018 at 9:25 PM, Dmitry Vyukov  wrote:
> On Thu, Feb 8, 2018 at 10:54 AM, Xin Long  wrote:
>> On Thu, Feb 8, 2018 at 6:58 AM, syzbot
>>  wrote:
>>> Hello,
>>>
>>> syzbot hit the following crash on upstream commit
>>> a2e5790d841658485d642196dbb0927303d6c22f (Wed Feb 7 06:15:42 2018 +)
>>> Merge branch 'akpm' (patches from Andrew)
>>>
>>> So far this crash happened 632 times on
>>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master.
>>> C reproducer is attached.
>>> syzkaller reproducer is attached.
>>> Raw console output is attached.
>>> compiler: gcc (GCC) 7.1.1 20170620
>>> .config is attached.
>>>
>>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>>> Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
>>> It will help syzbot understand when the bug is fixed. See footer for
>>> details.
>>> If you forward the report, please keep this part and the footer.
>>>
>>>
>>> ==
>>> WARNING: possible circular locking dependency detected
>>> 4.15.0+ #301 Not tainted
>>> --
>>> syzkaller233489/4179 is trying to acquire lock:
>>>  (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
>>> net/core/rtnetlink.c:74
>>>
>>> but task is already holding lock:
>>>  ([i].mutex){+.+.}, at: [<328553a2>]
>>> xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>>>
>>> which lock already depends on the new lock.
>>>
>>>
>>> the existing dependency chain (in reverse order) is:
>>>
>>> -> #2 ([i].mutex){+.+.}:
>>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>>xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>>>xt_request_find_table_lock+0x28/0xc0 net/netfilter/x_tables.c:1088
>>>get_info+0x154/0x690 net/ipv6/netfilter/ip6_tables.c:989
>>>do_ipt_get_ctl+0x159/0xac0 net/ipv4/netfilter/ip_tables.c:1699
>>>nf_sockopt net/netfilter/nf_sockopt.c:104 [inline]
>>>nf_getsockopt+0x6a/0xc0 net/netfilter/nf_sockopt.c:122
>>>ip_getsockopt+0x15c/0x220 net/ipv4/ip_sockglue.c:1571
>>>tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3359
>>>sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2934
>>>SYSC_getsockopt net/socket.c:1880 [inline]
>>>SyS_getsockopt+0x178/0x340 net/socket.c:1862
>>>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>>>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>>>
>>> -> #1 (sk_lock-AF_INET){+.+.}:
>>>lock_sock_nested+0xc2/0x110 net/core/sock.c:2777
>>>lock_sock include/net/sock.h:1463 [inline]
>>>do_ip_setsockopt.isra.12+0x1d9/0x3210 net/ipv4/ip_sockglue.c:646
>>>ip_setsockopt+0x3a/0xa0 net/ipv4/ip_sockglue.c:1252
>>>udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2401
>>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>>>SYSC_setsockopt net/socket.c:1849 [inline]
>>>SyS_setsockopt+0x189/0x360 net/socket.c:1828
>>>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>>>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>>>
>>> -> #0 (rtnl_mutex){+.+.}:
>>>lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>>>unregister_netdevice_notifier+0x91/0x4e0 net/core/dev.c:1673
>>>clusterip_config_entry_put net/ipv4/netfilter/ipt_CLUSTERIP.c:114
>>> [inline]
>>>clusterip_tg_destroy+0x389/0x6e0
>>> net/ipv4/netfilter/ipt_CLUSTERIP.c:518
>>>cleanup_entry+0x218/0x350 net/ipv4/netfilter/ip_tables.c:654
>>>__do_replace+0x79d/0xa50 net/ipv4/netfilter/ip_tables.c:1089
>>>do_replace net/ipv4/netfilter/ip_tables.c:1145 [inline]
>>>do_ipt_set_ctl+0x40f/0x5f0 net/ipv4/netfilter/ip_tables.c:1675
>>>nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>>>nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>>>ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1259
>>>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2905
>>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>>>SYSC_setsockopt net/socket.c:1849 [inline]
>>>SyS_setsockopt+0x189/0x360 net/socket.c:1828
>>>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>>>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>>>
>>> other info that might help us debug this:
>>>
>>> Chain exists of:
>>>   rtnl_mutex --> sk_lock-AF_INET --> [i].mutex
>>>
>>>  Possible unsafe locking scenario:
>>>
>>>

Re: [iproute PATCH] ip-route: Propagate errors from parse_one_nh()

2018-02-08 Thread Élie Bouttier
On 24/01/2018 16:44, Stephen Hemminger wrote:
> On Wed, 24 Jan 2018 10:19:24 +0100
> Phil Sutter  wrote:
> 
>> Hi Stephen,
>>
>> On Tue, Jan 23, 2018 at 02:44:42PM -0800, Stephen Hemminger wrote:
>>> On Tue, 23 Jan 2018 17:40:47 +0100
>>> Phil Sutter  wrote:
>>>   
 The following command segfaults if enp0s31f6 does not exist:

 | # ip -6 route add default proto ra metric 20100 \
 |  nexthop via fe80:52:0:2040::1fc dev enp0s31f6 weight 1 \
 |  nexthop via fe80:52:0:2040::1fe dev enp0s31f6 weight 1

 Since the non-zero return code from parse_one_nh() is ignored,
 parse_nexthops() continues iterating over the the same fields in argv
 until buffer space is exhausted and eventually accesses unallocated
 memory.

 Fix this by aborting on error in parse_nexthops() and make
 iproute_modify() fail if parse_nexthops() did.

 Reported-by: Lennart Poettering 
 Fixes: 2f406f2d0b4ef ("ip route: replace exits with returns")
 Signed-off-by: Phil Sutter 
 ---
  ip/iproute.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)

 diff --git a/ip/iproute.c b/ip/iproute.c
 index bf886fda9d761..d7accf57ac8d1 100644
 --- a/ip/iproute.c
 +++ b/ip/iproute.c
 @@ -871,7 +871,8 @@ static int parse_nexthops(struct nlmsghdr *n, struct 
 rtmsg *r,
memset(rtnh, 0, sizeof(*rtnh));
rtnh->rtnh_len = sizeof(*rtnh);
rta->rta_len += rtnh->rtnh_len;
 -  parse_one_nh(n, r, rta, rtnh, , );
 +  if (parse_one_nh(n, r, rta, rtnh, , ) < 0)
 +  return -1;
rtnh = RTNH_NEXT(rtnh);
}
  
 @@ -1318,8 +1319,8 @@ static int iproute_modify(int cmd, unsigned int 
 flags, int argc, char **argv)
addattr_l(, sizeof(req), RTA_METRICS, RTA_DATA(mxrta), 
 RTA_PAYLOAD(mxrta));
}
  
 -  if (nhs_ok)
 -  parse_nexthops(, , argc, argv);
 +  if (nhs_ok && parse_nexthops(, , argc, argv) < 0)
 +  return -1;
  
if (req.r.rtm_family == AF_UNSPEC)
req.r.rtm_family = AF_INET;  
>>>
>>>
>>> The real issue is that handling of invalid device is different than all the 
>>> other
>>> possible semantic errors.
>>>
>>> My recommendations are:
>>> * change bad device to use invarg() which does exit
>>> * make functions that only return 0 void including
>>> parse_one_nh
>>> lwt_parse_encap
>>> get_addr
>>>
>>> Also, it looks like read_family converts any address family it doesn't know 
>>> about to unspec
>>> that is stupid behavior as well.
>>>
>>> The original commit 2f406f2d0b4ef ("ip route: replace exits with returns")
>>> looks like well intentioned but suspect. Most of the errors in ip route
>>> indicate real issues where continuing is not a good plan.  
>>
>> You're right, the use of invarg() for any other error effectively
>> prevents what said commit tried to achieve, so my fix is pretty
>> pointless in that regard. Yet I wonder why we still have 'ip -batch
>> -force' given that it's not useful. Maybe Élie is able to provide some
>> details about the use-case said commit tried to fix?
>>
>> Meanwhile I'll prepare some patches to address the shortcomings you
>> mentioned above.
> 
> The use case for batch (and force) is that there may be a large set of routes
> or qdisc operations where it is ok for some of them to fail because of 
> responses
> from the kernel failing.  I don't think batch should ever just continue if 
> handed
> invalid syntax for device or address. There are some borderline cases, for 
> example
> if a tunnel device could not be created and later steps depend on that name.
> 
> Agree, lets get some real data on why the original patch was done.

If I remember correctly, I made this commit for a batch of "route get"
and not stop on inexistent routes. But my patch was not limited to this
use case and I tried to fix others potential situations where we should
not stop. The change I made in parse_one_nh is not directly related to
my use care, sorry for the introduced regression, I should have been
more careful.

Ihmo, if a tunnel device could not be created, later steps depending on
it will fail too, but we potentially still want subsequent operations
(for instance the creation of a second tunnel) to be done. I you don't
want it, don't use -force or make two batch files. From this point of
view, Phil patch is better than invarg() but I'm fine with invarg() too.


Thank you all and sorry again,

Élie


Re: possible deadlock in rtnl_lock (4)

2018-02-08 Thread Dmitry Vyukov
On Thu, Feb 8, 2018 at 10:54 AM, Xin Long  wrote:
> On Thu, Feb 8, 2018 at 6:58 AM, syzbot
>  wrote:
>> Hello,
>>
>> syzbot hit the following crash on upstream commit
>> a2e5790d841658485d642196dbb0927303d6c22f (Wed Feb 7 06:15:42 2018 +)
>> Merge branch 'akpm' (patches from Andrew)
>>
>> So far this crash happened 632 times on
>> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master.
>> C reproducer is attached.
>> syzkaller reproducer is attached.
>> Raw console output is attached.
>> compiler: gcc (GCC) 7.1.1 20170620
>> .config is attached.
>>
>> IMPORTANT: if you fix the bug, please add the following tag to the commit:
>> Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
>> It will help syzbot understand when the bug is fixed. See footer for
>> details.
>> If you forward the report, please keep this part and the footer.
>>
>>
>> ==
>> WARNING: possible circular locking dependency detected
>> 4.15.0+ #301 Not tainted
>> --
>> syzkaller233489/4179 is trying to acquire lock:
>>  (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
>> net/core/rtnetlink.c:74
>>
>> but task is already holding lock:
>>  ([i].mutex){+.+.}, at: [<328553a2>]
>> xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>>
>> which lock already depends on the new lock.
>>
>>
>> the existing dependency chain (in reverse order) is:
>>
>> -> #2 ([i].mutex){+.+.}:
>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>>xt_request_find_table_lock+0x28/0xc0 net/netfilter/x_tables.c:1088
>>get_info+0x154/0x690 net/ipv6/netfilter/ip6_tables.c:989
>>do_ipt_get_ctl+0x159/0xac0 net/ipv4/netfilter/ip_tables.c:1699
>>nf_sockopt net/netfilter/nf_sockopt.c:104 [inline]
>>nf_getsockopt+0x6a/0xc0 net/netfilter/nf_sockopt.c:122
>>ip_getsockopt+0x15c/0x220 net/ipv4/ip_sockglue.c:1571
>>tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3359
>>sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2934
>>SYSC_getsockopt net/socket.c:1880 [inline]
>>SyS_getsockopt+0x178/0x340 net/socket.c:1862
>>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>>
>> -> #1 (sk_lock-AF_INET){+.+.}:
>>lock_sock_nested+0xc2/0x110 net/core/sock.c:2777
>>lock_sock include/net/sock.h:1463 [inline]
>>do_ip_setsockopt.isra.12+0x1d9/0x3210 net/ipv4/ip_sockglue.c:646
>>ip_setsockopt+0x3a/0xa0 net/ipv4/ip_sockglue.c:1252
>>udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2401
>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>>SYSC_setsockopt net/socket.c:1849 [inline]
>>SyS_setsockopt+0x189/0x360 net/socket.c:1828
>>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>>
>> -> #0 (rtnl_mutex){+.+.}:
>>lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>>unregister_netdevice_notifier+0x91/0x4e0 net/core/dev.c:1673
>>clusterip_config_entry_put net/ipv4/netfilter/ipt_CLUSTERIP.c:114
>> [inline]
>>clusterip_tg_destroy+0x389/0x6e0
>> net/ipv4/netfilter/ipt_CLUSTERIP.c:518
>>cleanup_entry+0x218/0x350 net/ipv4/netfilter/ip_tables.c:654
>>__do_replace+0x79d/0xa50 net/ipv4/netfilter/ip_tables.c:1089
>>do_replace net/ipv4/netfilter/ip_tables.c:1145 [inline]
>>do_ipt_set_ctl+0x40f/0x5f0 net/ipv4/netfilter/ip_tables.c:1675
>>nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>>nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>>ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1259
>>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2905
>>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>>SYSC_setsockopt net/socket.c:1849 [inline]
>>SyS_setsockopt+0x189/0x360 net/socket.c:1828
>>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>>
>> other info that might help us debug this:
>>
>> Chain exists of:
>>   rtnl_mutex --> sk_lock-AF_INET --> [i].mutex
>>
>>  Possible unsafe locking scenario:
>>
>>CPU0CPU1
>>
>>   lock([i].mutex);
>>lock(sk_lock-AF_INET);
>>  

Re: selftests: net: reuseport_bpf failed intermittently

2018-02-08 Thread Naresh Kamboju
Hi Daniel,

On 8 February 2018 at 18:07, Daniel Borkmann  wrote:
> On 02/08/2018 12:41 PM, Naresh Kamboju wrote:
>> selftests/net/reuseport_bpf FAILED in full run on x86_64 and the
>> independent test execution resulted as PASS.
>>
>> Test failed output log:
>> -
>> Testing too many filters...
>> Testing filters on non-SO_REUSEPORT socket...
>> ./reuseport_bpf : ebpf error: Operation not permitted
>>
>> One more experiment,
>> The same test case "reuseport_bpf" run 10 times in a row.
>> The first run pass and other runs failed.
>>
>> Running tests for 10 times
>> + cd /opt/kselftests/mainline/net
>> + ./reuseport_bpf
> [...]
>>  IPv6 TCP w/ mapped IPv4 
>> Testing EBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing CBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing filter add without bind...
>> SUCCESS
>> + echo PASS
>> PASS
>> + ./reuseport_bpf
>>  IPv4 UDP 
>> Testing EBPF mod 10...
>> Reprograming, testing mod 5...
>> ./reuseport_bpf: ebpf error. log:
>> 0: (bf) r6 = r1
>> 1: (20) r0 = *(u32 *)skb[0]
>> 2: (97) r0 %= 10
>> 3: (95) exit
>> processed 4 insns
>> : Operation not permitted
>> + echo FAIL
>> FAIL
>> + ./reuseport_bpf
>>  IPv4 UDP 
>> Testing EBPF mod 10...
>> Reprograming, testing mod 5...
>> ./reuseport_bpf: ebpf error. log:
>> 0: (bf) r6 = r1
>> 1: (20) r0 = *(u32 *)skb[0]
>> 2: (97) r0 %= 10
>> 3: (95) exit
>> processed 4 insns
>> : Operation not permitted
>> + echo FAIL
>> FAIL
>> + ./reuseport_bpf
>>  IPv4 UDP 
>> Testing EBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing EBPF mod 20...
>> Reprograming, testing mod 10...
>> Testing CBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing CBPF mod 20...
>> Reprograming, testing mod 10...
>> Testing too many filters...
>> Testing filters on non-SO_REUSEPORT socket...
>>  IPv6 UDP 
>> Testing EBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing EBPF mod 20...
>> Reprograming, testing mod 10...
>> Testing CBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing CBPF mod 20...
>> Reprograming, testing mod 10...
>> Testing too many filters...
>> Testing filters on non-SO_REUSEPORT socket...
>>  IPv6 UDP w/ mapped IPv4 
>> Testing EBPF mod 20...
>> Reprograming, testing mod 10...
>> Testing EBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing CBPF mod 10...
>> Reprograming, testing mod 5...
>> Testing CBPF mod 20...
>> Reprograming, testing mod 10...
>>  IPv4 TCP 
>> Testing EBPF mod 10...
>> ./reuseport_bpf: failed to bind send socket: Address already in use
>> + echo FAIL
> [...]
>>
>> Please refer this bug link for more details
>> https://bugs.linaro.org/show_bug.cgi?id=3502
>>
>> We are using x86_64 machine and the rootfs is mounted on NFS.
>> Please guide me in solving this test case failure.
>
> [ +Alexei ]
>
> Below should fix these two issues. (For checking the bpf selftests, we should
> probably adapt similar scheme with ctor/dtor to act more graceful.)
>
>  tools/testing/selftests/net/reuseport_bpf.c | 21 -
>  1 file changed, 20 insertions(+), 1 deletion(-)

Thanks for the quick response and patch.
I will test this patch and share logs.

>
> diff --git a/tools/testing/selftests/net/reuseport_bpf.c 
> b/tools/testing/selftests/net/reuseport_bpf.c
> index 4a82174..cad14cd 100644
> --- a/tools/testing/selftests/net/reuseport_bpf.c
> +++ b/tools/testing/selftests/net/reuseport_bpf.c
> @@ -21,6 +21,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>
>  #ifndef ARRAY_SIZE
> @@ -190,11 +191,14 @@ static void send_from(struct test_params p, uint16_t 
> sport, char *buf,
> struct sockaddr * const saddr = new_any_sockaddr(p.send_family, 
> sport);
> struct sockaddr * const daddr =
> new_loopback_sockaddr(p.send_family, p.recv_port);
> -   const int fd = socket(p.send_family, p.protocol, 0);
> +   const int fd = socket(p.send_family, p.protocol, 0), one = 1;
>
> if (fd < 0)
> error(1, errno, "failed to create send socket");
>
> +   if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, , sizeof(one)))
> +   error(1, errno, "failed to set reuseaddr");
> +
> if (bind(fd, saddr, sockaddr_size()))
> error(1, errno, "failed to bind send socket");
>
> @@ -433,6 +437,21 @@ void enable_fastopen(void)
> }
>  }
>
> +static struct rlimit rlim_old, rlim_new;
> +
> +static  __attribute__((constructor)) void main_ctor(void)
> +{
> +   getrlimit(RLIMIT_MEMLOCK, _old);
> +   rlim_new.rlim_cur = rlim_old.rlim_cur + (1UL << 20);
> +   rlim_new.rlim_max = rlim_old.rlim_max + (1UL << 20);
> +   setrlimit(RLIMIT_MEMLOCK, _new);
> +}
> +
> +static __attribute__((destructor)) void main_dtor(void)
> +{
> +   setrlimit(RLIMIT_MEMLOCK, _old);
> +}
> +
>  int main(void)
>  {
> fprintf(stderr, " IPv4 UDP \n");
> --
> 2.9.5
>

- Naresh


Re: net: phy: question about phy_is_internal for generic-phy

2018-02-08 Thread Andrew Lunn
On Thu, Feb 08, 2018 at 07:09:25PM +0900, Kunihiko Hayashi wrote:
> Hello,
> 
> Is there a way to specify "phy is internal" to generic phy driver,
> that is, to make phy_is_internal() function available?
> 
> I found "phy-is-integrated" DT property in
> Documentation/devicetree/bindings/net/phy.txt, however, it seems
> that the property is no effect for generic phy. And I think that
> the meaning of "integrated" is slightly different from "internal".

Hi Kunihiko

Could you explain the bigger picture. Why do you need this?

Thanks
Andrew



 


Re: Regression for ip6-in-ip4 IPsec tunnel in 4.14.16

2018-02-08 Thread Yves-Alexis Perez
On Wed, 2018-02-07 at 20:46 +0100, Yves-Alexis Perez wrote:
> Maybe. I tried with removing the MTU setting, and I get (on ping again)
> 
> févr. 07 20:44:01 scapa kernel: mtu: 1266
> 
> which means I would get -EINVAL on standards kernels, which is not really good
> either.

Actually after rebooting on the Debian 4.14.17 kernel, with the outter MTU
unset (or set to 1360), I don't get the -EINVAL anymore, so maybe it'd be OK.

Unfortunately I have the feeling that debugging this will be a bit tricky for
other people. MTU tuning for tunnels are always a bit confusing, but having
the kernel return EINVAL on a ping doesn't really help narrowing it down to
that MTU setting. Not sure if some kind of logging could be added to help?

Regards,
-- 
Yves-Alexis

signature.asc
Description: This is a digitally signed message part


Re: selftests: net: reuseport_bpf failed intermittently

2018-02-08 Thread Daniel Borkmann
On 02/08/2018 12:41 PM, Naresh Kamboju wrote:
> selftests/net/reuseport_bpf FAILED in full run on x86_64 and the
> independent test execution resulted as PASS.
> 
> Test failed output log:
> -
> Testing too many filters...
> Testing filters on non-SO_REUSEPORT socket...
> ./reuseport_bpf : ebpf error: Operation not permitted
> 
> One more experiment,
> The same test case "reuseport_bpf" run 10 times in a row.
> The first run pass and other runs failed.
> 
> Running tests for 10 times
> + cd /opt/kselftests/mainline/net
> + ./reuseport_bpf
[...]
>  IPv6 TCP w/ mapped IPv4 
> Testing EBPF mod 10...
> Reprograming, testing mod 5...
> Testing CBPF mod 10...
> Reprograming, testing mod 5...
> Testing filter add without bind...
> SUCCESS
> + echo PASS
> PASS
> + ./reuseport_bpf
>  IPv4 UDP 
> Testing EBPF mod 10...
> Reprograming, testing mod 5...
> ./reuseport_bpf: ebpf error. log:
> 0: (bf) r6 = r1
> 1: (20) r0 = *(u32 *)skb[0]
> 2: (97) r0 %= 10
> 3: (95) exit
> processed 4 insns
> : Operation not permitted
> + echo FAIL
> FAIL
> + ./reuseport_bpf
>  IPv4 UDP 
> Testing EBPF mod 10...
> Reprograming, testing mod 5...
> ./reuseport_bpf: ebpf error. log:
> 0: (bf) r6 = r1
> 1: (20) r0 = *(u32 *)skb[0]
> 2: (97) r0 %= 10
> 3: (95) exit
> processed 4 insns
> : Operation not permitted
> + echo FAIL
> FAIL
> + ./reuseport_bpf
>  IPv4 UDP 
> Testing EBPF mod 10...
> Reprograming, testing mod 5...
> Testing EBPF mod 20...
> Reprograming, testing mod 10...
> Testing CBPF mod 10...
> Reprograming, testing mod 5...
> Testing CBPF mod 20...
> Reprograming, testing mod 10...
> Testing too many filters...
> Testing filters on non-SO_REUSEPORT socket...
>  IPv6 UDP 
> Testing EBPF mod 10...
> Reprograming, testing mod 5...
> Testing EBPF mod 20...
> Reprograming, testing mod 10...
> Testing CBPF mod 10...
> Reprograming, testing mod 5...
> Testing CBPF mod 20...
> Reprograming, testing mod 10...
> Testing too many filters...
> Testing filters on non-SO_REUSEPORT socket...
>  IPv6 UDP w/ mapped IPv4 
> Testing EBPF mod 20...
> Reprograming, testing mod 10...
> Testing EBPF mod 10...
> Reprograming, testing mod 5...
> Testing CBPF mod 10...
> Reprograming, testing mod 5...
> Testing CBPF mod 20...
> Reprograming, testing mod 10...
>  IPv4 TCP 
> Testing EBPF mod 10...
> ./reuseport_bpf: failed to bind send socket: Address already in use
> + echo FAIL
[...]
> 
> Please refer this bug link for more details
> https://bugs.linaro.org/show_bug.cgi?id=3502
> 
> We are using x86_64 machine and the rootfs is mounted on NFS.
> Please guide me in solving this test case failure.

[ +Alexei ]

Below should fix these two issues. (For checking the bpf selftests, we should
probably adapt similar scheme with ctor/dtor to act more graceful.)

 tools/testing/selftests/net/reuseport_bpf.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/tools/testing/selftests/net/reuseport_bpf.c 
b/tools/testing/selftests/net/reuseport_bpf.c
index 4a82174..cad14cd 100644
--- a/tools/testing/selftests/net/reuseport_bpf.c
+++ b/tools/testing/selftests/net/reuseport_bpf.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 

 #ifndef ARRAY_SIZE
@@ -190,11 +191,14 @@ static void send_from(struct test_params p, uint16_t 
sport, char *buf,
struct sockaddr * const saddr = new_any_sockaddr(p.send_family, sport);
struct sockaddr * const daddr =
new_loopback_sockaddr(p.send_family, p.recv_port);
-   const int fd = socket(p.send_family, p.protocol, 0);
+   const int fd = socket(p.send_family, p.protocol, 0), one = 1;

if (fd < 0)
error(1, errno, "failed to create send socket");

+   if (setsockopt(fd, SOL_SOCKET, SO_REUSEADDR, , sizeof(one)))
+   error(1, errno, "failed to set reuseaddr");
+
if (bind(fd, saddr, sockaddr_size()))
error(1, errno, "failed to bind send socket");

@@ -433,6 +437,21 @@ void enable_fastopen(void)
}
 }

+static struct rlimit rlim_old, rlim_new;
+
+static  __attribute__((constructor)) void main_ctor(void)
+{
+   getrlimit(RLIMIT_MEMLOCK, _old);
+   rlim_new.rlim_cur = rlim_old.rlim_cur + (1UL << 20);
+   rlim_new.rlim_max = rlim_old.rlim_max + (1UL << 20);
+   setrlimit(RLIMIT_MEMLOCK, _new);
+}
+
+static __attribute__((destructor)) void main_dtor(void)
+{
+   setrlimit(RLIMIT_MEMLOCK, _old);
+}
+
 int main(void)
 {
fprintf(stderr, " IPv4 UDP \n");
-- 
2.9.5



Re: [oss-drivers] [PATCH net 5/5] nfp: populate MODULE_VERSION

2018-02-08 Thread Simon Horman
On Wed, Feb 07, 2018 at 08:55:26PM -0800, Jakub Kicinski wrote:
> DKMS and similar out-of-tree module replacement services use
> module version to make sure the out-of-tree software is not
> older than the module shipped with the kernel.  We use the
> kernel version in ethtool -i output, put it into MODULE_VERSION
> as well.
> 
> Reported-by: Jan Gutter 
> Signed-off-by: Jakub Kicinski 
> Reviewed-by: Dirk van der Merwe 

Reviewed-by: Simon Horman 


Re: [oss-drivers] [PATCH net 4/5] nfp: limit the number of TSO segments

2018-02-08 Thread Simon Horman
On Wed, Feb 07, 2018 at 08:55:25PM -0800, Jakub Kicinski wrote:
> Most FWs limit the number of TSO segments a frame can produce
> to 64.  This is for fairness and efficiency (of FW datapath)
> reasons.  If a frame with larger number of segments is submitted
> the FW will drop it.
> 
> Signed-off-by: Jakub Kicinski 

Reviewed-by: Simon Horman 


[RFC][PATCH bpf 2/2] bpf: powerpc64: add JIT support for multi-function programs

2018-02-08 Thread Sandipan Das
This adds support for bpf-to-bpf function calls for the powerpc64
JIT compiler. After a round of the usual JIT passes, the offsets
to callee functions from __bpf_call_base are known. To update the
target addresses for the branch instructions associated with each
BPF_CALL, an extra pass is performed.

Since it is seen that the offsets may be as large as 64 bits for
powerpc64, we use the aux data associated with each caller to get
the correct branch target address rather than using the imm field
of the BPF_CALL instruction.

Signed-off-by: Sandipan Das 
---
 arch/powerpc/net/bpf_jit_comp64.c | 77 ++-
 1 file changed, 67 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/net/bpf_jit_comp64.c 
b/arch/powerpc/net/bpf_jit_comp64.c
index 0a34b0cec7b7..f31f22c8cb0b 100644
--- a/arch/powerpc/net/bpf_jit_comp64.c
+++ b/arch/powerpc/net/bpf_jit_comp64.c
@@ -290,7 +290,7 @@ static void bpf_jit_emit_tail_call(u32 *image, struct 
codegen_context *ctx, u32
 /* Assemble the body code between the prologue & epilogue */
 static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image,
  struct codegen_context *ctx,
- u32 *addrs)
+ u32 *addrs, bool extra_pass)
 {
const struct bpf_insn *insn = fp->insnsi;
int flen = fp->len;
@@ -299,6 +299,10 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
/* Start of epilogue code - will only be valid 2nd pass onwards */
u32 exit_addr = addrs[flen];
 
+   /* List of callee functions - will only be valid for the extra pass */
+   struct bpf_prog **callee = fp->aux->func;
+   u32 callee_cnt = fp->aux->func_cnt, callee_idx = 0;
+
for (i = 0; i < flen; i++) {
u32 code = insn[i].code;
u32 dst_reg = b2p[insn[i].dst_reg];
@@ -746,11 +750,17 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
break;
 
/*
-* Call kernel helper
+* Call kernel helper or bpf function
 */
case BPF_JMP | BPF_CALL:
ctx->seen |= SEEN_FUNC;
-   func = (u8 *) __bpf_call_base + imm;
+   if (insn[i].src_reg == BPF_PSEUDO_CALL && extra_pass)
+   if (callee && callee_idx < callee_cnt)
+   func = (u8 *) 
callee[callee_idx++]->bpf_func;
+   else
+   return -EINVAL;
+   else
+   func = (u8 *) __bpf_call_base + imm;
 
/* Save skb pointer if we need to re-cache skb data */
if ((ctx->seen & SEEN_SKB) &&
@@ -970,6 +980,14 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 
*image,
return 0;
 }
 
+struct powerpc64_jit_data {
+   struct bpf_binary_header *header;
+   u32 *addrs;
+   u8 *image;
+   u32 proglen;
+   struct codegen_context ctx;
+};
+
 struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
 {
u32 proglen;
@@ -977,6 +995,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
u8 *image = NULL;
u32 *code_base;
u32 *addrs;
+   struct powerpc64_jit_data *jit_data;
struct codegen_context cgctx;
int pass;
int flen;
@@ -984,6 +1003,7 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
struct bpf_prog *org_fp = fp;
struct bpf_prog *tmp_fp;
bool bpf_blinded = false;
+   bool extra_pass = false;
 
if (!fp->jit_requested)
return org_fp;
@@ -997,7 +1017,28 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog *fp)
fp = tmp_fp;
}
 
+   jit_data = fp->aux->jit_data;
+   if (!jit_data) {
+   jit_data = kzalloc(sizeof(*jit_data), GFP_KERNEL);
+   if (!jit_data) {
+   fp = org_fp;
+   goto out;
+   }
+   fp->aux->jit_data = jit_data;
+   }
+
flen = fp->len;
+   addrs = jit_data->addrs;
+   if (addrs) {
+   cgctx = jit_data->ctx;
+   image = jit_data->image;
+   bpf_hdr = jit_data->header;
+   proglen = jit_data->proglen;
+   alloclen = proglen + FUNCTION_DESCR_SIZE;
+   extra_pass = true;
+   goto skip_init_ctx;
+   }
+
addrs = kzalloc((flen+1) * sizeof(*addrs), GFP_KERNEL);
if (addrs == NULL) {
fp = org_fp;
@@ -1010,10 +1051,10 @@ struct bpf_prog *bpf_int_jit_compile(struct bpf_prog 
*fp)
cgctx.stack_size = round_up(fp->aux->stack_depth, 16);
 
/* Scouting faux-generate pass 0 */
-   if (bpf_jit_build_body(fp, 0, , addrs)) {
+   if 

[RFC][PATCH bpf 1/2] bpf: allow 64-bit offsets for bpf function calls

2018-02-08 Thread Sandipan Das
The imm field of a bpf_insn is a signed 32-bit integer. For
JIT-ed bpf-to-bpf function calls, it stores the offset from
__bpf_call_base to the start of the callee function.

For some architectures, such as powerpc64, it was found that
this offset may be as large as 64 bits because of which this
cannot be accomodated in the imm field without truncation.

To resolve this, we additionally use the aux data within each
bpf_prog associated with the caller functions to store the
addresses of their respective callees.

Signed-off-by: Sandipan Das 
---
 kernel/bpf/verifier.c | 39 ++-
 1 file changed, 38 insertions(+), 1 deletion(-)

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 5fb69a85d967..52088b4ca02f 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -5282,6 +5282,19 @@ static int jit_subprogs(struct bpf_verifier_env *env)
 * run last pass of JIT
 */
for (i = 0; i <= env->subprog_cnt; i++) {
+   u32 flen = func[i]->len, callee_cnt = 0;
+   struct bpf_prog **callee;
+
+   /* for now assume that the maximum number of bpf function
+* calls that can be made by a caller must be at most the
+* number of bpf instructions in that function
+*/
+   callee = kzalloc(sizeof(func[i]) * flen, GFP_KERNEL);
+   if (!callee) {
+   err = -ENOMEM;
+   goto out_free;
+   }
+
insn = func[i]->insnsi;
for (j = 0; j < func[i]->len; j++, insn++) {
if (insn->code != (BPF_JMP | BPF_CALL) ||
@@ -5292,6 +5305,26 @@ static int jit_subprogs(struct bpf_verifier_env *env)
insn->imm = (u64 (*)(u64, u64, u64, u64, u64))
func[subprog]->bpf_func -
__bpf_call_base;
+
+   /* the offset to the callee from __bpf_call_base
+* may be larger than what the 32 bit integer imm
+* can accomodate which will truncate the higher
+* order bits
+*
+* to avoid this, we additionally utilize the aux
+* data of each caller function for storing the
+* addresses of every callee associated with it
+*/
+   callee[callee_cnt++] = func[subprog];
+   }
+
+   /* free up callee list if no function calls were made */
+   if (!callee_cnt) {
+   kfree(callee);
+   callee = NULL;
+   } else {
+   func[i]->aux->func = callee;
+   func[i]->aux->func_cnt = callee_cnt;
}
}
for (i = 0; i <= env->subprog_cnt; i++) {
@@ -5338,8 +5371,12 @@ static int jit_subprogs(struct bpf_verifier_env *env)
return 0;
 out_free:
for (i = 0; i <= env->subprog_cnt; i++)
-   if (func[i])
+   if (func[i]) {
+   /* cleanup callee list */
+   if (func[i]->aux->func)
+   kfree(func[i]->aux->func);
bpf_jit_free(func[i]);
+   }
kfree(func);
/* cleanup main prog to be interpreted */
prog->jit_requested = 0;
-- 
2.14.3



[bpf-next V3 PATCH 4/5] selftests/bpf: add selftest that use test_libbpf_open

2018-02-08 Thread Jesper Dangaard Brouer
This script test_libbpf.sh will be part of the 'make run_tests'
invocation, but can also be invoked manually in this directory,
and a verbose mode can be enabled via setting the environment
variable $VERBOSE like:

 $ VERBOSE=yes ./test_libbpf.sh

The script contains some tests that are commented out, as they
currently fail.  They are reminders about what we need to improve
for the libbpf loader library.

Signed-off-by: Jesper Dangaard Brouer 
---
 tools/testing/selftests/bpf/Makefile   |   14 +++-
 tools/testing/selftests/bpf/test_libbpf.sh |   49 
 2 files changed, 61 insertions(+), 2 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_libbpf.sh

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index de7b85e2e532..8b5667714250 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -13,6 +13,7 @@ endif
 CFLAGS += -Wall -O2 -I$(APIDIR) -I$(LIBDIR) -I$(GENDIR) $(GENFLAGS) 
-I../../../include
 LDLIBS += -lcap -lelf -lrt -lpthread
 
+# Order correspond to 'make run_tests' order
 TEST_GEN_PROGS = test_verifier test_tag test_maps test_lru_map test_lpm_map 
test_progs \
test_align test_verifier_log test_dev_cgroup test_tcpbpf_user
 
@@ -20,17 +21,26 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
-   sample_map_ret0.o test_tcpbpf_kern.o test_libbpf_open
+   sample_map_ret0.o test_tcpbpf_kern.o
 
-TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
+# Order correspond to 'make run_tests' order
+TEST_PROGS := test_kmod.sh \
+   test_libbpf.sh \
+   test_xdp_redirect.sh \
+   test_xdp_meta.sh \
test_offload.py
 
+# Compile but not part of 'make run_tests'
+TEST_GEN_PROGS_EXTENDED = test_libbpf_open
+
 include ../lib.mk
 
 BPFOBJ := $(OUTPUT)/libbpf.a $(OUTPUT)/cgroup_helpers.c
 
 $(TEST_GEN_PROGS): $(BPFOBJ)
 
+$(TEST_GEN_PROGS_EXTENDED): $(OUTPUT)/libbpf.a
+
 .PHONY: force
 
 # force a rebuild of BPFOBJ when its dependencies are updated
diff --git a/tools/testing/selftests/bpf/test_libbpf.sh 
b/tools/testing/selftests/bpf/test_libbpf.sh
new file mode 100755
index ..d97dc914cd49
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_libbpf.sh
@@ -0,0 +1,49 @@
+#!/bin/sh
+# SPDX-License-Identifier: GPL-2.0
+
+export TESTNAME=test_libbpf
+
+# Determine selftest success via shell exit code
+exit_handler()
+{
+   if (( $? == 0 )); then
+   echo "selftests: $TESTNAME [PASS]";
+   else
+   echo "$TESTNAME: failed at file $LAST_LOADED" 1>&2
+   echo "selftests: $TESTNAME [FAILED]";
+   fi
+}
+
+libbpf_open_file()
+{
+   LAST_LOADED=$1
+   if [ -n "$VERBOSE" ]; then
+   ./test_libbpf_open $1
+   else
+   ./test_libbpf_open --quiet $1
+   fi
+}
+
+# Exit script immediately (well catched by trap handler) if any
+# program/thing exits with a non-zero status.
+set -e
+
+# (Use 'trap -l' to list meaning of numbers)
+trap exit_handler 0 2 3 6 9
+
+libbpf_open_file test_l4lb.o
+
+# TODO: fix libbpf to load noinline functions
+# [warning] libbpf: incorrect bpf_call opcode
+#libbpf_open_file test_l4lb_noinline.o
+
+# TODO: fix test_xdp_meta.c to load with libbpf
+# [warning] libbpf: test_xdp_meta.o doesn't provide kernel version
+#libbpf_open_file test_xdp_meta.o
+
+# TODO: fix libbpf to handle .eh_frame
+# [warning] libbpf: relocation failed: no section(10)
+#libbpf_open_file ../../../../samples/bpf/tracex3_kern.o
+
+# Success
+exit 0



[bpf-next V3 PATCH 3/5] selftests/bpf: add test program for loading BPF ELF files

2018-02-08 Thread Jesper Dangaard Brouer
V2: Moved program into selftests/bpf from tools/libbpf

This program can be used on its own for testing/debugging if a
BPF ELF-object file can be loaded with libbpf (from tools/lib/bpf).

If something is wrong with the ELF object, the program have
a --debug mode that will display the ELF sections and especially
the skipped sections.  This allows for quickly identifying the
problematic ELF section number, which can be corrolated with the
readelf tool.

The program signal error via return codes, and also have
a --quiet mode, which is practical for use in scripts like
selftests/bpf.

Signed-off-by: Jesper Dangaard Brouer 
---
 tools/testing/selftests/bpf/Makefile   |2 
 tools/testing/selftests/bpf/test_libbpf_open.c |  150 
 2 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/bpf/test_libbpf_open.c

diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index bf05bc5e36e5..de7b85e2e532 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -20,7 +20,7 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
-   sample_map_ret0.o test_tcpbpf_kern.o
+   sample_map_ret0.o test_tcpbpf_kern.o test_libbpf_open
 
 TEST_PROGS := test_kmod.sh test_xdp_redirect.sh test_xdp_meta.sh \
test_offload.py
diff --git a/tools/testing/selftests/bpf/test_libbpf_open.c 
b/tools/testing/selftests/bpf/test_libbpf_open.c
new file mode 100644
index ..8fcd1c076add
--- /dev/null
+++ b/tools/testing/selftests/bpf/test_libbpf_open.c
@@ -0,0 +1,150 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Jesper Dangaard Brouer, Red Hat Inc.
+ */
+static const char *__doc__ =
+   "Libbpf test program for loading BPF ELF object files";
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static const struct option long_options[] = {
+   {"help",no_argument,NULL, 'h' },
+   {"debug",   no_argument,NULL, 'D' },
+   {"quiet",   no_argument,NULL, 'q' },
+   {0, 0, NULL,  0 }
+};
+
+static void usage(char *argv[])
+{
+   int i;
+
+   printf("\nDOCUMENTATION:\n%s\n\n", __doc__);
+   printf(" Usage: %s (options-see-below) BPF_FILE\n", argv[0]);
+   printf(" Listing options:\n");
+   for (i = 0; long_options[i].name != 0; i++) {
+   printf(" --%-12s", long_options[i].name);
+   printf(" short-option: -%c",
+  long_options[i].val);
+   printf("\n");
+   }
+   printf("\n");
+}
+
+#define DEFINE_PRINT_FN(name, enabled) \
+static int libbpf_##name(const char *fmt, ...) \
+{  \
+va_list args;  \
+int ret;   \
+   \
+va_start(args, fmt);   \
+   if (enabled) {  \
+   fprintf(stderr, "[" #name "] ");\
+   ret = vfprintf(stderr, fmt, args);  \
+   }   \
+va_end(args);  \
+return ret;\
+}
+DEFINE_PRINT_FN(warning, 1)
+DEFINE_PRINT_FN(info, 1)
+DEFINE_PRINT_FN(debug, 1)
+
+#define EXIT_FAIL_LIBBPF EXIT_FAILURE
+#define EXIT_FAIL_OPTION 2
+
+int test_walk_progs(struct bpf_object *obj, bool verbose)
+{
+   struct bpf_program *prog;
+   int cnt = 0;
+
+   bpf_object__for_each_program(prog, obj) {
+   cnt++;
+   if (verbose)
+   printf("Prog (count:%d) section_name: %s\n", cnt,
+  bpf_program__title(prog, false));
+   }
+   return 0;
+}
+
+int test_walk_maps(struct bpf_object *obj, bool verbose)
+{
+   struct bpf_map *map;
+   int cnt = 0;
+
+   bpf_map__for_each(map, obj) {
+   cnt++;
+   if (verbose)
+   printf("Map (count:%d) name: %s\n", cnt,
+  bpf_map__name(map));
+   }
+   return 0;
+}
+
+int test_open_file(char *filename, bool verbose)
+{
+   struct bpf_object *bpfobj = NULL;
+   long err;
+
+   if (verbose)
+   printf("Open BPF ELF-file with libbpf: %s\n", filename);
+
+   /* Load BPF ELF object file and check for errors */
+   bpfobj = bpf_object__open(filename);
+   err = libbpf_get_error(bpfobj);
+   if (err) {
+   char err_buf[128];
+   

[bpf-next V3 PATCH 5/5] tools/libbpf: handle issues with bpf ELF objects containing .eh_frames

2018-02-08 Thread Jesper Dangaard Brouer
V3: More generic skipping of relo-section (suggested by Daniel)

If clang >= 4.0.1 is missing the option '-target bpf', it will cause
llc/llvm to create two ELF sections for "Exception Frames", with
section names '.eh_frame' and '.rel.eh_frame'.

The BPF ELF loader library libbpf fails when loading files with these
sections.  The other in-kernel BPF ELF loader in samples/bpf/bpf_load.c,
handle this gracefully. And iproute2 loader also seems to work with these
"eh" sections.

The issue in libbpf is caused by bpf_object__elf_collect() skipping
some sections, and later when performing relocation it will be
pointing to a skipped section, as these sections cannot be found by
bpf_object__find_prog_by_idx() in bpf_object__collect_reloc().

This is a general issue that also occurs for other sections, like
debug sections which are also skipped and can have relo section.

As suggested by Daniel.  To avoid keeping state about all skipped
sections, instead perform a direct qlookup in the ELF object.  Lookup
the section that the relo-section points to and check if it contains
executable machine instructions (denoted by the sh_flags
SHF_EXECINSTR).  Use this check to also skip irrelevant relo-sections.

Note, for samples/bpf/ the '-target bpf' parameter to clang cannot be used
due to incompatibility with asm embedded headers, that some of the samples
include. This is explained in more details by Yonghong Song in bpf_devel_QA.

Signed-off-by: Jesper Dangaard Brouer 
---
 tools/lib/bpf/libbpf.c |   26 ++
 1 file changed, 26 insertions(+)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index b4eeaa3ebff5..661147773dcb 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -738,6 +738,24 @@ bpf_object__init_maps(struct bpf_object *obj)
return 0;
 }
 
+static bool section_have_execinstr(struct bpf_object *obj, int idx)
+{
+   Elf_Scn *scn;
+   GElf_Shdr sh;
+
+   scn = elf_getscn(obj->efile.elf, idx);
+   if (!scn)
+   return false;
+
+   if (gelf_getshdr(scn, ) != )
+   return false;
+
+   if (sh.sh_flags & SHF_EXECINSTR)
+   return true;
+
+   return false;
+}
+
 static int bpf_object__elf_collect(struct bpf_object *obj)
 {
Elf *elf = obj->efile.elf;
@@ -821,6 +839,14 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
} else if (sh.sh_type == SHT_REL) {
void *reloc = obj->efile.reloc;
int nr_reloc = obj->efile.nr_reloc + 1;
+   int sec = sh.sh_info; /* points to other section */
+
+   /* Only do relo for section with exec instructions */
+   if (!section_have_execinstr(obj, sec)) {
+   pr_debug("skip relo %s(%d) for section(%d)\n",
+name, idx, sec);
+   continue;
+   }
 
reloc = realloc(reloc,
sizeof(*obj->efile.reloc) * nr_reloc);



[bpf-next V3 PATCH 0/5] tools/libbpf improvements and selftests

2018-02-08 Thread Jesper Dangaard Brouer
While playing with using libbpf for the Suricata project, we had
issues LLVM >= 4.0.1 generating ELF files that could not be loaded
with libbpf (tools/lib/bpf/).

During the troubleshooting phase, I wrote a test program and improved
the debugging output in libbpf.  I turned this into a selftests
program, and it also serves as a code example for libbpf in itself.

I discovered that there are at least three ELF load issues with
libbpf.  I left them as TODO comments in (tools/testing/selftests/bpf)
test_libbpf.sh. I've only fixed the load issue with eh_frames, and
other types of relo-section that does not have exec flags.  We can
work on the other issues later.

---

Jesper Dangaard Brouer (5):
  bpf: Sync kernel ABI header with tooling header for bpf_common.h
  tools/libbpf: improve the pr_debug statements to contain section numbers
  selftests/bpf: add test program for loading BPF ELF files
  selftests/bpf: add selftest that use test_libbpf_open
  tools/libbpf: handle issues with bpf ELF objects containing .eh_frames


 tools/include/uapi/linux/bpf_common.h  |7 +
 tools/lib/bpf/libbpf.c |   51 ++--
 tools/testing/selftests/bpf/Makefile   |   12 ++
 tools/testing/selftests/bpf/test_libbpf.sh |   49 
 tools/testing/selftests/bpf/test_libbpf_open.c |  150 
 5 files changed, 253 insertions(+), 16 deletions(-)
 create mode 100755 tools/testing/selftests/bpf/test_libbpf.sh
 create mode 100644 tools/testing/selftests/bpf/test_libbpf_open.c

--


[bpf-next V3 PATCH 2/5] tools/libbpf: improve the pr_debug statements to contain section numbers

2018-02-08 Thread Jesper Dangaard Brouer
While debugging a bpf ELF loading issue, I needed to correlate the
ELF section number with the failed relocation section reference.
Thus, add section numbers/index to the pr_debug.

In debug mode, also print section that were skipped.  This helped
me identify that a section (.eh_frame) was skipped, and this was
the reason the relocation section (.rel.eh_frame) could not find
that section number.

The section numbers corresponds to the readelf tools Section Headers [Nr].

Signed-off-by: Jesper Dangaard Brouer 
---
 tools/lib/bpf/libbpf.c |   25 +
 1 file changed, 13 insertions(+), 12 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 30c776375118..b4eeaa3ebff5 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -315,8 +315,8 @@ bpf_program__init(void *data, size_t size, char 
*section_name, int idx,
 
prog->section_name = strdup(section_name);
if (!prog->section_name) {
-   pr_warning("failed to alloc name for prog under section %s\n",
-  section_name);
+   pr_warning("failed to alloc name for prog under section(%d) 
%s\n",
+  idx, section_name);
goto errout;
}
 
@@ -759,29 +759,29 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
 
idx++;
if (gelf_getshdr(scn, ) != ) {
-   pr_warning("failed to get section header from %s\n",
-  obj->path);
+   pr_warning("failed to get section(%d) header from %s\n",
+  idx, obj->path);
err = -LIBBPF_ERRNO__FORMAT;
goto out;
}
 
name = elf_strptr(elf, ep->e_shstrndx, sh.sh_name);
if (!name) {
-   pr_warning("failed to get section name from %s\n",
-  obj->path);
+   pr_warning("failed to get section(%d) name from %s\n",
+  idx, obj->path);
err = -LIBBPF_ERRNO__FORMAT;
goto out;
}
 
data = elf_getdata(scn, 0);
if (!data) {
-   pr_warning("failed to get section data from %s(%s)\n",
-  name, obj->path);
+   pr_warning("failed to get section(%d) data from 
%s(%s)\n",
+  idx, name, obj->path);
err = -LIBBPF_ERRNO__FORMAT;
goto out;
}
-   pr_debug("section %s, size %ld, link %d, flags %lx, type=%d\n",
-name, (unsigned long)data->d_size,
+   pr_debug("section(%d) %s, size %ld, link %d, flags %lx, 
type=%d\n",
+idx, name, (unsigned long)data->d_size,
 (int)sh.sh_link, (unsigned long)sh.sh_flags,
 (int)sh.sh_type);
 
@@ -836,6 +836,8 @@ static int bpf_object__elf_collect(struct bpf_object *obj)
obj->efile.reloc[n].shdr = sh;
obj->efile.reloc[n].data = data;
}
+   } else {
+   pr_debug("skip section(%d) %s\n", idx, name);
}
if (err)
goto out;
@@ -1115,8 +1117,7 @@ static int bpf_object__collect_reloc(struct bpf_object 
*obj)
 
prog = bpf_object__find_prog_by_idx(obj, idx);
if (!prog) {
-   pr_warning("relocation failed: no %d section\n",
-  idx);
+   pr_warning("relocation failed: no section(%d)\n", idx);
return -LIBBPF_ERRNO__RELOC;
}
 



[bpf-next V3 PATCH 1/5] bpf: Sync kernel ABI header with tooling header for bpf_common.h

2018-02-08 Thread Jesper Dangaard Brouer
I recently fixed up a lot of commits that forgot to keep the tooling
headers in sync.  And then I forgot to do the same thing in commit
cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes"). Let correct
that before people notice ;-).

Lawrence did partly fix/sync this for bpf.h in commit d6d4f60c3a09
("bpf: add selftest for tcpbpf").

Fixes: cb5f7334d479 ("bpf: add comments to BPF ld/ldx sizes")
Signed-off-by: Jesper Dangaard Brouer 
---
 tools/include/uapi/linux/bpf_common.h |7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/include/uapi/linux/bpf_common.h 
b/tools/include/uapi/linux/bpf_common.h
index 18be90725ab0..ee97668bdadb 100644
--- a/tools/include/uapi/linux/bpf_common.h
+++ b/tools/include/uapi/linux/bpf_common.h
@@ -15,9 +15,10 @@
 
 /* ld/ldx fields */
 #define BPF_SIZE(code)  ((code) & 0x18)
-#defineBPF_W   0x00
-#defineBPF_H   0x08
-#defineBPF_B   0x10
+#defineBPF_W   0x00 /* 32-bit */
+#defineBPF_H   0x08 /* 16-bit */
+#defineBPF_B   0x10 /*  8-bit */
+/* eBPFBPF_DW  0x1864-bit */
 #define BPF_MODE(code)  ((code) & 0xe0)
 #defineBPF_IMM 0x00
 #defineBPF_ABS 0x20



Re: [PATCH net v2] netfilter: drop outermost socket lock in getsockopt()

2018-02-08 Thread Xin Long
On Thu, Feb 8, 2018 at 7:19 PM, Paolo Abeni  wrote:
> The Syzbot reported a possible deadlock in the netfilter area caused by
> rtnl lock, xt lock and socket lock being acquired with a different order
> on different code paths, leading to the following backtrace:
>
> ==
> WARNING: possible circular locking dependency detected
> 4.15.0+ #301 Not tainted
> --
> syzkaller233489/4179 is trying to acquire lock:
>   (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
>
> but task is already holding lock:
>   ([i].mutex){+.+.}, at: [<328553a2>]
> xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>
> which lock already depends on the new lock.
> ===
>
> Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
> only in the required scope"), we already acquire the socket lock in
> the innermost scope, where needed. In such commit I forgot to remove
> the outer-most socket lock from the getsockopt() path, this commit
> addresses the issues dropping it now.
>
> v1 -> v2: fix bad subj, added relavant 'fixes' tag
>
> Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice 
> notifiers")
> Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
> Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the 
> required scope")
> Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
> Suggested-by: Florian Westphal 
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv4/ip_sockglue.c   |  7 +--
>  net/ipv6/ipv6_sockglue.c | 10 ++
>  2 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 008be04ac1cc..9c41a0cef1a5 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1567,10 +1567,7 @@ int ip_getsockopt(struct sock *sk, int level,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> -   err = nf_getsockopt(sk, PF_INET, optname, optval,
> -   );
> -   release_sock(sk);
> +   err = nf_getsockopt(sk, PF_INET, optname, optval, );
> if (err >= 0)
> err = put_user(len, optlen);
> return err;
> @@ -1602,9 +1599,7 @@ int compat_ip_getsockopt(struct sock *sk, int level, 
> int optname,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> err = compat_nf_getsockopt(sk, PF_INET, optname, optval, 
> );
> -   release_sock(sk);
> if (err >= 0)
> err = put_user(len, optlen);
> return err;
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index d78d41fc4b1a..24535169663d 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -1367,10 +1367,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int 
> optname,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> -   err = nf_getsockopt(sk, PF_INET6, optname, optval,
> -   );
> -   release_sock(sk);
> +   err = nf_getsockopt(sk, PF_INET6, optname, optval, );
> if (err >= 0)
> err = put_user(len, optlen);
> }
> @@ -1409,10 +1406,7 @@ int compat_ipv6_getsockopt(struct sock *sk, int level, 
> int optname,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> -   err = compat_nf_getsockopt(sk, PF_INET6,
> -  optname, optval, );
> -   release_sock(sk);
> +   err = compat_nf_getsockopt(sk, PF_INET6, optname, optval, 
> );
> if (err >= 0)
> err = put_user(len, optlen);
> }
> --
> 2.14.3
>
Reviewed-by: Xin Long 


Re: [Patch net] ipt_CLUSTERIP: fix a race condition of proc file creation

2018-02-08 Thread Xin Long
On Thu, Feb 8, 2018 at 1:59 PM, Cong Wang  wrote:
> There is a race condition between clusterip_config_entry_put()
> and clusterip_config_init(), after we release the spinlock in
> clusterip_config_entry_put(), a new proc file with a same IP could
> be created immediately since it is already removed from the configs
> list, therefore it triggers this warning:
>
> [ cut here ]
> proc_dir_entry 'ipt_CLUSTERIP/172.20.0.170' already registered
> WARNING: CPU: 1 PID: 4152 at fs/proc/generic.c:330 proc_register+0x2a4/0x370 
> fs/proc/generic.c:329
> Kernel panic - not syncing: panic_on_warn set ...
>
> As a quick fix, just move the proc_remove() inside the spinlock.
>
> Reported-by: 
> Fixes: 6c5d5cfbe3c5 ("netfilter: ipt_CLUSTERIP: check duplicate config when 
> initializing")
> Tested-by: Paolo Abeni 
> Cc: Xin Long 
> Cc: Pablo Neira Ayuso 
> Signed-off-by: Cong Wang 
> ---
>  net/ipv4/netfilter/ipt_CLUSTERIP.c | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> index 3a84a60f6b39..1ff72b87a066 100644
> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> @@ -107,12 +107,6 @@ clusterip_config_entry_put(struct net *net, struct 
> clusterip_config *c)
>
> local_bh_disable();
> if (refcount_dec_and_lock(>entries, >lock)) {
> -   list_del_rcu(>list);
> -   spin_unlock(>lock);
> -   local_bh_enable();
> -
> -   unregister_netdevice_notifier(>notifier);
> -
> /* In case anyone still accesses the file, the open/close
>  * functions are also incrementing the refcount on their own,
>  * so it's safe to remove the entry even if it's in use. */
> @@ -120,6 +114,12 @@ clusterip_config_entry_put(struct net *net, struct 
> clusterip_config *c)
> if (cn->procdir)
> proc_remove(c->pde);
>  #endif
> +   list_del_rcu(>list);
> +   spin_unlock(>lock);
> +   local_bh_enable();
> +
> +   unregister_netdevice_notifier(>notifier);
> +
> return;
> }
> local_bh_enable();
> --
> 2.13.0
>
Reviewed-by: Xin Long 


[PATCH v2] lockdep: Fix fs_reclaim warning.

2018-02-08 Thread Tetsuo Handa
>From 361d37a7d36978020dfb4c11ec1f4800937ccb68 Mon Sep 17 00:00:00 2001
From: Tetsuo Handa 
Date: Thu, 8 Feb 2018 10:35:35 +0900
Subject: [PATCH v2] lockdep: Fix fs_reclaim warning.

Dave Jones reported fs_reclaim lockdep warnings.

  
  WARNING: possible recursive locking detected
  4.15.0-rc9-backup-debug+ #1 Not tainted
  
  sshd/24800 is trying to acquire lock:
   (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

  but task is already holding lock:
   (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

  other info that might help us debug this:
   Possible unsafe locking scenario:

 CPU0
 
lock(fs_reclaim);
lock(fs_reclaim);

   *** DEADLOCK ***

   May be due to missing lock nesting notation

  2 locks held by sshd/24800:
   #0:  (sk_lock-AF_INET6){+.+.}, at: [<1a069652>] tcp_sendmsg+0x19/0x40
   #1:  (fs_reclaim){+.+.}, at: [<84f438c2>] 
fs_reclaim_acquire.part.102+0x5/0x30

  stack backtrace:
  CPU: 3 PID: 24800 Comm: sshd Not tainted 4.15.0-rc9-backup-debug+ #1
  Call Trace:
   dump_stack+0xbc/0x13f
   __lock_acquire+0xa09/0x2040
   lock_acquire+0x12e/0x350
   fs_reclaim_acquire.part.102+0x29/0x30
   kmem_cache_alloc+0x3d/0x2c0
   alloc_extent_state+0xa7/0x410
   __clear_extent_bit+0x3ea/0x570
   try_release_extent_mapping+0x21a/0x260
   __btrfs_releasepage+0xb0/0x1c0
   btrfs_releasepage+0x161/0x170
   try_to_release_page+0x162/0x1c0
   shrink_page_list+0x1d5a/0x2fb0
   shrink_inactive_list+0x451/0x940
   shrink_node_memcg.constprop.88+0x4c9/0x5e0
   shrink_node+0x12d/0x260
   try_to_free_pages+0x418/0xaf0
   __alloc_pages_slowpath+0x976/0x1790
   __alloc_pages_nodemask+0x52c/0x5c0
   new_slab+0x374/0x3f0
   ___slab_alloc.constprop.81+0x47e/0x5a0
   __slab_alloc.constprop.80+0x32/0x60
   __kmalloc_track_caller+0x267/0x310
   __kmalloc_reserve.isra.40+0x29/0x80
   __alloc_skb+0xee/0x390
   sk_stream_alloc_skb+0xb8/0x340
   tcp_sendmsg_locked+0x8e6/0x1d30
   tcp_sendmsg+0x27/0x40
   inet_sendmsg+0xd0/0x310
   sock_write_iter+0x17a/0x240
   __vfs_write+0x2ab/0x380
   vfs_write+0xfb/0x260
   SyS_write+0xb6/0x140
   do_syscall_64+0x1e5/0xc05
   entry_SYSCALL64_slow_path+0x25/0x25

This warning is caused by commit d92a8cfcb37ecd13 ("locking/lockdep: Rework
FS_RECLAIM annotation") which replaced lockdep_set_current_reclaim_state()/
lockdep_clear_current_reclaim_state() in __perform_reclaim() and
lockdep_trace_alloc() in slab_pre_alloc_hook() with fs_reclaim_acquire()/
fs_reclaim_release(). Since __kmalloc_reserve() from __alloc_skb() adds
__GFP_NOMEMALLOC | __GFP_NOWARN to gfp_mask, and all reclaim path simply
propagates __GFP_NOMEMALLOC, fs_reclaim_acquire() in slab_pre_alloc_hook()
is trying to grab the 'fake' lock again when __perform_reclaim() already
grabbed the 'fake' lock.

The

  /* this guy won't enter reclaim */
  if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
  return false;

test which causes slab_pre_alloc_hook() to try to grab the 'fake' lock
was added by commit cf40bd16fdad42c0 ("lockdep: annotate reclaim context
(__GFP_NOFS)"). But that test is outdated because PF_MEMALLOC thread won't
enter reclaim regardless of __GFP_NOMEMALLOC after commit 341ce06f69abfafa
("page allocator: calculate the alloc_flags for allocation only once")
added the PF_MEMALLOC safeguard (

  /* Avoid recursion of direct reclaim */
  if (p->flags & PF_MEMALLOC)
  goto nopage;

in __alloc_pages_slowpath()).

Thus, let's fix outdated test by removing __GFP_NOMEMALLOC test and allow
__need_fs_reclaim() to return false.

Reported-and-tested-by: Dave Jones 
Signed-off-by: Tetsuo Handa 
Cc: Peter Zijlstra 
Cc: Nick Piggin 
---
 mm/page_alloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 81e18ce..19fb76b 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3590,7 +3590,7 @@ static bool __need_fs_reclaim(gfp_t gfp_mask)
return false;
 
/* this guy won't enter reclaim */
-   if ((current->flags & PF_MEMALLOC) && !(gfp_mask & __GFP_NOMEMALLOC))
+   if (current->flags & PF_MEMALLOC)
return false;
 
/* We're only interested __GFP_FS allocations for now */
-- 
1.8.3.1


selftests: net: reuseport_bpf failed intermittently

2018-02-08 Thread Naresh Kamboju
selftests/net/reuseport_bpf FAILED in full run on x86_64 and the
independent test execution resulted as PASS.

Test failed output log:
-
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
./reuseport_bpf : ebpf error: Operation not permitted

One more experiment,
The same test case "reuseport_bpf" run 10 times in a row.
The first run pass and other runs failed.

Running tests for 10 times
+ cd /opt/kselftests/mainline/net
+ ./reuseport_bpf
 IPv4 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP w/ mapped IPv4 
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
 IPv4 TCP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 TCP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 TCP w/ mapped IPv4 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing filter add without bind...
SUCCESS
+ echo PASS
PASS
+ ./reuseport_bpf
 IPv4 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
./reuseport_bpf: ebpf error. log:
0: (bf) r6 = r1
1: (20) r0 = *(u32 *)skb[0]
2: (97) r0 %= 10
3: (95) exit
processed 4 insns
: Operation not permitted
+ echo FAIL
FAIL
+ ./reuseport_bpf
 IPv4 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
./reuseport_bpf: ebpf error. log:
0: (bf) r6 = r1
1: (20) r0 = *(u32 *)skb[0]
2: (97) r0 %= 10
3: (95) exit
processed 4 insns
: Operation not permitted
+ echo FAIL
FAIL
+ ./reuseport_bpf
 IPv4 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP w/ mapped IPv4 
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
 IPv4 TCP 
Testing EBPF mod 10...
./reuseport_bpf: failed to bind send socket: Address already in use
+ echo FAIL
FAIL
+ ./reuseport_bpf
 IPv4 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP w/ mapped IPv4 
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
 IPv4 TCP 
Testing EBPF mod 10...
./reuseport_bpf: failed to bind send socket: Address already in use
+ echo FAIL
FAIL
+ ./reuseport_bpf
 IPv4 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing CBPF mod 10...
Reprograming, testing mod 5...
Testing CBPF mod 20...
Reprograming, testing mod 10...
Testing too many filters...
Testing filters on non-SO_REUSEPORT socket...
 IPv6 UDP 
Testing EBPF mod 10...
Reprograming, testing mod 5...
Testing EBPF mod 20...
Reprograming, testing mod 10...
Testing 

[PATCH net v2] netfilter: drop outermost socket lock in getsockopt()

2018-02-08 Thread Paolo Abeni
The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:

==
WARNING: possible circular locking dependency detected
4.15.0+ #301 Not tainted
--
syzkaller233489/4179 is trying to acquire lock:
  (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
  ([i].mutex){+.+.}, at: [<328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

v1 -> v2: fix bad subj, added relavant 'fixes' tag

Fixes: 22265a5c3c10 ("netfilter: xt_TEE: resolve oif using netdevice notifiers")
Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the 
required scope")
Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
Suggested-by: Florian Westphal 
Signed-off-by: Paolo Abeni 
---
 net/ipv4/ip_sockglue.c   |  7 +--
 net/ipv6/ipv6_sockglue.c | 10 ++
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 008be04ac1cc..9c41a0cef1a5 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1567,10 +1567,7 @@ int ip_getsockopt(struct sock *sk, int level,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
-   err = nf_getsockopt(sk, PF_INET, optname, optval,
-   );
-   release_sock(sk);
+   err = nf_getsockopt(sk, PF_INET, optname, optval, );
if (err >= 0)
err = put_user(len, optlen);
return err;
@@ -1602,9 +1599,7 @@ int compat_ip_getsockopt(struct sock *sk, int level, int 
optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
err = compat_nf_getsockopt(sk, PF_INET, optname, optval, );
-   release_sock(sk);
if (err >= 0)
err = put_user(len, optlen);
return err;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index d78d41fc4b1a..24535169663d 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1367,10 +1367,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int 
optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
-   err = nf_getsockopt(sk, PF_INET6, optname, optval,
-   );
-   release_sock(sk);
+   err = nf_getsockopt(sk, PF_INET6, optname, optval, );
if (err >= 0)
err = put_user(len, optlen);
}
@@ -1409,10 +1406,7 @@ int compat_ipv6_getsockopt(struct sock *sk, int level, 
int optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
-   err = compat_nf_getsockopt(sk, PF_INET6,
-  optname, optval, );
-   release_sock(sk);
+   err = compat_nf_getsockopt(sk, PF_INET6, optname, optval, );
if (err >= 0)
err = put_user(len, optlen);
}
-- 
2.14.3



Re: [PATCH bpf 0/6] nfp fix for calls, bpftool completions and doc fixes

2018-02-08 Thread Daniel Borkmann
On 02/08/2018 05:27 AM, Jakub Kicinski wrote:
> Hi!
> 
> First patch in this series fixes applying the relocation to immediate
> load instructions in the NFP JIT.
> 
> The remaining patches come from Quentin.  Small addition to libbpf
> makes sure it recognizes all standard section names.  Makefile in
> bpftool/Documentation is improved to explicitly check for rst2man
> being installed on the system, otherwise we risk installing empty
> files.  Man page for bpftool-map is corrected to include program
> as a potential value for map of programs.
> 
> Last two patches are slightly longer, those update bash completions to
> include this release cycle's additions from Roman.  Maybe the use of
> Fixes tags is slightly frivolous there, but having bash completions
> which don't cover all commands and options could be disruptive to work
> flow for users.

Series applied to bpf, thanks Jakub and Quentin!


Re: [PATCH net] netfilter: on setsockopt() acquire sock lock only in the required scope

2018-02-08 Thread Xin Long
On Thu, Feb 8, 2018 at 6:38 PM, Paolo Abeni  wrote:
> The Syzbot reported a possible deadlock in the netfilter area caused by
> rtnl lock, xt lock and socket lock being acquired with a different order
> on different code paths, leading to the following backtrace:
>
> ==
> WARNING: possible circular locking dependency detected
> 4.15.0+ #301 Not tainted
> --
> syzkaller233489/4179 is trying to acquire lock:
>   (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
>
> but task is already holding lock:
>   ([i].mutex){+.+.}, at: [<328553a2>]
> xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>
> which lock already depends on the new lock.
> ===
>
> Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
> only in the required scope"), we already acquire the socket lock in
> the innermost scope, where needed. In such commit I forgot to remove
> the outer-most socket lock from the getsockopt() path, this commit
> addresses the issues dropping it now.
>
> Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
> Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the 
> required scope")
> Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
> Suggested-by: Florian Westphal 
> Signed-off-by: Paolo Abeni 
> ---
>  net/ipv4/ip_sockglue.c   |  7 +--
>  net/ipv6/ipv6_sockglue.c | 10 ++
>  2 files changed, 3 insertions(+), 14 deletions(-)
>
> diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
> index 008be04ac1cc..9c41a0cef1a5 100644
> --- a/net/ipv4/ip_sockglue.c
> +++ b/net/ipv4/ip_sockglue.c
> @@ -1567,10 +1567,7 @@ int ip_getsockopt(struct sock *sk, int level,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> -   err = nf_getsockopt(sk, PF_INET, optname, optval,
> -   );
> -   release_sock(sk);
> +   err = nf_getsockopt(sk, PF_INET, optname, optval, );
> if (err >= 0)
> err = put_user(len, optlen);
> return err;
> @@ -1602,9 +1599,7 @@ int compat_ip_getsockopt(struct sock *sk, int level, 
> int optname,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> err = compat_nf_getsockopt(sk, PF_INET, optname, optval, 
> );
> -   release_sock(sk);
> if (err >= 0)
> err = put_user(len, optlen);
> return err;
> diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
> index d78d41fc4b1a..24535169663d 100644
> --- a/net/ipv6/ipv6_sockglue.c
> +++ b/net/ipv6/ipv6_sockglue.c
> @@ -1367,10 +1367,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int 
> optname,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> -   err = nf_getsockopt(sk, PF_INET6, optname, optval,
> -   );
> -   release_sock(sk);
> +   err = nf_getsockopt(sk, PF_INET6, optname, optval, );
> if (err >= 0)
> err = put_user(len, optlen);
> }
> @@ -1409,10 +1406,7 @@ int compat_ipv6_getsockopt(struct sock *sk, int level, 
> int optname,
> if (get_user(len, optlen))
> return -EFAULT;
>
> -   lock_sock(sk);
> -   err = compat_nf_getsockopt(sk, PF_INET6,
> -  optname, optval, );
> -   release_sock(sk);
> +   err = compat_nf_getsockopt(sk, PF_INET6, optname, optval, 
> );
> if (err >= 0)
> err = put_user(len, optlen);
> }
> --
> 2.14.3
>
Patch looks good to me, the better way to fix this deadlock,
just the subject should be 'getsockopt' instead.

Thanks.


Re: [PATCH] ath9k: turn on btcoex_enable as default

2018-02-08 Thread Felix Fietkau
On 2018-02-08 06:28, Kai-Heng Feng wrote:
> Without btcoex_enable, WiFi activies make both WiFi and Bluetooth
> unstable if there's a bluetooth connection.
> 
> Enable this option when bt_ant_diversity is disabled.
> 
> BugLink: https://bugs.launchpad.net/bugs/1746164
> Signed-off-by: Kai-Heng Feng 
I think this might cause regressions on devices that don't have
bluetooth. This probably either needs more EEPROM checks, or something
to selectively enable it only on affected platforms.

- Felix


[PATCH iproute2-next] ip/tunnel: Minor cleanups

2018-02-08 Thread Serhey Popovych
Few minor changes to reduce diffs between ip and ipv6 tunnel code:

  1) reduce intendation by one level when adding attributes in gre and
 gre6; reorder addattr*() calls to simplify diff

  2) reorder local variables definition; change their type (e.g. for
 IFLA_LINK) to match ones returned by rta_getattr_*()

  3) move "mode" parameter parsing in link_iptnl.c to the similar
 position as in link_ip6tnl.c

  4) handle "tc" as shortcut for "tclass"/"tos" in link_iptnl.c

  5) add whitespace where required

Signed-off-by: Serhey Popovych 
---
 ip/link_gre.c|   91 +++---
 ip/link_gre6.c   |   83 -
 ip/link_ip6tnl.c |   13 
 ip/link_iptnl.c  |   61 ++--
 ip/link_vti.c|2 +-
 ip/link_vti6.c   |2 +-
 6 files changed, 127 insertions(+), 125 deletions(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index b2573a1..69c8deb 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -84,23 +84,23 @@ static int gre_parse_opt(struct link_util *lu, int argc, 
char **argv,
struct rtattr *tb[IFLA_MAX + 1];
struct rtattr *linkinfo[IFLA_INFO_MAX+1];
struct rtattr *greinfo[IFLA_GRE_MAX + 1];
+   int len;
__u16 iflags = 0;
__u16 oflags = 0;
__be32 ikey = 0;
__be32 okey = 0;
unsigned int saddr = 0;
unsigned int daddr = 0;
-   unsigned int link = 0;
__u8 pmtudisc = 1;
-   __u8 ttl = 0;
+   __u8 ignore_df = 0;
__u8 tos = 0;
-   int len;
+   __u8 ttl = 0;
+   __u32 link = 0;
__u16 encaptype = 0;
__u16 encapflags = 0;
__u16 encapsport = 0;
__u16 encapdport = 0;
__u8 metadata = 0;
-   __u8 ignore_df = 0;
__u32 fwmark = 0;
__u32 erspan_idx = 0;
__u8 erspan_ver = 0;
@@ -155,31 +155,34 @@ get_failed:
pmtudisc = rta_getattr_u8(
greinfo[IFLA_GRE_PMTUDISC]);
 
-   if (greinfo[IFLA_GRE_TTL])
-   ttl = rta_getattr_u8(greinfo[IFLA_GRE_TTL]);
+   if (greinfo[IFLA_GRE_IGNORE_DF])
+   ignore_df =
+   !!rta_getattr_u8(greinfo[IFLA_GRE_IGNORE_DF]);
 
if (greinfo[IFLA_GRE_TOS])
tos = rta_getattr_u8(greinfo[IFLA_GRE_TOS]);
 
+   if (greinfo[IFLA_GRE_TTL])
+   ttl = rta_getattr_u8(greinfo[IFLA_GRE_TTL]);
+
if (greinfo[IFLA_GRE_LINK])
link = rta_getattr_u32(greinfo[IFLA_GRE_LINK]);
 
if (greinfo[IFLA_GRE_ENCAP_TYPE])
encaptype = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_TYPE]);
+
if (greinfo[IFLA_GRE_ENCAP_FLAGS])
encapflags = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_FLAGS]);
+
if (greinfo[IFLA_GRE_ENCAP_SPORT])
encapsport = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_SPORT]);
+
if (greinfo[IFLA_GRE_ENCAP_DPORT])
encapdport = 
rta_getattr_u16(greinfo[IFLA_GRE_ENCAP_DPORT]);
 
if (greinfo[IFLA_GRE_COLLECT_METADATA])
metadata = 1;
 
-   if (greinfo[IFLA_GRE_IGNORE_DF])
-   ignore_df =
-   !!rta_getattr_u8(greinfo[IFLA_GRE_IGNORE_DF]);
-
if (greinfo[IFLA_GRE_FWMARK])
fwmark = rta_getattr_u32(greinfo[IFLA_GRE_FWMARK]);
 
@@ -354,49 +357,47 @@ get_failed:
return -1;
}
 
-   if (!metadata) {
-   addattr32(n, 1024, IFLA_GRE_IKEY, ikey);
-   addattr32(n, 1024, IFLA_GRE_OKEY, okey);
-   addattr_l(n, 1024, IFLA_GRE_IFLAGS, , 2);
-   addattr_l(n, 1024, IFLA_GRE_OFLAGS, , 2);
-   addattr_l(n, 1024, IFLA_GRE_LOCAL, , 4);
-   addattr_l(n, 1024, IFLA_GRE_REMOTE, , 4);
-   addattr_l(n, 1024, IFLA_GRE_PMTUDISC, , 1);
-   if (ignore_df)
-   addattr8(n, 1024, IFLA_GRE_IGNORE_DF, ignore_df & 1);
-   if (link)
-   addattr32(n, 1024, IFLA_GRE_LINK, link);
-   addattr_l(n, 1024, IFLA_GRE_TTL, , 1);
-   addattr_l(n, 1024, IFLA_GRE_TOS, , 1);
-   addattr32(n, 1024, IFLA_GRE_FWMARK, fwmark);
-   if (erspan_ver) {
-   addattr8(n, 1024, IFLA_GRE_ERSPAN_VER, erspan_ver);
-   if (erspan_ver == 1 && erspan_idx != 0) {
-   addattr32(n, 1024,
- IFLA_GRE_ERSPAN_INDEX, erspan_idx);
-   } else if (erspan_ver == 2) {
-   addattr8(n, 1024,
-IFLA_GRE_ERSPAN_DIR, erspan_dir);
-  

[PATCH iproute2-next 2/3] gre/gre6: Unify gre_print_help()

2018-02-08 Thread Serhey Popovych
Reduce diff lines between gre and gre6 help printing code.

Use @struct link_util ->id field to print correct link help:
all callers now pass this data structure to gre_print_help().

Get rid of custom print_usage() and usage() functions and use
gre_print_help() directly, return from function on "... type
" instead of exit(2).

Signed-off-by: Serhey Popovych 
---
 ip/link_gre.c  |   73 +--
 ip/link_gre6.c |   74 ++--
 2 files changed, 67 insertions(+), 80 deletions(-)

diff --git a/ip/link_gre.c b/ip/link_gre.c
index b2573a1..e972a10 100644
--- a/ip/link_gre.c
+++ b/ip/link_gre.c
@@ -23,34 +23,38 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f)
+static void gre_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... { gre | gretap | erspan } [ remote ADDR ]\n"
-   "[ local ADDR ]\n"
-   "[ [i|o]seq ]\n"
-   "[ [i|o]key KEY ]\n"
-   "[ [i|o]csum ]\n"
-   "[ ttl TTL ]\n"
-   "[ tos TOS ]\n"
-   "[ [no]pmtudisc ]\n"
-   "[ [no]ignore-df ]\n"
-   "[ dev PHYS_DEV ]\n"
-   "[ noencap ]\n"
-   "[ encap { fou | gue | none } ]\n"
-   "[ encap-sport PORT ]\n"
-   "[ encap-dport PORT ]\n"
-   "[ [no]encap-csum ]\n"
-   "[ [no]encap-csum6 ]\n"
-   "[ [no]encap-remcsum ]\n"
-   "[ external ]\n"
-   "[ fwmark MARK ]\n"
-   "[ erspan_ver version ]\n"
-   "[ erspan IDX ]\n"
-   "[ erspan_dir { ingress | egress } 
]\n"
-   "[ erspan_hwid hwid ]\n"
-   "[ external ]\n"
+   "Usage: ... %-9s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
+   " [ local ADDR ]\n"
+   " [ [i|o]seq ]\n"
+   " [ [i|o]key KEY ]\n"
+   " [ [i|o]csum ]\n"
+   " [ ttl TTL ]\n"
+   " [ tos TOS ]\n"
+   " [ [no]pmtudisc ]\n"
+   " [ [no]ignore-df ]\n"
+   " [ dev PHYS_DEV ]\n"
+   " [ fwmark MARK ]\n"
+   " [ external ]\n"
+   " [ noencap ]\n"
+   " [ encap { fou | gue | none } ]\n"
+   " [ encap-sport PORT ]\n"
+   " [ encap-dport PORT ]\n"
+   " [ [no]encap-csum ]\n"
+   " [ [no]encap-csum6 ]\n"
+   " [ [no]encap-remcsum ]\n"
+   " [ erspan_ver version ]\n"
+   " [ erspan IDX ]\n"
+   " [ erspan_dir { ingress | egress } ]\n"
+   " [ erspan_hwid hwid ]\n"
"\n"
+   );
+   fprintf(f,
"Where: ADDR := { IP_ADDRESS | any }\n"
"   TOS  := { NUMBER | inherit }\n"
"   TTL  := { 1..255 | inherit }\n"
@@ -59,13 +63,6 @@ static void print_usage(FILE *f)
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int gre_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
@@ -336,8 +333,10 @@ get_failed:
NEXT_ARG();
if (get_u16(_hwid, *argv, 0))
invarg("invalid erspan hwid\n", *argv);
-   } else
-   usage();
+   } else {
+   gre_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -517,12 +516,6 @@ static void gre_print_opt(struct link_util *lu, FILE *f, 

[PATCH iproute2-next 0/3] ip/tunnel: Unify tunnel help message print routines

2018-02-08 Thread Serhey Popovych
To show only relevant diffs of ip and ipv6 variants help message print
routines needs to be unified and improved.

Get rid of print_usage() and usage() wrappers: use single function to
output help message. As side effect we return -1 from parse function
instead of calling exit(2) in case of "... tunnel " is
found.

Additionally we get pointer to @struct link_util and can directly access
->id information to prepare customized help message.

Split calls to fprintf() two group: one that contains format string with
specifiers (thus requiring parameters) and another one that does not.
This helps compiler to optimize calls to fprintf() with fputs() when no
format specifiers in string. Do not use fputs() directly to keep code
formatting nice.

After this series applied following diffs:

  # diff -urN ip/link_gre{,6}.c
  # diff -urN ip/link_vti{,6}.c
  # diff -urN ip/link_ip{,6}tnl.c

in scope of help print routines reduced to necessary minimum.

Tested minimally by compiling and executing "ip link help " and
"ip link add type help" commands. Looks correct.

See individual patch description for more information.

Reviews, commands and suggestions are welcome.

Thanks,
Serhii

Serhey Popovych (3):
  vti/vti6: Unify vti_print_help()
  gre/gre6: Unify gre_print_help()
  iptnl/ip6tnl: Unify iptunnel_print_help()

 ip/link_gre.c|   73 
 ip/link_gre6.c   |   74 +
 ip/link_ip6tnl.c |   58 ++-
 ip/link_iptnl.c  |   88 ++
 ip/link_vti.c|   42 +++---
 ip/link_vti6.c   |   45 +---
 6 files changed, 178 insertions(+), 202 deletions(-)

-- 
1.7.10.4



[PATCH iproute2-next 1/3] vti/vti6: Unify vti_print_help()

2018-02-08 Thread Serhey Popovych
Reduce diff lines between vti and vti6 help printing code.

Use @struct link_util ->id field to print correct link help:
all callers now pass this data structure to vti_print_help().

Get rid of custom print_usage() and usage() functions and use
vti_print_help() directly, return from function on "... type
" instead of exit(2).

While there replace vti6 function prefix with vti to reduce
diff lines too.

Signed-off-by: Serhey Popovych 
---
 ip/link_vti.c  |   42 ++
 ip/link_vti6.c |   45 -
 2 files changed, 38 insertions(+), 49 deletions(-)

diff --git a/ip/link_vti.c b/ip/link_vti.c
index 49b87e9..f128e6b 100644
--- a/ip/link_vti.c
+++ b/ip/link_vti.c
@@ -23,29 +23,27 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-
-static void print_usage(FILE *f)
+static void vti_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... vti [ remote ADDR ]\n"
-   "   [ local ADDR ]\n"
-   "   [ [i|o]key KEY ]\n"
-   "   [ dev PHYS_DEV ]\n"
-   "   [ fwmark MARK ]\n"
+   "Usage: ... %-4s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
+   "[ local ADDR ]\n"
+   "[ [i|o]key KEY ]\n"
+   "[ dev PHYS_DEV ]\n"
+   "[ fwmark MARK ]\n"
"\n"
-   "Where: ADDR := { IP_ADDRESS }\n"
+   );
+   fprintf(f,
+   "Where: ADDR := { IP%s_ADDRESS }\n"
"   KEY  := { DOTTED_QUAD | NUMBER }\n"
-   "   MARK := { 0x0..0x }\n"
+   "   MARK := { 0x0..0x }\n",
+   ""
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
 static int vti_parse_opt(struct link_util *lu, int argc, char **argv,
 struct nlmsghdr *n)
 {
@@ -147,8 +145,10 @@ get_failed:
NEXT_ARG();
if (get_u32(, *argv, 0))
invarg("invalid fwmark\n", *argv);
-   } else
-   usage();
+   } else {
+   vti_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -208,12 +208,6 @@ static void vti_print_opt(struct link_util *lu, FILE *f, 
struct rtattr *tb[])
}
 }
 
-static void vti_print_help(struct link_util *lu, int argc, char **argv,
-   FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util vti_link_util = {
.id = "vti",
.maxattr = IFLA_VTI_MAX,
diff --git a/ip/link_vti6.c b/ip/link_vti6.c
index d1fbec5..0a888cd 100644
--- a/ip/link_vti6.c
+++ b/ip/link_vti6.c
@@ -24,30 +24,29 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void print_usage(FILE *f)
+static void vti_print_help(struct link_util *lu, int argc, char **argv, FILE 
*f)
 {
fprintf(f,
-   "Usage: ... vti6 [ remote ADDR ]\n"
+   "Usage: ... %-4s [ remote ADDR ]\n",
+   lu->id
+   );
+   fprintf(f,
"[ local ADDR ]\n"
"[ [i|o]key KEY ]\n"
"[ dev PHYS_DEV ]\n"
"[ fwmark MARK ]\n"
"\n"
-   "Where: ADDR := { IPV6_ADDRESS }\n"
+   );
+   fprintf(f,
+   "Where: ADDR := { IP%s_ADDRESS }\n"
"   KEY  := { DOTTED_QUAD | NUMBER }\n"
-   "   MARK := { 0x0..0x }\n"
+   "   MARK := { 0x0..0x }\n",
+   "V6"
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
-static int vti6_parse_opt(struct link_util *lu, int argc, char **argv,
- struct nlmsghdr *n)
+static int vti_parse_opt(struct link_util *lu, int argc, char **argv,
+struct nlmsghdr *n)
 {
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct {
@@ -153,8 +152,10 @@ get_failed:
NEXT_ARG();
if (get_u32(, *argv, 0))
invarg("invalid fwmark\n", *argv);
-   } else
-   usage();
+   } else {
+   vti_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--; argv++;
}
 
@@ -169,7 +170,7 @@ get_failed:
return 0;
 }
 
-static void vti6_print_opt(struct link_util *lu, FILE *f, struct rtattr *tb[])
+static void 

[PATCH iproute2-next 3/3] iptnl/ip6tnl: Unify iptunnel_print_help()

2018-02-08 Thread Serhey Popovych
Reduce diff lines between iptnl and ip6tnl help printing code.

Use @struct link_util ->id field to print correct link help:
all callers now pass this data structure to iptunnel_print_help().

Get rid of custom print_usage() and usage() functions and use
iptunnel_print_help() directly, return from function on "... type
" instead of exit(2).

While there replace ip6tunnel function prefix with iptunnel to reduce
diff lines too.

Signed-off-by: Serhey Popovych 
---
 ip/link_ip6tnl.c |   58 ++-
 ip/link_iptnl.c  |   88 ++
 2 files changed, 73 insertions(+), 73 deletions(-)

diff --git a/ip/link_ip6tnl.c b/ip/link_ip6tnl.c
index 91d7d99..54545ad 100644
--- a/ip/link_ip6tnl.c
+++ b/ip/link_ip6tnl.c
@@ -29,20 +29,26 @@
 
 #define DEFAULT_TNL_HOP_LIMIT  (64)
 
-static void print_usage(FILE *f)
+static void iptunnel_print_help(struct link_util *lu, int argc, char **argv,
+   FILE *f)
 {
+   const char *mode;
+
+   fprintf(f,
+   "Usage: ... %-6s [ remote ADDR ]\n",
+   lu->id
+   );
fprintf(f,
-   "Usage: ... ip6tnl [ mode { ip6ip6 | ipip6 | any } ]\n"
-   "  [ remote ADDR ]\n"
"  [ local ADDR ]\n"
-   "  [ dev PHYS_DEV ]\n"
"  [ encaplimit ELIM ]\n"
"  [ hoplimit HLIM ]\n"
"  [ tclass TCLASS ]\n"
"  [ flowlabel FLOWLABEL ]\n"
"  [ dscp inherit ]\n"
-   "  [ fwmark MARK ]\n"
"  [ [no]allow-localremote ]\n"
+   "  [ dev PHYS_DEV ]\n"
+   "  [ fwmark MARK ]\n"
+   "  [ external ]\n"
"  [ noencap ]\n"
"  [ encap { fou | gue | none } ]\n"
"  [ encap-sport PORT ]\n"
@@ -50,8 +56,14 @@ static void print_usage(FILE *f)
"  [ [no]encap-csum ]\n"
"  [ [no]encap-csum6 ]\n"
"  [ [no]encap-remcsum ]\n"
-   "  [ external ]\n"
-   "\n"
+   );
+   mode = "{ ip6ip6 | ipip6 | any }";
+   fprintf(f,
+   "  [ mode %s ]\n"
+   "\n",
+   mode
+   );
+   fprintf(f,
"Where: ADDR  := IPV6_ADDRESS\n"
"   ELIM  := { none | 0..255 }(default=%d)\n"
"   HLIM  := 0..255 (default=%d)\n"
@@ -62,15 +74,8 @@ static void print_usage(FILE *f)
);
 }
 
-static void usage(void) __attribute__((noreturn));
-static void usage(void)
-{
-   print_usage(stderr);
-   exit(-1);
-}
-
-static int ip6tunnel_parse_opt(struct link_util *lu, int argc, char **argv,
-  struct nlmsghdr *n)
+static int iptunnel_parse_opt(struct link_util *lu, int argc, char **argv,
+ struct nlmsghdr *n)
 {
struct ifinfomsg *ifi = NLMSG_DATA(n);
struct {
@@ -304,8 +309,10 @@ get_failed:
encapflags &= ~TUNNEL_ENCAP_FLAG_REMCSUM;
} else if (strcmp(*argv, "external") == 0) {
metadata = 1;
-   } else
-   usage();
+   } else {
+   iptunnel_print_help(lu, argc, argv, stderr);
+   return -1;
+   }
argc--, argv++;
}
 
@@ -331,7 +338,8 @@ get_failed:
return 0;
 }
 
-static void ip6tunnel_print_opt(struct link_util *lu, FILE *f, struct rtattr 
*tb[])
+static void iptunnel_print_opt(struct link_util *lu, FILE *f,
+  struct rtattr *tb[])
 {
char s2[64];
int flags = 0;
@@ -456,16 +464,10 @@ static void ip6tunnel_print_opt(struct link_util *lu, 
FILE *f, struct rtattr *tb
IFLA_IPTUN_ENCAP_DPORT);
 }
 
-static void ip6tunnel_print_help(struct link_util *lu, int argc, char **argv,
-FILE *f)
-{
-   print_usage(f);
-}
-
 struct link_util ip6tnl_link_util = {
.id = "ip6tnl",
.maxattr = IFLA_IPTUN_MAX,
-   .parse_opt = ip6tunnel_parse_opt,
-   .print_opt = ip6tunnel_print_opt,
-   .print_help = ip6tunnel_print_help,
+   .parse_opt = iptunnel_parse_opt,
+   .print_opt = iptunnel_print_opt,
+   .print_help = iptunnel_print_help,
 };
diff --git a/ip/link_iptnl.c b/ip/link_iptnl.c
index 3e653b7..84117ac 100644
--- a/ip/link_iptnl.c
+++ b/ip/link_iptnl.c
@@ -24,49 +24,51 @@
 #include "ip_common.h"
 #include "tunnel.h"
 
-static void 

[PATCH net] netfilter: on setsockopt() acquire sock lock only in the required scope

2018-02-08 Thread Paolo Abeni
The Syzbot reported a possible deadlock in the netfilter area caused by
rtnl lock, xt lock and socket lock being acquired with a different order
on different code paths, leading to the following backtrace:

==
WARNING: possible circular locking dependency detected
4.15.0+ #301 Not tainted
--
syzkaller233489/4179 is trying to acquire lock:
  (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
net/core/rtnetlink.c:74

but task is already holding lock:
  ([i].mutex){+.+.}, at: [<328553a2>]
xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041

which lock already depends on the new lock.
===

Since commit 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock
only in the required scope"), we already acquire the socket lock in
the innermost scope, where needed. In such commit I forgot to remove
the outer-most socket lock from the getsockopt() path, this commit
addresses the issues dropping it now.

Fixes: 202f59afd441 ("netfilter: ipt_CLUSTERIP: do not hold dev")
Fixes: 3f34cfae1230 ("netfilter: on sockopt() acquire sock lock only in the 
required scope")
Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
Suggested-by: Florian Westphal 
Signed-off-by: Paolo Abeni 
---
 net/ipv4/ip_sockglue.c   |  7 +--
 net/ipv6/ipv6_sockglue.c | 10 ++
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 008be04ac1cc..9c41a0cef1a5 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -1567,10 +1567,7 @@ int ip_getsockopt(struct sock *sk, int level,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
-   err = nf_getsockopt(sk, PF_INET, optname, optval,
-   );
-   release_sock(sk);
+   err = nf_getsockopt(sk, PF_INET, optname, optval, );
if (err >= 0)
err = put_user(len, optlen);
return err;
@@ -1602,9 +1599,7 @@ int compat_ip_getsockopt(struct sock *sk, int level, int 
optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
err = compat_nf_getsockopt(sk, PF_INET, optname, optval, );
-   release_sock(sk);
if (err >= 0)
err = put_user(len, optlen);
return err;
diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c
index d78d41fc4b1a..24535169663d 100644
--- a/net/ipv6/ipv6_sockglue.c
+++ b/net/ipv6/ipv6_sockglue.c
@@ -1367,10 +1367,7 @@ int ipv6_getsockopt(struct sock *sk, int level, int 
optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
-   err = nf_getsockopt(sk, PF_INET6, optname, optval,
-   );
-   release_sock(sk);
+   err = nf_getsockopt(sk, PF_INET6, optname, optval, );
if (err >= 0)
err = put_user(len, optlen);
}
@@ -1409,10 +1406,7 @@ int compat_ipv6_getsockopt(struct sock *sk, int level, 
int optname,
if (get_user(len, optlen))
return -EFAULT;
 
-   lock_sock(sk);
-   err = compat_nf_getsockopt(sk, PF_INET6,
-  optname, optval, );
-   release_sock(sk);
+   err = compat_nf_getsockopt(sk, PF_INET6, optname, optval, );
if (err >= 0)
err = put_user(len, optlen);
}
-- 
2.14.3



RE: [RFC crypto v3 8/9] chtls: Register the ULP

2018-02-08 Thread Vakul Garg


> -Original Message-
> From: linux-crypto-ow...@vger.kernel.org [mailto:linux-crypto-
> ow...@vger.kernel.org] On Behalf Of Atul Gupta
> Sent: Thursday, February 8, 2018 3:56 PM
> To: Dave Watson 
> Cc: s...@queasysnail.net; herb...@gondor.apana.org.au; linux-
> cry...@vger.kernel.org; ganes...@chelsio.co; netdev@vger.kernel.org;
> da...@davemloft.net; Boris Pismenny ; Ilya
> Lesokhin 
> Subject: RE: [RFC crypto v3 8/9] chtls: Register the ULP
> 
> I thought about this and approach below can avoid new ulp type:
> 
> 1. Register Inline TLS driver to net TLS
> 2. enable ethtool -K  tls-hw-record-offload on
> 3. Issue " setsockopt(fd, SOL_TCP, TCP_ULP, "tls", sizeof("tls")) " after 
> Bind,
> this will enable user fetch net_device corresponding to ipaadr bound to
> interface, if dev found is the one registered and record-offload enabled,
> program the sk->sk_prot as required.
 
What happens in case of TLS  clients which do not explicitly call bind() and
rely on kernel to choose an ephemeral port for socket?
Does calling setsockopt after the connection is established fix the problem?

> 4. fallback to SW TLS for any other case, bind to inaddr_any falls in this
> category and need proper handling?
> 
> tls-hw-record-offload is TLS record offload to HW, which does tx/rx and
> record creation Inline.
> 
> enum {
> TLS_BASE_TX,
> TLS_SW_TX,
> TLS_RECORD_HW, /* TLS record processed Inline */
> TLS_NUM_CONFIG,
> };
> 
> -Original Message-
> From: Dave Watson [mailto:davejwat...@fb.com]
> Sent: Wednesday, January 31, 2018 10:14 PM
> To: Atul Gupta 
> Cc: s...@queasysnail.net; herb...@gondor.apana.org.au; linux-
> cry...@vger.kernel.org; ganes...@chelsio.co; netdev@vger.kernel.org;
> da...@davemloft.net; Boris Pismenny ; Ilya
> Lesokhin 
> Subject: Re: [RFC crypto v3 8/9] chtls: Register the ULP
> 
> On 01/31/18 04:14 PM, Atul Gupta wrote:
> >
> >
> > On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:
> > > On 01/30/18 06:51 AM, Atul Gupta wrote:
> > >
> > > > What I was referring is that passing "tls" ulp type in setsockopt
> > > > may be insufficient to make the decision when multi HW assist
> > > > Inline TLS solution exists.
> > > Setting the ULP doesn't choose HW or SW implementation, I think that
> > > should be done later when setting up crypto with
> > >
> > > setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
> > setsockpot [mentioned above] is quite late for driver to enable HW
> > implementation, we require something as early as tls_init
> > [setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver
> > to set HW prot and offload connection beside Inline Tx/Rx.
> > >
> > > Any reason we can't use ethtool to choose HW vs SW implementation,
> > > if available on the device?
> > Thought about it,  the interface index is not available to fetch
> > netdev and caps check to set HW prot eg. bind [prot.hash] --> tls_hash to
> program HW.
> 
> Perhaps this is the part I don't follow - why do you need to override hash and
> check for LISTEN?  I briefly looked through the patch named "CPL handler
> definition", this looks like it is a full TCP offload?
> 
> Yes, this is connection and record layer offload, and the reason I used
> different ulp type, need to see what additional info or check can help setup
> the required sk prot.


RE: [RFC crypto v3 8/9] chtls: Register the ULP

2018-02-08 Thread Atul Gupta
I thought about this and approach below can avoid new ulp type:

1. Register Inline TLS driver to net TLS 
2. enable ethtool -K  tls-hw-record-offload on
3. Issue " setsockopt(fd, SOL_TCP, TCP_ULP, "tls", sizeof("tls")) " after Bind, 
this will enable user fetch net_device corresponding to ipaadr bound to 
interface, if dev found is the one registered and record-offload enabled, 
program the sk->sk_prot as required.
4. fallback to SW TLS for any other case, bind to inaddr_any falls in this 
category and need proper handling?

tls-hw-record-offload is TLS record offload to HW, which does tx/rx and record 
creation Inline.

enum {
TLS_BASE_TX,
TLS_SW_TX,
TLS_RECORD_HW, /* TLS record processed Inline */
TLS_NUM_CONFIG,
};

-Original Message-
From: Dave Watson [mailto:davejwat...@fb.com]
Sent: Wednesday, January 31, 2018 10:14 PM
To: Atul Gupta 
Cc: s...@queasysnail.net; herb...@gondor.apana.org.au; 
linux-cry...@vger.kernel.org; ganes...@chelsio.co; netdev@vger.kernel.org; 
da...@davemloft.net; Boris Pismenny ; Ilya Lesokhin 

Subject: Re: [RFC crypto v3 8/9] chtls: Register the ULP

On 01/31/18 04:14 PM, Atul Gupta wrote:
> 
> 
> On Tuesday 30 January 2018 10:41 PM, Dave Watson wrote:
> > On 01/30/18 06:51 AM, Atul Gupta wrote:
> > 
> > > What I was referring is that passing "tls" ulp type in setsockopt 
> > > may be insufficient to make the decision when multi HW assist 
> > > Inline TLS solution exists.
> > Setting the ULP doesn't choose HW or SW implementation, I think that 
> > should be done later when setting up crypto with
> > 
> > setsockopt(SOL_TLS, TLS_TX, struct crypto_info).
> setsockpot [mentioned above] is quite late for driver to enable HW 
> implementation, we require something as early as tls_init 
> [setsockopt(sock, SOL_TCP, TCP_ULP, "tls", sizeof("tls"))], for driver 
> to set HW prot and offload connection beside Inline Tx/Rx.
> > 
> > Any reason we can't use ethtool to choose HW vs SW implementation, 
> > if available on the device?
> Thought about it,  the interface index is not available to fetch 
> netdev and caps check to set HW prot eg. bind [prot.hash] --> tls_hash to 
> program HW.

Perhaps this is the part I don't follow - why do you need to override hash and 
check for LISTEN?  I briefly looked through the patch named "CPL handler 
definition", this looks like it is a full TCP offload?

Yes, this is connection and record layer offload, and the reason I used 
different ulp type, need to see what additional info or check can help setup 
the required sk prot.


net: phy: question about phy_is_internal for generic-phy

2018-02-08 Thread Kunihiko Hayashi
Hello,

Is there a way to specify "phy is internal" to generic phy driver,
that is, to make phy_is_internal() function available?

I found "phy-is-integrated" DT property in
Documentation/devicetree/bindings/net/phy.txt, however, it seems
that the property is no effect for generic phy. And I think that
the meaning of "integrated" is slightly different from "internal".

---
Best Regards,
Kunihiko Hayashi



Re: possible deadlock in rtnl_lock (4)

2018-02-08 Thread Xin Long
On Thu, Feb 8, 2018 at 6:58 AM, syzbot
 wrote:
> Hello,
>
> syzbot hit the following crash on upstream commit
> a2e5790d841658485d642196dbb0927303d6c22f (Wed Feb 7 06:15:42 2018 +)
> Merge branch 'akpm' (patches from Andrew)
>
> So far this crash happened 632 times on
> https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git/master.
> C reproducer is attached.
> syzkaller reproducer is attached.
> Raw console output is attached.
> compiler: gcc (GCC) 7.1.1 20170620
> .config is attached.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: syzbot+ddde1c7b7ff7442d7...@syzkaller.appspotmail.com
> It will help syzbot understand when the bug is fixed. See footer for
> details.
> If you forward the report, please keep this part and the footer.
>
>
> ==
> WARNING: possible circular locking dependency detected
> 4.15.0+ #301 Not tainted
> --
> syzkaller233489/4179 is trying to acquire lock:
>  (rtnl_mutex){+.+.}, at: [<48e996fd>] rtnl_lock+0x17/0x20
> net/core/rtnetlink.c:74
>
> but task is already holding lock:
>  ([i].mutex){+.+.}, at: [<328553a2>]
> xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 ([i].mutex){+.+.}:
>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>xt_find_table_lock+0x3e/0x3e0 net/netfilter/x_tables.c:1041
>xt_request_find_table_lock+0x28/0xc0 net/netfilter/x_tables.c:1088
>get_info+0x154/0x690 net/ipv6/netfilter/ip6_tables.c:989
>do_ipt_get_ctl+0x159/0xac0 net/ipv4/netfilter/ip_tables.c:1699
>nf_sockopt net/netfilter/nf_sockopt.c:104 [inline]
>nf_getsockopt+0x6a/0xc0 net/netfilter/nf_sockopt.c:122
>ip_getsockopt+0x15c/0x220 net/ipv4/ip_sockglue.c:1571
>tcp_getsockopt+0x82/0xd0 net/ipv4/tcp.c:3359
>sock_common_getsockopt+0x95/0xd0 net/core/sock.c:2934
>SYSC_getsockopt net/socket.c:1880 [inline]
>SyS_getsockopt+0x178/0x340 net/socket.c:1862
>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>
> -> #1 (sk_lock-AF_INET){+.+.}:
>lock_sock_nested+0xc2/0x110 net/core/sock.c:2777
>lock_sock include/net/sock.h:1463 [inline]
>do_ip_setsockopt.isra.12+0x1d9/0x3210 net/ipv4/ip_sockglue.c:646
>ip_setsockopt+0x3a/0xa0 net/ipv4/ip_sockglue.c:1252
>udp_setsockopt+0x45/0x80 net/ipv4/udp.c:2401
>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>SYSC_setsockopt net/socket.c:1849 [inline]
>SyS_setsockopt+0x189/0x360 net/socket.c:1828
>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>
> -> #0 (rtnl_mutex){+.+.}:
>lock_acquire+0x1d5/0x580 kernel/locking/lockdep.c:3920
>__mutex_lock_common kernel/locking/mutex.c:756 [inline]
>__mutex_lock+0x16f/0x1a80 kernel/locking/mutex.c:893
>mutex_lock_nested+0x16/0x20 kernel/locking/mutex.c:908
>rtnl_lock+0x17/0x20 net/core/rtnetlink.c:74
>unregister_netdevice_notifier+0x91/0x4e0 net/core/dev.c:1673
>clusterip_config_entry_put net/ipv4/netfilter/ipt_CLUSTERIP.c:114
> [inline]
>clusterip_tg_destroy+0x389/0x6e0
> net/ipv4/netfilter/ipt_CLUSTERIP.c:518
>cleanup_entry+0x218/0x350 net/ipv4/netfilter/ip_tables.c:654
>__do_replace+0x79d/0xa50 net/ipv4/netfilter/ip_tables.c:1089
>do_replace net/ipv4/netfilter/ip_tables.c:1145 [inline]
>do_ipt_set_ctl+0x40f/0x5f0 net/ipv4/netfilter/ip_tables.c:1675
>nf_sockopt net/netfilter/nf_sockopt.c:106 [inline]
>nf_setsockopt+0x67/0xc0 net/netfilter/nf_sockopt.c:115
>ip_setsockopt+0x97/0xa0 net/ipv4/ip_sockglue.c:1259
>tcp_setsockopt+0x82/0xd0 net/ipv4/tcp.c:2905
>sock_common_setsockopt+0x95/0xd0 net/core/sock.c:2975
>SYSC_setsockopt net/socket.c:1849 [inline]
>SyS_setsockopt+0x189/0x360 net/socket.c:1828
>do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
>entry_SYSCALL_64_after_hwframe+0x26/0x9b
>
> other info that might help us debug this:
>
> Chain exists of:
>   rtnl_mutex --> sk_lock-AF_INET --> [i].mutex
>
>  Possible unsafe locking scenario:
>
>CPU0CPU1
>
>   lock([i].mutex);
>lock(sk_lock-AF_INET);
>lock([i].mutex);
>   lock(rtnl_mutex);
>
>  *** DEADLOCK ***

It's probably just a warning.
I'm thinking an improment that moves up xt_table_unlock(t) in __do_replace():

+++ 

Re: [suricata PATCH 1/3] suricata/ebpf: take clang -target bpf include issue of stdint.h into account

2018-02-08 Thread Jesper Dangaard Brouer

On Thu, 08 Feb 2018 00:52:09 +0100 Eric Leblond  wrote:

> Hi,
> 
> On Wed, 2018-02-07 at 23:21 +0100, Jesper Dangaard Brouer wrote:
> > From: Jesper Dangaard Brouer 
> > 
> > This patch prepares code before enabling the clang -target bpf.
> > 
> > The clang compiler does not like #include  when
> > using '-target bpf' it will fail with:
> > 
> >  fatal error: 'gnu/stubs-32.h' file not found  
> ...
> > This can be worked around by installing the 32-bit version of
> > glibc-devel.i686 on your distribution.
> > 
> > But the BPF programs does not really need to include stdint.h,
> > if converting:
> >   uint64_t -> __u64
> >   uint32_t -> __u32
> >   uint16_t -> __u16
> >   uint8_t  -> __u8
> > 
> > This patch does this type syntax conversion.  
> 
> There is an issue for system like Debian because they don't have a
> asm/types.h in the include path if the architecture is not defined
> which is the case due to target bpf. This results in:
> 
> clang-5.0 -Wall -Iinclude -O2 \
>   -D__KERNEL__ -D__ASM_SYSREG_H \
>   -target bpf -S -emit-llvm vlan_filter.c -o vlan_filter.ll
> In file included from vlan_filter.c:19:
> In file included from include/linux/bpf.h:11:
> /usr/include/linux/types.h:5:10: fatal error: 'asm/types.h' file not
> found
> #include 
>  ^
> 1 error generated.
> Makefile:523: recipe for target 'vlan_filter.bpf' failed
> 
> To go into details, the Debian package providing the 'asm/typs.h'
> include is the the headers or linux-libc-dev. But this package comes
> with a flavor and thus we have a prefix: 
>  linux-libc-dev:amd64: /usr/include/x86_64-linux-gnu/asm/types.h

Oh, the joy of distro choices.

> "Fun" part here is that if you build a debian package of the via make
> in Linux tree then the linux-libc-dev package is correct.
> 
> So I propose the following patch that fixes the issue for me:
> 
> diff --git a/ebpf/Makefile.am b/ebpf/Makefile.am
> index 89a3304e9..712b05343 100644
> --- a/ebpf/Makefile.am
> +++ b/ebpf/Makefile.am
> @@ -16,6 +16,7 @@ all: $(BPF_TARGETS)
>  $(BPF_TARGETS): %.bpf: %.c
>  #  From C-code to LLVM-IR format suffix .ll (clang -S -emit-llvm)
> ${CLANG} -Wall $(BPF_CFLAGS) -O2 \
> +   -I/usr/include/$(host_cpu)-$(host_os)/ \

Cool solution. These variables originate from configure/automake.

Would it be more technical correct to use(?): $(build_cpu)-$(build_os)

I verified that the variables are the same (notice 'make -p' trick):

$ make -p | egrep '_os'
build_os = linux-gnu
host_os = linux-gnu

$ make -p | egrep '_cpu'
host_cpu = x86_64
build_cpu = x86_64



> -D__KERNEL__ -D__ASM_SYSREG_H \
> -target bpf -S -emit-llvm $< -o ${@:.bpf=.ll}
>  #  From LLVM-IR to BPF-bytecode in ELF-obj file
> 
> Let me know if it is ok for you.

I'm fine with this fix.

I wonder if we should check other distros?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-queue 1/3] Partial revert "e1000e: Avoid receiver overrun interrupt bursts"

2018-02-08 Thread Benjamin Poirier
I forgot to mark it as such but this is v2 of the series originally
submitted in this thread:
https://lkml.org/lkml/2018/1/26/93

Changes since v1:
* series rebased to apply over "e1000e: Remove Other from EIAC."
  http://patchwork.ozlabs.org/patch/867833/
  This essentially removes patch 3/3 from the original series.
* dropped [PATCH 2/3] Revert "e1000e: Separate signaling for link
  check/link up". After Alex's feedback, I think that problem needs to
  be addressed differently and I will submit a separate patch for it.
* patch 1 was split into three parts. Instead of manually clearing OTHER
  via a write to icr as in v1, in v2 we make sure that INT_ASSERTED is
  always set via bits for all events related to the Other interrupt in
  IMS.

Benjamin Poirier (3):
  Partial revert "e1000e: Avoid receiver overrun interrupt bursts"
  e1000e: Fix queue interrupt re-raising in Other interrupt.
  e1000e: Avoid missed interrupts following ICR read.

 drivers/net/ethernet/intel/e1000e/defines.h | 21 ++-
 drivers/net/ethernet/intel/e1000e/netdev.c  | 32 +
 2 files changed, 30 insertions(+), 23 deletions(-)

On 2018/02/08 15:47, Benjamin Poirier wrote:
> This partially reverts commit 4aea7a5c5e940c1723add439f4088844cd26196d.
> 
> We keep the fix for the first part of the problem (1) described in the log
> of that commit, that is to read ICR in the other interrupt handler. We
> remove the fix for the second part of the problem (2), Other interrupt
> throttling.
> 
> Bursts of "Other" interrupts may once again occur during rxo (receive
> overflow) traffic conditions. This is deemed acceptable in the interest of
> avoiding unforeseen fallout from changes that are not strictly necessary.
> As discussed, the e1000e driver should be in "maintenance mode".
> 
> Link: https://www.spinics.net/lists/netdev/msg480675.html
> Signed-off-by: Benjamin Poirier 
> ---
>  drivers/net/ethernet/intel/e1000e/netdev.c | 16 ++--
>  1 file changed, 2 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c 
> b/drivers/net/ethernet/intel/e1000e/netdev.c
> index 153ad406c65e..3b36efa6228d 100644
> --- a/drivers/net/ethernet/intel/e1000e/netdev.c
> +++ b/drivers/net/ethernet/intel/e1000e/netdev.c
> @@ -1915,21 +1915,10 @@ static irqreturn_t e1000_msix_other(int 
> __always_unused irq, void *data)
>   struct e1000_adapter *adapter = netdev_priv(netdev);
>   struct e1000_hw *hw = >hw;
>   u32 icr;
> - bool enable = true;
>  
>   icr = er32(ICR);
>   ew32(ICR, E1000_ICR_OTHER);
>  
> - if (icr & E1000_ICR_RXO) {
> - ew32(ICR, E1000_ICR_RXO);
> - enable = false;
> - /* napi poll will re-enable Other, make sure it runs */
> - if (napi_schedule_prep(>napi)) {
> - adapter->total_rx_bytes = 0;
> - adapter->total_rx_packets = 0;
> - __napi_schedule(>napi);
> - }
> - }
>   if (icr & E1000_ICR_LSC) {
>   ew32(ICR, E1000_ICR_LSC);
>   hw->mac.get_link_status = true;
> @@ -1938,7 +1927,7 @@ static irqreturn_t e1000_msix_other(int __always_unused 
> irq, void *data)
>   mod_timer(>watchdog_timer, jiffies + 1);
>   }
>  
> - if (enable && !test_bit(__E1000_DOWN, >state))
> + if (!test_bit(__E1000_DOWN, >state))
>   ew32(IMS, E1000_IMS_OTHER);
>  
>   return IRQ_HANDLED;
> @@ -2708,8 +2697,7 @@ static int e1000e_poll(struct napi_struct *napi, int 
> weight)
>   napi_complete_done(napi, work_done);
>   if (!test_bit(__E1000_DOWN, >state)) {
>   if (adapter->msix_entries)
> - ew32(IMS, adapter->rx_ring->ims_val |
> -  E1000_IMS_OTHER);
> + ew32(IMS, adapter->rx_ring->ims_val);
>   else
>   e1000_irq_enable(adapter);
>   }
> -- 
> 2.16.1
> 


Re: [Patch net] ipt_CLUSTERIP: fix a refcount bug in clusterip_config_find_get()

2018-02-08 Thread Florian Westphal
Cong Wang  wrote:
> In clusterip_config_find_get() we hold RCU read lock so it could
> run concurrently with clusterip_config_entry_put(), as a result,
> the refcnt could go back to 1 from 0, which leads to a double
> list_del()... Just replace refcount_inc() with
> refcount_inc_not_zero(), as for c->refcount.
> 
> Fixes: d73f33b16883 ("netfilter: CLUSTERIP: RCU conversion")
> Cc: Eric Dumazet 
> Cc: Pablo Neira Ayuso 
> Signed-off-by: Cong Wang 
> ---
>  net/ipv4/netfilter/ipt_CLUSTERIP.c | 6 --
>  1 file changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c 
> b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> index 1ff72b87a066..4537b1686c7c 100644
> --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c
> +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c
> @@ -154,8 +154,10 @@ clusterip_config_find_get(struct net *net, __be32 
> clusterip, int entry)
>  #endif
>   if (unlikely(!refcount_inc_not_zero(>refcount)))
>   c = NULL;
> - else if (entry)
> - refcount_inc(>entries);
> + else if (entry) {
> + if (unlikely(!refcount_inc_not_zero(>entries)))

this needs to call clusterip_config_put(c); too, else we leak one
reference.

Other than that this looks good.


<    1   2