date:20180318

Re: [PATCH] json_print: fix print_uint with helper type extensions

2018-03-18 Thread Kevin Darbyshire-Bryant

> On 16 Mar 2018, at 20:34, Stephen Hemminger  
> wrote:
>> 
>> print_uint64(PRINT_ANY, "refcnt", "refcnt %" PRIu64 " ", t->tcm_info)
>> 
>> Signed-off-by: Kevin Darbyshire-Bryant 
> 
> I am fine with this. But since there is no code using it yet, it should
> go net-next branch.
> 
> Reviewed-by: Stephen Hemminger 

Existing code is tripping up over the hidden uint - > uint64_t promotion in 
print_uint in iproute2 v4.15, that’s how I fell over the issue.  Should I split 
the patch?  One fixing the uint->uint64_t and the other offering the explicit 
type length options.

Obviously I now realise that the email header should have iproute2 in it.  
Learning, slowly :-)

Cheers,

Kevin D-B

012C ACB2 28C6 C53E 9775  9123 B3A2 389B 9DE2 334A

signature.asc
Description: Message signed with OpenPGP

cześć piękna

2018-03-18 Thread Wesley

Mam nadzieję, że czujesz się dobrze, a ten e-mail spotka Cię w dobrym stanie 
zdrowia. Przepraszam, jeśli przeszkadzam lub przeszkadzam. Bardzo przepraszam 
za naruszenie Twojej prywatności. Nazywam się Wesley. Jestem ze stanu, jestem 
singlem i mam dość samotności. Jestem bardzo pogodnym, życzliwym i pozytywnym 
człowiekiem, a obecnie szukam związku, w którym czuję się kochany. Mam 
nadzieję, że będzie to nowy etap w moim życiu. Opowiedz mi więcej o sobie, 
jeśli nie masz nic przeciwko.

Zawsze doceniałem szczerość, uczciwość. Mam nadzieję, że możemy się lepiej 
poznać.

Mam nadzieję, że wkrótce się odezwę.

Pozdrowienia,

Wesley.

cześć piękna

2018-03-18 Thread Wesley

Mam nadzieję, że czujesz się dobrze, a ten e-mail spotka Cię w dobrym stanie 
zdrowia. Przepraszam, jeśli przeszkadzam lub przeszkadzam. Bardzo przepraszam 
za naruszenie Twojej prywatności. Nazywam się Wesley. Jestem ze stanu, jestem 
singlem i mam dość samotności. Jestem bardzo pogodnym, życzliwym i pozytywnym 
człowiekiem, a obecnie szukam związku, w którym czuję się kochany. Mam 
nadzieję, że będzie to nowy etap w moim życiu. Opowiedz mi więcej o sobie, 
jeśli nie masz nic przeciwko.

Zawsze doceniałem szczerość, uczciwość. Mam nadzieję, że możemy się lepiej 
poznać.

Mam nadzieję, że wkrótce się odezwę.

Pozdrowienia,

Wesley.

Good day friend!!!

2018-03-18 Thread Wesley

Am Wes from United States but currently in Syria for peace keeping mission. I 
am currently looking for friendship that will lead to  relationship in which I 
feel loved again.

I want to get to know you better, if I may be so bold. I consider myself an 
easy-going man..

Please forgive my manners am not good when it comes to Internet because that is 
not really my field.  Here in Syria we are not allowed to go out that makes it 
very bored for me so I just think I need a friend to talk to outside to keep me 
going...

I would love to get to know the "real" you as a friend. Your likes, your 
dislikes, your interests..what makes you.

My favorite color is Blue. My favorite food is BACON, I could easily become a 
vegetarian if it wasn't for bacon!!

I hope you can tell me more details about your job, relationship and your 
past.



Hoping to hear from you soon.

Wes.

Re: [PATCH net-next 03/12] dt-bindings: net: dwmac-sun8i: Clean up clock delay chain descriptions

2018-03-18 Thread Sergei Shtylyov


Hello!

On 3/17/2018 12:28 PM, Chen-Yu Tsai wrote:


The clock delay chains found in the glue layer for dwmac-sun8i are only
used with RGMII PHYs. They are not intended for non-RGMII PHYs, such as
MII external PHYs or the internal PHY. Also, a recent SoC has a smaller
range of possible values for the delay chain.

This patch reformats the delay chain section of the device tree binding
to make it clear that the delay chains only apply to RGMII PHYs, and
make it easier to add the R40-specific bits later.

Signed-off-by: Chen-Yu Tsai 
---
  Documentation/devicetree/bindings/net/dwmac-sun8i.txt | 11 +++
  1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt 
b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
index 3d6d5fa0c4d5..b8a3028d6c30 100644
--- a/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
+++ b/Documentation/devicetree/bindings/net/dwmac-sun8i.txt
@@ -28,10 +28,13 @@ Required properties:
- allwinner,sun8i-a83t-system-controller
  
  Optional properties:

-- allwinner,tx-delay-ps: TX clock delay chain value in ps. Range value is 
0-700. Default is 0)
-- allwinner,rx-delay-ps: RX clock delay chain value in ps. Range value is 
0-3100. Default is 0)
-Both delay properties need to be a multiple of 100. They control the delay for
-external PHY.
+- allwinner,tx-delay-ps: TX clock delay chain value in ps.
+Range is 0-700. Default is 0.
+- allwinner,rx-delay-ps: RX clock delay chain value in ps.
+Range is 0-3100. Default is 0.
+Both delay properties need to be a multiple of 100. They control the
+clock delay for external RGMII PHY. They are do apply to the internal


  s/are do/do not/?


+PHY or external non-RGMII PHYs.
  
  Optional properties for the following compatibles:

- "allwinner,sun8i-h3-emac",


MBR, Sergei

linux-next on x60: network manager often complains "network is disabled" after resume

2018-03-18 Thread Pavel Machek

Hi!

With recent linux-next, after resume networkmanager often claims that
"network is disabled". Sometimes suspend/resume clears that.

Any ideas? Does it work for you?
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature

Re: [PATCH] vmxnet3: fix LRO feature check

2018-03-18 Thread kbuild test robot

Hi Igor,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.16-rc4]
[also build test WARNING on next-20180316]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Igor-Pylypiv/vmxnet3-fix-LRO-feature-check/20180318-140725
config: x86_64-randconfig-s3-03181820 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from include/linux/kernel.h:10:0,
from include/linux/list.h:9,
from include/linux/module.h:9,
from drivers/net//vmxnet3/vmxnet3_drv.c:27:
   drivers/net//vmxnet3/vmxnet3_drv.c: In function 'vmxnet3_rq_rx_complete':
   drivers/net//vmxnet3/vmxnet3_drv.c:1474:8: warning: suggest parentheses 
around operand of '!' or change '&' to '&&' or '!' to '~' [-Wparentheses]
   !adapter->netdev->features & NETIF_F_LRO) {
   ^~~
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/net//vmxnet3/vmxnet3_drv.c:1473:4: note: in expansion of macro 'if'
   if (!rcd->tcp ||
   ^~
   drivers/net//vmxnet3/vmxnet3_drv.c:1474:8: warning: suggest parentheses 
around operand of '!' or change '&' to '&&' or '!' to '~' [-Wparentheses]
   !adapter->netdev->features & NETIF_F_LRO) {
   ^~~
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^~~~
>> drivers/net//vmxnet3/vmxnet3_drv.c:1473:4: note: in expansion of macro 'if'
   if (!rcd->tcp ||
   ^~
   drivers/net//vmxnet3/vmxnet3_drv.c:1474:8: warning: suggest parentheses 
around operand of '!' or change '&' to '&&' or '!' to '~' [-Wparentheses]
   !adapter->netdev->features & NETIF_F_LRO) {
   ^~~
   include/linux/compiler.h:69:16: note: in definition of macro '__trace_if'
  __r = !!(cond); \
   ^~~~
>> drivers/net//vmxnet3/vmxnet3_drv.c:1473:4: note: in expansion of macro 'if'
   if (!rcd->tcp ||
   ^~

vim +/if +1473 drivers/net//vmxnet3/vmxnet3_drv.c

  1255  
  1256  static int
  1257  vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
  1258 struct vmxnet3_adapter *adapter, int quota)
  1259  {
  1260  static const u32 rxprod_reg[2] = {
  1261  VMXNET3_REG_RXPROD, VMXNET3_REG_RXPROD2
  1262  };
  1263  u32 num_pkts = 0;
  1264  bool skip_page_frags = false;
  1265  struct Vmxnet3_RxCompDesc *rcd;
  1266  struct vmxnet3_rx_ctx *ctx = &rq->rx_ctx;
  1267  u16 segCnt = 0, mss = 0;
  1268  #ifdef __BIG_ENDIAN_BITFIELD
  1269  struct Vmxnet3_RxDesc rxCmdDesc;
  1270  struct Vmxnet3_RxCompDesc rxComp;
  1271  #endif
  1272  vmxnet3_getRxComp(rcd, 
&rq->comp_ring.base[rq->comp_ring.next2proc].rcd,
  1273&rxComp);
  1274  while (rcd->gen == rq->comp_ring.gen) {
  1275  struct vmxnet3_rx_buf_info *rbi;
  1276  struct sk_buff *skb, *new_skb = NULL;
  1277  struct page *new_page = NULL;
  1278  dma_addr_t new_dma_addr;
  1279  int num_to_alloc;
  1280  struct Vmxnet3_RxDesc *rxd;
  1281  u32 idx, ring_idx;
  1282  struct vmxnet3_cmd_ring *ring = NULL;
  1283  if (num_pkts >= quota) {
  1284  /* we may stop even before we see the EOP desc 
of
  1285   * the current pkt
  1286   */
  1287  break;
  1288  }
  1289  BUG_ON(rcd->rqID != rq->qid && rcd->rqID != rq->qid2 &&
  1290 rcd->rqID != rq->dataRingQid);
  1291  idx = rcd->rxdIdx;
  1292  ring_idx = VMXNET3_GET_RING_IDX(adapter, rcd->rqID);
  1293  ring = rq->rx_ring + ring_idx;
  1294  vmxnet3_getRxDesc(rxd, 
&rq->rx_ring[ring_idx].base[idx].rxd,
  1295&rxCmdDesc);
  1296  rbi = rq->buf_info[ring_idx] + idx;
  1297  
  1298

Re: [PATCH v2 1/2] sysfs: symlink: export sysfs_create_link_nowarn()

2018-03-18 Thread Greg Kroah-Hartman

On Fri, Mar 16, 2018 at 05:08:34PM -0500, Grygorii Strashko wrote:
> The sysfs_create_link_nowarn() is going to be used in phylib framework in
> subsequent patch which can be built as module. Hence, export
> sysfs_create_link_nowarn() to avoid build errors.
> 
> Cc: Florian Fainelli 
> Cc: Andrew Lunn 
> Fixes: a3995460491d ("net: phy: Relax error checking on sysfs_create_link()")

This specific patch doesn't fix anything, it just _allows_ it to be
fixed in the second patch :)

Anyway, just a nit...

Acked-by: Greg Kroah-Hartman

Re: [PATCH] net: dsa: drop some VLAs in switch.c

2018-03-18 Thread Salvatore Mesoraca

2018-03-14 13:48 GMT+01:00 Salvatore Mesoraca :
> 2018-03-14 12:24 GMT+01:00 David Laight :
>> Isn't using DECLARE_BITMAP() completely OTT when the maximum size is less
>> than the number of bits in a word?
>
> It allocates ceiling(size/8) "unsigned long"s, so yes.

Actually I meant ceiling(size/8/sizeof(unsigned long))
I'm sorry for the typo.

Salvatore

RE: [PATCH v11 crypto 00/12] Chelsio Inline TLS

2018-03-18 Thread Atul Gupta

Hi Dave/Herbert,

This series is against crypto tree, should I submit two patch series:
1. netdev specific changes against net-next tree?
2. crypto changes against crypto tree?

Regards
Atul

-Original Message-
From: David Miller [mailto:da...@davemloft.net] 
Sent: Sunday, March 18, 2018 5:33 AM
To: Atul Gupta 
Cc: davejwat...@fb.com; herb...@gondor.apana.org.au; s...@queasysnail.net; 
sbri...@redhat.com; linux-cry...@vger.kernel.org; netdev@vger.kernel.org; 
Ganesh GR 
Subject: Re: [PATCH v11 crypto 00/12] Chelsio Inline TLS

From: Atul Gupta 
Date: Fri, 16 Mar 2018 21:06:22 +0530

> Series for Chelsio Inline TLS driver (chtls)

This series doesn't even come close to applying to the net-next tree, please 
respin.

Thank you.

Re: [PATCH v11 crypto 00/12] Chelsio Inline TLS

2018-03-18 Thread David Miller

From: Atul Gupta 
Date: Sun, 18 Mar 2018 14:30:30 +

> Hi Dave/Herbert,
> 
> This series is against crypto tree, should I submit two patch series:
> 1. netdev specific changes against net-next tree?
> 2. crypto changes against crypto tree?

Herbert, is it OK for this entire series to go via net-next?

Thanks!

Re: HW question: i210 vs. BCM5461S over SGMII: no response from PHY to MDIO requests?

2018-03-18 Thread Andrew Lunn

> I'm not getting an ACK from the SFP, probably because I've got the 
> address and offset wrong and because I'd better use indirect access.
> There's some more work awaiting me...

Try address 0x50.

i2detect will probe all addresses for you, if you have a standard
Linux i2c bus.

  Andrew

[PATCH net] devlink: Remove redundant free on error path

2018-03-18 Thread Arkadi Sharshevsky

The current code performs unneeded free. Remove the redundant skb freeing
during the error path.

Fixes: 1555d204e743 ("devlink: Support for pipeline debug (dpipe)")
Signed-off-by: Arkadi Sharshevsky 
---
 net/core/devlink.c | 16 
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index f23e5ed..7917838 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1798,7 +1798,7 @@ static int devlink_dpipe_tables_fill(struct genl_info 
*info,
if (!nlh) {
err = devlink_dpipe_send_and_alloc_skb(&skb, info);
if (err)
-   goto err_skb_send_alloc;
+   return err;
goto send_done;
}
 
@@ -1807,7 +1807,6 @@ static int devlink_dpipe_tables_fill(struct genl_info 
*info,
 nla_put_failure:
err = -EMSGSIZE;
 err_table_put:
-err_skb_send_alloc:
genlmsg_cancel(skb, hdr);
nlmsg_free(skb);
return err;
@@ -2073,7 +2072,7 @@ static int devlink_dpipe_entries_fill(struct genl_info 
*info,
 table->counters_enabled,
 &dump_ctx);
if (err)
-   goto err_entries_dump;
+   return err;
 
 send_done:
nlh = nlmsg_put(dump_ctx.skb, info->snd_portid, info->snd_seq,
@@ -2081,16 +2080,10 @@ static int devlink_dpipe_entries_fill(struct genl_info 
*info,
if (!nlh) {
err = devlink_dpipe_send_and_alloc_skb(&dump_ctx.skb, info);
if (err)
-   goto err_skb_send_alloc;
+   return err;
goto send_done;
}
return genlmsg_reply(dump_ctx.skb, info);
-
-err_entries_dump:
-err_skb_send_alloc:
-   genlmsg_cancel(dump_ctx.skb, dump_ctx.hdr);
-   nlmsg_free(dump_ctx.skb);
-   return err;
 }
 
 static int devlink_nl_cmd_dpipe_entries_get(struct sk_buff *skb,
@@ -2229,7 +,7 @@ static int devlink_dpipe_headers_fill(struct genl_info 
*info,
if (!nlh) {
err = devlink_dpipe_send_and_alloc_skb(&skb, info);
if (err)
-   goto err_skb_send_alloc;
+   return err;
goto send_done;
}
return genlmsg_reply(skb, info);
@@ -2237,7 +2230,6 @@ static int devlink_dpipe_headers_fill(struct genl_info 
*info,
 nla_put_failure:
err = -EMSGSIZE;
 err_table_put:
-err_skb_send_alloc:
genlmsg_cancel(skb, hdr);
nlmsg_free(skb);
return err;
-- 
2.4.11

Re: [PATCH v11 crypto 12/12] crypto: chtls - Makefile Kconfig (fwd)

2018-03-18 Thread Julia Lawall

Please check the indentation on line 1655.

thanks,
julia

-- Forwarded message --
Date: Sun, 18 Mar 2018 18:15:36 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH v11 crypto 12/12] crypto: chtls - Makefile Kconfig

CC: kbuild-...@01.org
In-Reply-To: <1521214661-28928-12-git-send-email-atul.gu...@chelsio.com>
References: <1521214661-28928-12-git-send-email-atul.gu...@chelsio.com>
TO: Atul Gupta 
CC: davejwat...@fb.com, da...@davemloft.net, herb...@gondor.apana.org.au
CC: s...@queasysnail.net, sbri...@redhat.com, linux-cry...@vger.kernel.org, 
netdev@vger.kernel.org, ganes...@chelsio.com

Hi Atul,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on v4.16-rc4]
[cannot apply to next-20180316]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Atul-Gupta/tls-support-for-Inline-tls-record/20180318-162840
config: i386-allmodconfig (attached as .config)
compiler: gcc-7 (Debian 7.3.0-1) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
:: branch date: 2 hours ago
:: commit date: 2 hours ago

All error/warnings (new ones prefixed by >>):

   drivers/crypto/chelsio/chtls/chtls_io.c: In function 'chtls_expansion_size':
>> drivers/crypto/chelsio/chtls/chtls_io.c:457:2: error: expected ',' or ';' 
>> before 'int'
 int expnsize, frcn, fraglast, fragsize;
 ^~~
>> drivers/crypto/chelsio/chtls/chtls_io.c:461:3: error: 'fragsize' undeclared 
>> (first use in this function); did you mean 'ivs_size'?
  fragsize = hws->mfs;
  ^~~~
  ivs_size
   drivers/crypto/chelsio/chtls/chtls_io.c:461:3: note: each undeclared 
identifier is reported only once for each function it appears in
>> drivers/crypto/chelsio/chtls/chtls_io.c:465:4: error: 'frcnt' undeclared 
>> (first use in this function); did you mean 'pducnt'?
   frcnt = (data_len / fragsize);
   ^
   pducnt
>> drivers/crypto/chelsio/chtls/chtls_io.c:468:4: error: 'expnsize' undeclared 
>> (first use in this function); did you mean 'fragsize'?
   expnsize =  frcnt * expppdu;
   ^~~~
   fragsize
>> drivers/crypto/chelsio/chtls/chtls_io.c:480:4: error: 'fraglast' undeclared 
>> (first use in this function); did you mean 'rb_last'?
   fraglast = data_len % fragsize;
   ^~~~
   rb_last
   drivers/crypto/chelsio/chtls/chtls_io.c: In function 'peekmsg':
>> drivers/crypto/chelsio/chtls/chtls_io.c:1653:5: warning: this 'if' clause 
>> does not guard... [-Wmisleading-indentation]
if (!copied)
^~
   drivers/crypto/chelsio/chtls/chtls_io.c:1655:6: note: ...this statement, but 
the latter is misleadingly indented as if it were guarded by the 'if'
 break;
 ^
   drivers/crypto/chelsio/chtls/chtls_io.c: In function 'chtls_expansion_size':
>> drivers/crypto/chelsio/chtls/chtls_io.c:492:1: warning: control reaches end 
>> of non-void function [-Wreturn-type]
}
^

coccinelle warnings: (new ones prefixed by >>)

>> drivers/crypto/chelsio/chtls/chtls_io.c:1654:5-22: code aligned with 
>> following code on line 1655

# 
https://github.com/0day-ci/linux/commit/635907fe348f84b525d7ce16ae8f2a9b82c631e3
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 635907fe348f84b525d7ce16ae8f2a9b82c631e3
vim +1654 drivers/crypto/chelsio/chtls/chtls_io.c

8ae18d74 Atul Gupta 2018-03-16  1542
8ae18d74 Atul Gupta 2018-03-16  1543  /*
8ae18d74 Atul Gupta 2018-03-16  1544   * Peek at data in a socket's receive 
buffer.
8ae18d74 Atul Gupta 2018-03-16  1545   */
8ae18d74 Atul Gupta 2018-03-16  1546  static int peekmsg(struct sock *sk, 
struct msghdr *msg,
8ae18d74 Atul Gupta 2018-03-16  1547   size_t len, int nonblock, 
int flags)
8ae18d74 Atul Gupta 2018-03-16  1548  {
8ae18d74 Atul Gupta 2018-03-16  1549struct tcp_sock *tp = tcp_sk(sk);
8ae18d74 Atul Gupta 2018-03-16  1550struct sk_buff *skb;
8ae18d74 Atul Gupta 2018-03-16  1551u32 peek_seq, offset;
8ae18d74 Atul Gupta 2018-03-16  1552int copied = 0;
8ae18d74 Atul Gupta 2018-03-16  1553size_t avail;  /* amount of 
available data in current skb */
8ae18d74 Atul Gupta 2018-03-16  1554long timeo;
8ae18d74 Atul Gupta 2018-03-16  1555
8ae18d74 Atul Gupta 2018-03-16  1556lock_sock(sk);
8ae18d74 Atul Gupta 2018-03-16  1557timeo = sock_rcvtimeo(sk, nonblock);
8ae18d74 Atul Gupta 2018-03-16  1558peek_seq = tp->copied_seq;
8ae18d74 Atul Gupta 2018-03-16  1559
8ae18d74 Atul Gupta 2018-03-16  1560do {
8ae18d74 Atul Gupta 2018-03-16  1561

Re: [PATCH v11 crypto 01/12] tls: support for Inline tls record (fwd)

2018-03-18 Thread Julia Lawall


ctx is dereferenced on line 258 but has been freed on line 229.

julia

-- Forwarded message --
Date: Sun, 18 Mar 2018 18:05:25 +0800
From: kbuild test robot 
To: kbu...@01.org
Cc: Julia Lawall 
Subject: Re: [PATCH v11 crypto 01/12] tls: support for Inline tls record

CC: kbuild-...@01.org
In-Reply-To: <1521214661-28928-1-git-send-email-atul.gu...@chelsio.com>
References: <1521214661-28928-1-git-send-email-atul.gu...@chelsio.com>
TO: Atul Gupta 
CC: davejwat...@fb.com, da...@davemloft.net, herb...@gondor.apana.org.au, 
s...@queasysnail.net, sbri...@redhat.com, linux-cry...@vger.kernel.org, 
netdev@vger.kernel.org, ganes...@chelsio.com
CC: s...@queasysnail.net, sbri...@redhat.com, linux-cry...@vger.kernel.org, 
netdev@vger.kernel.org, ganes...@chelsio.com

Hi Atul,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on v4.16-rc4]
[cannot apply to next-20180316]
[if your patch is applied to the wrong git tree, please drop us a note to help 
improve the system]

url:
https://github.com/0day-ci/linux/commits/Atul-Gupta/tls-support-for-Inline-tls-record/20180318-162840
:: branch date: 2 hours ago
:: commit date: 2 hours ago

>> net/tls/tls_main.c:258:5-8: ERROR: reference preceded by free on line 229

# 
https://github.com/0day-ci/linux/commit/be47378786b9d9874dfc3ab57504565275c7b3ff
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout be47378786b9d9874dfc3ab57504565275c7b3ff
vim +258 net/tls/tls_main.c

3c4d75591 Dave Watson   2017-06-14  218
3c4d75591 Dave Watson   2017-06-14  219  static void tls_sk_proto_close(struct 
sock *sk, long timeout)
3c4d75591 Dave Watson   2017-06-14  220  {
3c4d75591 Dave Watson   2017-06-14  221 struct tls_context *ctx = 
tls_get_ctx(sk);
3c4d75591 Dave Watson   2017-06-14  222 long timeo = sock_sndtimeo(sk, 
0);
3c4d75591 Dave Watson   2017-06-14  223 void (*sk_proto_close)(struct 
sock *sk, long timeout);
3c4d75591 Dave Watson   2017-06-14  224
3c4d75591 Dave Watson   2017-06-14  225 lock_sock(sk);
ff45d820a Ilya Lesokhin 2017-11-13  226 sk_proto_close = 
ctx->sk_proto_close;
ff45d820a Ilya Lesokhin 2017-11-13  227
ff45d820a Ilya Lesokhin 2017-11-13  228 if (ctx->tx_conf == 
TLS_BASE_TX) {
ff45d820a Ilya Lesokhin 2017-11-13 @229 kfree(ctx);
ff45d820a Ilya Lesokhin 2017-11-13  230 goto skip_tx_cleanup;
ff45d820a Ilya Lesokhin 2017-11-13  231 }
3c4d75591 Dave Watson   2017-06-14  232
3c4d75591 Dave Watson   2017-06-14  233 if 
(!tls_complete_pending_work(sk, ctx, 0, &timeo))
3c4d75591 Dave Watson   2017-06-14  234 
tls_handle_open_record(sk, 0);
3c4d75591 Dave Watson   2017-06-14  235
3c4d75591 Dave Watson   2017-06-14  236 if (ctx->partially_sent_record) 
{
3c4d75591 Dave Watson   2017-06-14  237 struct scatterlist *sg 
= ctx->partially_sent_record;
3c4d75591 Dave Watson   2017-06-14  238
3c4d75591 Dave Watson   2017-06-14  239 while (1) {
3c4d75591 Dave Watson   2017-06-14  240 
put_page(sg_page(sg));
3c4d75591 Dave Watson   2017-06-14  241 
sk_mem_uncharge(sk, sg->length);
3c4d75591 Dave Watson   2017-06-14  242
3c4d75591 Dave Watson   2017-06-14  243 if 
(sg_is_last(sg))
3c4d75591 Dave Watson   2017-06-14  244 break;
3c4d75591 Dave Watson   2017-06-14  245 sg++;
3c4d75591 Dave Watson   2017-06-14  246 }
3c4d75591 Dave Watson   2017-06-14  247 }
ff45d820a Ilya Lesokhin 2017-11-13  248
3c4d75591 Dave Watson   2017-06-14  249 kfree(ctx->rec_seq);
3c4d75591 Dave Watson   2017-06-14  250 kfree(ctx->iv);
3c4d75591 Dave Watson   2017-06-14  251
ff45d820a Ilya Lesokhin 2017-11-13  252 if (ctx->tx_conf == TLS_SW_TX)
ff45d820a Ilya Lesokhin 2017-11-13  253 
tls_sw_free_tx_resources(sk);
3c4d75591 Dave Watson   2017-06-14  254
ff45d820a Ilya Lesokhin 2017-11-13  255  skip_tx_cleanup:
3c4d75591 Dave Watson   2017-06-14  256 release_sock(sk);
3c4d75591 Dave Watson   2017-06-14  257 sk_proto_close(sk, timeout);
be4737878 Atul Gupta2018-03-16 @258 if (ctx->tx_conf == 
TLS_HW_RECORD)
be4737878 Atul Gupta2018-03-16  259 kfree(ctx);
3c4d75591 Dave Watson   2017-06-14  260  }
3c4d75591 Dave Watson   2017-06-14  261

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation

Re: [PATCH RFC 1/2] netlink: extend extack so it can carry more than one message

2018-03-18 Thread David Ahern

On 3/16/18 1:23 PM, Marcelo Ricardo Leitner wrote:
> Currently extack can carry only a single message, which is usually the
> error message.
> 
> This imposes a limitation on a more verbose error reporting. For
> example, it's not able to carry warning messages together with the error
> message, or 2 warning messages.


The only means for userspace to separate an error message from info or
warnings is the error in nlmsgerr. If it is non-0, any extack message is
considered an error else it is a warning.


> 
> One use case is when dealing with tc offloading. If it failed to
> offload, and also failed to install on software, it will report only
> regarding the error about the software datapath, but the error from the
> hardware path would also have been welcomed.
> 
> This patch extends extack so it now can carry up to 8 messages and these
> messages may be prefixed similarly to printk/pr_warning, so thus they
> can be tagged either was warning or error.
> 
> Fixed number of messages because supporting a dynamic limit seem to be
> an overkill for the moment. Remember that this is not meant to be a
> trace tool, but an error reporting one.
> 
> Signed-off-by: Marcelo Ricardo Leitner 
> ---
>  include/linux/netlink.h  | 50 
> +---
>  net/netlink/af_netlink.c | 12 +++-
>  2 files changed, 37 insertions(+), 25 deletions(-)
> 
> diff --git a/include/linux/netlink.h b/include/linux/netlink.h
> index 
> f3075d6c7e8229c999ab650537f1e3b11e1f457b..d9780836cf263d4c436d732e9b7a8cde0739ac23
>  100644
> --- a/include/linux/netlink.h
> +++ b/include/linux/netlink.h
> @@ -71,43 +71,53 @@ netlink_kernel_create(struct net *net, int unit, struct 
> netlink_kernel_cfg *cfg)
>   * @cookie: cookie data to return to userspace (for success)
>   * @cookie_len: actual cookie data length
>   */
> +#define NETLINK_MAX_EXTACK_MSGS 8

8 is way too many. If some change fails because of an error, why would a
single error message not be enough? If it is a not an error, why is more
than 1 warning message not enough? (I forget the details of the tc
'skip_sw' use case)


>  struct netlink_ext_ack {
> - const char *_msg;
> + const char *_msg[NETLINK_MAX_EXTACK_MSGS];
>   const struct nlattr *bad_attr;
>   u8 cookie[NETLINK_MAX_COOKIE_LEN];
>   u8 cookie_len;
> + u8 _msg_count;
>  };
>  
> -/* Always use this macro, this allows later putting the
> - * message into a separate section or such for things
> - * like translation or listing all possible messages.
> - * Currently string formatting is not supported (due
> - * to the lack of an output buffer.)
> +/* Always use these macros, this allows later putting
> + * the message into a separate section or such for
> + * things like translation or listing all possible
> + * messages.  Currently string formatting is not
> + * supported (due to the lack of an output buffer.)
>   */
> -#define NL_SET_ERR_MSG(extack, msg) do { \
> - static const char __msg[] = msg;\
> - struct netlink_ext_ack *__extack = (extack);\
> - \
> - if (__extack)   \
> - __extack->_msg = __msg; \
> +#define NL_SET_MSG(extack, msg) do { \
> + static const char __msg[] = msg;\
> + struct netlink_ext_ack *__extack = (extack);\
> + \
> + if (__extack && \
> + !WARN_ON(__extack->_msg_count >= NETLINK_MAX_EXTACK_MSGS))  \
> + __extack->_msg[__extack->_msg_count++] = __msg; \
>  } while (0)
>  
> +#define NL_SET_ERR_MSG(extack, msg)  NL_SET_MSG(extack, msg)
> +#define NL_SET_WARN_MSG(extack, msg) NL_SET_MSG(extack, KERN_WARNING msg)
> +
>  #define NL_SET_ERR_MSG_MOD(extack, msg)  \
>   NL_SET_ERR_MSG((extack), KBUILD_MODNAME ": " msg)
> +#define NL_SET_WARN_MSG_MOD(extack, msg) \
> + NL_SET_WARN_MSG((extack), KBUILD_MODNAME ": " msg)
> +

Adding separate macros for error versus warning is confusing since from
an extack perspective a message is a message and there is no uapi to
separate them.

Re: [PATCH RFC 0/2] Add support for warnings to extack

2018-03-18 Thread David Ahern

On 3/16/18 1:23 PM, Marcelo Ricardo Leitner wrote:
> Currently we have the limitation that warnings cannot be reported though
> extack. For example, when tc flower failed to get offloaded but got
> installed on software datapath. The hardware failure is not fatal and
> thus extack is not even shared with the driver, so the error is simply
> omitted from any logging.

If this set ends up moving forward, the above statement needs to be
corrected: extack allows non-error messages to be sent back to the user,
so the above must be talking about some other limitation local to tc.

[PATCH iproute2] treat "default" and "all"/"any" addresses differenty

2018-03-18 Thread Alexander Zubkov

Debian maintainer found that basic command:
# ip route flush all
No longer worked as expected which breaks user scripts and
expectations. It no longer flushed all IPv4 routes.

Recently behavior of "default" prefix parameter was corrected. But at
the same time behavior of "all"/"any" was altered too, because they
were the same branch of the code. As those parameters mean different,
they need to be treated differently in code too. This patch reflects
the difference.

Also after mentioned change, address parsing code was changed more
and address family was set explicitly even for "all"/"any" addresses.
And that broke matching conditions further. This patch fixes that too
and returns AF_UNSPEC to "all"/"any" address.

Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
IPv4) and "all"/"any" always matches anything in exact, root and match
modes.

Reported-by: Luca Boccassi 
Signed-off-by: Alexander Zubkov 
---
 lib/utils.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/lib/utils.c b/lib/utils.c
index 379739d..eba4fa7 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -560,14 +560,23 @@ static int __get_addr_1(inet_prefix *addr, const char 
*name, int family)
 {
memset(addr, 0, sizeof(*addr));
 
-   if (strcmp(name, "default") == 0 ||
-   strcmp(name, "all") == 0 ||
-   strcmp(name, "any") == 0) {
+   if (strcmp(name, "default") == 0) {
if ((family == AF_DECnet) || (family == AF_MPLS))
return -1;
addr->family = (family != AF_UNSPEC) ? family : AF_INET;
addr->bytelen = af_byte_len(addr->family);
addr->bitlen = -2;
+   addr->flags |= PREFIXLEN_SPECIFIED;
+   return 0;
+   }
+
+   if (strcmp(name, "all") == 0 ||
+   strcmp(name, "any") == 0) {
+   if ((family == AF_DECnet) || (family == AF_MPLS))
+   return -1;
+   addr->family = AF_UNSPEC;
+   addr->bytelen = 0;
+   addr->bitlen = -2;
return 0;
}
 
@@ -695,7 +704,7 @@ int get_prefix_1(inet_prefix *dst, char *arg, int family)
 
bitlen = af_bit_len(dst->family);
 
-   flags = PREFIXLEN_SPECIFIED;
+   flags = 0;
if (slash) {
unsigned int plen;
 
@@ -706,12 +715,11 @@ int get_prefix_1(inet_prefix *dst, char *arg, int family)
if (plen > bitlen)
return -1;
 
+   flags |= PREFIXLEN_SPECIFIED;
bitlen = plen;
} else {
if (dst->bitlen == -2)
bitlen = 0;
-   else
-   flags = 0;
}
 
dst->flags |= flags;
-- 
1.9.1

Re: [PATCH iproute2] treat "default" and "all"/"any" addresses differenty

2018-03-18 Thread Alexander Zubkov

This version of patch is for master now.

18.03.2018, 17:50, "Alexander Zubkov" :
> Debian maintainer found that basic command:
> # ip route flush all
> No longer worked as expected which breaks user scripts and
> expectations. It no longer flushed all IPv4 routes.
>
> Recently behavior of "default" prefix parameter was corrected. But at
> the same time behavior of "all"/"any" was altered too, because they
> were the same branch of the code. As those parameters mean different,
> they need to be treated differently in code too. This patch reflects
> the difference.
>
> Also after mentioned change, address parsing code was changed more
> and address family was set explicitly even for "all"/"any" addresses.
> And that broke matching conditions further. This patch fixes that too
> and returns AF_UNSPEC to "all"/"any" address.
>
> Now "default" is treated as top-level prefix (for example 0.0.0.0/0 in
> IPv4) and "all"/"any" always matches anything in exact, root and match
> modes.
>
> Reported-by: Luca Boccassi 
> Signed-off-by: Alexander Zubkov 
> ---
>  lib/utils.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)
>
> diff --git a/lib/utils.c b/lib/utils.c
> index 379739d..eba4fa7 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -560,14 +560,23 @@ static int __get_addr_1(inet_prefix *addr, const char 
> *name, int family)
>  {
>  memset(addr, 0, sizeof(*addr));
>
> - if (strcmp(name, "default") == 0 ||
> - strcmp(name, "all") == 0 ||
> - strcmp(name, "any") == 0) {
> + if (strcmp(name, "default") == 0) {
>  if ((family == AF_DECnet) || (family == AF_MPLS))
>  return -1;
>  addr->family = (family != AF_UNSPEC) ? family : AF_INET;
>  addr->bytelen = af_byte_len(addr->family);
>  addr->bitlen = -2;
> + addr->flags |= PREFIXLEN_SPECIFIED;
> + return 0;
> + }
> +
> + if (strcmp(name, "all") == 0 ||
> + strcmp(name, "any") == 0) {
> + if ((family == AF_DECnet) || (family == AF_MPLS))
> + return -1;
> + addr->family = AF_UNSPEC;
> + addr->bytelen = 0;
> + addr->bitlen = -2;
>  return 0;
>  }
>
> @@ -695,7 +704,7 @@ int get_prefix_1(inet_prefix *dst, char *arg, int family)
>
>  bitlen = af_bit_len(dst->family);
>
> - flags = PREFIXLEN_SPECIFIED;
> + flags = 0;
>  if (slash) {
>  unsigned int plen;
>
> @@ -706,12 +715,11 @@ int get_prefix_1(inet_prefix *dst, char *arg, int 
> family)
>  if (plen > bitlen)
>  return -1;
>
> + flags |= PREFIXLEN_SPECIFIED;
>  bitlen = plen;
>  } else {
>  if (dst->bitlen == -2)
>  bitlen = 0;
> - else
> - flags = 0;
>  }
>
>  dst->flags |= flags;
> --
> 1.9.1

Re: [PATCH net-next 00/10 v2] selftests: pmtu: Add further vti/vti6 MTU and PMTU tests

2018-03-18 Thread David Ahern

On 3/16/18 7:31 PM, Stefano Brivio wrote:
> Patches 5/10 to 10/10 add tests to verify default MTU assignment
> for vti4 and vti6 interfaces, to check that MTU values set on new
> link and link changes are properly taken and validated, and to
> verify PMTU exceptions on vti4 interfaces.
> 
> Patch 1/10 reverses function return codes as suggested by David
> Ahern.
> 
> Patch 2/10 fixes the helper to fetch exceptions MTU to run in the
> passed namespace.
> 
> Patches 3/10 and 4/10 are preparation work to make it easier to
> introduce those tests.
> 
> v2: Reverse return codes, and make output prettier in 4/9 by
> using padded printf, test descriptions and buffered error
> strings. Remove accidental output to /dev/kmsg from 10/10
> (was 9/9).

pulling pmtu.sh from net-next I see log messages on console:

[  890.741117] ip netns exec ns-kFUQx6 ip link set vti6_a type vti6
remote fc00:1001::0 local fc00:1001::0
[  890.837569] ip netns exec ns-kFUQx6 ip link set vti6_a mtu 1280 type
vti6 remote fc00:1000::0 local fc00:1000::0

Re: [PATCH RFC 0/2] Add support for warnings to extack

2018-03-18 Thread Marcelo Ricardo Leitner

On Fri, Mar 16, 2018 at 03:05:18PM -0700, Jakub Kicinski wrote:
> CC: David Ahern 
> 
> On Fri, 16 Mar 2018 16:23:08 -0300, Marcelo Ricardo Leitner wrote:
> > Currently we have the limitation that warnings cannot be reported though
> > extack. For example, when tc flower failed to get offloaded but got
> > installed on software datapath. The hardware failure is not fatal and
> > thus extack is not even shared with the driver, so the error is simply
> > omitted from any logging.
> > 
> > The idea here is to allow such kind of warnings to get through and be
> > available for the sysadmin or the tool managing such commands (like Open
> > vSwitch), so that if this happens, we will have such log message in a
> > file later.
> > 
> > The first patch extends extack to support more than one message and with
> > different log level (currently only error and warning). The second
> > shares extack with the drivers regardless of skip_sw.
> > 
> > The iproute patch also follows.
> > 
> > This kernel change is backward compatible with older iproute because
> > iproute will only process the last message, which should be the error
> > one in case of failure, or a warning if it suceeded. 
> > 
> > The iproute change is compatible with older kernels because it will find
> > only one message to be processed and will handle it properly.
> > 
> > With this patches, this is now possible:
> > # tc qdisc add dev p7p1 ingress
> > # tc filter add dev p7p1 parent : protocol ip prio 1 flower \
> > src_mac ec:13:db:00:00:00 dst_mac ec:14:c2:00:00:00 \
> > src_ip 56.0.0.0 dst_ip 55.0.0.0 action drop
> > Warning: TC offload is disabled on net device.
> > # echo $?
> > 0
> 
> IMHO this set does more and less than is required to solve the
> problem.  
> 
> The way I understand it is we don't want HW offload errors/warnings to
> be printed to unsuspecting users who didn't specify any skip_* flags.
> What carries the message and whether it's explicitly marked as warning
> or error does not change the fact that user of the SW fwd path may not
> want to not be bothered by offload warnings.

Fair enough. We can then have a 'tc -v' option to enable this more
verbose logging.

> 
> There maybe well be value in ability to report multiple messages.  But
> for opt-in warning messages I would be leaning towards:
> 
> diff --git a/include/net/pkt_cls.h b/include/net/pkt_cls.h
> index 
> e828d31be5dae0ae8c69016dfde50379296484aa..7cec393bb47974b48a6d510b8aa84534a7a98594
>  100644
> --- a/include/net/pkt_cls.h
> +++ b/include/net/pkt_cls.h
> @@ -705,8 +705,7 @@ tc_cls_common_offload_init(struct tc_cls_common_offload 
> *cls_common,
>   cls_common->chain_index = tp->chain->index;
>   cls_common->protocol = tp->protocol;
>   cls_common->prio = tp->prio;
> - if (tc_skip_sw(flags))
> + if (tc_skip_sw(flags) || flags & TCA_CLS_FLAGS_OFFLOAD_VERBOSE)
>   cls_common->extack = extack;
>  }
>  
>  enum tc_fl_command {
> 
> That is admittedly quite conservative.  Esp. in case of flower, cls_bpf
> is used in SW far more than HW, not to mention qdisc offload (although
> flag would be different there)!

Yeah, or something more generic, as a general -v / --verbose option.

  M.

Re: [PATCH net-next 00/10 v2] selftests: pmtu: Add further vti/vti6 MTU and PMTU tests

2018-03-18 Thread Stefano Brivio

On Sun, 18 Mar 2018 11:31:10 -0600
David Ahern  wrote:

> On 3/16/18 7:31 PM, Stefano Brivio wrote:
> > Patches 5/10 to 10/10 add tests to verify default MTU assignment
> > for vti4 and vti6 interfaces, to check that MTU values set on new
> > link and link changes are properly taken and validated, and to
> > verify PMTU exceptions on vti4 interfaces.
> > 
> > Patch 1/10 reverses function return codes as suggested by David
> > Ahern.
> > 
> > Patch 2/10 fixes the helper to fetch exceptions MTU to run in the
> > passed namespace.
> > 
> > Patches 3/10 and 4/10 are preparation work to make it easier to
> > introduce those tests.
> > 
> > v2: Reverse return codes, and make output prettier in 4/9 by
> > using padded printf, test descriptions and buffered error
> > strings. Remove accidental output to /dev/kmsg from 10/10
> > (was 9/9).  
> 
> pulling pmtu.sh from net-next I see log messages on console:
> 
> [  890.741117] ip netns exec ns-kFUQx6 ip link set vti6_a type vti6
> remote fc00:1001::0 local fc00:1001::0
> [  890.837569] ip netns exec ns-kFUQx6 ip link set vti6_a mtu 1280 type
> vti6 remote fc00:1000::0 local fc00:1000::0

Thanks for checking. I accidentally left two more prints to kernel log
in 10/10, I'll send another patch to remove them.

-- 
Stefano

Re: [PATCH RFC 1/2] netlink: extend extack so it can carry more than one message

2018-03-18 Thread Marcelo Ricardo Leitner

On Sun, Mar 18, 2018 at 10:11:20AM -0600, David Ahern wrote:
> On 3/16/18 1:23 PM, Marcelo Ricardo Leitner wrote:
> > Currently extack can carry only a single message, which is usually the
> > error message.
> > 
> > This imposes a limitation on a more verbose error reporting. For
> > example, it's not able to carry warning messages together with the error
> > message, or 2 warning messages.
> 
> 
> The only means for userspace to separate an error message from info or
> warnings is the error in nlmsgerr. If it is non-0, any extack message is
> considered an error else it is a warning.

I don't see your point here.

The proposed patch extends what you said to:
- include warnings on error reports
- allow more than 1 message

With the proposed patch, if nlmsgerr is 0 all messages are considered
as warnings. If it's non-zero, some may be marked as warnings.

AFAICT it is not far from what you described, and still honouring the
main knob, mlmsgerr.

> 
> 
> > 
> > One use case is when dealing with tc offloading. If it failed to
> > offload, and also failed to install on software, it will report only
> > regarding the error about the software datapath, but the error from the
> > hardware path would also have been welcomed.
> > 
> > This patch extends extack so it now can carry up to 8 messages and these
> > messages may be prefixed similarly to printk/pr_warning, so thus they
> > can be tagged either was warning or error.
> > 
> > Fixed number of messages because supporting a dynamic limit seem to be
> > an overkill for the moment. Remember that this is not meant to be a
> > trace tool, but an error reporting one.
> > 
> > Signed-off-by: Marcelo Ricardo Leitner 
> > ---
> >  include/linux/netlink.h  | 50 
> > +---
> >  net/netlink/af_netlink.c | 12 +++-
> >  2 files changed, 37 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/linux/netlink.h b/include/linux/netlink.h
> > index 
> > f3075d6c7e8229c999ab650537f1e3b11e1f457b..d9780836cf263d4c436d732e9b7a8cde0739ac23
> >  100644
> > --- a/include/linux/netlink.h
> > +++ b/include/linux/netlink.h
> > @@ -71,43 +71,53 @@ netlink_kernel_create(struct net *net, int unit, struct 
> > netlink_kernel_cfg *cfg)
> >   * @cookie: cookie data to return to userspace (for success)
> >   * @cookie_len: actual cookie data length
> >   */
> > +#define NETLINK_MAX_EXTACK_MSGS 8
> 
> 8 is way too many. If some change fails because of an error, why would a

Ok. I'm fine with 4 for now.

> single error message not be enough? If it is a not an error, why is more
> than 1 warning message not enough? (I forget the details of the tc
> 'skip_sw' use case)

Because 1 message assumes you have a simple and linear code path being
executed and that it aborts on the first error it encounters.

You could, for example, report several bad arguments at once instead
of having the user to fix & retry until he gets it all right.

You can also have situations like:
Warning: option foo is useless with option bar specified.
(as in: the rule may not work as you intended)
Error: failed to allocate a new node.

The goal is to be able to deliver more information to the
user/software using netlink and be able to log it, for later analysis.

If you have a look at mlx5/core/en_tc.c, there are several
printk(KERN_WARNING or pr_warn calls that should be getting returned
via extack and not to kernel dmesg. That's just one domain and it is
not aware if there is already a message stored in extack or if there
will be another one later.

> 
> 
> >  struct netlink_ext_ack {
> > -   const char *_msg;
> > +   const char *_msg[NETLINK_MAX_EXTACK_MSGS];
> > const struct nlattr *bad_attr;
> > u8 cookie[NETLINK_MAX_COOKIE_LEN];
> > u8 cookie_len;
> > +   u8 _msg_count;
> >  };
> >  
> > -/* Always use this macro, this allows later putting the
> > - * message into a separate section or such for things
> > - * like translation or listing all possible messages.
> > - * Currently string formatting is not supported (due
> > - * to the lack of an output buffer.)
> > +/* Always use these macros, this allows later putting
> > + * the message into a separate section or such for
> > + * things like translation or listing all possible
> > + * messages.  Currently string formatting is not
> > + * supported (due to the lack of an output buffer.)
> >   */
> > -#define NL_SET_ERR_MSG(extack, msg) do {   \
> > -   static const char __msg[] = msg;\
> > -   struct netlink_ext_ack *__extack = (extack);\
> > -   \
> > -   if (__extack)   \
> > -   __extack->_msg = __msg; \
> > +#define NL_SET_MSG(extack, msg) do {   
> > \
> > +   static const char __msg[] = msg;\
> > +   struct netlink_ext_ack *__extack = (extack);\
> > +

Re: [PATCH RFC 0/2] Add support for warnings to extack

2018-03-18 Thread Marcelo Ricardo Leitner

On Sun, Mar 18, 2018 at 10:11:52AM -0600, David Ahern wrote:
> On 3/16/18 1:23 PM, Marcelo Ricardo Leitner wrote:
> > Currently we have the limitation that warnings cannot be reported though
> > extack. For example, when tc flower failed to get offloaded but got
> > installed on software datapath. The hardware failure is not fatal and
> > thus extack is not even shared with the driver, so the error is simply
> > omitted from any logging.
> 
> If this set ends up moving forward, the above statement needs to be
> corrected: extack allows non-error messages to be sent back to the user,
> so the above must be talking about some other limitation local to tc.

Right.

Thanks,
Marcelo

[PATCH net-next] net: dsa: mv88e6xxx: Fix missing register lock in serdes_get_stats

2018-03-18 Thread Florian Fainelli

We can hit the register lock not held assertion with the following path:

[   34.170631] mv88e6085 0.1:00: Switch registers lock not held!
[   34.176510] CPU: 0 PID: 950 Comm: ethtool Not tainted 4.16.0-rc4 #143
[   34.182985] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
[   34.189519] Backtrace:
[   34.192033] [<8010c4b4>] (dump_backtrace) from [<8010c788>] 
(show_stack+0x20/0x24)
[   34.199680]  r6:9f5dc010 r5:0011 r4:9f5dc010 r3:
[   34.205434] [<8010c768>] (show_stack) from [<80679d38>] 
(dump_stack+0x24/0x28)
[   34.212719] [<80679d14>] (dump_stack) from [<804844a8>] 
(mv88e6xxx_read+0x70/0x7c)
[   34.220376] [<80484438>] (mv88e6xxx_read) from [<804870dc>] 
(mv88e6xxx_port_get_cmode+0x34/0x4c)
[   34.229257]  r5:a09cd128 r4:9ee31d07
[   34.232880] [<804870a8>] (mv88e6xxx_port_get_cmode) from [<80487e6c>] 
(mv88e6352_port_has_serdes+0x24/0x64)
[   34.242690]  r4:9f5dc010
[   34.245309] [<80487e48>] (mv88e6352_port_has_serdes) from [<804880b8>] 
(mv88e6352_serdes_get_stats+0x28/0x12c)
[   34.255389]  r4:0001
[   34.257973] [<80488090>] (mv88e6352_serdes_get_stats) from [<804811e8>] 
(mv88e6xxx_get_ethtool_stats+0xb0/0xc0)
[   34.268156]  r10: r9: r8: r7:a09cd020 r6:0001 
r5:9f5dc01c
[   34.276052]  r4:9f5dc010
[   34.278631] [<80481138>] (mv88e6xxx_get_ethtool_stats) from [<8064f740>] 
(dsa_slave_get_ethtool_stats+0xbc/0xc4)

mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
chip->info->ops->stats_get_stats(), which holds the register lock, and
chip->info->ops->serdes_get_stats() which does not. Have
chip->info->ops->serdes_get_stats() be running with the register lock held to
avoid such assertions.

Fixes: 436fe17d273b ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to have 
statistics")
Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/mv88e6xxx/chip.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c
index bd3ee84770c7..c22c18ed2de3 100644
--- a/drivers/net/dsa/mv88e6xxx/chip.c
+++ b/drivers/net/dsa/mv88e6xxx/chip.c
@@ -849,7 +849,9 @@ static void mv88e6xxx_get_stats(struct mv88e6xxx_chip 
*chip, int port,
 
if (chip->info->ops->serdes_get_stats) {
data += count;
+   mutex_lock(&chip->reg_lock);
chip->info->ops->serdes_get_stats(chip, port, data);
+   mutex_unlock(&chip->reg_lock);
}
 }
 
-- 
2.14.1

Re: [PATCH RFC 0/2] Add support for warnings to extack

2018-03-18 Thread Marcelo Ricardo Leitner

On Sun, Mar 18, 2018 at 10:11:52AM -0600, David Ahern wrote:
> On 3/16/18 1:23 PM, Marcelo Ricardo Leitner wrote:
> > Currently we have the limitation that warnings cannot be reported though
> > extack. For example, when tc flower failed to get offloaded but got
> > installed on software datapath. The hardware failure is not fatal and
> > thus extack is not even shared with the driver, so the error is simply
> > omitted from any logging.
> 
> If this set ends up moving forward, the above statement needs to be
> corrected: extack allows non-error messages to be sent back to the user,
> so the above must be talking about some other limitation local to tc.

I'll split this patchset into two:
- pass extack to drivers when doing tc offload (such as flower and
  others) and allow reporting of warnings in such cases. (according to
  a opt-in flag, as discussed in the other subthread)

- allow more than one message

The 1st may lead to the 2nd but right now it's more as a supposition,
as there is no actual user for it yet.

[PATCH 1/2] brcmfmac: add new dt entries for SG SDIO settings

2018-03-18 Thread Alexey Roslyakov

There are 3 fields in SDIO settings (quirks) to workaround some of
the SG SDIO host particularities, i.e higher align requirements for
SG items.
All coding is done the long time ago, but there is no way to change the
driver behavior without patching the kernel.
Add missing devicetree entries.

Signed-off-by: Alexey Roslyakov 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c
index aee6e5937c41..0718ca09a40d 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c
@@ -31,6 +31,7 @@ void brcmf_of_probe(struct device *dev, enum brcmf_bus_type 
bus_type,
int irq;
u32 irqf;
u32 val;
+   u16 align;
 
if (!np || bus_type != BRCMF_BUSTYPE_SDIO ||
!of_device_is_compatible(np, "brcm,bcm4329-fmac"))
@@ -39,6 +40,15 @@ void brcmf_of_probe(struct device *dev, enum brcmf_bus_type 
bus_type,
if (of_property_read_u32(np, "brcm,drive-strength", &val) == 0)
sdio->drive_strength = val;
 
+   sdio->broken_sg_support =
+   of_property_read_bool(np, "brcm,broken-sg-support");
+
+   if (of_property_read_u16(np, "brcm,sd-head-align", &align) == 0)
+   sdio->sd_head_align = align;
+
+   if (of_property_read_u16(np, "brcm,sd-sgentry-align", &align) == 0)
+   sdio->sd_sgentry_align = align;
+
/* make sure there are interrupts defined in the node */
if (!of_find_property(np, "interrupts", NULL))
return;
-- 
2.16.1

[PATCH 2/2] dt: bindings: add new dt entries for brcmfmac

2018-03-18 Thread Alexey Roslyakov

In case if the host has higher align requirements for SG items, allow
setting device-specific aligns for scatterlist items.

Signed-off-by: Alexey Roslyakov 
---
 Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt 
b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
index 86602f264dce..187b8c1b52a7 100644
--- a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
+++ b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
@@ -17,6 +17,11 @@ Optional properties:
When not specified the device will use in-band SDIO interrupts.
  - interrupt-names : name of the out-of-band interrupt, which must be set
to "host-wake".
+ - brcm,broken-sg-support : boolean flag to indicate that the SDIO host
+   controller has higher align requirement than 32 bytes for each
+   scatterlist item.
+ - brcm,sd-head-align : alignment requirement for start of data buffer.
+ - brcm,sd-sgentry-align : length alignment requirement for each sg entry.
 
 Example:
 
-- 
2.16.1

Re: [PATCH v2 10/21] lightnvm: Remove depends on HAS_DMA in case of platform dependency

2018-03-18 Thread Matias Bjørling


On 03/16/2018 02:51 PM, Geert Uytterhoeven wrote:

Remove dependencies on HAS_DMA where a Kconfig symbol depends on another
symbol that implies HAS_DMA, and, optionally, on "|| COMPILE_TEST".
In most cases this other symbol is an architecture or platform specific
symbol, or PCI.

Generic symbols and drivers without platform dependencies keep their
dependencies on HAS_DMA, to prevent compiling subsystems or drivers that
cannot work anyway.

This simplifies the dependencies, and allows to improve compile-testing.

Notes:
   - FSL_FMAN keeps its dependency on HAS_DMA, as it calls set_dma_ops(),
 which does not exist if HAS_DMA=n (Do we need a dummy? The use of
 set_dma_ops() in this driver is questionable),
   - SND_SOC_LPASS_IPQ806X and SND_SOC_LPASS_PLATFORM loose their
 dependency on HAS_DMA, as they are selected from
 SND_SOC_APQ8016_SBC.

Signed-off-by: Geert Uytterhoeven 
Reviewed-by: Mark Brown 
Acked-by: Robin Murphy 
---
v2:
   - Add Reviewed-by, Acked-by,
   - Drop RFC state,
   - Split per subsystem.
---
  drivers/lightnvm/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/lightnvm/Kconfig b/drivers/lightnvm/Kconfig
index 10c08982185a572f..9c03f35d9df113c6 100644
--- a/drivers/lightnvm/Kconfig
+++ b/drivers/lightnvm/Kconfig
@@ -4,7 +4,7 @@
  
  menuconfig NVM

bool "Open-Channel SSD target support"
-   depends on BLOCK && HAS_DMA && PCI
+   depends on BLOCK && PCI
select BLK_DEV_NVME
help
  Say Y here to get to enable Open-channel SSDs.



Looks good.

Reviewed-by: Matias Bjørling

[PATCH net-next 0/4] net: dsa: Plug in PHYLINK support

2018-03-18 Thread Florian Fainelli

Hi all,

This patch series adds PHYLINK support to DSA which is necessary to support more
complex PHY and pluggable modules setups.

Patch series can be found here:

https://github.com/ffainelli/linux/commits/dsa-phylink

This was tested on:

- dsa-loop
- bcm_sf2
- mv88e6xxx
- b53

With a variety of test cases:
- internal & external MDIO PHYs
- MoCA with link notification through interrupt/MMIO register
- built-in PHYs
- ifconfig up/down for several cycles works
- bind/unbind of the drivers

And everything should still work as expected. Please be aware of the following:

- switch drivers (like bcm_sf2) which may have user-facing network ports using
  fixed links would need to implement phylink_mac_ops to remain functional.
  PHYLINK does not create a phy_device for fixed links, therefore our
  call to adjust_link() from phylink_mac_link_{up,down} would not be calling
  into the driver. This *should not* affect CPU/DSA ports which are configured
  through adjust_link() but have no network devices

- support for SFP/SFF is now possible, but switch drivers will still need some
  modifications to properly support those, including, but not limited to using
  the correct binding information. This will be submitted on top of this series

Russell, we could theoretically eliminate patch 3 and resolve this within DSA
entirely by keeping a per-port phy_interface_t (we did that before), this is
not a big change if we have to, let me know if you feel like this is cleaner. I
was initially considering passing a phylink_link_state reference to
mac_link_{up,down} but only a couple of fields are valid during link_down and
ended up with passing the phy_interface_t value we need instead. This is
necessary for switch drivers which have different types of port interfaces (see
bcm_sf2 documentation in tree).

Thank you!

Florian Fainelli (4):
  net: dsa: Eliminate dsa_slave_get_link()
  net: phy: phylink: Provide PHY interface to mac_link_{up,down}
  net: dsa: Plug in PHYLINK support
  net: dsa: bcm_sf2: Implement phylink_mac_ops

 drivers/net/dsa/bcm_sf2.c | 190 +
 drivers/net/ethernet/marvell/mvneta.c |   4 +-
 drivers/net/phy/phylink.c |   6 +-
 include/linux/phylink.h   |  10 +-
 include/net/dsa.h |  27 ++-
 net/dsa/Kconfig   |   2 +-
 net/dsa/dsa_priv.h|   9 -
 net/dsa/slave.c   | 304 --
 8 files changed, 340 insertions(+), 212 deletions(-)

-- 
2.14.1

[PATCH net-next 1/4] net: dsa: Eliminate dsa_slave_get_link()

2018-03-18 Thread Florian Fainelli

Since we use PHYLIB to manage the per-port link indication, this will
also be reflected correctly in the network device's carrier state, so we
can use ethtool_op_get_link() instead.

Signed-off-by: Florian Fainelli 
---
 net/dsa/slave.c | 12 +---
 1 file changed, 1 insertion(+), 11 deletions(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 18561af7a8f1..9714e8b002d3 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -498,16 +498,6 @@ dsa_slave_get_regs(struct net_device *dev, struct 
ethtool_regs *regs, void *_p)
ds->ops->get_regs(ds, dp->index, regs, _p);
 }
 
-static u32 dsa_slave_get_link(struct net_device *dev)
-{
-   if (!dev->phydev)
-   return -ENODEV;
-
-   genphy_update_link(dev->phydev);
-
-   return dev->phydev->link;
-}
-
 static int dsa_slave_get_eeprom_len(struct net_device *dev)
 {
struct dsa_port *dp = dsa_slave_to_port(dev);
@@ -981,7 +971,7 @@ static const struct ethtool_ops dsa_slave_ethtool_ops = {
.get_regs_len   = dsa_slave_get_regs_len,
.get_regs   = dsa_slave_get_regs,
.nway_reset = phy_ethtool_nway_reset,
-   .get_link   = dsa_slave_get_link,
+   .get_link   = ethtool_op_get_link,
.get_eeprom_len = dsa_slave_get_eeprom_len,
.get_eeprom = dsa_slave_get_eeprom,
.set_eeprom = dsa_slave_set_eeprom,
-- 
2.14.1

[PATCH net-next 2/4] net: phy: phylink: Provide PHY interface to mac_link_{up,down}

2018-03-18 Thread Florian Fainelli

In preparation for having DSA transition entirely to PHYLINK, we need to pass a
PHY interface type to the mac_link_{up,down} callbacks because we may have to
make decisions on that (e.g: turn on/off RGMII interfaces etc.). We do not pass
an entire phylink_link_state because not all parameters (pause, duplex etc.) are
defined when the link is down, only link and interface are.

Update mvneta accordingly since it currently implements phylink_mac_ops.

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/marvell/mvneta.c |  4 +++-
 drivers/net/phy/phylink.c |  6 +-
 include/linux/phylink.h   | 10 --
 3 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c 
b/drivers/net/ethernet/marvell/mvneta.c
index 25e9a551cc8c..60de9b8d62c2 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3396,7 +3396,8 @@ static void mvneta_set_eee(struct mvneta_port *pp, bool 
enable)
mvreg_write(pp, MVNETA_LPI_CTRL_1, lpi_ctl1);
 }
 
-static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode)
+static void mvneta_mac_link_down(struct net_device *ndev, unsigned int mode,
+phy_interface_t interface)
 {
struct mvneta_port *pp = netdev_priv(ndev);
u32 val;
@@ -3415,6 +3416,7 @@ static void mvneta_mac_link_down(struct net_device *ndev, 
unsigned int mode)
 }
 
 static void mvneta_mac_link_up(struct net_device *ndev, unsigned int mode,
+  phy_interface_t interface,
   struct phy_device *phy)
 {
struct mvneta_port *pp = netdev_priv(ndev);
diff --git a/drivers/net/phy/phylink.c b/drivers/net/phy/phylink.c
index 51a011a349fe..cef3c1356a8c 100644
--- a/drivers/net/phy/phylink.c
+++ b/drivers/net/phy/phylink.c
@@ -423,8 +423,10 @@ static void phylink_resolve(struct work_struct *w)
if (pl->phylink_disable_state) {
pl->mac_link_dropped = false;
link_state.link = false;
+   link_state.interface = pl->phy_state.interface;
} else if (pl->mac_link_dropped) {
link_state.link = false;
+   link_state.interface = pl->phy_state.interface;
} else {
switch (pl->link_an_mode) {
case MLO_AN_PHY:
@@ -470,10 +472,12 @@ static void phylink_resolve(struct work_struct *w)
if (link_state.link != netif_carrier_ok(ndev)) {
if (!link_state.link) {
netif_carrier_off(ndev);
-   pl->ops->mac_link_down(ndev, pl->link_an_mode);
+   pl->ops->mac_link_down(ndev, pl->link_an_mode,
+  pl->phy_state.interface);
netdev_info(ndev, "Link is Down\n");
} else {
pl->ops->mac_link_up(ndev, pl->link_an_mode,
+pl->phy_state.interface,
 pl->phydev);
 
netif_carrier_on(ndev);
diff --git a/include/linux/phylink.h b/include/linux/phylink.h
index bd137c273d38..f29a40947de9 100644
--- a/include/linux/phylink.h
+++ b/include/linux/phylink.h
@@ -73,8 +73,10 @@ struct phylink_mac_ops {
void (*mac_config)(struct net_device *ndev, unsigned int mode,
   const struct phylink_link_state *state);
void (*mac_an_restart)(struct net_device *ndev);
-   void (*mac_link_down)(struct net_device *ndev, unsigned int mode);
+   void (*mac_link_down)(struct net_device *ndev, unsigned int mode,
+ phy_interface_t interface);
void (*mac_link_up)(struct net_device *ndev, unsigned int mode,
+   phy_interface_t interface,
struct phy_device *phy);
 };
 
@@ -161,17 +163,20 @@ void mac_an_restart(struct net_device *ndev);
  * mac_link_down() - take the link down
  * @ndev: a pointer to a &struct net_device for the MAC.
  * @mode: link autonegotiation mode
+ * @interface: link &typedef phy_interface_t mode
  *
  * If @mode is not an in-band negotiation mode (as defined by
  * phylink_autoneg_inband()), force the link down and disable any
  * Energy Efficient Ethernet MAC configuration.
  */
-void mac_link_down(struct net_device *ndev, unsigned int mode);
+void mac_link_down(struct net_device *ndev, unsigned int mode,
+  phy_interface_t interface);
 
 /**
  * mac_link_up() - allow the link to come up
  * @ndev: a pointer to a &struct net_device for the MAC.
  * @mode: link autonegotiation mode
+ * @interface: link &typedef phy_interface_t mode
  * @phy: any attached phy
  *
  * If @mode is not an in-band negotiation mode (as defined by
@@ -180,6 +185,7 @@ void mac_link_down(struct net_device *ndev, unsigned int 
mode);
  * phy_init_eee() and perform appropriate MAC configuration

[PATCH net-next 4/4] net: dsa: bcm_sf2: Implement phylink_mac_ops

2018-03-18 Thread Florian Fainelli

Make the bcm_sf2 driver implement phylink_mac_ops since it needs to
support a wide variety of network interfaces: internal & external MDIO
PHYs, fixed PHYs, MoCA with MMIO link status.

A large amount of what needs to be done already exists under
bcm_sf2_sw_adjust_link() so we are essentially breaking this down into
the necessary operation for PHYLINK to work: mac_config, mac_link_up,
mac_link_down and validate. We can now entirely get rid of most of what
fixed_link_update() provided because only the link information is actually
necessary. We still have to force DUPLEX_FULL for legacy Device Tree bindings
that did not specify that before.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c | 186 +-
 1 file changed, 118 insertions(+), 68 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 726d75a61795..b4d5fdcf4183 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -306,7 +306,8 @@ static int bcm_sf2_sw_mdio_write(struct mii_bus *bus, int 
addr, int regnum,
 
 static irqreturn_t bcm_sf2_switch_0_isr(int irq, void *dev_id)
 {
-   struct bcm_sf2_priv *priv = dev_id;
+   struct dsa_switch *ds = dev_id;
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
 
priv->irq0_stat = intrl2_0_readl(priv, INTRL2_CPU_STATUS) &
~priv->irq0_mask;
@@ -317,16 +318,21 @@ static irqreturn_t bcm_sf2_switch_0_isr(int irq, void 
*dev_id)
 
 static irqreturn_t bcm_sf2_switch_1_isr(int irq, void *dev_id)
 {
-   struct bcm_sf2_priv *priv = dev_id;
+   struct dsa_switch *ds = dev_id;
+   struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
 
priv->irq1_stat = intrl2_1_readl(priv, INTRL2_CPU_STATUS) &
~priv->irq1_mask;
intrl2_1_writel(priv, priv->irq1_stat, INTRL2_CPU_CLEAR);
 
-   if (priv->irq1_stat & P_LINK_UP_IRQ(P7_IRQ_OFF))
-   priv->port_sts[7].link = 1;
-   if (priv->irq1_stat & P_LINK_DOWN_IRQ(P7_IRQ_OFF))
-   priv->port_sts[7].link = 0;
+   if (priv->irq1_stat & P_LINK_UP_IRQ(P7_IRQ_OFF)) {
+   priv->port_sts[7].link = true;
+   dsa_port_phylink_mac_change(ds, 7, true);
+   }
+   if (priv->irq1_stat & P_LINK_DOWN_IRQ(P7_IRQ_OFF)) {
+   priv->port_sts[7].link = false;
+   dsa_port_phylink_mac_change(ds, 7, false);
+   }
 
return IRQ_HANDLED;
 }
@@ -473,13 +479,56 @@ static u32 bcm_sf2_sw_get_phy_flags(struct dsa_switch 
*ds, int port)
return priv->hw_params.gphy_rev;
 }
 
-static void bcm_sf2_sw_adjust_link(struct dsa_switch *ds, int port,
-  struct phy_device *phydev)
+static void bcm_sf2_sw_validate(struct dsa_switch *ds, int port,
+   unsigned long *supported,
+   struct phylink_link_state *state)
+{
+   __ETHTOOL_DECLARE_LINK_MODE_MASK(mask) = { 0, };
+
+   if (!phy_interface_mode_is_rgmii(state->interface) &&
+   state->interface != PHY_INTERFACE_MODE_MII &&
+   state->interface != PHY_INTERFACE_MODE_REVMII &&
+   state->interface != PHY_INTERFACE_MODE_GMII &&
+   state->interface != PHY_INTERFACE_MODE_INTERNAL &&
+   state->interface != PHY_INTERFACE_MODE_MOCA) {
+   bitmap_zero(supported, __ETHTOOL_LINK_MODE_MASK_NBITS);
+   dev_err(ds->dev,
+   "Unsupported interface: %d\n", state->interface);
+   return;
+   }
+
+   /* Allow all the expected bits */
+   phylink_set(mask, Autoneg);
+   phylink_set_port_modes(mask);
+   phylink_set(mask, Pause);
+   phylink_set(mask, Asym_Pause);
+
+   /* With the exclusion of MII and Reverse MII, we support Gigabit,
+* including Half duplex
+*/
+   if (state->interface != PHY_INTERFACE_MODE_MII &&
+   state->interface != PHY_INTERFACE_MODE_REVMII) {
+   phylink_set(mask, 1000baseT_Full);
+   phylink_set(mask, 1000baseT_Half);
+   }
+
+   phylink_set(mask, 10baseT_Half);
+   phylink_set(mask, 10baseT_Full);
+   phylink_set(mask, 100baseT_Half);
+   phylink_set(mask, 100baseT_Full);
+
+   bitmap_and(supported, supported, mask,
+  __ETHTOOL_LINK_MODE_MASK_NBITS);
+   bitmap_and(state->advertising, state->advertising, mask,
+  __ETHTOOL_LINK_MODE_MASK_NBITS);
+}
+
+static void bcm_sf2_sw_mac_config(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
-   struct ethtool_eee *p = &priv->dev->ports[port].eee;
u32 id_mode_dis = 0, port_mode;
-   const char *str = NULL;
u32 reg, offset;
 
if (priv->type == BCM7445_DEVICE_ID)
@@ -487,62 +536,48 @

[PATCH net-next 3/4] net: dsa: Plug in PHYLINK support

2018-03-18 Thread Florian Fainelli

Add support for PHYLINK within the DSA subsystem in order to support more
complex devices such as pluggable (SFP) and non-pluggable (SFF) modules, 10G
PHYs, and traditional PHYs. Using PHYLINK allows us to drop some amount of
complexity we had while probing fixed and non-fixed PHYs using Device Tree.

Because PHYLINK separates the Ethernet MAC/port configuration into different
stages, we let switch drivers implement those, and for now, we maintain
functionality by calling dsa_slave_adjust_link() during
phylink_mac_link_{up,down} which provides semantically equivalent steps.

Drivers willing to take advantage of PHYLINK should implement the phylink_mac_*
operations that DSA wraps.

We cannot quite remove the adjust_link() callback just yet, because a number of
drivers rely on that for configuring their "CPU" and "DSA" ports, this is done
dsa_port_setup_phy_of() and dsa_port_fixed_link_register_of() still.

Drivers that utilize fixed links for user-facing ports (e.g: bcm_sf2) will need
to implement phylink_mac_ops from now on to preserve functionality, since 
PHYLINK
*does not* create a phy_device instance for fixed links.

Signed-off-by: Florian Fainelli 
---
 drivers/net/dsa/bcm_sf2.c |  12 +-
 include/net/dsa.h |  27 -
 net/dsa/Kconfig   |   2 +-
 net/dsa/dsa_priv.h|   9 --
 net/dsa/slave.c   | 300 --
 5 files changed, 213 insertions(+), 137 deletions(-)

diff --git a/drivers/net/dsa/bcm_sf2.c b/drivers/net/dsa/bcm_sf2.c
index 0378eded31f2..726d75a61795 100644
--- a/drivers/net/dsa/bcm_sf2.c
+++ b/drivers/net/dsa/bcm_sf2.c
@@ -15,7 +15,7 @@
 #include 
 #include 
 #include 
-#include 
+#include 
 #include 
 #include 
 #include 
@@ -563,7 +563,7 @@ static void bcm_sf2_sw_adjust_link(struct dsa_switch *ds, 
int port,
 }
 
 static void bcm_sf2_sw_fixed_link_update(struct dsa_switch *ds, int port,
-struct fixed_phy_status *status)
+struct phylink_link_state *status)
 {
struct bcm_sf2_priv *priv = bcm_sf2_to_priv(ds);
u32 duplex, pause, offset;
@@ -611,13 +611,11 @@ static void bcm_sf2_sw_fixed_link_update(struct 
dsa_switch *ds, int port,
core_writel(priv, reg, offset);
 
if ((pause & (1 << port)) &&
-   (pause & (1 << (port + PAUSESTS_TX_PAUSE_SHIFT {
-   status->asym_pause = 1;
-   status->pause = 1;
-   }
+   (pause & (1 << (port + PAUSESTS_TX_PAUSE_SHIFT
+   status->pause |= MLO_PAUSE_TX;
 
if (pause & (1 << port))
-   status->pause = 1;
+   status->pause |= MLO_PAUSE_TXRX_MASK;
 }
 
 static void bcm_sf2_enable_acb(struct dsa_switch *ds)
diff --git a/include/net/dsa.h b/include/net/dsa.h
index 60fb4ec8ba61..5399bed88df8 100644
--- a/include/net/dsa.h
+++ b/include/net/dsa.h
@@ -20,12 +20,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
 struct tc_action;
 struct phy_device;
-struct fixed_phy_status;
+struct phylink_link_state;
 
 enum dsa_tag_protocol {
DSA_TAG_PROTO_NONE = 0,
@@ -199,6 +200,7 @@ struct dsa_port {
u8  stp_state;
struct net_device   *bridge_dev;
struct devlink_port devlink_port;
+   struct phylink  *pl;
/*
 * Original copy of the master netdev ethtool_ops
 */
@@ -350,8 +352,27 @@ struct dsa_switch_ops {
 */
void(*adjust_link)(struct dsa_switch *ds, int port,
struct phy_device *phydev);
+   /*
+* PHYLINK integration
+*/
+   void(*phylink_validate)(struct dsa_switch *ds, int port,
+   unsigned long *supported,
+   struct phylink_link_state *state);
+   int (*phylink_mac_link_state)(struct dsa_switch *ds, int port,
+ struct phylink_link_state *state);
+   void(*phylink_mac_config)(struct dsa_switch *ds, int port,
+ unsigned int mode,
+ const struct phylink_link_state *state);
+   void(*phylink_mac_an_restart)(struct dsa_switch *ds, int port);
+   void(*phylink_mac_link_down)(struct dsa_switch *ds, int port,
+unsigned int mode,
+phy_interface_t interface);
+   void(*phylink_mac_link_up)(struct dsa_switch *ds, int port,
+  unsigned int mode,
+  phy_interface_t interface,
+  struct phy_device *phydev);
void(*fixed_link_update)(struct dsa_switch *ds, int port,
-   struct fixed_phy_status *st);
+struct phylink_link_state *state);
 
/*

Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix missing register lock in serdes_get_stats

2018-03-18 Thread Andrew Lunn

On Sun, Mar 18, 2018 at 11:23:05AM -0700, Florian Fainelli wrote:
> We can hit the register lock not held assertion with the following path:
> 
> [   34.170631] mv88e6085 0.1:00: Switch registers lock not held!
> [   34.176510] CPU: 0 PID: 950 Comm: ethtool Not tainted 4.16.0-rc4 #143
> [   34.182985] Hardware name: Freescale Vybrid VF5xx/VF6xx (Device Tree)
> [   34.189519] Backtrace:
> [   34.192033] [<8010c4b4>] (dump_backtrace) from [<8010c788>] 
> (show_stack+0x20/0x24)
> [   34.199680]  r6:9f5dc010 r5:0011 r4:9f5dc010 r3:
> [   34.205434] [<8010c768>] (show_stack) from [<80679d38>] 
> (dump_stack+0x24/0x28)
> [   34.212719] [<80679d14>] (dump_stack) from [<804844a8>] 
> (mv88e6xxx_read+0x70/0x7c)
> [   34.220376] [<80484438>] (mv88e6xxx_read) from [<804870dc>] 
> (mv88e6xxx_port_get_cmode+0x34/0x4c)
> [   34.229257]  r5:a09cd128 r4:9ee31d07
> [   34.232880] [<804870a8>] (mv88e6xxx_port_get_cmode) from [<80487e6c>] 
> (mv88e6352_port_has_serdes+0x24/0x64)
> [   34.242690]  r4:9f5dc010
> [   34.245309] [<80487e48>] (mv88e6352_port_has_serdes) from [<804880b8>] 
> (mv88e6352_serdes_get_stats+0x28/0x12c)
> [   34.255389]  r4:0001
> [   34.257973] [<80488090>] (mv88e6352_serdes_get_stats) from [<804811e8>] 
> (mv88e6xxx_get_ethtool_stats+0xb0/0xc0)
> [   34.268156]  r10: r9: r8: r7:a09cd020 r6:0001 
> r5:9f5dc01c
> [   34.276052]  r4:9f5dc010
> [   34.278631] [<80481138>] (mv88e6xxx_get_ethtool_stats) from [<8064f740>] 
> (dsa_slave_get_ethtool_stats+0xbc/0xc4)
> 
> mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
> chip->info->ops->stats_get_stats(), which holds the register lock, and
> chip->info->ops->serdes_get_stats() which does not. Have
> chip->info->ops->serdes_get_stats() be running with the register lock held to
> avoid such assertions.
> 
> Fixes: 436fe17d273b ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to 
> have statistics")
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Yes, i have the same patch in my backlog of patches.

Andrew

Re: [PATCH net-next 1/4] net: dsa: Eliminate dsa_slave_get_link()

2018-03-18 Thread Andrew Lunn

On Sun, Mar 18, 2018 at 11:52:43AM -0700, Florian Fainelli wrote:
> Since we use PHYLIB to manage the per-port link indication, this will
> also be reflected correctly in the network device's carrier state, so we
> can use ethtool_op_get_link() instead.
> 
> Signed-off-by: Florian Fainelli 

Reviewed-by: Andrew Lunn 

Andrew

Re: [PATCH net-next 3/4] net: dsa: Plug in PHYLINK support

2018-03-18 Thread Andrew Lunn

> +static int dsa_slave_nway_reset(struct net_device *dev)
> +{
> + struct dsa_port *dp = dsa_slave_to_port(dev);
> +
> + return phylink_ethtool_nway_reset(dp->pl);
> +}

Hi Florian

I've seen in one of Russells trees a patch to put a phylink into
net_device. That would make a generic slave_nway_reset() possible, and
a few others as well. Maybe it makes sense to pull in that patch?

  Andrew

[PATCH net] net: fec: Fix unbalanced PM runtime calls

2018-03-18 Thread Florian Fainelli

When unbinding/removing the driver, we will run into the following warnings:

[  259.655198] fec 400d1000.ethernet: 400d1000.ethernet supply phy not found, 
using dummy regulator
[  259.665065] fec 400d1000.ethernet: Unbalanced pm_runtime_enable!
[  259.672770] fec 400d1000.ethernet (unnamed net_device) (uninitialized): 
Invalid MAC address: 00:00:00:00:00:00
[  259.683062] fec 400d1000.ethernet (unnamed net_device) (uninitialized): 
Using random MAC address: f2:3e:93:b7:29:c1
[  259.696239] libphy: fec_enet_mii_bus: probed

Avoid these warnings by balancing the runtime PM calls during fec_drv_remove().

Signed-off-by: Florian Fainelli 
---
 drivers/net/ethernet/freescale/fec_main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/freescale/fec_main.c 
b/drivers/net/ethernet/freescale/fec_main.c
index 7a7f3a42b2aa..d4604bc8eb5b 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -3600,6 +3600,8 @@ fec_drv_remove(struct platform_device *pdev)
fec_enet_mii_remove(fep);
if (fep->reg_phy)
regulator_disable(fep->reg_phy);
+   pm_runtime_put(&pdev->dev);
+   pm_runtime_disable(&pdev->dev);
if (of_phy_is_fixed_link(np))
of_phy_deregister_fixed_link(np);
of_node_put(fep->phy_node);
-- 
2.14.1

[bpf-next PATCH v3 00/18] bpf,sockmap: sendmsg/sendfile ULP

2018-03-18 Thread John Fastabend

This series adds a BPF hook for sendmsg and senfile by using
the ULP infrastructure and sockmap. A simple pseudocode example
would be,

  // load the programs
  bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
&obj, &msg_prog);

  // lookup the sockmap
  bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");

  // get fd for sockmap
  map_fd_msg = bpf_map__fd(bpf_map_msg);

  // attach program to sockmap
  bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);

  // Add a socket 'fd' to sockmap at location 'i'
  bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);

  
After the above snippet any socket attached to the map would run
msg_prog on sendmsg and sendfile system calls.

Three additional helpers are added bpf_msg_apply_bytes(),
bpf_msg_cork_bytes(), and bpf_msg_pull_data(). With
bpf_msg_apply_bytes BPF programs can tell the infrastructure how
many bytes the given verdict should apply to. This has two cases.
First, a BPF program applies verdict to fewer bytes than in the
current sendmsg/sendfile msg this will apply the verdict to the
first N bytes of the message then run the BPF program again with
data pointers recalculated to the N+1 byte. The second case is the
BPF program applies a verdict to more bytes than the current sendmsg
or sendfile system call. In this case the infrastructure will cache
the verdict and apply it to future sendmsg/sendfile calls until the
byte limit is reached. This avoids the overhead of running BPF
programs on large payloads.

The helper bpf_msg_cork_bytes() handles a different case where
a BPF program can not reach a verdict on a msg until it receives
more bytes AND the program doesn't want to forward the packet
until it is known to be "good". The example case being a user
(albeit a dumb one probably) sends a N byte header in 1B system
calls. The BPF program can call bpf_msg_cork_bytes with the
required byte limit to reach a verdict and then the program will
only be called again once N bytes are received.

The last helper added in this series is bpf_msg_pull_data(). It
is used to pull data in for modification or reading. Similar to
how sk_pull_data() works msg_pull_data can be used to access data
not in the initial (data_start, data_end) range. For sendpage()
calls this is needed if any data is accessed because the BPF
sendpage hook initializes the data_start and data_end pointers to
zero. We do this because sendpage data is shared with the user
and can be modified during or after the BPF verdict possibly
invalidating any verdict the BPF program decides. For sendmsg
the data is already copied by the sendmsg bpf infrastructure so
we only copy the data if the user request a data range that is
not already linearized. This happens if the user requests larger
blocks of data that are not in a single scatterlist element. The
common case seems to be accessing headers which normally are
in the first scatterlist element and already linearized.

For more examples please review the sample program. There are
examples for all the actions and helpers there.

Patches 1-8 implement the above sockmap/BPF infrastructure. The
remaining patches flush out some minimal selftests and the sample
sockmap program. The sockmap sample program is the main vehicle
for testing this infrastructure and will be moved into selftests
shortly. The final patch in this series is a simple shell script
to run a set of tests. These are the tests I run after any changes
to sockmap. The next task on the list after this series is to
push those into selftests so we can avoid manually testing.

Couple notes on future items in the pipeline,

  0. move sample sockmap programs into selftests (noted above)
  1. add additional support for tcp flags, most are ignored now.
  2. add a Documentation/bpf/sockmap file with these details
  3. support stacked ULP types to allow this and ktls to cooperate
  4. Ingress flag support, redirect only supports egress here. The
 other redirect helpers support ingress and egress flags.
  5. add optimizations, I cut a few optimizations here in the
 first iteration of the code for later study/implementation

-v3 updates
  : u32 data pointers in msg_md changed to void *
  : page_address NULL check and flag verification in msg_pull_data
  : remove old note in commit msg that is no longer relevant
  : remove enum sk_msg_action its not used anywhere
  : fixup test_verifier W -> DW insn to account for data pointers
  : unintentionally dropped a smap_stop_tx() call in sockmap.c

I propagated the ACKs forward because above changes were small
one/two line changes.

-v2 updates (discussion):

Dave noticed that sendpage call was previously (in v1) running
on the data directly. This allowed users to potentially modify
the data after or during the BPF program. However doing a copy
automatically even if the data is not accessed has measurable 
performance impact. So we added another helper modeled after
the existing skb_pull_data() helper to allow users to se

[bpf-next PATCH v3 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG

2018-03-18 Thread John Fastabend

When calling do_tcp_sendpages() from in kernel and we know the data
has no references from user side we can omit SKBTX_SHARED_FRAG flag.
This patch adds an internal flag, NO_SKBTX_SHARED_FRAG that can be used
to omit setting SKBTX_SHARED_FRAG.

The flag is not exposed to userspace because the sendpage call from
the splice logic masks out all bits except MSG_MORE.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 include/linux/socket.h |1 +
 net/ipv4/tcp.c |4 +++-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/socket.h b/include/linux/socket.h
index 1ce1f76..60e0148 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -287,6 +287,7 @@ struct ucred {
 #define MSG_SENDPAGE_NOTLAST 0x2 /* sendpage() internal : not the last 
page */
 #define MSG_BATCH  0x4 /* sendmmsg(): more messages coming */
 #define MSG_EOF MSG_FIN
+#define MSG_NO_SHARED_FRAGS 0x8 /* sendpage() internal : page frags are 
not shared */
 
 #define MSG_ZEROCOPY   0x400   /* Use user data in kernel path */
 #define MSG_FASTOPEN   0x2000  /* Send data in TCP SYN */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index fb350f7..f90ec24 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -994,7 +994,9 @@ ssize_t do_tcp_sendpages(struct sock *sk, struct page 
*page, int offset,
get_page(page);
skb_fill_page_desc(skb, i, page, offset, copy);
}
-   skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
+
+   if (!(flags & MSG_NO_SHARED_FRAGS))
+   skb_shinfo(skb)->tx_flags |= SKBTX_SHARED_FRAG;
 
skb->len += copy;
skb->data_len += copy;

[bpf-next PATCH v3 02/18] sockmap: convert refcnt to an atomic refcnt

2018-03-18 Thread John Fastabend

The sockmap refcnt up until now has been wrapped in the
sk_callback_lock(). So its not actually needed any locking of its
own. The counter itself tracks the lifetime of the psock object.
Sockets in a sockmap have a lifetime that is independent of the
map they are part of. This is possible because a single socket may
be in multiple maps. When this happens we can only release the
psock data associated with the socket when the refcnt reaches
zero. There are three possible delete sock reference decrement
paths first through the normal sockmap process, the user deletes
the socket from the map. Second the map is removed and all sockets
in the map are removed, delete path is similar to case 1. The third
case is an asyncronous socket event such as a closing the socket. The
last case handles removing sockets that are no longer available.
For completeness, although inc does not pose any problems in this
patch series, the inc case only happens when a psock is added to a
map.

Next we plan to add another socket prog type to handle policy and
monitoring on the TX path. When we do this however we will need to
keep a reference count open across the sendmsg/sendpage call and
holding the sk_callback_lock() here (on every send) seems less than
ideal, also it may sleep in cases where we hit memory pressure.
Instead of dealing with these issues in some clever way simply make
the reference counting a refcnt_t type and do proper atomic ops.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 kernel/bpf/sockmap.c |   23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index a927e89..051b2242 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -62,8 +62,7 @@ struct smap_psock_map_entry {
 
 struct smap_psock {
struct rcu_head rcu;
-   /* refcnt is used inside sk_callback_lock */
-   u32 refcnt;
+   refcount_t refcnt;
 
/* datapath variables */
struct sk_buff_head rxqueue;
@@ -373,15 +372,13 @@ static void smap_destroy_psock(struct rcu_head *rcu)
 
 static void smap_release_sock(struct smap_psock *psock, struct sock *sock)
 {
-   psock->refcnt--;
-   if (psock->refcnt)
-   return;
-
-   tcp_cleanup_ulp(sock);
-   smap_stop_sock(psock, sock);
-   clear_bit(SMAP_TX_RUNNING, &psock->state);
-   rcu_assign_sk_user_data(sock, NULL);
-   call_rcu_sched(&psock->rcu, smap_destroy_psock);
+   if (refcount_dec_and_test(&psock->refcnt)) {
+   tcp_cleanup_ulp(sock);
+   smap_stop_sock(psock, sock);
+   clear_bit(SMAP_TX_RUNNING, &psock->state);
+   rcu_assign_sk_user_data(sock, NULL);
+   call_rcu_sched(&psock->rcu, smap_destroy_psock);
+   }
 }
 
 static int smap_parse_func_strparser(struct strparser *strp,
@@ -511,7 +508,7 @@ static struct smap_psock *smap_init_psock(struct sock *sock,
INIT_WORK(&psock->tx_work, smap_tx_work);
INIT_WORK(&psock->gc_work, smap_gc_work);
INIT_LIST_HEAD(&psock->maps);
-   psock->refcnt = 1;
+   refcount_set(&psock->refcnt, 1);
 
rcu_assign_sk_user_data(sock, psock);
sock_hold(sock);
@@ -772,7 +769,7 @@ static int sock_map_ctx_update_elem(struct 
bpf_sock_ops_kern *skops,
err = -EBUSY;
goto out_progs;
}
-   psock->refcnt++;
+   refcount_inc(&psock->refcnt);
} else {
psock = smap_init_psock(sock, stab);
if (IS_ERR(psock)) {

[bpf-next PATCH v3 04/18] net: generalize sk_alloc_sg to work with scatterlist rings

2018-03-18 Thread John Fastabend

The current implementation of sk_alloc_sg expects scatterlist to always
start at entry 0 and complete at entry MAX_SKB_FRAGS.

Future patches will want to support starting at arbitrary offset into
scatterlist so add an additional sg_start parameters and then default
to the current values in TLS code paths.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 include/net/sock.h |2 +-
 net/core/sock.c|   27 ---
 net/tls/tls_sw.c   |4 ++--
 3 files changed, 19 insertions(+), 14 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 447150c..b7c75e0 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2142,7 +2142,7 @@ static inline struct page_frag *sk_page_frag(struct sock 
*sk)
 bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
 
 int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
-   int *sg_num_elem, unsigned int *sg_size,
+   int sg_start, int *sg_curr, unsigned int *sg_size,
int first_coalesce);
 
 /*
diff --git a/net/core/sock.c b/net/core/sock.c
index f68dff0..4f92c29 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2240,19 +2240,20 @@ bool sk_page_frag_refill(struct sock *sk, struct 
page_frag *pfrag)
 EXPORT_SYMBOL(sk_page_frag_refill);
 
 int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
-   int *sg_num_elem, unsigned int *sg_size,
+   int sg_start, int *sg_curr_index, unsigned int *sg_curr_size,
int first_coalesce)
 {
+   int sg_curr = *sg_curr_index, use = 0, rc = 0;
+   unsigned int size = *sg_curr_size;
struct page_frag *pfrag;
-   unsigned int size = *sg_size;
-   int num_elem = *sg_num_elem, use = 0, rc = 0;
struct scatterlist *sge;
-   unsigned int orig_offset;
 
len -= size;
pfrag = sk_page_frag(sk);
 
while (len > 0) {
+   unsigned int orig_offset;
+
if (!sk_page_frag_refill(sk, pfrag)) {
rc = -ENOMEM;
goto out;
@@ -2270,17 +2271,21 @@ int sk_alloc_sg(struct sock *sk, int len, struct 
scatterlist *sg,
orig_offset = pfrag->offset;
pfrag->offset += use;
 
-   sge = sg + num_elem - 1;
-   if (num_elem > first_coalesce && sg_page(sg) == pfrag->page &&
+   sge = sg + sg_curr - 1;
+   if (sg_curr > first_coalesce && sg_page(sg) == pfrag->page &&
sg->offset + sg->length == orig_offset) {
sg->length += use;
} else {
-   sge++;
+   sge = sg + sg_curr;
sg_unmark_end(sge);
sg_set_page(sge, pfrag->page, use, orig_offset);
get_page(pfrag->page);
-   ++num_elem;
-   if (num_elem == MAX_SKB_FRAGS) {
+   sg_curr++;
+
+   if (sg_curr == MAX_SKB_FRAGS)
+   sg_curr = 0;
+
+   if (sg_curr == sg_start) {
rc = -ENOSPC;
break;
}
@@ -2289,8 +2294,8 @@ int sk_alloc_sg(struct sock *sk, int len, struct 
scatterlist *sg,
len -= use;
}
 out:
-   *sg_size = size;
-   *sg_num_elem = num_elem;
+   *sg_curr_size = size;
+   *sg_curr_index = sg_curr;
return rc;
 }
 EXPORT_SYMBOL(sk_alloc_sg);
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 0fc8a24..057a558 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -94,7 +94,7 @@ static int alloc_encrypted_sg(struct sock *sk, int len)
int rc = 0;
 
rc = sk_alloc_sg(sk, len,
-ctx->sg_encrypted_data,
+ctx->sg_encrypted_data, 0,
 &ctx->sg_encrypted_num_elem,
 &ctx->sg_encrypted_size, 0);
 
@@ -107,7 +107,7 @@ static int alloc_plaintext_sg(struct sock *sk, int len)
struct tls_sw_context *ctx = tls_sw_ctx(tls_ctx);
int rc = 0;
 
-   rc = sk_alloc_sg(sk, len, ctx->sg_plaintext_data,
+   rc = sk_alloc_sg(sk, len, ctx->sg_plaintext_data, 0,
 &ctx->sg_plaintext_num_elem, &ctx->sg_plaintext_size,
 tls_ctx->pending_open_record_frags);

[bpf-next PATCH v3 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper

2018-03-18 Thread John Fastabend

A single sendmsg or sendfile system call can contain multiple logical
messages that a BPF program may want to read and apply a verdict. But,
without an apply_bytes helper any verdict on the data applies to all
bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
care to read the first N bytes of a msg. If the payload is large say
MB or even GB setting up and calling the BPF program repeatedly for
all bytes, even though the verdict is already known, creates
unnecessary overhead.

To allow BPF programs to control how many bytes a given verdict
applies to we implement a bpf_msg_apply_bytes() helper. When called
from within a BPF program this sets a counter, internal to the
BPF infrastructure, that applies the last verdict to the next N
bytes. If the N is smaller than the current data being processed
from a sendmsg/sendfile call, the first N bytes will be sent and
the BPF program will be re-run with start_data pointing to the N+1
byte. If N is larger than the current data being processed the
BPF verdict will be applied to multiple sendmsg/sendfile calls
until N bytes are consumed.

Note1 if a socket closes with apply_bytes counter non-zero this
is not a problem because data is not being buffered for N bytes
and is sent as its received.

Signed-off-by: John Fastabend 
---
 include/uapi/linux/bpf.h |3 ++-
 net/core/filter.c|   16 
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ef52953..a557a2a 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -791,7 +791,8 @@ struct bpf_stack_build_id {
FN(getsockopt), \
FN(override_return),\
FN(sock_ops_cb_flags_set),  \
-   FN(msg_redirect_map),
+   FN(msg_redirect_map),   \
+   FN(msg_apply_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 2b6c475..17d6775 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1928,6 +1928,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
.arg4_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_msg_apply_bytes, struct sk_msg_buff *, msg, u32, bytes)
+{
+   msg->apply_bytes = bytes;
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_msg_apply_bytes_proto = {
+   .func   = bpf_msg_apply_bytes,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
return task_get_classid(skb);
@@ -3634,6 +3648,8 @@ static const struct bpf_func_proto 
*sk_msg_func_proto(enum bpf_func_id func_id)
switch (func_id) {
case BPF_FUNC_msg_redirect_map:
return &bpf_msg_redirect_map_proto;
+   case BPF_FUNC_msg_apply_bytes:
+   return &bpf_msg_apply_bytes_proto;
default:
return bpf_base_func_proto(func_id);
}

[bpf-next PATCH v3 07/18] bpf: sockmap, add msg_cork_bytes() helper

2018-03-18 Thread John Fastabend

In the case where we need a specific number of bytes before a
verdict can be assigned, even if the data spans multiple sendmsg
or sendfile calls. The BPF program may use msg_cork_bytes().

The extreme case is a user can call sendmsg repeatedly with
1-byte msg segments. Obviously, this is bad for performance but
is still valid. If the BPF program needs N bytes to validate
a header it can use msg_cork_bytes to specify N bytes and the
BPF program will not be called again until N bytes have been
accumulated. The infrastructure will attempt to coalesce data
if possible so in many cases (most my use cases at least) the
data will be in a single scatterlist element with data pointers
pointing to start/end of the element. However, this is dependent
on available memory so is not guaranteed. So BPF programs must
validate data pointer ranges, but this is the case anyways to
convince the verifier the accesses are valid.

Signed-off-by: John Fastabend 
---
 include/uapi/linux/bpf.h |3 ++-
 net/core/filter.c|   16 
 2 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a557a2a..1765cfb 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -792,7 +792,8 @@ struct bpf_stack_build_id {
FN(override_return),\
FN(sock_ops_cb_flags_set),  \
FN(msg_redirect_map),   \
-   FN(msg_apply_bytes),
+   FN(msg_apply_bytes),\
+   FN(msg_cork_bytes),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 17d6775..0c9daf6 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1942,6 +1942,20 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff *msg)
.arg2_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_2(bpf_msg_cork_bytes, struct sk_msg_buff *, msg, u32, bytes)
+{
+   msg->cork_bytes = bytes;
+   return 0;
+}
+
+static const struct bpf_func_proto bpf_msg_cork_bytes_proto = {
+   .func   = bpf_msg_cork_bytes,
+   .gpl_only   = false,
+   .ret_type   = RET_INTEGER,
+   .arg1_type  = ARG_PTR_TO_CTX,
+   .arg2_type  = ARG_ANYTHING,
+};
+
 BPF_CALL_1(bpf_get_cgroup_classid, const struct sk_buff *, skb)
 {
return task_get_classid(skb);
@@ -3650,6 +3664,8 @@ static const struct bpf_func_proto 
*sk_msg_func_proto(enum bpf_func_id func_id)
return &bpf_msg_redirect_map_proto;
case BPF_FUNC_msg_apply_bytes:
return &bpf_msg_apply_bytes_proto;
+   case BPF_FUNC_msg_cork_bytes:
+   return &bpf_msg_cork_bytes_proto;
default:
return bpf_base_func_proto(func_id);
}

[bpf-next PATCH v3 13/18] bpf: sockmap sample, add data verification option

2018-03-18 Thread John Fastabend

To verify data is not being dropped or corrupted this adds an option
to verify test-patterns on recv.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_user.c |  118 
 1 file changed, 84 insertions(+), 34 deletions(-)

diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index ec624a8..8017ad7a 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -68,6 +68,7 @@
{"iov_count",   required_argument,  NULL, 'i' },
{"length",  required_argument,  NULL, 'l' },
{"test",required_argument,  NULL, 't' },
+   {"data_test",   no_argument,NULL, 'd' },
{"txmsg",   no_argument,&txmsg_pass,  1  },
{"txmsg_noisy", no_argument,&txmsg_noisy, 1  },
{"txmsg_redir", no_argument,&txmsg_redir, 1  },
@@ -208,45 +209,49 @@ struct msg_stats {
 static int msg_loop_sendpage(int fd, int iov_length, int cnt,
 struct msg_stats *s)
 {
-   off_t offset = 0;
+   unsigned char k = 0;
FILE *file;
int i, fp;
 
file = fopen(".sendpage_tst.tmp", "w+");
-   fseek(file, iov_length * cnt, SEEK_CUR);
-   fprintf(file, "A");
+   for (i = 0; i < iov_length * cnt; i++, k++)
+   fwrite(&k, sizeof(char), 1, file);
+   fflush(file);
fseek(file, 0, SEEK_SET);
+   fclose(file);
 
-   fp = fileno(file);
+   fp = open(".sendpage_tst.tmp", O_RDONLY);
clock_gettime(CLOCK_MONOTONIC, &s->start);
for (i = 0; i < cnt; i++) {
-   int sent = sendfile(fd, fp, &offset, iov_length);
+   int sent = sendfile(fd, fp, NULL, iov_length);
 
if (sent < 0) {
perror("send loop error:");
-   fclose(file);
+   close(fp);
return sent;
}
s->bytes_sent += sent;
}
clock_gettime(CLOCK_MONOTONIC, &s->end);
-   fclose(file);
+   close(fp);
return 0;
 }
 
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
-   struct msg_stats *s, bool tx)
+   struct msg_stats *s, bool tx, bool data_test)
 {
struct msghdr msg = {0};
int err, i, flags = MSG_NOSIGNAL;
struct iovec *iov;
+   unsigned char k;
 
iov = calloc(iov_count, sizeof(struct iovec));
if (!iov)
return errno;
 
+   k = 0;
for (i = 0; i < iov_count; i++) {
-   char *d = calloc(iov_length, sizeof(char));
+   unsigned char *d = calloc(iov_length, sizeof(char));
 
if (!d) {
fprintf(stderr, "iov_count %i/%i OOM\n", i, iov_count);
@@ -254,10 +259,18 @@ static int msg_loop(int fd, int iov_count, int 
iov_length, int cnt,
}
iov[i].iov_base = d;
iov[i].iov_len = iov_length;
+
+   if (data_test && tx) {
+   int j;
+
+   for (j = 0; j < iov_length; j++)
+   d[j] = k++;
+   }
}
 
msg.msg_iov = iov;
msg.msg_iovlen = iov_count;
+   k = 0;
 
if (tx) {
clock_gettime(CLOCK_MONOTONIC, &s->start);
@@ -311,6 +324,26 @@ static int msg_loop(int fd, int iov_count, int iov_length, 
int cnt,
}
 
s->bytes_recvd += recv;
+
+   if (data_test) {
+   int j;
+
+   for (i = 0; i < msg.msg_iovlen; i++) {
+   unsigned char *d = iov[i].iov_base;
+
+   for (j = 0;
+j < iov[i].iov_len && recv; j++) {
+   if (d[j] != k++) {
+   errno = -EIO;
+   fprintf(stderr,
+   "detected data 
corruption @iov[%i]:%i %02x != %02x, %02x ?= %02x\n",
+   i, j, d[j], k - 
1, d[j+1], k + 1);
+   goto out_errno;
+   }
+   recv--;
+   }
+   }
+   }
}
clock_gettime(CLOCK_MONOTONIC, &s->end);
}
@@ -338,8 +371,15 @@ static inline float recvdBps(struct msg_stats s)
return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
 }
 
+struct sockmap_options {
+   int verbose;
+   bool base;
+   bo

[bpf-next PATCH v3 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes()

2018-03-18 Thread John Fastabend

Add sample application support for the bpf_msg_cork_bytes helper. This
lets the user specify how many bytes each verdict should apply to.

Similar to apply_bytes() tests these can be run as a stand-alone test
when used without other options or inline with other tests by using
the txmsg_cork option along with any of the basic tests txmsg,
txmsg_redir, txmsg_drop.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_kern.c|   53 +
 samples/sockmap/sockmap_user.c|   19 ++
 tools/include/uapi/linux/bpf.h|3 +-
 tools/testing/selftests/bpf/bpf_helpers.h |2 +
 4 files changed, 68 insertions(+), 9 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 205ec36..7352267 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -64,6 +64,13 @@ struct bpf_map_def SEC("maps") sock_apply_bytes = {
.max_entries = 1
 };
 
+struct bpf_map_def SEC("maps") sock_cork_bytes = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(int),
+   .value_size = sizeof(int),
+   .max_entries = 1
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -135,6 +142,9 @@ int bpf_prog4(struct sk_msg_md *msg)
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
bpf_msg_apply_bytes(msg, *bytes);
+   bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+   if (bytes)
+   bpf_msg_cork_bytes(msg, *bytes);
return SK_PASS;
 }
 
@@ -143,13 +153,16 @@ int bpf_prog5(struct sk_msg_md *msg)
 {
void *data_end = (void *)(long) msg->data_end;
void *data = (void *)(long) msg->data;
-   int *bytes, err = 0, zero = 0;
+   int *bytes, err1 = -1, err2 = -1, zero = 0;
 
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
-   err = bpf_msg_apply_bytes(msg, *bytes);
-   bpf_printk("sk_msg2: data length %i err %i\n",
-  (__u64)data_end - (__u64)data, err);
+   err1 = bpf_msg_apply_bytes(msg, *bytes);
+   bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+   if (bytes)
+   err2 = bpf_msg_cork_bytes(msg, *bytes);
+   bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n",
+  (__u64)data_end - (__u64)data, err1, err2);
return SK_PASS;
 }
 
@@ -163,6 +176,9 @@ int bpf_prog6(struct sk_msg_md *msg)
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
bpf_msg_apply_bytes(msg, *bytes);
+   bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+   if (bytes)
+   bpf_msg_cork_bytes(msg, *bytes);
return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -171,13 +187,17 @@ int bpf_prog7(struct sk_msg_md *msg)
 {
void *data_end = (void *)(long) msg->data_end;
void *data = (void *)(long) msg->data;
-   int *bytes, err = 0, zero = 0;
+   int *bytes, err1 = 0, err2 = 0, zero = 0;
 
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
-   err = bpf_msg_apply_bytes(msg, *bytes);
-   bpf_printk("sk_msg3: redirect(%iB) err=%i\n",
-  (__u64)data_end - (__u64)data, err);
+   err1 = bpf_msg_apply_bytes(msg, *bytes);
+   bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+   if (bytes)
+   err2 = bpf_msg_cork_bytes(msg, *bytes);
+
+   bpf_printk("sk_msg3: redirect(%iB) err1=%i err2=%i\n",
+  (__u64)data_end - (__u64)data, err1, err2);
return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
@@ -198,5 +218,22 @@ int bpf_prog8(struct sk_msg_md *msg)
}
return SK_PASS;
 }
+SEC("sk_msg6")
+int bpf_prog9(struct sk_msg_md *msg)
+{
+   void *data_end = (void *)(long) msg->data_end;
+   void *data = (void *)(long) msg->data;
+   int ret = 0, *bytes, zero = 0;
+
+   bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+   if (bytes) {
+   if (((__u64)data_end - (__u64)data) >= *bytes)
+   return SK_PASS;
+   ret = bpf_msg_cork_bytes(msg, *bytes);
+   if (ret)
+   return SK_DROP;
+   }
+   return SK_PASS;
+}
 
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 41774ec..4e0a3d8 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -60,6 +60,7 @@
 int txmsg_redir;
 int txmsg_redir_noisy;
 int txmsg_apply;
+int txmsg_cork;
 
 static const struct option long_options[] = {
{"help",no_argument,NULL, 'h' },
@@ -75,6 +76,7 @@
{"txmsg_redir", no_argument,&txmsg_redir, 1  },
{"txmsg_redir_noisy",   no_argument,&txmsg_redir_noisy, 1},
{"txmsg_apply", req

[bpf-next PATCH v3 18/18] bpf: sockmap test script

2018-03-18 Thread John Fastabend

This adds the test script I am currently using to validate
the latest sockmap changes. Shortly sockmap will be ported
to selftests and these will be run from the infrastructure
there. Until then add the script here so we have a coverage
checklist when porting into selftests.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_test.sh |  450 +++
 1 file changed, 450 insertions(+)
 create mode 100755 samples/sockmap/sockmap_test.sh

diff --git a/samples/sockmap/sockmap_test.sh b/samples/sockmap/sockmap_test.sh
new file mode 100755
index 000..6d8cc40
--- /dev/null
+++ b/samples/sockmap/sockmap_test.sh
@@ -0,0 +1,450 @@
+#Test a bunch of positive cases to verify basic functionality
+for prog in "--txmsg" "--txmsg_redir" "--txmsg_drop"; do
+for t in "sendmsg" "sendpage"; do
+for r in 1 10 100; do
+   for i in 1 10 100; do
+   for l in 1 10 100; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i 
$i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+   done
+   done
+done
+done
+done
+
+#Test max iov
+t="sendmsg"
+r=1
+i=1024
+l=1
+prog="--txmsg"
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+prog="--txmsg_redir"
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+
+# Test max iov with 1k send
+
+t="sendmsg"
+r=1
+i=1024
+l=1024
+prog="--txmsg"
+
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+prog="--txmsg_redir"
+TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+echo $TEST
+$TEST
+sleep 2
+
+# Test apply with 1B
+r=1
+i=1024
+l=1024
+prog="--txmsg_apply 1"
+
+for t in "sendmsg" "sendpage"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test apply with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test apply with apply that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test apply and redirect with 1B
+r=1
+i=1024
+l=1024
+prog="--txmsg_redir --txmsg_apply 1"
+
+for t in "sendmsg" "sendpage"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test apply and redirect with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_redir --txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test apply and redirect with apply that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_apply 2048"
+
+for t in "sendmsg" "sendpage"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test cork with 1B not really useful but test it anyways
+r=1
+i=1024
+l=1024
+prog="--txmsg_cork 1"
+
+for t in "sendpage" "sendmsg"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test cork with a more reasonable 100B
+r=1
+i=1000
+l=1000
+prog="--txmsg_cork 100"
+
+for t in "sendpage" "sendmsg"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test cork with larger value than send
+r=1
+i=8
+l=1024
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test cork with cork that never reaches limit
+r=1024
+i=1
+l=1
+prog="--txmsg_cork 2048"
+
+for t in "sendpage" "sendmsg"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+r=1
+i=1024
+l=1024
+prog="--txmsg_redir --txmsg_cork 1"
+
+for t in "sendpage" "sendmsg"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test cork with a more reasonable 100B
+r=1
+i=1000
+l=1000
+prog="--txmsg_redir --txmsg_cork 100"
+
+for t in "sendpage" "sendmsg"; do
+   TEST="./sockmap --cgroup /mnt/cgroup2/ -t $t -r $r -i $i -l $l $prog"
+   echo $TEST
+   $TEST
+   sleep 2
+done
+
+# Test cork with larger value than send
+r=1
+i=8
+l=1024
+prog="--t

[bpf-next PATCH v3 12/18] bpf: sockmap sample, add sendfile test

2018-03-18 Thread John Fastabend

To exercise TX ULP sendpage implementation we need a test that does
a sendfile. Add sendfile test option here.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_user.c |   70 ++--
 1 file changed, 60 insertions(+), 10 deletions(-)

diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index bbfe3a2..ec624a8 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -67,10 +68,10 @@
{"iov_count",   required_argument,  NULL, 'i' },
{"length",  required_argument,  NULL, 'l' },
{"test",required_argument,  NULL, 't' },
-   {"txmsg",   no_argument,&txmsg_pass,  1  },
-   {"txmsg_noisy", no_argument,&txmsg_noisy, 1  },
-   {"txmsg_redir", no_argument,&txmsg_redir, 1  },
-   {"txmsg_redir_noisy",   no_argument,&txmsg_redir_noisy, 1},
+   {"txmsg",   no_argument,&txmsg_pass,  1  },
+   {"txmsg_noisy", no_argument,&txmsg_noisy, 1  },
+   {"txmsg_redir", no_argument,&txmsg_redir, 1  },
+   {"txmsg_redir_noisy",   no_argument,&txmsg_redir_noisy, 1},
{0, 0, NULL, 0 }
 };
 
@@ -204,6 +205,35 @@ struct msg_stats {
struct timespec end;
 };
 
+static int msg_loop_sendpage(int fd, int iov_length, int cnt,
+struct msg_stats *s)
+{
+   off_t offset = 0;
+   FILE *file;
+   int i, fp;
+
+   file = fopen(".sendpage_tst.tmp", "w+");
+   fseek(file, iov_length * cnt, SEEK_CUR);
+   fprintf(file, "A");
+   fseek(file, 0, SEEK_SET);
+
+   fp = fileno(file);
+   clock_gettime(CLOCK_MONOTONIC, &s->start);
+   for (i = 0; i < cnt; i++) {
+   int sent = sendfile(fd, fp, &offset, iov_length);
+
+   if (sent < 0) {
+   perror("send loop error:");
+   fclose(file);
+   return sent;
+   }
+   s->bytes_sent += sent;
+   }
+   clock_gettime(CLOCK_MONOTONIC, &s->end);
+   fclose(file);
+   return 0;
+}
+
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
struct msg_stats *s, bool tx)
 {
@@ -309,7 +339,7 @@ static inline float recvdBps(struct msg_stats s)
 }
 
 static int sendmsg_test(int iov_count, int iov_buf, int cnt,
-   int verbose, bool base)
+   int verbose, bool base, bool sendpage)
 {
float sent_Bps = 0, recvd_Bps = 0;
int rx_fd, txpid, rxpid, err = 0;
@@ -325,6 +355,8 @@ static int sendmsg_test(int iov_count, int iov_buf, int cnt,
 
rxpid = fork();
if (rxpid == 0) {
+   if (sendpage)
+   iov_count = 1;
err = msg_loop(rx_fd, iov_count, iov_buf, cnt, &s, false);
if (err)
fprintf(stderr,
@@ -348,7 +380,11 @@ static int sendmsg_test(int iov_count, int iov_buf, int 
cnt,
 
txpid = fork();
if (txpid == 0) {
-   err = msg_loop(c1, iov_count, iov_buf, cnt, &s, true);
+   if (sendpage)
+   err = msg_loop_sendpage(c1, iov_buf, cnt, &s);
+   else
+   err = msg_loop(c1, iov_count, iov_buf, cnt, &s, true);
+
if (err)
fprintf(stderr,
"msg_loop_tx: iov_count %i iov_buf %i cnt %i 
err %i\n",
@@ -452,6 +488,8 @@ enum {
PING_PONG,
SENDMSG,
BASE,
+   BASE_SENDPAGE,
+   SENDPAGE,
 };
 
 int main(int argc, char **argv)
@@ -494,6 +532,10 @@ int main(int argc, char **argv)
test = SENDMSG;
} else if (strcmp(optarg, "base") == 0) {
test = BASE;
+   } else if (strcmp(optarg, "base_sendpage") == 0) {
+   test = BASE_SENDPAGE;
+   } else if (strcmp(optarg, "sendpage") == 0) {
+   test = SENDPAGE;
} else {
usage(argv);
return -1;
@@ -533,7 +575,7 @@ int main(int argc, char **argv)
}
 
/* If base test skip BPF setup */
-   if (test == BASE)
+   if (test == BASE || test == BASE_SENDPAGE)
goto run;
 
/* Attach programs to sockmap */
@@ -599,7 +641,7 @@ int main(int argc, char **argv)
err, strerror(errno));
return err;
}
-   if (test == SENDMSG)
+   if (txmsg_redir || txmsg_redir_noisy)
redir_fd = c2;
else

[bpf-next PATCH v3 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG

2018-03-18 Thread John Fastabend

Add map tests to attach BPF_PROG_TYPE_SK_MSG types to a sockmap.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 tools/include/uapi/linux/bpf.h |   10 
 tools/testing/selftests/bpf/Makefile   |3 +
 tools/testing/selftests/bpf/bpf_helpers.h  |2 +
 tools/testing/selftests/bpf/sockmap_parse_prog.c   |   15 +
 tools/testing/selftests/bpf/sockmap_tcp_msg_prog.c |   33 
 tools/testing/selftests/bpf/sockmap_verdict_prog.c |7 +++
 tools/testing/selftests/bpf/test_maps.c|   55 +++-
 7 files changed, 118 insertions(+), 7 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/sockmap_tcp_msg_prog.c

diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 1944d0a..13d9c59 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -133,6 +133,7 @@ enum bpf_prog_type {
BPF_PROG_TYPE_SOCK_OPS,
BPF_PROG_TYPE_SK_SKB,
BPF_PROG_TYPE_CGROUP_DEVICE,
+   BPF_PROG_TYPE_SK_MSG,
 };
 
 enum bpf_attach_type {
@@ -143,6 +144,7 @@ enum bpf_attach_type {
BPF_SK_SKB_STREAM_PARSER,
BPF_SK_SKB_STREAM_VERDICT,
BPF_CGROUP_DEVICE,
+   BPF_SK_MSG_VERDICT,
__MAX_BPF_ATTACH_TYPE
 };
 
@@ -941,6 +943,14 @@ enum sk_action {
SK_PASS,
 };
 
+/* user accessible metadata for SK_MSG packet hook, new fields must
+ * be added to the end of this structure
+ */
+struct sk_msg_md {
+   void *data;
+   void *data_end;
+};
+
 #define BPF_TAG_SIZE   8
 
 struct bpf_prog_info {
diff --git a/tools/testing/selftests/bpf/Makefile 
b/tools/testing/selftests/bpf/Makefile
index b0d29fd..f35fb02 100644
--- a/tools/testing/selftests/bpf/Makefile
+++ b/tools/testing/selftests/bpf/Makefile
@@ -29,7 +29,8 @@ TEST_GEN_FILES = test_pkt_access.o test_xdp.o test_l4lb.o 
test_tcp_estats.o test
test_pkt_md_access.o test_xdp_redirect.o test_xdp_meta.o 
sockmap_parse_prog.o \
sockmap_verdict_prog.o dev_cgroup.o sample_ret0.o test_tracepoint.o \
test_l4lb_noinline.o test_xdp_noinline.o test_stacktrace_map.o \
-   sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o
+   sample_map_ret0.o test_tcpbpf_kern.o test_stacktrace_build_id.o \
+   sockmap_tcp_msg_prog.o
 
 # Order correspond to 'make run_tests' order
 TEST_PROGS := test_kmod.sh \
diff --git a/tools/testing/selftests/bpf/bpf_helpers.h 
b/tools/testing/selftests/bpf/bpf_helpers.h
index dde2c11..1558fe8 100644
--- a/tools/testing/selftests/bpf/bpf_helpers.h
+++ b/tools/testing/selftests/bpf/bpf_helpers.h
@@ -123,6 +123,8 @@ static int (*bpf_skb_under_cgroup)(void *ctx, void *map, 
int index) =
(void *) BPF_FUNC_skb_under_cgroup;
 static int (*bpf_skb_change_head)(void *, int len, int flags) =
(void *) BPF_FUNC_skb_change_head;
+static int (*bpf_skb_pull_data)(void *, int len) =
+   (void *) BPF_FUNC_skb_pull_data;
 
 /* Scan the ARCH passed in from ARCH env variable (see Makefile) */
 #if defined(__TARGET_ARCH_x86)
diff --git a/tools/testing/selftests/bpf/sockmap_parse_prog.c 
b/tools/testing/selftests/bpf/sockmap_parse_prog.c
index a1dec2b..0f92858 100644
--- a/tools/testing/selftests/bpf/sockmap_parse_prog.c
+++ b/tools/testing/selftests/bpf/sockmap_parse_prog.c
@@ -20,14 +20,25 @@ int bpf_prog1(struct __sk_buff *skb)
__u32 lport = skb->local_port;
__u32 rport = skb->remote_port;
__u8 *d = data;
+   __u32 len = (__u32) data_end - (__u32) data;
+   int err;
 
-   if (data + 10 > data_end)
-   return skb->len;
+   if (data + 10 > data_end) {
+   err = bpf_skb_pull_data(skb, 10);
+   if (err)
+   return SK_DROP;
+
+   data_end = (void *)(long)skb->data_end;
+   data = (void *)(long)skb->data;
+   if (data + 10 > data_end)
+   return SK_DROP;
+   }
 
/* This write/read is a bit pointless but tests the verifier and
 * strparser handler for read/write pkt data and access into sk
 * fields.
 */
+   d = data;
d[7] = 1;
return skb->len;
 }
diff --git a/tools/testing/selftests/bpf/sockmap_tcp_msg_prog.c 
b/tools/testing/selftests/bpf/sockmap_tcp_msg_prog.c
new file mode 100644
index 000..12a7b5c
--- /dev/null
+++ b/tools/testing/selftests/bpf/sockmap_tcp_msg_prog.c
@@ -0,0 +1,33 @@
+#include 
+#include "bpf_helpers.h"
+#include "bpf_util.h"
+#include "bpf_endian.h"
+
+int _version SEC("version") = 1;
+
+#define bpf_printk(fmt, ...)   \
+({ \
+  char fmt[] = fmt;\
+  bpf_trace_printk(fmt, sizeof(fmt),   \
+   ##__VA_ARGS__); \
+})
+
+SEC("sk_msg1")
+int bpf_prog1(struct sk_msg_md *msg)
+{
+   void *data

[bpf-next PATCH v3 14/18] bpf: sockmap, add sample option to test apply_bytes helper

2018-03-18 Thread John Fastabend

This adds an option to test the apply_bytes helper. This option lets
the user specify an int on the command line specifying how much data
each verdict should apply to.

When this is set a map entry is set with the bytes input by the user
and then the specified program --txmsg or --txmsg_redir will use the
value and set the applied data. If no other option is set then a
default --txmsg_apply program is run. This program will drop pkts
if an error is detected on the bytes map lookup. Useful to verify
the map lookup and apply helper are working and causing a hard
error if it is not.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_kern.c|   54 ++---
 samples/sockmap/sockmap_user.c|   19 ++
 tools/testing/selftests/bpf/bpf_helpers.h |3 +-
 3 files changed, 68 insertions(+), 8 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 75edb2f..205ec36 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -57,6 +57,13 @@ struct bpf_map_def SEC("maps") sock_map_redir = {
.max_entries = 1,
 };
 
+struct bpf_map_def SEC("maps") sock_apply_bytes = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(int),
+   .value_size = sizeof(int),
+   .max_entries = 1
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -123,6 +130,11 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 SEC("sk_msg1")
 int bpf_prog4(struct sk_msg_md *msg)
 {
+   int *bytes, zero = 0;
+
+   bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+   if (bytes)
+   bpf_msg_apply_bytes(msg, *bytes);
return SK_PASS;
 }
 
@@ -131,8 +143,13 @@ int bpf_prog5(struct sk_msg_md *msg)
 {
void *data_end = (void *)(long) msg->data_end;
void *data = (void *)(long) msg->data;
+   int *bytes, err = 0, zero = 0;
 
-   bpf_printk("sk_msg2: data length %i\n", (__u32)data_end - (__u32)data);
+   bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+   if (bytes)
+   err = bpf_msg_apply_bytes(msg, *bytes);
+   bpf_printk("sk_msg2: data length %i err %i\n",
+  (__u64)data_end - (__u64)data, err);
return SK_PASS;
 }
 
@@ -141,9 +158,12 @@ int bpf_prog6(struct sk_msg_md *msg)
 {
void *data_end = (void *)(long) msg->data_end;
void *data = (void *)(long) msg->data;
-   int ret = 0;
+   int *bytes, zero = 0;
 
-   return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+   bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+   if (bytes)
+   bpf_msg_apply_bytes(msg, *bytes);
+   return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
 SEC("sk_msg4")
@@ -151,10 +171,32 @@ int bpf_prog7(struct sk_msg_md *msg)
 {
void *data_end = (void *)(long) msg->data_end;
void *data = (void *)(long) msg->data;
-   int ret = 0;
+   int *bytes, err = 0, zero = 0;
+
+   bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+   if (bytes)
+   err = bpf_msg_apply_bytes(msg, *bytes);
+   bpf_printk("sk_msg3: redirect(%iB) err=%i\n",
+  (__u64)data_end - (__u64)data, err);
+   return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
+}
 
-   bpf_printk("sk_msg3: redirect(%iB)\n", (__u32)data_end - (__u32)data);
-   return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+SEC("sk_msg5")
+int bpf_prog8(struct sk_msg_md *msg)
+{
+   void *data_end = (void *)(long) msg->data_end;
+   void *data = (void *)(long) msg->data;
+   int ret = 0, *bytes, zero = 0;
+
+   bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+   if (bytes) {
+   ret = bpf_msg_apply_bytes(msg, *bytes);
+   if (ret)
+   return SK_DROP;
+   } else {
+   return SK_DROP;
+   }
+   return SK_PASS;
 }
 
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 8017ad7a..41774ec 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -59,6 +59,7 @@
 int txmsg_noisy;
 int txmsg_redir;
 int txmsg_redir_noisy;
+int txmsg_apply;
 
 static const struct option long_options[] = {
{"help",no_argument,NULL, 'h' },
@@ -73,6 +74,7 @@
{"txmsg_noisy", no_argument,&txmsg_noisy, 1  },
{"txmsg_redir", no_argument,&txmsg_redir, 1  },
{"txmsg_redir_noisy",   no_argument,&txmsg_redir_noisy, 1},
+   {"txmsg_apply", required_argument,  NULL, 'a'},
{0, 0, NULL, 0 }
 };
 
@@ -546,7 +548,9 @@ int main(int argc, char **argv)
while ((opt = getopt_long(argc, argv, ":dhvc:r:i:l:t:",
  long_options, &longindex)) != -1) {
switch (opt) {
-   /* Cgroup configuration */
+

[bpf-next PATCH v3 16/18] bpf: sockmap add SK_DROP tests

2018-03-18 Thread John Fastabend

Add tests for SK_DROP.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_kern.c |   15 ++
 samples/sockmap/sockmap_user.c |   62 ++--
 2 files changed, 61 insertions(+), 16 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 7352267..8b6c34c 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -236,4 +236,19 @@ int bpf_prog9(struct sk_msg_md *msg)
return SK_PASS;
 }
 
+SEC("sk_msg7")
+int bpf_prog10(struct sk_msg_md *msg)
+{
+   int *bytes, zero = 0;
+
+   bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
+   if (bytes)
+   bpf_msg_apply_bytes(msg, *bytes);
+   bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
+   if (bytes)
+   bpf_msg_cork_bytes(msg, *bytes);
+   return SK_DROP;
+}
+
+
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 4e0a3d8..52c4ed7 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -59,6 +59,7 @@
 int txmsg_noisy;
 int txmsg_redir;
 int txmsg_redir_noisy;
+int txmsg_drop;
 int txmsg_apply;
 int txmsg_cork;
 
@@ -75,6 +76,7 @@
{"txmsg_noisy", no_argument,&txmsg_noisy, 1  },
{"txmsg_redir", no_argument,&txmsg_redir, 1  },
{"txmsg_redir_noisy",   no_argument,&txmsg_redir_noisy, 1},
+   {"txmsg_drop",  no_argument,&txmsg_drop, 1 },
{"txmsg_apply", required_argument,  NULL, 'a'},
{"txmsg_cork",  required_argument,  NULL, 'k'},
{0, 0, NULL, 0 }
@@ -210,9 +212,19 @@ struct msg_stats {
struct timespec end;
 };
 
+struct sockmap_options {
+   int verbose;
+   bool base;
+   bool sendpage;
+   bool data_test;
+   bool drop_expected;
+};
+
 static int msg_loop_sendpage(int fd, int iov_length, int cnt,
-struct msg_stats *s)
+struct msg_stats *s,
+struct sockmap_options *opt)
 {
+   bool drop = opt->drop_expected;
unsigned char k = 0;
FILE *file;
int i, fp;
@@ -229,12 +241,18 @@ static int msg_loop_sendpage(int fd, int iov_length, int 
cnt,
for (i = 0; i < cnt; i++) {
int sent = sendfile(fd, fp, NULL, iov_length);
 
-   if (sent < 0) {
+   if (!drop && sent < 0) {
perror("send loop error:");
close(fp);
return sent;
+   } else if (drop && sent >= 0) {
+   printf("sendpage loop error expected: %i\n", sent);
+   close(fp);
+   return -EIO;
}
-   s->bytes_sent += sent;
+
+   if (sent > 0)
+   s->bytes_sent += sent;
}
clock_gettime(CLOCK_MONOTONIC, &s->end);
close(fp);
@@ -242,12 +260,15 @@ static int msg_loop_sendpage(int fd, int iov_length, int 
cnt,
 }
 
 static int msg_loop(int fd, int iov_count, int iov_length, int cnt,
-   struct msg_stats *s, bool tx, bool data_test)
+   struct msg_stats *s, bool tx,
+   struct sockmap_options *opt)
 {
struct msghdr msg = {0};
int err, i, flags = MSG_NOSIGNAL;
struct iovec *iov;
unsigned char k;
+   bool data_test = opt->data_test;
+   bool drop = opt->drop_expected;
 
iov = calloc(iov_count, sizeof(struct iovec));
if (!iov)
@@ -281,11 +302,16 @@ static int msg_loop(int fd, int iov_count, int 
iov_length, int cnt,
for (i = 0; i < cnt; i++) {
int sent = sendmsg(fd, &msg, flags);
 
-   if (sent < 0) {
+   if (!drop && sent < 0) {
perror("send loop error:");
goto out_errno;
+   } else if (drop && sent >= 0) {
+   printf("send loop error expected: %i\n", sent);
+   errno = -EIO;
+   goto out_errno;
}
-   s->bytes_sent += sent;
+   if (sent > 0)
+   s->bytes_sent += sent;
}
clock_gettime(CLOCK_MONOTONIC, &s->end);
} else {
@@ -375,13 +401,6 @@ static inline float recvdBps(struct msg_stats s)
return s.bytes_recvd / (s.end.tv_sec - s.start.tv_sec);
 }
 
-struct sockmap_options {
-   int verbose;
-   bool base;
-   bool sendpage;
-   bool data_test;
-};
-
 static int sendmsg_test(int iov_count, int iov_buf, int cnt,
struct sockmap_options *opt)
 {
@@ -399,10 +418,13 @@ static int sendmsg_test(int iov_count, int iov_buf

[bpf-next PATCH v3 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

2018-03-18 Thread John Fastabend

This implements a BPF ULP layer to allow policy enforcement and
monitoring at the socket layer. In order to support this a new
program type BPF_PROG_TYPE_SK_MSG is used to run the policy at
the sendmsg/sendpage hook. To attach the policy to sockets a
sockmap is used with a new program attach type BPF_SK_MSG_VERDICT.

Similar to previous sockmap usages when a sock is added to a
sockmap, via a map update, if the map contains a BPF_SK_MSG_VERDICT
program type attached then the BPF ULP layer is created on the
socket and the attached BPF_PROG_TYPE_SK_MSG program is run for
every msg in sendmsg case and page/offset in sendpage case.

BPF_PROG_TYPE_SK_MSG Semantics/API:

BPF_PROG_TYPE_SK_MSG supports only two return codes SK_PASS and
SK_DROP. Returning SK_DROP free's the copied data in the sendmsg
case and in the sendpage case leaves the data untouched. Both cases
return -EACESS to the user. Returning SK_PASS will allow the msg to
be sent.

In the sendmsg case data is copied into kernel space buffers before
running the BPF program. The kernel space buffers are stored in a
scatterlist object where each element is a kernel memory buffer.
Some effort is made to coalesce data from the sendmsg call here.
For example a sendmsg call with many one byte iov entries will
likely be pushed into a single entry. The BPF program is run with
data pointers (start/end) pointing to the first sg element.

In the sendpage case data is not copied. We opt not to copy the
data by default here, because the BPF infrastructure does not
know what bytes will be needed nor when they will be needed. So
copying all bytes may be wasteful. Because of this the initial
start/end data pointers are (0,0). Meaning no data can be read or
written. This avoids reading data that may be modified by the
user. A new helper is added later in this series if reading and
writing the data is needed. The helper call will do a copy by
default so that the page is exclusively owned by the BPF call.

The verdict from the BPF_PROG_TYPE_SK_MSG applies to the entire msg
in the sendmsg() case and the entire page/offset in the sendpage case.
This avoids ambiguity on how to handle mixed return codes in the
sendmsg case. Again a helper is added later in the series if
a verdict needs to apply to multiple system calls and/or only
a subpart of the currently being processed message.

The helper msg_redirect_map() can be used to select the socket to
send the data on. This is used similar to existing redirect use
cases. This allows policy to redirect msgs.

Pseudo code simple example:

The basic logic to attach a program to a socket is as follows,

  // load the programs
  bpf_prog_load(SOCKMAP_TCP_MSG_PROG, BPF_PROG_TYPE_SK_MSG,
&obj, &msg_prog);

  // lookup the sockmap
  bpf_map_msg = bpf_object__find_map_by_name(obj, "my_sock_map");

  // get fd for sockmap
  map_fd_msg = bpf_map__fd(bpf_map_msg);

  // attach program to sockmap
  bpf_prog_attach(msg_prog, map_fd_msg, BPF_SK_MSG_VERDICT, 0);

Adding sockets to the map is done in the normal way,

  // Add a socket 'fd' to sockmap at location 'i'
  bpf_map_update_elem(map_fd_msg, &i, fd, BPF_ANY);

After the above any socket attached to "my_sock_map", in this case
'fd', will run the BPF msg verdict program (msg_prog) on every
sendmsg and sendpage system call.

For a complete example see BPF selftests or sockmap samples.

Implementation notes:

It seemed the simplest, to me at least, to use a refcnt to ensure
psock is not lost across the sendmsg copy into the sg, the bpf program
running on the data in sg_data, and the final pass to the TCP stack.
Some performance testing may show a better method to do this and avoid
the refcnt cost, but for now use the simpler method.

Another item that will come after basic support is in place is
supporting MSG_MORE flag. At the moment we call sendpages even if
the MSG_MORE flag is set. An enhancement would be to collect the
pages into a larger scatterlist and pass down the stack. Notice that
bpf_tcp_sendmsg() could support this with some additional state saved
across sendmsg calls. I built the code to support this without having
to do refactoring work. Other features TBD include ZEROCOPY and the
TCP_RECV_QUEUE/TCP_NO_QUEUE support. This will follow initial series
shortly.

Future work could improve size limits on the scatterlist rings used
here. Currently, we use MAX_SKB_FRAGS simply because this was being
used already in the TLS case. Future work could extend the kernel sk
APIs to tune this depending on workload. This is a trade-off
between memory usage and throughput performance.

Signed-off-by: John Fastabend 
---
 include/linux/bpf.h   |1 
 include/linux/bpf_types.h |1 
 include/linux/filter.h|   17 +
 include/uapi/linux/bpf.h  |   22 +
 kernel/bpf/sockmap.c  |  712 -
 kernel/bpf/syscall.c  |   14 +
 kernel/bpf/verifier.c |5 
 net/core/filter.c |  106 +++
 8 files changed, 857 insertions(

[bpf-next PATCH v3 01/18] sock: make static tls function alloc_sg generic sock helper

2018-03-18 Thread John Fastabend

The TLS ULP module builds scatterlists from a sock using
page_frag_refill(). This is going to be useful for other ULPs
so move it into sock file for more general use.

In the process remove useless goto at end of while loop.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 include/net/sock.h |4 +++
 net/core/sock.c|   56 ++
 net/tls/tls_sw.c   |   69 +---
 3 files changed, 67 insertions(+), 62 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index b962458..447150c 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -2141,6 +2141,10 @@ static inline struct page_frag *sk_page_frag(struct sock 
*sk)
 
 bool sk_page_frag_refill(struct sock *sk, struct page_frag *pfrag);
 
+int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
+   int *sg_num_elem, unsigned int *sg_size,
+   int first_coalesce);
+
 /*
  * Default write policy as shown to user space via poll/select/SIGIO
  */
diff --git a/net/core/sock.c b/net/core/sock.c
index 27f218b..f68dff0 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2239,6 +2239,62 @@ bool sk_page_frag_refill(struct sock *sk, struct 
page_frag *pfrag)
 }
 EXPORT_SYMBOL(sk_page_frag_refill);
 
+int sk_alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
+   int *sg_num_elem, unsigned int *sg_size,
+   int first_coalesce)
+{
+   struct page_frag *pfrag;
+   unsigned int size = *sg_size;
+   int num_elem = *sg_num_elem, use = 0, rc = 0;
+   struct scatterlist *sge;
+   unsigned int orig_offset;
+
+   len -= size;
+   pfrag = sk_page_frag(sk);
+
+   while (len > 0) {
+   if (!sk_page_frag_refill(sk, pfrag)) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   use = min_t(int, len, pfrag->size - pfrag->offset);
+
+   if (!sk_wmem_schedule(sk, use)) {
+   rc = -ENOMEM;
+   goto out;
+   }
+
+   sk_mem_charge(sk, use);
+   size += use;
+   orig_offset = pfrag->offset;
+   pfrag->offset += use;
+
+   sge = sg + num_elem - 1;
+   if (num_elem > first_coalesce && sg_page(sg) == pfrag->page &&
+   sg->offset + sg->length == orig_offset) {
+   sg->length += use;
+   } else {
+   sge++;
+   sg_unmark_end(sge);
+   sg_set_page(sge, pfrag->page, use, orig_offset);
+   get_page(pfrag->page);
+   ++num_elem;
+   if (num_elem == MAX_SKB_FRAGS) {
+   rc = -ENOSPC;
+   break;
+   }
+   }
+
+   len -= use;
+   }
+out:
+   *sg_size = size;
+   *sg_num_elem = num_elem;
+   return rc;
+}
+EXPORT_SYMBOL(sk_alloc_sg);
+
 static void __lock_sock(struct sock *sk)
__releases(&sk->sk_lock.slock)
__acquires(&sk->sk_lock.slock)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index f26376e..0fc8a24 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -87,71 +87,16 @@ static void trim_both_sgl(struct sock *sk, int target_size)
target_size);
 }
 
-static int alloc_sg(struct sock *sk, int len, struct scatterlist *sg,
-   int *sg_num_elem, unsigned int *sg_size,
-   int first_coalesce)
-{
-   struct page_frag *pfrag;
-   unsigned int size = *sg_size;
-   int num_elem = *sg_num_elem, use = 0, rc = 0;
-   struct scatterlist *sge;
-   unsigned int orig_offset;
-
-   len -= size;
-   pfrag = sk_page_frag(sk);
-
-   while (len > 0) {
-   if (!sk_page_frag_refill(sk, pfrag)) {
-   rc = -ENOMEM;
-   goto out;
-   }
-
-   use = min_t(int, len, pfrag->size - pfrag->offset);
-
-   if (!sk_wmem_schedule(sk, use)) {
-   rc = -ENOMEM;
-   goto out;
-   }
-
-   sk_mem_charge(sk, use);
-   size += use;
-   orig_offset = pfrag->offset;
-   pfrag->offset += use;
-
-   sge = sg + num_elem - 1;
-   if (num_elem > first_coalesce && sg_page(sg) == pfrag->page &&
-   sg->offset + sg->length == orig_offset) {
-   sg->length += use;
-   } else {
-   sge++;
-   sg_unmark_end(sge);
-   sg_set_page(sge, pfrag->page, use, orig_offset);
-   get_page(pfrag->page);
-   ++num_elem;
-   if (num_elem == MAX_SKB_FRAGS) {
-   rc = -ENOSPC;

[bpf-next PATCH v3 11/18] bpf: sockmap sample, add option to attach SK_MSG program

2018-03-18 Thread John Fastabend

Add sockmap option to use SK_MSG program types.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/bpf/bpf_load.c|8 +++
 samples/sockmap/sockmap_kern.c|   52 +++
 samples/sockmap/sockmap_user.c|   67 ++---
 tools/include/uapi/linux/bpf.h|   13 +-
 tools/lib/bpf/libbpf.c|1 
 tools/testing/selftests/bpf/bpf_helpers.h |3 +
 6 files changed, 135 insertions(+), 9 deletions(-)

diff --git a/samples/bpf/bpf_load.c b/samples/bpf/bpf_load.c
index 69806d7..b1a310c 100644
--- a/samples/bpf/bpf_load.c
+++ b/samples/bpf/bpf_load.c
@@ -67,6 +67,7 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
bool is_cgroup_sk = strncmp(event, "cgroup/sock", 11) == 0;
bool is_sockops = strncmp(event, "sockops", 7) == 0;
bool is_sk_skb = strncmp(event, "sk_skb", 6) == 0;
+   bool is_sk_msg = strncmp(event, "sk_msg", 6) == 0;
size_t insns_cnt = size / sizeof(struct bpf_insn);
enum bpf_prog_type prog_type;
char buf[256];
@@ -96,6 +97,8 @@ static int load_and_attach(const char *event, struct bpf_insn 
*prog, int size)
prog_type = BPF_PROG_TYPE_SOCK_OPS;
} else if (is_sk_skb) {
prog_type = BPF_PROG_TYPE_SK_SKB;
+   } else if (is_sk_msg) {
+   prog_type = BPF_PROG_TYPE_SK_MSG;
} else {
printf("Unknown event '%s'\n", event);
return -1;
@@ -113,7 +116,7 @@ static int load_and_attach(const char *event, struct 
bpf_insn *prog, int size)
if (is_xdp || is_perf_event || is_cgroup_skb || is_cgroup_sk)
return 0;
 
-   if (is_socket || is_sockops || is_sk_skb) {
+   if (is_socket || is_sockops || is_sk_skb || is_sk_msg) {
if (is_socket)
event += 6;
else
@@ -589,7 +592,8 @@ static int do_load_bpf_file(const char *path, fixup_map_cb 
fixup_map)
memcmp(shname, "socket", 6) == 0 ||
memcmp(shname, "cgroup/", 7) == 0 ||
memcmp(shname, "sockops", 7) == 0 ||
-   memcmp(shname, "sk_skb", 6) == 0) {
+   memcmp(shname, "sk_skb", 6) == 0 ||
+   memcmp(shname, "sk_msg", 6) == 0) {
ret = load_and_attach(shname, data->d_buf,
  data->d_size);
if (ret != 0)
diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 52b0053..75edb2f 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -43,6 +43,20 @@ struct bpf_map_def SEC("maps") sock_map = {
.max_entries = 20,
 };
 
+struct bpf_map_def SEC("maps") sock_map_txmsg = {
+   .type = BPF_MAP_TYPE_SOCKMAP,
+   .key_size = sizeof(int),
+   .value_size = sizeof(int),
+   .max_entries = 20,
+};
+
+struct bpf_map_def SEC("maps") sock_map_redir = {
+   .type = BPF_MAP_TYPE_SOCKMAP,
+   .key_size = sizeof(int),
+   .value_size = sizeof(int),
+   .max_entries = 1,
+};
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -105,4 +119,42 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 
return 0;
 }
+
+SEC("sk_msg1")
+int bpf_prog4(struct sk_msg_md *msg)
+{
+   return SK_PASS;
+}
+
+SEC("sk_msg2")
+int bpf_prog5(struct sk_msg_md *msg)
+{
+   void *data_end = (void *)(long) msg->data_end;
+   void *data = (void *)(long) msg->data;
+
+   bpf_printk("sk_msg2: data length %i\n", (__u32)data_end - (__u32)data);
+   return SK_PASS;
+}
+
+SEC("sk_msg3")
+int bpf_prog6(struct sk_msg_md *msg)
+{
+   void *data_end = (void *)(long) msg->data_end;
+   void *data = (void *)(long) msg->data;
+   int ret = 0;
+
+   return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+}
+
+SEC("sk_msg4")
+int bpf_prog7(struct sk_msg_md *msg)
+{
+   void *data_end = (void *)(long) msg->data_end;
+   void *data = (void *)(long) msg->data;
+   int ret = 0;
+
+   bpf_printk("sk_msg3: redirect(%iB)\n", (__u32)data_end - (__u32)data);
+   return bpf_msg_redirect_map(msg, &sock_map_redir, ret, 0);
+}
+
 char _license[] SEC("license") = "GPL";
diff --git a/samples/sockmap/sockmap_user.c b/samples/sockmap/sockmap_user.c
index 95a54a8..bbfe3a2 100644
--- a/samples/sockmap/sockmap_user.c
+++ b/samples/sockmap/sockmap_user.c
@@ -54,6 +54,11 @@
 /* global sockets */
 int s1, s2, c1, c2, p1, p2;
 
+int txmsg_pass;
+int txmsg_noisy;
+int txmsg_redir;
+int txmsg_redir_noisy;
+
 static const struct option long_options[] = {
{"help",no_argument,NULL, 'h' },
{"cgroup",  required_argument,  NULL, 'c' },
@@ -62,6 +67,10 @@
{"iov_count",   required_argument,  NULL, 'i' },
{"length",  required_argument,  NULL, 'l' },
{"tes

[bpf-next PATCH v3 10/18] bpf: add verifier tests for BPF_PROG_TYPE_SK_MSG

2018-03-18 Thread John Fastabend

Test read and writes for BPF_PROG_TYPE_SK_MSG.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 tools/testing/selftests/bpf/test_verifier.c |   54 +++
 1 file changed, 54 insertions(+)

diff --git a/tools/testing/selftests/bpf/test_verifier.c 
b/tools/testing/selftests/bpf/test_verifier.c
index 86d7ff4..3e7718b 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -1597,6 +1597,60 @@ struct test_val {
.prog_type = BPF_PROG_TYPE_SK_SKB,
},
{
+   "direct packet read for SK_MSG",
+   .insns = {
+   BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
+   offsetof(struct sk_msg_md, data)),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1,
+   offsetof(struct sk_msg_md, data_end)),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+   BPF_LDX_MEM(BPF_B, BPF_REG_0, BPF_REG_2, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_SK_MSG,
+   },
+   {
+   "direct packet write for SK_MSG",
+   .insns = {
+   BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
+   offsetof(struct sk_msg_md, data)),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1,
+   offsetof(struct sk_msg_md, data_end)),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 1),
+   BPF_STX_MEM(BPF_B, BPF_REG_2, BPF_REG_2, 0),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_SK_MSG,
+   },
+   {
+   "overlapping checks for direct packet access SK_MSG",
+   .insns = {
+   BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_1,
+   offsetof(struct sk_msg_md, data)),
+   BPF_LDX_MEM(BPF_DW, BPF_REG_3, BPF_REG_1,
+   offsetof(struct sk_msg_md, data_end)),
+   BPF_MOV64_REG(BPF_REG_0, BPF_REG_2),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 8),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 4),
+   BPF_MOV64_REG(BPF_REG_1, BPF_REG_2),
+   BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6),
+   BPF_JMP_REG(BPF_JGT, BPF_REG_1, BPF_REG_3, 1),
+   BPF_LDX_MEM(BPF_H, BPF_REG_0, BPF_REG_2, 6),
+   BPF_MOV64_IMM(BPF_REG_0, 0),
+   BPF_EXIT_INSN(),
+   },
+   .result = ACCEPT,
+   .prog_type = BPF_PROG_TYPE_SK_MSG,
+   },
+   {
"check skb->mark is not writeable by sockets",
.insns = {
BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_1,

[bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data

2018-03-18 Thread John Fastabend

Currently, if a bpf sk msg program is run the program
can only parse data that the (start,end) pointers already
consumed. For sendmsg hooks this is likely the first
scatterlist element. For sendpage this will be the range
(0,0) because the data is shared with userspace and by
default we want to avoid allowing userspace to modify
data while (or after) BPF verdict is being decided.

To support pulling in additional bytes for parsing use
a new helper bpf_sk_msg_pull(start, end, flags) which
works similar to cls tc logic. This helper will attempt
to point the data start pointer at 'start' bytes offest
into msg and data end pointer at 'end' bytes offset into
message.

After basic sanity checks to ensure 'start' <= 'end' and
'end' <= msg_length there are a few cases we need to
handle.

First the sendmsg hook has already copied the data from
userspace and has exclusive access to it. Therefor, it
is not necessesary to copy the data. However, it may
be required. After finding the scatterlist element with
'start' offset byte in it there are two cases. One the
range (start,end) is entirely contained in the sg element
and is already linear. All that is needed is to update the
data pointers, no allocate/copy is needed. The other case
is (start, end) crosses sg element boundaries. In this
case we allocate a block of size 'end - start' and copy
the data to linearize it.

Next sendpage hook has not copied any data in initial
state so that data pointers are (0,0). In this case we
handle it similar to the above sendmsg case except the
allocation/copy must always happen. Then when sending
the data we have possibly three memory regions that
need to be sent, (0, start - 1), (start, end), and
(end + 1, msg_length). This is required to ensure any
writes by the BPF program are correctly transmitted.

Lastly this operation will invalidate any previous
data checks so BPF programs will have to revalidate
pointers after making this BPF call.

Signed-off-by: John Fastabend 
---
 include/uapi/linux/bpf.h |3 +
 net/core/filter.c|  135 ++
 2 files changed, 136 insertions(+), 2 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 1765cfb..18b7c51 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -793,7 +793,8 @@ struct bpf_stack_build_id {
FN(sock_ops_cb_flags_set),  \
FN(msg_redirect_map),   \
FN(msg_apply_bytes),\
-   FN(msg_cork_bytes),
+   FN(msg_cork_bytes), \
+   FN(msg_pull_data),
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
  * function eBPF program intends to call
diff --git a/net/core/filter.c b/net/core/filter.c
index 0c9daf6..c86f03f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1956,6 +1956,136 @@ struct sock *do_msg_redirect_map(struct sk_msg_buff 
*msg)
.arg2_type  = ARG_ANYTHING,
 };
 
+BPF_CALL_4(bpf_msg_pull_data,
+  struct sk_msg_buff *, msg, u32, start, u32, end, u64, flags)
+{
+   unsigned int len = 0, offset = 0, copy = 0;
+   struct scatterlist *sg = msg->sg_data;
+   int first_sg, last_sg, i, shift;
+   unsigned char *p, *to, *from;
+   int bytes = end - start;
+   struct page *page;
+
+   if (unlikely(flags || end <= start))
+   return -EINVAL;
+
+   /* First find the starting scatterlist element */
+   i = msg->sg_start;
+   do {
+   len = sg[i].length;
+   offset += len;
+   if (start < offset + len)
+   break;
+   i++;
+   if (i == MAX_SKB_FRAGS)
+   i = 0;
+   } while (i != msg->sg_end);
+
+   if (unlikely(start >= offset + len))
+   return -EINVAL;
+
+   if (!msg->sg_copy[i] && bytes <= len)
+   goto out;
+
+   first_sg = i;
+
+   /* At this point we need to linearize multiple scatterlist
+* elements or a single shared page. Either way we need to
+* copy into a linear buffer exclusively owned by BPF. Then
+* place the buffer in the scatterlist and fixup the original
+* entries by removing the entries now in the linear buffer
+* and shifting the remaining entries. For now we do not try
+* to copy partial entries to avoid complexity of running out
+* of sg_entry slots. The downside is reading a single byte
+* will copy the entire sg entry.
+*/
+   do {
+   copy += sg[i].length;
+   i++;
+   if (i == MAX_SKB_FRAGS)
+   i = 0;
+   if (bytes < copy)
+   break;
+   } while (i != msg->sg_end);
+   last_sg = i;
+
+   if (unlikely(copy < end - start))
+   return -EINVAL;
+
+   page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
+   if (unlikely(!page))
+   return -

[bpf-next PATCH v3 17/18] bpf: sockmap sample test for bpf_msg_pull_data

2018-03-18 Thread John Fastabend

This adds an option to test the msg_pull_data helper. This
uses two options txmsg_start and txmsg_end to let the user
specify start and end bytes to pull.

The options can be used with txmsg_apply, txmsg_cork options
as well as with any of the basic tests, txmsg, txmsg_redir and
txmsg_drop (plus noisy variants) to run pull_data inline with
those tests. By giving user direct control over the variables
we can easily do negative testing as well as positive tests.

Signed-off-by: John Fastabend 
Acked-by: David S. Miller 
---
 samples/sockmap/sockmap_kern.c|   79 -
 samples/sockmap/sockmap_user.c|   32 
 tools/include/uapi/linux/bpf.h|3 +
 tools/testing/selftests/bpf/bpf_helpers.h |2 +
 4 files changed, 101 insertions(+), 15 deletions(-)

diff --git a/samples/sockmap/sockmap_kern.c b/samples/sockmap/sockmap_kern.c
index 8b6c34c..9ad5ba7 100644
--- a/samples/sockmap/sockmap_kern.c
+++ b/samples/sockmap/sockmap_kern.c
@@ -71,6 +71,14 @@ struct bpf_map_def SEC("maps") sock_cork_bytes = {
.max_entries = 1
 };
 
+struct bpf_map_def SEC("maps") sock_pull_bytes = {
+   .type = BPF_MAP_TYPE_ARRAY,
+   .key_size = sizeof(int),
+   .value_size = sizeof(int),
+   .max_entries = 2
+};
+
+
 SEC("sk_skb1")
 int bpf_prog1(struct __sk_buff *skb)
 {
@@ -137,7 +145,8 @@ int bpf_sockmap(struct bpf_sock_ops *skops)
 SEC("sk_msg1")
 int bpf_prog4(struct sk_msg_md *msg)
 {
-   int *bytes, zero = 0;
+   int *bytes, zero = 0, one = 1;
+   int *start, *end;
 
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
@@ -145,15 +154,18 @@ int bpf_prog4(struct sk_msg_md *msg)
bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
if (bytes)
bpf_msg_cork_bytes(msg, *bytes);
+   start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+   end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+   if (start && end)
+   bpf_msg_pull_data(msg, *start, *end, 0);
return SK_PASS;
 }
 
 SEC("sk_msg2")
 int bpf_prog5(struct sk_msg_md *msg)
 {
-   void *data_end = (void *)(long) msg->data_end;
-   void *data = (void *)(long) msg->data;
-   int *bytes, err1 = -1, err2 = -1, zero = 0;
+   int err1 = -1, err2 = -1, zero = 0, one = 1;
+   int *bytes, *start, *end, len1, len2;
 
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
@@ -161,17 +173,32 @@ int bpf_prog5(struct sk_msg_md *msg)
bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
if (bytes)
err2 = bpf_msg_cork_bytes(msg, *bytes);
+   len1 = (__u64)msg->data_end - (__u64)msg->data;
+   start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+   end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+   if (start && end) {
+   int err;
+
+   bpf_printk("sk_msg2: pull(%i:%i)\n",
+  start ? *start : 0, end ? *end : 0);
+   err = bpf_msg_pull_data(msg, *start, *end, 0);
+   if (err)
+   bpf_printk("sk_msg2: pull_data err %i\n",
+  err);
+   len2 = (__u64)msg->data_end - (__u64)msg->data;
+   bpf_printk("sk_msg2: length update %i->%i\n",
+  len1, len2);
+   }
bpf_printk("sk_msg2: data length %i err1 %i err2 %i\n",
-  (__u64)data_end - (__u64)data, err1, err2);
+  len1, err1, err2);
return SK_PASS;
 }
 
 SEC("sk_msg3")
 int bpf_prog6(struct sk_msg_md *msg)
 {
-   void *data_end = (void *)(long) msg->data_end;
-   void *data = (void *)(long) msg->data;
-   int *bytes, zero = 0;
+   int *bytes, zero = 0, one = 1;
+   int *start, *end;
 
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
@@ -179,15 +206,18 @@ int bpf_prog6(struct sk_msg_md *msg)
bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
if (bytes)
bpf_msg_cork_bytes(msg, *bytes);
+   start = bpf_map_lookup_elem(&sock_pull_bytes, &zero);
+   end = bpf_map_lookup_elem(&sock_pull_bytes, &one);
+   if (start && end)
+   bpf_msg_pull_data(msg, *start, *end, 0);
return bpf_msg_redirect_map(msg, &sock_map_redir, zero, 0);
 }
 
 SEC("sk_msg4")
 int bpf_prog7(struct sk_msg_md *msg)
 {
-   void *data_end = (void *)(long) msg->data_end;
-   void *data = (void *)(long) msg->data;
-   int *bytes, err1 = 0, err2 = 0, zero = 0;
+   int err1 = 0, err2 = 0, zero = 0, one = 1;
+   int *bytes, *start, *end, len1, len2;
 
bytes = bpf_map_lookup_elem(&sock_apply_bytes, &zero);
if (bytes)
@@ -195,9 +225,24 @@ int bpf_prog7(struct sk_msg_md *msg)
bytes = bpf_map_lookup_elem(&sock_cork_bytes, &zero);
if (bytes)
err2 = bpf_msg_cork_bytes(msg, *bytes);
-
+   len1 = (__u64)msg

Re: [PATCH 1/2] brcmfmac: add new dt entries for SG SDIO settings

2018-03-18 Thread Andrew Lunn

> + if (of_property_read_u16(np, "brcm,sd-head-align", &align) == 0)
> + sdio->sd_head_align = align;

Hi Alexey

I think you can make this:

of_property_read_u16(np, "brcm,sd-head-align", &sdio->sd_head_align);

of_property_read_u16() should not touch the destination variable if
the properties does not exist, or if there is an error.

Andrew

Re: [bpf-next PATCH v3 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

2018-03-18 Thread David Miller

From: John Fastabend 
Date: Sun, 18 Mar 2018 12:57:10 -0700

> This implements a BPF ULP layer to allow policy enforcement and
> monitoring at the socket layer.
 ...
> Signed-off-by: John Fastabend 

Acked-by: David S. Miller

Re: [bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data

2018-03-18 Thread David Miller

From: John Fastabend 
Date: Sun, 18 Mar 2018 12:57:25 -0700

> Currently, if a bpf sk msg program is run the program
> can only parse data that the (start,end) pointers already
> consumed. For sendmsg hooks this is likely the first
> scatterlist element. For sendpage this will be the range
> (0,0) because the data is shared with userspace and by
> default we want to avoid allowing userspace to modify
> data while (or after) BPF verdict is being decided.
> 
> To support pulling in additional bytes for parsing use
> a new helper bpf_sk_msg_pull(start, end, flags) which
> works similar to cls tc logic. This helper will attempt
> to point the data start pointer at 'start' bytes offest
> into msg and data end pointer at 'end' bytes offset into
> message.
 ...
> Signed-off-by: John Fastabend 

Acked-by: David S. Miller

Re: [bpf-next PATCH v3 07/18] bpf: sockmap, add msg_cork_bytes() helper

2018-03-18 Thread David Miller

From: John Fastabend 
Date: Sun, 18 Mar 2018 12:57:20 -0700

> In the case where we need a specific number of bytes before a
> verdict can be assigned, even if the data spans multiple sendmsg
> or sendfile calls. The BPF program may use msg_cork_bytes().
> 
> The extreme case is a user can call sendmsg repeatedly with
> 1-byte msg segments. Obviously, this is bad for performance but
> is still valid. If the BPF program needs N bytes to validate
> a header it can use msg_cork_bytes to specify N bytes and the
> BPF program will not be called again until N bytes have been
> accumulated. The infrastructure will attempt to coalesce data
> if possible so in many cases (most my use cases at least) the
> data will be in a single scatterlist element with data pointers
> pointing to start/end of the element. However, this is dependent
> on available memory so is not guaranteed. So BPF programs must
> validate data pointer ranges, but this is the case anyways to
> convince the verifier the accesses are valid.
> 
> Signed-off-by: John Fastabend 

Acked-by: David S. Miller

Re: [bpf-next PATCH v3 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper

2018-03-18 Thread David Miller

From: John Fastabend 
Date: Sun, 18 Mar 2018 12:57:15 -0700

> A single sendmsg or sendfile system call can contain multiple logical
> messages that a BPF program may want to read and apply a verdict. But,
> without an apply_bytes helper any verdict on the data applies to all
> bytes in the sendmsg/sendfile. Alternatively, a BPF program may only
> care to read the first N bytes of a msg. If the payload is large say
> MB or even GB setting up and calling the BPF program repeatedly for
> all bytes, even though the verdict is already known, creates
> unnecessary overhead.
> 
> To allow BPF programs to control how many bytes a given verdict
> applies to we implement a bpf_msg_apply_bytes() helper. When called
> from within a BPF program this sets a counter, internal to the
> BPF infrastructure, that applies the last verdict to the next N
> bytes. If the N is smaller than the current data being processed
> from a sendmsg/sendfile call, the first N bytes will be sent and
> the BPF program will be re-run with start_data pointing to the N+1
> byte. If N is larger than the current data being processed the
> BPF verdict will be applied to multiple sendmsg/sendfile calls
> until N bytes are consumed.
> 
> Note1 if a socket closes with apply_bytes counter non-zero this
> is not a problem because data is not being buffered for N bytes
> and is sent as its received.
> 
> Signed-off-by: John Fastabend 

Acked-by: David S. Miller

Re: [PATCH net] net: fec: Fix unbalanced PM runtime calls

2018-03-18 Thread David Miller

From: Florian Fainelli 
Date: Sun, 18 Mar 2018 12:49:51 -0700

> When unbinding/removing the driver, we will run into the following warnings:
> 
> [  259.655198] fec 400d1000.ethernet: 400d1000.ethernet supply phy not found, 
> using dummy regulator
> [  259.665065] fec 400d1000.ethernet: Unbalanced pm_runtime_enable!
> [  259.672770] fec 400d1000.ethernet (unnamed net_device) (uninitialized): 
> Invalid MAC address: 00:00:00:00:00:00
> [  259.683062] fec 400d1000.ethernet (unnamed net_device) (uninitialized): 
> Using random MAC address: f2:3e:93:b7:29:c1
> [  259.696239] libphy: fec_enet_mii_bus: probed
> 
> Avoid these warnings by balancing the runtime PM calls during 
> fec_drv_remove().
> 
> Signed-off-by: Florian Fainelli 

Applied, thank you.

Queue this up for -stable?

Re: [PATCH net-next] net: dsa: mv88e6xxx: Fix missing register lock in serdes_get_stats

2018-03-18 Thread David Miller

From: Florian Fainelli 
Date: Sun, 18 Mar 2018 11:23:05 -0700

> We can hit the register lock not held assertion with the following path:
...
> mv88e6xxx_get_ethtool_stats() calls mv88e6xxx_get_stats() which calls both
> chip->info->ops->stats_get_stats(), which holds the register lock, and
> chip->info->ops->serdes_get_stats() which does not. Have
> chip->info->ops->serdes_get_stats() be running with the register lock held to
> avoid such assertions.
> 
> Fixes: 436fe17d273b ("net: dsa: mv88e6xxx: Allow the SERDES interfaces to 
> have statistics")
> Signed-off-by: Florian Fainelli 

Applied, thanks Florian.

Re: [rds-devel] [PATCH RFC RFC] rds: Use NETDEV_UNREGISTER in rds_tcp_dev_event() (then kill NETDEV_UNREGISTER_FINAL)

2018-03-18 Thread Sowmini Varadhan

On (03/18/18 00:55), Kirill Tkhai wrote:
> 
> I just want to make rds not using NETDEV_UNREGISTER_FINAL. If there is
> another solution to do that, I'm not again that.

The patch below takes care of this. I've done some preliminary testing,
and I'll send it upstream after doing additional self-review/testing.
Please also take a look, if you can, to see if I missed something.

Thanks for the input,

--Sowmini
---patch follows

diff --git a/net/rds/tcp.c b/net/rds/tcp.c
index 08ea9cd..87c2643 100644
--- a/net/rds/tcp.c
+++ b/net/rds/tcp.c
@@ -485,40 +485,6 @@ static __net_init int rds_tcp_init_net(struct net *net)
return err;
 }
 
-static void __net_exit rds_tcp_exit_net(struct net *net)
-{
-   struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
-
-   if (rtn->rds_tcp_sysctl)
-   unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
-
-   if (net != &init_net && rtn->ctl_table)
-   kfree(rtn->ctl_table);
-
-   /* If rds_tcp_exit_net() is called as a result of netns deletion,
-* the rds_tcp_kill_sock() device notifier would already have cleaned
-* up the listen socket, thus there is no work to do in this function.
-*
-* If rds_tcp_exit_net() is called as a result of module unload,
-* i.e., due to rds_tcp_exit() -> unregister_pernet_subsys(), then
-* we do need to clean up the listen socket here.
-*/
-   if (rtn->rds_tcp_listen_sock) {
-   struct socket *lsock = rtn->rds_tcp_listen_sock;
-
-   rtn->rds_tcp_listen_sock = NULL;
-   rds_tcp_listen_stop(lsock, &rtn->rds_tcp_accept_w);
-   }
-}
-
-static struct pernet_operations rds_tcp_net_ops = {
-   .init = rds_tcp_init_net,
-   .exit = rds_tcp_exit_net,
-   .id = &rds_tcp_netid,
-   .size = sizeof(struct rds_tcp_net),
-   .async = true,
-};
-
 static void rds_tcp_kill_sock(struct net *net)
 {
struct rds_tcp_connection *tc, *_tc;
@@ -546,40 +512,38 @@ static void rds_tcp_kill_sock(struct net *net)
rds_conn_destroy(tc->t_cpath->cp_conn);
 }
 
-void *rds_tcp_listen_sock_def_readable(struct net *net)
+static void __net_exit rds_tcp_exit_net(struct net *net)
 {
struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
-   struct socket *lsock = rtn->rds_tcp_listen_sock;
 
-   if (!lsock)
-   return NULL;
+   rds_tcp_kill_sock(net);
 
-   return lsock->sk->sk_user_data;
+   if (rtn->rds_tcp_sysctl)
+   unregister_net_sysctl_table(rtn->rds_tcp_sysctl);
+
+   if (net != &init_net && rtn->ctl_table)
+   kfree(rtn->ctl_table);
 }
 
-static int rds_tcp_dev_event(struct notifier_block *this,
-unsigned long event, void *ptr)
+static struct pernet_operations rds_tcp_net_ops = {
+   .init = rds_tcp_init_net,
+   .exit = rds_tcp_exit_net,
+   .id = &rds_tcp_netid,
+   .size = sizeof(struct rds_tcp_net),
+   .async = true,
+};
+
+void *rds_tcp_listen_sock_def_readable(struct net *net)
 {
-   struct net_device *dev = netdev_notifier_info_to_dev(ptr);
+   struct rds_tcp_net *rtn = net_generic(net, rds_tcp_netid);
+   struct socket *lsock = rtn->rds_tcp_listen_sock;
 
-   /* rds-tcp registers as a pernet subys, so the ->exit will only
-* get invoked after network acitivity has quiesced. We need to
-* clean up all sockets  to quiesce network activity, and use
-* the unregistration of the per-net loopback device as a trigger
-* to start that cleanup.
-*/
-   if (event == NETDEV_UNREGISTER_FINAL &&
-   dev->ifindex == LOOPBACK_IFINDEX)
-   rds_tcp_kill_sock(dev_net(dev));
+   if (!lsock)
+   return NULL;
 
-   return NOTIFY_DONE;
+   return lsock->sk->sk_user_data;
 }
 
-static struct notifier_block rds_tcp_dev_notifier = {
-   .notifier_call= rds_tcp_dev_event,
-   .priority = -10, /* must be called after other network notifiers */
-};
-
 /* when sysctl is used to modify some kernel socket parameters,this
  * function  resets the RDS connections in that netns  so that we can
  * restart with new parameters.  The assumption is that such reset
@@ -625,9 +589,7 @@ static void rds_tcp_exit(void)
rds_tcp_set_unloading();
synchronize_rcu();
rds_info_deregister_func(RDS_INFO_TCP_SOCKETS, rds_tcp_tc_info);
-   unregister_pernet_subsys(&rds_tcp_net_ops);
-   if (unregister_netdevice_notifier(&rds_tcp_dev_notifier))
-   pr_warn("could not unregister rds_tcp_dev_notifier\n");
+   unregister_pernet_device(&rds_tcp_net_ops);
rds_tcp_destroy_conns();
rds_trans_unregister(&rds_tcp_transport);
rds_tcp_recv_exit();
@@ -651,24 +613,17 @@ static int rds_tcp_init(void)
if (ret)
goto out_slab;
 
-   ret = register_pernet_s

Re: [PATCH net-next 0/3] Automatic PHY interrupts

2018-03-18 Thread David Miller

From: Andrew Lunn 
Date: Sat, 17 Mar 2018 20:32:02 +0100

> Now that the mv88e6xxx driver either installs in interrupt handler, or
> polls for interrupts, it is possible to always handle PHY interrupts,
> rather than have phylib perform the polling. This speeds up detection
> of link changes and reduces the load on the MDIO bus, which is
> beneficial for PTP.

Series applied, thanks Andrew.

[PATCH net-next] selftests: pmtu: Drop prints to kernel log from pmtu_vti6_link_change_mtu

2018-03-18 Thread Stefano Brivio

Reported-by: David Ahern 
Fixes: 1fad59ea1c34 ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test")
Signed-off-by: Stefano Brivio 
---
 tools/testing/selftests/net/pmtu.sh | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/testing/selftests/net/pmtu.sh 
b/tools/testing/selftests/net/pmtu.sh
index 92197c05bac4..1e428781a625 100755
--- a/tools/testing/selftests/net/pmtu.sh
+++ b/tools/testing/selftests/net/pmtu.sh
@@ -421,7 +421,6 @@ test_pmtu_vti6_link_change_mtu() {
 
# Move to another device with different MTU, without passing MTU, check
# MTU is adjusted
-   echo "${ns_a} ip link set vti6_a type vti6 remote ${dummy6_1_addr} 
local ${dummy6_1_addr}" > /dev/kmsg
${ns_a} ip link set vti6_a type vti6 remote ${dummy6_1_addr} local 
${dummy6_1_addr}
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
if [ ${mtu} -ne $((3000 - 40)) ]; then
@@ -430,7 +429,6 @@ test_pmtu_vti6_link_change_mtu() {
fi
 
# Move it back, passing MTU, check MTU is not overridden
-   echo "${ns_a} ip link set vti6_a mtu 1280 type vti6 remote 
${dummy6_0_addr} local ${dummy6_0_addr}" > /dev/kmsg
${ns_a} ip link set vti6_a mtu 1280 type vti6 remote ${dummy6_0_addr} 
local ${dummy6_0_addr}
mtu="$(link_get_mtu "${ns_a}" vti6_a)"
if [ ${mtu} -ne 1280 ]; then
-- 
2.15.1

Re: [PATCH net-next] selftests: pmtu: Drop prints to kernel log from pmtu_vti6_link_change_mtu

2018-03-18 Thread David Miller

From: Stefano Brivio 
Date: Sun, 18 Mar 2018 21:58:12 +0100

> Reported-by: David Ahern 
> Fixes: 1fad59ea1c34 ("selftests: pmtu: Add pmtu_vti6_link_change_mtu test")
> Signed-off-by: Stefano Brivio 

Applied, thanks Stefano.

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-18 Thread Rasmus Villemoes

On 2018-03-17 19:52, Linus Torvalds wrote:
> On Sat, Mar 17, 2018 at 12:27 AM, Kees Cook  wrote:
>>
>> Unfortunately my 4.4 test fails quickly:
>>
>> ./include/linux/jiffies.h: In function ‘jiffies_delta_to_clock_t’:
>> ./include/linux/jiffies.h:444: error: first argument to
>> ‘__builtin_choose_expr’ not a constant
> 
> Ok, so it really looks like that same "__builtin_constant_p() doesn't
> return a constant".
> 
> Which is really odd, but there you have it.

Not really. We do rely on builtin_constant_p not being folded too
quickly to a 0/1 answer, so that gcc still generates good code even if
the argument is only known to be constant at a late(r) optimization
stage (through inlining and all). So unlike types_compatible_p, which
can obviously be answered early during parsing, builtin_constant_p is
most of the time a yes/no/maybe/it's complicated thing. Sure, when the
argument is just a literal or perhaps even any kind of ICE, gcc can fold
it to "yes", and I think it does (though the details of when and if gcc
does that can obviously be very version-dependent, which may be some of
what we've seen). But when it's not that obvious, gcc leaves it in the
undetermined state. That's not good enough for builtin_choose_expr,
because even the type of the resulting expression depends on that first
argument, so that really must be resolved early.

So to have some kind of builtin_constant_p control a
builtin_choose_expr, it would need to be a "builtin_ice_p" or
"builtin_obviously_constant_p" that would always be folded to 0/1 as
part of evaluating ICEs.

So I don't think there's any way around creating a separate macro for
use with compile-time constants.

Rasmus

Re: [PATCH 1/5] mtd: Initialize ->fail_addr early in mtd_erase()

2018-03-18 Thread Boris Brezillon

On Mon, 12 Feb 2018 22:03:07 +0100
Boris Brezillon  wrote:

> mtd_erase() can return an error before ->fail_addr is initialized to
> MTD_FAIL_ADDR_UNKNOWN. Move this initialization at the very beginning
> of the function.

Applied the patchset after addressing Miquel's comments.

> 
> Signed-off-by: Boris Brezillon 
> ---
>  drivers/mtd/mtdcore.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
> index a1c94526fb88..c87859ff338b 100644
> --- a/drivers/mtd/mtdcore.c
> +++ b/drivers/mtd/mtdcore.c
> @@ -953,6 +953,8 @@ EXPORT_SYMBOL_GPL(__put_mtd_device);
>   */
>  int mtd_erase(struct mtd_info *mtd, struct erase_info *instr)
>  {
> + instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
> +
>   if (!mtd->erasesize || !mtd->_erase)
>   return -ENOTSUPP;
>  
> @@ -961,7 +963,6 @@ int mtd_erase(struct mtd_info *mtd, struct erase_info 
> *instr)
>   if (!(mtd->flags & MTD_WRITEABLE))
>   return -EROFS;
>  
> - instr->fail_addr = MTD_FAIL_ADDR_UNKNOWN;
>   if (!instr->len) {
>   instr->state = MTD_ERASE_DONE;
>   mtd_erase_callback(instr);



-- 
Boris Brezillon, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com

Re: [PATCH net-next 02/12] clk: sunxi-ng: r40: export a regmap to access the GMAC register

2018-03-18 Thread Maxime Ripard

On Sat, Mar 17, 2018 at 05:28:47PM +0800, Chen-Yu Tsai wrote:
> From: Icenowy Zheng 
> 
> There's a GMAC configuration register, which exists on A64/A83T/H3/H5 in
> the syscon part, in the CCU of R40 SoC.
> 
> Export a regmap of the CCU.
> 
> Read access is not restricted to all registers, but only the GMAC
> register is allowed to be written.
> 
> Signed-off-by: Icenowy Zheng 
> Signed-off-by: Chen-Yu Tsai 

Gah, this is crazy. I'm really starting to regret letting that syscon
in in the first place...

And I'm not really looking forward the time where SCPI et al. will be
mature and we'll have the clock controller completely outside of our
control.

Maxime

-- 
Maxime Ripard, Bootlin (formerly Free Electrons)
Embedded Linux and Kernel engineering
https://bootlin.com


signature.asc
Description: PGP signature

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-18 Thread Linus Torvalds

On Sun, Mar 18, 2018 at 2:13 PM, Rasmus Villemoes
 wrote:
> On 2018-03-17 19:52, Linus Torvalds wrote:
>>
>> Ok, so it really looks like that same "__builtin_constant_p() doesn't
>> return a constant".
>>
>> Which is really odd, but there you have it.
>
> Not really. We do rely on builtin_constant_p not being folded too
> quickly to a 0/1 answer, so that gcc still generates good code even if
> the argument is only known to be constant at a late(r) optimization
> stage (through inlining and all).

Hmm. That makes sense. It just doesn't work for our case when we
really want to choose the expression based on side effects or not.

> So unlike types_compatible_p, which
> can obviously be answered early during parsing, builtin_constant_p is
> most of the time a yes/no/maybe/it's complicated thing.

The silly thing is, the thing we *really* want to know _is_ actually
easily accessible during the early parsing, exactly like
__builtin_compatible_p(): it's not really that we care about the
expressions being constant, as much as the "can this have side
effects" question.

We only really use __builtin_constant_p() as a (bad) approximation of
that in this case, since it's the best we can do.

So the real use would be to say "can we use the simple direct macro
that just happens to also fold into a nice integer constant
expression" for __builtin_choose_expr().

I tried to find something like that, but it really doesn't exist, even
though I would actually have expected it to be a somewhat common
concern for macro writers: write a macro that works in any arbitrary
environment.

I guess array sizes are really the only true limiting environment
(static initializers is another one).

How annoying. Odd that newer gcc's seem to do so much better (ie gcc-5
seems to be fine). So _something_ must have changed.

 Linus

Re: [PATCH net] net: fec: Fix unbalanced PM runtime calls

2018-03-18 Thread Florian Fainelli



On 03/18/2018 01:35 PM, David Miller wrote:
> From: Florian Fainelli 
> Date: Sun, 18 Mar 2018 12:49:51 -0700
> 
>> When unbinding/removing the driver, we will run into the following warnings:
>>
>> [  259.655198] fec 400d1000.ethernet: 400d1000.ethernet supply phy not 
>> found, using dummy regulator
>> [  259.665065] fec 400d1000.ethernet: Unbalanced pm_runtime_enable!
>> [  259.672770] fec 400d1000.ethernet (unnamed net_device) (uninitialized): 
>> Invalid MAC address: 00:00:00:00:00:00
>> [  259.683062] fec 400d1000.ethernet (unnamed net_device) (uninitialized): 
>> Using random MAC address: f2:3e:93:b7:29:c1
>> [  259.696239] libphy: fec_enet_mii_bus: probed
>>
>> Avoid these warnings by balancing the runtime PM calls during 
>> fec_drv_remove().
>>
>> Signed-off-by: Florian Fainelli 
> 
> Applied, thank you.
> 
> Queue this up for -stable?

I would be inclined to say yes, it was not exactly easy to track down a
set of commits, since this was changed quite a bit.
-- 
Florian

Re: [PATCH net-next 2/5] cxgb4: Add support to initialise/read SRQ entries

2018-03-18 Thread Stefano Brivio

On Sat, 17 Mar 2018 12:52:26 +0530
Raju Rangoju  wrote:

> +struct srq_data *t4_init_srq(int srq_size)
> +{
> + struct srq_data *s;
> +
> + s = kzalloc(sizeof(*s), GFP_KERNEL | __GFP_NOWARN);
> + if (!s)
> + s = vzalloc(sizeof(*s));
> + if (!s)
> + return NULL;

I guess you could use kvzalloc() here.

> [...]
>
> +++ b/drivers/net/ethernet/chelsio/cxgb4/srq.h
>
> [...]
>
> +enum {
> + SRQ_WAIT_TO = (HZ * 5),
> +};

Why not #define? Am I missing something?

-- 
Stefano

Re: [PATCH v11 crypto 06/12] crypto: chtls - structure and macro for Inline TLS

2018-03-18 Thread Sabrina Dubroca

2018-03-16, 21:07:35 +0530, Atul Gupta wrote:
[...]
> +#define SOCK_INLINE (31)
[...]

> +static inline int csk_flag(const struct sock *sk, enum csk_flags flag)
> +{
> + struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
> +
> + if (!sock_flag(sk, SOCK_INLINE))
> + return 0;
> + return test_bit(flag, &csk->flags);
> +}

Should drivers really start defining their own socket flags?


> +static inline void set_queue(struct sk_buff *skb,
> +  unsigned int queue, const struct sock *sk)
> +{
> + skb->queue_mapping = queue;
> +}

That's skb_set_queue_mapping(), no need to define your own.

-- 
Sabrina

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-18 Thread Rasmus Villemoes

On 2018-03-18 22:33, Linus Torvalds wrote:
> On Sun, Mar 18, 2018 at 2:13 PM, Rasmus Villemoes
>  wrote:
>> On 2018-03-17 19:52, Linus Torvalds wrote:
>>>
>>> Ok, so it really looks like that same "__builtin_constant_p() doesn't
>>> return a constant".
>>>
>>> Which is really odd, but there you have it.
>>
>> Not really. We do rely on builtin_constant_p not being folded too
>> quickly to a 0/1 answer, so that gcc still generates good code even if
>> the argument is only known to be constant at a late(r) optimization
>> stage (through inlining and all).
> 
> Hmm. That makes sense. It just doesn't work for our case when we
> really want to choose the expression based on side effects or not.
> 
>> So unlike types_compatible_p, which
>> can obviously be answered early during parsing, builtin_constant_p is
>> most of the time a yes/no/maybe/it's complicated thing.
> 
> The silly thing is, the thing we *really* want to know _is_ actually
> easily accessible during the early parsing, exactly like
> __builtin_compatible_p(): it's not really that we care about the
> expressions being constant, as much as the "can this have side
> effects" question.

OK, I missed where this was made about side effects of x and y, but I
suppose the idea was to use

  no_side_effects(x) && no_side_effects(y) ? trivial_max(x, y) :
old_max(x, y)

or the same thing spelled with b_c_e? Yes, I think that would work, if
we indeed had that way of checking an expression.

> We only really use __builtin_constant_p() as a (bad) approximation of
> that in this case, since it's the best we can do.

I don't think you should parenthesize bad, rather capitalize it. ({x++;
0;}) is constant in the eyes of __builtin_constant_p, but not
side-effect free. Sure, that's very contrived, but I can easily imagine
some max(f(foo), -1) call where f is sometimes an external function, but
for other .configs it's a static inline that always returns 0, but still
has some non-trivial side-effect before that. And this would all depend
on which optimizations gcc applies before it decides to evaluate
builtin_constant_p, so could be some fun debugging. Good thing that that
didn't work out...

> So the real use would be to say "can we use the simple direct macro
> that just happens to also fold into a nice integer constant
> expression" for __builtin_choose_expr().
> 
> I tried to find something like that, but it really doesn't exist, even
> though I would actually have expected it to be a somewhat common
> concern for macro writers: write a macro that works in any arbitrary
> environment.

Yeah, I think the closest is a five year old suggestion
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57612) to add a
__builtin_assert_no_side_effects, that could be used in macros that for
some reason cannot be implemented without evaluating some argument
multiple times. It would be more useful to have the predicate version,
which one could always turn into a build bug version. But we have
neither, unfortunately.

Rasmus

[PATCH v2] net: ethernet: arc: Fix a potential memory leak if an optional regulator is deferred

2018-03-18 Thread Christophe JAILLET

If the optional regulator is deferred, we must release some resources.
They will be re-allocated when the probe function will be called again.

Fixes: 6eacf31139bf ("ethernet: arc: Add support for Rockchip SoC layer device 
tree bindings")
Signed-off-by: Christophe JAILLET 
---
v2: v1 did not compile because of an erroneous variable name. s/ret/err/
---
 drivers/net/ethernet/arc/emac_rockchip.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/arc/emac_rockchip.c 
b/drivers/net/ethernet/arc/emac_rockchip.c
index 16f9bee992fe..8ee9dfd0e363 100644
--- a/drivers/net/ethernet/arc/emac_rockchip.c
+++ b/drivers/net/ethernet/arc/emac_rockchip.c
@@ -169,8 +169,10 @@ static int emac_rockchip_probe(struct platform_device 
*pdev)
/* Optional regulator for PHY */
priv->regulator = devm_regulator_get_optional(dev, "phy");
if (IS_ERR(priv->regulator)) {
-   if (PTR_ERR(priv->regulator) == -EPROBE_DEFER)
-   return -EPROBE_DEFER;
+   if (PTR_ERR(priv->regulator) == -EPROBE_DEFER) {
+   err = -EPROBE_DEFER;
+   goto out_clk_disable;
+   }
dev_err(dev, "no regulator found\n");
priv->regulator = NULL;
}
-- 
2.14.1

Re: [PATCH v5 0/2] Remove false-positive VLAs when using max()

2018-03-18 Thread Linus Torvalds

On Sun, Mar 18, 2018 at 3:59 PM, Rasmus Villemoes
 wrote:
>
> OK, I missed where this was made about side effects of x and y

We never made it explicit, since all we really cared about in the end
is the constantness.

But yes:

> but I suppose the idea was to use
>
>   no_side_effects(x) && no_side_effects(y) ? trivial_max(x, y) :
> old_max(x, y)

Exactly. Except with __builtin_choose_expr(), because we need the end
result to be seen as a integer constant expression, so that we can
then use it as an array size. So it needs that early parse-time
resolution.

>> We only really use __builtin_constant_p() as a (bad) approximation of
>> that in this case, since it's the best we can do.
>
> I don't think you should parenthesize bad, rather capitalize it. ({x++;
> 0;}) is constant in the eyes of __builtin_constant_p, but not
> side-effect free.

Hmm. Yeah, but probably don't much care for the kernel.

For example, we do things like this:

   #define __swab64(x) \
(__builtin_constant_p((__u64)(x)) ? \
___constant_swab64(x) : \
__fswab64(x))

where that "___constant_swab64()" very much uses the same argument
over and over.

And we do that for related reasons - we really want to do the constant
folding at build time for some cases, and this was the only sane way
to do it. Eg code like

return (addr & htonl(0xff00)) == htonl(0x7f00);

wants to do the right thing, and long ago gcc didn't have builtins for
byte swapping, so we had to just do nasty horribly macros that DTRT
for constants.

But since the kernel is standalone, we don't need to really worry
about the *generic* case, we just need to worry about our own macros,
and if somebody does that example you show I guess we'll just have to
shun them ;)

Of course, our own macros are often macros from hell, exactly because
they often contain a lot of type-checking and/or type-(size)-based
polymorphism. But then we actually *want* gcc to simplify things, and
avoid side effects, just have potentially very complex expressions.

But we basically never have that kind of intentionally evil macros, so
we are willing to live with a bad substitute.

But yes, it would be better to have some more control over things, and
actually have a way to say "if X is a constant integer expression, do
transformation Y, otherwise call function y()".

Actually sparse started out with the idea in the background that it
could become not just a checker, but a "transformation tool". Partly
because of long gone historical issues (ie gcc people used to be very
anti-plugin due to licensing issues).

Of course, a more integrated and powerful preprocessor language is
almost always what we *really* wanted, but traditionally "powerful
preprocessor" has always meant "completely incomprehensible and badly
integrated preprocessor".

"cpp" may be a horrid language, but it's there and it's fast (when
integrated with the front-end, like everybody does now)

But sadly, cpp is really bad for the above kind of "if argument is
constant" kind of tricks. I suspect we'd use it a lot otherwise.

>> So the real use would be to say "can we use the simple direct macro
>> that just happens to also fold into a nice integer constant
>> expression" for __builtin_choose_expr().
>>
>> I tried to find something like that, but it really doesn't exist, even
>> though I would actually have expected it to be a somewhat common
>> concern for macro writers: write a macro that works in any arbitrary
>> environment.
>
> Yeah, I think the closest is a five year old suggestion
> (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=57612) to add a
> __builtin_assert_no_side_effects, that could be used in macros that for
> some reason cannot be implemented without evaluating some argument
> multiple times. It would be more useful to have the predicate version,
> which one could always turn into a build bug version. But we have
> neither, unfortunately.

Yeah, and since we're in the situation that *new* gcc versions work
for us anyway, and we only have issues with older gcc's (that sadly
people still use), even if there was a new cool feature we couldn't
use it.

Oh well. Thanks for the background.

Linus

[PATCH net 7/7] net/sched: fix idr leak in the error path of tcf_skbmod_init()

2018-03-18 Thread Davide Caratti

tcf_skbmod_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure skbmod rules
using the same idr value will systematically fail with -ENOSPC, unless
the first attempt was done using the 'replace' keyword:

 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action skbmod swap mac index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in tcf_skbmod_init(), ensuring that tcf_idr_release() is called
on the error path when the idr has been reserved, but not yet inserted.
Also, don't test 'ovr' in the error path, to avoid a 'replace' failure
implicitly become a 'delete' that leaks refcount in act_skbmod module:

 # rmmod act_skbmod; modprobe act_skbmod
 # tc action add action skbmod swap mac index 100
 # tc action add action skbmod swap mac continue index 100
 RTNETLINK answers: File exists
 We have an error talking to the kernel
 # tc action replace action skbmod swap mac continue index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action list action skbmod
 #
 # rmmod  act_skbmod
 rmmod: ERROR: Module act_skbmod is in use

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: Davide Caratti 
---
 net/sched/act_skbmod.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_skbmod.c b/net/sched/act_skbmod.c
index d09565d6433e..7b0700f52b50 100644
--- a/net/sched/act_skbmod.c
+++ b/net/sched/act_skbmod.c
@@ -152,7 +152,7 @@ static int tcf_skbmod_init(struct net *net, struct nlattr 
*nla,
ASSERT_RTNL();
p = kzalloc(sizeof(struct tcf_skbmod_params), GFP_KERNEL);
if (unlikely(!p)) {
-   if (ovr)
+   if (ret == ACT_P_CREATED)
tcf_idr_release(*a, bind);
return -ENOMEM;
}
-- 
2.14.3

[PATCH net 4/7] net/sched: fix idr leak in the error path of tcp_pedit_init()

2018-03-18 Thread Davide Caratti

tcf_pedit_init() can fail to allocate 'keys' after the idr has been
successfully reserved. When this happens, subsequent attempts to configure
a pedit rule using the same idr value systematically fail with -ENOSPC:

 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action pedit munge ip ttl set 63 index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_act_pedit_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: Davide Caratti 
---
 net/sched/act_pedit.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_pedit.c b/net/sched/act_pedit.c
index 349beaffb29e..fef08835f26d 100644
--- a/net/sched/act_pedit.c
+++ b/net/sched/act_pedit.c
@@ -176,7 +176,7 @@ static int tcf_pedit_init(struct net *net, struct nlattr 
*nla,
p = to_pedit(*a);
keys = kmalloc(ksize, GFP_KERNEL);
if (keys == NULL) {
-   tcf_idr_cleanup(*a, est);
+   tcf_idr_release(*a, bind);
kfree(keys_ex);
return -ENOMEM;
}
-- 
2.14.3

[PATCH net 2/7] net/sched: fix idr leak in the error path of tcf_simp_init()

2018-03-18 Thread Davide Caratti

if the kernel fails to duplicate 'sdata', creation of a new action fails
with -ENOMEM. However, subsequent attempts to install the same action
using the same value of 'index' systematically fail with -ENOSPC, and
that value of 'index' will no more be usable by act_simple, until rmmod /
insmod of act_simple.ko is done:

 # tc actions add action simple sdata hello index 100
 # tc actions list action simple

action order 0: Simple 
 index 100 ref 1 bind 0
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc actions flush action simple
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc actions add action simple sdata hello index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of tcf_simp_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup().

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Suggested-by: Cong Wang 
Signed-off-by: Davide Caratti 
---
 net/sched/act_simple.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_simple.c b/net/sched/act_simple.c
index 425eac11f6da..b1f38063ada0 100644
--- a/net/sched/act_simple.c
+++ b/net/sched/act_simple.c
@@ -121,7 +121,7 @@ static int tcf_simp_init(struct net *net, struct nlattr 
*nla,
d = to_defact(*a);
ret = alloc_defdata(d, defdata);
if (ret < 0) {
-   tcf_idr_cleanup(*a, est);
+   tcf_idr_release(*a, bind);
return ret;
}
d->tcf_action = parm->action;
-- 
2.14.3

[PATCH net 1/7] net/sched: fix idr leak on the error path of tcf_bpf_init()

2018-03-18 Thread Davide Caratti

when the following command sequence is entered

 # tc action add action bpf bytecode '4,40 0 0 12,31 0 1 2048,6 0 0 262144,6 0 
0 0' index 100
 RTNETLINK answers: Invalid argument
 We have an error talking to the kernel
 # tc action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 0 0 262144,6 0 
0 0' index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel

act_bpf correctly refuses to install the first TC rule, because 31 is not
a valid instruction. However, it refuses to install the second TC rule,
even if the BPF code is correct. Furthermore, it's no more possible to
install any other rule having the same value of 'index' until act_bpf
module is unloaded/inserted again. After the idr has been reserved, call
tcf_idr_release() instead of tcf_idr_cleanup(), to fix this issue.

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: Davide Caratti 
---
 net/sched/act_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index b3f2c15affa7..9d2cabf1dc7e 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -352,7 +352,7 @@ static int tcf_bpf_init(struct net *net, struct nlattr *nla,
return res;
 out:
if (res == ACT_P_CREATED)
-   tcf_idr_cleanup(*act, est);
+   tcf_idr_release(*act, bind);
 
return ret;
 }
-- 
2.14.3

[PATCH net 6/7] net/sched: fix idr leak in the error path of tcf_vlan_init()

2018-03-18 Thread Davide Caratti

tcf_vlan_init() can fail after the idr has been successfully reserved.
When this happens, every subsequent attempt to configure vlan rules using
the same idr value will systematically fail with -ENOSPC, unless the first
attempt was done using the 'replace' keyword.

 # tc action add action vlan pop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action vlan pop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action vlan pop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in tcf_vlan_init(), ensuring that tcf_idr_release() is called on
the error path when the idr has been reserved, but not yet inserted. Also,
don't test 'ovr' in the error path, to avoid a 'replace' failure implicitly
become a 'delete' that leaks refcount in act_vlan module:

 # rmmod act_vlan; modprobe act_vlan
 # tc action add action vlan push id 5 index 100
 # tc action replace action vlan push id 7 index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action list action vlan
 #
 # rmmod act_vlan
 # rmmod: ERROR: Module act_vlan is in use

Fixes: 4c5b9d9642c8 ("act_vlan: VLAN action rewrite to use RCU lock/unlock and 
update")
Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")

Signed-off-by: Davide Caratti 
---
 net/sched/act_vlan.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_vlan.c b/net/sched/act_vlan.c
index c2914e9a4a6f..c49cb61adedf 100644
--- a/net/sched/act_vlan.c
+++ b/net/sched/act_vlan.c
@@ -195,7 +195,7 @@ static int tcf_vlan_init(struct net *net, struct nlattr 
*nla,
ASSERT_RTNL();
p = kzalloc(sizeof(*p), GFP_KERNEL);
if (!p) {
-   if (ovr)
+   if (ret == ACT_P_CREATED)
tcf_idr_release(*a, bind);
return -ENOMEM;
}
-- 
2.14.3

[PATCH net 5/7] net/sched: fix idr leak in the error path of __tcf_ipt_init()

2018-03-18 Thread Davide Caratti

__tcf_ipt_init() can fail after the idr has been successfully reserved.
When this happens, subsequent attempts to configure xt/ipt rules using
the same idr value systematically fail with -ENOSPC:

 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
 target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".
 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
 target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 Command "(null)" is unknown, try "tc actions help".
 # tc action add action xt -j LOG --log-prefix test1 index 100
 tablename: mangle hook: NF_IP_POST_ROUTING
 target:  LOG level warning prefix "test1" index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 ...

Fix this in the error path of __tcf_ipt_init(), calling tcf_idr_release()
in place of tcf_idr_cleanup(). Since tcf_ipt_release() can now be called
when tcfi_t is NULL, we also need to protect calls to ipt_destroy_target()
to avoid NULL pointer dereference.

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: Davide Caratti 
---
 net/sched/act_ipt.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/sched/act_ipt.c b/net/sched/act_ipt.c
index 06e380ae0928..71b618144803 100644
--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -80,9 +80,12 @@ static void ipt_destroy_target(struct xt_entry_target *t)
 static void tcf_ipt_release(struct tc_action *a)
 {
struct tcf_ipt *ipt = to_ipt(a);
-   ipt_destroy_target(ipt->tcfi_t);
+
+   if (ipt->tfci_t) {
+   ipt_destroy_target(ipt->tcfi_t);
+   kfree(ipt->tcfi_t);
+   }
kfree(ipt->tcfi_tname);
-   kfree(ipt->tcfi_t);
 }
 
 static const struct nla_policy ipt_policy[TCA_IPT_MAX + 1] = {
@@ -187,7 +190,7 @@ static int __tcf_ipt_init(struct net *net, unsigned int id, 
struct nlattr *nla,
kfree(tname);
 err1:
if (ret == ACT_P_CREATED)
-   tcf_idr_cleanup(*a, est);
+   tcf_idr_release(*a, bind);
return err;
 }
 
-- 
2.14.3

[PATCH net 3/7] net/sched: fix idr leak in the error path of tcf_act_police_init()

2018-03-18 Thread Davide Caratti

tcf_act_police_init() can fail after the idr has been successfully
reserved (e.g., qdisc_get_rtab() may return NULL). When this happens,
subsequent attempts to configure a police rule using the same idr value
systematiclly fail with -ENOSPC:

 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: Cannot allocate memory
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 We have an error talking to the kernel
 # tc action add action police rate 1000 burst 1000 drop index 100
 RTNETLINK answers: No space left on device
 ...

Fix this in the error path of tcf_act_police_init(), calling
tcf_idr_release() in place of tcf_idr_cleanup().

Fixes: 65a206c01e8e ("net/sched: Change act_api and act_xxx modules to use IDR")
Signed-off-by: Davide Caratti 
---
 net/sched/act_police.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/act_police.c b/net/sched/act_police.c
index 95d3c9097b25..faebf82b99f1 100644
--- a/net/sched/act_police.c
+++ b/net/sched/act_police.c
@@ -194,7 +194,7 @@ static int tcf_act_police_init(struct net *net, struct 
nlattr *nla,
qdisc_put_rtab(P_tab);
qdisc_put_rtab(R_tab);
if (ret == ACT_P_CREATED)
-   tcf_idr_cleanup(*a, est);
+   tcf_idr_release(*a, bind);
return err;
 }
 
-- 
2.14.3

[PATCH net 0/7] fix idr leak in actions

2018-03-18 Thread Davide Caratti

This series fixes situations where a temporary failure to install a TC
action results in the permanent impossibility to reuse the configured
'index'.

Thanks to Cong Wang for the initial review.

Davide Caratti (7):
  net/sched: fix idr leak on the error path of tcf_bpf_init()
  net/sched: fix idr leak in the error path of tcf_simp_init()
  net/sched: fix idr leak in the error path of tcf_act_police_init()
  net/sched: fix idr leak in the error path of tcp_pedit_init()
  net/sched: fix idr leak in the error path of __tcf_ipt_init()
  net/sched: fix idr leak in the error path of tcf_vlan_init()
  net/sched: fix idr leak in the error path of tcf_skbmod_init()

 net/sched/act_bpf.c| 2 +-
 net/sched/act_ipt.c| 9 ++---
 net/sched/act_pedit.c  | 2 +-
 net/sched/act_police.c | 2 +-
 net/sched/act_simple.c | 2 +-
 net/sched/act_skbmod.c | 2 +-
 net/sched/act_vlan.c   | 2 +-
 7 files changed, 12 insertions(+), 9 deletions(-)

-- 
2.14.3

Re: [PATCH net-next 0/4] net: dsa: Plug in PHYLINK support

2018-03-18 Thread Florian Fainelli



On 03/18/2018 11:52 AM, Florian Fainelli wrote:
> Hi all,
> 
> This patch series adds PHYLINK support to DSA which is necessary to support 
> more
> complex PHY and pluggable modules setups.
> 
> Patch series can be found here:
> 
> https://github.com/ffainelli/linux/commits/dsa-phylink
> 
> This was tested on:
> 
> - dsa-loop
> - bcm_sf2
> - mv88e6xxx
> - b53
> 
> With a variety of test cases:
> - internal & external MDIO PHYs
> - MoCA with link notification through interrupt/MMIO register
> - built-in PHYs
> - ifconfig up/down for several cycles works
> - bind/unbind of the drivers
> 
> And everything should still work as expected. Please be aware of the 
> following:
> 
> - switch drivers (like bcm_sf2) which may have user-facing network ports using
>   fixed links would need to implement phylink_mac_ops to remain functional.
>   PHYLINK does not create a phy_device for fixed links, therefore our
>   call to adjust_link() from phylink_mac_link_{up,down} would not be calling
>   into the driver. This *should not* affect CPU/DSA ports which are configured
>   through adjust_link() but have no network devices
> 
> - support for SFP/SFF is now possible, but switch drivers will still need some
>   modifications to properly support those, including, but not limited to using
>   the correct binding information. This will be submitted on top of this 
> series
> 
> Russell, we could theoretically eliminate patch 3 and resolve this within DSA
> entirely by keeping a per-port phy_interface_t (we did that before), this is
> not a big change if we have to, let me know if you feel like this is cleaner. 
> I
> was initially considering passing a phylink_link_state reference to
> mac_link_{up,down} but only a couple of fields are valid during link_down and
> ended up with passing the phy_interface_t value we need instead. This is
> necessary for switch drivers which have different types of port interfaces 
> (see
> bcm_sf2 documentation in tree).

I think I will proceed differently for v2:

- introduce DSA phylink_mac_ops in dsa_switch_ops, such that drivers can
define those as preliminary commits, those won't be used by
net/dsa/slave.c just yet though

- have all relevant drivers implement phylink_mac_ops such that the
pluming is there and functional

- switch net/dsa/slave.c to using PHYLINK

That way, we should avoid any breakage in between and have an "atomic"
switch between PHYLIB and PHYLINK.

> 
> Thank you!
> 
> Florian Fainelli (4):
>   net: dsa: Eliminate dsa_slave_get_link()
>   net: phy: phylink: Provide PHY interface to mac_link_{up,down}
>   net: dsa: Plug in PHYLINK support
>   net: dsa: bcm_sf2: Implement phylink_mac_ops
> 
>  drivers/net/dsa/bcm_sf2.c | 190 +
>  drivers/net/ethernet/marvell/mvneta.c |   4 +-
>  drivers/net/phy/phylink.c |   6 +-
>  include/linux/phylink.h   |  10 +-
>  include/net/dsa.h |  27 ++-
>  net/dsa/Kconfig   |   2 +-
>  net/dsa/dsa_priv.h|   9 -
>  net/dsa/slave.c   | 304 
> --
>  8 files changed, 340 insertions(+), 212 deletions(-)
> 

-- 
Florian

Re: [PATCH net] net: fec: Fix unbalanced PM runtime calls

2018-03-18 Thread David Miller

From: Florian Fainelli 
Date: Sun, 18 Mar 2018 14:42:22 -0700

> On 03/18/2018 01:35 PM, David Miller wrote:
>> Queue this up for -stable?
> 
> I would be inclined to say yes, it was not exactly easy to track down a
> set of commits, since this was changed quite a bit.

Ok, queued up.

[PATCH net-next] tc-testing: add selftests for 'bpf' action

2018-03-18 Thread Davide Caratti

Test d959: Add cBPF action with valid bytecode
Test f84a: Add cBPF action with invalid bytecode
Test e939: Add eBPF action with valid object-file
Test d819: Replace cBPF bytecode and action control
Test 6ae3: Delete cBPF action
Test 3e0d: List cBPF actions
Test 55ce: Flush BPF actions
Test ccc3: Add cBPF action with duplicate index

Signed-off-by: Davide Caratti 
---
 .../selftests/tc-testing/tc-tests/actions/bpf.json | 215 +
 1 file changed, 215 insertions(+)
 create mode 100644 tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json

diff --git a/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json 
b/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json
new file mode 100644
index ..0295a63dd0c8
--- /dev/null
+++ b/tools/testing/selftests/tc-testing/tc-tests/actions/bpf.json
@@ -0,0 +1,215 @@
+[
+{
+"id": "d959",
+"name": "Add cBPF action with valid bytecode",
+"category": [
+"actions",
+"bpf"
+],
+"setup": [
+[
+"$TC action flush action bpf",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC action add action bpf bytecode '4,40 0 0 12,21 0 
1 2048,6 0 0 262144,6 0 0 0' index 100",
+"expExitCode": "0",
+"verifyCmd": "$TC action get action bpf index 100",
+"matchPattern": "action order [0-9]*: bpf bytecode '4,40 0 0 12,21 0 1 
2048,6 0 0 262144,6 0 0 0' default-action pipe.*index 100 ref",
+"matchCount": "1",
+"teardown": [
+"$TC action flush action bpf"
+]
+},
+{
+"id": "f84a",
+"name": "Add cBPF action with invalid bytecode",
+"category": [
+"actions",
+"bpf"
+],
+"setup": [
+[
+"$TC actions flush action bpf",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC action add action bpf bytecode '4,40 0 0 12,31 0 
1 2048,6 0 0 262144,6 0 0 0' index 100",
+"expExitCode": "255",
+"verifyCmd": "$TC action get action bpf index 100",
+"matchPattern": "action order [0-9]*: bpf bytecode '4,40 0 0 12,31 0 1 
2048,6 0 0 262144,6 0 0 0' default-action pipe.*index 100 ref",
+"matchCount": "0",
+"teardown": [
+"$TC actions flush action bpf"
+]
+},
+{
+"id": "e939",
+"name": "Add eBPF action with valid object-file",
+"category": [
+"actions",
+"bpf"
+],
+"setup": [
+"printf '#include \nchar l[] 
__attribute__((section(\"license\"),used))=\"GPL\"; 
__attribute__((section(\"action\"),used)) int m(struct __sk_buff *s) { return 
2; }' | clang -O2 -x c -c - -target bpf -o _b.o",
+[
+"$TC action flush action bpf",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC action add action bpf object-file _b.o index 667",
+"expExitCode": "0",
+"verifyCmd": "$TC action get action bpf index 667",
+"matchPattern": "action order [0-9]*: bpf _b.o:\\[action\\] id [0-9]* 
tag 3b185187f1855c4c default-action pipe.*index 667 ref",
+"matchCount": "1",
+"teardown": [
+"$TC action flush action bpf",
+"rm -f _b.o"
+]
+},
+{
+"id": "d819",
+"name": "Replace cBPF bytecode and action control",
+"category": [
+"actions",
+"bpf"
+],
+"setup": [
+[
+"$TC actions flush action bpf",
+0,
+1,
+255
+],
+[
+"$TC action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 
0 0 262144,6 0 0 0' index 555",
+0,
+1,
+255
+]
+],
+"cmdUnderTest": "$TC action replace action bpf bytecode '4,40 0 0 
12,21 0 1 2054,6 0 0 262144,6 0 0 0' drop index 555",
+"expExitCode": "0",
+"verifyCmd": "$TC action get action bpf index 555",
+"matchPattern": "action order [0-9]*: bpf bytecode '4,40 0 0 12,21 0 1 
2054,6 0 0 262144,6 0 0 0' default-action drop.*index 555 ref",
+"matchCount": "1",
+"teardown": [
+"$TC action flush action bpf"
+]
+},
+{
+"id": "6ae3",
+"name": "Delete cBPF action ",
+"category": [
+"actions",
+"bpf"
+],
+"setup": [
+[
+"$TC actions flush action bpf",
+0,
+1,
+255
+],
+[
+"$TC action add action bpf bytecode '4,40 0 0 12,21 0 1 2048,6 
0 0 262144,6 0 0 0' index 444",
+0,
+1,
+

linux-next: manual merge of the net-next tree with the syscalls tree

2018-03-18 Thread Stephen Rothwell

Hi all,

Today's linux-next merge of the net-next tree got a conflict in:

  include/linux/socket.h

between commits:

  06947f40e3e9 ("net: socket: add __sys_recvfrom() helper; remove in-kernel 
call to syscall")
  bd4053a762c6 ("net: socket: move check for forbid_cmsg_compat to 
__sys_...msg()")

from the syscalls tree and commit:

  d8d211a2a0c3 ("net: Make extern and export get_net_ns()")

from the net-next tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc include/linux/socket.h
index e2b6bd4fe977,1ce1f768a58c..
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@@ -356,30 -352,7 +356,31 @@@ extern long __sys_sendmsg(int fd, struc
  extern int __sys_recvmmsg(int fd, struct mmsghdr __user *mmsg, unsigned int 
vlen,
  unsigned int flags, struct timespec *timeout);
  extern int __sys_sendmmsg(int fd, struct mmsghdr __user *mmsg,
 -unsigned int vlen, unsigned int flags);
 +unsigned int vlen, unsigned int flags,
 +bool forbid_cmsg_compat);
 +
 +/* helpers which do the actual work for syscalls */
 +extern int __sys_recvfrom(int fd, void __user *ubuf, size_t size,
 +unsigned int flags, struct sockaddr __user *addr,
 +int __user *addr_len);
 +extern int __sys_sendto(int fd, void __user *buff, size_t len,
 +  unsigned int flags, struct sockaddr __user *addr,
 +  int addr_len);
 +extern int __sys_accept4(int fd, struct sockaddr __user *upeer_sockaddr,
 +   int __user *upeer_addrlen, int flags);
 +extern int __sys_socket(int family, int type, int protocol);
 +extern int __sys_bind(int fd, struct sockaddr __user *umyaddr, int addrlen);
 +extern int __sys_connect(int fd, struct sockaddr __user *uservaddr,
 +   int addrlen);
 +extern int __sys_listen(int fd, int backlog);
 +extern int __sys_getsockname(int fd, struct sockaddr __user *usockaddr,
 +   int __user *usockaddr_len);
 +extern int __sys_getpeername(int fd, struct sockaddr __user *usockaddr,
 +   int __user *usockaddr_len);
 +extern int __sys_socketpair(int family, int type, int protocol,
 +  int __user *usockvec);
 +extern int __sys_shutdown(int fd, int how);
 +
  
+ extern struct ns_common *get_net_ns(struct ns_common *ns);
  #endif /* _LINUX_SOCKET_H */


pgpP0HMQOuVNT.pgp
Description: OpenPGP digital signature

[PATCH net-next v2 2/2] dt: bindings: add new dt entries for brcmfmac

2018-03-18 Thread Alexey Roslyakov

In case if the host has higher align requirements for SG items, allow
setting device-specific aligns for scatterlist items.

Signed-off-by: Alexey Roslyakov 
---
 Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt | 5 +
 1 file changed, 5 insertions(+)

diff --git 
a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt 
b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
index 86602f264dce..187b8c1b52a7 100644
--- a/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
+++ b/Documentation/devicetree/bindings/net/wireless/brcm,bcm43xx-fmac.txt
@@ -17,6 +17,11 @@ Optional properties:
When not specified the device will use in-band SDIO interrupts.
  - interrupt-names : name of the out-of-band interrupt, which must be set
to "host-wake".
+ - brcm,broken-sg-support : boolean flag to indicate that the SDIO host
+   controller has higher align requirement than 32 bytes for each
+   scatterlist item.
+ - brcm,sd-head-align : alignment requirement for start of data buffer.
+ - brcm,sd-sgentry-align : length alignment requirement for each sg entry.
 
 Example:
 
-- 
2.16.1

[PATCH net-next v2 1/2] brcmfmac: add new dt entries for SG SDIO settings

2018-03-18 Thread Alexey Roslyakov

There are 3 fields in SDIO settings (quirks) to workaround some of
the SG SDIO host particularities, i.e higher align requirements for
SG items.
All coding is done the long time ago, but there is no way to change the
driver behavior without patching the kernel.
Add missing devicetree entries.

Signed-off-by: Alexey Roslyakov 
---
 drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c 
b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c
index aee6e5937c41..14135752b659 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/of.c
@@ -30,14 +30,20 @@ void brcmf_of_probe(struct device *dev, enum brcmf_bus_type 
bus_type,
struct device_node *np = dev->of_node;
int irq;
u32 irqf;
-   u32 val;
 
if (!np || bus_type != BRCMF_BUSTYPE_SDIO ||
!of_device_is_compatible(np, "brcm,bcm4329-fmac"))
return;
 
-   if (of_property_read_u32(np, "brcm,drive-strength", &val) == 0)
-   sdio->drive_strength = val;
+   of_property_read_u32(np, "brcm,drive-strength", &sdio->drive_strength);
+
+   sdio->broken_sg_support =
+   of_property_read_bool(np, "brcm,broken-sg-support");
+
+   of_property_read_u16(np, "brcm,sd-head-align", &sdio->sd_head_align);
+
+   of_property_read_u16(np, "brcm,sd-sgentry-align",
+&sdio->sd_sgentry_align);
 
/* make sure there are interrupts defined in the node */
if (!of_find_property(np, "interrupts", NULL))
-- 
2.16.1

[PATCH net-next v2 0/2] brcmfmac: add new dt entries for SG SDIO settings

2018-03-18 Thread Alexey Roslyakov

Changes in v2: don't check of_property_read_* return values since it
doesn't change the value if property not found.
Suggested by Andrew Lunn.

Re: [PATCH v3 18/18] infiniband: cxgb4: Eliminate duplicate barriers on weakly-ordered archs

2018-03-18 Thread Jason Gunthorpe

On Sat, Mar 17, 2018 at 02:30:10PM -0400, Sinan Kaya wrote:

> Somebody also has to take a task and work very hard to get rid of 
> __raw_writeX()
> APIs in drivers/net directory. It looked like a very common practice though
> it clearly violates multiarch portability concerns Jason and Deve highlighted.

When you posted your list I thought most of the hits were in what I'd
think of 'one-arch drivers', eg an IRQ controller or clock driver or
something.. Some might have a reason for it (eg avoiding the swap, for
instance), maybe it is a hold over from before writel_relaxed, or
maybe it is just a cargo-cult behavior..

It is the obviously multi-arch drivers that probably need some
attention..

Jason

Re: [PATCH v11 crypto 06/12] crypto: chtls - structure and macro for Inline TLS

2018-03-18 Thread Atul Gupta



On 3/19/2018 4:23 AM, Sabrina Dubroca wrote:
> 2018-03-16, 21:07:35 +0530, Atul Gupta wrote:
> [...]
>> +#define SOCK_INLINE (31)
> [...]
>
>> +static inline int csk_flag(const struct sock *sk, enum csk_flags flag)
>> +{
>> +struct chtls_sock *csk = rcu_dereference_sk_user_data(sk);
>> +
>> +if (!sock_flag(sk, SOCK_INLINE))
>> +return 0;
>> +return test_bit(flag, &csk->flags);
>> +}
> Should drivers really start defining their own socket flags?
this is for conn in Inline mode once transitioned to HW, will re-check if can 
avoid this. Thanks
>
>
>> +static inline void set_queue(struct sk_buff *skb,
>> + unsigned int queue, const struct sock *sk)
>> +{
>> +skb->queue_mapping = queue;
>> +}
> That's skb_set_queue_mapping(), no need to define your own.
Yes, can avoid re-def. Thank you.
>

Re: [PATCH v11 crypto 00/12] Chelsio Inline TLS

2018-03-18 Thread Atul Gupta



On 3/18/2018 8:06 PM, David Miller wrote:
> From: Atul Gupta 
> Date: Sun, 18 Mar 2018 14:30:30 +
>
>> Hi Dave/Herbert,
>>
>> This series is against crypto tree, should I submit two patch series:
>> 1. netdev specific changes against net-next tree?
>> 2. crypto changes against crypto tree?
> Herbert, is it OK for this entire series to go via net-next?
Once Herbert confirms I will create series against net-next.
> Thanks!

[PATCH RFC v2 net-next 04/21] net/ipv6: Pass net to fib6_update_sernum

2018-03-18 Thread David Ahern

Pass net namespace to fib6_update_sernum. It can not be marked const
as fib6_new_sernum will change ipv6.fib6_sernum.

Signed-off-by: David Ahern 
---
 include/net/ip6_fib.h |  2 +-
 net/ipv6/ip6_fib.c|  3 +--
 net/ipv6/route.c  | 10 +-
 3 files changed, 7 insertions(+), 8 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 5e86fd9dc857..f0aaf1c8f1a8 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -408,7 +408,7 @@ void __net_exit fib6_notifier_exit(struct net *net);
 unsigned int fib6_tables_seq_read(struct net *net);
 int fib6_tables_dump(struct net *net, struct notifier_block *nb);
 
-void fib6_update_sernum(struct rt6_info *rt);
+void fib6_update_sernum(struct net *net, struct rt6_info *rt);
 void fib6_update_sernum_upto_root(struct net *net, struct rt6_info *rt);
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 2f995e9e3050..29a9e835faac 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -105,9 +105,8 @@ enum {
FIB6_NO_SERNUM_CHANGE = 0,
 };
 
-void fib6_update_sernum(struct rt6_info *rt)
+void fib6_update_sernum(struct net *net, struct rt6_info *rt)
 {
-   struct net *net = dev_net(rt->dst.dev);
struct fib6_node *fn;
 
fn = rcu_dereference_protected(rt->rt6i_node,
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 939d122e71b4..8e4f0995e95a 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -1349,7 +1349,7 @@ static int rt6_insert_exception(struct rt6_info *nrt,
/* Update fn->fn_sernum to invalidate all cached dst */
if (!err) {
spin_lock_bh(&ort->rt6i_table->tb6_lock);
-   fib6_update_sernum(ort);
+   fib6_update_sernum(net, ort);
spin_unlock_bh(&ort->rt6i_table->tb6_lock);
fib6_force_start_gc(net);
}
@@ -3733,11 +3733,11 @@ void rt6_multipath_rebalance(struct rt6_info *rt)
 static int fib6_ifup(struct rt6_info *rt, void *p_arg)
 {
const struct arg_netdev_event *arg = p_arg;
-   const struct net *net = dev_net(arg->dev);
+   struct net *net = dev_net(arg->dev);
 
if (rt != net->ipv6.ip6_null_entry && rt->dst.dev == arg->dev) {
rt->rt6i_nh_flags &= ~arg->nh_flags;
-   fib6_update_sernum_upto_root(dev_net(rt->dst.dev), rt);
+   fib6_update_sernum_upto_root(net, rt);
rt6_multipath_rebalance(rt);
}
 
@@ -3816,7 +3816,7 @@ static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
 {
const struct arg_netdev_event *arg = p_arg;
const struct net_device *dev = arg->dev;
-   const struct net *net = dev_net(dev);
+   struct net *net = dev_net(dev);
 
if (rt == net->ipv6.ip6_null_entry)
return 0;
@@ -3839,7 +3839,7 @@ static int fib6_ifdown(struct rt6_info *rt, void *p_arg)
}
rt6_multipath_nh_flags_set(rt, dev, RTNH_F_DEAD |
   RTNH_F_LINKDOWN);
-   fib6_update_sernum(rt);
+   fib6_update_sernum(net, rt);
rt6_multipath_rebalance(rt);
}
return -2;
-- 
2.11.0

[PATCH RFC v2 net-next 02/21] net: Handle null dst in rtnl_put_cacheinfo

2018-03-18 Thread David Ahern

Need to keep expires time for IPv6 routes in a dump of FIB entries.
Update rtnl_put_cacheinfo to allow dst to be NULL in which case
rta_cacheinfo will only contain non-dst data.

Signed-off-by: David Ahern 
---
 net/core/rtnetlink.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 87079eaa871b..33d6ee808155 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -780,13 +780,15 @@ int rtnl_put_cacheinfo(struct sk_buff *skb, struct 
dst_entry *dst, u32 id,
   long expires, u32 error)
 {
struct rta_cacheinfo ci = {
-   .rta_lastuse = jiffies_delta_to_clock_t(jiffies - dst->lastuse),
-   .rta_used = dst->__use,
-   .rta_clntref = atomic_read(&(dst->__refcnt)),
.rta_error = error,
.rta_id =  id,
};
 
+   if (dst) {
+   ci.rta_lastuse = jiffies_delta_to_clock_t(jiffies - 
dst->lastuse);
+   ci.rta_used = dst->__use;
+   ci.rta_clntref = atomic_read(&dst->__refcnt);
+   }
if (expires) {
unsigned long clock;
 
-- 
2.11.0

[PATCH RFC v2 net-next 07/21] net/ipv6: Save route type in rt6_info

2018-03-18 Thread David Ahern

The RTN_ type for IPv6 FIB entries is currently embedded in rt6i_flags
and dst.error. Since dst is going to be removed, it can no longer be
relied on for FIB dumps so save the route type as fib6_type.

Signed-off-by: David Ahern 
---
 include/net/ip6_fib.h |  1 +
 net/ipv6/addrconf.c   |  2 ++
 net/ipv6/route.c  | 46 --
 3 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index f0aaf1c8f1a8..0165820bbafb 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -174,6 +174,7 @@ struct rt6_info {
int rt6i_nh_weight;
unsigned short  rt6i_nfheader_len;
u8  rt6i_protocol;
+   u8  fib6_type;
u8  exception_bucket_flushed:1,
should_flush:1,
unused:6;
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 2b0aa6d7eb17..2313d74ccf46 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -2343,6 +2343,7 @@ addrconf_prefix_route(struct in6_addr *pfx, int plen, 
struct net_device *dev,
.fc_flags = RTF_UP | flags,
.fc_nlinfo.nl_net = dev_net(dev),
.fc_protocol = RTPROT_KERNEL,
+   .fc_type = RTN_UNICAST,
};
 
cfg.fc_dst = *pfx;
@@ -2406,6 +2407,7 @@ static void addrconf_add_mroute(struct net_device *dev)
.fc_ifindex = dev->ifindex,
.fc_dst_len = 8,
.fc_flags = RTF_UP,
+   .fc_type = RTN_UNICAST,
.fc_nlinfo.nl_net = dev_net(dev),
};
 
diff --git a/net/ipv6/route.c b/net/ipv6/route.c
index 662798496720..8c3b26c42bd4 100644
--- a/net/ipv6/route.c
+++ b/net/ipv6/route.c
@@ -307,6 +307,7 @@ static const struct rt6_info ip6_null_entry_template = {
.rt6i_protocol  = RTPROT_KERNEL,
.rt6i_metric= ~(u32) 0,
.rt6i_ref   = ATOMIC_INIT(1),
+   .fib6_type  = RTN_UNREACHABLE,
 };
 
 #ifdef CONFIG_IPV6_MULTIPLE_TABLES
@@ -324,6 +325,7 @@ static const struct rt6_info ip6_prohibit_entry_template = {
.rt6i_protocol  = RTPROT_KERNEL,
.rt6i_metric= ~(u32) 0,
.rt6i_ref   = ATOMIC_INIT(1),
+   .fib6_type  = RTN_PROHIBIT,
 };
 
 static const struct rt6_info ip6_blk_hole_entry_template = {
@@ -339,6 +341,7 @@ static const struct rt6_info ip6_blk_hole_entry_template = {
.rt6i_protocol  = RTPROT_KERNEL,
.rt6i_metric= ~(u32) 0,
.rt6i_ref   = ATOMIC_INIT(1),
+   .fib6_type  = RTN_BLACKHOLE,
 };
 
 #endif
@@ -2756,6 +2759,11 @@ static struct rt6_info *ip6_route_info_create(struct 
fib6_config *cfg,
goto out;
}
 
+   if (cfg->fc_type > RTN_MAX) {
+   NL_SET_ERR_MSG(extack, "Invalid route type");
+   goto out;
+   }
+
if (cfg->fc_dst_len > 128) {
NL_SET_ERR_MSG(extack, "Invalid prefix length");
goto out;
@@ -2868,6 +2876,8 @@ static struct rt6_info *ip6_route_info_create(struct 
fib6_config *cfg,
rt->rt6i_metric = cfg->fc_metric;
rt->rt6i_nh_weight = 1;
 
+   rt->fib6_type = cfg->fc_type;
+
/* We cannot add true routes via loopback here,
   they would result in kernel looping; promote them to reject routes
 */
@@ -3302,6 +3312,7 @@ static struct rt6_info *rt6_add_route_info(struct net 
*net,
.fc_flags   = RTF_GATEWAY | RTF_ADDRCONF | RTF_ROUTEINFO |
  RTF_UP | RTF_PREF(pref),
.fc_protocol = RTPROT_RA,
+   .fc_type = RTN_UNICAST,
.fc_nlinfo.portid = 0,
.fc_nlinfo.nlh = NULL,
.fc_nlinfo.nl_net = net,
@@ -3358,6 +3369,7 @@ struct rt6_info *rt6_add_dflt_router(struct net *net,
.fc_flags   = RTF_GATEWAY | RTF_ADDRCONF | RTF_DEFAULT |
  RTF_UP | RTF_EXPIRES | RTF_PREF(pref),
.fc_protocol = RTPROT_RA,
+   .fc_type = RTN_UNICAST,
.fc_nlinfo.portid = 0,
.fc_nlinfo.nlh = NULL,
.fc_nlinfo.nl_net = net,
@@ -3433,6 +3445,7 @@ static void rtmsg_to_fib6_config(struct net *net,
cfg->fc_dst_len = rtmsg->rtmsg_dst_len;
cfg->fc_src_len = rtmsg->rtmsg_src_len;
cfg->fc_flags = rtmsg->rtmsg_flags;
+   cfg->fc_type = rtmsg->rtmsg_type;
 
cfg->fc_nlinfo.nl_net = net;
 
@@ -3553,10 +3566,13 @@ struct rt6_info *addrconf_dst_alloc(struct net *net,
 
rt->rt6i_protocol = RTPROT_KERNEL;
rt->rt6i_flags = RTF_UP | RTF_NONEXTHOP;
-   if (anycast)
+   if (anycast) {
+   rt->fib6_type = RTN_ANYCAST;
rt->rt6i_flags |= RTF_ANYCAST;
-   else
+   } else {
+

[PATCH RFC v2 net-next 11/21] net/ipv6: move expires into rt6_info

2018-03-18 Thread David Ahern

Add expires to rt6_info for FIB entries, and add fib6 helpers to
manage it. Data path use of dst.expires remains.

The transition is fairly straightforward: when working with fib entries,
rt->dst.expires is just rt->expires, rt6_clean_expires is replaced with
fib6_clean_expires, rt6_set_expires becomes fib6_set_expires, and
rt6_check_expired becomes fib6_check_expired, where the fib6 versions
are added by this patch.

Signed-off-by: David Ahern 
---
 include/net/ip6_fib.h | 27 +++
 net/ipv6/addrconf.c   |  6 +++---
 net/ipv6/ip6_fib.c|  8 
 net/ipv6/ndisc.c  |  2 +-
 net/ipv6/route.c  | 20 +++-
 5 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/include/net/ip6_fib.h b/include/net/ip6_fib.h
index 1f8dc9d12abb..c73b985734f5 100644
--- a/include/net/ip6_fib.h
+++ b/include/net/ip6_fib.h
@@ -179,6 +179,7 @@ struct rt6_info {
should_flush:1,
unused:6;
 
+   unsigned long   expires;
struct dst_metrics  *fib6_metrics;
 #define fib6_pmtu  fib6_metrics->metrics[RTAX_MTU-1]
struct fib6_nh  fib6_nh;
@@ -197,6 +198,26 @@ static inline struct inet6_dev *ip6_dst_idev(struct 
dst_entry *dst)
return ((struct rt6_info *)dst)->rt6i_idev;
 }
 
+static inline void fib6_clean_expires(struct rt6_info *f6i)
+{
+   f6i->rt6i_flags &= ~RTF_EXPIRES;
+   f6i->expires = 0;
+}
+
+static inline void fib6_set_expires(struct rt6_info *f6i,
+   unsigned long expires)
+{
+   f6i->expires = expires;
+   f6i->rt6i_flags |= RTF_EXPIRES;
+}
+
+static inline bool fib6_check_expired(const struct rt6_info *f6i)
+{
+   if (f6i->rt6i_flags & RTF_EXPIRES)
+   return time_after(jiffies, f6i->expires);
+   return false;
+}
+
 static inline void rt6_clean_expires(struct rt6_info *rt)
 {
rt->rt6i_flags &= ~RTF_EXPIRES;
@@ -211,11 +232,9 @@ static inline void rt6_set_expires(struct rt6_info *rt, 
unsigned long expires)
 
 static inline void rt6_update_expires(struct rt6_info *rt0, int timeout)
 {
-   struct rt6_info *rt;
+   if (!(rt0->rt6i_flags & RTF_EXPIRES) && rt0->from)
+   rt0->dst.expires = rt0->from->expires;
 
-   for (rt = rt0; rt && !(rt->rt6i_flags & RTF_EXPIRES); rt = rt->from);
-   if (rt && rt != rt0)
-   rt0->dst.expires = rt->dst.expires;
dst_set_expires(&rt0->dst, timeout);
rt0->rt6i_flags |= RTF_EXPIRES;
 }
diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 7d92c6e48d2e..23834864adb5 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1202,7 +1202,7 @@ cleanup_prefix_route(struct inet6_ifaddr *ifp, unsigned 
long expires, bool del_r
ip6_del_rt(dev_net(ifp->idev->dev), rt);
else {
if (!(rt->rt6i_flags & RTF_EXPIRES))
-   rt6_set_expires(rt, expires);
+   fib6_set_expires(rt, expires);
ip6_rt_put(rt);
}
}
@@ -2685,9 +2685,9 @@ void addrconf_prefix_rcv(struct net_device *dev, u8 *opt, 
int len, bool sllao)
rt = NULL;
} else if (addrconf_finite_timeout(rt_expires)) {
/* not infinity */
-   rt6_set_expires(rt, jiffies + rt_expires);
+   fib6_set_expires(rt, jiffies + rt_expires);
} else {
-   rt6_clean_expires(rt);
+   fib6_clean_expires(rt);
}
} else if (valid_lft) {
clock_t expires = 0;
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index f3f284c3a486..70eca9cb551f 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -905,9 +905,9 @@ static int fib6_add_rt2node(struct fib6_node *fn, struct 
rt6_info *rt,
if (!(iter->rt6i_flags & RTF_EXPIRES))
return -EEXIST;
if (!(rt->rt6i_flags & RTF_EXPIRES))
-   rt6_clean_expires(iter);
+   fib6_clean_expires(iter);
else
-   rt6_set_expires(iter, rt->dst.expires);
+   fib6_set_expires(iter, rt->expires);
fib6_metric_set(iter, RTAX_MTU, rt->fib6_pmtu);
return -EEXIST;
}
@@ -1994,8 +1994,8 @@ static int fib6_age(struct rt6_info *rt, void *arg)
 *  Routes are expired even if they are in use.
 */
 
-   if (rt->rt6i_flags & RTF_EXPIRES && rt->dst.expires) {
-   if (t

1 2 >

1 - 100 of 128 matches

Mail list logo