RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-28 Thread Vakul Garg



> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Sunday, July 29, 2018 11:48 AM
> To: Vakul Garg 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: Re: [net-next v6 1/2] net/tls: Use socket data_ready callback on
> record availability
> 
> From: Vakul Garg 
> Date: Sun, 29 Jul 2018 06:01:29 +
> 
> > Could you please correct me if my counter-reasoning behind changing
> > the socket callback is wrong?
> 
> Ok, after stufying the code a bit I agree with your analysis.

Thanks.
Kindly advise, if I need to resubmit/rebase the patch.
 



Re: [net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-28 Thread David Miller
From: Vakul Garg 
Date: Sun, 29 Jul 2018 06:01:29 +

> Could you please correct me if my counter-reasoning behind changing
> the socket callback is wrong?

Ok, after stufying the code a bit I agree with your analysis.


Re: [PATCH][net-next] openvswitch: eliminate cpu_used_mask from sw_flow

2018-07-28 Thread David Miller
From: Li RongQing 
Date: Fri, 27 Jul 2018 16:03:57 +0800

> The size of struct cpumask varies with CONFIG_NR_CPUS, some config
> CONFIG_NR_CPUS is very larger, like 5120, struct cpumask will take
> 640 bytes, if there is thousands of flows, it will take lots of
> memory
> 
> cpu_used_mask has two purposes
> 1: Assume first cpu as cpu0 which maybe not true; now use
>cpumask_first(cpu_possible_mask)
> 2: when get/clear statistic, reduce the iteratation; but it
>is not hot path, so use for_each_possible_cpu
> 
> Signed-off-by: Zhang Yu 
> Signed-off-by: Li RongQing 

This seems to completely undo the optimization done by:

commit c4b2bf6b4a35348fe6d1eb06928eb68d7b9d99a9
Author: Tonghao Zhang 
Date:   Mon Jul 17 23:28:06 2017 -0700

openvswitch: Optimize operations for OvS flow_stats.

And in that commit message it states clearly that flow_free()
performance matters, and that the iteration over cpu_possible_mask in
the for() loop is the problem.

At a minimum, we can't apply this unless you explain why the
above performance issue won't be reintroudced by your change.

Thank you.


RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on record availability

2018-07-28 Thread Vakul Garg
Hi David

Could you please correct me if my counter-reasoning behind changing the socket 
callback is wrong?

Thanks & Regards

Vakul

> -Original Message-
> From: Vakul Garg
> Sent: Wednesday, July 25, 2018 11:22 AM
> To: David Miller 
> Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com; davejwat...@fb.com
> Subject: RE: [net-next v6 1/2] net/tls: Use socket data_ready callback on
> record availability
> 
> 
> 
> > -Original Message-
> > From: David Miller [mailto:da...@davemloft.net]
> > Sent: Wednesday, July 25, 2018 1:43 AM
> > To: Vakul Garg 
> > Cc: netdev@vger.kernel.org; bor...@mellanox.com;
> avia...@mellanox.com;
> > davejwat...@fb.com
> > Subject: Re: [net-next v6 1/2] net/tls: Use socket data_ready callback
> > on record availability
> >
> > From: Vakul Garg 
> > Date: Tue, 24 Jul 2018 15:44:02 +0530
> >
> > > On receipt of a complete tls record, use socket's saved data_ready
> > > callback instead of state_change callback.
> > >
> > > Signed-off-by: Vakul Garg 
> >
> > I don't think this is correct.
> >
> > Here, the stream parser has given us a complete TLS record.
> >
> > But we haven't decrypted this packet yet.  It sits on the stream
> > parser's queue to be processed by tls_sw_recvmsg(), not the saved
> > socket's receive queue.
> 
> I understand that at this point in code, the TLS record is still queued in
> encrypted state. But the decryption happens inline when tls_sw_recvmsg()
> gets invokved.
> So it should be ok to notify the  waiting context about the availability of 
> data
> as soon as we could collect a full TLS record.
> 
> For new data availability notification, sk_data_ready callback should be more
> more appropriate. It points to sock_def_readable() which wakes up
> specifically for EPOLLIN event.
> 
> This is in contrast to the socket callback sk_state_change which points to
> sock_def_wakeup() which issues a wakeup unconditionally (without event
> mask).


Re: [PATCH net-next v2 0/2] tls: Fix improper revert in zerocopy_from_iter

2018-07-28 Thread David Miller
From: Doron Roberts-Kedes 
Date: Thu, 26 Jul 2018 07:59:34 -0700

> This series fixes the improper iov_iter_revert introcded in 
> "tls: Fix zerocopy_from_iter iov handling". 
> 
> Changes from v1:
> - call iov_iter_revert inside zerocopy_from_iter

Series applied, thank you.


Re: [PATCH net] tcp_bbr: fix bw probing to raise in-flight data for very small BDPs

2018-07-28 Thread David Miller
From: Neal Cardwell 
Date: Fri, 27 Jul 2018 17:19:12 -0400

> For some very small BDPs (with just a few packets) there was a
> quantization effect where the target number of packets in flight
> during the super-unity-gain (1.25x) phase of gain cycling was
> implicitly truncated to a number of packets no larger than the normal
> unity-gain (1.0x) phase of gain cycling. This meant that in multi-flow
> scenarios some flows could get stuck with a lower bandwidth, because
> they did not push enough packets inflight to discover that there was
> more bandwidth available. This was really only an issue in multi-flow
> LAN scenarios, where RTTs and BDPs are low enough for this to be an
> issue.
> 
> This fix ensures that gain cycling can raise inflight for small BDPs
> by ensuring that in PROBE_BW mode target inflight values with a
> super-unity gain are always greater than inflight values with a gain
> <= 1. Importantly, this applies whether the inflight value is
> calculated for use as a cwnd value, or as a target inflight value for
> the end of the super-unity phase in bbr_is_next_cycle_phase() (both
> need to be bigger to ensure we can probe with more packets in flight
> reliably).
> 
> This is a candidate fix for stable releases.
> 
> Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
> Signed-off-by: Neal Cardwell 
> Acked-by: Yuchung Cheng 
> Acked-by: Soheil Hassas Yeganeh 
> Acked-by: Priyaranjan Jha 
> Reviewed-by: Eric Dumazet 

Applied and queued up for -stable, thank you.


Re: [PATCH net-next v4 3/4] net/tc: introduce TC_ACT_REINSERT.

2018-07-28 Thread kbuild test robot
Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/TC-refactor-act_mirred-packets-re-injection/20180729-102154
config: x86_64-randconfig-u0-07291027 (attached as .config)
compiler: gcc-5 (Debian 5.5.0-3) 5.4.1 20171010
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All warnings (new ones prefixed by >>):

   In file included from arch/x86/include/asm/current.h:5:0,
from include/linux/sched.h:12,
from include/linux/uaccess.h:5,
from net/core/dev.c:75:
   net/core/dev.c: In function 'netif_receive_generic_xdp':
   net/core/dev.c:4255:28: error: 'struct sk_buff' has no member named 
'tc_redirected'
 if (skb_cloned(skb) || skb->tc_redirected)
   ^
   include/linux/compiler.h:58:30: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
>> net/core/dev.c:4255:2: note: in expansion of macro 'if'
 if (skb_cloned(skb) || skb->tc_redirected)
 ^
   net/core/dev.c:4255:28: error: 'struct sk_buff' has no member named 
'tc_redirected'
 if (skb_cloned(skb) || skb->tc_redirected)
   ^
   include/linux/compiler.h:58:42: note: in definition of macro '__trace_if'
 if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
 ^
>> net/core/dev.c:4255:2: note: in expansion of macro 'if'
 if (skb_cloned(skb) || skb->tc_redirected)
 ^
   net/core/dev.c:4255:28: error: 'struct sk_buff' has no member named 
'tc_redirected'
 if (skb_cloned(skb) || skb->tc_redirected)
   ^
   include/linux/compiler.h:69:16: note: in definition of macro '__trace_if'
  __r = !!(cond); \
   ^
>> net/core/dev.c:4255:2: note: in expansion of macro 'if'
 if (skb_cloned(skb) || skb->tc_redirected)
 ^

vim +/if +4255 net/core/dev.c

  4241  
  4242  static u32 netif_receive_generic_xdp(struct sk_buff *skb,
  4243   struct xdp_buff *xdp,
  4244   struct bpf_prog *xdp_prog)
  4245  {
  4246  struct netdev_rx_queue *rxqueue;
  4247  void *orig_data, *orig_data_end;
  4248  u32 metalen, act = XDP_DROP;
  4249  int hlen, off;
  4250  u32 mac_len;
  4251  
  4252  /* Reinjected packets coming from act_mirred or similar should
  4253   * not get XDP generic processing.
  4254   */
> 4255  if (skb_cloned(skb) || skb->tc_redirected)
  4256  return XDP_PASS;
  4257  
  4258  /* XDP packets must be linear and must have sufficient headroom
  4259   * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
  4260   * native XDP provides, thus we need to do it here as well.
  4261   */
  4262  if (skb_is_nonlinear(skb) ||
  4263  skb_headroom(skb) < XDP_PACKET_HEADROOM) {
  4264  int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
  4265  int troom = skb->tail + skb->data_len - skb->end;
  4266  
  4267  /* In case we have to go down the path and also 
linearize,
  4268   * then lets do the pskb_expand_head() work just once 
here.
  4269   */
  4270  if (pskb_expand_head(skb,
  4271   hroom > 0 ? ALIGN(hroom, 
NET_SKB_PAD) : 0,
  4272   troom > 0 ? troom + 128 : 0, 
GFP_ATOMIC))
  4273  goto do_drop;
  4274  if (skb_linearize(skb))
  4275  goto do_drop;
  4276  }
  4277  
  4278  /* The XDP program wants to see the packet starting at the MAC
  4279   * header.
  4280   */
  4281  mac_len = skb->data - skb_mac_header(skb);
  4282  hlen = skb_headlen(skb) + mac_len;
  4283  xdp->data = skb->data - mac_len;
  4284  xdp->data_meta = xdp->data;
  4285  xdp->data_end = xdp->data + hlen;
  4286  xdp->data_hard_start = skb->data - skb_headroom(skb);
  4287  orig_data_end = xdp->data_end;
  4288  orig_data = xdp->data;
  4289  
  4290  rxqueue = netif_get_rxqueue(skb);
  4291  xdp->rxq = &rxqueue->xdp_rxq;
  4292  
  4293  act = bpf_prog_run_xdp(xdp_prog, xdp);
  4294  
  4295  off = xdp->data - orig_data;
  4296  if (off > 0)
  4297  __skb_pull(skb, off);
  4298  else if (off < 0)
  4299  __skb_push(skb, -off);
  4300  skb->mac_header += off;
  4301  
  4302  /* check if bpf_xdp_adjust_tail was used. it can only "shrink"
  4303   * pckt.
  4304

Re: pull-request: bpf 2018-07-28

2018-07-28 Thread David Miller
From: Daniel Borkmann 
Date: Sun, 29 Jul 2018 04:22:24 +0200

> The following pull-request contains BPF updates for your *net* tree.
> 
> The main changes are:
> 
> 1) API fixes for libbpf's BTF mapping of map key/value types in order
>to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
>markings, from Martin.
> 
> 2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
>consumer pointer of the RX queue, from Björn.
> 
> 3) Fix __xdp_return() to check for NULL pointer after the rhashtable
>lookup that retrieves the allocator object, from Taehee.
> 
> 4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
>by 4 bytes which got removed from overall stack usage, from Wang.
> 
> 5) Fix bpf_skb_load_bytes_relative() length check to use actual
>packet length, from Daniel.
> 
> 6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
>handler, from Thomas.
> 
> Please consider pulling these changes from:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Pulled, thanks!


Re: [PATCH net-next v4 3/4] net/tc: introduce TC_ACT_REINSERT.

2018-07-28 Thread kbuild test robot
Hi Paolo,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/TC-refactor-act_mirred-packets-re-injection/20180729-102154
config: i386-randconfig-n0-201829 (attached as .config)
compiler: gcc-7 (Debian 7.3.0-16) 7.3.0
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net//core/dev.c: In function 'netif_receive_generic_xdp':
>> net//core/dev.c:4255:28: error: 'struct sk_buff' has no member named 
>> 'tc_redirected'
 if (skb_cloned(skb) || skb->tc_redirected)
   ^~

vim +4255 net//core/dev.c

  4241  
  4242  static u32 netif_receive_generic_xdp(struct sk_buff *skb,
  4243   struct xdp_buff *xdp,
  4244   struct bpf_prog *xdp_prog)
  4245  {
  4246  struct netdev_rx_queue *rxqueue;
  4247  void *orig_data, *orig_data_end;
  4248  u32 metalen, act = XDP_DROP;
  4249  int hlen, off;
  4250  u32 mac_len;
  4251  
  4252  /* Reinjected packets coming from act_mirred or similar should
  4253   * not get XDP generic processing.
  4254   */
> 4255  if (skb_cloned(skb) || skb->tc_redirected)
  4256  return XDP_PASS;
  4257  
  4258  /* XDP packets must be linear and must have sufficient headroom
  4259   * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
  4260   * native XDP provides, thus we need to do it here as well.
  4261   */
  4262  if (skb_is_nonlinear(skb) ||
  4263  skb_headroom(skb) < XDP_PACKET_HEADROOM) {
  4264  int hroom = XDP_PACKET_HEADROOM - skb_headroom(skb);
  4265  int troom = skb->tail + skb->data_len - skb->end;
  4266  
  4267  /* In case we have to go down the path and also 
linearize,
  4268   * then lets do the pskb_expand_head() work just once 
here.
  4269   */
  4270  if (pskb_expand_head(skb,
  4271   hroom > 0 ? ALIGN(hroom, 
NET_SKB_PAD) : 0,
  4272   troom > 0 ? troom + 128 : 0, 
GFP_ATOMIC))
  4273  goto do_drop;
  4274  if (skb_linearize(skb))
  4275  goto do_drop;
  4276  }
  4277  
  4278  /* The XDP program wants to see the packet starting at the MAC
  4279   * header.
  4280   */
  4281  mac_len = skb->data - skb_mac_header(skb);
  4282  hlen = skb_headlen(skb) + mac_len;
  4283  xdp->data = skb->data - mac_len;
  4284  xdp->data_meta = xdp->data;
  4285  xdp->data_end = xdp->data + hlen;
  4286  xdp->data_hard_start = skb->data - skb_headroom(skb);
  4287  orig_data_end = xdp->data_end;
  4288  orig_data = xdp->data;
  4289  
  4290  rxqueue = netif_get_rxqueue(skb);
  4291  xdp->rxq = &rxqueue->xdp_rxq;
  4292  
  4293  act = bpf_prog_run_xdp(xdp_prog, xdp);
  4294  
  4295  off = xdp->data - orig_data;
  4296  if (off > 0)
  4297  __skb_pull(skb, off);
  4298  else if (off < 0)
  4299  __skb_push(skb, -off);
  4300  skb->mac_header += off;
  4301  
  4302  /* check if bpf_xdp_adjust_tail was used. it can only "shrink"
  4303   * pckt.
  4304   */
  4305  off = orig_data_end - xdp->data_end;
  4306  if (off != 0) {
  4307  skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
  4308  skb->len -= off;
  4309  
  4310  }
  4311  
  4312  switch (act) {
  4313  case XDP_REDIRECT:
  4314  case XDP_TX:
  4315  __skb_push(skb, mac_len);
  4316  break;
  4317  case XDP_PASS:
  4318  metalen = xdp->data - xdp->data_meta;
  4319  if (metalen)
  4320  skb_metadata_set(skb, metalen);
  4321  break;
  4322  default:
  4323  bpf_warn_invalid_xdp_action(act);
  4324  /* fall through */
  4325  case XDP_ABORTED:
  4326  trace_xdp_exception(skb->dev, xdp_prog, act);
  4327  /* fall through */
  4328  case XDP_DROP:
  4329  do_drop:
  4330  kfree_skb(skb);
  4331  break;
  4332  }
  4333  
  4334  return act;
  4335  }
  4336  

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH net-next v4 4/4] act_mirred: use TC_ACT_REINSERT when possible

2018-07-28 Thread kbuild test robot
Hi Paolo,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Paolo-Abeni/TC-refactor-act_mirred-packets-re-injection/20180729-102154
config: xtensa-allyesconfig (attached as .config)
compiler: xtensa-linux-gcc (GCC) 8.1.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
GCC_VERSION=8.1.0 make.cross ARCH=xtensa 

Note: it may well be a FALSE warning. FWIW you are at least aware of it now.
http://gcc.gnu.org/wiki/Better_Uninitialized_Warnings

All warnings (new ones prefixed by >>):

   net/sched/act_mirred.c: In function 'tcf_mirred':
>> net/sched/act_mirred.c:268:6: warning: 'is_redirect' may be used 
>> uninitialized in this function [-Wmaybe-uninitialized]
  if (is_redirect)
 ^

vim +/is_redirect +268 net/sched/act_mirred.c

   182  
   183  static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a,
   184struct tcf_result *res)
   185  {
   186  struct tcf_mirred *m = to_mirred(a);
   187  struct sk_buff *skb2 = skb;
   188  bool m_mac_header_xmit;
   189  struct net_device *dev;
   190  int retval, err = 0;
   191  bool use_reinsert;
   192  bool want_ingress;
   193  bool is_redirect;
   194  int m_eaction;
   195  int mac_len;
   196  
   197  tcf_lastuse_update(&m->tcf_tm);
   198  bstats_cpu_update(this_cpu_ptr(m->common.cpu_bstats), skb);
   199  
   200  m_mac_header_xmit = READ_ONCE(m->tcfm_mac_header_xmit);
   201  m_eaction = READ_ONCE(m->tcfm_eaction);
   202  retval = READ_ONCE(m->tcf_action);
   203  dev = rcu_dereference_bh(m->tcfm_dev);
   204  if (unlikely(!dev)) {
   205  pr_notice_once("tc mirred: target device is gone\n");
   206  goto out;
   207  }
   208  
   209  if (unlikely(!(dev->flags & IFF_UP))) {
   210  net_notice_ratelimited("tc mirred to Houston: device %s 
is down\n",
   211 dev->name);
   212  goto out;
   213  }
   214  
   215  /* we could easily avoid the clone only if called by ingress 
and clsact;
   216   * since we can't easily detect the clsact caller, skip clone 
only for
   217   * ingress - that covers the TC S/W datapath.
   218   */
   219  is_redirect = tcf_mirred_is_act_redirect(m_eaction);
   220  use_reinsert = skb_at_tc_ingress(skb) && is_redirect &&
   221 tcf_mirred_can_reinsert(retval);
   222  if (!use_reinsert) {
   223  skb2 = skb_clone(skb, GFP_ATOMIC);
   224  if (!skb2)
   225  goto out;
   226  }
   227  
   228  /* If action's target direction differs than filter's direction,
   229   * and devices expect a mac header on xmit, then mac push/pull 
is
   230   * needed.
   231   */
   232  want_ingress = tcf_mirred_act_wants_ingress(m_eaction);
   233  if (skb_at_tc_ingress(skb) != want_ingress && 
m_mac_header_xmit) {
   234  if (!skb_at_tc_ingress(skb)) {
   235  /* caught at egress, act ingress: pull mac */
   236  mac_len = skb_network_header(skb) - 
skb_mac_header(skb);
   237  skb_pull_rcsum(skb2, mac_len);
   238  } else {
   239  /* caught at ingress, act egress: push mac */
   240  skb_push_rcsum(skb2, skb->mac_len);
   241  }
   242  }
   243  
   244  skb2->skb_iif = skb->dev->ifindex;
   245  skb2->dev = dev;
   246  
   247  /* mirror is always swallowed */
   248  if (is_redirect) {
   249  skb2->tc_redirected = 1;
   250  skb2->tc_from_ingress = skb2->tc_at_ingress;
   251  
   252  /* let's the caller reinsert the packet, if possible */
   253  if (use_reinsert) {
   254  res->ingress = want_ingress;
   255  res->qstats = 
this_cpu_ptr(m->common.cpu_qstats);
   256  return TC_ACT_REINSERT;
   257  }
   258  }
   259  
   260  if (!want_ingress)
   261  err = dev_queue_xmit(skb2);
   262  else
   263  err = netif_receive_skb(skb2);
   264  
   265  if (err) {
   266  out:
   267  
qstats_overlimit_inc(this_cpu_ptr(m->common.cpu_qstats));
 > 268  if (is_redirect)
   269  retval = TC_ACT_SHOT;
   270  

pull-request: bpf 2018-07-28

2018-07-28 Thread Daniel Borkmann
Hi David,

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) API fixes for libbpf's BTF mapping of map key/value types in order
   to make them compatible with iproute2's BPF_ANNOTATE_KV_PAIR()
   markings, from Martin.

2) Fix AF_XDP to not report POLLIN prematurely by using the non-cached
   consumer pointer of the RX queue, from Björn.

3) Fix __xdp_return() to check for NULL pointer after the rhashtable
   lookup that retrieves the allocator object, from Taehee.

4) Fix x86-32 JIT to adjust ebp register in prologue and epilogue
   by 4 bytes which got removed from overall stack usage, from Wang.

5) Fix bpf_skb_load_bytes_relative() length check to use actual
   packet length, from Daniel.

6) Fix uninitialized return code in libbpf bpf_perf_event_read_simple()
   handler, from Thomas.

Please consider pulling these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git

Thanks a lot!



The following changes since commit 1a4f14bab1868b443f0dd3c55b689a478f82e72e:

  Merge branch 'tcp-robust-ooo' (2018-07-23 12:01:48 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git 

for you to fetch changes up to 71eb5255f55bdb484d35ff7c9a1803f453dfbf82:

  bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog() (2018-07-28 
21:23:24 +0200)


Björn Töpel (1):
  xsk: fix poll/POLLIN premature returns

Daniel Borkmann (2):
  Merge branch 'bpf-annotate-kv-pair'
  bpf: fix bpf_skb_load_bytes_relative pkt length check

Martin KaFai Lau (5):
  bpf: btf: Ensure the member->offset is in the right order
  bpf: btf: Sync uapi btf.h to tools
  bpf: Replace [u]int32_t and [u]int64_t in libbpf
  bpf: Introduce BPF_ANNOTATE_KV_PAIR
  bpf: btf: Use exact btf value_size match in map_check_btf()

Taehee Yoo (2):
  xdp: add NULL pointer check in __xdp_return()
  bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()

Thomas Richter (1):
  perf build: Build error in libbpf missing initialization

Wang YanQing (1):
  bpf, x32: Fix regression caused by commit 24dea04767e6

 arch/x86/net/bpf_jit_comp32.c|   8 +-
 kernel/bpf/arraymap.c|   2 +-
 kernel/bpf/btf.c |  14 +++-
 net/core/filter.c|  12 +--
 net/core/lwt_bpf.c   |   2 +-
 net/core/xdp.c   |   3 +-
 net/xdp/xsk_queue.h  |   2 +-
 tools/include/uapi/linux/btf.h   |   2 +-
 tools/lib/bpf/btf.c  |  39 +
 tools/lib/bpf/btf.h  |  10 ++-
 tools/lib/bpf/libbpf.c   |  87 ++--
 tools/lib/bpf/libbpf.h   |   4 +-
 tools/testing/selftests/bpf/bpf_helpers.h|   9 +++
 tools/testing/selftests/bpf/test_btf.c   | 114 ++-
 tools/testing/selftests/bpf/test_btf_haskv.c |   7 +-
 15 files changed, 225 insertions(+), 90 deletions(-)


Re: [PATCH net] ipv4: remove BUG_ON() from fib_compute_spec_dst

2018-07-28 Thread David Miller
From: Lorenzo Bianconi 
Date: Fri, 27 Jul 2018 18:15:46 +0200

> Remove BUG_ON() from fib_compute_spec_dst routine and check
> in_dev pointer during flowi4 data structure initialization.
> fib_compute_spec_dst routine can be run concurrently with device removal
> where ip_ptr net_device pointer is set to NULL. This can happen
> if userspace enables pkt info on UDP rx socket and the device
> is removed while traffic is flowing
> 
> Fixes: 35ebf65e851c ("ipv4: Create and use fib_compute_spec_dst() helper")
> Signed-off-by: Lorenzo Bianconi 

Applied and queued up for -stable, thank you.


Re: [PATCH net] enic: handle mtu change for vf properly

2018-07-28 Thread David Miller
From: Govindarajulu Varadarajan 
Date: Fri, 27 Jul 2018 11:19:29 -0700

> When driver gets notification for mtu change, driver does not handle it for
> all RQs. It handles only RQ[0].
> 
> Fix is to use enic_change_mtu() interface to change mtu for vf.
> 
> Signed-off-by: Govindarajulu Varadarajan 

Applied, thank you.


[PATCH net] openvswitch: meter: Fix setting meter id for new entries

2018-07-28 Thread Justin Pettit
The meter code would create an entry for each new meter.  However, it
would not set the meter id in the new entry, so every meter would appear
to have a meter id of zero.  This commit properly sets the meter id when
adding the entry.

Fixes: 96fbc13d7e77 ("openvswitch: Add meter infrastructure")
Signed-off-by: Justin Pettit 
Cc: Andy Zhou 
---
 net/openvswitch/meter.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/openvswitch/meter.c b/net/openvswitch/meter.c
index b891a91577f8..c038e021a591 100644
--- a/net/openvswitch/meter.c
+++ b/net/openvswitch/meter.c
@@ -211,6 +211,7 @@ static struct dp_meter *dp_meter_create(struct nlattr **a)
if (!meter)
return ERR_PTR(-ENOMEM);
 
+   meter->id = nla_get_u32(a[OVS_METER_ATTR_ID]);
meter->used = div_u64(ktime_get_ns(), 1000 * 1000);
meter->kbps = a[OVS_METER_ATTR_KBPS] ? 1 : 0;
meter->keep_stats = !a[OVS_METER_ATTR_CLEAR];
@@ -280,6 +281,10 @@ static int ovs_meter_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
u32 meter_id;
bool failed;
 
+   if (!a[OVS_METER_ATTR_ID]) {
+   return -ENODEV;
+   }
+
meter = dp_meter_create(a);
if (IS_ERR_OR_NULL(meter))
return PTR_ERR(meter);
@@ -298,11 +303,6 @@ static int ovs_meter_cmd_set(struct sk_buff *skb, struct 
genl_info *info)
goto exit_unlock;
}
 
-   if (!a[OVS_METER_ATTR_ID]) {
-   err = -ENODEV;
-   goto exit_unlock;
-   }
-
meter_id = nla_get_u32(a[OVS_METER_ATTR_ID]);
 
/* Cannot fail after this. */
-- 
2.17.1



Re: [PATCH net] nfp: flower: fix port metadata conversion bug

2018-07-28 Thread David Miller
From: Jakub Kicinski 
Date: Fri, 27 Jul 2018 20:56:52 -0700

> From: John Hurley 
> 
> Function nfp_flower_repr_get_type_and_port expects an enum nfp_repr_type
> return value but, if the repr type is unknown, returns a value of type
> enum nfp_flower_cmsg_port_type.  This means that if FW encodes the port
> ID in a way the driver does not understand instead of dropping the frame
> driver may attribute it to a physical port (uplink) provided the port
> number is less than physical port count.
> 
> Fix this and ensure a net_device of NULL is returned if the repr can not
> be determined.
> 
> Fixes: 1025351a88a4 ("nfp: add flower app")
> Signed-off-by: John Hurley 
> Signed-off-by: Jakub Kicinski 
> ---
> This is low impact and unlikely, but also fix is trivial so either
> net or net-next works.

Applied to 'net', thanks.


[PATCH iproute2-next] Add tc(8) userspace support for SKB Priority qdisc

2018-07-28 Thread Nishanth Devarajan
sch_skbprio is a qdisc that prioritizes packets according to their skb->priority
field. Under congestion, it drops already-enqueued lower priority packets to
make space available for higher priority packets. Skbprio was conceived as a
solution for denial-of-service defenses that need to route packets with
different priorities as a means to overcome DoS attacks.

Signed-off-by: Nishanth Devarajan 
Reviewed-by: Michel Machado 
---
 include/uapi/linux/pkt_sched.h |  7 
 man/man8/tc-skbprio.8  | 70 
 tc/Makefile|  1 +
 tc/q_skbprio.c | 81 ++
 4 files changed, 159 insertions(+)
 create mode 100644 man/man8/tc-skbprio.8
 create mode 100644 tc/q_skbprio.c

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 37b5096..81af99e 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -124,6 +124,12 @@ struct tc_fifo_qopt {
__u32   limit;  /* Queue length: bytes for bfifo, packets for pfifo */
 };
 
+/* SKBPRIO section */
+
+struct tc_skbprio_qopt {
+   __u32   limit;  /* Queue length in packets. */
+};
+
 /* PRIO section */
 
 #define TCQ_PRIO_BANDS 16
@@ -256,6 +262,7 @@ struct tc_red_qopt {
 #define TC_RED_ECN 1
 #define TC_RED_HARDDROP2
 #define TC_RED_ADAPTATIVE  4
+#define TC_RED_OFFLOADED   8
 };
 
 struct tc_red_xstats {
diff --git a/man/man8/tc-skbprio.8 b/man/man8/tc-skbprio.8
new file mode 100644
index 000..ae4f9e1
--- /dev/null
+++ b/man/man8/tc-skbprio.8
@@ -0,0 +1,70 @@
+.TH SKBPRIO 8 "27 July 2018" "iproute2" "Linux"
+.SH NAME
+skbprio \- SKB Priority Queue
+
+.SH SYNOPSIS
+.B tc qdisc ... add skbprio
+.B [ limit
+packets
+.B ]
+
+.SH DESCRIPTION
+SKB Priority Queue is a queueing discipline intended to prioritize
+the most important packets during a denial-of-service (
+.B DoS
+) attack. The priority of a packet is given by
+.B skb->priority
+, where a higher value places the packet closer to the exit of the queue. When
+the queue is full, the lowest priority packet in the queue is dropped to make
+room for the packet to be added if it has higher priority. If the packet to be
+added has lower priority than all packets in the queue, it is dropped.
+
+Without SKB priority queue, queue length limits must be imposed
+on individual sub-queues, and there is no straightforward way to enforce
+a global queue length limit across all priorities. SKBprio queue enforces a
+global queue length limit while not restricting the lengths of individual
+sub-queues.
+
+While SKB Priority Queue is agnostic to how
+.B skb->priority
+is assigned. A typical use case is to copy
+the 6-bit DS field of IPv4 and IPv6 packets using
+.BR tc-skbedit (8)
+. If
+.B skb->priority
+is greater or equal to 64, the priority is assumed to be 63.
+Priorities less than 64 are taken at face value.
+
+SKB Priority Queue enables routers to locally decide which
+packets to drop under a DoS attack.
+Priorities should be assigned to packets such that the higher the priority,
+the more expected behavior a source shows.
+So sources have an incentive to play by the rules.
+
+.SH ALGORITHM
+
+Skbprio maintains 64 lists (priorities go from 0 to 63).
+When a packet is enqueued, it gets inserted at the
+.B tail
+of its priority list. When a packet needs to be sent out to the network, it is
+taken from the head of the highest priority list. When the queue is full,
+the packet at the tail of the lowest priority list is dropped to serve the
+ingress packet - if it is of higher priority, otherwise the ingress packet is
+dropped. This algorithm allocates as much bandwidth as possible to high
+priority packets, while only servicing low priority packets when
+there is enough bandwidth.
+
+.SH PARAMETERS
+.TP
+limit
+Maximum queue size specified in packets. It defaults to 64.
+The range for this parameter is [0, UINT32_MAX].
+
+.SH SEE ALSO
+.BR tc-prio (8),
+.BR tc-skbedit (8)
+
+.SH AUTHORS
+Nishanth Devarajan , Michel Machado 
+
+This manpage maintained by Bert Hubert 
diff --git a/tc/Makefile b/tc/Makefile
index dfd0026..7646105 100644
--- a/tc/Makefile
+++ b/tc/Makefile
@@ -71,6 +71,7 @@ TCMODULES += q_clsact.o
 TCMODULES += e_bpf.o
 TCMODULES += f_matchall.o
 TCMODULES += q_cbs.o
+TCMODULES += q_skbprio.o
 
 TCSO :=
 ifeq ($(TC_CONFIG_ATM),y)
diff --git a/tc/q_skbprio.c b/tc/q_skbprio.c
new file mode 100644
index 000..a2a5077
--- /dev/null
+++ b/tc/q_skbprio.c
@@ -0,0 +1,81 @@
+/*
+ * q_skbprio.c SKB PRIORITY QUEUE.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ *
+ * Authors:Nishanth Devarajan, 
+ *
+ */
+
+#include 
+#include 
+#include 
+#inclu

Re: [PATCH bpf] bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()

2018-07-28 Thread Daniel Borkmann
On 07/28/2018 05:28 PM, Taehee Yoo wrote:
> bpf_parse_prog() is protected by rcu_read_lock().
> so that GFP_KERNEL is not allowed in the bpf_parse_prog().
> 
> [51015.579396] =
> [51015.579418] WARNING: suspicious RCU usage
> [51015.579444] 4.18.0-rc6+ #208 Not tainted
> [51015.579464] -
> [51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU 
> read-side critical section!
> [51015.579510] other info that might help us debug this:
> [51015.579532] rcu_scheduler_active = 2, debug_locks = 1
> [51015.579556] 2 locks held by ip/1861:
> [51015.579577]  #0: a8c12fd1 (rtnl_mutex){+.+.}, at: 
> rtnetlink_rcv_msg+0x2e0/0x910
> [51015.579711]  #1: bf815f8e (rcu_read_lock){}, at: 
> lwtunnel_build_state+0x96/0x390
> [51015.579842] stack backtrace:
> [51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208
> [51015.579891] Hardware name: To be filled by O.E.M. To be filled by 
> O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
> [51015.579911] Call Trace:
> [51015.579950]  dump_stack+0x74/0xbb
> [51015.58]  ___might_sleep+0x16b/0x3a0
> [51015.580047]  __kmalloc_track_caller+0x220/0x380
> [51015.580077]  kmemdup+0x1c/0x40
> [51015.580077]  bpf_parse_prog+0x10e/0x230
> [51015.580164]  ? kasan_kmalloc+0xa0/0xd0
> [51015.580164]  ? bpf_destroy_state+0x30/0x30
> [51015.580164]  ? bpf_build_state+0xe2/0x3e0
> [51015.580164]  bpf_build_state+0x1bb/0x3e0
> [51015.580164]  ? bpf_parse_prog+0x230/0x230
> [51015.580164]  ? lock_is_held_type+0x123/0x1a0
> [51015.580164]  lwtunnel_build_state+0x1aa/0x390
> [51015.580164]  fib_create_info+0x1579/0x33d0
> [51015.580164]  ? sched_clock_local+0xe2/0x150
> [51015.580164]  ? fib_info_update_nh_saddr+0x1f0/0x1f0
> [51015.580164]  ? sched_clock_local+0xe2/0x150
> [51015.580164]  fib_table_insert+0x201/0x1990
> [51015.580164]  ? lock_downgrade+0x610/0x610
> [51015.580164]  ? fib_table_lookup+0x1920/0x1920
> [51015.580164]  ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0
> [51015.580164]  ? rtm_to_fib_config+0x637/0xbd0
> [51015.580164]  inet_rtm_newroute+0xed/0x1b0
> [51015.580164]  ? rtm_to_fib_config+0xbd0/0xbd0
> [51015.580164]  rtnetlink_rcv_msg+0x331/0x910
> [ ... ]
> 
> Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
> Signed-off-by: Taehee Yoo 

Applied to bpf, thanks Taehee!


Re: [PATCH bpf] tools/bpftool: fix a percpu_array map dump problem

2018-07-28 Thread Daniel Borkmann
On 07/28/2018 01:11 AM, Yonghong Song wrote:
> I hit the following problem when I tried to use bpftool
> to dump a percpu array.
> 
>   $ sudo ./bpftool map show
>   61: percpu_array  name stub  flags 0x0
> key 4B  value 4B  max_entries 1  memlock 4096B
>   ...
>   $ sudo ./bpftool map dump id 61
>   bpftool: malloc.c:2406: sysmalloc: Assertion
>   `(old_top == initial_top (av) && old_size == 0) || \
>((unsigned long) (old_size) >= MINSIZE && \
>prev_inuse (old_top) && \
>((unsigned long) old_end & (pagesize - 1)) == 0)'
>   failed.
>   Aborted
> 
> Further debugging revealed that this is due to
> miscommunication between bpftool and kernel.
> For example, for the above percpu_array with value size of 4B.
> The map info returned to user space has value size of 4B.
> 
> In bpftool, the values array for lookup is allocated like:
>info->value_size * get_possible_cpus() = 4 * get_possible_cpus()
> In kernel (kernel/bpf/syscall.c), the values array size is
> rounded up to multiple of 8.
>round_up(map->value_size, 8) * num_possible_cpus()
>= 8 * num_possible_cpus()
> So when kernel copies the values to user buffer, the kernel will
> overwrite beyond user buffer boundary.
> 
> This patch fixed the issue by allocating and stepping through
> percpu map value array properly in bpftool.
> 
> Fixes: 71bb428fe2c19 ("tools: bpf: add bpftool")
> Signed-off-by: Yonghong Song 
> ---
>  tools/bpf/bpftool/map.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/bpf/bpftool/map.c b/tools/bpf/bpftool/map.c
> index 0ee3ba479d87..92bc55f98c4c 100644
> --- a/tools/bpf/bpftool/map.c
> +++ b/tools/bpf/bpftool/map.c
> @@ -35,6 +35,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -91,7 +92,8 @@ static bool map_is_map_of_progs(__u32 type)
>  static void *alloc_value(struct bpf_map_info *info)
>  {
>   if (map_is_per_cpu(info->type))
> - return malloc(info->value_size * get_possible_cpus());
> + return malloc(round_up(info->value_size, 8) *
> +   get_possible_cpus());
>   else
>   return malloc(info->value_size);
>  }
> @@ -273,9 +275,10 @@ static void print_entry_json(struct bpf_map_info *info, 
> unsigned char *key,
>   do_dump_btf(&d, info, key, value);
>   }
>   } else {
> - unsigned int i, n;
> + unsigned int i, n, step;
>  
>   n = get_possible_cpus();
> + step = round_up(info->value_size, 8);
>  
>   jsonw_name(json_wtr, "key");
>   print_hex_data_json(key, info->key_size);
> @@ -288,7 +291,7 @@ static void print_entry_json(struct bpf_map_info *info, 
> unsigned char *key,
>   jsonw_int_field(json_wtr, "cpu", i);
>  
>   jsonw_name(json_wtr, "value");
> - print_hex_data_json(value + i * info->value_size,
> + print_hex_data_json(value + i * step,
>   info->value_size);
>  
>   jsonw_end_object(json_wtr);

Fix looks correct to me, but you would also need the same fix in 
print_entry_plain(), no?

Thanks,
Daniel


Re: [PATCH][net-next] openvswitch: eliminate cpu_used_mask from sw_flow

2018-07-28 Thread Pravin Shelar
On Fri, Jul 27, 2018 at 1:03 AM, Li RongQing  wrote:
> The size of struct cpumask varies with CONFIG_NR_CPUS, some config
> CONFIG_NR_CPUS is very larger, like 5120, struct cpumask will take
> 640 bytes, if there is thousands of flows, it will take lots of
> memory
>
I am fine with removing cpumask bitmap from flow struct.

> cpu_used_mask has two purposes
> 1: Assume first cpu as cpu0 which maybe not true; now use
>cpumask_first(cpu_possible_mask)

I am not sure about this, most of system would have cpu zero, so why
this change is done in this patch ? This adds overhead of calculating
first cpu when updating stats in fast path.

> 2: when get/clear statistic, reduce the iteratation; but it
>is not hot path, so use for_each_possible_cpu
>


[PATCH bpf] bpf: fix bpf_skb_load_bytes_relative pkt length check

2018-07-28 Thread Daniel Borkmann
The len > skb_headlen(skb) cannot be used as a maximum upper bound
for the packet length since it does not have any relation to the full
linear packet length when filtering is used from upper layers (e.g.
in case of reuseport BPF programs) as by then skb->data, skb->len
already got mangled through __skb_pull() and others.

Fixes: 4e1ec56cdc59 ("bpf: add skb_load_bytes_relative helper")
Signed-off-by: Daniel Borkmann 
Acked-by: Martin KaFai Lau 
---
 net/core/filter.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index 06da770..9dfd145 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -1712,24 +1712,26 @@ static const struct bpf_func_proto 
bpf_skb_load_bytes_proto = {
 BPF_CALL_5(bpf_skb_load_bytes_relative, const struct sk_buff *, skb,
   u32, offset, void *, to, u32, len, u32, start_header)
 {
+   u8 *end = skb_tail_pointer(skb);
+   u8 *net = skb_network_header(skb);
+   u8 *mac = skb_mac_header(skb);
u8 *ptr;
 
-   if (unlikely(offset > 0x || len > skb_headlen(skb)))
+   if (unlikely(offset > 0x || len > (end - mac)))
goto err_clear;
 
switch (start_header) {
case BPF_HDR_START_MAC:
-   ptr = skb_mac_header(skb) + offset;
+   ptr = mac + offset;
break;
case BPF_HDR_START_NET:
-   ptr = skb_network_header(skb) + offset;
+   ptr = net + offset;
break;
default:
goto err_clear;
}
 
-   if (likely(ptr >= skb_mac_header(skb) &&
-  ptr + len <= skb_tail_pointer(skb))) {
+   if (likely(ptr >= mac && ptr + len <= end)) {
memcpy(to, ptr, len);
return 0;
}
-- 
2.9.5



Re: [patch net-next] net: sched: don't dump chains only held by actions

2018-07-28 Thread Cong Wang
On Sat, Jul 28, 2018 at 10:20 AM Cong Wang  wrote:
>
> On Fri, Jul 27, 2018 at 12:47 AM Jiri Pirko  wrote:
> >
> > From: Jiri Pirko 
> >
> > In case a chain is empty and not explicitly created by a user,
> > such chain should not exist. The only exception is if there is
> > an action "goto chain" pointing to it. In that case, don't show the
> > chain in the dump. Track the chain references held by actions and
> > use them to find out if a chain should or should not be shown
> > in chain dump.
> >
> > Signed-off-by: Jiri Pirko 
>
> Looks reasonable to me.
>
> Acked-by: Cong Wang 

Hold on...

If you increase the refcnt for a zombie chain on NEWCHAIN path,
then it would become a non-zombie, this makes sense. However,
if the action_refcnt gets increased again when another action uses it,
it become a zombie again because refcnt==action_refcnt??


Re: [patch net-next] net: sched: don't dump chains only held by actions

2018-07-28 Thread Cong Wang
On Fri, Jul 27, 2018 at 12:47 AM Jiri Pirko  wrote:
>
> From: Jiri Pirko 
>
> In case a chain is empty and not explicitly created by a user,
> such chain should not exist. The only exception is if there is
> an action "goto chain" pointing to it. In that case, don't show the
> chain in the dump. Track the chain references held by actions and
> use them to find out if a chain should or should not be shown
> in chain dump.
>
> Signed-off-by: Jiri Pirko 

Looks reasonable to me.

Acked-by: Cong Wang 


Re: [net-next 10/16] net/mlx5: Support PCIe buffer congestion handling via Devlink

2018-07-28 Thread Bjorn Helgaas
On Thu, Jul 26, 2018 at 07:00:20AM -0700, Alexander Duyck wrote:
> On Thu, Jul 26, 2018 at 12:14 AM, Jiri Pirko  wrote:
> > Thu, Jul 26, 2018 at 02:43:59AM CEST, jakub.kicin...@netronome.com wrote:
> >>On Wed, 25 Jul 2018 08:23:26 -0700, Alexander Duyck wrote:
> >>> On Wed, Jul 25, 2018 at 5:31 AM, Eran Ben Elisha wrote:
> >>> > On 7/24/2018 10:51 PM, Jakub Kicinski wrote:
> >>>  The devlink params haven't been upstream even for a full cycle and
> >>>  already you guys are starting to use them to configure standard
> >>>  features like queuing.
> >>> >>>
> >>> >>> We developed the devlink params in order to support non-standard
> >>> >>> configuration only. And for non-standard, there are generic and vendor
> >>> >>> specific options.
> >>> >>
> >>> >> I thought it was developed for performing non-standard and possibly
> >>> >> vendor specific configuration.  Look at DEVLINK_PARAM_GENERIC_* for
> >>> >> examples of well justified generic options for which we have no
> >>> >> other API.  The vendor mlx4 options look fairly vendor specific if you
> >>> >> ask me, too.
> >>> >>
> >>> >> Configuring queuing has an API.  The question is it acceptable to enter
> >>> >> into the risky territory of controlling offloads via devlink parameters
> >>> >> or would we rather make vendors take the time and effort to model
> >>> >> things to (a subset) of existing APIs.  The HW never fits the APIs
> >>> >> perfectly.
> >>> >
> >>> > I understand what you meant here, I would like to highlight that this
> >>> > mechanism was not meant to handle SRIOV, Representors, etc.
> >>> > The vendor specific configuration suggested here is to handle a 
> >>> > congestion
> >>> > state in Multi Host environment (which includes PF and multiple VFs per
> >>> > host), where one host is not aware to the other hosts, and each is 
> >>> > running
> >>> > on its own pci/driver. It is a device working mode configuration.
> >>> >
> >>> > This  couldn't fit into any existing API, thus creating this vendor 
> >>> > specific
> >>> > unique API is needed.
> >>>
> >>> If we are just going to start creating devlink interfaces in for every
> >>> one-off option a device wants to add why did we even bother with
> >>> trying to prevent drivers from using sysfs? This just feels like we
> >>> are back to the same arguments we had back in the day with it.
> >>>
> >>> I feel like the bigger question here is if devlink is how we are going
> >>> to deal with all PCIe related features going forward, or should we
> >>> start looking at creating a new interface/tool for PCI/PCIe related
> >>> features? My concern is that we have already had features such as DMA
> >>> Coalescing that didn't really fit into anything and now we are
> >>> starting to see other things related to DMA and PCIe bus credits. I'm
> >>> wondering if we shouldn't start looking at a tool/interface to
> >>> configure all the PCIe related features such as interrupts, error
> >>> reporting, DMA configuration, power management, etc. Maybe we could
> >>> even look at sharing it across subsystems and include things like
> >>> storage, graphics, and other subsystems in the conversation.
> >>
> >>Agreed, for actual PCIe configuration (i.e. not ECN marking) we do need
> >>to build up an API.  Sharing it across subsystems would be very cool!

I read the thread (starting at [1], for anybody else coming in late)
and I see this has something to do with "configuring outbound PCIe
buffers", but I haven't seen the connection to PCIe protocol or
features, i.e., I can't connect this to anything in the PCIe spec.

Can somebody help me understand how the PCI core is relevant?  If
there's some connection with a feature defined by PCIe, or if it
affects the PCIe transaction protocol somehow, I'm definitely
interested in this.  But if this only affects the data transferred
over PCIe, i.e., the data payloads of PCIe TLP packets, then I'm not
sure why the PCI core should care.

> > I wonder howcome there isn't such API in place already. Or is it?
> > If it is not, do you have any idea how should it look like? Should it be
> > an extension of the existing PCI uapi or something completely new?
> > It would be probably good to loop some PCI people in...
> 
> The closest thing I can think of in terms of answering your questions
> as to why we haven't seen anything like that would be setpci.
> Basically with that tool you can go through the PCI configuration
> space and update any piece you want. The problem is it can have
> effects on the driver and I don't recall there ever being any sort of
> notification mechanism added to make a driver aware of configuration
> updates.

setpci is a development and debugging tool, not something we should
use as the standard way of configuring things.  Use of setpci should
probably taint the kernel because the PCI core configures features
like MPS, ASPM, AER, etc., based on the assumption that nobody else is
changing things in PCI config space.

> As far as the interface I 

[PATCH bpf] bpf: use GFP_ATOMIC instead of GFP_KERNEL in bpf_parse_prog()

2018-07-28 Thread Taehee Yoo
bpf_parse_prog() is protected by rcu_read_lock().
so that GFP_KERNEL is not allowed in the bpf_parse_prog().

[51015.579396] =
[51015.579418] WARNING: suspicious RCU usage
[51015.579444] 4.18.0-rc6+ #208 Not tainted
[51015.579464] -
[51015.579488] ./include/linux/rcupdate.h:303 Illegal context switch in RCU 
read-side critical section!
[51015.579510] other info that might help us debug this:
[51015.579532] rcu_scheduler_active = 2, debug_locks = 1
[51015.579556] 2 locks held by ip/1861:
[51015.579577]  #0: a8c12fd1 (rtnl_mutex){+.+.}, at: 
rtnetlink_rcv_msg+0x2e0/0x910
[51015.579711]  #1: bf815f8e (rcu_read_lock){}, at: 
lwtunnel_build_state+0x96/0x390
[51015.579842] stack backtrace:
[51015.579869] CPU: 0 PID: 1861 Comm: ip Not tainted 4.18.0-rc6+ #208
[51015.579891] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./Aptio CRB, BIOS 5.6.5 07/08/2015
[51015.579911] Call Trace:
[51015.579950]  dump_stack+0x74/0xbb
[51015.58]  ___might_sleep+0x16b/0x3a0
[51015.580047]  __kmalloc_track_caller+0x220/0x380
[51015.580077]  kmemdup+0x1c/0x40
[51015.580077]  bpf_parse_prog+0x10e/0x230
[51015.580164]  ? kasan_kmalloc+0xa0/0xd0
[51015.580164]  ? bpf_destroy_state+0x30/0x30
[51015.580164]  ? bpf_build_state+0xe2/0x3e0
[51015.580164]  bpf_build_state+0x1bb/0x3e0
[51015.580164]  ? bpf_parse_prog+0x230/0x230
[51015.580164]  ? lock_is_held_type+0x123/0x1a0
[51015.580164]  lwtunnel_build_state+0x1aa/0x390
[51015.580164]  fib_create_info+0x1579/0x33d0
[51015.580164]  ? sched_clock_local+0xe2/0x150
[51015.580164]  ? fib_info_update_nh_saddr+0x1f0/0x1f0
[51015.580164]  ? sched_clock_local+0xe2/0x150
[51015.580164]  fib_table_insert+0x201/0x1990
[51015.580164]  ? lock_downgrade+0x610/0x610
[51015.580164]  ? fib_table_lookup+0x1920/0x1920
[51015.580164]  ? lwtunnel_valid_encap_type.part.6+0xcb/0x3a0
[51015.580164]  ? rtm_to_fib_config+0x637/0xbd0
[51015.580164]  inet_rtm_newroute+0xed/0x1b0
[51015.580164]  ? rtm_to_fib_config+0xbd0/0xbd0
[51015.580164]  rtnetlink_rcv_msg+0x331/0x910
[ ... ]

Fixes: 3a0af8fd61f9 ("bpf: BPF for lightweight tunnel infrastructure")
Signed-off-by: Taehee Yoo 
---
 net/core/lwt_bpf.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/core/lwt_bpf.c b/net/core/lwt_bpf.c
index e7e626f..e450985 100644
--- a/net/core/lwt_bpf.c
+++ b/net/core/lwt_bpf.c
@@ -217,7 +217,7 @@ static int bpf_parse_prog(struct nlattr *attr, struct 
bpf_lwt_prog *prog,
if (!tb[LWT_BPF_PROG_FD] || !tb[LWT_BPF_PROG_NAME])
return -EINVAL;
 
-   prog->name = nla_memdup(tb[LWT_BPF_PROG_NAME], GFP_KERNEL);
+   prog->name = nla_memdup(tb[LWT_BPF_PROG_NAME], GFP_ATOMIC);
if (!prog->name)
return -ENOMEM;
 
-- 
2.9.3



[no subject]

2018-07-28 Thread Andrew Martinez
 body {height: 100%; color:#00; font-size:12pt; font-family:arial, 
helvetica, sans-serif;}


Brauchen Sie einen Kredit? Wenn ja, mailen Sie uns jetzt für weitere 
Informationen

Do you need a loan of any kind? If Yes email us now for more info

[no subject]

2018-07-28 Thread Andrew Martinez



Brauchen Sie einen Kredit? Wenn ja, mailen Sie uns jetzt für weitere 
Informationen

Do you need a loan of any kind? If Yes email us now for more info