[PATCH v1 3/3] net: skb_queue_purge(): lock/unlock the queue only once

2017-09-08 Thread Michael Witten
Thanks for your input, Eric Dumazet and Stephen Hemminger; based on
your observations, this version of the patch implements a very
lightweight purging of the queue.

To apply this patch, save this email to:

  /path/to/email

and then run:

  git am --scissors /path/to/email

You may also fetch this patch from GitHub:

  git checkout -b test 5969d1bb3082b41eba8fd2c826559abe38ccb6df
  git pull https://github.com/mfwitten/linux.git net/tcp-ip/01-cleanup/02

Sincerely,
Michael Witten

8<8<8<8<8<8<8<8<8<8<8<8<8<

Hitherto, the queue's lock has been locked/unlocked every time
an item is dequeued; this seems not only inefficient, but also
incorrect, as the whole point of `skb_queue_purge()' is to clear
the queue, presumably without giving any other thread a chance to
manipulate the queue in the interim.

With this commit, the queue's lock is locked/unlocked only once
when `skb_queue_purge()' is called, and in a way that disables
the IRQs for only a minimal amount of time.

This is achieved by atomically re-initializing the queue (thereby
clearing it), and then freeing each of the items as though it were
enqueued in a private queue that doesn't require locking.

Signed-off-by: Michael Witten 
---
 net/core/skbuff.c | 26 ++
 1 file changed, 18 insertions(+), 8 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 68065d7d383f..bd26b0bde784 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2825,18 +2825,28 @@ struct sk_buff *skb_dequeue_tail(struct sk_buff_head 
*list)
 EXPORT_SYMBOL(skb_dequeue_tail);
 
 /**
- * skb_queue_purge - empty a list
- * @list: list to empty
+ * skb_queue_purge - empty a queue
+ * @q: the queue to empty
  *
- * Delete all buffers on an &sk_buff list. Each buffer is removed from
- * the list and one reference dropped. This function takes the list
- * lock and is atomic with respect to other list locking functions.
+ * Dequeue and free each socket buffer that is in @q.
+ *
+ * This function is atomic with respect to other queue-locking functions.
  */
-void skb_queue_purge(struct sk_buff_head *list)
+void skb_queue_purge(struct sk_buff_head *q)
 {
-   struct sk_buff *skb;
-   while ((skb = skb_dequeue(list)) != NULL)
+   unsigned long flags;
+   struct sk_buff *skb, *next, *head = (struct sk_buff *)q;
+
+   spin_lock_irqsave(&q->lock, flags);
+   skb = q->next;
+   __skb_queue_head_init(q);
+   spin_unlock_irqrestore(&q->lock, flags);
+
+   while (skb != head) {
+   next = skb->next;
kfree_skb(skb);
+   skb = next;
+   }
 }
 EXPORT_SYMBOL(skb_queue_purge);
 
-- 
2.14.1



Re: [PATCH net-next v8] openvswitch: enable NSH support

2017-09-08 Thread kbuild test robot
Hi Yi,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Yi-Yang/openvswitch-enable-NSH-support/20170909-124643
config: i386-randconfig-a1-201736 (attached as .config)
compiler: gcc-5 (Debian 5.4.1-2) 5.4.1 20160904
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All errors (new ones prefixed by >>):

   net/openvswitch/actions.o: In function `do_execute_actions':
>> actions.c:(.text+0x19a7): undefined reference to `skb_push_nsh'
>> actions.c:(.text+0x19ee): undefined reference to `skb_pop_nsh'

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread David Miller
From: 严海双 
Date: Sat, 9 Sep 2017 13:09:57 +0800

> 
> 
>> On 2017年9月9日, at 下午12:35, Cong Wang  wrote:
>> 
>> On Fri, Sep 8, 2017 at 6:25 PM, 严海双  
>> wrote:
>>> 
>>> 
 On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
 
 On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
  wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.
 
 So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
 in a whole system, right? This just makes OOM easier to trigger.
 
>>> 
>>> From my understanding, before the patch, we had N * 
>>> net->ipv4.sysctl_tcp_max_orphans,
>>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
>>> ns2.sysctl_tcp_max_orphans
>>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
>> 
>> Nope, by N I mean the number of containers. Before your patch, the limit
>> is global, after your patch it is per container.
>> 
> 
> Yeah, for example, if there is N containers, before the patch, I mean the 
> limit is:
> 
>   N * net->ipv4.sysctl_tcp_max_orphans
> 
> After the patch, the limit is:
> 
>   ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. 
> net->ipv4.sysctl_tcp_max_orphans + …

Not true.

Please remove "N" from your equation of the current situation.

"sysctl_tcp_max_orphans" applies to entire system, it is a global limit,
comparing one limit against all orphans in the system, there is no N.


Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread 严海双


> On 2017年9月9日, at 下午12:35, Cong Wang  wrote:
> 
> On Fri, Sep 8, 2017 at 6:25 PM, 严海双  wrote:
>> 
>> 
>>> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
>>> 
>>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>>  wrote:
 Different namespace application might require different maximal number
 of TCP sockets independently of the host.
>>> 
>>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>>> in a whole system, right? This just makes OOM easier to trigger.
>>> 
>> 
>> From my understanding, before the patch, we had N * 
>> net->ipv4.sysctl_tcp_max_orphans,
>> and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
>> ns2.sysctl_tcp_max_orphans
>> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.
> 
> Nope, by N I mean the number of containers. Before your patch, the limit
> is global, after your patch it is per container.
> 

Yeah, for example, if there is N containers, before the patch, I mean the limit 
is:

N * net->ipv4.sysctl_tcp_max_orphans

After the patch, the limit is:

ns1. net->ipv4.sysctl_tcp_max_orphans + ns2. 
net->ipv4.sysctl_tcp_max_orphans + …






Re: [PATCH net-next 1/1] sched: Use __qdisc_drop instead of kfree_skb in sch_prio and sch_qfq

2017-09-08 Thread David Miller
From: gfree.w...@vip.163.com
Date: Mon,  4 Sep 2017 14:21:12 +0800

> From: Gao Feng 
> 
> The commit 520ac30f4551 ("net_sched: drop packets after root qdisc lock
> is released) made a big change of tc for performance. There are two points
> left in sch_prio and sch_qfq which are not changed with that commit. Now
> enhance them now with __qdisc_drop.
> 
> Signed-off-by: Gao Feng 

Thanks for catching these missed cases, applied.


[GIT] Networking

2017-09-08 Thread David Miller

The iwlwifi firmware compat fix is in here as well as some other
stuff:

1) Fix request socket leak introduced by BPF deadlock fix, from Eric
   Dumazet.

2) Fix VLAN handling with TXQs in mac80211, from Johannes Berg.

3) Missing __qdisc_drop conversions in prio and qfq schedulers, from
   Gao Feng.

4) Use after free in netlink nlk groups handling, from Xin Long.

5) Handle MTU update properly in ipv6 gre tunnels, from Xin Long.

6) Fix leak of ipv6 fib tables on netns teardown, from Sabrina Dubroca
   with follow-on fix from Eric Dumazet.

7) Need RCU and preemption disabled during generic XDP data patch, from
   John Fastabend.

Please pull, thanks a lot!

The following changes since commit 80cee03bf1d626db0278271b505d7f5febb37bba:

  Merge branch 'linus' of 
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 (2017-09-06 
15:17:17 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git 

for you to fetch changes up to 9beb8bedb05c5f3a353dba62b8fa7cbbb9ec685e:

  bpf: make error reporting in bpf_warn_invalid_xdp_action more clear 
(2017-09-08 21:13:09 -0700)


Arnd Bergmann (1):
  isdn: isdnloop: fix logic error in isdnloop_sendbuf

Avraham Stern (1):
  mac80211: flush hw_roc_start work before cancelling the ROC

Baruch Siach (5):
  dt-binding: phy: don't confuse with Ethernet phy properties
  dt-bindings: net: don't confuse with generic PHY property
  dt-bindings: add SFF vendor prefix
  dt-binding: net: sfp binding documentation
  net: phy: sfp: rename dt properties to match the binding

Beni Lev (1):
  mac80211_hwsim: Use proper TX power

Chunho Lee (1):
  mac80211: Fix null pointer dereference with iTXQ support

Daniel Borkmann (2):
  bpf: don't select potentially stale ri->map from buggy xdp progs
  bpf: make error reporting in bpf_warn_invalid_xdp_action more clear

David S. Miller (4):
  Merge tag 'mac80211-for-davem-2017-09-07' of 
git://git.kernel.org/.../jberg/mac80211
  Merge tag 'wireless-drivers-for-davem-2017-09-08' of 
git://git.kernel.org/.../kvalo/wireless-drivers
  Merge git://git.kernel.org/.../pablo/nf
  Merge branch 'xdp-bpf-fixes'

Emmanuel Grumbach (1):
  cfg80211: honor NL80211_RRF_NO_HT40{MINUS,PLUS}

Eric Dumazet (2):
  tcp: fix a request socket leak
  ipv6: fix typo in fib6_net_exit()

Florian Fainelli (1):
  Revert "mdio_bus: Remove unneeded gpiod NULL check"

Florian Westphal (5):
  netfilter: nf_nat: don't bug when mapping already exists
  netfilter: xtables: add scheduling opportunity in get_counters
  netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"
  netfilter: nat: use keyed locks
  netfilter: core: remove erroneous warn_on

Gao Feng (1):
  sched: Use __qdisc_drop instead of kfree_skb in sch_prio and sch_qfq

Haishuang Yan (2):
  ip_tunnel: fix setting ttl and tos value in collect_md mode
  ip6_tunnel: fix setting hop_limit value for ipv6 tunnel

Håkon Bugge (1):
  rds: Fix incorrect statistics counting

Ian W MORRISON (1):
  brcmfmac: feature check for multi-scheduled scan fails on bcm4345 devices

Igor Mitsyanko (1):
  nl80211: look for HT/VHT capabilities in beacon's tail

Ilan peer (1):
  mac80211: Complete ampdu work schedule during session tear down

Ivan Khoronzhuk (1):
  net: ethernet: ti: netcp_core: no need in netif_napi_del

Jiri Pirko (1):
  net: sched: fix memleak for chain zero

Johannes Berg (3):
  mac80211: fix VLAN handling with TXQs
  mac80211: agg-tx: call drv_wake_tx_queue in proper context
  mac80211: fix deadlock in driver-managed RX BA session start

John Fastabend (3):
  net: rcu lock and preempt disable missing around generic xdp
  bpf: add support for sockmap detach programs
  bpf: devmap, use cond_resched instead of cpu_relax

Kees Cook (1):
  net: tulip: Constify tulip_tbl

Kleber Sacilotto de Souza (1):
  tipc: remove unnecessary call to dev_net()

Larry Finger (2):
  rtlwifi: btcoexist: Fix breakage of ant_sel for rtl8723be
  rtlwifi: btcoexist: Fix antenna selection code

Liad Kaufman (1):
  mac80211: add MESH IE in the correct order

Luca Coelho (1):
  iwlwifi: mvm: only send LEDS_CMD when the FW supports it

Marcelo Ricardo Leitner (1):
  sctp: fix missing wake ups in some situations

Mathieu Malaterre (1):
  davicom: Display proper debug level up to 6

Paolo Abeni (1):
  udp: drop head states only when all skb references are gone

Sabrina Dubroca (1):
  ipv6: fix memory leak with multiple tables during netns destruction

Sharon Dvir (1):
  mac80211: shorten debug prints using ht_dbg() to avoid warning

Simon Dinkin (1):
  mac80211: fix incorrect assignment of reassoc value

Vishwanath Pai (1):
  netfilter: xt_hashlimit: fix build error caused by 64bit division

Xin Long (

Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread Cong Wang
On Fri, Sep 8, 2017 at 6:25 PM, 严海双  wrote:
>
>
>> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
>>
>> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>>  wrote:
>>> Different namespace application might require different maximal number
>>> of TCP sockets independently of the host.
>>
>> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
>> in a whole system, right? This just makes OOM easier to trigger.
>>
>
> From my understanding, before the patch, we had N * 
> net->ipv4.sysctl_tcp_max_orphans,
> and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
> ns2.sysctl_tcp_max_orphans
> + ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.

Nope, by N I mean the number of containers. Before your patch, the limit
is global, after your patch it is per container.


Re: [net PATCH 0/3] Fixes for XDP/BPF

2017-09-08 Thread David Miller
From: John Fastabend 
Date: Fri, 08 Sep 2017 14:00:05 -0700

> The following fixes, UAPI updates, and small improvement,
> 
> i. XDP needs to be called inside RCU with preempt disabled.
> 
> ii. Not strictly a bug fix but we have an attach command in the
> sockmap UAPI already to avoid having a single kernel released with
> only the attach and not the detach I'm pushing this into net branch.
> Its early in the RC cycle so I think this is OK (not ideal but better
> than supporting a UAPI with a missing detach forever).
> 
> iii. Final patch replace cpu_relax with cond_resched in devmap.

Series applied, thanks John.


Re: [PATCH net] bpf: make error reporting in bpf_warn_invalid_xdp_action more clear

2017-09-08 Thread David Miller
From: Daniel Borkmann 
Date: Sat,  9 Sep 2017 01:40:35 +0200

> Differ between illegal XDP action code and just driver
> unsupported one to provide better feedback when we throw
> a one-time warning here. Reason is that with 814abfabef3c
> ("xdp: add bpf_redirect helper function") not all drivers
> support the new XDP return code yet and thus they will
> fall into their 'default' case when checking for return
> codes after program return, which then triggers a
> bpf_warn_invalid_xdp_action() stating that the return
> code is illegal, but from XDP perspective it's not.
> 
> I decided not to place something like a XDP_ACT_MAX define
> into uapi i) given we don't have this either for all other
> program types, ii) future action codes could have further
> encoding there, which would render such define unsuitable
> and we wouldn't be able to rip it out again, and iii) we
> rarely add new action codes.
> 
> Signed-off-by: Daniel Borkmann 
> Acked-by: Alexei Starovoitov 

Applied.


Re: [PATCH net] Revert "mdio_bus: Remove unneeded gpiod NULL check"

2017-09-08 Thread David Miller
From: Florian Fainelli 
Date: Fri,  8 Sep 2017 15:38:07 -0700

> This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus:
> Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB
> checks for NULL descriptors, so it's safe to drop them, but it is not
> when CONFIG_GPIOLIB is disabled in the kernel. If we do call
> gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings
> coming from the inline stubs declared in include/linux/gpio/consumer.h.
> 
> Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check")
> Reported-by: Woojung Huh 
> Signed-off-by: Florian Fainelli 

Applied.


Re: [PATCH net] phy: mvebu-cp110: checking for NULL instead of IS_ERR()

2017-09-08 Thread David Miller
From: Dan Carpenter 
Date: Fri, 8 Sep 2017 13:31:37 +0300

> devm_ioremap_resource() never returns NULL, it only returns error
> pointers so this test needs to be changed.
> 
> Fixes: d0438bd6aa09 ("phy: add the mvebu cp110 comphy driver")
> Signed-off-by: Dan Carpenter 
> ---
> This driver apparently is going through the net tree, but netdev isn't
> listed as handling it in MAINTAINERS.  Kishon, do you know what's up
> with that?

Yeah let's sort this out before I apply this fix to my tree.


Re: [PATCH] net: ethernet: ti: netcp_core: no need in netif_napi_del

2017-09-08 Thread David Miller
From: Ivan Khoronzhuk 
Date: Thu,  7 Sep 2017 18:32:30 +0300

> Don't remove rx_napi specifically just before free_netdev(),
> it's supposed to be done in it and is confusing w/o tx_napi deletion.
> 
> Signed-off-by: Ivan Khoronzhuk 

Applied.


Re: [PATCH] net: tulip: Constify tulip_tbl

2017-09-08 Thread David Miller
From: Kees Cook 
Date: Thu, 7 Sep 2017 12:35:14 -0700

> It looks like all users of tulip_tbl are reads, so mark this table
> as read-only.
> 
> $ git grep tulip_tbl  # edited to avoid line-wraps...
> interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ...
> interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs&~RxPollInt, ...
> interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ...
> interrupt.c: iowrite32(tulip_tbl[tp->chip_id].valid_intrs | TimerInt,
> pnic.c:  iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7);
> tulip.h: extern struct tulip_chip_table tulip_tbl[];
> tulip_core.c:struct tulip_chip_table tulip_tbl[] = {
> tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR5);
> tulip_core.c:iowrite32(tulip_tbl[tp->chip_id].valid_intrs, ioaddr + CSR7);
> tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer,
> tulip_core.c:const char *chip_name = tulip_tbl[chip_idx].chip_name;
> tulip_core.c:if (pci_resource_len (pdev, 0) < tulip_tbl[chip_idx].io_size)
> tulip_core.c:ioaddr =  pci_iomap(..., tulip_tbl[chip_idx].io_size);
> tulip_core.c:tp->flags = tulip_tbl[chip_idx].flags;
> tulip_core.c:setup_timer(&tp->timer, tulip_tbl[tp->chip_id].media_timer,
> tulip_core.c:INIT_WORK(&tp->media_work, tulip_tbl[tp->chip_id].media_task);
> 
> Cc: "David S. Miller" 
> Cc: Jarod Wilson 
> Cc: "Gustavo A. R. Silva" 
> Cc: netdev@vger.kernel.org
> Cc: linux-par...@vger.kernel.org
> Signed-off-by: Kees Cook 

Applied.


Re: [PATCH] drivers: net: Display proper debug level up to 6

2017-09-08 Thread David Miller
From: Mathieu Malaterre 
Date: Thu,  7 Sep 2017 13:24:20 +0200

> This will make it explicit some messages are of the form:
> dm9000_dbg(db, 5, ...
> 
> Signed-off-by: Mathieu Malaterre 

Applied.


Re: [V2 PATCH net-next 0/2] Fixes for XDP_REDIRECT map

2017-09-08 Thread David Miller
From: Jesper Dangaard Brouer 
Date: Thu, 07 Sep 2017 14:33:08 +0200

> This my V2 of catching XDP_REDIRECT and bpf_redirect_map() API usage
> that can potentially crash the kernel.  Addressed Daniels feedback in
> patch01, and added patch02 which catch and cleanup dangling map
> pointers.
> 
> I know John and Daniel are working on a more long-term solution, of
> recording the bpf_prog pointer together with the map pointer.  I just
> wanted to propose these fixes as a stop-gap to the potential crashes.

Jesper if these are still relevant, please resubmit against the 'net'
tree, thanks!


Re: [PATCH net v4 3/3] net: phy: sfp: rename dt properties to match the binding

2017-09-08 Thread David Miller
From: Baruch Siach 
Date: Thu,  7 Sep 2017 12:25:50 +0300

> Make the Rx rate select control gpio property name match the documented
> binding. This would make the addition of 'rate-select1-gpios' for SFP+
> support more natural.
> 
> Also, make the MOD-DEF0 gpio property name match the documentation.
> 
> Signed-off-by: Baruch Siach 

Applied.


Re: [PATCH net v4 1/3] dt-bindings: add SFF vendor prefix

2017-09-08 Thread David Miller
From: Baruch Siach 
Date: Thu,  7 Sep 2017 12:25:48 +0300

> Acked-by: Rob Herring 
> Signed-off-by: Baruch Siach 

Applied.


Re: [PATCH net v4 2/3] dt-binding: net: sfp binding documentation

2017-09-08 Thread David Miller
From: Baruch Siach 
Date: Thu,  7 Sep 2017 12:25:49 +0300

> Add device-tree binding documentation SFP transceivers. Support for SFP
> transceivers has been recently introduced (drivers/net/phy/sfp.c).
> 
> Signed-off-by: Baruch Siach 

Applied.


Re: [PATCH] dt-bindings: net: don't confuse with generic PHY property

2017-09-08 Thread David Miller
From: Baruch Siach 
Date: Thu,  7 Sep 2017 11:09:59 +0300

> This complements commit 9a94b3a4bd (dt-binding: phy: don't confuse with
> Ethernet phy properties).
> 
> The generic PHY 'phys' property sometime appears in the same node with
> the Ethernet PHY 'phy' or 'phy-handle' properties. Add a warning in
> ethernet.txt to reduce confusion.
> 
> Signed-off-by: Baruch Siach 

Applied.


Re: [PATCH 1/2] ip_tunnel: fix setting ttl and tos value in collect_md mode

2017-09-08 Thread David Miller
From: Haishuang Yan 
Date: Thu,  7 Sep 2017 14:08:34 +0800

> ttl and tos variables are declared and assigned, but are not used in
> iptunnel_xmit() function.
> 
> Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
> Cc: Alexei Starovoitov 
> Signed-off-by: Haishuang Yan 

Applied.


Re: [PATCH 2/2] ip6_tunnel: fix setting hop_limit value for ipv6 tunnel

2017-09-08 Thread David Miller
From: Haishuang Yan 
Date: Thu,  7 Sep 2017 14:08:35 +0800

> Similar to vxlan/geneve tunnel, if hop_limit is zero, it should fall
> back to ip6_dst_hoplimt().
> 
> Signed-off-by: Haishuang Yan 

Applied.


Re: [PATCH net-next 3/3] net: phy: realtek: add RTL8201F phy-id and functions

2017-09-08 Thread Jassi Brar
On 9 September 2017 at 00:21, Florian Fainelli  wrote:
> On 09/08/2017 06:02 AM, Kunihiko Hayashi wrote:
>> From: Jassi Brar 
>>
>> Add RTL8201F phy-id and the related functions to the driver.
>>
>> The original patch is as follows:
>> https://patchwork.kernel.org/patch/2538341/
>>
>> Signed-off-by: Jongsung Kim 
>> Signed-off-by: Jassi Brar 
>> Signed-off-by: Kunihiko Hayashi 
>> ---
>>  drivers/net/phy/realtek.c | 45 +
>>  1 file changed, 45 insertions(+)
>>
>> diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
>> index 9cbe645..d9974ce 100644
>> --- a/drivers/net/phy/realtek.c
>> +++ b/drivers/net/phy/realtek.c
>> @@ -29,10 +29,23 @@
>>  #define RTL8211F_PAGE_SELECT 0x1f
>>  #define RTL8211F_TX_DELAY0x100
>>
>> +#define RTL8201F_ISR 0x1e
>> +#define RTL8201F_PAGE_SELECT 0x1f
>
> We have a page select register define for the RTL8211F right above, so
> surely we can make that a common definition?
>
That is just for the sake of consistency.
I mean RTL8211 wouldn't look neat among everything else RTL8201.

Also the page-select offsets just _happen_ to be same value...
RTL8211E_INER_LINK_STATUS and RTL8211F_INER_LINK_STATUS are very
different.

>> +#define RTL8201F_IER 0x13
>> +
>>  MODULE_DESCRIPTION("Realtek PHY driver");
>>  MODULE_AUTHOR("Johnson Leung");
>>  MODULE_LICENSE("GPL");
>>
>> +static int rtl8201_ack_interrupt(struct phy_device *phydev)
>> +{
>> + int err;
>> +
>> + err = phy_read(phydev, RTL8201F_ISR);
>> +
>> + return (err < 0) ? err : 0;
>> +}
>> +
>>  static int rtl821x_ack_interrupt(struct phy_device *phydev)
>>  {
>>   int err;
>> @@ -54,6 +67,25 @@ static int rtl8211f_ack_interrupt(struct phy_device 
>> *phydev)
>>   return (err < 0) ? err : 0;
>>  }
>>
>> +static int rtl8201_config_intr(struct phy_device *phydev)
>> +{
>> + int err;
>> +
>> + /* switch to page 7 */
>> + phy_write(phydev, RTL8201F_PAGE_SELECT, 0x7);
>> +
>> + if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
>> + err = phy_write(phydev, RTL8201F_IER,
>> + BIT(13) | BIT(12) | BIT(11));
>
> Can you detail what bits 11, 12 and 13 do? Do they correspond to link,
> duplex and pause changes by any chance?
>
Sorry no idea. The datasheet would say, and other functions too use
such magic values.

>> + else
>> + err = phy_write(phydev, RTL8201F_IER, 0);
>> +
>> + /* restore to default page 0 */
>> + phy_write(phydev, RTL8201F_PAGE_SELECT, 0x0);
>> +
>> + return err;
>> +}
>> +
>
> Other than that, LGTM:
>
> Reviewed-by: Florian Fainelli 
>
Thank you.


Re: [PATCH 1/2] ip_tunnel: fix setting ttl and tos value in collect_md mode

2017-09-08 Thread Alexei Starovoitov

On 9/6/17 11:08 PM, Haishuang Yan wrote:

ttl and tos variables are declared and assigned, but are not used in
iptunnel_xmit() function.

Fixes: cfc7381b3002 ("ip_tunnel: add collect_md mode to IPIP tunnel")
Cc: Alexei Starovoitov 
Signed-off-by: Haishuang Yan 
---
 net/ipv4/ip_tunnel.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/ip_tunnel.c b/net/ipv4/ip_tunnel.c
index 129d1a3..e1856bf 100644
--- a/net/ipv4/ip_tunnel.c
+++ b/net/ipv4/ip_tunnel.c
@@ -618,8 +618,8 @@ void ip_md_tunnel_xmit(struct sk_buff *skb, struct 
net_device *dev, u8 proto)
ip_rt_put(rt);
goto tx_dropped;
}
-   iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, key->tos,
- key->ttl, df, !net_eq(tunnel->net, dev_net(dev)));
+   iptunnel_xmit(NULL, rt, skb, fl4.saddr, fl4.daddr, proto, tos, ttl,
+ df, !net_eq(tunnel->net, dev_net(dev)));


indeed. good catch. thanks
Acked-by: Alexei Starovoitov 



RE: [PATCH RFC 3/5] Add KSZ8795 switch driver

2017-09-08 Thread Tristram.Ha
> -Original Message-
> From: Pavel Machek [mailto:pa...@ucw.cz]
> Sent: Friday, September 08, 2017 2:58 PM
> To: Tristram Ha - C24268
> Cc: and...@lunn.ch; muva...@gmail.com; nathan.leigh.con...@gmail.com;
> vivien.dide...@savoirfairelinux.com; f.faine...@gmail.com;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Woojung Huh -
> C21699
> Subject: Re: [PATCH RFC 3/5] Add KSZ8795 switch driver
> 
> Hi!
> 
> > > > +   default:
> > > > +   processed = false;
> > > > +   break;
> > > > +   }
> > > > +   if (processed)
> > > > +   *val = data;
> > > > +}
> > >
> > > Similar code will be needed by other drivers, right?
> >
> > Although KSZ8795 and KSZ8895 may use the same code, the other
> > chips will have different code.
> 
> Ok, please make sure code is shared between these two.

The exact function probably cannot be shared between KSZ8795
and KSZ8895 drivers because although the register constants look
the same but their exact locations are different in the 2 chips.



Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread 严海双


> On 2017年9月9日, at 上午6:13, Cong Wang  wrote:
> 
> On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
>  wrote:
>> Different namespace application might require different maximal number
>> of TCP sockets independently of the host.
> 
> So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
> in a whole system, right? This just makes OOM easier to trigger.
> 

>From my understanding, before the patch, we had N * 
>net->ipv4.sysctl_tcp_max_orphans,
and after the patch, we could have ns1.sysctl_tcp_max_orphans + 
ns2.sysctl_tcp_max_orphans
+ ns3.sysctl_tcp_max_orphans, is that right? Thanks for your reviewing.



RE: [PATCH RFC 0/6] Modify KSZ9477 DSA driver in preparation to add other KSZ switch drivers.

2017-09-08 Thread Tristram.Ha
> -Original Message-
> From: Pavel Machek [mailto:pa...@ucw.cz]
> Sent: Friday, September 08, 2017 1:54 AM
> To: Tristram Ha - C24268
> Cc: and...@lunn.ch; muva...@gmail.com; nathan.leigh.con...@gmail.com;
> vivien.dide...@savoirfairelinux.com; f.faine...@gmail.com;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Woojung Huh -
> C21699
> Subject: Re: [PATCH RFC 0/6] Modify KSZ9477 DSA driver in preparation to
> add other KSZ switch drivers.
> 
> Hi!
> 
> > From: Tristram Ha 
> >
> > This series of patches is to modify the original KSZ9477 DSA driver so that
> other KSZ switch drivers can be added and use the common code.
> >
> 
> Please wrap the lines from time to time...
> 
> 
> > This patch set is against net-next.
> >
> >  drivers/net/dsa/microchip/Makefile |2 +-
> >  drivers/net/dsa/microchip/ksz9477.c| 1317
> 
> 
> We already have ksz_9477_reg.h. So should this be ksz_9477.c for
> consistency?

The product name is KSZ9477 and other switches are also like KSZ,
so I would prefer to have no separation between KSZ and the product
number.  I think the file ksz_9477_reg.h was named that way because
the other files were named ksz_common.c and ksz_spi.c.  If need to
we can change the file name.



Re: [PATCH] net: bonding: Fix transmit load balancing in balance-alb mode if specified by sysfs

2017-09-08 Thread महेश बंडेवार
On Fri, Sep 8, 2017 at 7:30 AM, Nikolay Aleksandrov
 wrote:
> On 08/09/17 17:17, Kosuke Tatsukawa wrote:
>> Hi,
>>
>>> On 08/09/17 13:10, Nikolay Aleksandrov wrote:
 On 08/09/17 05:06, Kosuke Tatsukawa wrote:
> Hi,
>
>> On  7.09.2017 01:47, Kosuke Tatsukawa wrote:
>>> Commit cbf5ecb30560 ("net: bonding: Fix transmit load balancing in
>>> balance-alb mode") tried to fix transmit dynamic load balancing in
>>> balance-alb mode, which wasn't working after commit 8b426dc54cf4
>>> ("bonding: remove hardcoded value").
>>>
>>> It turned out that my previous patch only fixed the case when
>>> balance-alb was specified as bonding module parameter, and not when
>>> balance-alb mode was set using /sys/class/net/*/bonding/mode (the most
>>> common usage).  In the latter case, tlb_dynamic_lb was set up according
>>> to the default mode of the bonding interface, which happens to be
>>> balance-rr.
>>>
>>> This additional patch addresses this issue by setting up tlb_dynamic_lb
>>> to 1 if "mode" is set to balance-alb through the sysfs interface.
>>>
>>> I didn't add code to change tlb_balance_lb back to the default value for
>>> other modes, because "mode" is usually set up only once during
>>> initialization, and it's not worthwhile to change the static variable
>>> bonding_defaults in bond_main.c to a global variable just for this
>>> purpose.
>>>
>>> Commit 8b426dc54cf4 also changes the value of tlb_dynamic_lb for
>>> balance-tlb mode if it is set up using the sysfs interface.  I didn't
>>> change that behavior, because the value of tlb_balance_lb can be changed
>>> using the sysfs interface for balance-tlb, and I didn't like changing
>>> the default value back and forth for balance-tlb.
>>>
>>> As for balance-alb, /sys/class/net/*/bonding/tlb_balance_lb cannot be
>>> written to.  However, I think balance-alb with tlb_dynamic_lb set to 0
>>> is not an intended usage, so there is little use making it writable at
>>> this moment.
>>>
>>> Fixes: 8b426dc54cf4 ("bonding: remove hardcoded value")
>>> Reported-by: Reinis Rozitis 
>>> Signed-off-by: Kosuke Tatsukawa 
>>> Cc: sta...@vger.kernel.org  # v4.12+
>>> ---
>>>  drivers/net/bonding/bond_options.c |3 +++
>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>
>> I don't believe this to be the right solution, hardcoding it like this
>> changes user-visible behaviour. The issue is that if someone configured
>> it to be 0 in tlb mode, suddenly it will become 1 and will silently
>> override their config if they switch the mode to alb. Also it robs users
>> from their choice.
>>
>> If you think this should be settable in ALB mode, then IMO you should
>> edit tlb_dynamic_lb option's unsuppmodes and allow it to be set in ALB.
>> That would also be consistent with how it's handled in TLB mode.
>
> No, I don't think tlb_dynamic_lb should be settable in balance-alb at
> this point.  All the current commits regarding tlb_dynamic_lb are for
> balance-tlb mode, so I don't think balance-alb with tlb_dynamic_lb set
> to 0 is an intended usage.
>
>
>> Going back and looking at your previous fix I'd argue that it is also
>> wrong, you should've removed the mode check altogether to return the
>> original behaviour where the dynamic_lb is set to 1 (enabled) by
>> default and then ALB mode would've had it, of course that would've left
>> the case of setting it to 0 in TLB mode and switching to ALB, but that
>> is a different issue.
>
> Maybe balance-alb shouldn't be dependent on tlb_dynamic_lb.
> tlb_dynamic_lb is referenced in the following functions.
>
>  + bond_do_alb_xmit()  -- Used by both balance-tlb and balance-alb
>  + bond_tlb_xmit()  -- Only used by balance-tlb
>  + bond_open()  -- Used by both balance-tlb and balance-alb
>  + bond_check_params()  -- Used during module initialization
>  + bond_fill_info()  -- Used to get/set value
>  + bond_option_tlb_dynamic_lb_set()  -- Used to get/set value
>  + bonding_show_tlb_dynamic_lb()  -- Used to get/set value
>  + bond_is_nondyn_tlb()  -- Only referenced if balance-tlb mode
>
> The following untested patch adds code to make balance-alb work as if
> tlb_dynamic_lb==1 for the functions which affect balance-alb mode.  It
> also reverts my previous patch.
>
> What do you think about this approach?
> ---
> Kosuke TATSUKAWA  | 1st Platform Software Division
>   | NEC Solution Innovators
>   | ta...@ab.jp.nec.com
>

 Logically the approach looks good, that being said it adds unnecessary 
 tests in
 the fast path, why not just something like the patch below ? That only 
 leaves the
 problem if it is zeroed in TLB and switched 

[PATCH net] bpf: make error reporting in bpf_warn_invalid_xdp_action more clear

2017-09-08 Thread Daniel Borkmann
Differ between illegal XDP action code and just driver
unsupported one to provide better feedback when we throw
a one-time warning here. Reason is that with 814abfabef3c
("xdp: add bpf_redirect helper function") not all drivers
support the new XDP return code yet and thus they will
fall into their 'default' case when checking for return
codes after program return, which then triggers a
bpf_warn_invalid_xdp_action() stating that the return
code is illegal, but from XDP perspective it's not.

I decided not to place something like a XDP_ACT_MAX define
into uapi i) given we don't have this either for all other
program types, ii) future action codes could have further
encoding there, which would render such define unsuitable
and we wouldn't be able to rip it out again, and iii) we
rarely add new action codes.

Signed-off-by: Daniel Borkmann 
Acked-by: Alexei Starovoitov 
---
 include/uapi/linux/bpf.h | 4 ++--
 net/core/filter.c| 6 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index ba848b7..43ab5c4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -766,8 +766,8 @@ struct bpf_sock {
 
 /* User return codes for XDP prog type.
  * A valid XDP program must return one of these defined values. All other
- * return codes are reserved for future use. Unknown return codes will result
- * in packet drop.
+ * return codes are reserved for future use. Unknown return codes will
+ * result in packet drops and a warning via bpf_warn_invalid_xdp_action().
  */
 enum xdp_action {
XDP_ABORTED = 0,
diff --git a/net/core/filter.c b/net/core/filter.c
index 0848df2..adac4eb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3609,7 +3609,11 @@ static bool xdp_is_valid_access(int off, int size,
 
 void bpf_warn_invalid_xdp_action(u32 act)
 {
-   WARN_ONCE(1, "Illegal XDP return value %u, expect packet loss\n", act);
+   const u32 act_max = XDP_REDIRECT;
+
+   WARN_ONCE(1, "%s XDP return value %u, expect packet loss!\n",
+ act > act_max ? "Illegal" : "Driver unsupported",
+ act);
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
-- 
1.9.3



Re: [PATCH net] Revert "mdio_bus: Remove unneeded gpiod NULL check"

2017-09-08 Thread Linus Walleij
On Sat, Sep 9, 2017 at 1:18 AM, Florian Fainelli  wrote:
> On 09/08/2017 04:13 PM, Linus Walleij wrote:
>> On Sat, Sep 9, 2017 at 12:38 AM, Florian Fainelli  
>> wrote:
>>
>>> This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus:
>>> Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB
>>> checks for NULL descriptors, so it's safe to drop them, but it is not
>>> when CONFIG_GPIOLIB is disabled in the kernel. If we do call
>>> gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings
>>> coming from the inline stubs declared in include/linux/gpio/consumer.h.
>>>
>>> Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check")
>>> Reported-by: Woojung Huh 
>>> Signed-off-by: Florian Fainelli 
>>
>> Yeah I guess you don't wanna have these messages spewing
>> in the console. :/
>>
>> But what about simply doing this:
>
> That would create another dependency which really does not need to be
> there as it really prevents you from testing more configurations,
> including but not limited to having CONFIG_GPIOLIB=n w/
> CONFIG_MDIO_DEVICE=[ym].
>
> The GPIOLIB=n inline stubs have a "contract" with the code calling them
> that is fairly clear, which is what this revert is leveraging.

Ayeah the contract is something like "you can call us if compiled out,
but then you get warnings".

Yeah NULL checks does the trick as well as #ifdef:in in that sense.
It's no big deal so if you prefer this:
Acked-by: Linus Walleij 

Yours,
Linus Walleij


Re: [PATCH net] Revert "mdio_bus: Remove unneeded gpiod NULL check"

2017-09-08 Thread Florian Fainelli
On 09/08/2017 04:13 PM, Linus Walleij wrote:
> On Sat, Sep 9, 2017 at 12:38 AM, Florian Fainelli  
> wrote:
> 
>> This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus:
>> Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB
>> checks for NULL descriptors, so it's safe to drop them, but it is not
>> when CONFIG_GPIOLIB is disabled in the kernel. If we do call
>> gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings
>> coming from the inline stubs declared in include/linux/gpio/consumer.h.
>>
>> Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check")
>> Reported-by: Woojung Huh 
>> Signed-off-by: Florian Fainelli 
> 
> Yeah I guess you don't wanna have these messages spewing
> in the console. :/
> 
> But what about simply doing this:

That would create another dependency which really does not need to be
there as it really prevents you from testing more configurations,
including but not limited to having CONFIG_GPIOLIB=n w/
CONFIG_MDIO_DEVICE=[ym].

The GPIOLIB=n inline stubs have a "contract" with the code calling them
that is fairly clear, which is what this revert is leveraging.

Instead of doing what you suggest, we could also encapsulate the
offending code within an #IS_ENABLED(CONFIG_GPIOLIB) block, which would
be virtually the same thing in terms of object code generated, but would
make it clear there is a functional dependency, I would personally be
keener on that approach than having a select or depends.

Thanks

> 
> 
> From 150e4f3c1f227c9a4c31bbb48d44220e25b792ec Mon Sep 17 00:00:00 2001
> From: Linus Walleij 
> Date: Sat, 9 Sep 2017 01:07:22 +0200
> Subject: [PATCH] net: phy: mdio_bus: mandate gpiolib
> 
> The core mdio bus unconditionally uses gpiolib APIs to handle
> reset lines which results in runtime errors when it is compiled
> out. It is not supposed to be possible to stub out gpiolib
> like this, the error messages from the stubs are a sign that
> something is wrong.
> 
> So just select it. Fix some unneeded #includes while we're
> at it.>
> Signed-off-by: Linus Walleij 
> ---
>  drivers/net/phy/Kconfig| 1 +
>  drivers/net/phy/mdio_bus.c | 2 --
>  2 files changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
> index a9d16a3af514..b6c5bb9941c3 100644
> --- a/drivers/net/phy/Kconfig
> +++ b/drivers/net/phy/Kconfig
> @@ -4,6 +4,7 @@
> 
>  menuconfig MDIO_DEVICE
>  tristate "MDIO bus device drivers"
> +select GPIOLIB
>  help
>MDIO devices and driver infrastructure code.
> 
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index b6f9fa670168..9c60f87b7209 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -22,11 +22,9 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
> 


-- 
Florian


Re: [PATCH net-next] mdio_bus: Remove unneeded gpiod NULL check

2017-09-08 Thread Linus Walleij
On Thu, Sep 7, 2017 at 11:39 PM, Florian Fainelli  wrote:

> I think it means CONFIG_GPIOLIB=n in the kernel because it's not needed,
> yet you run code (like drivers/net/phy/mdio_bus.c) that unconditionally
> calls into GPIOLIB and attempts to configure a given GPIO if available.
> This thread is actually what prompted me to write this email:

Yeah I think GPIOLIB should simply be selected in KConfig then.

Sent an inline patch to do that in the other reply.

Maybe I should simply delete the stubs if they confuse
people.

It is possible to select GPIOLIB on any platform these days
and get the right APIs, this used to not be the case but I fixed
it a while back.

Alternatively we can depend on GPIOLIB but it's simpler to just
select it I think, it's like a library this code uses and so that's
it.

Yours,
Linus Walleij


Re: [PATCH net] Revert "mdio_bus: Remove unneeded gpiod NULL check"

2017-09-08 Thread Linus Walleij
On Sat, Sep 9, 2017 at 12:38 AM, Florian Fainelli  wrote:

> This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus:
> Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB
> checks for NULL descriptors, so it's safe to drop them, but it is not
> when CONFIG_GPIOLIB is disabled in the kernel. If we do call
> gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings
> coming from the inline stubs declared in include/linux/gpio/consumer.h.
>
> Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check")
> Reported-by: Woojung Huh 
> Signed-off-by: Florian Fainelli 

Yeah I guess you don't wanna have these messages spewing
in the console. :/

But what about simply doing this:


>From 150e4f3c1f227c9a4c31bbb48d44220e25b792ec Mon Sep 17 00:00:00 2001
From: Linus Walleij 
Date: Sat, 9 Sep 2017 01:07:22 +0200
Subject: [PATCH] net: phy: mdio_bus: mandate gpiolib

The core mdio bus unconditionally uses gpiolib APIs to handle
reset lines which results in runtime errors when it is compiled
out. It is not supposed to be possible to stub out gpiolib
like this, the error messages from the stubs are a sign that
something is wrong.

So just select it. Fix some unneeded #includes while we're
at it.

Signed-off-by: Linus Walleij 
---
 drivers/net/phy/Kconfig| 1 +
 drivers/net/phy/mdio_bus.c | 2 --
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/phy/Kconfig b/drivers/net/phy/Kconfig
index a9d16a3af514..b6c5bb9941c3 100644
--- a/drivers/net/phy/Kconfig
+++ b/drivers/net/phy/Kconfig
@@ -4,6 +4,7 @@

 menuconfig MDIO_DEVICE
 tristate "MDIO bus device drivers"
+select GPIOLIB
 help
   MDIO devices and driver infrastructure code.

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index b6f9fa670168..9c60f87b7209 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -22,11 +22,9 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
-- 
2.13.5

Yours,
Linus Walleij


Re: [PATCH net] ipv6: fix typo in fib6_net_exit()

2017-09-08 Thread Sabrina Dubroca
2017-09-08, 15:48:47 -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.
> 
> Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns 
> destruction")
> Signed-off-by: Eric Dumazet 

Ouch, yes :(
Thanks for the fix.

Acked-by: Sabrina Dubroca 

-- 
Sabrina


Re: [PATCH net] ipv6: fix typo in fib6_net_exit()

2017-09-08 Thread David Miller
From: Eric Dumazet 
Date: Fri, 08 Sep 2017 15:48:47 -0700

> From: Eric Dumazet 
> 
> IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.
> 
> Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns 
> destruction")
> Signed-off-by: Eric Dumazet 

Applied.


Re: [PATCH net] tcp: fix a request socket leak

2017-09-08 Thread David Miller
From: Alexei Starovoitov 
Date: Fri, 8 Sep 2017 15:59:23 -0700

> On Fri, Sep 08, 2017 at 12:44:47PM -0700, Eric Dumazet wrote:
>> From: Eric Dumazet 
>> 
>> While the cited commit fixed a possible deadlock, it added a leak
>> of the request socket, since reqsk_put() must be called if the BPF
>> filter decided the ACK packet must be dropped.
>> 
>> Fixes: d624d276d1dd ("tcp: fix possible deadlock in TCP stack vs BPF filter")
>> Signed-off-by: Eric Dumazet 
>> ---
>> David, this is a stable candidate (linux-4.13 has this bug)
> 
> Thanks for the fix
> Acked-by: Alexei Starovoitov 
> .. since the fix makes sense to me and as a reminder not to forget
> to backport it as well :)

Applied and queued up for -stable.


Re: [PATCH net] tcp: fix a request socket leak

2017-09-08 Thread Alexei Starovoitov
On Fri, Sep 08, 2017 at 12:44:47PM -0700, Eric Dumazet wrote:
> From: Eric Dumazet 
> 
> While the cited commit fixed a possible deadlock, it added a leak
> of the request socket, since reqsk_put() must be called if the BPF
> filter decided the ACK packet must be dropped.
> 
> Fixes: d624d276d1dd ("tcp: fix possible deadlock in TCP stack vs BPF filter")
> Signed-off-by: Eric Dumazet 
> ---
> David, this is a stable candidate (linux-4.13 has this bug)

Thanks for the fix
Acked-by: Alexei Starovoitov 
.. since the fix makes sense to me and as a reminder not to forget
to backport it as well :)



[PATCH net] ipv6: fix typo in fib6_net_exit()

2017-09-08 Thread Eric Dumazet
From: Eric Dumazet 

IPv6 FIB should use FIB6_TABLE_HASHSZ, not FIB_TABLE_HASHSZ.

Fixes: ba1cc08d9488 ("ipv6: fix memory leak with multiple tables during netns 
destruction")
Signed-off-by: Eric Dumazet 
---
 net/ipv6/ip6_fib.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
index 
8280172c806ca47c5ca4aef723d405a0a8d7df2a..e5308d7cbd75c4fb67861082382122d445cf5b74
 100644
--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -2033,7 +2033,7 @@ static void fib6_net_exit(struct net *net)
rt6_ifdown(net, NULL);
del_timer_sync(&net->ipv6.ip6_fib_timer);
 
-   for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
+   for (i = 0; i < FIB6_TABLE_HASHSZ; i++) {
struct hlist_head *head = &net->ipv6.fib_table_hash[i];
struct hlist_node *tmp;
struct fib6_table *tb;




Re: [net PATCH 3/3] bpf: devmap, use cond_resched instead of cpu_relax

2017-09-08 Thread Alexei Starovoitov

On 9/8/17 2:01 PM, John Fastabend wrote:

Be a bit more friendly about waiting for flush bits to complete.
Replace the cpu_relax() with a cond_resched().

Suggested-by: Daniel Borkmann 
Acked-by: Daniel Borkmann 
Signed-off-by: John Fastabend 


unlike patch 1 and 2, this one could have waited till net-next opens,
but I don't mind now. lgtm
Acked-by: Alexei Starovoitov 



Re: [net PATCH 2/3] bpf: add support for sockmap detach programs

2017-09-08 Thread Alexei Starovoitov

On 9/8/17 2:00 PM, John Fastabend wrote:

The bpf map sockmap supports adding programs via attach commands. This
patch adds the detach command to keep the API symmetric and allow
users to remove previously added programs. Otherwise the user would
have to delete the map and re-add it to get in this state.

This also adds a series of additional tests to capture detach operation
and also attaching/detaching invalid prog types.

API note: socks will run (or not run) programs depending on the state
of the map at the time the sock is added. We do not for example walk
the map and remove programs from previously attached socks.

Acked-by: Daniel Borkmann 
Signed-off-by: John Fastabend 


Nice clean patch. Thx
Acked-by: Alexei Starovoitov 


Re: [net PATCH 1/3] net: rcu lock and preempt disable missing around generic xdp

2017-09-08 Thread Alexei Starovoitov

On 9/8/17 2:00 PM, John Fastabend wrote:

do_xdp_generic must be called inside rcu critical section with preempt
disabled to ensure BPF programs are valid and per-cpu variables used
for redirect operations are consistent. This patch ensures this is true
and fixes the splat below.

The netif_receive_skb_internal() code path is now broken into two rcu
critical sections. I decided it was better to limit the preempt_enable/disable
block to just the xdp static key portion and the fallout is more
rcu_read_lock/unlock calls. Seems like the best option to me.

[  607.596901] =
[  607.596906] WARNING: suspicious RCU usage
[  607.596912] 4.13.0-rc4+ #570 Not tainted
[  607.596917] -
[  607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage!
[  607.596927]
[  607.596927] other info that might help us debug this:
[  607.596927]
[  607.596933]
[  607.596933] rcu_scheduler_active = 2, debug_locks = 1
[  607.596938] 2 locks held by pool/14624:
[  607.596943]  #0:  (rcu_read_lock_bh){..}, at: [] 
ip_finish_output2+0x14d/0x890
[  607.596973]  #1:  (rcu_read_lock_bh){..}, at: [] 
__dev_queue_xmit+0x14a/0xfd0
[  607.597000]
[  607.597000] stack backtrace:
[  607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570
[  607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 
03/01/2017
[  607.597016] Call Trace:
[  607.597027]  dump_stack+0x67/0x92
[  607.597040]  lockdep_rcu_suspicious+0xdd/0x110
[  607.597054]  do_xdp_generic+0x313/0xa50
[  607.597068]  ? time_hardirqs_on+0x5b/0x150
[  607.597076]  ? mark_held_locks+0x6b/0xc0
[  607.597088]  ? netdev_pick_tx+0x150/0x150
[  607.597117]  netif_rx_internal+0x205/0x3f0
[  607.597127]  ? do_xdp_generic+0xa50/0xa50
[  607.597144]  ? lock_downgrade+0x2b0/0x2b0
[  607.597158]  ? __lock_is_held+0x93/0x100
[  607.597187]  netif_rx+0x119/0x190
[  607.597202]  loopback_xmit+0xfd/0x1b0
[  607.597214]  dev_hard_start_xmit+0x127/0x4e0

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Fixes: b5cdae3291f7 ("net: Generic XDP")
Acked-by: Daniel Borkmann 
Signed-off-by: John Fastabend 


argh, so it's due to virtual devices and loopback.
Not pretty, but have to agree I don't see another way of fixing it.

Acked-by: Alexei Starovoitov 




[PATCH net] Revert "mdio_bus: Remove unneeded gpiod NULL check"

2017-09-08 Thread Florian Fainelli
This reverts commit 95b80bf3db03c2bf572a357cf74b9a6aefef0a4a ("mdio_bus:
Remove unneeded gpiod NULL check"), this commit assumed that GPIOLIB
checks for NULL descriptors, so it's safe to drop them, but it is not
when CONFIG_GPIOLIB is disabled in the kernel. If we do call
gpiod_set_value_cansleep() on a GPIO descriptor we will issue warnings
coming from the inline stubs declared in include/linux/gpio/consumer.h.

Fixes: 95b80bf3db03 ("mdio_bus: Remove unneeded gpiod NULL check")
Reported-by: Woojung Huh 
Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/mdio_bus.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index b6f9fa670168..2df7b62c1a36 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -399,7 +399,8 @@ int __mdiobus_register(struct mii_bus *bus, struct module 
*owner)
}
 
/* Put PHYs in RESET to save power */
-   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
+   if (bus->reset_gpiod)
+   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
 
device_del(&bus->dev);
return err;
@@ -424,7 +425,8 @@ void mdiobus_unregister(struct mii_bus *bus)
}
 
/* Put PHYs in RESET to save power */
-   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
+   if (bus->reset_gpiod)
+   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
 
device_del(&bus->dev);
 }
-- 
2.9.3



Re: [PATCH] ipv4: Namespaceify tcp_max_orphans knob

2017-09-08 Thread Cong Wang
On Wed, Sep 6, 2017 at 8:10 PM, Haishuang Yan
 wrote:
> Different namespace application might require different maximal number
> of TCP sockets independently of the host.

So after your patch we could have N * net->ipv4.sysctl_tcp_max_orphans
in a whole system, right? This just makes OOM easier to trigger.


Re: [PATCH RFC 3/5] Add KSZ8795 switch driver

2017-09-08 Thread Pavel Machek
Hi!

> > > + default:
> > > + processed = false;
> > > + break;
> > > + }
> > > + if (processed)
> > > + *val = data;
> > > +}
> > 
> > Similar code will be needed by other drivers, right?
> 
> Although KSZ8795 and KSZ8895 may use the same code, the other
> chips will have different code.

Ok, please make sure code is shared between these two.

Thansk,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH net] netlink: fix an use-after-free issue for nlk groups

2017-09-08 Thread Cong Wang
On Tue, Sep 5, 2017 at 8:47 PM, Xin Long  wrote:
> ChunYu found a netlink use-after-free issue by syzkaller:
>
> [28448.842981] BUG: KASAN: use-after-free in __nla_put+0x37/0x40 at addr 
> 8807185e2378
> [28448.969918] Call Trace:
> [...]
> [28449.117207]  __nla_put+0x37/0x40
> [28449.132027]  nla_put+0xf5/0x130
> [28449.146261]  sk_diag_fill.isra.4.constprop.5+0x5a0/0x750 [netlink_diag]
> [28449.176608]  __netlink_diag_dump+0x25a/0x700 [netlink_diag]
> [28449.202215]  netlink_diag_dump+0x176/0x240 [netlink_diag]
> [28449.226834]  netlink_dump+0x488/0xbb0
> [28449.298014]  __netlink_dump_start+0x4e8/0x760
> [28449.317924]  netlink_diag_handler_dump+0x261/0x340 [netlink_diag]
> [28449.413414]  sock_diag_rcv_msg+0x207/0x390
> [28449.432409]  netlink_rcv_skb+0x149/0x380
> [28449.467647]  sock_diag_rcv+0x2d/0x40
> [28449.484362]  netlink_unicast+0x562/0x7b0
> [28449.564790]  netlink_sendmsg+0xaa8/0xe60
> [28449.661510]  sock_sendmsg+0xcf/0x110
> [28449.865631]  __sys_sendmsg+0xf3/0x240
> [28450.000964]  SyS_sendmsg+0x32/0x50
> [28450.016969]  do_syscall_64+0x25c/0x6c0
> [28450.154439]  entry_SYSCALL64_slow_path+0x25/0x25
>
> It was caused by no protection between nlk groups' free in netlink_release
> and nlk groups' accessing in sk_diag_dump_groups. The similar issue also
> exists in netlink_seq_show().
>
> This patch is to defer nlk groups' free in deferred_put_nlk_sk.

This looks odd too, at least not complete.

The netlink sock itself is protected by RCU to speed up
the lookup path, but not necessarily nlk->groups, at least
I don't see rcu_dereference() in sk_diag_dump_groups().
And netlink_realloc_groups() needs fix too, right? Otherwise
krealloc() could reallocate a brand new memory and
existing readers will crash too?

I am afraid you need more work to make nlk->groups
RCU friendly. RCU is not just about call_rcu(), both
readers and writers need to use proper RCU API.


Re: [PATCH RFC 5/5] Add KSZ8795 SPI driver

2017-09-08 Thread Pavel Machek
Hi!

> > Please format according to CodingStyle. (Not only this.)
> > 
> > And this will be common for more drivers. Can it go to a header file
> > and be included...?
> > 
> 
> Sorry about the formatting.  It seems my e-mail system needs to be checked
> to make sure it does not auto-format the contents again.
> 
> About the SPI access functions they are the same for each driver except the
> low level ksz_spi_read_reg and ksz_spi_write_reg.  The dev_io_ops structure
> should contain only those 2 and ksz_spi_get and ksz_spi_set.
> 
> But that requires changing ksz_spi.c.  The idea was to keep the code of
> KSZ9477 driver with little change as possible while introducing another 
> driver.

So we change ksz_spi.c. Goal is to have clean code.

Thanks,
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Pavel Machek
Hi!

> There will be 5 drivers to support these devices:
> 
> ksz9477.c - KSZ9893/KSZ9897/KSZ9567/KSZ9566/KSZ9477
> ksz8795.c - KSZ8795/KSZ8795/KSZ8765
> ksz8895.c - KSZ8895/KSZ8864

Could we see the 8895 driver, please?

> Out of topic I have a question to ask the community regarding the DSA
> port creation:
> 
> Port 1 can be specified using the reg parameter specifying 0; port 2, 1;
> and so on.  The KSZ8794 is a variant of KSZ8795 with port 4 disabled.
> So
> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 4.
> But our company Marketing does not want to promote that fact but treat
> KSZ8794 as a distinct product.
> So
> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 3.
> Is this okay to hide this information inside the driver?  This is manageable
> for KSZ8794 but for KSZ8864 the first port is disabled:
> lan1 = 1, lan2 = 2, lan3 = 3, cpu = 4.
> 
> I am not sure whether DSA has or will have a way to display the port
> mapping to regular users.

Kernel is not a place to play marketing games, and people reading the
kernel sources are not targets of your marketing department.

Please let us just see the underlying hardware, as is.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


[PATCH iproute2 1/3] bridge: isolate vlans parsing code in a separate API

2017-09-08 Thread Roman Mashak
IFLA_BRIDGE_VLAN_INFO parsing logic will be used in link and vlan
processing code, so it makes sense to move it in the separate function.

Signed-off-by: Roman Mashak 
---
 bridge/br_common.h |   1 +
 bridge/vlan.c  | 145 +++--
 2 files changed, 76 insertions(+), 70 deletions(-)

diff --git a/bridge/br_common.h b/bridge/br_common.h
index c649e7d..01447dd 100644
--- a/bridge/br_common.h
+++ b/bridge/br_common.h
@@ -4,6 +4,7 @@
 #define MDB_RTR_RTA(r) \
((struct rtattr *)(((char *)(r)) + RTA_ALIGN(sizeof(__u32
 
+extern void print_vlan_info(FILE *fp, struct rtattr *tb, int ifindex);
 extern int print_linkinfo(const struct sockaddr_nl *who,
  struct nlmsghdr *n,
  void *arg);
diff --git a/bridge/vlan.c b/bridge/vlan.c
index ebcdace..fc361ae 100644
--- a/bridge/vlan.c
+++ b/bridge/vlan.c
@@ -189,7 +189,6 @@ static int print_vlan(const struct sockaddr_nl *who,
struct ifinfomsg *ifm = NLMSG_DATA(n);
int len = n->nlmsg_len;
struct rtattr *tb[IFLA_MAX+1];
-   bool vlan_flags = false;
 
if (n->nlmsg_type != RTM_NEWLINK) {
fprintf(stderr, "Not RTM_NEWLINK: %08x %08x %08x\n",
@@ -218,75 +217,7 @@ static int print_vlan(const struct sockaddr_nl *who,
ll_index_to_name(ifm->ifi_index));
return 0;
} else {
-   struct rtattr *i, *list = tb[IFLA_AF_SPEC];
-   int rem = RTA_PAYLOAD(list);
-   __u16 last_vid_start = 0;
-
-   if (!filter_vlan)
-   print_vlan_port(fp, ifm->ifi_index);
-
-   for (i = RTA_DATA(list); RTA_OK(i, rem); i = RTA_NEXT(i, rem)) {
-   struct bridge_vlan_info *vinfo;
-   int vcheck_ret;
-
-   if (i->rta_type != IFLA_BRIDGE_VLAN_INFO)
-   continue;
-
-   vinfo = RTA_DATA(i);
-
-   if (!(vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END))
-   last_vid_start = vinfo->vid;
-   vcheck_ret = filter_vlan_check(vinfo);
-   if (vcheck_ret == -1)
-   break;
-   else if (vcheck_ret == 0)
-   continue;
-
-   if (filter_vlan)
-   print_vlan_port(fp, ifm->ifi_index);
-   if (jw_global) {
-   jsonw_start_object(jw_global);
-   jsonw_uint_field(jw_global, "vlan",
-last_vid_start);
-   if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN)
-   continue;
-   } else {
-   fprintf(fp, "\t %hu", last_vid_start);
-   }
-   if (last_vid_start != vinfo->vid) {
-   if (jw_global)
-   jsonw_uint_field(jw_global, "vlanEnd",
-vinfo->vid);
-   else
-   fprintf(fp, "-%hu", vinfo->vid);
-   }
-   if (vinfo->flags & BRIDGE_VLAN_INFO_PVID) {
-   if (jw_global) {
-   
start_json_vlan_flags_array(&vlan_flags);
-   jsonw_string(jw_global, "PVID");
-   } else {
-   fprintf(fp, " PVID");
-   }
-   }
-   if (vinfo->flags & BRIDGE_VLAN_INFO_UNTAGGED) {
-   if (jw_global) {
-   
start_json_vlan_flags_array(&vlan_flags);
-   jsonw_string(jw_global,
-"Egress Untagged");
-   } else {
-   fprintf(fp, " Egress Untagged");
-   }
-   }
-   if (jw_global && vlan_flags) {
-   jsonw_end_array(jw_global);
-   vlan_flags = false;
-   }
-
-   if (jw_global)
-   jsonw_end_object(jw_global);
-   else
-   fprintf(fp, "\n");
-   }
+   print_vlan_info(fp, tb[IFLA_AF_SPEC], ifm->ifi_index);
}
if (!filter_vlan) {
if (jw_global)
@@ -470,6 +401,80 @@ static int vlan_show(int argc, char **argv)
return 0;
 }
 
+void print_vlan_info(FILE *f

[PATCH iproute2 3/3] bridge: request vlans along with link information

2017-09-08 Thread Roman Mashak
Signed-off-by: Roman Mashak 
---
 bridge/link.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/bridge/link.c b/bridge/link.c
index 60200f1..9e4206f 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -461,9 +461,19 @@ static int brlink_show(int argc, char **argv)
}
}
 
-   if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETLINK) < 0) {
-   perror("Cannon send dump request");
-   exit(1);
+   if (show_details) {
+   if (rtnl_wilddump_req_filter(&rth, PF_BRIDGE, RTM_GETLINK,
+(compress_vlans ?
+ RTEXT_FILTER_BRVLAN_COMPRESSED :
+ RTEXT_FILTER_BRVLAN)) < 0) {
+   perror("Cannon send dump request");
+   exit(1);
+   }
+   } else {
+   if (rtnl_wilddump_request(&rth, PF_BRIDGE, RTM_GETLINK) < 0) {
+   perror("Cannon send dump request");
+   exit(1);
+   }
}
 
if (rtnl_dump_filter(&rth, print_linkinfo, stdout) < 0) {
-- 
1.9.1



[PATCH iproute2 2/3] bridge: dump vlan table information for link

2017-09-08 Thread Roman Mashak
Kernel also reports vlans a port is member of, so print it. Since vlan
table can be quite large, dump it only when detailed information is
requested.

Signed-off-by: Roman Mashak 
---
 bridge/link.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/bridge/link.c b/bridge/link.c
index 93472ad..60200f1 100644
--- a/bridge/link.c
+++ b/bridge/link.c
@@ -213,6 +213,13 @@ int print_linkinfo(const struct sockaddr_nl *who,
 
if (aftb[IFLA_BRIDGE_MODE])
print_hwmode(fp, 
rta_getattr_u16(aftb[IFLA_BRIDGE_MODE]));
+   if (show_details) {
+   if (aftb[IFLA_BRIDGE_VLAN_INFO]) {
+   fprintf(fp, "\n");
+   print_vlan_info(fp, tb[IFLA_AF_SPEC],
+   ifi->ifi_index);
+   }
+   }
}
 
fprintf(fp, "\n");
-- 
1.9.1



[PATCH iproute2 0/3] Process IFLA_BRIDGE_VLAN_INFO tlv

2017-09-08 Thread Roman Mashak
Process IFLA_BRIDGE_VLAN_INFO attribute nested in IFLA_AF_SPEC, which
contains bridge vlan table per port passed in link events.

Roman Mashak (3):
  bridge: isolate vlans  parsing code in a separate API
  bridge: dump vlan table information for link
  bridge: request vlans along with link information

 bridge/br_common.h |   1 +
 bridge/link.c  |  23 +++--
 bridge/vlan.c  | 145 +++--
 3 files changed, 96 insertions(+), 73 deletions(-)

-- 
1.9.1



Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Pavel Machek
On Fri 2017-09-08 21:01:22, Andrew Lunn wrote:
> > > So i would suggest one driver supporting all the different devices.
> > 
> > There will be 5 drivers to support these devices:
> > 
> > ksz9477.c - KSZ9893/KSZ9897/KSZ9567/KSZ9566/KSZ9477
> > ksz8795.c - KSZ8795/KSZ8795/KSZ8765
> > ksz8895.c - KSZ8895/KSZ8864
> > ksz8863.c - KSZ8863/KSZ8873
> > ksz8463.c - KSZ8463
> > 
> > These chips have different SPI access mechanisms, MIB counter reading,
> > and register set.  These can be combined into one single driver using
> > function pointers, at least for ksz8795/ksz8895/ksz8863/ksz8463.  My
> > only concern is the memory footprint.  The customer may not want a
> > big driver to cover all the switches while only one is used.
> 
> If memory footprint is your problem, make it a compile time choice
> which devices are supported within the one driver. In practice, you
> will find most distributions just enable them all.

I have to side with Tristram here. The register layouts are so
different that single driver does not make sense.

What could make sense is single function, compiled 5 times, based on
different includes; same source code but 5 different binaries.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html


signature.asc
Description: Digital signature


Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

2017-09-08 Thread David Miller
From: Eduardo Valentin 
Date: Fri, 8 Sep 2017 11:55:21 -0700

> I agree. I would prefer to understand here what is the technical
> reason not to accept these patches other than "use other system
> calls".

I explained this, let me reiterate:


And most importantly, letting the kernel have flexibility in this area
is absolutely essential for various forms of optimizations and power
savings.



Re: [PATCH iproute2] tc: fq: support low_rate_threshold attribute

2017-09-08 Thread Soheil Hassas Yeganeh
On Fri, Sep 8, 2017 at 5:12 PM, Eric Dumazet  wrote:
> From: Eric Dumazet 
>
> TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9
>
> Tested:
>
> lpaa5:/tmp# tc -qd add dev eth1 root fq
> lpaa5:/tmp# tc -s qd sh dev eth1
> qdisc fq 8003: root refcnt 5 limit 1p flow_limit 1000p buckets 4096 \
>  orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 
> 3648 \
>  initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
>  Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
>  backlog 0b 0p requeues 0
>   116 flows (114 inactive, 0 throttled)
>   1 gc, 0 highprio, 0 throttled
>
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 50 -r 300,300 -o 
> P99_LATENCY
> 99th Percentile Latency Microseconds
> 7081
>
> lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
> lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 50 -r 300,300 -o 
> P99_LATENCY
> 99th Percentile Latency Microseconds
> 858
>
>
> Signed-off-by: Eric Dumazet 

Acked-by: Soheil Hassas Yeganeh 

Thank you, Eric!


[PATCH iproute2] tc: fq: support low_rate_threshold attribute

2017-09-08 Thread Eric Dumazet
From: Eric Dumazet 

TCA_FQ_LOW_RATE_THRESHOLD sch_fq attribute was added in linux-4.9

Tested:

lpaa5:/tmp# tc -qd add dev eth1 root fq
lpaa5:/tmp# tc -s qd sh dev eth1
qdisc fq 8003: root refcnt 5 limit 1p flow_limit 1000p buckets 4096 \
 orphan_mask 4095 bands 3 priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 quantum 3648 \
 initial_quantum 18240 low_rate_threshold 550Kbit refill_delay 40.0ms
 Sent 62139 bytes 395 pkt (dropped 0, overlimits 0 requeues 0)
 backlog 0b 0p requeues 0
  116 flows (114 inactive, 0 throttled)
  1 gc, 0 highprio, 0 throttled

lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 50 -r 300,300 -o 
P99_LATENCY
99th Percentile Latency Microseconds
7081

lpaa5:/tmp# tc qd replace dev eth1 root fq low_rate_threshold 10Mbit
lpaa5:/tmp# ./netperf -H lpaa6 -t TCP_RR -l10 -- -q 50 -r 300,300 -o 
P99_LATENCY
99th Percentile Latency Microseconds
858


Signed-off-by: Eric Dumazet 
---
 tc/q_fq.c |   20 
 1 file changed, 20 insertions(+)

diff --git a/tc/q_fq.c b/tc/q_fq.c
index c9efbfc4..45b2ffd9 100644
--- a/tc/q_fq.c
+++ b/tc/q_fq.c
@@ -55,6 +55,7 @@ static void explain(void)
fprintf(stderr, "  [ quantum BYTES ] [ initial_quantum 
BYTES ]\n");
fprintf(stderr, "  [ maxrate RATE  ] [ buckets NUMBER ]\n");
fprintf(stderr, "  [ [no]pacing ] [ refill_delay TIME ]\n");
+   fprintf(stderr, "  [ low_rate_threshold RATE ]\n");
fprintf(stderr, "  [ orphan_mask MASK]\n");
 }
 
@@ -79,6 +80,7 @@ static int fq_parse_opt(struct qdisc_util *qu, int argc, char 
**argv,
unsigned int initial_quantum;
unsigned int buckets = 0;
unsigned int maxrate;
+   unsigned int low_rate_threshold;
unsigned int defrate;
unsigned int refill_delay;
unsigned int orphan_mask;
@@ -90,6 +92,7 @@ static int fq_parse_opt(struct qdisc_util *qu, int argc, char 
**argv,
bool set_defrate = false;
bool set_refill_delay = false;
bool set_orphan_mask = false;
+   bool set_low_rate_threshold = false;
int pacing = -1;
struct rtattr *tail;
 
@@ -121,6 +124,13 @@ static int fq_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
return -1;
}
set_maxrate = true;
+   } else if (strcmp(*argv, "low_rate_threshold") == 0) {
+   NEXT_ARG();
+   if (get_rate(&low_rate_threshold, *argv)) {
+   fprintf(stderr, "Illegal 
\"low_rate_threshold\"\n");
+   return -1;
+   }
+   set_low_rate_threshold = true;
} else if (strcmp(*argv, "defrate") == 0) {
NEXT_ARG();
if (get_rate(&defrate, *argv)) {
@@ -196,6 +206,9 @@ static int fq_parse_opt(struct qdisc_util *qu, int argc, 
char **argv,
if (set_maxrate)
addattr_l(n, 1024, TCA_FQ_FLOW_MAX_RATE,
  &maxrate, sizeof(maxrate));
+   if (set_low_rate_threshold)
+   addattr_l(n, 1024, TCA_FQ_LOW_RATE_THRESHOLD,
+ &low_rate_threshold, sizeof(low_rate_threshold));
if (set_defrate)
addattr_l(n, 1024, TCA_FQ_FLOW_DEFAULT_RATE,
  &defrate, sizeof(defrate));
@@ -276,6 +289,13 @@ static int fq_print_opt(struct qdisc_util *qu, FILE *f, 
struct rtattr *opt)
if (rate != 0)
fprintf(f, "defrate %s ", sprint_rate(rate, b1));
}
+   if (tb[TCA_FQ_LOW_RATE_THRESHOLD] &&
+   RTA_PAYLOAD(tb[TCA_FQ_LOW_RATE_THRESHOLD]) >= sizeof(__u32)) {
+   rate = rta_getattr_u32(tb[TCA_FQ_LOW_RATE_THRESHOLD]);
+
+   if (rate != 0)
+   fprintf(f, "low_rate_threshold %s ", sprint_rate(rate, 
b1));
+   }
if (tb[TCA_FQ_FLOW_REFILL_DELAY] &&
RTA_PAYLOAD(tb[TCA_FQ_FLOW_REFILL_DELAY]) >= sizeof(__u32)) {
refill_delay = rta_getattr_u32(tb[TCA_FQ_FLOW_REFILL_DELAY]);




[net PATCH 3/3] bpf: devmap, use cond_resched instead of cpu_relax

2017-09-08 Thread John Fastabend
Be a bit more friendly about waiting for flush bits to complete.
Replace the cpu_relax() with a cond_resched().

Suggested-by: Daniel Borkmann 
Acked-by: Daniel Borkmann 
Signed-off-by: John Fastabend 
---
 kernel/bpf/devmap.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index ecf9f99..959c9a0 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -159,7 +159,7 @@ static void dev_map_free(struct bpf_map *map)
unsigned long *bitmap = per_cpu_ptr(dtab->flush_needed, cpu);
 
while (!bitmap_empty(bitmap, dtab->map.max_entries))
-   cpu_relax();
+   cond_resched();
}
 
for (i = 0; i < dtab->map.max_entries; i++) {



[net PATCH 1/3] net: rcu lock and preempt disable missing around generic xdp

2017-09-08 Thread John Fastabend
do_xdp_generic must be called inside rcu critical section with preempt
disabled to ensure BPF programs are valid and per-cpu variables used
for redirect operations are consistent. This patch ensures this is true
and fixes the splat below.

The netif_receive_skb_internal() code path is now broken into two rcu
critical sections. I decided it was better to limit the preempt_enable/disable
block to just the xdp static key portion and the fallout is more
rcu_read_lock/unlock calls. Seems like the best option to me.

[  607.596901] =
[  607.596906] WARNING: suspicious RCU usage
[  607.596912] 4.13.0-rc4+ #570 Not tainted
[  607.596917] -
[  607.596923] net/core/dev.c:3948 suspicious rcu_dereference_check() usage!
[  607.596927]
[  607.596927] other info that might help us debug this:
[  607.596927]
[  607.596933]
[  607.596933] rcu_scheduler_active = 2, debug_locks = 1
[  607.596938] 2 locks held by pool/14624:
[  607.596943]  #0:  (rcu_read_lock_bh){..}, at: [] 
ip_finish_output2+0x14d/0x890
[  607.596973]  #1:  (rcu_read_lock_bh){..}, at: [] 
__dev_queue_xmit+0x14a/0xfd0
[  607.597000]
[  607.597000] stack backtrace:
[  607.597006] CPU: 5 PID: 14624 Comm: pool Not tainted 4.13.0-rc4+ #570
[  607.597011] Hardware name: Dell Inc. Precision Tower 5810/0HHV7N, BIOS A17 
03/01/2017
[  607.597016] Call Trace:
[  607.597027]  dump_stack+0x67/0x92
[  607.597040]  lockdep_rcu_suspicious+0xdd/0x110
[  607.597054]  do_xdp_generic+0x313/0xa50
[  607.597068]  ? time_hardirqs_on+0x5b/0x150
[  607.597076]  ? mark_held_locks+0x6b/0xc0
[  607.597088]  ? netdev_pick_tx+0x150/0x150
[  607.597117]  netif_rx_internal+0x205/0x3f0
[  607.597127]  ? do_xdp_generic+0xa50/0xa50
[  607.597144]  ? lock_downgrade+0x2b0/0x2b0
[  607.597158]  ? __lock_is_held+0x93/0x100
[  607.597187]  netif_rx+0x119/0x190
[  607.597202]  loopback_xmit+0xfd/0x1b0
[  607.597214]  dev_hard_start_xmit+0x127/0x4e0

Fixes: d445516966dc ("net: xdp: support xdp generic on virtual devices")
Fixes: b5cdae3291f7 ("net: Generic XDP")
Acked-by: Daniel Borkmann 
Signed-off-by: John Fastabend 
---
 net/core/dev.c |   25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 6f845e4..fb766d9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3981,8 +3981,13 @@ static int netif_rx_internal(struct sk_buff *skb)
trace_netif_rx(skb);
 
if (static_key_false(&generic_xdp_needed)) {
-   int ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog),
-skb);
+   int ret;
+
+   preempt_disable();
+   rcu_read_lock();
+   ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+   rcu_read_unlock();
+   preempt_enable();
 
/* Consider XDP consuming the packet a success from
 * the netdev point of view we do not want to count
@@ -4500,18 +4505,20 @@ static int netif_receive_skb_internal(struct sk_buff 
*skb)
if (skb_defer_rx_timestamp(skb))
return NET_RX_SUCCESS;
 
-   rcu_read_lock();
-
if (static_key_false(&generic_xdp_needed)) {
-   int ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog),
-skb);
+   int ret;
 
-   if (ret != XDP_PASS) {
-   rcu_read_unlock();
+   preempt_disable();
+   rcu_read_lock();
+   ret = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+   rcu_read_unlock();
+   preempt_enable();
+
+   if (ret != XDP_PASS)
return NET_RX_DROP;
-   }
}
 
+   rcu_read_lock();
 #ifdef CONFIG_RPS
if (static_key_false(&rps_needed)) {
struct rps_dev_flow voidflow, *rflow = &voidflow;



[net PATCH 2/3] bpf: add support for sockmap detach programs

2017-09-08 Thread John Fastabend
The bpf map sockmap supports adding programs via attach commands. This
patch adds the detach command to keep the API symmetric and allow
users to remove previously added programs. Otherwise the user would
have to delete the map and re-add it to get in this state.

This also adds a series of additional tests to capture detach operation
and also attaching/detaching invalid prog types.

API note: socks will run (or not run) programs depending on the state
of the map at the time the sock is added. We do not for example walk
the map and remove programs from previously attached socks.

Acked-by: Daniel Borkmann 
Signed-off-by: John Fastabend 
---
 include/linux/bpf.h |8 ++---
 kernel/bpf/sockmap.c|2 +
 kernel/bpf/syscall.c|   27 ++--
 tools/testing/selftests/bpf/test_maps.c |   51 ++-
 4 files changed, 72 insertions(+), 16 deletions(-)

diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index c2cb1b5..8390859 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -385,16 +385,16 @@ static inline void __dev_map_flush(struct bpf_map *map)
 
 #if defined(CONFIG_STREAM_PARSER) && defined(CONFIG_BPF_SYSCALL)
 struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 key);
-int sock_map_attach_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
+int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type);
 #else
 static inline struct sock  *__sock_map_lookup_elem(struct bpf_map *map, u32 
key)
 {
return NULL;
 }
 
-static inline int sock_map_attach_prog(struct bpf_map *map,
-  struct bpf_prog *prog,
-  u32 type)
+static inline int sock_map_prog(struct bpf_map *map,
+   struct bpf_prog *prog,
+   u32 type)
 {
return -EOPNOTSUPP;
 }
diff --git a/kernel/bpf/sockmap.c b/kernel/bpf/sockmap.c
index f6ffde9..6424ce0 100644
--- a/kernel/bpf/sockmap.c
+++ b/kernel/bpf/sockmap.c
@@ -792,7 +792,7 @@ static int sock_map_ctx_update_elem(struct 
bpf_sock_ops_kern *skops,
return err;
 }
 
-int sock_map_attach_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type)
+int sock_map_prog(struct bpf_map *map, struct bpf_prog *prog, u32 type)
 {
struct bpf_stab *stab = container_of(map, struct bpf_stab, map);
struct bpf_prog *orig;
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index 70ad8e2..cb17e1c 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -1096,10 +1096,10 @@ static int bpf_obj_get(const union bpf_attr *attr)
 
 #define BPF_PROG_ATTACH_LAST_FIELD attach_flags
 
-static int sockmap_get_from_fd(const union bpf_attr *attr)
+static int sockmap_get_from_fd(const union bpf_attr *attr, bool attach)
 {
+   struct bpf_prog *prog = NULL;
int ufd = attr->target_fd;
-   struct bpf_prog *prog;
struct bpf_map *map;
struct fd f;
int err;
@@ -1109,16 +1109,20 @@ static int sockmap_get_from_fd(const union bpf_attr 
*attr)
if (IS_ERR(map))
return PTR_ERR(map);
 
-   prog = bpf_prog_get_type(attr->attach_bpf_fd, BPF_PROG_TYPE_SK_SKB);
-   if (IS_ERR(prog)) {
-   fdput(f);
-   return PTR_ERR(prog);
+   if (attach) {
+   prog = bpf_prog_get_type(attr->attach_bpf_fd,
+BPF_PROG_TYPE_SK_SKB);
+   if (IS_ERR(prog)) {
+   fdput(f);
+   return PTR_ERR(prog);
+   }
}
 
-   err = sock_map_attach_prog(map, prog, attr->attach_type);
+   err = sock_map_prog(map, prog, attr->attach_type);
if (err) {
fdput(f);
-   bpf_prog_put(prog);
+   if (prog)
+   bpf_prog_put(prog);
return err;
}
 
@@ -1155,7 +1159,7 @@ static int bpf_prog_attach(const union bpf_attr *attr)
break;
case BPF_SK_SKB_STREAM_PARSER:
case BPF_SK_SKB_STREAM_VERDICT:
-   return sockmap_get_from_fd(attr);
+   return sockmap_get_from_fd(attr, true);
default:
return -EINVAL;
}
@@ -1204,7 +1208,10 @@ static int bpf_prog_detach(const union bpf_attr *attr)
ret = cgroup_bpf_update(cgrp, NULL, attr->attach_type, false);
cgroup_put(cgrp);
break;
-
+   case BPF_SK_SKB_STREAM_PARSER:
+   case BPF_SK_SKB_STREAM_VERDICT:
+   ret = sockmap_get_from_fd(attr, false);
+   break;
default:
return -EINVAL;
}
diff --git a/tools/testing/selftests/bpf/test_maps.c 
b/tools/testing/selftests/bpf/test_maps.c
index 4acc772..fe3a443 100644
--- a/tools/testing/selftests/bpf/test_maps.c
+++ b/tools/testing/selftests/bpf/test_maps.c
@@ -558,7 +558,7 @@ static void test_sockmap(int ta

[net PATCH 0/3] Fixes for XDP/BPF

2017-09-08 Thread John Fastabend
The following fixes, UAPI updates, and small improvement,

i. XDP needs to be called inside RCU with preempt disabled.

ii. Not strictly a bug fix but we have an attach command in the
sockmap UAPI already to avoid having a single kernel released with
only the attach and not the detach I'm pushing this into net branch.
Its early in the RC cycle so I think this is OK (not ideal but better
than supporting a UAPI with a missing detach forever).

iii. Final patch replace cpu_relax with cond_resched in devmap.

---

John Fastabend (3):
  net: rcu lock and preempt disable missing around generic xdp
  bpf: add support for sockmap detach programs
  bpf: devmap, use cond_resched instead of cpu_relax


 include/linux/bpf.h |8 ++---
 kernel/bpf/devmap.c |2 +
 kernel/bpf/sockmap.c|2 +
 kernel/bpf/syscall.c|   27 ++--
 net/core/dev.c  |   25 ++-
 tools/testing/selftests/bpf/test_maps.c |   51 ++-
 6 files changed, 89 insertions(+), 26 deletions(-)

--
Signature


Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Florian Fainelli
On 09/08/2017 01:07 PM, tristram...@microchip.com wrote:
>> -Original Message-
>> From: Florian Fainelli [mailto:f.faine...@gmail.com]
>> Sent: Friday, September 08, 2017 12:54 PM
>> To: Tristram Ha - C24268; muva...@gmail.com
>> Cc: and...@lunn.ch; pa...@ucw.cz; nathan.leigh.con...@gmail.com;
>> vivien.dide...@savoirfairelinux.com; netdev@vger.kernel.org; linux-
>> ker...@vger.kernel.org; Woojung Huh - C21699
>> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new
>> drivers can be added
>>
>> On 09/08/2017 12:48 PM, tristram...@microchip.com wrote:
 -Original Message-
 From: Maxim Uvarov [mailto:muva...@gmail.com]
 Sent: Friday, September 08, 2017 12:00 PM
 To: Florian Fainelli
 Cc: Tristram Ha - C24268; Andrew Lunn; Pavel Machek; Nathan Conrad; Vivien
 Didelot; netdev; linux-ker...@vger.kernel.org; Woojung Huh - C21699
 Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that
>> new
 drivers can be added

 2017-09-08 21:48 GMT+03:00 Florian Fainelli :
> On 09/07/2017 02:11 PM, tristram...@microchip.com wrote:
>> From: Tristram Ha 
>>
>> Add other KSZ switches support so that patch check does not complain.
>>
>> Signed-off-by: Tristram Ha 
>> ---
>>  Documentation/devicetree/bindings/net/dsa/ksz.txt | 117
>> --
>>  1 file changed, 62 insertions(+), 55 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> b/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> index 0ab8b39..34af0e0 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> @@ -3,8 +3,15 @@ Microchip KSZ Series Ethernet switches
>>
>>  Required properties:
>>
>> -- compatible: For external switch chips, compatible string must be
>> exactly one
>> -  of: "microchip,ksz9477"
>> +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
>> +   "microchip,ksz8795" for KSZ8795 chip,
>> +   "microchip,ksz8794" for KSZ8794 chip,
>> +   "microchip,ksz8765" for KSZ8765 chip,
>> +   "microchip,ksz8895" for KSZ8895 chip,
>> +   "microchip,ksz8864" for KSZ8864 chip,
>> +   "microchip,ksz8873" for KSZ8873 chip,
>> +   "microchip,ksz8863" for KSZ8863 chip,
>> +   "microchip,ksz8463" for KSZ8463 chip
>


 Tristram, does any of this devices support chaining?

 Maxim.
>>>
>>> They do not if you mean daisy-chaining the switches together.
>>
>> Marvell tags allow you to specify both a port and switch index
>> destination after setting up an appropriate routing table, I am assuming
>> this is not supported.
>>
>> What happens though if I connect two KSZ switches ones to another say:
>>
>> eth0
>>  -> KSZ8463
>>  -> KSZ8463
>>
>> Will the first switch terminates KSZ tag if it sees two tags
>> encapsulated in another, something like this:
>>
>> | MAC DA | MAC SA |  payload | Inner KSZ tag | Inner KSZ tag | FCS |
>>
> 
> In theory it is doable by adding more tags and remember which port
> is connected to the cpu port of another switch, but there is no switch
> forwarding and everything is handled by software.

Fair enough.

> 
>>>
>>> There is always a problem that once tail tagging mode is enabled
>>> sending a frame through the MAC without going through the DSA
>>> layer will cause the frame to be dropped.
>>
>> Yes, once the master network device is used for DSA, it is still usable
>> directly by e.g: applications, but it won't do anything if the switch is
>> configured such that it drops ingressing frames not having the proper
>> tag. We documented that here:
>>
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docume
>> ntation/networking/dsa/dsa.txt#n275
>> --
> 
> As the DSA was developed for the Marvell switches I assumed they can
> forward frame without a tag.

Once you use tags with your switch, eth0/master netdev/conduit interface
no longer has a purpose as a normal interface because we create per-port
network devices. That means if you want to send packets towards Port 0
you use the proper network interface. If you create a bridge, you will
use brX as the network device to send/receive packets from, in all
cases, the packets originate from the CPU and the frame ingresses the
switch with the proper information within the tag to target the vector
of ports.

If you have tags enabled and you use eth0 to send packets towards the
switch, there are not that many options, either you have the proper tag,
and the switch will forward to the right port which can be the CPU port
itself if you want, but why do that?

When tags are not enabled (e.g: b53) that is slightly different,
eth0/master/conduit remains usable as a normal interface would, but
unless you start adding VLAN tags, you cann

RE: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Tristram.Ha
> -Original Message-
> From: Florian Fainelli [mailto:f.faine...@gmail.com]
> Sent: Friday, September 08, 2017 12:54 PM
> To: Tristram Ha - C24268; muva...@gmail.com
> Cc: and...@lunn.ch; pa...@ucw.cz; nathan.leigh.con...@gmail.com;
> vivien.dide...@savoirfairelinux.com; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; Woojung Huh - C21699
> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new
> drivers can be added
> 
> On 09/08/2017 12:48 PM, tristram...@microchip.com wrote:
> >> -Original Message-
> >> From: Maxim Uvarov [mailto:muva...@gmail.com]
> >> Sent: Friday, September 08, 2017 12:00 PM
> >> To: Florian Fainelli
> >> Cc: Tristram Ha - C24268; Andrew Lunn; Pavel Machek; Nathan Conrad; Vivien
> >> Didelot; netdev; linux-ker...@vger.kernel.org; Woojung Huh - C21699
> >> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that
> new
> >> drivers can be added
> >>
> >> 2017-09-08 21:48 GMT+03:00 Florian Fainelli :
> >>> On 09/07/2017 02:11 PM, tristram...@microchip.com wrote:
>  From: Tristram Ha 
> 
>  Add other KSZ switches support so that patch check does not complain.
> 
>  Signed-off-by: Tristram Ha 
>  ---
>   Documentation/devicetree/bindings/net/dsa/ksz.txt | 117
>  --
>   1 file changed, 62 insertions(+), 55 deletions(-)
> 
>  diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt
>  b/Documentation/devicetree/bindings/net/dsa/ksz.txt
>  index 0ab8b39..34af0e0 100644
>  --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
>  +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
>  @@ -3,8 +3,15 @@ Microchip KSZ Series Ethernet switches
> 
>   Required properties:
> 
>  -- compatible: For external switch chips, compatible string must be
>  exactly one
>  -  of: "microchip,ksz9477"
>  +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
>  +   "microchip,ksz8795" for KSZ8795 chip,
>  +   "microchip,ksz8794" for KSZ8794 chip,
>  +   "microchip,ksz8765" for KSZ8765 chip,
>  +   "microchip,ksz8895" for KSZ8895 chip,
>  +   "microchip,ksz8864" for KSZ8864 chip,
>  +   "microchip,ksz8873" for KSZ8873 chip,
>  +   "microchip,ksz8863" for KSZ8863 chip,
>  +   "microchip,ksz8463" for KSZ8463 chip
> >>>
> >>
> >>
> >> Tristram, does any of this devices support chaining?
> >>
> >> Maxim.
> >
> > They do not if you mean daisy-chaining the switches together.
> 
> Marvell tags allow you to specify both a port and switch index
> destination after setting up an appropriate routing table, I am assuming
> this is not supported.
> 
> What happens though if I connect two KSZ switches ones to another say:
> 
> eth0
>   -> KSZ8463
>   -> KSZ8463
> 
> Will the first switch terminates KSZ tag if it sees two tags
> encapsulated in another, something like this:
> 
> | MAC DA | MAC SA |  payload | Inner KSZ tag | Inner KSZ tag | FCS |
> 

In theory it is doable by adding more tags and remember which port
is connected to the cpu port of another switch, but there is no switch
forwarding and everything is handled by software.

> >
> > There is always a problem that once tail tagging mode is enabled
> > sending a frame through the MAC without going through the DSA
> > layer will cause the frame to be dropped.
> 
> Yes, once the master network device is used for DSA, it is still usable
> directly by e.g: applications, but it won't do anything if the switch is
> configured such that it drops ingressing frames not having the proper
> tag. We documented that here:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Docume
> ntation/networking/dsa/dsa.txt#n275
> --

As the DSA was developed for the Marvell switches I assumed they can
forward frame without a tag.



RE: [PATCH net-next] mdio_bus: Remove unneeded gpiod NULL check

2017-09-08 Thread Woojung.Huh
> >> I think it means CONFIG_GPIOLIB=n in the kernel because it's not needed,
> >> yet you run code (like drivers/net/phy/mdio_bus.c) that unconditionally
> >> calls into GPIOLIB and attempts to configure a given GPIO if available.
> > Yes. I'm facing issue on PC which won't need GPIOLIB as default.
> > Warning message goes away when GPIOLIB is enabled, and fortunately,
> > Ubuntu default config has it.
> > So, it may not be seen by many users when with full/default configuration.
> 
> Woojung, I suppose you are also getting a warning from
> gpiod_set_value_cansleep() done in mdiobus_unregister() right? With
> CONFIG_GPIOLIB=n devm_gpiod_get_optional() returns NULL, which we
> don't
> check as an error, on purpose however we still call
> gpiod_set_value_cansleep() on a NULL GPIO descriptor, so the following
> should do:
> 
> diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
> index b6f9fa670168..67dbb7c26840 100644
> --- a/drivers/net/phy/mdio_bus.c
> +++ b/drivers/net/phy/mdio_bus.c
> @@ -424,7 +424,8 @@ void mdiobus_unregister(struct mii_bus *bus)
> }
> 
> /* Put PHYs in RESET to save power */
> -   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
> +   if (!IS_ERR_OR_NULL(bus->reset_gpiod))
> +   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
> 
> device_del(&bus->dev);
>  }
Hi Florian,

Thanks for the patch. I'm avoiding warning with CONFIG_GPIOLIB=y for now.
I'm curious that there is a final conclusion to resolve this issue,
either with CONFIG_GPIOLIB=y or something else.

- Woojung


Re: [PATCH net-next] mdio_bus: Remove unneeded gpiod NULL check

2017-09-08 Thread Florian Fainelli
On 09/07/2017 02:51 PM, woojung@microchip.com wrote:
> If someone is using GPIO descriptors with GPIO disabled, i.e. calling
> gpiod_get() and friends, this is very likely to be a bug, and what
> the driver wants to do is:
>
>  depends on GPIOLIB
>
> or
>
>  select GPIOLIB
>
> in Kconfig. The whole optional thing is mainly a leftover from when it
> was possible to have a local implementation of the GPIOLIB API in
> some custom header file, noone sane should be doing that anymore,
> and if they do, they can very well face the warnings.
>
> If someone is facing a lot of WARN_ON() messages to this, it is a clear
> indication that they need to fix their Kconfig and in that case it is 
> proper.
 Linus & Andrew,

 I knew that it is already in David's pulling request.
 Configuring GPIOLIB is the right solution  even if platform doesn't use it?
>>>
>>> I guess?
>>>
>>> "Platform doesn't use it" what does that mean?
>>>
>>> Does it mean it does not call the
>>> APIs of the GPIOLIB, does it mean it doesn't have a GPIO driver
>>> at probe (but may have one by having it probed from a module)
>>> or does it mean the platform can never have it?
>>
>> I think it means CONFIG_GPIOLIB=n in the kernel because it's not needed,
>> yet you run code (like drivers/net/phy/mdio_bus.c) that unconditionally
>> calls into GPIOLIB and attempts to configure a given GPIO if available.
> Yes. I'm facing issue on PC which won't need GPIOLIB as default.
> Warning message goes away when GPIOLIB is enabled, and fortunately,
> Ubuntu default config has it.
> So, it may not be seen by many users when with full/default configuration.

Woojung, I suppose you are also getting a warning from
gpiod_set_value_cansleep() done in mdiobus_unregister() right? With
CONFIG_GPIOLIB=n devm_gpiod_get_optional() returns NULL, which we don't
check as an error, on purpose however we still call
gpiod_set_value_cansleep() on a NULL GPIO descriptor, so the following
should do:

diff --git a/drivers/net/phy/mdio_bus.c b/drivers/net/phy/mdio_bus.c
index b6f9fa670168..67dbb7c26840 100644
--- a/drivers/net/phy/mdio_bus.c
+++ b/drivers/net/phy/mdio_bus.c
@@ -424,7 +424,8 @@ void mdiobus_unregister(struct mii_bus *bus)
}

/* Put PHYs in RESET to save power */
-   gpiod_set_value_cansleep(bus->reset_gpiod, 1);
+   if (!IS_ERR_OR_NULL(bus->reset_gpiod))
+   gpiod_set_value_cansleep(bus->reset_gpiod, 1);

device_del(&bus->dev);
 }


> 
>> This thread is actually what prompted me to write this email:
>>
>> https://lkml.org/lkml/2017/9/2/3
> Thanks for the link.
> 
> 


-- 
Florian


Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Florian Fainelli
On 09/08/2017 12:48 PM, tristram...@microchip.com wrote:
>> -Original Message-
>> From: Maxim Uvarov [mailto:muva...@gmail.com]
>> Sent: Friday, September 08, 2017 12:00 PM
>> To: Florian Fainelli
>> Cc: Tristram Ha - C24268; Andrew Lunn; Pavel Machek; Nathan Conrad; Vivien
>> Didelot; netdev; linux-ker...@vger.kernel.org; Woojung Huh - C21699
>> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new
>> drivers can be added
>>
>> 2017-09-08 21:48 GMT+03:00 Florian Fainelli :
>>> On 09/07/2017 02:11 PM, tristram...@microchip.com wrote:
 From: Tristram Ha 

 Add other KSZ switches support so that patch check does not complain.

 Signed-off-by: Tristram Ha 
 ---
  Documentation/devicetree/bindings/net/dsa/ksz.txt | 117
 --
  1 file changed, 62 insertions(+), 55 deletions(-)

 diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt
 b/Documentation/devicetree/bindings/net/dsa/ksz.txt
 index 0ab8b39..34af0e0 100644
 --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
 +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
 @@ -3,8 +3,15 @@ Microchip KSZ Series Ethernet switches

  Required properties:

 -- compatible: For external switch chips, compatible string must be
 exactly one
 -  of: "microchip,ksz9477"
 +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
 +   "microchip,ksz8795" for KSZ8795 chip,
 +   "microchip,ksz8794" for KSZ8794 chip,
 +   "microchip,ksz8765" for KSZ8765 chip,
 +   "microchip,ksz8895" for KSZ8895 chip,
 +   "microchip,ksz8864" for KSZ8864 chip,
 +   "microchip,ksz8873" for KSZ8873 chip,
 +   "microchip,ksz8863" for KSZ8863 chip,
 +   "microchip,ksz8463" for KSZ8463 chip
>>>
>>
>>
>> Tristram, does any of this devices support chaining?
>>
>> Maxim.
> 
> They do not if you mean daisy-chaining the switches together.

Marvell tags allow you to specify both a port and switch index
destination after setting up an appropriate routing table, I am assuming
this is not supported.

What happens though if I connect two KSZ switches ones to another say:

eth0
-> KSZ8463
-> KSZ8463

Will the first switch terminates KSZ tag if it sees two tags
encapsulated in another, something like this:

| MAC DA | MAC SA |  payload | Inner KSZ tag | Inner KSZ tag | FCS |

> 
> There is always a problem that once tail tagging mode is enabled
> sending a frame through the MAC without going through the DSA
> layer will cause the frame to be dropped.

Yes, once the master network device is used for DSA, it is still usable
directly by e.g: applications, but it won't do anything if the switch is
configured such that it drops ingressing frames not having the proper
tag. We documented that here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/dsa/dsa.txt#n275
-- 
Florian


Re: Use after free in __dst_destroy_metrics_generic

2017-09-08 Thread Subash Abhinov Kasiviswanathan

On 2017-09-08 10:10, Cong Wang wrote:

On Thu, Sep 7, 2017 at 5:52 PM, Subash Abhinov Kasiviswanathan
 wrote:

We are seeing a possible use after free in ip6_dst_destroy.

It appears as if memory of the __DST_METRICS_PTR(old) was freed in 
some path

and allocated
to ion driver. ion driver has also freed it. Finally the memory is 
freed by

the
fib gc and crashes since it is already deallocated.


Does the attach (compile-only) patch help anything?

From my _quick_ glance, it seems we miss the refcnt'ing
right in __dst_destroy_metrics_generic().

Thanks!


Hi Cong

Thanks for patch. I'll try this out.

--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


RE: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Tristram.Ha
> -Original Message-
> From: Maxim Uvarov [mailto:muva...@gmail.com]
> Sent: Friday, September 08, 2017 12:00 PM
> To: Florian Fainelli
> Cc: Tristram Ha - C24268; Andrew Lunn; Pavel Machek; Nathan Conrad; Vivien
> Didelot; netdev; linux-ker...@vger.kernel.org; Woojung Huh - C21699
> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new
> drivers can be added
> 
> 2017-09-08 21:48 GMT+03:00 Florian Fainelli :
> > On 09/07/2017 02:11 PM, tristram...@microchip.com wrote:
> >> From: Tristram Ha 
> >>
> >> Add other KSZ switches support so that patch check does not complain.
> >>
> >> Signed-off-by: Tristram Ha 
> >> ---
> >>  Documentation/devicetree/bindings/net/dsa/ksz.txt | 117
> >> --
> >>  1 file changed, 62 insertions(+), 55 deletions(-)
> >>
> >> diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt
> >> b/Documentation/devicetree/bindings/net/dsa/ksz.txt
> >> index 0ab8b39..34af0e0 100644
> >> --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
> >> +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
> >> @@ -3,8 +3,15 @@ Microchip KSZ Series Ethernet switches
> >>
> >>  Required properties:
> >>
> >> -- compatible: For external switch chips, compatible string must be
> >> exactly one
> >> -  of: "microchip,ksz9477"
> >> +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
> >> +   "microchip,ksz8795" for KSZ8795 chip,
> >> +   "microchip,ksz8794" for KSZ8794 chip,
> >> +   "microchip,ksz8765" for KSZ8765 chip,
> >> +   "microchip,ksz8895" for KSZ8895 chip,
> >> +   "microchip,ksz8864" for KSZ8864 chip,
> >> +   "microchip,ksz8873" for KSZ8873 chip,
> >> +   "microchip,ksz8863" for KSZ8863 chip,
> >> +   "microchip,ksz8463" for KSZ8463 chip
> >
> 
> 
> Tristram, does any of this devices support chaining?
> 
> Maxim.

They do not if you mean daisy-chaining the switches together.

There is always a problem that once tail tagging mode is enabled
sending a frame through the MAC without going through the DSA
layer will cause the frame to be dropped.



Re: [PATCH net] net: qualcomm: rmnet: Fix a double free

2017-09-08 Thread Subash Abhinov Kasiviswanathan

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
index 557c9bf1a469..0335fce54201 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -95,10 +95,8 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff 
*skb)

skb_pull(skb, packet_len);

/* Some hardware can send us empty frames. Catch them */
-   if (ntohs(maph->pkt_len) == 0) {
-   kfree_skb(skb);
+   if (ntohs(maph->pkt_len) == 0)
return NULL;
-   }

return skbn;
 }


Thanks for the patch. This is fixing the double free, but is leaking the 
new skb skbn

created. Perhaps we should add the check earlier -

diff --git a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c 
b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c

index 557c9bf..86b8c75 100644
--- a/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
+++ b/drivers/net/ethernet/qualcomm/rmnet/rmnet_map_data.c
@@ -84,6 +84,10 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff 
*skb)

if (((int)skb->len - (int)packet_len) < 0)
return NULL;

+   /* Some hardware can send us empty frames. Catch them */
+   if (ntohs(maph->pkt_len) == 0)
+   return NULL;
+
skbn = alloc_skb(packet_len + RMNET_MAP_DEAGGR_SPACING, 
GFP_ATOMIC);

if (!skbn)
return NULL;
@@ -94,11 +98,5 @@ struct sk_buff *rmnet_map_deaggregate(struct sk_buff 
*skb)

memcpy(skbn->data, skb->data, packet_len);
skb_pull(skb, packet_len);

-   /* Some hardware can send us empty frames. Catch them */
-   if (ntohs(maph->pkt_len) == 0) {
-   kfree_skb(skb);
-   return NULL;
-   }
-
return skbn;
 }


--
Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a 
Linux Foundation Collaborative Project


[PATCH net] tcp: fix a request socket leak

2017-09-08 Thread Eric Dumazet
From: Eric Dumazet 

While the cited commit fixed a possible deadlock, it added a leak
of the request socket, since reqsk_put() must be called if the BPF
filter decided the ACK packet must be dropped.

Fixes: d624d276d1dd ("tcp: fix possible deadlock in TCP stack vs BPF filter")
Signed-off-by: Eric Dumazet 
---
David, this is a stable candidate (linux-4.13 has this bug)

 net/ipv4/tcp_ipv4.c |6 +++---
 net/ipv6/tcp_ipv6.c |6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 
a63486afa7a7e7b4dce88b65bc27cfa872a3ba2f..d9416b5162bc1bdd1acd34fcb4da21cb6b62d0ae
 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -1669,9 +1669,9 @@ int tcp_v4_rcv(struct sk_buff *skb)
 */
sock_hold(sk);
refcounted = true;
-   if (tcp_filter(sk, skb))
-   goto discard_and_relse;
-   nsk = tcp_check_req(sk, skb, req, false);
+   nsk = NULL;
+   if (!tcp_filter(sk, skb))
+   nsk = tcp_check_req(sk, skb, req, false);
if (!nsk) {
reqsk_put(req);
goto discard_and_relse;
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 
38f76d8b231e3e5923ad72d05b8d320b6aeb850f..64d94afa427f81fd7a505b1cfd1c0be66c273810
 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1460,9 +1460,9 @@ static int tcp_v6_rcv(struct sk_buff *skb)
}
sock_hold(sk);
refcounted = true;
-   if (tcp_filter(sk, skb))
-   goto discard_and_relse;
-   nsk = tcp_check_req(sk, skb, req, false);
+   nsk = NULL;
+   if (!tcp_filter(sk, skb))
+   nsk = tcp_check_req(sk, skb, req, false);
if (!nsk) {
reqsk_put(req);
goto discard_and_relse;




Re: [PATCH net] netlink: access nlk groups safely in netlink bind and getname

2017-09-08 Thread Cong Wang
On Tue, Sep 5, 2017 at 8:53 PM, Xin Long  wrote:
> Now there is no lock protecting nlk ngroups/groups' accessing in
> netlink bind and getname. It's safe from nlk groups' setting in
> netlink_release, but not from netlink_realloc_groups called by
> netlink_setsockopt.
>
> netlink_lock_table is needed in both netlink bind and getname when
> accessing nlk groups.

This looks very odd.

netlink_lock_table() should be protecting nl_table, why
it also protects nlk->groups?? For me it looks like you
need lock_sock() instead.


Re: [PATCH net-next 2/3] net: ethernet: socionext: add AVE ethernet driver

2017-09-08 Thread Florian Fainelli
On 09/08/2017 06:02 AM, Kunihiko Hayashi wrote:
> The UniPhier platform from Socionext provides the AVE ethernet
> controller that includes MAC and MDIO bus supporting RGMII/RMII
> modes. The controller is named AVE.
> 
> Signed-off-by: Kunihiko Hayashi 
> Signed-off-by: Jassi Brar 
> ---
>  drivers/net/ethernet/Kconfig |1 +
>  drivers/net/ethernet/Makefile|1 +
>  drivers/net/ethernet/socionext/Kconfig   |   22 +
>  drivers/net/ethernet/socionext/Makefile  |4 +
>  drivers/net/ethernet/socionext/sni_ave.c | 1618 
> ++
>  5 files changed, 1646 insertions(+)
>  create mode 100644 drivers/net/ethernet/socionext/Kconfig
>  create mode 100644 drivers/net/ethernet/socionext/Makefile
>  create mode 100644 drivers/net/ethernet/socionext/sni_ave.c
> 
> diff --git a/drivers/net/ethernet/Kconfig b/drivers/net/ethernet/Kconfig
> index c604213..d50519e 100644
> --- a/drivers/net/ethernet/Kconfig
> +++ b/drivers/net/ethernet/Kconfig
> @@ -170,6 +170,7 @@ source "drivers/net/ethernet/sis/Kconfig"
>  source "drivers/net/ethernet/sfc/Kconfig"
>  source "drivers/net/ethernet/sgi/Kconfig"
>  source "drivers/net/ethernet/smsc/Kconfig"
> +source "drivers/net/ethernet/socionext/Kconfig"
>  source "drivers/net/ethernet/stmicro/Kconfig"
>  source "drivers/net/ethernet/sun/Kconfig"
>  source "drivers/net/ethernet/tehuti/Kconfig"
> diff --git a/drivers/net/ethernet/Makefile b/drivers/net/ethernet/Makefile
> index a0a03d4..9f55b36 100644
> --- a/drivers/net/ethernet/Makefile
> +++ b/drivers/net/ethernet/Makefile
> @@ -81,6 +81,7 @@ obj-$(CONFIG_SFC) += sfc/
>  obj-$(CONFIG_SFC_FALCON) += sfc/falcon/
>  obj-$(CONFIG_NET_VENDOR_SGI) += sgi/
>  obj-$(CONFIG_NET_VENDOR_SMSC) += smsc/
> +obj-$(CONFIG_NET_VENDOR_SOCIONEXT) += socionext/
>  obj-$(CONFIG_NET_VENDOR_STMICRO) += stmicro/
>  obj-$(CONFIG_NET_VENDOR_SUN) += sun/
>  obj-$(CONFIG_NET_VENDOR_TEHUTI) += tehuti/
> diff --git a/drivers/net/ethernet/socionext/Kconfig 
> b/drivers/net/ethernet/socionext/Kconfig
> new file mode 100644
> index 000..788f26f
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/Kconfig
> @@ -0,0 +1,22 @@
> +config NET_VENDOR_SOCIONEXT
> + bool "Socionext ethernet drivers"
> + default y
> + ---help---
> +   Option to select ethernet drivers for Socionext platforms.
> +
> +   Note that the answer to this question doesn't directly affect the
> +   kernel: saying N will just cause the configurator to skip all
> +   the questions about Agere devices. If you say Y, you will be asked
> +   for your specific card in the following questions.
> +
> +if NET_VENDOR_SOCIONEXT
> +
> +config SNI_AVE
> + tristate "Socionext AVE ethernet support"
> + depends on (ARCH_UNIPHIER || COMPILE_TEST) && OF
> + select PHYLIB
> + ---help---
> +   Driver for gigabit ethernet MACs, called AVE, in the
> +   Socionext UniPhier family.
> +
> +endif #NET_VENDOR_SOCIONEXT
> diff --git a/drivers/net/ethernet/socionext/Makefile 
> b/drivers/net/ethernet/socionext/Makefile
> new file mode 100644
> index 000..0356341
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/Makefile
> @@ -0,0 +1,4 @@
> +#
> +# Makefile for all ethernet ip drivers on Socionext platforms
> +#
> +obj-$(CONFIG_SNI_AVE) += sni_ave.o
> diff --git a/drivers/net/ethernet/socionext/sni_ave.c 
> b/drivers/net/ethernet/socionext/sni_ave.c
> new file mode 100644
> index 000..c870777
> --- /dev/null
> +++ b/drivers/net/ethernet/socionext/sni_ave.c
> @@ -0,0 +1,1618 @@
> +/**
> + * sni_ave.c - Socionext UniPhier AVE ethernet driver
> + *
> + * Copyright 2014 Panasonic Corporation
> + * Copyright 2015-2017 Socionext Inc.
> + *
> + * This program is free software: you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2  of
> + * the License as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +/* General Register Group */
> +#define AVE_IDR  0x0 /* ID */
> +#define AVE_VR   0x4 /* Version */
> +#define AVE_GRR  0x8 /* Global Reset */
> +#define AVE_CFGR 0xc /* Configuration */
> +
> +/* Interrupt Register Group */
> +#define AVE_GIMR 0x100   /* Global Interrupt Mask */
> +#define AVE_GISR 0x104   /* Global Interrupt Status */
> +
> +/* MAC Register Group */
> +#define AVE_TXCR 0x200   /* TX Setup */
> +#define AVE_RXCR 0x204   /* RX Setup */
> +#define AVE_RXMAC1R  0x208   /* MAC address (lower) */
> +#define AVE_RXMAC2R  0x20c   /

Re: ipset losing entries on its own

2017-09-08 Thread Akshat Kakkar
Any more information needed?


Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

2017-09-08 Thread Eric Dumazet
On Fri, 2017-09-08 at 11:55 -0700, Eduardo Valentin wrote:
> Hello,
> 
> On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> > From: David Woodhouse 
> > Date: Fri, 08 Sep 2017 18:23:22 +0100
> > 
> > > I don't know that anyone's ever tried saying "show me the chapter
> and
> > > verse of the documentation"
> > 
> > Do you know why I brought this up?  Because the person I am replying
> > to told me that the syscall documentation should have suggested this
> > or that.
> > 
> > That's why.
> 
> :-) My intention was for sure not to upset anybody.
> 
> Just to reiterate, the point of patch is simple, there was a change in
> behavior in the system call from one kernel version to the other. As I
> mentioned, I agree that the userspace could use other means to achieve
> the same, but still the system call behavior has changed.
> 
> > 
> > So let's concentrate on the other aspects of my reply, ok?
> 
> I agree. I would prefer to understand here what is the technical
> reason not to accept these patches other than "use other system
> calls".

So if we need to replace all 'legacy' timers to high resolution timer,
because some application was _relying_ on jiffies being kind of precise,
maybe it is better to revert the change done on legacy timers.

Or continue the migration and make them use high res internally.

select() and poll() are the standard way to have precise timeouts,
it is silly we have to maintain a timeout handling in the datagram fast
path.





Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Florian Fainelli
On 09/08/2017 12:01 PM, Andrew Lunn wrote:
>>> So i would suggest one driver supporting all the different devices.
>>
>> There will be 5 drivers to support these devices:
>>
>> ksz9477.c - KSZ9893/KSZ9897/KSZ9567/KSZ9566/KSZ9477
>> ksz8795.c - KSZ8795/KSZ8795/KSZ8765
>> ksz8895.c - KSZ8895/KSZ8864
>> ksz8863.c - KSZ8863/KSZ8873
>> ksz8463.c - KSZ8463
>>
>> These chips have different SPI access mechanisms, MIB counter reading,
>> and register set.  These can be combined into one single driver using
>> function pointers, at least for ksz8795/ksz8895/ksz8863/ksz8463.  My
>> only concern is the memory footprint.  The customer may not want a
>> big driver to cover all the switches while only one is used.
> 
> If memory footprint is your problem, make it a compile time choice
> which devices are supported within the one driver. In practice, you
> will find most distributions just enable them all.
>  
>> Out of topic I have a question to ask the community regarding the DSA
>> port creation:
>>
>> Port 1 can be specified using the reg parameter specifying 0; port 2, 1;
>> and so on.  The KSZ8794 is a variant of KSZ8795 with port 4 disabled.
>> So
>> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 4.
>> But our company Marketing does not want to promote that fact but treat
>> KSZ8794 as a distinct product.
>> So
>> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 3.
>> Is this okay to hide this information inside the driver?  This is manageable
>> for KSZ8794 but for KSZ8864 the first port is disabled:
>> lan1 = 1, lan2 = 2, lan3 = 3, cpu = 4.
>>
>> I am not sure whether DSA has or will have a way to display the port
>> mapping to regular users.
> 
> So the port number to name is determined in the device tree. It
> depends on the board how the ports are named. I always suggest going
> by the label on the box. clearfog is a good example of this, 0 - lan5,
> 1 - lan4, etc.
> 
> So for the KSZ8864, ensure reg = 0 causes an error.
> 
> Does the hardware force which port is used for the CPU? I've boards
> with Marvell devices where the CPU is port 0, or port 6, or port
> 7. The hardware does not care. So we don't force anything.
> 
> As for KSZ8794 vs KSZ8795, is there different IDs in the silicon?  It
> is these IDs you are using, not the compatible string, to determine
> how the drive the silicon. You can trust the ID in the
> silicon. Anything else can be wrong.

You can't always trust the silicon to report the right revision, just
like marketing people in semiconductors tend to like to play tricks and
pretend that the same chip is the same, but it is not quite, and a new
tape-out was not possible, so it has the same ID, including the
revision. This is of course not what should be done, but it happens all
the time.

Describe in Device Tree with a proper compatible string, which may
end-up being treated the same way by the driver, and if it happens that
the silicon is not wrong, and can be differentiated, then you can always
resort to that to warn the user the wrong compatible was specified, or
that this is not going to work, or anything really.
-- 
Florian


Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Andrew Lunn
> > So i would suggest one driver supporting all the different devices.
> 
> There will be 5 drivers to support these devices:
> 
> ksz9477.c - KSZ9893/KSZ9897/KSZ9567/KSZ9566/KSZ9477
> ksz8795.c - KSZ8795/KSZ8795/KSZ8765
> ksz8895.c - KSZ8895/KSZ8864
> ksz8863.c - KSZ8863/KSZ8873
> ksz8463.c - KSZ8463
> 
> These chips have different SPI access mechanisms, MIB counter reading,
> and register set.  These can be combined into one single driver using
> function pointers, at least for ksz8795/ksz8895/ksz8863/ksz8463.  My
> only concern is the memory footprint.  The customer may not want a
> big driver to cover all the switches while only one is used.

If memory footprint is your problem, make it a compile time choice
which devices are supported within the one driver. In practice, you
will find most distributions just enable them all.
 
> Out of topic I have a question to ask the community regarding the DSA
> port creation:
> 
> Port 1 can be specified using the reg parameter specifying 0; port 2, 1;
> and so on.  The KSZ8794 is a variant of KSZ8795 with port 4 disabled.
> So
> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 4.
> But our company Marketing does not want to promote that fact but treat
> KSZ8794 as a distinct product.
> So
> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 3.
> Is this okay to hide this information inside the driver?  This is manageable
> for KSZ8794 but for KSZ8864 the first port is disabled:
> lan1 = 1, lan2 = 2, lan3 = 3, cpu = 4.
> 
> I am not sure whether DSA has or will have a way to display the port
> mapping to regular users.

So the port number to name is determined in the device tree. It
depends on the board how the ports are named. I always suggest going
by the label on the box. clearfog is a good example of this, 0 - lan5,
1 - lan4, etc.

So for the KSZ8864, ensure reg = 0 causes an error.

Does the hardware force which port is used for the CPU? I've boards
with Marvell devices where the CPU is port 0, or port 6, or port
7. The hardware does not care. So we don't force anything.

As for KSZ8794 vs KSZ8795, is there different IDs in the silicon?  It
is these IDs you are using, not the compatible string, to determine
how the drive the silicon. You can trust the ID in the
silicon. Anything else can be wrong.

 Andrew


Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Maxim Uvarov
2017-09-08 21:48 GMT+03:00 Florian Fainelli :
> On 09/07/2017 02:11 PM, tristram...@microchip.com wrote:
>> From: Tristram Ha 
>>
>> Add other KSZ switches support so that patch check does not complain.
>>
>> Signed-off-by: Tristram Ha 
>> ---
>>  Documentation/devicetree/bindings/net/dsa/ksz.txt | 117 
>> --
>>  1 file changed, 62 insertions(+), 55 deletions(-)
>>
>> diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt 
>> b/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> index 0ab8b39..34af0e0 100644
>> --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
>> @@ -3,8 +3,15 @@ Microchip KSZ Series Ethernet switches
>>
>>  Required properties:
>>
>> -- compatible: For external switch chips, compatible string must be exactly 
>> one
>> -  of: "microchip,ksz9477"
>> +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
>> +   "microchip,ksz8795" for KSZ8795 chip,
>> +   "microchip,ksz8794" for KSZ8794 chip,
>> +   "microchip,ksz8765" for KSZ8765 chip,
>> +   "microchip,ksz8895" for KSZ8895 chip,
>> +   "microchip,ksz8864" for KSZ8864 chip,
>> +   "microchip,ksz8873" for KSZ8873 chip,
>> +   "microchip,ksz8863" for KSZ8863 chip,
>> +   "microchip,ksz8463" for KSZ8463 chip
>


Tristram, does any of this devices support chaining?

Maxim.

> It becomes pretty obvious there is a 1 to 1 mapping between the
> compatible name and what you should be using it for so specifying
> ksz8795 for KSZ8795 chip is really redundant.
>
> You could just list all compatible strings that you support and change
> the verbiage to be:
>
> compatible: Should be one of:
> "microchip,ksz9477"
> ...
> "microchip,ksz8463"
>>
>>  See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional  
>> required and optional properties.
>> @@ -13,60 +20,60 @@ Examples:
>>
>>  Ethernet switch connected via SPI to the host, CPU port wired to eth0:
>>
>> - eth0: ethernet@10001000 {
>> - fixed-link {
>> - speed = <1000>;
>> - full-duplex;
>> - };
>> - };
>> + eth0: ethernet@10001000 {
>> + fixed-link {
>> + speed = <1000>;
>> + full-duplex;
>> + };
>> + };
>
> This is a good clean up, but it would probably belong in a separate
> patch that you would do before adding compatible strings for instance.
>
>>
>> - spi1: spi@f8008000 {
>> - pinctrl-0 = <&pinctrl_spi_ksz>;
>> - cs-gpios = <&pioC 25 0>;
>> - id = <1>;
>> - status = "okay";
>> + spi1: spi@f8008000 {
>> + cs-gpios = <&pioC 25 0>;
>> + id = <1>;
>> + status = "okay";
>>
>> - ksz9477: ksz9477@0 {
>> - compatible = 
>> "microchip,ksz9477";
>> - reg = <0>;
>> + ksz9477: ksz9477@0 {
>> + compatible = "microchip,ksz9477";
>> + reg = <0>;
>>
>> - 
>> spi-max-frequency = <4400>;
>> - spi-cpha;
>> - spi-cpol;
>> + spi-max-frequency = <4400>;
>> + spi-cpha;
>> + spi-cpol;
>> +
>> + status = "okay";
>> + ports {
>> + #address-cells = <1>;
>> + #size-cells = <0>;
>> + port@0 {
>> + reg = <0>;
>> + label = "lan1";
>> + };
>> + port@1 {
>> + reg = <1>;
>> + label = "lan2";
>> + };
>> + port@2 {
>> + reg = <2>;
>> + label = "lan3";
>> + };
>> + port@3 {
>> + reg = <3>;
>> + label = "lan4";
>> + };
>> + port@4 {
>> +  

Re: [PATCH v2 0/2] enable hires timer to timeout datagram socket

2017-09-08 Thread Eduardo Valentin
Hello,

On Fri, Sep 08, 2017 at 10:26:45AM -0700, David Miller wrote:
> From: David Woodhouse 
> Date: Fri, 08 Sep 2017 18:23:22 +0100
> 
> > I don't know that anyone's ever tried saying "show me the chapter and
> > verse of the documentation"
> 
> Do you know why I brought this up?  Because the person I am replying
> to told me that the syscall documentation should have suggested this
> or that.
> 
> That's why.

:-) My intention was for sure not to upset anybody.

Just to reiterate, the point of patch is simple, there was a change in behavior 
in the system call from one kernel version to the other. As I mentioned, I 
agree that the userspace could use other means to achieve the same, but still 
the system call behavior has changed.

> 
> So let's concentrate on the other aspects of my reply, ok?

I agree. I would prefer to understand here what is the technical reason not to 
accept these patches other than "use other system calls".


-- 
All the best,
Eduardo Valentin


Re: [PATCH net-next 1/3] dt-bindings: net: add DT bindings for Socionext UniPhier AVE

2017-09-08 Thread Florian Fainelli
On 09/08/2017 06:02 AM, Kunihiko Hayashi wrote:
> DT bindings for the AVE ethernet controller found on Socionext's
> UniPhier platforms.
> 
> Signed-off-by: Kunihiko Hayashi 
> Signed-off-by: Jassi Brar 
> ---
>  .../bindings/net/socionext,uniphier-ave4.txt   | 44 
> ++
>  1 file changed, 44 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> 
> diff --git 
> a/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt 
> b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> new file mode 100644
> index 000..57ae96d
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/net/socionext,uniphier-ave4.txt
> @@ -0,0 +1,44 @@
> +* Socionext AVE ethernet controller
> +
> +This describes the devicetree bindings for AVE ethernet controller
> +implemented on Socionext UniPhier SoCs.
> +
> +Required properties:
> + - compatible:  Should be "socionext,uniphier-ave4"
> + - reg: Address where registers are mapped and size of region.
> + - interrupts: IRQ common for mac and phy interrupts.
> + - phy-mode: See ethernet.txt in the same directory.
> + - #address-cells: Must be <1>.
> + - #size-cells: Must be <0>.
> + - Node, with 'reg' property, for each PHY on the MDIO bus.
> +
> +Optional properties:
> + - socionext,desc-bits: 32/64 descriptor size. Default 32.

32 bits, this has an unit. Why is this configurable through Device Tree
and if this is describing the HW, as it should, why is not that inferred
from the compatible string?

> + - local-mac-address: See ethernet.txt in the same directory.
> + - pinctrl-names: List of assigned state names, see pinctrl
> + binding documentation.
> + - pinctrl-0: List of phandles to configure the GPIO pin used
> + as interrupt line, see also generic and your platform
> + specific pinctrl binding documentation.
> + - socionext,internal-phy-interrupt: Boolean to denote if the
> + PHY interrupt is internally handled by the MAC.

Interesting, why do you need to declare this as opposed to also
determining this entirely from the compatible string of the Ethernet MAC?

> +
> +
> +Example:
> +
> + eth: ethernet@6500 {
> + compatible = "socionext,uniphier-ave4";
> + reg = <0x6500 0x8500>;
> + interrupts = <0 66 4>;
> + pinctrl-names = "default";
> + pinctrl-0 = <&pinctrl_ether_rgmii>;
> + phy-mode = "rgmii";
> + socionext,desc-bits = <64>;
> + local-mac-address = [00 00 00 00 00 00];
> +
> + #address-cells = <1>;
> + #size-cells = <0>;

There is no phy-handle property, yet there clearly seems to be one
attached, and I think Andrew now also responded to that in the driver
patch, this needs fixing.

> + ethphy: ethphy@1 {
> + reg = <1>;
> + };
> + };
> 


-- 
Florian


Re: [PATCH net-next 3/3] net: phy: realtek: add RTL8201F phy-id and functions

2017-09-08 Thread Florian Fainelli
On 09/08/2017 06:02 AM, Kunihiko Hayashi wrote:
> From: Jassi Brar 
> 
> Add RTL8201F phy-id and the related functions to the driver.
> 
> The original patch is as follows:
> https://patchwork.kernel.org/patch/2538341/
> 
> Signed-off-by: Jongsung Kim 
> Signed-off-by: Jassi Brar 
> Signed-off-by: Kunihiko Hayashi 
> ---
>  drivers/net/phy/realtek.c | 45 +
>  1 file changed, 45 insertions(+)
> 
> diff --git a/drivers/net/phy/realtek.c b/drivers/net/phy/realtek.c
> index 9cbe645..d9974ce 100644
> --- a/drivers/net/phy/realtek.c
> +++ b/drivers/net/phy/realtek.c
> @@ -29,10 +29,23 @@
>  #define RTL8211F_PAGE_SELECT 0x1f
>  #define RTL8211F_TX_DELAY0x100
>  
> +#define RTL8201F_ISR 0x1e
> +#define RTL8201F_PAGE_SELECT 0x1f

We have a page select register define for the RTL8211F right above, so
surely we can make that a common definition?

> +#define RTL8201F_IER 0x13
> +
>  MODULE_DESCRIPTION("Realtek PHY driver");
>  MODULE_AUTHOR("Johnson Leung");
>  MODULE_LICENSE("GPL");
>  
> +static int rtl8201_ack_interrupt(struct phy_device *phydev)
> +{
> + int err;
> +
> + err = phy_read(phydev, RTL8201F_ISR);
> +
> + return (err < 0) ? err : 0;
> +}
> +
>  static int rtl821x_ack_interrupt(struct phy_device *phydev)
>  {
>   int err;
> @@ -54,6 +67,25 @@ static int rtl8211f_ack_interrupt(struct phy_device 
> *phydev)
>   return (err < 0) ? err : 0;
>  }
>  
> +static int rtl8201_config_intr(struct phy_device *phydev)
> +{
> + int err;
> +
> + /* switch to page 7 */
> + phy_write(phydev, RTL8201F_PAGE_SELECT, 0x7);
> +
> + if (phydev->interrupts == PHY_INTERRUPT_ENABLED)
> + err = phy_write(phydev, RTL8201F_IER,
> + BIT(13) | BIT(12) | BIT(11));

Can you detail what bits 11, 12 and 13 do? Do they correspond to link,
duplex and pause changes by any chance?

> + else
> + err = phy_write(phydev, RTL8201F_IER, 0);
> +
> + /* restore to default page 0 */
> + phy_write(phydev, RTL8201F_PAGE_SELECT, 0x0);
> +
> + return err;
> +}
> +

Other than that, LGTM:

Reviewed-by: Florian Fainelli 

>  static int rtl8211b_config_intr(struct phy_device *phydev)
>  {
>   int err;
> @@ -129,6 +161,18 @@ static struct phy_driver realtek_drvs[] = {
>   .config_aneg= &genphy_config_aneg,
>   .read_status= &genphy_read_status,
>   }, {
> + .phy_id = 0x001cc816,
> + .name   = "RTL8201F 10/100Mbps Ethernet",
> + .phy_id_mask= 0x001f,
> + .features   = PHY_BASIC_FEATURES,
> + .flags  = PHY_HAS_INTERRUPT,
> + .config_aneg= &genphy_config_aneg,
> + .read_status= &genphy_read_status,
> + .ack_interrupt  = &rtl8201_ack_interrupt,
> + .config_intr= &rtl8201_config_intr,
> + .suspend= genphy_suspend,
> + .resume = genphy_resume,
> + }, {
>   .phy_id = 0x001cc912,
>   .name   = "RTL8211B Gigabit Ethernet",
>   .phy_id_mask= 0x001f,
> @@ -181,6 +225,7 @@ static struct phy_driver realtek_drvs[] = {
>  module_phy_driver(realtek_drvs);
>  
>  static struct mdio_device_id __maybe_unused realtek_tbl[] = {
> + { 0x001cc816, 0x001f },
>   { 0x001cc912, 0x001f },
>   { 0x001cc914, 0x001f },
>   { 0x001cc915, 0x001f },
> 


-- 
Florian


Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Florian Fainelli
On 09/07/2017 02:11 PM, tristram...@microchip.com wrote:
> From: Tristram Ha 
> 
> Add other KSZ switches support so that patch check does not complain.
> 
> Signed-off-by: Tristram Ha 
> ---
>  Documentation/devicetree/bindings/net/dsa/ksz.txt | 117 
> --
>  1 file changed, 62 insertions(+), 55 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/dsa/ksz.txt 
> b/Documentation/devicetree/bindings/net/dsa/ksz.txt
> index 0ab8b39..34af0e0 100644
> --- a/Documentation/devicetree/bindings/net/dsa/ksz.txt
> +++ b/Documentation/devicetree/bindings/net/dsa/ksz.txt
> @@ -3,8 +3,15 @@ Microchip KSZ Series Ethernet switches
>  
>  Required properties:
>  
> -- compatible: For external switch chips, compatible string must be exactly 
> one
> -  of: "microchip,ksz9477"
> +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
> +   "microchip,ksz8795" for KSZ8795 chip,
> +   "microchip,ksz8794" for KSZ8794 chip,
> +   "microchip,ksz8765" for KSZ8765 chip,
> +   "microchip,ksz8895" for KSZ8895 chip,
> +   "microchip,ksz8864" for KSZ8864 chip,
> +   "microchip,ksz8873" for KSZ8873 chip,
> +   "microchip,ksz8863" for KSZ8863 chip,
> +   "microchip,ksz8463" for KSZ8463 chip

It becomes pretty obvious there is a 1 to 1 mapping between the
compatible name and what you should be using it for so specifying
ksz8795 for KSZ8795 chip is really redundant.

You could just list all compatible strings that you support and change
the verbiage to be:

compatible: Should be one of:
"microchip,ksz9477"
...
"microchip,ksz8463"
>  
>  See Documentation/devicetree/bindings/dsa/dsa.txt for a list of additional  
> required and optional properties.
> @@ -13,60 +20,60 @@ Examples:
>  
>  Ethernet switch connected via SPI to the host, CPU port wired to eth0:
>  
> - eth0: ethernet@10001000 {
> - fixed-link {
> - speed = <1000>;
> - full-duplex;
> - };
> - };
> + eth0: ethernet@10001000 {
> + fixed-link {
> + speed = <1000>;
> + full-duplex;
> + };
> + };

This is a good clean up, but it would probably belong in a separate
patch that you would do before adding compatible strings for instance.

>  
> - spi1: spi@f8008000 {
> - pinctrl-0 = <&pinctrl_spi_ksz>;
> - cs-gpios = <&pioC 25 0>;
> - id = <1>;
> - status = "okay";
> + spi1: spi@f8008000 {
> + cs-gpios = <&pioC 25 0>;
> + id = <1>;
> + status = "okay";
>  
> - ksz9477: ksz9477@0 {
> - compatible = 
> "microchip,ksz9477";
> - reg = <0>;
> + ksz9477: ksz9477@0 {
> + compatible = "microchip,ksz9477";
> + reg = <0>;
>  
> - 
> spi-max-frequency = <4400>;
> - spi-cpha;
> - spi-cpol;
> + spi-max-frequency = <4400>;
> + spi-cpha;
> + spi-cpol;
> +
> + status = "okay";
> + ports {
> + #address-cells = <1>;
> + #size-cells = <0>;
> + port@0 {
> + reg = <0>;
> + label = "lan1";
> + };
> + port@1 {
> + reg = <1>;
> + label = "lan2";
> + };
> + port@2 {
> + reg = <2>;
> + label = "lan3";
> + };
> + port@3 {
> + reg = <3>;
> + label = "lan4";
> + };
> + port@4 {
> + reg = <4>;
> + label = "lan5";
> + };
> + port@5 {
> + reg = <5>;
> +   

Re: WARNING: CPU: 2 PID: 4277 at lib/refcount.c:186

2017-09-08 Thread Eric Dumazet
On Fri, 2017-09-08 at 10:21 -0700, Cong Wang wrote:
> (Cc'ing netdev)
> 
> On Fri, Sep 8, 2017 at 5:59 AM, Shankara Pailoor  wrote:
> > Hi,
> >
> > I found a warning while fuzzing with Syzkaller on linux 4.13-rc7 on
> > x86_64. The full stack trace is below:
> >
> > WARNING: CPU: 2 PID: 4277 at lib/refcount.c:186
> > refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:186
> > Kernel panic - not syncing: panic_on_warn set ...
> >
> > CPU: 2 PID: 4277 Comm: syz-executor0 Not tainted 4.13.0-rc7 #3
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> > Ubuntu-1.8.2-1ubuntu1 04/01/2014
> > Call Trace:
> >  
> >  __dump_stack lib/dump_stack.c:16 [inline]
> >  dump_stack+0xf7/0x1aa lib/dump_stack.c:52
> >  panic+0x1ae/0x3a7 kernel/panic.c:180
> >  __warn+0x1c4/0x1d9 kernel/panic.c:541
> >  report_bug+0x211/0x2d0 lib/bug.c:183
> >  fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
> >  do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
> >  do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
> >  do_error_trap+0x118/0x340 arch/x86/kernel/traps.c:310
> >  do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
> >  invalid_op+0x18/0x20 arch/x86/entry/entry_64.S:846
> > RIP: 0010:refcount_sub_and_test+0x167/0x1b0 lib/refcount.c:186
> > RSP: 0018:88006e006b60 EFLAGS: 00010286
> > RAX: 0026 RBX:  RCX: 
> > RDX: 0026 RSI: 11000dc00d2c RDI: ed000dc00d60
> > RBP: 88006e006bf0 R08: 0001 R09: 
> > R10:  R11:  R12: 11000dc00d6d
> > R13:  R14: 0001 R15: 88006ce9d340
> >  refcount_dec_and_test+0x1a/0x20 lib/refcount.c:211
> >  reqsk_put+0x71/0x2b0 include/net/request_sock.h:123
> >  tcp_v4_rcv+0x259e/0x2e20 net/ipv4/tcp_ipv4.c:1729
> >  ip_local_deliver_finish+0x2e2/0xba0 net/ipv4/ip_input.c:216
> >  NF_HOOK include/linux/netfilter.h:248 [inline]
> >  ip_local_deliver+0x1ce/0x6d0 net/ipv4/ip_input.c:257
> >  dst_input include/net/dst.h:477 [inline]
> >  ip_rcv_finish+0x8db/0x19c0 net/ipv4/ip_input.c:397
> >  NF_HOOK include/linux/netfilter.h:248 [inline]
> >  ip_rcv+0xc3f/0x17d0 net/ipv4/ip_input.c:488
> >  __netif_receive_skb_core+0x1fb7/0x31f0 net/core/dev.c:4298
> >  __netif_receive_skb+0x2c/0x1b0 net/core/dev.c:4336
> >  process_backlog+0x1c5/0x6d0 net/core/dev.c:5102
> >  napi_poll net/core/dev.c:5499 [inline]
> >  net_rx_action+0x6d3/0x14a0 net/core/dev.c:5565
> >  __do_softirq+0x2cb/0xb2d kernel/softirq.c:284
> >  do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:898
> >  
> >  do_softirq.part.16+0x63/0x80 kernel/softirq.c:328
> >  do_softirq kernel/softirq.c:176 [inline]
> >  __local_bh_enable_ip+0x84/0x90 kernel/softirq.c:181
> >  local_bh_enable include/linux/bottom_half.h:31 [inline]
> >  rcu_read_unlock_bh include/linux/rcupdate.h:705 [inline]
> >  ip_finish_output2+0x8ad/0x1360 net/ipv4/ip_output.c:231
> >  ip_finish_output+0x74e/0xb80 net/ipv4/ip_output.c:317
> >  NF_HOOK_COND include/linux/netfilter.h:237 [inline]
> >  ip_output+0x1cc/0x850 net/ipv4/ip_output.c:405
> >  dst_output include/net/dst.h:471 [inline]
> >  ip_local_out+0x95/0x160 net/ipv4/ip_output.c:124
> >  ip_queue_xmit+0x8c6/0x1810 net/ipv4/ip_output.c:504
> >  tcp_transmit_skb+0x1963/0x3320 net/ipv4/tcp_output.c:1123
> >  tcp_send_ack.part.35+0x38c/0x620 net/ipv4/tcp_output.c:3575
> >  tcp_send_ack+0x49/0x60 net/ipv4/tcp_output.c:3545
> >  tcp_rcv_synsent_state_process net/ipv4/tcp_input.c:5795 [inline]
> >  tcp_rcv_state_process+0x4876/0x4b60 net/ipv4/tcp_input.c:5930
> >  tcp_v4_do_rcv+0x58a/0x820 net/ipv4/tcp_ipv4.c:1483
> >  sk_backlog_rcv include/net/sock.h:907 [inline]
> >  __release_sock+0x124/0x360 net/core/sock.c:2223
> >  release_sock+0xa4/0x2a0 net/core/sock.c:2715
> >  inet_wait_for_connect net/ipv4/af_inet.c:557 [inline]
> >  __inet_stream_connect+0x671/0xf00 net/ipv4/af_inet.c:643
> >  inet_stream_connect+0x58/0xa0 net/ipv4/af_inet.c:682
> >  SYSC_connect+0x204/0x470 net/socket.c:1628
> >  SyS_connect+0x24/0x30 net/socket.c:1609
> >  entry_SYSCALL_64_fastpath+0x18/0xad
> > RIP: 0033:0x451e59
> > RSP: 002b:7f474843fc08 EFLAGS: 0216 ORIG_RAX: 002a
> > RAX: ffda RBX: 00718000 RCX: 00451e59
> > RDX: 0010 RSI: 20002000 RDI: 0007
> > RBP: 0046 R08:  R09: 
> > R10:  R11: 0216 R12: 
> > R13: 7ffc040a0f8f R14: 7f47484409c0 R15: 
> >
> >
> >
> >
> > I found that the following program is able to reproduce the warning:
> >
> >
> > Pastebin: https://pastebin.com/B75BdYKz
> >
> > Here are my configs: https://pastebin.com/zRYCXbak
> >
> > Regards,
> > Shankara
> >

Sweet, thanks for the report, I will have a look.

It seems one reqsk_put(req); is missing, but that would lead to a memory
leak, not a double reqsk_put(req) :/

( Same fix is needed for IPv6 )

diff --git a/net/ipv4/tcp_i

Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Florian Fainelli
On 09/08/2017 11:40 AM, tristram...@microchip.com wrote:
>> -Original Message-
>> From: Andrew Lunn [mailto:and...@lunn.ch]
>> Sent: Friday, September 08, 2017 7:12 AM
>> To: Maxim Uvarov
>> Cc: Tristram Ha - C24268; Pavel Machek; Nathan Conrad; Vivien Didelot; 
>> Florian
>> Fainelli; netdev; linux-ker...@vger.kernel.org; Woojung Huh - C21699
>> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new
>> drivers can be added
>>
>> On Fri, Sep 08, 2017 at 04:32:35PM +0300, Maxim Uvarov wrote:
>>> 2017-09-08 0:54 GMT+03:00 Andrew Lunn :
> -- compatible: For external switch chips, compatible string must be 
> exactly
>> one
> -  of: "microchip,ksz9477"
> +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
> +   "microchip,ksz8795" for KSZ8795 chip,
> +   "microchip,ksz8794" for KSZ8794 chip,
> +   "microchip,ksz8765" for KSZ8765 chip,
> +   "microchip,ksz8895" for KSZ8895 chip,
> +   "microchip,ksz8864" for KSZ8864 chip,
> +   "microchip,ksz8873" for KSZ8873 chip,
> +   "microchip,ksz8863" for KSZ8863 chip,
> +   "microchip,ksz8463" for KSZ8463 chip

>>>
>>> all that chips have the same spi access to get chip id on probe(). I
>>> prefer common microship,ksz-spi rather than somebody will always
>>> maintain that list.
>>
>> The Marvell DSA driver is similar. The compatibility string tells you
>> enough to go find the switch ID in the switch itself.
>>
>> I suppose this comes down to, is there going to be one SPI driver for
>> all the devices, or lots of drivers? In general, DSA has one driver
>> for lots of devices. The mv88e6xxx supports around 25 devices. The b53
>> has around 17, etc.
>>
>> So i would suggest one driver supporting all the different devices.
>>
> 
> There will be 5 drivers to support these devices:
> 
> ksz9477.c - KSZ9893/KSZ9897/KSZ9567/KSZ9566/KSZ9477
> ksz8795.c - KSZ8795/KSZ8795/KSZ8765
> ksz8895.c - KSZ8895/KSZ8864
> ksz8863.c - KSZ8863/KSZ8873
> ksz8463.c - KSZ8463
> 
> These chips have different SPI access mechanisms, MIB counter reading,
> and register set.  These can be combined into one single driver using
> function pointers, at least for ksz8795/ksz8895/ksz8863/ksz8463.  My
> only concern is the memory footprint.  The customer may not want a
> big driver to cover all the switches while only one is used.
> 
> Out of topic I have a question to ask the community regarding the DSA
> port creation:
> 
> Port 1 can be specified using the reg parameter specifying 0; port 2, 1;
> and so on.  The KSZ8794 is a variant of KSZ8795 with port 4 disabled.
> So
> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 4.
> But our company Marketing does not want to promote that fact but treat
> KSZ8794 as a distinct product.
> So
> lan1 = 0, lan2 = 1, lan3 = 2, cpu = 3.
> Is this okay to hide this information inside the driver?  This is manageable
> for KSZ8794 but for KSZ8864 the first port is disabled:
> lan1 = 1, lan2 = 2, lan3 = 3, cpu = 4.

What dictates all of that is ultimately Device Tree because it defines
the port mapping, including where the CPU port is. Once your driver
knows which chip it is "talking to" you could always have the driver
issue warnings and indicate that the Device Tree is malformed if the
user-specified port mapping does not match what the HW supports.

> 
> I am not sure whether DSA has or will have a way to display the port
> mapping to regular users.

DSA does display the port mapping of user facing ports under the
standard sysfs attribute /sys/class/net/*/phys_port_name. CPU port
mapping is not displayed because there is no CPU-port network device
(intentionally).
-- 
Florian


RE: [PATCH RFC] Update documentation for KSZ DSA drivers so that new drivers can be added

2017-09-08 Thread Tristram.Ha
> -Original Message-
> From: Andrew Lunn [mailto:and...@lunn.ch]
> Sent: Friday, September 08, 2017 7:12 AM
> To: Maxim Uvarov
> Cc: Tristram Ha - C24268; Pavel Machek; Nathan Conrad; Vivien Didelot; Florian
> Fainelli; netdev; linux-ker...@vger.kernel.org; Woojung Huh - C21699
> Subject: Re: [PATCH RFC] Update documentation for KSZ DSA drivers so that new
> drivers can be added
> 
> On Fri, Sep 08, 2017 at 04:32:35PM +0300, Maxim Uvarov wrote:
> > 2017-09-08 0:54 GMT+03:00 Andrew Lunn :
> > >> -- compatible: For external switch chips, compatible string must be 
> > >> exactly
> one
> > >> -  of: "microchip,ksz9477"
> > >> +- compatible: Should be "microchip,ksz9477" for KSZ9477 chip,
> > >> +   "microchip,ksz8795" for KSZ8795 chip,
> > >> +   "microchip,ksz8794" for KSZ8794 chip,
> > >> +   "microchip,ksz8765" for KSZ8765 chip,
> > >> +   "microchip,ksz8895" for KSZ8895 chip,
> > >> +   "microchip,ksz8864" for KSZ8864 chip,
> > >> +   "microchip,ksz8873" for KSZ8873 chip,
> > >> +   "microchip,ksz8863" for KSZ8863 chip,
> > >> +   "microchip,ksz8463" for KSZ8463 chip
> > >
> >
> > all that chips have the same spi access to get chip id on probe(). I
> > prefer common microship,ksz-spi rather than somebody will always
> > maintain that list.
> 
> The Marvell DSA driver is similar. The compatibility string tells you
> enough to go find the switch ID in the switch itself.
> 
> I suppose this comes down to, is there going to be one SPI driver for
> all the devices, or lots of drivers? In general, DSA has one driver
> for lots of devices. The mv88e6xxx supports around 25 devices. The b53
> has around 17, etc.
> 
> So i would suggest one driver supporting all the different devices.
> 

There will be 5 drivers to support these devices:

ksz9477.c - KSZ9893/KSZ9897/KSZ9567/KSZ9566/KSZ9477
ksz8795.c - KSZ8795/KSZ8795/KSZ8765
ksz8895.c - KSZ8895/KSZ8864
ksz8863.c - KSZ8863/KSZ8873
ksz8463.c - KSZ8463

These chips have different SPI access mechanisms, MIB counter reading,
and register set.  These can be combined into one single driver using
function pointers, at least for ksz8795/ksz8895/ksz8863/ksz8463.  My
only concern is the memory footprint.  The customer may not want a
big driver to cover all the switches while only one is used.

Out of topic I have a question to ask the community regarding the DSA
port creation:

Port 1 can be specified using the reg parameter specifying 0; port 2, 1;
and so on.  The KSZ8794 is a variant of KSZ8795 with port 4 disabled.
So
lan1 = 0, lan2 = 1, lan3 = 2, cpu = 4.
But our company Marketing does not want to promote that fact but treat
KSZ8794 as a distinct product.
So
lan1 = 0, lan2 = 1, lan3 = 2, cpu = 3.
Is this okay to hide this information inside the driver?  This is manageable
for KSZ8794 but for KSZ8864 the first port is disabled:
lan1 = 1, lan2 = 2, lan3 = 3, cpu = 4.

I am not sure whether DSA has or will have a way to display the port
mapping to regular users.



Re: [PATCH RFC 5/5] Add KSZ8795 SPI driver

2017-09-08 Thread Andrew Lunn
> Sorry about the formatting.  It seems my e-mail system needs to be checked
> to make sure it does not auto-format the contents again.

I've never seen issues like this with git send-email. Please use
it. Email problems generally happen with the client, not the
backend. What client did you use to send these patches?

> About the SPI access functions they are the same for each driver except the
> low level ksz_spi_read_reg and ksz_spi_write_reg.  The dev_io_ops structure
> should contain only those 2 and ksz_spi_get and ksz_spi_set.
> 
> But that requires changing ksz_spi.c.  The idea was to keep the code of
> KSZ9477 driver with little change as possible while introducing another 
> driver.

Maintainability is always the primary goal. If you need to change the
KSZ9477 code to make the whole more maintainable, do so. Just make it
lots of small, easy to review, obviously correct changes.

 Andrew


RE: [PATCH RFC 3/5] Add KSZ8795 switch driver

2017-09-08 Thread Woojung.Huh
> > > > @@ -0,0 +1,2066 @@
> > > > +/*
> > > > + * Microchip KSZ8795 switch driver
> > > > + *
> > > > + * Copyright (C) 2017 Microchip Technology Inc.
> > > > + * Tristram Ha 
> > > > + *
> > > > + * Permission to use, copy, modify, and/or distribute this software for
> any
> > > > + * purpose with or without fee is hereby granted, provided that the
> above
> > > > + * copyright notice and this permission notice appear in all copies.
> > > > + *
> > > > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS
> ALL
> > > WARRANTIES
> > > > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED
> WARRANTIES
> > > OF
> > > > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR
> BE
> > > LIABLE FOR
> > > > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR
> ANY
> > > DAMAGES
> > > > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS,
> WHETHER
> > > IN AN
> > > > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> > > ARISING OUT OF
> > > > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
> SOFTWARE.
> > > > + */
> > >
> > > This is not exactly GPL, right? But tagging below says it is
> > > GPL. Please fix one.
> > >
> >
> > This boilerplate paragraph was copied from the KSZ9477 driver, although I
> did
> > wonder why this was used.
> 
> Hi Tristram
> 
> Please can you talk to your legal people and see if this can be
> replaced with the standard GPL text?
This should be replaced to GPL. These text copied from drivers/net/dsa/b53/*.
Will submit patches of drivers/net/dsa/microchip/*

- Woojung


Re: [PATCH 0/9] Netfilter/IPVS fixes for net

2017-09-08 Thread David Miller
From: Pablo Neira Ayuso 
Date: Fri,  8 Sep 2017 19:45:39 +0200

> The following patchset contains Netfilter/IPVS fixes for your net tree,
> they are:
> 
> 1) Fix SCTP connection setup when IPVS module is loaded and any scheduler
>is registered, from Xin Long.
> 
> 2) Don't create a SCTP connection from SCTP ABORT packets, also from
>Xin Long.
> 
> 3) WARN_ON() and drop packet, instead of BUG_ON() races when calling
>nf_nat_setup_info(). This is specifically a longstanding problem
>when br_netfilter with conntrack support is in place, patch from
>Florian Westphal.
> 
> 4) Avoid softlock splats via iptables-restore, also from Florian.
> 
> 5) Revert NAT hashtable conversion to rhashtable, semantics of rhlist
>are different from our simple NAT hashtable, this has been causing
>problems in the recent Linux kernel releases. From Florian.
> 
> 6) Add per-bucket spinlock for NAT hashtable, so at least we restore
>one of the benefits we got from the previous rhashtable conversion.
> 
> 7) Fix incorrect hashtable size in memory allocation in xt_hashlimit,
>from Zhizhou Tian.
> 
> 8) Fix build/link problems with hashlimit and 32-bit arches, to address
>recent fallout from a new hashlimit mode, from Vishwanath Pai.

Pulled, thanks Pablo.


Re: [PATCH RFC 3/5] Add KSZ8795 switch driver

2017-09-08 Thread Andrew Lunn
> > > @@ -0,0 +1,2066 @@
> > > +/*
> > > + * Microchip KSZ8795 switch driver
> > > + *
> > > + * Copyright (C) 2017 Microchip Technology Inc.
> > > + *   Tristram Ha 
> > > + *
> > > + * Permission to use, copy, modify, and/or distribute this software for 
> > > any
> > > + * purpose with or without fee is hereby granted, provided that the above
> > > + * copyright notice and this permission notice appear in all copies.
> > > + *
> > > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
> > WARRANTIES
> > > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES
> > OF
> > > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE
> > LIABLE FOR
> > > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY
> > DAMAGES
> > > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
> > IN AN
> > > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> > ARISING OUT OF
> > > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > > + */
> > 
> > This is not exactly GPL, right? But tagging below says it is
> > GPL. Please fix one.
> > 
> 
> This boilerplate paragraph was copied from the KSZ9477 driver, although I did
> wonder why this was used.

Hi Tristram

Please can you talk to your legal people and see if this can be
replaced with the standard GPL text?

> > > + for (timeout = 1; timeout > 0; timeout--) {
> > > + ksz_read8(dev, REG_IND_MIB_CHECK, &check);
> > > +
> > > + if (check & MIB_COUNTER_VALID) {
> > > + ksz_read32(dev, REG_IND_DATA_LO, &data);
> > > + if (addr < 2) {
> > > + u64 total;
> > > +
> > > + total = check & MIB_TOTAL_BYTES_H;
> > > + total <<= 32;
> > > + *cnt += total;
> > > + *cnt += data;
> > > + if (check & MIB_COUNTER_OVERFLOW) {
> > > + total = MIB_TOTAL_BYTES_H + 1;
> > > + total <<= 32;
> > > + *cnt += total;
> > > + }
> > > + } else {
> > > + if (check & MIB_COUNTER_OVERFLOW)
> > > + *cnt += MIB_PACKET_DROPPED + 1;
> > > + *cnt += data & MIB_PACKET_DROPPED;
> > > + }
> > > + break;
> > > + }
> > > + }
> > 
> > Why do you need a loop here? This is quite strange code. (And you have
> > similar strangeness elsewhere. Please fix.)
> > 
> 
> The MIB_COUNTER_VALID bit may be invalid on first read, although in slow
> SPI speed it never happens.  The timeout value should be increased to 2.

Maybe timeout is the wrong name? There is nothing to do with time
here.
 
> > > +static int valid_dyn_entry(struct ksz_device *dev, u8 *data)
> > > +{
> > > + int timeout = 100;
> > > +
> > > + do {
> > > + ksz_read8(dev, REG_IND_DATA_CHECK, data);
> > > + timeout--;
> > > + } while ((*data & DYNAMIC_MAC_TABLE_NOT_READY) && timeout);
> > > +
> > > + /* Entry is not ready for accessing. */
> > > + if (*data & DYNAMIC_MAC_TABLE_NOT_READY) {
> > > + return 1;
> > > + /* Entry is ready for accessing. */
> > > + } else {
> > > + ksz_read8(dev, REG_IND_DATA_8, data);
> > > +
> > > + /* There is no valid entry in the table. */
> > > + if (*data & DYNAMIC_MAC_TABLE_MAC_EMPTY)
> > > + return 2;
> > > + }
> > > + return 0;
> > > +}
> > 
> > Normal calling convention is 0 / -ERROR, not 0,1,2.
> >
> 
> This is an internal function that is not returning any error.  It just reports
> different conditions so the calling function decides what to do.

Still, best practice is to use standard error codes.
  
Andrew


Re: [PATCH net] ipv6: fix memory leak with multiple tables during netns destruction

2017-09-08 Thread Cong Wang
On Fri, Sep 8, 2017 at 1:26 AM, Sabrina Dubroca  wrote:
> fib6_net_exit only frees the main and local tables. If another table was
> created with fib6_alloc_table, we leak it when the netns is destroyed.
>
> Fix this in the same way ip_fib_net_exit cleans up tables, by walking
> through the whole hashtable of fib6_table's. We can get rid of the
> special cases for local and main, since they're also part of the
> hashtable.
>
> Reproducer:
> ip netns add x
> ip -net x -6 rule add from 6003:1::/64 table 100
> ip netns del x
>
> Reported-by: Jianlin Shi 
> Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
> Signed-off-by: Sabrina Dubroca 
> ---
>  net/ipv6/ip6_fib.c | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index a3b5c163325f..8280172c806c 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -191,6 +191,12 @@ void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
>  }
>  EXPORT_SYMBOL_GPL(rt6_free_pcpu);
>
> +static void fib6_free_table(struct fib6_table *table)
> +{
> +   inetpeer_invalidate_tree(&table->tb6_peers);
> +   kfree(table);
> +}
> +
>  static void fib6_link_table(struct net *net, struct fib6_table *tb)
>  {
> unsigned int h;
> @@ -2022,15 +2028,22 @@ static int __net_init fib6_net_init(struct net *net)
>
>  static void fib6_net_exit(struct net *net)
>  {
> +   unsigned int i;
> +
> rt6_ifdown(net, NULL);
> del_timer_sync(&net->ipv6.ip6_fib_timer);
>
> -#ifdef CONFIG_IPV6_MULTIPLE_TABLES
> -   inetpeer_invalidate_tree(&net->ipv6.fib6_local_tbl->tb6_peers);
> -   kfree(net->ipv6.fib6_local_tbl);
> -#endif
> -   inetpeer_invalidate_tree(&net->ipv6.fib6_main_tbl->tb6_peers);
> -   kfree(net->ipv6.fib6_main_tbl);
> +   for (i = 0; i < FIB_TABLE_HASHSZ; i++) {
> +   struct hlist_head *head = &net->ipv6.fib_table_hash[i];
> +   struct hlist_node *tmp;
> +   struct fib6_table *tb;
> +
> +   hlist_for_each_entry_safe(tb, tmp, head, tb6_hlist) {
> +   hlist_del(&tb->tb6_hlist);

hlist_del_rcu() ?


> +   fib6_free_table(tb);

kfree_rcu() in fib6_free_table()?


> +   }
> +   }
> +
> kfree(net->ipv6.fib_table_hash);
> kfree(net->ipv6.rt6_stats);
> fib6_notifier_exit(net);
> --
> 2.14.1
>


Re: [PATCH net] ipv6: fix memory leak with multiple tables during netns destruction

2017-09-08 Thread Eric Dumazet
On Fri, 2017-09-08 at 10:26 +0200, Sabrina Dubroca wrote:
> fib6_net_exit only frees the main and local tables. If another table was
> created with fib6_alloc_table, we leak it when the netns is destroyed.
> 
> Fix this in the same way ip_fib_net_exit cleans up tables, by walking
> through the whole hashtable of fib6_table's. We can get rid of the
> special cases for local and main, since they're also part of the
> hashtable.
> 
> Reproducer:
> ip netns add x
> ip -net x -6 rule add from 6003:1::/64 table 100
> ip netns del x
> 
> Reported-by: Jianlin Shi 
> Fixes: 58f09b78b730 ("[NETNS][IPV6] ip6_fib - make it per network namespace")
> Signed-off-by: Sabrina Dubroca 
> ---
>  net/ipv6/ip6_fib.c | 25 +++--
>  1 file changed, 19 insertions(+), 6 deletions(-)
> 
> diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c
> index a3b5c163325f..8280172c806c 100644
> --- a/net/ipv6/ip6_fib.c
> +++ b/net/ipv6/ip6_fib.c
> @@ -191,6 +191,12 @@ void rt6_free_pcpu(struct rt6_info *non_pcpu_rt)
>  }
>  EXPORT_SYMBOL_GPL(rt6_free_pcpu);
>  
> +static void fib6_free_table(struct fib6_table *table)
> +{
> + inetpeer_invalidate_tree(&table->tb6_peers);
> + kfree(table);
> +}
> +
>  static void fib6_link_table(struct net *net, struct fib6_table *tb)
>  {
>   unsigned int h;
> @@ -2022,15 +2028,22 @@ static int __net_init fib6_net_init(struct net *net)
>  
>  static void fib6_net_exit(struct net *net)
>  {
> + unsigned int i;
> +
>   rt6_ifdown(net, NULL);
>   del_timer_sync(&net->ipv6.ip6_fib_timer);
>  
> -#ifdef CONFIG_IPV6_MULTIPLE_TABLES
> - inetpeer_invalidate_tree(&net->ipv6.fib6_local_tbl->tb6_peers);
> - kfree(net->ipv6.fib6_local_tbl);
> -#endif
> - inetpeer_invalidate_tree(&net->ipv6.fib6_main_tbl->tb6_peers);
> - kfree(net->ipv6.fib6_main_tbl);
> + for (i = 0; i < FIB_TABLE_HASHSZ; i++) {

should not this be FIB6_TABLE_HASHSZ ?

> + struct hlist_head *head = &net->ipv6.fib_table_hash[i];
> + struct hlist_node *tmp;
> + struct fib6_table *tb;
> +
> + hlist_for_each_entry_safe(tb, tmp, head, tb6_hlist) {
> + hlist_del(&tb->tb6_hlist);
> + fib6_free_table(tb);
> + }
> + }
> +
>   kfree(net->ipv6.fib_table_hash);
>   kfree(net->ipv6.rt6_stats);
>   fib6_notifier_exit(net);




RE: [PATCH RFC 3/5] Add KSZ8795 switch driver

2017-09-08 Thread Tristram.Ha
> -Original Message-
> From: Pavel Machek [mailto:pa...@ucw.cz]
> Sent: Friday, September 08, 2017 2:19 AM
> To: Tristram Ha - C24268
> Cc: and...@lunn.ch; muva...@gmail.com; nathan.leigh.con...@gmail.com;
> vivien.dide...@savoirfairelinux.com; f.faine...@gmail.com;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org; Woojung Huh - C21699
> Subject: Re: [PATCH RFC 3/5] Add KSZ8795 switch driver
> 
> On Thu 2017-09-07 21:17:16, tristram...@microchip.com wrote:
> > From: Tristram Ha 
> >
> > Add KSZ8795 switch support with function code.
> 
> English? "Add KSZ8795 switch support." ?
> 
> > Signed-off-by: Tristram Ha 
> > ---
> > diff --git a/drivers/net/dsa/microchip/ksz8795.c
> b/drivers/net/dsa/microchip/ksz8795.c
> > new file mode 100644
> > index 000..e4d4e6a
> > --- /dev/null
> > +++ b/drivers/net/dsa/microchip/ksz8795.c
> > @@ -0,0 +1,2066 @@
> > +/*
> > + * Microchip KSZ8795 switch driver
> > + *
> > + * Copyright (C) 2017 Microchip Technology Inc.
> > + * Tristram Ha 
> > + *
> > + * Permission to use, copy, modify, and/or distribute this software for any
> > + * purpose with or without fee is hereby granted, provided that the above
> > + * copyright notice and this permission notice appear in all copies.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL
> WARRANTIES
> > + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES
> OF
> > + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE
> LIABLE FOR
> > + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY
> DAMAGES
> > + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
> IN AN
> > + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION,
> ARISING OUT OF
> > + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
> > + */
> 
> This is not exactly GPL, right? But tagging below says it is
> GPL. Please fix one.
> 

This boilerplate paragraph was copied from the KSZ9477 driver, although I did
wonder why this was used.

> > +static int ksz_reset_switch(struct ksz_device *dev)
> > +{
> > +   /* reset switch */
> > +   ksz_write8(dev, REG_POWER_MANAGEMENT_1,
> > +  SW_SOFTWARE_POWER_DOWN <<
> SW_POWER_MANAGEMENT_MODE_S);
> > +   ksz_write8(dev, REG_POWER_MANAGEMENT_1, 0);
> > +
> > +   return 0;
> > +}
> 
> This is going to be same in other drivers, right?
> 

The switch reset is different in each chip, although KSZ8795 and
KSZ8895 use the same mechanism.

> > +
> > +   for (timeout = 1; timeout > 0; timeout--) {
> > +   ksz_read8(dev, REG_IND_MIB_CHECK, &check);
> > +
> > +   if (check & MIB_COUNTER_VALID) {
> > +   ksz_read32(dev, REG_IND_DATA_LO, &data);
> > +   if (addr < 2) {
> > +   u64 total;
> > +
> > +   total = check & MIB_TOTAL_BYTES_H;
> > +   total <<= 32;
> > +   *cnt += total;
> > +   *cnt += data;
> > +   if (check & MIB_COUNTER_OVERFLOW) {
> > +   total = MIB_TOTAL_BYTES_H + 1;
> > +   total <<= 32;
> > +   *cnt += total;
> > +   }
> > +   } else {
> > +   if (check & MIB_COUNTER_OVERFLOW)
> > +   *cnt += MIB_PACKET_DROPPED + 1;
> > +   *cnt += data & MIB_PACKET_DROPPED;
> > +   }
> > +   break;
> > +   }
> > +   }
> 
> Why do you need a loop here? This is quite strange code. (And you have
> similar strangeness elsewhere. Please fix.)
> 

The MIB_COUNTER_VALID bit may be invalid on first read, although in slow
SPI speed it never happens.  The timeout value should be increased to 2.

> > +static int valid_dyn_entry(struct ksz_device *dev, u8 *data)
> > +{
> > +   int timeout = 100;
> > +
> > +   do {
> > +   ksz_read8(dev, REG_IND_DATA_CHECK, data);
> > +   timeout--;
> > +   } while ((*data & DYNAMIC_MAC_TABLE_NOT_READY) && timeout);
> > +
> > +   /* Entry is not ready for accessing. */
> > +   if (*data & DYNAMIC_MAC_TABLE_NOT_READY) {
> > +   return 1;
> > +   /* Entry is ready for accessing. */
> > +   } else {
> > +   ksz_read8(dev, REG_IND_DATA_8, data);
> > +
> > +   /* There is no valid entry in the table. */
> > +   if (*data & DYNAMIC_MAC_TABLE_MAC_EMPTY)
> > +   return 2;
> > +   }
> > +   return 0;
> > +}
> 
> Normal calling convention is 0 / -ERROR, not 0,1,2.
>

This is an internal function that is not returning any error.  It just reports
different conditions so the calling function decides what to do.
 
> > +static void ksz_r_phy(struct ksz_device *dev, u16 phy, u16 reg, u16 *val)
> > +{
> > +   struct ksz_port *port;
> > +   u8 ctrl;
> > +   u8 restart;
> > +   u8 link;
> > +   u8 speed;
> > +   u8 for

Re: [netfilter-core] [PATCH] netfilter: nat: constify rhashtable_params

2017-09-08 Thread Pablo Neira Ayuso
On Fri, Sep 08, 2017 at 01:46:30PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Aug 30, 2017 at 05:18:04PM +0530, Arvind Yadav wrote:
> > rhashtable_params are not supposed to change at runtime. All
> > Functions rhashtable_* working with const rhashtable_params
> > provided by . So mark the non-const structs
> > as const.
> 
> Applied to nf, thanks.

Actually, I tossed this. NAT rhashtable has been reverted, so no need
for this constification anymore.


[PATCH 3/9] netfilter: nf_nat: don't bug when mapping already exists

2017-09-08 Thread Pablo Neira Ayuso
From: Florian Westphal 

It seems preferrable to limp along if we have a conflicting mapping,
its certainly better than a BUG().

Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_nat_core.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 40573aa6c133..dc3519cc7209 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -416,7 +416,9 @@ nf_nat_setup_info(struct nf_conn *ct,
 
WARN_ON(maniptype != NF_NAT_MANIP_SRC &&
maniptype != NF_NAT_MANIP_DST);
-   BUG_ON(nf_nat_initialized(ct, maniptype));
+
+   if (WARN_ON(nf_nat_initialized(ct, maniptype)))
+   return NF_DROP;
 
/* What we've got will look like inverse of reply. Normally
 * this is what is in the conntrack, except for prior
-- 
2.1.4



[PATCH 0/9] Netfilter/IPVS fixes for net

2017-09-08 Thread Pablo Neira Ayuso
Hi David,

The following patchset contains Netfilter/IPVS fixes for your net tree,
they are:

1) Fix SCTP connection setup when IPVS module is loaded and any scheduler
   is registered, from Xin Long.

2) Don't create a SCTP connection from SCTP ABORT packets, also from
   Xin Long.

3) WARN_ON() and drop packet, instead of BUG_ON() races when calling
   nf_nat_setup_info(). This is specifically a longstanding problem
   when br_netfilter with conntrack support is in place, patch from
   Florian Westphal.

4) Avoid softlock splats via iptables-restore, also from Florian.

5) Revert NAT hashtable conversion to rhashtable, semantics of rhlist
   are different from our simple NAT hashtable, this has been causing
   problems in the recent Linux kernel releases. From Florian.

6) Add per-bucket spinlock for NAT hashtable, so at least we restore
   one of the benefits we got from the previous rhashtable conversion.

7) Fix incorrect hashtable size in memory allocation in xt_hashlimit,
   from Zhizhou Tian.

8) Fix build/link problems with hashlimit and 32-bit arches, to address
   recent fallout from a new hashlimit mode, from Vishwanath Pai.

You can pull these changes from:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git

Thanks!



The following changes since commit 8e0deed92406d93ae0365cb8a6134db5721e7aca:

  tipc: remove unnecessary call to dev_net() (2017-09-06 21:25:52 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git HEAD

for you to fetch changes up to 90c4ae4e2c1da9f1eaf846136861af43d4c1ff34:

  netfilter: xt_hashlimit: fix build error caused by 64bit division (2017-09-08 
18:55:53 +0200)


Florian Westphal (5):
  netfilter: nf_nat: don't bug when mapping already exists
  netfilter: xtables: add scheduling opportunity in get_counters
  netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to 
rhashtable"
  netfilter: nat: use keyed locks
  netfilter: core: remove erroneous warn_on

Vishwanath Pai (1):
  netfilter: xt_hashlimit: fix build error caused by 64bit division

Xin Long (2):
  netfilter: ipvs: fix the issue that sctp_conn_schedule drops non-INIT 
packet
  netfilter: ipvs: do not create conn for ABORT packet in sctp_conn_schedule

Zhizhou Tian (1):
  netfilter: xt_hashlimit: alloc hashtable with right size

 include/net/netfilter/nf_conntrack.h  |   3 +-
 include/net/netfilter/nf_nat.h|   1 -
 net/ipv4/netfilter/arp_tables.c   |   1 +
 net/ipv4/netfilter/ip_tables.c|   1 +
 net/ipv6/netfilter/ip6_tables.c   |   1 +
 net/netfilter/core.c  |   2 +-
 net/netfilter/ipvs/ip_vs_proto_sctp.c |   8 +-
 net/netfilter/nf_nat_core.c   | 146 --
 net/netfilter/xt_hashlimit.c  |  16 ++--
 9 files changed, 88 insertions(+), 91 deletions(-)


[PATCH 2/9] netfilter: ipvs: do not create conn for ABORT packet in sctp_conn_schedule

2017-09-08 Thread Pablo Neira Ayuso
From: Xin Long 

There's no reason for ipvs to create a conn for an ABORT packet
even if sysctl_sloppy_sctp is set.

This patch is to accept it without creating a conn, just as ipvs
does for tcp's RST packet.

Signed-off-by: Xin Long 
Signed-off-by: Simon Horman 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/ipvs/ip_vs_proto_sctp.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c 
b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index 81f08198b125..57c8ee66491e 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -25,7 +25,8 @@ sctp_conn_schedule(struct netns_ipvs *ipvs, int af, struct 
sk_buff *skb,
sch = skb_header_pointer(skb, iph->len + sizeof(_sctph),
 sizeof(_schunkh), &_schunkh);
if (sch) {
-   if (!(sysctl_sloppy_sctp(ipvs) ||
+   if (sch->type == SCTP_CID_ABORT ||
+   !(sysctl_sloppy_sctp(ipvs) ||
  sch->type == SCTP_CID_INIT))
return 1;
ports = &sh->source;
-- 
2.1.4



[PATCH 5/9] netfilter: nat: Revert "netfilter: nat: convert nat bysrc hash to rhashtable"

2017-09-08 Thread Pablo Neira Ayuso
From: Florian Westphal 

This reverts commit 870190a9ec9075205c0fa795a09fa931694a3ff1.

It was not a good idea. The custom hash table was a much better
fit for this purpose.

A fast lookup is not essential, in fact for most cases there is no lookup
at all because original tuple is not taken and can be used as-is.
What needs to be fast is insertion and deletion.

rhlist removal however requires a rhlist walk.
We can have thousands of entries in such a list if source port/addresses
are reused for multiple flows, if this happens removal requests are so
expensive that deletions of a few thousand flows can take several
seconds(!).

The advantages that we got from rhashtable are:
1) table auto-sizing
2) multiple locks

1) would be nice to have, but it is not essential as we have at
most one lookup per new flow, so even a million flows in the bysource
table are not a problem compared to current deletion cost.
2) is easy to add to custom hash table.

I tried to add hlist_node to rhlist to speed up rhltable_remove but this
isn't doable without changing semantics.  rhltable_remove_fast will
check that the to-be-deleted object is part of the table and that
requires a list walk that we want to avoid.

Furthermore, using hlist_node increases size of struct rhlist_head, which
in turn increases nf_conn size.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196821
Reported-by: Ivan Babrou 
Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 include/net/netfilter/nf_conntrack.h |   3 +-
 include/net/netfilter/nf_nat.h   |   1 -
 net/netfilter/nf_nat_core.c  | 130 ++-
 3 files changed, 54 insertions(+), 80 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index fdc9c64a1c94..8f3bd30511de 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -17,7 +17,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include 
 #include 
@@ -77,7 +76,7 @@ struct nf_conn {
possible_net_t ct_net;
 
 #if IS_ENABLED(CONFIG_NF_NAT)
-   struct rhlist_head nat_bysource;
+   struct hlist_node   nat_bysource;
 #endif
/* all members below initialized via memset */
u8 __nfct_init_offset[0];
diff --git a/include/net/netfilter/nf_nat.h b/include/net/netfilter/nf_nat.h
index 05c82a1a4267..b71701302e61 100644
--- a/include/net/netfilter/nf_nat.h
+++ b/include/net/netfilter/nf_nat.h
@@ -1,6 +1,5 @@
 #ifndef _NF_NAT_H
 #define _NF_NAT_H
-#include 
 #include 
 #include 
 #include 
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index dc3519cc7209..f090419f5f97 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -30,19 +30,17 @@
 #include 
 #include 
 
+static DEFINE_SPINLOCK(nf_nat_lock);
+
 static DEFINE_MUTEX(nf_nat_proto_mutex);
 static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO]
__read_mostly;
 static const struct nf_nat_l4proto __rcu **nf_nat_l4protos[NFPROTO_NUMPROTO]
__read_mostly;
 
-struct nf_nat_conn_key {
-   const struct net *net;
-   const struct nf_conntrack_tuple *tuple;
-   const struct nf_conntrack_zone *zone;
-};
-
-static struct rhltable nf_nat_bysource_table;
+static struct hlist_head *nf_nat_bysource __read_mostly;
+static unsigned int nf_nat_htable_size __read_mostly;
+static unsigned int nf_nat_hash_rnd __read_mostly;
 
 inline const struct nf_nat_l3proto *
 __nf_nat_l3proto_find(u8 family)
@@ -118,17 +116,19 @@ int nf_xfrm_me_harder(struct net *net, struct sk_buff 
*skb, unsigned int family)
 EXPORT_SYMBOL(nf_xfrm_me_harder);
 #endif /* CONFIG_XFRM */
 
-static u32 nf_nat_bysource_hash(const void *data, u32 len, u32 seed)
+/* We keep an extra hash for each conntrack, for fast searching. */
+static unsigned int
+hash_by_src(const struct net *n, const struct nf_conntrack_tuple *tuple)
 {
-   const struct nf_conntrack_tuple *t;
-   const struct nf_conn *ct = data;
+   unsigned int hash;
+
+   get_random_once(&nf_nat_hash_rnd, sizeof(nf_nat_hash_rnd));
 
-   t = &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple;
/* Original src, to ensure we map it consistently if poss. */
+   hash = jhash2((u32 *)&tuple->src, sizeof(tuple->src) / sizeof(u32),
+ tuple->dst.protonum ^ nf_nat_hash_rnd ^ net_hash_mix(n));
 
-   seed ^= net_hash_mix(nf_ct_net(ct));
-   return jhash2((const u32 *)&t->src, sizeof(t->src) / sizeof(u32),
- t->dst.protonum ^ seed);
+   return reciprocal_scale(hash, nf_nat_htable_size);
 }
 
 /* Is this tuple already taken? (not by us) */
@@ -184,28 +184,6 @@ same_src(const struct nf_conn *ct,
t->src.u.all == tuple->src.u.all);
 }
 
-static int nf_nat_bysource_cmp(struct rhashtable_compare_arg *arg,
-  const void *obj)
-{
-   const struc

[PATCH 1/9] netfilter: ipvs: fix the issue that sctp_conn_schedule drops non-INIT packet

2017-09-08 Thread Pablo Neira Ayuso
From: Xin Long 

Commit 5e26b1b3abce ("ipvs: support scheduling inverse and icmp SCTP
packets") changed to check packet type early. It introduced a side
effect: if it's not a INIT packet, ports will be set as  NULL, and
the packet will be dropped later.

It caused that sctp couldn't create connection when ipvs module is
loaded and any scheduler is registered on server.

Li Shuang reproduced it by running the cmds on sctp server:
  # ipvsadm -A -t 1.1.1.1:80 -s rr
  # ipvsadm -D -t 1.1.1.1:80
then the server could't work any more.

This patch is to return 1 when it's not an INIT packet. It means ipvs
will accept it without creating a conn for it, just like what it does
for tcp.

Fixes: 5e26b1b3abce ("ipvs: support scheduling inverse and icmp SCTP packets")
Reported-by: Li Shuang 
Signed-off-by: Xin Long 
Signed-off-by: Simon Horman 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/ipvs/ip_vs_proto_sctp.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_proto_sctp.c 
b/net/netfilter/ipvs/ip_vs_proto_sctp.c
index e1efa446b305..81f08198b125 100644
--- a/net/netfilter/ipvs/ip_vs_proto_sctp.c
+++ b/net/netfilter/ipvs/ip_vs_proto_sctp.c
@@ -24,9 +24,12 @@ sctp_conn_schedule(struct netns_ipvs *ipvs, int af, struct 
sk_buff *skb,
if (sh) {
sch = skb_header_pointer(skb, iph->len + sizeof(_sctph),
 sizeof(_schunkh), &_schunkh);
-   if (sch && (sch->type == SCTP_CID_INIT ||
-   sysctl_sloppy_sctp(ipvs)))
+   if (sch) {
+   if (!(sysctl_sloppy_sctp(ipvs) ||
+ sch->type == SCTP_CID_INIT))
+   return 1;
ports = &sh->source;
+   }
}
} else {
ports = skb_header_pointer(
-- 
2.1.4



[PATCH 6/9] netfilter: nat: use keyed locks

2017-09-08 Thread Pablo Neira Ayuso
From: Florian Westphal 

no need to serialize on a single lock, we can partition the table and
add/delete in parallel to different slots.
This restores one of the advantages that got lost with the rhlist
revert.

Cc: Ivan Babrou 
Signed-off-by: Florian Westphal 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/nf_nat_core.c | 36 
 1 file changed, 24 insertions(+), 12 deletions(-)

diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index f090419f5f97..f393a7086025 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -30,7 +30,7 @@
 #include 
 #include 
 
-static DEFINE_SPINLOCK(nf_nat_lock);
+static spinlock_t nf_nat_locks[CONNTRACK_LOCKS];
 
 static DEFINE_MUTEX(nf_nat_proto_mutex);
 static const struct nf_nat_l3proto __rcu *nf_nat_l3protos[NFPROTO_NUMPROTO]
@@ -425,13 +425,15 @@ nf_nat_setup_info(struct nf_conn *ct,
 
if (maniptype == NF_NAT_MANIP_SRC) {
unsigned int srchash;
+   spinlock_t *lock;
 
srchash = hash_by_src(net,
  &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
-   spin_lock_bh(&nf_nat_lock);
+   lock = &nf_nat_locks[srchash % ARRAY_SIZE(nf_nat_locks)];
+   spin_lock_bh(lock);
hlist_add_head_rcu(&ct->nat_bysource,
   &nf_nat_bysource[srchash]);
-   spin_unlock_bh(&nf_nat_lock);
+   spin_unlock_bh(lock);
}
 
/* It's done. */
@@ -525,6 +527,16 @@ static int nf_nat_proto_remove(struct nf_conn *i, void 
*data)
return i->status & IPS_NAT_MASK ? 1 : 0;
 }
 
+static void __nf_nat_cleanup_conntrack(struct nf_conn *ct)
+{
+   unsigned int h;
+
+   h = hash_by_src(nf_ct_net(ct), 
&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
+   spin_lock_bh(&nf_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]);
+   hlist_del_rcu(&ct->nat_bysource);
+   spin_unlock_bh(&nf_nat_locks[h % ARRAY_SIZE(nf_nat_locks)]);
+}
+
 static int nf_nat_proto_clean(struct nf_conn *ct, void *data)
 {
if (nf_nat_proto_remove(ct, data))
@@ -540,9 +552,7 @@ static int nf_nat_proto_clean(struct nf_conn *ct, void 
*data)
 * will delete entry from already-freed table.
 */
clear_bit(IPS_SRC_NAT_DONE_BIT, &ct->status);
-   spin_lock_bh(&nf_nat_lock);
-   hlist_del_rcu(&ct->nat_bysource);
-   spin_unlock_bh(&nf_nat_lock);
+   __nf_nat_cleanup_conntrack(ct);
 
/* don't delete conntrack.  Although that would make things a lot
 * simpler, we'd end up flushing all conntracks on nat rmmod.
@@ -670,11 +680,8 @@ EXPORT_SYMBOL_GPL(nf_nat_l3proto_unregister);
 /* No one using conntrack by the time this called. */
 static void nf_nat_cleanup_conntrack(struct nf_conn *ct)
 {
-   if (ct->status & IPS_SRC_NAT_DONE) {
-   spin_lock_bh(&nf_nat_lock);
-   hlist_del_rcu(&ct->nat_bysource);
-   spin_unlock_bh(&nf_nat_lock);
-   }
+   if (ct->status & IPS_SRC_NAT_DONE)
+   __nf_nat_cleanup_conntrack(ct);
 }
 
 static struct nf_ct_ext_type nat_extend __read_mostly = {
@@ -796,10 +803,12 @@ static struct nf_ct_helper_expectfn follow_master_nat = {
 
 static int __init nf_nat_init(void)
 {
-   int ret;
+   int ret, i;
 
/* Leave them the same for the moment. */
nf_nat_htable_size = nf_conntrack_htable_size;
+   if (nf_nat_htable_size < ARRAY_SIZE(nf_nat_locks))
+   nf_nat_htable_size = ARRAY_SIZE(nf_nat_locks);
 
nf_nat_bysource = nf_ct_alloc_hashtable(&nf_nat_htable_size, 0);
if (!nf_nat_bysource)
@@ -812,6 +821,9 @@ static int __init nf_nat_init(void)
return ret;
}
 
+   for (i = 0; i < ARRAY_SIZE(nf_nat_locks); i++)
+   spin_lock_init(&nf_nat_locks[i]);
+
nf_ct_helper_expectfn_register(&follow_master_nat);
 
BUG_ON(nfnetlink_parse_nat_setup_hook != NULL);
-- 
2.1.4



[PATCH 9/9] netfilter: xt_hashlimit: fix build error caused by 64bit division

2017-09-08 Thread Pablo Neira Ayuso
From: Vishwanath Pai 

64bit division causes build/link errors on 32bit architectures. It
prints out error messages like:

ERROR: "__aeabi_uldivmod" [net/netfilter/xt_hashlimit.ko] undefined!

The value of avg passed through by userspace in BYTE mode cannot exceed
U32_MAX. Which means 64bit division in user2rate_bytes is unnecessary.
To fix this I have changed the type of param 'user' to u32.

Since anything greater than U32_MAX is an invalid input we error out in
hashlimit_mt_check_common() when this is the case.

Changes in v2:
Making return type as u32 would cause an overflow for small
values of 'user' (for example 2, 3 etc). To avoid this I bumped up
'r' to u64 again as well as the return type. This is OK since the
variable that stores the result is u64. We still avoid 64bit
division here since 'user' is u32.

Fixes: bea74641e378 ("netfilter: xt_hashlimit: add rate match mode")
Signed-off-by: Vishwanath Pai 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_hashlimit.c | 12 +++-
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 962ea4a63d9f..5da8746f7b88 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Harald Welte ");
@@ -527,12 +528,12 @@ static u64 user2rate(u64 user)
}
 }
 
-static u64 user2rate_bytes(u64 user)
+static u64 user2rate_bytes(u32 user)
 {
u64 r;
 
-   r = user ? 0xULL / user : 0xULL;
-   r = (r - 1) << 4;
+   r = user ? U32_MAX / user : U32_MAX;
+   r = (r - 1) << XT_HASHLIMIT_BYTE_SHIFT;
return r;
 }
 
@@ -588,7 +589,8 @@ static void rateinfo_init(struct dsthash_ent *dh,
dh->rateinfo.prev_window = 0;
dh->rateinfo.current_rate = 0;
if (hinfo->cfg.mode & XT_HASHLIMIT_BYTES) {
-   dh->rateinfo.rate = user2rate_bytes(hinfo->cfg.avg);
+   dh->rateinfo.rate =
+   user2rate_bytes((u32)hinfo->cfg.avg);
if (hinfo->cfg.burst)
dh->rateinfo.burst =
hinfo->cfg.burst * dh->rateinfo.rate;
@@ -870,7 +872,7 @@ static int hashlimit_mt_check_common(const struct 
xt_mtchk_param *par,
 
/* Check for overflow. */
if (revision >= 3 && cfg->mode & XT_HASHLIMIT_RATE_MATCH) {
-   if (cfg->avg == 0) {
+   if (cfg->avg == 0 || cfg->avg > U32_MAX) {
pr_info("hashlimit invalid rate\n");
return -ERANGE;
}
-- 
2.1.4



[PATCH 4/9] netfilter: xtables: add scheduling opportunity in get_counters

2017-09-08 Thread Pablo Neira Ayuso
From: Florian Westphal 

There are reports about spurious softlockups during iptables-restore, a
backtrace i saw points at get_counters -- it uses a sequence lock and also
has unbounded restart loop.

Signed-off-by: Florian Westphal 
Acked-by: Eric Dumazet 
Signed-off-by: Pablo Neira Ayuso 
---
 net/ipv4/netfilter/arp_tables.c | 1 +
 net/ipv4/netfilter/ip_tables.c  | 1 +
 net/ipv6/netfilter/ip6_tables.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index e04457198f93..9e2770fd00be 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -629,6 +629,7 @@ static void get_counters(const struct xt_table_info *t,
 
ADD_COUNTER(counters[i], bcnt, pcnt);
++i;
+   cond_resched();
}
}
 }
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 576cba2b57e9..39286e543ee6 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -776,6 +776,7 @@ get_counters(const struct xt_table_info *t,
 
ADD_COUNTER(counters[i], bcnt, pcnt);
++i; /* macro does multi eval of i */
+   cond_resched();
}
}
 }
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 54b1e75eded1..01bd3ee5ebc6 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -795,6 +795,7 @@ get_counters(const struct xt_table_info *t,
 
ADD_COUNTER(counters[i], bcnt, pcnt);
++i;
+   cond_resched();
}
}
 }
-- 
2.1.4



[PATCH 7/9] netfilter: core: remove erroneous warn_on

2017-09-08 Thread Pablo Neira Ayuso
From: Florian Westphal 

kernel test robot reported:

WARNING: CPU: 0 PID: 1244 at net/netfilter/core.c:218 
__nf_hook_entries_try_shrink+0x49/0xcd
[..]

After allowing batching in nf_unregister_net_hooks its possible that an earlier
call to __nf_hook_entries_try_shrink already compacted the list.
If this happens we don't need to do anything.

Fixes: d3ad2c17b4047 ("netfilter: core: batch nf_unregister_net_hooks 
synchronize_net calls")
Reported-by: kernel test robot 
Signed-off-by: Florian Westphal 
Acked-by: Aaron Conole 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/netfilter/core.c b/net/netfilter/core.c
index 04fe25abc5f6..52cd2901a097 100644
--- a/net/netfilter/core.c
+++ b/net/netfilter/core.c
@@ -215,7 +215,7 @@ static void *__nf_hook_entries_try_shrink(struct 
nf_hook_entries __rcu **pp)
if (skip == hook_entries)
goto out_assign;
 
-   if (WARN_ON(skip == 0))
+   if (skip == 0)
return NULL;
 
hook_entries -= skip;
-- 
2.1.4



Re: [PATCH] can: check for null sk before deferencing it via the call to sock_net

2017-09-08 Thread Oliver Hartkopp



On 09/08/2017 05:02 PM, Colin King wrote:

From: Colin Ian King 

The assignment of net via call sock_net will dereference sk. This
is performed before a sanity null check on sk, so there could be
a potential null dereference on the sock_net call if sk is null.
Fix this by assigning net after the sk null check. Also replace
the sk == NULL with the more usual !sk idiom.

Detected by CoverityScan CID#1431862 ("Dereference before null check")

Fixes: 384317ef4187 ("can: network namespace support for CAN_BCM protocol")
Signed-off-by: Colin Ian King 


Acked-by: Oliver Hartkopp 


Thanks Collin!


---
  net/can/bcm.c | 5 +++--
  1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index 47a8748d953a..a3791674b8ce 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1493,13 +1493,14 @@ static int bcm_init(struct sock *sk)
  static int bcm_release(struct socket *sock)
  {
struct sock *sk = sock->sk;
-   struct net *net = sock_net(sk);
+   struct net *net;
struct bcm_sock *bo;
struct bcm_op *op, *next;
  
-	if (sk == NULL)

+   if (!sk)
return 0;
  
+	net = sock_net(sk);

bo = bcm_sk(sk);
  
  	/* remove bcm_ops, timer, rx_unregister(), etc. */




[PATCH 8/9] netfilter: xt_hashlimit: alloc hashtable with right size

2017-09-08 Thread Pablo Neira Ayuso
From: Zhizhou Tian 

struct xt_byteslimit_htable used hlist_head, but memory allocation is
done through sizeof(struct list_head).

Signed-off-by: Zhizhou Tian 
Signed-off-by: Pablo Neira Ayuso 
---
 net/netfilter/xt_hashlimit.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/xt_hashlimit.c b/net/netfilter/xt_hashlimit.c
index 10d48234f5f4..962ea4a63d9f 100644
--- a/net/netfilter/xt_hashlimit.c
+++ b/net/netfilter/xt_hashlimit.c
@@ -279,7 +279,7 @@ static int htable_create(struct net *net, struct 
hashlimit_cfg3 *cfg,
size = cfg->size;
} else {
size = (totalram_pages << PAGE_SHIFT) / 16384 /
-  sizeof(struct list_head);
+  sizeof(struct hlist_head);
if (totalram_pages > 1024 * 1024 * 1024 / PAGE_SIZE)
size = 8192;
if (size < 16)
@@ -287,7 +287,7 @@ static int htable_create(struct net *net, struct 
hashlimit_cfg3 *cfg,
}
/* FIXME: don't use vmalloc() here or anywhere else -HW */
hinfo = vmalloc(sizeof(struct xt_hashlimit_htable) +
-   sizeof(struct list_head) * size);
+   sizeof(struct hlist_head) * size);
if (hinfo == NULL)
return -ENOMEM;
*out_hinfo = hinfo;
-- 
2.1.4



  1   2   3   >