Re: [PATCH v2 2/4] can: fixed-transceiver: Add documentation for CAN fixed transceiver bindings

2017-08-01 Thread Kurt Van Dijck
> Hi Kurt,
> 
> On 07/28/2017 09:41 PM, Kurt Van Dijck wrote:
> 
> >>The transceiver is an analog device that needs to support faster
> >>switching frequency (FETs) including minimizing delay to support CAN-FD
> >>ie higher bitrate. From the transceiver perspective the bits for
> >>"arbitration" and "data" look exactly the same. Since it can't
> >>differentiate between the two (at the physical layer) then the actual
> >>limit isn't specific to which part/type of the CAN message is being
> >>sent. Rather its just a single overall max bitrate limit.
> >
> >I must disagree here.
> >The transceiver is an analog device that performs 2 functions:
> >propagate tx bits to CAN wire, and propagate CAN wire state
> >(dominant/recesive) to rx bits.
> >
> >I'll rephrase the above explanation to fit your argument:
> >During arbitration, both directions are required, and needs to propagate
> >within 1 bit time. The transceiver doesn't know, it just performs to
> >best effort.
> >During data, the round-trip timing requirement of layer2 is relaxed.
> >The transceiver still doesn't know, it still performs to best effort.
> >Due to the relaxed round-trip timing requirement, the same transceiver
> >can suddenly allow higher bitrates. The transceiver didn't change, the
> >requirement did change.
> >This is what I meant earlier with "layer2 has been adapted to circumvent
> >layer1 limitations"
> >

> I talked to our CAN transceiver & CAN physical layer specialist who was
> involved in the ISO activities too.
> 
> We just need ONE value: max-bitrate
> 
> The CAN transceiver is qualified to provide this bitrate. That's it.
> There's nothing special with the arbitration bitrate - we don't even need to
> outline any CAN FD capability.
> 
> The other things like 'loop delay compensation' are done in the CAN
> controller. The better the transceiver get's the bits 'in shape' the higher
> it can be qualified. But from the SoC/Controller/Linux view we only need the
> max-bitrate value to double check with the CAN controllers bitrate
> configuration request (which was Franklins intention).

I bet your physical layer specialist understands the details way better
than I do :-)

1 value it is then.

Kind regards,
Kurt



Re: [RFC net-next] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-01 Thread Eric Dumazet
On Mon, 2017-07-31 at 19:57 -0700, Shaohua Li wrote:
> On Mon, Jul 31, 2017 at 04:10:07PM -0700, Stephen Hemminger wrote:
> > On Mon, 31 Jul 2017 10:18:57 -0700
> > Shaohua Li  wrote:
> > 
> > > From: Shaohua Li 
> > > 
> > > In a syn flooding test, the fib6_table rwlock is a significant
> > > bottleneck. While converting the rwlock to rcu sounds straighforward,
> > > but is very challenging if it's possible. A percpu spinlock is quite
> > > trival for this problem since updating the routing table is a rare
> > > event. In my test, the server receives around 1.5 Mpps in syn flooding
> > > test without the patch in a dual sockets and 56-CPU system. With the
> > > patch, the server receives around 3.8Mpps, and perf report doesn't show
> > > the locking issue.
> > > 
> > > Cc: Wei Wang 
> > 
> > You just reinvented brlock...
> 
> you mean lglock? It has been removed from kernel.
>  
> > RCU is not that hard, why not do it right?
> 
> Maybe. But don't think it's the reason why we shouldn't do the percpu lock 
> now,
> this is a simple change, if some smart guys find a way of RCU, we can easily
> remove this.

Make sure to test this on a 256 cpu host, dealing with ICMP messages a
lot.

percpu locks do not scale. This hack was okay last decade, sure, but it
is no longer a good hack.

I would rather focus on the RCU work, Wei is actively working on it.





Re: [PATCH net 1/3] tcp: introduce tcp_rto_delta_us() helper for xmit timer fix

2017-08-01 Thread Eric Dumazet
On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Pure refactor. This helper will be required in the xmit timer fix
> later in the patch series. (Because the TLP logic will want to make
> this calculation.)
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Nandita Dukkipati 
> ---
>  include/net/tcp.h| 10 ++
>  net/ipv4/tcp_input.c |  5 +
>  2 files changed, 11 insertions(+), 4 deletions(-)

Acked-by: Eric Dumazet 




Re: [RFC] net: make net.core.{r,w}mem_{default,max} namespaced

2017-08-01 Thread Eric Dumazet
On Tue, 2017-08-01 at 02:17 -0400, Hannes Frederic Sowa wrote:

> We do account rmem as well as wmem allocated memory to the apropriate
> mem_cgs. In theory this should be okay.

Last time I checked, rmem was not memcg ready yet.

Can you describe the details ?

Thanks.




Re: [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire

2017-08-01 Thread Eric Dumazet
On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Have tcp_schedule_loss_probe() base the TLP scheduling decision based
> on when the RTO *should* fire. This is to enable the upcoming xmit
> timer fix in this series, where tcp_schedule_loss_probe() cannot
> assume that the last timer installed was an RTO timer (because we are
> no longer doing the "rearm RTO, rearm RTO, rearm TLP" dance on every
> ACK). So tcp_schedule_loss_probe() must independently figure out when
> an RTO would want to fire.
> 
> In the new TLP implementation following in this series, we cannot
> assume that icsk_timeout was set based on an RTO; after processing a
> cumulative ACK the icsk_timeout we see can be from a previous TLP or
> RTO. So we need to independently recalculate the RTO time (instead of
> reading it out of icsk_timeout). Removing this dependency on the
> nature of icsk_timeout makes things a little easier to reason about
> anyway.
> 
> Note that the old and new code should be equivalent, since they are
> both saying: "if the RTO is in the future, but at an earlier time than
> the normal TLP time, then set the TLP timer to fire when the RTO would
> have fired".
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Nandita Dukkipati 
> ---
>  net/ipv4/tcp_output.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> index 2f1588bf73da..0ae6b5d176c0 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -2377,8 +2377,8 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>  {
>   struct inet_connection_sock *icsk = inet_csk(sk);
>   struct tcp_sock *tp = tcp_sk(sk);
> - u32 timeout, tlp_time_stamp, rto_time_stamp;
>   u32 rtt = usecs_to_jiffies(tp->srtt_us >> 3);
> + u32 timeout, rto_delta_us;
>  
>   /* No consecutive loss probes. */
>   if (WARN_ON(icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)) {
> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>   timeout = max_t(u32, timeout, msecs_to_jiffies(10));
>  
>   /* If RTO is shorter, just schedule TLP in its place. */

I have hard time to read this comment.

We are here trying to arm a timer based on TLP.

If RTO is shorter, we'll arm the timer based on RTO instead of TLP.

Is "If RTO is shorter, just schedule TLP in its place." really correct ?

I suggest we reword the comment or simply get rid of it now the code is
more obvious.

> - tlp_time_stamp = tcp_jiffies32 + timeout;
> - rto_time_stamp = (u32)inet_csk(sk)->icsk_timeout;
> - if ((s32)(tlp_time_stamp - rto_time_stamp) > 0) {
> - s32 delta = rto_time_stamp - tcp_jiffies32;
> - if (delta > 0)
> - timeout = delta;
> - }
> + rto_delta_us = tcp_rto_delta_us(sk);  /* How far in future is RTO? */
> + if (rto_delta_us > 0)
> + timeout = min_t(u32, timeout, usecs_to_jiffies(rto_delta_us));
>  
>   inet_csk_reset_xmit_timer(sk, ICSK_TIME_LOSS_PROBE, timeout,
> TCP_RTO_MAX);

Acked-by: Eric Dumazet 




Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-01 Thread Eric Dumazet
On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
> Fix a TCP loss recovery performance bug raised recently on the netdev
> list, in two threads:
> 
> (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
> (ii) July 26, 2017: netdev thread:
>  "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
>  outstanding TLP retransmission"

Acked-by: Eric Dumazet 




RE: [PATCH] netfilter: fix stringop-overflow warning with UBSAN

2017-08-01 Thread David Laight
From: Arnd Bergmann
> Sent: 31 July 2017 11:09
> Using gcc-7 with UBSAN enabled, we get this false-positive warning:
> 
> net/netfilter/ipset/ip_set_core.c: In function 'ip_set_sockfn_get':
> net/netfilter/ipset/ip_set_core.c:1998:3: error: 'strncpy' writing 32 bytes 
> into a region of size 2
> overflows the destination [-Werror=stringop-overflow=]
>strncpy(req_get->set.name, set ? set->name : "",
>^~~~
> sizeof(req_get->set.name));
> ~~
> 
> This seems completely bogus, and I could not find a nice workaround.
> To work around it in a less elegant way, I change the ?: operator
> into an if()/else() construct.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  net/netfilter/ipset/ip_set_core.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/net/netfilter/ipset/ip_set_core.c 
> b/net/netfilter/ipset/ip_set_core.c
> index e495b5e484b1..d7ebb021003b 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -1995,8 +1995,12 @@ ip_set_sockfn_get(struct sock *sk, int optval, void 
> __user *user, int *len)
>   }
>   nfnl_lock(NFNL_SUBSYS_IPSET);
>   set = ip_set(inst, req_get->set.index);
> - strncpy(req_get->set.name, ,
> - IPSET_MAXNAMELEN);
> + if (set)
> + strncpy(req_get->set.name, set->name,
> + sizeof(req_get->set.name));
> + else
> + memset(req_get->set.name, '\0',
> +sizeof(req_get->set.name));

If you use strncpy() here, the compiler might optimise the code
back to 'how it was before'.

Or, maybe an explicit temporary: 'const char *name = set ? set->name : "";

David



Re: [RFC] net: make net.core.{r,w}mem_{default,max} namespaced

2017-08-01 Thread Hannes Frederic Sowa
On Tue, Aug 1, 2017, at 09:18, Eric Dumazet wrote:
> On Tue, 2017-08-01 at 02:17 -0400, Hannes Frederic Sowa wrote:
> 
> > We do account rmem as well as wmem allocated memory to the apropriate
> > mem_cgs. In theory this should be okay.
> 
> Last time I checked, rmem was not memcg ready yet.
> 
> Can you describe the details ?

As long as our packets pass __sock_queue_rcv_skb (what we do with udp,
raw, packet) we call down to sk_rmem_schedule, which in
__sk_mem_schedule -> sk_mem_raise_allocated will charge the mem_cg or
suppress the allocation.

For tcp, as I remember from the last discussion, we simply drop packets
and don't handle that very sensible, albeit we should not get out of the
mem_cg limits.

tcp_try_rmem_schedule in front of the ofo as well as the data queue
should make sure of that. Did I overlook anything?

OTOH, I am not too fond of the change. I just think that it shouldn't
deplete memory but rather stall connections.

Thanks,
Hannes


pull request: bluetooth-next 2017-08-01

2017-08-01 Thread Johan Hedberg
Hi Dave,

Here's our first batch of Bluetooth patches for the 4.14 kernel:

 - Several new USB IDs for the btusb driver
 - Memory leak fix in btusb driver
 - Cleanups & fixes to hci_nokia, hci_serdev and hci_bcm drivers
 - Fixed cleanup path in mrf24j40 (802.15.4) driver probe function
 - A few other smaller cleanups & fixes to drivers

Please let me know if there are any issues pulling. Thanks.

Johan

---
The following changes since commit 04d8980b4a9ca178be1c703467f2ed4ac0800e90:

  cxgb4: Update register ranges of T4/T5/T6 adapters (2017-07-19 22:27:03 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next.git 
for-upstream

for you to fetch changes up to d829b9e230f4138fb6194e854e1bb46f737f1c3d:

  Bluetooth: btusb: add ID for LiteOn 04ca:3016 (2017-08-01 08:28:31 +0300)


Brian Norris (1):
  Bluetooth: btusb: add ID for LiteOn 04ca:3016

Christophe JAILLET (1):
  mrf24j40: Fix en error handling path in 'mrf24j40_probe()'

Dan Carpenter (1):
  Bluetooth: btrtl: Fix a error code in rtl_load_config()

Derek Robson (1):
  Bluetooth: Style fix - align block comments

Dmitry Tunin (1):
  Bluetooth: btusb: Add support of all Foxconn (105b) Broadcom devices

Gustavo A. R. Silva (1):
  Bluetooth: btwilink: remove unnecessary static in bt_ti_probe()

Ian Molton (5):
  Bluetooth: hci_nokia: prevent crash on module removal
  Bluetooth: hci_nokia: remove duplicate call to pm_runtime_disable()
  Bluetooth: hci_serdev: Introduce hci_uart_unregister_device()
  Bluetooth: hci_nokia: Use new hci_uart_unregister_device() function
  Bluetooth: hci_ll: Use new hci_uart_unregister_device() function

Jeffy Chen (1):
  Bluetooth: btusb: Fix memory leak in play_deferred

Joan Jani (1):
  Bluetooth: btqca: Fixed a coding style error

Leif Liddy (1):
  Bluetooth: btusb: fix QCA Rome suspend/resume

Loic Poulain (2):
  Bluetooth: hci_bcm: Make bcm_request_irq fail if no IRQ resource
  Bluetooth: hci_uart: Fix uninitialized alignment value

Marcel Holtmann (1):
  Bluetooth: hci_nokia: select BT_BCM for btbcm_set_bdaddr()

 drivers/bluetooth/Kconfig |  1 +
 drivers/bluetooth/ath3k.c |  3 ++-
 drivers/bluetooth/bt3c_cs.c   |  8 ---
 drivers/bluetooth/btmrvl_sdio.c   |  6 --
 drivers/bluetooth/btqca.c |  2 +-
 drivers/bluetooth/btrtl.c |  2 ++
 drivers/bluetooth/btsdio.c|  3 ++-
 drivers/bluetooth/btuart_cs.c |  8 ---
 drivers/bluetooth/btusb.c | 44 ---
 drivers/bluetooth/btwilink.c  |  8 +++
 drivers/bluetooth/hci_bcm.c   | 30 +-
 drivers/bluetooth/hci_h4.c|  2 +-
 drivers/bluetooth/hci_ldisc.c |  3 ++-
 drivers/bluetooth/hci_ll.c| 11 +++---
 drivers/bluetooth/hci_nokia.c | 10 +
 drivers/bluetooth/hci_serdev.c| 13 
 drivers/bluetooth/hci_uart.h  |  1 +
 drivers/net/ieee802154/mrf24j40.c |  3 ++-
 18 files changed, 101 insertions(+), 57 deletions(-)


signature.asc
Description: PGP signature


[PATCH net-next 1/7] esp4: Support RX checksum with crypto offload

2017-08-01 Thread ilant
From: Ilan Tayari 

Keep the device's reported ip_summed indication in case crypto
was offloaded by the device. Subtract the csum values of the
stripped parts (esp header+iv, esp trailer+auth_data) to keep
value correct.

Note: CHECKSUM_COMPLETE should be indicated only if skb->csum
has the post-decryption offload csum value.

Signed-off-by: Ariel Levkovich 
Signed-off-by: Ilan Tayari 
---
 net/ipv4/esp4.c | 14 +++---
 net/ipv4/esp4_offload.c |  4 +++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv4/esp4.c b/net/ipv4/esp4.c
index 0cbee0a666ff..741acd7b9646 100644
--- a/net/ipv4/esp4.c
+++ b/net/ipv4/esp4.c
@@ -510,7 +510,8 @@ int esp_input_done2(struct sk_buff *skb, int err)
int elen = skb->len - hlen;
int ihl;
u8 nexthdr[2];
-   int padlen;
+   int padlen, trimlen;
+   __wsum csumdiff;
 
if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
kfree(ESP_SKB_CB(skb)->tmp);
@@ -568,8 +569,15 @@ int esp_input_done2(struct sk_buff *skb, int err)
skb->ip_summed = CHECKSUM_UNNECESSARY;
}
 
-   pskb_trim(skb, skb->len - alen - padlen - 2);
-   __skb_pull(skb, hlen);
+   trimlen = alen + padlen + 2;
+   if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
+   skb->csum = csum_block_sub(skb->csum, csumdiff,
+  skb->len - trimlen);
+   }
+   pskb_trim(skb, skb->len - trimlen);
+
+   skb_pull_rcsum(skb, hlen);
if (x->props.mode == XFRM_MODE_TUNNEL)
skb_reset_transport_header(skb);
else
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index e0666016a764..05831dea00f4 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -182,11 +182,13 @@ static struct sk_buff *esp4_gso_segment(struct sk_buff 
*skb,
 static int esp_input_tail(struct xfrm_state *x, struct sk_buff *skb)
 {
struct crypto_aead *aead = x->data;
+   struct xfrm_offload *xo = xfrm_offload(skb);
 
if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr) + 
crypto_aead_ivsize(aead)))
return -EINVAL;
 
-   skb->ip_summed = CHECKSUM_NONE;
+   if (!(xo->flags & CRYPTO_DONE))
+   skb->ip_summed = CHECKSUM_NONE;
 
return esp_input_done2(skb, 0);
 }
-- 
2.11.0



[PATCH net-next 2/7] esp6: Support RX checksum with crypto offload

2017-08-01 Thread ilant
From: Ilan Tayari 

Keep the device's reported ip_summed indication in case crypto
was offloaded by the device. Subtract the csum values of the
stripped parts (esp header+iv, esp trailer+auth_data) to keep
value correct.

Note: CHECKSUM_COMPLETE should be indicated only if skb->csum
has the post-decryption offload csum value.

Signed-off-by: Ariel Levkovich 
Signed-off-by: Ilan Tayari 
---
 net/ipv6/esp6.c | 14 +++---
 net/ipv6/esp6_offload.c |  4 +++-
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 9ed35473dcb5..0ca1db62e381 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -470,7 +470,8 @@ int esp6_input_done2(struct sk_buff *skb, int err)
int hlen = sizeof(struct ip_esp_hdr) + crypto_aead_ivsize(aead);
int elen = skb->len - hlen;
int hdr_len = skb_network_header_len(skb);
-   int padlen;
+   int padlen, trimlen;
+   __wsum csumdiff;
u8 nexthdr[2];
 
if (!xo || (xo && !(xo->flags & CRYPTO_DONE)))
@@ -492,8 +493,15 @@ int esp6_input_done2(struct sk_buff *skb, int err)
 
/* ... check padding bits here. Silly. :-) */
 
-   pskb_trim(skb, skb->len - alen - padlen - 2);
-   __skb_pull(skb, hlen);
+   trimlen = alen + padlen + 2;
+   if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
+   skb->csum = csum_block_sub(skb->csum, csumdiff,
+  skb->len - trimlen);
+   }
+   pskb_trim(skb, skb->len - trimlen);
+
+   skb_pull_rcsum(skb, hlen);
if (x->props.mode == XFRM_MODE_TUNNEL)
skb_reset_transport_header(skb);
else
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index f02f131f6435..eec3add177fe 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -209,11 +209,13 @@ static struct sk_buff *esp6_gso_segment(struct sk_buff 
*skb,
 static int esp6_input_tail(struct xfrm_state *x, struct sk_buff *skb)
 {
struct crypto_aead *aead = x->data;
+   struct xfrm_offload *xo = xfrm_offload(skb);
 
if (!pskb_may_pull(skb, sizeof(struct ip_esp_hdr) + 
crypto_aead_ivsize(aead)))
return -EINVAL;
 
-   skb->ip_summed = CHECKSUM_NONE;
+   if (!(xo->flags & CRYPTO_DONE))
+   skb->ip_summed = CHECKSUM_NONE;
 
return esp6_input_done2(skb, 0);
 }
-- 
2.11.0



[PATCH] Adding Agile-SD TCP module and modifying Kconfig and Makefile to configure the kernel for this new module to configure the kernel for this new module.

2017-08-01 Thread mohamedalrshah
From: Mohamed Alrshah 

---
 net/ipv4/Kconfig   |  15 
 net/ipv4/Makefile  |   1 +
 net/ipv4/tcp_agilesd.c | 221 +
 3 files changed, 237 insertions(+)
 create mode 100755 net/ipv4/tcp_agilesd.c

diff --git a/net/ipv4/Kconfig b/net/ipv4/Kconfig
index 91a25579..22d824b1 100644
--- a/net/ipv4/Kconfig
+++ b/net/ipv4/Kconfig
@@ -677,6 +677,17 @@ config TCP_CONG_BBR
bufferbloat, policers, or AQM schemes that do not provide a delay
signal. It requires the fq ("Fair Queue") pacing packet scheduler.
 
+config TCP_CONG_AGILESD
+tristate "Agile-SD Congestion control"
+default n
+---help---
+
+This is version 1.0 of Agile-SD TCP. It is a sender-side only. 
+It contributes the Agility Factor (AF) to shorten the epoch time 
+and to make TCP independent from RTT. AF reduces the sensitivity 
+to packet losses, which in turn Agile-SD to achieve better throughput 
+over high-speed networks.
+
 choice
prompt "Default TCP congestion control"
default DEFAULT_CUBIC
@@ -713,6 +724,9 @@ choice
 
config DEFAULT_BBR
bool "BBR" if TCP_CONG_BBR=y
+   
+config DEFAULT_AGILESD
+   bool "AGILESD" if TCP_CONG_AGILESD=y
 
config DEFAULT_RENO
bool "Reno"
@@ -738,6 +752,7 @@ config DEFAULT_TCP_CONG
default "dctcp" if DEFAULT_DCTCP
default "cdg" if DEFAULT_CDG
default "bbr" if DEFAULT_BBR
+   default "agilesd" if DEFAULT_AGILESD
default "cubic"
 
 config TCP_MD5SIG
diff --git a/net/ipv4/Makefile b/net/ipv4/Makefile
index f83de23a..33d398b5 100644
--- a/net/ipv4/Makefile
+++ b/net/ipv4/Makefile
@@ -44,6 +44,7 @@ obj-$(CONFIG_INET_UDP_DIAG) += udp_diag.o
 obj-$(CONFIG_INET_RAW_DIAG) += raw_diag.o
 obj-$(CONFIG_NET_TCPPROBE) += tcp_probe.o
 obj-$(CONFIG_TCP_CONG_BBR) += tcp_bbr.o
+obj-$(CONFIG_TCP_CONG_AGILESD) += tcp_agilesd.o
 obj-$(CONFIG_TCP_CONG_BIC) += tcp_bic.o
 obj-$(CONFIG_TCP_CONG_CDG) += tcp_cdg.o
 obj-$(CONFIG_TCP_CONG_CUBIC) += tcp_cubic.o
diff --git a/net/ipv4/tcp_agilesd.c b/net/ipv4/tcp_agilesd.c
new file mode 100755
index ..fd040ff2
--- /dev/null
+++ b/net/ipv4/tcp_agilesd.c
@@ -0,0 +1,221 @@
+/* agilesd is a Loss-Based Congestion Control Algorithm for TCP v1.0.
+ * agilesd has been created by Mohamed A. Alrshah,
+ * at Faculty of Computer Science and Information Technology,
+ * Universiti Putra Malaysia.
+ * agilesd is based on the article, which is published in 2015 as below:
+ * 
+ * Alrshah, M.A., Othman, M., Ali, B. and Hanapi, Z.M., 2015. 
+ * Agile-SD: a Linux-based TCP congestion control algorithm for supporting 
high-speed and short-distance networks. 
+ * Journal of Network and Computer Applications, 55, pp.181-190. 
+ */
+
+/* These includes are very important to operate the algorithm under NS2. */
+//#define NS_PROTOCOL "tcp_agilesd.c"
+//#include "../ns-linux-c.h"
+//#include "../ns-linux-util.h"
+//#include 
+/* These includes are very important to operate the algorithm under NS2. */
+
+/* These includes are very important to operate the algorithm under Linux OS. 
*/
+#include 
+#include 
+#include 
+#include 
+//#include // optional
+//#include  // optional
+/* These includes are very important to operate the algorithm under Linux OS. 
*/
+
+#define SCALE   1000   /* Scale factor to avoid fractions */
+#define Double_SCALE 100   /* Double_SCALE must be equal to SCALE^2 */
+#define beta900/* beta for multiplicative 
decrease */
+
+static int initial_ssthresh __read_mostly;
+//static int beta __read_mostly = 900; /*the initial value of beta is equal to 
90%*/
+
+module_param(initial_ssthresh, int, 0644);
+MODULE_PARM_DESC(initial_ssthresh, "initial value of slow start threshold");
+//module_param(beta, int, 0444);
+//MODULE_PARM_DESC(beta, "beta for multiplicative decrease");
+
+/* agilesd Parameters */
+struct agilesdtcp {
+   u32 loss_cwnd;  // congestion window at 
last loss.
+   u32 frac_tracer;// This is to trace the 
fractions of the increment.
+   u32 degraded_loss_cwnd; // loss_cwnd after degradation.
+   enumdystate{SS=0, CA=1} agilesd_tcp_status;
+};
+
+static inline void agilesdtcp_reset(struct sock *sk)
+{
+   /*After timeout loss cntRTT and baseRTT must be reset to the initial 
values as below */
+}
+
+/* This function is called after the first acknowledgment is received and 
before the congestion
+ * control algorithm will be called for the first time. If the congestion 
control algorithm has
+ * private data, it should initialize its private date here. */
+static void agilesdtcp_init(struct sock *sk)
+{
+   struct agilesdtcp *ca = inet_csk_ca(sk);
+
+   // If the value of initial_ssthresh is not set, snd_ssthresh will be 
initialized by a large value.
+   if (initi

[PATCH net-next 3/7] xfrm6: Fix CHECKSUM_COMPLETE after IPv6 header push

2017-08-01 Thread ilant
From: Yossi Kuperman 

xfrm6_transport_finish rebuilds the IPv6 header based on the
original one and pushes it back without fixing skb->csum.
Therefore, CHECKSUM_COMPLETE is no longer valid and the packet
gets dropped.

Fix skb->csum by calling skb_postpush_rcsum.

Note: A valid IPv4 header has checksum 0, unlike IPv6. Thus,
the change is not needed in the sibling xfrm4_transport_finish
function.

Signed-off-by: Yossi Kuperman 
Signed-off-by: Ilan Tayari 
---
 net/ipv6/xfrm6_input.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/xfrm6_input.c b/net/ipv6/xfrm6_input.c
index 3ef5d913e7a3..f95943a13abc 100644
--- a/net/ipv6/xfrm6_input.c
+++ b/net/ipv6/xfrm6_input.c
@@ -34,6 +34,7 @@ EXPORT_SYMBOL(xfrm6_rcv_spi);
 int xfrm6_transport_finish(struct sk_buff *skb, int async)
 {
struct xfrm_offload *xo = xfrm_offload(skb);
+   int nhlen = skb->data - skb_network_header(skb);
 
skb_network_header(skb)[IP6CB(skb)->nhoff] =
XFRM_MODE_SKB_CB(skb)->protocol;
@@ -43,8 +44,9 @@ int xfrm6_transport_finish(struct sk_buff *skb, int async)
return 1;
 #endif
 
-   __skb_push(skb, skb->data - skb_network_header(skb));
+   __skb_push(skb, nhlen);
ipv6_hdr(skb)->payload_len = htons(skb->len - sizeof(struct ipv6hdr));
+   skb_postpush_rcsum(skb, skb_network_header(skb), nhlen);
 
if (xo && (xo->flags & XFRM_GRO)) {
skb_mac_header_rebuild(skb);
-- 
2.11.0



[PATCH net-next 4/7] esp6: Fix RX checksum after header pull

2017-08-01 Thread ilant
From: Yossi Kuperman 

Both ip6_input_finish (non-GRO) and esp6_gro_receive (GRO) strip
the IPv6 header without adjusting skb->csum accordingly. As a
result CHECKSUM_COMPLETE breaks and "hw csum failure" is written
to the kernel log by netdev_rx_csum_fault (dev.c).

Fix skb->csum by substracting the checksum value of the pulled IPv6
header using a call to skb_postpull_rcsum.

This affects both transport and tunnel modes.

Note that the fix occurs far from the place that the header was
pulled. This is based on existing code, see:
ipv6_srh_rcv() in exthdrs.c and rawv6_rcv() in raw.c

Signed-off-by: Yossi Kuperman 
Signed-off-by: Ilan Tayari 
---
 net/ipv6/esp6.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/ipv6/esp6.c b/net/ipv6/esp6.c
index 0ca1db62e381..74bde202eb9a 100644
--- a/net/ipv6/esp6.c
+++ b/net/ipv6/esp6.c
@@ -495,6 +495,8 @@ int esp6_input_done2(struct sk_buff *skb, int err)
 
trimlen = alen + padlen + 2;
if (skb->ip_summed == CHECKSUM_COMPLETE) {
+   skb_postpull_rcsum(skb, skb_network_header(skb),
+  skb_network_header_len(skb));
csumdiff = skb_checksum(skb, skb->len - trimlen, trimlen, 0);
skb->csum = csum_block_sub(skb->csum, csumdiff,
   skb->len - trimlen);
-- 
2.11.0



[PATCH net-next 0/7] IPSec offload improvements

2017-08-01 Thread ilant
From: Ilan Tayari 

Hi Steffen,

This patchset introduces several improvements to IPSec offload.
We would like to see these merged in 4.14.

Patches 1-4 add RX checksum offload support.
This gives a big performance boost.
These patches have been submitted before but were not merged.
Note that patches 1-2 changed slightly with a call to skb_pull_rcsum.

Patch 5 adds automatic loading of XFRM offload modules, but only
if crypto-offload is explicitly requested by user.
This avoid issues in the field where user forgets to load the
module manually and so crypto-offload does not happen.

Patch 6 fixes the leftover xfrm_offload in RX SKBs.
This solves some issues with forwarding.

Patch 7 allows IPSec GSO on local sockets, with or without
crypto-offload.
This also gives a large performance boost.

Thanks,
Ilan.

Ilan Tayari (4):
  esp4: Support RX checksum with crypto offload
  esp6: Support RX checksum with crypto offload
  xfrm: Auto-load xfrm offload modules
  xfrm: Clear RX SKB secpath xfrm_offload

Steffen Klassert (1):
  net: Allow IPsec GSO for local sockets

Yossi Kuperman (2):
  xfrm6: Fix CHECKSUM_COMPLETE after IPv6 header push
  esp6: Fix RX checksum after header pull

 include/net/xfrm.h  | 23 ++-
 net/core/sock.c |  2 +-
 net/ipv4/esp4.c | 14 +++---
 net/ipv4/esp4_offload.c |  5 -
 net/ipv6/esp6.c | 16 +---
 net/ipv6/esp6_offload.c |  5 -
 net/ipv6/xfrm6_input.c  |  4 +++-
 net/xfrm/xfrm_device.c  |  2 +-
 net/xfrm/xfrm_input.c   |  2 ++
 net/xfrm/xfrm_state.c   | 16 
 net/xfrm/xfrm_user.c|  2 +-
 11 files changed, 74 insertions(+), 17 deletions(-)

-- 
2.11.0



[PATCH net-next 5/7] xfrm: Auto-load xfrm offload modules

2017-08-01 Thread ilant
From: Ilan Tayari 

IPSec crypto offload depends on the protocol-specific
offload module (such as esp_offload.ko).

When the user installs an SA with crypto-offload, load
the offload module automatically, in the same way
that the protocol module is loaded (such as esp.ko)

Signed-off-by: Ilan Tayari 
---
 include/net/xfrm.h  |  4 +++-
 net/ipv4/esp4_offload.c |  1 +
 net/ipv6/esp6_offload.c |  1 +
 net/xfrm/xfrm_device.c  |  2 +-
 net/xfrm/xfrm_state.c   | 16 
 net/xfrm/xfrm_user.c|  2 +-
 6 files changed, 19 insertions(+), 7 deletions(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index afb4929d7232..5a360100136c 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -43,6 +43,8 @@
MODULE_ALIAS("xfrm-mode-" __stringify(family) "-" __stringify(encap))
 #define MODULE_ALIAS_XFRM_TYPE(family, proto) \
MODULE_ALIAS("xfrm-type-" __stringify(family) "-" __stringify(proto))
+#define MODULE_ALIAS_XFRM_OFFLOAD_TYPE(family, proto) \
+   MODULE_ALIAS("xfrm-offload-" __stringify(family) "-" __stringify(proto))
 
 #ifdef CONFIG_XFRM_STATISTICS
 #define XFRM_INC_STATS(net, field) 
SNMP_INC_STATS((net)->mib.xfrm_statistics, field)
@@ -1558,7 +1560,7 @@ void xfrm_spd_getinfo(struct net *net, struct 
xfrmk_spdinfo *si);
 u32 xfrm_replay_seqhi(struct xfrm_state *x, __be32 net_seq);
 int xfrm_init_replay(struct xfrm_state *x);
 int xfrm_state_mtu(struct xfrm_state *x, int mtu);
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay);
+int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload);
 int xfrm_init_state(struct xfrm_state *x);
 int xfrm_prepare_input(struct xfrm_state *x, struct sk_buff *skb);
 int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 spi, int encap_type);
diff --git a/net/ipv4/esp4_offload.c b/net/ipv4/esp4_offload.c
index 05831dea00f4..aca1c85f0795 100644
--- a/net/ipv4/esp4_offload.c
+++ b/net/ipv4/esp4_offload.c
@@ -305,3 +305,4 @@ module_init(esp4_offload_init);
 module_exit(esp4_offload_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Steffen Klassert ");
+MODULE_ALIAS_XFRM_OFFLOAD_TYPE(AF_INET, XFRM_PROTO_ESP);
diff --git a/net/ipv6/esp6_offload.c b/net/ipv6/esp6_offload.c
index eec3add177fe..8d4e2ba9163d 100644
--- a/net/ipv6/esp6_offload.c
+++ b/net/ipv6/esp6_offload.c
@@ -334,3 +334,4 @@ module_init(esp6_offload_init);
 module_exit(esp6_offload_exit);
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Steffen Klassert ");
+MODULE_ALIAS_XFRM_OFFLOAD_TYPE(AF_INET6, XFRM_PROTO_ESP);
diff --git a/net/xfrm/xfrm_device.c b/net/xfrm/xfrm_device.c
index 5cd7a244e88d..1904127f5fb8 100644
--- a/net/xfrm/xfrm_device.c
+++ b/net/xfrm/xfrm_device.c
@@ -63,7 +63,7 @@ int xfrm_dev_state_add(struct net *net, struct xfrm_state *x,
xfrm_address_t *daddr;
 
if (!x->type_offload)
-   return 0;
+   return -EINVAL;
 
/* We don't yet support UDP encapsulation, TFC padding and ESN. */
if (x->encap || x->tfcpad || (x->props.flags & XFRM_STATE_ESN))
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 82cbbce69b79..a41e2ef789c0 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -296,12 +296,14 @@ int xfrm_unregister_type_offload(const struct 
xfrm_type_offload *type,
 }
 EXPORT_SYMBOL(xfrm_unregister_type_offload);
 
-static const struct xfrm_type_offload *xfrm_get_type_offload(u8 proto, 
unsigned short family)
+static const struct xfrm_type_offload *
+xfrm_get_type_offload(u8 proto, unsigned short family, bool try_load)
 {
struct xfrm_state_afinfo *afinfo;
const struct xfrm_type_offload **typemap;
const struct xfrm_type_offload *type;
 
+retry:
afinfo = xfrm_state_get_afinfo(family);
if (unlikely(afinfo == NULL))
return NULL;
@@ -311,6 +313,12 @@ static const struct xfrm_type_offload 
*xfrm_get_type_offload(u8 proto, unsigned
if ((type && !try_module_get(type->owner)))
type = NULL;
 
+   if (!type && try_load) {
+   request_module("xfrm-offload-%d-%d", family, proto);
+   try_load = 0;
+   goto retry;
+   }
+
rcu_read_unlock();
return type;
 }
@@ -2165,7 +2173,7 @@ int xfrm_state_mtu(struct xfrm_state *x, int mtu)
return mtu - x->props.header_len;
 }
 
-int __xfrm_init_state(struct xfrm_state *x, bool init_replay)
+int __xfrm_init_state(struct xfrm_state *x, bool init_replay, bool offload)
 {
struct xfrm_state_afinfo *afinfo;
struct xfrm_mode *inner_mode;
@@ -2230,7 +2238,7 @@ int __xfrm_init_state(struct xfrm_state *x, bool 
init_replay)
if (x->type == NULL)
goto error;
 
-   x->type_offload = xfrm_get_type_offload(x->id.proto, family);
+   x->type_offload = xfrm_get_type_offload(x->id.proto, family, offload);
 
err = x->type->init_state(x);
if (err)
@@ -2258,7 +2266,7 @@ EXPORT_SYMBOL(__xfrm_init_state);
 
 int xfrm_init_state(struct xfrm_state *x)
 

[PATCH net-next 7/7] net: Allow IPsec GSO for local sockets

2017-08-01 Thread ilant
From: Steffen Klassert 

This patch allows local sockets to make use of XFRM GSO code path.

Signed-off-by: Steffen Klassert 
Signed-off-by: Ilan Tayari 
---
 include/net/xfrm.h | 19 +++
 net/core/sock.c|  2 +-
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xfrm.h b/include/net/xfrm.h
index 5a360100136c..18d7de34a5c3 100644
--- a/include/net/xfrm.h
+++ b/include/net/xfrm.h
@@ -1858,6 +1858,20 @@ int xfrm_dev_state_add(struct net *net, struct 
xfrm_state *x,
   struct xfrm_user_offload *xuo);
 bool xfrm_dev_offload_ok(struct sk_buff *skb, struct xfrm_state *x);
 
+static inline bool xfrm_dst_offload_ok(struct dst_entry *dst)
+{
+   struct xfrm_state *x = dst->xfrm;
+
+   if (!x || !x->type_offload)
+   return false;
+
+   if (x->xso.offload_handle && (x->xso.dev == dst->path->dev) &&
+   !dst->child->xfrm)
+   return true;
+
+   return false;
+}
+
 static inline void xfrm_dev_state_delete(struct xfrm_state *x)
 {
struct xfrm_state_offload *xso = &x->xso;
@@ -1900,6 +1914,11 @@ static inline bool xfrm_dev_offload_ok(struct sk_buff 
*skb, struct xfrm_state *x
 {
return false;
 }
+
+static inline bool xfrm_dst_offload_ok(struct dst_entry *dst)
+{
+   return false;
+}
 #endif
 
 static inline int xfrm_mark_get(struct nlattr **attrs, struct xfrm_mark *m)
diff --git a/net/core/sock.c b/net/core/sock.c
index ac2a404c73eb..e4b45d027d8b 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1757,7 +1757,7 @@ void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
sk->sk_route_caps |= NETIF_F_GSO_SOFTWARE;
sk->sk_route_caps &= ~sk->sk_route_nocaps;
if (sk_can_gso(sk)) {
-   if (dst->header_len) {
+   if (dst->header_len && !xfrm_dst_offload_ok(dst)) {
sk->sk_route_caps &= ~NETIF_F_GSO_MASK;
} else {
sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
-- 
2.11.0



[PATCH net-next 6/7] xfrm: Clear RX SKB secpath xfrm_offload

2017-08-01 Thread ilant
From: Ilan Tayari 

If an incoming packet undergoes XFRM crypto-offload, its secpath is
filled with xfrm_offload struct denoting offload information.

If the SKB is then forwarded to a device which supports crypto-
offload, the stack wrongfully attempts to offload it (even though
the output SA may not exist on the device) due to the leftover
secpath xo.

Clear the ingress xo by zeroizing secpath->olen just before
delivering the decapsulated packet to the network stack.

Fixes: d77e38e612a0 ("xfrm: Add an IPsec hardware offloading API")
Signed-off-by: Ilan Tayari 
---
 net/xfrm/xfrm_input.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 923205e279f7..f07eec59dcae 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -424,6 +424,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
nf_reset(skb);
 
if (decaps) {
+   skb->sp->olen = 0;
skb_dst_drop(skb);
gro_cells_receive(&gro_cells, skb);
return 0;
@@ -434,6 +435,7 @@ int xfrm_input(struct sk_buff *skb, int nexthdr, __be32 
spi, int encap_type)
 
err = x->inner_mode->afinfo->transport_finish(skb, xfrm_gro || 
async);
if (xfrm_gro) {
+   skb->sp->olen = 0;
skb_dst_drop(skb);
gro_cells_receive(&gro_cells, skb);
return err;
-- 
2.11.0



[PATCH v3] ss: Enclose IPv6 address in brackets

2017-08-01 Thread Florian Lehner
This updated patch adds support for RFC2732 IPv6 address format with
brackets for the tool ss.
It implements the suggestion by Phil Sutter to use a further value,
whether an address was resolved to a hostname.

Signed-off-by: Lehner Florian 
---
 include/utils.h | 10 +++---
 lib/utils.c | 11 +++
 misc/ss.c   | 20 +++-
 3 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/include/utils.h b/include/utils.h
index 6080b96..ffacb49 100644
--- a/include/utils.h
+++ b/include/utils.h
@@ -114,9 +114,13 @@ int addr64_n2a(__u64 addr, char *buff, size_t len);
 int af_bit_len(int af);
 int af_byte_len(int af);

-const char *format_host_r(int af, int len, const void *addr,
-  char *buf, int buflen);
-const char *format_host(int af, int lne, const void *addr);
+const char *format_host_rb(int af, int len, const void *addr,
+  char *buf, int buflen, bool *resolved);
+#define format_host_r(af, len, addr, buf, buflen) \
+   format_host_rb(af, len, addr, buf, buflen, NULL)
+const char *format_host_b(int af, int lne, const void *addr, bool
*resolved);
+#define format_host(af, lne, addr) \
+   format_host_b(af, lne, addr, NULL)
 #define format_host_rta(af, rta) \
format_host(af, RTA_PAYLOAD(rta), RTA_DATA(rta))
 const char *rt_addr_n2a_r(int af, int len, const void *addr,
diff --git a/lib/utils.c b/lib/utils.c
index 9aa3219..42c3bf5 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -898,8 +898,8 @@ static const char *resolve_address(const void *addr,
int len, int af)
 }
 #endif

-const char *format_host_r(int af, int len, const void *addr,
-   char *buf, int buflen)
+const char *format_host_rb(int af, int len, const void *addr,
+   char *buf, int buflen, bool *resolved)
 {
 #ifdef RESOLVE_HOSTNAMES
if (resolve_hosts) {
@@ -909,17 +909,20 @@ const char *format_host_r(int af, int len, const
void *addr,

if (len > 0 &&
(n = resolve_address(addr, len, af)) != NULL)
+   {
+   *resolved = true;
return n;
+   }
}
 #endif
return rt_addr_n2a_r(af, len, addr, buf, buflen);
 }

-const char *format_host(int af, int len, const void *addr)
+const char *format_host_b(int af, int len, const void *addr, bool
*resolved)
 {
static char buf[256];

-   return format_host_r(af, len, addr, buf, 256);
+   return format_host_rb(af, len, addr, buf, 256, resolved);
 }


diff --git a/misc/ss.c b/misc/ss.c
index 12763c9..d37bd1d 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1046,20 +1046,30 @@ do_numeric:

 static void inet_addr_print(const inet_prefix *a, int port, unsigned
int ifindex)
 {
-   char buf[1024];
-   const char *ap = buf;
+   char b1[1024], b2[1024];
+   const char *ap = b1;
int est_len = addr_width;
const char *ifname = NULL;
+   bool resolved = false;

if (a->family == AF_INET) {
if (a->data[0] == 0) {
-   buf[0] = '*';
-   buf[1] = 0;
+   b1[0] = '*';
+   b1[1] = 0;
} else {
ap = format_host(AF_INET, 4, a->data);
}
} else {
-   ap = format_host(a->family, 16, a->data);
+   if (a->family == AF_INET6) {
+   ap = format_host_b(a->family, 16, a->data, &resolved);
+   if (!resolved)
+   {
+   sprintf(b2, "[%s]", ap);
+   ap = b2;
+   }
+   } else {
+   ap = format_host(a->family, 16, a->data);
+   }
est_len = strlen(ap);
if (est_len <= addr_width)
est_len = addr_width;
-- 
2.9.4


Re: [PATCH v2] ravb: add wake-on-lan support via magic packet

2017-08-01 Thread Niklas Söderlund
Hi Sergei,

Thanks for your feedback!

On 2017-07-31 22:47:12 +0300, Sergei Shtylyov wrote:
> Hello!
> 
> On 07/30/2017 05:06 PM, Niklas Söderlund wrote:
> 
> > WoL is enabled in the suspend callback by setting MagicPacket detection
> > and disabling all interrupts expect MagicPacket. In the resume path the
> > driver needs to reset the hardware to rearm the WoL logic, this prevents
> > the driver from simply restoring the registers and to take advantage of
> > that ravb was not suspended to reduce resume time. To reset the
> > hardware the driver closes the device, sets it in reset mode and reopens
> > the device just like it would do in a normal suspend/resume scenario
> > without WoL enabled, but it both closes and opens the device in the
> > resume callback since the device needs to be reset for WoL to work.
> 
> > One quirk needed for WoL is that the module clock needs to be prevented
> > from being switched off by Runtime PM. To keep the clock alive the
> > suspend callback need to call clk_enable() directly to increase the
> > usage count of the clock. Then when Runtime PM decreases the clock usage
> > count it won't reach 0 and be switched off.
> > 
> > Signed-off-by: Niklas Söderlund 
> [...]
> > diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
> > b/drivers/net/ethernet/renesas/ravb_main.c
> > index 5931e859876c2aee..3d399f85417a83cf 100644
> > --- a/drivers/net/ethernet/renesas/ravb_main.c
> > +++ b/drivers/net/ethernet/renesas/ravb_main.c
> [...]
> > @@ -2179,6 +2270,32 @@ static int __maybe_unused ravb_resume(struct device 
> > *dev)
> > struct ravb_private *priv = netdev_priv(ndev);
> > int ret = 0;
> > +   /* Reduce the usecount of the clock to zero and then
> > +* restore it to its original value. This is done to force
> > +* the clock to be re-enabled which is a workaround
> > +* for renesas-cpg-mssr driver which do not enable clocks
> > +* when resuming from PSCI suspend/resume.
> > +*
> > +* Without this workaround the driver fails to communicate
> > +* with the hardware if WoL was enabled when the system
> > +* entered PSCI suspend. This is due to that if WoL is enabled
> > +* we explicitly keep the clock from being turned off when
> > +* suspending, but in PSCI sleep power is cut so the clock
> > +* is disabled anyhow, the clock driver is not aware of this
> > +* so the clock is not turned back on when resuming.
> > +*
> > +* TODO: once the renesas-cpg-mssr suspend/resume is working
> > +*   this clock dance should be removed.
> > +*/
> > +   clk_disable(priv->clk);
> > +   clk_disable(priv->clk);
> > +   clk_enable(priv->clk);
> > +   clk_enable(priv->clk);
> 
>After a small chat with Niklas, it became clear that this dance should be
> behind the *if* that follows. I'd also like to see this workaround as a
> separate patch since the isse it addresses is R-Car gen3 specific (and thus
> can be reverted once the CPG/MMSR driver is fixed)...

Thanks for spotting my mistake! I will fix this and break out the 
workaround in a separate patch and send a v3.

> 
> > +
> > +   /* If WoL is enabled set reset mode to rearm the WoL logic */
> > +   if (priv->wol_enabled)
> > +   ravb_write(ndev, CCC_OPC_RESET, CCC);
> > +
> > /* All register have been reset to default values.
> >  * Restore all registers which where setup at probe time and
> >  * reopen device if it was running before system suspended.
> > @@ -2202,6 +2319,11 @@ static int __maybe_unused ravb_resume(struct device 
> > *dev)
> > ravb_write(ndev, priv->desc_bat_dma, DBAT);
> > if (netif_running(ndev)) {
> > +   if (priv->wol_enabled) {
> > +   ret = ravb_wol_restore(ndev);
> > +   if (ret)
> > +   return ret;
> > +   }
> > ret = ravb_open(ndev);
> 
>Hm, perhaps worth calling sh_eth_open() outside sh_eth_wol_restore() as 
> well?

Worth looking at, but this is outside the scope of this series as it 
touches the sh_eth driver :-)

> 
> MBR, Sergei

-- 
Regards,
Niklas Söderlund


Re: [PATCH net-next RFC 0/6] Configure cloud filters in i40e via tc/flower classifier

2017-08-01 Thread Jamal Hadi Salim

On 17-07-31 08:36 PM, Amritha Nambiar wrote:

This patch series enables configuring cloud filters in i40e
using the tc/flower classifier. The only tc-filter action
supported is to redirect packets to a traffic class on the
same device. The tc/mirred:redirect action is extended to
accept a traffic class to achieve this.

The cloud filters are added for a VSI and are cleaned up when
the VSI is deleted. The filters that match on L4 ports needs
enhanced admin queue functions with big buffer support for
extended general fields in Add/Remove Cloud filters command.

Example:
# tc qdisc add dev eth0 ingress

# ethtool -K eth0 hw-tc-offload on

# tc filter add dev eth0 protocol ip parent : prio 1 flower\
   dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
   skip_sw indev eth0 action mirred ingress redirect dev eth0 tc 1



I think "queue 1" sounds better than "tc 1".
"tc" is  already a keyword in a few places (even within that declaration
above).

cheers,
jamal


[PATCH v3 2/2] ravb: add workaround for clock when resuming with WoL enabled

2017-08-01 Thread Niklas Söderlund
The renesas-cpg-mssr clock driver are not yet aware of PSCI sleep where
power is cut to the SoC. When resuming from this state with WoL enabled
the enable count of the ravb clock is 1 and the clock driver thinks the
clock is already on when PM core enables the clock and increments the
enable count to 2. This will result in the ravb driver failing to talk
to the hardware since the module clock is off. Work around this by
forcing the enable count to 0 and then back to 2 when resuming with WoL
enabled.

This workaround should be reverted once the renesas-cpg-mssr clock
driver becomes aware of this PSCI sleep behavior.

Signed-off-by: Niklas Söderlund 
---
 drivers/net/ethernet/renesas/ravb_main.c | 27 +--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 6d10db1b51468031..fdf30bfa403bf416 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -2270,9 +2270,32 @@ static int __maybe_unused ravb_resume(struct device *dev)
struct ravb_private *priv = netdev_priv(ndev);
int ret = 0;
 
-   /* If WoL is enabled set reset mode to rearm the WoL logic */
-   if (priv->wol_enabled)
+   if (priv->wol_enabled) {
+   /* Reduce the usecount of the clock to zero and then
+* restore it to its original value. This is done to force
+* the clock to be re-enabled which is a workaround
+* for renesas-cpg-mssr driver which do not enable clocks
+* when resuming from PSCI suspend/resume.
+*
+* Without this workaround the driver fails to communicate
+* with the hardware if WoL was enabled when the system
+* entered PSCI suspend. This is due to that if WoL is enabled
+* we explicitly keep the clock from being turned off when
+* suspending, but in PSCI sleep power is cut so the clock
+* is disabled anyhow, the clock driver is not aware of this
+* so the clock is not turned back on when resuming.
+*
+* TODO: once the renesas-cpg-mssr suspend/resume is working
+*   this clock dance should be removed.
+*/
+   clk_disable(priv->clk);
+   clk_disable(priv->clk);
+   clk_enable(priv->clk);
+   clk_enable(priv->clk);
+
+   /* Set reset mode to rearm the WoL logic */
ravb_write(ndev, CCC_OPC_RESET, CCC);
+   }
 
/* All register have been reset to default values.
 * Restore all registers which where setup at probe time and
-- 
2.13.3



[PATCH v3 1/2] ravb: add wake-on-lan support via magic packet

2017-08-01 Thread Niklas Söderlund
WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that ravb was not suspended to reduce resume time. To reset the
hardware the driver closes the device, sets it in reset mode and reopens
the device just like it would do in a normal suspend/resume scenario
without WoL enabled, but it both closes and opens the device in the
resume callback since the device needs to be reset for WoL to work.

One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.

Signed-off-by: Niklas Söderlund 
---
 drivers/net/ethernet/renesas/ravb.h  |   2 +
 drivers/net/ethernet/renesas/ravb_main.c | 108 +--
 2 files changed, 106 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/renesas/ravb.h 
b/drivers/net/ethernet/renesas/ravb.h
index 0525bd696d5d02e5..96a27b00c90e212a 100644
--- a/drivers/net/ethernet/renesas/ravb.h
+++ b/drivers/net/ethernet/renesas/ravb.h
@@ -991,6 +991,7 @@ struct ravb_private {
struct net_device *ndev;
struct platform_device *pdev;
void __iomem *addr;
+   struct clk *clk;
struct mdiobb_ctrl mdiobb;
u32 num_rx_ring[NUM_RX_QUEUE];
u32 num_tx_ring[NUM_TX_QUEUE];
@@ -1033,6 +1034,7 @@ struct ravb_private {
 
unsigned no_avb_link:1;
unsigned avb_link_active_low:1;
+   unsigned wol_enabled:1;
 };
 
 static inline u32 ravb_read(struct net_device *ndev, enum ravb_reg reg)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c 
b/drivers/net/ethernet/renesas/ravb_main.c
index 5931e859876c2aee..6d10db1b51468031 100644
--- a/drivers/net/ethernet/renesas/ravb_main.c
+++ b/drivers/net/ethernet/renesas/ravb_main.c
@@ -680,6 +680,9 @@ static void ravb_emac_interrupt_unlocked(struct net_device 
*ndev)
 
ecsr = ravb_read(ndev, ECSR);
ravb_write(ndev, ecsr, ECSR);   /* clear interrupt */
+
+   if (ecsr & ECSR_MPD)
+   pm_wakeup_event(&priv->pdev->dev, 0);
if (ecsr & ECSR_ICD)
ndev->stats.tx_carrier_errors++;
if (ecsr & ECSR_LCHNG) {
@@ -1330,6 +1333,33 @@ static int ravb_get_ts_info(struct net_device *ndev,
return 0;
 }
 
+static void ravb_get_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+   struct ravb_private *priv = netdev_priv(ndev);
+
+   wol->supported = 0;
+   wol->wolopts = 0;
+
+   if (priv->clk) {
+   wol->supported = WAKE_MAGIC;
+   wol->wolopts = priv->wol_enabled ? WAKE_MAGIC : 0;
+   }
+}
+
+static int ravb_set_wol(struct net_device *ndev, struct ethtool_wolinfo *wol)
+{
+   struct ravb_private *priv = netdev_priv(ndev);
+
+   if (!priv->clk || wol->wolopts & ~WAKE_MAGIC)
+   return -EOPNOTSUPP;
+
+   priv->wol_enabled = !!(wol->wolopts & WAKE_MAGIC);
+
+   device_set_wakeup_enable(&priv->pdev->dev, priv->wol_enabled);
+
+   return 0;
+}
+
 static const struct ethtool_ops ravb_ethtool_ops = {
.nway_reset = ravb_nway_reset,
.get_msglevel   = ravb_get_msglevel,
@@ -1343,6 +1373,8 @@ static const struct ethtool_ops ravb_ethtool_ops = {
.get_ts_info= ravb_get_ts_info,
.get_link_ksettings = ravb_get_link_ksettings,
.set_link_ksettings = ravb_set_link_ksettings,
+   .get_wol= ravb_get_wol,
+   .set_wol= ravb_set_wol,
 };
 
 static inline int ravb_hook_irq(unsigned int irq, irq_handler_t handler,
@@ -2041,6 +2073,11 @@ static int ravb_probe(struct platform_device *pdev)
 
priv->chip_id = chip_id;
 
+   /* Get clock, if not found that's OK but Wake-On-Lan is unavailable */
+   priv->clk = devm_clk_get(&pdev->dev, NULL);
+   if (IS_ERR(priv->clk))
+   priv->clk = NULL;
+
/* Set function */
ndev->netdev_ops = &ravb_netdev_ops;
ndev->ethtool_ops = &ravb_ethtool_ops;
@@ -2107,6 +2144,9 @@ static int ravb_probe(struct platform_device *pdev)
if (error)
goto out_napi_del;
 
+   if (priv->clk)
+   device_set_wakeup_capable(&pdev->dev, 1);
+
/* Print device information */
netdev_info(ndev, "Base address at %#x, %pM, IRQ %d.\n",
(u32)ndev->base_addr, ndev->dev_addr, ndev->irq);
@@ -2160,15 +2200,66 @@ static int ravb_remove(struct platform_device *pdev)
return 0;
 }
 
+static int ravb_wol_setup(struct net_device *ndev)
+{
+   struct ravb_private *priv = netdev_priv(ndev);
+
+   /* Disable interrup

[PATCH v3 0/2] ravb: add wake-on-lan support via magic packet

2017-08-01 Thread Niklas Söderlund
WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that ravb was not suspended to reduce resume time. To reset the
hardware the driver closes the device, sets it in reset mode and reopens
the device just like it would do in a normal suspend/resume scenario
without WoL enabled, but it both closes and opens the device in the
resume callback since the device needs to be reset for WoL to work.

One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.

Changes since v2
- Only do the clock dance to workaround PSCI sleep when resuming if WoL 
  is enabled. This was a bug in v2 which resulted in a WARN if resuming 
  from PSCI sleep with WoL disabled, thanks Sergei for pointing this 
  out!
- Break out clock dance workaround in separate patch to make it easier 
  to revert once a fix is upstream for the clock driver as suggested by 
  Sergei.

Changes since v1
- Fix issue where device would fail to resume from PSCI suspend if WoL
  was enabled, reported by Geert. The fault was that the clock driver
  thinks the clock is on, but PSCI have disabled it, added workaround
  for this in ravb driver which can be removed once the clock driver is
  aware of the PSCI behavior.
- Only try to restore from wol wake up if netif is running, since this
  is a condition to enable wol in the first place this was a bug in v1.

Niklas Söderlund (2):
  ravb: add wake-on-lan support via magic packet
  ravb: add workaround for clock when resuming with WoL enabled

 drivers/net/ethernet/renesas/ravb.h  |   2 +
 drivers/net/ethernet/renesas/ravb_main.c | 131 ++-
 2 files changed, 129 insertions(+), 4 deletions(-)

-- 
2.13.3



Re: [PATCH 1/6] [net-next]net: sched: act_mirred: Extend redirect action to accept a traffic class

2017-08-01 Thread Jamal Hadi Salim

On 17-07-31 08:37 PM, Amritha Nambiar wrote:

The Mirred/redirect action is extended to forward to a traffic
class on the device. The traffic class index needs to be
provided in addition to the device's ifindex.

Example:
# tc filter add dev eth0 protocol ip parent : prio 1 flower\
   dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
   skip_sw indev eth0 action mirred ingress redirect dev eth0 tc 1

Signed-off-by: Amritha Nambiar 
---
  include/net/tc_act/tc_mirred.h|7 +++
  include/uapi/linux/tc_act/tc_mirred.h |5 +
  net/sched/act_mirred.c|   17 +
  3 files changed, 29 insertions(+)

diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
index 604bc31..60058c4 100644
--- a/include/net/tc_act/tc_mirred.h
+++ b/include/net/tc_act/tc_mirred.h
@@ -9,6 +9,8 @@ struct tcf_mirred {
int tcfm_eaction;
int tcfm_ifindex;
booltcfm_mac_header_xmit;
+   u8  tcfm_tc;
+   u32 flags;
struct net_device __rcu *tcfm_dev;
struct list_headtcfm_list;
  };
@@ -37,4 +39,9 @@ static inline int tcf_mirred_ifindex(const struct tc_action 
*a)
return to_mirred(a)->tcfm_ifindex;
  }
  
+static inline int tcf_mirred_tc(const struct tc_action *a)

+{
+   return to_mirred(a)->tcfm_tc;
+}
+
  #endif /* __NET_TC_MIR_H */
diff --git a/include/uapi/linux/tc_act/tc_mirred.h 
b/include/uapi/linux/tc_act/tc_mirred.h
index 3d7a2b3..8ff4d76 100644
--- a/include/uapi/linux/tc_act/tc_mirred.h
+++ b/include/uapi/linux/tc_act/tc_mirred.h
@@ -9,6 +9,10 @@
  #define TCA_EGRESS_MIRROR 2 /* mirror packet to EGRESS */
  #define TCA_INGRESS_REDIR 3  /* packet redirect to INGRESS*/
  #define TCA_INGRESS_MIRROR 4 /* mirror packet to INGRESS */
+
+#define MIRRED_F_TC_MAP0x1
+#define MIRRED_TC_MAP_MAX  0x10


Assuming this is the max number of queues?
Where does this upper bound come from? Is it a spec
or an intel thing? If spec - mentioning which
spec and section would be useful.

cheers,
jamal


[PATCH net 0/2] lan78xx: Fixes to lan78xx driver

2017-08-01 Thread Nisar.Sayed
From: Nisar Sayed 

This series of patches are for lan78xx driver.

These patches fixes potential issues associated with lan78xx driver

Nisar Sayed (2):
  USB fast connect/disconnect crash fix
  Fix to handle hard_header_len update

 drivers/net/usb/lan78xx.c | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

-- 
1.9.1


[PATCH net 1/2] lan78xx: USB fast connect/disconnect crash fix

2017-08-01 Thread Nisar.Sayed
From: Nisar Sayed 

USB fast connect/disconnect crash fix

When USB plugged/unplugged at fast rate,
lan78xx_mdio_init() in lan78xx_bind() failing case is not handled.
Whenever  lan78xx_mdio_init() failed, dev->mdiobus will be freed, however
since lan78xx_bind() not consider as error and try to proceed for
further initialization in lan78xx_probe() which leads system hung/crash.
Also when register_netdev() failed, netdev is freed without calling 
lan78xx_unbind().
Hence halting the failed cases right manner fixes the system crash/hung issue.

Signed-off-by: Nisar Sayed 
---
 drivers/net/usb/lan78xx.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 5833f7e..8ef3639 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2858,13 +2858,13 @@ static int lan78xx_bind(struct lan78xx_net *dev, struct 
usb_interface *intf)
/* Init all registers */
ret = lan78xx_reset(dev);
 
-   lan78xx_mdio_init(dev);
+   ret = lan78xx_mdio_init(dev);
 
dev->net->flags |= IFF_MULTICAST;
 
pdata->wol = WAKE_MAGIC;
 
-   return 0;
+   return ret;
 }
 
 static void lan78xx_unbind(struct lan78xx_net *dev, struct usb_interface *intf)
@@ -3525,11 +3525,11 @@ static int lan78xx_probe(struct usb_interface *intf,
udev = interface_to_usbdev(intf);
udev = usb_get_dev(udev);
 
-   ret = -ENOMEM;
netdev = alloc_etherdev(sizeof(struct lan78xx_net));
if (!netdev) {
-   dev_err(&intf->dev, "Error: OOM\n");
-   goto out1;
+   dev_err(&intf->dev, "Error: OOM\n");
+   ret = -ENOMEM;
+   goto out1;
}
 
/* netdev_printk() needs this */
@@ -3610,7 +3610,7 @@ static int lan78xx_probe(struct usb_interface *intf,
ret = register_netdev(netdev);
if (ret != 0) {
netif_err(dev, probe, netdev, "couldn't register the device\n");
-   goto out2;
+   goto out3;
}
 
usb_set_intfdata(intf, dev);
-- 
1.9.1


[PATCH net 2/2] lan78xx: Fix to handle hard_header_len update

2017-08-01 Thread Nisar.Sayed
From: Nisar Sayed 

Fix to handle hard_header_len update

When ifconfig up/down sequence is initiated hard_header_len
get updated incrementally for each ifconfig up /down sequence,
this leads invalid hard_header_len, moving to lan78xx_bind
to have one time update of hard_header_len addresses the issue.

Signed-off-by: Nisar Sayed 
---
 drivers/net/usb/lan78xx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/usb/lan78xx.c b/drivers/net/usb/lan78xx.c
index 8ef3639..b99a7fb 100644
--- a/drivers/net/usb/lan78xx.c
+++ b/drivers/net/usb/lan78xx.c
@@ -2367,9 +2367,6 @@ static int lan78xx_reset(struct lan78xx_net *dev)
/* Init LTM */
lan78xx_init_ltm(dev);
 
-   dev->net->hard_header_len += TX_OVERHEAD;
-   dev->hard_mtu = dev->net->mtu + dev->net->hard_header_len;
-
if (dev->udev->speed == USB_SPEED_SUPER) {
buf = DEFAULT_BURST_CAP_SIZE / SS_USB_PKT_SIZE;
dev->rx_urb_size = DEFAULT_BURST_CAP_SIZE;
@@ -2855,6 +2852,9 @@ static int lan78xx_bind(struct lan78xx_net *dev, struct 
usb_interface *intf)
return ret;
}
 
+   dev->net->hard_header_len += TX_OVERHEAD;
+   dev->hard_mtu = dev->net->mtu + dev->net->hard_header_len;
+
/* Init all registers */
ret = lan78xx_reset(dev);
 
-- 
1.9.1


Re: [PATCH 1/6] [net-next]net: sched: act_mirred: Extend redirect action to accept a traffic class

2017-08-01 Thread Jamal Hadi Salim

On 17-07-31 08:37 PM, Amritha Nambiar wrote:

The Mirred/redirect action is extended to forward to a traffic
class on the device. The traffic class index needs to be
provided in addition to the device's ifindex.

Example:
# tc filter add dev eth0 protocol ip parent : prio 1 flower\
   dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
   skip_sw indev eth0 action mirred ingress redirect dev eth0 tc 1

Signed-off-by: Amritha Nambiar 
---
  include/net/tc_act/tc_mirred.h|7 +++
  include/uapi/linux/tc_act/tc_mirred.h |5 +
  net/sched/act_mirred.c|   17 +
  3 files changed, 29 insertions(+)

diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
index 604bc31..60058c4 100644
--- a/include/net/tc_act/tc_mirred.h
+++ b/include/net/tc_act/tc_mirred.h
@@ -9,6 +9,8 @@ struct tcf_mirred {
int tcfm_eaction;
int tcfm_ifindex;
booltcfm_mac_header_xmit;
+   u8  tcfm_tc;
+   u32 flags;
struct net_device __rcu *tcfm_dev;
struct list_headtcfm_list;
  };
@@ -37,4 +39,9 @@ static inline int tcf_mirred_ifindex(const struct tc_action 
*a)
return to_mirred(a)->tcfm_ifindex;
  }
  
+static inline int tcf_mirred_tc(const struct tc_action *a)

+{
+   return to_mirred(a)->tcfm_tc;
+}
+
  #endif /* __NET_TC_MIR_H */
diff --git a/include/uapi/linux/tc_act/tc_mirred.h 
b/include/uapi/linux/tc_act/tc_mirred.h
index 3d7a2b3..8ff4d76 100644
--- a/include/uapi/linux/tc_act/tc_mirred.h
+++ b/include/uapi/linux/tc_act/tc_mirred.h
@@ -9,6 +9,10 @@
  #define TCA_EGRESS_MIRROR 2 /* mirror packet to EGRESS */
  #define TCA_INGRESS_REDIR 3  /* packet redirect to INGRESS*/
  #define TCA_INGRESS_MIRROR 4 /* mirror packet to INGRESS */
+
+#define MIRRED_F_TC_MAP0x1
+#define MIRRED_TC_MAP_MAX  0x10
+#define MIRRED_TC_MAP_MASK 0xF
  
  struct tc_mirred {

tc_gen;
@@ -21,6 +25,7 @@ enum {
TCA_MIRRED_TM,
TCA_MIRRED_PARMS,
TCA_MIRRED_PAD,
+   TCA_MIRRED_TC_MAP,
__TCA_MIRRED_MAX
  };
  #define TCA_MIRRED_MAX (__TCA_MIRRED_MAX - 1)
diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
index 1b5549a..f9801de 100644
--- a/net/sched/act_mirred.c
+++ b/net/sched/act_mirred.c
@@ -67,6 +67,7 @@ static void tcf_mirred_release(struct tc_action *a, int bind)
  
  static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = {

[TCA_MIRRED_PARMS]  = { .len = sizeof(struct tc_mirred) },
+   [TCA_MIRRED_TC_MAP] = { .type = NLA_U8 },
  };
  
  static unsigned int mirred_net_id;

@@ -83,6 +84,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
struct tcf_mirred *m;
struct net_device *dev;
bool exists = false;
+   u8 *tc_map = NULL;
+   u32 flags = 0;
int ret;
  
  	if (nla == NULL)

@@ -92,6 +95,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
return ret;
if (tb[TCA_MIRRED_PARMS] == NULL)
return -EINVAL;
+
+   if (tb[TCA_MIRRED_TC_MAP]) {
+   tc_map = nla_data(tb[TCA_MIRRED_TC_MAP]);
+   if (*tc_map >= MIRRED_TC_MAP_MAX)
+   return -EINVAL;
+   flags |= MIRRED_F_TC_MAP;




+   }
+
parm = nla_data(tb[TCA_MIRRED_PARMS]);
  
  	exists = tcf_hash_check(tn, parm->index, a, bind);

@@ -139,6 +150,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
ASSERT_RTNL();
m->tcf_action = parm->action;
m->tcfm_eaction = parm->eaction;
+   m->flags = flags;
if (dev != NULL) {
m->tcfm_ifindex = parm->ifindex;
if (ret != ACT_P_CREATED)
@@ -146,6 +158,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
*nla,
dev_hold(dev);
rcu_assign_pointer(m->tcfm_dev, dev);
m->tcfm_mac_header_xmit = mac_header_xmit;
+   if (flags & MIRRED_F_TC_MAP)
+   m->tcfm_tc = *tc_map & MIRRED_TC_MAP_MASK;
}
  

Is the mask a hardware limit. I dont know how these queues are
allocated - I am assuming each of these "tc queues" maps to a rx
DMA ring?


if (ret == ACT_P_CREATED) {
@@ -259,6 +273,9 @@ static int tcf_mirred_dump(struct sk_buff *skb, struct 
tc_action *a, int bind,
  
  	if (nla_put(skb, TCA_MIRRED_PARMS, sizeof(opt), &opt))

goto nla_put_failure;
+   if ((m->flags & MIRRED_F_TC_MAP) &&
+   nla_put_u8(skb, TCA_MIRRED_TC_MAP, m->tcfm_tc))
+   goto nla_put_failure;



If you have m->tcfm_tc then I dont think you need the flags; so i would
remove it from the struct altogether.

cheers,
jamal


Re: [PATCH net] mcs7780: Silence uninitialized variable warning

2017-08-01 Thread Dan Carpenter
On Mon, Jul 31, 2017 at 10:37:16AM -0700, David Miller wrote:
> From: Dan Carpenter 
> Date: Mon, 31 Jul 2017 10:41:40 +0300
> 
> > On Sat, Jul 29, 2017 at 11:28:55PM -0700, David Miller wrote:
> >> From: Dan Carpenter 
> >> Date: Fri, 28 Jul 2017 17:45:11 +0300
> >> 
> >> > -__u16 rval;
> >> > +__u16 rval = -1;
> >> 
> >> Fixing a bogus warning by assigning a signed constant to an
> >> unsigned variable doesn't really make me all that happy.
> >> 
> >> I don't think I'll apply this, sorry.
> > 
> > There's no guarantee that small kmallocs will always succeed in future
> > kernels so it's not *totally* bogus.
> 
> Perhaps the burdon of initializing the value belongs in
> mcs_get_reg(), and you can set it properly to 0x
> instead of -1.
> 
> Ok?

Sure.  I will resend.

thanks,
dan carpenter


Re: [PATCH 3/6] [net-next]net: i40e: Extend set switch config command to accept cloud filter mode

2017-08-01 Thread Jamal Hadi Salim

On 17-07-31 08:37 PM, Amritha Nambiar wrote:

Add definitions for L4 filters and switch modes based on cloud filters
modes and extend the set switch config command to include the
additional cloud filter mode.



"Cloud filters"? Seriously? Is "enteprise" the next one? ;->

cheers,
jamal


[PATCH 0/1 v3 nf-next] constify nf_conntrack_l3/4proto parameters

2017-08-01 Thread Julia Lawall
When a nf_conntrack_l3/4proto parameter is not on the left hand side
of an assignment, its address is not taken, and it is not passed to a
function that may modify its fields, then it can be declared as const.

This change is useful from a documentation point of view, and can
possibly facilitate making some nf_conntrack_l4proto structures const
subsequently.

Done with the help of Coccinelle.  The following semantic patch shows
the nf_conntrack_l3proto case.

// 
virtual update_results
virtual after_start

@initialize:ocaml@
@@

let unsafe = Hashtbl.create 101

let is_unsafe f = Hashtbl.mem unsafe f

let changed = ref false


(* The next three rules relate to the fact that we do not know the type of
void * variables.  Fortunately this is only neede on the first iteration,
but it still means that the whole kernel will end up being considered. *)

@has depends on !after_start && !update_results@
identifier f,x;
position p;
@@

f@p(...,struct nf_conntrack_l3proto *x,...) { ... }

@hasa depends on !after_start@
identifier f,x;
position p;
@@

f@p(...,struct nf_conntrack_l3proto *x[],...) { ... }

@others depends on !after_start && !update_results@
position p != {has.p,hasa.p};
identifier f,x;
@@

f@p(...,void *x,...) { ... }

@script:ocaml@
f << others.f;
@@

changed := true;
Hashtbl.add unsafe f ()


@fpb depends on !update_results disable optional_qualifier, drop_cast exists@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x,fld;
identifier bad : script:ocaml() { is_unsafe(bad) };
assignment operator aop;
expression e;
local idexpression fp;
type T;
@@

f(...,struct nf_conntrack_l3proto *x,...)
{
...
(
  return x;
|
  (<+...x...+>) aop e
|
  e aop x
|
  (T)x
|
  &(<+...x...+>)
|
  bad(...,x,...)
|
  fp(...,x,...)
|
  (<+...e->fld...+>)(...,x,...)
)
...when any
 }

@script:ocaml@
f << fpb.f;
@@

changed := true;
Printf.eprintf "%s is unsafe\n" f;
Hashtbl.add unsafe f ()

@fpba depends on !update_results disable optional_qualifier, drop_cast exists@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x,fld;
identifier bad : script:ocaml() { is_unsafe(bad) };
assignment operator aop;
expression e;
local idexpression fp;
type T;
@@

f(...,struct nf_conntrack_l3proto *x[],...)
{
...
(
  return \(x\|x[...]\);
|
  (<+...x...+>) aop e
|
  e aop \(x\|x[...]\)
|
  (T)\(x\|x[...]\)
|
  &(<+...x...+>)
|
  bad(...,\(x\|x[...]\),...)
|
  fp(...,\(x\|x[...]\),...)
|
  (<+...e->fld...+>)(...,\(x\|x[...]\),...)
)
... when any
 }

@script:ocaml@
f << fpba.f;
@@

changed := true;
Printf.eprintf "%s is unsafe\n" f;
Hashtbl.add unsafe f ()

@finalize:ocaml depends on !update_results@
tbls << merge.unsafe;
c << merge.changed;
@@

List.iter
(fun t ->
  Hashtbl.iter
(fun k v ->
  if not (Hashtbl.mem unsafe k) then Hashtbl.add unsafe k ()) t)
tbls;
changed := false;
let changed = List.exists (fun x -> !x) c in
let it = new iteration() in
it#add_virtual_rule After_start;
(if not changed
then it#add_virtual_rule Update_results);
it#register()

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
@@

f(...,
+ const
  struct nf_conntrack_l3proto *x,...) { ... }

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
type T;
@@

T f(...,
+ const
  struct nf_conntrack_l3proto *x,...);

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
@@

f(...,
  struct nf_conntrack_l3proto *
+ const
  x[],...) { ... }

@depends on update_results disable optional_qualifier@
identifier f : script:ocaml() { not(is_unsafe(f)) };
identifier x;
type T;
@@

T f(...,
  struct nf_conntrack_l3proto *
+ const
  x[],...);
// 

---

v3:

Rebased against nf-next.  Some functions, such as
nf_ct_l3proto_pernet_register, are no longer defined, so they are no longer
updated.

 include/net/netfilter/nf_conntrack_l4proto.h |   14 +++---
 include/net/netfilter/nf_conntrack_timeout.h |2 +-
 net/netfilter/nf_conntrack_core.c|8 
 net/netfilter/nf_conntrack_netlink.c |6 +++---
 net/netfilter/nf_conntrack_proto.c   |   24 
 net/netfilter/nfnetlink_cttimeout.c  |5 +++--
 6 files changed, 30 insertions(+), 29 deletions(-)


[PATCH 1/1 v3 nf-next] netfilter: constify nf_conntrack_l3/4proto parameters

2017-08-01 Thread Julia Lawall
When a nf_conntrack_l3/4proto parameter is not on the left hand side
of an assignment, its address is not taken, and it is not passed to a
function that may modify its fields, then it can be declared as const.

This change is useful from a documentation point of view, and can
possibly facilitate making some nf_conntrack_l3/4proto structures const
subsequently.

Done with the help of Coccinelle.

Signed-off-by: Julia Lawall 

---

v3:

Rebased against nf-next.  Some functions, such as
nf_ct_l3proto_pernet_register, are no longer defined, so they are no longer
updated.

 include/net/netfilter/nf_conntrack_l4proto.h |   14 +++---
 include/net/netfilter/nf_conntrack_timeout.h |2 +-
 net/netfilter/nf_conntrack_core.c|8 
 net/netfilter/nf_conntrack_netlink.c |6 +++---
 net/netfilter/nf_conntrack_proto.c   |   24 
 net/netfilter/nfnetlink_cttimeout.c  |5 +++--
 6 files changed, 30 insertions(+), 29 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack_l4proto.h 
b/include/net/netfilter/nf_conntrack_l4proto.h
index 7032e04..b6e27ca 100644
--- a/include/net/netfilter/nf_conntrack_l4proto.h
+++ b/include/net/netfilter/nf_conntrack_l4proto.h
@@ -125,23 +125,23 @@ struct nf_conntrack_l4proto 
*__nf_ct_l4proto_find(u_int16_t l3proto,
 
 struct nf_conntrack_l4proto *nf_ct_l4proto_find_get(u_int16_t l3proto,
u_int8_t l4proto);
-void nf_ct_l4proto_put(struct nf_conntrack_l4proto *p);
+void nf_ct_l4proto_put(const struct nf_conntrack_l4proto *p);
 
 /* Protocol pernet registration. */
 int nf_ct_l4proto_pernet_register_one(struct net *net,
- struct nf_conntrack_l4proto *proto);
+   const struct nf_conntrack_l4proto *proto);
 void nf_ct_l4proto_pernet_unregister_one(struct net *net,
-struct nf_conntrack_l4proto *proto);
+   const struct nf_conntrack_l4proto *proto);
 int nf_ct_l4proto_pernet_register(struct net *net,
- struct nf_conntrack_l4proto *proto[],
+ struct nf_conntrack_l4proto *const proto[],
  unsigned int num_proto);
 void nf_ct_l4proto_pernet_unregister(struct net *net,
-struct nf_conntrack_l4proto *proto[],
-unsigned int num_proto);
+   struct nf_conntrack_l4proto *const proto[],
+   unsigned int num_proto);
 
 /* Protocol global registration. */
 int nf_ct_l4proto_register_one(struct nf_conntrack_l4proto *proto);
-void nf_ct_l4proto_unregister_one(struct nf_conntrack_l4proto *proto);
+void nf_ct_l4proto_unregister_one(const struct nf_conntrack_l4proto *proto);
 int nf_ct_l4proto_register(struct nf_conntrack_l4proto *proto[],
   unsigned int num_proto);
 void nf_ct_l4proto_unregister(struct nf_conntrack_l4proto *proto[],
diff --git a/include/net/netfilter/nf_conntrack_timeout.h 
b/include/net/netfilter/nf_conntrack_timeout.h
index d40b893..b222957 100644
--- a/include/net/netfilter/nf_conntrack_timeout.h
+++ b/include/net/netfilter/nf_conntrack_timeout.h
@@ -68,7 +68,7 @@ struct nf_conn_timeout *nf_ct_timeout_ext_add(struct nf_conn 
*ct,
 
 static inline unsigned int *
 nf_ct_timeout_lookup(struct net *net, struct nf_conn *ct,
-struct nf_conntrack_l4proto *l4proto)
+const struct nf_conntrack_l4proto *l4proto)
 {
 #ifdef CONFIG_NF_CONNTRACK_TIMEOUT
struct nf_conn_timeout *timeout_ext;
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 2bc4991..f2f00ea 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -1176,8 +1176,8 @@ void nf_conntrack_free(struct nf_conn *ct)
 static noinline struct nf_conntrack_tuple_hash *
 init_conntrack(struct net *net, struct nf_conn *tmpl,
   const struct nf_conntrack_tuple *tuple,
-  struct nf_conntrack_l3proto *l3proto,
-  struct nf_conntrack_l4proto *l4proto,
+  const struct nf_conntrack_l3proto *l3proto,
+  const struct nf_conntrack_l4proto *l4proto,
   struct sk_buff *skb,
   unsigned int dataoff, u32 hash)
 {
@@ -1288,8 +1288,8 @@ void nf_conntrack_free(struct nf_conn *ct)
  unsigned int dataoff,
  u_int16_t l3num,
  u_int8_t protonum,
- struct nf_conntrack_l3proto *l3proto,
- struct nf_conntrack_l4proto *l4proto)
+ const struct nf_conntrack_l3proto *l3proto,
+ const struct nf_conntrack_l4proto *l4proto)
 {
const struct nf_conntrack_zone *zone;
struct nf_conntrack_tuple tuple;
diff --git a/net/netfilter/nf_conntrack_netlink.c 

Re: [PATCH 6/6] [net-next]net: i40e: Enable cloud filters in i40e via tc/flower classifier

2017-08-01 Thread Jamal Hadi Salim

On 17-07-31 08:38 PM, Amritha Nambiar wrote:

This patch enables tc-flower based hardware offloads. tc/flower
filter provided by the kernel is configured as driver specific
cloud filter. The patch implements functions and admin queue
commands needed to support cloud filters in the driver and
adds cloud filters to configure these tc-flower filters.

The only action supported is to redirect packets to a traffic class
on the same device.

# tc qdisc add dev eth0 ingress
# ethtool -K eth0 hw-tc-offload on

# tc filter add dev eth0 protocol ip parent :\
   prio 1 flower dst_mac 3c:fd:fe:a0:d6:70 skip_sw indev eth0\
   action mirred ingress redirect dev eth0 tc 0



Out of curiosity - did you need to say "indev eth0" there?
Also: Is it possible to add an skbmark? Example something like
these that directs two flows to the same queue but different
skb marks:

# tc filter add dev eth0 protocol ip parent : \
  prio 2 flower dst_ip 192.168.3.5/32 \
  ip_proto udp dst_port 2a skip_sw \
  action skbedit mark 11 \
  action mirred ingress redirect dev eth0 tcqueue 1

# tc filter add dev eth0 protocol ip parent : \
prio 1 flower dst_mac 3c:fd:fe:a0:d6:70 skip_sw \
action skbedit mark 12 \
action mirred ingress redirect dev eth0 tcqueue 1

cheers,
jamal


Re: [PATCH net-next v12 0/4] net sched actions: improve dump performance

2017-08-01 Thread Jamal Hadi Salim

On 17-07-31 11:54 PM, Stephen Hemminger wrote:

On Mon, 31 Jul 2017 08:06:42 -0400
Jamal Hadi Salim  wrote:



[..]

Please cleanup and resubmit for net-next.



Will do.


The header files have been updated in iproute2 net-next branch.



When does net-next show up? I noticed some changes - example Jiri's
multi-table changes are not in the tree (I believe they were submitted
as part of net-next).


It is not clear to me that the new code is backward compatiable
Will new versions of tc work on old kernels and vice/versa?



AFAIK and tested it is.



Also, no #ifdef's


Those will go away. The intention was to test things which will be
rejected (in case some other app in the future uses this feature).

cheers,
jamal




RE: [PATCH] ss: Enclose IPv6 address in brackets

2017-08-01 Thread David Laight
From: Florian Lehner
> Sent: 29 July 2017 13:29
> This patch adds support for RFC2732 IPv6 address format with brackets
> for the tool ss. So output for ss changes from
> 2a00:1450:400a:804::200e:443 to [2a00:1450:400a:804::200e]:443 for IPv6
> addresses with attached port number.
> 
> Signed-off-by: Lehner Florian 
> ---
>  misc/ss.c | 6 +-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/misc/ss.c b/misc/ss.c
> index 12763c9..db39c93 100644
> --- a/misc/ss.c
> +++ b/misc/ss.c
> @@ -1059,7 +1059,11 @@ static void inet_addr_print(const inet_prefix *a,
> int port, unsigned int ifindex
>   ap = format_host(AF_INET, 4, a->data);
>   }
>   } else {
> - ap = format_host(a->family, 16, a->data);
> + if (a->family == AF_INET6) {
> + sprintf(buf, "[%s]", format_host(a->family, 16, 
> a->data));
> + } else {
> + ap = format_host(a->family, 16, a->data);
> + }
>   est_len = strlen(ap);
...

There are some strange things going on with global variables if this works at 
all.
The text form of the address is in buf[] in one path and *ap in the other.

One option might be to call format_host() then use strchr(ap, ':')
to add [] if the string contains any ':'.

David



Re: [PATCH 1/6] [net-next]net: sched: act_mirred: Extend redirect action to accept a traffic class

2017-08-01 Thread Jiri Pirko
Tue, Aug 01, 2017 at 02:37:37AM CEST, amritha.namb...@intel.com wrote:
>The Mirred/redirect action is extended to forward to a traffic
>class on the device. The traffic class index needs to be
>provided in addition to the device's ifindex.
>
>Example:
># tc filter add dev eth0 protocol ip parent : prio 1 flower\
>  dst_ip 192.168.1.1/32 ip_proto udp dst_port 22\
>  skip_sw indev eth0 action mirred ingress redirect dev eth0 tc 1

You need to make sure that the current offloaders fill forbid to add
this rule, not just silently ignore the tc value.


>
>Signed-off-by: Amritha Nambiar 
>---
> include/net/tc_act/tc_mirred.h|7 +++
> include/uapi/linux/tc_act/tc_mirred.h |5 +
> net/sched/act_mirred.c|   17 +
> 3 files changed, 29 insertions(+)
>
>diff --git a/include/net/tc_act/tc_mirred.h b/include/net/tc_act/tc_mirred.h
>index 604bc31..60058c4 100644
>--- a/include/net/tc_act/tc_mirred.h
>+++ b/include/net/tc_act/tc_mirred.h
>@@ -9,6 +9,8 @@ struct tcf_mirred {
>   int tcfm_eaction;
>   int tcfm_ifindex;
>   booltcfm_mac_header_xmit;
>+  u8  tcfm_tc;
>+  u32 flags;
>   struct net_device __rcu *tcfm_dev;
>   struct list_headtcfm_list;
> };
>@@ -37,4 +39,9 @@ static inline int tcf_mirred_ifindex(const struct tc_action 
>*a)
>   return to_mirred(a)->tcfm_ifindex;
> }
> 
>+static inline int tcf_mirred_tc(const struct tc_action *a)
>+{
>+  return to_mirred(a)->tcfm_tc;
>+}
>+
> #endif /* __NET_TC_MIR_H */
>diff --git a/include/uapi/linux/tc_act/tc_mirred.h 
>b/include/uapi/linux/tc_act/tc_mirred.h
>index 3d7a2b3..8ff4d76 100644
>--- a/include/uapi/linux/tc_act/tc_mirred.h
>+++ b/include/uapi/linux/tc_act/tc_mirred.h
>@@ -9,6 +9,10 @@
> #define TCA_EGRESS_MIRROR 2 /* mirror packet to EGRESS */
> #define TCA_INGRESS_REDIR 3  /* packet redirect to INGRESS*/
> #define TCA_INGRESS_MIRROR 4 /* mirror packet to INGRESS */
>+
>+#define MIRRED_F_TC_MAP   0x1
>+#define MIRRED_TC_MAP_MAX 0x10
>+#define MIRRED_TC_MAP_MASK0xF

I'm completely lost. Why do you have these values here? and in fact one
twice?


>   
>   
> struct tc_mirred {
>   tc_gen;
>@@ -21,6 +25,7 @@ enum {
>   TCA_MIRRED_TM,
>   TCA_MIRRED_PARMS,
>   TCA_MIRRED_PAD,
>+  TCA_MIRRED_TC_MAP,
>   __TCA_MIRRED_MAX
> };
> #define TCA_MIRRED_MAX (__TCA_MIRRED_MAX - 1)
>diff --git a/net/sched/act_mirred.c b/net/sched/act_mirred.c
>index 1b5549a..f9801de 100644
>--- a/net/sched/act_mirred.c
>+++ b/net/sched/act_mirred.c
>@@ -67,6 +67,7 @@ static void tcf_mirred_release(struct tc_action *a, int bind)
> 
> static const struct nla_policy mirred_policy[TCA_MIRRED_MAX + 1] = {
>   [TCA_MIRRED_PARMS]  = { .len = sizeof(struct tc_mirred) },
>+  [TCA_MIRRED_TC_MAP] = { .type = NLA_U8 },
> };
> 
> static unsigned int mirred_net_id;
>@@ -83,6 +84,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
>*nla,
>   struct tcf_mirred *m;
>   struct net_device *dev;
>   bool exists = false;
>+  u8 *tc_map = NULL;
>+  u32 flags = 0;
>   int ret;
> 
>   if (nla == NULL)
>@@ -92,6 +95,14 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
>*nla,
>   return ret;
>   if (tb[TCA_MIRRED_PARMS] == NULL)
>   return -EINVAL;
>+
>+  if (tb[TCA_MIRRED_TC_MAP]) {
>+  tc_map = nla_data(tb[TCA_MIRRED_TC_MAP]);
>+  if (*tc_map >= MIRRED_TC_MAP_MAX)
>+  return -EINVAL;
>+  flags |= MIRRED_F_TC_MAP;
>+  }
>+
>   parm = nla_data(tb[TCA_MIRRED_PARMS]);
> 
>   exists = tcf_hash_check(tn, parm->index, a, bind);
>@@ -139,6 +150,7 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
>*nla,
>   ASSERT_RTNL();
>   m->tcf_action = parm->action;
>   m->tcfm_eaction = parm->eaction;
>+  m->flags = flags;
>   if (dev != NULL) {
>   m->tcfm_ifindex = parm->ifindex;
>   if (ret != ACT_P_CREATED)
>@@ -146,6 +158,8 @@ static int tcf_mirred_init(struct net *net, struct nlattr 
>*nla,
>   dev_hold(dev);
>   rcu_assign_pointer(m->tcfm_dev, dev);
>   m->tcfm_mac_header_xmit = mac_header_xmit;
>+  if (flags & MIRRED_F_TC_MAP)
>+  m->tcfm_tc = *tc_map & MIRRED_TC_MAP_MASK;
>   }
> 
>   if (ret == ACT_P_CREATED) {
>@@ -259,6 +273,9 @@ static int tcf_mirred_dump(struct sk_buff *skb, struct 
>tc_action *a, int bind,
> 
>   if (nla_put(skb, TCA_MIRRED_PARMS, sizeof(opt), &opt))
>   goto nla_put_failure;
>+  if ((m->flags & MIRRED_F_TC_MAP) &&
>+  nla_put_u8(skb, TCA_MIRRED_TC_MAP, m->tcfm_tc))
>+  goto nla_put_failure;
> 
>   tcf_tm_dump(&t, &m->tcf_tm);
>   if (nla_put_64bit(s

[PATCH nf-next] netfilter: constify nf_loginfo structures

2017-08-01 Thread Julia Lawall
The nf_loginfo structures are only passed as the seventh argument to
nf_log_trace, which is declared as const or stored in a local const
variable.  Thus the nf_loginfo structures themselves can be const.

Done with the help of Coccinelle.

// 
@r disable optional_qualifier@
identifier i;
position p;
@@
static struct nf_loginfo i@p = { ... };

@ok1@
identifier r.i;
expression list[6] es;
position p;
@@
 nf_log_trace(es,&i@p,...)

@ok2@
identifier r.i;
const struct nf_loginfo *e;
position p;
@@
 e = &i@p

@bad@
position p != {r.p,ok1.p,ok2.p};
identifier r.i;
struct nf_loginfo e;
@@
e@i@p

@depends on !bad disable optional_qualifier@
identifier r.i;
@@
static
+const
 struct nf_loginfo i = { ... };
// 

Signed-off-by: Julia Lawall 

---
 net/ipv4/netfilter/ip_tables.c   |2 +-
 net/ipv4/netfilter/nf_log_arp.c  |2 +-
 net/ipv4/netfilter/nf_log_ipv4.c |2 +-
 net/ipv6/netfilter/ip6_tables.c  |2 +-
 net/ipv6/netfilter/nf_log_ipv6.c |2 +-
 net/netfilter/nf_tables_core.c   |2 +-
 net/netfilter/nfnetlink_log.c|2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/net/netfilter/nf_tables_core.c b/net/netfilter/nf_tables_core.c
index c5bab08..dfd0bf3 100644
--- a/net/netfilter/nf_tables_core.c
+++ b/net/netfilter/nf_tables_core.c
@@ -29,7 +29,7 @@
[NFT_TRACETYPE_RULE]= "rule",
 };
 
-static struct nf_loginfo trace_loginfo = {
+static const struct nf_loginfo trace_loginfo = {
.type = NF_LOG_TYPE_LOG,
.u = {
.log = {
diff --git a/net/netfilter/nfnetlink_log.c b/net/netfilter/nfnetlink_log.c
index c684ba9..cad6498 100644
--- a/net/netfilter/nfnetlink_log.c
+++ b/net/netfilter/nfnetlink_log.c
@@ -606,7 +606,7 @@ static void nfulnl_instance_free_rcu(struct rcu_head *head)
return -1;
 }
 
-static struct nf_loginfo default_loginfo = {
+static const struct nf_loginfo default_loginfo = {
.type = NF_LOG_TYPE_ULOG,
.u = {
.ulog = {
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index 2a55a40..96a289c 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -151,7 +151,7 @@ enum nf_ip_trace_comments {
[NF_IP_TRACE_COMMENT_POLICY]= "policy",
 };
 
-static struct nf_loginfo trace_loginfo = {
+static const struct nf_loginfo trace_loginfo = {
.type = NF_LOG_TYPE_LOG,
.u = {
.log = {
diff --git a/net/ipv4/netfilter/nf_log_arp.c b/net/ipv4/netfilter/nf_log_arp.c
index 2f3895d..df5c2a2 100644
--- a/net/ipv4/netfilter/nf_log_arp.c
+++ b/net/ipv4/netfilter/nf_log_arp.c
@@ -25,7 +25,7 @@
 #include 
 #include 
 
-static struct nf_loginfo default_loginfo = {
+static const struct nf_loginfo default_loginfo = {
.type   = NF_LOG_TYPE_LOG,
.u = {
.log = {
diff --git a/net/ipv4/netfilter/nf_log_ipv4.c b/net/ipv4/netfilter/nf_log_ipv4.c
index c83a996..4388de0 100644
--- a/net/ipv4/netfilter/nf_log_ipv4.c
+++ b/net/ipv4/netfilter/nf_log_ipv4.c
@@ -24,7 +24,7 @@
 #include 
 #include 
 
-static struct nf_loginfo default_loginfo = {
+static const struct nf_loginfo default_loginfo = {
.type   = NF_LOG_TYPE_LOG,
.u = {
.log = {
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 1f90644..9f66449 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -176,7 +176,7 @@ enum nf_ip_trace_comments {
[NF_IP6_TRACE_COMMENT_POLICY]   = "policy",
 };
 
-static struct nf_loginfo trace_loginfo = {
+static const struct nf_loginfo trace_loginfo = {
.type = NF_LOG_TYPE_LOG,
.u = {
.log = {
diff --git a/net/ipv6/netfilter/nf_log_ipv6.c b/net/ipv6/netfilter/nf_log_ipv6.c
index 97c7242..b397a8f 100644
--- a/net/ipv6/netfilter/nf_log_ipv6.c
+++ b/net/ipv6/netfilter/nf_log_ipv6.c
@@ -25,7 +25,7 @@
 #include 
 #include 
 
-static struct nf_loginfo default_loginfo = {
+static const struct nf_loginfo default_loginfo = {
.type   = NF_LOG_TYPE_LOG,
.u = {
.log = {



[PATCH v2 net-next 1/3] net: dsa: lan9303: Refactor lan9303_xxx_packet_processing()

2017-08-01 Thread Egil Hjelmeland
lan9303_enable_packet_processing, lan9303_disable_packet_processing()
Pass port number (0,1,2) as parameter instead of port offset.
Because other functions in the module pass port numbers.
And to enable simplifications in following patch.

Plus replaced a constant 0x400 with LAN9303_SWITCH_PORT_REG().

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 62 ++
 1 file changed, 33 insertions(+), 29 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 8e430d1ee297..4c514d3b9f68 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -159,9 +159,7 @@
 # define LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT1 (BIT(9) | BIT(8))
 # define LAN9303_BM_EGRSS_PORT_TYPE_SPECIAL_TAG_PORT0 (BIT(1) | BIT(0))
 
-#define LAN9303_PORT_0_OFFSET 0x400
-#define LAN9303_PORT_1_OFFSET 0x800
-#define LAN9303_PORT_2_OFFSET 0xc00
+#define LAN9303_SWITCH_PORT_REG(port, reg0) (0x400 * (port) + (reg0))
 
 /* the built-in PHYs are of type LAN911X */
 #define MII_LAN911X_SPECIAL_MODES 0x12
@@ -428,6 +426,13 @@ static int lan9303_read_switch_reg(struct lan9303 *chip, 
u16 regnum, u32 *val)
return ret;
 }
 
+static int lan9303_write_switch_port(
+   struct lan9303 *chip, unsigned int port, u16 regnum, u32 val)
+{
+   return lan9303_write_switch_reg(
+   chip, LAN9303_SWITCH_PORT_REG(port, regnum), val);
+}
+
 static int lan9303_detect_phy_setup(struct lan9303 *chip)
 {
int reg;
@@ -458,24 +463,23 @@ static int lan9303_detect_phy_setup(struct lan9303 *chip)
return 0;
 }
 
-#define LAN9303_MAC_RX_CFG_OFFS (LAN9303_MAC_RX_CFG_0 - LAN9303_PORT_0_OFFSET)
-#define LAN9303_MAC_TX_CFG_OFFS (LAN9303_MAC_TX_CFG_0 - LAN9303_PORT_0_OFFSET)
-
 static int lan9303_disable_packet_processing(struct lan9303 *chip,
 unsigned int port)
 {
int ret;
 
/* disable RX, but keep register reset default values else */
-   ret = lan9303_write_switch_reg(chip, LAN9303_MAC_RX_CFG_OFFS + port,
-  LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES);
+   ret = lan9303_write_switch_port(
+   chip, port, LAN9303_MAC_RX_CFG_0,
+   LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES);
if (ret)
return ret;
 
/* disable TX, but keep register reset default values else */
-   return lan9303_write_switch_reg(chip, LAN9303_MAC_TX_CFG_OFFS + port,
-   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
-   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE);
+   return lan9303_write_switch_port(
+   chip, port, LAN9303_MAC_TX_CFG_0,
+   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
+   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE);
 }
 
 static int lan9303_enable_packet_processing(struct lan9303 *chip,
@@ -484,17 +488,19 @@ static int lan9303_enable_packet_processing(struct 
lan9303 *chip,
int ret;
 
/* enable RX and keep register reset default values else */
-   ret = lan9303_write_switch_reg(chip, LAN9303_MAC_RX_CFG_OFFS + port,
-  LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES |
-  LAN9303_MAC_RX_CFG_X_RX_ENABLE);
+   ret = lan9303_write_switch_port(
+   chip, port, LAN9303_MAC_RX_CFG_0,
+   LAN9303_MAC_RX_CFG_X_REJECT_MAC_TYPES |
+   LAN9303_MAC_RX_CFG_X_RX_ENABLE);
if (ret)
return ret;
 
/* enable TX and keep register reset default values else */
-   return lan9303_write_switch_reg(chip, LAN9303_MAC_TX_CFG_OFFS + port,
-   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
-   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE |
-   LAN9303_MAC_TX_CFG_X_TX_ENABLE);
+   return lan9303_write_switch_port(
+   chip, port, LAN9303_MAC_TX_CFG_0,
+   LAN9303_MAC_TX_CFG_X_TX_IFG_CONFIG_DEFAULT |
+   LAN9303_MAC_TX_CFG_X_TX_PAD_ENABLE |
+   LAN9303_MAC_TX_CFG_X_TX_ENABLE);
 }
 
 /* We want a special working switch:
@@ -558,13 +564,13 @@ static int lan9303_disable_processing(struct lan9303 
*chip)
 {
int ret;
 
-   ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_0_OFFSET);
+   ret = lan9303_disable_packet_processing(chip, 0);
if (ret)
return ret;
-   ret = lan9303_disable_packet_processing(chip, LAN9303_PORT_1_OFFSET);
+   ret = lan9303_disable_packet_processing(chip, 1);
if (ret)
return ret;
-   return lan9303_disable_packet_processing(chip, LAN9303_PORT_2_OFFSET);
+   return lan9303_disable_packet_processing(chip, 2);
 }
 
 static int lan9303_check_device(struct lan9303 *c

[PATCH v2 net-next 3/3] net: dsa: lan9303: Simplify lan9303_xxx_packet_processing() usage

2017-08-01 Thread Egil Hjelmeland
Simplify usage of lan9303_enable_packet_processing,
lan9303_disable_packet_processing()

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 2a3c6bf473dd..4da580c43751 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -564,15 +564,16 @@ static int lan9303_handle_reset(struct lan9303 *chip)
 /* stop processing packets for all ports */
 static int lan9303_disable_processing(struct lan9303 *chip)
 {
-   int ret;
+   int p;
 
-   ret = lan9303_disable_packet_processing(chip, 0);
-   if (ret)
-   return ret;
-   ret = lan9303_disable_packet_processing(chip, 1);
-   if (ret)
-   return ret;
-   return lan9303_disable_packet_processing(chip, 2);
+   for (p = 0; p < LAN9303_NUM_PORTS; p++) {
+   int ret = lan9303_disable_packet_processing(chip, p);
+
+   if (ret)
+   return ret;
+   }
+
+   return 0;
 }
 
 static int lan9303_check_device(struct lan9303 *chip)
@@ -765,7 +766,6 @@ static int lan9303_port_enable(struct dsa_switch *ds, int 
port,
/* enable internal packet processing */
switch (port) {
case 1:
-   return lan9303_enable_packet_processing(chip, port);
case 2:
return lan9303_enable_packet_processing(chip, port);
default:
@@ -784,13 +784,9 @@ static void lan9303_port_disable(struct dsa_switch *ds, 
int port,
/* disable internal packet processing */
switch (port) {
case 1:
-   lan9303_disable_packet_processing(chip, port);
-   lan9303_phy_write(ds, chip->phy_addr_sel_strap + 1,
- MII_BMCR, BMCR_PDOWN);
-   break;
case 2:
lan9303_disable_packet_processing(chip, port);
-   lan9303_phy_write(ds, chip->phy_addr_sel_strap + 2,
+   lan9303_phy_write(ds, chip->phy_addr_sel_strap + port,
  MII_BMCR, BMCR_PDOWN);
break;
default:
-- 
2.11.0



[PATCH v2 net-next 0/3] Refactor lan9303_xxx_packet_processing

2017-08-01 Thread Egil Hjelmeland
This series is purely non functional. It changes the 
lan9303_enable_packet_processing,
lan9303_disable_packet_processing() to pass port number (0,1,2) as
parameter instead of port offset. This aligns them with
other functions in the module, and makes it possible to simplify the code.

First patch: Change lan9303_xxx_packet_processing parameter:
 - Pass port number (0,1,2) as parameter.
 - Introduced lan9303_write_switch_port() 
 - Plus replaced a constant 0x400 with LAN9303_SWITCH_PORT_REG()

Second patch: Introduce LAN9303_NUM_PORTS=3, used in next patch.

Third patch: Simplify lan9303_xxx_packet_processing usage.

Comments welcome!

Changes v1 -> v2:
 - introduced lan9303_write_switch_port() in first patch
 - inserted LAN9303_NUM_PORTS patch
 - Use LAN9303_NUM_PORTS in last patch. Plus whitespace change.  

Egil Hjelmeland (3):
  net: dsa: lan9303: Refactor lan9303_xxx_packet_processing()
  net: dsa: lan9303: define LAN9303_NUM_PORTS 3
  net: dsa: lan9303: Simplify lan9303_xxx_packet_processing() usage

 drivers/net/dsa/lan9303-core.c | 78 ++
 1 file changed, 40 insertions(+), 38 deletions(-)

-- 
2.11.0



[PATCH v2 net-next 2/3] net: dsa: lan9303: define LAN9303_NUM_PORTS 3

2017-08-01 Thread Egil Hjelmeland
Will be used instead of '3' in upcomming patches.

Signed-off-by: Egil Hjelmeland 
---
 drivers/net/dsa/lan9303-core.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/dsa/lan9303-core.c b/drivers/net/dsa/lan9303-core.c
index 4c514d3b9f68..2a3c6bf473dd 100644
--- a/drivers/net/dsa/lan9303-core.c
+++ b/drivers/net/dsa/lan9303-core.c
@@ -20,6 +20,8 @@
 
 #include "lan9303.h"
 
+#define LAN9303_NUM_PORTS 3
+
 /* 13.2 System Control and Status Registers
  * Multiply register number by 4 to get address offset.
  */
-- 
2.11.0



Re: [PATCH v2 net-next 2/3] net: dsa: lan9303: define LAN9303_NUM_PORTS 3

2017-08-01 Thread Juergen Borleis
Hi Egil,

On Tuesday 01 August 2017 13:14:38 Egil Hjelmeland wrote:
> Will be used instead of '3' in upcomming patches.
>
> Signed-off-by: Egil Hjelmeland 
> ---
>  drivers/net/dsa/lan9303-core.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/dsa/lan9303-core.c
> b/drivers/net/dsa/lan9303-core.c index 4c514d3b9f68..2a3c6bf473dd 100644
> --- a/drivers/net/dsa/lan9303-core.c
> +++ b/drivers/net/dsa/lan9303-core.c
> @@ -20,6 +20,8 @@
>
>  #include "lan9303.h"
>
> +#define LAN9303_NUM_PORTS 3
> +
>  /* 13.2 System Control and Status Registers
>   * Multiply register number by 4 to get address offset.
>   */

Maybe we should put this macro into a shared location because 
in "net/dsa/tag_lan9303.c" there is already a "#define LAN9303_MAX_PORTS 
3".

jb

-- 
Pengutronix e.K.                             | Juergen Borleis             |
Industrial Linux Solutions                   | http://www.pengutronix.de/  |


[PATCH net-next] net: bcmgenet: drop COMPILE_TEST dependency

2017-08-01 Thread Arnd Bergmann
The last patch added the dependency on 'OF && HAS_IOMEM' but left
COMPILE_TEST as an alternative, which kind of defeats the purpose
of adding the dependency, we still get randconfig build warnings:

warning: (NET_DSA_BCM_SF2 && BCMGENET) selects MDIO_BCM_UNIMAC which has unmet 
direct dependencies (NETDEVICES && MDIO_BUS && HAS_IOMEM && OF_MDIO)

For compile-testing purposes, we don't really need this anyway,
as CONFIG_OF can be enabled on all architectures, and HAS_IOMEM
is present on all architectures we do meaningful compile-testing on
(the exception being arch/um).

This makes both OF and HAS_IOMEM hard dependencies.

Fixes: 5af74bb4fcf8 ("net: bcmgenet: Add dependency on HAS_IOMEM && OF")
Signed-off-by: Arnd Bergmann 
---
 drivers/net/ethernet/broadcom/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/broadcom/Kconfig 
b/drivers/net/ethernet/broadcom/Kconfig
index 45775399cab6..1456cb18f830 100644
--- a/drivers/net/ethernet/broadcom/Kconfig
+++ b/drivers/net/ethernet/broadcom/Kconfig
@@ -61,7 +61,7 @@ config BCM63XX_ENET
 
 config BCMGENET
tristate "Broadcom GENET internal MAC support"
-   depends on (OF && HAS_IOMEM) || COMPILE_TEST
+   depends on OF && HAS_IOMEM
select MII
select PHYLIB
select FIXED_PHY
-- 
2.9.0



RE: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-01 Thread maowenan


> -Original Message-
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Neal Cardwell
> Sent: Tuesday, August 01, 2017 10:58 AM
> To: David Miller
> Cc: netdev@vger.kernel.org; Neal Cardwell; Yuchung Cheng; Nandita Dukkipati
> Subject: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data
> ACKed/SACKed
> 
> Fix a TCP loss recovery performance bug raised recently on the netdev list, in
> two threads:
> 
> (i)  July 26, 2017: netdev thread "TCP fast retransmit issues"
> (ii) July 26, 2017: netdev thread:
>  "[PATCH V2 net-next] TLP: Don't reschedule PTO when there's one
>  outstanding TLP retransmission"
> 
> The basic problem is that incoming TCP packets that did not indicate forward
> progress could cause the xmit timer (TLP or RTO) to be rearmed and pushed
> back in time. In certain corner cases this could result in the following 
> problems
> noted in these threads:
> 
>  - Repeated ACKs coming in with bogus SACKs corrupted by middleboxes
>could cause TCP to repeatedly schedule TLPs forever. We kept
>sending TLPs after every ~200ms, which elicited bogus SACKs, which
>caused more TLPs, ad infinitum; we never fired an RTO to fill in
>the holes.
> 
>  - Incoming data segments could, in some cases, cause us to reschedule
>our RTO or TLP timer further out in time, for no good reason. This
>could cause repeated inbound data to result in stalls in outbound
>data, in the presence of packet loss.
> 
> This commit fixes these bugs by changing the TLP and RTO ACK processing to:
> 
>  (a) Only reschedule the xmit timer once per ACK.
> 
>  (b) Only reschedule the xmit timer if tcp_clean_rtx_queue() deems the
>  ACK indicates sufficient forward progress (a packet was
>  cumulatively ACKed, or we got a SACK for a packet that was sent
>  before the most recent retransmit of the write queue head).
> 
> This brings us back into closer compliance with the RFCs, since, as the
> comment for tcp_rearm_rto() notes, we should only restart the RTO timer
> after forward progress on the connection. Previously we were restarting the
> xmit timer even in these cases where there was no forward progress.
> 
> As a side benefit, this commit simplifies and speeds up the TCP timer arming
> logic. We had been calling inet_csk_reset_xmit_timer() three times on normal
> ACKs that cumulatively acknowledged some data:
> 
> 1) Once near the top of tcp_ack() to switch from TLP timer to RTO:
> if (icsk->icsk_pending == ICSK_TIME_LOSS_PROBE)
>tcp_rearm_rto(sk);
> 
> 2) Once in tcp_clean_rtx_queue(), to update the RTO:
> if (flag & FLAG_ACKED) {
>tcp_rearm_rto(sk);
> 
> 3) Once in tcp_ack() after tcp_fastretrans_alert() to switch from RTO
>to TLP:
> if (icsk->icsk_pending == ICSK_TIME_RETRANS)
>tcp_schedule_loss_probe(sk);
> 
> This commit, by only rescheduling the xmit timer once per ACK, simplifies the
> code and reduces CPU overhead.
> 
> This commit was tested in an A/B test with Google web server traffic. SNMP
> stats and request latency metrics were within noise levels, substantiating 
> that
> for normal web traffic patterns this is a rare issue. This commit was also 
> tested
> with packetdrill tests to verify that it fixes the timer behavior in the 
> corner
> cases discussed in the netdev threads mentioned above.
> 
> This patch is a bug fix patch intended to be queued for -stable relases.
> 
> Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
> Reported-by: Klavs Klavsen 
> Reported-by: Mao Wenan 
> Signed-off-by: Neal Cardwell 
> Signed-off-by: Yuchung Cheng 
> Signed-off-by: Nandita Dukkipati 
> ---
>  net/ipv4/tcp_input.c  | 25 -
> net/ipv4/tcp_output.c |  9 -
>  2 files changed, 16 insertions(+), 18 deletions(-)
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> 345febf0a46e..3e777cfbba56 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -107,6 +107,7 @@ int sysctl_tcp_invalid_ratelimit __read_mostly = HZ/2;
>  #define FLAG_ORIG_SACK_ACKED 0x200 /* Never retransmitted data are
> (s)acked  */
>  #define FLAG_SND_UNA_ADVANCED0x400 /* Snd_una was changed (!=
> FLAG_DATA_ACKED) */
>  #define FLAG_DSACKING_ACK0x800 /* SACK blocks contained D-SACK info
> */
> +#define FLAG_SET_XMIT_TIMER  0x1000 /* Set TLP or RTO timer */
>  #define FLAG_SACK_RENEGING   0x2000 /* snd_una advanced to a sacked
> seq */
>  #define FLAG_UPDATE_TS_RECENT0x4000 /* tcp_replace_ts_recent() */
>  #define FLAG_NO_CHALLENGE_ACK0x8000 /* do not call
> tcp_send_challenge_ack()  */
> @@ -3016,6 +3017,13 @@ void tcp_rearm_rto(struct sock *sk)
>   }
>  }
> 
> +/* Try to schedule a loss probe; if that doesn't work, then schedule an
> +RTO. */ static void tcp_set_xmit_timer(struct sock *sk) {
> + if (!tcp_schedule_loss_probe(sk))
> + tcp_rearm_rto(sk);
> +}
> +
>  /* If

Re: [PATCH v2 net-next 2/3] net: dsa: lan9303: define LAN9303_NUM_PORTS 3

2017-08-01 Thread Egil Hjelmeland

On 01. aug. 2017 13:49, Juergen Borleis wrote:

Hi Egil,

On Tuesday 01 August 2017 13:14:38 Egil Hjelmeland wrote:

Will be used instead of '3' in upcomming patches.


+#define LAN9303_NUM_PORTS 3
+


Maybe we should put this macro into a shared location because
in "net/dsa/tag_lan9303.c" there is already a "#define LAN9303_MAX_PORTS
3".

jb



Is there any suitable shared location for such driver specific
definitions?
I could change the name to LAN9303_MAX_PORTS so it the same.
Rhymes better with DSA_MAX_PORTS too.

Let's hear what other mean.

Egil



Re: [PATCH v3 1/2] ravb: add wake-on-lan support via magic packet

2017-08-01 Thread Sergei Shtylyov

Hello!

On 08/01/2017 01:14 PM, Niklas Söderlund wrote:


WoL is enabled in the suspend callback by setting MagicPacket detection
and disabling all interrupts expect MagicPacket. In the resume path the
driver needs to reset the hardware to rearm the WoL logic, this prevents
the driver from simply restoring the registers and to take advantage of
that ravb was not suspended to reduce resume time. To reset the
hardware the driver closes the device, sets it in reset mode and reopens
the device just like it would do in a normal suspend/resume scenario
without WoL enabled, but it both closes and opens the device in the
resume callback since the device needs to be reset for WoL to work.

One quirk needed for WoL is that the module clock needs to be prevented
from being switched off by Runtime PM. To keep the clock alive the
suspend callback need to call clk_enable() directly to increase the
usage count of the clock. Then when Runtime PM decreases the clock usage
count it won't reach 0 and be switched off.



Signed-off-by: Niklas Söderlund 


[...]

Acked-by: Sergei Shtylyov 

MBR, Sergei


Re: [PATCH v3 2/2] ravb: add workaround for clock when resuming with WoL enabled

2017-08-01 Thread Sergei Shtylyov

On 08/01/2017 01:14 PM, Niklas Söderlund wrote:


The renesas-cpg-mssr clock driver are not yet aware of PSCI sleep where
power is cut to the SoC. When resuming from this state with WoL enabled
the enable count of the ravb clock is 1 and the clock driver thinks the
clock is already on when PM core enables the clock and increments the
enable count to 2. This will result in the ravb driver failing to talk
to the hardware since the module clock is off. Work around this by
forcing the enable count to 0 and then back to 2 when resuming with WoL
enabled.

This workaround should be reverted once the renesas-cpg-mssr clock
driver becomes aware of this PSCI sleep behavior.

Signed-off-by: Niklas Söderlund 


Acked-by: Sergei Shtylyov 

MBR, Sergei


Re: [PATCH v2 net-next 2/3] net: dsa: lan9303: define LAN9303_NUM_PORTS 3

2017-08-01 Thread Andrew Lunn
On Tue, Aug 01, 2017 at 02:31:44PM +0200, Egil Hjelmeland wrote:
> On 01. aug. 2017 13:49, Juergen Borleis wrote:
> >Hi Egil,
> >
> >On Tuesday 01 August 2017 13:14:38 Egil Hjelmeland wrote:
> >>Will be used instead of '3' in upcomming patches.
> >>
> >>
> >>+#define LAN9303_NUM_PORTS 3
> >>+
> >
> >Maybe we should put this macro into a shared location because
> >in "net/dsa/tag_lan9303.c" there is already a "#define LAN9303_MAX_PORTS
> >3".
> >
> >jb
> >
> 
> Is there any suitable shared location for such driver specific
> definitions?
> I could change the name to LAN9303_MAX_PORTS so it the same.
> Rhymes better with DSA_MAX_PORTS too.

Hi Egil, Juergen

The other tag drivers do:

if (source_port >= ds->num_ports || !ds->ports[source_port].netdev)
return NULL;

or just

if (!ds->ports[port].netdev)
return NULL;

The first version is the safest, since a malicious switch could return
port 42, and you are accessing way off the end of ds->ports[]. It does
however require you call dsa_switch_alloc() with the correct number of
ports.

 Andrew


Re: [PATCH v2 net-next 1/3] net: dsa: lan9303: Refactor lan9303_xxx_packet_processing()

2017-08-01 Thread Andrew Lunn
> @@ -704,7 +710,7 @@ static void lan9303_get_ethtool_stats(struct dsa_switch 
> *ds, int port,
>   unsigned int u, poff;
>   int ret;
>  
> - poff = port * 0x400;
> + poff = LAN9303_SWITCH_PORT_REG(port, 0);
>  
>   for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
>   ret = lan9303_read_switch_reg(chip,

So the actual code is:

for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_reg(chip,
  lan9303_mib[u].offset + poff,
  ®);

Could this be written as

for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_port(chip, port, 
lan9303_mib[u].offset, ®);

It is then clear you are reading the statistics from a port register.

   Andrew


[PATCH net 4/4] net/mlx4_core: Fixes missing capability bit in flags2 capability dump

2017-08-01 Thread Tariq Toukan
From: Jack Morgenstein 

The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

QUERY_DEV_CAP_DIAG_RPRT_PER_PORT

However, it failed to introduce a corresponding entry in function
dump_dev_cap_flags2() for outputting a line in the message log
when this capability bit is set.

The change here fixes that omission.

Fixes: c7c122ed67e4 ("net/mlx4: Add diagnostic counters capability bit")
Reported-by: Mukesh Kacker 
Signed-off-by: Jack Morgenstein 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 7c9502bae1cc..041c0ed65929 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -159,6 +159,7 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 
flags)
[32] = "Loopback source checks support",
[33] = "RoCEv2 support",
[34] = "DMFS Sniffer support (UC & MC)",
+   [35] = "Diag counters per port",
[36] = "QinQ VST mode support",
[37] = "sl to vl mapping table change event support",
};
-- 
1.8.3.1



[PATCH net 1/4] net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support

2017-08-01 Thread Tariq Toukan
From: Inbar Karmy 

Currently when WoL is supported but disabled, ethtool reports:
"Supports Wake-on: d".
Fix the indication of Wol support, so that the indication
remains "g" all the time if the NIC supports WoL.

Tested:
As accepted, when NIC supports WoL- ethtool reports:
Supports Wake-on: g
Wake-on: d
when NIC doesn't support WoL- ethtool reports:
Supports Wake-on: d
Wake-on: d

Fixes: 14c07b1358ed ("mlx4: Wake on LAN support")
Signed-off-by: Inbar Karmy 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 15 ---
 drivers/net/ethernet/mellanox/mlx4/fw.c |  4 
 drivers/net/ethernet/mellanox/mlx4/fw.h |  1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  2 ++
 include/linux/mlx4/device.h |  1 +
 5 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c 
b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
index c751a1d434ad..3d4e4a5d00d1 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_ethtool.c
@@ -223,6 +223,7 @@ static void mlx4_en_get_wol(struct net_device *netdev,
struct ethtool_wolinfo *wol)
 {
struct mlx4_en_priv *priv = netdev_priv(netdev);
+   struct mlx4_caps *caps = &priv->mdev->dev->caps;
int err = 0;
u64 config = 0;
u64 mask;
@@ -235,24 +236,24 @@ static void mlx4_en_get_wol(struct net_device *netdev,
mask = (priv->port == 1) ? MLX4_DEV_CAP_FLAG_WOL_PORT1 :
MLX4_DEV_CAP_FLAG_WOL_PORT2;
 
-   if (!(priv->mdev->dev->caps.flags & mask)) {
+   if (!(caps->flags & mask)) {
wol->supported = 0;
wol->wolopts = 0;
return;
}
 
+   if (caps->wol_port[priv->port])
+   wol->supported = WAKE_MAGIC;
+   else
+   wol->supported = 0;
+
err = mlx4_wol_read(priv->mdev->dev, &config, priv->port);
if (err) {
en_err(priv, "Failed to get WoL information\n");
return;
}
 
-   if (config & MLX4_EN_WOL_MAGIC)
-   wol->supported = WAKE_MAGIC;
-   else
-   wol->supported = 0;
-
-   if (config & MLX4_EN_WOL_ENABLED)
+   if ((config & MLX4_EN_WOL_ENABLED) && (config & MLX4_EN_WOL_MAGIC))
wol->wolopts = WAKE_MAGIC;
else
wol->wolopts = 0;
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 37e84a59e751..c165f16623a9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -764,6 +764,7 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
 #define QUERY_DEV_CAP_CQ_TS_SUPPORT_OFFSET 0x3e
 #define QUERY_DEV_CAP_MAX_PKEY_OFFSET  0x3f
 #define QUERY_DEV_CAP_EXT_FLAGS_OFFSET 0x40
+#define QUERY_DEV_CAP_WOL_OFFSET   0x43
 #define QUERY_DEV_CAP_FLAGS_OFFSET 0x44
 #define QUERY_DEV_CAP_RSVD_UAR_OFFSET  0x48
 #define QUERY_DEV_CAP_UAR_SZ_OFFSET0x49
@@ -920,6 +921,9 @@ int mlx4_QUERY_DEV_CAP(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
MLX4_GET(ext_flags, outbox, QUERY_DEV_CAP_EXT_FLAGS_OFFSET);
MLX4_GET(flags, outbox, QUERY_DEV_CAP_FLAGS_OFFSET);
dev_cap->flags = flags | (u64)ext_flags << 32;
+   MLX4_GET(field, outbox, QUERY_DEV_CAP_WOL_OFFSET);
+   dev_cap->wol_port[1] = !!(field & 0x20);
+   dev_cap->wol_port[2] = !!(field & 0x40);
MLX4_GET(field, outbox, QUERY_DEV_CAP_RSVD_UAR_OFFSET);
dev_cap->reserved_uars = field >> 4;
MLX4_GET(field, outbox, QUERY_DEV_CAP_UAR_SZ_OFFSET);
diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.h 
b/drivers/net/ethernet/mellanox/mlx4/fw.h
index 5343a0599253..b52ba01aa486 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.h
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.h
@@ -129,6 +129,7 @@ struct mlx4_dev_cap {
u32 dmfs_high_rate_qpn_range;
struct mlx4_rate_limit_caps rl_caps;
struct mlx4_port_cap port_cap[MLX4_MAX_PORTS + 1];
+   bool wol_port[MLX4_MAX_PORTS + 1];
 };
 
 struct mlx4_func_cap {
diff --git a/drivers/net/ethernet/mellanox/mlx4/main.c 
b/drivers/net/ethernet/mellanox/mlx4/main.c
index a27c9c13a36e..09b9bc17bce9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/main.c
+++ b/drivers/net/ethernet/mellanox/mlx4/main.c
@@ -424,6 +424,8 @@ static int mlx4_dev_cap(struct mlx4_dev *dev, struct 
mlx4_dev_cap *dev_cap)
dev->caps.stat_rate_support  = dev_cap->stat_rate_support;
dev->caps.max_gso_sz = dev_cap->max_gso_sz;
dev->caps.max_rss_tbl_sz = dev_cap->max_rss_tbl_sz;
+   dev->caps.wol_port[1]  = dev_cap->wol_port[1];
+   dev->caps.wol_port[2]  = dev_cap->wol_port[2];
 
/* Save uar page shift */
if (!mlx4_is_

[PATCH net 2/4] net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump

2017-08-01 Thread Tariq Toukan
From: Jack Morgenstein 

The index value in function dump_dev_cap_flags2() for outputting
"sl to vl mapping table change event support" needs to be
consistent with the value of the enumerated constant
MLX4_DEV_CAP_FLAG2_SL_TO_VL_CHANGE_EVENT defined in file
include/linux/mlx4_device.h

The change here restores that consistency.

Fixes: fd10ed8e6f42 ("IB/mlx4: Fix possible vl/sl field mismatch in LRH header 
in QP1 packets")
Reported-by: Mukesh Kacker 
Signed-off-by: Jack Morgenstein 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index c165f16623a9..009dd03466d6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -160,7 +160,7 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 
flags)
[33] = "RoCEv2 support",
[34] = "DMFS Sniffer support (UC & MC)",
[35] = "QinQ VST mode support",
-   [36] = "sl to vl mapping table change event support"
+   [37] = "sl to vl mapping table change event support",
};
int i;
 
-- 
1.8.3.1



[PATCH net 3/4] net/mlx4_core: Fix namespace misalignment in QinQ VST support commit

2017-08-01 Thread Tariq Toukan
From: Jack Morgenstein 

The cited commit introduced the following new enum value in file
include/linux/mlx4/device.h:

MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP

However the value of MLX4_DEV_CAP_FLAG2_SVLAN_BY_QP needs to stay
consistent with the value used in another namespace in
function dump_dev_cap_flags2(), which is manually kept in sync.
The change here restores that consistency.

Fixes: 7c3d21c8153c ("net/mlx4_core: Preparation for VF vlan protocol 802.1ad")
Reported-by: Mukesh Kacker 
Signed-off-by: Jack Morgenstein 
Signed-off-by: Tariq Toukan 
---
 drivers/net/ethernet/mellanox/mlx4/fw.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/fw.c 
b/drivers/net/ethernet/mellanox/mlx4/fw.c
index 009dd03466d6..7c9502bae1cc 100644
--- a/drivers/net/ethernet/mellanox/mlx4/fw.c
+++ b/drivers/net/ethernet/mellanox/mlx4/fw.c
@@ -159,7 +159,7 @@ static void dump_dev_cap_flags2(struct mlx4_dev *dev, u64 
flags)
[32] = "Loopback source checks support",
[33] = "RoCEv2 support",
[34] = "DMFS Sniffer support (UC & MC)",
-   [35] = "QinQ VST mode support",
+   [36] = "QinQ VST mode support",
[37] = "sl to vl mapping table change event support",
};
int i;
-- 
1.8.3.1



Re: [PATCH v2 net-next 2/3] net: dsa: lan9303: define LAN9303_NUM_PORTS 3

2017-08-01 Thread Egil Hjelmeland

On 01. aug. 2017 15:27, Andrew Lunn wrote:

On Tue, Aug 01, 2017 at 02:31:44PM +0200, Egil Hjelmeland wrote:

On 01. aug. 2017 13:49, Juergen Borleis wrote:

Hi Egil,

On Tuesday 01 August 2017 13:14:38 Egil Hjelmeland wrote:

Will be used instead of '3' in upcomming patches.


+#define LAN9303_NUM_PORTS 3
+


Maybe we should put this macro into a shared location because
in "net/dsa/tag_lan9303.c" there is already a "#define LAN9303_MAX_PORTS
3".

jb



Is there any suitable shared location for such driver specific
definitions?
I could change the name to LAN9303_MAX_PORTS so it the same.
Rhymes better with DSA_MAX_PORTS too.


Hi Egil, Juergen

The other tag drivers do:

 if (source_port >= ds->num_ports || !ds->ports[source_port].netdev)
 return NULL;

or just

 if (!ds->ports[port].netdev)
 return NULL;

The first version is the safest, since a malicious switch could return
port 42, and you are accessing way off the end of ds->ports[]. It does
however require you call dsa_switch_alloc() with the correct number of
ports.



Sounds like a plan for a later patch, when changing to 
dsa_switch_alloc(LAN9303_NUM_PORTS)




  Andrew



Egil


[PATCH net 0/4] mlx4 misc fixes

2017-08-01 Thread Tariq Toukan
Hi Dave,

This patchset contains misc bug fixes from the team
to the mlx4 Core and Eth drivers.

Patch 1 by Inbar fixes a wrong ethtool indication for Wake-on-LAN.
The other 3 patches by Jack add a missing capability description,
and fixes the off-by-1 misalignment for the following capabilities
descriptions.

Series generated against net commit:
cc75f8514db6 samples/bpf: fix bpf tunnel cleanup

Thanks,
Tariq.


Inbar Karmy (1):
  net/mlx4_en: Fix wrong indication of Wake-on-LAN (WoL) support

Jack Morgenstein (3):
  net/mlx4_core: Fix sl_to_vl_change bit offset in flags2 dump
  net/mlx4_core: Fix namespace misalignment in QinQ VST support commit
  net/mlx4_core: Fixes missing capability bit in flags2 capability dump

 drivers/net/ethernet/mellanox/mlx4/en_ethtool.c | 15 ---
 drivers/net/ethernet/mellanox/mlx4/fw.c |  9 +++--
 drivers/net/ethernet/mellanox/mlx4/fw.h |  1 +
 drivers/net/ethernet/mellanox/mlx4/main.c   |  2 ++
 include/linux/mlx4/device.h |  1 +
 5 files changed, 19 insertions(+), 9 deletions(-)

-- 
1.8.3.1



Re: [PATCH v2 net-next 1/3] net: dsa: lan9303: Refactor lan9303_xxx_packet_processing()

2017-08-01 Thread Egil Hjelmeland

On 01. aug. 2017 15:39, Andrew Lunn wrote:

@@ -704,7 +710,7 @@ static void lan9303_get_ethtool_stats(struct dsa_switch 
*ds, int port,
unsigned int u, poff;
int ret;
  
-	poff = port * 0x400;

+   poff = LAN9303_SWITCH_PORT_REG(port, 0);
  
  	for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {

ret = lan9303_read_switch_reg(chip,


So the actual code is:

for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_reg(chip,
  lan9303_mib[u].offset + poff,
  ®);

Could this be written as

for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_port(chip, port, lan9303_mib[u].offset, 
®);

It is then clear you are reading the statistics from a port register.

Andrew



Yes it can. Since it is (insignificantly) less efficient, I
chose not to touch it. But I can do it if you like.

Egil


Re: [PATCH v2 net-next 1/3] net: dsa: lan9303: Refactor lan9303_xxx_packet_processing()

2017-08-01 Thread Andrew Lunn
On Tue, Aug 01, 2017 at 03:50:14PM +0200, Egil Hjelmeland wrote:
> On 01. aug. 2017 15:39, Andrew Lunn wrote:
> >>@@ -704,7 +710,7 @@ static void lan9303_get_ethtool_stats(struct dsa_switch 
> >>*ds, int port,
> >>unsigned int u, poff;
> >>int ret;
> >>-   poff = port * 0x400;
> >>+   poff = LAN9303_SWITCH_PORT_REG(port, 0);
> >>for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
> >>ret = lan9303_read_switch_reg(chip,
> >
> >So the actual code is:
> >
> > for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
> > ret = lan9303_read_switch_reg(chip,
> >   lan9303_mib[u].offset + poff,
> >   ®);
> >
> >Could this be written as
> >
> > for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
> > ret = lan9303_read_switch_port(chip, port, 
> > lan9303_mib[u].offset, ®);
> >
> >It is then clear you are reading the statistics from a port register.
> >
> >Andrew
> >
> 
> Yes it can. Since it is (insignificantly) less efficient, I
> chose not to touch it. But I can do it if you like.

I doubt it is less efficient. The compiler has seen
lan9303_read_switch_port() and will probably inline it.  So what the
optimiser gets to see is probably the same in both cases.

Try generating the assembler listing in both cases, and compare them

make drivers/net/dsa/lan9303-core.lst

 Andrew


[PATCH net-next] tcp: tcp_data_queue() cleanup

2017-08-01 Thread Eric Dumazet
From: Eric Dumazet 

Commit c13ee2a4f03f ("tcp: reindent two spots after prequeue removal")
removed code in tcp_data_queue().

We can go a little farther, removing an always true test,
and removing initializers for fragstolen and eaten variables.

Signed-off-by: Eric Dumazet 
Cc: Florian Westphal 
---
 net/ipv4/tcp_input.c |   15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 
af0a98d54b627a90d90fb9f1f2a600277c620559..df670d7ed98df1c108dc654cf446344ff9fdd4be
 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4564,8 +4564,8 @@ int tcp_send_rcvq(struct sock *sk, struct msghdr *msg, 
size_t size)
 static void tcp_data_queue(struct sock *sk, struct sk_buff *skb)
 {
struct tcp_sock *tp = tcp_sk(sk);
-   bool fragstolen = false;
-   int eaten = -1;
+   bool fragstolen;
+   int eaten;
 
if (TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) {
__kfree_skb(skb);
@@ -4588,12 +4588,11 @@ static void tcp_data_queue(struct sock *sk, struct 
sk_buff *skb)
 
/* Ok. In sequence. In window. */
 queue_and_out:
-   if (eaten < 0) {
-   if (skb_queue_len(&sk->sk_receive_queue) == 0)
-   sk_forced_mem_schedule(sk, skb->truesize);
-   else if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
-   goto drop;
-   }
+   if (skb_queue_len(&sk->sk_receive_queue) == 0)
+   sk_forced_mem_schedule(sk, skb->truesize);
+   else if (tcp_try_rmem_schedule(sk, skb, skb->truesize))
+   goto drop;
+
eaten = tcp_queue_rcv(sk, skb, 0, &fragstolen);
tcp_rcv_nxt_update(tp, TCP_SKB_CB(skb)->end_seq);
if (skb->len)




Re: [PATCH RFC, iproute2] tc/mirred: Extend the mirred/redirect action to accept additional traffic class parameter

2017-08-01 Thread Roman Mashak
Amritha Nambiar  writes:

[...]

> @@ -72,6 +73,8 @@ parse_direction(struct action_util *a, int *argc_p, char 
> ***argv_p,
>   struct tc_mirred p = {};
>   struct rtattr *tail;
>   char d[16] = {};
> + __u32 flags = 0;
> + __u8 tc;
>  
>   while (argc > 0) {
>  
> @@ -142,6 +145,18 @@ parse_direction(struct action_util *a, int *argc_p, char 
> ***argv_p,
>   argc--;
>   argv++;
>  
> + if ((argc > 0) && (matches(*argv, "tc") == 0)) {
> + NEXT_ARG();
> + tc = atoi(*argv);

Probably better to use strtol() instead, somebody wants to specify hex
base, also it has stronger error checks.

> + if (tc >= MIRRED_TC_MAP_MAX) {
> + fprintf(stderr, "Invalid TC 
> index\n");
> + return -1;
> + }
> + flags |= MIRRED_F_TC_MAP;
> + ok++;
> + argc--;
> + argv++;
> + }
>   break;
>  
>   }
> @@ -193,6 +208,9 @@ parse_direction(struct action_util *a, int *argc_p, char 
> ***argv_p,
>   tail = NLMSG_TAIL(n);
>   addattr_l(n, MAX_MSG, tca_id, NULL, 0);
>   addattr_l(n, MAX_MSG, TCA_MIRRED_PARMS, &p, sizeof(p));
> + if (flags & MIRRED_F_TC_MAP)
> + addattr_l(n, MAX_MSG, TCA_MIRRED_TC_MAP,
> +   &tc, sizeof(tc));
>   tail->rta_len = (void *) NLMSG_TAIL(n) - (void *) tail;
>  
>   *argc_p = argc;
> @@ -248,6 +266,7 @@ print_mirred(struct action_util *au, FILE * f, struct 
> rtattr *arg)
>   struct tc_mirred *p;
>   struct rtattr *tb[TCA_MIRRED_MAX + 1];
>   const char *dev;
> + __u8 *tc;
>  
>   if (arg == NULL)
>   return -1;
> @@ -273,6 +292,11 @@ print_mirred(struct action_util *au, FILE * f, struct 
> rtattr *arg)
>   fprintf(f, "mirred (%s to device %s)", mirred_n2a(p->eaction), dev);
>   print_action_control(f, " ", p->action, "");
>  

> + if (tb[TCA_MIRRED_TC_MAP]) {
> + tc = RTA_DATA(tb[TCA_MIRRED_TC_MAP]);
> + fprintf(f, " tc %d", *tc);

'tc' is declared as __u8 so format should be %u

> + }
> +
>   fprintf(f, "\n ");
>   fprintf(f, "\tindex %u ref %d bind %d", p->index, p->refcnt,
>   p->bindcnt);


Re: [PATCH net-next 01/11] net: dsa: make EEE ops optional

2017-08-01 Thread Andrew Lunn
Hi Vivien

> @@ -646,38 +646,42 @@ static int dsa_slave_set_eee(struct net_device *dev, 
> struct ethtool_eee *e)
>  {
>   struct dsa_slave_priv *p = netdev_priv(dev);
>   struct dsa_switch *ds = p->dp->ds;
> - int ret;
> + int err = -ENODEV;
>  
> - if (!ds->ops->set_eee)
> - return -EOPNOTSUPP;
> + if (ds->ops->set_eee) {
> + err = ds->ops->set_eee(ds, p->dp->index, p->phy, e);
> + if (err)
> + return err;
> + }
>  
> - ret = ds->ops->set_eee(ds, p->dp->index, p->phy, e);
> - if (ret)
> - return ret;
> + if (p->phy) {
> + err = phy_ethtool_set_eee(p->phy, e);
> + if (err)
> + return err;

I don't think you need this if (err). You unconditionally return err
as you exit the function.

> + }
>  
> - if (p->phy)
> - ret = phy_ethtool_set_eee(p->phy, e);
> -
> - return ret;
> + return err;




>  }
>  
>  static int dsa_slave_get_eee(struct net_device *dev, struct ethtool_eee *e)
>  {
>   struct dsa_slave_priv *p = netdev_priv(dev);
>   struct dsa_switch *ds = p->dp->ds;
> - int ret;
> + int err = -ENODEV;
>  
> - if (!ds->ops->get_eee)
> - return -EOPNOTSUPP;
> + if (ds->ops->get_eee) {
> + err = ds->ops->get_eee(ds, p->dp->index, e);
> + if (err)
> + return err;
> + }
>  
> - ret = ds->ops->get_eee(ds, p->dp->index, e);
> - if (ret)
> - return ret;
> + if (p->phy) {
> + err = phy_ethtool_get_eee(p->phy, e);
> + if (err)
> + return err;

Same here.

> + }
>  
> - if (p->phy)
> - ret = phy_ethtool_get_eee(p->phy, e);
> -
> - return ret;
> + return err;
>  }
>  
>  #ifdef CONFIG_NET_POLL_CONTROLLER
> -- 
> 2.13.3
> 


Re: [PATCH v3] ss: Enclose IPv6 address in brackets

2017-08-01 Thread Phil Sutter
On Tue, Aug 01, 2017 at 12:05:13PM +0200, Florian Lehner wrote:
[...]
> @@ -114,9 +114,13 @@ int addr64_n2a(__u64 addr, char *buff, size_t len);
>  int af_bit_len(int af);
>  int af_byte_len(int af);
> 
> -const char *format_host_r(int af, int len, const void *addr,
> -char *buf, int buflen);
> -const char *format_host(int af, int lne, const void *addr);
> +const char *format_host_rb(int af, int len, const void *addr,
> +char *buf, int buflen, bool *resolved);
> +#define format_host_r(af, len, addr, buf, buflen) \
> + format_host_rb(af, len, addr, buf, buflen, NULL)
> +const char *format_host_b(int af, int lne, const void *addr, bool
> *resolved);
> +#define format_host(af, lne, addr) \
> + format_host_b(af, lne, addr, NULL)
>  #define format_host_rta(af, rta) \
>   format_host(af, RTA_PAYLOAD(rta), RTA_DATA(rta))
>  const char *rt_addr_n2a_r(int af, int len, const void *addr,
> diff --git a/lib/utils.c b/lib/utils.c
> index 9aa3219..42c3bf5 100644
> --- a/lib/utils.c
> +++ b/lib/utils.c
> @@ -898,8 +898,8 @@ static const char *resolve_address(const void *addr,
> int len, int af)
>  }
>  #endif
> 
> -const char *format_host_r(int af, int len, const void *addr,
> - char *buf, int buflen)
> +const char *format_host_rb(int af, int len, const void *addr,
> + char *buf, int buflen, bool *resolved)
>  {
>  #ifdef RESOLVE_HOSTNAMES
>   if (resolve_hosts) {
> @@ -909,17 +909,20 @@ const char *format_host_r(int af, int len, const
> void *addr,
> 
>   if (len > 0 &&
>   (n = resolve_address(addr, len, af)) != NULL)
> + {
> + *resolved = true;
>   return n;
> + }
>   }
>  #endif
>   return rt_addr_n2a_r(af, len, addr, buf, buflen);
>  }

Did you test that? I guess calling format_host() will lead to
dereference of a NULL pointer.

Cheers, Phil


RE: [PATCH net-next 00/10] net: l3mdev: Support for sockets bound to enslaved device

2017-08-01 Thread David Laight
From: David Ahern
> Sent: 01 August 2017 04:13
...
> Existing code for socket lookups already pass in 6+ arguments. Rather
> than add another for the enslaved device index, the existing lookups
> are converted to use a new sk_lookup struct. From there, the enslaved
> device index becomes another element of the struct.
> 
> Patch 1 introduces sk_lookup struct and helper.

I guess that socket lookup happens quite often!
Passing the lookup parameters in a structure might have a
measurable negative effect on performance - especially if the
structure isn't passed through to other functions.

Have you made any performance mearurements?

David



[PATCH net] xfrm: fix null pointer dereference on state and tmpl sort

2017-08-01 Thread Koichiro Den
Creating sub policy that matches the same outer flow as main policy does
leads to a null pointer dereference if the outer mode's family is ipv4.
For userspace compatibility, this patch just eliminates the crash i.e.,
does not introduce any new sorting rule, which would fruitlessly affect
all but the aforementioned case.

Signed-off-by: Koichiro Den 
---
 net/xfrm/xfrm_state.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index 6c0956d10db6..a792effdb0b5 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -1620,6 +1620,7 @@ int
 xfrm_tmpl_sort(struct xfrm_tmpl **dst, struct xfrm_tmpl **src, int n,
   unsigned short family, struct net *net)
 {
+   int i;
int err = 0;
struct xfrm_state_afinfo *afinfo = xfrm_state_get_afinfo(family);
if (!afinfo)
@@ -1628,6 +1629,9 @@ xfrm_tmpl_sort(struct xfrm_tmpl **dst, struct xfrm_tmpl 
**src, int n,
spin_lock_bh(&net->xfrm.xfrm_state_lock); /*FIXME*/
if (afinfo->tmpl_sort)
err = afinfo->tmpl_sort(dst, src, n);
+   else
+   for (i = 0; i < n; i++)
+   dst[i] = src[i];
spin_unlock_bh(&net->xfrm.xfrm_state_lock);
rcu_read_unlock();
return err;
@@ -1638,6 +1642,7 @@ int
 xfrm_state_sort(struct xfrm_state **dst, struct xfrm_state **src, int n,
unsigned short family)
 {
+   int i;
int err = 0;
struct xfrm_state_afinfo *afinfo = xfrm_state_get_afinfo(family);
struct net *net = xs_net(*src);
@@ -1648,6 +1653,9 @@ xfrm_state_sort(struct xfrm_state **dst, struct 
xfrm_state **src, int n,
spin_lock_bh(&net->xfrm.xfrm_state_lock);
if (afinfo->state_sort)
err = afinfo->state_sort(dst, src, n);
+   else
+   for (i = 0; i < n; i++)
+   dst[i] = src[i];
spin_unlock_bh(&net->xfrm.xfrm_state_lock);
rcu_read_unlock();
return err;
-- 
2.9.4




Re: [PATCH net 3/3] tcp: fix xmit timer to only be reset if data ACKed/SACKed

2017-08-01 Thread Neal Cardwell
On Tue, Aug 1, 2017 at 8:20 AM, maowenan  wrote:
> > + /* If needed, reset TLP/RTO timer; RACK may later override this. */
> [Mao Wenan] I have question about RACK, if there is no RACK feature
> in lower version, who can clear this flag:FLAG_SET_XMIT_TIMER?

In the comment, "this" is referring to the xmit timer. The comment is
referring to the fact that RACK (or similarly
tcp_pause_early_retransmit() in earlier kernels) can override the
value of the xmit timer set in tcp_set_xmit_timer(). The comment is
not claiming that RACK clears the FLAG_SET_XMIT_TIMER bit in "flag".

Note that "flag" is an argument to tcp_ack(). And while processing a
given incoming ACK, the callers of tcp_ack() pass in either 0 or one
or more constant bit flags. So FLAG_SET_XMIT_TIMER and all the other
bit flags are implicitly set to zero at the beginning of processing
each ACK.

neal


Re: [PATCH net-next 10/11] net: dsa: mv88e6xxx: remove EEE support

2017-08-01 Thread Andrew Lunn
On Mon, Jul 31, 2017 at 06:17:18PM -0400, Vivien Didelot wrote:
> The PHY's EEE settings are already accessed by the DSA layer through the
> Marvell PHY driver and there is nothing to be done for switch's MACs.

I'm confused, or missing something. Does not patch #1 mean that if the
DSA driver does not have a set_eee function, we always return -ENODEV
in slave.c?

There might be nothing to configure here, but some of the switches do
support EEE. So we need at least a NOP set_eee. Better still it should
return -ENODEV for those switches which don't actually support EEE,
and 0 for those that do?

Andrew


Re: [PATCH net-next 00/11] net: dsa: rework EEE support

2017-08-01 Thread Andrew Lunn
On Mon, Jul 31, 2017 at 06:17:08PM -0400, Vivien Didelot wrote:
> EEE implies configuring the port's PHY and MAC of both ends of the wire.
> 
> The current EEE support in DSA mixes PHY and MAC configuration, which is
> bad because PHYs must be configured through a proper PHY driver. The DSA
> switch operations for EEE are only meant for configuring the port's MAC,
> which are integrated in the Ethernet switch device.
> 
> This patchset fixes the EEE support in qca8k driver, makes the DSA layer
> call phy_init_eee for all drivers, and remove the EEE support from the
> mv88e6xxx driver since the Marvell PHY driver should be enough for it.

Hi Vivien

Thanks for working on this. I like the general direction this takes,
moving the repeated code into slave.c

   Andrew


Re: [PATCH RFC 08/13] phylink: add phylink infrastructure

2017-08-01 Thread Andrew Lunn
On Tue, Jul 25, 2017 at 03:03:13PM +0100, Russell King wrote:
> The link between the ethernet MAC and its PHY has become more complex
> as the interface evolves.  This is especially true with serdes links,
> where the part of the PHY is effectively integrated into the MAC.
> 
> Serdes links can be connected to a variety of devices, including SFF
> modules soldered down onto the board with the MAC, a SFP cage with
> a hotpluggable SFP module which may contain a PHY or directly modulate
> the serdes signals onto optical media with or without a PHY, or even
> a classical PHY connection.
> 
> Moreover, the negotiation information on serdes links comes in two
> varieties - SGMII mode, where the PHY provides its speed/duplex/flow
> control information to the MAC, and 1000base-X mode where both ends
> exchange their abilities and each resolve the link capabilities.
> 
> This means we need a more flexible means to support these arrangements,
> particularly with the hotpluggable nature of SFP, where the PHY can
> be attached or detached after the network device has been brought up.
> 
> Ethtool information can come from multiple sources:
> - we may have a PHY operating in either SGMII or 1000base-X mode, in
>   which case we take ethtool/mii data directly from the PHY.
> - we may have a optical SFP module without a PHY, with the MAC
>   operating in 1000base-X mode - the ethtool/mii data needs to come
>   from the MAC.
> - we may have a copper SFP module with a PHY whic can't be accessed,
>   which means we need to take ethtool/mii data from the MAC.
> 
> Phylink aims to solve this by providing an intermediary between the
> MAC and PHY, providing a safe way for PHYs to be hotplugged, and
> allowing a SFP driver to reconfigure the serdes connection.
> 
> Phylink also takes over support of fixed link connections, where the
> speed/duplex/flow control are fixed, but link status may be controlled
> by a GPIO signal.  By avoiding the fixed-phy implementation, phylink
> can provide a faster response to link events: fixed-phy has to wait for
> phylib to operate its state machine, which can take several seconds.
> In comparison, phylink takes milliseconds.
> 
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH RFC 09/13] sfp: add sfp-bus to bridge between network devices and sfp cages

2017-08-01 Thread Andrew Lunn
On Tue, Jul 25, 2017 at 03:03:18PM +0100, Russell King wrote:
> Signed-off-by: Russell King 

Reviewed-by: Andrew Lunn 

Andrew


Re: [PATCH net 2/3] tcp: enable xmit timer fix by having TLP use time when RTO should fire

2017-08-01 Thread Neal Cardwell
On Tue, Aug 1, 2017 at 3:22 AM, Eric Dumazet  wrote:
> On Mon, 2017-07-31 at 22:58 -0400, Neal Cardwell wrote:
>> @@ -2418,13 +2418,9 @@ bool tcp_schedule_loss_probe(struct sock *sk)
>>   timeout = max_t(u32, timeout, msecs_to_jiffies(10));
>>
>>   /* If RTO is shorter, just schedule TLP in its place. */
>
> I have hard time to read this comment.
>
> We are here trying to arm a timer based on TLP.
>
> If RTO is shorter, we'll arm the timer based on RTO instead of TLP.
>
> Is "If RTO is shorter, just schedule TLP in its place." really correct ?
>
> I suggest we reword the comment or simply get rid of it now the code is
> more obvious.

OK, how about:

  /* If the RTO formula yields an earlier time, then use that time. */

We can also add a reference to the RACK/TLP Internet Draft at the top
of tcp_schedule_loss_probe().

Whatever wording we decide on, I am happy to send a patch for net-next
once this fix is merged into net-next.

neal


Re: [PATCH RFC 00/13] phylink and sfp support

2017-08-01 Thread Andrew Lunn
On Tue, Jul 25, 2017 at 03:01:39PM +0100, Russell King - ARM Linux wrote:
> Hi,
> 
> This patch series introduces generic support for SFP sockets found on
> various Marvell based platforms.  The idea here is to provide common
> SFP socket support which can be re-used by network drivers as
> appropriate, rather than each network driver having to re-implement
> SFP socket support.

There is a lot of code here, and i'm not really going to understand it
until i use it. I have a couple of boards with SFFs connected to
switches, so i will spend some time over the next month or so to make
DSA use phylink. As is usual, we can sort out any issues as we go
along.

David, if it still applies cleanly, can you add it to net-next?

   Thanks
Andrew


Re: [PATCH] ss: Enclose IPv6 address in brackets

2017-08-01 Thread Stephen Hemminger
On Tue, 1 Aug 2017 11:11:03 +
David Laight  wrote:

> From: Florian Lehner
> > Sent: 29 July 2017 13:29
> > This patch adds support for RFC2732 IPv6 address format with brackets
> > for the tool ss. So output for ss changes from
> > 2a00:1450:400a:804::200e:443 to [2a00:1450:400a:804::200e]:443 for IPv6
> > addresses with attached port number.
> > 
> > Signed-off-by: Lehner Florian 
> > ---
> >  misc/ss.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/misc/ss.c b/misc/ss.c
> > index 12763c9..db39c93 100644
> > --- a/misc/ss.c
> > +++ b/misc/ss.c
> > @@ -1059,7 +1059,11 @@ static void inet_addr_print(const inet_prefix *a,
> > int port, unsigned int ifindex
> > ap = format_host(AF_INET, 4, a->data);
> > }
> > } else {
> > -   ap = format_host(a->family, 16, a->data);
> > +   if (a->family == AF_INET6) {
> > +   sprintf(buf, "[%s]", format_host(a->family, 16, 
> > a->data));
> > +   } else {
> > +   ap = format_host(a->family, 16, a->data);
> > +   }
> > est_len = strlen(ap);  
> ...
> 
> There are some strange things going on with global variables if this works at 
> all.
> The text form of the address is in buf[] in one path and *ap in the other.
> 
> One option might be to call format_host() then use strchr(ap, ':')
> to add [] if the string contains any ':'.
> 
>   David
> 

That sounds like a better solution.

Also what about IN6ADDR_ANY


Re: [PATCH v2 net-next 1/3] net: dsa: lan9303: Refactor lan9303_xxx_packet_processing()

2017-08-01 Thread Egil Hjelmeland

On 01. aug. 2017 16:02, Andrew Lunn wrote:

On Tue, Aug 01, 2017 at 03:50:14PM +0200, Egil Hjelmeland wrote:

On 01. aug. 2017 15:39, Andrew Lunn wrote:

@@ -704,7 +710,7 @@ static void lan9303_get_ethtool_stats(struct dsa_switch 
*ds, int port,
unsigned int u, poff;
int ret;
-   poff = port * 0x400;
+   poff = LAN9303_SWITCH_PORT_REG(port, 0);
for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_reg(chip,


So the actual code is:

for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_reg(chip,
  lan9303_mib[u].offset + poff,
  ®);

Could this be written as

for (u = 0; u < ARRAY_SIZE(lan9303_mib); u++) {
ret = lan9303_read_switch_port(chip, port, lan9303_mib[u].offset, 
®);

It is then clear you are reading the statistics from a port register.

Andrew



Yes it can. Since it is (insignificantly) less efficient, I
chose not to touch it. But I can do it if you like.


I doubt it is less efficient. The compiler has seen
lan9303_read_switch_port() and will probably inline it.  So what the
optimiser gets to see is probably the same in both cases.

Try generating the assembler listing in both cases, and compare them

make drivers/net/dsa/lan9303-core.lst

  Andrew



Thanks for the tips about generating assembler listing, can be useful
another time. But in this case I trust you :-)
And in this case it does not really matter, because its not in the
data path.

I did try to look at the listing. But I did not quite understand it.
Looks like it is doing both inlining and unrolling.

Anyway, you just decide how you like to have it in this series.

Egil





Re: [RFC net-next] net ipv6: convert fib6_table rwlock to a percpu lock

2017-08-01 Thread Stephen Hemminger
On Mon, 31 Jul 2017 19:57:04 -0700
Shaohua Li  wrote:

> On Mon, Jul 31, 2017 at 04:10:07PM -0700, Stephen Hemminger wrote:
> > On Mon, 31 Jul 2017 10:18:57 -0700
> > Shaohua Li  wrote:
> >   
> > > From: Shaohua Li 
> > > 
> > > In a syn flooding test, the fib6_table rwlock is a significant
> > > bottleneck. While converting the rwlock to rcu sounds straighforward,
> > > but is very challenging if it's possible. A percpu spinlock is quite
> > > trival for this problem since updating the routing table is a rare
> > > event. In my test, the server receives around 1.5 Mpps in syn flooding
> > > test without the patch in a dual sockets and 56-CPU system. With the
> > > patch, the server receives around 3.8Mpps, and perf report doesn't show
> > > the locking issue.
> > > 
> > > Cc: Wei Wang   
> > 
> > You just reinvented brlock...

It was a long time ago (2.4) that brlock came in
 https://lwn.net/Articles/378781/

I removed it in 2.5.64 or so.



Re: [PATCH net-next 00/10] net: l3mdev: Support for sockets bound to enslaved device

2017-08-01 Thread David Ahern
On 8/1/17 8:15 AM, David Laight wrote:
> From: David Ahern
>> Sent: 01 August 2017 04:13
> ...
>> Existing code for socket lookups already pass in 6+ arguments. Rather
>> than add another for the enslaved device index, the existing lookups
>> are converted to use a new sk_lookup struct. From there, the enslaved
>> device index becomes another element of the struct.
>>
>> Patch 1 introduces sk_lookup struct and helper.
> 
> I guess that socket lookup happens quite often!
> Passing the lookup parameters in a structure might have a
> measurable negative effect on performance - especially if the
> structure isn't passed through to other functions.
> 
> Have you made any performance mearurements?


Before patches:

IPv4
   Test   TCP_RR   23769.42 23862.59 23867.69  sum
71499.70  avg  23833
   Test  TCP_CRR8649.29  8650.94  8661.24  sum
25961.47  avg   8653
   Test   UDP_RR   26935.38 26813.30 26747.88  sum
80496.56  avg  26832

IPv6
   Test   TCP_RR   24708.10 24629.43 24593.75  sum
73931.28  avg  24643
   Test  TCP_CRR8432.82  8489.26  8474.82  sum
25396.90  avg   8465
   Test   UDP_RR   23607.57 23722.37 23713.80  sum
71043.74  avg  23681


#
After patches:

IPv4
   Test   TCP_RR   24204.41 23993.05 24129.18  sum
72326.64  avg  24108
   Test  TCP_CRR8690.31  8630.12  8620.88  sum
25941.31  avg   8647
   Test   UDP_RR   26653.26 26725.76 26587.70  sum
79966.72  avg  26655

IPv6
   Test   TCP_RR   24807.54 24698.30 24849.84  sum
74355.68  avg  24785
   Test  TCP_CRR8573.22  8640.02  8624.09  sum
25837.33  avg   8612
   Test   UDP_RR   23800.14 23747.01 23814.94  sum
71362.09  avg  23787

The middle columns are the results of each 30-second run and then the
average of the 3 is on the end.



Re: [PATCH] Adding Agile-SD TCP module and modifying Kconfig and Makefile to configure the kernel for this new module to configure the kernel for this new module.

2017-08-01 Thread Stephen Hemminger
On Tue,  1 Aug 2017 17:50:05 +0800
mohamedalrshah  wrote:

> From: Mohamed Alrshah 
> 

Please add more background on Agile-SD in the email commit message.


> +static struct tcp_congestion_ops agilesdtcp __read_mostly = {
> + .init   = agilesdtcp_init,
> + .ssthresh   = agilesdtcp_recalc_ssthresh,   //REQUIRED
> + .cong_avoid = agilesdtcp_cong_avoid,//REQUIRED
> + .set_state  = agilesdtcp_state,
> + .undo_cwnd  = agilesdtcp_undo_cwnd,
> + .pkts_acked = agilesdtcp_acked,
> + .owner  = THIS_MODULE,
> + .name   = "agilesd",//REQUIRED
> + //.min_cwnd = agilesdtcp_min_cwnd,  //NOT REQUIRED
> +};

Your code must match kernel coding style. See Documentation for more
info and use the checkpatch.pl checking script.


RE: [PATCH RFC, iproute2] tc/mirred: Extend the mirred/redirect action to accept additional traffic class parameter

2017-08-01 Thread David Laight
From: Stephen Hemminger
> Sent: 01 August 2017 04:52
> On Mon, 31 Jul 2017 17:40:50 -0700
> Amritha Nambiar  wrote:
> The concept is fine, bu t the code looks different than the rest which
> is never a good sign.
> 
> 
> > +   if ((argc > 0) && (matches(*argv, "tc") == 0)) {
> 
> Extra () are unnecessary in compound conditional.
> 
> > +   tc = atoi(*argv);
> 
> Prefer using strtoul since it has better error handling than atoi()
> 
> > +   argc--;
> > +   argv++;
> > +   }
> 
> 
> Use NEXT_ARG() construct like rest of the code.

Why bother faffing about with argc at all?
The argument list terminates when *argv == NULL.

David



[PATCH iproute2 v2] ip: change flag names to an array

2017-08-01 Thread Stephen Hemminger
For the most of the address flags, use a table of values rather
than open coding every value.  This allows for easier inevitable
expansion of flags.

This also fixes the missing stable-privacy flag.

Signed-off-by: Stephen Hemminger 
---
v2 - use name/value table rather than assuming order
 handle negative masks "-deprecated"

 ip/ipaddress.c | 184 -
 1 file changed, 89 insertions(+), 95 deletions(-)

diff --git a/ip/ipaddress.c b/ip/ipaddress.c
index cf8ef8186f52..4d37c5e04507 100644
--- a/ip/ipaddress.c
+++ b/ip/ipaddress.c
@@ -1012,14 +1012,95 @@ static unsigned int get_ifa_flags(struct ifaddrmsg *ifa,
ifa->ifa_flags;
 }
 
+/* Mapping from argument to address flag mask */
+struct {
+   const char *name;
+   unsigned long value;
+} ifa_flag_names[] = {
+   { "secondary",  IFA_F_SECONDARY },
+   { "temporary",  IFA_F_SECONDARY },
+   { "nodad",  IFA_F_NODAD },
+   { "optimistic", IFA_F_OPTIMISTIC },
+   { "dadfailed",  IFA_F_DADFAILED },
+   { "home",   IFA_F_HOMEADDRESS },
+   { "deprecated", IFA_F_DEPRECATED },
+   { "tentative",  IFA_F_TENTATIVE },
+   { "permanent",  IFA_F_PERMANENT },
+   { "mngtmpaddr", IFA_F_MANAGETEMPADDR },
+   { "noprefixroute",  IFA_F_NOPREFIXROUTE },
+   { "autojoin",   IFA_F_MCAUTOJOIN },
+   { "stable-privacy", IFA_F_STABLE_PRIVACY },
+};
+
+static void print_ifa_flags(FILE *fp, const struct ifaddrmsg *ifa,
+   unsigned int flags)
+{
+   unsigned int i;
+
+   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
+   unsigned long mask = ifa_flag_names[i].value;
+
+   if (mask == IFA_F_PERMANENT) {
+   if (!(flags & mask))
+   fprintf(fp, "dynamic ");
+   } else if (flags & mask) {
+   if (mask == IFA_F_SECONDARY &&
+   ifa->ifa_family == AF_INET6)
+   fprintf(fp, "temporary ");
+   else
+   fprintf(fp, "%s ", ifa_flag_names[i].name);
+   }
+
+   flags &= ~mask;
+   }
+
+   if (flags)
+   fprintf(fp, "flags %02x ", flags);
+
+}
+
+static int get_filter(const char *arg)
+{
+   unsigned int i;
+
+   /* Special cases */
+   if (strcmp(arg, "dynamic") == 0) {
+   filter.flags &= ~IFA_F_PERMANENT;
+   filter.flagmask |= IFA_F_PERMANENT;
+   } else if (strcmp(arg, "primary") == 0) {
+   filter.flags &= ~IFA_F_SECONDARY;
+   filter.flagmask |= IFA_F_SECONDARY;
+   } else if (*arg == '-') {
+   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
+   if (strcmp(arg + 1, ifa_flag_names[i].name))
+   continue;
+
+   filter.flags &= ifa_flag_names[i].value;
+   filter.flagmask |= ifa_flag_names[i].value;
+   return 0;
+   }
+
+   return -1;
+   } else {
+   for (i = 0; i < ARRAY_SIZE(ifa_flag_names); i++) {
+   if (strcmp(arg, ifa_flag_names[i].name))
+   continue;
+   filter.flags |= ifa_flag_names[i].value;
+   filter.flagmask |= ifa_flag_names[i].value;
+   return 0;
+   }
+   return -1;
+   }
+
+   return 0;
+}
+
 int print_addrinfo(const struct sockaddr_nl *who, struct nlmsghdr *n,
   void *arg)
 {
FILE *fp = arg;
struct ifaddrmsg *ifa = NLMSG_DATA(n);
int len = n->nlmsg_len;
-   int deprecated = 0;
-   /* Use local copy of ifa_flags to not interfere with filtering code */
unsigned int ifa_flags;
struct rtattr *rta_tb[IFA_MAX+1];
 
@@ -1144,52 +1225,9 @@ int print_addrinfo(const struct sockaddr_nl *who, struct 
nlmsghdr *n,
  rta_tb[IFA_ANYCAST]));
}
fprintf(fp, "scope %s ", rtnl_rtscope_n2a(ifa->ifa_scope, b1, 
sizeof(b1)));
-   if (ifa_flags & IFA_F_SECONDARY) {
-   ifa_flags &= ~IFA_F_SECONDARY;
-   if (ifa->ifa_family == AF_INET6)
-   fprintf(fp, "temporary ");
-   else
-   fprintf(fp, "secondary ");
-   }
-   if (ifa_flags & IFA_F_TENTATIVE) {
-   ifa_flags &= ~IFA_F_TENTATIVE;
-   fprintf(fp, "tentative ");
-   }
-   if (ifa_flags & IFA_F_DEPRECATED) {
-   ifa_flags &= ~IFA_F_DEPRECATED;
-   deprecated = 1;
-   fprintf(fp, "deprecated ");
-   }
-   if (ifa_flags & IFA_F_HOMEADDRESS) {
-   ifa_flags &

Re: [PATCH net-next 10/11] net: dsa: mv88e6xxx: remove EEE support

2017-08-01 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

> On Mon, Jul 31, 2017 at 06:17:18PM -0400, Vivien Didelot wrote:
>> The PHY's EEE settings are already accessed by the DSA layer through the
>> Marvell PHY driver and there is nothing to be done for switch's MACs.
>
> I'm confused, or missing something. Does not patch #1 mean that if the
> DSA driver does not have a set_eee function, we always return -ENODEV
> in slave.c?

If there is a PHY, phy_init_eee (if eee_enabled is true) and
phy_ethtool_set_eee is called. If there is a .set_eee op, it is
called. If both are absent, -ENODEV is returned.

> There might be nothing to configure here, but some of the switches do
> support EEE. So we need at least a NOP set_eee. Better still it should
> return -ENODEV for those switches which don't actually support EEE,
> and 0 for those that do?

As I explain in a commit message, I didn't want to make the EEE ops
mandatory, because it makes it impossible for the DSA layer to
distinguish whether the driver did not update the ethtool_eee structure
because there is nothing to do on the port's MAC side (e.g. mv88e6xxx or
qca8k) or if it returned EEE disabled. To avoid confusion, I prefered to
make the ops optional, making the phy_* calls enough in the first case.

That being said, if you don't share this point of view and prefer to
define an inline dsa_set_eee_noop() function, I don't mind, since this
allows the DSA layer to make the distinction.


Thanks,

Vivien


[PATCH iproute2] netns: make /var/run/netns bind-mount recursive

2017-08-01 Thread Casey Callendrello
When ip netns {add|delete} is first run, it bind-mounts /var/run/netns
on top of itself, then marks it as shared. However, if there are already
bind-mounts in the directory from other tools, these would not be
propagated. Fix this by recursively bind-mounting.

Signed-off-by: Casey Callendrello 
---
 ip/ipnetns.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ip/ipnetns.c b/ip/ipnetns.c
index 198e9de8..9ee1fe6a 100644
--- a/ip/ipnetns.c
+++ b/ip/ipnetns.c
@@ -636,7 +636,7 @@ static int netns_add(int argc, char **argv)
}
 
/* Upgrade NETNS_RUN_DIR to a mount point */
-   if (mount(NETNS_RUN_DIR, NETNS_RUN_DIR, "none", MS_BIND, NULL)) 
{
+   if (mount(NETNS_RUN_DIR, NETNS_RUN_DIR, "none", MS_BIND | 
MS_REC, NULL)) {
fprintf(stderr, "mount --bind %s %s failed: %s\n",
NETNS_RUN_DIR, NETNS_RUN_DIR, strerror(errno));
return -1;
-- 
2.13.3



[PATCH 2/2] usb: qmi_wwan: add D-Link DWM-222 device ID

2017-08-01 Thread Hector Martin
Cc: sta...@vger.kernel.org
Signed-off-by: Hector Martin 
---
 drivers/net/usb/qmi_wwan.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/usb/qmi_wwan.c b/drivers/net/usb/qmi_wwan.c
index 5894e3c9468f..ff6f39fe6c00 100644
--- a/drivers/net/usb/qmi_wwan.c
+++ b/drivers/net/usb/qmi_wwan.c
@@ -1175,6 +1175,7 @@ static const struct usb_device_id products[] = {
{QMI_FIXED_INTF(0x19d2, 0x1428, 2)},/* Telewell TW-LTE 4G v2 */
{QMI_FIXED_INTF(0x19d2, 0x2002, 4)},/* ZTE (Vodafone) K3765-Z */
{QMI_FIXED_INTF(0x2001, 0x7e19, 4)},/* D-Link DWM-221 B1 */
+   {QMI_FIXED_INTF(0x2001, 0x7e35, 4)},/* D-Link DWM-222 */
{QMI_FIXED_INTF(0x0f3d, 0x68a2, 8)},/* Sierra Wireless MC7700 */
{QMI_FIXED_INTF(0x114f, 0x68a2, 8)},/* Sierra Wireless MC7750 */
{QMI_FIXED_INTF(0x1199, 0x68a2, 8)},/* Sierra Wireless MC7710 in 
QMI mode */
-- 
2.13.3



Re: [PATCH 1/2] ipv6: constify inet6_protocol structures

2017-08-01 Thread David Ahern
On 7/31/17 11:59 PM, Julia Lawall wrote:
>> This change breaks the kernel if one of these sysctls are changed:
>> tcp_early_demux, udp_early_demux
> 
> The other patch in the series has the same problem and should be dropped
> too.
> 
> julia

Julia: are you going to send a revert patch? Right now I have to do that
manually before launching test scripts.


Re: [PATCH net-next 10/11] net: dsa: mv88e6xxx: remove EEE support

2017-08-01 Thread Andrew Lunn
On Tue, Aug 01, 2017 at 11:36:13AM -0400, Vivien Didelot wrote:
> Hi Andrew,
> 
> Andrew Lunn  writes:
> 
> > On Mon, Jul 31, 2017 at 06:17:18PM -0400, Vivien Didelot wrote:
> >> The PHY's EEE settings are already accessed by the DSA layer through the
> >> Marvell PHY driver and there is nothing to be done for switch's MACs.
> >
> > I'm confused, or missing something. Does not patch #1 mean that if the
> > DSA driver does not have a set_eee function, we always return -ENODEV
> > in slave.c?
> 
> If there is a PHY, phy_init_eee (if eee_enabled is true) and
> phy_ethtool_set_eee is called. If there is a .set_eee op, it is
> called. If both are absent, -ENODEV is returned.

O.K, i don't think that is correct. EEE should only be enabled if both
the MAC and the PHY supports it. We need some way for the MAC to
indicate it does not support EEE.

If set_eee is optional the meaning of a NULL pointer is that the MAC
does support EEE. So for mv88e6060, lan9303, microchip and mt7530
which currently don't support EEE, you need to add a set_eee which
returns -ENODEV.

Having to implement the op to say you don't implement the feature just
seems wrong.

  Andrew


Re: [PATCH 1/2] ipv6: constify inet6_protocol structures

2017-08-01 Thread Julia Lawall


On Tue, 1 Aug 2017, David Ahern wrote:

> On 7/31/17 11:59 PM, Julia Lawall wrote:
> >> This change breaks the kernel if one of these sysctls are changed:
> >> tcp_early_demux, udp_early_demux
> >
> > The other patch in the series has the same problem and should be dropped
> > too.
> >
> > julia
>
> Julia: are you going to send a revert patch? Right now I have to do that
> manually before launching test scripts.

Sorry, I didn't know it was applied.  I can send it.

julia


Re: [PATCH v3] ss: Enclose IPv6 address in brackets

2017-08-01 Thread Florian Lehner


On 08/01/2017 04:11 PM, Phil Sutter wrote:
> On Tue, Aug 01, 2017 at 12:05:13PM +0200, Florian Lehner wrote:
> [...]
>> @@ -114,9 +114,13 @@ int addr64_n2a(__u64 addr, char *buff, size_t len);
>>  int af_bit_len(int af);
>>  int af_byte_len(int af);
>>
>> -const char *format_host_r(int af, int len, const void *addr,
>> -   char *buf, int buflen);
>> -const char *format_host(int af, int lne, const void *addr);
>> +const char *format_host_rb(int af, int len, const void *addr,
>> +   char *buf, int buflen, bool *resolved);
>> +#define format_host_r(af, len, addr, buf, buflen) \
>> +format_host_rb(af, len, addr, buf, buflen, NULL)
>> +const char *format_host_b(int af, int lne, const void *addr, bool
>> *resolved);
>> +#define format_host(af, lne, addr) \
>> +format_host_b(af, lne, addr, NULL)
>>  #define format_host_rta(af, rta) \
>>  format_host(af, RTA_PAYLOAD(rta), RTA_DATA(rta))
>>  const char *rt_addr_n2a_r(int af, int len, const void *addr,
>> diff --git a/lib/utils.c b/lib/utils.c
>> index 9aa3219..42c3bf5 100644
>> --- a/lib/utils.c
>> +++ b/lib/utils.c
>> @@ -898,8 +898,8 @@ static const char *resolve_address(const void *addr,
>> int len, int af)
>>  }
>>  #endif
>>
>> -const char *format_host_r(int af, int len, const void *addr,
>> -char *buf, int buflen)
>> +const char *format_host_rb(int af, int len, const void *addr,
>> +char *buf, int buflen, bool *resolved)
>>  {
>>  #ifdef RESOLVE_HOSTNAMES
>>  if (resolve_hosts) {
>> @@ -909,17 +909,20 @@ const char *format_host_r(int af, int len, const
>> void *addr,
>>
>>  if (len > 0 &&
>>  (n = resolve_address(addr, len, af)) != NULL)
>> +{
>> +*resolved = true;
>>  return n;
>> +}
>>  }
>>  #endif
>>  return rt_addr_n2a_r(af, len, addr, buf, buflen);
>>  }
> 
> Did you test that? I guess calling format_host() will lead to
> dereference of a NULL pointer.

Yes, I did. And it just worked.
David Laight suggested to use strchr(). Instead of changing stuff in
lib/* I will try this.


Re: [PATCH net-next] net: bcmgenet: drop COMPILE_TEST dependency

2017-08-01 Thread Florian Fainelli
On 08/01/2017 04:50 AM, Arnd Bergmann wrote:
> The last patch added the dependency on 'OF && HAS_IOMEM' but left
> COMPILE_TEST as an alternative, which kind of defeats the purpose
> of adding the dependency, we still get randconfig build warnings:
> 
> warning: (NET_DSA_BCM_SF2 && BCMGENET) selects MDIO_BCM_UNIMAC which has 
> unmet direct dependencies (NETDEVICES && MDIO_BUS && HAS_IOMEM && OF_MDIO)
> 
> For compile-testing purposes, we don't really need this anyway,
> as CONFIG_OF can be enabled on all architectures, and HAS_IOMEM
> is present on all architectures we do meaningful compile-testing on
> (the exception being arch/um).
> 
> This makes both OF and HAS_IOMEM hard dependencies.
> 
> Fixes: 5af74bb4fcf8 ("net: bcmgenet: Add dependency on HAS_IOMEM && OF")
> Signed-off-by: Arnd Bergmann 

Reviewed-by: Florian Fainelli 

Thanks!
-- 
Florian


Re: [PATCH net-next 10/11] net: dsa: mv88e6xxx: remove EEE support

2017-08-01 Thread Florian Fainelli
On 08/01/2017 09:06 AM, Andrew Lunn wrote:
> On Tue, Aug 01, 2017 at 11:36:13AM -0400, Vivien Didelot wrote:
>> Hi Andrew,
>>
>> Andrew Lunn  writes:
>>
>>> On Mon, Jul 31, 2017 at 06:17:18PM -0400, Vivien Didelot wrote:
 The PHY's EEE settings are already accessed by the DSA layer through the
 Marvell PHY driver and there is nothing to be done for switch's MACs.
>>>
>>> I'm confused, or missing something. Does not patch #1 mean that if the
>>> DSA driver does not have a set_eee function, we always return -ENODEV
>>> in slave.c?
>>
>> If there is a PHY, phy_init_eee (if eee_enabled is true) and
>> phy_ethtool_set_eee is called. If there is a .set_eee op, it is
>> called. If both are absent, -ENODEV is returned.
> 
> O.K, i don't think that is correct. EEE should only be enabled if both
> the MAC and the PHY supports it. We need some way for the MAC to
> indicate it does not support EEE.

If the MAC does not support EEE but the PHY does I think you can still
allow EEE to be advertised and enabled, you just won't have the MAC be
able to leverage the power savings that EEE brings. AFAICT this is still
a valid mode whereby the PHY is put in a lower power mode, just not the
whole transmit path (MAC + PHY).

> 
> If set_eee is optional the meaning of a NULL pointer is that the MAC
> does support EEE. So for mv88e6060, lan9303, microchip and mt7530
> which currently don't support EEE, you need to add a set_eee which
> returns -ENODEV.
> 
> Having to implement the op to say you don't implement the feature just
> seems wrong.

If it is truly optional then we can either request drivers to implement
it and return something agreed upon, or just not implementing it means
we might be able to do EEE only at the PHY level.
-- 
Florian


[iproute PATCH] Really fix get_addr() and get_prefix() error messages

2017-08-01 Thread Phil Sutter
Both functions take the desired address family as a parameter. So using
that to notify the user what address family was expected is correct,
unlike using dst->family which will tell the user only what address
family was specified.

The situation which commit 334af76143368 tried to fix was when 'ip'
would accept addresses from multiple families. In that case, the family
parameter is set to AF_UNSPEC so that get_addr_1() may accept any valid
address.

This patch introduces a wrapper around family_name() which returns the
string "any valid" for AF_UNSPEC instead of the three question marks
unsuitable for use in error messages.

Tests for AF_UNSPEC:

| # ip a a 256.10.166.1/24 dev d0
| Error: any valid prefix is expected rather than "256.10.166.1/24".

| # ip neighbor add proxy 2001:db8::g dev d0
| Error: any valid address is expected rather than "2001:db8::g".

Tests for explicit address family:

| # ip -6 addrlabel add prefix 1.1.1.1/24 label 123
| Error: inet6 prefix is expected rather than "1.1.1.1/24".

| # ip -4 addrlabel add prefix dead:beef::1/24 label 123
| Error: inet prefix is expected rather than "dead:beef::1/24".

Reported-by: Jaroslav Aster 
Fixes: 334af76143368 ("fix get_addr() and get_prefix() error messages")
Signed-off-by: Phil Sutter 
---
 lib/utils.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/lib/utils.c b/lib/utils.c
index 9aa3219c5547d..9143ed2284870 100644
--- a/lib/utils.c
+++ b/lib/utils.c
@@ -616,12 +616,19 @@ done:
return err;
 }
 
+static const char *family_name_verbose(int family)
+{
+   if (family == AF_UNSPEC)
+   return "any valid";
+   return family_name(family);
+}
+
 int get_addr(inet_prefix *dst, const char *arg, int family)
 {
if (get_addr_1(dst, arg, family)) {
fprintf(stderr,
"Error: %s address is expected rather than \"%s\".\n",
-   family_name(dst->family), arg);
+   family_name_verbose(family), arg);
exit(1);
}
return 0;
@@ -639,7 +646,7 @@ int get_prefix(inet_prefix *dst, char *arg, int family)
if (get_prefix_1(dst, arg, family)) {
fprintf(stderr,
"Error: %s prefix is expected rather than \"%s\".\n",
-   family_name(dst->family), arg);
+   family_name_verbose(family), arg);
exit(1);
}
return 0;
-- 
2.13.1



Re: [PATCH net-next 10/11] net: dsa: mv88e6xxx: remove EEE support

2017-08-01 Thread Vivien Didelot
Hi Andrew,

Andrew Lunn  writes:

>> >> The PHY's EEE settings are already accessed by the DSA layer through the
>> >> Marvell PHY driver and there is nothing to be done for switch's MACs.
>> >
>> > I'm confused, or missing something. Does not patch #1 mean that if the
>> > DSA driver does not have a set_eee function, we always return -ENODEV
>> > in slave.c?
>> 
>> If there is a PHY, phy_init_eee (if eee_enabled is true) and
>> phy_ethtool_set_eee is called. If there is a .set_eee op, it is
>> called. If both are absent, -ENODEV is returned.
>
> O.K, i don't think that is correct. EEE should only be enabled if both
> the MAC and the PHY supports it. We need some way for the MAC to
> indicate it does not support EEE.
>
> If set_eee is optional the meaning of a NULL pointer is that the MAC
> does support EEE. So for mv88e6060, lan9303, microchip and mt7530
> which currently don't support EEE, you need to add a set_eee which
> returns -ENODEV.
>
> Having to implement the op to say you don't implement the feature just
> seems wrong.

Agreed, above I simply described how this patchset currently behaves.

I suggested in the previous mail to define a DSA noop so that the driver
can indicate that its MACs supports EEE, even though there's nothing to
do (and the DSA layer can learn about that):

static inline int dsa_set_mac_eee_noop(struct dsa_switch *ds,
   int port,
   struct ethtool_eee *e)
{
dev_dbg(ds->dev, "nothing to do for port %d's MAC\n", port);
return 0;
}

(and the respective dsa_get_mac_eee_noop() for sure.)

Second option is: we keep it KISS and let the driver define its noop,
but as I explain, it is confusion, especially for the get operation.


Thanks,

Vivien


[RFC PATCH v2 0/2] nb8800 suspend/resume support

2017-08-01 Thread Mason
Hello,

I need suspend/resume support in the nb8800 driver.
On tango platforms, suspend loses all context (MMIO registers).
To make the task easy, we just close the device on suspend,
and open it again on resume. This requires properly resetting
the HW on resume.

Patch 1 moves all the HW init to nb8800_init()
Patch 2 adds suspend/resume support

Regards.


[RFC PATCH v2 1/2] net: ethernet: nb8800: Reset HW block in ndo_open

2017-08-01 Thread Mason
Move all HW initializations to nb8800_init.
This provides the basis for suspend/resume support.
---
 drivers/net/ethernet/aurora/nb8800.c | 50 +---
 drivers/net/ethernet/aurora/nb8800.h |  1 +
 2 files changed, 25 insertions(+), 26 deletions(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index e94159507847..aa18ea25d91f 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -39,6 +39,7 @@
 
 #include "nb8800.h"
 
+static void nb8800_init(struct net_device *dev);
 static void nb8800_tx_done(struct net_device *dev);
 static int nb8800_dma_stop(struct net_device *dev);
 
@@ -957,6 +958,8 @@ static int nb8800_open(struct net_device *dev)
struct phy_device *phydev;
int err;
 
+   nb8800_init(dev);
+
/* clear any pending interrupts */
nb8800_writel(priv, NB8800_RXC_SR, 0xf);
nb8800_writel(priv, NB8800_TXC_SR, 0xf);
@@ -1246,11 +1249,6 @@ static int nb8800_hw_init(struct net_device *dev)
nb8800_writeb(priv, NB8800_PQ1, val >> 8);
nb8800_writeb(priv, NB8800_PQ2, val & 0xff);
 
-   /* Auto-negotiate by default */
-   priv->pause_aneg = true;
-   priv->pause_rx = true;
-   priv->pause_tx = true;
-
nb8800_mc_init(dev, 0);
 
return 0;
@@ -1350,6 +1348,20 @@ static const struct of_device_id nb8800_dt_ids[] = {
 };
 MODULE_DEVICE_TABLE(of, nb8800_dt_ids);
 
+static void nb8800_init(struct net_device *dev)
+{
+   struct nb8800_priv *priv = netdev_priv(dev);
+   const struct nb8800_ops *ops = priv->ops;
+
+   if (ops && ops->reset)
+   ops->reset(dev);
+   nb8800_hw_init(dev);
+   if (ops && ops->init)
+   ops->init(dev);
+   nb8800_update_mac_addr(dev);
+   priv->speed = 0;
+}
+
 static int nb8800_probe(struct platform_device *pdev)
 {
const struct of_device_id *match;
@@ -1389,6 +1401,7 @@ static int nb8800_probe(struct platform_device *pdev)
 
priv = netdev_priv(dev);
priv->base = base;
+   priv->ops = ops;
 
priv->phy_mode = of_get_phy_mode(pdev->dev.of_node);
if (priv->phy_mode < 0)
@@ -1407,12 +1420,6 @@ static int nb8800_probe(struct platform_device *pdev)
 
spin_lock_init(&priv->tx_lock);
 
-   if (ops && ops->reset) {
-   ret = ops->reset(dev);
-   if (ret)
-   goto err_disable_clk;
-   }
-
bus = devm_mdiobus_alloc(&pdev->dev);
if (!bus) {
ret = -ENOMEM;
@@ -1454,21 +1461,16 @@ static int nb8800_probe(struct platform_device *pdev)
 
priv->mii_bus = bus;
 
-   ret = nb8800_hw_init(dev);
-   if (ret)
-   goto err_deregister_fixed_link;
-
-   if (ops && ops->init) {
-   ret = ops->init(dev);
-   if (ret)
-   goto err_deregister_fixed_link;
-   }
-
dev->netdev_ops = &nb8800_netdev_ops;
dev->ethtool_ops = &nb8800_ethtool_ops;
dev->flags |= IFF_MULTICAST;
dev->irq = irq;
 
+   /* Auto-negotiate by default */
+   priv->pause_aneg = true;
+   priv->pause_rx = true;
+   priv->pause_tx = true;
+
mac = of_get_mac_address(pdev->dev.of_node);
if (mac)
ether_addr_copy(dev->dev_addr, mac);
@@ -1476,14 +1478,12 @@ static int nb8800_probe(struct platform_device *pdev)
if (!is_valid_ether_addr(dev->dev_addr))
eth_hw_addr_random(dev);
 
-   nb8800_update_mac_addr(dev);
-
netif_carrier_off(dev);
 
ret = register_netdev(dev);
if (ret) {
netdev_err(dev, "failed to register netdev\n");
-   goto err_free_dma;
+   goto err_deregister_fixed_link;
}
 
netif_napi_add(dev, &priv->napi, nb8800_poll, NAPI_POLL_WEIGHT);
@@ -1492,8 +1492,6 @@ static int nb8800_probe(struct platform_device *pdev)
 
return 0;
 
-err_free_dma:
-   nb8800_dma_free(dev);
 err_deregister_fixed_link:
if (of_phy_is_fixed_link(pdev->dev.of_node))
of_phy_deregister_fixed_link(pdev->dev.of_node);
diff --git a/drivers/net/ethernet/aurora/nb8800.h 
b/drivers/net/ethernet/aurora/nb8800.h
index 6ec4a956e1e5..d5f4481a2c7b 100644
--- a/drivers/net/ethernet/aurora/nb8800.h
+++ b/drivers/net/ethernet/aurora/nb8800.h
@@ -305,6 +305,7 @@ struct nb8800_priv {
dma_addr_t  tx_desc_dma;
 
struct clk  *clk;
+   const struct nb8800_ops *ops;
 };
 
 struct nb8800_ops {
-- 
2.11.0


[RFC PATCH v2 2/2] net: ethernet: nb8800: Add suspend/resume support

2017-08-01 Thread Mason
Wrappers around nb8800_stop and nb8800_open.
---
 drivers/net/ethernet/aurora/nb8800.c | 23 ++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aurora/nb8800.c 
b/drivers/net/ethernet/aurora/nb8800.c
index aa18ea25d91f..607064a6d7a1 100644
--- a/drivers/net/ethernet/aurora/nb8800.c
+++ b/drivers/net/ethernet/aurora/nb8800.c
@@ -1012,7 +1012,6 @@ static int nb8800_stop(struct net_device *dev)
netif_stop_queue(dev);
napi_disable(&priv->napi);
 
-   nb8800_dma_stop(dev);
nb8800_mac_rx(dev, false);
nb8800_mac_tx(dev, false);
 
@@ -1526,6 +1525,26 @@ static int nb8800_remove(struct platform_device *pdev)
return 0;
 }
 
+static int nb8800_suspend(struct platform_device *pdev, pm_message_t state)
+{
+   struct net_device *dev = platform_get_drvdata(pdev);
+
+   if (netif_running(dev))
+   nb8800_stop(dev);
+
+   return 0;
+}
+
+static int nb8800_resume(struct platform_device *pdev)
+{
+   struct net_device *dev = platform_get_drvdata(pdev);
+
+   if (netif_running(dev))
+   nb8800_open(dev);
+
+   return 0;
+}
+
 static struct platform_driver nb8800_driver = {
.driver = {
.name   = "nb8800",
@@ -1533,6 +1552,8 @@ static struct platform_driver nb8800_driver = {
},
.probe  = nb8800_probe,
.remove = nb8800_remove,
+   .suspend = nb8800_suspend,
+   .resume = nb8800_resume,
 };
 
 module_platform_driver(nb8800_driver);
-- 
2.11.0


[PATCH 0/2] Revert "ipv6: constify inet6_protocol structures"

2017-08-01 Thread Julia Lawall
inet6_add_protocol and inet6_del_protocol include casts that remove the
effect of the const annotation on their parameter, leading to possible
runtime crashes.


Re: [PATCH v6 net-next] net: systemport: Support 64bit statistics

2017-08-01 Thread Florian Fainelli
On 07/31/2017 06:18 PM, Jianming.qiao wrote:
> When using Broadcom Systemport device in 32bit Platform, ifconfig can
> only report up to 4G tx,rx status, which will be wrapped to 0 when the
> number of incoming or outgoing packets exceeds 4G, only taking
> around 2 hours in busy network environment (such as streaming).
> Therefore, it makes hard for network diagnostic tool to get reliable
> statistical result, so the patch is used to add 64bit support for
> Broadcom Systemport device in 32bit Platform.

Almost there, can you turn on lock debugging and try to e.g: modprobe
bcmsysport:

<4>[   17.836361] CPU: 3 PID: 1328 Comm: modprobe Not tainted
4.13.0-rc1-00560-g67f9849fc4f9 #300
<4>[   17.844760] Hardware name: Broadcom STB (Flattened Device Tree)
<4>[   17.850744] [] (unwind_backtrace) from []
(show_stack+0x10/0x14)
<4>[   17.858555] [] (show_stack) from []
(dump_stack+0xb0/0xdc)
<4>[   17.865838] [] (dump_stack) from []
(register_lock_class+0x200/0x5ec)
<4>[   17.874075] [] (register_lock_class) from []
(__lock_acquire+0x9c/0x19c4)
<4>[   17.882664] [] (__lock_acquire) from []
(lock_acquire+0xd0/0x294)
<4>[   17.890594] [] (lock_acquire) from []
(bcm_sysport_get_stats64+0xe0/0x1b4 [bcmsysport])
<4>[   17.900440] [] (bcm_sysport_get_stats64 [bcmsysport])
from [] (dev_get_stats+0x38/0xac)
<4>[   17.910263] [] (dev_get_stats) from []
(rtnl_fill_stats+0x38/0x118)
<4>[   17.918330] [] (rtnl_fill_stats) from []
(rtnl_fill_ifinfo+0x4cc/0xdac)
<4>[   17.926749] [] (rtnl_fill_ifinfo) from []
(rtmsg_ifinfo_build_skb+0x70/0xdc)
<4>[   17.935591] [] (rtmsg_ifinfo_build_skb) from
[] (rtmsg_ifinfo_event.part.6+0x14/0x44)
<4>[   17.945221] [] (rtmsg_ifinfo_event.part.6) from
[] (rtmsg_ifinfo+0x20/0x28)
<4>[   17.953990] [] (rtmsg_ifinfo) from []
(register_netdevice+0x558/0x684)
<4>[   17.962311] [] (register_netdevice) from []
(register_netdev+0x14/0x24)
<4>[   17.970739] [] (register_netdev) from []
(bcm_sysport_probe+0x2f0/0x42c [bcmsysport])
<4>[   17.980384] [] (bcm_sysport_probe [bcmsysport]) from
[] (platform_drv_probe+0x4c/0xac)
<4>[   17.990100] [] (platform_drv_probe) from []
(driver_probe_device+0x2b8/0x468)
<4>[   17.999030] [] (driver_probe_device) from []
(__driver_attach+0xec/0x128)
<4>[   18.007613] [] (__driver_attach) from []
(bus_for_each_dev+0x74/0xb8)
<4>[   18.015844] [] (bus_for_each_dev) from []
(bus_add_driver+0x1bc/0x270)
<4>[   18.024158] [] (bus_add_driver) from []
(driver_register+0x78/0xf8)
<4>[   18.032218] [] (driver_register) from []
(do_one_initcall+0x50/0x190)
<4>[   18.040446] [] (do_one_initcall) from []
(do_init_module+0x64/0x1f8)
<4>[   18.048584] [] (do_init_module) from []
(load_module+0x1eb4/0x27f0)
<4>[   18.056641] [] (load_module) from []
(SyS_init_module+0x128/0x19c)
<4>[   18.064623] [] (SyS_init_module) from []
(ret_fast_syscall+0x0/0x1c)
<6>[   18.072890] brcm-systemport f04a.ethernet: Broadcom
SYSTEMPORTv 1.00 at 0xf0ca8000 (irqs: 64, 65, TXQs: 32, RXQs: 1)

You might also want to check the output of ethtool -S because now
tx_bytes and tx_packets is all zeroes.

I know we are not supposed to include netdev stats within ethtool, but
since it was done that way and ethtool is an user-facing/ABI, this needs
to keep working.

Thanks!

> 
> Signed-off-by: Jianming.qiao 
> ---
>  drivers/net/ethernet/broadcom/bcmsysport.c | 68 
> --
>  drivers/net/ethernet/broadcom/bcmsysport.h |  9 +++-
>  2 files changed, 52 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bcmsysport.c 
> b/drivers/net/ethernet/broadcom/bcmsysport.c
> index 5333601..bb3cc7a 100644
> --- a/drivers/net/ethernet/broadcom/bcmsysport.c
> +++ b/drivers/net/ethernet/broadcom/bcmsysport.c
> @@ -662,6 +662,7 @@ static int bcm_sysport_alloc_rx_bufs(struct 
> bcm_sysport_priv *priv)
>  static unsigned int bcm_sysport_desc_rx(struct bcm_sysport_priv *priv,
>   unsigned int budget)
>  {
> + struct bcm_sysport_stats *stats64 = &priv->stats64;
>   struct net_device *ndev = priv->netdev;
>   unsigned int processed = 0, to_process;
>   struct bcm_sysport_cb *cb;
> @@ -765,6 +766,10 @@ static unsigned int bcm_sysport_desc_rx(struct 
> bcm_sysport_priv *priv,
>   skb->protocol = eth_type_trans(skb, ndev);
>   ndev->stats.rx_packets++;
>   ndev->stats.rx_bytes += len;
> + u64_stats_update_begin(&stats64->syncp);
> + stats64->rx_packets++;
> + stats64->rx_bytes += len;
> + u64_stats_update_end(&stats64->syncp);
>  
>   napi_gro_receive(&priv->napi, skb);
>  next:
> @@ -787,17 +792,15 @@ static void bcm_sysport_tx_reclaim_one(struct 
> bcm_sysport_tx_ring *ring,
>   struct device *kdev = &priv->pdev->dev;
>  
>   if (cb->skb) {
> - ring->bytes += cb->skb->len;
>   *bytes_compl += cb->skb->len;
>   dma_unmap_single(kdev, dma_unmap_addr(cb, dma_addr),
> 

[PATCH 2/2] Revert "l2tp: constify inet6_protocol structures"

2017-08-01 Thread Julia Lawall
This reverts commit d04916a48ad4a3db892b664fa9c3a2a693c378ad.

inet6_add_protocol and inet6_del_protocol include casts that remove the
effect of the const annotation on their parameter, leading to possible
runtime crashes.

Reported-by: Eric Dumazet 
Signed-off-by: Julia Lawall 

---

 net/l2tp/l2tp_ip6.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/l2tp/l2tp_ip6.c b/net/l2tp/l2tp_ip6.c
index d2efcd9..88b397c 100644
--- a/net/l2tp/l2tp_ip6.c
+++ b/net/l2tp/l2tp_ip6.c
@@ -788,7 +788,7 @@ static int l2tp_ip6_recvmsg(struct sock *sk, struct msghdr 
*msg, size_t len,
.ops= &l2tp_ip6_ops,
 };
 
-static const struct inet6_protocol l2tp_ip6_protocol = {
+static struct inet6_protocol l2tp_ip6_protocol __read_mostly = {
.handler= l2tp_ip6_recv,
 };
 


[PATCH 1/2] Revert "ipv6: constify inet6_protocol structures"

2017-08-01 Thread Julia Lawall
This reverts commit 3a3a4e3054137c5ff5d4d306ec834f6d25d7f95b.

inet6_add_protocol and inet6_del_protocol include casts that remove the
effect of the const annotation on their parameter, leading to possible
runtime crashes.

Reported-by: Eric Dumazet 
Signed-off-by: Julia Lawall 

---

 net/ipv6/ip6_gre.c  |2 +-
 net/ipv6/tcp_ipv6.c |2 +-
 net/ipv6/udp.c  |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/ipv6/ip6_gre.c b/net/ipv6/ip6_gre.c
index 33865d6..67ff2aa 100644
--- a/net/ipv6/ip6_gre.c
+++ b/net/ipv6/ip6_gre.c
@@ -1080,7 +1080,7 @@ static void ip6gre_fb_tunnel_init(struct net_device *dev)
 }
 
 
-static const struct inet6_protocol ip6gre_protocol = {
+static struct inet6_protocol ip6gre_protocol __read_mostly = {
.handler = gre_rcv,
.err_handler = ip6gre_err,
.flags   = INET6_PROTO_NOPOLICY|INET6_PROTO_FINAL,
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index 39ee8e7..ced5dcf 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1944,7 +1944,7 @@ struct proto tcpv6_prot = {
.diag_destroy   = tcp_abort,
 };
 
-static const struct inet6_protocol tcpv6_protocol = {
+static struct inet6_protocol tcpv6_protocol = {
.early_demux=   tcp_v6_early_demux,
.early_demux_handler =  tcp_v6_early_demux,
.handler=   tcp_v6_rcv,
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 7e6d7f5..98fe456 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1457,7 +1457,7 @@ int compat_udpv6_getsockopt(struct sock *sk, int level, 
int optname,
 }
 #endif
 
-static const struct inet6_protocol udpv6_protocol = {
+static struct inet6_protocol udpv6_protocol = {
.early_demux=   udp_v6_early_demux,
.early_demux_handler =  udp_v6_early_demux,
.handler=   udpv6_rcv,


[PATCH v4] ss: Enclose IPv6 address in brackets

2017-08-01 Thread Florian Lehner
This updated patch adds support for RFC2732 IPv6 address format with
brackets for the tool ss.

Following the advice by David Laight I used strchr().
Also, IN6ADDR_ANY and INADDR_ANY will return "*".


Signed-off-by: Lehner Florian 
---
 misc/ss.c | 28 +---
 1 file changed, 17 insertions(+), 11 deletions(-)

diff --git a/misc/ss.c b/misc/ss.c
index 12763c9..d40ad00 100644
--- a/misc/ss.c
+++ b/misc/ss.c
@@ -1046,25 +1046,31 @@ do_numeric:

 static void inet_addr_print(const inet_prefix *a, int port, unsigned
int ifindex)
 {
-   char buf[1024];
+   char buf[1024], buf2[1024];
const char *ap = buf;
+   char *c = NULL;
int est_len = addr_width;
const char *ifname = NULL;

-   if (a->family == AF_INET) {
-   if (a->data[0] == 0) {
+   if (a->data[0] == 0) {
buf[0] = '*';
buf[1] = 0;
-   } else {
+   } else {
+   if (a->family == AF_INET) {
ap = format_host(AF_INET, 4, a->data);
+   } else {
+   ap = format_host(a->family, 16, a->data);
+   c = strchr(ap, ':');
+   if (c != NULL && a->family == AF_INET6) {
+   sprintf(buf2, "[%s]", ap);
+   ap = buf2;
+   }
+   est_len = strlen(ap);
+   if (est_len <= addr_width)
+   est_len = addr_width;
+   else
+   est_len = addr_width + 
((est_len-addr_width+3)/4)*4;
}
-   } else {
-   ap = format_host(a->family, 16, a->data);
-   est_len = strlen(ap);
-   if (est_len <= addr_width)
-   est_len = addr_width;
-   else
-   est_len = addr_width + ((est_len-addr_width+3)/4)*4;
}

if (ifindex) {
-- 
2.9.4



Re: [PATCH 0/2] Revert "ipv6: constify inet6_protocol structures"

2017-08-01 Thread David Miller
From: Julia Lawall 
Date: Tue,  1 Aug 2017 18:27:27 +0200

> inet6_add_protocol and inet6_del_protocol include casts that remove the
> effect of the const annotation on their parameter, leading to possible
> runtime crashes.

Series applied, thanks for following up on this.


Re: [PATCH net-next 4/6] tcp: remove header prediction

2017-08-01 Thread Neal Cardwell
On Sat, Jul 29, 2017 at 9:57 PM, Florian Westphal  wrote:
> @@ -5519,11 +5347,10 @@ void tcp_finish_connect(struct sock *sk, struct 
> sk_buff *skb)
> if (sock_flag(sk, SOCK_KEEPOPEN))
> inet_csk_reset_keepalive_timer(sk, keepalive_time_when(tp));
>
> -   if (!tp->rx_opt.snd_wscale)
> -   __tcp_fast_path_on(tp, tp->snd_wnd);
> -   else
> -   tp->pred_flags = 0;
> -
> +   if (!sock_flag(sk, SOCK_DEAD)) {
> +   sk->sk_state_change(sk);
> +   sk_wake_async(sk, SOCK_WAKE_IO, POLL_OUT);
> +   }
>  }
>

This patch hunk seems like it introduces a minor bug. It seems like
after this change, the sk_state_change() and sk_wake_async() calls for
a completed active connection happen twice: once in this new spot
inside tcp_finish_connect() and once in the existing code in
tcp_rcv_synsent_state_process() immediately after it calls
tcp_finish_connect().

I would vote for removing this new code snippet and retaining the old
one, in case there are subtle interactions with
tcp_rcv_fastopen_synack(), which happens in between the new wake-up
location and the old wake-up location.

neal


[iproute PATCH] iplink: Notify user if EEXIST error might be spurious

2017-08-01 Thread Phil Sutter
Back in the days when RTM_NEWLINK wasn't yet implemented, people had to
rely upon kernel modules to create (virtual) interfaces for them. The
number of those was usually defined via module parameter, and a sane
default value was chosen. Now that iproute2 allows users to instantiate
new interfaces at will, this is no longer required - though for
backwards compatibility reasons, we're stuck with both methods which
collide at the point when one tries to create an interface with a
standard name for a type which exists in a kernel module: The kernel
will load the module, which instantiates the interface and the following
RTM_NEWLINK request will fail since the interface exists already. For
instance:

| # lsmod | grep dummy
| # ip link show | grep dummy0
| # ip link add dummy0 type dummy
| RTNETLINK answers: File exists
| # ip link show | grep -c dummy0
| 1

There is no race-free solution in userspace for this dilemma as far as I
can tell, so try to detect whether a user might have run into this and
notify that the given error message might be irrelevant.

Signed-off-by: Phil Sutter 
---
 ip/iplink.c | 35 ++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/ip/iplink.c b/ip/iplink.c
index 5aff2fde38dae..f94fa96668d21 100644
--- a/ip/iplink.c
+++ b/ip/iplink.c
@@ -867,6 +867,33 @@ int iplink_parse(int argc, char **argv, struct iplink_req 
*req,
return ret - argc;
 }
 
+static bool link_name_is_standard(const char *type, const char *name)
+{
+   const struct {
+   const char *type;
+   const char *name;
+   } standard_links[] = {
+   { "dummy", "dummy0" },
+   { "ifb", "ifb0" },
+   { "ifb", "ifb1" },
+   { "bond", "bond0" },
+   { "ipip", "tunl0" },
+   { "gre", "gre0" },
+   { "gretap", "gretap0" },
+   { "vti", "ip_vti0" },
+   { "sit", "sit0" },
+   };
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(standard_links); i++) {
+   if (strcmp(type, standard_links[i].type) ||
+   strcmp(name, standard_links[i].name))
+   continue;
+   return true;
+   }
+   return false;
+}
+
 static int iplink_modify(int cmd, unsigned int flags, int argc, char **argv)
 {
int len;
@@ -1007,8 +1034,14 @@ static int iplink_modify(int cmd, unsigned int flags, 
int argc, char **argv)
return -1;
}
 
-   if (rtnl_talk(&rth, &req.n, NULL, 0) < 0)
+   if (rtnl_talk(&rth, &req.n, NULL, 0) < 0) {
+   if (errno == EEXIST &&
+   flags & NLM_F_CREATE &&
+   link_name_is_standard(type, name))
+   fprintf(stderr,
+   "Note: Kernel might have created interface upon 
module load.\n");
return -2;
+   }
 
return 0;
 }
-- 
2.13.1



  1   2   3   >