Your payment Released

2016-09-17 Thread First Mile Consulting Inc.
Hello

We want to let you know that record available to us states that sometime ago, 
you were contacted by some people who said they wanted to wire some money into 
your account and you people share at an agreed ratio. You opened communication 
with the consulting company then but after sometime, you cut off communication 
when you were asked to pay certain amount of money to facilitate the transfer 
of the said fund to your account. That transaction is real. It is not a scam.

They want to pay you now. Do reply so that we tell you the steps to follow and 
your get your money within the next few weeks.

Waiting for your reply



John Kulpinski
CEO
First Mile Consulting Inc.


Re: [PATCH v2 2/2] openvswitch: use percpu flow stats

2016-09-17 Thread pravin shelar
On Thu, Sep 15, 2016 at 3:11 PM, Thadeu Lima de Souza Cascardo
 wrote:
> Instead of using flow stats per NUMA node, use it per CPU. When using
> megaflows, the stats lock can be a bottleneck in scalability.
>
> On a E5-2690 12-core system, usual throughput went from ~4Mpps to
> ~15Mpps when forwarding between two 40GbE ports with a single flow
> configured on the datapath.
>
> This has been tested on a system with possible CPUs 0-7,16-23. After
> module removal, there were no corruption on the slab cache.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo 
> Cc: pravin shelar 

Looks good.

Acked-by: Pravin B Shelar 


Re: [PATCH v2 1/2] openvswitch: fix flow stats accounting when node 0 is not possible

2016-09-17 Thread pravin shelar
On Thu, Sep 15, 2016 at 3:11 PM, Thadeu Lima de Souza Cascardo
 wrote:
> On a system with only node 1 as possible, all statistics is going to be
> accounted on node 0 as it will have a single writer.
>
> However, when getting and clearing the statistics, node 0 is not going
> to be considered, as it's not a possible node.
>
> Tested that statistics are not zero on a system with only node 1
> possible. Also compile-tested with CONFIG_NUMA off.
>
> Signed-off-by: Thadeu Lima de Souza Cascardo 

Acked-by: Pravin B Shelar 


Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Joe Perches
On Sat, 2016-09-17 at 16:38 -0700, Florian Fainelli wrote:
> 2016-09-17 16:23 GMT-07:00 Joe Perches :
> > On Sat, 2016-09-17 at 16:17 -0700, Florian Fainelli wrote:
> > > The list does not accept public subscribers, so this is the correct
> > > entry to use.
> > Then M: is definitely _not_ the correct entry for this
> > and it should be:
> > L: bcm-kernel-feedback-l...@broadcom.com (subscribers-only)

> Olof indicated otherwise, so who is right 
> here?https://www.spinics.net/lists/arm-kernel/msg512572.html

> prompting to this patch series:
> http://linux-arm-kernel.infradead.narkive.com/CRyvGOKd/patch-maintainers-change-l-to-

This hasn't been applied to -next, and I looked at the 
existing L: entries.

No worries here either if it's an exploder and not a mailing list.

Pity it's not called something like bcm-linux-driv...@broadcom.com
if it's really an exploder.


Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Michael Chan
On Sat, Sep 17, 2016 at 4:17 PM, Florian Fainelli  wrote:
> 2016-09-17 15:51 GMT-07:00 Joe Perches :
>> On Sat, 2016-09-17 at 15:27 -0700, Florian Fainelli wrote:
>>> Gary has not been with Broadcom for some time now, replace his address
>>> with the internal mailing-list used for other entries.
>>>
>>> > Signed-off-by: Florian Fainelli 
>>> ---
>>> Michael,
>>>
>>> Since this is an old driver, not sure who could step up as a maintainer
>>> for b44?
>> []
>>> diff --git a/MAINTAINERS b/MAINTAINERS
>> []
>>> @@ -2500,8 +2500,8 @@ S:  Supported
>>
>>> F:kernel/bpf/
>>>
>>>  BROADCOM B44 10/100 ETHERNET DRIVER
>>> -M:   Gary Zambrano 
>>>  L:   netdev@vger.kernel.org
>>> +M:   bcm-kernel-feedback-l...@broadcom.com
>>>  S:   Supported
>>>  F:   drivers/net/ethernet/broadcom/b44.*
>>
>> Without an actual maintainer, this should really be
>> orphan and not supported.
>
> I would like to hear from Michael before concluding that
>

I have worked on this NIC more than 10 years ago.  Last time I
checked, I don't have this NIC anymore after moving offices several
times.

I don't mind being the maintainer, if no one else more suitable and
have access to hardware wants to do it.


Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Florian Fainelli


2016-09-17 16:23 GMT-07:00 Joe Perches :
> On Sat, 2016-09-17 at 16:17 -0700, Florian Fainelli wrote:
>> 2016-09-17 15:51 GMT-07:00 Joe Perches :
> []
>> > Without an actual maintainer, this should really be
>> > orphan and not supported.
>> I would like to hear from Michael before concluding that
>
> No worries.
>
>> > And the M: bcm-kernel-feedback-list@ should be L:
>> The list does not accept public subscribers, so this is the correct
>> entry to use.
>
> Then M: is definitely _not_ the correct entry for this
> and it should be:
>
> L: bcm-kernel-feedback-l...@broadcom.com (subscribers-only)

Olof indicated otherwise, so who is right here?

https://www.spinics.net/lists/arm-kernel/msg512572.html

prompting to this patch series:

http://linux-arm-kernel.infradead.narkive.com/CRyvGOKd/patch-maintainers-change-l-to-m-for-broadcom-arm-soc-entries
-- 
Florian


[PATCH net-next 00/11] rxrpc: Tracepoint addition and improvement

2016-09-17 Thread David Howells

Here is a set of patches that add some more tracepoints and improve a couple
of existing ones.  New additions include:

 (1) Connection refcount tracking.

 (2) Client connection state machine tracking.

 (3) Tx and Rx packet lifecycle.

 (4) ACK reception and transmission.

 (5) recvmsg processing.

Updates include:

 (1) Print the symbolic packet name in the Rx packet tracepoint.

 (2) Additional call refcount trace events.

 (3) Improvements to sk_buff tracking with AF_RXRPC.

In addition:

 (1) Config option to inject packet loss during both transmission and
 reception.

 (2) Removal of some printks.

This series needs to be applied on top of the previously posted fixes.

The patches can be found here also:


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160917-2

David
---
David Howells (11):
  rxrpc: Print the packet type name in the Rx packet trace
  rxrpc: Add some additional call tracing
  rxrpc: Add connection tracepoint and client conn state tracepoint
  rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer
  rxrpc: Add a tracepoint to log received ACK packets
  rxrpc: Add a tracepoint to log ACK transmission
  rxrpc: Add a tracepoint to follow packets in the Rx buffer
  rxrpc: Add a tracepoint to follow what recvmsg does
  rxrpc: Remove printks from rxrpc_recvmsg_data() to fix uninit var
  rxrpc: Improve skb tracing
  rxrpc: Add config to inject packet loss


 include/trace/events/rxrpc.h |  226 --
 net/rxrpc/Kconfig|7 +
 net/rxrpc/af_rxrpc.c |5 +
 net/rxrpc/ar-internal.h  |  159 +++---
 net/rxrpc/call_accept.c  |7 +
 net/rxrpc/call_event.c   |8 +
 net/rxrpc/call_object.c  |   31 --
 net/rxrpc/conn_client.c  |   82 +++
 net/rxrpc/conn_event.c   |   11 +-
 net/rxrpc/conn_object.c  |   72 +
 net/rxrpc/conn_service.c |4 +
 net/rxrpc/input.c|   31 --
 net/rxrpc/local_event.c  |4 -
 net/rxrpc/misc.c |   81 +++
 net/rxrpc/output.c   |   20 +++-
 net/rxrpc/peer_event.c   |   10 +-
 net/rxrpc/recvmsg.c  |   60 ---
 net/rxrpc/sendmsg.c  |   19 ++--
 net/rxrpc/skbuff.c   |   53 +++---
 19 files changed, 740 insertions(+), 150 deletions(-)



[PATCH net-next 02/11] rxrpc: Add some additional call tracing

2016-09-17 Thread David Howells
Add additional call tracepoint points for noting call-connected,
call-released and connection-failed events.

Also fix one tracepoint that was using an integer instead of the
corresponding enum value as the point type.

Signed-off-by: David Howells 
---

 net/rxrpc/ar-internal.h |3 +++
 net/rxrpc/call_object.c |   18 ++
 2 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 0f6fafa2c271..4a73c20d9436 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -539,6 +539,8 @@ enum rxrpc_call_trace {
rxrpc_call_queued,
rxrpc_call_queued_ref,
rxrpc_call_seen,
+   rxrpc_call_connected,
+   rxrpc_call_release,
rxrpc_call_got,
rxrpc_call_got_userid,
rxrpc_call_got_kernel,
@@ -546,6 +548,7 @@ enum rxrpc_call_trace {
rxrpc_call_put_userid,
rxrpc_call_put_kernel,
rxrpc_call_put_noqueue,
+   rxrpc_call_error,
rxrpc_call__nr_trace
 };
 
diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 23f5a5f58282..0df9d1af8edb 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -53,6 +53,8 @@ const char rxrpc_call_traces[rxrpc_call__nr_trace][4] = {
[rxrpc_call_new_service]= "NWs",
[rxrpc_call_queued] = "QUE",
[rxrpc_call_queued_ref] = "QUR",
+   [rxrpc_call_connected]  = "CON",
+   [rxrpc_call_release]= "RLS",
[rxrpc_call_seen]   = "SEE",
[rxrpc_call_got]= "GOT",
[rxrpc_call_got_userid] = "Gus",
@@ -61,6 +63,7 @@ const char rxrpc_call_traces[rxrpc_call__nr_trace][4] = {
[rxrpc_call_put_userid] = "Pus",
[rxrpc_call_put_kernel] = "Pke",
[rxrpc_call_put_noqueue]= "PNQ",
+   [rxrpc_call_error]  = "*E*",
 };
 
 struct kmem_cache *rxrpc_call_jar;
@@ -222,8 +225,8 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
return call;
}
 
-   trace_rxrpc_call(call, 0, atomic_read(>usage), here,
-(const void *)user_call_ID);
+   trace_rxrpc_call(call, rxrpc_call_new_client, atomic_read(>usage),
+here, (const void *)user_call_ID);
 
/* Publish the call, even though it is incompletely set up as yet */
write_lock(>call_lock);
@@ -263,6 +266,9 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
if (ret < 0)
goto error;
 
+   trace_rxrpc_call(call, rxrpc_call_connected, atomic_read(>usage),
+here, ERR_PTR(ret));
+
spin_lock_bh(>conn->params.peer->lock);
hlist_add_head(>error_link,
   >conn->params.peer->error_targets);
@@ -287,6 +293,8 @@ error_dup_user_ID:
 error:
__rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR,
RX_CALL_DEAD, ret);
+   trace_rxrpc_call(call, rxrpc_call_error, atomic_read(>usage),
+here, ERR_PTR(ret));
rxrpc_release_call(rx, call);
rxrpc_put_call(call, rxrpc_call_put);
_leave(" = %d", ret);
@@ -396,15 +404,17 @@ void rxrpc_get_call(struct rxrpc_call *call, enum 
rxrpc_call_trace op)
  */
 void rxrpc_release_call(struct rxrpc_sock *rx, struct rxrpc_call *call)
 {
+   const void *here = __builtin_return_address(0);
struct rxrpc_connection *conn = call->conn;
bool put = false;
int i;
 
_enter("{%d,%d}", call->debug_id, atomic_read(>usage));
 
-   ASSERTCMP(call->state, ==, RXRPC_CALL_COMPLETE);
+   trace_rxrpc_call(call, rxrpc_call_release, atomic_read(>usage),
+here, (const void *)call->flags);
 
-   rxrpc_see_call(call);
+   ASSERTCMP(call->state, ==, RXRPC_CALL_COMPLETE);
 
spin_lock_bh(>lock);
if (test_and_set_bit(RXRPC_CALL_RELEASED, >flags))



[PATCH net-next 04/11] rxrpc: Add a tracepoint to follow the life of a packet in the Tx buffer

2016-09-17 Thread David Howells
Add a tracepoint to follow the insertion of a packet into the transmit
buffer, its transmission and its rotation out of the buffer.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   26 ++
 net/rxrpc/ar-internal.h  |   12 
 net/rxrpc/input.c|2 ++
 net/rxrpc/misc.c |9 +
 net/rxrpc/sendmsg.c  |9 -
 5 files changed, 57 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index c0c496c83f31..ffc74b3e5b76 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -208,6 +208,32 @@ TRACE_EVENT(rxrpc_abort,
  __entry->abort_code, __entry->error, __entry->why)
);
 
+TRACE_EVENT(rxrpc_transmit,
+   TP_PROTO(struct rxrpc_call *call, enum rxrpc_transmit_trace why),
+
+   TP_ARGS(call, why),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(enum rxrpc_transmit_trace,  why )
+   __field(rxrpc_seq_t,tx_hard_ack )
+   __field(rxrpc_seq_t,tx_top  )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->why = why;
+   __entry->tx_hard_ack = call->tx_hard_ack;
+   __entry->tx_top = call->tx_top;
+  ),
+
+   TP_printk("c=%p %s f=%08x n=%u",
+ __entry->call,
+ rxrpc_transmit_traces[__entry->why],
+ __entry->tx_hard_ack + 1,
+ __entry->tx_top - __entry->tx_hard_ack)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 6ca40eea3022..afa5dcc05fe0 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -593,6 +593,18 @@ enum rxrpc_call_trace {
 
 extern const char rxrpc_call_traces[rxrpc_call__nr_trace][4];
 
+enum rxrpc_transmit_trace {
+   rxrpc_transmit_wait,
+   rxrpc_transmit_queue,
+   rxrpc_transmit_queue_reqack,
+   rxrpc_transmit_queue_last,
+   rxrpc_transmit_rotate,
+   rxrpc_transmit_end,
+   rxrpc_transmit__nr_trace
+};
+
+extern const char rxrpc_transmit_traces[rxrpc_transmit__nr_trace][4];
+
 extern const char *const rxrpc_pkts[];
 extern const char *rxrpc_acks(u8 reason);
 
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index c1f83d22f9b7..c7eb5104e91a 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -59,6 +59,7 @@ static void rxrpc_rotate_tx_window(struct rxrpc_call *call, 
rxrpc_seq_t to)
 
spin_unlock(>lock);
 
+   trace_rxrpc_transmit(call, rxrpc_transmit_rotate);
wake_up(>waitq);
 
while (list) {
@@ -107,6 +108,7 @@ static bool rxrpc_end_tx_phase(struct rxrpc_call *call, 
const char *abort_why)
}
 
write_unlock(>state_lock);
+   trace_rxrpc_transmit(call, rxrpc_transmit_end);
_leave(" = ok");
return true;
 }
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index 598064d3bdd2..dca89995f03e 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -132,3 +132,12 @@ const char rxrpc_client_traces[rxrpc_client__nr_trace][7] 
= {
[rxrpc_client_to_waiting]   = "->Wait",
[rxrpc_client_uncount]  = "Uncoun",
 };
+
+const char rxrpc_transmit_traces[rxrpc_transmit__nr_trace][4] = {
+   [rxrpc_transmit_wait]   = "WAI",
+   [rxrpc_transmit_queue]  = "QUE",
+   [rxrpc_transmit_queue_reqack]   = "QRA",
+   [rxrpc_transmit_queue_last] = "QLS",
+   [rxrpc_transmit_rotate] = "ROT",
+   [rxrpc_transmit_end]= "END",
+};
diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index 8bfddf4e338c..28d8f73cf11d 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -56,6 +56,7 @@ static int rxrpc_wait_for_tx_window(struct rxrpc_sock *rx,
break;
}
 
+   trace_rxrpc_transmit(call, rxrpc_transmit_wait);
release_sock(>sk);
*timeo = schedule_timeout(*timeo);
lock_sock(>sk);
@@ -104,8 +105,14 @@ static void rxrpc_queue_packet(struct rxrpc_call *call, 
struct sk_buff *skb,
smp_wmb();
call->rxtx_buffer[ix] = skb;
call->tx_top = seq;
-   if (last)
+   if (last) {
set_bit(RXRPC_CALL_TX_LAST, >flags);
+   trace_rxrpc_transmit(call, rxrpc_transmit_queue_last);
+   } else if (sp->hdr.flags & RXRPC_REQUEST_ACK) {
+   trace_rxrpc_transmit(call, rxrpc_transmit_queue_reqack);
+   } else {
+   trace_rxrpc_transmit(call, rxrpc_transmit_queue);
+   }
 
if (last || call->state == 

[PATCH net-next 11/11] rxrpc: Add config to inject packet loss

2016-09-17 Thread David Howells
Add a configuration option to inject packet loss by discarding
approximately every 8th packet received and approximately every 8th DATA
packet transmitted.

Note that no locking is used, but it shouldn't really matter.

Signed-off-by: David Howells 
---

 net/rxrpc/Kconfig  |7 +++
 net/rxrpc/input.c  |8 
 net/rxrpc/output.c |9 +
 3 files changed, 24 insertions(+)

diff --git a/net/rxrpc/Kconfig b/net/rxrpc/Kconfig
index 13396c74b5c1..86f8853a038c 100644
--- a/net/rxrpc/Kconfig
+++ b/net/rxrpc/Kconfig
@@ -26,6 +26,13 @@ config AF_RXRPC_IPV6
  Say Y here to allow AF_RXRPC to use IPV6 UDP as well as IPV4 UDP as
  its network transport.
 
+config AF_RXRPC_INJECT_LOSS
+   bool "Inject packet loss into RxRPC packet stream"
+   depends on AF_RXRPC
+   help
+ Say Y here to inject packet loss by discarding some received and some
+ transmitted packets.
+
 
 config AF_RXRPC_DEBUG
bool "RxRPC dynamic debugging"
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 84bb16d47b85..7ac1edf3aac7 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -712,6 +712,14 @@ void rxrpc_data_ready(struct sock *udp_sk)
skb_orphan(skb);
sp = rxrpc_skb(skb);
 
+   if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) {
+   static int lose;
+   if ((lose++ & 7) == 7) {
+   rxrpc_lose_skb(skb, rxrpc_skb_rx_lost);
+   return;
+   }
+   }
+
_net("Rx UDP packet from %08x:%04hu",
 ntohl(ip_hdr(skb)->saddr), ntohs(udp_hdr(skb)->source));
 
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index a2cad5ce7416..16e18a94ffa6 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -225,6 +225,15 @@ int rxrpc_send_data_packet(struct rxrpc_connection *conn, 
struct sk_buff *skb)
msg.msg_controllen = 0;
msg.msg_flags = 0;
 
+   if (IS_ENABLED(CONFIG_AF_RXRPC_INJECT_LOSS)) {
+   static int lose;
+   if ((lose++ & 7) == 7) {
+   rxrpc_lose_skb(skb, rxrpc_skb_tx_lost);
+   _leave(" = 0 [lose]");
+   return 0;
+   }
+   }
+
/* send the packet with the don't fragment bit set if we currently
 * think it's small enough */
if (skb->len - sizeof(struct rxrpc_wire_header) < 
conn->params.peer->maxdata) {



[PATCH net-next 05/11] rxrpc: Add a tracepoint to log received ACK packets

2016-09-17 Thread David Howells
Add a tracepoint to log information from received ACK packets.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   26 ++
 net/rxrpc/input.c|2 ++
 2 files changed, 28 insertions(+)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index ffc74b3e5b76..2b19f3fa5174 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -234,6 +234,32 @@ TRACE_EVENT(rxrpc_transmit,
  __entry->tx_top - __entry->tx_hard_ack)
);
 
+TRACE_EVENT(rxrpc_rx_ack,
+   TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t first, u8 reason, u8 
n_acks),
+
+   TP_ARGS(call, first, reason, n_acks),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(rxrpc_seq_t,first   )
+   __field(u8, reason  )
+   __field(u8, n_acks  )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->first = first;
+   __entry->reason = reason;
+   __entry->n_acks = n_acks;
+  ),
+
+   TP_printk("c=%p %s f=%08x n=%u",
+ __entry->call,
+ rxrpc_acks(__entry->reason),
+ __entry->first,
+ __entry->n_acks)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index c7eb5104e91a..7b18ca124978 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -440,6 +440,8 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct 
sk_buff *skb,
hard_ack = first_soft_ack - 1;
nr_acks = buf.ack.nAcks;
 
+   trace_rxrpc_rx_ack(call, first_soft_ack, buf.ack.reason, nr_acks);
+
_proto("Rx ACK %%%u { m=%hu f=#%u p=#%u s=%%%u r=%s n=%u }",
   sp->hdr.serial,
   ntohs(buf.ack.maxSkew),



[PATCH net-next 08/11] rxrpc: Add a tracepoint to follow what recvmsg does

2016-09-17 Thread David Howells
Add a tracepoint to follow what recvmsg does within AF_RXRPC.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   34 ++
 net/rxrpc/ar-internal.h  |   17 +
 net/rxrpc/misc.c |   14 ++
 net/rxrpc/recvmsg.c  |   34 ++
 4 files changed, 91 insertions(+), 8 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 7dd5f0188681..58732202e9f0 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -323,6 +323,40 @@ TRACE_EVENT(rxrpc_receive,
  __entry->top)
);
 
+TRACE_EVENT(rxrpc_recvmsg,
+   TP_PROTO(struct rxrpc_call *call, enum rxrpc_recvmsg_trace why,
+rxrpc_seq_t seq, unsigned int offset, unsigned int len,
+int ret),
+
+   TP_ARGS(call, why, seq, offset, len, ret),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(enum rxrpc_recvmsg_trace,   why )
+   __field(rxrpc_seq_t,seq )
+   __field(unsigned int,   offset  )
+   __field(unsigned int,   len )
+   __field(int,ret )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->why = why;
+   __entry->seq = seq;
+   __entry->offset = offset;
+   __entry->len = len;
+   __entry->ret = ret;
+  ),
+
+   TP_printk("c=%p %s q=%08x o=%u l=%u ret=%d",
+ __entry->call,
+ rxrpc_recvmsg_traces[__entry->why],
+ __entry->seq,
+ __entry->offset,
+ __entry->len,
+ __entry->ret)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index e5d2f2fb8e41..a17341d2df3d 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -617,6 +617,23 @@ enum rxrpc_receive_trace {
 
 extern const char rxrpc_receive_traces[rxrpc_receive__nr_trace][4];
 
+enum rxrpc_recvmsg_trace {
+   rxrpc_recvmsg_enter,
+   rxrpc_recvmsg_wait,
+   rxrpc_recvmsg_dequeue,
+   rxrpc_recvmsg_hole,
+   rxrpc_recvmsg_next,
+   rxrpc_recvmsg_cont,
+   rxrpc_recvmsg_full,
+   rxrpc_recvmsg_data_return,
+   rxrpc_recvmsg_terminal,
+   rxrpc_recvmsg_to_be_accepted,
+   rxrpc_recvmsg_return,
+   rxrpc_recvmsg__nr_trace
+};
+
+extern const char rxrpc_recvmsg_traces[rxrpc_recvmsg__nr_trace][5];
+
 extern const char *const rxrpc_pkts[];
 extern const char *rxrpc_acks(u8 reason);
 
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index db5f1d54fc90..c7065d893d1e 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -150,3 +150,17 @@ const char 
rxrpc_receive_traces[rxrpc_receive__nr_trace][4] = {
[rxrpc_receive_rotate]  = "ROT",
[rxrpc_receive_end] = "END",
 };
+
+const char rxrpc_recvmsg_traces[rxrpc_recvmsg__nr_trace][5] = {
+   [rxrpc_recvmsg_enter]   = "ENTR",
+   [rxrpc_recvmsg_wait]= "WAIT",
+   [rxrpc_recvmsg_dequeue] = "DEQU",
+   [rxrpc_recvmsg_hole]= "HOLE",
+   [rxrpc_recvmsg_next]= "NEXT",
+   [rxrpc_recvmsg_cont]= "CONT",
+   [rxrpc_recvmsg_full]= "FULL",
+   [rxrpc_recvmsg_data_return] = "DATA",
+   [rxrpc_recvmsg_terminal]= "TERM",
+   [rxrpc_recvmsg_to_be_accepted]  = "TBAC",
+   [rxrpc_recvmsg_return]  = "RETN",
+};
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 22d51087c580..b62a08151895 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -94,6 +94,8 @@ static int rxrpc_recvmsg_term(struct rxrpc_call *call, struct 
msghdr *msg)
break;
}
 
+   trace_rxrpc_recvmsg(call, rxrpc_recvmsg_terminal, call->rx_hard_ack,
+   call->rx_pkt_offset, call->rx_pkt_len, ret);
return ret;
 }
 
@@ -124,6 +126,7 @@ static int rxrpc_recvmsg_new_call(struct rxrpc_sock *rx,
write_unlock(>call_lock);
}
 
+   trace_rxrpc_recvmsg(call, rxrpc_recvmsg_to_be_accepted, 1, 0, 0, ret);
return ret;
 }
 
@@ -310,8 +313,11 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
for (seq = hard_ack + 1; before_eq(seq, top); seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;
skb = call->rxtx_buffer[ix];
-   if (!skb)
+   if (!skb) {
+   

[PATCH net-next 10/11] rxrpc: Improve skb tracing

2016-09-17 Thread David Howells
Improve sk_buff tracing within AF_RXRPC by the following means:

 (1) Use an enum to note the event type rather than plain integers and use
 an array of event names rather than a big multi ?: list.

 (2) Distinguish Rx from Tx packets and account them separately.  This
 requires the call phase to be tracked so that we know what we might
 find in rxtx_buffer[].

 (3) Add a parameter to rxrpc_{new,see,get,free}_skb() to indicate the
 event type.

 (4) A pair of 'rotate' events are added to indicate packets that are about
 to be rotated out of the Rx and Tx windows.

 (5) A pair of 'lost' events are added, along with rxrpc_lose_skb() for
 packet loss injection recording.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   12 +++---
 net/rxrpc/af_rxrpc.c |5 ++--
 net/rxrpc/ar-internal.h  |   33 ++
 net/rxrpc/call_event.c   |8 +++---
 net/rxrpc/call_object.c  |   11 ++---
 net/rxrpc/conn_event.c   |6 ++---
 net/rxrpc/input.c|   13 ++
 net/rxrpc/local_event.c  |4 ++-
 net/rxrpc/misc.c |   18 ++
 net/rxrpc/output.c   |4 ++-
 net/rxrpc/peer_event.c   |   10 
 net/rxrpc/recvmsg.c  |7 +++---
 net/rxrpc/sendmsg.c  |   10 
 net/rxrpc/skbuff.c   |   53 ++
 14 files changed, 131 insertions(+), 63 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 58732202e9f0..75a5d8bf50e1 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -107,14 +107,14 @@ TRACE_EVENT(rxrpc_call,
);
 
 TRACE_EVENT(rxrpc_skb,
-   TP_PROTO(struct sk_buff *skb, int op, int usage, int mod_count,
-const void *where),
+   TP_PROTO(struct sk_buff *skb, enum rxrpc_skb_trace op,
+int usage, int mod_count, const void *where),
 
TP_ARGS(skb, op, usage, mod_count, where),
 
TP_STRUCT__entry(
__field(struct sk_buff *,   skb )
-   __field(int,op  )
+   __field(enum rxrpc_skb_trace,   op  )
__field(int,usage   )
__field(int,mod_count   )
__field(const void *,   where   )
@@ -130,11 +130,7 @@ TRACE_EVENT(rxrpc_skb,
 
TP_printk("s=%p %s u=%d m=%d p=%pSR",
  __entry->skb,
- (__entry->op == 0 ? "NEW" :
-  __entry->op == 1 ? "SEE" :
-  __entry->op == 2 ? "GET" :
-  __entry->op == 3 ? "FRE" :
-  "PUR"),
+ rxrpc_skb_traces[__entry->op],
  __entry->usage,
  __entry->mod_count,
  __entry->where)
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 09f81befc705..8dbf7bed2cc4 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -45,7 +45,7 @@ u32 rxrpc_epoch;
 atomic_t rxrpc_debug_id;
 
 /* count of skbs currently in use */
-atomic_t rxrpc_n_skbs;
+atomic_t rxrpc_n_tx_skbs, rxrpc_n_rx_skbs;
 
 struct workqueue_struct *rxrpc_workqueue;
 
@@ -867,7 +867,8 @@ static void __exit af_rxrpc_exit(void)
proto_unregister(_proto);
rxrpc_destroy_all_calls();
rxrpc_destroy_all_connections();
-   ASSERTCMP(atomic_read(_n_skbs), ==, 0);
+   ASSERTCMP(atomic_read(_n_tx_skbs), ==, 0);
+   ASSERTCMP(atomic_read(_n_rx_skbs), ==, 0);
rxrpc_destroy_all_locals();
 
remove_proc_entry("rxrpc_conns", init_net.proc_net);
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index a17341d2df3d..034f525f2235 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -520,6 +520,7 @@ struct rxrpc_call {
rxrpc_seq_t rx_expect_next; /* Expected next packet 
sequence number */
u8  rx_winsize; /* Size of Rx window */
u8  tx_winsize; /* Maximum size of Tx window */
+   booltx_phase;   /* T if transmission phase, F 
if receive phase */
u8  nr_jumbo_bad;   /* Number of jumbo 
dups/exceeds-windows */
 
/* receive-phase ACK management */
@@ -534,6 +535,27 @@ struct rxrpc_call {
rxrpc_serial_t  acks_latest;/* serial number of latest ACK 
received */
 };
 
+enum rxrpc_skb_trace {
+   rxrpc_skb_rx_cleaned,
+   rxrpc_skb_rx_freed,
+   rxrpc_skb_rx_got,
+   rxrpc_skb_rx_lost,
+   rxrpc_skb_rx_received,
+   rxrpc_skb_rx_rotated,
+   rxrpc_skb_rx_purged,
+   rxrpc_skb_rx_seen,
+   rxrpc_skb_tx_cleaned,
+   

[PATCH net-next 09/11] rxrpc: Remove printks from rxrpc_recvmsg_data() to fix uninit var

2016-09-17 Thread David Howells
Remove _enter/_debug/_leave calls from rxrpc_recvmsg_data() of which one
uses an uninitialised variable.

Signed-off-by: David Howells 
---

 net/rxrpc/recvmsg.c |8 
 1 file changed, 8 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index b62a08151895..79e65668bc58 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -296,8 +296,6 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
unsigned int rx_pkt_offset, rx_pkt_len;
int ix, copy, ret = -EAGAIN, ret2;
 
-   _enter("");
-
rx_pkt_offset = call->rx_pkt_offset;
rx_pkt_len = call->rx_pkt_len;
 
@@ -343,8 +341,6 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
trace_rxrpc_recvmsg(call, rxrpc_recvmsg_cont, seq,
rx_pkt_offset, rx_pkt_len, 0);
}
-   _debug("recvmsg %x DATA #%u { %d, %d }",
-  sp->hdr.callNumber, seq, rx_pkt_offset, rx_pkt_len);
 
/* We have to handle short, empty and used-up DATA packets. */
remain = len - *_offset;
@@ -360,8 +356,6 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
}
 
/* handle piecemeal consumption of data packets */
-   _debug("copied %d @%zu", copy, *_offset);
-
rx_pkt_offset += copy;
rx_pkt_len -= copy;
*_offset += copy;
@@ -370,7 +364,6 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
if (rx_pkt_len > 0) {
trace_rxrpc_recvmsg(call, rxrpc_recvmsg_full, seq,
rx_pkt_offset, rx_pkt_len, 0);
-   _debug("buffer full");
ASSERTCMP(*_offset, ==, len);
ret = 0;
break;
@@ -398,7 +391,6 @@ out:
 done:
trace_rxrpc_recvmsg(call, rxrpc_recvmsg_data_return, seq,
rx_pkt_offset, rx_pkt_len, ret);
-   _leave(" = %d [%u/%u]", ret, seq, top);
return ret;
 }
 



[PATCH net-next 06/11] rxrpc: Add a tracepoint to log ACK transmission

2016-09-17 Thread David Howells
Add a tracepoint to log information about ACK transmission.

Signed-off-by: David Howels 
---

 include/trace/events/rxrpc.h |   30 ++
 net/rxrpc/conn_event.c   |3 +++
 net/rxrpc/output.c   |7 ++-
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 2b19f3fa5174..d545d692ae22 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -260,6 +260,36 @@ TRACE_EVENT(rxrpc_rx_ack,
  __entry->n_acks)
);
 
+TRACE_EVENT(rxrpc_tx_ack,
+   TP_PROTO(struct rxrpc_call *call, rxrpc_seq_t first,
+rxrpc_serial_t serial, u8 reason, u8 n_acks),
+
+   TP_ARGS(call, first, serial, reason, n_acks),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(rxrpc_seq_t,first   )
+   __field(rxrpc_serial_t, serial  )
+   __field(u8, reason  )
+   __field(u8, n_acks  )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->first = first;
+   __entry->serial = serial;
+   __entry->reason = reason;
+   __entry->n_acks = n_acks;
+  ),
+
+   TP_printk("c=%p %s f=%08x r=%08x n=%u",
+ __entry->call,
+ rxrpc_acks(__entry->reason),
+ __entry->first,
+ __entry->serial,
+ __entry->n_acks)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/conn_event.c b/net/rxrpc/conn_event.c
index a43f4c94a88d..9b19c51831aa 100644
--- a/net/rxrpc/conn_event.c
+++ b/net/rxrpc/conn_event.c
@@ -98,6 +98,9 @@ static void rxrpc_conn_retransmit_call(struct 
rxrpc_connection *conn,
pkt.info.rwind  = htonl(rxrpc_rx_window_size);
pkt.info.jumbo_max  = htonl(rxrpc_rx_jumbo_max);
len += sizeof(pkt.ack) + sizeof(pkt.info);
+
+   trace_rxrpc_tx_ack(NULL, chan->last_seq, 0,
+  RXRPC_ACK_DUPLICATE, 0);
break;
}
 
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 0b21ed859de7..2c9daeadce87 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -38,12 +38,14 @@ struct rxrpc_pkt_buffer {
 static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
 struct rxrpc_pkt_buffer *pkt)
 {
+   rxrpc_serial_t serial;
rxrpc_seq_t hard_ack, top, seq;
int ix;
u32 mtu, jmax;
u8 *ackp = pkt->acks;
 
/* Barrier against rxrpc_input_data(). */
+   serial = call->ackr_serial;
hard_ack = READ_ONCE(call->rx_hard_ack);
top = smp_load_acquire(>rx_top);
 
@@ -51,7 +53,7 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
pkt->ack.maxSkew= htons(call->ackr_skew);
pkt->ack.firstPacket= htonl(hard_ack + 1);
pkt->ack.previousPacket = htonl(call->ackr_prev_seq);
-   pkt->ack.serial = htonl(call->ackr_serial);
+   pkt->ack.serial = htonl(serial);
pkt->ack.reason = call->ackr_reason;
pkt->ack.nAcks  = top - hard_ack;
 
@@ -75,6 +77,9 @@ static size_t rxrpc_fill_out_ack(struct rxrpc_call *call,
pkt->ackinfo.rwind  = htonl(call->rx_winsize);
pkt->ackinfo.jumbo_max  = htonl(jmax);
 
+   trace_rxrpc_tx_ack(call, hard_ack + 1, serial, call->ackr_reason,
+  top - hard_ack);
+
*ackp++ = 0;
*ackp++ = 0;
*ackp++ = 0;



Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Joe Perches
On Sat, 2016-09-17 at 16:17 -0700, Florian Fainelli wrote:
> 2016-09-17 15:51 GMT-07:00 Joe Perches :
[]
> > Without an actual maintainer, this should really be
> > orphan and not supported.
> I would like to hear from Michael before concluding that

No worries.

> > And the M: bcm-kernel-feedback-list@ should be L:
> The list does not accept public subscribers, so this is the correct
> entry to use.

Then M: is definitely _not_ the correct entry for this
and it should be:

L: bcm-kernel-feedback-l...@broadcom.com (subscribers-only)



[PATCH net-next 03/11] rxrpc: Add connection tracepoint and client conn state tracepoint

2016-09-17 Thread David Howells
Add a pair of tracepoints, one to track rxrpc_connection struct ref
counting and the other to track the client connection cache state.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   60 +++
 net/rxrpc/ar-internal.h  |   76 +--
 net/rxrpc/call_accept.c  |4 ++
 net/rxrpc/call_object.c  |2 -
 net/rxrpc/conn_client.c  |   82 +-
 net/rxrpc/conn_event.c   |2 +
 net/rxrpc/conn_object.c  |   72 +++--
 net/rxrpc/conn_service.c |4 ++
 net/rxrpc/misc.c |   31 
 9 files changed, 274 insertions(+), 59 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index 0a30c673509c..c0c496c83f31 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -16,6 +16,66 @@
 
 #include 
 
+TRACE_EVENT(rxrpc_conn,
+   TP_PROTO(struct rxrpc_connection *conn, enum rxrpc_conn_trace op,
+int usage, const void *where),
+
+   TP_ARGS(conn, op, usage, where),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_connection *,  conn)
+   __field(int,op  )
+   __field(int,usage   )
+   __field(const void *,   where   )
+),
+
+   TP_fast_assign(
+   __entry->conn = conn;
+   __entry->op = op;
+   __entry->usage = usage;
+   __entry->where = where;
+  ),
+
+   TP_printk("C=%p %s u=%d sp=%pSR",
+ __entry->conn,
+ rxrpc_conn_traces[__entry->op],
+ __entry->usage,
+ __entry->where)
+   );
+
+TRACE_EVENT(rxrpc_client,
+   TP_PROTO(struct rxrpc_connection *conn, int channel,
+enum rxrpc_client_trace op),
+
+   TP_ARGS(conn, channel, op),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_connection *,  conn)
+   __field(u32,cid )
+   __field(int,channel )
+   __field(int,usage   )
+   __field(enum rxrpc_client_trace,op  )
+   __field(enum rxrpc_conn_cache_state, cs )
+),
+
+   TP_fast_assign(
+   __entry->conn = conn;
+   __entry->channel = channel;
+   __entry->usage = atomic_read(>usage);
+   __entry->op = op;
+   __entry->cid = conn->proto.cid;
+   __entry->cs = conn->cache_state;
+  ),
+
+   TP_printk("C=%p h=%2d %s %s i=%08x u=%d",
+ __entry->conn,
+ __entry->channel,
+ rxrpc_client_traces[__entry->op],
+ rxrpc_conn_cache_states[__entry->cs],
+ __entry->cid,
+ __entry->usage)
+   );
+
 TRACE_EVENT(rxrpc_call,
TP_PROTO(struct rxrpc_call *call, enum rxrpc_call_trace op,
 int usage, const void *where, const void *aux),
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 4a73c20d9436..6ca40eea3022 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -314,6 +314,7 @@ enum rxrpc_conn_cache_state {
RXRPC_CONN_CLIENT_ACTIVE,   /* Conn is on active list, doing calls 
*/
RXRPC_CONN_CLIENT_CULLED,   /* Conn is culled and delisted, doing 
calls */
RXRPC_CONN_CLIENT_IDLE, /* Conn is on idle list, doing mostly 
nothing */
+   RXRPC_CONN__NR_CACHE_STATES
 };
 
 /*
@@ -533,6 +534,44 @@ struct rxrpc_call {
rxrpc_serial_t  acks_latest;/* serial number of latest ACK 
received */
 };
 
+enum rxrpc_conn_trace {
+   rxrpc_conn_new_client,
+   rxrpc_conn_new_service,
+   rxrpc_conn_queued,
+   rxrpc_conn_seen,
+   rxrpc_conn_got,
+   rxrpc_conn_put_client,
+   rxrpc_conn_put_service,
+   rxrpc_conn__nr_trace
+};
+
+extern const char rxrpc_conn_traces[rxrpc_conn__nr_trace][4];
+
+enum rxrpc_client_trace {
+   rxrpc_client_activate_chans,
+   rxrpc_client_alloc,
+   rxrpc_client_chan_activate,
+   rxrpc_client_chan_disconnect,
+   rxrpc_client_chan_pass,
+   rxrpc_client_chan_unstarted,
+   rxrpc_client_cleanup,
+   rxrpc_client_count,
+   rxrpc_client_discard,
+   rxrpc_client_duplicate,
+   rxrpc_client_exposed,
+   rxrpc_client_replace,
+   rxrpc_client_to_active,
+   

[PATCH net-next 10/14] rxrpc: Fix the parsing of soft-ACKs

2016-09-17 Thread David Howells
The soft-ACK parser doesn't increment the pointer into the soft-ACK list,
resulting in the first ACK/NACK value being applied to all the relevant
packets in the Tx queue.  This has the potential to miss retransmissions
and cause excessive retransmissions.

Fix this by incrementing the pointer.

Signed-off-by: David Howells 
---

 net/rxrpc/input.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index f0d9115b9b7e..c1f83d22f9b7 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -384,7 +384,7 @@ static void rxrpc_input_soft_acks(struct rxrpc_call *call, 
u8 *acks,
 
for (; nr_acks > 0; nr_acks--, seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;
-   switch (*acks) {
+   switch (*acks++) {
case RXRPC_ACK_TYPE_ACK:
call->rxtx_annotations[ix] = RXRPC_TX_ANNO_ACK;
break;



[PATCH net-next 07/11] rxrpc: Add a tracepoint to follow packets in the Rx buffer

2016-09-17 Thread David Howells
Add a tracepoint to follow the life of packets that get added to a call's
receive buffer.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |   33 +
 net/rxrpc/ar-internal.h  |   12 
 net/rxrpc/call_accept.c  |3 +++
 net/rxrpc/input.c|6 +-
 net/rxrpc/misc.c |9 +
 net/rxrpc/recvmsg.c  |   11 +++
 6 files changed, 73 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index d545d692ae22..7dd5f0188681 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -290,6 +290,39 @@ TRACE_EVENT(rxrpc_tx_ack,
  __entry->n_acks)
);
 
+TRACE_EVENT(rxrpc_receive,
+   TP_PROTO(struct rxrpc_call *call, enum rxrpc_receive_trace why,
+rxrpc_serial_t serial, rxrpc_seq_t seq),
+
+   TP_ARGS(call, why, serial, seq),
+
+   TP_STRUCT__entry(
+   __field(struct rxrpc_call *,call)
+   __field(enum rxrpc_receive_trace,   why )
+   __field(rxrpc_serial_t, serial  )
+   __field(rxrpc_seq_t,seq )
+   __field(rxrpc_seq_t,hard_ack)
+   __field(rxrpc_seq_t,top )
+),
+
+   TP_fast_assign(
+   __entry->call = call;
+   __entry->why = why;
+   __entry->serial = serial;
+   __entry->seq = seq;
+   __entry->hard_ack = call->rx_hard_ack;
+   __entry->top = call->rx_top;
+  ),
+
+   TP_printk("c=%p %s r=%08x q=%08x w=%08x-%08x",
+ __entry->call,
+ rxrpc_receive_traces[__entry->why],
+ __entry->serial,
+ __entry->seq,
+ __entry->hard_ack,
+ __entry->top)
+   );
+
 #endif /* _TRACE_RXRPC_H */
 
 /* This part must be outside protection */
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index afa5dcc05fe0..e5d2f2fb8e41 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -605,6 +605,18 @@ enum rxrpc_transmit_trace {
 
 extern const char rxrpc_transmit_traces[rxrpc_transmit__nr_trace][4];
 
+enum rxrpc_receive_trace {
+   rxrpc_receive_incoming,
+   rxrpc_receive_queue,
+   rxrpc_receive_queue_last,
+   rxrpc_receive_front,
+   rxrpc_receive_rotate,
+   rxrpc_receive_end,
+   rxrpc_receive__nr_trace
+};
+
+extern const char rxrpc_receive_traces[rxrpc_receive__nr_trace][4];
+
 extern const char *const rxrpc_pkts[];
 extern const char *rxrpc_acks(u8 reason);
 
diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 3e474508ba75..a8d39d7cf42c 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -367,6 +367,9 @@ found_service:
goto out;
}
 
+   trace_rxrpc_receive(call, rxrpc_receive_incoming,
+   sp->hdr.serial, sp->hdr.seq);
+
/* Make the call live. */
rxrpc_incoming_call(rx, call, skb);
conn = call->conn;
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 7b18ca124978..b690220533c6 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -284,8 +284,12 @@ next_subpacket:
call->rxtx_buffer[ix] = skb;
if (after(seq, call->rx_top))
smp_store_release(>rx_top, seq);
-   if (flags & RXRPC_LAST_PACKET)
+   if (flags & RXRPC_LAST_PACKET) {
set_bit(RXRPC_CALL_RX_LAST, >flags);
+   trace_rxrpc_receive(call, rxrpc_receive_queue_last, serial, 
seq);
+   } else {
+   trace_rxrpc_receive(call, rxrpc_receive_queue, serial, seq);
+   }
queued = true;
 
if (after_eq(seq, call->rx_expect_next)) {
diff --git a/net/rxrpc/misc.c b/net/rxrpc/misc.c
index dca89995f03e..db5f1d54fc90 100644
--- a/net/rxrpc/misc.c
+++ b/net/rxrpc/misc.c
@@ -141,3 +141,12 @@ const char 
rxrpc_transmit_traces[rxrpc_transmit__nr_trace][4] = {
[rxrpc_transmit_rotate] = "ROT",
[rxrpc_transmit_end]= "END",
 };
+
+const char rxrpc_receive_traces[rxrpc_receive__nr_trace][4] = {
+   [rxrpc_receive_incoming]= "INC",
+   [rxrpc_receive_queue]   = "QUE",
+   [rxrpc_receive_queue_last]  = "QLS",
+   [rxrpc_receive_front]   = "FRN",
+   [rxrpc_receive_rotate]  = "ROT",
+   [rxrpc_receive_end] = "END",
+};
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 8b8d7e14f800..22d51087c580 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -134,6 +134,7 @@ static void rxrpc_end_rx_phase(struct rxrpc_call 

[PATCH net-next 05/14] rxrpc: Record calls that need to be accepted

2016-09-17 Thread David Howells
Record calls that need to be accepted using sk_acceptq_added() otherwise
the backlog counter goes negative because sk_acceptq_removed() is called.
This causes the preallocator to malfunction.

Calls that are preaccepted by AFS within the kernel aren't affected by
this.

Signed-off-by: David Howells 
---

 net/rxrpc/call_accept.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/rxrpc/call_accept.c b/net/rxrpc/call_accept.c
index 26c293ef98eb..323b8da50163 100644
--- a/net/rxrpc/call_accept.c
+++ b/net/rxrpc/call_accept.c
@@ -369,6 +369,8 @@ found_service:
 
if (rx->notify_new_call)
rx->notify_new_call(>sk, call, call->user_call_ID);
+   else
+   sk_acceptq_added(>sk);
 
spin_lock(>state_lock);
switch (conn->state) {



[PATCH net-next 02/14] rxrpc: Move the check of rx_pkt_offset from rxrpc_locate_data() to caller

2016-09-17 Thread David Howells
Move the check of rx_pkt_offset from rxrpc_locate_data() to the caller,
rxrpc_recvmsg_data(), so that it's more clear what's going on there.

Signed-off-by: David Howells 
---

 net/rxrpc/recvmsg.c |9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index a284205b8ecf..0d085f5cf1bf 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -240,9 +240,6 @@ static int rxrpc_locate_data(struct rxrpc_call *call, 
struct sk_buff *skb,
int ret;
u8 annotation = *_annotation;
 
-   if (offset > 0)
-   return 0;
-
/* Locate the subpacket */
offset = sp->offset;
len = skb->len - sp->offset;
@@ -303,8 +300,10 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
if (msg)
sock_recv_timestamp(msg, sock->sk, skb);
 
-   ret = rxrpc_locate_data(call, skb, >rxtx_annotations[ix],
-   _pkt_offset, _pkt_len);
+   if (rx_pkt_offset == 0)
+   ret = rxrpc_locate_data(call, skb,
+   >rxtx_annotations[ix],
+   _pkt_offset, _pkt_len);
_debug("recvmsg %x DATA #%u { %d, %d }",
   sp->hdr.callNumber, seq, rx_pkt_offset, rx_pkt_len);
 



[PATCH net-next 01/11] rxrpc: Print the packet type name in the Rx packet trace

2016-09-17 Thread David Howells
Print a symbolic packet type name for each valid received packet in the
trace output, not just a number.

Signed-off-by: David Howells 
---

 include/trace/events/rxrpc.h |5 +++--
 net/rxrpc/ar-internal.h  |6 +++---
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/trace/events/rxrpc.h b/include/trace/events/rxrpc.h
index ea3b10ed91a8..0a30c673509c 100644
--- a/include/trace/events/rxrpc.h
+++ b/include/trace/events/rxrpc.h
@@ -93,11 +93,12 @@ TRACE_EVENT(rxrpc_rx_packet,
memcpy(&__entry->hdr, >hdr, sizeof(__entry->hdr));
   ),
 
-   TP_printk("%08x:%08x:%08x:%04x %08x %08x %02x %02x",
+   TP_printk("%08x:%08x:%08x:%04x %08x %08x %02x %02x %s",
  __entry->hdr.epoch, __entry->hdr.cid,
  __entry->hdr.callNumber, __entry->hdr.serviceId,
  __entry->hdr.serial, __entry->hdr.seq,
- __entry->hdr.type, __entry->hdr.flags)
+ __entry->hdr.type, __entry->hdr.flags,
+ __entry->hdr.type <= 15 ? rxrpc_pkts[__entry->hdr.type] : 
"?UNK")
);
 
 TRACE_EVENT(rxrpc_rx_done,
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index e78c40b37db5..0f6fafa2c271 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -551,6 +551,9 @@ enum rxrpc_call_trace {
 
 extern const char rxrpc_call_traces[rxrpc_call__nr_trace][4];
 
+extern const char *const rxrpc_pkts[];
+extern const char *rxrpc_acks(u8 reason);
+
 #include 
 
 /*
@@ -851,11 +854,8 @@ extern unsigned int rxrpc_rx_mtu;
 extern unsigned int rxrpc_rx_jumbo_max;
 extern unsigned int rxrpc_resend_timeout;
 
-extern const char *const rxrpc_pkts[];
 extern const s8 rxrpc_ack_priority[];
 
-extern const char *rxrpc_acks(u8 reason);
-
 /*
  * output.c
  */



[PATCH net-next 11/14] rxrpc: Fix retransmission algorithm

2016-09-17 Thread David Howells
Make the retransmission algorithm use for-loops instead of do-loops and
move the counter increments into the for-statement increment slots.

Though the do-loops are slighly more efficient since there will be at least
one pass through the each loop, the counter increments are harder to get
right as the continue-statements skip them.

Without this, if there are any positive acks within the loop, the do-loop
will cycle forever because the counter increment is never done.

Signed-off-by: David Howells 
---

 net/rxrpc/call_event.c |   12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 9367c3be31eb..f0cabc48a1b7 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -163,8 +163,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 */
now = jiffies;
resend_at = now + rxrpc_resend_timeout;
-   seq = cursor + 1;
-   do {
+   for (seq = cursor + 1; before_eq(seq, top); seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;
annotation = call->rxtx_annotations[ix];
if (annotation == RXRPC_TX_ANNO_ACK)
@@ -184,8 +183,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 
/* Okay, we need to retransmit a packet. */
call->rxtx_annotations[ix] = RXRPC_TX_ANNO_RETRANS;
-   seq++;
-   } while (before_eq(seq, top));
+   }
 
call->resend_at = resend_at;
 
@@ -194,8 +192,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 * lock is dropped, it may clear some of the retransmission markers for
 * packets that it soft-ACKs.
 */
-   seq = cursor + 1;
-   do {
+   for (seq = cursor + 1; before_eq(seq, top); seq++) {
ix = seq & RXRPC_RXTX_BUFF_MASK;
annotation = call->rxtx_annotations[ix];
if (annotation != RXRPC_TX_ANNO_RETRANS)
@@ -237,8 +234,7 @@ static void rxrpc_resend(struct rxrpc_call *call)
 
if (after(call->tx_hard_ack, seq))
seq = call->tx_hard_ack;
-   seq++;
-   } while (before_eq(seq, top));
+   }
 
 out_unlock:
spin_unlock_bh(>lock);



[PATCH net-next 08/14] rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()

2016-09-17 Thread David Howells
Call rxrpc_release_call() on getting an error in rxrpc_new_client_call()
rather than trying to do the cleanup ourselves.  This isn't a problem,
provided we set RXRPC_CALL_HAS_USERID only if we actually add the call to
the calls tree as cleanup code fragments that would otherwise cause
problems are conditional.

Without this, we miss some of the cleanup.

Signed-off-by: David Howells 
---

 net/rxrpc/call_object.c |   36 
 1 file changed, 12 insertions(+), 24 deletions(-)

diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index b0ffbd9664e6..23f5a5f58282 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -226,9 +226,6 @@ struct rxrpc_call *rxrpc_new_client_call(struct rxrpc_sock 
*rx,
 (const void *)user_call_ID);
 
/* Publish the call, even though it is incompletely set up as yet */
-   call->user_call_ID = user_call_ID;
-   __set_bit(RXRPC_CALL_HAS_USERID, >flags);
-
write_lock(>call_lock);
 
pp = >calls.rb_node;
@@ -242,10 +239,12 @@ struct rxrpc_call *rxrpc_new_client_call(struct 
rxrpc_sock *rx,
else if (user_call_ID > xcall->user_call_ID)
pp = &(*pp)->rb_right;
else
-   goto found_user_ID_now_present;
+   goto error_dup_user_ID;
}
 
rcu_assign_pointer(call->socket, rx);
+   call->user_call_ID = user_call_ID;
+   __set_bit(RXRPC_CALL_HAS_USERID, >flags);
rxrpc_get_call(call, rxrpc_call_got_userid);
rb_link_node(>sock_node, parent, pp);
rb_insert_color(>sock_node, >calls);
@@ -276,33 +275,22 @@ struct rxrpc_call *rxrpc_new_client_call(struct 
rxrpc_sock *rx,
_leave(" = %p [new]", call);
return call;
 
-error:
-   write_lock(>call_lock);
-   rb_erase(>sock_node, >calls);
-   write_unlock(>call_lock);
-   rxrpc_put_call(call, rxrpc_call_put_userid);
-
-   write_lock(_call_lock);
-   list_del_init(>link);
-   write_unlock(_call_lock);
-
-error_out:
-   __rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR,
-   RX_CALL_DEAD, ret);
-   set_bit(RXRPC_CALL_RELEASED, >flags);
-   rxrpc_put_call(call, rxrpc_call_put);
-   _leave(" = %d", ret);
-   return ERR_PTR(ret);
-
/* We unexpectedly found the user ID in the list after taking
 * the call_lock.  This shouldn't happen unless the user races
 * with itself and tries to add the same user ID twice at the
 * same time in different threads.
 */
-found_user_ID_now_present:
+error_dup_user_ID:
write_unlock(>call_lock);
ret = -EEXIST;
-   goto error_out;
+
+error:
+   __rxrpc_set_call_completion(call, RXRPC_CALL_LOCAL_ERROR,
+   RX_CALL_DEAD, ret);
+   rxrpc_release_call(rx, call);
+   rxrpc_put_call(call, rxrpc_call_put);
+   _leave(" = %d", ret);
+   return ERR_PTR(ret);
 }
 
 /*



[PATCH net-next 06/14] rxrpc: Purge the to_be_accepted queue on socket release

2016-09-17 Thread David Howells
Purge the queue of to_be_accepted calls on socket release.  Note that
purging sock_calls doesn't release the ref owned by to_be_accepted.

Probably the sock_calls list is redundant given a purges of the recvmsg_q,
the to_be_accepted queue and the calls tree.

Signed-off-by: David Howells 
---

 net/rxrpc/call_object.c |   10 ++
 1 file changed, 10 insertions(+)

diff --git a/net/rxrpc/call_object.c b/net/rxrpc/call_object.c
index 22f9b0d1a138..b0ffbd9664e6 100644
--- a/net/rxrpc/call_object.c
+++ b/net/rxrpc/call_object.c
@@ -476,6 +476,16 @@ void rxrpc_release_calls_on_socket(struct rxrpc_sock *rx)
 
_enter("%p", rx);
 
+   while (!list_empty(>to_be_accepted)) {
+   call = list_entry(rx->to_be_accepted.next,
+ struct rxrpc_call, accept_link);
+   list_del(>accept_link);
+   rxrpc_abort_call("SKR", call, 0, RX_CALL_DEAD, ECONNRESET);
+   rxrpc_send_call_packet(call, RXRPC_PACKET_TYPE_ABORT);
+   rxrpc_release_call(rx, call);
+   rxrpc_put_call(call, rxrpc_call_put);
+   }
+
while (!list_empty(>sock_calls)) {
call = list_entry(rx->sock_calls.next,
  struct rxrpc_call, sock_link);



[PATCH net-next 04/14] rxrpc: Fix handling of the last packet in rxrpc_recvmsg_data()

2016-09-17 Thread David Howells
The code for determining the last packet in rxrpc_recvmsg_data() has been
using the RXRPC_CALL_RX_LAST flag to determine if the rx_top pointer points
to the last packet or not.  This isn't a good idea, however, as the input
code may be running simultaneously on another CPU and that sets the flag
*before* updating the top pointer.

Fix this by the following means:

 (1) Restrict the use of RXRPC_CALL_RX_LAST to the input routines only.
 There's otherwise a synchronisation problem between detecting the flag
 and checking tx_top.  This could probably be dealt with by appropriate
 application of memory barriers, but there's a simpler way.

 (2) Set RXRPC_CALL_RX_LAST after setting rx_top.

 (3) Make rxrpc_rotate_rx_window() consult the flags header field of the
 DATA packet it's about to discard to see if that was the last packet.
 Use this as the basis for ending the Rx phase.  This shouldn't be a
 problem because the recvmsg side of things is guaranteed to see the
 packets in order.

 (4) Make rxrpc_recvmsg_data() return 1 to indicate the end of the data if:

 (a) the packet it has just processed is marked as RXRPC_LAST_PACKET

 (b) the call's Rx phase has been ended.

Signed-off-by: David Howells 
---

 net/rxrpc/input.c   |4 +++-
 net/rxrpc/recvmsg.c |   49 +
 2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index 75af0bd316c7..f0d9115b9b7e 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -238,7 +238,7 @@ next_subpacket:
len = RXRPC_JUMBO_DATALEN;
 
if (flags & RXRPC_LAST_PACKET) {
-   if (test_and_set_bit(RXRPC_CALL_RX_LAST, >flags) &&
+   if (test_bit(RXRPC_CALL_RX_LAST, >flags) &&
seq != call->rx_top)
return rxrpc_proto_abort("LSN", call, seq);
} else {
@@ -282,6 +282,8 @@ next_subpacket:
call->rxtx_buffer[ix] = skb;
if (after(seq, call->rx_top))
smp_store_release(>rx_top, seq);
+   if (flags & RXRPC_LAST_PACKET)
+   set_bit(RXRPC_CALL_RX_LAST, >flags);
queued = true;
 
if (after_eq(seq, call->rx_expect_next)) {
diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 1edf2cf62cc5..8b8d7e14f800 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -134,6 +134,8 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call)
 {
_enter("%d,%s", call->debug_id, rxrpc_call_states[call->state]);
 
+   ASSERTCMP(call->rx_hard_ack, ==, call->rx_top);
+
if (call->state == RXRPC_CALL_CLIENT_RECV_REPLY) {
rxrpc_propose_ACK(call, RXRPC_ACK_IDLE, 0, 0, true, false);
rxrpc_send_call_packet(call, RXRPC_PACKET_TYPE_ACK);
@@ -163,8 +165,10 @@ static void rxrpc_end_rx_phase(struct rxrpc_call *call)
  */
 static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
 {
+   struct rxrpc_skb_priv *sp;
struct sk_buff *skb;
rxrpc_seq_t hard_ack, top;
+   u8 flags;
int ix;
 
_enter("%d", call->debug_id);
@@ -177,6 +181,8 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
ix = hard_ack & RXRPC_RXTX_BUFF_MASK;
skb = call->rxtx_buffer[ix];
rxrpc_see_skb(skb);
+   sp = rxrpc_skb(skb);
+   flags = sp->hdr.flags;
call->rxtx_buffer[ix] = NULL;
call->rxtx_annotations[ix] = 0;
/* Barrier against rxrpc_input_data(). */
@@ -184,8 +190,8 @@ static void rxrpc_rotate_rx_window(struct rxrpc_call *call)
 
rxrpc_free_skb(skb);
 
-   _debug("%u,%u,%lx", hard_ack, top, call->flags);
-   if (hard_ack == top && test_bit(RXRPC_CALL_RX_LAST, >flags))
+   _debug("%u,%u,%02x", hard_ack, top, flags);
+   if (flags & RXRPC_LAST_PACKET)
rxrpc_end_rx_phase(call);
 }
 
@@ -278,13 +284,19 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
size_t remain;
bool last;
unsigned int rx_pkt_offset, rx_pkt_len;
-   int ix, copy, ret = 0;
+   int ix, copy, ret = -EAGAIN, ret2;
 
_enter("");
 
rx_pkt_offset = call->rx_pkt_offset;
rx_pkt_len = call->rx_pkt_len;
 
+   if (call->state >= RXRPC_CALL_SERVER_ACK_REQUEST) {
+   seq = call->rx_hard_ack;
+   ret = 1;
+   goto done;
+   }
+
/* Barriers against rxrpc_input_data(). */
hard_ack = call->rx_hard_ack;
top = smp_load_acquire(>rx_top);
@@ -301,11 +313,13 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
sock_recv_timestamp(msg, sock->sk, skb);
 
if (rx_pkt_offset == 0) {
-   ret = rxrpc_locate_data(call, skb,
-   >rxtx_annotations[ix],
-   _pkt_offset, 

[PATCH net-next 12/14] rxrpc: Don't transmit an ACK if there's no reason set

2016-09-17 Thread David Howells
Don't transmit an ACK if call->ackr_reason in unset.  There's the
possibility of a race between recvmsg() sending an ACK and the background
processing thread trying to send the same one.

Signed-off-by: David Howells 
---

 net/rxrpc/output.c |5 +
 1 file changed, 5 insertions(+)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index 06a9aca739d1..aa0507214b31 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -137,6 +137,11 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 
type)
switch (type) {
case RXRPC_PACKET_TYPE_ACK:
spin_lock_bh(>lock);
+   if (!call->ackr_reason) {
+   spin_unlock_bh(>lock);
+   ret = 0;
+   goto out;
+   }
n = rxrpc_fill_out_ack(call, pkt);
call->ackr_reason = 0;
 



Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Florian Fainelli
2016-09-17 15:51 GMT-07:00 Joe Perches :
> On Sat, 2016-09-17 at 15:27 -0700, Florian Fainelli wrote:
>> Gary has not been with Broadcom for some time now, replace his address
>> with the internal mailing-list used for other entries.
>>
>> > Signed-off-by: Florian Fainelli 
>> ---
>> Michael,
>>
>> Since this is an old driver, not sure who could step up as a maintainer
>> for b44?
> []
>> diff --git a/MAINTAINERS b/MAINTAINERS
> []
>> @@ -2500,8 +2500,8 @@ S:  Supported
>
>> F:kernel/bpf/
>>
>>  BROADCOM B44 10/100 ETHERNET DRIVER
>> -M:   Gary Zambrano 
>>  L:   netdev@vger.kernel.org
>> +M:   bcm-kernel-feedback-l...@broadcom.com
>>  S:   Supported
>>  F:   drivers/net/ethernet/broadcom/b44.*
>
> Without an actual maintainer, this should really be
> orphan and not supported.

I would like to hear from Michael before concluding that

>
> And the M: bcm-kernel-feedback-list@ should be L:

The list does not accept public subscribers, so this is the correct
entry to use.

>
> BCM4401 NICs are essentially from 2002.
>
> Does anyone really use these any longer with a
> current distribution or kernel version?

This NIC is also embedded inside BCM47xx/BCM53xx which is still
getting active support from Rafal and Hauke.
-- 
Florian


[PATCH net-next 14/14] rxrpc: Fix the basic transmit DATA packet content size at 1412 bytes

2016-09-17 Thread David Howells
Fix the basic transmit DATA packet content size at 1412 bytes so that they
can be arbitrarily assembled into jumbo packets.

In the future, I'm thinking of moving to keeping a jumbo packet header at
the beginning of each packet in the Tx queue and creating the packet header
on the spot when kernel_sendmsg() is invoked.  That way, jumbo packets can
be assembled on the spur of the moment for (re-)transmission.

Signed-off-by: David Howells 
---

 net/rxrpc/sendmsg.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/sendmsg.c b/net/rxrpc/sendmsg.c
index cba236575073..8bfddf4e338c 100644
--- a/net/rxrpc/sendmsg.c
+++ b/net/rxrpc/sendmsg.c
@@ -214,7 +214,7 @@ static int rxrpc_send_data(struct rxrpc_sock *rx,
goto maybe_error;
}
 
-   max = call->conn->params.peer->maxdata;
+   max = RXRPC_JUMBO_DATALEN;
max -= call->conn->security_size;
max &= ~(call->conn->size_align - 1UL);
 



[PATCH net-next 09/14] rxrpc: Fix unexposed client conn release

2016-09-17 Thread David Howells
If the last call on a client connection is release after the connection has
had a bunch of calls allocated but before any DATA packets are sent (so
that it's not yet marked RXRPC_CONN_EXPOSED), an assertion will happen in
rxrpc_disconnect_client_call().

af_rxrpc: Assertion failed - 1(0x1) >= 2(0x2) is false
[ cut here ]
kernel BUG at ../net/rxrpc/conn_client.c:753!

This is because it's expecting the conn to have been exposed and to have 2
or more refs - but this isn't necessarily the case.

Simply remove the assertion.  This allows the conn to be moved into the
inactive state and deleted if it isn't resurrected before the final put is
called.

Signed-off-by: David Howells 
---

 net/rxrpc/conn_client.c |1 -
 1 file changed, 1 deletion(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 5a675c43cace..226bc910e556 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -721,7 +721,6 @@ void rxrpc_disconnect_client_call(struct rxrpc_call *call)
}
 
ASSERTCMP(rcu_access_pointer(chan->call), ==, call);
-   ASSERTCMP(atomic_read(>usage), >=, 2);
 
/* If a client call was exposed to the world, we save the result for
 * retransmission.



[PATCH net-next 13/14] rxrpc: Be consistent about switch value in rxrpc_send_call_packet()

2016-09-17 Thread David Howells
rxrpc_send_call_packet() should use type in both its switch-statements
rather than using pkt->whdr.type.  This might give the compiler an easier
job of uninitialised variable checking.

Signed-off-by: David Howells 
---

 net/rxrpc/output.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index aa0507214b31..0b21ed859de7 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -182,7 +182,7 @@ int rxrpc_send_call_packet(struct rxrpc_call *call, u8 type)
 , iov, ioc, len);
 
if (ret < 0 && call->state < RXRPC_CALL_COMPLETE) {
-   switch (pkt->whdr.type) {
+   switch (type) {
case RXRPC_PACKET_TYPE_ACK:
rxrpc_propose_ACK(call, pkt->ack.reason,
  ntohs(pkt->ack.maxSkew),



[PATCH net-next 07/14] rxrpc: Fix the putting of client connections

2016-09-17 Thread David Howells
In rxrpc_put_one_client_conn(), if a connection has RXRPC_CONN_COUNTED set
on it, then it's accounted for in rxrpc_nr_client_conns and may be on
various lists - and this is cleaned up correctly.

However, if the connection doesn't have RXRPC_CONN_COUNTED set on it, then
the put routine returns rather than just skipping the extra bit of cleanup.

Fix this by making the extra bit of clean up conditional instead and always
killing off the connection.

This manifests itself as connections with a zero usage count hanging around
in /proc/net/rxrpc_conns because the connection allocated, but discarded,
due to a race with another process that set up a parallel connection, which
was then shared instead.

Signed-off-by: David Howells 
---

 net/rxrpc/conn_client.c |   28 +---
 1 file changed, 13 insertions(+), 15 deletions(-)

diff --git a/net/rxrpc/conn_client.c b/net/rxrpc/conn_client.c
index 9344a8416ceb..5a675c43cace 100644
--- a/net/rxrpc/conn_client.c
+++ b/net/rxrpc/conn_client.c
@@ -818,7 +818,7 @@ idle_connection:
 static struct rxrpc_connection *
 rxrpc_put_one_client_conn(struct rxrpc_connection *conn)
 {
-   struct rxrpc_connection *next;
+   struct rxrpc_connection *next = NULL;
struct rxrpc_local *local = conn->params.local;
unsigned int nr_conns;
 
@@ -834,24 +834,22 @@ rxrpc_put_one_client_conn(struct rxrpc_connection *conn)
 
ASSERTCMP(conn->cache_state, ==, RXRPC_CONN_CLIENT_INACTIVE);
 
-   if (!test_bit(RXRPC_CONN_COUNTED, >flags))
-   return NULL;
-
-   spin_lock(_client_conn_cache_lock);
-   nr_conns = --rxrpc_nr_client_conns;
+   if (test_bit(RXRPC_CONN_COUNTED, >flags)) {
+   spin_lock(_client_conn_cache_lock);
+   nr_conns = --rxrpc_nr_client_conns;
+
+   if (nr_conns < rxrpc_max_client_connections &&
+   !list_empty(_waiting_client_conns)) {
+   next = list_entry(rxrpc_waiting_client_conns.next,
+ struct rxrpc_connection, cache_link);
+   rxrpc_get_connection(next);
+   rxrpc_activate_conn(next);
+   }
 
-   next = NULL;
-   if (nr_conns < rxrpc_max_client_connections &&
-   !list_empty(_waiting_client_conns)) {
-   next = list_entry(rxrpc_waiting_client_conns.next,
- struct rxrpc_connection, cache_link);
-   rxrpc_get_connection(next);
-   rxrpc_activate_conn(next);
+   spin_unlock(_client_conn_cache_lock);
}
 
-   spin_unlock(_client_conn_cache_lock);
rxrpc_kill_connection(conn);
-
if (next)
rxrpc_activate_channels(next);
 



[PATCH net-next 00/14] rxrpc: Fixes & miscellany

2016-09-17 Thread David Howells

Here are some more AF_RXRPC fix patches with a couple of miscellaneous
changes also.  Fixes include:

 (1) Make RxRPC IPv6 support conditional on IPv6 being available.

 (2) Move the condition check in rxrpc_locate_data() into the caller and
 check the error return.

 (3) Fix the detection of the last received packet in recvmsg.

 (4) Account calls that need acceptance and clean up any unaccepted ones if
 the socket gets closed.

 (5) Fix the cleanup of client connections.

 (6) Fix the soft-ACK parsing and the retransmission of packets based on
 those ACKs.

 (7) Suppress transmission of an ACK when there's no pending ACK to
 transmit because another thread stole it.

And some miscellany:

 (8) Whitespace removal.

 (9) Switch-value consistency in rxrpc_send_call_packet().

(10) Fix the basic transmission packet size to allow for spur-of-the-moment
 jumbo DATA packet production.


The patches can be found here also (non-terminally on the branch):


http://git.kernel.org/cgit/linux/kernel/git/dhowells/linux-fs.git/log/?h=rxrpc-rewrite

Tagged thusly:

git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git
rxrpc-rewrite-20160917-1

David
---
David Howells (14):
  rxrpc: Remove some whitespace.
  rxrpc: Move the check of rx_pkt_offset from rxrpc_locate_data() to caller
  rxrpc: Check the return value of rxrpc_locate_data()
  rxrpc: Fix handling of the last packet in rxrpc_recvmsg_data()
  rxrpc: Record calls that need to be accepted
  rxrpc: Purge the to_be_accepted queue on socket release
  rxrpc: Fix the putting of client connections
  rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()
  rxrpc: Fix unexposed client conn release
  rxrpc: Fix the parsing of soft-ACKs
  rxrpc: Fix retransmission algorithm
  rxrpc: Don't transmit an ACK if there's no reason set
  rxrpc: Be consistent about switch value in rxrpc_send_call_packet()
  rxrpc: Fix the basic transmit DATA packet content size at 1412 bytes


 net/rxrpc/call_accept.c |2 ++
 net/rxrpc/call_event.c  |   14 
 net/rxrpc/call_object.c |   46 -
 net/rxrpc/conn_client.c |   29 --
 net/rxrpc/input.c   |6 -
 net/rxrpc/output.c  |7 +-
 net/rxrpc/recvmsg.c |   53 ---
 net/rxrpc/sendmsg.c |2 +-
 8 files changed, 89 insertions(+), 70 deletions(-)



[PATCH net-next 01/14] rxrpc: Remove some whitespace.

2016-09-17 Thread David Howells
Remove a tab that's on a line that should otherwise be blank.

Signed-off-by: David Howells 
---

 net/rxrpc/call_event.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/rxrpc/call_event.c b/net/rxrpc/call_event.c
index 61432049869b..9367c3be31eb 100644
--- a/net/rxrpc/call_event.c
+++ b/net/rxrpc/call_event.c
@@ -31,7 +31,7 @@ static void rxrpc_set_timer(struct rxrpc_call *call)
_enter("{%ld,%ld,%ld:%ld}",
   call->ack_at - now, call->resend_at - now, call->expire_at - now,
   call->timer.expires - now);
-   
+
read_lock_bh(>state_lock);
 
if (call->state < RXRPC_CALL_COMPLETE) {



[PATCH net-next 03/14] rxrpc: Check the return value of rxrpc_locate_data()

2016-09-17 Thread David Howells
Check the return value of rxrpc_locate_data() in rxrpc_recvmsg_data().

Signed-off-by: David Howells 
---

 net/rxrpc/recvmsg.c |5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/net/rxrpc/recvmsg.c b/net/rxrpc/recvmsg.c
index 0d085f5cf1bf..1edf2cf62cc5 100644
--- a/net/rxrpc/recvmsg.c
+++ b/net/rxrpc/recvmsg.c
@@ -300,10 +300,13 @@ static int rxrpc_recvmsg_data(struct socket *sock, struct 
rxrpc_call *call,
if (msg)
sock_recv_timestamp(msg, sock->sk, skb);
 
-   if (rx_pkt_offset == 0)
+   if (rx_pkt_offset == 0) {
ret = rxrpc_locate_data(call, skb,
>rxtx_annotations[ix],
_pkt_offset, _pkt_len);
+   if (ret < 0)
+   goto out;
+   }
_debug("recvmsg %x DATA #%u { %d, %d }",
   sp->hdr.callNumber, seq, rx_pkt_offset, rx_pkt_len);
 



[PATCH net-next 1/5] pie: use qdisc_dequeue_head wrapper

2016-09-17 Thread Florian Westphal
Doesn't change generated code.

Signed-off-by: Florian Westphal 
---
 net/sched/sch_pie.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index a570b0b..d976d74 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -511,7 +511,7 @@ static int pie_dump_stats(struct Qdisc *sch, struct 
gnet_dump *d)
 static struct sk_buff *pie_qdisc_dequeue(struct Qdisc *sch)
 {
struct sk_buff *skb;
-   skb = __qdisc_dequeue_head(sch, >q);
+   skb = qdisc_dequeue_head(sch);
 
if (!skb)
return NULL;
-- 
2.7.3



[PATCH net-next 0/5] sched: convert queues to single-linked list

2016-09-17 Thread Florian Westphal
During Netfilter Workshop 2016 Eric Dumazet pointed out that qdisc
schedulers use doubly-linked lists, even though single-linked list
would be enough.

The double-linked skb lists incur one extra write on enqueue/dequeue
operations (to change ->prev pointer of next list elem).

This series converts qdiscs to single-linked version, listhead
maintains pointers to first (for dequeue) and last skb (for enqueue).

Most qdiscs don't queue at all and instead use a leaf qdisc (typically
pfifo_fast) so only a few schedulers needed changes.

I briefly tested netem and htb and they seemed fine.

UDP_STREAM netperf with 64 byte packets via veth+pfifo_fast shows
a small (~2%) improvement.

Florian Westphal (5):
  pie: use qdisc_dequeue_head wrapper
  sched: don't use skb queue helpers
  sched: remove qdisc arg from __qdisc_dequeue_head
  sched: replace __skb_dequeue with __qdisc_dequeue_head
  sched: add and use qdisc_skb_head helpers

 include/net/sch_generic.h |   72 +++---
 net/sched/sch_codel.c |4 +-
 net/sched/sch_fifo.c  |4 +-
 net/sched/sch_generic.c   |   30 +++
 net/sched/sch_htb.c   |   24 ---
 net/sched/sch_netem.c |   20 +---
 net/sched/sch_pie.c   |4 +-
 7 files changed, 115 insertions(+), 43 deletions(-)



[PATCH net-next 4/5] sched: replace __skb_dequeue with __qdisc_dequeue_head

2016-09-17 Thread Florian Westphal
After previous patch these functions are identical.
Replace __skb_dequeue in qdiscs with __qdisc_dequeue_head.

Next patch will then make __qdisc_dequeue_head handle
single-linked list instead of strcut sk_buff_head argument.

Doesn't change generated code.

Signed-off-by: Florian Westphal 
---
 net/sched/sch_codel.c | 4 ++--
 net/sched/sch_netem.c | 2 +-
 net/sched/sch_pie.c   | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_codel.c b/net/sched/sch_codel.c
index 4002df3..5bfa79e 100644
--- a/net/sched/sch_codel.c
+++ b/net/sched/sch_codel.c
@@ -69,7 +69,7 @@ struct codel_sched_data {
 static struct sk_buff *dequeue_func(struct codel_vars *vars, void *ctx)
 {
struct Qdisc *sch = ctx;
-   struct sk_buff *skb = __skb_dequeue(>q);
+   struct sk_buff *skb = __qdisc_dequeue_head(>q);
 
if (skb)
sch->qstats.backlog -= qdisc_pkt_len(skb);
@@ -172,7 +172,7 @@ static int codel_change(struct Qdisc *sch, struct nlattr 
*opt)
 
qlen = sch->q.qlen;
while (sch->q.qlen > sch->limit) {
-   struct sk_buff *skb = __skb_dequeue(>q);
+   struct sk_buff *skb = __qdisc_dequeue_head(>q);
 
dropped += qdisc_pkt_len(skb);
qdisc_qstats_backlog_dec(sch, skb);
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index 1832d77..0a964b3 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -587,7 +587,7 @@ static struct sk_buff *netem_dequeue(struct Qdisc *sch)
struct rb_node *p;
 
 tfifo_dequeue:
-   skb = __skb_dequeue(>q);
+   skb = __qdisc_dequeue_head(>q);
if (skb) {
qdisc_qstats_backlog_dec(sch, skb);
 deliver:
diff --git a/net/sched/sch_pie.c b/net/sched/sch_pie.c
index d976d74..5c3a99d 100644
--- a/net/sched/sch_pie.c
+++ b/net/sched/sch_pie.c
@@ -231,7 +231,7 @@ static int pie_change(struct Qdisc *sch, struct nlattr *opt)
/* Drop excess packets if new limit is lower */
qlen = sch->q.qlen;
while (sch->q.qlen > sch->limit) {
-   struct sk_buff *skb = __skb_dequeue(>q);
+   struct sk_buff *skb = __qdisc_dequeue_head(>q);
 
dropped += qdisc_pkt_len(skb);
qdisc_qstats_backlog_dec(sch, skb);
-- 
2.7.3



[PATCH net-next 5/5] sched: add and use qdisc_skb_head helpers

2016-09-17 Thread Florian Westphal
This change replaces sk_buff_head struct in Qdiscs with new qdisc_skb_head.

Its similar to the skb_buff_head api, but does not use skb->prev pointers.

Qdiscs will commonly enqueue at the tail of a list and dequeue at head.
While skb_buff_head works fine for this, enqueue/dequeue needs to also
adjust the prev pointer of next element.

The ->prev pointer is not required for qdiscs so we can just leave
it undefined and avoid one cacheline write access for en/dequeue.

Suggested-by: Eric Dumazet 
Signed-off-by: Florian Westphal 
---
 include/net/sch_generic.h | 63 ++-
 net/sched/sch_generic.c   | 21 
 net/sched/sch_htb.c   | 24 +++---
 net/sched/sch_netem.c | 14 +--
 4 files changed, 94 insertions(+), 28 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 0741ed4..e6aa0a2 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -36,6 +36,14 @@ struct qdisc_size_table {
u16 data[];
 };
 
+/* similar to sk_buff_head, but skb->prev pointer is undefined. */
+struct qdisc_skb_head {
+   struct sk_buff  *head;
+   struct sk_buff  *tail;
+   __u32   qlen;
+   spinlock_t  lock;
+};
+
 struct Qdisc {
int (*enqueue)(struct sk_buff *skb,
   struct Qdisc *sch,
@@ -76,7 +84,7 @@ struct Qdisc {
 * For performance sake on SMP, we put highly modified fields at the end
 */
struct sk_buff  *gso_skb cacheline_aligned_in_smp;
-   struct sk_buff_head q;
+   struct qdisc_skb_head   q;
struct gnet_stats_basic_packed bstats;
seqcount_t  running;
struct gnet_stats_queue qstats;
@@ -600,10 +608,27 @@ static inline void qdisc_qstats_overlimit(struct Qdisc 
*sch)
sch->qstats.overlimits++;
 }
 
+static inline void qdisc_skb_head_init(struct qdisc_skb_head *qh)
+{
+   qh->head = NULL;
+   qh->tail = NULL;
+   qh->qlen = 0;
+}
+
 static inline int __qdisc_enqueue_tail(struct sk_buff *skb, struct Qdisc *sch,
-  struct sk_buff_head *list)
+  struct qdisc_skb_head *qh)
 {
-   __skb_queue_tail(list, skb);
+   struct sk_buff *last = qh->tail;
+
+   if (last) {
+   skb->next = NULL;
+   last->next = skb;
+   qh->tail = skb;
+   } else {
+   qh->tail = skb;
+   qh->head = skb;
+   }
+   qh->qlen++;
qdisc_qstats_backlog_inc(sch, skb);
 
return NET_XMIT_SUCCESS;
@@ -614,9 +639,17 @@ static inline int qdisc_enqueue_tail(struct sk_buff *skb, 
struct Qdisc *sch)
return __qdisc_enqueue_tail(skb, sch, >q);
 }
 
-static inline struct sk_buff *__qdisc_dequeue_head(struct sk_buff_head *list)
+static inline struct sk_buff *__qdisc_dequeue_head(struct qdisc_skb_head *qh)
 {
-   struct sk_buff *skb = __skb_dequeue(list);
+   struct sk_buff *skb = qh->head;
+
+   if (likely(skb != NULL)) {
+   qh->head = skb->next;
+   qh->qlen--;
+   if (qh->head == NULL)
+   qh->tail = NULL;
+   skb->next = NULL;
+   }
 
return skb;
 }
@@ -643,10 +676,10 @@ static inline void __qdisc_drop(struct sk_buff *skb, 
struct sk_buff **to_free)
 }
 
 static inline unsigned int __qdisc_queue_drop_head(struct Qdisc *sch,
-  struct sk_buff_head *list,
+  struct qdisc_skb_head *qh,
   struct sk_buff **to_free)
 {
-   struct sk_buff *skb = __skb_dequeue(list);
+   struct sk_buff *skb = __qdisc_dequeue_head(qh);
 
if (likely(skb != NULL)) {
unsigned int len = qdisc_pkt_len(skb);
@@ -667,7 +700,9 @@ static inline unsigned int qdisc_queue_drop_head(struct 
Qdisc *sch,
 
 static inline struct sk_buff *qdisc_peek_head(struct Qdisc *sch)
 {
-   return skb_peek(>q);
+   const struct qdisc_skb_head *qh = >q;
+
+   return qh->head;
 }
 
 /* generic pseudo peek method for non-work-conserving qdisc */
@@ -702,15 +737,19 @@ static inline struct sk_buff *qdisc_dequeue_peeked(struct 
Qdisc *sch)
return skb;
 }
 
-static inline void __qdisc_reset_queue(struct sk_buff_head *list)
+static inline void __qdisc_reset_queue(struct qdisc_skb_head *qh)
 {
/*
 * We do not know the backlog in bytes of this list, it
 * is up to the caller to correct it
 */
-   if (!skb_queue_empty(list)) {
-   rtnl_kfree_skbs(list->next, list->prev);
-   __skb_queue_head_init(list);
+   ASSERT_RTNL();
+   if (qh->qlen) {
+   rtnl_kfree_skbs(qh->head, qh->tail);
+
+   qh->head = 

[PATCH net-next 3/5] sched: remove qdisc arg from __qdisc_dequeue_head

2016-09-17 Thread Florian Westphal
Moves qdisc stat accouting to qdisc_dequeue_head.

The only direct caller of the __qdisc_dequeue_head version open-codes
this now.

This allows us to later use __qdisc_dequeue_head as a replacement
of __skb_dequeue() (which operates on sk_buff_head list).

Signed-off-by: Florian Westphal 
---
 include/net/sch_generic.h | 15 ---
 net/sched/sch_generic.c   |  7 ++-
 2 files changed, 14 insertions(+), 8 deletions(-)

diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 52a2015..0741ed4 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -614,11 +614,17 @@ static inline int qdisc_enqueue_tail(struct sk_buff *skb, 
struct Qdisc *sch)
return __qdisc_enqueue_tail(skb, sch, >q);
 }
 
-static inline struct sk_buff *__qdisc_dequeue_head(struct Qdisc *sch,
-  struct sk_buff_head *list)
+static inline struct sk_buff *__qdisc_dequeue_head(struct sk_buff_head *list)
 {
struct sk_buff *skb = __skb_dequeue(list);
 
+   return skb;
+}
+
+static inline struct sk_buff *qdisc_dequeue_head(struct Qdisc *sch)
+{
+   struct sk_buff *skb = __qdisc_dequeue_head(>q);
+
if (likely(skb != NULL)) {
qdisc_qstats_backlog_dec(sch, skb);
qdisc_bstats_update(sch, skb);
@@ -627,11 +633,6 @@ static inline struct sk_buff *__qdisc_dequeue_head(struct 
Qdisc *sch,
return skb;
 }
 
-static inline struct sk_buff *qdisc_dequeue_head(struct Qdisc *sch)
-{
-   return __qdisc_dequeue_head(sch, >q);
-}
-
 /* Instead of calling kfree_skb() while root qdisc lock is held,
  * queue the skb for future freeing at end of __dev_xmit_skb()
  */
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 5e63bf6..73877d9 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -506,7 +506,12 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc 
*qdisc)
 
if (likely(band >= 0)) {
struct sk_buff_head *list = band2list(priv, band);
-   struct sk_buff *skb = __qdisc_dequeue_head(qdisc, list);
+   struct sk_buff *skb = __qdisc_dequeue_head(list);
+
+   if (likely(skb != NULL)) {
+   qdisc_qstats_backlog_dec(qdisc, skb);
+   qdisc_bstats_update(qdisc, skb);
+   }
 
qdisc->q.qlen--;
if (skb_queue_empty(list))
-- 
2.7.3



[PATCH net-next 2/5] sched: don't use skb queue helpers

2016-09-17 Thread Florian Westphal
A followup change will replace the sk_buff_head in the qdisc
struct with a slightly different list.

Use of the sk_buff_head helpers will thus cause compiler
warnings.

Open-code these accesses in an extra change to ease review.

Signed-off-by: Florian Westphal 
---
 net/sched/sch_fifo.c| 4 ++--
 net/sched/sch_generic.c | 2 +-
 net/sched/sch_netem.c   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/sched/sch_fifo.c b/net/sched/sch_fifo.c
index baeed6a..1e37247 100644
--- a/net/sched/sch_fifo.c
+++ b/net/sched/sch_fifo.c
@@ -31,7 +31,7 @@ static int bfifo_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 static int pfifo_enqueue(struct sk_buff *skb, struct Qdisc *sch,
 struct sk_buff **to_free)
 {
-   if (likely(skb_queue_len(>q) < sch->limit))
+   if (likely(sch->q.qlen < sch->limit))
return qdisc_enqueue_tail(skb, sch);
 
return qdisc_drop(skb, sch, to_free);
@@ -42,7 +42,7 @@ static int pfifo_tail_enqueue(struct sk_buff *skb, struct 
Qdisc *sch,
 {
unsigned int prev_backlog;
 
-   if (likely(skb_queue_len(>q) < sch->limit))
+   if (likely(sch->q.qlen < sch->limit))
return qdisc_enqueue_tail(skb, sch);
 
prev_backlog = sch->qstats.backlog;
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 0d21b56..5e63bf6 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -486,7 +486,7 @@ static inline struct sk_buff_head *band2list(struct 
pfifo_fast_priv *priv,
 static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc,
  struct sk_buff **to_free)
 {
-   if (skb_queue_len(>q) < qdisc_dev(qdisc)->tx_queue_len) {
+   if (qdisc->q.qlen < qdisc_dev(qdisc)->tx_queue_len) {
int band = prio2band[skb->priority & TC_PRIO_MAX];
struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
struct sk_buff_head *list = band2list(priv, band);
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c
index aaaf021..1832d77 100644
--- a/net/sched/sch_netem.c
+++ b/net/sched/sch_netem.c
@@ -502,7 +502,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
1<<(prandom_u32() % 8);
}
 
-   if (unlikely(skb_queue_len(>q) >= sch->limit))
+   if (unlikely(sch->q.qlen >= sch->limit))
return qdisc_drop(skb, sch, to_free);
 
qdisc_qstats_backlog_inc(sch, skb);
@@ -522,7 +522,7 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
if (q->rate) {
struct sk_buff *last;
 
-   if (!skb_queue_empty(>q))
+   if (sch->q.qlen)
last = skb_peek_tail(>q);
else
last = netem_rb_to_skb(rb_last(>t_root));
-- 
2.7.3



Re: [PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Joe Perches
On Sat, 2016-09-17 at 15:27 -0700, Florian Fainelli wrote:
> Gary has not been with Broadcom for some time now, replace his address
> with the internal mailing-list used for other entries.
> 
> > Signed-off-by: Florian Fainelli 
> --- 
> Michael,
> 
> Since this is an old driver, not sure who could step up as a maintainer
> for b44?
[]
> diff --git a/MAINTAINERS b/MAINTAINERS
[]
> @@ -2500,8 +2500,8 @@ S:  Supported

> F:kernel/bpf/
>  
>  BROADCOM B44 10/100 ETHERNET DRIVER
> -M:   Gary Zambrano 
>  L:   netdev@vger.kernel.org
> +M:   bcm-kernel-feedback-l...@broadcom.com
>  S:   Supported
>  F:   drivers/net/ethernet/broadcom/b44.*

Without an actual maintainer, this should really be
orphan and not supported.

And the M: bcm-kernel-feedback-list@ should be L:

BCM4401 NICs are essentially from 2002.

Does anyone really use these any longer with a
current distribution or kernel version?



[PATCH net] MAINTAINERS: Gary Zambrano's email is bouncing

2016-09-17 Thread Florian Fainelli
Gary has not been with Broadcom for some time now, replace his address
with the internal mailing-list used for other entries.

Signed-off-by: Florian Fainelli 
---
Michael,

Since this is an old driver, not sure who could step up as a maintainer
for b44?

 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index a5e1270dfbf1..dffc3bca17ee 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2500,8 +2500,8 @@ S:Supported
 F: kernel/bpf/
 
 BROADCOM B44 10/100 ETHERNET DRIVER
-M: Gary Zambrano 
 L: netdev@vger.kernel.org
+M: bcm-kernel-feedback-l...@broadcom.com
 S: Supported
 F: drivers/net/ethernet/broadcom/b44.*
 
-- 
2.7.4



[PATCH 2/2] net: ethernet: broadcom: b44: use new api ethtool_{get|set}_link_ksettings

2016-09-17 Thread Philippe Reynes
The ethtool api {get|set}_settings is deprecated.
We move this driver to new api {get|set}_link_ksettings.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/broadcom/b44.c |   98 +++
 1 files changed, 54 insertions(+), 44 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c 
b/drivers/net/ethernet/broadcom/b44.c
index 936f06f..17aa33c 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1832,58 +1832,65 @@ static int b44_nway_reset(struct net_device *dev)
return r;
 }
 
-static int b44_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int b44_get_link_ksettings(struct net_device *dev,
+ struct ethtool_link_ksettings *cmd)
 {
struct b44 *bp = netdev_priv(dev);
+   u32 supported, advertising;
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY) {
BUG_ON(!dev->phydev);
-   return phy_ethtool_gset(dev->phydev, cmd);
+   return phy_ethtool_ksettings_get(dev->phydev, cmd);
}
 
-   cmd->supported = (SUPPORTED_Autoneg);
-   cmd->supported |= (SUPPORTED_100baseT_Half |
- SUPPORTED_100baseT_Full |
- SUPPORTED_10baseT_Half |
- SUPPORTED_10baseT_Full |
- SUPPORTED_MII);
+   supported = (SUPPORTED_Autoneg);
+   supported |= (SUPPORTED_100baseT_Half |
+ SUPPORTED_100baseT_Full |
+ SUPPORTED_10baseT_Half |
+ SUPPORTED_10baseT_Full |
+ SUPPORTED_MII);
 
-   cmd->advertising = 0;
+   advertising = 0;
if (bp->flags & B44_FLAG_ADV_10HALF)
-   cmd->advertising |= ADVERTISED_10baseT_Half;
+   advertising |= ADVERTISED_10baseT_Half;
if (bp->flags & B44_FLAG_ADV_10FULL)
-   cmd->advertising |= ADVERTISED_10baseT_Full;
+   advertising |= ADVERTISED_10baseT_Full;
if (bp->flags & B44_FLAG_ADV_100HALF)
-   cmd->advertising |= ADVERTISED_100baseT_Half;
+   advertising |= ADVERTISED_100baseT_Half;
if (bp->flags & B44_FLAG_ADV_100FULL)
-   cmd->advertising |= ADVERTISED_100baseT_Full;
-   cmd->advertising |= ADVERTISED_Pause | ADVERTISED_Asym_Pause;
-   ethtool_cmd_speed_set(cmd, ((bp->flags & B44_FLAG_100_BASE_T) ?
-   SPEED_100 : SPEED_10));
-   cmd->duplex = (bp->flags & B44_FLAG_FULL_DUPLEX) ?
+   advertising |= ADVERTISED_100baseT_Full;
+   advertising |= ADVERTISED_Pause | ADVERTISED_Asym_Pause;
+   cmd->base.speed = (bp->flags & B44_FLAG_100_BASE_T) ?
+   SPEED_100 : SPEED_10;
+   cmd->base.duplex = (bp->flags & B44_FLAG_FULL_DUPLEX) ?
DUPLEX_FULL : DUPLEX_HALF;
-   cmd->port = 0;
-   cmd->phy_address = bp->phy_addr;
-   cmd->transceiver = (bp->flags & B44_FLAG_EXTERNAL_PHY) ?
-   XCVR_EXTERNAL : XCVR_INTERNAL;
-   cmd->autoneg = (bp->flags & B44_FLAG_FORCE_LINK) ?
+   cmd->base.port = 0;
+   cmd->base.phy_address = bp->phy_addr;
+   cmd->base.autoneg = (bp->flags & B44_FLAG_FORCE_LINK) ?
AUTONEG_DISABLE : AUTONEG_ENABLE;
-   if (cmd->autoneg == AUTONEG_ENABLE)
-   cmd->advertising |= ADVERTISED_Autoneg;
+   if (cmd->base.autoneg == AUTONEG_ENABLE)
+   advertising |= ADVERTISED_Autoneg;
+
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.supported,
+   supported);
+   ethtool_convert_legacy_u32_to_link_mode(cmd->link_modes.advertising,
+   advertising);
+
if (!netif_running(dev)){
-   ethtool_cmd_speed_set(cmd, 0);
-   cmd->duplex = 0xff;
+   cmd->base.speed = 0;
+   cmd->base.duplex = 0xff;
}
-   cmd->maxtxpkt = 0;
-   cmd->maxrxpkt = 0;
+
return 0;
 }
 
-static int b44_set_settings(struct net_device *dev, struct ethtool_cmd *cmd)
+static int b44_set_link_ksettings(struct net_device *dev,
+ const struct ethtool_link_ksettings *cmd)
 {
struct b44 *bp = netdev_priv(dev);
u32 speed;
int ret;
+   u32 advertising;
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY) {
BUG_ON(!dev->phydev);
@@ -1891,31 +1898,34 @@ static int b44_set_settings(struct net_device *dev, 
struct ethtool_cmd *cmd)
if (netif_running(dev))
b44_setup_phy(bp);
 
-   ret = phy_ethtool_sset(dev->phydev, cmd);
+   ret = phy_ethtool_ksettings_set(dev->phydev, cmd);
 
spin_unlock_irq(>lock);
 
return ret;
}
 
-   speed = ethtool_cmd_speed(cmd);
+   speed = cmd->base.speed;
+
+   

[PATCH 1/2] net: ethernet: broadcom: b44: use phydev from struct net_device

2016-09-17 Thread Philippe Reynes
The private structure contain a pointer to phydev, but the structure
net_device already contain such pointer. So we can remove the pointer
phydev in the private structure, and update the driver to use the
one contained in struct net_device.

Signed-off-by: Philippe Reynes 
---
 drivers/net/ethernet/broadcom/b44.c |   22 +++---
 drivers/net/ethernet/broadcom/b44.h |1 -
 2 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/b44.c 
b/drivers/net/ethernet/broadcom/b44.c
index 74f0a37..936f06f 100644
--- a/drivers/net/ethernet/broadcom/b44.c
+++ b/drivers/net/ethernet/broadcom/b44.c
@@ -1486,7 +1486,7 @@ static int b44_open(struct net_device *dev)
b44_enable_ints(bp);
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY)
-   phy_start(bp->phydev);
+   phy_start(dev->phydev);
 
netif_start_queue(dev);
 out:
@@ -1651,7 +1651,7 @@ static int b44_close(struct net_device *dev)
netif_stop_queue(dev);
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY)
-   phy_stop(bp->phydev);
+   phy_stop(dev->phydev);
 
napi_disable(>napi);
 
@@ -1837,8 +1837,8 @@ static int b44_get_settings(struct net_device *dev, 
struct ethtool_cmd *cmd)
struct b44 *bp = netdev_priv(dev);
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY) {
-   BUG_ON(!bp->phydev);
-   return phy_ethtool_gset(bp->phydev, cmd);
+   BUG_ON(!dev->phydev);
+   return phy_ethtool_gset(dev->phydev, cmd);
}
 
cmd->supported = (SUPPORTED_Autoneg);
@@ -1886,12 +1886,12 @@ static int b44_set_settings(struct net_device *dev, 
struct ethtool_cmd *cmd)
int ret;
 
if (bp->flags & B44_FLAG_EXTERNAL_PHY) {
-   BUG_ON(!bp->phydev);
+   BUG_ON(!dev->phydev);
spin_lock_irq(>lock);
if (netif_running(dev))
b44_setup_phy(bp);
 
-   ret = phy_ethtool_sset(bp->phydev, cmd);
+   ret = phy_ethtool_sset(dev->phydev, cmd);
 
spin_unlock_irq(>lock);
 
@@ -2137,8 +2137,8 @@ static int b44_ioctl(struct net_device *dev, struct ifreq 
*ifr, int cmd)
 
spin_lock_irq(>lock);
if (bp->flags & B44_FLAG_EXTERNAL_PHY) {
-   BUG_ON(!bp->phydev);
-   err = phy_mii_ioctl(bp->phydev, ifr, cmd);
+   BUG_ON(!dev->phydev);
+   err = phy_mii_ioctl(dev->phydev, ifr, cmd);
} else {
err = generic_mii_ioctl(>mii_if, if_mii(ifr), cmd, NULL);
}
@@ -2206,7 +2206,7 @@ static const struct net_device_ops b44_netdev_ops = {
 static void b44_adjust_link(struct net_device *dev)
 {
struct b44 *bp = netdev_priv(dev);
-   struct phy_device *phydev = bp->phydev;
+   struct phy_device *phydev = dev->phydev;
bool status_changed = 0;
 
BUG_ON(!phydev);
@@ -2303,7 +2303,6 @@ static int b44_register_phy_one(struct b44 *bp)
  SUPPORTED_MII);
phydev->advertising = phydev->supported;
 
-   bp->phydev = phydev;
bp->old_link = 0;
bp->phy_addr = phydev->mdio.addr;
 
@@ -2323,9 +2322,10 @@ err_out:
 
 static void b44_unregister_phy_one(struct b44 *bp)
 {
+   struct net_device *dev = bp->dev;
struct mii_bus *mii_bus = bp->mii_bus;
 
-   phy_disconnect(bp->phydev);
+   phy_disconnect(dev->phydev);
mdiobus_unregister(mii_bus);
mdiobus_free(mii_bus);
 }
diff --git a/drivers/net/ethernet/broadcom/b44.h 
b/drivers/net/ethernet/broadcom/b44.h
index 65d88d7..89d2cf3 100644
--- a/drivers/net/ethernet/broadcom/b44.h
+++ b/drivers/net/ethernet/broadcom/b44.h
@@ -404,7 +404,6 @@ struct b44 {
u32 tx_pending;
u8  phy_addr;
u8  force_copybreak;
-   struct phy_device   *phydev;
struct mii_bus  *mii_bus;
int old_link;
struct mii_if_info  mii_if;
-- 
1.7.4.4



Re: stmmac/RTL8211F/Meson GXBB: TX throughput problems

2016-09-17 Thread André Roth

Hi all,

I have an odroid c2 board which shows this issue. No data is
transmitted or received after a moment of intense tx traffic. Copying a
1GB file per scp from the board triggers it repeatedly. 

The board has a stmmac - user ID: 0x11, Synopsys ID: 0x37.

When switching the network to 100Mb/s the copying does
not seam to trigger the issue.

I've attached the ethtool statistics before and after the problem.

Thanks for your help, 

 André



> Hi Alexandre,
> 
> On Mon, Sep 12, 2016 at 6:37 PM, Alexandre Torgue
>  wrote:
> > Which Synopsys IP version do you use ?  
> found this in a dmesg log:
> [1.504784] stmmac - user ID: 0x11, Synopsys ID: 0x37
> [1.509785]  Ring mode enabled
> [1.512796]  DMA HW capability register supported
> [1.517286]  Normal descriptors
> [1.520565]  RX Checksum Offload Engine supported
> [1.525219]  COE Type 2
> [1.527638]  TX Checksum insertion supported
> [1.531862]  Wake-Up On Lan supported
> [1.535483]  Enable RX Mitigation via HW Watchdog Timer
> [1.543851] libphy: stmmac: probed
> [1.544025] eth0: PHY ID 001cc916 at 0 IRQ POLL (stmmac-0:00)
> active [1.550321] eth0: PHY ID 001cc916 at 7 IRQ POLL
> (stmmac-0:07)
> 
> >> Gbit ethernet on my device is provided by a Realtek RTL8211F RGMII
> >> PHY. Similar issues were reported in #linux-amlogic by a user with
> >> an Odroid C2 board (= similar hardware).
> >>
> >> The symptoms are:
> >> Receiving data is plenty fast (I can max out my internet connection
> >> easily, and with iperf3 I get ~900Mbit/s).
> >> Transmitting data from the device is unfortunately very slow,
> >> traffic sometimes even stalls completely.
> >>
> >> I have attached the iperf results and the output of
> >> /sys/kernel/debug/stmmaceth/eth0/descriptors_status.
> >> Below you can find the ifconfig, netstat and stmmac dma_cap info
> >> (*after* I ran all tests).
> >>
> >> The "involved parties" are:
> >> - Meson GXBB specific network configuration registers (I have have
> >> double-checked them with the reference drivers: everything seems
> >> fine here)
> >> - stmmac: it seems that nobody else has reported these kind of
> >> issues so far, however I'd still like to hear where I should
> >> enable some debugging bits to rule out any stmmac bug  
> >
> >
> > On my side, I just tested on the same "kind" of system:
> > -SYNOPSYS GMAC 3.7
> > -RTL8211EG as PHY
> >
> > With I perf, I reach:
> > -RX: 932 Mbps
> > -TX: 820Mbps
> >
> > Can you check ethtool -S eth0 (most precisely "MMC"counter and
> > errors) ? Which kernel version do you use ?  
> I am using a 4.8.0-rc4 kernel, based on Kevin's "integration" branch:
> [0] Unfortunately I don't have access to my device in the next few
> days, but I'll keep you updated once I have the ethtool output.
> 
> 
> Thanks for your time
> Regards,
> Martin
> 
> 
> [0]
> https://git.kernel.org/cgit/linux/kernel/git/khilman/linux-amlogic.git/log/?h=v4.8/integ
> 
> ___
> linux-amlogic mailing list
> linux-amlo...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-amlogic
> 



ethstats.after
Description: Binary data


ethstats.before
Description: Binary data


Re: [PATCH v2 net-next 07/16] tcp: track data delivery rate for a TCP connection

2016-09-17 Thread Eric Dumazet
On Sat, Sep 17, 2016 at 12:04 PM, kbuild test robot  wrote:
> Hi Yuchung,
>
> [auto build test ERROR on net-next/master]
>
> url:
> https://github.com/0day-ci/linux/commits/Neal-Cardwell/tcp-BBR-congestion-control-algorithm/20160918-014058
> config: x86_64-randconfig-s2-09180225 (attached as .config)
> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
>
> All error/warnings (new ones prefixed by >>):
>
>net/ipv4/tcp_input.c: In function 'tcp_ack':
>>> net/ipv4/tcp_input.c:3559: error: unknown field 'v64' specified in 
>>> initializer
>>> net/ipv4/tcp_input.c:3559: warning: missing braces around initializer
>net/ipv4/tcp_input.c:3559: warning: (near initialization for 
> 'rs.prior_mstamp.')
>
> vim +/v64 +3559 net/ipv4/tcp_input.c
>
>   3553  /* This routine deals with incoming acks, but not outgoing ones. */
>   3554  static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int 
> flag)
>   3555  {
>   3556  struct inet_connection_sock *icsk = inet_csk(sk);
>   3557  struct tcp_sock *tp = tcp_sk(sk);
>   3558  struct tcp_sacktag_state sack_state;
>> 3559  struct rate_sample rs = { .prior_mstamp.v64 = 0, 
>> .prior_delivered = 0 };
>

Arg, silly compilers out there.

We can omit prior_mstamp , as the compiler will zero fields anyway

struct rate_sample rs = { .prior_delivered = 0 };


Re: [PATCH v2 net-next 07/16] tcp: track data delivery rate for a TCP connection

2016-09-17 Thread kbuild test robot
Hi Yuchung,

[auto build test ERROR on net-next/master]

url:
https://github.com/0day-ci/linux/commits/Neal-Cardwell/tcp-BBR-congestion-control-algorithm/20160918-014058
config: x86_64-randconfig-s2-09180225 (attached as .config)
compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
reproduce:
# save the attached .config to linux build tree
make ARCH=x86_64 

All error/warnings (new ones prefixed by >>):

   net/ipv4/tcp_input.c: In function 'tcp_ack':
>> net/ipv4/tcp_input.c:3559: error: unknown field 'v64' specified in 
>> initializer
>> net/ipv4/tcp_input.c:3559: warning: missing braces around initializer
   net/ipv4/tcp_input.c:3559: warning: (near initialization for 
'rs.prior_mstamp.')

vim +/v64 +3559 net/ipv4/tcp_input.c

  3553  /* This routine deals with incoming acks, but not outgoing ones. */
  3554  static int tcp_ack(struct sock *sk, const struct sk_buff *skb, int flag)
  3555  {
  3556  struct inet_connection_sock *icsk = inet_csk(sk);
  3557  struct tcp_sock *tp = tcp_sk(sk);
  3558  struct tcp_sacktag_state sack_state;
> 3559  struct rate_sample rs = { .prior_mstamp.v64 = 0, 
> .prior_delivered = 0 };
  3560  u32 prior_snd_una = tp->snd_una;
  3561  u32 ack_seq = TCP_SKB_CB(skb)->seq;
  3562  u32 ack = TCP_SKB_CB(skb)->ack_seq;

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: application/gzip


[PATCH v2 net-next 10/16] tcp: allow congestion control module to request TSO skb segment count

2016-09-17 Thread Neal Cardwell
Add the tso_segs_goal() function in tcp_congestion_ops to allow the
congestion control module to specify the number of segments that
should be in a TSO skb sent by tcp_write_xmit() and
tcp_xmit_retransmit_queue(). The congestion control module can either
request a particular number of segments in TSO skb that we transmit,
or return 0 if it doesn't care.

This allows the upcoming BBR congestion control module to select small
TSO skb sizes if the module detects that the bottleneck bandwidth is
very low, or that the connection is policed to a low rate.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h |  2 ++
 net/ipv4/tcp_output.c | 15 +--
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index a69ed7f..f8f581f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -913,6 +913,8 @@ struct tcp_congestion_ops {
u32  (*undo_cwnd)(struct sock *sk);
/* hook for packet ack accounting (optional) */
void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample);
+   /* suggest number of segments for each skb to transmit (optional) */
+   u32 (*tso_segs_goal)(struct sock *sk);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e02c8eb..0137956 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1566,6 +1566,17 @@ static u32 tcp_tso_autosize(const struct sock *sk, 
unsigned int mss_now)
return min_t(u32, segs, sk->sk_gso_max_segs);
 }
 
+/* Return the number of segments we want in the skb we are transmitting.
+ * See if congestion control module wants to decide; otherwise, autosize.
+ */
+static u32 tcp_tso_segs(struct sock *sk, unsigned int mss_now)
+{
+   const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
+   u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
+
+   return tso_segs ? : tcp_tso_autosize(sk, mss_now);
+}
+
 /* Returns the portion of skb which can be sent right away */
 static unsigned int tcp_mss_split_point(const struct sock *sk,
const struct sk_buff *skb,
@@ -2061,7 +2072,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int 
mss_now, int nonagle,
}
}
 
-   max_segs = tcp_tso_autosize(sk, mss_now);
+   max_segs = tcp_tso_segs(sk, mss_now);
while ((skb = tcp_send_head(sk))) {
unsigned int limit;
 
@@ -2778,7 +2789,7 @@ void tcp_xmit_retransmit_queue(struct sock *sk)
last_lost = tp->snd_una;
}
 
-   max_segs = tcp_tso_autosize(sk, tcp_current_mss(sk));
+   max_segs = tcp_tso_segs(sk, tcp_current_mss(sk));
tcp_for_write_queue_from(skb, sk) {
__u8 sacked;
int segs;
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 14/16] tcp: new CC hook to set sending rate with rate_sample in any CA state

2016-09-17 Thread Neal Cardwell
From: Yuchung Cheng 

This commit introduces an optional new "omnipotent" hook,
cong_control(), for congestion control modules. The cong_control()
function is called at the end of processing an ACK (i.e., after
updating sequence numbers, the SACK scoreboard, and loss
detection). At that moment we have precise delivery rate information
the congestion control module can use to control the sending behavior
(using cwnd, TSO skb size, and pacing rate) in any CA state.

This function can also be used by a congestion control that prefers
not to use the default cwnd reduction approach (i.e., the PRR
algorithm) during CA_Recovery to control the cwnd and sending rate
during loss recovery.

We take advantage of the fact that recent changes defer the
retransmission or transmission of new data (e.g. by F-RTO) in recovery
until the new tcp_cong_control() function is run.

With this commit, we only run tcp_update_pacing_rate() if the
congestion control is not using this new API. New congestion controls
which use the new API do not want the TCP stack to run the default
pacing rate calculation and overwrite whatever pacing rate they have
chosen at initialization time.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h|  4 
 net/ipv4/tcp_cong.c  |  2 +-
 net/ipv4/tcp_input.c | 17 ++---
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 1aa9628..f83b7f2 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -919,6 +919,10 @@ struct tcp_congestion_ops {
u32 (*tso_segs_goal)(struct sock *sk);
/* returns the multiplier used in tcp_sndbuf_expand (optional) */
u32 (*sndbuf_expand)(struct sock *sk);
+   /* call when packets are delivered to update cwnd and pacing rate,
+* after all the ca_state processing. (optional)
+*/
+   void (*cong_control)(struct sock *sk, const struct rate_sample *rs);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_cong.c b/net/ipv4/tcp_cong.c
index 882caa4..1294af4 100644
--- a/net/ipv4/tcp_cong.c
+++ b/net/ipv4/tcp_cong.c
@@ -69,7 +69,7 @@ int tcp_register_congestion_control(struct tcp_congestion_ops 
*ca)
int ret = 0;
 
/* all algorithms must implement ssthresh and cong_avoid ops */
-   if (!ca->ssthresh || !ca->cong_avoid) {
+   if (!ca->ssthresh || !(ca->cong_avoid || ca->cong_control)) {
pr_err("%s does not implement required ops\n", ca->name);
return -EINVAL;
}
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index a134e66..931fe32 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2536,6 +2536,9 @@ static inline void tcp_end_cwnd_reduction(struct sock *sk)
 {
struct tcp_sock *tp = tcp_sk(sk);
 
+   if (inet_csk(sk)->icsk_ca_ops->cong_control)
+   return;
+
/* Reset cwnd to ssthresh in CWR or Recovery (unless it's undone) */
if (inet_csk(sk)->icsk_ca_state == TCP_CA_CWR ||
(tp->undo_marker && tp->snd_ssthresh < TCP_INFINITE_SSTHRESH)) {
@@ -3312,8 +3315,15 @@ static inline bool tcp_may_raise_cwnd(const struct sock 
*sk, const int flag)
  * information. All transmission or retransmission are delayed afterwards.
  */
 static void tcp_cong_control(struct sock *sk, u32 ack, u32 acked_sacked,
-int flag)
+int flag, const struct rate_sample *rs)
 {
+   const struct inet_connection_sock *icsk = inet_csk(sk);
+
+   if (icsk->icsk_ca_ops->cong_control) {
+   icsk->icsk_ca_ops->cong_control(sk, rs);
+   return;
+   }
+
if (tcp_in_cwnd_reduction(sk)) {
/* Reduce cwnd if state mandates */
tcp_cwnd_reduction(sk, acked_sacked, flag);
@@ -3683,7 +3693,7 @@ static int tcp_ack(struct sock *sk, const struct sk_buff 
*skb, int flag)
delivered = tp->delivered - delivered;  /* freshly ACKed or SACKed */
lost = tp->lost - lost; /* freshly marked lost */
tcp_rate_gen(sk, delivered, lost, , );
-   tcp_cong_control(sk, ack, delivered, flag);
+   tcp_cong_control(sk, ack, delivered, flag, );
tcp_xmit_recovery(sk, rexmit);
return 1;
 
@@ -5981,7 +5991,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff 
*skb)
} else
tcp_init_metrics(sk);
 
-   tcp_update_pacing_rate(sk);
+   if (!inet_csk(sk)->icsk_ca_ops->cong_control)
+   

[PATCH v2 net-next 05/16] tcp: switch back to proper tcp_skb_cb size check in tcp_init()

2016-09-17 Thread Neal Cardwell
From: Eric Dumazet 

Revert to the tcp_skb_cb size check that tcp_init() had before commit
b4772ef879a8 ("net: use common macro for assering skb->cb[] available
size in protocol families"). As related commit 744d5a3e9fe2 ("net:
move skb->dropcount to skb->cb[]") explains, the
sock_skb_cb_check_size() mechanism was added to ensure that there is
space for dropcount, "for protocol families using it". But TCP is not
a protocol using dropcount, so tcp_init() doesn't need to provision
space for dropcount in the skb->cb[], and thus we can revert to the
older form of the tcp_skb_cb size check. Doing so allows TCP to use 4
more bytes of the skb->cb[] space.

Fixes: b4772ef879a8 ("net: use common macro for assering skb->cb[] available 
size in protocol families")
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
---
 net/ipv4/tcp.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 5b0b49c..53798e1 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3244,11 +3244,12 @@ static void __init tcp_init_mem(void)
 
 void __init tcp_init(void)
 {
-   unsigned long limit;
int max_rshare, max_wshare, cnt;
+   unsigned long limit;
+   struct sk_buff *skb;
unsigned int i;
 
-   sock_skb_cb_check_size(sizeof(struct tcp_skb_cb));
+   BUILD_BUG_ON(sizeof(struct tcp_skb_cb) > sizeof(skb->cb));
 
percpu_counter_init(_sockets_allocated, 0, GFP_KERNEL);
percpu_counter_init(_orphan_count, 0, GFP_KERNEL);
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 07/16] tcp: track data delivery rate for a TCP connection

2016-09-17 Thread Neal Cardwell
From: Yuchung Cheng 

This patch generates data delivery rate (throughput) samples on a
per-ACK basis. These rate samples can be used by congestion control
modules, and specifically will be used by TCP BBR in later patches in
this series.

Key state:

tp->delivered: Tracks the total number of data packets (original or not)
   delivered so far. This is an already-existing field.

tp->delivered_mstamp: the last time tp->delivered was updated.

Algorithm:

A rate sample is calculated as (d1 - d0)/(t1 - t0) on a per-ACK basis:

  d1: the current tp->delivered after processing the ACK
  t1: the current time after processing the ACK

  d0: the prior tp->delivered when the acked skb was transmitted
  t0: the prior tp->delivered_mstamp when the acked skb was transmitted

When an skb is transmitted, we snapshot d0 and t0 in its control
block in tcp_rate_skb_sent().

When an ACK arrives, it may SACK and ACK some skbs. For each SACKed
or ACKed skb, tcp_rate_skb_delivered() updates the rate_sample struct
to reflect the latest (d0, t0).

Finally, tcp_rate_gen() generates a rate sample by storing
(d1 - d0) in rs->delivered and (t1 - t0) in rs->interval_us.

One caveat: if an skb was sent with no packets in flight, then
tp->delivered_mstamp may be either invalid (if the connection is
starting) or outdated (if the connection was idle). In that case,
we'll re-stamp tp->delivered_mstamp.

At first glance it seems t0 should always be the time when an skb was
transmitted, but actually this could over-estimate the rate due to
phase mismatch between transmit and ACK events. To track the delivery
rate, we ensure that if packets are in flight then t0 and and t1 are
times at which packets were marked delivered.

If the initial and final RTTs are different then one may be corrupted
by some sort of noise. The noise we see most often is sending gaps
caused by delayed, compressed, or stretched acks. This either affects
both RTTs equally or artificially reduces the final RTT. We approach
this by recording the info we need to compute the initial RTT
(duration of the "send phase" of the window) when we recorded the
associated inflight. Then, for a filter to avoid bandwidth
overestimates, we generalize the per-sample bandwidth computation
from:

bw = delivered / ack_phase_rtt

to the following:

bw = delivered / max(send_phase_rtt, ack_phase_rtt)

In large-scale experiments, this filtering approach incorporating
send_phase_rtt is effective at avoiding bandwidth overestimates due to
ACK compression or stretched ACKs.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h   |   2 +
 include/net/tcp.h |  35 +++-
 net/ipv4/Makefile |   2 +-
 net/ipv4/tcp_input.c  |  46 +++-
 net/ipv4/tcp_output.c |   4 ++
 net/ipv4/tcp_rate.c   | 149 ++
 6 files changed, 222 insertions(+), 16 deletions(-)
 create mode 100644 net/ipv4/tcp_rate.c

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 38590fb..c50e6ae 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -268,6 +268,8 @@ struct tcp_sock {
u32 prr_out;/* Total number of pkts sent during Recovery. */
u32 delivered;  /* Total data packets delivered incl. rexmits */
u32 lost;   /* Total data packets lost incl. rexmits */
+   struct skb_mstamp first_tx_mstamp;  /* start of window send phase */
+   struct skb_mstamp delivered_mstamp; /* time we reached "delivered" */
 
u32 rcv_wnd;/* Current receiver window  */
u32 write_seq;  /* Tail(+1) of data held in tcp send buffer */
diff --git a/include/net/tcp.h b/include/net/tcp.h
index 2f1648a..b261c89 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -763,8 +763,14 @@ struct tcp_skb_cb {
__u32   ack_seq;/* Sequence number ACK'd*/
union {
struct {
-   /* There is space for up to 20 bytes */
+   /* There is space for up to 24 bytes */
__u32 in_flight;/* Bytes in flight when packet sent */
+   /* pkts S/ACKed so far upon tx of skb, incl retrans: */
+   __u32 delivered;
+   /* start of send pipeline phase */
+   struct skb_mstamp first_tx_mstamp;
+   /* when we reached the "delivered" count */
+   struct skb_mstamp delivered_mstamp;
} tx;   /* only used for outgoing skbs */
union {
struct inet_skb_parmh4;
@@ -860,6 +866,26 @@ struct ack_sample {
 

[PATCH v2 net-next 04/16] net_sched: sch_fq: add low_rate_threshold parameter

2016-09-17 Thread Neal Cardwell
From: Eric Dumazet 

This commit adds to the fq module a low_rate_threshold parameter to
insert a delay after all packets if the socket requests a pacing rate
below the threshold.

This helps achieve more precise control of the sending rate with
low-rate paths, especially policers. The basic issue is that if a
congestion control module detects a policer at a certain rate, it may
want fq to be able to shape to that policed rate. That way the sender
can avoid policer drops by having the packets arrive at the policer at
or just under the policed rate.

The default threshold of 550Kbps was chosen analytically so that for
policers or links at 500Kbps or 512Kbps fq would very likely invoke
this mechanism, even if the pacing rate was briefly slightly above the
available bandwidth. This value was then empirically validated with
two years of production testing on YouTube video servers.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/uapi/linux/pkt_sched.h |  2 ++
 net/sched/sch_fq.c | 22 +++---
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/include/uapi/linux/pkt_sched.h b/include/uapi/linux/pkt_sched.h
index 2382eed..f8e39db 100644
--- a/include/uapi/linux/pkt_sched.h
+++ b/include/uapi/linux/pkt_sched.h
@@ -792,6 +792,8 @@ enum {
 
TCA_FQ_ORPHAN_MASK, /* mask applied to orphaned skb hashes */
 
+   TCA_FQ_LOW_RATE_THRESHOLD, /* per packet delay under this rate */
+
__TCA_FQ_MAX
 };
 
diff --git a/net/sched/sch_fq.c b/net/sched/sch_fq.c
index e5458b9..40ad4fc 100644
--- a/net/sched/sch_fq.c
+++ b/net/sched/sch_fq.c
@@ -94,6 +94,7 @@ struct fq_sched_data {
u32 flow_max_rate;  /* optional max rate per flow */
u32 flow_plimit;/* max packets per flow */
u32 orphan_mask;/* mask for orphaned skb */
+   u32 low_rate_threshold;
struct rb_root  *fq_root;
u8  rate_enable;
u8  fq_trees_log;
@@ -433,7 +434,7 @@ static struct sk_buff *fq_dequeue(struct Qdisc *sch)
struct fq_flow_head *head;
struct sk_buff *skb;
struct fq_flow *f;
-   u32 rate;
+   u32 rate, plen;
 
skb = fq_dequeue_head(sch, >internal);
if (skb)
@@ -482,7 +483,7 @@ begin:
prefetch(>end);
f->credit -= qdisc_pkt_len(skb);
 
-   if (f->credit > 0 || !q->rate_enable)
+   if (!q->rate_enable)
goto out;
 
/* Do not pace locally generated ack packets */
@@ -493,8 +494,15 @@ begin:
if (skb->sk)
rate = min(skb->sk->sk_pacing_rate, rate);
 
+   if (rate <= q->low_rate_threshold) {
+   f->credit = 0;
+   plen = qdisc_pkt_len(skb);
+   } else {
+   plen = max(qdisc_pkt_len(skb), q->quantum);
+   if (f->credit > 0)
+   goto out;
+   }
if (rate != ~0U) {
-   u32 plen = max(qdisc_pkt_len(skb), q->quantum);
u64 len = (u64)plen * NSEC_PER_SEC;
 
if (likely(rate))
@@ -662,6 +670,7 @@ static const struct nla_policy fq_policy[TCA_FQ_MAX + 1] = {
[TCA_FQ_FLOW_MAX_RATE]  = { .type = NLA_U32 },
[TCA_FQ_BUCKETS_LOG]= { .type = NLA_U32 },
[TCA_FQ_FLOW_REFILL_DELAY]  = { .type = NLA_U32 },
+   [TCA_FQ_LOW_RATE_THRESHOLD] = { .type = NLA_U32 },
 };
 
 static int fq_change(struct Qdisc *sch, struct nlattr *opt)
@@ -716,6 +725,10 @@ static int fq_change(struct Qdisc *sch, struct nlattr *opt)
if (tb[TCA_FQ_FLOW_MAX_RATE])
q->flow_max_rate = nla_get_u32(tb[TCA_FQ_FLOW_MAX_RATE]);
 
+   if (tb[TCA_FQ_LOW_RATE_THRESHOLD])
+   q->low_rate_threshold =
+   nla_get_u32(tb[TCA_FQ_LOW_RATE_THRESHOLD]);
+
if (tb[TCA_FQ_RATE_ENABLE]) {
u32 enable = nla_get_u32(tb[TCA_FQ_RATE_ENABLE]);
 
@@ -781,6 +794,7 @@ static int fq_init(struct Qdisc *sch, struct nlattr *opt)
q->fq_root  = NULL;
q->fq_trees_log = ilog2(1024);
q->orphan_mask  = 1024 - 1;
+   q->low_rate_threshold   = 55 / 8;
qdisc_watchdog_init(>watchdog, sch);
 
if (opt)
@@ -811,6 +825,8 @@ static int fq_dump(struct Qdisc *sch, struct sk_buff *skb)
nla_put_u32(skb, TCA_FQ_FLOW_REFILL_DELAY,
jiffies_to_usecs(q->flow_refill_delay)) ||
nla_put_u32(skb, TCA_FQ_ORPHAN_MASK, q->orphan_mask) ||
+   nla_put_u32(skb, TCA_FQ_LOW_RATE_THRESHOLD,
+   q->low_rate_threshold) ||
nla_put_u32(skb, TCA_FQ_BUCKETS_LOG, 

[PATCH v2 net-next 03/16] tcp: use windowed min filter library for TCP min_rtt estimation

2016-09-17 Thread Neal Cardwell
Refactor the TCP min_rtt code to reuse the new win_minmax library in
lib/win_minmax.c to simplify the TCP code.

This is a pure refactor: the functionality is exactly the same. We
just moved the windowed min code to make TCP easier to read and
maintain, and to allow other parts of the kernel to use the windowed
min/max filter code.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h  |  5 ++--
 include/net/tcp.h|  2 +-
 net/ipv4/tcp.c   |  2 +-
 net/ipv4/tcp_input.c | 64 
 net/ipv4/tcp_minisocks.c |  2 +-
 5 files changed, 10 insertions(+), 65 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c723a46..6433cc8 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -19,6 +19,7 @@
 
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -234,9 +235,7 @@ struct tcp_sock {
u32 mdev_max_us;/* maximal mdev for the last rtt period */
u32 rttvar_us;  /* smoothed mdev_max*/
u32 rtt_seq;/* sequence number to update rttvar */
-   struct rtt_meas {
-   u32 rtt, ts;/* RTT in usec and sampling time in jiffies. */
-   } rtt_min[3];
+   struct  minmax rtt_min;
 
u32 packets_out;/* Packets which are "in flight"*/
u32 retrans_out;/* Retransmitted packets out*/
diff --git a/include/net/tcp.h b/include/net/tcp.h
index fdfbedd..2f1648a 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -671,7 +671,7 @@ static inline bool tcp_ca_dst_locked(const struct dst_entry 
*dst)
 /* Minimum RTT in usec. ~0 means not available. */
 static inline u32 tcp_min_rtt(const struct tcp_sock *tp)
 {
-   return tp->rtt_min[0].rtt;
+   return minmax_get(>rtt_min);
 }
 
 /* Compute the actual receive window we are currently advertising.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index a13fcb3..5b0b49c 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -387,7 +387,7 @@ void tcp_init_sock(struct sock *sk)
 
icsk->icsk_rto = TCP_TIMEOUT_INIT;
tp->mdev_us = jiffies_to_usecs(TCP_TIMEOUT_INIT);
-   tp->rtt_min[0].rtt = ~0U;
+   minmax_reset(>rtt_min, tcp_time_stamp, ~0U);
 
/* So many TCP implementations out there (incorrectly) count the
 * initial SYN frame in their delayed-ACK and congestion control
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 70b892d..ac5b38f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -2879,67 +2879,13 @@ static void tcp_fastretrans_alert(struct sock *sk, 
const int acked,
*rexmit = REXMIT_LOST;
 }
 
-/* Kathleen Nichols' algorithm for tracking the minimum value of
- * a data stream over some fixed time interval. (E.g., the minimum
- * RTT over the past five minutes.) It uses constant space and constant
- * time per update yet almost always delivers the same minimum as an
- * implementation that has to keep all the data in the window.
- *
- * The algorithm keeps track of the best, 2nd best & 3rd best min
- * values, maintaining an invariant that the measurement time of the
- * n'th best >= n-1'th best. It also makes sure that the three values
- * are widely separated in the time window since that bounds the worse
- * case error when that data is monotonically increasing over the window.
- *
- * Upon getting a new min, we can forget everything earlier because it
- * has no value - the new min is <= everything else in the window by
- * definition and it's the most recent. So we restart fresh on every new min
- * and overwrites 2nd & 3rd choices. The same property holds for 2nd & 3rd
- * best.
- */
 static void tcp_update_rtt_min(struct sock *sk, u32 rtt_us)
 {
-   const u32 now = tcp_time_stamp, wlen = sysctl_tcp_min_rtt_wlen * HZ;
-   struct rtt_meas *m = tcp_sk(sk)->rtt_min;
-   struct rtt_meas rttm = {
-   .rtt = likely(rtt_us) ? rtt_us : jiffies_to_usecs(1),
-   .ts = now,
-   };
-   u32 elapsed;
-
-   /* Check if the new measurement updates the 1st, 2nd, or 3rd choices */
-   if (unlikely(rttm.rtt <= m[0].rtt))
-   m[0] = m[1] = m[2] = rttm;
-   else if (rttm.rtt <= m[1].rtt)
-   m[1] = m[2] = rttm;
-   else if (rttm.rtt <= m[2].rtt)
-   m[2] = rttm;
-
-   elapsed = now - m[0].ts;
-   if (unlikely(elapsed > wlen)) {
-   /* Passed entire window without a new min so make 2nd choice
-* the new min & 3rd choice the new 2nd. So forth and so on.
-*/
-   m[0] = m[1];
-   m[1] = m[2];
-   m[2] = rttm;
-   if 

[PATCH v2 net-next 12/16] tcp: export tcp_mss_to_mtu() for congestion control modules

2016-09-17 Thread Neal Cardwell
Export tcp_mss_to_mtu(), so that congestion control modules can use
this to help calculate a pacing rate.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 net/ipv4/tcp_output.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0bf3d48..7d025a7 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1362,6 +1362,7 @@ int tcp_mss_to_mtu(struct sock *sk, int mss)
}
return mtu;
 }
+EXPORT_SYMBOL(tcp_mss_to_mtu);
 
 /* MTU probing init per socket */
 void tcp_mtup_init(struct sock *sk)
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 02/16] lib/win_minmax: windowed min or max estimator

2016-09-17 Thread Neal Cardwell
This commit introduces a generic library to estimate either the min or
max value of a time-varying variable over a recent time window. This
is code originally from Kathleen Nichols. The current form of the code
is from Van Jacobson.

A single struct minmax_sample will track the estimated windowed-max
value of the series if you call minmax_running_max() or the estimated
windowed-min value of the series if you call minmax_running_min().

Nearly equivalent code is already in place for minimum RTT estimation
in the TCP stack. This commit extracts that code and generalizes it to
handle both min and max. Moving the code here reduces the footprint
and complexity of the TCP code base and makes the filter generally
available for other parts of the codebase, including an upcoming TCP
congestion control module.

This library works well for time series where the measurements are
smoothly increasing or decreasing.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/win_minmax.h | 37 +
 lib/Makefile   |  2 +-
 lib/win_minmax.c   | 98 ++
 3 files changed, 136 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/win_minmax.h
 create mode 100644 lib/win_minmax.c

diff --git a/include/linux/win_minmax.h b/include/linux/win_minmax.h
new file mode 100644
index 000..5656960
--- /dev/null
+++ b/include/linux/win_minmax.h
@@ -0,0 +1,37 @@
+/**
+ * lib/minmax.c: windowed min/max tracker by Kathleen Nichols.
+ *
+ */
+#ifndef MINMAX_H
+#define MINMAX_H
+
+#include 
+
+/* A single data point for our parameterized min-max tracker */
+struct minmax_sample {
+   u32 t;  /* time measurement was taken */
+   u32 v;  /* value measured */
+};
+
+/* State for the parameterized min-max tracker */
+struct minmax {
+   struct minmax_sample s[3];
+};
+
+static inline u32 minmax_get(const struct minmax *m)
+{
+   return m->s[0].v;
+}
+
+static inline u32 minmax_reset(struct minmax *m, u32 t, u32 meas)
+{
+   struct minmax_sample val = { .t = t, .v = meas };
+
+   m->s[2] = m->s[1] = m->s[0] = val;
+   return m->s[0].v;
+}
+
+u32 minmax_running_max(struct minmax *m, u32 win, u32 t, u32 meas);
+u32 minmax_running_min(struct minmax *m, u32 win, u32 t, u32 meas);
+
+#endif
diff --git a/lib/Makefile b/lib/Makefile
index 5dc77a8..df747e5 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 flex_proportions.o ratelimit.o show_mem.o \
 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o
+earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/win_minmax.c b/lib/win_minmax.c
new file mode 100644
index 000..c8420d4
--- /dev/null
+++ b/lib/win_minmax.c
@@ -0,0 +1,98 @@
+/**
+ * lib/minmax.c: windowed min/max tracker
+ *
+ * Kathleen Nichols' algorithm for tracking the minimum (or maximum)
+ * value of a data stream over some fixed time interval.  (E.g.,
+ * the minimum RTT over the past five minutes.) It uses constant
+ * space and constant time per update yet almost always delivers
+ * the same minimum as an implementation that has to keep all the
+ * data in the window.
+ *
+ * The algorithm keeps track of the best, 2nd best & 3rd best min
+ * values, maintaining an invariant that the measurement time of
+ * the n'th best >= n-1'th best. It also makes sure that the three
+ * values are widely separated in the time window since that bounds
+ * the worse case error when that data is monotonically increasing
+ * over the window.
+ *
+ * Upon getting a new min, we can forget everything earlier because
+ * it has no value - the new min is <= everything else in the window
+ * by definition and it's the most recent. So we restart fresh on
+ * every new min and overwrites 2nd & 3rd choices. The same property
+ * holds for 2nd & 3rd best.
+ */
+#include 
+#include 
+
+/* As time advances, update the 1st, 2nd, and 3rd choices. */
+static u32 minmax_subwin_update(struct minmax *m, u32 win,
+   const struct minmax_sample *val)
+{
+   u32 dt = val->t - m->s[0].t;
+
+   if (unlikely(dt > win)) {
+   /*
+* Passed entire window without a new val so make 2nd
+* choice the new val & 3rd choice the new 2nd choice.
+* we may have to iterate this since our 2nd choice
+* may also be outside the window (we checked on entry
+* that the third 

[PATCH v2 net-next 13/16] tcp: allow congestion control to expand send buffer differently

2016-09-17 Thread Neal Cardwell
From: Yuchung Cheng 

Currently the TCP send buffer expands to twice cwnd, in order to allow
limited transmits in the CA_Recovery state. This assumes that cwnd
does not increase in the CA_Recovery.

For some congestion control algorithms, like the upcoming BBR module,
if the losses in recovery do not indicate congestion then we may
continue to raise cwnd multiplicatively in recovery. In such cases the
current multiplier will falsely limit the sending rate, much as if it
were limited by the application.

This commit adds an optional congestion control callback to use a
different multiplier to expand the TCP send buffer. For congestion
control modules that do not specificy this callback, TCP continues to
use the previous default of 2.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h| 2 ++
 net/ipv4/tcp_input.c | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index 3492041..1aa9628 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -917,6 +917,8 @@ struct tcp_congestion_ops {
void (*pkts_acked)(struct sock *sk, const struct ack_sample *sample);
/* suggest number of segments for each skb to transmit (optional) */
u32 (*tso_segs_goal)(struct sock *sk);
+   /* returns the multiplier used in tcp_sndbuf_expand (optional) */
+   u32 (*sndbuf_expand)(struct sock *sk);
/* get info for inet_diag (optional) */
size_t (*get_info)(struct sock *sk, u32 ext, int *attr,
   union tcp_cc_info *info);
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index df26af0..a134e66 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -289,6 +289,7 @@ static bool tcp_ecn_rcv_ecn_echo(const struct tcp_sock *tp, 
const struct tcphdr
 static void tcp_sndbuf_expand(struct sock *sk)
 {
const struct tcp_sock *tp = tcp_sk(sk);
+   const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
int sndmem, per_mss;
u32 nr_segs;
 
@@ -309,7 +310,8 @@ static void tcp_sndbuf_expand(struct sock *sk)
 * Cubic needs 1.7 factor, rounded to 2 to include
 * extra cushion (application might react slowly to POLLOUT)
 */
-   sndmem = 2 * nr_segs * per_mss;
+   sndmem = ca_ops->sndbuf_expand ? ca_ops->sndbuf_expand(sk) : 2;
+   sndmem *= nr_segs * per_mss;
 
if (sk->sk_sndbuf < sndmem)
sk->sk_sndbuf = min(sndmem, sysctl_tcp_wmem[2]);
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 09/16] tcp: export data delivery rate

2016-09-17 Thread Neal Cardwell
From: Yuchung Cheng 

This commit export two new fields in struct tcp_info:

  tcpi_delivery_rate: The most recent goodput, as measured by
tcp_rate_gen(). If the socket is limited by the sending
application (e.g., no data to send), it reports the highest
measurement instead of the most recent. The unit is bytes per
second (like other rate fields in tcp_info).

  tcpi_delivery_rate_app_limited: A boolean indicating if the goodput
was measured when the socket's throughput was limited by the
sending application.

This delivery rate information can be useful for applications that
want to know the current throughput the TCP connection is seeing,
e.g. adaptive bitrate video streaming. It can also be very useful for
debugging or troubleshooting.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h  |  5 -
 include/uapi/linux/tcp.h |  3 +++
 net/ipv4/tcp.c   | 11 ++-
 net/ipv4/tcp_rate.c  | 12 +++-
 4 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index fdcd00f..a17ae7b 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -213,7 +213,8 @@ struct tcp_sock {
u8 reord;/* reordering detected */
} rack;
u16 advmss; /* Advertised MSS   */
-   u8  unused;
+   u8  rate_app_limited:1,  /* rate_{delivered,interval_us} limited? */
+   unused:7;
u8  nonagle : 4,/* Disable Nagle algorithm? */
thin_lto: 1,/* Use linear timeouts for thin streams */
thin_dupack : 1,/* Fast retransmit on first dupack  */
@@ -271,6 +272,8 @@ struct tcp_sock {
u32 app_limited;/* limited until "delivered" reaches this val */
struct skb_mstamp first_tx_mstamp;  /* start of window send phase */
struct skb_mstamp delivered_mstamp; /* time we reached "delivered" */
+   u32 rate_delivered;/* saved rate sample: packets delivered */
+   u32 rate_interval_us;  /* saved rate sample: time elapsed */
 
u32 rcv_wnd;/* Current receiver window  */
u32 write_seq;  /* Tail(+1) of data held in tcp send buffer */
diff --git a/include/uapi/linux/tcp.h b/include/uapi/linux/tcp.h
index 482898f..73ac0db 100644
--- a/include/uapi/linux/tcp.h
+++ b/include/uapi/linux/tcp.h
@@ -167,6 +167,7 @@ struct tcp_info {
__u8tcpi_backoff;
__u8tcpi_options;
__u8tcpi_snd_wscale : 4, tcpi_rcv_wscale : 4;
+   __u8tcpi_delivery_rate_app_limited:1;
 
__u32   tcpi_rto;
__u32   tcpi_ato;
@@ -211,6 +212,8 @@ struct tcp_info {
__u32   tcpi_min_rtt;
__u32   tcpi_data_segs_in;  /* RFC4898 tcpEStatsDataSegsIn */
__u32   tcpi_data_segs_out; /* RFC4898 tcpEStatsDataSegsOut */
+
+   __u64   tcpi_delivery_rate;
 };
 
 /* for TCP_MD5SIG socket option */
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0327a44..46b05b2 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2695,7 +2695,7 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
 {
const struct tcp_sock *tp = tcp_sk(sk); /* iff sk_type == SOCK_STREAM */
const struct inet_connection_sock *icsk = inet_csk(sk);
-   u32 now = tcp_time_stamp;
+   u32 now = tcp_time_stamp, intv;
unsigned int start;
int notsent_bytes;
u64 rate64;
@@ -2785,6 +2785,15 @@ void tcp_get_info(struct sock *sk, struct tcp_info *info)
info->tcpi_min_rtt = tcp_min_rtt(tp);
info->tcpi_data_segs_in = tp->data_segs_in;
info->tcpi_data_segs_out = tp->data_segs_out;
+
+   info->tcpi_delivery_rate_app_limited = tp->rate_app_limited ? 1 : 0;
+   rate = READ_ONCE(tp->rate_delivered);
+   intv = READ_ONCE(tp->rate_interval_us);
+   if (rate && intv) {
+   rate64 = (u64)rate * tp->mss_cache * USEC_PER_SEC;
+   do_div(rate64, intv);
+   put_unaligned(rate64, >tcpi_delivery_rate);
+   }
 }
 EXPORT_SYMBOL_GPL(tcp_get_info);
 
diff --git a/net/ipv4/tcp_rate.c b/net/ipv4/tcp_rate.c
index 52ff84b..9be1581 100644
--- a/net/ipv4/tcp_rate.c
+++ b/net/ipv4/tcp_rate.c
@@ -149,12 +149,22 @@ void tcp_rate_gen(struct sock *sk, u32 delivered, u32 
lost,
 * for connections suffer heavy or prolonged losses.
 */
if (unlikely(rs->interval_us < tcp_min_rtt(tp))) {
-   rs->interval_us = -1;
if (!rs->is_retrans)
pr_debug("tcp rate: %ld %d %u %u %u\n",
 rs->interval_us, rs->delivered,

[PATCH v2 net-next 15/16] tcp: increase ICSK_CA_PRIV_SIZE from 64 bytes to 88

2016-09-17 Thread Neal Cardwell
The TCP CUBIC module already uses 64 bytes.
The upcoming TCP BBR module uses 88 bytes.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/inet_connection_sock.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/inet_connection_sock.h 
b/include/net/inet_connection_sock.h
index 49dcad4..197a30d 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -134,8 +134,8 @@ struct inet_connection_sock {
} icsk_mtup;
u32   icsk_user_timeout;
 
-   u64   icsk_ca_priv[64 / sizeof(u64)];
-#define ICSK_CA_PRIV_SIZE  (8 * sizeof(u64))
+   u64   icsk_ca_priv[88 / sizeof(u64)];
+#define ICSK_CA_PRIV_SIZE  (11 * sizeof(u64))
 };
 
 #define ICSK_TIME_RETRANS  1   /* Retransmit timer */
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 11/16] tcp: export tcp_tso_autosize() and parameterize minimum number of TSO segments

2016-09-17 Thread Neal Cardwell
To allow congestion control modules to use the default TSO auto-sizing
algorithm as one of the ingredients in their own decision about TSO sizing:

1) Export tcp_tso_autosize() so that CC modules can use it.

2) Change tcp_tso_autosize() to allow callers to specify a minimum
   number of segments per TSO skb, in case the congestion control
   module has a different notion of the best floor for TSO skbs for
   the connection right now. For very low-rate paths or policed
   connections it can be appropriate to use smaller TSO skbs.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/net/tcp.h | 2 ++
 net/ipv4/tcp_output.c | 9 ++---
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/include/net/tcp.h b/include/net/tcp.h
index f8f581f..3492041 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -533,6 +533,8 @@ __u32 cookie_v6_init_sequence(const struct sk_buff *skb, 
__u16 *mss);
 #endif
 /* tcp_output.c */
 
+u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
+int min_tso_segs);
 void __tcp_push_pending_frames(struct sock *sk, unsigned int cur_mss,
   int nonagle);
 bool tcp_may_send_now(struct sock *sk);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 0137956..0bf3d48 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1549,7 +1549,8 @@ static bool tcp_nagle_check(bool partial, const struct 
tcp_sock *tp,
 /* Return how many segs we'd like on a TSO packet,
  * to send one TSO packet per ms
  */
-static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now)
+u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
+int min_tso_segs)
 {
u32 bytes, segs;
 
@@ -1561,10 +1562,11 @@ static u32 tcp_tso_autosize(const struct sock *sk, 
unsigned int mss_now)
 * This preserves ACK clocking and is consistent
 * with tcp_tso_should_defer() heuristic.
 */
-   segs = max_t(u32, bytes / mss_now, sysctl_tcp_min_tso_segs);
+   segs = max_t(u32, bytes / mss_now, min_tso_segs);
 
return min_t(u32, segs, sk->sk_gso_max_segs);
 }
+EXPORT_SYMBOL(tcp_tso_autosize);
 
 /* Return the number of segments we want in the skb we are transmitting.
  * See if congestion control module wants to decide; otherwise, autosize.
@@ -1574,7 +1576,8 @@ static u32 tcp_tso_segs(struct sock *sk, unsigned int 
mss_now)
const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
 
-   return tso_segs ? : tcp_tso_autosize(sk, mss_now);
+   return tso_segs ? :
+   tcp_tso_autosize(sk, mss_now, sysctl_tcp_min_tso_segs);
 }
 
 /* Returns the portion of skb which can be sent right away */
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 16/16] tcp_bbr: add BBR congestion control

2016-09-17 Thread Neal Cardwell
This commit implements a new TCP congestion control algorithm: BBR
(Bottleneck Bandwidth and RTT). A detailed description of BBR will be
published in ACM Queue, Vol. 14 No. 5, September-October 2016, as
"BBR: Congestion-Based Congestion Control".

BBR has significantly increased throughput and reduced latency for
connections on Google's internal backbone networks and google.com and
YouTube Web servers.

BBR requires only changes on the sender side, not in the network or
the receiver side. Thus it can be incrementally deployed on today's
Internet, or in datacenters.

The Internet has predominantly used loss-based congestion control
(largely Reno or CUBIC) since the 1980s, relying on packet loss as the
signal to slow down. While this worked well for many years, loss-based
congestion control is unfortunately out-dated in today's networks. On
today's Internet, loss-based congestion control causes the infamous
bufferbloat problem, often causing seconds of needless queuing delay,
since it fills the bloated buffers in many last-mile links. On today's
high-speed long-haul links using commodity switches with shallow
buffers, loss-based congestion control has abysmal throughput because
it over-reacts to losses caused by transient traffic bursts.

In 1981 Kleinrock and Gale showed that the optimal operating point for
a network maximizes delivered bandwidth while minimizing delay and
loss, not only for single connections but for the network as a
whole. Finding that optimal operating point has been elusive, since
any single network measurement is ambiguous: network measurements are
the result of both bandwidth and propagation delay, and those two
cannot be measured simultaneously.

While it is impossible to disambiguate any single bandwidth or RTT
measurement, a connection's behavior over time tells a clearer
story. BBR uses a measurement strategy designed to resolve this
ambiguity. It combines these measurements with a robust servo loop
using recent control systems advances to implement a distributed
congestion control algorithm that reacts to actual congestion, not
packet loss or transient queue delay, and is designed to converge with
high probability to a point near the optimal operating point.

In a nutshell, BBR creates an explicit model of the network pipe by
sequentially probing the bottleneck bandwidth and RTT. On the arrival
of each ACK, BBR derives the current delivery rate of the last round
trip, and feeds it through a windowed max-filter to estimate the
bottleneck bandwidth. Conversely it uses a windowed min-filter to
estimate the round trip propagation delay. The max-filtered bandwidth
and min-filtered RTT estimates form BBR's model of the network pipe.

Using its model, BBR sets control parameters to govern sending
behavior. The primary control is the pacing rate: BBR applies a gain
multiplier to transmit faster or slower than the observed bottleneck
bandwidth. The conventional congestion window (cwnd) is now the
secondary control; the cwnd is set to a small multiple of the
estimated BDP (bandwidth-delay product) in order to allow full
utilization and bandwidth probing while bounding the potential amount
of queue at the bottleneck.

When a BBR connection starts, it enters STARTUP mode and applies a
high gain to perform an exponential search to quickly probe the
bottleneck bandwidth (doubling its sending rate each round trip, like
slow start). However, instead of continuing until it fills up the
buffer (i.e. a loss), or until delay or ACK spacing reaches some
threshold (like Hystart), it uses its model of the pipe to estimate
when that pipe is full: it estimates the pipe is full when it notices
the estimated bandwidth has stopped growing. At that point it exits
STARTUP and enters DRAIN mode, where it reduces its pacing rate to
drain the queue it estimates it has created.

Then BBR enters steady state. In steady state, PROBE_BW mode cycles
between first pacing faster to probe for more bandwidth, then pacing
slower to drain any queue that created if no more bandwidth was
available, and then cruising at the estimated bandwidth to utilize the
pipe without creating excess queue. Occasionally, on an as-needed
basis, it sends significantly slower to probe for RTT (PROBE_RTT
mode).

Our long-term goal is to improve the congestion control algorithms
used on the Internet. We are hopeful that BBR can help advance the
efforts toward this goal, and motivate the community to do further
research.

Test results, performance evaluations, feedback, and BBR-related
discussions are very welcome in the public e-mail list for BBR:

  https://groups.google.com/forum/#!forum/bbr-dev

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/uapi/linux/inet_diag.h |  13 

[PATCH v2 net-next 06/16] tcp: count packets marked lost for a TCP connection

2016-09-17 Thread Neal Cardwell
Count the number of packets that a TCP connection marks lost.

Congestion control modules can use this loss rate information for more
intelligent decisions about how fast to send.

Specifically, this is used in TCP BBR policer detection. BBR uses a
high packet loss rate as one signal in its policer detection and
policer bandwidth estimation algorithm.

The BBR policer detection algorithm cannot simply track retransmits,
because a retransmit can be (and often is) an indicator of packets
lost long, long ago. This is particularly true in a long CA_Loss
period that repairs the initial massive losses when a policer kicks
in.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h  |  1 +
 net/ipv4/tcp_input.c | 25 -
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index 6433cc8..38590fb 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -267,6 +267,7 @@ struct tcp_sock {
 * receiver in Recovery. */
u32 prr_out;/* Total number of pkts sent during Recovery. */
u32 delivered;  /* Total data packets delivered incl. rexmits */
+   u32 lost;   /* Total data packets lost incl. rexmits */
 
u32 rcv_wnd;/* Current receiver window  */
u32 write_seq;  /* Tail(+1) of data held in tcp send buffer */
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ac5b38f..024b579 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -899,12 +899,29 @@ static void tcp_verify_retransmit_hint(struct tcp_sock 
*tp, struct sk_buff *skb)
tp->retransmit_high = TCP_SKB_CB(skb)->end_seq;
 }
 
+/* Sum the number of packets on the wire we have marked as lost.
+ * There are two cases we care about here:
+ * a) Packet hasn't been marked lost (nor retransmitted),
+ *and this is the first loss.
+ * b) Packet has been marked both lost and retransmitted,
+ *and this means we think it was lost again.
+ */
+static void tcp_sum_lost(struct tcp_sock *tp, struct sk_buff *skb)
+{
+   __u8 sacked = TCP_SKB_CB(skb)->sacked;
+
+   if (!(sacked & TCPCB_LOST) ||
+   ((sacked & TCPCB_LOST) && (sacked & TCPCB_SACKED_RETRANS)))
+   tp->lost += tcp_skb_pcount(skb);
+}
+
 static void tcp_skb_mark_lost(struct tcp_sock *tp, struct sk_buff *skb)
 {
if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))) {
tcp_verify_retransmit_hint(tp, skb);
 
tp->lost_out += tcp_skb_pcount(skb);
+   tcp_sum_lost(tp, skb);
TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
}
 }
@@ -913,6 +930,7 @@ void tcp_skb_mark_lost_uncond_verify(struct tcp_sock *tp, 
struct sk_buff *skb)
 {
tcp_verify_retransmit_hint(tp, skb);
 
+   tcp_sum_lost(tp, skb);
if (!(TCP_SKB_CB(skb)->sacked & (TCPCB_LOST|TCPCB_SACKED_ACKED))) {
tp->lost_out += tcp_skb_pcount(skb);
TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
@@ -1890,6 +1908,7 @@ void tcp_enter_loss(struct sock *sk)
struct sk_buff *skb;
bool new_recovery = icsk->icsk_ca_state < TCP_CA_Recovery;
bool is_reneg;  /* is receiver reneging on SACKs? */
+   bool mark_lost;
 
/* Reduce ssthresh if it has not yet been made inside this window. */
if (icsk->icsk_ca_state <= TCP_CA_Disorder ||
@@ -1923,8 +1942,12 @@ void tcp_enter_loss(struct sock *sk)
if (skb == tcp_send_head(sk))
break;
 
+   mark_lost = (!(TCP_SKB_CB(skb)->sacked & TCPCB_SACKED_ACKED) ||
+is_reneg);
+   if (mark_lost)
+   tcp_sum_lost(tp, skb);
TCP_SKB_CB(skb)->sacked &= (~TCPCB_TAGBITS)|TCPCB_SACKED_ACKED;
-   if (!(TCP_SKB_CB(skb)->sacked_SACKED_ACKED) || is_reneg) {
+   if (mark_lost) {
TCP_SKB_CB(skb)->sacked &= ~TCPCB_SACKED_ACKED;
TCP_SKB_CB(skb)->sacked |= TCPCB_LOST;
tp->lost_out += tcp_skb_pcount(skb);
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 08/16] tcp: track application-limited rate samples

2016-09-17 Thread Neal Cardwell
From: Soheil Hassas Yeganeh 

This commit adds code to track whether the delivery rate represented
by each rate_sample was limited by the application.

Upon each transmit, we store in the is_app_limited field in the skb a
boolean bit indicating whether there is a known "bubble in the pipe":
a point in the rate sample interval where the sender was
application-limited, and did not transmit even though the cwnd and
pacing rate allowed it.

This logic marks the flow app-limited on a write if *all* of the
following are true:

  1) There is less than 1 MSS of unsent data in the write queue
 available to transmit.

  2) There is no packet in the sender's queues (e.g. in fq or the NIC
 tx queue).

  3) The connection is not limited by cwnd.

  4) There are no lost packets to retransmit.

The tcp_rate_check_app_limited() code in tcp_rate.c determines whether
the connection is application-limited at the moment. If the flow is
application-limited, it sets the tp->app_limited field. If the flow is
application-limited then that means there is effectively a "bubble" of
silence in the pipe now, and this silence will be reflected in a lower
bandwidth sample for any rate samples from now until we get an ACK
indicating this bubble has exited the pipe: specifically, until we get
an ACK for the next packet we transmit.

When we send every skb we record in scb->tx.is_app_limited whether the
resulting rate sample will be application-limited.

The code in tcp_rate_gen() checks to see when it is safe to mark all
known application-limited bubbles of silence as having exited the
pipe. It does this by checking to see when the delivered count moves
past the tp->app_limited marker. At this point it zeroes the
tp->app_limited marker, as all known bubbles are out of the pipe.

We make room for the tx.is_app_limited bit in the skb by borrowing a
bit from the in_flight field used by NV to record the number of bytes
in flight. The receive window in the TCP header is 16 bits, and the
max receive window scaling shift factor is 14 (RFC 1323). So the max
receive window offered by the TCP protocol is 2^(16+14) = 2^30. So we
only need 30 bits for the tx.in_flight used by NV.

Signed-off-by: Van Jacobson 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Nandita Dukkipati 
Signed-off-by: Eric Dumazet 
Signed-off-by: Soheil Hassas Yeganeh 
---
 include/linux/tcp.h  |  1 +
 include/net/tcp.h|  6 +-
 net/ipv4/tcp.c   |  8 
 net/ipv4/tcp_minisocks.c |  3 +++
 net/ipv4/tcp_rate.c  | 29 -
 5 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/include/linux/tcp.h b/include/linux/tcp.h
index c50e6ae..fdcd00f 100644
--- a/include/linux/tcp.h
+++ b/include/linux/tcp.h
@@ -268,6 +268,7 @@ struct tcp_sock {
u32 prr_out;/* Total number of pkts sent during Recovery. */
u32 delivered;  /* Total data packets delivered incl. rexmits */
u32 lost;   /* Total data packets lost incl. rexmits */
+   u32 app_limited;/* limited until "delivered" reaches this val */
struct skb_mstamp first_tx_mstamp;  /* start of window send phase */
struct skb_mstamp delivered_mstamp; /* time we reached "delivered" */
 
diff --git a/include/net/tcp.h b/include/net/tcp.h
index b261c89..a69ed7f 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -764,7 +764,9 @@ struct tcp_skb_cb {
union {
struct {
/* There is space for up to 24 bytes */
-   __u32 in_flight;/* Bytes in flight when packet sent */
+   __u32 in_flight:30,/* Bytes in flight at transmit */
+ is_app_limited:1, /* cwnd not fully used? */
+ unused:1;
/* pkts S/ACKed so far upon tx of skb, incl retrans: */
__u32 delivered;
/* start of send pipeline phase */
@@ -883,6 +885,7 @@ struct rate_sample {
int  losses;/* number of packets marked lost upon ACK */
u32  acked_sacked;  /* number of packets newly (S)ACKed upon ACK */
u32  prior_in_flight;   /* in flight before this ACK */
+   bool is_app_limited;/* is sample from packet with bubble in pipe? */
bool is_retrans;/* is sample from retransmission? */
 };
 
@@ -978,6 +981,7 @@ void tcp_rate_skb_delivered(struct sock *sk, struct sk_buff 
*skb,
struct rate_sample *rs);
 void tcp_rate_gen(struct sock *sk, u32 delivered, u32 lost,
  struct skb_mstamp *now, struct rate_sample *rs);
+void tcp_rate_check_app_limited(struct sock *sk);
 
 /* These functions determine how the current flow behaves in respect of SACK
  * handling. SACK is 

[PATCH v2 net-next 01/16] tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict

2016-09-17 Thread Neal Cardwell
From: Soheil Hassas Yeganeh 

The upcoming change "lib/win_minmax: windowed min or max estimator"
introduces a struct called minmax, which is then included in
include/linux/tcp.h in the upcoming change "tcp: use windowed min
filter library for TCP min_rtt estimation". This would create a
compilation error for tcp_cdg.c, which defines its own minmax
struct. To avoid this naming conflict (and potentially others in the
future), this commit renames the version used in tcp_cdg.c to
cdg_minmax.

Signed-off-by: Soheil Hassas Yeganeh 
Signed-off-by: Neal Cardwell 
Signed-off-by: Yuchung Cheng 
Signed-off-by: Eric Dumazet 
Cc: Kenneth Klette Jonassen 
---
 net/ipv4/tcp_cdg.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/ipv4/tcp_cdg.c b/net/ipv4/tcp_cdg.c
index 03725b2..35b2803 100644
--- a/net/ipv4/tcp_cdg.c
+++ b/net/ipv4/tcp_cdg.c
@@ -56,7 +56,7 @@ MODULE_PARM_DESC(use_shadow, "use shadow window heuristic");
 module_param(use_tolerance, bool, 0644);
 MODULE_PARM_DESC(use_tolerance, "use loss tolerance heuristic");
 
-struct minmax {
+struct cdg_minmax {
union {
struct {
s32 min;
@@ -74,10 +74,10 @@ enum cdg_state {
 };
 
 struct cdg {
-   struct minmax rtt;
-   struct minmax rtt_prev;
-   struct minmax *gradients;
-   struct minmax gsum;
+   struct cdg_minmax rtt;
+   struct cdg_minmax rtt_prev;
+   struct cdg_minmax *gradients;
+   struct cdg_minmax gsum;
bool gfilled;
u8  tail;
u8  state;
@@ -353,7 +353,7 @@ static void tcp_cdg_cwnd_event(struct sock *sk, const enum 
tcp_ca_event ev)
 {
struct cdg *ca = inet_csk_ca(sk);
struct tcp_sock *tp = tcp_sk(sk);
-   struct minmax *gradients;
+   struct cdg_minmax *gradients;
 
switch (ev) {
case CA_EVENT_CWND_RESTART:
-- 
2.8.0.rc3.226.g39d4020



[PATCH v2 net-next 00/16] tcp: BBR congestion control algorithm

2016-09-17 Thread Neal Cardwell
tcp: BBR congestion control algorithm

This patch series implements a new TCP congestion control algorithm:
BBR (Bottleneck Bandwidth and RTT). A paper with a detailed
description of BBR will be published in ACM Queue, September-October
2016, as "BBR: Congestion-Based Congestion Control". BBR is widely
deployed in production at Google.

The patch series starts with a set of supporting infrastructure
changes, including a few that extend the congestion control
framework. The last patch adds BBR as a TCP congestion control
module. Please see individual patches for the details.

- v1 -> v2: fix issues caught by build bots:
 - fix "tcp: export data delivery rate" to use rate64 instead of rate,
   so there is a 64-bit numerator for the do_div call
 - fix conflicting definitions for minmax caused by
   "tcp: use windowed min filter library for TCP min_rtt estimation"
   with a new commit:
   tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict
 - fix warning about the use of __packed in
   "tcp: track data delivery rate for a TCP connection",
   which involves the addition of a new commit:
   tcp: switch back to proper tcp_skb_cb size check in tcp_init()  

Eric Dumazet (2):
  net_sched: sch_fq: add low_rate_threshold parameter
  tcp: switch back to proper tcp_skb_cb size check in tcp_init()

Neal Cardwell (8):
  lib/win_minmax: windowed min or max estimator
  tcp: use windowed min filter library for TCP min_rtt estimation
  tcp: count packets marked lost for a TCP connection
  tcp: allow congestion control module to request TSO skb segment count
  tcp: export tcp_tso_autosize() and parameterize minimum number of TSO
segments
  tcp: export tcp_mss_to_mtu() for congestion control modules
  tcp: increase ICSK_CA_PRIV_SIZE from 64 bytes to 88
  tcp_bbr: add BBR congestion control

Soheil Hassas Yeganeh (2):
  tcp: cdg: rename struct minmax in tcp_cdg.c to avoid a naming conflict
  tcp: track application-limited rate samples

Yuchung Cheng (4):
  tcp: track data delivery rate for a TCP connection
  tcp: export data delivery rate
  tcp: allow congestion control to expand send buffer differently
  tcp: new CC hook to set sending rate with rate_sample in any CA state

 include/linux/tcp.h|  14 +-
 include/linux/win_minmax.h |  37 ++
 include/net/inet_connection_sock.h |   4 +-
 include/net/tcp.h  |  53 ++-
 include/uapi/linux/inet_diag.h |  13 +
 include/uapi/linux/pkt_sched.h |   2 +
 include/uapi/linux/tcp.h   |   3 +
 lib/Makefile   |   2 +-
 lib/win_minmax.c   |  98 +
 net/ipv4/Kconfig   |  18 +
 net/ipv4/Makefile  |   3 +-
 net/ipv4/tcp.c |  26 +-
 net/ipv4/tcp_bbr.c | 875 +
 net/ipv4/tcp_cdg.c |  12 +-
 net/ipv4/tcp_cong.c|   2 +-
 net/ipv4/tcp_input.c   | 154 +++
 net/ipv4/tcp_minisocks.c   |   5 +-
 net/ipv4/tcp_output.c  |  27 +-
 net/ipv4/tcp_rate.c| 186 
 net/sched/sch_fq.c |  22 +-
 20 files changed, 1449 insertions(+), 107 deletions(-)
 create mode 100644 include/linux/win_minmax.h
 create mode 100644 lib/win_minmax.c
 create mode 100644 net/ipv4/tcp_bbr.c
 create mode 100644 net/ipv4/tcp_rate.c

-- 
2.8.0.rc3.226.g39d4020



Re: [RFC PATCH 9/9] ethernet: sun8i-emac: add pm_runtime support

2016-09-17 Thread Florian Fainelli
On 09/14/2016 07:03 AM, LABBE Corentin wrote:
> On Mon, Sep 12, 2016 at 10:44:51PM +0200, Maxime Ripard wrote:
>>> +static int __maybe_unused sun8i_emac_resume(struct platform_device *pdev)
>>> +{
>>> +   struct net_device *ndev = platform_get_drvdata(pdev);
>>> +   struct sun8i_emac_priv *priv = netdev_priv(ndev);
>>> +
>>> +   phy_start(ndev->phydev);
>>> +
>>> +   sun8i_emac_start_tx(ndev);
>>> +   sun8i_emac_start_rx(ndev);
>>> +
>>> +   if (netif_running(ndev))
>>> +   netif_device_attach(ndev);
>>> +
>>> +   netif_start_queue(ndev);
>>> +
>>> +   napi_enable(>napi);
>>> +
>>> +   return 0;
>>> +}
>>
>> The main idea behind the runtime PM hooks is that they bring the
>> device to a working state and shuts it down when it's not needed
>> anymore.
>>
> 
> I expect that the first part (all pm_runtime_xxx) of the patch bring that.
> When the interface is not opened:
> cat /sys/devices/platform/soc/1c3.ethernet/power/runtime_status 
> suspended

If your interface is not open, it should be in a low power state, only
when it gets open (which means it is used) should you make it
functional, that's pretty much the same thing as the runtime PM
reference count usage here.

I don't see a lot of value for using runtime_pm_* hooks here except
calling into the existing suspend/resume functions that you have defined
already, but then again, the code should be modular enough already in
the driver.

Runtime PM for network devices cannot be used as efficiently as you
would with any kind of host-initiated bus/controller because your device
needs to be able to receive packets without the host's ability to wake
up the device to receive packets, so, with the exception of MDIO (which
is host initiated), everything else besides except packet transmission
(then again, I would not want to wait N ms to bring the interface in a
state where it can now transmit packets, that's terrible for latency) is
pretty much impossible to fully suspend due to its asynchronous nature.
-- 
Florian


[PATCH net-next v3 0/3] net: ethernet: mediatek: add HW LRO functions

2016-09-17 Thread Nelson Chang
The series add the large receive offload (LRO) functions by hardware and
the ethtool functions to configure RX flows of HW LRO.

changes since v3:
- Respin the patch by the newer driver
- Move the dts description of hwlro to optional properties

changes since v2:
- Add ndo_fix_features to prevent NETIF_F_LRO off while RX flow is programmed
- Rephrase the dts property is a capability if the hardware supports LRO

changes since v1:
- Add HW LRO support
- Add ethtool hooks to set LRO RX flows

Nelson Chang (3):
  net: ethernet: mediatek: add HW LRO functions of PDMA RX rings
  net: ethernet: mediatek: add ethtool functions to configure RX flows
of HW LRO
  net: ethernet: mediatek: add dts configuration to enable HW LRO

 .../devicetree/bindings/net/mediatek-net.txt   |   2 +
 drivers/net/ethernet/mediatek/mtk_eth_soc.c| 433 +++--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h|  75 +++-
 3 files changed, 485 insertions(+), 25 deletions(-)

-- 
1.9.1



[PATCH -next] cxgb4: Fix return value check in cfg_queues_uld()

2016-09-17 Thread Wei Yongjun
From: Wei Yongjun 

Fix the retrn value check which testing the wrong variable
in cfg_queues_uld().

Fixes: 94cdb8bb993a ("cxgb4: Add support for dynamic allocation of
resources for ULD")
Signed-off-by: Wei Yongjun 
---
 drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c 
b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
index 5d402ba..4d1de62 100644
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_uld.c
@@ -245,7 +245,7 @@ int cfg_queues_uld(struct adapter *adap, unsigned int 
uld_type,
}
 
rxq_info->rspq_id = kcalloc(nrxq, sizeof(unsigned short), GFP_KERNEL);
-   if (!rxq_info->uldrxq) {
+   if (!rxq_info->rspq_id) {
kfree(rxq_info->uldrxq);
kfree(rxq_info);
return -ENOMEM;



[PATCH net-next v3 3/3] net: ethernet: mediatek: add the dts property to set if the HW supports LRO

2016-09-17 Thread Nelson Chang
Add the dts property for the capability if the hardware supports LRO.

Signed-off-by: Nelson Chang 
---
 Documentation/devicetree/bindings/net/mediatek-net.txt | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt 
b/Documentation/devicetree/bindings/net/mediatek-net.txt
index 32eaaca..6103e55 100644
--- a/Documentation/devicetree/bindings/net/mediatek-net.txt
+++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
@@ -24,7 +24,7 @@ Required properties:
 Optional properties:
 - interrupt-parent: Should be the phandle for the interrupt controller
   that services interrupts for this device
-
+- mediatek,hwlro: the capability if the hardware supports LRO functions
 
 * Ethernet MAC node
 
@@ -51,6 +51,7 @@ eth: ethernet@1b10 {
reset-names = "eth";
mediatek,ethsys = <>;
mediatek,pctl = <_pctl_a>;
+   mediatek,hwlro;
#address-cells = <1>;
#size-cells = <0>;
 
-- 
1.9.1



[PATCH net-next v3 2/3] net: ethernet: mediatek: add ethtool functions to configure RX flows of HW LRO

2016-09-17 Thread Nelson Chang
The codes add ethtool functions to set RX flows for HW LRO. Because the
HW LRO hardware can only recognize the destination IP of TCP/IP RX flows,
the ethtool command to add HW LRO flow is as below:
ethtool -N [devname] flow-type tcp4 dst-ip [ip_addr] loc [0~1]

Otherwise, cause the hardware can set total four destination IPs, each
GMAC (GMAC1/GMAC2) can set two IPs separately at most.

Signed-off-by: Nelson Chang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 236 
 1 file changed, 236 insertions(+)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 18600cb..481f360 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1357,6 +1357,182 @@ static void mtk_hwlro_rx_uninit(struct mtk_eth *eth)
mtk_w32(eth, 0, MTK_PDMA_LRO_CTRL_DW0);
 }
 
+static void mtk_hwlro_val_ipaddr(struct mtk_eth *eth, int idx, __be32 ip)
+{
+   u32 reg_val;
+
+   reg_val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(idx));
+
+   /* invalidate the IP setting */
+   mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(idx));
+
+   mtk_w32(eth, ip, MTK_LRO_DIP_DW0_CFG(idx));
+
+   /* validate the IP setting */
+   mtk_w32(eth, (reg_val | MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(idx));
+}
+
+static void mtk_hwlro_inval_ipaddr(struct mtk_eth *eth, int idx)
+{
+   u32 reg_val;
+
+   reg_val = mtk_r32(eth, MTK_LRO_CTRL_DW2_CFG(idx));
+
+   /* invalidate the IP setting */
+   mtk_w32(eth, (reg_val & ~MTK_RING_MYIP_VLD), MTK_LRO_CTRL_DW2_CFG(idx));
+
+   mtk_w32(eth, 0, MTK_LRO_DIP_DW0_CFG(idx));
+}
+
+static int mtk_hwlro_get_ip_cnt(struct mtk_mac *mac)
+{
+   int cnt = 0;
+   int i;
+
+   for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
+   if (mac->hwlro_ip[i])
+   cnt++;
+   }
+
+   return cnt;
+}
+
+static int mtk_hwlro_add_ipaddr(struct net_device *dev,
+   struct ethtool_rxnfc *cmd)
+{
+   struct ethtool_rx_flow_spec *fsp =
+   (struct ethtool_rx_flow_spec *)>fs;
+   struct mtk_mac *mac = netdev_priv(dev);
+   struct mtk_eth *eth = mac->hw;
+   int hwlro_idx;
+
+   if ((fsp->flow_type != TCP_V4_FLOW) ||
+   (!fsp->h_u.tcp_ip4_spec.ip4dst) ||
+   (fsp->location > 1))
+   return -EINVAL;
+
+   mac->hwlro_ip[fsp->location] = htonl(fsp->h_u.tcp_ip4_spec.ip4dst);
+   hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
+
+   mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
+
+   mtk_hwlro_val_ipaddr(eth, hwlro_idx, mac->hwlro_ip[fsp->location]);
+
+   return 0;
+}
+
+static int mtk_hwlro_del_ipaddr(struct net_device *dev,
+   struct ethtool_rxnfc *cmd)
+{
+   struct ethtool_rx_flow_spec *fsp =
+   (struct ethtool_rx_flow_spec *)>fs;
+   struct mtk_mac *mac = netdev_priv(dev);
+   struct mtk_eth *eth = mac->hw;
+   int hwlro_idx;
+
+   if (fsp->location > 1)
+   return -EINVAL;
+
+   mac->hwlro_ip[fsp->location] = 0;
+   hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + fsp->location;
+
+   mac->hwlro_ip_cnt = mtk_hwlro_get_ip_cnt(mac);
+
+   mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
+
+   return 0;
+}
+
+static void mtk_hwlro_netdev_disable(struct net_device *dev)
+{
+   struct mtk_mac *mac = netdev_priv(dev);
+   struct mtk_eth *eth = mac->hw;
+   int i, hwlro_idx;
+
+   for (i = 0; i < MTK_MAX_LRO_IP_CNT; i++) {
+   mac->hwlro_ip[i] = 0;
+   hwlro_idx = (mac->id * MTK_MAX_LRO_IP_CNT) + i;
+
+   mtk_hwlro_inval_ipaddr(eth, hwlro_idx);
+   }
+
+   mac->hwlro_ip_cnt = 0;
+}
+
+static int mtk_hwlro_get_fdir_entry(struct net_device *dev,
+   struct ethtool_rxnfc *cmd)
+{
+   struct mtk_mac *mac = netdev_priv(dev);
+   struct ethtool_rx_flow_spec *fsp =
+   (struct ethtool_rx_flow_spec *)>fs;
+
+   /* only tcp dst ipv4 is meaningful, others are meaningless */
+   fsp->flow_type = TCP_V4_FLOW;
+   fsp->h_u.tcp_ip4_spec.ip4dst = ntohl(mac->hwlro_ip[fsp->location]);
+   fsp->m_u.tcp_ip4_spec.ip4dst = 0;
+
+   fsp->h_u.tcp_ip4_spec.ip4src = 0;
+   fsp->m_u.tcp_ip4_spec.ip4src = 0x;
+   fsp->h_u.tcp_ip4_spec.psrc = 0;
+   fsp->m_u.tcp_ip4_spec.psrc = 0x;
+   fsp->h_u.tcp_ip4_spec.pdst = 0;
+   fsp->m_u.tcp_ip4_spec.pdst = 0x;
+   fsp->h_u.tcp_ip4_spec.tos = 0;
+   fsp->m_u.tcp_ip4_spec.tos = 0xff;
+
+   return 0;
+}
+
+static int mtk_hwlro_get_fdir_all(struct net_device *dev,
+ struct ethtool_rxnfc *cmd,
+ u32 *rule_locs)
+{
+   struct mtk_mac *mac = netdev_priv(dev);
+   int cnt = 0;
+   int i;
+
+   for (i = 0; i < 

[PATCH net-next v3 1/3] net: ethernet: mediatek: add HW LRO functions of PDMA RX rings

2016-09-17 Thread Nelson Chang
The codes add the large receive offload (LRO) functions by hardware as below:
1) PDMA has total four RX rings that one is the normal ring, and others can
   be configured as LRO rings.
2) Only TCP/IP RX flows can be offloaded. The hardware can set four IP
   addresses at most, if the destination IP of the RX flow matches one of
   them, it has the chance to be offloaded.
3) There three RX flows can be offloaded at most, and one flow is mapped to
   one RX ring.
4) If there are more than three candidate RX flows, the hardware can
   choose three of them by throughput comparison results.

Signed-off-by: Nelson Chang 
---
 drivers/net/ethernet/mediatek/mtk_eth_soc.c | 215 +---
 drivers/net/ethernet/mediatek/mtk_eth_soc.h |  75 +-
 2 files changed, 265 insertions(+), 25 deletions(-)

diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c 
b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
index 522fe8d..18600cb 100644
--- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -820,11 +820,51 @@ drop:
return NETDEV_TX_OK;
 }
 
+static struct mtk_rx_ring *mtk_get_rx_ring(struct mtk_eth *eth)
+{
+   int i;
+   struct mtk_rx_ring *ring;
+   int idx;
+
+   if (!eth->hwlro)
+   return >rx_ring[0];
+
+   for (i = 0; i < MTK_MAX_RX_RING_NUM; i++) {
+   ring = >rx_ring[i];
+   idx = NEXT_RX_DESP_IDX(ring->calc_idx, ring->dma_size);
+   if (ring->dma[idx].rxd2 & RX_DMA_DONE) {
+   ring->calc_idx_update = true;
+   return ring;
+   }
+   }
+
+   return NULL;
+}
+
+static void mtk_update_rx_cpu_idx(struct mtk_eth *eth)
+{
+   struct mtk_rx_ring *ring;
+   int i;
+
+   if (!eth->hwlro) {
+   ring = >rx_ring[0];
+   mtk_w32(eth, ring->calc_idx, ring->crx_idx_reg);
+   } else {
+   for (i = 0; i < MTK_MAX_RX_RING_NUM; i++) {
+   ring = >rx_ring[i];
+   if (ring->calc_idx_update) {
+   ring->calc_idx_update = false;
+   mtk_w32(eth, ring->calc_idx, ring->crx_idx_reg);
+   }
+   }
+   }
+}
+
 static int mtk_poll_rx(struct napi_struct *napi, int budget,
   struct mtk_eth *eth)
 {
-   struct mtk_rx_ring *ring = >rx_ring;
-   int idx = ring->calc_idx;
+   struct mtk_rx_ring *ring;
+   int idx;
struct sk_buff *skb;
u8 *data, *new_data;
struct mtk_rx_dma *rxd, trxd;
@@ -836,7 +876,11 @@ static int mtk_poll_rx(struct napi_struct *napi, int 
budget,
dma_addr_t dma_addr;
int mac = 0;
 
-   idx = NEXT_RX_DESP_IDX(idx);
+   ring = mtk_get_rx_ring(eth);
+   if (unlikely(!ring))
+   goto rx_done;
+
+   idx = NEXT_RX_DESP_IDX(ring->calc_idx, ring->dma_size);
rxd = >dma[idx];
data = ring->data[idx];
 
@@ -907,12 +951,13 @@ release_desc:
done++;
}
 
+rx_done:
if (done) {
/* make sure that all changes to the dma ring are flushed before
 * we continue
 */
wmb();
-   mtk_w32(eth, ring->calc_idx, MTK_PRX_CRX_IDX0);
+   mtk_update_rx_cpu_idx(eth);
}
 
return done;
@@ -1135,32 +1180,41 @@ static void mtk_tx_clean(struct mtk_eth *eth)
}
 }
 
-static int mtk_rx_alloc(struct mtk_eth *eth)
+static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag)
 {
-   struct mtk_rx_ring *ring = >rx_ring;
+   struct mtk_rx_ring *ring = >rx_ring[ring_no];
+   int rx_data_len, rx_dma_size;
int i;
 
-   ring->frag_size = mtk_max_frag_size(ETH_DATA_LEN);
+   if (rx_flag == MTK_RX_FLAGS_HWLRO) {
+   rx_data_len = MTK_MAX_LRO_RX_LENGTH;
+   rx_dma_size = MTK_HW_LRO_DMA_SIZE;
+   } else {
+   rx_data_len = ETH_DATA_LEN;
+   rx_dma_size = MTK_DMA_SIZE;
+   }
+
+   ring->frag_size = mtk_max_frag_size(rx_data_len);
ring->buf_size = mtk_max_buf_size(ring->frag_size);
-   ring->data = kcalloc(MTK_DMA_SIZE, sizeof(*ring->data),
+   ring->data = kcalloc(rx_dma_size, sizeof(*ring->data),
 GFP_KERNEL);
if (!ring->data)
return -ENOMEM;
 
-   for (i = 0; i < MTK_DMA_SIZE; i++) {
+   for (i = 0; i < rx_dma_size; i++) {
ring->data[i] = netdev_alloc_frag(ring->frag_size);
if (!ring->data[i])
return -ENOMEM;
}
 
ring->dma = dma_alloc_coherent(eth->dev,
-  MTK_DMA_SIZE * sizeof(*ring->dma),
+  rx_dma_size * sizeof(*ring->dma),
 

[PATCH net-next v2 0/3] net: ethernet: mediatek: add HW LRO functions

2016-09-17 Thread Nelson Chang
The series add the large receive offload (LRO) functions by hardware and
the ethtool functions to configure RX flows of HW LRO.

changes since v3:
- Respin the patch by the newer driver
- Move the dts description of hwlro to optional properties

changes since v2:
- Add ndo_fix_features to prevent NETIF_F_LRO off while RX flow is programmed
- Rephrase the dts property is a capability if the hardware supports LRO

changes since v1:
- Add HW LRO support
- Add ethtool hooks to set LRO RX flows

Nelson Chang (3):
  net: ethernet: mediatek: add HW LRO functions of PDMA RX rings
  net: ethernet: mediatek: add ethtool functions to configure RX flows
of HW LRO
  net: ethernet: mediatek: add dts configuration to enable HW LRO

 .../devicetree/bindings/net/mediatek-net.txt   |   2 +
 drivers/net/ethernet/mediatek/mtk_eth_soc.c| 433 +++--
 drivers/net/ethernet/mediatek/mtk_eth_soc.h|  75 +++-
 3 files changed, 485 insertions(+), 25 deletions(-)

-- 
1.9.1



RE: [PATCH net-next v2 0/3] net: ethernet: mediatek: add HW LRO functions

2016-09-17 Thread Nelson Chang
Thanks David!
I'll respin the patch and submit the newer version.

-Original Message-
From: David Miller [mailto:da...@davemloft.net] 
Sent: Saturday, September 17, 2016 9:46 PM
To: Nelson Chang (張家祥)
Cc: j...@phrozen.org; f.faine...@gmail.com; n...@openwrt.org;
netdev@vger.kernel.org; linux-media...@lists.infradead.org;
nelsonch...@gmail.com
Subject: Re: [PATCH net-next v2 0/3] net: ethernet: mediatek: add HW LRO
functions

From: Nelson Chang 
Date: Wed, 14 Sep 2016 13:58:56 +0800

> The series add the large receive offload (LRO) functions by hardware 
> and the ethtool functions to configure RX flows of HW LRO.
> 
> changes since v2:
> - Add ndo_fix_features to prevent NETIF_F_LRO off while RX flow is 
> programmed
> - Rephrase the dts property is a capability if the hardware supports 
> LRO
> 
> changes since v1:
> - Add HW LRO support
> - Add ethtool hooks to set LRO RX flows

This doesn't apply cleanly to net-next.




[net PATCH V3] mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full

2016-09-17 Thread Jesper Dangaard Brouer
The XDP_TX action can fail transmitting the frame in case the TX ring
is full or port is down.  In case of TX failure it should drop the
frame, and not as now call 'break' which is the same as XDP_PASS.

Fixes: 9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write support")
Signed-off-by: Jesper Dangaard Brouer 

---
Is this goto lable inside a switch case too ugly?
Note, this fix have nothing to do with the page-refcnt bug I reported.

 drivers/net/ethernet/mellanox/mlx4/en_rx.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 2040dad8611d..9eadda431965 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -906,11 +906,12 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct 
mlx4_en_cq *cq, int bud
length, tx_index,
_pending))
goto consumed;
-   break;
+   goto xdp_drop; /* Drop on xmit failure */
default:
bpf_warn_invalid_xdp_action(act);
case XDP_ABORTED:
case XDP_DROP:
+   xdp_drop:
if (mlx4_en_rx_recycle(ring, frags))
goto consumed;
goto next;



Your (Email Address) Outlook exceeded

2016-09-17 Thread Kaitlin Schneekloth
Your (Email Address) Outlook exceeded its storage limit. 
https://docs.google.com/forms/d/e/1FAIpQLSdtc96pXFgZ5LIOEaRYQaBOvX0ae7kS_RpTukKOq7eI4RASQw/viewform
 (FILL) and Click on Submit to get more space or you wont be able to send Mail.


Re: [net PATCH] mlx4: fix XDP_TX is acting like XDP_PASS on TX ring full

2016-09-17 Thread Jesper Dangaard Brouer
On Fri, 16 Sep 2016 13:43:50 -0700
Brenden Blanco  wrote:

> On Fri, Sep 16, 2016 at 10:36:12PM +0200, Jesper Dangaard Brouer wrote:
> > The XDP_TX action can fail transmitting the frame in case the TX ring
> > is full or port is down.  In case of TX failure it should drop the
> > frame, and not as now call 'break' which is the same as XDP_PASS.
> > 
> > Fixes: 9ecc2d86171a ("net/mlx4_en: add xdp forwarding and data write 
> > support")
> > Signed-off-by: Jesper Dangaard Brouer   
> 
> You could in theory have also tried to recycle the page instead of
> dropping it, but that's probably not worth optimizing when tx is backed
> up, as you'll only save a handful of page_put's. The code to do so
> wouldn't have been pretty.

Yes, we could (and perhaps should) recycle the page instead. But as you
also mention it would not look pretty. I'll send a V3, as XDPs primary
concern is performance.

> Reviewed-by: Brenden Blanco 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  Author of http://www.iptv-analyzer.org
  LinkedIn: http://www.linkedin.com/in/brouer


Re: [PATCH net-next 0/4] ip_tunnel: add collect_md mode to IPv4/IPv6 tunnels

2016-09-17 Thread David Miller
From: Alexei Starovoitov 
Date: Thu, 15 Sep 2016 13:00:28 -0700

> Similar to geneve, vxlan, gre tunnels implement 'collect metadata' mode
> in ipip, ipip6, ip6ip6 tunnels.

Series applied, thanks.


Re: [PATCH 0/3] constify net_device_ops structures

2016-09-17 Thread David Miller
From: Julia Lawall 
Date: Thu, 15 Sep 2016 22:23:23 +0200

> Constify net_device_ops structures.

All applied, thanks.


Re: [PATCH net-next] net: vrf: Remove RT_FL_TOS

2016-09-17 Thread David Miller
From: David Ahern 
Date: Thu, 15 Sep 2016 10:13:47 -0700

> No longer used after d66f6c0a8f3c0 ("net: ipv4: Remove l3mdev_get_saddr")
> 
> Signed-off-by: David Ahern 

Applied.


Re: [PATCH net-next] net: l3mdev: Remove netif_index_is_l3_master

2016-09-17 Thread David Miller
From: David Ahern 
Date: Thu, 15 Sep 2016 10:18:45 -0700

> No longer used after e0d56fdd73422 ("net: l3mdev: remove redundant calls")
> 
> Signed-off-by: David Ahern 

Applied.


Re: [PATCH] llc: switch type to bool as the timeout is only tested versus 0

2016-09-17 Thread David Miller
From: Alan 
Date: Thu, 15 Sep 2016 18:51:25 +0100

> (As asked by Dave in Februrary)
> 
> Signed-off-by: Alan Cox 

Applied.


Re: [PATCH net-next] tcp: prepare skbs for better sack shifting

2016-09-17 Thread David Miller
From: Eric Dumazet 
Date: Thu, 15 Sep 2016 09:33:02 -0700

> From: Eric Dumazet 
> 
> With large BDP TCP flows and lossy networks, it is very important
> to keep a low number of skbs in the write queue.
> 
> RACK and SACK processing can perform a linear scan of it.
> 
> We should avoid putting any payload in skb->head, so that SACK
> shifting can be done if needed.
> 
> With this patch, we allow to pack ~0.5 MB per skb instead of
> the 64KB initially cooked at tcp_sendmsg() time.
> 
> This gives a reduction of number of skbs in write queue by eight.
> tcp_rack_detect_loss() likes this.
> 
> We still allow payload in skb->head for first skb put in the queue,
> to not impact RPC workloads.
> 
> Signed-off-by: Eric Dumazet 
> Cc: Yuchung Cheng 

Applied.


Re: [PATCH net] sctp: fix SSN comparision

2016-09-17 Thread David Miller
From: Marcelo Ricardo Leitner 
Date: Thu, 15 Sep 2016 15:02:38 -0300

> This function actually operates on u32 yet its paramteres were declared
> as u16, causing integer truncation upon calling.
> 
> Note in patch context that ADDIP_SERIAL_SIGN_BIT is already 32 bits.
> 
> Signed-off-by: Marcelo Ricardo Leitner 

Applied.


Re: [PATCH] irda: Free skb on irda_accept error path.

2016-09-17 Thread David Miller
From: Phil Turnbull 
Date: Thu, 15 Sep 2016 12:41:44 -0400

> skb is not freed if newsk is NULL. Rework the error path so free_skb is
> unconditionally called on function exit.
> 
> Fixes: c3ea9fa27413 ("[IrDA] af_irda: IRDA_ASSERT cleanups")
> Signed-off-by: Phil Turnbull 

Applied.


Re: [PATCH net] tcp: fix overflow in __tcp_retransmit_skb()

2016-09-17 Thread David Miller
From: Eric Dumazet 
Date: Thu, 15 Sep 2016 08:12:33 -0700

> From: Eric Dumazet 
> 
> If a TCP socket gets a large write queue, an overflow can happen
> in a test in __tcp_retransmit_skb() preventing all retransmits.
> 
> The flow then stalls and resets after timeouts.
> 
> Tested:
> 
> sysctl -w net.core.wmem_max=10
> netperf -H dest -- -s 10
> 
> Signed-off-by: Eric Dumazet 

Applied.


Re: [PATCH net] net: avoid sk_forward_alloc overflows

2016-09-17 Thread David Miller
From: Eric Dumazet 
Date: Thu, 15 Sep 2016 08:48:46 -0700

> From: Eric Dumazet 
> 
> A malicious TCP receiver, sending SACK, can force the sender to split
> skbs in write queue and increase its memory usage.
> 
> Then, when socket is closed and its write queue purged, we might
> overflow sk_forward_alloc (It becomes negative)
> 
> sk_mem_reclaim() does nothing in this case, and more than 2GB
> are leaked from TCP perspective (tcp_memory_allocated is not changed)
> 
> Then warnings trigger from inet_sock_destruct() and
> sk_stream_kill_queues() seeing a not zero sk_forward_alloc
> 
> All TCP stack can be stuck because TCP is under memory pressure.
> 
> A simple fix is to preemptively reclaim from sk_mem_uncharge().
> 
> This makes sure a socket wont have more than 2 MB forward allocated,
> after burst and idle period.
> 
> Signed-off-by: Eric Dumazet 

Applied.


Re: [PATCH v2] xen-netback: fix error handling on netback_probe()

2016-09-17 Thread David Miller
From: Filipe Manco 
Date: Thu, 15 Sep 2016 17:10:46 +0200

> In case of error during netback_probe() (e.g. an entry missing on the
> xenstore) netback_remove() is called on the new device, which will set
> the device backend state to XenbusStateClosed by calling
> set_backend_state(). However, the backend state wasn't initialized by
> netback_probe() at this point, which will cause and invalid transaction
> and set_backend_state() to BUG().
> 
> Initialize the backend state at the beginning of netback_probe() to
> XenbusStateInitialising, and create two new valid state transitions on
> set_backend_state(), from XenbusStateInitialising to XenbusStateClosed,
> and from XenbusStateInitialising to XenbusStateInitWait.
> 
> Signed-off-by: Filipe Manco 

Applied, thanks.


Re: pull-request: wireless-drivers-next 2016-09-15

2016-09-17 Thread David Miller
From: Kalle Valo 
Date: Thu, 15 Sep 2016 18:09:21 +0300

> here's the first pull request for 4.9. The ones I want to point out are
> the FIELD_PREP() and FIELD_GET() macros added to bitfield.h, which are
> reviewed by Linus, and make it possible to remove util.h from mt7601u.
> 
> Also we have new HW support to various drivers and other smaller
> features, the signed tag below contains more information. And I pulled
> my ath-current (uses older net tree as the baseline) branch to fix a
> conflict in ath10k.
> 
> Once again the diffstat from git request-pull was wrong. I fixed it by
> manually copying the diffstat from a test pull against net-next, so
> everything should be ok. But please let me know if there are any
> problems.

Pulled, thanks Kalle.


Re: [PATCH net-next 0/3] mlx5e Order-0 pages for Striding RQ

2016-09-17 Thread David Miller
From: Tariq Toukan 
Date: Thu, 15 Sep 2016 16:08:35 +0300

> In this series, we refactor our Striding RQ receive-flow to always use
> fragmented WQEs (Work Queue Elements) using order-0 pages, omitting the
> flow that allocates and splits high-order pages which would fragment
> and deplete high-order pages in the system.
> 
> The first patch gives a slight degradation, but opens the opportunity
> to using a simple page-cache mechanism of a fair size.
> The page-cache, implemented in patch 3, not only closes the performance
> gap but even gives a gain.
> In patch 2 we re-organize the code to better manage the calls for
> alloc/de-alloc pages in the RX flow.
> 
> Series generated against net-next commit:
> bed806cb266e "Merge branch 'mlxsw-ethtool'"

Series applied, thanks.


Re: [PATCH net-next v2 0/3] net: ethernet: mediatek: add HW LRO functions

2016-09-17 Thread David Miller
From: Nelson Chang 
Date: Wed, 14 Sep 2016 13:58:56 +0800

> The series add the large receive offload (LRO) functions by hardware and
> the ethtool functions to configure RX flows of HW LRO.
> 
> changes since v2:
> - Add ndo_fix_features to prevent NETIF_F_LRO off while RX flow is programmed
> - Rephrase the dts property is a capability if the hardware supports LRO
> 
> changes since v1:
> - Add HW LRO support
> - Add ethtool hooks to set LRO RX flows

This doesn't apply cleanly to net-next.


Re: [PATCH net-next 2/2] net sched ife action: Introduce skb tcindex metadata encap decap

2016-09-17 Thread David Miller
From: Jamal Hadi Salim 
Date: Thu, 15 Sep 2016 06:49:54 -0400

> +static int __init ifetc_index_init_module(void)
> +{
> + pr_emerg("Loaded IFE tc_index\n");
 ...
> +static void __exit ifetc_index_cleanup_module(void)
> +{
> + pr_emerg("Unloaded IFE tc_index\n");

This looks like some leftover debugging, please remove.


Re: [RFC PATCH 9/9] ethernet: sun8i-emac: add pm_runtime support

2016-09-17 Thread Maxime Ripard
On Wed, Sep 14, 2016 at 04:03:04PM +0200, LABBE Corentin wrote:
> > > +static int __maybe_unused sun8i_emac_suspend(struct platform_device 
> > > *pdev, pm_message_t state)
> > > +{
> > > + struct net_device *ndev = platform_get_drvdata(pdev);
> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> > > +
> > > + napi_disable(>napi);
> > > +
> > > + if (netif_running(ndev))
> > > + netif_device_detach(ndev);
> > > +
> > > + sun8i_emac_stop_tx(ndev);
> > > + sun8i_emac_stop_rx(ndev);
> > > +
> > > + sun8i_emac_rx_clean(ndev);
> > > + sun8i_emac_tx_clean(ndev);
> > > +
> > > + phy_stop(ndev->phydev);
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +static int __maybe_unused sun8i_emac_resume(struct platform_device *pdev)
> > > +{
> > > + struct net_device *ndev = platform_get_drvdata(pdev);
> > > + struct sun8i_emac_priv *priv = netdev_priv(ndev);
> > > +
> > > + phy_start(ndev->phydev);
> > > +
> > > + sun8i_emac_start_tx(ndev);
> > > + sun8i_emac_start_rx(ndev);
> > > +
> > > + if (netif_running(ndev))
> > > + netif_device_attach(ndev);
> > > +
> > > + netif_start_queue(ndev);
> > > +
> > > + napi_enable(>napi);
> > > +
> > > + return 0;
> > > +}
> > 
> > The main idea behind the runtime PM hooks is that they bring the
> > device to a working state and shuts it down when it's not needed
> > anymore.

Indeed.

> I expect that the first part (all pm_runtime_xxx) of the patch bring that.
> When the interface is not opened:
> cat /sys/devices/platform/soc/1c3.ethernet/power/runtime_status 
> suspended
> 
> > However, they shouldn't be called when the device is still in used, so
> > all the mangling with NAPI, the phy and so on is irrelevant here, but
> > the clocks, resets, for example, are.
> > 
> 
> I do the same as other ethernet driver for suspend/resume.

suspend / resume are used when you put the whole system into suspend,
and bring it back.

runtime_pm is only when the device is not used anymore. It makes sense
when you suspend to do whatever you're doing here. It doesn't make any
when the system is not suspended, but the device is.

> > >  static const struct of_device_id sun8i_emac_of_match_table[] = {
> > >   { .compatible = "allwinner,sun8i-a83t-emac",
> > > .data = _variant_a83t },
> > > @@ -2246,6 +2302,8 @@ static struct platform_driver sun8i_emac_driver = {
> > >   .name   = "sun8i-emac",
> > >   .of_match_table = sun8i_emac_of_match_table,
> > >   },
> > > + .suspend= sun8i_emac_suspend,
> > > + .resume = sun8i_emac_resume,
> > 
> > These are not the runtime PM hooks. How did you test that?
> > 
> 
> Anyway I didnt test suspend/resume so I will remove it until I
> successfully found how to hibernate my board.

So you submit code you never tested? That's usually a recipe for
disaster.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com


signature.asc
Description: PGP signature


Re: [PATCH net-next 05/14] tcp: track data delivery rate for a TCP connection

2016-09-17 Thread Neal Cardwell
On Fri, Sep 16, 2016 at 5:38 PM, kbuild test robot  wrote:
> Hi Yuchung,
>
> [auto build test WARNING on net-next/master]
> All warnings (new ones prefixed by >>):
>
>In file included from net/ipv4/route.c:103:0:
>>> include/net/tcp.h:769:11: warning: 'packed' attribute ignored for field of 
>>> type 'struct skb_mstamp' [-Wattributes]
>struct skb_mstamp first_tx_mstamp __packed;
>   ^~
>include/net/tcp.h:771:11: warning: 'packed' attribute ignored for field of 
> type 'struct skb_mstamp' [-Wattributes]
>struct skb_mstamp delivered_mstamp __packed;
>   ^~

We have a fix for this, and we'll post it with the v2 series.

thanks,
neal


Re: [patch net-next v10 2/3] net: core: Add offload stats to if_stats_msg

2016-09-17 Thread Nikolay Aleksandrov

> On Sep 16, 2016, at 4:05 PM, Jiri Pirko  wrote:
> 
> From: Nogah Frankel 
> 
> Add a nested attribute of offload stats to if_stats_msg
> named IFLA_STATS_LINK_OFFLOAD_XSTATS.
> Under it, add SW stats, meaning stats only per packets that went via
> slowpath to the cpu, named IFLA_OFFLOAD_XSTATS_CPU_HIT.
> 
> Signed-off-by: Nogah Frankel 
> Signed-off-by: Jiri Pirko 
> ---
> include/uapi/linux/if_link.h |   9 
> net/core/rtnetlink.c | 111 +--
> 2 files changed, 116 insertions(+), 4 deletions(-)
> 
> 

Acked-by: Nikolay Aleksandrov 




Re: [patch net-next v10 1/3] netdevice: Add offload statistics ndo

2016-09-17 Thread Nikolay Aleksandrov

> On Sep 16, 2016, at 4:05 PM, Jiri Pirko  wrote:
> 
> From: Nogah Frankel 
> 
> Add a new ndo to return statistics for offloaded operation.
> Since there can be many different offloaded operation with many
> stats types, the ndo gets an attribute id by which it knows which
> stats are wanted. The ndo also gets a void pointer to be cast according
> to the attribute id.
> 
> Signed-off-by: Nogah Frankel 
> Signed-off-by: Jiri Pirko 
> ---
> include/linux/netdevice.h | 12 
> 1 file changed, 12 insertions(+)
> 

Reviewed-by: Nikolay Aleksandrov 




Re: [PATCH net-next v2 3/5] cxgb4: add parser to translate u32 filters to internal spec

2016-09-17 Thread Rahul Lakkireddy
On Thursday, September 09/15/16, 2016 at 07:27:24 -0700, John Fastabend wrote:
> On 16-09-13 04:42 AM, Rahul Lakkireddy wrote:
> > Parse information sent by u32 into internal filter specification.
> > Add support for parsing several fields in IPv4, IPv6, TCP, and UDP.
> > 
> > Signed-off-by: Rahul Lakkireddy 
> > Signed-off-by: Hariprasad Shenai 
> > ---
> 
> Looks good to me. Also curious if you would find it worthwhile to
> have a cls_u32 mode that starts at L2 instead of the IP header? The
> use case would be to use cls_u32 with various encapsulation protocols
> in front of the IP header.
> 
> Reviewed-by: John Fastabend 

Thanks for the review John.  Yes, we are also looking for getting u32
to start from L2 header in order to allow matching encapsulation
protocols in front of IP header.

In addition, our hardware also keeps track of per-filter statistics
such as number of times the filter has been hit and number of bytes
that hit the filter. Hence, we are also looking for exposing these
per-filter stats to u32.

Thanks,
Rahul


Re: [PATCH net-next] rxrpc: Make IPv6 support conditional on CONFIG_IPV6

2016-09-17 Thread David Miller
From: David Howells 
Date: Sat, 17 Sep 2016 07:26:01 +0100

> Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
> This is then made conditional on CONFIG_IPV6.
> 
> Without this, the following can be seen:
> 
>net/built-in.o: In function `rxrpc_init_peer':
>>> peer_object.c:(.text+0x18c3c8): undefined reference to 
>>> `ip6_route_output_flags'
> 
> Reported-by: kbuild test robot 
> Signed-off-by: David Howells 

Applied.


[PATCH net-next] rxrpc: Make IPv6 support conditional on CONFIG_IPV6

2016-09-17 Thread David Howells
Add CONFIG_AF_RXRPC_IPV6 and make the IPv6 support code conditional on it.
This is then made conditional on CONFIG_IPV6.

Without this, the following can be seen:

   net/built-in.o: In function `rxrpc_init_peer':
>> peer_object.c:(.text+0x18c3c8): undefined reference to 
>> `ip6_route_output_flags'

Reported-by: kbuild test robot 
Signed-off-by: David Howells 
---

 net/rxrpc/Kconfig|7 +++
 net/rxrpc/af_rxrpc.c |7 ++-
 net/rxrpc/conn_object.c  |2 ++
 net/rxrpc/local_object.c |2 ++
 net/rxrpc/output.c   |2 ++
 net/rxrpc/peer_event.c   |4 +++-
 net/rxrpc/peer_object.c  |   10 ++
 net/rxrpc/utils.c|2 ++
 8 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/net/rxrpc/Kconfig b/net/rxrpc/Kconfig
index 784c53163b7b..13396c74b5c1 100644
--- a/net/rxrpc/Kconfig
+++ b/net/rxrpc/Kconfig
@@ -19,6 +19,13 @@ config AF_RXRPC
 
  See Documentation/networking/rxrpc.txt.
 
+config AF_RXRPC_IPV6
+   bool "IPv6 support for RxRPC"
+   depends on (IPV6 = m && AF_RXRPC = m) || (IPV6 = y && AF_RXRPC)
+   help
+ Say Y here to allow AF_RXRPC to use IPV6 UDP as well as IPV4 UDP as
+ its network transport.
+
 
 config AF_RXRPC_DEBUG
bool "RxRPC dynamic debugging"
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index f61f7b2d1ca4..09f81befc705 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -109,12 +109,14 @@ static int rxrpc_validate_address(struct rxrpc_sock *rx,
tail = offsetof(struct sockaddr_rxrpc, transport.sin.__pad);
break;
 
+#ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
if (srx->transport_len < sizeof(struct sockaddr_in6))
return -EINVAL;
tail = offsetof(struct sockaddr_rxrpc, transport) +
sizeof(struct sockaddr_in6);
break;
+#endif
 
default:
return -EAFNOSUPPORT;
@@ -413,9 +415,11 @@ static int rxrpc_sendmsg(struct socket *sock, struct 
msghdr *m, size_t len)
case AF_INET:
rx->srx.transport_len = sizeof(struct sockaddr_in);
break;
+#ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
rx->srx.transport_len = sizeof(struct sockaddr_in6);
break;
+#endif
default:
ret = -EAFNOSUPPORT;
goto error_unlock;
@@ -570,7 +574,8 @@ static int rxrpc_create(struct net *net, struct socket 
*sock, int protocol,
return -EAFNOSUPPORT;
 
/* we support transport protocol UDP/UDP6 only */
-   if (protocol != PF_INET && protocol != PF_INET6)
+   if (protocol != PF_INET &&
+   IS_ENABLED(CONFIG_AF_RXRPC_IPV6) && protocol != PF_INET6)
return -EPROTONOSUPPORT;
 
if (sock->type != SOCK_DGRAM)
diff --git a/net/rxrpc/conn_object.c b/net/rxrpc/conn_object.c
index c0ddba787fd4..bb1f29280aea 100644
--- a/net/rxrpc/conn_object.c
+++ b/net/rxrpc/conn_object.c
@@ -134,6 +134,7 @@ struct rxrpc_connection *rxrpc_find_connection_rcu(struct 
rxrpc_local *local,
srx.transport.sin.sin_addr.s_addr)
goto not_found;
break;
+#ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
if (peer->srx.transport.sin6.sin6_port !=
srx.transport.sin6.sin6_port ||
@@ -142,6 +143,7 @@ struct rxrpc_connection *rxrpc_find_connection_rcu(struct 
rxrpc_local *local,
   sizeof(struct in6_addr)) != 0)
goto not_found;
break;
+#endif
default:
BUG();
}
diff --git a/net/rxrpc/local_object.c b/net/rxrpc/local_object.c
index f5b9bb0d3f98..e3fad80b0795 100644
--- a/net/rxrpc/local_object.c
+++ b/net/rxrpc/local_object.c
@@ -58,6 +58,7 @@ static long rxrpc_local_cmp_key(const struct rxrpc_local 
*local,
memcmp(>srx.transport.sin.sin_addr,
   >transport.sin.sin_addr,
   sizeof(struct in_addr));
+#ifdef CONFIG_AF_RXRPC_IPV6
case AF_INET6:
/* If the choice of UDP6 port is left up to the transport, then
 * the endpoint record doesn't match.
@@ -67,6 +68,7 @@ static long rxrpc_local_cmp_key(const struct rxrpc_local 
*local,
memcmp(>srx.transport.sin6.sin6_addr,
   >transport.sin6.sin6_addr,
   sizeof(struct in6_addr));
+#endif
default:
BUG();
}
diff --git a/net/rxrpc/output.c b/net/rxrpc/output.c
index d7cd87f17f0d..06a9aca739d1 100644
--- a/net/rxrpc/output.c
+++ b/net/rxrpc/output.c
@@ -259,6 +259,7 @@