Re: [Cake] [PATCH net-next 02/14] net_sched: cake: implement lockless cake_dump()

2024-04-17 Thread Toke Høiland-Jørgensen via Cake
Simon Horman  writes:

> + Toke Høiland-Jørgensen 
>   cake@lists.bufferbloat.net

Thanks!

> On Mon, Apr 15, 2024 at 01:20:42PM +, Eric Dumazet wrote:
>> Instead of relying on RTNL, cake_dump() can use READ_ONCE()
>> annotations, paired with WRITE_ONCE() ones in cake_change().
>> 
>> Signed-off-by: Eric Dumazet 

Just to be sure I understand this correctly: the idea is that with
READ/WRITE_ONCE annotations, we can dump the qdisc options without
taking the RTNL lock. This means that a dump not be consistent across a
concurrent reconfig that changes multiple parameters, but each parameter
will be either the new or the old value. Right?

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next v2] net: sched: cake: Optimize the number of function calls and branches in heap construction

2024-04-09 Thread Toke Høiland-Jørgensen via Cake
Kuan-Wei Chiu  writes:

> When constructing a heap, heapify operations are required on all
> non-leaf nodes. Thus, determining the index of the first non-leaf node
> is crucial. In a heap, the left child's index of node i is 2 * i + 1
> and the right child's index is 2 * i + 2. Node CAKE_MAX_TINS *
> CAKE_QUEUES / 2 has its left and right children at indexes
> CAKE_MAX_TINS * CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2,
> respectively, which are beyond the heap's range, indicating it as a
> leaf node. Conversely, node CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 has a
> left child at index CAKE_MAX_TINS * CAKE_QUEUES - 1, confirming its
> non-leaf status. The loop should start from it since it's not a leaf
> node.
>
> By starting the loop from CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1, we
> minimize function calls and branch condition evaluations. This
> adjustment theoretically reduces two function calls (one for
> cake_heapify() and another for cake_heap_get_backlog()) and five branch
> evaluations (one for iterating all non-leaf nodes, one within
> cake_heapify()'s while loop, and three more within the while loop
> with if conditions).
>
> Signed-off-by: Kuan-Wei Chiu 

Acked-by: Toke Høiland-Jørgensen 
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next] net: sched: cake: Optimize number of calls to cake_heapify()

2024-04-08 Thread Toke Høiland-Jørgensen via Cake
Kuan-Wei Chiu  writes:

> On Sun, Apr 07, 2024 at 06:10:04PM +0200, Toke Høiland-Jørgensen wrote:
>> Kuan-Wei Chiu  writes:
>> 
>> > Improve the max-heap construction process by reducing unnecessary
>> > heapify operations. Specifically, adjust the starting condition from
>> > n / 2 to n / 2 - 1 in the loop that iterates over all non-leaf
>> > elements.
>> 
>> Please add an explanation for why this change is correct, and why it is
>> beneficial. "Improve" and "unnecessary" is way too implicit.
>> 
>> pw-bot: cr
>
> For correctness:
> To build a heap, we need to perform heapify operations on all non-leaf
> nodes, so we need to find the index of the first non-leaf node. In a
> heap, the index of node i, the left child's index is 2 * i + 1, and the
> right child's index is 2 * i + 2. The left and right children of node
> CAKE_MAX_TINS * CAKE_QUEUES / 2 are at indexes CAKE_MAX_TINS *
> CAKE_QUEUES + 1 and CAKE_MAX_TINS * CAKE_QUEUES + 2, respectively. Both
> children's indexes are beyond the range of the heap, indicating that
> CAKE_MAX_TINS * CAKE_QUEUES / 2 is a leaf node. The left child of node
> CAKE_MAX_TINS * CAKE_QUEUES / 2 - 1 is at index CAKE_MAX_TINS *
> CAKE_QUEUES - 1, and the right child is at index CAKE_MAX_TINS *
> CAKE_QUEUES. Therefore, we know the left child exists, but the right
> child does not. Since it's not a leaf node, the loop should start from
> it.
>
> For benefit:
> We can reduce 2 function calls (one for cake_heapify() and another for
> cake_heap_get_backlog()) and decrease 5 branch condition evaluations
> (one for iterating through all non-leaf nodes, one inside the while
> loop of cake_heapify(), and three more inside the while loop with if
> conditions). The only added operation is an extra subtraction.
>
> If you're satisfied with the explanation above, I can attempt to
> rewrite the commit message and send the v2 patch.

Yes, sounds reasonable. Did you measure any real-world performance
benefit, or is this purely a theoretical optimisation? Either way,
please indicate this in the updated patch description.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next] net: sched: cake: Optimize number of calls to cake_heapify()

2024-04-07 Thread Toke Høiland-Jørgensen via Cake
Kuan-Wei Chiu  writes:

> Improve the max-heap construction process by reducing unnecessary
> heapify operations. Specifically, adjust the starting condition from
> n / 2 to n / 2 - 1 in the loop that iterates over all non-leaf
> elements.

Please add an explanation for why this change is correct, and why it is
beneficial. "Improve" and "unnecessary" is way too implicit.

pw-bot: cr
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake's ack-filter vs GSO

2024-02-23 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> On Sun, Feb 18, 2024 at 8:37 AM Toke Høiland-Jørgensen  wrote:
>>
>> Dave Taht via Cake  writes:
>>
>> > It has been years since I looked at cake's code.
>> >
>> > Does anyone remember why we do not ack-filter a gso-split?
>>
>> Because a GSO packet cannot be a pure ACK, so it wouldn't be filtered
>> anyway...
>
> But a GRO packet can, and most likely IS a pure ack packet train that
> could and should be thinned. I think. Yes?

Erm, no, because those would have header differences and so wouldn't be
combined into a single GRO packet...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake's ack-filter vs GSO

2024-02-18 Thread Toke Høiland-Jørgensen via Cake
Dave Taht via Cake  writes:

> It has been years since I looked at cake's code.
>
> Does anyone remember why we do not ack-filter a gso-split?

Because a GSO packet cannot be a pure ACK, so it wouldn't be filtered
anyway...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH 0/3] net/sched: Load modules via alias

2023-12-06 Thread Toke Høiland-Jørgensen via Cake
Michal Koutný  writes:

> On Wed, Dec 06, 2023 at 02:28:57PM -0800, Stephen Hemminger 
>  wrote:
>> It is not clear to me what this patchset is trying to fix.
>> Autoloading happens now, but it does depend on the name not alias.
>
> There are some more details in the thread of v1 [1] [2].
> Does it clarify?

Yes, but this should be explained clearly in the commit message
(including the reason why this is useful, in the follow-up to [1]).

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Fwd: been a while since I did a kernel patch

2023-07-29 Thread Toke Høiland-Jørgensen via Cake
Dave Taht via Cake  writes:

> I have been meaning to make this patch use UINT_MAX and submit to the
> kernel for a long time. Anyone want to give it a spin?

There's tooling to ease submission these days:

https://people.kernel.org/monsieuricon/sending-a-kernel-patch-with-b4-part-1

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Two questions re high speed congestionmanagement anddatagram protocols

2023-06-28 Thread Toke Høiland-Jørgensen via Cake
"David P. Reed via Bloat"  writes:

> (One such nightmare can be seen in LKML... Search for
> dpr...@deepplum.com patch emails. I tried hard, was worn down, then
> gave up, since I found a way to avoid the bug, in virtualization code
> on x86, and gave up on getting it fixed after a year. Life is too
> short.

Went looking, since I think it's important to learn from such process
failures (and you're certainly not the first to opine that getting
patches into the kernel is challenging).

I'm assuming you're referring to this series, right?

https://lore.kernel.org/all/20200704203809.76391-4-dpr...@deepplum.com/

Which, to me, looks like it was pretty close to being accepted; another
revision would probably have made the cut...

...In fact it seems those patches were later resurrected by Sean as part
of this series, six months later:

https://lore.kernel.org/all/20201231002702.2223707-1-sea...@google.com/

One of the patches retained your authorship and made it into the kernel
in this commit:

https://git.kernel.org/torvalds/c/5364a305


So wouldn't necessarily call that a complete failure :) Seems the main
process issue you hit here was a subsystem that was too resource
constrained on the review side, which does sadly happen. The kernel
process tends to heavily favour convenience of reviewers for the same
reason (which is one of the reasons it can be off-putting to outsiders,
so it's a bit of a chicken and egg situation...)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] big tcp

2023-02-23 Thread Toke Høiland-Jørgensen via Cake
Jonathan Morton via Cake  writes:

>> On 23 Feb, 2023, at 3:35 am, Dave Taht via Cake  
>> wrote:
>> 
>> does this break cake?
>> 
>> https://lore.kernel.org/netdev/de811bf3-e2d8-f727-72bc-c8a754a9d...@tessares.net/T/
>
> It looks like they've included patches *to* Cake to handle anticipated
> breakage.  This is one of the great things about having it upstreamed.

Yup, indeed. In this case they did so without Cc'ing the maintainer,
though :(

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] flow dissection vs encapsulated traffic?

2023-02-05 Thread Toke Høiland-Jørgensen via Cake
Dave Taht via Cake  writes:

> In looking how the code has morphed since I last looked at it, I found
> myself staring at this bit...
>
> skb_flow_dissect_flow_keys(skb, &keys,
>FLOW_DISSECTOR_F_STOP_AT_FLOW_LABEL);
>
> // so we have delved deeply into the packet at this point... finding
> various encapsulations...
>
> then we get to:
>
> /* Don't use the SKB hash if we change the lookup keys from conntrack 
> */
> if (nat_enabled && cake_update_flowkeys(&keys, skb))
> use_skbhash = false;
>
> This leverages skb_protocol(), which as best as I can tell just peers
> into the *vlan headers*,
> not deeper into the packet...
>
> Then we proceed merrily into the update_flowkeys code thinking it is
> the outer type
> (ipv4), not the inner, then dissect away, using a v4 union...
>
> Am I reading this wrong? Please tell me I am reading this wrong...

I don't follow. What's the bug you are seeing, specifically?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net, v2 1/3] net: sched: cake: fix null pointer access issue when cake_init() fails

2022-10-18 Thread Toke Høiland-Jørgensen via Cake
Zhengchao Shao  writes:

> When the default qdisc is cake, if the qdisc of dev_queue fails to be
> inited during mqprio_init(), cake_reset() is invoked to clear
> resources. In this case, the tins is NULL, and it will cause gpf issue.
>
> The process is as follows:
> qdisc_create_dflt()
>   cake_init()
>   q->tins = kvcalloc(...)--->failed, q->tins is NULL
>   ...
>   qdisc_put()
>   ...
>   cake_reset()
>   ...
>   cake_dequeue_one()
>   b = &q->tins[...]   --->q->tins is NULL
>
> The following is the Call Trace information:
> general protection fault, probably for non-canonical address
> 0xdc00:  [#1] PREEMPT SMP KASAN
> KASAN: null-ptr-deref in range [0x-0x0007]
> RIP: 0010:cake_dequeue_one+0xc9/0x3c0
> Call Trace:
> 
> cake_reset+0xb1/0x140
> qdisc_reset+0xed/0x6f0
> qdisc_destroy+0x82/0x4c0
> qdisc_put+0x9e/0xb0
> qdisc_create_dflt+0x2c3/0x4a0
> mqprio_init+0xa71/0x1760
> qdisc_create+0x3eb/0x1000
> tc_modify_qdisc+0x408/0x1720
> rtnetlink_rcv_msg+0x38e/0xac0
> netlink_rcv_skb+0x12d/0x3a0
> netlink_unicast+0x4a2/0x740
> netlink_sendmsg+0x826/0xcc0
> sock_sendmsg+0xc5/0x100
> sys_sendmsg+0x583/0x690
> ___sys_sendmsg+0xe8/0x160
> __sys_sendmsg+0xbf/0x160
> do_syscall_64+0x35/0x80
> entry_SYSCALL_64_after_hwframe+0x46/0xb0
> RIP: 0033:0x7f89e5122d04
> 
>
> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) 
> qdisc")
> Signed-off-by: Zhengchao Shao 

Acked-by: Toke Høiland-Jørgensen 
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next, v3 01/18] net/sched: sch_api: add helper for tc qdisc walker stats dump

2022-09-21 Thread Toke Høiland-Jørgensen via Cake
Zhengchao Shao  writes:

> The walk implementation of most qdisc class modules is basically the
> same. That is, the values of count and skip are checked first. If
> count is greater than or equal to skip, the registered fn function is
> executed. Otherwise, increase the value of count. So we can reconstruct
> them.
>
> Signed-off-by: Zhengchao Shao 

Acked-by: Toke Høiland-Jørgensen 

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next, v3 02/18] net/sched: use tc_qdisc_stats_dump() in qdisc

2022-09-21 Thread Toke Høiland-Jørgensen via Cake
Zhengchao Shao  writes:

> use tc_qdisc_stats_dump() in qdisc.
>
> Signed-off-by: Zhengchao Shao 
> Reviewed-by: Victor Nogueira 
> Tested-by: Victor Nogueira 

For sch_cake:

Acked-by: Toke Høiland-Jørgensen 

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next, v2 01/18] net/sched: sch_api: add helper for tc qdisc walker stats dump

2022-09-20 Thread Toke Høiland-Jørgensen via Cake
Zhengchao Shao  writes:

> The walk implementation of most qdisc class modules is basically the
> same. That is, the values of count and skip are checked first. If
> count is greater than or equal to skip, the registered fn function is
> executed. Otherwise, increase the value of count. So we can reconstruct
> them.
>
> Signed-off-by: Zhengchao Shao 
> ---
>  include/net/pkt_sched.h | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
> index 29f65632ebc5..243e8b0cb7ea 100644
> --- a/include/net/pkt_sched.h
> +++ b/include/net/pkt_sched.h
> @@ -222,4 +222,17 @@ static inline struct tc_skb_cb *tc_skb_cb(const struct 
> sk_buff *skb)
>   return cb;
>  }
>  
> +static inline bool tc_qdisc_stats_dump(struct Qdisc *sch,
> +struct qdisc_walker *arg,
> +unsigned long cl)
> +{
> + if (arg->count >= arg->skip && arg->fn(sch, cl, arg) < 0) {

Seems a bit confusing that tc_qdisc_stats_dump() reverses the order of
the 'cl' and 'arg' parameters relative to the callback?

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb

2022-09-01 Thread Toke Høiland-Jørgensen via Cake
Jakub Kicinski  writes:

> On Thu, 01 Sep 2022 00:13:24 +0200 Toke Høiland-Jørgensen wrote:
>> Ah, crossed streams (just sent v2[0]).
>
> Sorry about that, traveling knocked out my sense of time and I kept
> thinking it's Thursday, and the discussion happened yesterday :S

Haha, OK, no worries :)

>> Hmm, okay, so as noted in the changelog to v2, just this patch will
>> break htb+cake (because htb will now skip htb_activate()); do you prefer
>> that I send a follow-up to fix HTB in this mode, or to revert this and
>> apply the fix to sfb in v2 instead?
>
> Reverted. Let's review v2 as if v1 was not applied.

SGTM!

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb

2022-08-31 Thread Toke Høiland-Jørgensen via Cake
patchwork-bot+netdev...@kernel.org writes:

> Hello:
>
> This patch was applied to netdev/net.git (master)
> by Jakub Kicinski :
>
> On Wed, 31 Aug 2022 11:21:03 +0200 you wrote:
>> When the GSO splitting feature of sch_cake is enabled, GSO superpackets
>> will be broken up and the resulting segments enqueued in place of the
>> original skb. In this case, CAKE calls consume_skb() on the original skb,
>> but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
>> assuming the original skb still exists, when it really has been freed. Fix
>> this by adding the __NET_XMIT_STOLEN flag to the return value in this case.
>> 
>> [...]
>
> Here is the summary with links:
>   - [net] sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb
> https://git.kernel.org/netdev/net/c/90fabae8a2c2

Ah, crossed streams (just sent v2[0]).

Hmm, okay, so as noted in the changelog to v2, just this patch will
break htb+cake (because htb will now skip htb_activate()); do you prefer
that I send a follow-up to fix HTB in this mode, or to revert this and
apply the fix to sfb in v2 instead?

-Toke


[0] https://lore.kernel.org/r/20220831215219.499563-1-t...@toke.dk
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb

2022-08-31 Thread Toke Høiland-Jørgensen via Cake
Eric Dumazet  writes:

> On Wed, Aug 31, 2022 at 2:25 AM Toke Høiland-Jørgensen  wrote:
>>
>> When the GSO splitting feature of sch_cake is enabled, GSO superpackets
>> will be broken up and the resulting segments enqueued in place of the
>> original skb. In this case, CAKE calls consume_skb() on the original skb,
>> but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
>> assuming the original skb still exists, when it really has been freed. Fix
>> this by adding the __NET_XMIT_STOLEN flag to the return value in this case.
>>
>
> I think you forgot to give credits to the team who discovered this issue.
>
> Something like this
>
> Reported-by: zdi-disclosu...@trendmicro.com # ZDI-CAN-18231

Ah, right; apologies, will respin!

It also looks like fixing it this way will actually break other things
(most notably sch_cake as a child of sch_htb), so will send a different
patch as v2...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net] sch_cake: Return __NET_XMIT_STOLEN when consuming enqueued skb

2022-08-31 Thread Toke Høiland-Jørgensen via Cake
When the GSO splitting feature of sch_cake is enabled, GSO superpackets
will be broken up and the resulting segments enqueued in place of the
original skb. In this case, CAKE calls consume_skb() on the original skb,
but still returns NET_XMIT_SUCCESS. This can confuse parent qdiscs into
assuming the original skb still exists, when it really has been freed. Fix
this by adding the __NET_XMIT_STOLEN flag to the return value in this case.

Fixes: 0c850344d388 ("sch_cake: Conditionally split GSO segments")
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index a43a58a73d09..a04928082e4a 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1713,6 +1713,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
}
idx--;
flow = &b->flows[idx];
+   ret = NET_XMIT_SUCCESS;
 
/* ensure shaper state isn't stale */
if (!b->tin_backlog) {
@@ -1771,6 +1772,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
 
qdisc_tree_reduce_backlog(sch, 1-numsegs, len-slen);
consume_skb(skb);
+   ret |= __NET_XMIT_STOLEN;
} else {
/* not splitting */
cobalt_set_enqueue_time(skb, now);
@@ -1904,7 +1906,7 @@ static s32 cake_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
}
b->drop_overlimit += dropped;
}
-   return NET_XMIT_SUCCESS;
+   return ret;
 }
 
 static struct sk_buff *cake_dequeue_one(struct Qdisc *sch)
-- 
2.37.2

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next, v2] net: sched: remove redundant NULL check in change hook function

2022-08-29 Thread Toke Høiland-Jørgensen via Cake
Zhengchao Shao  writes:

> Currently, the change function can be called by two ways. The one way is
> that qdisc_change() will call it. Before calling change function,
> qdisc_change() ensures tca[TCA_OPTIONS] is not empty. The other way is
> that .init() will call it. The opt parameter is also checked before
> calling change function in .init(). Therefore, it's no need to check the
> input parameter opt in change function.
>
> Signed-off-by: Zhengchao Shao 

for sch_cake:

Acked-by: Toke Høiland-Jørgensen 
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] CAKE saves someone's job during the pandemic...

2022-08-25 Thread Toke Høiland-Jørgensen via Cake
I figured y'all would find this an uplifting story - I certainly did!

https://forum.openwrt.org/t/nftables-and-qos-in-2021/112013/530

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next] sch_cake: Take into account guideline DEF/DGSIC/36 from French Administration

2022-03-28 Thread Toke Høiland-Jørgensen
Christophe Leroy  writes:

> Le 28/03/2022 à 17:43, Toke Høiland-Jørgensen a écrit :
>> Christophe Leroy  writes:
>> 
>>> French Administration has written a guideline that defines additional
>>> DSCP values for use in its networks.
>> 
>> Huh, that's interesting!
>> 
>>> Add new CAKE diffserv tables to take those new values into account
>>> and add CONFIG_NET_SCH_CAKE_DGSIC to select those tables instead of
>>> the default ones.
>> 
>> ...however I don't think we should be including something this
>> special-purpose into the qdisc kernel code, and certainly we shouldn't
>> have a config option that changes the meaning of the existing diffserv
>> keywords!
>
> By the way it doesn't really change meaning. Just enhances it indeed. 
> Because they are not changing the meaning of existing DSCP codes, just 
> adding new ones.

Ah, OK, but then it's even easier, you can just add rules for the new
values using the userspace options and let cake select the rest using
the existing code.

>> Rather, this is something that is best specified from userspace; and in
>> fact Cake already has no less than two different ways to do this: the
>> 'fwmark' option, and setting the skb->priority field. Have you tried
>> using those?
>
> No I have not. In fact I'm just discovering the subject after some 
> people told me "we are contractually required to apply this guideline, 
> please make sure Linux kernel supports it".
>
> I'll give 'fwmark' and/or skb->priority a try. Is there any 
> Documentation on that somewhere ?

Both are documented in the tc-cake man page.

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next] sch_cake: Take into account guideline DEF/DGSIC/36 from French Administration

2022-03-28 Thread Toke Høiland-Jørgensen via Cake
Christophe Leroy  writes:

> French Administration has written a guideline that defines additional
> DSCP values for use in its networks.

Huh, that's interesting!

> Add new CAKE diffserv tables to take those new values into account
> and add CONFIG_NET_SCH_CAKE_DGSIC to select those tables instead of
> the default ones.

...however I don't think we should be including something this
special-purpose into the qdisc kernel code, and certainly we shouldn't
have a config option that changes the meaning of the existing diffserv
keywords!

Rather, this is something that is best specified from userspace; and in
fact Cake already has no less than two different ways to do this: the
'fwmark' option, and setting the skb->priority field. Have you tried
using those?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] libreqos adds pping integration

2022-03-14 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> https://github.com/rchac/LibreQoS

Cool! Would be an excellent use case for the BPF-based pping (to avoid
the overhead of libpcap)...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake ingress switch

2022-02-26 Thread Toke Høiland-Jørgensen via Cake
Jim Geo  writes:

> Hello,
>
> Was it the function of ingress argument in cake?
> Checking online man pages, I couldn't find an explanation.
> Does it make sense to use it on an ifb device? (sending ingress
> traffic through it)

Yes, that's exactly what it's meant for. The option changes cake's
accounting so that packets that are dropped are accounted as part of the
effective bandwidth of the link (since on ingress, even if they're
dropped they have already traversed the bottleneck link). This should
help keep the bottleneck under tighter control...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] sch_cake: diffserv8 CS1 should be bulk

2022-01-25 Thread Toke Høiland-Jørgensen
Matt Johnston  writes:

> The CS1 priority (index 0x08) was changed from 0 to 1 when LE (index
> 0x01) was added. This looks unintentional, it doesn't match the
> docs and CS1 shouldn't be the same tin as AF1x

Hmm, Kevin, any comments?

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Cerowrt-devel] 10gige and 2.5gige

2021-12-17 Thread Toke Høiland-Jørgensen via Cake
Sebastian Moeller  writes:

> Mmmh, I guess our approach at traffic shaping does not scale well at those 
> speeds. Maybe this could be fixed with larger batching?
>
> I think it might be worth trying to switch to simple.qos/fq_codel and
> set a somewhat larger burst/quantum duration in defaults.sh, then
> disable BQL on the NIC and configure a beefy txqueuelen on the
> interface. This might help making SQM limp along to higher rates. If
> that would actually work, we could try to see whether we can make cake
> learn coarser batching ("bursts" and/or quantum) at high rates (but I
> did not check what cake does internally, it might already do this,
> @jonathan?)

I fear the problem is rather the lack of multithreading. I have a fairly
beefy 8-core ARM box for my main router these days, and even that can't
forward at a gigabit on a single core. There was a bug in the HW
configuration so all traffic was sent to a single core, which resulted
in 50+ms of bloat and traffic capping out way short of a gigabit. Now
that it's fixed and traffic is mixed over all eight cores I have smooth
sailing and no bloat. Thankfully I don't need to shape, so I'm just
running straight mq+fq_codel on the physical interface

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] tossing acks into the background queue

2021-11-23 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> On Tue, Nov 23, 2021 at 2:39 AM Toke Høiland-Jørgensen  wrote:
>>
>> Sebastian Moeller  writes:
>>
>> > Hi Dave,
>> >
>> > On 23 November 2021 08:32:06 CET, Dave Taht  wrote:
>> >>The context of my question is basically this:
>> >>
>> >>Is cake baked? Is it done?
>> >
>> > How about per MAC address fairness (useful for ISPs and to treat
>> > IPv4/6 equally)?
>> >
>> > How about configurable number of queues (again helpful for ISPs)?
>>
>> FWIW I don't think CAKE is the right thing for ISPs, except in a
>> deployment where there's a single CAKE instance per customer. For
>> anything else (i.e., a single shaper that handles multiple customers),
>> you really need hierarchical policy enforcement like in a traditional
>> HTB configuration. And retrofitting this on top of CAKE is going to
>> conflict with the existing functionality, so it probably has to be a
>> separate qdisc anyway.
>
> What progress has been made on breaking the HTB locks in the last few
> years?

None. Don't see that happening any time soon; just the simple pfifo_fast
qdisc is uncovering all kinds of bugs when running in lockless mode.

Jesper basically solved the contention issue by partitioning the traffic
and running multiple instances:
https://github.com/xdp-project/xdp-cpumap-tc

Doesn't work for bandwidth sharing across instances, though, so it
solves the ISP "separate rates per customer" case, but not the CAKE
"shape a single link" case.

> Given the enormous number of hw tx/rx queues we see today (64+ on
> 10gbit), trying to charge off
> bandwidth per queue in a cake-derived shaper and protecting the merge
> with rcu seemed plausible...

Yeah, that was what I was going to try, but it turned out to be
decidedly non-trivial to make sch_cake itself mq-aware, so I gave up. My
hope is that this will be possible once we get sch_bpf, so we can just
have separate instances but they can share a single atomic var for the
bandwidth sync...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] tossing acks into the background queue

2021-11-23 Thread Toke Høiland-Jørgensen via Cake
>> And either way, CAKE is still going to be limited by being single core
>> only, and fixing that requires some serious surgery that I seem to
>> recall looking into and giving up at some point :(
>
>   That is sad, and pretty much rules out that I could make some
>   progress in that direction. The next level is shaping at ~1Gbps,
>   even though faster access links become available, like 8.5/10
>   Gbps (XGS-PON is nominally 10G, but after FEC only ~8.5 Gbps
>   actually are usable) or for a lucky few even 25 Gbps ...

Yup. FWIW my hope is that we'll be able to do something about this using
XDP, eventually... :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] tossing acks into the background queue

2021-11-23 Thread Toke Høiland-Jørgensen via Cake
Sebastian Moeller  writes:

> Hi Dave,
>
> On 23 November 2021 08:32:06 CET, Dave Taht  wrote:
>>The context of my question is basically this:
>>
>>Is cake baked? Is it done?
>
> How about per MAC address fairness (useful for ISPs and to treat
> IPv4/6 equally)?
>
> How about configurable number of queues (again helpful for ISPs)?

FWIW I don't think CAKE is the right thing for ISPs, except in a
deployment where there's a single CAKE instance per customer. For
anything else (i.e., a single shaper that handles multiple customers),
you really need hierarchical policy enforcement like in a traditional
HTB configuration. And retrofitting this on top of CAKE is going to
conflict with the existing functionality, so it probably has to be a
separate qdisc anyway.

> IMHO cake works pretty well, with the biggest issue being its CPU
> demands. As far as I understand however, that is caused by the shaper
> component and there low latency and throughput are in direct
> competition, if we want to lower the CPU latency demands we need to
> allow for bigger buffers that keep the link busy even if cake itself
> is not scheduled as precisely as we would desire or as e.g. BQL
> requires.

Yes, as link speed increases, batching needs to increase to keep up.
This does not *have* to impact latency, as the faster link should keep
the granularity constant in the time domain. So experimenting with doing
this dynamically in CAKE might be worthwhile, but probably not trivial.

And either way, CAKE is still going to be limited by being single core
only, and fixing that requires some serious surgery that I seem to
recall looking into and giving up at some point :(

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Ecn-sane] l4s kernel submission

2021-10-19 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> my more specific question was gro. On gro assembly is the dscp/ecn
> header examined?

Yes, and only packets with the same value get aggregated:
https://elixir.bootlin.com/linux/latest/source/net/ipv6/ip6_offload.c#L263

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Ecn-sane] l4s kernel submission

2021-10-18 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> An open question for me:
>
> What happens when a GSO packet is marked? Do all the packets get the
> marking, or just the first?

When segmenting a GSO packet, the IP header is copied to all segments.
The code has grown quite complex over the years, but it's easier to see
in the original submission from 2006:

https://lore.kernel.org/all/20060622081400.gc22...@gondor.apana.org.au/

Which means all packets in a GSO segment will end up marked on the
wire...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Ecn-sane] l4s kernel submission

2021-10-14 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> On Thu, Oct 14, 2021 at 2:44 PM Toke Høiland-Jørgensen  wrote:
>>
>> Dave Taht  writes:
>>
>> > weirdly enough, my gmail account has not received anything from netdev
>> > since oct 11.
>>
>> You're not alone in that:
>> https://lore.kernel.org/netdev/20211014112718.6aed7...@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/T/#t
>
> ok. One of these years I'll go back to running my own email server
> full time.

You can also subscribe to Linux lists by importing the mails from Lore,
as one of the replies in the thread above pointed out. Been meaning to
switch to that myself, but haven't gotten around to it yet...

>> > yes, i think fq_codel will be better, and even the proposed
>> > too-shallow threshold will make for less of a dent on the internet.
>> >
>> > still... I do wish I'd seen this earlier.
>>
>> Earlier? You forwarded the patch hours after it was posted...
>
> I have a daily search for fq_codel, bufferbloat, etc. I have noticed
> lately that some mailing list traffic from us is being indexed again.
> I wish I knew why our lists were not indexed by google.
>
> Anyway, lacking being on that thread, it's currently impossible to
> reply.

The Lore page contains instructions for various ways of replying even
without having the original email message in your mailbox. It's at the
bottom... 

> That said, the l4s fq_codel patch is intrinsically fair, which is
> vastly superior to the dualpi approach.

Yup, I agree it's better, but I don't like baking in the ECT(1)
semantics to UAPI. I suggested a filter-based approach which I'm
currently discussing with Eric on that thread as you might have noticed :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Ecn-sane] l4s kernel submission

2021-10-14 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> weirdly enough, my gmail account has not received anything from netdev
> since oct 11.

You're not alone in that:
https://lore.kernel.org/netdev/20211014112718.6aed7...@kicinski-fedora-pc1c0hjn.dhcp.thefacebook.com/T/#t

> yes, i think fq_codel will be better, and even the proposed
> too-shallow threshold will make for less of a dent on the internet.
>
> still... I do wish I'd seen this earlier.

Earlier? You forwarded the patch hours after it was posted...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] slightly negative deficits

2021-10-08 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> I have sometimes thought that allowing the deficit to go negative
> (either one quantum or one mtu) might
> result in slightly better service times, or tcp, or aqm behavior,
> given tcp's requirement for two back to back packets
> to release an ack.

Why? Isn't that just equivalent to raising the quantum?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net] sch_cake: fix srchost/dsthost hashing mode

2021-08-16 Thread Toke Høiland-Jørgensen
When adding support for using the skb->hash value as the flow hash in CAKE,
I accidentally introduced a logic error that broke the host-only isolation
modes of CAKE (srchost and dsthost keywords). Specifically, the flow_hash
variable should stay initialised to 0 in cake_hash() in pure host-based
hashing mode. Add a check for this before using the skb->hash value as
flow_hash.

Fixes: b0c19ed6088a ("sch_cake: Take advantage of skb->hash where appropriate")
Reported-by: Pete Heist 
Tested-by: Pete Heist 
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 951542843cab..28af8b1e1bb1 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -720,7 +720,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct 
sk_buff *skb,
 skip_hash:
if (flow_override)
flow_hash = flow_override - 1;
-   else if (use_skbhash)
+   else if (use_skbhash && (flow_mode & CAKE_FLOW_FLOWS))
flow_hash = skb->hash;
if (host_override) {
dsthost_hash = host_override - 1;
-- 
2.32.0

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake srchost/dsthost stopped working?

2021-08-05 Thread Toke Høiland-Jørgensen via Cake
Pete Heist  writes:

> As for triple-isolate, I added a test Jon suggested:
>
> "I would recommend a W configuration of flows
>
> one of the three hosts sends to or from both of the two hosts, while
> the other two send to or from only one each
>
> this should result in the two hosts getting equal throughput each, and
> equal throughput in each of their two flows"
>
> This is the last test in the output, and I think the behavior looks
> correct for both the unpatched and patched versions:
>
> https://www.heistp.net/downloads/cake-hostfair/cake-hostfair-unpatched.txt
>
> https://www.heistp.net/downloads/cake-hostfair/cake-hostfair-patched.txt
>
> Thanks guys, and that, I believe, is finally it for now... :)

Awesome, thank you both! I'll send a proper patch upstream :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake srchost/dsthost stopped working?

2021-08-04 Thread Toke Høiland-Jørgensen via Cake
Jonathan Morton  writes:

> On Wed, 4 Aug 2021 at 14:14, Toke Høiland-Jørgensen via Cake
>  wrote:
>>
>> Pete Heist  writes:
>>
>> > One more tip, reverting this commit seems to fix it:
>> >
>> > https://github.com/torvalds/linux/commit/b0c19ed6088ab41dd2a727b60594b7297c15d6ce
>>
>> Ah, I think I see what the problem is; could you please try the patch
>> below?
>>
>> -Toke
>>
>> diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
>> index 951542843cab..a83c4d4326da 100644
>> --- a/net/sched/sch_cake.c
>> +++ b/net/sched/sch_cake.c
>> @@ -720,7 +720,7 @@ static u32 cake_hash(struct cake_tin_data *q, const 
>> struct sk_buff *skb,
>>  skip_hash:
>> if (flow_override)
>> flow_hash = flow_override - 1;
>> -   else if (use_skbhash)
>> +   else if (use_skbhash && flow_mode & CAKE_FLOW_FLOWS)
>> flow_hash = skb->hash;
>> if (host_override) {
>> dsthost_hash = host_override - 1;
>
> Good catch - I was going to have to wade in and remind myself how this
> lump of code worked.  But shouldn't the masking operation be in
> brackets, to make the precedence explicit?

Well, seeing as I introduced the bug, I think it's only fair that I fix
it as well ;)

I don't mind adding parenthesis; can do so when submitting a patch,
after Pete confirms that this fixes the issue...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] cake srchost/dsthost stopped working?

2021-08-04 Thread Toke Høiland-Jørgensen via Cake
Pete Heist  writes:

> One more tip, reverting this commit seems to fix it:
>
> https://github.com/torvalds/linux/commit/b0c19ed6088ab41dd2a727b60594b7297c15d6ce

Ah, I think I see what the problem is; could you please try the patch
below?

-Toke

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 951542843cab..a83c4d4326da 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -720,7 +720,7 @@ static u32 cake_hash(struct cake_tin_data *q, const struct 
sk_buff *skb,
 skip_hash:
if (flow_override)
flow_hash = flow_override - 1;
-   else if (use_skbhash)
+   else if (use_skbhash && flow_mode & CAKE_FLOW_FLOWS)
flow_hash = skb->hash;
if (host_override) {
dsthost_hash = host_override - 1;
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Cerowrt-devel] Fwd: [RFC net-next] net: extend netdev features

2021-07-11 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> One thing somewhat related to this was finally expanding the space
> available for the tc and iptables functionality for
> things like hashing and actions etc from 16 bits to 32. That is
> something of a fork lift upgrade, but... 64k queues is not
> enough in some cases, nor is 64k possible users in libreqos. thoughts

skb->hash is 32 bits already. There is hardware out there that does
64-bit hashing, promising to hash packets in a way that basically
guarantees the hash to be a globally unique flow identifier. Supporting
*that* in the kernel would need some work, though... :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Little's Law mea culpa, but not invalidating my main point

2021-07-09 Thread Toke Høiland-Jørgensen via Cake
"Holland, Jake via Bloat"  writes:

> Hi David,
>
> That’s an interesting point, and I think you’re right that packet
> arrival is poorly modeled as a Poisson process, because in practice
> packet transmissions are very rarely unrelated to other packet
> transmissions.
>
> But now you’ve got me wondering what the right approach is. Do you
> have any advice for how to improve this kind of modeling?

I actually tried my hand at finding something better for my master's
thesis and came across something called a Markov-Modulated Poisson
Process (MMPP/D/1 queue)[0]. It looked promising, but unfortunately I
failed to make it produce any useful predictions. Most likely this was
as much a result of my own failings as a queueing theorist as it was the
fault of the model (I was in way over my head by the time I got to that
model); so I figured I'd mention it here in case anyone more qualified
would have any opinion on it.

I did manage to get the Linux kernel to produce queueing behaviour that
resembled that of a standard M/M/1 queue (if you squint a bit); all you
have to do is to use a traffic generator that emits packets with the
distribution the model assumes... :)

The full thesis is still available[1] for the perusal of morbidly curious.

-Toke

[0] https://www.sciencedirect.com/science/article/abs/pii/016653169390035S
[1] https://rucforsk.ruc.dk/ws/files/57613884/thesis-final.pdf
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Cake memory usage after 1TB

2021-06-24 Thread Toke Høiland-Jørgensen
Pete Heist  writes:

> I'm using Cake on an EdgeRouter-X at a site that does around 50-
> 100GB/day, and we're quite happy with it so far. Egress and ingress
> stats are below after ~1TB download and ~100GB upload.
>
> I use "memlimit 8M", and noticed that it's reporting 8389696 bytes used
> on egress, which is at the maximum (slightly over actually). Around 6M
> is used on ingress. Should I then raise these limits, or is it normal
> for it to settle in around the maximum over time?

That stat is the maximum memory ever used, not the current usage (that
is reported by "backlog 0b 0p"). So if your queue ever overflows it'll
be at its maximum.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Cake memory usage after 1TB

2021-06-24 Thread Toke Høiland-Jørgensen via Cake
Sebastian Moeller  writes:

>> On Jun 24, 2021, at 15:59, Toke Høiland-Jørgensen  wrote:
>> 
>> Pete Heist  writes:
>> 
>>> I'm using Cake on an EdgeRouter-X at a site that does around 50-
>>> 100GB/day, and we're quite happy with it so far. Egress and ingress
>>> stats are below after ~1TB download and ~100GB upload.
>>> 
>>> I use "memlimit 8M", and noticed that it's reporting 8389696 bytes used
>>> on egress, which is at the maximum (slightly over actually). Around 6M
>>> is used on ingress. Should I then raise these limits, or is it normal
>>> for it to settle in around the maximum over time?
>> 
>> That stat is the maximum memory ever used, not the current usage (that
>> is reported by "backlog 0b 0p"). So if your queue ever overflows it'll
>> be at its maximum.
>
>   Is there an easy way to reset that, short of tearing down the
>   qdisc set-up and building it up again?

Nope, 'fraid not :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] customizing Cake's isolation with ipsets, tc-flow and eBPF

2021-06-06 Thread Toke Høiland-Jørgensen via Cake
Pete Heist  writes:

> On Sun, 2021-06-06 at 21:59 +0200, Toke Høiland-Jørgensen wrote:
>> Pete Heist  writes:
>> 
>> > I've always wanted a way to customize Cake's host and flow isolation
>> > in
>> > a way that would be usable e.g. for small ISPs, and this is what I
>> > came
>> > up with:
>> > 
>> > https://github.com/heistp/cake-custom-isolation
>> > 
>> > ipsets are used to set the skb priority or mark, then tc-flow or a
>> > simple eBPF classifier is used in a child filter of cake to get the
>> > major and minor class IDs set, which override the host and flow
>> > hashes.
>> 
>> Very cool! Awesome to see the customisation options being used for
>> something neat like this! :)
>> 
>> > To show it in action, the cakeiso.sh script sets up a netns
>> > environment
>> > and runs competition between two "subscribers" and three flows, two
>> > TCP
>> > flows and one unresponsive UDP flow. Several configurations are run
>> > to
>> > show what is and isn't possible.
>> > 
>> > If anyone knows of a simpler way than eBPF to get both the major and
>> > minor class ID set from ipsets, I'd like to hear it, but the included
>> > classifiers are at least very simple one-liners...
>> 
>> Well, you could go the other way? Instead of ipset, just do the
>> classification in eBPF and use a BPF map to store the IP addresses.
>> There's even an LPM map type, so you can use arbitrary prefix lengths
>> for each class (or not, and just use a hashmap)...
>
> True that, I started something like that at some point:
>
> https://github.com/heistp/tc-users/
>
> but I think I got a little overzealous with it. I'm not sure if/when
> I'll get back to that, but the ipset solution seems to be "good enough"
> for what I (and my ISP) needs. I'm glad you slipped the tc filter
> overrides in before Cake went out the door. :)

Yes! I would love to take credit for the idea, but this came from
the netdev review. I agree though, great that it ended up in there!

> This doesn't do away with the possible need for a full-blown ISP qdisc
> one day, with configurable subscriber tiers, handling of higher loads,
> etc, but at least it's something for the little guys.

Yeah; I have something in the pipeline that will hopefully end up being
useable for that, but may take a little while to get there :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] customizing Cake's isolation with ipsets, tc-flow and eBPF

2021-06-06 Thread Toke Høiland-Jørgensen via Cake
Pete Heist  writes:

> I've always wanted a way to customize Cake's host and flow isolation in
> a way that would be usable e.g. for small ISPs, and this is what I came
> up with:
>
> https://github.com/heistp/cake-custom-isolation
>
> ipsets are used to set the skb priority or mark, then tc-flow or a
> simple eBPF classifier is used in a child filter of cake to get the
> major and minor class IDs set, which override the host and flow hashes.

Very cool! Awesome to see the customisation options being used for
something neat like this! :)

> To show it in action, the cakeiso.sh script sets up a netns environment
> and runs competition between two "subscribers" and three flows, two TCP
> flows and one unresponsive UDP flow. Several configurations are run to
> show what is and isn't possible.
>
> If anyone knows of a simpler way than eBPF to get both the major and
> minor class ID set from ipsets, I'd like to hear it, but the included
> classifiers are at least very simple one-liners...

Well, you could go the other way? Instead of ipset, just do the
classification in eBPF and use a BPF map to store the IP addresses.
There's even an LPM map type, so you can use arbitrary prefix lengths
for each class (or not, and just use a hashmap)...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Flent-users] centos 8 and cake and flent

2021-05-06 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> Currently centos (and I assume redhat) is at 4.18. Cake went into 4.19
> so I assume the next major
> redhat/centos releases will have it.

Note that the RHEL kernel version number is basically a complete
fabrication; it's the version that the kernel was forked from, something
like 30% of all commits are backported for each new upstream release,
without changing the RHEL-kernel version number.

Which means that all the Cake out-of-tree kernel version compatibility
stuff is not going to work, because that works based on the kernel
version number...

> Is there a yum/rpm expert in the house? flent does not appear to be
> packaged up for this (?),

It's in Fedora: https://src.fedoraproject.org/rpms/flent - should be
fairly straight-forward to add it to EPEL as well, but thus far no one
has requested it... :)

> neither is netperf or irtt. Is there a repo I could use?

netperf is blocked on licensing: 
https://bugzilla.redhat.com/show_bug.cgi?id=1729939

Since the re-licensing there may be a chance, but not sure what the
procedure is when there's not a release with the new license.

As for irtt, that should be pretty straight-forward to package.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Cerowrt-devel] merlin gains cake support

2021-03-03 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> https://github.com/dtaht/tc-adv/issues/27
>
> I don't have any of the hardware supported by this firmware lying
> around handy... can anyone test?

I believe the original reported already tested and confirmed the fix,
actually. I have a patch pending for upstream iproute2 which should show
up on netdev as soon as vger unclogs...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-26 Thread Toke Høiland-Jørgensen via Cake
Taraldsen Erik  writes:

> This is getting LTE/5G spesific.  Not sure if it belongs on the list.
> Let us know if we are generating noise.

I for one think it's fascinating - carry on! :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-26 Thread Toke Høiland-Jørgensen via Cake
Nils Andreas Svee  writes:

> On 2/25/21 11:30 AM, Toke Høiland-Jørgensen wrote:
>
>> Ah, wireguard doesn't have XDP support, so that's likely not going to
>> work; and if you run it on the physical interface, even if you didn't
>> get driver errors, the tool would just see the encrypted packets which
>> is not terribly helpful (it parses TCP timestamps to match
>> incoming/outgoing packets and compute the RTT).
>
> I figured that might be the case. Yes I would've disabled the VPN if I 
> didn't get driver errors.
> I changed the network interface to use an emulated Intel E1000 tonight, 
> and if I bypass the VPN it works as it should.

Right, awesome!

>> I guess we should be more flexible about which hooks we support, so it
>> can also be used on devices with no XDP support. Adding in Simon, who is
>> writing the code; I think he is focused on getting a couple of other
>> features done first, but this could go on the TODO list :)
>
> It's not like I'm in a hurry, and I'd probably need some time to figure 
> out how to tweak the CAKE parameters correctly with this anyway ;)
>
> Speaking of, isn't one of the challenges with solutions like these that 
> it's hard to tell when conditions have improved and allow for more 
> throughput? At least that's what I remember being the issue when I 
> tested CAKE's autorate-ingress back in the day.

Yeah, there would have to be some kind of probing to discover when the
bandwidth goes up (maybe something like what BBR does?). Working out the
details of this is still in the future, this is all just loose plans
that I'll try to get back to once we have the measurement tool working
reasonably well. Input and experiments welcome, of course!

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-25 Thread Toke Høiland-Jørgensen via Cake
"Nils Andreas Svee"  writes:

> I ran it on my router though, which has a decent amount of TCP flows running 
> at any given time.
> It's all going over a wg tunnel though, that might be wonky for all I
> know.

Ah, wireguard doesn't have XDP support, so that's likely not going to
work; and if you run it on the physical interface, even if you didn't
get driver errors, the tool would just see the encrypted packets which
is not terribly helpful (it parses TCP timestamps to match
incoming/outgoing packets and compute the RTT).

I guess we should be more flexible about which hooks we support, so it
can also be used on devices with no XDP support. Adding in Simon, who is
writing the code; I think he is focused on getting a couple of other
features done first, but this could go on the TODO list :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-24 Thread Toke Høiland-Jørgensen via Cake


On 24 February 2021 23:49:48 CET, Nils Andreas Svee  wrote:
>I'll look into pping. Admittedly I'm quite ignorant about BPF, so I'll
>likely blunder about for a bit, but hey, got it to compile - *and* run,
>but I didn't get any output other than the messages from clean_map.
>Dunno if I did something wrong, I'll look at it again tomorrow.


It monitors TCP traffic, so it'll only show something if you have TCP flows 
going while you run it...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-24 Thread Toke Høiland-Jørgensen via Cake
Taraldsen Erik  writes:

> Disclamer: I'm working on the Fixed Wireless products for Telenor
> (Zyxel NR7101 outdoor wall mounted unit). Not the Mobile Broadband
> products. We are working with Zyxel and Qualcom to try and implement
> an upstream queue which adapts to available radio resources. To much
> NDA so can't really disclose anything useful. Lets just say we are
> aware of the issues and are actively working to try and improve the
> situation - but don't hold your breath for a sollution.
>
> What sort of HW are you running your LTE on?
>
> Do you have a subscription with rate limitations? The PGW (router
> which enforces the limit) is a lot more latency friendly than if you
> are radio limited. So it may be beneficial to have a "slow"
> subscription rather than "free speed" then it comes to latency. Slow
> meaning lower subscrption rate than radio rate.

Ah, this is lovely! "How do I get my internet to be faster?" "Just buy a
slower connection!" :D

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Bloat] Fwd: [Galene] Dave on bufferbloat and jitter at 8pm CET Tuesday 23

2021-02-24 Thread Toke Høiland-Jørgensen via Cake
Dave Taht  writes:

> wow, that is (predictably) miserable, even with cake. The only
> solution that is going to
> work is to somehow actively monitor your link quality and adjust cake
> to suit. Or we can start trying to use kathie's passive ping tools.

We have a PhD student working on a BPF-based implementation of pping:
https://github.com/xdp-project/bpf-examples/tree/master/pping

My hope is that this can end up being an always-on thing that runs on
the router and can be used to adjust the CAKE parameters as latency
spikes.

There are still a few rough edges on the implementation (most notably
the data output can become quite high), but it should otherwise be
usable, so feel free to take it for a spin. Needs a fairly recent LLVM
(10+ IIRC) to compile the BPF parts.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Enforcing video quality question

2021-02-20 Thread Toke Høiland-Jørgensen via Cake
John Sager  writes:

> PS does the cake list allow attachments? It will be a small zip file.

It'll likely end up in the moderation queue (don't recall the size
limit), but I can manually approve it from there...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Enforcing video quality question

2021-02-20 Thread Toke Høiland-Jørgensen via Cake
John Sager  writes:

> You will need to specify the hosts explicitly, unless you can live with them 
> all sharing one bandwidth class. In that case if you have more than one 
> using bandwidth they would share the bandwidth in that class equally. I 
> assume from your original post that you want each host to be limited in 
> bandwidth to a specific value, but to do that you need a class for each host 
> in the ingress HTB.

Just do enough classes that you can cover the whole IP space? At least
for IPv4 that's trivial; for IPv6 you'll probably need to hash and hope
that there are not too many collisions...

> What you probably need is a scheduler that has a limit per flow up to
> an overall ceiling beyond which it shares equally. I'm not aware that
> any of the schedulers do anything like that.

If you use FQ-CoDel as the leaf qdisc in HTB you'll get flow scheduling
to each host. There won't be a per-flow *limit*, but you'll get nice
scheduling of all flows going towards each host.

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Enforcing video quality question

2021-02-18 Thread Toke Høiland-Jørgensen via Cake
Peter Lepeska  writes:

> A user on the OpenWrt forum suggested hashlimit rules supported by
> iptables. How does that idea sound to you?

That will result in a cliff-edge policer (i.e., as soon as a device goes
over its limits it will see every packet get dropped). This doesn't
interact too well with the burstiness of TCP, so you'll likely get
erratic behaviour of the traffic if you do that. Doing the same thing
with HTB means the router will queue+shape each class (and with FQ-CoDel
on the leaves, you'll get a nice AQM behaviour as well), so that will be
smoother and less prone to bloat :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Enforcing video quality question

2021-02-18 Thread Toke Høiland-Jørgensen via Cake
Peter Lepeska  writes:

> Not sure this is the right forum but I am using Cake on my OpenWRT router
> so I thought I would cross-post this question (
> https://forum.openwrt.org/t/use-sqm-to-configure-video-quality/88984) here.

CAKE doesn't have any facilities to do per-device shaping like you're
asking, no. You could use fwmark to always put traffic into a lower
priority bin, but that still only limits in the aggregate, it's not a
per-device thing.

What you'd need is something like an HTB policy that puts each internal
IP into its own bin and caps the bandwidth of each. You could write a
.qos script and integrate it into sqm-scripts, but I don't think there's
anything already in there that would do what you want either...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] NLA_F_NESTED is missing

2020-11-04 Thread Toke Høiland-Jørgensen via Cake
Dean Scarff  writes:

>  On Tue, 03 Nov 2020 12:00:55 +0100, Toke Høiland-Jørgensen wrote:
>> Dean Scarff  writes:
>>
>>>  On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote:
 Dean Scarff  writes:

>  Hi,
>
>  I've been happily running the out-of-tree sch_cake on my 
> Raspberry
> Pi
>  since 2015.  However, I recently upgraded my kernel (to 5.4.72 
> from
>  Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the
>  sch_cake in mainline.  Now, when running:
>
>sudo /sbin/tc qdisc add dev ppp0 root cake
>
>  I get the error:
>
>Error: NLA_F_NESTED is missing.
>
>  I get this error with the sch_cake in mainline, and also with
> sch_cake
>  built out-of-tree.  I also get the error with both Debian's
> iproute2
>  5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's
> tc-adv
>  repo.
>
>  Any ideas on what this error means and how to fix it?

 I just tried building a 5.4.72 kernel and couldn't reproduce this, 
 so
 it
 seems it's a fault with the raspberry pi kernel; I guess opening a
 bug
 against that would be the way to go?

 As for what's actually causing this, I couldn't find anything 
 obvious
 that touches this code in the qdisc layer; but I suppose it has
 something to do with the core qdisc netlink parsing code?

 -Toke
>>>
>>>  Thanks for the data point.
>>>
>>>  For the record, the relevant kernel source is:
>>>  
>>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143
>>>  and the Pi branch:
>>>  
>>> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143
>>>
>>>  It seems very unlikely that the Pi folks are patching the netlink
>>>  stuff, so I don't think I'll get much traction there unless I can 
>>> call
>>>  out something specifically wrong with their patchset.
>>
>> Well, something odd is certainly going on. The error message you're
>> quoting comes form a part of the netlink parsing code (in the kernel)
>> that shouldn't even be hit by the qdisc addition: NLA_F_NESTED 
>> parsing
>> is only enabled in 'strict' validation mode, which is not used for
>> qdiscs.
>>
>> So IDK, maybe a compiler issue or a bit that gets set wrong 
>> somewhere?
>> Bisecting the kernel may be the only option here, I don't think 
>> you're
>> going to find anything in userspace...
>
>  Yeah, I came to the same conclusion.  I verified the userspace was sane 
>  via gdb (see earlier post), and I also read through the sch_api.c and 
>  nlattr.c kernel code and it sure looks impossible for the strict 
>  validation to be getting hit.
>
>  Safe to say this was random corruption: I downgraded the kernel, things 
>  worked as expected, then I upgraded back to the 5.4.72 and it worked 
>  too!  Interestingly, the problem persisted across reboots (so it wasn't 
>  just RAM corruption), and all the kernel files also matched their "dpkg" 
>  MD5s (so it wasn't like the binaries were obviously corrupt on disk).  
>  I've replaced the Pi's microSD card just to be safe, though... kernel 
>  corruption is scary.

Ugh, Heisenbugs are the worst! Great to hear you managed to resolve it,
though :)

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] NLA_F_NESTED is missing

2020-11-03 Thread Toke Høiland-Jørgensen via Cake
Dean Scarff  writes:

>  On Mon, 02 Nov 2020 13:37:00 +0100, Toke wrote:
>> Dean Scarff  writes:
>>
>>>  Hi,
>>>
>>>  I've been happily running the out-of-tree sch_cake on my Raspberry 
>>> Pi
>>>  since 2015.  However, I recently upgraded my kernel (to 5.4.72 from
>>>  Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the
>>>  sch_cake in mainline.  Now, when running:
>>>
>>>sudo /sbin/tc qdisc add dev ppp0 root cake
>>>
>>>  I get the error:
>>>
>>>Error: NLA_F_NESTED is missing.
>>>
>>>  I get this error with the sch_cake in mainline, and also with 
>>> sch_cake
>>>  built out-of-tree.  I also get the error with both Debian's 
>>> iproute2
>>>  5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's 
>>> tc-adv
>>>  repo.
>>>
>>>  Any ideas on what this error means and how to fix it?
>>
>> I just tried building a 5.4.72 kernel and couldn't reproduce this, so 
>> it
>> seems it's a fault with the raspberry pi kernel; I guess opening a 
>> bug
>> against that would be the way to go?
>>
>> As for what's actually causing this, I couldn't find anything obvious
>> that touches this code in the qdisc layer; but I suppose it has
>> something to do with the core qdisc netlink parsing code?
>>
>> -Toke
>
>  Thanks for the data point.
>
>  For the record, the relevant kernel source is:
>  
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/include/net/netlink.h?h=v5.4.72#n1143
>  and the Pi branch:
>  
> https://github.com/raspberrypi/linux/blob/raspberrypi-kernel_1.20201022-1/include/net/netlink.h#L1143
>
>  It seems very unlikely that the Pi folks are patching the netlink 
>  stuff, so I don't think I'll get much traction there unless I can call 
>  out something specifically wrong with their patchset.

Well, something odd is certainly going on. The error message you're
quoting comes form a part of the netlink parsing code (in the kernel)
that shouldn't even be hit by the qdisc addition: NLA_F_NESTED parsing
is only enabled in 'strict' validation mode, which is not used for
qdiscs.

So IDK, maybe a compiler issue or a bit that gets set wrong somewhere?
Bisecting the kernel may be the only option here, I don't think you're
going to find anything in userspace...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] NLA_F_NESTED is missing

2020-11-02 Thread Toke Høiland-Jørgensen via Cake
Dean Scarff  writes:

>  Hi,
>
>  I've been happily running the out-of-tree sch_cake on my Raspberry Pi 
>  since 2015.  However, I recently upgraded my kernel (to 5.4.72 from 
>  Raspbian's raspberrypi-kernel 1.20201022-1), which comes with the 
>  sch_cake in mainline.  Now, when running:
>
>sudo /sbin/tc qdisc add dev ppp0 root cake
>
>  I get the error:
>
>Error: NLA_F_NESTED is missing.
>
>  I get this error with the sch_cake in mainline, and also with sch_cake 
>  built out-of-tree.  I also get the error with both Debian's iproute2 
>  5.9.0-1 (built myself via debian/rules) and "tc" from dtaht's tc-adv 
>  repo.
>
>  Any ideas on what this error means and how to fix it?

I just tried building a 5.4.72 kernel and couldn't reproduce this, so it
seems it's a fault with the raspberry pi kernel; I guess opening a bug
against that would be the way to go?

As for what's actually causing this, I couldn't find anything obvious
that touches this code in the qdisc layer; but I suppose it has
something to do with the core qdisc netlink parsing code?

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Cake, low speed ADSL & fwmark

2020-07-28 Thread Toke Høiland-Jørgensen via Cake
Jim Geo  writes:

>>
>> > On 28 Jul, 2020, at 12:41 am, Jim Geo  wrote:
>> >
>> > Thank you for all the efforts you have done to make internet usable.
>> >
>> > I currently use htb & fq_codel in my low speed ADSL 6Mbps downlink/1 Mbps 
>> > uplink. I use fwmark to control both uplink and downlink with good results 
>> > in terms of bandwidth allocation. Streaming video is chopping bulk traffic 
>> > successfully.
>> >
>> > Is setting up cake worth the effort at such low speeds? Would it reduce 
>> > latency?
>>
>> Cake has a better-quality shaper than HTB does, and a more sophisticated 
>> flow-isolation scheme than fq_codel does.  These tend to matter more at low 
>> speeds, not less.  It's also generally easier to set up than a compound 
>> qdisc scheme.
>>
>> > Regarding fwmark can you please elaborate more on the calculations 
>> > performed? Man page is not that helpful.
>> >
>> > My understanding is this:
>> >
>> > I use 1,2,3,4 as marks of traffic.
>> > If I set the mask to 0xff[..] the marks will remain unchanged. Then 
>> > right shifting will occur for the unset bits, so they will land on tins
>> > 1,1,3,1
>> >
>> > Can you please correct me? If logical and performed between mask and mark 
>> > value?
>>
>> Since there's only a few "tins" at a time used in Cake, and the fwmark is a 
>> direct mapping into those tins, a narrow mask is probably safer to use than 
>> a wide one.  The reason for the mask is so you can encode several values 
>> into different parts of the mark value.  The shift is simply to move the 
>> field covered by the mask to the low end of the word, so that it is useful 
>> to Cake.
>>
>> For your use case, a mask of 0xF will be completely sufficient.  It would 
>> allow you to specify mark values of 1-15, to map directly in the first 15 
>> tins used by Cake, or a mark value of 0 to fall back to Cake's default 
>> Diffserv handling.  None of Cake's tin setups use more than 8 tins, and most 
>> use fewer.
>>
>>  - Jonathan Morton
>>
>
> Thanks for the info! I've noticed that by using 0xF, marks 1-4 become
> tins 0-3. Tin 0 is special? I assumed it's for bulk traffic. I use
> diffserv8.

Nah, it's just that the fwmark uses 1-indexed tin numbers (because a
mark of 0 is the same as 'unset').

The code in cake_select_tin() that handles the mark is literally just this:

else if (mark && mark <= q->tin_cnt)
tin = q->tin_order[mark - 1];

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] quantum configuration

2020-07-24 Thread Toke Høiland-Jørgensen via Cake
Luca Muscariello  writes:

> Is there a reason why in cake the quantum cannot be configured to a
> different value like in fq_codel?

I think this was mostly to be as no-knob as possible; so the quantum is
auto-scaled with the tin bandwidths, instead of being configurable.

Jonathan can probably expand on this...

-Toke
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net v2] vlan: consolidate VLAN parsing code and limit max parsing depth

2020-07-07 Thread Toke Høiland-Jørgensen
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.

Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to a max depth of 8 tags. As part of this,
switch over __vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.

v2:
- Use limit of 8 tags instead of 32 (matching XMIT_RECURSION_LIMIT)

Reported-by: Toshiaki Makita 
Reported-by: Daniel Borkmann 
Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the 
presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/if_vlan.h | 57 -
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 427a5b8597c2..41a518336673 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -25,6 +25,8 @@
 #define VLAN_ETH_DATA_LEN  1500/* Max. octets in payload*/
 #define VLAN_ETH_FRAME_LEN 1518/* Max. octets in frame sans FCS */
 
+#define VLAN_MAX_DEPTH 8   /* Max. number of nested VLAN tags 
parsed */
+
 /*
  * struct vlan_hdr - vlan header
  * @h_vlan_TCI: priority and VLAN ID
@@ -308,34 +310,6 @@ static inline bool eth_type_vlan(__be16 ethertype)
}
 }
 
-/* A getter for the SKB protocol field which will handle VLAN tags consistently
- * whether VLAN acceleration is enabled or not.
- */
-static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
-{
-   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
-   __be16 proto = skb->protocol;
-
-   if (!skip_vlan)
-   /* VLAN acceleration strips the VLAN header from the skb and
-* moves it to skb->vlan_proto
-*/
-   return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
-
-   while (eth_type_vlan(proto)) {
-   struct vlan_hdr vhdr, *vh;
-
-   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
-   if (!vh)
-   break;
-
-   proto = vh->h_vlan_encapsulated_proto;
-   offset += sizeof(vhdr);
-   }
-
-   return proto;
-}
-
 static inline bool vlan_hw_offload_capable(netdev_features_t features,
   __be16 proto)
 {
@@ -605,10 +579,10 @@ static inline int vlan_get_tag(const struct sk_buff *skb, 
u16 *vlan_tci)
  * Returns the EtherType of the packet, regardless of whether it is
  * vlan encapsulated (normal or hardware accelerated) or not.
  */
-static inline __be16 __vlan_get_protocol(struct sk_buff *skb, __be16 type,
+static inline __be16 __vlan_get_protocol(const struct sk_buff *skb, __be16 
type,
 int *depth)
 {
-   unsigned int vlan_depth = skb->mac_len;
+   unsigned int vlan_depth = skb->mac_len, parse_depth = VLAN_MAX_DEPTH;
 
/* if type is 802.1Q/AD then the header should already be
 * present at mac_len - VLAN_HLEN (if mac_len > 0), or at
@@ -623,13 +597,12 @@ static inline __be16 __vlan_get_protocol(struct sk_buff 
*skb, __be16 type,
vlan_depth = ETH_HLEN;
}
do {
-   struct vlan_hdr *vh;
+   struct vlan_hdr vhdr, *vh;
 
-   if (unlikely(!pskb_may_pull(skb,
-   vlan_depth + VLAN_HLEN)))
+   vh = skb_header_pointer(skb, vlan_depth, sizeof(vhdr), 
&vhdr);
+   if (unlikely(!vh || !--parse_depth))
return 0;
 
-   vh = (struct vlan_hdr *)(skb->data + vlan_depth);
type = vh->h_vlan_encapsulated_proto;
vlan_depth += VLAN_HLEN;
} while (eth_type_vlan(type));
@@ -648,11 +621,25 @@ static inline __be16 __vlan_get_protocol(struct sk_buff 
*skb, __be16 type,
  * Returns the EtherType of the packet, regardless of whether it is
  * vlan encapsulated (normal or hardware accelerated) or not.
  */
-static inline __be16 vlan_get_protocol(struct sk_buff *skb)
+static inline __be16 vlan_get_protocol(const struct sk_buff *skb)
 {
return __vlan_get_protocol(skb, skb->protocol, NULL);
 }
 
+/* A getter for the SKB protocol field which will handle VLAN tags consistently
+ * whether VLAN acceleration is enabled or not.
+ */
+static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
+{
+   if (!skip_vlan)
+   /*

Re: [Cake] [PATCH net] vlan: consolidate VLAN parsing code and limit max parsing depth

2020-07-07 Thread Toke Høiland-Jørgensen
Toshiaki Makita  writes:

> On 2020/07/06 21:29, Toke Høiland-Jørgensen wrote:
>> Toshiaki pointed out that we now have two very similar functions to extract
>> the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
>> that the unbounded parsing loop makes it possible for maliciously crafted
>> packets to loop through potentially hundreds of tags.
>> 
>> Fix both of these issues by consolidating the two parsing functions and
>> limiting the VLAN tag parsing to an arbitrarily-chosen, but hopefully
>> conservative, max depth of 32 tags. As part of this, switch over
>> __vlan_get_protocol() to use skb_header_pointer() instead of
>> pskb_may_pull(), to avoid the possible side effects of the latter and keep
>> the skb pointer 'const' through all the parsing functions.
>> 
>> Reported-by: Toshiaki Makita 
>> Reported-by: Daniel Borkmann 
>> Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in 
>> the presence of VLANs")
>> Signed-off-by: Toke Høiland-Jørgensen 
>> ---
> ...
>> @@ -623,13 +597,12 @@ static inline __be16 __vlan_get_protocol(struct 
>> sk_buff *skb, __be16 type,
>>  vlan_depth = ETH_HLEN;
>>  }
>>  do {
>> -struct vlan_hdr *vh;
>> +struct vlan_hdr vhdr, *vh;
>>   
>> -if (unlikely(!pskb_may_pull(skb,
>> -vlan_depth + VLAN_HLEN)))
>> +vh = skb_header_pointer(skb, vlan_depth, sizeof(vhdr), 
>> &vhdr);
>
> Some drivers which use vlan_get_protocol to get IP protocol for checksum 
> offload discards
> packets when it cannot get the protocol.
> I guess for such users this function should try to get protocol even if it is 
> not in skb header?
> I'm not sure such a case can happen, but since you care about this, you know 
> real cases where
> vlan tag can be in skb frags?

skb_header_pointer() will still succeed in reading the data, it'll just
do so by copying it into the buffer on the stack (vhdr) instead of
moving the SKB data itself around...

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] vlan: consolidate VLAN parsing code and limit max parsing depth

2020-07-07 Thread Toke Høiland-Jørgensen
Toshiaki Makita  writes:

> On 2020/07/07 7:44, Toke Høiland-Jørgensen wrote:
>> Daniel Borkmann  writes:
>>> On 7/6/20 2:29 PM, Toke Høiland-Jørgensen wrote:
>>>> Toshiaki pointed out that we now have two very similar functions to extract
>>>> the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
>>>> that the unbounded parsing loop makes it possible for maliciously crafted
>>>> packets to loop through potentially hundreds of tags.
>>>>
>>>> Fix both of these issues by consolidating the two parsing functions and
>>>> limiting the VLAN tag parsing to an arbitrarily-chosen, but hopefully
>>>> conservative, max depth of 32 tags. As part of this, switch over
>>>> __vlan_get_protocol() to use skb_header_pointer() instead of
>>>> pskb_may_pull(), to avoid the possible side effects of the latter and keep
>>>> the skb pointer 'const' through all the parsing functions.
>>>>
>>>> Reported-by: Toshiaki Makita 
>>>> Reported-by: Daniel Borkmann 
>>>> Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in 
>>>> the presence of VLANs")
>>>> Signed-off-by: Toke Høiland-Jørgensen 
>>>> ---
>>>>include/linux/if_vlan.h | 57 -
>>>>1 file changed, 22 insertions(+), 35 deletions(-)
>>>>
>>>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>>>> index 427a5b8597c2..855d16192e6a 100644
>>>> --- a/include/linux/if_vlan.h
>>>> +++ b/include/linux/if_vlan.h
>>>> @@ -25,6 +25,8 @@
>>>>#define VLAN_ETH_DATA_LEN   1500/* Max. octets in payload   
>>>>  */
>>>>#define VLAN_ETH_FRAME_LEN  1518/* Max. octets in frame sans 
>>>> FCS */
>>>>
>>>> +#define VLAN_MAX_DEPTH32  /* Max. number of nested VLAN 
>>>> tags parsed */
>>>> +
>>>
>>> Any insight on limits of nesting wrt QinQ, maybe from spec side?
>> 
>> Don't think so. Wikipedia says this:
>> 
>>   802.1ad is upward compatible with 802.1Q. Although 802.1ad is limited
>>   to two tags, there is no ceiling on the standard limiting a single
>>   frame to more than two tags, allowing for growth in the protocol. In
>>   practice Service Provider topologies often anticipate and utilize
>>   frames having more than two tags.
>> 
>>> Why not 8 as max, for example (I'd probably even consider a depth like
>>> this as utterly broken setup ..)?
>> 
>> I originally went with 8, but chickened out after seeing how many places
>> call the parsing function. While I do agree that eight tags is... somewhat
>> excessive... I was trying to make absolutely sure no one would hit this
>> limit in normal use. See also https://xkcd.com/1172/ :)
>
> Considering that XMIT_RECURSION_LIMIT is 8, I also think 8 is sufficient.

Alright, fair enough, I'll send a v2 with a limit of 8 :)

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] vlan: consolidate VLAN parsing code and limit max parsing depth

2020-07-06 Thread Toke Høiland-Jørgensen
Daniel Borkmann  writes:

> On 7/6/20 2:29 PM, Toke Høiland-Jørgensen wrote:
>> Toshiaki pointed out that we now have two very similar functions to extract
>> the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
>> that the unbounded parsing loop makes it possible for maliciously crafted
>> packets to loop through potentially hundreds of tags.
>> 
>> Fix both of these issues by consolidating the two parsing functions and
>> limiting the VLAN tag parsing to an arbitrarily-chosen, but hopefully
>> conservative, max depth of 32 tags. As part of this, switch over
>> __vlan_get_protocol() to use skb_header_pointer() instead of
>> pskb_may_pull(), to avoid the possible side effects of the latter and keep
>> the skb pointer 'const' through all the parsing functions.
>> 
>> Reported-by: Toshiaki Makita 
>> Reported-by: Daniel Borkmann 
>> Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in 
>> the presence of VLANs")
>> Signed-off-by: Toke Høiland-Jørgensen 
>> ---
>>   include/linux/if_vlan.h | 57 -
>>   1 file changed, 22 insertions(+), 35 deletions(-)
>> 
>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>> index 427a5b8597c2..855d16192e6a 100644
>> --- a/include/linux/if_vlan.h
>> +++ b/include/linux/if_vlan.h
>> @@ -25,6 +25,8 @@
>>   #define VLAN_ETH_DATA_LEN  1500/* Max. octets in payload*/
>>   #define VLAN_ETH_FRAME_LEN 1518/* Max. octets in frame sans FCS */
>>   
>> +#define VLAN_MAX_DEPTH  32  /* Max. number of nested VLAN 
>> tags parsed */
>> +
>
> Any insight on limits of nesting wrt QinQ, maybe from spec side?

Don't think so. Wikipedia says this:

 802.1ad is upward compatible with 802.1Q. Although 802.1ad is limited
 to two tags, there is no ceiling on the standard limiting a single
 frame to more than two tags, allowing for growth in the protocol. In
 practice Service Provider topologies often anticipate and utilize
 frames having more than two tags.

> Why not 8 as max, for example (I'd probably even consider a depth like
> this as utterly broken setup ..)?

I originally went with 8, but chickened out after seeing how many places
call the parsing function. While I do agree that eight tags is... somewhat
excessive... I was trying to make absolutely sure no one would hit this
limit in normal use. See also https://xkcd.com/1172/ :)

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net] vlan: consolidate VLAN parsing code and limit max parsing depth

2020-07-06 Thread Toke Høiland-Jørgensen
Toshiaki pointed out that we now have two very similar functions to extract
the L3 protocol number in the presence of VLAN tags. And Daniel pointed out
that the unbounded parsing loop makes it possible for maliciously crafted
packets to loop through potentially hundreds of tags.

Fix both of these issues by consolidating the two parsing functions and
limiting the VLAN tag parsing to an arbitrarily-chosen, but hopefully
conservative, max depth of 32 tags. As part of this, switch over
__vlan_get_protocol() to use skb_header_pointer() instead of
pskb_may_pull(), to avoid the possible side effects of the latter and keep
the skb pointer 'const' through all the parsing functions.

Reported-by: Toshiaki Makita 
Reported-by: Daniel Borkmann 
Fixes: d7bf2ebebc2b ("sched: consistently handle layer3 header accesses in the 
presence of VLANs")
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/if_vlan.h | 57 -
 1 file changed, 22 insertions(+), 35 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index 427a5b8597c2..855d16192e6a 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -25,6 +25,8 @@
 #define VLAN_ETH_DATA_LEN  1500/* Max. octets in payload*/
 #define VLAN_ETH_FRAME_LEN 1518/* Max. octets in frame sans FCS */
 
+#define VLAN_MAX_DEPTH 32  /* Max. number of nested VLAN tags 
parsed */
+
 /*
  * struct vlan_hdr - vlan header
  * @h_vlan_TCI: priority and VLAN ID
@@ -308,34 +310,6 @@ static inline bool eth_type_vlan(__be16 ethertype)
}
 }
 
-/* A getter for the SKB protocol field which will handle VLAN tags consistently
- * whether VLAN acceleration is enabled or not.
- */
-static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
-{
-   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
-   __be16 proto = skb->protocol;
-
-   if (!skip_vlan)
-   /* VLAN acceleration strips the VLAN header from the skb and
-* moves it to skb->vlan_proto
-*/
-   return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
-
-   while (eth_type_vlan(proto)) {
-   struct vlan_hdr vhdr, *vh;
-
-   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
-   if (!vh)
-   break;
-
-   proto = vh->h_vlan_encapsulated_proto;
-   offset += sizeof(vhdr);
-   }
-
-   return proto;
-}
-
 static inline bool vlan_hw_offload_capable(netdev_features_t features,
   __be16 proto)
 {
@@ -605,10 +579,10 @@ static inline int vlan_get_tag(const struct sk_buff *skb, 
u16 *vlan_tci)
  * Returns the EtherType of the packet, regardless of whether it is
  * vlan encapsulated (normal or hardware accelerated) or not.
  */
-static inline __be16 __vlan_get_protocol(struct sk_buff *skb, __be16 type,
+static inline __be16 __vlan_get_protocol(const struct sk_buff *skb, __be16 
type,
 int *depth)
 {
-   unsigned int vlan_depth = skb->mac_len;
+   unsigned int vlan_depth = skb->mac_len, parse_depth = VLAN_MAX_DEPTH;
 
/* if type is 802.1Q/AD then the header should already be
 * present at mac_len - VLAN_HLEN (if mac_len > 0), or at
@@ -623,13 +597,12 @@ static inline __be16 __vlan_get_protocol(struct sk_buff 
*skb, __be16 type,
vlan_depth = ETH_HLEN;
}
do {
-   struct vlan_hdr *vh;
+   struct vlan_hdr vhdr, *vh;
 
-   if (unlikely(!pskb_may_pull(skb,
-   vlan_depth + VLAN_HLEN)))
+   vh = skb_header_pointer(skb, vlan_depth, sizeof(vhdr), 
&vhdr);
+   if (unlikely(!vh || !--parse_depth))
return 0;
 
-   vh = (struct vlan_hdr *)(skb->data + vlan_depth);
type = vh->h_vlan_encapsulated_proto;
vlan_depth += VLAN_HLEN;
} while (eth_type_vlan(type));
@@ -648,11 +621,25 @@ static inline __be16 __vlan_get_protocol(struct sk_buff 
*skb, __be16 type,
  * Returns the EtherType of the packet, regardless of whether it is
  * vlan encapsulated (normal or hardware accelerated) or not.
  */
-static inline __be16 vlan_get_protocol(struct sk_buff *skb)
+static inline __be16 vlan_get_protocol(const struct sk_buff *skb)
 {
return __vlan_get_protocol(skb, skb->protocol, NULL);
 }
 
+/* A getter for the SKB protocol field which will handle VLAN tags consistently
+ * whether VLAN acceleration is enabled or not.
+ */
+static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
+{
+   if (!skip_vlan)
+   /* VLAN acceleration strips the V

Re: [Cake] [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-06 Thread Toke Høiland-Jørgensen
Toshiaki Makita  writes:

> On 2020/07/04 20:33, Toke Høiland-Jørgensen wrote:
>> Toshiaki Makita  writes:
>>> On 2020/07/04 5:26, Toke Høiland-Jørgensen wrote:
>>> ...
>>>> +/* A getter for the SKB protocol field which will handle VLAN tags 
>>>> consistently
>>>> + * whether VLAN acceleration is enabled or not.
>>>> + */
>>>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool 
>>>> skip_vlan)
>>>> +{
>>>> +  unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>>>> +  __be16 proto = skb->protocol;
>>>> +
>>>> +  if (!skip_vlan)
>>>> +  /* VLAN acceleration strips the VLAN header from the skb and
>>>> +   * moves it to skb->vlan_proto
>>>> +   */
>>>> +  return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
>>>> +
>>>> +  while (eth_type_vlan(proto)) {
>>>> +  struct vlan_hdr vhdr, *vh;
>>>> +
>>>> +  vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
>>>> +  if (!vh)
>>>> +  break;
>>>> +
>>>> +  proto = vh->h_vlan_encapsulated_proto;
>>>> +  offset += sizeof(vhdr);
>>>> +  }
>>>
>>> Why don't you use __vlan_get_protocol() here? It looks quite similar.
>>> Is there any problem with using that?
>> 
>> TBH, I completely missed that helper. It seems to have side effects,
>> though (pskb_may_pull()), which is one of the things the original patch
>> to sch_cake that initiated all of this was trying to avoid.
>
> Sorry for not completely following the discussion...
> Pulling data is wrong for cake or other schedulers?

This was not explicit in the current thread, but the reason I started
looking into this in the first place was a pull request on the
out-of-tree version of sch_cake that noticed that there are drivers that
will allocate SKBs in such a way that accessing the packet header causes
it to be reallocated: https://github.com/dtaht/sch_cake/pull/136

I'm not entirely positive that this applies to just reading the header
through pskb_may_pull(), or if it was only on skb_try_make_writable();
but in any case it seems to me that it's better for a helper like
__vlan_get_protocol() to not have side effects.

>> I guess I could just fix that, though, and switch __vlan_get_protocol()
>> over to using skb_header_pointer(). Will send a follow-up to do that.
>> 
>> Any opinion on whether it's a good idea to limit the max parse depth
>> while I'm at it (see Daniel's reply)?
>
> The logic was originally introduced by skb_network_protocol() back in
> v3.10, and I have never heard of security report about that. But yes,
> I guess it potentially can be used for DoS attack.

Right, I'll add the limit as well, then :)

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-04 Thread Toke Høiland-Jørgensen
Toshiaki Makita  writes:

> On 2020/07/04 5:26, Toke Høiland-Jørgensen wrote:
> ...
>> +/* A getter for the SKB protocol field which will handle VLAN tags 
>> consistently
>> + * whether VLAN acceleration is enabled or not.
>> + */
>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
>> +{
>> +unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>> +__be16 proto = skb->protocol;
>> +
>> +if (!skip_vlan)
>> +/* VLAN acceleration strips the VLAN header from the skb and
>> + * moves it to skb->vlan_proto
>> + */
>> +return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
>> +
>> +while (eth_type_vlan(proto)) {
>> +struct vlan_hdr vhdr, *vh;
>> +
>> +vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
>> +if (!vh)
>> +break;
>> +
>> +proto = vh->h_vlan_encapsulated_proto;
>> +offset += sizeof(vhdr);
>> +}
>
> Why don't you use __vlan_get_protocol() here? It looks quite similar.
> Is there any problem with using that?

TBH, I completely missed that helper. It seems to have side effects,
though (pskb_may_pull()), which is one of the things the original patch
to sch_cake that initiated all of this was trying to avoid.

I guess I could just fix that, though, and switch __vlan_get_protocol()
over to using skb_header_pointer(). Will send a follow-up to do that.

Any opinion on whether it's a good idea to limit the max parse depth
while I'm at it (see Daniel's reply)?

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-04 Thread Toke Høiland-Jørgensen
Daniel Borkmann  writes:

> On 7/3/20 10:26 PM, Toke Høiland-Jørgensen wrote:
>> There are a couple of places in net/sched/ that check skb->protocol and act
>> on the value there. However, in the presence of VLAN tags, the value stored
>> in skb->protocol can be inconsistent based on whether VLAN acceleration is
>> enabled. The commit quoted in the Fixes tag below fixed the users of
>> skb->protocol to use a helper that will always see the VLAN ethertype.
>> 
>> However, most of the callers don't actually handle the VLAN ethertype, but
>> expect to find the IP header type in the protocol field. This means that
>> things like changing the ECN field, or parsing diffserv values, stops
>> working if there's a VLAN tag, or if there are multiple nested VLAN
>> tags (QinQ).
>> 
>> To fix this, change the helper to take an argument that indicates whether
>> the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
>> make sure to skip all of them, so behaviour is consistent even in QinQ
>> mode.
>> 
>> To make the helper usable from the ECN code, move it to if_vlan.h instead
>> of pkt_sched.h.
>> 
>> v3:
>> - Remove empty lines
>> - Move vlan variable definitions inside loop in skb_protocol()
>> - Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
>>bpf_skb_ecn_set_ce()
>> 
>> v2:
>> - Use eth_type_vlan() helper in skb_protocol()
>> - Also fix code that reads skb->protocol directly
>> - Change a couple of 'if/else if' statements to switch constructs to avoid
>>calling the helper twice
>> 
>> Reported-by: Ilya Ponetayev 
>> Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of 
>> accelerated vlan path")
>> Signed-off-by: Toke Høiland-Jørgensen 
>> ---
>>   include/linux/if_vlan.h  | 28 
>>   include/net/inet_ecn.h   | 25 +
>>   include/net/pkt_sched.h  | 11 ---
>>   net/core/filter.c| 10 +++---
>>   net/sched/act_connmark.c |  9 ++---
>>   net/sched/act_csum.c |  2 +-
>>   net/sched/act_ct.c   |  9 -
>>   net/sched/act_ctinfo.c   |  9 ++---
>>   net/sched/act_mpls.c |  2 +-
>>   net/sched/act_skbedit.c  |  2 +-
>>   net/sched/cls_api.c  |  2 +-
>>   net/sched/cls_flow.c |  8 
>>   net/sched/cls_flower.c   |  2 +-
>>   net/sched/em_ipset.c |  2 +-
>>   net/sched/em_ipt.c   |  2 +-
>>   net/sched/em_meta.c  |  2 +-
>>   net/sched/sch_cake.c |  4 ++--
>>   net/sched/sch_dsmark.c   |  6 +++---
>>   net/sched/sch_teql.c |  2 +-
>>   19 files changed, 86 insertions(+), 51 deletions(-)
>> 
>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>> index b05e855f1ddd..427a5b8597c2 100644
>> --- a/include/linux/if_vlan.h
>> +++ b/include/linux/if_vlan.h
>> @@ -308,6 +308,34 @@ static inline bool eth_type_vlan(__be16 ethertype)
>>  }
>>   }
>>   
>> +/* A getter for the SKB protocol field which will handle VLAN tags 
>> consistently
>> + * whether VLAN acceleration is enabled or not.
>> + */
>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
>> +{
>> +unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>> +__be16 proto = skb->protocol;
>> +
>> +if (!skip_vlan)
>> +/* VLAN acceleration strips the VLAN header from the skb and
>> + * moves it to skb->vlan_proto
>> + */
>> +return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
>> +
>> +while (eth_type_vlan(proto)) {
>> +struct vlan_hdr vhdr, *vh;
>> +
>> +vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
>> +if (!vh)
>> +break;
>> +
>> +proto = vh->h_vlan_encapsulated_proto;
>> +offset += sizeof(vhdr);
>> +}
>
> Hm, why is the while loop 'unbounded'? Does it even make sense to have
> a packet with hundreds of vlan hdrs in there what you'd end up
> walking? What if an attacker crafts a max sized packet with only
> vlan_hdr forcing exorbitant looping in fast-path here (e.g. via
> af_packet)?

Hmm, I guess you're right that could theoretically happen. But on the
other hand, a lot of drivers seem to be cheerfully calling
vlan_get_protocol() on incoming packets, which doesn't have a limit on
the depth 

[Cake] [PATCH net v3] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-03 Thread Toke Høiland-Jørgensen
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v3:
- Remove empty lines
- Move vlan variable definitions inside loop in skb_protocol()
- Also use skb_protocol() helper in IP{,6}_ECN_decapsulate() and
  bpf_skb_ecn_set_ce()

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
  calling the helper twice

Reported-by: Ilya Ponetayev 
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated 
vlan path")
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/if_vlan.h  | 28 
 include/net/inet_ecn.h   | 25 +
 include/net/pkt_sched.h  | 11 ---
 net/core/filter.c| 10 +++---
 net/sched/act_connmark.c |  9 ++---
 net/sched/act_csum.c |  2 +-
 net/sched/act_ct.c   |  9 -
 net/sched/act_ctinfo.c   |  9 ++---
 net/sched/act_mpls.c |  2 +-
 net/sched/act_skbedit.c  |  2 +-
 net/sched/cls_api.c  |  2 +-
 net/sched/cls_flow.c |  8 
 net/sched/cls_flower.c   |  2 +-
 net/sched/em_ipset.c |  2 +-
 net/sched/em_ipt.c   |  2 +-
 net/sched/em_meta.c  |  2 +-
 net/sched/sch_cake.c |  4 ++--
 net/sched/sch_dsmark.c   |  6 +++---
 net/sched/sch_teql.c |  2 +-
 19 files changed, 86 insertions(+), 51 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index b05e855f1ddd..427a5b8597c2 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -308,6 +308,34 @@ static inline bool eth_type_vlan(__be16 ethertype)
}
 }
 
+/* A getter for the SKB protocol field which will handle VLAN tags consistently
+ * whether VLAN acceleration is enabled or not.
+ */
+static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
+{
+   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
+   __be16 proto = skb->protocol;
+
+   if (!skip_vlan)
+   /* VLAN acceleration strips the VLAN header from the skb and
+* moves it to skb->vlan_proto
+*/
+   return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
+
+   while (eth_type_vlan(proto)) {
+   struct vlan_hdr vhdr, *vh;
+
+   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
+   if (!vh)
+   break;
+
+   proto = vh->h_vlan_encapsulated_proto;
+   offset += sizeof(vhdr);
+   }
+
+   return proto;
+}
+
 static inline bool vlan_hw_offload_capable(netdev_features_t features,
   __be16 proto)
 {
diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index 0f0d1efe06dd..e1eaf1780288 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -172,7 +173,7 @@ static inline void ipv6_copy_dscp(unsigned int dscp, struct 
ipv6hdr *inner)
 
 static inline int INET_ECN_set_ce(struct sk_buff *skb)
 {
-   switch (skb->protocol) {
+   switch (skb_protocol(skb, true)) {
case cpu_to_be16(ETH_P_IP):
if (skb_network_header(skb) + sizeof(struct iphdr) <=
skb_tail_pointer(skb))
@@ -191,7 +192,7 @@ static inline int INET_ECN_set_ce(struct sk_buff *skb)
 
 static inline int INET_ECN_set_ect1(struct sk_buff *skb)
 {
-   switch (skb->protocol) {
+   switch (skb_protocol(skb, true)) {
case cpu_to_be16(ETH_P_IP):
if (skb_network_header(skb) + sizeof(struct iphdr) <=
skb_tail_pointer(skb))
@@ -272,12 +273,16 @@ static inline int IP_ECN_decapsulate(const struct iphdr 
*oiph,
 {
__u8 inner;
 
-   if (skb->protocol == htons(ETH_P_IP))
+   switch (skb_protocol(skb, true)) {
+   case htons(ETH_P_IP):
inner = ip_hdr(skb)->tos;
-   else if (skb->p

Re: [Cake] [PATCH net v2] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-03 Thread Toke Høiland-Jørgensen
Roman Mashak  writes:

> Toke Høiland-Jørgensen  writes:
>
>
> [...]
>
>> +/* A getter for the SKB protocol field which will handle VLAN tags 
>> consistently
>> + * whether VLAN acceleration is enabled or not.
>> + */
>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
>> +{
>> +unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>> +__be16 proto = skb->protocol;
>> +struct vlan_hdr vhdr, *vh;
>
> Nit: you could move vhdr and *vh definitions inside the while loop,
> because of their inner scope use.

Sure, can do.

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net v2] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-03 Thread Toke Høiland-Jørgensen
Cong Wang  writes:

> On Fri, Jul 3, 2020 at 8:22 AM Toke Høiland-Jørgensen  wrote:
>> diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
>> index b05e855f1ddd..d0c1cb0d264d 100644
>> --- a/include/linux/if_vlan.h
>> +++ b/include/linux/if_vlan.h
>> @@ -308,6 +308,35 @@ static inline bool eth_type_vlan(__be16 ethertype)
>> }
>>  }
>>
>> +/* A getter for the SKB protocol field which will handle VLAN tags 
>> consistently
>> + * whether VLAN acceleration is enabled or not.
>> + */
>> +static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
>> +{
>> +   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
>> +   __be16 proto = skb->protocol;
>> +   struct vlan_hdr vhdr, *vh;
>> +
>> +   if (!skip_vlan)
>> +   /* VLAN acceleration strips the VLAN header from the skb and
>> +* moves it to skb->vlan_proto
>> +*/
>> +   return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
>> +
>> +   while (eth_type_vlan(proto)) {
>> +   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
>> +   if (!vh)
>> +   break;
>> +
>> +   proto = vh->h_vlan_encapsulated_proto;
>> +   offset += sizeof(vhdr);
>> +   }
>> +
>> +   return proto;
>> +}
>> +
>> +
>> +
>
> Just nit: too many newlines here. Please run checkpatch.pl.

Hmm, I did run checkpatch, but seems it only complains about multiple
newlines when run with --strict. Will fix, thanks! :)

>>  static inline bool vlan_hw_offload_capable(netdev_features_t features,
>>__be16 proto)
>>  {
>> diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
>> index 0f0d1efe06dd..82763ba597f2 100644
>> --- a/include/net/inet_ecn.h
>> +++ b/include/net/inet_ecn.h
>> @@ -4,6 +4,7 @@
>>
>>  #include 
>>  #include 
>> +#include 
>>
>>  #include 
>>  #include 
>> @@ -172,7 +173,7 @@ static inline void ipv6_copy_dscp(unsigned int dscp, 
>> struct ipv6hdr *inner)
>>
>>  static inline int INET_ECN_set_ce(struct sk_buff *skb)
>>  {
>> -   switch (skb->protocol) {
>> +   switch (skb_protocol(skb, true)) {
>> case cpu_to_be16(ETH_P_IP):
>> if (skb_network_header(skb) + sizeof(struct iphdr) <=
>> skb_tail_pointer(skb))
>> @@ -191,7 +192,7 @@ static inline int INET_ECN_set_ce(struct sk_buff *skb)
>>
>>  static inline int INET_ECN_set_ect1(struct sk_buff *skb)
>>  {
>> -   switch (skb->protocol) {
>> +   switch (skb_protocol(skb, true)) {
>> case cpu_to_be16(ETH_P_IP):
>> if (skb_network_header(skb) + sizeof(struct iphdr) <=
>> skb_tail_pointer(skb))
>
> These two helpers are called by non-net_sched too, are you sure
> your change is correct for them too?
>
> For example, IP6_ECN_decapsulate() uses skb->protocol then calls
> INET_ECN_decapsulate() which calls the above, after your change
> they use skb_protocol(). This looks inconsistent to me.

Good point. I'll change IP{,6}_ECN_decapsulate() to also use
skb_protocol().

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net v2] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-03 Thread Toke Høiland-Jørgensen
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

v2:
- Use eth_type_vlan() helper in skb_protocol()
- Also fix code that reads skb->protocol directly
- Change a couple of 'if/else if' statements to switch constructs to avoid
  calling the helper twice

Reported-by: Ilya Ponetayev 
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated 
vlan path")
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/if_vlan.h  | 29 +
 include/net/inet_ecn.h   |  5 +++--
 include/net/pkt_sched.h  | 11 ---
 net/sched/act_connmark.c |  9 ++---
 net/sched/act_csum.c |  2 +-
 net/sched/act_ct.c   |  9 -
 net/sched/act_ctinfo.c   |  9 ++---
 net/sched/act_mpls.c |  2 +-
 net/sched/act_skbedit.c  |  2 +-
 net/sched/cls_api.c  |  2 +-
 net/sched/cls_flow.c |  8 
 net/sched/cls_flower.c   |  2 +-
 net/sched/em_ipset.c |  2 +-
 net/sched/em_ipt.c   |  2 +-
 net/sched/em_meta.c  |  2 +-
 net/sched/sch_cake.c |  4 ++--
 net/sched/sch_dsmark.c   |  6 +++---
 net/sched/sch_teql.c |  2 +-
 18 files changed, 66 insertions(+), 42 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index b05e855f1ddd..d0c1cb0d264d 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -308,6 +308,35 @@ static inline bool eth_type_vlan(__be16 ethertype)
}
 }
 
+/* A getter for the SKB protocol field which will handle VLAN tags consistently
+ * whether VLAN acceleration is enabled or not.
+ */
+static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
+{
+   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
+   __be16 proto = skb->protocol;
+   struct vlan_hdr vhdr, *vh;
+
+   if (!skip_vlan)
+   /* VLAN acceleration strips the VLAN header from the skb and
+* moves it to skb->vlan_proto
+*/
+   return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
+
+   while (eth_type_vlan(proto)) {
+   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
+   if (!vh)
+   break;
+
+   proto = vh->h_vlan_encapsulated_proto;
+   offset += sizeof(vhdr);
+   }
+
+   return proto;
+}
+
+
+
 static inline bool vlan_hw_offload_capable(netdev_features_t features,
   __be16 proto)
 {
diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index 0f0d1efe06dd..82763ba597f2 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -172,7 +173,7 @@ static inline void ipv6_copy_dscp(unsigned int dscp, struct 
ipv6hdr *inner)
 
 static inline int INET_ECN_set_ce(struct sk_buff *skb)
 {
-   switch (skb->protocol) {
+   switch (skb_protocol(skb, true)) {
case cpu_to_be16(ETH_P_IP):
if (skb_network_header(skb) + sizeof(struct iphdr) <=
skb_tail_pointer(skb))
@@ -191,7 +192,7 @@ static inline int INET_ECN_set_ce(struct sk_buff *skb)
 
 static inline int INET_ECN_set_ect1(struct sk_buff *skb)
 {
-   switch (skb->protocol) {
+   switch (skb_protocol(skb, true)) {
case cpu_to_be16(ETH_P_IP):
if (skb_network_header(skb) + sizeof(struct iphdr) <=
skb_tail_pointer(skb))
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 9092e697059e..ac8c890a2657 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -136,17 +136,6 @@ static inline void qdisc_run(struct Qdisc *q)
}
 }
 
-static inline __be16 tc_skb_protocol(const struct sk_buff *skb)
-{
-   /* We need to take extra care in case the skb came via
-* vlan accelerated path. In that case, use skb->vlan_proto
-* as the original vlan header was already stripped.
-*/
-   if (skb_vlan_

Re: [Cake] [PATCH net] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-03 Thread Toke Høiland-Jørgensen
Davide Caratti  writes:

> hello Toke,
>
> thanks for answering!
>
> On Fri, 2020-07-03 at 14:05 +0200, Toke Høiland-Jørgensen wrote:
>>   while (proto == htons(ETH_P_8021Q) || proto == htons(ETH_P_8021AD)) {
>
> maybe this line be shortened, since if_vlan.h has [1]:
>
> while (eth_type_vlan(proto)) {
>   ...
> }

Good point, missed that! Will fix and send a v2.

> If I read well, the biggest change from functional point of view is that
> now qdiscs can set the ECN bit also on non-accelerated VLAN packets and
> QinQ-tagged packets, if the IP header is the outer-most header after VLAN;
> and the same applies to almost all net/sched former users of skb->protocol 
> or tc_skb_protocol().

Yup, that's the idea.

> Question (sorry in advance because it might be a dumb one :) ):
>
> do you know why cls_flower, act_ct, act_mpls and act_connmark keep reading
> skb->protocol? is that intentional?

Hmm, no not really. I only checked for calls to tc_skb_protocol(), not
for direct uses of skb->protocol. Will fix those as well :)

> (for act_mpls that doesn't look intentional, and probably the result is
> that the BOS bit is not set correctly if someone tries to push/pop a label
> for a non-accelerated or QinQ packet. But I didn't try it experimentally
> :) )

Hmm, you're certainly right that the MPLS code should use the helper to
get consistent use between accelerated/non-accelerated VLAN usage. But I
don't know enough about MPLS to judge whether it should be skipping the
VLAN tags or not. Sounds like you're saying the right thing is to skip
the VLAN tags there as well?

Looking at the others, it looks like act_connmark and act_ct both ought
to skip VLAN tags, while act_flower should probably keep it, since it
seems it has a VLAN match type. Or?

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net] sched: consistently handle layer3 header accesses in the presence of VLANs

2020-07-03 Thread Toke Høiland-Jørgensen
There are a couple of places in net/sched/ that check skb->protocol and act
on the value there. However, in the presence of VLAN tags, the value stored
in skb->protocol can be inconsistent based on whether VLAN acceleration is
enabled. The commit quoted in the Fixes tag below fixed the users of
skb->protocol to use a helper that will always see the VLAN ethertype.

However, most of the callers don't actually handle the VLAN ethertype, but
expect to find the IP header type in the protocol field. This means that
things like changing the ECN field, or parsing diffserv values, stops
working if there's a VLAN tag, or if there are multiple nested VLAN
tags (QinQ).

To fix this, change the helper to take an argument that indicates whether
the caller wants to skip the VLAN tags or not. When skipping VLAN tags, we
make sure to skip all of them, so behaviour is consistent even in QinQ
mode.

To make the helper usable from the ECN code, move it to if_vlan.h instead
of pkt_sched.h.

Reported-by: Ilya Ponetayev 
Fixes: d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated 
vlan path")
Signed-off-by: Toke Høiland-Jørgensen 
---
 include/linux/if_vlan.h | 27 +++
 include/net/inet_ecn.h  |  5 +++--
 include/net/pkt_sched.h | 11 ---
 net/sched/act_csum.c|  2 +-
 net/sched/act_ctinfo.c  |  4 ++--
 net/sched/act_skbedit.c |  2 +-
 net/sched/cls_api.c |  2 +-
 net/sched/cls_flow.c|  8 
 net/sched/em_ipset.c|  2 +-
 net/sched/em_ipt.c  |  2 +-
 net/sched/em_meta.c |  2 +-
 net/sched/sch_cake.c|  4 ++--
 net/sched/sch_dsmark.c  |  6 +++---
 net/sched/sch_teql.c|  2 +-
 14 files changed, 48 insertions(+), 31 deletions(-)

diff --git a/include/linux/if_vlan.h b/include/linux/if_vlan.h
index b05e855f1ddd..761aae890b2d 100644
--- a/include/linux/if_vlan.h
+++ b/include/linux/if_vlan.h
@@ -78,6 +78,33 @@ static inline bool is_vlan_dev(const struct net_device *dev)
 #define skb_vlan_tag_get_cfi(__skb)(!!((__skb)->vlan_tci & VLAN_CFI_MASK))
 #define skb_vlan_tag_get_prio(__skb)   (((__skb)->vlan_tci & VLAN_PRIO_MASK) 
>> VLAN_PRIO_SHIFT)
 
+/* A getter for the SKB protocol field which will handle VLAN tags consistently
+ * whether VLAN acceleration is enabled or not.
+ */
+static inline __be16 skb_protocol(const struct sk_buff *skb, bool skip_vlan)
+{
+   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
+   __be16 proto = skb->protocol;
+   struct vlan_hdr vhdr, *vh;
+
+   if (!skip_vlan)
+   /* VLAN acceleration strips the VLAN header from the skb and
+* moves it to skb->vlan_proto
+*/
+   return skb_vlan_tag_present(skb) ? skb->vlan_proto : proto;
+
+   while (proto == htons(ETH_P_8021Q) || proto == htons(ETH_P_8021AD)) {
+   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
+   if (!vh)
+   break;
+
+   proto = vh->h_vlan_encapsulated_proto;
+   offset += sizeof(vhdr);
+   }
+
+   return proto;
+}
+
 static inline int vlan_get_rx_ctag_filter_info(struct net_device *dev)
 {
ASSERT_RTNL();
diff --git a/include/net/inet_ecn.h b/include/net/inet_ecn.h
index 0f0d1efe06dd..82763ba597f2 100644
--- a/include/net/inet_ecn.h
+++ b/include/net/inet_ecn.h
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -172,7 +173,7 @@ static inline void ipv6_copy_dscp(unsigned int dscp, struct 
ipv6hdr *inner)
 
 static inline int INET_ECN_set_ce(struct sk_buff *skb)
 {
-   switch (skb->protocol) {
+   switch (skb_protocol(skb, true)) {
case cpu_to_be16(ETH_P_IP):
if (skb_network_header(skb) + sizeof(struct iphdr) <=
skb_tail_pointer(skb))
@@ -191,7 +192,7 @@ static inline int INET_ECN_set_ce(struct sk_buff *skb)
 
 static inline int INET_ECN_set_ect1(struct sk_buff *skb)
 {
-   switch (skb->protocol) {
+   switch (skb_protocol(skb, true)) {
case cpu_to_be16(ETH_P_IP):
if (skb_network_header(skb) + sizeof(struct iphdr) <=
skb_tail_pointer(skb))
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index 9092e697059e..ac8c890a2657 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -136,17 +136,6 @@ static inline void qdisc_run(struct Qdisc *q)
}
 }
 
-static inline __be16 tc_skb_protocol(const struct sk_buff *skb)
-{
-   /* We need to take extra care in case the skb came via
-* vlan accelerated path. In that case, use skb->vlan_proto
-* as the original vlan header was already stripped.
-*/
-   if (skb_vlan_tag_present(skb))
-   return skb->vlan_proto;
-   return skb->protocol;
-}
-
 /* Calculate maximal size of packet seen by hard_start_xmit
routine of this device.
  */
diff --

Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags

2020-06-29 Thread Toke Høiland-Jørgensen
Davide Caratti  writes:

> hi Toke,
>
> thanks for answering.
>
> On Fri, 2020-06-26 at 14:52 +0200, Toke Høiland-Jørgensen wrote:
>> Davide Caratti  writes:
>
> [...]
>
>> > 
>> > >  I guess I can trying going through them all and figuring out if
>> > > there's a more generic solution.
>> > 
>> > For sch_cake, I think that the qdisc shouldn't look at the IP header when
>> > it schedules packets having a VLAN tag.
>> > 
>> > Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
>> > should look at the VLAN priority (PCP) bits (and that's something that
>> > cake doesn't do currently - but I have a small patch in my stash that
>> > implements this: please let me know if you are interested in seeing it :)
>> > ).
>> > 
>> > Then, to ensure that the IP precedence is respected, even with different
>> > VLAN tags, users should explicitly configure TC filters that "map" the
>> > DSCP value to a PCP value. This would ensure that configured priority is
>> > respected by the scheduler, and would also be flexible enough to allow
>> > different "mappings".
>> 
>> I think you have this the wrong way around :)
>> 
>> I.e., classifying based on VLAN priority is even more esoteric than
>> using diffserv markings,
>
> is it so uncommon? I knew that almost every wifi card did something
> similar with 802.11 'access categories'. More generally, I'm not sure if
> it's ok to ignore any QoS information present in the L2 header. Anyway,
>
>> so that should not be the default. Making it
>> the default would also make the behaviour change for the same traffic if
>> there's a VLAN tag present, which is bound to confuse people. I suppose
>> it could be an option, but not really sure that's needed, since as you
>> say you could just implement it with your own TC filters...
>
> you caught me :) ,
>
> I wrote that patch in my stash to fix cake on my home router, where voice
> and data are encapsulated in IP over PPPoE over VLANs, and different
> services go over different VLAN ids (one VLAN dedicated for voice, the
> other one for data) [1]. The quickest thing I did was: to prioritize
> packets having VLAN id equal to 1035.
>
> Now that I look at cake code again (where again means: after almost 1
> year) it would be probably better to assign skb->priority using flower +
> act_skbedit, and then prioritize in the qdisc: if I read the code well,
> this would avoid voice and data falling into the same traffic class (that
> was my original problem).
>
> please note: I didn't try this patch - but I think that even with this
> code I would have voice and data mixed together, because there is PPPoE
> between VLAN and IP.
>
>> > Sure, my proposal does not cover the problem of mangling the CE bit
>> > inside VLAN-tagged packets, i.e. if we should understand if qdiscs
>> > should allow it or not.
>> 
>> Hmm, yeah, that's the rub, isn't it? I think this is related to this
>> commit, which first introduced tc_skb_protocol():
>> 
>> d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan 
>> path")
>> 
>> That commit at least made the behaviour consistent across
>> accelerated/non-accelerated VLANs. However, the patch description
>> asserts that 'tc code .. expects vlan protocol type [in skb->protocol]'.
>> Looking at the various callers, I'm not actually sure that's true, in
>> the sense that most of the callers don't handle VLAN ethertypes at all,
>> but expects to find an IP header. This is the case for at least:
>> 
>> - act_ctinfo
>> - act_skbedit
>> - cls_flow
>> - em_ipset
>> - em_ipt
>> - sch_cake
>> - sch_dsmark
>
> sure, I'm not saying it's not possible to look inside IP headers. What I
> understood from Cong's replies [2], and he sort-of convinced me, was: when
> I have IP and one or more VLAN tags, no matter whether it is accelerated
> or not, it should be sufficient to access the IP header daisy-chaining
> 'act_vlan pop actions' -> access to the IP header -> ' act_vlan push
> actions (in the reversed order).
>
> oh well, that's still not sufficient in my home router because of PPPoE. I
> should practice with cls_bpf more seriously :-) 
>
> or write act_pppoe.c :D
>
>> In fact the only caller that explicitly handles a VLAN ethertype seems
>> to be act_csum; and that handles it in a way that also just sk

Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags

2020-06-26 Thread Toke Høiland-Jørgensen
Davide Caratti  writes:

> hello,
>
> my 2 cents:
>
> On Thu, 2020-06-25 at 21:53 +0200, Toke Høiland-Jørgensen wrote:
>> I think it depends a little on the use case; some callers actually care
>> about the VLAN tags themselves and handle that specially (e.g.,
>> act_csum).
>
> I remember taht something similar was discussed about 1 year ago [1].

Ah, thank you for the pointer!

>> Whereas others (e.g., sch_dsmark) probably will have the same
>> issue.
>
> I'd say that the issue "propagates" to all qdiscs that mangle the ECN-CE
> bit (i.e., calling INET_ECN_set_ce() [2]), most notably all the RED
> variants and "codel/fq_codel".

Yeah, I think we should fix INET_ECN_set_ce() instead of re-implementing
it in CAKE. See below, though.

>>  I guess I can trying going through them all and figuring out if
>> there's a more generic solution.
>
> For sch_cake, I think that the qdisc shouldn't look at the IP header when
> it schedules packets having a VLAN tag.
>
> Probably, when tc_skb_protocol() returns ETH_P_8021Q or ETH_P_8021AD, we
> should look at the VLAN priority (PCP) bits (and that's something that
> cake doesn't do currently - but I have a small patch in my stash that
> implements this: please let me know if you are interested in seeing it :)
> ).
>
> Then, to ensure that the IP precedence is respected, even with different
> VLAN tags, users should explicitly configure TC filters that "map" the
> DSCP value to a PCP value. This would ensure that configured priority is
> respected by the scheduler, and would also be flexible enough to allow
> different "mappings".

I think you have this the wrong way around :)

I.e., classifying based on VLAN priority is even more esoteric than
using diffserv markings, so that should not be the default. Making it
the default would also make the behaviour change for the same traffic if
there's a VLAN tag present, which is bound to confuse people. I suppose
it could be an option, but not really sure that's needed, since as you
say you could just implement it with your own TC filters...

> Sure, my proposal does not cover the problem of mangling the CE bit
> inside VLAN-tagged packets, i.e. if we should understand if qdiscs
> should allow it or not.

Hmm, yeah, that's the rub, isn't it? I think this is related to this
commit, which first introduced tc_skb_protocol():

d8b9605d2697 ("net: sched: fix skb->protocol use in case of accelerated vlan 
path")

That commit at least made the behaviour consistent across
accelerated/non-accelerated VLANs. However, the patch description
asserts that 'tc code .. expects vlan protocol type [in skb->protocol]'.
Looking at the various callers, I'm not actually sure that's true, in
the sense that most of the callers don't handle VLAN ethertypes at all,
but expects to find an IP header. This is the case for at least:

- act_ctinfo
- act_skbedit
- cls_flow
- em_ipset
- em_ipt
- sch_cake
- sch_dsmark

In fact the only caller that explicitly handles a VLAN ethertype seems
to be act_csum; and that handles it in a way that also just skips the
VLAN headers, albeit by skb_pull()'ing the header.

cls_api, em_meta and sch_teql don't explicitly handle it; but returning
the VLAN ethertypes to those does seem to make sense, since they just
pass the value somewhere else.

So my suggestion would be to add a new helper that skips the VLAN tags
and finds the L3 ethertype (i.e., basically cake_skb_proto() as
introduced in this patch), then switch all the callers listed above, as
well as the INET_ECN_set_ce() over to using that. Maybe something like
'skb_l3_protocol()' which could go into skbuff.h itself, so the ECN code
can also find it?

Any objections to this? It's not actually clear to me how the discussion
you quoted above landed; but this will at least make things consistent
across all the different actions/etc.

Adding in Jiri and Jamal as well since they were involved in the patch I
quoted above.

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH RESEND net-next] sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling

2020-06-25 Thread Toke Høiland-Jørgensen
From: Kevin Darbyshire-Bryant 

Change tin mapping on diffserv3, 4 & 8 for LE PHB support, in essence
making LE a member of the Bulk tin.

Bulk has the least priority and minimum of 1/16th total bandwidth in the
face of higher priority traffic.

NB: Diffserv 3 & 4 swap tin 0 & 1 priorities from the default order as
found in diffserv8, in case anyone is wondering why it looks a bit odd.

Signed-off-by: Kevin Darbyshire-Bryant 
[ reword commit message slightly ]
Signed-off-by: Toke Høiland-Jørgensen 
---
Resending this as standalone after splitting out the rest of the series to -net

 net/sched/sch_cake.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 60f8ae578819..cae367515804 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -312,8 +312,8 @@ static const u8 precedence[] = {
 };
 
 static const u8 diffserv8[] = {
-   2, 5, 1, 2, 4, 2, 2, 2,
-   0, 2, 1, 2, 1, 2, 1, 2,
+   2, 0, 1, 2, 4, 2, 2, 2,
+   1, 2, 1, 2, 1, 2, 1, 2,
5, 2, 4, 2, 4, 2, 4, 2,
3, 2, 3, 2, 3, 2, 3, 2,
6, 2, 3, 2, 3, 2, 3, 2,
@@ -323,7 +323,7 @@ static const u8 diffserv8[] = {
 };
 
 static const u8 diffserv4[] = {
-   0, 2, 0, 0, 2, 0, 0, 0,
+   0, 1, 0, 0, 2, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0,
2, 0, 2, 0, 2, 0, 2, 0,
2, 0, 2, 0, 2, 0, 2, 0,
@@ -334,7 +334,7 @@ static const u8 diffserv4[] = {
 };
 
 static const u8 diffserv3[] = {
-   0, 0, 0, 0, 2, 0, 0, 0,
+   0, 1, 0, 0, 2, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
-- 
2.27.0

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 3/3] sch_cake: fix a few style nits

2020-06-25 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

I spotted a few nits when comparing the in-tree version of sch_cake with
the out-of-tree one: A redundant error variable declaration shadowing an
outer declaration, and an indentation alignment issue. Fix both of these.

Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) 
qdisc")
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 094d6e652deb..ca813697728e 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -2715,7 +2715,7 @@ static int cake_init(struct Qdisc *sch, struct nlattr 
*opt,
qdisc_watchdog_init(&q->watchdog, sch);
 
if (opt) {
-   int err = cake_change(sch, opt, extack);
+   err = cake_change(sch, opt, extack);
 
if (err)
return err;
@@ -3032,7 +3032,7 @@ static int cake_dump_class_stats(struct Qdisc *sch, 
unsigned long cl,
PUT_STAT_S32(BLUE_TIMER_US,
 ktime_to_us(
 ktime_sub(now,
-flow->cvars.blue_timer)));
+  
flow->cvars.blue_timer)));
}
if (flow->cvars.dropping) {
PUT_STAT_S32(DROP_NEXT_US,

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 0/3] sched: A couple of fixes for sch_cake

2020-06-25 Thread Toke Høiland-Jørgensen
Hi Dave

This series contains a couple of fixes for diffserv handling in sch_cake that
provide a nice speedup (with a somewhat pedantic nit fix tacked on to the end).

Not quite sure about whether this should go to stable; it does provide a nice
speedup, but it's not strictly a fix in the "correctness" sense. I lean towards
including this in stable as well, since our most important consumer of that
(OpenWrt) is likely to backport the series anyway.

-Toke

---

Ilya Ponetayev (1):
  sch_cake: don't try to reallocate or unshare skb unconditionally

Toke Høiland-Jørgensen (2):
  sch_cake: don't call diffserv parsing code when it is not needed
  sch_cake: fix a few style nits


 net/sched/sch_cake.c | 17 +++--
 1 file changed, 11 insertions(+), 6 deletions(-)

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 1/3] sch_cake: don't try to reallocate or unshare skb unconditionally

2020-06-25 Thread Toke Høiland-Jørgensen
From: Ilya Ponetayev 

cake_handle_diffserv() tries to linearize mac and network header parts of
skb and to make it writable unconditionally. In some cases it leads to full
skb reallocation, which reduces throughput and increases CPU load. Some
measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
reallocates skb, if skb was allocated in ethernet driver via so-called
'build skb' method from page cache (it was discovered by strange increase
of kmalloc-2048 slab at first).

Obtain DSCP value via read-only skb_header_pointer() call, and leave
linearization only for DSCP bleaching or ECN CE setting. And, as an
additional optimisation, skip diffserv parsing entirely if it is not needed
by the current configuration.

Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before 
changing DSCP bits")
Signed-off-by: Ilya Ponetayev 
[ fix a few style issues, reflow commit message ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |   41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 60f8ae578819..cae006bef565 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1553,30 +1553,49 @@ static unsigned int cake_drop(struct Qdisc *sch, struct 
sk_buff **to_free)
 
 static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 {
-   int wlen = skb_network_offset(skb);
+   const int offset = skb_network_offset(skb);
+   u16 *buf, buf_;
u8 dscp;
 
switch (tc_skb_protocol(skb)) {
case htons(ETH_P_IP):
-   wlen += sizeof(struct iphdr);
-   if (!pskb_may_pull(skb, wlen) ||
-   skb_try_make_writable(skb, wlen))
+   buf = skb_header_pointer(skb, offset, sizeof(buf_), &buf_);
+   if (unlikely(!buf))
return 0;
 
-   dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
-   if (wash && dscp)
+   /* ToS is in the second byte of iphdr */
+   dscp = ipv4_get_dsfield((struct iphdr *)buf) >> 2;
+
+   if (wash && dscp) {
+   const int wlen = offset + sizeof(struct iphdr);
+
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0);
+   }
+
return dscp;
 
case htons(ETH_P_IPV6):
-   wlen += sizeof(struct ipv6hdr);
-   if (!pskb_may_pull(skb, wlen) ||
-   skb_try_make_writable(skb, wlen))
+   buf = skb_header_pointer(skb, offset, sizeof(buf_), &buf_);
+   if (unlikely(!buf))
return 0;
 
-   dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
-   if (wash && dscp)
+   /* Traffic class is in the first and second bytes of ipv6hdr */
+   dscp = ipv6_get_dsfield((struct ipv6hdr *)buf) >> 2;
+
+   if (wash && dscp) {
+   const int wlen = offset + sizeof(struct ipv6hdr);
+
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0);
+   }
+
return dscp;
 
case htons(ETH_P_ARP):

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net 2/3] sch_cake: don't call diffserv parsing code when it is not needed

2020-06-25 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

As a further optimisation of the diffserv parsing codepath, we can skip it
entirely if CAKE is configured to neither use diffserv-based
classification, nor to zero out the diffserv bits.

Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before 
changing DSCP bits")
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index cae006bef565..094d6e652deb 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1551,7 +1551,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct 
sk_buff **to_free)
return idx + (tin << 16);
 }
 
-static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
+static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
 {
const int offset = skb_network_offset(skb);
u16 *buf, buf_;
@@ -1612,14 +1612,17 @@ static struct cake_tin_data *cake_select_tin(struct 
Qdisc *sch,
 {
struct cake_sched_data *q = qdisc_priv(sch);
u32 tin, mark;
+   bool wash;
u8 dscp;
 
/* Tin selection: Default to diffserv-based selection, allow overriding
-* using firewall marks or skb->priority.
+* using firewall marks or skb->priority. Call DSCP parsing early if
+* wash is enabled, otherwise defer to below to skip unneeded parsing.
 */
-   dscp = cake_handle_diffserv(skb,
-   q->rate_flags & CAKE_FLAG_WASH);
mark = (skb->mark & q->fwmark_mask) >> q->fwmark_shft;
+   wash = !!(q->rate_flags & CAKE_FLAG_WASH);
+   if (wash)
+   dscp = cake_handle_diffserv(skb, wash);
 
if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
tin = 0;
@@ -1633,6 +1636,8 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc 
*sch,
tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
 
else {
+   if (!wash)
+   dscp = cake_handle_diffserv(skb, wash);
tin = q->tin_index[dscp];
 
if (unlikely(tin >= q->tin_cnt))

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags

2020-06-25 Thread Toke Høiland-Jørgensen
David Miller  writes:

> From: Toke Høiland-Jørgensen 
> Date: Thu, 25 Jun 2020 13:55:03 +0200
>
>> From: Ilya Ponetayev 
>> 
>> CAKE was using the return value of tc_skb_protocol() and expecting it to be
>> the IP protocol type. This can fail in the presence of QinQ VLAN tags,
>> making CAKE unable to handle ECN marking and diffserv parsing in this case.
>> Fix this by implementing our own version of tc_skb_protocol(), which will
>> use skb->protocol directly, but also parse and skip over any VLAN tags and
>> return the inner protocol number instead.
>> 
>> Also fix CE marking by implementing a version of INET_ECN_set_ce() that
>> uses the same parsing routine.
>> 
>> Fixes: ea82511518f4 ("sch_cake: Add NAT awareness to packet classifier")
>> Fixes: b2100cc56fca ("sch_cake: Use tc_skb_protocol() helper for getting 
>> packet protocol")
>> Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) 
>> qdisc")
>> Signed-off-by: Ilya Ponetayev 
>> [ squash original two patches, rewrite commit message ]
>> Signed-off-by: Toke Høiland-Jørgensen 
>
> First, this is a bug fix and should probably be steered to 'net'.
>
> Also, other users of tc_skb_protocol() are almost certainly hitting a
> similar problem aren't they?  Maybe fix this generically.

I think it depends a little on the use case; some callers actually care
about the VLAN tags themselves and handle that specially (e.g.,
act_csum). Whereas others (e.g., sch_dsmark) probably will have the same
issue. I guess I can trying going through them all and figuring out if
there's a more generic solution.

I'll split out the diffserv parsing fixes and send those for your net
tree straight away, then circle back to this one...

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake

2020-06-25 Thread Toke Høiland-Jørgensen
David Miller  writes:

> From: Toke Høiland-Jørgensen 
> Date: Thu, 25 Jun 2020 13:55:02 +0200
>
>> The first three patches in the series are candidates for inclusion into 
>> stable.
>
> Stable candidates, ie. fixes, should target 'net' not 'net-next'.

Right, sure, can do; I was just being lazy and putting everything in one
series :)

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next 3/5] sch_cake: don't call diffserv parsing code when it is not needed

2020-06-25 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

As a further optimisation of the diffserv parsing codepath, we can skip it
entirely if CAKE is neither configured to use diffserv-based
classification, nor to zero out the diffserv bits.

Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before 
changing DSCP bits")
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |   13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index cebcc36755ac..958523c777be 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1597,7 +1597,7 @@ static unsigned int cake_drop(struct Qdisc *sch, struct 
sk_buff **to_free)
return idx + (tin << 16);
 }
 
-static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
+static u8 cake_handle_diffserv(struct sk_buff *skb, bool wash)
 {
const int offset = skb_network_offset(skb);
u16 *buf, buf_;
@@ -1658,14 +1658,17 @@ static struct cake_tin_data *cake_select_tin(struct 
Qdisc *sch,
 {
struct cake_sched_data *q = qdisc_priv(sch);
u32 tin, mark;
+   bool wash;
u8 dscp;
 
/* Tin selection: Default to diffserv-based selection, allow overriding
-* using firewall marks or skb->priority.
+* using firewall marks or skb->priority. Call DSCP parsing early if
+* wash is enabled, otherwise defer to below to skip unneeded parsing.
 */
-   dscp = cake_handle_diffserv(skb,
-   q->rate_flags & CAKE_FLAG_WASH);
mark = (skb->mark & q->fwmark_mask) >> q->fwmark_shft;
+   wash = !!(q->rate_flags & CAKE_FLAG_WASH);
+   if (wash)
+   dscp = cake_handle_diffserv(skb, wash);
 
if (q->tin_mode == CAKE_DIFFSERV_BESTEFFORT)
tin = 0;
@@ -1679,6 +1682,8 @@ static struct cake_tin_data *cake_select_tin(struct Qdisc 
*sch,
tin = q->tin_order[TC_H_MIN(skb->priority) - 1];
 
else {
+   if (!wash)
+   dscp = cake_handle_diffserv(skb, wash);
tin = q->tin_index[dscp];
 
if (unlikely(tin >= q->tin_cnt))

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next 0/5] sched: A series of fixes and optimisations for sch_cake

2020-06-25 Thread Toke Høiland-Jørgensen
Hi Dave

This series contains a number of fixes and optimisations for sch_cake that we've
accumulated in the out-of-tree version. See the individual patch descriptions
for more details.

The first three patches in the series are candidates for inclusion into stable.

-Toke

---

Ilya Ponetayev (2):
  sch_cake: fix IP protocol handling in the presence of VLAN tags
  sch_cake: don't try to reallocate or unshare skb unconditionally

Kevin Darbyshire-Bryant (1):
  sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling

Toke Høiland-Jørgensen (2):
  sch_cake: don't call diffserv parsing code when it is not needed
  sch_cake: fix a few style nits


 net/sched/sch_cake.c | 66 ++--
 1 file changed, 45 insertions(+), 21 deletions(-)

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next 2/5] sch_cake: don't try to reallocate or unshare skb unconditionally

2020-06-25 Thread Toke Høiland-Jørgensen
From: Ilya Ponetayev 

cake_handle_diffserv() tries to linearize mac and network header parts of
skb and to make it writable unconditionally. In some cases it leads to full
skb reallocation, which reduces throughput and increases CPU load. Some
measurements of IPv4 forward + NAPT on MIPS router with 580 MHz single-core
CPU was conducted. It appears that on kernel 4.9 skb_try_make_writable()
reallocates skb, if skb was allocated in ethernet driver via so-called
'build skb' method from page cache (it was discovered by strange increase
of kmalloc-2048 slab at first).

Obtain DSCP value via read-only skb_header_pointer() call, and leave
linearization only for DSCP bleaching or ECN CE setting. And, as an
additional optimisation, skip diffserv parsing entirely if it is not needed
by the current configuration.

Fixes: c87b4ecdbe8d ("sch_cake: Make sure we can write the IP header before 
changing DSCP bits")
Signed-off-by: Ilya Ponetayev 
[ fix a few style issues, reflow commit message ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |   41 ++---
 1 file changed, 30 insertions(+), 11 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 0f594d88a957..cebcc36755ac 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -1599,30 +1599,49 @@ static unsigned int cake_drop(struct Qdisc *sch, struct 
sk_buff **to_free)
 
 static u8 cake_handle_diffserv(struct sk_buff *skb, u16 wash)
 {
-   int wlen = skb_network_offset(skb);
+   const int offset = skb_network_offset(skb);
+   u16 *buf, buf_;
u8 dscp;
 
switch (cake_skb_proto(skb)) {
case htons(ETH_P_IP):
-   wlen += sizeof(struct iphdr);
-   if (!pskb_may_pull(skb, wlen) ||
-   skb_try_make_writable(skb, wlen))
+   buf = skb_header_pointer(skb, offset, sizeof(buf_), &buf_);
+   if (unlikely(!buf))
return 0;
 
-   dscp = ipv4_get_dsfield(ip_hdr(skb)) >> 2;
-   if (wash && dscp)
+   /* ToS is in the second byte of iphdr */
+   dscp = ipv4_get_dsfield((struct iphdr *)buf) >> 2;
+
+   if (wash && dscp) {
+   const int wlen = offset + sizeof(struct iphdr);
+
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
ipv4_change_dsfield(ip_hdr(skb), INET_ECN_MASK, 0);
+   }
+
return dscp;
 
case htons(ETH_P_IPV6):
-   wlen += sizeof(struct ipv6hdr);
-   if (!pskb_may_pull(skb, wlen) ||
-   skb_try_make_writable(skb, wlen))
+   buf = skb_header_pointer(skb, offset, sizeof(buf_), &buf_);
+   if (unlikely(!buf))
return 0;
 
-   dscp = ipv6_get_dsfield(ipv6_hdr(skb)) >> 2;
-   if (wash && dscp)
+   /* Traffic class is in the first and second bytes of ipv6hdr */
+   dscp = ipv6_get_dsfield((struct ipv6hdr *)buf) >> 2;
+
+   if (wash && dscp) {
+   const int wlen = offset + sizeof(struct ipv6hdr);
+
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
ipv6_change_dsfield(ipv6_hdr(skb), INET_ECN_MASK, 0);
+   }
+
return dscp;
 
case htons(ETH_P_ARP):

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next 1/5] sch_cake: fix IP protocol handling in the presence of VLAN tags

2020-06-25 Thread Toke Høiland-Jørgensen
From: Ilya Ponetayev 

CAKE was using the return value of tc_skb_protocol() and expecting it to be
the IP protocol type. This can fail in the presence of QinQ VLAN tags,
making CAKE unable to handle ECN marking and diffserv parsing in this case.
Fix this by implementing our own version of tc_skb_protocol(), which will
use skb->protocol directly, but also parse and skip over any VLAN tags and
return the inner protocol number instead.

Also fix CE marking by implementing a version of INET_ECN_set_ce() that
uses the same parsing routine.

Fixes: ea82511518f4 ("sch_cake: Add NAT awareness to packet classifier")
Fixes: b2100cc56fca ("sch_cake: Use tc_skb_protocol() helper for getting packet 
protocol")
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) 
qdisc")
Signed-off-by: Ilya Ponetayev 
[ squash original two patches, rewrite commit message ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |   52 +++---
 1 file changed, 49 insertions(+), 3 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 60f8ae578819..0f594d88a957 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -497,6 +497,52 @@ static bool cobalt_queue_empty(struct cobalt_vars *vars,
return down;
 }
 
+static __be16 cake_skb_proto(const struct sk_buff *skb)
+{
+   unsigned int offset = skb_mac_offset(skb) + sizeof(struct ethhdr);
+   __be16 proto = skb->protocol;
+   struct vlan_hdr vhdr, *vh;
+
+   while (proto == htons(ETH_P_8021Q) || proto == htons(ETH_P_8021AD)) {
+   vh = skb_header_pointer(skb, offset, sizeof(vhdr), &vhdr);
+   if (!vh)
+   break;
+
+   proto = vh->h_vlan_encapsulated_proto;
+   offset += sizeof(vhdr);
+   }
+
+   return proto;
+}
+
+static int cake_set_ce(struct sk_buff *skb)
+{
+   int wlen = skb_network_offset(skb);
+
+   switch (cake_skb_proto(skb)) {
+   case htons(ETH_P_IP):
+   wlen += sizeof(struct iphdr);
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
+   return IP_ECN_set_ce(ip_hdr(skb));
+
+   case htons(ETH_P_IPV6):
+   wlen += sizeof(struct ipv6hdr);
+   if (!pskb_may_pull(skb, wlen) ||
+   skb_try_make_writable(skb, wlen))
+   return 0;
+
+   return IP6_ECN_set_ce(skb, ipv6_hdr(skb));
+
+   default:
+   return 0;
+   }
+
+   return 0;
+}
+
 /* Call this with a freshly dequeued packet for possible congestion marking.
  * Returns true as an instruction to drop the packet, false for delivery.
  */
@@ -549,7 +595,7 @@ static bool cobalt_should_drop(struct cobalt_vars *vars,
 
if (next_due && vars->dropping) {
/* Use ECN mark if possible, otherwise drop */
-   drop = !(vars->ecn_marked = INET_ECN_set_ce(skb));
+   drop = !(vars->ecn_marked = cake_set_ce(skb));
 
vars->count++;
if (!vars->count)
@@ -592,7 +638,7 @@ static bool cake_update_flowkeys(struct flow_keys *keys,
bool rev = !skb->_nfct, upd = false;
__be32 ip;
 
-   if (tc_skb_protocol(skb) != htons(ETH_P_IP))
+   if (cake_skb_proto(skb) != htons(ETH_P_IP))
return false;
 
if (!nf_ct_get_tuple_skb(&tuple, skb))
@@ -1556,7 +1602,7 @@ static u8 cake_handle_diffserv(struct sk_buff *skb, u16 
wash)
int wlen = skb_network_offset(skb);
u8 dscp;
 
-   switch (tc_skb_protocol(skb)) {
+   switch (cake_skb_proto(skb)) {
case htons(ETH_P_IP):
wlen += sizeof(struct iphdr);
if (!pskb_may_pull(skb, wlen) ||

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next 5/5] sch_cake: fix a few style nits

2020-06-25 Thread Toke Høiland-Jørgensen
From: Toke Høiland-Jørgensen 

I spotted a few nits when comparing the in-tree version of sch_cake with
the out-of-tree one: A redundant error variable declaration shadowing an
outer declaration, and an indentation alignment issue. Fix both of these to
minimise the delta.

Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 78a702a4e1d4..e075913b2fd7 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -2761,7 +2761,7 @@ static int cake_init(struct Qdisc *sch, struct nlattr 
*opt,
qdisc_watchdog_init(&q->watchdog, sch);
 
if (opt) {
-   int err = cake_change(sch, opt, extack);
+   err = cake_change(sch, opt, extack);
 
if (err)
return err;
@@ -3078,7 +3078,7 @@ static int cake_dump_class_stats(struct Qdisc *sch, 
unsigned long cl,
PUT_STAT_S32(BLUE_TIMER_US,
 ktime_to_us(
 ktime_sub(now,
-flow->cvars.blue_timer)));
+  
flow->cvars.blue_timer)));
}
if (flow->cvars.dropping) {
PUT_STAT_S32(DROP_NEXT_US,

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net-next 4/5] sch_cake: add RFC 8622 LE PHB support to CAKE diffserv handling

2020-06-25 Thread Toke Høiland-Jørgensen
From: Kevin Darbyshire-Bryant 

Change tin mapping on diffserv3, 4 & 8 for LE PHB support, in essence
making LE a member of the Bulk tin.

Bulk has the least priority and minimum of 1/16th total bandwidth in the
face of higher priority traffic.

NB: Diffserv 3 & 4 swap tin 0 & 1 priorities from the default order as
found in diffserv8, in case anyone is wondering why it looks a bit odd.

Signed-off-by: Kevin Darbyshire-Bryant 
[ reword commit message slightly ]
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 958523c777be..78a702a4e1d4 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -312,8 +312,8 @@ static const u8 precedence[] = {
 };
 
 static const u8 diffserv8[] = {
-   2, 5, 1, 2, 4, 2, 2, 2,
-   0, 2, 1, 2, 1, 2, 1, 2,
+   2, 0, 1, 2, 4, 2, 2, 2,
+   1, 2, 1, 2, 1, 2, 1, 2,
5, 2, 4, 2, 4, 2, 4, 2,
3, 2, 3, 2, 3, 2, 3, 2,
6, 2, 3, 2, 3, 2, 3, 2,
@@ -323,7 +323,7 @@ static const u8 diffserv8[] = {
 };
 
 static const u8 diffserv4[] = {
-   0, 2, 0, 0, 2, 0, 0, 0,
+   0, 1, 0, 0, 2, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0,
2, 0, 2, 0, 2, 0, 2, 0,
2, 0, 2, 0, 2, 0, 2, 0,
@@ -334,7 +334,7 @@ static const u8 diffserv4[] = {
 };
 
 static const u8 diffserv3[] = {
-   0, 0, 0, 0, 2, 0, 0, 0,
+   0, 1, 0, 0, 2, 0, 0, 0,
1, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0,

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [CAKE] Rate is much lower than expected - CPU load is higher than expected

2020-06-23 Thread Toke Høiland-Jørgensen
Jose Blanquicet  writes:

> Hi Toke,
>
> Thanks for your reply.
>
> On Mon, Jun 22, 2020 at 5:47 PM Toke Høiland-Jørgensen  
> wrote:
>> > We have an embedded system with limited CPU resources that acts as a
>> > gateway to provide Internet access from LTE to a private USB-NCM
>> > network (And also to a Wi-Fi private network but we will work on it
>> > later). Our problem is that the bandwidth on LTE and USB link is
>> > higher than what the system is able to handle thus it reaches 100% of
>> > CPU load when we perform a simple speed test from a device on the
>> > private network.
>>
>> What speeds were you getting without shaping?
>
> Between 35 and 40Mbps.
>
>> > Therefore, we want to limit the bandwidth to avoid system getting
>> > saturated in such use-case. To do so, we thought to use the CAKE on
>> > the USB interface. For instance, we tried:
>> >
>> > tc qdisc replace root dev eth0 cake bandwidth 20mbit ethernet
>> > internet flowblind nonat besteffort nowash
>> >
>> > It worked correctly and the maximum rate was limited but there are two
>> > things that are worrying us:
>> >
>> > 1) The maximum rate reached after applying CAKE was in between 12Mbps
>> > and 15Mbps which is quite lower than the 20Mbps we are configuring, we
>> > were expecting around 18-19. Why? Is there something in the parameters
>> > we are doing wrong? Please take into account that our goal is to limit
>> > the rate but adding as little CPU load as possible.
>>
>> Hmm, are you actually running out of CPU? I.e., is the CPU pegged at
>> 100% when you hit this limit? What kind of platform are you running on?
>> And what kernel and CAKE versions are you using?
>
> I checked the CPU with top and there is still free CPU to be used. We
> also tried with lower values like 10 and it is again far away from the
> configured limit.
>
> We have just a percentage of an ARM Cortex A7 (1.2GHz) because the
> rest is reserved for modem. We are now trying to optimize all the
> applications in the system but LTE<->WIFI/USB data transfer is indeed
> the
> use-case that puts our system in crisis.
>
> The kernel version is 3.18 and for we are using the latest commit on
> master branch (9d79e2b) for CAKE. In case, we could change CAKE but
> not the kernel version, at most some specific patches.

Right, well if you're not running out of CPU I guess it could be a
timing issue. The CAKE shaper relies on accurate timestamps and the
qdisc watchdog timer to schedule the transmission of packets. A loaded
system can simply miss deadlines, which would be consistent with the
behaviour you're seeing.

In fact, when looking into this a bit more, I came across this commit
that seemed to observe the same behaviour in sch_fq:
https://git.kernel.org/torvalds/c/fefa569a9d4b

So I guess we could try to do something similar in CAKE.

Could you please post the output of 'tc -s qdisc' after a test run? That
should give some indication on how much the shaper is throttling...

>> > 2) The CPU load added by CAKE was not negligible for our system. In
>> > fact, we compared the CPU load when limitation was done by CAKE and by
>> > the device on the private network, e.g. curl tool with parameter
>> > "--limit-rate". As a result, we found that the CPU load when using
>> > CAKE was 30%. Is there any way to make it lighter with a different
>> > configuration?
>>
>> No, you've already turned off most of the features that might incur
>> overhead, so I don't think there's anything more you can do
>> configuration-wise to improve CPU load. Shaping does tend to use up a
>> lot of CPU, so it's not too surprising you run into issues here.
>
> Could you please help us to identify which one is still active? We
> thought we had already turned off all the features not needed to apply
> a limitation with a single queue (Besteffor mode).

Well the only thing more you can turn off by configuration is the shaper
itself :)

>> We did recently get a pull request whose author states that he was
>> seeing a 1/3 improvement in performance from it. See:
>> https://github.com/dtaht/sch_cake/pull/136
>>
>> You could try this; if your ingress network device driver has the same
>> issue with skbs being allocated in smaller bits, you may see a similar
>> increase with this patch. For a quick test you could also just try
>> commenting out the call to cake_handle_diffserv() entirely since you're
>> running in besteffort mode anyway :)
>
> Interesting. We will try this, we commented out the call to
> cake_handle_diffserv() as you said and just to be sure, we also
> applied the 2nd commit of the PR. I will be back soon with news.

OK, great!

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [CAKE] Rate is much lower than expected - CPU load is higher than expected

2020-06-22 Thread Toke Høiland-Jørgensen
Jose Blanquicet  writes:

> Hi everyone,
>
> We have an embedded system with limited CPU resources that acts as a
> gateway to provide Internet access from LTE to a private USB-NCM
> network (And also to a Wi-Fi private network but we will work on it
> later). Our problem is that the bandwidth on LTE and USB link is
> higher than what the system is able to handle thus it reaches 100% of
> CPU load when we perform a simple speed test from a device on the
> private network.

What speeds were you getting without shaping?

> Therefore, we want to limit the bandwidth to avoid system getting
> saturated in such use-case. To do so, we thought to use the CAKE on
> the USB interface. For instance, we tried:
>
> tc qdisc replace root dev eth0 cake bandwidth 20mbit ethernet
> internet flowblind nonat besteffort nowash
>
> It worked correctly and the maximum rate was limited but there are two
> things that are worrying us:
>
> 1) The maximum rate reached after applying CAKE was in between 12Mbps
> and 15Mbps which is quite lower than the 20Mbps we are configuring, we
> were expecting around 18-19. Why? Is there something in the parameters
> we are doing wrong? Please take into account that our goal is to limit
> the rate but adding as little CPU load as possible.

Hmm, are you actually running out of CPU? I.e., is the CPU pegged at
100% when you hit this limit? What kind of platform are you running on?
And what kernel and CAKE versions are you using?

> 2) The CPU load added by CAKE was not negligible for our system. In
> fact, we compared the CPU load when limitation was done by CAKE and by
> the device on the private network, e.g. curl tool with parameter
> "--limit-rate". As a result, we found that the CPU load when using
> CAKE was 30%. Is there any way to make it lighter with a different
> configuration?

No, you've already turned off most of the features that might incur
overhead, so I don't think there's anything more you can do
configuration-wise to improve CPU load. Shaping does tend to use up a
lot of CPU, so it's not too surprising you run into issues here.

We did recently get a pull request whose author states that he was
seeing a 1/3 improvement in performance from it. See:
https://github.com/dtaht/sch_cake/pull/136

You could try this; if your ingress network device driver has the same
issue with skbs being allocated in smaller bits, you may see a similar
increase with this patch. For a quick test you could also just try
commenting out the call to cake_handle_diffserv() entirely since you're
running in besteffort mode anyway :)

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [Ecn-sane] Fwd: [tsvwg] Fwd: Working Group Last Call: QUIC protocol drafts

2020-06-10 Thread Toke Høiland-Jørgensen via Cake
--- Begin Message ---
Dave Taht  writes:

> I am happy to see quic in last call. there are a ton of interoperble
> implementations now.

And in related news, this RFC of pacing from userspace was posted on
netdev yesterday:

https://lore.kernel.org/netdev/20200609140934.110785-1-willemdebruijn.ker...@gmail.com/T/

-Toke
--- End Message ---
___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] [PATCH net] sch_cake: Take advantage of skb->hash where appropriate

2020-05-29 Thread Toke Høiland-Jørgensen
Jakub Kicinski  writes:

> On Fri, 29 May 2020 14:43:44 +0200 Toke Høiland-Jørgensen wrote:
>> + * enabled there's another check below after doing the conntrack lookup.
>> +  */
>
> nit: alignment

Ah, right, seems I forget to hit  after adding that end marker.

Davem, can I get you to fix that when applying, or should I send a v2?

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


[Cake] [PATCH net] sch_cake: Take advantage of skb->hash where appropriate

2020-05-29 Thread Toke Høiland-Jørgensen
While the other fq-based qdiscs take advantage of skb->hash and doesn't
recompute it if it is already set, sch_cake does not.

This was a deliberate choice because sch_cake hashes various parts of the
packet header to support its advanced flow isolation modes. However,
foregoing the use of skb->hash entirely loses a few important benefits:

- When skb->hash is set by hardware, a few CPU cycles can be saved by not
  hashing again in software.

- Tunnel encapsulations will generally preserve the value of skb->hash from
  before the encapsulation, which allows flow-based qdiscs to distinguish
  between flows even though the outer packet header no longer has flow
  information.

It turns out that we can preserve these desirable properties in many cases,
while still supporting the advanced flow isolation properties of sch_cake.
This patch does so by reusing the skb->hash value as the flow_hash part of
the hashing procedure in cake_hash() only in the following conditions:

- If the skb->hash is marked as covering the flow headers (skb->l4_hash is
  set)

AND

- NAT header rewriting is either disabled, or did not change any values
  used for hashing. The latter is important to match local-origin packets
  such as those of a tunnel endpoint.

The immediate motivation for fixing this was the recent patch to WireGuard
to preserve the skb->hash on encapsulation. As such, this is also what I
tested against; with this patch, added latency under load for competing
flows drops from ~8 ms to sub-1ms on an RRUL test over a WireGuard tunnel
going through a virtual link shaped to 1Gbps using sch_cake. This matches
the results we saw with a similar setup using sch_fq_codel when testing the
WireGuard patch.

Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) 
qdisc")
Signed-off-by: Toke Høiland-Jørgensen 
---
 net/sched/sch_cake.c | 65 ++--
 1 file changed, 51 insertions(+), 14 deletions(-)

diff --git a/net/sched/sch_cake.c b/net/sched/sch_cake.c
index 1496e87cd07b..2a704b25dc6a 100644
--- a/net/sched/sch_cake.c
+++ b/net/sched/sch_cake.c
@@ -584,26 +584,48 @@ static bool cobalt_should_drop(struct cobalt_vars *vars,
return drop;
 }
 
-static void cake_update_flowkeys(struct flow_keys *keys,
+static bool cake_update_flowkeys(struct flow_keys *keys,
 const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
struct nf_conntrack_tuple tuple = {};
-   bool rev = !skb->_nfct;
+   bool rev = !skb->_nfct, upd = false;
+   __be32 ip;
 
if (tc_skb_protocol(skb) != htons(ETH_P_IP))
-   return;
+   return false;
 
if (!nf_ct_get_tuple_skb(&tuple, skb))
-   return;
+   return false;
 
-   keys->addrs.v4addrs.src = rev ? tuple.dst.u3.ip : tuple.src.u3.ip;
-   keys->addrs.v4addrs.dst = rev ? tuple.src.u3.ip : tuple.dst.u3.ip;
+   ip = rev ? tuple.dst.u3.ip : tuple.src.u3.ip;
+   if (ip != keys->addrs.v4addrs.src) {
+   keys->addrs.v4addrs.src = ip;
+   upd = true;
+   }
+   ip = rev ? tuple.src.u3.ip : tuple.dst.u3.ip;
+   if (ip != keys->addrs.v4addrs.dst) {
+   keys->addrs.v4addrs.dst = ip;
+   upd = true;
+   }
 
if (keys->ports.ports) {
-   keys->ports.src = rev ? tuple.dst.u.all : tuple.src.u.all;
-   keys->ports.dst = rev ? tuple.src.u.all : tuple.dst.u.all;
+   __be16 port;
+
+   port = rev ? tuple.dst.u.all : tuple.src.u.all;
+   if (port != keys->ports.src) {
+   keys->ports.src = port;
+   upd = true;
+   }
+   port = rev ? tuple.src.u.all : tuple.dst.u.all;
+   if (port != keys->ports.dst) {
+   port = keys->ports.dst;
+   upd = true;
+   }
}
+   return upd;
+#else
+   return false;
 #endif
 }
 
@@ -624,23 +646,36 @@ static bool cake_ddst(int flow_mode)
 static u32 cake_hash(struct cake_tin_data *q, const struct sk_buff *skb,
 int flow_mode, u16 flow_override, u16 host_override)
 {
+   bool hash_flows = (!flow_override && !!(flow_mode & CAKE_FLOW_FLOWS));
+   bool hash_hosts = (!host_override && !!(flow_mode & CAKE_FLOW_HOSTS));
+   bool nat_enabled = !!(flow_mode & CAKE_FLOW_NAT_FLAG);
u32 flow_hash = 0, srchost_hash = 0, dsthost_hash = 0;
u16 reduced_hash, srchost_idx, dsthost_idx;
struct flow_keys keys, host_keys;
+   bool use_skbhash = skb->l4_hash;
 
if (unlikely(flow_mode == CAKE_FLOW_NONE))
return 0;
 
-   /* If both overrides are set we can skip packet dissection entirely */
-   if ((flow_override || !(flow_mode & CAKE_FLOW_FLOWS)

Re: [Cake] [Make-wifi-fast] [Bloat] dslreports is no longer free

2020-05-27 Thread Toke Høiland-Jørgensen
Matthew Ford  writes:

> What's the bufferbloat verdict on https://speed.cloudflare.com/ ?

Huh, didn't know about that. Seems they're measuring the latency before
the download test, though, so no bufferbloat numbers. If anyone knows
someone at Cloudflare we could try to bug to get this fixed, that would
be awesome!

Their FAQ links to https://www.speedcheck.org/ for "troubleshooting
tips". And of course that page doesn't seem to mention latency or
bufferbloat at all :(

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Query on ACK

2020-05-25 Thread Toke Høiland-Jørgensen
Jonathan Morton  writes:

>> On 25 May, 2020, at 8:17 am, Avakash bhat  wrote:
>> 
>> We had another query we would like to resolve. We wanted to verify the 
>> working of ack filter in ns-3, 
>> so we decided to replicate the Fig 6 graph in the CAKE 
>> paper(https://ieeexplore.ieee.org/document/8475045). 
>> While trying to build the topology we realized that we do not know the 
>> number of packets or bytes sent from 
>> the source to the destination for each of the TCP connections ( We are 
>> assuming it is a point to point connection with 4 TCP flows). 
>> 
>> Could we get a bit more details about how the experiment was conducted?
>
> I believe this was conducted using the RRUL test in Flent.  This opens
> four saturating TCP flows in each direction, and also sends a small
> amount of latency measuring traffic.  On this occasion I don't think
> we added any simulated path delays, and only imposed the quoted
> asymmetric bandwidth limits (30Mbps down, 1Mbps up).

See https://www.cs.kau.se/tohojo/cake/ - the link to the data files near
the bottom of that page also contains the Flent batch file and setup
scripts used to run the whole thing.

(And there's no explicit "number of bytes sent", but rather the flows
are capacity-seeking flows running for a limited *time*).

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Is target a command-line option?

2020-05-22 Thread Toke Høiland-Jørgensen
"Aarti Nandagiri. via Cake"  writes:

> From: "Aarti Nandagiri." 
> Subject: Is target a command-line option?
> To: cake@lists.bufferbloat.net
> Date: Thu, 21 May 2020 17:12:00 +0530
>
> Hello,
>
> Can I use a different 'target' value by passing it as a command-line option
> for the cake qdisc?

Not directly. You can use one of the presents (datacentre, lan, metro,
regional, internet, oceanic, satellite, interplanetary) to select a
preset target/interval setting, or you can use 'rtt X' to set the CoDel,
in which case 'target' will be set to interval/20.

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


Re: [Cake] Query on ACK

2020-05-08 Thread Toke Høiland-Jørgensen
Dave Taht  writes:

> On Thu, May 7, 2020 at 11:36 PM Avakash bhat  wrote:
>>
>> Ok thanks so much for the clarifications.
>> That cleared it up quite a bit.
>
> I note that there was something really subtle that could have been
> done to improve cake's ack handling, and for all I know
> it actually happened in the final codebase.
>
> so, please, go forth and duplicate the existing implementation, and
> ignore me, cause looking at this hairy code gives me a
> headache.
>
> anyway, to try and describe what I thought I saw an interaction with
> the scheduler back in the day.
>
> The ack-filter runs, deleting all but one packet from the ack queue,
> and delivers that.
> the scheduler runs, serves a bunch of other flows, then returns to the
> ack queue, which has accumulated a couple more packets,
> the ack-filter runs, deleting all but one packet from the ack queue,
> and delivers that, but doesn't exhaust its qauntum

The ACK filter runs on enqueue, so if a queue has only ACKs in it, it
will never accumulate anything in the first place...

-Toke

___
Cake mailing list
Cake@lists.bufferbloat.net
https://lists.bufferbloat.net/listinfo/cake


  1   2   3   4   5   6   7   >