Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-16 Thread Jamal Hadi Salim

Sorry I dropped this.

On 14/05/18 10:08 AM, Michel Machado wrote:

On 09/05/18 01:37 PM, Michel Machado wrote:




A simplified description of what DSprio is meant to do is as follows: 
when a link is overloaded at a router, DSprio makes this router drop the 
packets of lower priority.


Makes sense. Any priority based work-conserving scheduler will work
fine. The only small difference you have with prio qdisc is you
drop an enqueued low prio packet to make room for a new higher prio
queue. Can you look at pfifo_head_drop qdisc to see if it suffices? It
may not be: In such a case, I would suggest a hybrid between
pfifo_head_drop and pfifo_fast for the new qdisc.
[Cong has suggested to write a classful qdisc but it may be sufficient
to just replicate what pfifo_fast does since it tracks virtual queues]

These priorities are assigned by Gatekeeper 
in such a way that well behaving sources are favored (Theorem 4.1 of the 
Portcullis paper pointed out in my previous email). Moreover, attackers 
cannot do much better than well behaving sources (Theorem 4.2). This 
description is simplified because it omits many other components of 
Gatekeeper that affects the packets that goes to DSprio.




I am sorry - I have no access to this document so dont know what these
theorems are. I understand your requirements. 1) You are looking to use
priority identifiers to select queues. 2) You want to prioritize
treatment of favorably tagged packets. The enqueueing will drop
lower priority packets to make space for higher priority under
congestion. Did i miss anything?
For #1 my suggestion is to use skbmod to set the priority tag.
For #2 if you didnt have to drop at enqueue time you could have
used any of the existing priority favoring qdiscs which recognize
skb->priority. Otherwise as i suggested above look at
pfifo_fast/pfifo_head_drop

Like you, I'm all in for less code. If someone can instruct us on how to 
accomplish the same thing that our patch is doing, we would be happy to 
withdraw it. We have submitted this patch because we want to lower the 
bar to deploy Gatekeeper as much as possible, and requiring network 
operators willing to deploy Gatekeeper to keep patching the kernel is an 
operational burden.




So I would suggest you keep this real simple - especially if you want to
go backwards in kernels. For existing kernels you can implement the
basic policies of what you need by using prio qdisc with a combination
of a classifier that knows how to match on dsfield (trivial to do with
u32) and skbedit action to tag the skb->priority. Then let prio qdisc
use the priomap to select the queue.
If you must drop enqueued low prio packets then you may need the new
qdisc. And to optimize, you will need the skbmod change.
I really think it is a bad idea to encapsulate the classifier in the
qdisc.



Look at the priomap or prio2band arrangement on prio qdisc
or pfifo_fast qdisc. You take an skbprio as an index into the array
and retrieve a queue to enqueue to. The size of the array is 16.
In the past this was based IIRC on ip precedence + 1 bit. Those map
similarly to DS fields (calls selectors, assured forwarding etc). So
no need to even increase the array beyond current 16.


What application is this change supposed to enable or help? I think this 
change should be left for when one can explain the need for it.




I meant to take a look at the prio map. It is an array of size 16 which
holds the skb->priority implicit classifier (prio, pfifo_fast etc).
A packets skb priority is used as an index into this array and from the 
result a queue is selected to put the packet onto.

The map of this array can be configured from user space. I was saying
earlier that it may be tempting to make a size 64 array to map the
possible dsfields - in practise that has never been pragmatic (so 16 was
sufficient).


cheers,
jamal


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-14 Thread Michel Machado

On 09/05/18 01:37 PM, Michel Machado wrote:

On 05/09/2018 10:43 AM, Jamal Hadi Salim wrote:

On 08/05/18 10:27 PM, Cong Wang wrote:
On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim  
wrote:




I like the suggestion of extending skbmod to mark skbprio based on ds. 
Given that DSprio would no longer depend on the DS field, would you 
have a name suggestion for this new queue discipline since the name 
"prio" is currently in use?




Not sure what to call it.
My struggle is still with the intended end goal of the qdisc.
It looks like prio qdisc except for the enqueue part which attempts
to use a shared global queue size for all prios. I would have
pointed to other approaches which use global priority queue pool
which do early congestion detection like RED or variants like GRED but
those use average values of the queue lengths not instantenous values 
such as you do.

I am tempted to say - based on my current understanding - that you dont
need a new qdisc; rather you need to map your dsfields to skbprio
(via skbmod) and stick with prio qdisc. I also think the skbmod
mapping is useful regardless of this need.


A simplified description of what DSprio is meant to do is as follows: 
when a link is overloaded at a router, DSprio makes this router drop the 
packets of lower priority. These priorities are assigned by Gatekeeper 
in such a way that well behaving sources are favored (Theorem 4.1 of the 
Portcullis paper pointed out in my previous email). Moreover, attackers 
cannot do much better than well behaving sources (Theorem 4.2). This 
description is simplified because it omits many other components of 
Gatekeeper that affects the packets that goes to DSprio.


Like you, I'm all in for less code. If someone can instruct us on how to 
accomplish the same thing that our patch is doing, we would be happy to 
withdraw it. We have submitted this patch because we want to lower the 
bar to deploy Gatekeeper as much as possible, and requiring network 
operators willing to deploy Gatekeeper to keep patching the kernel is an 
operational burden.


What should be the range of priorities that this new queue discipline 
would accept? skb->prioriry is of type __u32, but supporting 2^32 
priorities would require too large of an array to index packets by 
priority; the DS field is only 6 bits long. Do you have a use case in 
mind to guide us here?




Look at the priomap or prio2band arrangement on prio qdisc
or pfifo_fast qdisc. You take an skbprio as an index into the array
and retrieve a queue to enqueue to. The size of the array is 16.
In the past this was based IIRC on ip precedence + 1 bit. Those map
similarly to DS fields (calls selectors, assured forwarding etc). So
no need to even increase the array beyond current 16.


What application is this change supposed to enable or help? I think this 
change should be left for when one can explain the need for it.



2) Dropping already enqueued packets will not work well for
local feedback (__NET_XMIT_BYPASS return code is about the
packet that has been dropped from earlier enqueueing because
it is lower priority - it does not  signify anything with
current skb to which actually just got enqueud).
Perhaps (off top of my head) is to always enqueue packets on
high priority when their limit is exceeded as long as lower prio has
some space. Means youd have to increment low prio accounting if their
space is used.


I don't understand the point you are making here. Could you develop it 
further?




Sorry - I was meaning NET_XMIT_CN
If you drop an already enqueued packet - it makes sense to signify as
such using NET_XMIT_CN
this does not make sense for forwarded packets but it does
for locally sourced packets.


Thank you for bringing this detail to our attention; we've overlooked 
the return code NET_XMIT_CN. We'll adopt it when the queue is full and 
the lowest priority packet in the queue is being dropped to make room 
for the higher-priority, incoming packet.


[ ]'s
Michel Machado


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-12 Thread Jamal Hadi Salim

Sorry for the latency..

On 09/05/18 01:37 PM, Michel Machado wrote:

On 05/09/2018 10:43 AM, Jamal Hadi Salim wrote:

On 08/05/18 10:27 PM, Cong Wang wrote:
On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim  
wrote:




I like the suggestion of extending skbmod to mark skbprio based on ds. 
Given that DSprio would no longer depend on the DS field, would you have 
a name suggestion for this new queue discipline since the name "prio" is 
currently in use?




Not sure what to call it.
My struggle is still with the intended end goal of the qdisc.
It looks like prio qdisc except for the enqueue part which attempts
to use a shared global queue size for all prios. I would have
pointed to other approaches which use global priority queue pool
which do early congestion detection like RED or variants like GRED but
those use average values of the queue lengths not instantenous values 
such as you do.

I am tempted to say - based on my current understanding - that you dont
need a new qdisc; rather you need to map your dsfields to skbprio
(via skbmod) and stick with prio qdisc. I also think the skbmod
mapping is useful regardless of this need.

What should be the range of priorities that this new queue discipline 
would accept? skb->prioriry is of type __u32, but supporting 2^32 
priorities would require too large of an array to index packets by 
priority; the DS field is only 6 bits long. Do you have a use case in 
mind to guide us here?




Look at the priomap or prio2band arrangement on prio qdisc
or pfifo_fast qdisc. You take an skbprio as an index into the array
and retrieve a queue to enqueue to. The size of the array is 16.
In the past this was based IIRC on ip precedence + 1 bit. Those map
similarly to DS fields (calls selectors, assured forwarding etc). So
no need to even increase the array beyond current 16.


I find the cleverness in changing the highest/low prios confusing.
It looks error-prone (I guess that is why there is a BUG check)
To the authors: Is there a document/paper on the theory of this thing
as to why no explicit queues are "faster"?


The priority orientation in GKprio is due to two factors: failing safe 
and elegance. If zero were the highest priority, any operational mistake 
that leads not-classified packets through GKprio would potentially 
disrupt the system. We are humans, we'll make mistakes. The elegance 
aspect comes from the fact that the assigned priority is not massaged to 
fit the DS field. We find it helpful while inspecting packets on the wire.


The reason for us to avoid explicit queues in GKprio, which could change 
the behavior within a given priority, is to closely abide to the 
expected behavior assumed to prove Theorem 4.1 in the paper "Portcullis: 
Protecting Connection Setup from Denial-of-Capability Attacks":


https://dl.acm.org/citation.cfm?id=1282413



Paper seems to be under paywall. Googling didnt help.
My concern is still the science behind this; if you had written up
some test setup which shows how you concluded this was a better
approach at DOS prevention and showed some numbers it would have
helped greatly clarify.


1) I agree that using multiple queues as in prio qdisc would make it
more manageable; does not necessarily need to be classful if you
use implicit skbprio classification. i.e on equeue use a priority
map to select a queue; on dequeue always dequeu from highest prio
until it has no more packets to send.


In my reply to Cong, I point out that there is a technical limitation in 
the interface of queue disciplines that forbids GKprio to have explicit 
sub-queues:


https://www.mail-archive.com/netdev@vger.kernel.org/msg234201.html


2) Dropping already enqueued packets will not work well for
local feedback (__NET_XMIT_BYPASS return code is about the
packet that has been dropped from earlier enqueueing because
it is lower priority - it does not  signify anything with
current skb to which actually just got enqueud).
Perhaps (off top of my head) is to always enqueue packets on
high priority when their limit is exceeded as long as lower prio has
some space. Means youd have to increment low prio accounting if their
space is used.


I don't understand the point you are making here. Could you develop it 
further?




Sorry - I was meaning NET_XMIT_CN
If you drop an already enqueued packet - it makes sense to signify as
such using NET_XMIT_CN
this does not make sense for forwarded packets but it does
for locally sourced packets.

cheers,
jamal


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-10 Thread Michel Machado

On 05/10/2018 01:38 PM, Cong Wang wrote:

On Wed, May 9, 2018 at 7:09 AM, Michel Machado  wrote:

On 05/08/2018 10:24 PM, Cong Wang wrote:


On Tue, May 8, 2018 at 5:59 AM, Michel Machado 
wrote:


Overall it looks good to me, just one thing below:


+struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
+   .id =   "gkprio",
+   .priv_size  =   sizeof(struct gkprio_sched_data),
+   .enqueue=   gkprio_enqueue,
+   .dequeue=   gkprio_dequeue,
+   .peek   =   qdisc_peek_dequeued,
+   .init   =   gkprio_init,
+   .reset  =   gkprio_reset,
+   .change =   gkprio_change,
+   .dump   =   gkprio_dump,
+   .destroy=   gkprio_destroy,
+   .owner  =   THIS_MODULE,
+};




You probably want to add Qdisc_class_ops here so that you can
dump the stats of each internal queue.




Hi Cong,

 In the production scenario we are targeting, this priority queue must
be
classless; being classful would only bloat the code for us. I don't see
making this queue classful as a problem per se, but I suggest leaving it
as
a future improvement for when someone can come up with a useful scenario
for
it.




Take a look at sch_prio, it is fairly simple since your internal
queues are just an array... Per-queue stats are quite useful
in production, we definitely want to observe which queues are
full which are not.



DSprio cannot add Qdisc_class_ops without a rewrite of other queue
disciplines, which doesn't seem desirable. Since the method cops->leaf is
required (see register_qdisc()), we would need to replace the array struct
sk_buff_head qdiscs[GKPRIO_MAX_PRIORITY] in struct gkprio_sched_data with
the array struct Qdisc *queues[GKPRIO_MAX_PRIORITY] to be able to return a
Qdisc in dsprio_leaf(). The problem with this change is that Qdisc does not
have a method to dequeue from its tail. This new method may not even make
sense in other queue disciplines. But without this method, gkprio_enqueue()
cannot drop the lowest priority packet when the queue is full and an
incoming packet has higher priority.


Sorry for giving you a bad example. Take a look at sch_fq_codel instead,
it returns NULL for ->leaf() and maps its internal flows to classes.

I thought sch_prio uses internal qdiscs, but I was wrong, as you noticed
it actually exposes them to user via classes.

My point is never to make it classful, just want to expose the useful stats,
like how fq_codel dumps its internal flows.




Nevertheless, I see your point on being able to observe the distribution of
queued packets per priority. A solution for that would be to add the array
__u32 qlen[GKPRIO_MAX_PRIORITY] in struct tc_gkprio_qopt. This solution even
avoids adding overhead in the critical paths of DSprio. Do you see a better
solution?


I believe you can return NULL for ->leaf() and don't need to worry about
->graft() either. ;)


Thank you for pointing sch_fq_codel out. We'll follow its example.

[ ]'s
Michel Machado


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-10 Thread Cong Wang
On Wed, May 9, 2018 at 7:09 AM, Michel Machado  wrote:
> On 05/08/2018 10:24 PM, Cong Wang wrote:
>>
>> On Tue, May 8, 2018 at 5:59 AM, Michel Machado 
>> wrote:
>
> Overall it looks good to me, just one thing below:
>
>> +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
>> +   .id =   "gkprio",
>> +   .priv_size  =   sizeof(struct gkprio_sched_data),
>> +   .enqueue=   gkprio_enqueue,
>> +   .dequeue=   gkprio_dequeue,
>> +   .peek   =   qdisc_peek_dequeued,
>> +   .init   =   gkprio_init,
>> +   .reset  =   gkprio_reset,
>> +   .change =   gkprio_change,
>> +   .dump   =   gkprio_dump,
>> +   .destroy=   gkprio_destroy,
>> +   .owner  =   THIS_MODULE,
>> +};
>
>
>
> You probably want to add Qdisc_class_ops here so that you can
> dump the stats of each internal queue.
>>>
>>>
>>>
>>> Hi Cong,
>>>
>>> In the production scenario we are targeting, this priority queue must
>>> be
>>> classless; being classful would only bloat the code for us. I don't see
>>> making this queue classful as a problem per se, but I suggest leaving it
>>> as
>>> a future improvement for when someone can come up with a useful scenario
>>> for
>>> it.
>>
>>
>>
>> Take a look at sch_prio, it is fairly simple since your internal
>> queues are just an array... Per-queue stats are quite useful
>> in production, we definitely want to observe which queues are
>> full which are not.
>>
>
> DSprio cannot add Qdisc_class_ops without a rewrite of other queue
> disciplines, which doesn't seem desirable. Since the method cops->leaf is
> required (see register_qdisc()), we would need to replace the array struct
> sk_buff_head qdiscs[GKPRIO_MAX_PRIORITY] in struct gkprio_sched_data with
> the array struct Qdisc *queues[GKPRIO_MAX_PRIORITY] to be able to return a
> Qdisc in dsprio_leaf(). The problem with this change is that Qdisc does not
> have a method to dequeue from its tail. This new method may not even make
> sense in other queue disciplines. But without this method, gkprio_enqueue()
> cannot drop the lowest priority packet when the queue is full and an
> incoming packet has higher priority.

Sorry for giving you a bad example. Take a look at sch_fq_codel instead,
it returns NULL for ->leaf() and maps its internal flows to classes.

I thought sch_prio uses internal qdiscs, but I was wrong, as you noticed
it actually exposes them to user via classes.

My point is never to make it classful, just want to expose the useful stats,
like how fq_codel dumps its internal flows.


>
> Nevertheless, I see your point on being able to observe the distribution of
> queued packets per priority. A solution for that would be to add the array
> __u32 qlen[GKPRIO_MAX_PRIORITY] in struct tc_gkprio_qopt. This solution even
> avoids adding overhead in the critical paths of DSprio. Do you see a better
> solution?

I believe you can return NULL for ->leaf() and don't need to worry about
->graft() either. ;)


>
> By the way, I've used GKPRIO_MAX_PRIORITY and other names that include
> "gkprio" above to reflect the version 1 of this patch that we are
> discussing. We will rename these identifiers for version 2 of this patch to
> replace "gkprio" with "dsprio".
>

Sounds good.

Thanks.


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-09 Thread Michel Machado

On 05/09/2018 10:43 AM, Jamal Hadi Salim wrote:

On 08/05/18 10:27 PM, Cong Wang wrote:
On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim  
wrote:

Have you considered using skb->prio instead of peeking into the packet
header.
Also have you looked at the dsmark qdisc?



dsmark modifies ds fields, while this one just maps ds fields into
different queues.



Yeah, I was thinking more of re-using it for the purpose of mapping to
queues - but would require a lot more work.

once skbprio is set by something[1] then this qdisc could be used by
other subsystems (8021q, sockets etc); so i would argue for removal
of the embedded classification and instead maybe writing a simple
extension to skbmod to mark skbprio based on ds.


I like the suggestion of extending skbmod to mark skbprio based on ds. 
Given that DSprio would no longer depend on the DS field, would you have 
a name suggestion for this new queue discipline since the name "prio" is 
currently in use?


What should be the range of priorities that this new queue discipline 
would accept? skb->prioriry is of type __u32, but supporting 2^32 
priorities would require too large of an array to index packets by 
priority; the DS field is only 6 bits long. Do you have a use case in 
mind to guide us here?



I find the cleverness in changing the highest/low prios confusing.
It looks error-prone (I guess that is why there is a BUG check)
To the authors: Is there a document/paper on the theory of this thing
as to why no explicit queues are "faster"?


The priority orientation in GKprio is due to two factors: failing safe 
and elegance. If zero were the highest priority, any operational mistake 
that leads not-classified packets through GKprio would potentially 
disrupt the system. We are humans, we'll make mistakes. The elegance 
aspect comes from the fact that the assigned priority is not massaged to 
fit the DS field. We find it helpful while inspecting packets on the wire.


The reason for us to avoid explicit queues in GKprio, which could change 
the behavior within a given priority, is to closely abide to the 
expected behavior assumed to prove Theorem 4.1 in the paper "Portcullis: 
Protecting Connection Setup from Denial-of-Capability Attacks":


https://dl.acm.org/citation.cfm?id=1282413


1) I agree that using multiple queues as in prio qdisc would make it
more manageable; does not necessarily need to be classful if you
use implicit skbprio classification. i.e on equeue use a priority
map to select a queue; on dequeue always dequeu from highest prio
until it has no more packets to send.


In my reply to Cong, I point out that there is a technical limitation 
in the interface of queue disciplines that forbids GKprio to have 
explicit sub-queues:


https://www.mail-archive.com/netdev@vger.kernel.org/msg234201.html


2) Dropping already enqueued packets will not work well for
local feedback (__NET_XMIT_BYPASS return code is about the
packet that has been dropped from earlier enqueueing because
it is lower priority - it does not  signify anything with
current skb to which actually just got enqueud).
Perhaps (off top of my head) is to always enqueue packets on
high priority when their limit is exceeded as long as lower prio has
some space. Means youd have to increment low prio accounting if their
space is used.


I don't understand the point you are making here. Could you develop it 
further?


[ ]'s
Michel Machado


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-09 Thread Jamal Hadi Salim

On 08/05/18 10:27 PM, Cong Wang wrote:

On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim  wrote:

Have you considered using skb->prio instead of peeking into the packet
header.
Also have you looked at the dsmark qdisc?



dsmark modifies ds fields, while this one just maps ds fields into
different queues.



Yeah, I was thinking more of re-using it for the purpose of mapping to
queues - but would require a lot more work.

once skbprio is set by something[1] then this qdisc could be used by
other subsystems (8021q, sockets etc); so i would argue for removal
of the embedded classification and instead maybe writing a simple
extension to skbmod to mark skbprio based on ds.

I find the cleverness in changing the highest/low prios confusing.
It looks error-prone (I guess that is why there is a BUG check)
To the authors: Is there a document/paper on the theory of this thing
as to why no explicit queues are "faster"?

Some other feedback:

1) I agree that using multiple queues as in prio qdisc would make it
more manageable; does not necessarily need to be classful if you
use implicit skbprio classification. i.e on equeue use a priority
map to select a queue; on dequeue always dequeu from highest prio
until it has no more packets to send.

2) Dropping already enqueued packets will not work well for
local feedback (__NET_XMIT_BYPASS return code is about the
packet that has been dropped from earlier enqueueing because
it is lower priority - it does not  signify anything with
current skb to which actually just got enqueud).
Perhaps (off top of my head) is to always enqueue packets on
high priority when their limit is exceeded as long as lower prio has
some space. Means youd have to increment low prio accounting if their
space is used.

cheers,
jamal

[1] something like:
tc filter add match all ip action skbmod inheritdsfield
tc filter add match all ip6 action skbmod inheritdsfield

inheritdsfield maps ds to skb->prioriry


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-09 Thread Michel Machado

On 05/08/2018 10:24 PM, Cong Wang wrote:

On Tue, May 8, 2018 at 5:59 AM, Michel Machado  wrote:

Overall it looks good to me, just one thing below:


+struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
+   .id =   "gkprio",
+   .priv_size  =   sizeof(struct gkprio_sched_data),
+   .enqueue=   gkprio_enqueue,
+   .dequeue=   gkprio_dequeue,
+   .peek   =   qdisc_peek_dequeued,
+   .init   =   gkprio_init,
+   .reset  =   gkprio_reset,
+   .change =   gkprio_change,
+   .dump   =   gkprio_dump,
+   .destroy=   gkprio_destroy,
+   .owner  =   THIS_MODULE,
+};



You probably want to add Qdisc_class_ops here so that you can
dump the stats of each internal queue.



Hi Cong,

In the production scenario we are targeting, this priority queue must be
classless; being classful would only bloat the code for us. I don't see
making this queue classful as a problem per se, but I suggest leaving it as
a future improvement for when someone can come up with a useful scenario for
it.



Take a look at sch_prio, it is fairly simple since your internal
queues are just an array... Per-queue stats are quite useful
in production, we definitely want to observe which queues are
full which are not.



DSprio cannot add Qdisc_class_ops without a rewrite of other queue 
disciplines, which doesn't seem desirable. Since the method cops->leaf 
is required (see register_qdisc()), we would need to replace the array 
struct sk_buff_head qdiscs[GKPRIO_MAX_PRIORITY] in struct 
gkprio_sched_data with the array struct Qdisc 
*queues[GKPRIO_MAX_PRIORITY] to be able to return a Qdisc in 
dsprio_leaf(). The problem with this change is that Qdisc does not have 
a method to dequeue from its tail. This new method may not even make 
sense in other queue disciplines. But without this method, 
gkprio_enqueue() cannot drop the lowest priority packet when the queue 
is full and an incoming packet has higher priority.


Nevertheless, I see your point on being able to observe the distribution 
of queued packets per priority. A solution for that would be to add the 
array __u32 qlen[GKPRIO_MAX_PRIORITY] in struct tc_gkprio_qopt. This 
solution even avoids adding overhead in the critical paths of DSprio. Do 
you see a better solution?


By the way, I've used GKPRIO_MAX_PRIORITY and other names that include 
"gkprio" above to reflect the version 1 of this patch that we are 
discussing. We will rename these identifiers for version 2 of this patch 
to replace "gkprio" with "dsprio".


[ ]'s
Michel Machado


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-08 Thread Cong Wang
On Tue, May 8, 2018 at 6:29 AM, Jamal Hadi Salim  wrote:
> Have you considered using skb->prio instead of peeking into the packet
> header.
> Also have you looked at the dsmark qdisc?
>

dsmark modifies ds fields, while this one just maps ds fields into
different queues.


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-08 Thread Cong Wang
On Tue, May 8, 2018 at 5:59 AM, Michel Machado  wrote:
>>> Overall it looks good to me, just one thing below:
>>>
 +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
 +   .id =   "gkprio",
 +   .priv_size  =   sizeof(struct gkprio_sched_data),
 +   .enqueue=   gkprio_enqueue,
 +   .dequeue=   gkprio_dequeue,
 +   .peek   =   qdisc_peek_dequeued,
 +   .init   =   gkprio_init,
 +   .reset  =   gkprio_reset,
 +   .change =   gkprio_change,
 +   .dump   =   gkprio_dump,
 +   .destroy=   gkprio_destroy,
 +   .owner  =   THIS_MODULE,
 +};
>>>
>>>
>>> You probably want to add Qdisc_class_ops here so that you can
>>> dump the stats of each internal queue.
>
>
> Hi Cong,
>
>In the production scenario we are targeting, this priority queue must be
> classless; being classful would only bloat the code for us. I don't see
> making this queue classful as a problem per se, but I suggest leaving it as
> a future improvement for when someone can come up with a useful scenario for
> it.


Take a look at sch_prio, it is fairly simple since your internal
queues are just an array... Per-queue stats are quite useful
in production, we definitely want to observe which queues are
full which are not.


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-08 Thread Michel Machado

On 05/08/2018 09:29 AM, Jamal Hadi Salim wrote:

On 08/05/18 08:59 AM, Michel Machado wrote:

Overall it looks good to me, just one thing below:


+struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
+   .id =   "gkprio",
+   .priv_size  =   sizeof(struct gkprio_sched_data),
+   .enqueue    =   gkprio_enqueue,
+   .dequeue    =   gkprio_dequeue,
+   .peek   =   qdisc_peek_dequeued,
+   .init   =   gkprio_init,
+   .reset  =   gkprio_reset,
+   .change =   gkprio_change,
+   .dump   =   gkprio_dump,
+   .destroy    =   gkprio_destroy,
+   .owner  =   THIS_MODULE,
+};


You probably want to add Qdisc_class_ops here so that you can
dump the stats of each internal queue.


Hi Cong,

    In the production scenario we are targeting, this priority queue 
must be classless; being classful would only bloat the code for us. I 
don't see making this queue classful as a problem per se, but I 
suggest leaving it as a future improvement for when someone can come 
up with a useful scenario for it.


I am actually struggling with this whole thing.
Have you considered using skb->prio instead of peeking into the packet
header.
Also have you looked at the dsmark qdisc?


As far as I know, skb->priority (skb->prio has been renamed) is unsigned 
for packets that come from the network. DSprio, adopting Cong's name 
suggestion, is most useful "merging" packets that come from different 
network interfaces.


Had we relied on DSmark to mark skb->tc_index with the DS field, we 
would have forced anyone using DSprio to use DSmark. This may sound as a 
good idea, but DSmark always requires writable socket buffers while 
setting skb->tc_index with the DS field of the packet (see 
dsmark_enqueue()), what means that the kernel may drop high priority 
packets instead of low priority packets due to memory pressure.


[ ]'s
Michel Machado


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-08 Thread Michel Machado

Overall it looks good to me, just one thing below:


+struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
+   .id =   "gkprio",
+   .priv_size  =   sizeof(struct gkprio_sched_data),
+   .enqueue=   gkprio_enqueue,
+   .dequeue=   gkprio_dequeue,
+   .peek   =   qdisc_peek_dequeued,
+   .init   =   gkprio_init,
+   .reset  =   gkprio_reset,
+   .change =   gkprio_change,
+   .dump   =   gkprio_dump,
+   .destroy=   gkprio_destroy,
+   .owner  =   THIS_MODULE,
+};


You probably want to add Qdisc_class_ops here so that you can
dump the stats of each internal queue.


Hi Cong,

   In the production scenario we are targeting, this priority queue 
must be classless; being classful would only bloat the code for us. I 
don't see making this queue classful as a problem per se, but I suggest 
leaving it as a future improvement for when someone can come up with a 
useful scenario for it.


[ ]'s
Michel Machado


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-08 Thread Jamal Hadi Salim

On 08/05/18 08:59 AM, Michel Machado wrote:

Overall it looks good to me, just one thing below:


+struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
+   .id =   "gkprio",
+   .priv_size  =   sizeof(struct gkprio_sched_data),
+   .enqueue    =   gkprio_enqueue,
+   .dequeue    =   gkprio_dequeue,
+   .peek   =   qdisc_peek_dequeued,
+   .init   =   gkprio_init,
+   .reset  =   gkprio_reset,
+   .change =   gkprio_change,
+   .dump   =   gkprio_dump,
+   .destroy    =   gkprio_destroy,
+   .owner  =   THIS_MODULE,
+};


You probably want to add Qdisc_class_ops here so that you can
dump the stats of each internal queue.


Hi Cong,

    In the production scenario we are targeting, this priority queue 
must be classless; being classful would only bloat the code for us. I 
don't see making this queue classful as a problem per se, but I suggest 
leaving it as a future improvement for when someone can come up with a 
useful scenario for it.


I am actually struggling with this whole thing.
Have you considered using skb->prio instead of peeking into the packet
header.
Also have you looked at the dsmark qdisc?

cheers,
jamal


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-08 Thread Nishanth Devarajan
On Mon, May 07, 2018 at 10:24:51PM -0700, Cong Wang wrote:
> On Mon, May 7, 2018 at 2:36 AM, Nishanth Devarajan  wrote:
> > net/sched: add gkprio scheduler
> >
> > Gkprio (Gatekeeper Priority Queue) is a queueing discipline that prioritizes
> > IPv4 and IPv6 packets accordingly to their DSCP field. Although Gkprio can 
> > be
> > employed in any QoS scenario in which a higher DSCP field means a higher
> > priority packet, Gkprio was concieved as a solution for denial-of-service
> > defenses that need to route packets with different priorities.
> 
> 
> Can we give it a better name? "Gatekeeper" is meaningless if we read
> it alone, it ties to your Gatekeeper project which is more than just this
> kernel module. Maybe "DS Priority Queue"?
> 

Yes, we should be able to come up with a better name, we'll work on it.

> Overall it looks good to me, just one thing below:
> 
> > +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
> > +   .id =   "gkprio",
> > +   .priv_size  =   sizeof(struct gkprio_sched_data),
> > +   .enqueue=   gkprio_enqueue,
> > +   .dequeue=   gkprio_dequeue,
> > +   .peek   =   qdisc_peek_dequeued,
> > +   .init   =   gkprio_init,
> > +   .reset  =   gkprio_reset,
> > +   .change =   gkprio_change,
> > +   .dump   =   gkprio_dump,
> > +   .destroy=   gkprio_destroy,
> > +   .owner  =   THIS_MODULE,
> > +};
> 
> You probably want to add Qdisc_class_ops here so that you can
> dump the stats of each internal queue.

Alright, will make some changes and send in a v2.

Thanks,
Nishanth


Re: [PATCH net-next] net:sched: add gkprio scheduler

2018-05-07 Thread Cong Wang
On Mon, May 7, 2018 at 2:36 AM, Nishanth Devarajan  wrote:
> net/sched: add gkprio scheduler
>
> Gkprio (Gatekeeper Priority Queue) is a queueing discipline that prioritizes
> IPv4 and IPv6 packets accordingly to their DSCP field. Although Gkprio can be
> employed in any QoS scenario in which a higher DSCP field means a higher
> priority packet, Gkprio was concieved as a solution for denial-of-service
> defenses that need to route packets with different priorities.


Can we give it a better name? "Gatekeeper" is meaningless if we read
it alone, it ties to your Gatekeeper project which is more than just this
kernel module. Maybe "DS Priority Queue"?

Overall it looks good to me, just one thing below:

> +struct Qdisc_ops gkprio_qdisc_ops __read_mostly = {
> +   .id =   "gkprio",
> +   .priv_size  =   sizeof(struct gkprio_sched_data),
> +   .enqueue=   gkprio_enqueue,
> +   .dequeue=   gkprio_dequeue,
> +   .peek   =   qdisc_peek_dequeued,
> +   .init   =   gkprio_init,
> +   .reset  =   gkprio_reset,
> +   .change =   gkprio_change,
> +   .dump   =   gkprio_dump,
> +   .destroy=   gkprio_destroy,
> +   .owner  =   THIS_MODULE,
> +};

You probably want to add Qdisc_class_ops here so that you can
dump the stats of each internal queue.