Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-07-07 Thread Eric Dumazet
On Thu, 2016-07-07 at 18:32 +0200, Jiri Kosina wrote:
> On Thu, 7 Jul 2016, Eric Dumazet wrote:
> 
> > > @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> > > struct sk_buff *skb,
> > >  {
> > >   int ret = 0, q_idx = *q_idx_p;
> > >   struct Qdisc *q;
> > > + int b;
> > >  
> > >   if (!root)
> > >   return 0;
> > > @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> > > struct sk_buff *skb,
> > >   goto done;
> > >   q_idx++;
> > >   }
> > > - list_for_each_entry(q, >list, list) {
> > > + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > >   if (q_idx < s_q_idx) {
> > >   q_idx++;
> > >   continue;
> > > @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> > > struct sk_buff *skb,
> > >  int *t_p, int s_t)
> > >  {
> > >   struct Qdisc *q;
> > > + int b;
> > >  
> > >   if (!root)
> > >   return 0;
> > > @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> > > struct sk_buff *skb,
> > >   if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
> > >   return -1;
> > >  
> > > - list_for_each_entry(q, >list, list) {
> > > + hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > >   if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
> > >   return -1;
> > >   }
> > 
> > 
> > Not sure why you used the rcu version here, but the non rcu version in
> > tc_dump_qdisc_root()
> 
> Good catch.
> 
> Actually even the current code is odd in this regard -- 
> qdisc_match_from_root() uses RCU iterator,

Because it can be run from qdisc enqueue() dequeue(), not holding RTNL.

>  while tc_dump_*() use the 
> non-RCU one; addition and deletion is performed using RCU primitives.

It really is protected by RTNL (qdiscs can not change during the dump)

> 
> I haven't got my head around this yet; if it's correct at all, it'd at 
> least deserve a comment somewhere.
> 
> I'll respin v2 of the patch (there is also a conflict on HASH_SIZE 
> definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include 
> in netdevice.h that needs to be resolved as well) that'd make RCU usage 
> consistent.
> 
> Any other objections/comments? I was namely curious about any opinions 
> regarding the hashtable size.

Well, this is the tricky part, but rhashtable would mean way more
changes...






Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-07-07 Thread Jiri Kosina
On Thu, 7 Jul 2016, Eric Dumazet wrote:

> > @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> >  {
> > int ret = 0, q_idx = *q_idx_p;
> > struct Qdisc *q;
> > +   int b;
> >  
> > if (!root)
> > return 0;
> > @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> > goto done;
> > q_idx++;
> > }
> > -   list_for_each_entry(q, >list, list) {
> > +   hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > if (q_idx < s_q_idx) {
> > q_idx++;
> > continue;
> > @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> >int *t_p, int s_t)
> >  {
> > struct Qdisc *q;
> > +   int b;
> >  
> > if (!root)
> > return 0;
> > @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> > struct sk_buff *skb,
> > if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
> > return -1;
> >  
> > -   list_for_each_entry(q, >list, list) {
> > +   hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
> > if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
> > return -1;
> > }
> 
> 
> Not sure why you used the rcu version here, but the non rcu version in
> tc_dump_qdisc_root()

Good catch.

Actually even the current code is odd in this regard -- 
qdisc_match_from_root() uses RCU iterator, while tc_dump_*() use the 
non-RCU one; addition and deletion is performed using RCU primitives.

I haven't got my head around this yet; if it's correct at all, it'd at 
least deserve a comment somewhere.

I'll respin v2 of the patch (there is also a conflict on HASH_SIZE 
definition in ip6_tunnel.c, ip6_gre.c and sit.c due to hashtable.h include 
in netdevice.h that needs to be resolved as well) that'd make RCU usage 
consistent.

Any other objections/comments? I was namely curious about any opinions 
regarding the hashtable size.

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: [RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-07-07 Thread Eric Dumazet
On Thu, 2016-07-07 at 11:04 +0200, Jiri Kosina wrote:

> 
> 
> From: Jiri Kosina 
> Subject: [PATCH] net: sched: convert qdisc linked list to hashtable
> 
> Convert the per-device linked list into a hashtable. The primary motivation
> for this change is that currently, we're not tracking all the qdiscs in
> hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
> linked list by qdisc_match_from_root() is rather expensive.

...

>   }
> @@ -1440,6 +1441,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> struct sk_buff *skb,
>  {
>   int ret = 0, q_idx = *q_idx_p;
>   struct Qdisc *q;
> + int b;
>  
>   if (!root)
>   return 0;
> @@ -1454,7 +1456,7 @@ static int tc_dump_qdisc_root(struct Qdisc *root, 
> struct sk_buff *skb,
>   goto done;
>   q_idx++;
>   }
> - list_for_each_entry(q, >list, list) {
> + hash_for_each(qdisc_dev(root)->qdisc_hash, b, q, hash) {
>   if (q_idx < s_q_idx) {
>   q_idx++;
>   continue;
> @@ -1771,6 +1773,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> struct sk_buff *skb,
>  int *t_p, int s_t)
>  {
>   struct Qdisc *q;
> + int b;
>  
>   if (!root)
>   return 0;
> @@ -1778,7 +1781,7 @@ static int tc_dump_tclass_root(struct Qdisc *root, 
> struct sk_buff *skb,
>   if (tc_dump_tclass_qdisc(root, skb, tcm, cb, t_p, s_t) < 0)
>   return -1;
>  
> - list_for_each_entry(q, >list, list) {
> + hash_for_each_rcu(qdisc_dev(root)->qdisc_hash, b, q, hash) {
>   if (tc_dump_tclass_qdisc(q, skb, tcm, cb, t_p, s_t) < 0)
>   return -1;
>   }


Not sure why you used the rcu version here, but the non rcu version in
tc_dump_qdisc_root()

Thanks.





[RFC PATCH] net: sched: convert qdisc linked list to hashtable (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-07-07 Thread Jiri Kosina
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Anyway, we probably need to improve our ability to understand qdisc 
> hierarchies. Having some hidden qdiscs is the real problem here.
> 
> We need to add some hash table so that qdisc_match_from_root() does not 
> have to scan hundred of qdiscs.

So how about something like the patch below? I already have preliminary 
patches on top which unhide the default qdiscs, but let's make this one 
step after the other.

Thanks.




From: Jiri Kosina 
Subject: [PATCH] net: sched: convert qdisc linked list to hashtable

Convert the per-device linked list into a hashtable. The primary motivation
for this change is that currently, we're not tracking all the qdiscs in
hierarchy (e.g. excluding default qdiscs), as the lookup performed over the
linked list by qdisc_match_from_root() is rather expensive.

The ultimate goal is to get rid of hidden qdiscs completely, which will bring
much more determinism in user experience.

Signed-off-by: Jiri Kosina 
---
 include/linux/netdevice.h |  2 ++
 include/net/pkt_sched.h   |  4 ++--
 include/net/sch_generic.h |  2 +-
 net/core/dev.c|  1 +
 net/sched/sch_api.c   | 23 +--
 net/sched/sch_generic.c   |  6 +++---
 net/sched/sch_mq.c|  2 +-
 net/sched/sch_mqprio.c|  2 +-
 8 files changed, 24 insertions(+), 18 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index f45929c..630838e 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct netpoll_info;
 struct device;
@@ -1778,6 +1779,7 @@ struct net_device {
unsigned intnum_tx_queues;
unsigned intreal_num_tx_queues;
struct Qdisc*qdisc;
+   DECLARE_HASHTABLE   (qdisc_hash, 16);
unsigned long   tx_queue_len;
spinlock_t  tx_global_lock;
int watchdog_timeo;
diff --git a/include/net/pkt_sched.h b/include/net/pkt_sched.h
index fea53f4..8ba11b4 100644
--- a/include/net/pkt_sched.h
+++ b/include/net/pkt_sched.h
@@ -90,8 +90,8 @@ int unregister_qdisc(struct Qdisc_ops *qops);
 void qdisc_get_default(char *id, size_t len);
 int qdisc_set_default(const char *id);
 
-void qdisc_list_add(struct Qdisc *q);
-void qdisc_list_del(struct Qdisc *q);
+void qdisc_hash_add(struct Qdisc *q);
+void qdisc_hash_del(struct Qdisc *q);
 struct Qdisc *qdisc_lookup(struct net_device *dev, u32 handle);
 struct Qdisc *qdisc_lookup_class(struct net_device *dev, u32 handle);
 struct qdisc_rate_table *qdisc_get_rtab(struct tc_ratespec *r,
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h
index 62d5531..26f5cb3 100644
--- a/include/net/sch_generic.h
+++ b/include/net/sch_generic.h
@@ -67,7 +67,7 @@ struct Qdisc {
u32 limit;
const struct Qdisc_ops  *ops;
struct qdisc_size_table __rcu *stab;
-   struct list_headlist;
+   struct hlist_node   hash;
u32 handle;
u32 parent;
int (*reshape_fail)(struct sk_buff *skb,
diff --git a/net/core/dev.c b/net/core/dev.c
index 904ff43..edc1617 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -7511,6 +7511,7 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, 
const char *name,
INIT_LIST_HEAD(>all_adj_list.lower);
INIT_LIST_HEAD(>ptype_all);
INIT_LIST_HEAD(>ptype_specific);
+   hash_init(dev->qdisc_hash);
dev->priv_flags = IFF_XMIT_DST_RELEASE | IFF_XMIT_DST_RELEASE_PERM;
setup(dev);
 
diff --git a/net/sched/sch_api.c b/net/sched/sch_api.c
index ddf047d..82953cb 100644
--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -29,6 +29,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -265,33 +266,33 @@ static struct Qdisc *qdisc_match_from_root(struct Qdisc 
*root, u32 handle)
root->handle == handle)
return root;
 
-   list_for_each_entry_rcu(q, >list, list) {
+   hash_for_each_possible_rcu(qdisc_dev(root)->qdisc_hash, q, hash, 
handle) {
if (q->handle == handle)
return q;
}
return NULL;
 }
 
-void qdisc_list_add(struct Qdisc *q)
+void qdisc_hash_add(struct Qdisc *q)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
struct Qdisc *root = qdisc_dev(q)->qdisc;
 
WARN_ON_ONCE(root == _qdisc);
ASSERT_RTNL();
-   list_add_tail_rcu(>list, >list);
+   hash_add_rcu(qdisc_dev(q)->qdisc_hash, >hash, q->handle);
}
 }
-EXPORT_SYMBOL(qdisc_list_add);
+EXPORT_SYMBOL(qdisc_hash_add);
 
-void qdisc_list_del(struct Qdisc *q)
+void qdisc_hash_del(struct Qdisc *q)
 {
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {

Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-06-28 Thread Jiri Kosina
On Tue, 28 Jun 2016, Cong Wang wrote:

> > BTW, I've started to actually work on fixing this, and I've noticed that
> > TBF behavior actually violates what's stated in pfifo_fast manpage:
> >
> > ==
> > Whenever  an  interface is created, the pfifo_fast qdisc is
> > automatically used as a queue. If another qdisc is
> > attached, it preempts the default pfifo_fast, which automatically
> > returns to function when an  existing  qdisc is detached.
> >
> > In this sense this qdisc is magic, and unlike other qdiscs.
> > ==
> 
> It is out of date, now default qdisc can be set to any other qdisc
> via /proc. Also, probably due to historical reasons, we don't have
> a unified default default qdisc, some uses bfifo, some uses pfifo,
> we may break some existing script if we change that.

While I do understand that reasoning, I'd argue that unpredictable and 
unexpected behavior of TBF causing systems with non-working networking is 
much more likely than any userspace having hard dependency on the fact 
that default (*) qdisc for TBF is noop.

(*) where 'default upon creation' != 'default when reset'

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-06-28 Thread Cong Wang
On Tue, Jun 28, 2016 at 8:19 AM, Jiri Kosina  wrote:
> BTW, I've started to actually work on fixing this, and I've noticed that
> TBF behavior actually violates what's stated in pfifo_fast manpage:
>
> ==
> Whenever  an  interface is created, the pfifo_fast qdisc is
> automatically used as a queue. If another qdisc is
> attached, it preempts the default pfifo_fast, which automatically
> returns to function when an  existing  qdisc is detached.
>
> In this sense this qdisc is magic, and unlike other qdiscs.
> ==

It is out of date, now default qdisc can be set to any other qdisc
via /proc. Also, probably due to historical reasons, we don't have
a unified default default qdisc, some uses bfifo, some uses pfifo,
we may break some existing script if we change that.


[RFC PATCH] sch_tbf: avoid silent fallback to noop_qdisc (was Re: Deleting child qdisc doesn't reset parent to default qdisc?)

2016-06-28 Thread Jiri Kosina
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
> 
> Or fail the delete if the bfifo can not be allocated.
> 
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.
> 
> Surely the silent fallback to noop_qdisc is wrong.

So before we go further and fix the fact that we actually do have hidden 
qdiscs (by refactoring qdisc_match_from_root() and friends), I'd still 
like to bring the patch below up for consideration.

Thanks.




From: Jiri Kosina 
Subject: [PATCH] sch_tbf: avoid silent fallback to noop_qdisc

TBF started its life as a classless qdisc with a single builtin FIFO queue
which was being shaped.

When it got later turned into classful qdisc, it was written in a way that
the fallback qdisc was noop_qdisc, which produces bad user experience (delete
of last manually added class doesn't reset it to initial default, but renders
networking unusable instead).

Switch the default fallback to bfifo; this also mimics how the other guys
(HTB, HFSC, CBQ, ...) are behaving.

Signed-off-by: Jiri Kosina 
---
 net/sched/sch_tbf.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/net/sched/sch_tbf.c b/net/sched/sch_tbf.c
index 3161e49..b06dffe 100644
--- a/net/sched/sch_tbf.c
+++ b/net/sched/sch_tbf.c
@@ -508,8 +508,12 @@ static int tbf_graft(struct Qdisc *sch, unsigned long arg, 
struct Qdisc *new,
 {
struct tbf_sched_data *q = qdisc_priv(sch);
 
-   if (new == NULL)
-   new = _qdisc;
+   if (new == NULL) {
+   /* reset to default qdisc */
+   new = qdisc_create_dflt(sch->dev_queue, _qdisc_ops, 
sch->parent);
+   if (!new)
+   return -ENOBUFS;
+   }
 
*old = qdisc_replace(sch, new, >qdisc);
return 0;
-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-06-28 Thread Jiri Kosina
On Fri, 15 Apr 2016, Eric Dumazet wrote:

> > TBF is probably a bad example because it started life as a classless 
> > qdisc. There was only one built-in fifo queue that was shaped. Then 
> > someone made it classful and changed this behavior. To me it sounds 
> > reasonable to have the default behavior restored. At minimal 
> > consistency.
> 
> 
> Then you need to save the initial qdisc (bfifo for TBF) in a special
> place, to make sure the delete operation is guaranteed to succeed.
> 
> Or fail the delete if the bfifo can not be allocated.
> 
> I can tell that determinism if far more interesting than usability for
> some users occasionally playing with tc.

BTW, I've started to actually work on fixing this, and I've noticed that 
TBF behavior actually violates what's stated in pfifo_fast manpage:

==
Whenever  an  interface is created, the pfifo_fast qdisc is 
automatically used as a queue. If another qdisc is
attached, it preempts the default pfifo_fast, which automatically 
returns to function when an  existing  qdisc is detached.

In this sense this qdisc is magic, and unlike other qdiscs.
==

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-15 Thread David Miller
From: Eric Dumazet 
Date: Fri, 15 Apr 2016 07:58:48 -0700

> Having some hidden qdiscs is the real problem here.

+1


Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-15 Thread Eric Dumazet
On Fri, 2016-04-15 at 08:42 -0400, Jamal Hadi Salim wrote:
> On 16-04-14 01:49 PM, Eric Dumazet wrote:
> 
> > And what would be the chosen behavior ?
> >
> 
> TBF is probably a bad example because it started life as
> a classless qdisc. There was only one built-in fifo queue
> that was shaped. Then someone made it classful and changed
> this behavior. To me it sounds reasonable to have the
> default behavior restored. At minimal consistency.


Then you need to save the initial qdisc (bfifo for TBF) in a special
place, to make sure the delete operation is guaranteed to succeed.

Or fail the delete if the bfifo can not be allocated.

I can tell that determinism if far more interesting than usability for
some users occasionally playing with tc.

Surely the silent fallback to noop_qdisc is wrong.

Anyway, we probably need to improve our ability to understand qdisc
hierarchies. Having some hidden qdiscs is the real problem here.

We need to add some hash table so that qdisc_match_from_root() does not
have to scan hundred of qdiscs.







Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-15 Thread Jamal Hadi Salim

On 16-04-14 01:49 PM, Eric Dumazet wrote:


And what would be the chosen behavior ?



TBF is probably a bad example because it started life as
a classless qdisc. There was only one built-in fifo queue
that was shaped. Then someone made it classful and changed
this behavior. To me it sounds reasonable to have the
default behavior restored. At minimal consistency.


Relying on TBF installing a bfifo for you at delete would be hazardous.

For example CBQ got it differently than HFSC

If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
falls back to noop_qdisc, without warning the user :(

At least always using noop_qdisc is consistent. No magic there.

Doing 'unification' right now would break existing scripts.

This is too late, I am afraid.



Sigh. So rant:
IMO, we should let any new APIs and API updates stay longer
in discussion. Or better mark them as unstable for sometime.
The excuse that "it is out in the wild therefore cant be changed"
is harmful because the timeline is "forever" whereas
patches are applied after a short period of posting
and discussions and sometimes not involving the right people.
It is like having a jury issuing a death sentence after 1 week of
deliberation. You cant take it back after execution.

cheers,
jamal


Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 18:08 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
> 
> > > > I've came across the behavior where adding a child qdisc and then 
> > > > deleting 
> > > > it again makes the networking dysfunctional (I guess that's because all 
> > > > of 
> > > > a sudden there is absolutely no working qdisc on the device, although 
> > > > there originally was a default one in the parent).
> > > > 
> > > > In a nutshell, is this expected behavior or bug?
> > > 
> > > This is the expected behavior.
> > 
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
> 
> Would a patch that'd unify this in a sense that all qdiscs would assign 
> the default one upon deletion acceptable?
> 

And what would be the chosen behavior ?

Relying on TBF installing a bfifo for you at delete would be hazardous.

For example CBQ got it differently than HFSC

If qdisc_create_dflt() fails in CBQ, we fail the 'delete', while HFSC
falls back to noop_qdisc, without warning the user :(

At least always using noop_qdisc is consistent. No magic there.

Doing 'unification' right now would break existing scripts.

This is too late, I am afraid.




Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 18:22 +0200, Phil Sutter wrote:

> And those being invisible can be overridden using 'tc qd add', right?
> AFAIR they're not listed because they don't properly register, so the
> system doesn't care to override them. In this case we could change all
> classful qdiscs to restore the default qdisc if a leaf qdisc is being
> deleted instead of noop (which is probably not what the user wants
> anyway).

Even if they properly register, they are not visible.

Take a look at
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=95dc19299f741c986227ec33e23cbf9b3321f812

for some context.

When a default pfifo is created on say a HTB class, you do not see it by
default in a dump.

If you have 100 HTB classes, HTB created 100 pfifo just fine, and it
works, unless an admin tries to delete them maybe ;)





Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Phil Sutter
On Thu, Apr 14, 2016 at 08:44:40AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> > On Thu, 14 Apr 2016, Phil Sutter wrote:
> > 
> > > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > > one upon deletion instead of noop_qdisc, hence I would describe
> > > the situation using the words 'inconsistent' and 'accident' rather than
> > > 'expected'. :)
> > 
> > Exactly. I'd again like to stress the fact that this configuration works:
> > 
> > jikos:~ # tc qdisc show
> > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
> > 1.0ms 
> > 
> > and this (after performing add/delete operation) doesn't:
> > 
> > jikos:~ # tc qdisc show
> > qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
> > 1.0ms 
> > 
> > It's hard to spot a difference (hint: there is none).
> 
> This is because some qdisc are not visible in the dump.

And those being invisible can be overridden using 'tc qd add', right?
AFAIR they're not listed because they don't properly register, so the
system doesn't care to override them. In this case we could change all
classful qdiscs to restore the default qdisc if a leaf qdisc is being
deleted instead of noop (which is probably not what the user wants
anyway).

Cheers, Phil


Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
On Thu, 14 Apr 2016, Phil Sutter wrote:

> > > I've came across the behavior where adding a child qdisc and then 
> > > deleting 
> > > it again makes the networking dysfunctional (I guess that's because all 
> > > of 
> > > a sudden there is absolutely no working qdisc on the device, although 
> > > there originally was a default one in the parent).
> > > 
> > > In a nutshell, is this expected behavior or bug?
> > 
> > This is the expected behavior.
> 
> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Would a patch that'd unify this in a sense that all qdiscs would assign 
the default one upon deletion acceptable?

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 17:34 +0200, Jiri Kosina wrote:
> On Thu, 14 Apr 2016, Phil Sutter wrote:
> 
> > OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> > one upon deletion instead of noop_qdisc, hence I would describe
> > the situation using the words 'inconsistent' and 'accident' rather than
> > 'expected'. :)
> 
> Exactly. I'd again like to stress the fact that this configuration works:
> 
>   jikos:~ # tc qdisc show
>   qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
> 1.0ms 
> 
> and this (after performing add/delete operation) doesn't:
> 
>   jikos:~ # tc qdisc show
>   qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
> 1.0ms 
> 
> It's hard to spot a difference (hint: there is none).

This is because some qdisc are not visible in the dump.


qdisc_list_add() uses a single list, so adding too much stuff in it
could slow down fast path (qdisc_lookup(), called from
qdisc_tree_reduce_backlog())






Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
On Thu, 14 Apr 2016, Phil Sutter wrote:

> OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
> one upon deletion instead of noop_qdisc, hence I would describe
> the situation using the words 'inconsistent' and 'accident' rather than
> 'expected'. :)

Exactly. I'd again like to stress the fact that this configuration works:

jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
1.0ms 

and this (after performing add/delete operation) doesn't:

jikos:~ # tc qdisc show
qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 
1.0ms 

It's hard to spot a difference (hint: there is none).

Thanks,

-- 
Jiri Kosina
SUSE Labs



Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Phil Sutter
On Thu, Apr 14, 2016 at 08:01:39AM -0700, Eric Dumazet wrote:
> On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> > Hi,
> > 
> > I've came across the behavior where adding a child qdisc and then deleting 
> > it again makes the networking dysfunctional (I guess that's because all of 
> > a sudden there is absolutely no working qdisc on the device, although 
> > there originally was a default one in the parent).
> > 
> > In a nutshell, is this expected behavior or bug?
> 
> This is the expected behavior.

OTOH some qdiscs (CBQ, DRR, DSMARK, HFSC, HTB, QFQ) assign the default
one upon deletion instead of noop_qdisc, hence I would describe
the situation using the words 'inconsistent' and 'accident' rather than
'expected'. :)

Anyhow, the problem with skilled admins is they accept quirks too easily
and just build their scripts around them - the same scripts we have to
keep compatible to then.

Cheers, Phil


Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Eric Dumazet
On Thu, 2016-04-14 at 16:44 +0200, Jiri Kosina wrote:
> Hi,
> 
> I've came across the behavior where adding a child qdisc and then deleting 
> it again makes the networking dysfunctional (I guess that's because all of 
> a sudden there is absolutely no working qdisc on the device, although 
> there originally was a default one in the parent).
> 
> In a nutshell, is this expected behavior or bug?

This is the expected behavior.

If the kernel was suddenly doing a 'replace' when you ask a delete,
then the scripts doing a delete , than a add would break.

tc users are skilled admins ;)




Re: Deleting child qdisc doesn't reset parent to default qdisc?

2016-04-14 Thread Jiri Kosina
On Thu, 14 Apr 2016, Jiri Kosina wrote:

> In a nutshell, is this expected behavior or bug?

Just to clarify what seems to suggest to me that this is rather a bug that 
needs to be fixed (but apparently one that has been there for quite a long 
time) can be demonstrated by this:

> 
> =
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

The above configuration works.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.59 ms
> 
> jikos:~ # tc qdisc add dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 
> qdisc sfq 8008: dev eth0 parent 10:1 limit 127p quantum 1514b depth 127 
> divisor 1024 
> 
> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
> 64 bytes from info.nix.cz (195.47.235.3): icmp_seq=1 ttl=89 time=1.67 ms
> 
> jikos:~ # tc qdisc del dev eth0 parent 10:1 sfq
> jikos:~ # tc qdisc show
> qdisc tbf 10: dev eth0 root refcnt 2 rate 800Mbit burst 131000b lat 1.0ms 

The above configuration doesn't although it's identical to the working one 
at the beginning.

> jikos:~ # ping -c 1 nix.cz | head -2
> PING nix.cz (195.47.235.3) 56(84) bytes of data.
>   [ ... nothing happens ... ]
> ^C

-- 
Jiri Kosina
SUSE Labs