(Changing $subject as the discussion is going to a completely different topic)

On Thu, Sep 27, 2018 at 3:19 PM Eric Dumazet <eric.duma...@gmail.com> wrote:
>
>
>
> On 09/27/2018 02:36 PM, Cong Wang wrote:
>
> > I don't understand what you mean by changing ip command, you must
> > mean tc command, but still, I have no idea about how restarting failed
> > syscall could be related to my patch and why we need to restart anything
> > here. If the refcnt goes to 0, it will never come back, retrying won't help
> > anything.
> >
>
> Yep, tc command it is.
>
> I was not especially commenting your patch (replacing an english message by 
> another does
> not seem very big deal), but the fact that the code right there seems to be 
> prepared
> for parallel changes.
>
> But using RCU lookups in control path will lead to occasional failures
> that most user space tools would not expect.
>

I already discussed this with Vlad in the beginning of his RTNL
removal patches, we both agreed some lock is still needed, it is not
completely lockless. Take a look at tc action code now, two spinlocks
are still needed even after we will remove the RTNL there.


> Lets assume two tasks are launching "tc qdisc replace dev eth0 root XXX" in 
> whatever order/parallelism.
>
> Both should succeed, after/before major RTNL->other_locking_mechanism


Yes, it is never going to be completely lockless.


>
> Control paths are usually using a mutex or a spinlock so that they never hit 
> a 0-refcount at all.


For dev->qdisc, sure, we should already hold a refcnt, it can't be gone.

For qdisc_lookup_rcu(), it could be that refcnt goes to 0 before we
remove it from hashtable, right? call_rcu() is only called after
refcnt==0, so rcu read lock can't help here.

Thanks.

Reply via email to