On 07/16, Jan Kara wrote:
>
> On Wed 15-07-15 20:19:20, Oleg Nesterov wrote:
> >
> > Perhaps it makes to merge other 2 patches from Dave first? (those which
> > change __sb_start/end_write to rely on RCU). Afaics these changes are
> > straightforward and correct. Although I'd suggest to use preempt_disable()
> > and synchronize_sched() instead. I will be happy to (try to) make this
> > conversion on top of his changes.
> >
> > Because I do not want to delay the performance improvements and I do not
> > know when exactly I'll send the next version: I need to finish the previous
> > discussion about rcu_sync first. And the necessary changes in fs/super.c
> > depend on whether percpu_rw_semaphore will have rcu_sync or not (not too
> > much, only destroy_super() depends, but still).
> >
> > And of course, I am worried that I missed something and percpu_rw_semaphore
> > can't work for some reason. The code in fs/super.c looks simple, but it
> > seems that filesystems do the "strange" things with lockdep at least.
>
> So Dave's patches would go in only in the next merge window anyway so we
> still have like two-three weeks to decide which patchset to take.

OK, good.

> If you
> think it will take you longer,

Hopefully not.

> then merging Dave's patches makes some sense
> although I personally don't think the issue is so important that we have to
> fix it ASAP and eventual delay of one more release would be OK for me.

OK. I will try to do this in any case, I just wanted to say that I can
equally do this on top of Dave's patches.

To remind, I need to finish the discussion about percpu_rw_semaphore
and rcu_sync, then I'll try to make v2.

And. The biggest problem is lockdep. Everything else looks really simple
although of course I could miss something. And not only because the
filesystems abuse lockdep and thus we need some cleanups first. Unless
I am totally confused fs/super.c needs some cleanups (and fixes) too,
with or without this conversion. Say, acquire_freeze_lock() logic does
do not look right:

        - wait_event(.frozen < level) without rwsem_acquire_read() is
          just wrong from lockdep perspective. If we are going to deadlock
          because the caller is buggy, lockdep can't warn us.

        - __sb_start_write() can race with thaw_super() + freeze_super(),
          and after "goto retry" the 2nd  acquire_freeze_lock() is wrong.

        - The "tell lockdep we are doing trylock" hack doesn't look nice.

          I think this is correct, but this logic should be more explicit.
          Yes, the recursive read_lock() is fine if we hold the lock on
          higher level. But we do not need to fool lockdep. If we can not
          deadlock in this case (and I agree we can't) we should simply use
          wait == F consistently.

Something like this (not even compiled tested):

        static int
        do_sb_start_write(struct super_block *sb, int level, bool wait, 
unsigned long ip)
        {

                if (wait)
                        rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 
0, ip);
        retry:
                if (unlikely(sb->s_writers.frozen >= level)) {
                        if (!wait)
                                return 0;
                        wait_event(sb->s_writers.wait_unfrozen,
                                   sb->s_writers.frozen < level);
                }

                percpu_counter_inc(&sb->s_writers.counter[level-1]);
                /*
                 * Make sure counter is updated before we check for frozen.
                 * freeze_super() first sets frozen and then checks the counter.
                 */
                smp_mb();
                if (unlikely(sb->s_writers.frozen >= level)) {
                        __sb_end_write(sb, level);
                        goto retry;
                }

                if (!wait)
                        rwsem_acquire_read(&sb->s_writers.lock_map[level-1], 0, 
1, ip);
                return 1;
        }

        /*
         * This is an internal function, please use 
sb_start_{write,pagefault,intwrite}
         * instead.
         */
        int __sb_start_write(struct super_block *sb, int level, bool wait)
        {
                bool cantbelocked = false;
                int ret;

        #ifdef CONFIG_LOCKDEP
                /*
                 * We want lockdep to tell us about possible deadlocks with 
freezing but
                 * it's it bit tricky to properly instrument it. Getting a 
freeze protection
                 * works as getting a read lock but there are subtle problems. 
XFS for example
                 * gets freeze protection on internal level twice in some 
cases, which is OK
                 * only because we already hold a freeze protection also on 
higher level. Due
                 * to these cases we have to use wait == F (trylock mode) which 
must not fail.
                 */
                if (wait) {
                        int i;

                        for (i = 0; i < level - 1; i++)
                                if (lock_is_held(&sb->s_writers.lock_map[i])) {
                                        cantbelocked = true;
                                        break;
                                }
                }
        #endif
                ret = do_sb_start_write(sb, level, wait && !cantbelocked, 
_RET_IP_);
                WARN_ON(cantbelocked & !ret);
                return ret;
        }

This should not generate the additional code if CONFIG_LOCKDEP=n and
After this patch it will be trivial to convert __sb_start_write(), but
we need some more cleanups. And perhaps I'll send some changes (like
above) separately, because again, I think they make sense in any case.

In short: I'll try to make v2 asap, but this is all I can promise ;)

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to