Hello,

On (12/14/17 10:11), Tejun Heo wrote:
> Hey, Steven.
> 
> On Thu, Dec 14, 2017 at 12:55:06PM -0500, Steven Rostedt wrote:
> > Yes! Please create a reproducer, because I still don't believe there is
> > one. And it's all hand waving until there's an actual report that we can
> > lock up the system with my approach.
> 
> Yeah, will do, but out of curiosity, Sergey and I already described
> what the root problem was and you didn't really seem to take that.  Is
> that because the explanation didn't make sense to you or us
> misunderstanding what your code does?

I second _everything_ that Tejun has said.


Steven, your approach works ONLY when we have the following preconditions:

 a) there is a CPU that is calling printk() from the 'safe' (non-atomic,
    etc) context

        what does guarantee that? what happens if there is NO non-atomic
        CPU or that non-atomic simplky missses the console_owner != false
        point? we are going to conclude

        "if printk() doesn't work for you, it's because you are holding it 
wrong"?


        what if that non-atomic CPU does not call printk(), but instead
        it does console_lock()/console_unlock()? why there is no handoff?

        CPU0                            CPU1 ~ CPU10
                                        in atomic contexts [!]. ping-ponging 
console_sem
                                        ownership to each other. while what 
they really
                                        need to do is to simply up() and let 
CPU0 to
                                        handle it.
                                        printk
        console_lock()
         schedule()
                                        ...
                                        printk
                                        printk
                                        ...
                                        printk
                                        printk

                                        up()

        // woken up
        console_unlock()

        why do we make an emphasis on fixing vprintk_printk()?


 b) non-atomic CPU sees console_owner set (which is set for a very short
    period of time)

        again. what if that non-atomic CPU does not see console_owner?
        "don't use printk()"?

 c) the task that is looping in console_unlock() sees non-atomic CPU when
    console_owner is set.


IOW, we need to have


   the right CPU (a) at the very right moment (b && c) doing the very right 
thing.


   * and the "very right moment" is tiny and additionally depends
     on a foreign CPU [the one that is looping in console_unlock()].



a simple question - how is that going to work for everyone? are we
"fixing" a small fraction of possible use-cases?



Steven, I thought we reached the agreement [**] that the solution we should
be working on is a combination of prinkt_kthread and console_sem hand
off. Simply because it adds the missing "there is a non-atomic CPU wishing
to console_unlock()" thing.

        lkml.kernel.org/r/[email protected]

        https://marc.info/?l=linux-kernel&m=151011840830776&w=2
        https://marc.info/?l=linux-kernel&m=151015141407368&w=2
        https://marc.info/?l=linux-kernel&m=151018900919386&w=2
        https://marc.info/?l=linux-kernel&m=151019815721161&w=2
        https://marc.info/?l=linux-kernel&m=151020275921953&w=2
**      https://marc.info/?l=linux-kernel&m=151020404622181&w=2
**      https://marc.info/?l=linux-kernel&m=151020565222469&w=2


what am I missing?

        -ss

Reply via email to