Thank you for your clarification, and let me explain what I was
talking about before providing my last code version.

>>>>        kmsg_dump(KMSG_DUMP_PANIC);
>>>> +      panic_flush_to_console();
>
> This is wrong. kmsg_dump() flushes the messages into the registered
> dumpers, e.g. pstore. It handles only messages in the main
> log buffer.
>
> printk_safe_flush_on_panic() must be called before. It moves any
> pending messages from the temporary per-CPU buffers into the main
> log buffer so that they are visible for the dumpers.

Ok, this is very clear.

>>> Why?
>>
>> Here, we are supposed to push the pending messages, as I could verify
>> that they don't reach the console until the console_unblank(). So I
>> wanted to push them with panic_flush_to_console() before the possible
>> upcoming __crash_kexec() call.
>
> I do not see how the ordering of printk_safe_flush_on_panic()
> and kmsg_dump() would change anything for the upcoming
> __crash_kexec() call.

I was wrong to remove the call to printk_safe_flush_on_panic() before
kmsg_dump() as it was in panic_flush_to_console().

The call panic_flush_to_console() idea here was to push any messages
that could have been submitted (even by any dumper) to the console
before the possible upcoming __crash_kexec() call. Otherwise they could
be discarded (if the crash messages can't be retrieved).

And that's the reason why I've changed this code in my previous version
to force a call to panic_flush_to_console() before the __crash_kexec().
Which seems to be too insecure as far as I understand it now.


In your previous email you also indicate:

>> . With the use of "crashkernel", a Fatal MCE injection shows only the
>>     injection message
>>
>>     [   80.811708] mce: Machine check injector initialized
>>     [   92.298755] mce: Triggering MCE exception on CPU 0
>>     [   92.299362] Disabling lock debugging due to kernel taint
>>
>>        No other messages is displayed and the system reboots immediately.
>
> But you could find the messages in the crashdump. Aren't you?

Yes, but _only_ with the latest kexec-tools-2.0.21.git version !
[My installed kexec-tools.x86_64 2.0.20-34.0.4.el8_3 doesn't seem to
work, and I guess there is a separate problem that I will investigate
later.]


>
> It works this way by "design". The idea is the following:
>

That's a rather disarming statement :)

I'm sure we can agree that getting the information directly (and of
course safely) from the running kernel console would be a better option,
instead of having a 2 steps process to get the crash reason (even when
it works) as it adds a risk not to get this crash reason at all. And I
know for a fact that Cloud people do want to know why a platform goes
away ;)


> Taking any locks from NMI context might lead to a deadlock.
> Re-initializing the locks might lead to deadlock as well
> because of possible double unlock. Ignoring the locks might
> lead to problems either.
>
> A compromise is needed:
>
> 1. crashdump disabled
>
>    console_flush_on_panic() is called. It tries hard to get the
>    messages on the console because it is the only chance.
>
>    It does console_trylock(). It is called after
>    bust_spinlocks(1) so that even the console-specific locks
>    are taken only with trylock, see oops_in_progress.
>
>    BTW: There are people that do not like this because there
>    is still a risk of a deadlock. Some code paths
>    take locks without checking oops_in_progress.
>    For these people, more reliable reboot is more
>    important because they want to have the system
>    back ASAP (cloud people).
>
>
> 2. crashdump enabled:
>
>   Only printk_safe_flush_on_panic() is called. It does the best effort
>   to flush messages from the per-CPU buffers into the main log buffer
>   so that they can be found easily in the core.
>
>   It is pretty reliable. It should not be needed at all once the new
>   lockless ringbuffer gets fully integrated,
>
>   It does not try to flush the messages to the console. Getting
>   the crash dump is more important than risking a deadlock with
>   consoles.

Thanks again for the good clarification. I did not realize that calling
console_flush_on_panic(CONSOLE_FLUSH_PENDING) from panic() before our
__crash_kexec() was not safe enough.

So here is a new version of a (minimal) fix trying to follow your
suggestions:
(And you'll probably remove the added calls to
printk_safe_flush_on_panic() when integrating the new lockless
ringbuffer)

What do you think of this simplified possibility ?:
[If this fix is correct, I'll submit it as a v2 fix, with an updated comment and Notes]

--- kernel/panic.c      2021-01-11 20:33:25.268047057 -0500
***************
*** 166,171 ****
--- 166,180 ----
                ftrace_dump(DUMP_ALL);
  }

+ /*
+  * Force flush messages to the console.
+  */
+ static void panic_flush_to_console(void)
+ {
+       printk_safe_flush_on_panic();
+       console_flush_on_panic(CONSOLE_FLUSH_PENDING);
+ }
+
  /**
   *    panic - halt the system
   *    @fmt: The text string to print
***************
*** 284,291 ****
         *
         * Bypass the panic_cpu check and call __crash_kexec directly.
         */
!       if (_crash_kexec_post_notifiers)
                __crash_kexec(NULL);

  #ifdef CONFIG_VT
        unblank_screen();
--- 293,302 ----
         *
         * Bypass the panic_cpu check and call __crash_kexec directly.
         */
!       if (_crash_kexec_post_notifiers) {
!               printk_safe_flush_on_panic();
                __crash_kexec(NULL);
+       }

  #ifdef CONFIG_VT
        unblank_screen();
***************
*** 298,304 ****
         * buffer.  Try to acquire the lock then release it regardless of the
         * result.  The release will also print the buffers out.  Locks debug
         * should be disabled to avoid reporting bad unlock balance when
!        * panic() is not being callled from OOPS.
         */
        debug_locks_off();
        console_flush_on_panic(CONSOLE_FLUSH_PENDING);
--- 309,315 ----
         * buffer.  Try to acquire the lock then release it regardless of the
         * result.  The release will also print the buffers out.  Locks debug
         * should be disabled to avoid reporting bad unlock balance when
!        * panic() is not being called from OOPS.
         */
        debug_locks_off();
        console_flush_on_panic(CONSOLE_FLUSH_PENDING);
***************
*** 314,319 ****
--- 325,331 ----
                 * We can't use the "normal" timers since we just panicked.
                 */
                pr_emerg("Rebooting in %d seconds..\n", panic_timeout);
+               panic_flush_to_console();

                for (i = 0; i < panic_timeout * 1000; i += PANIC_TIMER_STEP) {
                        touch_nmi_watchdog();
***************
*** 347,352 ****
--- 359,365 ----
        disabled_wait();
  #endif
        pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf);
+       panic_flush_to_console();

        /* Do not scroll important messages printed above */
        suppress_printk = 1;

Reply via email to