Re: [question] panic() during reboot -f (reboot syscall)
On Tue, Mar 12, 2019 at 04:29:25PM -0500, Eric W. Biederman wrote: > I wonder if there is an easy way to get the scheduler to not schedule > userspace processes once the reboot system call has started. That > sounds like the simple way to avoid this kind of confusion. That sounds like adding code to a hotpath that is 'never' used.
Re: [question] panic() during reboot -f (reboot syscall)
Linus Torvalds writes: > On Wed, Mar 6, 2019 at 5:29 AM Petr Mladek wrote: >> >> I wonder if it is "normal" to get panic() when the system is rebooted >> using "reboot -f". I looks a bit weird to me. > > No, a panic is never normal (except possibly for test modules etc, of course). > >> Now, "reboot -f" just calls the reboot() syscall. I do not see >> anything that would stop processes. > > There isn't supposed to be anything. It's meant for "things are > screwed up, just reboot *now* without doing anything else". > > The "reboot now" is basically meant to be a poor man's power cycle. > >> But it shuts down devices very early, via: >> >> + kernel_restart() >> + kernel_restart_prepare() >> + blocking_notifier_call_chain(_notifier_list, SYS_RESTART, >> cmd); >> + device_shutdown() > > The problem is that there are conflicting goals here, and the kernel > doesn't even *know* if this is supposed to be a normal clean reboot, > or a "reboot -f" that just shuts down everything. > > On a nice clean reboot (where init has shut everything down) we > obviously _do_ want to shut devices down etc. Quite often you need to > do it just to make sure they come up nicely again (because the > firmware isn't even always re-initializing things properly on a soft > reboot). > > But on a "reboot -f", user space _hasn't_ cleaned up, and just wants > things to reboot. But the kernel doesn't really know. It just gets the > reboot system call in both cases. > >> By other words. It looks like the panic() is possible by design. >> But it looks a bit weird. Any opinion? > > It's definitely not "by design", but it might be unavoidable in this case. > > Of course, "unavoidable" is relative. There could be workarounds that > are reasonably ok in practice. > > Like having the filesystem panic code see "oh, system_state isn't > SYSTEM_RUNNING, so I shouldn't be panicing". I wonder if there is an easy way to get the scheduler to not schedule userspace processes once the reboot system call has started. That sounds like the simple way to avoid this kind of confusion. Eric
Re: [question] panic() during reboot -f (reboot syscall)
On Wed, Mar 6, 2019 at 5:29 AM Petr Mladek wrote: > > I wonder if it is "normal" to get panic() when the system is rebooted > using "reboot -f". I looks a bit weird to me. No, a panic is never normal (except possibly for test modules etc, of course). > Now, "reboot -f" just calls the reboot() syscall. I do not see > anything that would stop processes. There isn't supposed to be anything. It's meant for "things are screwed up, just reboot *now* without doing anything else". The "reboot now" is basically meant to be a poor man's power cycle. > But it shuts down devices very early, via: > > + kernel_restart() > + kernel_restart_prepare() > + blocking_notifier_call_chain(_notifier_list, SYS_RESTART, cmd); > + device_shutdown() The problem is that there are conflicting goals here, and the kernel doesn't even *know* if this is supposed to be a normal clean reboot, or a "reboot -f" that just shuts down everything. On a nice clean reboot (where init has shut everything down) we obviously _do_ want to shut devices down etc. Quite often you need to do it just to make sure they come up nicely again (because the firmware isn't even always re-initializing things properly on a soft reboot). But on a "reboot -f", user space _hasn't_ cleaned up, and just wants things to reboot. But the kernel doesn't really know. It just gets the reboot system call in both cases. > By other words. It looks like the panic() is possible by design. > But it looks a bit weird. Any opinion? It's definitely not "by design", but it might be unavoidable in this case. Of course, "unavoidable" is relative. There could be workarounds that are reasonably ok in practice. Like having the filesystem panic code see "oh, system_state isn't SYSTEM_RUNNING, so I shouldn't be panicing". Linus
[question] panic() during reboot -f (reboot syscall)
Hello, I wonder if it is "normal" to get panic() when the system is rebooted using "reboot -f". I looks a bit weird to me. In our case, the panic() was triggered from ext4 filesystem code that was mounted with "errors=panic" crash> bt PID: 3984 TASK: 887db1f6c180 CPU: 32 COMMAND: "bash" #0 [887e637bf9a8] machine_kexec at 81059c5c #1 [887e637bf9f8] __crash_kexec at 81119e0a #2 [887e637bfab8] panic at 81193c31 #3 [887e637bfb30] ext4_handle_error at a0229faa [ext4] #4 [887e637bfb40] __ext4_error_inode at a022a12a [ext4] #5 [887e637bfbe0] __ext4_get_inode_loc at a02096a5 [ext4] #6 [887e637bfc40] ext4_iget at a020c028 [ext4] #7 [887e637bfcc0] ext4_lookup at a0216ca0 [ext4] #8 [887e637bfce8] lookup_real at 81218e3f #9 [887e637bfd00] __lookup_hash at 8121916f #10 [887e637bfd20] walk_component at 8121b50f #11 [887e637bfd70] path_lookupat at 8121ca30 #12 [887e637bfd98] filename_lookup at 8121e58c #13 [887e637bfe98] vfs_fstatat at 81214549 #14 [887e637bfed8] SYSC_newstat at 812149ca #15 [887e637bff50] entry_SYSCALL_64_fastpath at 8161de61 RIP: 7f9db8d3ebe5 RSP: 7ffda081cf68 RFLAGS: 0246 RAX: ffda RBX: RCX: 7f9db8d3ebe5 RDX: 013c7fa0 RSI: 013c7fa0 RDI: 013c7f40 RBP: 7f9db943bee0 R8: 013c7f40 R9: 000b R10: 7af2c337 R11: 0246 R12: 013c7fa0 R13: 013c7fa0 R14: 0008 R15: 013c7f80 ORIG_RAX: 0004 CS: 0033 SS: 002b Now, "reboot -f" just calls the reboot() syscall. I do not see anything that would stop processes. It even does not stop other CPUs by purpose, see the commit cf7df378aa4ff7da ("reboot: rigrate shutdown/reboot to boot cpu"). But it shuts down devices very early, via: + kernel_restart() + kernel_restart_prepare() + blocking_notifier_call_chain(_notifier_list, SYS_RESTART, cmd); + device_shutdown() As a result, processes are still running. Filesystem code return errors because the underlaying disk device was removed. It causes panic() because "errors=panic" mount option. My undestanding that userspace is reponsible for "clean" reboot. The "reboot" command normally stops services, kill processes, sync disks, umount filesystem before it calls the "reboot" syscall. By other words. It looks like the panic() is possible by design. But it looks a bit weird. Any opinion? Best Regards, Petr