On 1/25/16, Jeff Merkey <linux....@gmail.com> wrote: > On 1/25/16, Jeff Merkey <linux....@gmail.com> wrote: >> On 1/24/16, Jeff Merkey <linux....@gmail.com> wrote: >>> On 1/24/16, Jeff Merkey <linux....@gmail.com> wrote: >>>> If I single step with either kgdb, kgdb, or mdb kernel debuggers over >>>> a sysret instruction anywhere in the OS, the system hard hangs in >>>> smp_call_function_single after the debugger releases the system and it >>>> resumes normal operation. The specific place the kernel hangs is in >>>> the loop below. The softlockup detector will periodically detect >>>> this condition when it occurs, but not always, most of the time the >>>> system is just hung and unresponsive. >>>> >>>> (2)> u smp_call_function_single+d8 >>>> <<<< hard hang in this loop with EDX=3 >>>> 0xffffffff810fce48 8B55E0 mov edx,DWORD PTR >>>> [rbp-32]=0xCE037DC0 >>>> 0xffffffff810fce4b 83E201 and edx,0x1 >>>> 0xffffffff810fce4e 75F6 jne >>>> smp_call_function_single+0xd6 (0xffffffff810fce46) (up) >>>> <<<<< >>>> 0xffffffff810fce50 EBC3 jmp >>>> smp_call_function_single+0xa5 (0xffffffff810fce15) (up) >>>> 0xffffffff810fce52 8B05E08EC700 mov eax,[oops_in_progress]=0x0 >>>> 0xffffffff810fce58 85C0 test eax,eax >>>> 0xffffffff810fce5a 7585 jne >>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up) >>>> 0xffffffff810fce5c 803D8E0C9D0000 cmp [__warned.20610]=0x00,0x0 >>>> 0xffffffff810fce63 0F8578FFFFFF jne >>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up) >>>> 0xffffffff810fce69 BE24010000 mov esi,0x124 >>>> 0xffffffff810fce6e 48C7C796B08C81 mov rdi,0xffffffff818cb096 >>>> 0xffffffff810fce75 894DBC mov DWORD PTR [rbp-68]=0x0,ecx >>>> 0xffffffff810fce78 488955C0 mov QWORD PTR >>>> [rbp-64]=0xFFFFFFFFFFFFFF10,rdx >>>> 0xffffffff810fce7c E8FF21F8FF call warn_slowpath_null >>>> 0xffffffff810fce81 C605690C9D0001 mov [__warned.20610]=0x00,0x1 >>>> 0xffffffff810fce88 8B4DBC mov ecx,DWORD PTR [rbp-68]=0x0 >>>> 0xffffffff810fce8b 488B55C0 mov rdx,QWORD PTR >>>> [rbp-64]=0xFFFFFFFFFFFFFF10 >>>> 0xffffffff810fce8f E94DFFFFFF jmp >>>> smp_call_function_single+0x71 (0xffffffff810fcde1) (up) >>>> 0xffffffff810fce94 E8A71EF8FF call __stack_chk_fail >>>> 0xffffffff810fce99 0F1F8000000000 nop DWORD PTR [rax]=0x0 >>>> (2)> g >>>> >>>> >>>> The stack backtrace when the bug occurs is: >>>> >>>> smp_call_function_single+0xd8 >>>> unmap_page_range+0x613 >>>> flush_tlb_func+0x0 >>>> smp_call_function_many+215 >>>> native_flush_tlb_others+0x118 >>>> flush_tlb_mm_range+0x61 >>>> tlb_flush_mmu_tlbonly+0x6b >>>> tlb_finish_mmu+0x14 >>>> unmap_region+0xe2 >>>> vma_rb_erase+0x10f >>>> do_unmap+0x217 >>>> vm_unmap+0x41 >>>> SyS_munmap+0x22 >>>> entry_SYSCALL_64_fastpath+0x12 >>>> >>>> I traced through this code a bunch of times in just normal operations >>>> without triggering the bug to get a feel for what it normally sees in >>>> EDX and it looks like someone has coded a looping function that always >>>> has EDX=0 in every case I saw in the except for when this bug occurs. >>>> >>>> So the exact C code this maps fro objdump of kernel/smp.o is: >>>> >>>> 469: e8 62 fe ff ff callq 2d0 <generic_exec_single> >>>> 46e: 8b 55 e0 mov -0x20(%rbp),%edx >>>> * previous function call. For multi-cpu calls its even more >>>> interesting >>>> * as we'll have to ensure no other cpu is observing our csd. >>>> */ >>>> static void csd_lock_wait(struct call_single_data *csd) >>>> { >>>> while (smp_load_acquire(&csd->flags) & CSD_FLAG_LOCK) >>>> 471: 83 e2 01 and $0x1,%edx >>>> 474: 74 cf je 445 >>>> <smp_call_function_single+0xa5> >>>> 476: f3 90 pause >>>> <<<<<<<<<< >>>> 478: 8b 55 e0 mov -0x20(%rbp),%edx >>>> 47b: 83 e2 01 and $0x1,%edx >>>> 47e: 75 f6 jne 476 >>>> <smp_call_function_single+0xd6> >>>> <<<<<<<<<<< >>>> 480: eb c3 jmp 445 >>>> <smp_call_function_single+0xa5> >>>> * Can deadlock when called with interrupts disabled. >>>> * We allow cpu's that are not yet online though, as no one else can >>>> * send smp call function interrupt to this cpu and as such deadlocks >>>> * can't happen. >>>> */ >>>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled() >>>> >>>> Each time this bug occurs csd->flags is always set to a value of 3 and >>>> never changes. When the system is just running normally, it seems to >>>> be 0 the rest of the time. Setting EDX=0 from the debugger console >>>> clears the hang condition and the system seems to recover except the >>>> system reports this error from the console when you attempt to load >>>> programs, indicating the ability of the system to load shared objects >>>> is fritzed. >>>> >>>> # >>>> # ls -l >>>> /lib64/libc.so.6 version GLI not found << this error and no shared >>>> objects will load >>>> # >>>> # >>>> >>>> Jeff >>>> >>> >>> I am running down a trace of the MSR values for swapgs. Looks like it >>> got nested somewhere down in the entry_64 code. If so, then this is >>> just a symptom and not the sickness. >>> >>> Jeff >>> >> >> I got to the bottom of this one and its related to the function loop >> locking up the system that is described above when code is allowed to >> run on one processor but not the others and one of them is trying to >> send an IPI. >> >> You might want to consider some sort of timeout logic for that >> function or ability for it to recover. The problem is caused by >> holding all the processors except the target processor being traced >> and letting it run for a trace session while the other processors are >> held suspended. If I release the all processors each time I step >> with the target processor the problem goes away. If I hold the >> processors while stepping over a section of code this lockup will >> occur and the processor the bug occurs on apparently never recovers >> >> In the process of tracking down this bug I traced all of the swapgs >> usage in all of the exception handlers and it works well. The check >> in paranoid_entry and paranoid_exit seems to catch the cases where the >> debugger is or has stepped over a swapgs instruction pointing to a gs >> in userspace for both int1 and int3 exceptions, and recovers correctly >> through paranoid_entry and paranoid exit with the correct gs and MSR >> value in all the cases I traced. >> >> I coded around this issue but it's probably a bug in using that while >> loop without some way to determine a processor has gone away >> temporarily. The solution is to let the processors run in between >> each and every int1 trap and not hold a processor in focus mode, then >> this code snippet that locks up in smp.c does not get hit. >> >> Jeff >> > > I guess Linus has already been trying to fix this problem since he was > the last person to touch that code. Looks like there is another case > where csd_unlock does not get called or is delayed. > > https://github.com/torvalds/linux/commit/8053871d0f7f67c7efb7f226ef031f78877d6625 > > I think I'll work on this one some more and see if I can fix it. It > still shows up after stepping through a sysret even with my changes, > its just harder to trigger. It crashes at exactly the same place in > the stacktrace from the same source so there's a clue to start with. > > Jeff >
This bug is confirmed to be an int1 recursion that happens because the sysret code in entry_64.S allows return to userspace with the trap flag set. The last person to try to fix this was from this commit -- Ingo. https://github.com/torvalds/linux/commit/4d7321381e5c7102a3d3faf0a0a0035a09619612 for this reported problem. http://lkml.iu.edu/hypermail/linux/kernel/1504.0/00594.html I verified that I am seeing the same problem here, it also exposes a bug in Linus code in smp.c. I have that fixed as well -- use a spinlock and don't try to guess what order hardware will post bus transactions for memory reads and writes. That code in smp.c is busted on some systems. I have a fix for this problem and Ingo needs to review this code because I have set the trap flag and watched is cycle through this code in entry_64.S while set and still end up executing a sysret when its not supposed to and take the other return path. Jeff