I am trying again on my original setup (not the simplified defconfig that I provided later).
I can now see how this is related to Greg's commit. Here is the stack trace at the time of the error: https://pasteboard.co/9QEmhZJFvIHC.png This is what I think happens. nxtask_assign_pid() calls kmm_free(g_pidhash). Supposedly, right after freeing it, it should set again g_pidhash to pidhash; kmm_free however uses a semaphore. When free is complete, it posts the semaphore. nxsem_post() will internally call nxsem_checkholder() to perform the new check. This leads to a call to nxsched_get_tcb() that tries to access g_pidhash. But! g_pidhash is deallocated at this point! And thus it points to garbage. KASAN is right to complain. On Wed, Apr 5, 2023 at 12:24 AM Gregory Nutt <spudan...@gmail.com> wrote: > > On 4/4/2023 2:43 PM, Fotis Panagiotopoulos wrote: > > Sorry, maybe it was a bad example. > > > > Here is a much more minimal config that you can run directly. > > https://pastebin.com/x775E7iF > > > > For me, it crashes almost immediately after starting. > > > > Trying again with 4ff4562401401a3a86c74cb2bda9a1a2b8d94e6d and it moves > > along. > > > Okay, I was able to replicate the error with that configuration, but I > don't know what it means since I don't understand the kazan stuff: > > ... > ostest_main: setenv(Variable3, BadValue2, FALSE) > show_variable: Variable=Variable1 has value=GoodValue1 > show_variable: Variable=Variable2 has value=GoodValue2 > show_variable: Variable=Variable3 has value=GoodValue3 > kasan_report: kasan detected a read access error, address at > 0x7ffff3db52c8, size is 8 > > Breakpoint 1, _assert ( > filename=0x555555576863 <syslog+159> "\220H\201\304", > <incomplete sequence \350>, linenum=0, > msg=0x7ffff3de8930 "\001") at misc/assert.c:423 > 423 { > (gdb) gt > Undefined command: "gt". Try "help". > (gdb) bt > #0 _assert (filename=0x555555576863 <syslog+159> "\220H\201\304", > <incomplete sequence \350>, > linenum=0, msg=0x7ffff3de8930 "\001") at misc/assert.c:423 > #1 0x000055555557053e in __assert (filename=0x5555555bd6d0 > "kasan/kasan.c", linenum=114, > msg=0x5555555bd735 "panic") at assert/lib_assert.c:36 > #2 0x000055555557798e in kasan_report (addr=0x7ffff3db52c8, size=8, > is_write=false) > at kasan/kasan.c:114 > #3 0x0000555555577f84 in __asan_loadN_noabort (addr=0x7ffff3db52c8, > size=8) at kasan/kasan.c:307 > #4 0x000055555557802b in __asan_load8_noabort (addr=0x7ffff3db52c8) > at kasan/kasan.c:331 > #5 0x000055555555d6e8 in nxsched_get_tcb (pid=3) at > sched/sched_gettcb.c:79 > #6 0x000055555555a258 in nxsem_checkholder (sem=0x7ffff3db5000) at > semaphore/sem_holder.c:1106 > #7 0x000055555556eafe in nxsem_post (sem=0x7ffff3db5000) at > semaphore/sem_post.c:85 > #8 0x000055555556edfb in sem_post (sem=0x7ffff3db5000) at > semaphore/sem_post.c:256 > #9 0x0000555555570ad5 in nxmutex_unlock (mutex=0x7ffff3db5000) at > misc/lib_mutex.c:340 > #10 0x000055555557899b in mm_unlock (heap=0x7ffff3db5000) at > mm_heap/mm_lock.c:117 > > The call to mm_unlock() is thing that kicks of the assertion check. > nxsem_checkholder() is the assertion check > > (gdb) up > #1 0x000055555557053e in __assert (filename=0x5555555bd6d0 > "kasan/kasan.c", linenum=114, > > msg=0x5555555bd735 "panic") at assert/lib_assert.c:36 > 36 _assert(filename, linenum, msg); > (gdb) up > #2 0x000055555557798e in kasan_report (addr=0x7ffff3db52c8, size=8, > is_write=false) > at kasan/kasan.c:114 > 114 PANIC(); > (gdb) up > #3 0x0000555555577f84 in __asan_loadN_noabort (addr=0x7ffff3db52c8, > size=8) at kasan/kasan.c:307 > 307 kasan_report(addr, size, false); > (gdb) up > #4 0x000055555557802b in __asan_load8_noabort (addr=0x7ffff3db52c8) > at kasan/kasan.c:331 > 331 __asan_loadN_noabort(addr, 8); > > The following is the logic that triggered __asan_loadN_noabort(). But I > don't see any problem. Could this be a false alarm? > > (gdb) up > #5 0x000055555555d6e8 in nxsched_get_tcb (pid=3) at > sched/sched_gettcb.c:79 > 79 if (g_pidhash[hash_ndx] != NULL && pid == > g_pidhash[hash_ndx]->pid) > (gdb) p hash_ndx > $8 = 3 > (gdb) p pid > $9 = 3 > (gdb) p g_pidhash[hash_ndx]->pid > $10 = 3 > > (gdb) up > #6 0x000055555555a258 in nxsem_checkholder (sem=0x7ffff3db5000) at > semaphore/sem_holder.c:1106 > 1106 htcb = nxsched_get_tcb(tid); > (gdb) p tid > $11 = 3 > > Can you explain what happened when __asan_loadN_noabort was called. > That is the error. As far as I can tell, everything else looks good to me. >