I am trying again on my original setup (not the simplified defconfig that I
provided later).

I can now see how this is related to Greg's commit.

Here is the stack trace at the time of the error:
https://pasteboard.co/9QEmhZJFvIHC.png


This is what I think happens.

nxtask_assign_pid() calls kmm_free(g_pidhash). Supposedly, right after
freeing it, it should set again g_pidhash to pidhash;
kmm_free however uses a semaphore. When free is complete, it posts the
semaphore.
nxsem_post() will internally call nxsem_checkholder() to perform the new
check.
This leads to a call to nxsched_get_tcb() that tries to access g_pidhash.

But! g_pidhash is deallocated at this point! And thus it points to garbage.

KASAN is right to complain.










On Wed, Apr 5, 2023 at 12:24 AM Gregory Nutt <spudan...@gmail.com> wrote:

>
> On 4/4/2023 2:43 PM, Fotis Panagiotopoulos wrote:
> > Sorry, maybe it was a bad example.
> >
> > Here is a much more minimal config that you can run directly.
> > https://pastebin.com/x775E7iF
> >
> > For me, it crashes almost immediately after starting.
> >
> > Trying again with 4ff4562401401a3a86c74cb2bda9a1a2b8d94e6d and it moves
> > along.
> >
> Okay, I was able to replicate the error with that configuration, but I
> don't know what it means since I don't understand the kazan stuff:
>
>     ...
>     ostest_main: setenv(Variable3, BadValue2, FALSE)
>     show_variable: Variable=Variable1 has value=GoodValue1
>     show_variable: Variable=Variable2 has value=GoodValue2
>     show_variable: Variable=Variable3 has value=GoodValue3
>     kasan_report: kasan detected a read access error, address at
>     0x7ffff3db52c8, size is 8
>
>     Breakpoint 1, _assert (
>          filename=0x555555576863 <syslog+159> "\220H\201\304",
>     <incomplete sequence \350>, linenum=0,
>          msg=0x7ffff3de8930 "\001") at misc/assert.c:423
>     423     {
>     (gdb) gt
>     Undefined command: "gt".  Try "help".
>     (gdb) bt
>     #0  _assert (filename=0x555555576863 <syslog+159> "\220H\201\304",
>     <incomplete sequence \350>,
>          linenum=0, msg=0x7ffff3de8930 "\001") at misc/assert.c:423
>     #1  0x000055555557053e in __assert (filename=0x5555555bd6d0
>     "kasan/kasan.c", linenum=114,
>          msg=0x5555555bd735 "panic") at assert/lib_assert.c:36
>     #2  0x000055555557798e in kasan_report (addr=0x7ffff3db52c8, size=8,
>     is_write=false)
>          at kasan/kasan.c:114
>     #3  0x0000555555577f84 in __asan_loadN_noabort (addr=0x7ffff3db52c8,
>     size=8) at kasan/kasan.c:307
>     #4  0x000055555557802b in __asan_load8_noabort (addr=0x7ffff3db52c8)
>     at kasan/kasan.c:331
>     #5  0x000055555555d6e8 in nxsched_get_tcb (pid=3) at
>     sched/sched_gettcb.c:79
>     #6  0x000055555555a258 in nxsem_checkholder (sem=0x7ffff3db5000) at
>     semaphore/sem_holder.c:1106
>     #7  0x000055555556eafe in nxsem_post (sem=0x7ffff3db5000) at
>     semaphore/sem_post.c:85
>     #8  0x000055555556edfb in sem_post (sem=0x7ffff3db5000) at
>     semaphore/sem_post.c:256
>     #9  0x0000555555570ad5 in nxmutex_unlock (mutex=0x7ffff3db5000) at
>     misc/lib_mutex.c:340
>     #10 0x000055555557899b in mm_unlock (heap=0x7ffff3db5000) at
>     mm_heap/mm_lock.c:117
>
> The call to mm_unlock() is thing that kicks of the assertion check.
> nxsem_checkholder() is the assertion check
>
>     (gdb) up
>     #1  0x000055555557053e in __assert (filename=0x5555555bd6d0
>     "kasan/kasan.c", linenum=114,
>
>          msg=0x5555555bd735 "panic") at assert/lib_assert.c:36
>     36        _assert(filename, linenum, msg);
>     (gdb) up
>     #2  0x000055555557798e in kasan_report (addr=0x7ffff3db52c8, size=8,
>     is_write=false)
>          at kasan/kasan.c:114
>     114           PANIC();
>     (gdb) up
>     #3  0x0000555555577f84 in __asan_loadN_noabort (addr=0x7ffff3db52c8,
>     size=8) at kasan/kasan.c:307
>     307           kasan_report(addr, size, false);
>     (gdb) up
>     #4  0x000055555557802b in __asan_load8_noabort (addr=0x7ffff3db52c8)
>     at kasan/kasan.c:331
>     331       __asan_loadN_noabort(addr, 8);
>
> The following is the logic that triggered __asan_loadN_noabort(). But I
> don't see any problem.  Could this be a false alarm?
>
>     (gdb) up
>     #5  0x000055555555d6e8 in nxsched_get_tcb (pid=3) at
>     sched/sched_gettcb.c:79
>     79            if (g_pidhash[hash_ndx] != NULL && pid ==
>     g_pidhash[hash_ndx]->pid)
>     (gdb) p hash_ndx
>     $8 = 3
>     (gdb) p pid
>     $9 = 3
>     (gdb) p g_pidhash[hash_ndx]->pid
>     $10 = 3
>
>     (gdb) up
>     #6  0x000055555555a258 in nxsem_checkholder (sem=0x7ffff3db5000) at
>     semaphore/sem_holder.c:1106
>     1106      htcb = nxsched_get_tcb(tid);
>     (gdb) p tid
>     $11 = 3
>
> Can you explain what happened when __asan_loadN_noabort was called.
> That is the error.  As far as I can tell, everything else looks good to me.
>

Reply via email to