Re: gdbstub initial code, v5

2010-08-24 Thread Oleg Nesterov
On 08/23, Oleg Nesterov wrote:

 However. I spent all Monday trying to resolve the new bug, and
 so far I do not understand what happens. Extremely hard to reproduce,
 and the kernel just hangs silently, without any message.
 So far I suspect the proble in utrace.c, but this time I am not sure.

Solved. This was scheduler bug fixed in 2.6.35, but I used 2.6.34.
This is really funny. This bug (PF_STARTING lockup) was found and
fixed by me  Peter.

Oh. But I hit yet another problem, BUG_ON() in __utrace_engine_release().
Again, it is not reproducible, I saw it only once in dmesg and I do
not even know for sure what I was doing.

I'll contiue tomorrow, but if I won't be able to quickly resolve
this problem I am going to ignore it for now. This time I think
ugdb is wrong.

Oleg.



Re: gdbstub initial code, v5

2010-08-24 Thread Roland McGrath
 When the main thread exits, gdbserver still exposes it to gdb as
 a running process. It is visible via info threads, you can switch
 to this thread, $Tp or $Hx result in OK as if this thread is alive.
 gdbserver even pretends that $vCont;x:DEAD_THEAD works, although
 this thread obviously can never report something.

This is sort of consistent with the kernel treatment.  The main thread
stays around as a zombie, acting as a moniker for the whole process.  But
indeed that is not actually useful for any thread-granularity control or
information (well, there is the dead thread's usage stats, but that's all).

 I don't think this is really right. This just confuses the user, and
 imho this should be considered like the minor bug.

I tend to agree, but don't think it's a big issue either way, really.

 ugdb doesn't do this. If the main thread exits - it exits like any
 other thread. I played with gdb, it seems to handle this case fine.

Sounds good to me!

   - The exit code (Wxx) can be wrong in mt-case.
 
 The problem is, -report_death can't safely access
 -group_exit_code with kernel  2.6.35. This is
 solveable.

Don't even worry about it.  If there is something trivial to do that makes
it better for earlier kernels, then go ahead.  But if the easy thing to do
gives correct results on =2.6.35 and racily wrong or random results on
older kernels, then we can just live with that.


Thanks,
Roland