Re: [lldb-dev] Too many open files

2015-10-07 Thread Adrian McCarthy via lldb-dev
Adding a printing destructor to threading.Event seems to aggravate timing
problems, causing several tests to fail to make their inferiors and that
seemingly keeps us below the open file limit.  That aside, the destructor
did fire many hundreds of times, so there's not a general problem stopping
all or even most of those to be cleaned up.

The event objects that I'm seeing with the Sysinternals tools are likely
Windows Events that Python creates to facilitate the interprocess
communication.

I'm looking at the ProcessDriver lifetimes now.

On Tue, Oct 6, 2015 at 9:54 AM, Todd Fiala  wrote:

> Okay.
>
> A promising avenue might be to look at how Windows cleans up the
> threading.Event objects.  Chasing that thread might yield why the events
> are not going away (assuming those are the events that are lingering on
> your end).  One thing you could consider doing is patching in a replacement
> destructor for the threading.Event and print something when it fires off,
> verifying that they're really going away from the Python side.  If they're
> not, perhaps there's a retain bloat issue where we're not getting rid of
> some python objects due to some unintended references living beyond
> expectations.
>
> The dosep.py call_with_timeout method drives the child process operation
> chain.  That thing creates a ProcessDriver and collects the results from it
> when done.  Everything within the ProcessDriver (including the event)
> should be cleaned up by the time the call_with_timeout() call wraps up as
> there shouldn't be any references outstanding.  It might also be worth you
> adding a destructor to the ProcessDriver to make sure that's going away,
> one per Python test inferior executed.
>
> On Tue, Oct 6, 2015 at 9:48 AM, Adrian McCarthy 
> wrote:
>
>> Python 2.7.10 made no difference.  I'm dealing with other issues this
>> afternoon, so I'll probably return to this on Wednesday.  It's not critical
>> since there are workarounds.
>>
>> On Tue, Oct 6, 2015 at 9:41 AM, Todd Fiala  wrote:
>>
>>>
>>>
>>> On Mon, Oct 5, 2015 at 3:58 PM, Adrian McCarthy 
>>> wrote:
>>>
 Different tools are giving me different numbers.

 At the time of the error, Windbg says there are about 2000 open
 handles, most of them are Event handles, not File handles.  That's higher
 than I'd expect, but not really concerning.


>>> Ah, that's useful.  I am using events (python threading.Event).  These
>>> don't afford any clean up mechanisms on them, so I assume these go away
>>> when the Python objects that hold them go away.
>>>
>>>
 Process Explorer, however, shows ~20k open handles per Python process
 running dotest.exe.  It also says that about 2000 of those are the
 process's "own handles."  I'm researching to see what that means.  I
 suspect it means that the process has about ~18k handles to objects owned
 by another process and 2k of ones that it actually owns.

 I found this Stack Overflow post, which suggests is may be an
 interaction with using Python subprocess in a loop and having those
 subprocesses work with files that are still open in the parent process, but
 I don't entirely understand the answer:


 http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files


>>> Hmm I'll read through that.
>>>
>>>
 It might be a problem with Python subprocess that's been fixed in a
 newer version.  I'm going to try upgrading from Python 2.7.9 to 2.7.10 to
 see if that makes a difference.


>>> Okay, we're on 2.7.10 on latest OS X.  I *think* I'm using Python 2.7.6
>>> on Ubuntu 14.04.  Checking now... (yes, 2.7.6 on 14.04).  Ubuntu 15.10 beta
>>> 1 is using Python 2.7.10.
>>>
>>> Seems reasonable to check that out.  Let me know what you find out!
>>>
>>> -Todd
>>>
>>>
 On Mon, Oct 5, 2015 at 12:02 PM, Todd Fiala 
 wrote:

> It's possible.  However, I was monitoring actual open files during the
> course of the run (i.e. what the kernel thought was open for the master
> driver process, which is the only place that makes sense to see leaks
> accumulate) in both threading and threading-pool (on OS X), and I saw only
> the handful of file handles that I'd expect to  be open - pipes
> (stdout,stderr,stdin) from the main test runner to the inferior test
> runners, the shared libraries loaded as part of the test runner, and (in 
> my
> case, but probably not yours for the configuration), the tcp sockets for
> gathering the test events.  There was no growth, and I didn't see things
> hanging around longer than I'd expect.
>
> The SysInternals process viewer tool is great for this kind of thing -
> glad you're using it.  Once you find out which file handles are getting
> leaked and where they came from, we can probably figure out which part 

Re: [lldb-dev] Thread resumes with stale signal after executing InferiorCallMmap

2015-10-07 Thread Eugene Birukov via lldb-dev
Even on Linux call to InferiorCallMmap does not fail consistently. In many 
cases it survives. I just happened to have 100% repro on this specific 
breakpoint in my specific problem. I.e. the burden of investigation is on me, 
since I cannot share my program. 
But I am not looking at this SIG_ILL yet. Whatever the problem is with mmap - 
the client must not carry this signal past expression evaluation. I.e. I 
believe that we can construct any arbitrary function that causes signal, call 
it from evaluate expression, and then continue would fail. I suspect that this 
problem might be applicable to any POSIX platform.
As it turned out, my initial analysis was incorrect. m_resume_signal is 
calculated from StopInfo::m_value (now I wonder why do we need two fields for 
that?). And after mmap call, m_stop_info on the thread is null. So, my current 
theory is that there is an event with SIG_ILL that is stuck in the broadcaster 
and is picked up and processed much later.

> Subject: Re: [lldb-dev] Thread resumes with stale signal after executing 
> InferiorCallMmap
> From: jing...@apple.com
> Date: Wed, 7 Oct 2015 15:08:18 -0700
> CC: lldb-dev@lists.llvm.org
> To: eugen...@hotmail.com
> 
> Does it only happen for InferiorCallMmap, or does an expression evaluation 
> that crashes in general set a bad signal on resume?  I don't see this 
> behavior in either case on OS X, so it may be something in the Linux support. 
>  Be interesting to figure out why it behaves this way on Linux, so whatever 
> we do we're implementing it consistently.
> 
> Jim
> 
> 
> 
> > On Oct 7, 2015, at 12:03 PM, Eugene Birukov via lldb-dev 
> >  wrote:
> > 
> > Hi,
> >  
> > I am using LLDB 3.7.0 C++ API. My program stops at a certain breakpoint and 
> > if I call SBFrame::EvaluateExpression() there, when I let it go it 
> > terminates with SIG_ILL on an innocent thread. I dug up into this, and 
> > there seems to be two independent problems there, this mail is about the 
> > second one.
> >  
> > • EvaluateExpression() calls Process::CanJIT() which in turn executes 
> > mmap() on the inferior. This mmap gets SIG_ILL because execution starts at 
> > address which is 2 bytes before the very first mmap instruction. I am still 
> > looking why LLDB server decided to do that - I am pretty sure that the 
> > client asked to set the program counter to correct value.
> > • So, the thread execution terminates and the signal is recorded on 
> > Thread::m_resume_signal. This field is not cleared during 
> > Thread::RestoreThreadStateFromCheckpoint() and fires when I resume the 
> > program after breakpoint.
> >  
> > So, what would be the best way to deal with the situation? Should I add 
> > "resume signal" field to ThreadStateCheckpoint? Or would StopInfo be a 
> > better place for that? Or something else?
> >  
> > Thanks,
> > Eugene
> > ___
> > lldb-dev mailing list
> > lldb-dev@lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> 
  ___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Thread resumes with stale signal after executing InferiorCallMmap

2015-10-07 Thread Jim Ingham via lldb-dev

> On Oct 7, 2015, at 4:06 PM, Eugene Birukov  wrote:
> 
> Even on Linux call to InferiorCallMmap does not fail consistently. In many 
> cases it survives. I just happened to have 100% repro on this specific 
> breakpoint in my specific problem. I.e. the burden of investigation is on me, 
> since I cannot share my program. 
> 
> But I am not looking at this SIG_ILL yet. Whatever the problem is with mmap - 
> the client must not carry this signal past expression evaluation. I.e. I 
> believe that we can construct any arbitrary function that causes signal, call 
> it from evaluate expression, and then continue would fail. I suspect that 
> this problem might be applicable to any POSIX platform.

It doesn't happen on OS X, though when it comes to signal handling in the 
debugger OS X is an odd fish...

> 
> As it turned out, my initial analysis was incorrect. m_resume_signal is 
> calculated from StopInfo::m_value (now I wonder why do we need two fields for 
> that?).

The signal that you stop with is not necessarily the one you are going to 
resume with.  For instance, if you use "process handle SIG_SOMESIG -p 0" to 
tell lldb not to propagate the signal, then the resume signal will be nothing, 
even though the stop signal is SIG_SOMESIG.

> And after mmap call, m_stop_info on the thread is null. So, my current theory 
> is that there is an event with SIG_ILL that is stuck in the broadcaster and 
> is picked up and processed much later.

When the expression evaluation completes, the StopInfo from the last "natural" 
stop should be put back in place in the thread.  After all, if you hit a 
breakpoint, run an expression, then ask why that thread stopped, you want to 
see "hit a breakpoint" not "ran a function call".  Sounds like that is failing 
somehow.

Jim


> 
> > Subject: Re: [lldb-dev] Thread resumes with stale signal after executing 
> > InferiorCallMmap
> > From: jing...@apple.com
> > Date: Wed, 7 Oct 2015 15:08:18 -0700
> > CC: lldb-dev@lists.llvm.org
> > To: eugen...@hotmail.com
> > 
> > Does it only happen for InferiorCallMmap, or does an expression evaluation 
> > that crashes in general set a bad signal on resume? I don't see this 
> > behavior in either case on OS X, so it may be something in the Linux 
> > support. Be interesting to figure out why it behaves this way on Linux, so 
> > whatever we do we're implementing it consistently.
> > 
> > Jim
> > 
> > 
> > 
> > > On Oct 7, 2015, at 12:03 PM, Eugene Birukov via lldb-dev 
> > >  wrote:
> > > 
> > > Hi,
> > >  
> > > I am using LLDB 3.7.0 C++ API. My program stops at a certain breakpoint 
> > > and if I call SBFrame::EvaluateExpression() there, when I let it go it 
> > > terminates with SIG_ILL on an innocent thread. I dug up into this, and 
> > > there seems to be two independent problems there, this mail is about the 
> > > second one.
> > >  
> > > • EvaluateExpression() calls Process::CanJIT() which in turn executes 
> > > mmap() on the inferior. This mmap gets SIG_ILL because execution starts 
> > > at address which is 2 bytes before the very first mmap instruction. I am 
> > > still looking why LLDB server decided to do that - I am pretty sure that 
> > > the client asked to set the program counter to correct value.
> > > • So, the thread execution terminates and the signal is recorded on 
> > > Thread::m_resume_signal. This field is not cleared during 
> > > Thread::RestoreThreadStateFromCheckpoint() and fires when I resume the 
> > > program after breakpoint.
> > >  
> > > So, what would be the best way to deal with the situation? Should I add 
> > > "resume signal" field to ThreadStateCheckpoint? Or would StopInfo be a 
> > > better place for that? Or something else?
> > >  
> > > Thanks,
> > > Eugene
> > > ___
> > > lldb-dev mailing list
> > > lldb-dev@lists.llvm.org
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev
> > 

___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


[lldb-dev] Thread resumes with stale signal after executing InferiorCallMmap

2015-10-07 Thread Eugene Birukov via lldb-dev
Hi,
 
I am using LLDB 3.7.0 C++ API. My program stops at a certain breakpoint and if 
I call SBFrame::EvaluateExpression() there, when I let it go it terminates with 
SIG_ILL on an innocent thread. I dug up into this, and there seems to be two 
independent problems there, this mail is about the second one.
 
EvaluateExpression() calls Process::CanJIT() which in turn executes mmap() on 
the inferior. This mmap gets SIG_ILL because execution starts at address which 
is 2 bytes before the very first mmap instruction. I am still looking why LLDB 
server decided to do that - I am pretty sure that the client asked to set the 
program counter to correct value.So, the thread execution terminates and the 
signal is recorded on Thread::m_resume_signal. This field is not cleared during 
Thread::RestoreThreadStateFromCheckpoint() and fires when I resume the program 
after breakpoint. 
So, what would be the best way to deal with the situation? Should I add "resume 
signal" field to ThreadStateCheckpoint? Or would StopInfo be a better place for 
that? Or something else?
 
Thanks,
Eugene
  ___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev