Re: [lldb-dev] Too many open files

2015-10-07 Thread Adrian McCarthy via lldb-dev
Adding a printing destructor to threading.Event seems to aggravate timing
problems, causing several tests to fail to make their inferiors and that
seemingly keeps us below the open file limit.  That aside, the destructor
did fire many hundreds of times, so there's not a general problem stopping
all or even most of those to be cleaned up.

The event objects that I'm seeing with the Sysinternals tools are likely
Windows Events that Python creates to facilitate the interprocess
communication.

I'm looking at the ProcessDriver lifetimes now.

On Tue, Oct 6, 2015 at 9:54 AM, Todd Fiala  wrote:

> Okay.
>
> A promising avenue might be to look at how Windows cleans up the
> threading.Event objects.  Chasing that thread might yield why the events
> are not going away (assuming those are the events that are lingering on
> your end).  One thing you could consider doing is patching in a replacement
> destructor for the threading.Event and print something when it fires off,
> verifying that they're really going away from the Python side.  If they're
> not, perhaps there's a retain bloat issue where we're not getting rid of
> some python objects due to some unintended references living beyond
> expectations.
>
> The dosep.py call_with_timeout method drives the child process operation
> chain.  That thing creates a ProcessDriver and collects the results from it
> when done.  Everything within the ProcessDriver (including the event)
> should be cleaned up by the time the call_with_timeout() call wraps up as
> there shouldn't be any references outstanding.  It might also be worth you
> adding a destructor to the ProcessDriver to make sure that's going away,
> one per Python test inferior executed.
>
> On Tue, Oct 6, 2015 at 9:48 AM, Adrian McCarthy 
> wrote:
>
>> Python 2.7.10 made no difference.  I'm dealing with other issues this
>> afternoon, so I'll probably return to this on Wednesday.  It's not critical
>> since there are workarounds.
>>
>> On Tue, Oct 6, 2015 at 9:41 AM, Todd Fiala  wrote:
>>
>>>
>>>
>>> On Mon, Oct 5, 2015 at 3:58 PM, Adrian McCarthy 
>>> wrote:
>>>
 Different tools are giving me different numbers.

 At the time of the error, Windbg says there are about 2000 open
 handles, most of them are Event handles, not File handles.  That's higher
 than I'd expect, but not really concerning.


>>> Ah, that's useful.  I am using events (python threading.Event).  These
>>> don't afford any clean up mechanisms on them, so I assume these go away
>>> when the Python objects that hold them go away.
>>>
>>>
 Process Explorer, however, shows ~20k open handles per Python process
 running dotest.exe.  It also says that about 2000 of those are the
 process's "own handles."  I'm researching to see what that means.  I
 suspect it means that the process has about ~18k handles to objects owned
 by another process and 2k of ones that it actually owns.

 I found this Stack Overflow post, which suggests is may be an
 interaction with using Python subprocess in a loop and having those
 subprocesses work with files that are still open in the parent process, but
 I don't entirely understand the answer:


 http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files


>>> Hmm I'll read through that.
>>>
>>>
 It might be a problem with Python subprocess that's been fixed in a
 newer version.  I'm going to try upgrading from Python 2.7.9 to 2.7.10 to
 see if that makes a difference.


>>> Okay, we're on 2.7.10 on latest OS X.  I *think* I'm using Python 2.7.6
>>> on Ubuntu 14.04.  Checking now... (yes, 2.7.6 on 14.04).  Ubuntu 15.10 beta
>>> 1 is using Python 2.7.10.
>>>
>>> Seems reasonable to check that out.  Let me know what you find out!
>>>
>>> -Todd
>>>
>>>
 On Mon, Oct 5, 2015 at 12:02 PM, Todd Fiala 
 wrote:

> It's possible.  However, I was monitoring actual open files during the
> course of the run (i.e. what the kernel thought was open for the master
> driver process, which is the only place that makes sense to see leaks
> accumulate) in both threading and threading-pool (on OS X), and I saw only
> the handful of file handles that I'd expect to  be open - pipes
> (stdout,stderr,stdin) from the main test runner to the inferior test
> runners, the shared libraries loaded as part of the test runner, and (in 
> my
> case, but probably not yours for the configuration), the tcp sockets for
> gathering the test events.  There was no growth, and I didn't see things
> hanging around longer than I'd expect.
>
> The SysInternals process viewer tool is great for this kind of thing -
> glad you're using it.  Once you find out which file handles are getting
> leaked and where they came from, we can probably figure out which part 

Re: [lldb-dev] Too many open files

2015-10-05 Thread Todd Fiala via lldb-dev
Interesting, okay..

This does appear to be an accumulation issue.  You made it most of the way
through before the issue hit.  I suspect we're leaking file handles.  It
probably doesn't hit the per-process limit on multiprocessing because the
leaked files get spread across more processes.

(All speculation but does fit the results).

I'll see if I can look into what's there - if we've got an obvious leak,
I'll take care of it.

On Mon, Oct 5, 2015 at 9:58 AM, Adrian McCarthy  wrote:

> Thanks for the ideas.
>
> With `--test-runner-name threading-pool`, I get too many open files.
>
> With `--test-runner-name multiprocessing-pool`, the suite runs fine.
>
> My machine has 40 logical cores.
>
> With `--threads=20`:  SUCCESS (and perhaps _faster_).
>
> With `--threads=30`:  SUCCESS.
>
> With `--threads=36`:  SUCCESS.
>
> With `--threads=38`:  TOO MANY OPEN FILES.
>
> So we're right at the edge.  I'll keep investigating.
>
> So it seems we're on the bleeding edge.
>
>
> On Fri, Oct 2, 2015 at 5:38 PM, Todd Fiala  wrote:
>
>> (swapped out the lldb list for the newer one)
>>
>> On Fri, Oct 2, 2015 at 5:37 PM, Todd Fiala  wrote:
>>
>>> Hmm, sounds suspicious.
>>>
>>> Can you try running the tests with two options and see if you get
>>> different results?
>>>
>>> # should be equivalent for the default on Windows, thus should match
>>> your above results.  This one uses a thread per worker queue.
>>> --test-runner-name threading-pool
>>>
>>> # should use a different test runner.  This one uses a process per
>>> worker queue.
>>> --test-runner-name multiprocessing-pool
>>>
>>> Aside from that, it seems like the total number of open files is
>>> exceeding some process/system maximum, which sounds like (maybe) we're
>>> leaking files somewhere.  Not enough info yet to guess where that might be
>>> coming in from, but maybe a part of the test runner isn't closing files
>>> somewhere.
>>>
>>> The other thing you can try is reducing the total number of threads,
>>> with:
>>> --threads {some-number-lower-than-your-total-number-of-logical-cores}
>>>
>>> in the event that your machine has a mongo number of logical cores, and
>>> perhaps it is trying to do too much.  (In that case, the
>>> multiprocessing-pool runner might actually help).
>>>
>>> Thanks!
>>>
>>> -Todd
>>>
>>> On Fri, Oct 2, 2015 at 5:31 PM, Adrian McCarthy 
>>> wrote:
>>>
 When running LLDB tests on Windows, I started getting a "too many open
 files" error from Python.  I used git bisect to narrow it down to this
 revision:

 http://llvm.org/viewvc/llvm-project?view=revision=249182

 The error output is:

 Command invoked: D:\src\Python-2.7.9\PCbuild\python_d.exe
 D:\src\llvm\llvm\tools\lldb\test\dotest.py -q --arch=i686 --executable
 D:/src/llvm/build/ninja/bin/lldb.exe -s
 D:/src/llvm/build/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS
 --enable-crash-dialog -C D:\src\llvm\build\ninja_release\bin\clang.exe
 --inferior -p TestRecursiveTypes.py D:\src\llvm\llvm\tools\lldb\test
 --event-add-entries worker_index=7:int
 384 out of 400 test suites processed - TestRecursiveTypes.py
 Traceback (most recent call last):
   File "D:/src/llvm/llvm/tools/lldb/test/dotest.py", line 1457, in
 
   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1355, in main
   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 968, in
 walk_and_invoke
   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1095, in
 
   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 889, in
 threading_test_runner_pool
   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 774, in
 map_async_run_loop
   File "D:\src\Python-2.7.9\Lib\multiprocessing\pool.py", line 558, in
 get
 OSError: [Errno 24] Too many open files
 [77809 refs]
 ninja: build stopped: subcommand failed.


 Any clue what might have caused this or what can be done to fix it?

 It's Friday afternoon, so there's no urgency from my perspective.  I'll
 probably get back to this on Monday morning.

 Thanks,
 Adrian McCarthy

>>>
>>>
>>>
>>> --
>>> -Todd
>>>
>>
>>
>>
>> --
>> -Todd
>>
>
>


-- 
-Todd
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev


Re: [lldb-dev] Too many open files

2015-10-05 Thread Adrian McCarthy via lldb-dev
Different tools are giving me different numbers.

At the time of the error, Windbg says there are about 2000 open handles,
most of them are Event handles, not File handles.  That's higher than I'd
expect, but not really concerning.

Process Explorer, however, shows ~20k open handles per Python process
running dotest.exe.  It also says that about 2000 of those are the
process's "own handles."  I'm researching to see what that means.  I
suspect it means that the process has about ~18k handles to objects owned
by another process and 2k of ones that it actually owns.

I found this Stack Overflow post, which suggests is may be an interaction
with using Python subprocess in a loop and having those subprocesses work
with files that are still open in the parent process, but I don't entirely
understand the answer:

http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files

It might be a problem with Python subprocess that's been fixed in a newer
version.  I'm going to try upgrading from Python 2.7.9 to 2.7.10 to see if
that makes a difference.

On Mon, Oct 5, 2015 at 12:02 PM, Todd Fiala  wrote:

> It's possible.  However, I was monitoring actual open files during the
> course of the run (i.e. what the kernel thought was open for the master
> driver process, which is the only place that makes sense to see leaks
> accumulate) in both threading and threading-pool (on OS X), and I saw only
> the handful of file handles that I'd expect to  be open - pipes
> (stdout,stderr,stdin) from the main test runner to the inferior test
> runners, the shared libraries loaded as part of the test runner, and (in my
> case, but probably not yours for the configuration), the tcp sockets for
> gathering the test events.  There was no growth, and I didn't see things
> hanging around longer than I'd expect.
>
> The SysInternals process viewer tool is great for this kind of thing -
> glad you're using it.  Once you find out which file handles are getting
> leaked and where they came from, we can probably figure out which part of
> the implementation is leaking it.  I don't *expect* it to be on our side
> given that it's not showing up on a POSIX-y system, but maybe it really is
> but isn't in the form of a file handle on the POSIX side.  I should have a
> look at the memory growth...
>
> On Mon, Oct 5, 2015 at 11:41 AM, Adrian McCarthy 
> wrote:
>
>> I'm poking around with some SysInternals tools.  Over the course of test
>> run, there are about 602k opens (CreateFiles) and 405k
>> closes (CloseFiles) system-wide.
>>
>> I'm looking for a way to stop it once the error happens, so I can see how
>> many files each process has open.  As it stands, the OS cleans up once the
>> error is hit.
>>
>> I wonder if it's not a matter of actually leaking open file handles but
>> that the closes are happening too late so that we cross the threshold
>> shortly before the test runner would have shut everything down.
>>
>> On Mon, Oct 5, 2015 at 11:32 AM, Todd Fiala  wrote:
>>
>>> On OS X, I'm also not seeing growth in the --test-runner-name
>>> threading-pool (the one you were using on Windows).
>>>
>>> Perhaps you can dig into if you're experiencing some kind of file leak
>>> on Windows.  It's possible you're hitting a platform-specific leak?  I
>>> recall Ed Maste hitting a FreeBSD-only leak in one or more of the python
>>> 2.7.x releases.
>>>
>>> On Mon, Oct 5, 2015 at 11:26 AM, Todd Fiala 
>>> wrote:
>>>
 Hmm, on OS X the file handles seem to be well behaved on the
 --test-runner-name threading.  I'm not seeing any file handle growth beyond
 the file handles I expect to be open.

 I'll see if the threading-pool behaves differently.  (That is similar
 to threading but uses the multiprocessing.pool mechanism, at the expense of
 me not  being able to catch Ctrl-C at all).

 It's possible the pool is introducing some leakage at the file level.

 On Mon, Oct 5, 2015 at 11:20 AM, Todd Fiala 
 wrote:

> Interesting, okay..
>
> This does appear to be an accumulation issue.  You made it most of the
> way through before the issue hit.  I suspect we're leaking file handles.
> It probably doesn't hit the per-process limit on multiprocessing because
> the leaked files get spread across more processes.
>
> (All speculation but does fit the results).
>
> I'll see if I can look into what's there - if we've got an obvious
> leak, I'll take care of it.
>
> On Mon, Oct 5, 2015 at 9:58 AM, Adrian McCarthy 
> wrote:
>
>> Thanks for the ideas.
>>
>> With `--test-runner-name threading-pool`, I get too many open files.
>>
>> With `--test-runner-name multiprocessing-pool`, the suite runs fine.
>>
>> My machine has 40 logical cores.
>>
>> With `--threads=20`:  SUCCESS (and 

Re: [lldb-dev] Too many open files

2015-10-05 Thread Adrian McCarthy via lldb-dev
Thanks for the ideas.

With `--test-runner-name threading-pool`, I get too many open files.

With `--test-runner-name multiprocessing-pool`, the suite runs fine.

My machine has 40 logical cores.

With `--threads=20`:  SUCCESS (and perhaps _faster_).

With `--threads=30`:  SUCCESS.

With `--threads=36`:  SUCCESS.

With `--threads=38`:  TOO MANY OPEN FILES.

So we're right at the edge.  I'll keep investigating.

So it seems we're on the bleeding edge.


On Fri, Oct 2, 2015 at 5:38 PM, Todd Fiala  wrote:

> (swapped out the lldb list for the newer one)
>
> On Fri, Oct 2, 2015 at 5:37 PM, Todd Fiala  wrote:
>
>> Hmm, sounds suspicious.
>>
>> Can you try running the tests with two options and see if you get
>> different results?
>>
>> # should be equivalent for the default on Windows, thus should match your
>> above results.  This one uses a thread per worker queue.
>> --test-runner-name threading-pool
>>
>> # should use a different test runner.  This one uses a process per worker
>> queue.
>> --test-runner-name multiprocessing-pool
>>
>> Aside from that, it seems like the total number of open files is
>> exceeding some process/system maximum, which sounds like (maybe) we're
>> leaking files somewhere.  Not enough info yet to guess where that might be
>> coming in from, but maybe a part of the test runner isn't closing files
>> somewhere.
>>
>> The other thing you can try is reducing the total number of threads, with:
>> --threads {some-number-lower-than-your-total-number-of-logical-cores}
>>
>> in the event that your machine has a mongo number of logical cores, and
>> perhaps it is trying to do too much.  (In that case, the
>> multiprocessing-pool runner might actually help).
>>
>> Thanks!
>>
>> -Todd
>>
>> On Fri, Oct 2, 2015 at 5:31 PM, Adrian McCarthy 
>> wrote:
>>
>>> When running LLDB tests on Windows, I started getting a "too many open
>>> files" error from Python.  I used git bisect to narrow it down to this
>>> revision:
>>>
>>> http://llvm.org/viewvc/llvm-project?view=revision=249182
>>>
>>> The error output is:
>>>
>>> Command invoked: D:\src\Python-2.7.9\PCbuild\python_d.exe
>>> D:\src\llvm\llvm\tools\lldb\test\dotest.py -q --arch=i686 --executable
>>> D:/src/llvm/build/ninja/bin/lldb.exe -s
>>> D:/src/llvm/build/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS
>>> --enable-crash-dialog -C D:\src\llvm\build\ninja_release\bin\clang.exe
>>> --inferior -p TestRecursiveTypes.py D:\src\llvm\llvm\tools\lldb\test
>>> --event-add-entries worker_index=7:int
>>> 384 out of 400 test suites processed - TestRecursiveTypes.py
>>>   Traceback (most recent call last):
>>>   File "D:/src/llvm/llvm/tools/lldb/test/dotest.py", line 1457, in
>>> 
>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1355, in main
>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 968, in
>>> walk_and_invoke
>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1095, in
>>> 
>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 889, in
>>> threading_test_runner_pool
>>>   File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 774, in
>>> map_async_run_loop
>>>   File "D:\src\Python-2.7.9\Lib\multiprocessing\pool.py", line 558, in
>>> get
>>> OSError: [Errno 24] Too many open files
>>> [77809 refs]
>>> ninja: build stopped: subcommand failed.
>>>
>>>
>>> Any clue what might have caused this or what can be done to fix it?
>>>
>>> It's Friday afternoon, so there's no urgency from my perspective.  I'll
>>> probably get back to this on Monday morning.
>>>
>>> Thanks,
>>> Adrian McCarthy
>>>
>>
>>
>>
>> --
>> -Todd
>>
>
>
>
> --
> -Todd
>
___
lldb-dev mailing list
lldb-dev@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev