On Mon, Oct 5, 2015 at 3:58 PM, Adrian McCarthy <amcca...@google.com> wrote:
> Different tools are giving me different numbers. > > At the time of the error, Windbg says there are about 2000 open handles, > most of them are Event handles, not File handles. That's higher than I'd > expect, but not really concerning. > > Ah, that's useful. I am using events (python threading.Event). These don't afford any clean up mechanisms on them, so I assume these go away when the Python objects that hold them go away. > Process Explorer, however, shows ~20k open handles per Python process > running dotest.exe. It also says that about 2000 of those are the > process's "own handles." I'm researching to see what that means. I > suspect it means that the process has about ~18k handles to objects owned > by another process and 2k of ones that it actually owns. > > I found this Stack Overflow post, which suggests is may be an interaction > with using Python subprocess in a loop and having those subprocesses work > with files that are still open in the parent process, but I don't entirely > understand the answer: > > > http://stackoverflow.com/questions/16526783/python-subprocess-too-many-open-files > > Hmm I'll read through that. > It might be a problem with Python subprocess that's been fixed in a newer > version. I'm going to try upgrading from Python 2.7.9 to 2.7.10 to see if > that makes a difference. > > Okay, we're on 2.7.10 on latest OS X. I *think* I'm using Python 2.7.6 on Ubuntu 14.04. Checking now... (yes, 2.7.6 on 14.04). Ubuntu 15.10 beta 1 is using Python 2.7.10. Seems reasonable to check that out. Let me know what you find out! -Todd > On Mon, Oct 5, 2015 at 12:02 PM, Todd Fiala <todd.fi...@gmail.com> wrote: > >> It's possible. However, I was monitoring actual open files during the >> course of the run (i.e. what the kernel thought was open for the master >> driver process, which is the only place that makes sense to see leaks >> accumulate) in both threading and threading-pool (on OS X), and I saw only >> the handful of file handles that I'd expect to be open - pipes >> (stdout,stderr,stdin) from the main test runner to the inferior test >> runners, the shared libraries loaded as part of the test runner, and (in my >> case, but probably not yours for the configuration), the tcp sockets for >> gathering the test events. There was no growth, and I didn't see things >> hanging around longer than I'd expect. >> >> The SysInternals process viewer tool is great for this kind of thing - >> glad you're using it. Once you find out which file handles are getting >> leaked and where they came from, we can probably figure out which part of >> the implementation is leaking it. I don't *expect* it to be on our side >> given that it's not showing up on a POSIX-y system, but maybe it really is >> but isn't in the form of a file handle on the POSIX side. I should have a >> look at the memory growth... >> >> On Mon, Oct 5, 2015 at 11:41 AM, Adrian McCarthy <amcca...@google.com> >> wrote: >> >>> I'm poking around with some SysInternals tools. Over the course of test >>> run, there are about 602k opens (CreateFiles) and 405k >>> closes (CloseFiles) system-wide. >>> >>> I'm looking for a way to stop it once the error happens, so I can see >>> how many files each process has open. As it stands, the OS cleans up once >>> the error is hit. >>> >>> I wonder if it's not a matter of actually leaking open file handles but >>> that the closes are happening too late so that we cross the threshold >>> shortly before the test runner would have shut everything down. >>> >>> On Mon, Oct 5, 2015 at 11:32 AM, Todd Fiala <todd.fi...@gmail.com> >>> wrote: >>> >>>> On OS X, I'm also not seeing growth in the --test-runner-name >>>> threading-pool (the one you were using on Windows). >>>> >>>> Perhaps you can dig into if you're experiencing some kind of file leak >>>> on Windows. It's possible you're hitting a platform-specific leak? I >>>> recall Ed Maste hitting a FreeBSD-only leak in one or more of the python >>>> 2.7.x releases. >>>> >>>> On Mon, Oct 5, 2015 at 11:26 AM, Todd Fiala <todd.fi...@gmail.com> >>>> wrote: >>>> >>>>> Hmm, on OS X the file handles seem to be well behaved on the >>>>> --test-runner-name threading. I'm not seeing any file handle growth >>>>> beyond >>>>> the file handles I expect to be open. >>>>> >>>>> I'll see if the threading-pool behaves differently. (That is similar >>>>> to threading but uses the multiprocessing.pool mechanism, at the expense >>>>> of >>>>> me not being able to catch Ctrl-C at all). >>>>> >>>>> It's possible the pool is introducing some leakage at the file level. >>>>> >>>>> On Mon, Oct 5, 2015 at 11:20 AM, Todd Fiala <todd.fi...@gmail.com> >>>>> wrote: >>>>> >>>>>> Interesting, okay.. >>>>>> >>>>>> This does appear to be an accumulation issue. You made it most of >>>>>> the way through before the issue hit. I suspect we're leaking file >>>>>> handles. It probably doesn't hit the per-process limit on >>>>>> multiprocessing >>>>>> because the leaked files get spread across more processes. >>>>>> >>>>>> (All speculation but does fit the results). >>>>>> >>>>>> I'll see if I can look into what's there - if we've got an obvious >>>>>> leak, I'll take care of it. >>>>>> >>>>>> On Mon, Oct 5, 2015 at 9:58 AM, Adrian McCarthy <amcca...@google.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks for the ideas. >>>>>>> >>>>>>> With `--test-runner-name threading-pool`, I get too many open files. >>>>>>> >>>>>>> With `--test-runner-name multiprocessing-pool`, the suite runs fine. >>>>>>> >>>>>>> My machine has 40 logical cores. >>>>>>> >>>>>>> With `--threads=20`: SUCCESS (and perhaps _faster_). >>>>>>> >>>>>>> With `--threads=30`: SUCCESS. >>>>>>> >>>>>>> With `--threads=36`: SUCCESS. >>>>>>> >>>>>>> With `--threads=38`: TOO MANY OPEN FILES. >>>>>>> >>>>>>> So we're right at the edge. I'll keep investigating. >>>>>>> >>>>>>> So it seems we're on the bleeding edge. >>>>>>> >>>>>>> >>>>>>> On Fri, Oct 2, 2015 at 5:38 PM, Todd Fiala <todd.fi...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> (swapped out the lldb list for the newer one) >>>>>>>> >>>>>>>> On Fri, Oct 2, 2015 at 5:37 PM, Todd Fiala <todd.fi...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hmm, sounds suspicious. >>>>>>>>> >>>>>>>>> Can you try running the tests with two options and see if you get >>>>>>>>> different results? >>>>>>>>> >>>>>>>>> # should be equivalent for the default on Windows, thus should >>>>>>>>> match your above results. This one uses a thread per worker queue. >>>>>>>>> --test-runner-name threading-pool >>>>>>>>> >>>>>>>>> # should use a different test runner. This one uses a process per >>>>>>>>> worker queue. >>>>>>>>> --test-runner-name multiprocessing-pool >>>>>>>>> >>>>>>>>> Aside from that, it seems like the total number of open files is >>>>>>>>> exceeding some process/system maximum, which sounds like (maybe) we're >>>>>>>>> leaking files somewhere. Not enough info yet to guess where that >>>>>>>>> might be >>>>>>>>> coming in from, but maybe a part of the test runner isn't closing >>>>>>>>> files >>>>>>>>> somewhere. >>>>>>>>> >>>>>>>>> The other thing you can try is reducing the total number of >>>>>>>>> threads, with: >>>>>>>>> --threads >>>>>>>>> {some-number-lower-than-your-total-number-of-logical-cores} >>>>>>>>> >>>>>>>>> in the event that your machine has a mongo number of logical >>>>>>>>> cores, and perhaps it is trying to do too much. (In that case, the >>>>>>>>> multiprocessing-pool runner might actually help). >>>>>>>>> >>>>>>>>> Thanks! >>>>>>>>> >>>>>>>>> -Todd >>>>>>>>> >>>>>>>>> On Fri, Oct 2, 2015 at 5:31 PM, Adrian McCarthy < >>>>>>>>> amcca...@google.com> wrote: >>>>>>>>> >>>>>>>>>> When running LLDB tests on Windows, I started getting a "too many >>>>>>>>>> open files" error from Python. I used git bisect to narrow it down >>>>>>>>>> to this >>>>>>>>>> revision: >>>>>>>>>> >>>>>>>>>> http://llvm.org/viewvc/llvm-project?view=revision&revision=249182 >>>>>>>>>> >>>>>>>>>> The error output is: >>>>>>>>>> >>>>>>>>>> Command invoked: D:\src\Python-2.7.9\PCbuild\python_d.exe >>>>>>>>>> D:\src\llvm\llvm\tools\lldb\test\dotest.py -q --arch=i686 >>>>>>>>>> --executable >>>>>>>>>> D:/src/llvm/build/ninja/bin/lldb.exe -s >>>>>>>>>> D:/src/llvm/build/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS >>>>>>>>>> --enable-crash-dialog -C >>>>>>>>>> D:\src\llvm\build\ninja_release\bin\clang.exe >>>>>>>>>> --inferior -p TestRecursiveTypes.py D:\src\llvm\llvm\tools\lldb\test >>>>>>>>>> --event-add-entries worker_index=7:int >>>>>>>>>> 384 out of 400 test suites processed - TestRecursiveTypes.py >>>>>>>>>> Traceback (most recent call last): >>>>>>>>>> File "D:/src/llvm/llvm/tools/lldb/test/dotest.py", line 1457, >>>>>>>>>> in <module> >>>>>>>>>> File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1355, in >>>>>>>>>> main >>>>>>>>>> File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 968, in >>>>>>>>>> walk_and_invoke >>>>>>>>>> File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 1095, in >>>>>>>>>> <lambda> >>>>>>>>>> File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 889, in >>>>>>>>>> threading_test_runner_pool >>>>>>>>>> File "D:\src\llvm\llvm\tools\lldb\test\dosep.py", line 774, in >>>>>>>>>> map_async_run_loop >>>>>>>>>> File "D:\src\Python-2.7.9\Lib\multiprocessing\pool.py", line >>>>>>>>>> 558, in get >>>>>>>>>> OSError: [Errno 24] Too many open files >>>>>>>>>> [77809 refs] >>>>>>>>>> ninja: build stopped: subcommand failed. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Any clue what might have caused this or what can be done to fix >>>>>>>>>> it? >>>>>>>>>> >>>>>>>>>> It's Friday afternoon, so there's no urgency from my >>>>>>>>>> perspective. I'll probably get back to this on Monday morning. >>>>>>>>>> >>>>>>>>>> Thanks, >>>>>>>>>> Adrian McCarthy >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> -Todd >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -Todd >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -Todd >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> -Todd >>>>> >>>> >>>> >>>> >>>> -- >>>> -Todd >>>> >>> >>> >> >> >> -- >> -Todd >> > > -- -Todd
_______________________________________________ lldb-dev mailing list lldb-dev@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-dev