The pluggable method would at least allow everyone to continue working until someone has time to dig into what's wrong with multiprocess on Windows
On Fri, Sep 4, 2015 at 9:56 PM Todd Fiala <todd.fi...@gmail.com> wrote: > On Fri, Sep 4, 2015 at 5:40 PM, Zachary Turner <ztur...@google.com> wrote: > >> >> >> On Fri, Sep 4, 2015 at 5:10 PM Todd Fiala <todd.fi...@gmail.com> wrote: >> >>> tfiala added a comment. >>> >>> In http://reviews.llvm.org/D12651#240480, @zturner wrote: >>> >>> > Tried out this patch, unfortunately I'm seeing the same thing. The >>> very >>> > first call to worker.join() is never returning. >>> > >>> > It's unfortunate that it's so hard to debug this stuff, do you have any >>> > suggestions for how I can try to nail down what the child dotest >>> instance >>> > is actually doing? I wonder if it's blocking somewhere in its >>> script, or >>> > if this is some quirk of the multiprocessing library's dynamic >>> invocation / >>> > whatever magic is does. >>> > >>> > How much of an effort would it be to make the switch to threads now? >>> The >>> > main thing we'd have to do is get rid of all of the globals in >>> dotest, and >>> > make a DoTest class or something. >>> >>> >>> It's a bit more work than I want to take on right now. I think we >>> really may want to keep the multiprocessing and just not exec out to >>> dotest.py for a third-ish time for each inferior. >>> >> >> Just to clarify, are you saying we may want to keep multiprocessing over >> threads even if you can solve the exec problem? Any particular reason? >> > > Yes, you understood me correctly. > > Prior to me getting into it, dosep.py was designed to isolate each test > into its own process (via the subprocess exec call) so that each test > directory or file got its own lldb processor and there was process-level > isolation, less contention on the Python global interpreter lock, etc. > > Then, when Steve Pucci and later I got to making it multithreaded, we > wrapped the exec call in a "import threading" style thread pool. That > maintained the process isolation property by having each thread just do an > exec (i.e. multiple execs in parallel). Except, this didn't work on > MacOSX. The exec calls grab the Python GIL on OS X (and not anywhere as as > far as I could find). But multithreading + exec is a valid option for > everything not OS X. > > The way I solved it to work for everyone was to drop the "import > threading" approach and switch to the "import multiprocessing" approach. > This worked everywhere, including on OS X (although with a few hiccups > initially, as it exposed occasional hangs at the time with what looked like > socket handling under Darwin). What I failed to see in my haste was that I > then had two levels of fork/exec-like behavior (i.e. we had two process > firewalls where we only needed one, at the cost of an extra exec): the > multiprocessing works by effectively forking/creating a new process that is > now isolated. But then we turn around and still create a subprocess to > exec out to dotest.py. > > What I'm suggesting in the near future is if we stick with the > multiprocessing approach, and eliminate the subprocess exec and instead > just have the multiprocess worker call directly into a methodized entry > point in dotest.py, we can skip the subprocess call within the multiprocess > worker. It is already isolated and a separate process, so it is already > fulfilling the isolation requirement. And it reduces the doubled processes > created. And it works on OS X in addition to everywhere else. It does > become more difficult to debug, but then again the majority of the logic is > in dotest.py and can be debugged --no-multiprocess (or with logging). > > This is all separate somewhat from the Ctrl-C issue, but it is the > backstory on what I'm referring to with the parallel test runner. > > Completely as an aside, I did ask Greg Clayton to see if he can poke into > why OS X is hitting the Python GIL on execs in "import threading"-style > execs from multiple threads. But assuming nothing magic changes there and > it wasn't easily solved (I tried and failed after several attempts to > diagnose last year), I'd prefer to keep a strategy that is the same unless > there's a decent win on the execution front. > > That all said, I'm starting to think a pluggable strategy for the actual > mechanic of the parallel test run might end up being best anyway since I'd > really like the Ctrl-C working and I'm not able to diagnose what's > happening on the Windows scenario. > > >> Multi-threaded is much easier to debug, for starters, because you can >> just attach your debugger to a single process. It also solves a lot of >> race conditions and makes output processing easier (not to mention higher >> performance), because you don't even need a way to have the sub-processes >> communicate their results back to the parent because the results are just >> in memory. stick them in a synchronized queue and the parent can just >> process it. So it would probably even speed up the test runner. >> >> I think if there's not a very good reason to keep multiprocessing around, >> we should aim for a threaded approach. My understanding is that lit >> already does this, so there's no fundamental reason it shouldn't work >> correctly on MacOSX, just have to solve the exec problem like you mentioned. >> >> >> > > > -- > -Todd >
_______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits