That'll also let me set it up so Greg can poke around with the threading version on OS X.
On Fri, Sep 4, 2015 at 10:04 PM, Todd Fiala <todd.fi...@gmail.com> wrote: > Yep, I'm thinking that's right. > > On Fri, Sep 4, 2015 at 10:02 PM, Zachary Turner <ztur...@google.com> > wrote: > >> The pluggable method would at least allow everyone to continue working >> until someone has time to dig into what's wrong with multiprocess on Windows >> >> On Fri, Sep 4, 2015 at 9:56 PM Todd Fiala <todd.fi...@gmail.com> wrote: >> >>> On Fri, Sep 4, 2015 at 5:40 PM, Zachary Turner <ztur...@google.com> >>> wrote: >>> >>>> >>>> >>>> On Fri, Sep 4, 2015 at 5:10 PM Todd Fiala <todd.fi...@gmail.com> wrote: >>>> >>>>> tfiala added a comment. >>>>> >>>>> In http://reviews.llvm.org/D12651#240480, @zturner wrote: >>>>> >>>>> > Tried out this patch, unfortunately I'm seeing the same thing. The >>>>> very >>>>> > first call to worker.join() is never returning. >>>>> > >>>>> > It's unfortunate that it's so hard to debug this stuff, do you have >>>>> any >>>>> > suggestions for how I can try to nail down what the child dotest >>>>> instance >>>>> > is actually doing? I wonder if it's blocking somewhere in its >>>>> script, or >>>>> > if this is some quirk of the multiprocessing library's dynamic >>>>> invocation / >>>>> > whatever magic is does. >>>>> > >>>>> > How much of an effort would it be to make the switch to threads >>>>> now? The >>>>> > main thing we'd have to do is get rid of all of the globals in >>>>> dotest, and >>>>> > make a DoTest class or something. >>>>> >>>>> >>>>> It's a bit more work than I want to take on right now. I think we >>>>> really may want to keep the multiprocessing and just not exec out to >>>>> dotest.py for a third-ish time for each inferior. >>>>> >>>> >>>> Just to clarify, are you saying we may want to keep multiprocessing >>>> over threads even if you can solve the exec problem? Any particular >>>> reason? >>>> >>> >>> Yes, you understood me correctly. >>> >>> Prior to me getting into it, dosep.py was designed to isolate each test >>> into its own process (via the subprocess exec call) so that each test >>> directory or file got its own lldb processor and there was process-level >>> isolation, less contention on the Python global interpreter lock, etc. >>> >>> Then, when Steve Pucci and later I got to making it multithreaded, we >>> wrapped the exec call in a "import threading" style thread pool. That >>> maintained the process isolation property by having each thread just do an >>> exec (i.e. multiple execs in parallel). Except, this didn't work on >>> MacOSX. The exec calls grab the Python GIL on OS X (and not anywhere as as >>> far as I could find). But multithreading + exec is a valid option for >>> everything not OS X. >>> >>> The way I solved it to work for everyone was to drop the "import >>> threading" approach and switch to the "import multiprocessing" approach. >>> This worked everywhere, including on OS X (although with a few hiccups >>> initially, as it exposed occasional hangs at the time with what looked like >>> socket handling under Darwin). What I failed to see in my haste was that I >>> then had two levels of fork/exec-like behavior (i.e. we had two process >>> firewalls where we only needed one, at the cost of an extra exec): the >>> multiprocessing works by effectively forking/creating a new process that is >>> now isolated. But then we turn around and still create a subprocess to >>> exec out to dotest.py. >>> >>> What I'm suggesting in the near future is if we stick with the >>> multiprocessing approach, and eliminate the subprocess exec and instead >>> just have the multiprocess worker call directly into a methodized entry >>> point in dotest.py, we can skip the subprocess call within the multiprocess >>> worker. It is already isolated and a separate process, so it is already >>> fulfilling the isolation requirement. And it reduces the doubled processes >>> created. And it works on OS X in addition to everywhere else. It does >>> become more difficult to debug, but then again the majority of the logic is >>> in dotest.py and can be debugged --no-multiprocess (or with logging). >>> >>> This is all separate somewhat from the Ctrl-C issue, but it is the >>> backstory on what I'm referring to with the parallel test runner. >>> >>> Completely as an aside, I did ask Greg Clayton to see if he can poke >>> into why OS X is hitting the Python GIL on execs in "import >>> threading"-style execs from multiple threads. But assuming nothing magic >>> changes there and it wasn't easily solved (I tried and failed after several >>> attempts to diagnose last year), I'd prefer to keep a strategy that is the >>> same unless there's a decent win on the execution front. >>> >>> That all said, I'm starting to think a pluggable strategy for the actual >>> mechanic of the parallel test run might end up being best anyway since I'd >>> really like the Ctrl-C working and I'm not able to diagnose what's >>> happening on the Windows scenario. >>> >>> >>>> Multi-threaded is much easier to debug, for starters, because you can >>>> just attach your debugger to a single process. It also solves a lot of >>>> race conditions and makes output processing easier (not to mention higher >>>> performance), because you don't even need a way to have the sub-processes >>>> communicate their results back to the parent because the results are just >>>> in memory. stick them in a synchronized queue and the parent can just >>>> process it. So it would probably even speed up the test runner. >>>> >>>> I think if there's not a very good reason to keep multiprocessing >>>> around, we should aim for a threaded approach. My understanding is that >>>> lit already does this, so there's no fundamental reason it shouldn't work >>>> correctly on MacOSX, just have to solve the exec problem like you >>>> mentioned. >>>> >>>> >>>> >>> >>> >>> -- >>> -Todd >>> >> > > > -- > -Todd > -- -Todd
_______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits