Yep, I'm thinking that's right. On Fri, Sep 4, 2015 at 10:02 PM, Zachary Turner <ztur...@google.com> wrote:
> The pluggable method would at least allow everyone to continue working > until someone has time to dig into what's wrong with multiprocess on Windows > > On Fri, Sep 4, 2015 at 9:56 PM Todd Fiala <todd.fi...@gmail.com> wrote: > >> On Fri, Sep 4, 2015 at 5:40 PM, Zachary Turner <ztur...@google.com> >> wrote: >> >>> >>> >>> On Fri, Sep 4, 2015 at 5:10 PM Todd Fiala <todd.fi...@gmail.com> wrote: >>> >>>> tfiala added a comment. >>>> >>>> In http://reviews.llvm.org/D12651#240480, @zturner wrote: >>>> >>>> > Tried out this patch, unfortunately I'm seeing the same thing. The >>>> very >>>> > first call to worker.join() is never returning. >>>> > >>>> > It's unfortunate that it's so hard to debug this stuff, do you have >>>> any >>>> > suggestions for how I can try to nail down what the child dotest >>>> instance >>>> > is actually doing? I wonder if it's blocking somewhere in its >>>> script, or >>>> > if this is some quirk of the multiprocessing library's dynamic >>>> invocation / >>>> > whatever magic is does. >>>> > >>>> > How much of an effort would it be to make the switch to threads now? >>>> The >>>> > main thing we'd have to do is get rid of all of the globals in >>>> dotest, and >>>> > make a DoTest class or something. >>>> >>>> >>>> It's a bit more work than I want to take on right now. I think we >>>> really may want to keep the multiprocessing and just not exec out to >>>> dotest.py for a third-ish time for each inferior. >>>> >>> >>> Just to clarify, are you saying we may want to keep multiprocessing over >>> threads even if you can solve the exec problem? Any particular reason? >>> >> >> Yes, you understood me correctly. >> >> Prior to me getting into it, dosep.py was designed to isolate each test >> into its own process (via the subprocess exec call) so that each test >> directory or file got its own lldb processor and there was process-level >> isolation, less contention on the Python global interpreter lock, etc. >> >> Then, when Steve Pucci and later I got to making it multithreaded, we >> wrapped the exec call in a "import threading" style thread pool. That >> maintained the process isolation property by having each thread just do an >> exec (i.e. multiple execs in parallel). Except, this didn't work on >> MacOSX. The exec calls grab the Python GIL on OS X (and not anywhere as as >> far as I could find). But multithreading + exec is a valid option for >> everything not OS X. >> >> The way I solved it to work for everyone was to drop the "import >> threading" approach and switch to the "import multiprocessing" approach. >> This worked everywhere, including on OS X (although with a few hiccups >> initially, as it exposed occasional hangs at the time with what looked like >> socket handling under Darwin). What I failed to see in my haste was that I >> then had two levels of fork/exec-like behavior (i.e. we had two process >> firewalls where we only needed one, at the cost of an extra exec): the >> multiprocessing works by effectively forking/creating a new process that is >> now isolated. But then we turn around and still create a subprocess to >> exec out to dotest.py. >> >> What I'm suggesting in the near future is if we stick with the >> multiprocessing approach, and eliminate the subprocess exec and instead >> just have the multiprocess worker call directly into a methodized entry >> point in dotest.py, we can skip the subprocess call within the multiprocess >> worker. It is already isolated and a separate process, so it is already >> fulfilling the isolation requirement. And it reduces the doubled processes >> created. And it works on OS X in addition to everywhere else. It does >> become more difficult to debug, but then again the majority of the logic is >> in dotest.py and can be debugged --no-multiprocess (or with logging). >> >> This is all separate somewhat from the Ctrl-C issue, but it is the >> backstory on what I'm referring to with the parallel test runner. >> >> Completely as an aside, I did ask Greg Clayton to see if he can poke into >> why OS X is hitting the Python GIL on execs in "import threading"-style >> execs from multiple threads. But assuming nothing magic changes there and >> it wasn't easily solved (I tried and failed after several attempts to >> diagnose last year), I'd prefer to keep a strategy that is the same unless >> there's a decent win on the execution front. >> >> That all said, I'm starting to think a pluggable strategy for the actual >> mechanic of the parallel test run might end up being best anyway since I'd >> really like the Ctrl-C working and I'm not able to diagnose what's >> happening on the Windows scenario. >> >> >>> Multi-threaded is much easier to debug, for starters, because you can >>> just attach your debugger to a single process. It also solves a lot of >>> race conditions and makes output processing easier (not to mention higher >>> performance), because you don't even need a way to have the sub-processes >>> communicate their results back to the parent because the results are just >>> in memory. stick them in a synchronized queue and the parent can just >>> process it. So it would probably even speed up the test runner. >>> >>> I think if there's not a very good reason to keep multiprocessing >>> around, we should aim for a threaded approach. My understanding is that >>> lit already does this, so there's no fundamental reason it shouldn't work >>> correctly on MacOSX, just have to solve the exec problem like you mentioned. >>> >>> >>> >> >> >> -- >> -Todd >> > -- -Todd
_______________________________________________ lldb-commits mailing list lldb-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits