The pluggable method would at least allow everyone to continue working
until someone has time to dig into what's wrong with multiprocess on Windows

On Fri, Sep 4, 2015 at 9:56 PM Todd Fiala <todd.fi...@gmail.com> wrote:

> On Fri, Sep 4, 2015 at 5:40 PM, Zachary Turner <ztur...@google.com> wrote:
>
>>
>>
>> On Fri, Sep 4, 2015 at 5:10 PM Todd Fiala <todd.fi...@gmail.com> wrote:
>>
>>> tfiala added a comment.
>>>
>>> In http://reviews.llvm.org/D12651#240480, @zturner wrote:
>>>
>>> > Tried out this patch, unfortunately I'm seeing the same thing.  The
>>> very
>>> >  first call to worker.join() is never returning.
>>> >
>>> > It's unfortunate that it's so hard to debug this stuff, do you have any
>>> >  suggestions for how I can try to nail down what the child dotest
>>> instance
>>> >  is actually doing?  I wonder if it's blocking somewhere in its
>>> script, or
>>> >  if this is some quirk of the multiprocessing library's dynamic
>>> invocation /
>>> >  whatever magic is does.
>>> >
>>> > How much of an effort would it be to make the switch to threads now?
>>> The
>>> >  main thing we'd have to do is get rid of all of the globals in
>>> dotest, and
>>> >  make a DoTest class or something.
>>>
>>>
>>> It's a bit more work than I want to take on right now.  I think we
>>> really may want to keep the multiprocessing and just not exec out to
>>> dotest.py for a third-ish time for each inferior.
>>>
>>
>> Just to clarify, are you saying we may want to keep multiprocessing over
>> threads even if you can solve the exec problem?  Any particular reason?
>>
>
> Yes, you understood me correctly.
>
> Prior to me getting into it, dosep.py was designed to isolate each test
> into its own process (via the subprocess exec call) so that each test
> directory or file got its own lldb processor and there was process-level
> isolation, less contention on the Python global interpreter lock, etc.
>
> Then, when Steve Pucci and later I got to making it multithreaded, we
> wrapped the exec call in a "import threading" style thread pool.  That
> maintained the process isolation property by having each thread just do an
> exec (i.e. multiple execs in parallel).  Except, this didn't work on
> MacOSX.  The exec calls grab the Python GIL on OS X (and not anywhere as as
> far as I could find).  But multithreading + exec is a valid option for
> everything not OS X.
>
> The way I solved it to work for everyone was to drop the "import
> threading" approach and switch to the "import multiprocessing" approach.
> This worked everywhere, including on OS X (although with a few hiccups
> initially, as it exposed occasional hangs at the time with what looked like
> socket handling under Darwin).  What I failed to see in my haste was that I
> then had two levels of fork/exec-like behavior (i.e. we had two process
> firewalls where we only needed one, at the cost of an extra exec): the
> multiprocessing works by effectively forking/creating a new process that is
> now isolated.  But then we turn around and still create a subprocess to
> exec out to dotest.py.
>
> What I'm suggesting in the near future is if we stick with the
> multiprocessing approach, and eliminate the subprocess exec and instead
> just have the multiprocess worker call directly into a methodized entry
> point in dotest.py, we can skip the subprocess call within the multiprocess
> worker.  It is already isolated and a separate process, so it is already
> fulfilling the isolation requirement.  And it reduces the doubled processes
> created.  And it works on OS X in addition to everywhere else.  It does
> become more difficult to debug, but then again the majority of the logic is
> in dotest.py and can be debugged --no-multiprocess (or with logging).
>
> This is all separate somewhat from the Ctrl-C issue, but it is the
> backstory on what I'm referring to with the parallel test runner.
>
> Completely as an aside, I did ask Greg Clayton to see if he can poke into
> why OS X is hitting the Python GIL on execs in "import threading"-style
> execs from multiple threads.  But assuming nothing magic changes there and
> it wasn't easily solved (I tried and failed after several attempts to
> diagnose last year), I'd prefer to keep a strategy that is the same unless
> there's a decent win on the execution front.
>
> That all said, I'm starting to think a pluggable strategy for the actual
> mechanic of the parallel test run might end up being best anyway since I'd
> really like the Ctrl-C working and I'm not able to diagnose what's
> happening on the Windows scenario.
>
>
>>   Multi-threaded is much easier to debug, for starters, because you can
>> just attach your debugger to a single process.  It also solves a lot of
>> race conditions and makes output processing easier (not to mention higher
>> performance), because you don't even need a way to have the sub-processes
>> communicate their results back to the parent because the results are just
>> in memory.  stick them in a synchronized queue and the parent can just
>> process it.  So it would probably even speed up the test runner.
>>
>> I think if there's not a very good reason to keep multiprocessing around,
>> we should aim for a threaded approach.  My understanding is that lit
>> already does this, so there's no fundamental reason it shouldn't work
>> correctly on MacOSX, just have to solve the exec problem like you mentioned.
>>
>>
>>
>
>
> --
> -Todd
>
_______________________________________________
lldb-commits mailing list
lldb-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/lldb-commits

Reply via email to