Re: Coroutines - was Re: Daffodil SAX API Proposal

Steve Lawrence Wed, 16 Sep 2020 07:16:48 -0700

By "coroutines library", you're talking about the one on your gist that
you wrote?


https://gist.github.com/mbeckerle/312474bac9bee9102438c160890b6539

It would be nice if batching we're an option, at least so we could test
it and see if there is a difference.

Also, although we're not trying to go faster by overlapping, perhaps
this is something we might want to consider? Is there a reason to not
parallelize the SAX thread filling up queue and the unparse thread
reading from that queue? I guess if one thread is much faster than the
other then there's really not much benefit and one thread might just
spin waiting for the other to read/write and event? Does your coroutine
library do something to prevent this from happening?


On 9/16/20 10:00 AM, Beckerle, Mike wrote:
> The point of the coroutines library is that doing something "as simple as" 
> just an array blocking queue, etc. with threads is always problematic.
> 
> Also, an important point. The objective here is "no parallelism". We're not 
> trying to go faster by overlapping things. We're just trying to change stacks 
> so we can run two different stack contexts.  Ideally this would all be a 
> single thread with stack switching. JVMs just don't have that.
> 
> I think the coroutines library is pretty simple to use, and could be adapted 
> to batch up requests to reduce overhead if we want.
> 
> 
> ________________________________
> From: Steve Lawrence <slawre...@apache.org>
> Sent: Wednesday, September 16, 2020 8:12 AM
> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
> 
> As I recall, the libraries that use things like annotations end up
> changing the return types of all the callers, which ends up leaking in
> to he API and changing it, so I don't think any of those solutions will
> work.
> 
> I think we have to use Threads, where the main thread is the caller
> using the SAX API, and when unparse is called we spawn off the actual
> unparse in a new thread. And there's some data structure shared between
> these threads that contains event information.
> 
> I think it really just comes down to which of the various
> implementations to use. I'm not too familiar with Mike's Coroutine
> class. Mike, can you maybe discuss what advantages this has over say
> just spawning a thread and sharing something like an ArrayBlockingQueue
> to pass event information between the threads? This seems like the
> simplest option, and allows tuning the size of the queue, which should
> allow batching of events and minimize context switching between threads.
> 
> - Steve
> 
> On 9/15/20 10:15 PM, Olabusayo Kilo wrote:
>> I don't think we came to a conclusion on which path we should take. If I
>> understand correctly, our options seem to be between the Thread-based
>> Coroutine library (#3; which has a bit of overhead) and the
>> Continuations library (#2; which is not yet supported for 2.13 and
>> requires the suspendable annotation). I wanted to check in to see if
>> there was a preferred one that I could focus my effort on?
>>
>> On 4/24/20 9:28 AM, Beckerle, Mike wrote:
>>> A further thought on this. The overhead difference between
>>> continuations and threads was 1 to 4 (roughly).
>>>
>>> If you add real workload to what happens on either side of that
>>> producer-consumer relationship, I bet this difference disappears into
>>> the noise, not because it becomes more efficient due to less
>>> contention, but because it's such a tiny fraction of the actual work
>>> being done.
>>>
>>> The Thread-based coroutines library, I have a copy of in a separate
>>> sandbox, so if you want to grab that I'll get it over to you so you
>>> don't have to dig for it.
>>> ________________________________
>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>> Sent: Friday, April 24, 2020 8:53 AM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>>
>>> That's really informative and confirms intuition that using threads
>>> really hurts performance when all you need is a stack switch.
>>>
>>> In this case reducing contention should reduce total work, but that
>>> depends on how carefully the queue is implemented. If it is a single
>>> lock it may not matter.
>>>
>>> We actually dont care about faster through parallelism because we
>>> should assume the machine is already saturated with work. We want to
>>> reduce total amount of work done.
>>>
>>>
>>>
>>>
>>> ________________________________
>>> From: Steve Lawrence <slawre...@apache.org>
>>> Sent: Friday, April 24, 2020 8:02:37 AM
>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>>
>>> I decided to look at performance of three potential options to see if
>>> that would rule anything out. I looked at 1) coroutines 2) continuations
>>> 3) threads with BlockingQueue. For each of these, I modified the gist to
>>> remove printlns and use a different producer consumer model (which is
>>> actually very straightforward in we come across other alternatives to
>>> test). So everything is the same except for how the SAX content handler
>>> interacts with the custom InfosetInputter. For the performance numbers
>>> below, I created enough "events" in a loop so that rate of events
>>> remained roughly the same as I increased the number of events.
>>>
>>> 1) coroutines
>>>
>>> It turns out the coroutines library has a limitation where the
>>> yieldval() call must be directly inside the coroutine{} block. This is
>>> basically a non-starter for us, since the entire unparse call needs to
>>> be a coroutine, and the yieldval call happens way down the stack. So not
>>> only does this not have any active development, it functionally won't
>>> even work for us.
>>>
>>> 2) continuations
>>>
>>> 16.50 million events per second
>>>
>>> 3) thread with BlockingQueue
>>>
>>> I think this is similar to the Coroutine library you wrote for Daffodil
>>> (though it looks like it's been removed, we can probably find it in git
>>> the history if we want). This runs the unparse method in a thread and
>>> has a blocking queue that the producer pushes to and the consumer takes
>>> from. I tested with different queue sizes to see how that affects
>>> performance:
>>>
>>>    size  rate
>>>       1  0.14 million events per second
>>>      10  1.36 million events per second
>>>     100  3.18 million events per second
>>>    1000  3.16 million events per second
>>> 100000  3.09 million events per second
>>>
>>> So this BlockinQueue approach is quite a bit slower, and definitely
>>> requires batching events to be somewhat performant. I guess this
>>> slowness makes sens as this approach creates a thread for the unparse,
>>> has different threads blocking on this queue, and also creates a bunch
>>> of event objects to put in the queue (the continuation approach just
>>> mutates state so no extra objects are needed). It is possible that this
>>> isn't an accurate test since the producer is going crazy fast since I'm
>>> just incrementing a Long in each loop iteration. In the real world, the
>>> producer is going to be parsing XML or something, so won't be as fast.
>>> Perhaps if the producer was actually slower there would be less thread
>>> contention and actually allow for more parallel work?
>>>
>>>
>>> On 4/23/20 5:41 PM, Beckerle, Mike wrote:
>>>> I am pretty worried about the @suspendable annotation. The way this
>>>> shift/reset stuff works is it modifies the scala compiler to do
>>>> something called continuation passing style. aka CPS.
>>>>
>>>> I'd be ok if that was isolated to just a segment of the code. Maybe
>>>> there is some natural way to do that?
>>>>
>>>> But it seems to me that all code on the pathway from where a reset
>>>> block is entered to where a shift is called, all of it has to
>>>> propagate this @suspendable behavior and be compiled by way of this
>>>> CPS plug in. That looks ok for the tiny toy examples, but for a giant
>>>> code base like Daffodil runtime1 unparser, .... that seems fragile,
>>>> potentially has impact on debugging, memory allocation, and
>>>> performance of the code, and,... well given the lack of enthusiastic
>>>> support for shift/reset I think it is risky.
>>>>
>>>> The only other option I can think of is to spawn a separate thread,
>>>> allow true concurrency in a producer-consumer model.
>>>>
>>>> We already have a Coroutines library you may recall. We're not using
>>>> it in the code base now, and it's fairly high-overhead as it is a
>>>> depth 1 queue, so is constantly switching threads. It might have
>>>> better performance characteristics if the switching was reduced to
>>>> once every 100 events or similar. Streaming behavior does not have to
>>>> switch from events to pull at granularity 1 event per pull, it can be
>>>> much coarser than that to push overhead down.
>>>>
>>>> The limiting thing here really seems to be the JVM. Java virtual
>>>> machines simply don't support the concept of co-routines in any
>>>> sensible manner.
>>>>
>>>> There are also some coroutine-style libraries for Java that depend on
>>>> byte-code modification. I suspect those have a similar issue to the
>>>> CPS transformation, ie., all the code on the way to a suspension
>>>> requires the byte code modification, but I may be wrong.
>>>>
>>>> ________________________________
>>>> From: Steve Lawrence <slawre...@apache.org>
>>>> Sent: Thursday, April 23, 2020 11:21 AM
>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
>>>>
>>>> Thanks Mike! Continuations seems like a better alternative, at least
>>>> from a support point of view. Though, it's a little concerning that no
>>>> one is really stepping up to port it to 2.13, but I don't think we're in
>>>> any rush to get to 2.13. And I personally find the reset/shift concept a
>>>> bit harder to wrap my head around than the co-routine resume/yield, but
>>>> ultimately it's not too bad.
>>>>
>>>> To see how it would work with our DataProcessor/InfosetInputter, I
>>>> forked and updated your gist to include things like InfosetInputters,
>>>> DataProcessor, ContentHandler, etc. and added a bunch of println's and
>>>> comments to make sure things were behaving the way I thought they
>>>> should.
>>>>
>>>> https://gist.github.com/stevedlawrence/5e16081f4690448de6131af02daacea9
>>>>
>>>> I think it came out pretty straightforward. I also modified this so that
>>>> there isn't as much back and forth between hasNext/next like I have in
>>>> the current proposal. The only time we go back the to
>>>> ContentHandler/producer is when next() is called, and we only go back to
>>>> the InfosetInputter/consumer when a complete event is found, including
>>>> hasNext.
>>>>
>>>> I do have one concern with this approach. Scala required the
>>>> @suspendable annotation on the unparse() method of the DataProcessor and
>>>> on the next() method of the InfosetInputter for both the abstract class
>>>> and concrete SAX implementation. I'm not sure if that annotation causes
>>>> any problems when not used inside a reset block (i.e. old API style), or
>>>> if that annotation will end up cascading throughout the codebase. Seems
>>>> like there's a possibility for that to happen. Maybe I just need to
>>>> reorganize the code a bit, but it's not clear to me how.
>>>>
>>>>
>>>> On 4/22/20 7:18 PM, Beckerle, Mike wrote:
>>>>> scala continuations is supported on 2.11 and 2.12, but work in
>>>>> progress for 2.13. The main web page for it says it is looking for a
>>>>> lead developer and without that typesafe/lightbeam is doing bare
>>>>> minimum maintenance.
>>>>>
>>>>> A producer/consumer idiom like what we need is easily expressed
>>>>> using this shift/reset thing.
>>>>>
>>>>> Here's a gist that does a control turnaround from a handler to a
>>>>> pull-oriented while loop. Took me a bit of research to get the
>>>>> build.sbt right so this would "just work"
>>>>>
>>>>> https://gist.github.com/mbeckerle/4c1d8f8c365958ef7d01bf770fa6317c
>>>>>
>>>>>
>>>>> ________________________________
>>>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>>>> Sent: Wednesday, April 22, 2020 5:01 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: Daffodil SAX API Proposal
>>>>>
>>>>> Another possibility is scala-asynch which I think can do what we want.
>>>>> ________________________________
>>>>> From: Beckerle, Mike <mbecke...@tresys.com>
>>>>> Sent: Wednesday, April 22, 2020 4:34 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: Daffodil SAX API Proposal
>>>>>
>>>>> The alternative is probably scala.util.continuations aka "shift and
>>>>> reset".
>>>>>
>>>>> It's much harder to understand and use, but at least its in the
>>>>> standard library so is supported. (I think.)
>>>>>
>>>>> ________________________________
>>>>> From: Steve Lawrence <slawre...@apache.org>
>>>>> Sent: Wednesday, April 22, 2020 3:40 PM
>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>> Subject: Re: Daffodil SAX API Proposal
>>>>>
>>>>> I responded.
>>>>>
>>>>> I checked the license to make sure it's compatible (BSD-3), but I
>>>>> didn't
>>>>> actually check what versions of Scala it works with.
>>>>>
>>>>> Looks like it is only published for 2.11, and the repo hasn't been
>>>>> updated for at least 3 years. There is a 2.12.x branch in their repo,
>>>>> but it too hasn't been updated in a long time. We might have to see how
>>>>> much effort it would take to update that library, or perhaps find
>>>>> another library.
>>>>>
>>>>>
>>>>> On 4/22/20 3:28 PM, Beckerle, Mike wrote:
>>>>>> I reviewed this and added a comment about the only significant
>>>>>> issue, which I think just boils down to trying to keep the
>>>>>> coroutining back and forth as simple as possible.
>>>>>>
>>>>>> Another thought: Is the scala coroutines library supported in 2.11
>>>>>> and 2.12 (and 2.13 for being future-safe?)
>>>>>>
>>>>>>
>>>>>> ________________________________
>>>>>> From: Steve Lawrence <slawre...@apache.org>
>>>>>> Sent: Wednesday, April 22, 2020 1:06 PM
>>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org>
>>>>>> Subject: Daffodil SAX API Proposal
>>>>>>
>>>>>> I've added a proposal to add a SAX API support to Daffodil.
>>>>>>
>>>>>> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+SAX+API
>>>>>>
>>>>>>
>>>>>> Many libraries and applications already support SAX, so this should
>>>>>> provide a means for more seamless integration into different
>>>>>> toolsuites,
>>>>>> opening up the places where Daffodil could be easily integrated.
>>>>>>
>>>>>> SAX is also generally viewed as having a lower memory overhead, though
>>>>>> this does not attempt to solve the memory issues related to
>>>>>> Daffodil and
>>>>>> the internal infoset representation. This essentially just adds a SAX
>>>>>> compatible API around our existing API. Other changes are needed to
>>>>>> reduce our memory overhead and truly support a streaming model.
>>>>>>
>>>>>> - Steve
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
> 
>

Re: Coroutines - was Re: Daffodil SAX API Proposal

Reply via email to