By "coroutines library", you're talking about the one on your gist that you wrote?
https://gist.github.com/mbeckerle/312474bac9bee9102438c160890b6539 It would be nice if batching we're an option, at least so we could test it and see if there is a difference. Also, although we're not trying to go faster by overlapping, perhaps this is something we might want to consider? Is there a reason to not parallelize the SAX thread filling up queue and the unparse thread reading from that queue? I guess if one thread is much faster than the other then there's really not much benefit and one thread might just spin waiting for the other to read/write and event? Does your coroutine library do something to prevent this from happening? On 9/16/20 10:00 AM, Beckerle, Mike wrote: > The point of the coroutines library is that doing something "as simple as" > just an array blocking queue, etc. with threads is always problematic. > > Also, an important point. The objective here is "no parallelism". We're not > trying to go faster by overlapping things. We're just trying to change stacks > so we can run two different stack contexts. Ideally this would all be a > single thread with stack switching. JVMs just don't have that. > > I think the coroutines library is pretty simple to use, and could be adapted > to batch up requests to reduce overhead if we want. > > > ________________________________ > From: Steve Lawrence <slawre...@apache.org> > Sent: Wednesday, September 16, 2020 8:12 AM > To: dev@daffodil.apache.org <dev@daffodil.apache.org> > Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal > > As I recall, the libraries that use things like annotations end up > changing the return types of all the callers, which ends up leaking in > to he API and changing it, so I don't think any of those solutions will > work. > > I think we have to use Threads, where the main thread is the caller > using the SAX API, and when unparse is called we spawn off the actual > unparse in a new thread. And there's some data structure shared between > these threads that contains event information. > > I think it really just comes down to which of the various > implementations to use. I'm not too familiar with Mike's Coroutine > class. Mike, can you maybe discuss what advantages this has over say > just spawning a thread and sharing something like an ArrayBlockingQueue > to pass event information between the threads? This seems like the > simplest option, and allows tuning the size of the queue, which should > allow batching of events and minimize context switching between threads. > > - Steve > > On 9/15/20 10:15 PM, Olabusayo Kilo wrote: >> I don't think we came to a conclusion on which path we should take. If I >> understand correctly, our options seem to be between the Thread-based >> Coroutine library (#3; which has a bit of overhead) and the >> Continuations library (#2; which is not yet supported for 2.13 and >> requires the suspendable annotation). I wanted to check in to see if >> there was a preferred one that I could focus my effort on? >> >> On 4/24/20 9:28 AM, Beckerle, Mike wrote: >>> A further thought on this. The overhead difference between >>> continuations and threads was 1 to 4 (roughly). >>> >>> If you add real workload to what happens on either side of that >>> producer-consumer relationship, I bet this difference disappears into >>> the noise, not because it becomes more efficient due to less >>> contention, but because it's such a tiny fraction of the actual work >>> being done. >>> >>> The Thread-based coroutines library, I have a copy of in a separate >>> sandbox, so if you want to grab that I'll get it over to you so you >>> don't have to dig for it. >>> ________________________________ >>> From: Beckerle, Mike <mbecke...@tresys.com> >>> Sent: Friday, April 24, 2020 8:53 AM >>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal >>> >>> That's really informative and confirms intuition that using threads >>> really hurts performance when all you need is a stack switch. >>> >>> In this case reducing contention should reduce total work, but that >>> depends on how carefully the queue is implemented. If it is a single >>> lock it may not matter. >>> >>> We actually dont care about faster through parallelism because we >>> should assume the machine is already saturated with work. We want to >>> reduce total amount of work done. >>> >>> >>> >>> >>> ________________________________ >>> From: Steve Lawrence <slawre...@apache.org> >>> Sent: Friday, April 24, 2020 8:02:37 AM >>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal >>> >>> I decided to look at performance of three potential options to see if >>> that would rule anything out. I looked at 1) coroutines 2) continuations >>> 3) threads with BlockingQueue. For each of these, I modified the gist to >>> remove printlns and use a different producer consumer model (which is >>> actually very straightforward in we come across other alternatives to >>> test). So everything is the same except for how the SAX content handler >>> interacts with the custom InfosetInputter. For the performance numbers >>> below, I created enough "events" in a loop so that rate of events >>> remained roughly the same as I increased the number of events. >>> >>> 1) coroutines >>> >>> It turns out the coroutines library has a limitation where the >>> yieldval() call must be directly inside the coroutine{} block. This is >>> basically a non-starter for us, since the entire unparse call needs to >>> be a coroutine, and the yieldval call happens way down the stack. So not >>> only does this not have any active development, it functionally won't >>> even work for us. >>> >>> 2) continuations >>> >>> 16.50 million events per second >>> >>> 3) thread with BlockingQueue >>> >>> I think this is similar to the Coroutine library you wrote for Daffodil >>> (though it looks like it's been removed, we can probably find it in git >>> the history if we want). This runs the unparse method in a thread and >>> has a blocking queue that the producer pushes to and the consumer takes >>> from. I tested with different queue sizes to see how that affects >>> performance: >>> >>> size rate >>> 1 0.14 million events per second >>> 10 1.36 million events per second >>> 100 3.18 million events per second >>> 1000 3.16 million events per second >>> 100000 3.09 million events per second >>> >>> So this BlockinQueue approach is quite a bit slower, and definitely >>> requires batching events to be somewhat performant. I guess this >>> slowness makes sens as this approach creates a thread for the unparse, >>> has different threads blocking on this queue, and also creates a bunch >>> of event objects to put in the queue (the continuation approach just >>> mutates state so no extra objects are needed). It is possible that this >>> isn't an accurate test since the producer is going crazy fast since I'm >>> just incrementing a Long in each loop iteration. In the real world, the >>> producer is going to be parsing XML or something, so won't be as fast. >>> Perhaps if the producer was actually slower there would be less thread >>> contention and actually allow for more parallel work? >>> >>> >>> On 4/23/20 5:41 PM, Beckerle, Mike wrote: >>>> I am pretty worried about the @suspendable annotation. The way this >>>> shift/reset stuff works is it modifies the scala compiler to do >>>> something called continuation passing style. aka CPS. >>>> >>>> I'd be ok if that was isolated to just a segment of the code. Maybe >>>> there is some natural way to do that? >>>> >>>> But it seems to me that all code on the pathway from where a reset >>>> block is entered to where a shift is called, all of it has to >>>> propagate this @suspendable behavior and be compiled by way of this >>>> CPS plug in. That looks ok for the tiny toy examples, but for a giant >>>> code base like Daffodil runtime1 unparser, .... that seems fragile, >>>> potentially has impact on debugging, memory allocation, and >>>> performance of the code, and,... well given the lack of enthusiastic >>>> support for shift/reset I think it is risky. >>>> >>>> The only other option I can think of is to spawn a separate thread, >>>> allow true concurrency in a producer-consumer model. >>>> >>>> We already have a Coroutines library you may recall. We're not using >>>> it in the code base now, and it's fairly high-overhead as it is a >>>> depth 1 queue, so is constantly switching threads. It might have >>>> better performance characteristics if the switching was reduced to >>>> once every 100 events or similar. Streaming behavior does not have to >>>> switch from events to pull at granularity 1 event per pull, it can be >>>> much coarser than that to push overhead down. >>>> >>>> The limiting thing here really seems to be the JVM. Java virtual >>>> machines simply don't support the concept of co-routines in any >>>> sensible manner. >>>> >>>> There are also some coroutine-style libraries for Java that depend on >>>> byte-code modification. I suspect those have a similar issue to the >>>> CPS transformation, ie., all the code on the way to a suspension >>>> requires the byte code modification, but I may be wrong. >>>> >>>> ________________________________ >>>> From: Steve Lawrence <slawre...@apache.org> >>>> Sent: Thursday, April 23, 2020 11:21 AM >>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>> Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal >>>> >>>> Thanks Mike! Continuations seems like a better alternative, at least >>>> from a support point of view. Though, it's a little concerning that no >>>> one is really stepping up to port it to 2.13, but I don't think we're in >>>> any rush to get to 2.13. And I personally find the reset/shift concept a >>>> bit harder to wrap my head around than the co-routine resume/yield, but >>>> ultimately it's not too bad. >>>> >>>> To see how it would work with our DataProcessor/InfosetInputter, I >>>> forked and updated your gist to include things like InfosetInputters, >>>> DataProcessor, ContentHandler, etc. and added a bunch of println's and >>>> comments to make sure things were behaving the way I thought they >>>> should. >>>> >>>> https://gist.github.com/stevedlawrence/5e16081f4690448de6131af02daacea9 >>>> >>>> I think it came out pretty straightforward. I also modified this so that >>>> there isn't as much back and forth between hasNext/next like I have in >>>> the current proposal. The only time we go back the to >>>> ContentHandler/producer is when next() is called, and we only go back to >>>> the InfosetInputter/consumer when a complete event is found, including >>>> hasNext. >>>> >>>> I do have one concern with this approach. Scala required the >>>> @suspendable annotation on the unparse() method of the DataProcessor and >>>> on the next() method of the InfosetInputter for both the abstract class >>>> and concrete SAX implementation. I'm not sure if that annotation causes >>>> any problems when not used inside a reset block (i.e. old API style), or >>>> if that annotation will end up cascading throughout the codebase. Seems >>>> like there's a possibility for that to happen. Maybe I just need to >>>> reorganize the code a bit, but it's not clear to me how. >>>> >>>> >>>> On 4/22/20 7:18 PM, Beckerle, Mike wrote: >>>>> scala continuations is supported on 2.11 and 2.12, but work in >>>>> progress for 2.13. The main web page for it says it is looking for a >>>>> lead developer and without that typesafe/lightbeam is doing bare >>>>> minimum maintenance. >>>>> >>>>> A producer/consumer idiom like what we need is easily expressed >>>>> using this shift/reset thing. >>>>> >>>>> Here's a gist that does a control turnaround from a handler to a >>>>> pull-oriented while loop. Took me a bit of research to get the >>>>> build.sbt right so this would "just work" >>>>> >>>>> https://gist.github.com/mbeckerle/4c1d8f8c365958ef7d01bf770fa6317c >>>>> >>>>> >>>>> ________________________________ >>>>> From: Beckerle, Mike <mbecke...@tresys.com> >>>>> Sent: Wednesday, April 22, 2020 5:01 PM >>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>>> Subject: Re: Daffodil SAX API Proposal >>>>> >>>>> Another possibility is scala-asynch which I think can do what we want. >>>>> ________________________________ >>>>> From: Beckerle, Mike <mbecke...@tresys.com> >>>>> Sent: Wednesday, April 22, 2020 4:34 PM >>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>>> Subject: Re: Daffodil SAX API Proposal >>>>> >>>>> The alternative is probably scala.util.continuations aka "shift and >>>>> reset". >>>>> >>>>> It's much harder to understand and use, but at least its in the >>>>> standard library so is supported. (I think.) >>>>> >>>>> ________________________________ >>>>> From: Steve Lawrence <slawre...@apache.org> >>>>> Sent: Wednesday, April 22, 2020 3:40 PM >>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>>> Subject: Re: Daffodil SAX API Proposal >>>>> >>>>> I responded. >>>>> >>>>> I checked the license to make sure it's compatible (BSD-3), but I >>>>> didn't >>>>> actually check what versions of Scala it works with. >>>>> >>>>> Looks like it is only published for 2.11, and the repo hasn't been >>>>> updated for at least 3 years. There is a 2.12.x branch in their repo, >>>>> but it too hasn't been updated in a long time. We might have to see how >>>>> much effort it would take to update that library, or perhaps find >>>>> another library. >>>>> >>>>> >>>>> On 4/22/20 3:28 PM, Beckerle, Mike wrote: >>>>>> I reviewed this and added a comment about the only significant >>>>>> issue, which I think just boils down to trying to keep the >>>>>> coroutining back and forth as simple as possible. >>>>>> >>>>>> Another thought: Is the scala coroutines library supported in 2.11 >>>>>> and 2.12 (and 2.13 for being future-safe?) >>>>>> >>>>>> >>>>>> ________________________________ >>>>>> From: Steve Lawrence <slawre...@apache.org> >>>>>> Sent: Wednesday, April 22, 2020 1:06 PM >>>>>> To: dev@daffodil.apache.org <dev@daffodil.apache.org> >>>>>> Subject: Daffodil SAX API Proposal >>>>>> >>>>>> I've added a proposal to add a SAX API support to Daffodil. >>>>>> >>>>>> https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+SAX+API >>>>>> >>>>>> >>>>>> Many libraries and applications already support SAX, so this should >>>>>> provide a means for more seamless integration into different >>>>>> toolsuites, >>>>>> opening up the places where Daffodil could be easily integrated. >>>>>> >>>>>> SAX is also generally viewed as having a lower memory overhead, though >>>>>> this does not attempt to solve the memory issues related to >>>>>> Daffodil and >>>>>> the internal infoset representation. This essentially just adds a SAX >>>>>> compatible API around our existing API. Other changes are needed to >>>>>> reduce our memory overhead and truly support a streaming model. >>>>>> >>>>>> - Steve >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> > >