A further thought on this. The overhead difference between continuations and
threads was 1 to 4 (roughly).
If you add real workload to what happens on either side of that
producer-consumer relationship, I bet this difference disappears into the
noise, not because it becomes more efficient due to less contention, but
because it's such a tiny fraction of the actual work being done.
The Thread-based coroutines library, I have a copy of in a separate sandbox, so
if you want to grab that I'll get it over to you so you don't have to dig for
it.
________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Friday, April 24, 2020 8:53 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
That's really informative and confirms intuition that using threads really
hurts performance when all you need is a stack switch.
In this case reducing contention should reduce total work, but that depends on
how carefully the queue is implemented. If it is a single lock it may not
matter.
We actually dont care about faster through parallelism because we should assume
the machine is already saturated with work. We want to reduce total amount of
work done.
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Friday, April 24, 2020 8:02:37 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
I decided to look at performance of three potential options to see if
that would rule anything out. I looked at 1) coroutines 2) continuations
3) threads with BlockingQueue. For each of these, I modified the gist to
remove printlns and use a different producer consumer model (which is
actually very straightforward in we come across other alternatives to
test). So everything is the same except for how the SAX content handler
interacts with the custom InfosetInputter. For the performance numbers
below, I created enough "events" in a loop so that rate of events
remained roughly the same as I increased the number of events.
1) coroutines
It turns out the coroutines library has a limitation where the
yieldval() call must be directly inside the coroutine{} block. This is
basically a non-starter for us, since the entire unparse call needs to
be a coroutine, and the yieldval call happens way down the stack. So not
only does this not have any active development, it functionally won't
even work for us.
2) continuations
16.50 million events per second
3) thread with BlockingQueue
I think this is similar to the Coroutine library you wrote for Daffodil
(though it looks like it's been removed, we can probably find it in git
the history if we want). This runs the unparse method in a thread and
has a blocking queue that the producer pushes to and the consumer takes
from. I tested with different queue sizes to see how that affects
performance:
size rate
1 0.14 million events per second
10 1.36 million events per second
100 3.18 million events per second
1000 3.16 million events per second
100000 3.09 million events per second
So this BlockinQueue approach is quite a bit slower, and definitely
requires batching events to be somewhat performant. I guess this
slowness makes sens as this approach creates a thread for the unparse,
has different threads blocking on this queue, and also creates a bunch
of event objects to put in the queue (the continuation approach just
mutates state so no extra objects are needed). It is possible that this
isn't an accurate test since the producer is going crazy fast since I'm
just incrementing a Long in each loop iteration. In the real world, the
producer is going to be parsing XML or something, so won't be as fast.
Perhaps if the producer was actually slower there would be less thread
contention and actually allow for more parallel work?
On 4/23/20 5:41 PM, Beckerle, Mike wrote:
I am pretty worried about the @suspendable annotation. The way this shift/reset
stuff works is it modifies the scala compiler to do something called
continuation passing style. aka CPS.
I'd be ok if that was isolated to just a segment of the code. Maybe there is
some natural way to do that?
But it seems to me that all code on the pathway from where a reset block is
entered to where a shift is called, all of it has to propagate this
@suspendable behavior and be compiled by way of this CPS plug in. That looks ok
for the tiny toy examples, but for a giant code base like Daffodil runtime1
unparser, .... that seems fragile, potentially has impact on debugging, memory
allocation, and performance of the code, and,... well given the lack of
enthusiastic support for shift/reset I think it is risky.
The only other option I can think of is to spawn a separate thread, allow true
concurrency in a producer-consumer model.
We already have a Coroutines library you may recall. We're not using it in the
code base now, and it's fairly high-overhead as it is a depth 1 queue, so is
constantly switching threads. It might have better performance characteristics
if the switching was reduced to once every 100 events or similar. Streaming
behavior does not have to switch from events to pull at granularity 1 event per
pull, it can be much coarser than that to push overhead down.
The limiting thing here really seems to be the JVM. Java virtual machines
simply don't support the concept of co-routines in any sensible manner.
There are also some coroutine-style libraries for Java that depend on byte-code
modification. I suspect those have a similar issue to the CPS transformation,
ie., all the code on the way to a suspension requires the byte code
modification, but I may be wrong.
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Thursday, April 23, 2020 11:21 AM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Coroutines - was Re: Daffodil SAX API Proposal
Thanks Mike! Continuations seems like a better alternative, at least
from a support point of view. Though, it's a little concerning that no
one is really stepping up to port it to 2.13, but I don't think we're in
any rush to get to 2.13. And I personally find the reset/shift concept a
bit harder to wrap my head around than the co-routine resume/yield, but
ultimately it's not too bad.
To see how it would work with our DataProcessor/InfosetInputter, I
forked and updated your gist to include things like InfosetInputters,
DataProcessor, ContentHandler, etc. and added a bunch of println's and
comments to make sure things were behaving the way I thought they should.
https://gist.github.com/stevedlawrence/5e16081f4690448de6131af02daacea9
I think it came out pretty straightforward. I also modified this so that
there isn't as much back and forth between hasNext/next like I have in
the current proposal. The only time we go back the to
ContentHandler/producer is when next() is called, and we only go back to
the InfosetInputter/consumer when a complete event is found, including
hasNext.
I do have one concern with this approach. Scala required the
@suspendable annotation on the unparse() method of the DataProcessor and
on the next() method of the InfosetInputter for both the abstract class
and concrete SAX implementation. I'm not sure if that annotation causes
any problems when not used inside a reset block (i.e. old API style), or
if that annotation will end up cascading throughout the codebase. Seems
like there's a possibility for that to happen. Maybe I just need to
reorganize the code a bit, but it's not clear to me how.
On 4/22/20 7:18 PM, Beckerle, Mike wrote:
scala continuations is supported on 2.11 and 2.12, but work in progress for
2.13. The main web page for it says it is looking for a lead developer and
without that typesafe/lightbeam is doing bare minimum maintenance.
A producer/consumer idiom like what we need is easily expressed using this
shift/reset thing.
Here's a gist that does a control turnaround from a handler to a pull-oriented while
loop. Took me a bit of research to get the build.sbt right so this would "just
work"
https://gist.github.com/mbeckerle/4c1d8f8c365958ef7d01bf770fa6317c
________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Wednesday, April 22, 2020 5:01 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Daffodil SAX API Proposal
Another possibility is scala-asynch which I think can do what we want.
________________________________
From: Beckerle, Mike <mbecke...@tresys.com>
Sent: Wednesday, April 22, 2020 4:34 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Daffodil SAX API Proposal
The alternative is probably scala.util.continuations aka "shift and reset".
It's much harder to understand and use, but at least its in the standard
library so is supported. (I think.)
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Wednesday, April 22, 2020 3:40 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Re: Daffodil SAX API Proposal
I responded.
I checked the license to make sure it's compatible (BSD-3), but I didn't
actually check what versions of Scala it works with.
Looks like it is only published for 2.11, and the repo hasn't been
updated for at least 3 years. There is a 2.12.x branch in their repo,
but it too hasn't been updated in a long time. We might have to see how
much effort it would take to update that library, or perhaps find
another library.
On 4/22/20 3:28 PM, Beckerle, Mike wrote:
I reviewed this and added a comment about the only significant issue, which I
think just boils down to trying to keep the coroutining back and forth as
simple as possible.
Another thought: Is the scala coroutines library supported in 2.11 and 2.12
(and 2.13 for being future-safe?)
________________________________
From: Steve Lawrence <slawre...@apache.org>
Sent: Wednesday, April 22, 2020 1:06 PM
To: dev@daffodil.apache.org <dev@daffodil.apache.org>
Subject: Daffodil SAX API Proposal
I've added a proposal to add a SAX API support to Daffodil.
https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+SAX+API
Many libraries and applications already support SAX, so this should
provide a means for more seamless integration into different toolsuites,
opening up the places where Daffodil could be easily integrated.
SAX is also generally viewed as having a lower memory overhead, though
this does not attempt to solve the memory issues related to Daffodil and
the internal infoset representation. This essentially just adds a SAX
compatible API around our existing API. Other changes are needed to
reduce our memory overhead and truly support a streaming model.
- Steve