[fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Josh Gargus
I hope this may be of general interest, but I'm personally interested on Dan's 
thoughts on whether Disruptors might be a suitable "compilation target" for 
Nile.  My intuition is that it may be the right way for Nile to run efficiently 
on commonly available multicore processors (i.e. by minimizing 
branch-misprediction and memory contention, and optimizing cache behavior).

I'm referring to:
http://code.google.com/p/disruptor/

Google will turn up plenty more references, but the StackOverflow topic is 
worthwhile:
http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work

One limitation that I saw was it appears to work only for simple linear 
dataflows:, each pipeline stage consumes and produces a single value.  This is 
a consequence of each entry in the ring-buffer having a fixed size (each entry 
is a data structure containing the "scratch space" necessary to record the 
input/output of each dataflow stage.

This seems to be a serious limitation, since Nile allows you to easily express 
dataflows with more complicated topologies.  For example, to draw a filled 
shape bounded by a sequence of bezier curves, you might write a Nile program 
that recursively splits curves until they are "small enough" or "straight 
enough" to be approximated by linear segments, and then to rasterize these to 
produce pixels to be shaded by downstream Nile elements.  The problem is that 
you don't know how many outputs will be produced by each pipeline stage.

A solution that occurred to me this morning is to use multiple ring buffers.  
Linear subgraphs of the dataflow (i.e. chained sequences of 
one-input-to-one-output elements) can fit into a single ring buffer, but 
elements that produce a varying number of outputs would output to a different 
ring buffer (or multiple ring buffers if it can produce multiple types of 
output).  This would be extremely cumbersome to program manually, but not if 
you compile down to it from Nile.

I don't understand the Nile C runtime very well, so it's possible that it's 
already doing something analogous to this (or even smarter).

Thoughts?

Cheers,
Josh___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Dan Amelang
Hi Josh, good to hear from you again. Hope Google is treating you well :)

"The Disruptor" looks to me like it could be an interesting approach
to optimizing inter-thread communication. Their technical paper is
very readable. For their problem domain, it makes a lot of sense.

Regarding the Nile C runtime, inter-thread communication is currently
not a bottleneck. So even if you created a C-based disruptor for Nile,
or generated Java code from Nile (in order to use theirs, which is
Java-based), I'm pretty sure you wouldn't see a performance
improvement.

The Nile runtime currently does several things to keep inter-thread
communication/contention down. It uses queues of fixed-sized buffers
for batched stream data.  Once a Nile process gets a hold of a buffer
for reading/writing, it has exclusive access to it, so no
synchronization is needed during reading/writing. Once the process is
done with the buffer, then it might contend with another when queueing
the buffer, but this is unlikely. Queuing is a relatively infrequent
occurrence (compared to computation and stream data read/writing),
plus the queues are kept on a per-process basis (there are many more
Nile processes than OS-level threads running) which scales well.

On top of that, the buffers (and other managed memory objects) are
laid out carefully to avoid false sharing (of cache lines).

The runtime also is designed to minimize L1 cache misses, more on that
if there is interest.

I apologize that as usual with my stuff, none of the Nile C runtime is
documented (and there aren't even program comments). But, I'm
currently writing all this up for my dissertation, which will be
available in case these details are of interest to anyone.

Dan

On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus  wrote:
> I hope this may be of general interest, but I'm personally interested on
> Dan's thoughts on whether Disruptors might be a suitable "compilation
> target" for Nile.  My intuition is that it may be the right way for Nile to
> run efficiently on commonly available multicore processors (i.e. by
> minimizing branch-misprediction and memory contention, and optimizing cache
> behavior).
>
> I'm referring to:
>
> http://code.google.com/p/disruptor/
>
>
> Google will turn up plenty more references, but the StackOverflow topic is
> worthwhile:
> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
>
> One limitation that I saw was it appears to work only for simple linear
> dataflows:, each pipeline stage consumes and produces a single value.  This
> is a consequence of each entry in the ring-buffer having a fixed size (each
> entry is a data structure containing the "scratch space" necessary to record
> the input/output of each dataflow stage.
>
> This seems to be a serious limitation, since Nile allows you to easily
> express dataflows with more complicated topologies.  For example, to draw a
> filled shape bounded by a sequence of bezier curves, you might write a Nile
> program that recursively splits curves until they are "small enough" or
> "straight enough" to be approximated by linear segments, and then to
> rasterize these to produce pixels to be shaded by downstream Nile elements.
> The problem is that you don't know how many outputs will be produced by each
> pipeline stage.
>
> A solution that occurred to me this morning is to use multiple ring buffers.
> Linear subgraphs of the dataflow (i.e. chained sequences of
> one-input-to-one-output elements) can fit into a single ring buffer, but
> elements that produce a varying number of outputs would output to a
> different ring buffer (or multiple ring buffers if it can produce multiple
> types of output).  This would be extremely cumbersome to program manually,
> but not if you compile down to it from Nile.
>
> I don't understand the Nile C runtime very well, so it's possible that it's
> already doing something analogous to this (or even smarter).
>
> Thoughts?
>
> Cheers,
> Josh
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Daniel Gackle
< The runtime also is designed to minimize L1 cache misses, more on that if
there is interest. >

Interested!



On Fri, Jul 13, 2012 at 4:32 PM, Dan Amelang wrote:

> Hi Josh, good to hear from you again. Hope Google is treating you well :)
>
> "The Disruptor" looks to me like it could be an interesting approach
> to optimizing inter-thread communication. Their technical paper is
> very readable. For their problem domain, it makes a lot of sense.
>
> Regarding the Nile C runtime, inter-thread communication is currently
> not a bottleneck. So even if you created a C-based disruptor for Nile,
> or generated Java code from Nile (in order to use theirs, which is
> Java-based), I'm pretty sure you wouldn't see a performance
> improvement.
>
> The Nile runtime currently does several things to keep inter-thread
> communication/contention down. It uses queues of fixed-sized buffers
> for batched stream data.  Once a Nile process gets a hold of a buffer
> for reading/writing, it has exclusive access to it, so no
> synchronization is needed during reading/writing. Once the process is
> done with the buffer, then it might contend with another when queueing
> the buffer, but this is unlikely. Queuing is a relatively infrequent
> occurrence (compared to computation and stream data read/writing),
> plus the queues are kept on a per-process basis (there are many more
> Nile processes than OS-level threads running) which scales well.
>
> On top of that, the buffers (and other managed memory objects) are
> laid out carefully to avoid false sharing (of cache lines).
>
> The runtime also is designed to minimize L1 cache misses, more on that
> if there is interest.
>
> I apologize that as usual with my stuff, none of the Nile C runtime is
> documented (and there aren't even program comments). But, I'm
> currently writing all this up for my dissertation, which will be
> available in case these details are of interest to anyone.
>
> Dan
>
> On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus  wrote:
> > I hope this may be of general interest, but I'm personally interested on
> > Dan's thoughts on whether Disruptors might be a suitable "compilation
> > target" for Nile.  My intuition is that it may be the right way for Nile
> to
> > run efficiently on commonly available multicore processors (i.e. by
> > minimizing branch-misprediction and memory contention, and optimizing
> cache
> > behavior).
> >
> > I'm referring to:
> >
> > http://code.google.com/p/disruptor/
> >
> >
> > Google will turn up plenty more references, but the StackOverflow topic
> is
> > worthwhile:
> >
> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
> >
> > One limitation that I saw was it appears to work only for simple linear
> > dataflows:, each pipeline stage consumes and produces a single value.
>  This
> > is a consequence of each entry in the ring-buffer having a fixed size
> (each
> > entry is a data structure containing the "scratch space" necessary to
> record
> > the input/output of each dataflow stage.
> >
> > This seems to be a serious limitation, since Nile allows you to easily
> > express dataflows with more complicated topologies.  For example, to
> draw a
> > filled shape bounded by a sequence of bezier curves, you might write a
> Nile
> > program that recursively splits curves until they are "small enough" or
> > "straight enough" to be approximated by linear segments, and then to
> > rasterize these to produce pixels to be shaded by downstream Nile
> elements.
> > The problem is that you don't know how many outputs will be produced by
> each
> > pipeline stage.
> >
> > A solution that occurred to me this morning is to use multiple ring
> buffers.
> > Linear subgraphs of the dataflow (i.e. chained sequences of
> > one-input-to-one-output elements) can fit into a single ring buffer, but
> > elements that produce a varying number of outputs would output to a
> > different ring buffer (or multiple ring buffers if it can produce
> multiple
> > types of output).  This would be extremely cumbersome to program
> manually,
> > but not if you compile down to it from Nile.
> >
> > I don't understand the Nile C runtime very well, so it's possible that
> it's
> > already doing something analogous to this (or even smarter).
> >
> > Thoughts?
> >
> > Cheers,
> > Josh
> ___
> fonc mailing list
> fonc@vpri.org
> http://vpri.org/mailman/listinfo/fonc
>
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Kevin Jones
< The runtime also is designed to minimize L1 cache misses, more on that if 
there is interest. >

Seconded!




 From: Daniel Gackle 
To: Fundamentals of New Computing  
Sent: Friday, July 13, 2012 4:21 PM
Subject: Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, 
low latency concurrency?
 

< The runtime also is designed to minimize L1 cache misses, more on that if 
there is interest. >

Interested!




On Fri, Jul 13, 2012 at 4:32 PM, Dan Amelang  wrote:

Hi Josh, good to hear from you again. Hope Google is treating you well :)
>
>"The Disruptor" looks to me like it could be an interesting approach
>to optimizing inter-thread communication. Their technical paper is
>very readable. For their problem domain, it makes a lot of sense.
>
>Regarding the Nile C runtime, inter-thread communication is currently
>not a bottleneck. So even if you created a C-based disruptor for Nile,
>or generated Java code from Nile (in order to use theirs, which is
>Java-based), I'm pretty sure you wouldn't see a performance
>improvement.
>
>The Nile runtime currently does several things to keep inter-thread
>communication/contention down. It uses queues of fixed-sized buffers
>for batched stream data.  Once a Nile process gets a hold of a buffer
>for reading/writing, it has exclusive access to it, so no
>synchronization is needed during reading/writing. Once the process is
>done with the buffer, then it might contend with another when queueing
>the buffer, but this is unlikely. Queuing is a relatively infrequent
>occurrence (compared to computation and stream data read/writing),
>plus the queues are kept on a per-process basis (there are many more
>Nile processes than OS-level threads running) which scales well.
>
>On top of that, the buffers (and other managed memory objects) are
>laid out carefully to avoid false sharing (of cache lines).
>
>The runtime also is designed to minimize L1 cache misses, more on that
>if there is interest.
>
>I apologize that as usual with my stuff, none of the Nile C runtime is
>documented (and there aren't even program comments). But, I'm
>currently writing all this up for my dissertation, which will be
>available in case these details are of interest to anyone.
>
>Dan
>
>
>On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus  wrote:
>> I hope this may be of general interest, but I'm personally interested on
>> Dan's thoughts on whether Disruptors might be a suitable "compilation
>> target" for Nile.  My intuition is that it may be the right way for Nile to
>> run efficiently on commonly available multicore processors (i.e. by
>> minimizing branch-misprediction and memory contention, and optimizing cache
>> behavior).
>>
>> I'm referring to:
>>
>> http://code.google.com/p/disruptor/
>>
>>
>> Google will turn up plenty more references, but the StackOverflow topic is
>> worthwhile:
>> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
>>
>> One limitation that I saw was it appears to work only for simple linear
>> dataflows:, each pipeline stage consumes and produces a single value.  This
>> is a consequence of each entry in the ring-buffer having a fixed size (each
>> entry is a data structure containing the "scratch space" necessary to record
>> the input/output of each dataflow stage.
>>
>> This seems to be a serious limitation, since Nile allows you to easily
>> express dataflows with more complicated topologies.  For example, to draw a
>> filled shape bounded by a sequence of bezier curves, you might write a Nile
>> program that recursively splits curves until they are "small enough" or
>> "straight enough" to be approximated by linear segments, and then to
>> rasterize these to produce pixels to be shaded by downstream Nile elements.
>> The problem is that you don't know how many outputs will be produced by each
>> pipeline stage.
>>
>> A solution that occurred to me this morning is to use multiple ring buffers.
>> Linear subgraphs of the dataflow (i.e. chained sequences of
>> one-input-to-one-output elements) can fit into a single ring buffer, but
>> elements that produce a varying number of outputs would output to a
>> different ring buffer (or multiple ring buffers if it can produce multiple
>> types of output).  This would be extremely cumbersome to program manually,
>> but not if you compile down to it from Nile.
>>
>> I don't understand the Nile C runtime very well, so it's possible that it's
>> already doing something analogous to this (or even smarter).
>>
>> Thoughts?
>>
>> Cheers,
>> Josh
>___
>fonc mailing list
>fonc@vpri.org
>http://vpri.org/mailman/listinfo/fonc
>

___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?

2012-07-14 Thread Josh Gargus

On Jul 13, 2012, at 3:32 PM, Dan Amelang wrote:

> Hi Josh, good to hear from you again. Hope Google is treating you well :)
> 
> "The Disruptor" looks to me like it could be an interesting approach
> to optimizing inter-thread communication. Their technical paper is
> very readable. For their problem domain, it makes a lot of sense.
> 
> Regarding the Nile C runtime, inter-thread communication is currently
> not a bottleneck. So even if you created a C-based disruptor for Nile,
> or generated Java code from Nile (in order to use theirs, which is
> Java-based), I'm pretty sure you wouldn't see a performance
> improvement.
> 

>From what you describe below, probably not.


> The Nile runtime currently does several things to keep inter-thread
> communication/contention down. It uses queues of fixed-sized buffers
> for batched stream data.  Once a Nile process gets a hold of a buffer
> for reading/writing, it has exclusive access to it, so no
> synchronization is needed during reading/writing. Once the process is
> done with the buffer, then it might contend with another when queueing
> the buffer, but this is unlikely. Queuing is a relatively infrequent
> occurrence (compared to computation and stream data read/writing),
> plus the queues are kept on a per-process basis (there are many more
> Nile processes than OS-level threads running) which scales well.
> 
> On top of that, the buffers (and other managed memory objects) are
> laid out carefully to avoid false sharing (of cache lines).
> 
> The runtime also is designed to minimize L1 cache misses, more on that
> if there is interest.

I'm interested in all of it :-)


> 
> I apologize that as usual with my stuff, none of the Nile C runtime is
> documented (and there aren't even program comments). But, I'm
> currently writing all this up for my dissertation, which will be
> available in case these details are of interest to anyone.

Good luck with the writing.  Let me know if you plan to visit the Bay area; it 
would be great to have lunch or dinner.

Cheers,
Josh



> 
> Dan
> 
> On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus  wrote:
>> I hope this may be of general interest, but I'm personally interested on
>> Dan's thoughts on whether Disruptors might be a suitable "compilation
>> target" for Nile.  My intuition is that it may be the right way for Nile to
>> run efficiently on commonly available multicore processors (i.e. by
>> minimizing branch-misprediction and memory contention, and optimizing cache
>> behavior).
>> 
>> I'm referring to:
>> 
>> http://code.google.com/p/disruptor/
>> 
>> 
>> Google will turn up plenty more references, but the StackOverflow topic is
>> worthwhile:
>> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
>> 
>> One limitation that I saw was it appears to work only for simple linear
>> dataflows:, each pipeline stage consumes and produces a single value.  This
>> is a consequence of each entry in the ring-buffer having a fixed size (each
>> entry is a data structure containing the "scratch space" necessary to record
>> the input/output of each dataflow stage.
>> 
>> This seems to be a serious limitation, since Nile allows you to easily
>> express dataflows with more complicated topologies.  For example, to draw a
>> filled shape bounded by a sequence of bezier curves, you might write a Nile
>> program that recursively splits curves until they are "small enough" or
>> "straight enough" to be approximated by linear segments, and then to
>> rasterize these to produce pixels to be shaded by downstream Nile elements.
>> The problem is that you don't know how many outputs will be produced by each
>> pipeline stage.
>> 
>> A solution that occurred to me this morning is to use multiple ring buffers.
>> Linear subgraphs of the dataflow (i.e. chained sequences of
>> one-input-to-one-output elements) can fit into a single ring buffer, but
>> elements that produce a varying number of outputs would output to a
>> different ring buffer (or multiple ring buffers if it can produce multiple
>> types of output).  This would be extremely cumbersome to program manually,
>> but not if you compile down to it from Nile.
>> 
>> I don't understand the Nile C runtime very well, so it's possible that it's
>> already doing something analogous to this (or even smarter).
>> 
>> Thoughts?
>> 
>> Cheers,
>> Josh

___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc