[fonc] Any thoughts on Disruptor pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Josh Gargus
I hope this may be of general interest, but I'm personally interested on Dan's 
thoughts on whether Disruptors might be a suitable compilation target for 
Nile.  My intuition is that it may be the right way for Nile to run efficiently 
on commonly available multicore processors (i.e. by minimizing 
branch-misprediction and memory contention, and optimizing cache behavior).

I'm referring to:
http://code.google.com/p/disruptor/

Google will turn up plenty more references, but the StackOverflow topic is 
worthwhile:
http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work

One limitation that I saw was it appears to work only for simple linear 
dataflows:, each pipeline stage consumes and produces a single value.  This is 
a consequence of each entry in the ring-buffer having a fixed size (each entry 
is a data structure containing the scratch space necessary to record the 
input/output of each dataflow stage.

This seems to be a serious limitation, since Nile allows you to easily express 
dataflows with more complicated topologies.  For example, to draw a filled 
shape bounded by a sequence of bezier curves, you might write a Nile program 
that recursively splits curves until they are small enough or straight 
enough to be approximated by linear segments, and then to rasterize these to 
produce pixels to be shaded by downstream Nile elements.  The problem is that 
you don't know how many outputs will be produced by each pipeline stage.

A solution that occurred to me this morning is to use multiple ring buffers.  
Linear subgraphs of the dataflow (i.e. chained sequences of 
one-input-to-one-output elements) can fit into a single ring buffer, but 
elements that produce a varying number of outputs would output to a different 
ring buffer (or multiple ring buffers if it can produce multiple types of 
output).  This would be extremely cumbersome to program manually, but not if 
you compile down to it from Nile.

I don't understand the Nile C runtime very well, so it's possible that it's 
already doing something analogous to this (or even smarter).

Thoughts?

Cheers,
Josh___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Any thoughts on Disruptor pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Dan Amelang
Hi Josh, good to hear from you again. Hope Google is treating you well :)

The Disruptor looks to me like it could be an interesting approach
to optimizing inter-thread communication. Their technical paper is
very readable. For their problem domain, it makes a lot of sense.

Regarding the Nile C runtime, inter-thread communication is currently
not a bottleneck. So even if you created a C-based disruptor for Nile,
or generated Java code from Nile (in order to use theirs, which is
Java-based), I'm pretty sure you wouldn't see a performance
improvement.

The Nile runtime currently does several things to keep inter-thread
communication/contention down. It uses queues of fixed-sized buffers
for batched stream data.  Once a Nile process gets a hold of a buffer
for reading/writing, it has exclusive access to it, so no
synchronization is needed during reading/writing. Once the process is
done with the buffer, then it might contend with another when queueing
the buffer, but this is unlikely. Queuing is a relatively infrequent
occurrence (compared to computation and stream data read/writing),
plus the queues are kept on a per-process basis (there are many more
Nile processes than OS-level threads running) which scales well.

On top of that, the buffers (and other managed memory objects) are
laid out carefully to avoid false sharing (of cache lines).

The runtime also is designed to minimize L1 cache misses, more on that
if there is interest.

I apologize that as usual with my stuff, none of the Nile C runtime is
documented (and there aren't even program comments). But, I'm
currently writing all this up for my dissertation, which will be
available in case these details are of interest to anyone.

Dan

On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus j...@schwa.ca wrote:
 I hope this may be of general interest, but I'm personally interested on
 Dan's thoughts on whether Disruptors might be a suitable compilation
 target for Nile.  My intuition is that it may be the right way for Nile to
 run efficiently on commonly available multicore processors (i.e. by
 minimizing branch-misprediction and memory contention, and optimizing cache
 behavior).

 I'm referring to:

 http://code.google.com/p/disruptor/


 Google will turn up plenty more references, but the StackOverflow topic is
 worthwhile:
 http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work

 One limitation that I saw was it appears to work only for simple linear
 dataflows:, each pipeline stage consumes and produces a single value.  This
 is a consequence of each entry in the ring-buffer having a fixed size (each
 entry is a data structure containing the scratch space necessary to record
 the input/output of each dataflow stage.

 This seems to be a serious limitation, since Nile allows you to easily
 express dataflows with more complicated topologies.  For example, to draw a
 filled shape bounded by a sequence of bezier curves, you might write a Nile
 program that recursively splits curves until they are small enough or
 straight enough to be approximated by linear segments, and then to
 rasterize these to produce pixels to be shaded by downstream Nile elements.
 The problem is that you don't know how many outputs will be produced by each
 pipeline stage.

 A solution that occurred to me this morning is to use multiple ring buffers.
 Linear subgraphs of the dataflow (i.e. chained sequences of
 one-input-to-one-output elements) can fit into a single ring buffer, but
 elements that produce a varying number of outputs would output to a
 different ring buffer (or multiple ring buffers if it can produce multiple
 types of output).  This would be extremely cumbersome to program manually,
 but not if you compile down to it from Nile.

 I don't understand the Nile C runtime very well, so it's possible that it's
 already doing something analogous to this (or even smarter).

 Thoughts?

 Cheers,
 Josh
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Any thoughts on Disruptor pattern for high-throughput, low latency concurrency?

2012-07-13 Thread Daniel Gackle
 The runtime also is designed to minimize L1 cache misses, more on that if
there is interest. 

Interested!



On Fri, Jul 13, 2012 at 4:32 PM, Dan Amelang daniel.amel...@gmail.comwrote:

 Hi Josh, good to hear from you again. Hope Google is treating you well :)

 The Disruptor looks to me like it could be an interesting approach
 to optimizing inter-thread communication. Their technical paper is
 very readable. For their problem domain, it makes a lot of sense.

 Regarding the Nile C runtime, inter-thread communication is currently
 not a bottleneck. So even if you created a C-based disruptor for Nile,
 or generated Java code from Nile (in order to use theirs, which is
 Java-based), I'm pretty sure you wouldn't see a performance
 improvement.

 The Nile runtime currently does several things to keep inter-thread
 communication/contention down. It uses queues of fixed-sized buffers
 for batched stream data.  Once a Nile process gets a hold of a buffer
 for reading/writing, it has exclusive access to it, so no
 synchronization is needed during reading/writing. Once the process is
 done with the buffer, then it might contend with another when queueing
 the buffer, but this is unlikely. Queuing is a relatively infrequent
 occurrence (compared to computation and stream data read/writing),
 plus the queues are kept on a per-process basis (there are many more
 Nile processes than OS-level threads running) which scales well.

 On top of that, the buffers (and other managed memory objects) are
 laid out carefully to avoid false sharing (of cache lines).

 The runtime also is designed to minimize L1 cache misses, more on that
 if there is interest.

 I apologize that as usual with my stuff, none of the Nile C runtime is
 documented (and there aren't even program comments). But, I'm
 currently writing all this up for my dissertation, which will be
 available in case these details are of interest to anyone.

 Dan

 On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus j...@schwa.ca wrote:
  I hope this may be of general interest, but I'm personally interested on
  Dan's thoughts on whether Disruptors might be a suitable compilation
  target for Nile.  My intuition is that it may be the right way for Nile
 to
  run efficiently on commonly available multicore processors (i.e. by
  minimizing branch-misprediction and memory contention, and optimizing
 cache
  behavior).
 
  I'm referring to:
 
  http://code.google.com/p/disruptor/
 
 
  Google will turn up plenty more references, but the StackOverflow topic
 is
  worthwhile:
 
 http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work
 
  One limitation that I saw was it appears to work only for simple linear
  dataflows:, each pipeline stage consumes and produces a single value.
  This
  is a consequence of each entry in the ring-buffer having a fixed size
 (each
  entry is a data structure containing the scratch space necessary to
 record
  the input/output of each dataflow stage.
 
  This seems to be a serious limitation, since Nile allows you to easily
  express dataflows with more complicated topologies.  For example, to
 draw a
  filled shape bounded by a sequence of bezier curves, you might write a
 Nile
  program that recursively splits curves until they are small enough or
  straight enough to be approximated by linear segments, and then to
  rasterize these to produce pixels to be shaded by downstream Nile
 elements.
  The problem is that you don't know how many outputs will be produced by
 each
  pipeline stage.
 
  A solution that occurred to me this morning is to use multiple ring
 buffers.
  Linear subgraphs of the dataflow (i.e. chained sequences of
  one-input-to-one-output elements) can fit into a single ring buffer, but
  elements that produce a varying number of outputs would output to a
  different ring buffer (or multiple ring buffers if it can produce
 multiple
  types of output).  This would be extremely cumbersome to program
 manually,
  but not if you compile down to it from Nile.
 
  I don't understand the Nile C runtime very well, so it's possible that
 it's
  already doing something analogous to this (or even smarter).
 
  Thoughts?
 
  Cheers,
  Josh
 ___
 fonc mailing list
 fonc@vpri.org
 http://vpri.org/mailman/listinfo/fonc

___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc