[fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?
I hope this may be of general interest, but I'm personally interested on Dan's thoughts on whether Disruptors might be a suitable "compilation target" for Nile. My intuition is that it may be the right way for Nile to run efficiently on commonly available multicore processors (i.e. by minimizing branch-misprediction and memory contention, and optimizing cache behavior). I'm referring to: http://code.google.com/p/disruptor/ Google will turn up plenty more references, but the StackOverflow topic is worthwhile: http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work One limitation that I saw was it appears to work only for simple linear dataflows:, each pipeline stage consumes and produces a single value. This is a consequence of each entry in the ring-buffer having a fixed size (each entry is a data structure containing the "scratch space" necessary to record the input/output of each dataflow stage. This seems to be a serious limitation, since Nile allows you to easily express dataflows with more complicated topologies. For example, to draw a filled shape bounded by a sequence of bezier curves, you might write a Nile program that recursively splits curves until they are "small enough" or "straight enough" to be approximated by linear segments, and then to rasterize these to produce pixels to be shaded by downstream Nile elements. The problem is that you don't know how many outputs will be produced by each pipeline stage. A solution that occurred to me this morning is to use multiple ring buffers. Linear subgraphs of the dataflow (i.e. chained sequences of one-input-to-one-output elements) can fit into a single ring buffer, but elements that produce a varying number of outputs would output to a different ring buffer (or multiple ring buffers if it can produce multiple types of output). This would be extremely cumbersome to program manually, but not if you compile down to it from Nile. I don't understand the Nile C runtime very well, so it's possible that it's already doing something analogous to this (or even smarter). Thoughts? Cheers, Josh___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?
Hi Josh, good to hear from you again. Hope Google is treating you well :) "The Disruptor" looks to me like it could be an interesting approach to optimizing inter-thread communication. Their technical paper is very readable. For their problem domain, it makes a lot of sense. Regarding the Nile C runtime, inter-thread communication is currently not a bottleneck. So even if you created a C-based disruptor for Nile, or generated Java code from Nile (in order to use theirs, which is Java-based), I'm pretty sure you wouldn't see a performance improvement. The Nile runtime currently does several things to keep inter-thread communication/contention down. It uses queues of fixed-sized buffers for batched stream data. Once a Nile process gets a hold of a buffer for reading/writing, it has exclusive access to it, so no synchronization is needed during reading/writing. Once the process is done with the buffer, then it might contend with another when queueing the buffer, but this is unlikely. Queuing is a relatively infrequent occurrence (compared to computation and stream data read/writing), plus the queues are kept on a per-process basis (there are many more Nile processes than OS-level threads running) which scales well. On top of that, the buffers (and other managed memory objects) are laid out carefully to avoid false sharing (of cache lines). The runtime also is designed to minimize L1 cache misses, more on that if there is interest. I apologize that as usual with my stuff, none of the Nile C runtime is documented (and there aren't even program comments). But, I'm currently writing all this up for my dissertation, which will be available in case these details are of interest to anyone. Dan On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus wrote: > I hope this may be of general interest, but I'm personally interested on > Dan's thoughts on whether Disruptors might be a suitable "compilation > target" for Nile. My intuition is that it may be the right way for Nile to > run efficiently on commonly available multicore processors (i.e. by > minimizing branch-misprediction and memory contention, and optimizing cache > behavior). > > I'm referring to: > > http://code.google.com/p/disruptor/ > > > Google will turn up plenty more references, but the StackOverflow topic is > worthwhile: > http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work > > One limitation that I saw was it appears to work only for simple linear > dataflows:, each pipeline stage consumes and produces a single value. This > is a consequence of each entry in the ring-buffer having a fixed size (each > entry is a data structure containing the "scratch space" necessary to record > the input/output of each dataflow stage. > > This seems to be a serious limitation, since Nile allows you to easily > express dataflows with more complicated topologies. For example, to draw a > filled shape bounded by a sequence of bezier curves, you might write a Nile > program that recursively splits curves until they are "small enough" or > "straight enough" to be approximated by linear segments, and then to > rasterize these to produce pixels to be shaded by downstream Nile elements. > The problem is that you don't know how many outputs will be produced by each > pipeline stage. > > A solution that occurred to me this morning is to use multiple ring buffers. > Linear subgraphs of the dataflow (i.e. chained sequences of > one-input-to-one-output elements) can fit into a single ring buffer, but > elements that produce a varying number of outputs would output to a > different ring buffer (or multiple ring buffers if it can produce multiple > types of output). This would be extremely cumbersome to program manually, > but not if you compile down to it from Nile. > > I don't understand the Nile C runtime very well, so it's possible that it's > already doing something analogous to this (or even smarter). > > Thoughts? > > Cheers, > Josh ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?
< The runtime also is designed to minimize L1 cache misses, more on that if there is interest. > Interested! On Fri, Jul 13, 2012 at 4:32 PM, Dan Amelang wrote: > Hi Josh, good to hear from you again. Hope Google is treating you well :) > > "The Disruptor" looks to me like it could be an interesting approach > to optimizing inter-thread communication. Their technical paper is > very readable. For their problem domain, it makes a lot of sense. > > Regarding the Nile C runtime, inter-thread communication is currently > not a bottleneck. So even if you created a C-based disruptor for Nile, > or generated Java code from Nile (in order to use theirs, which is > Java-based), I'm pretty sure you wouldn't see a performance > improvement. > > The Nile runtime currently does several things to keep inter-thread > communication/contention down. It uses queues of fixed-sized buffers > for batched stream data. Once a Nile process gets a hold of a buffer > for reading/writing, it has exclusive access to it, so no > synchronization is needed during reading/writing. Once the process is > done with the buffer, then it might contend with another when queueing > the buffer, but this is unlikely. Queuing is a relatively infrequent > occurrence (compared to computation and stream data read/writing), > plus the queues are kept on a per-process basis (there are many more > Nile processes than OS-level threads running) which scales well. > > On top of that, the buffers (and other managed memory objects) are > laid out carefully to avoid false sharing (of cache lines). > > The runtime also is designed to minimize L1 cache misses, more on that > if there is interest. > > I apologize that as usual with my stuff, none of the Nile C runtime is > documented (and there aren't even program comments). But, I'm > currently writing all this up for my dissertation, which will be > available in case these details are of interest to anyone. > > Dan > > On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus wrote: > > I hope this may be of general interest, but I'm personally interested on > > Dan's thoughts on whether Disruptors might be a suitable "compilation > > target" for Nile. My intuition is that it may be the right way for Nile > to > > run efficiently on commonly available multicore processors (i.e. by > > minimizing branch-misprediction and memory contention, and optimizing > cache > > behavior). > > > > I'm referring to: > > > > http://code.google.com/p/disruptor/ > > > > > > Google will turn up plenty more references, but the StackOverflow topic > is > > worthwhile: > > > http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work > > > > One limitation that I saw was it appears to work only for simple linear > > dataflows:, each pipeline stage consumes and produces a single value. > This > > is a consequence of each entry in the ring-buffer having a fixed size > (each > > entry is a data structure containing the "scratch space" necessary to > record > > the input/output of each dataflow stage. > > > > This seems to be a serious limitation, since Nile allows you to easily > > express dataflows with more complicated topologies. For example, to > draw a > > filled shape bounded by a sequence of bezier curves, you might write a > Nile > > program that recursively splits curves until they are "small enough" or > > "straight enough" to be approximated by linear segments, and then to > > rasterize these to produce pixels to be shaded by downstream Nile > elements. > > The problem is that you don't know how many outputs will be produced by > each > > pipeline stage. > > > > A solution that occurred to me this morning is to use multiple ring > buffers. > > Linear subgraphs of the dataflow (i.e. chained sequences of > > one-input-to-one-output elements) can fit into a single ring buffer, but > > elements that produce a varying number of outputs would output to a > > different ring buffer (or multiple ring buffers if it can produce > multiple > > types of output). This would be extremely cumbersome to program > manually, > > but not if you compile down to it from Nile. > > > > I don't understand the Nile C runtime very well, so it's possible that > it's > > already doing something analogous to this (or even smarter). > > > > Thoughts? > > > > Cheers, > > Josh > ___ > fonc mailing list > fonc@vpri.org > http://vpri.org/mailman/listinfo/fonc > ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?
< The runtime also is designed to minimize L1 cache misses, more on that if there is interest. > Seconded! From: Daniel Gackle To: Fundamentals of New Computing Sent: Friday, July 13, 2012 4:21 PM Subject: Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency? < The runtime also is designed to minimize L1 cache misses, more on that if there is interest. > Interested! On Fri, Jul 13, 2012 at 4:32 PM, Dan Amelang wrote: Hi Josh, good to hear from you again. Hope Google is treating you well :) > >"The Disruptor" looks to me like it could be an interesting approach >to optimizing inter-thread communication. Their technical paper is >very readable. For their problem domain, it makes a lot of sense. > >Regarding the Nile C runtime, inter-thread communication is currently >not a bottleneck. So even if you created a C-based disruptor for Nile, >or generated Java code from Nile (in order to use theirs, which is >Java-based), I'm pretty sure you wouldn't see a performance >improvement. > >The Nile runtime currently does several things to keep inter-thread >communication/contention down. It uses queues of fixed-sized buffers >for batched stream data. Once a Nile process gets a hold of a buffer >for reading/writing, it has exclusive access to it, so no >synchronization is needed during reading/writing. Once the process is >done with the buffer, then it might contend with another when queueing >the buffer, but this is unlikely. Queuing is a relatively infrequent >occurrence (compared to computation and stream data read/writing), >plus the queues are kept on a per-process basis (there are many more >Nile processes than OS-level threads running) which scales well. > >On top of that, the buffers (and other managed memory objects) are >laid out carefully to avoid false sharing (of cache lines). > >The runtime also is designed to minimize L1 cache misses, more on that >if there is interest. > >I apologize that as usual with my stuff, none of the Nile C runtime is >documented (and there aren't even program comments). But, I'm >currently writing all this up for my dissertation, which will be >available in case these details are of interest to anyone. > >Dan > > >On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus wrote: >> I hope this may be of general interest, but I'm personally interested on >> Dan's thoughts on whether Disruptors might be a suitable "compilation >> target" for Nile. My intuition is that it may be the right way for Nile to >> run efficiently on commonly available multicore processors (i.e. by >> minimizing branch-misprediction and memory contention, and optimizing cache >> behavior). >> >> I'm referring to: >> >> http://code.google.com/p/disruptor/ >> >> >> Google will turn up plenty more references, but the StackOverflow topic is >> worthwhile: >> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work >> >> One limitation that I saw was it appears to work only for simple linear >> dataflows:, each pipeline stage consumes and produces a single value. This >> is a consequence of each entry in the ring-buffer having a fixed size (each >> entry is a data structure containing the "scratch space" necessary to record >> the input/output of each dataflow stage. >> >> This seems to be a serious limitation, since Nile allows you to easily >> express dataflows with more complicated topologies. For example, to draw a >> filled shape bounded by a sequence of bezier curves, you might write a Nile >> program that recursively splits curves until they are "small enough" or >> "straight enough" to be approximated by linear segments, and then to >> rasterize these to produce pixels to be shaded by downstream Nile elements. >> The problem is that you don't know how many outputs will be produced by each >> pipeline stage. >> >> A solution that occurred to me this morning is to use multiple ring buffers. >> Linear subgraphs of the dataflow (i.e. chained sequences of >> one-input-to-one-output elements) can fit into a single ring buffer, but >> elements that produce a varying number of outputs would output to a >> different ring buffer (or multiple ring buffers if it can produce multiple >> types of output). This would be extremely cumbersome to program manually, >> but not if you compile down to it from Nile. >> >> I don't understand the Nile C runtime very well, so it's possible that it's >> already doing something analogous to this (or even smarter). >> >> Thoughts? >> >> Cheers, >> Josh >___ >fonc mailing list >fonc@vpri.org >http://vpri.org/mailman/listinfo/fonc > ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc
Re: [fonc] Any thoughts on "Disruptor" pattern for high-throughput, low latency concurrency?
On Jul 13, 2012, at 3:32 PM, Dan Amelang wrote: > Hi Josh, good to hear from you again. Hope Google is treating you well :) > > "The Disruptor" looks to me like it could be an interesting approach > to optimizing inter-thread communication. Their technical paper is > very readable. For their problem domain, it makes a lot of sense. > > Regarding the Nile C runtime, inter-thread communication is currently > not a bottleneck. So even if you created a C-based disruptor for Nile, > or generated Java code from Nile (in order to use theirs, which is > Java-based), I'm pretty sure you wouldn't see a performance > improvement. > >From what you describe below, probably not. > The Nile runtime currently does several things to keep inter-thread > communication/contention down. It uses queues of fixed-sized buffers > for batched stream data. Once a Nile process gets a hold of a buffer > for reading/writing, it has exclusive access to it, so no > synchronization is needed during reading/writing. Once the process is > done with the buffer, then it might contend with another when queueing > the buffer, but this is unlikely. Queuing is a relatively infrequent > occurrence (compared to computation and stream data read/writing), > plus the queues are kept on a per-process basis (there are many more > Nile processes than OS-level threads running) which scales well. > > On top of that, the buffers (and other managed memory objects) are > laid out carefully to avoid false sharing (of cache lines). > > The runtime also is designed to minimize L1 cache misses, more on that > if there is interest. I'm interested in all of it :-) > > I apologize that as usual with my stuff, none of the Nile C runtime is > documented (and there aren't even program comments). But, I'm > currently writing all this up for my dissertation, which will be > available in case these details are of interest to anyone. Good luck with the writing. Let me know if you plan to visit the Bay area; it would be great to have lunch or dinner. Cheers, Josh > > Dan > > On Fri, Jul 13, 2012 at 8:46 AM, Josh Gargus wrote: >> I hope this may be of general interest, but I'm personally interested on >> Dan's thoughts on whether Disruptors might be a suitable "compilation >> target" for Nile. My intuition is that it may be the right way for Nile to >> run efficiently on commonly available multicore processors (i.e. by >> minimizing branch-misprediction and memory contention, and optimizing cache >> behavior). >> >> I'm referring to: >> >> http://code.google.com/p/disruptor/ >> >> >> Google will turn up plenty more references, but the StackOverflow topic is >> worthwhile: >> http://stackoverflow.com/questions/6559308/how-does-lmaxs-disruptor-pattern-work >> >> One limitation that I saw was it appears to work only for simple linear >> dataflows:, each pipeline stage consumes and produces a single value. This >> is a consequence of each entry in the ring-buffer having a fixed size (each >> entry is a data structure containing the "scratch space" necessary to record >> the input/output of each dataflow stage. >> >> This seems to be a serious limitation, since Nile allows you to easily >> express dataflows with more complicated topologies. For example, to draw a >> filled shape bounded by a sequence of bezier curves, you might write a Nile >> program that recursively splits curves until they are "small enough" or >> "straight enough" to be approximated by linear segments, and then to >> rasterize these to produce pixels to be shaded by downstream Nile elements. >> The problem is that you don't know how many outputs will be produced by each >> pipeline stage. >> >> A solution that occurred to me this morning is to use multiple ring buffers. >> Linear subgraphs of the dataflow (i.e. chained sequences of >> one-input-to-one-output elements) can fit into a single ring buffer, but >> elements that produce a varying number of outputs would output to a >> different ring buffer (or multiple ring buffers if it can produce multiple >> types of output). This would be extremely cumbersome to program manually, >> but not if you compile down to it from Nile. >> >> I don't understand the Nile C runtime very well, so it's possible that it's >> already doing something analogous to this (or even smarter). >> >> Thoughts? >> >> Cheers, >> Josh ___ fonc mailing list fonc@vpri.org http://vpri.org/mailman/listinfo/fonc