A small update: I've gotten a resizable version of my disruptor
implementation working, and the performance looks pretty good so far.
I still have a few loose ends to tie up before I push out the changes.
I should have the updated code on GitHub hopefully within a couple of
weeks, depending on how much time I find to work on it.
On Sat, Nov 9, 2013 at 2:13 PM, Simon Ruggier <[email protected]
<mailto:[email protected]>> wrote:
Hi all, I've tentatively come up with a design that would allow
the sender to reallocate the buffer as necessary, with very little
added performance cost. The sending side would bear the cost of
reallocation, and there would be an extra test that receivers
would have to make every time they process an item (no extra
atomic operations needed). However, it may be a few weeks or more
before I have a working implementation to demonstrate, so I
figured it might be worthwhile to mention now that I'll be working
on this.
Also, I think it would be interesting to investigate doing
something like the Linux kernel's deadlock detection[1], but
generalized to apply to bounded queues, and implemented as a
static check. I know little about this, but even so, I can see how
it would be an enormous amount of work. On the other hand, I would
have thought the same thing about the memory safety rules that
Rust enforces. I'm hopeful that this will eventually be possible
as well.
[1] https://www.kernel.org/doc/Documentation/lockdep-design.txt
On Wed, Oct 30, 2013 at 12:55 AM, Simon Ruggier <[email protected]
<mailto:[email protected]>> wrote:
On Tue, Oct 29, 2013 at 3:30 PM, Brian Anderson
<[email protected] <mailto:[email protected]>> wrote:
On 10/28/2013 10:02 PM, Simon Ruggier wrote:
Greetings fellow Rustians!
First of all, thanks for working on such a great
language. I really like the clean syntax, increased
safety, separation of data from function definitions, and
freedom from having to declare duplicate method
prototypes in header files.
I've been working on an alternate way to communicate
between tasks in Rust, following the same approach as the
LMAX Disruptor.[1] I'm hoping to eventually offer a
superset of the functionality in the pipes API, and
replace them as the default communication mechanism
between tasks. Just as with concurrency in general, my
main motivation in implementing this is to improve
performance. For more information about the disruptor
approach, there's a lot of information linked from their
home page, in a variety of formats.
This is really exciting work. Thanks for pursuing it. I've
been interested in exploring something like Disruptor in
Rust. The current channel types in Rust are indeed slow,
and fixing them is the topic of
https://github.com/mozilla/rust/issues/8568.
I'll start paying attention to that. The Morrison & Afek 2013
paper looks like something I should read.
This is my first major contribution of new functionality
to an open-source project, so I didn't want to discuss it
in advance until I had a working system to demonstrate. I
currently have a very basic proof of concept that
achieves almost two orders of magnitude better
performance than the pipes API. On my hardware[2], I
currently see throughput of about 27 million items per
second when synchronizing with a double-checked wait
condition protocol between sender and receivers, 80+
million items with no blocking (i.e. busy waiting), and
anywhere from 240,000 to 600,000 when using pipes. The
LMAX Disruptor library gets up to 110 million items per
second on the same hardware (using busy waiting and
yielding), so there's definitely still room for
significant improvement.
Those are awesome results!
Thanks! When I first brought it up, it was getting about 14
million with the busy waiting. Minimizing the number of atomic
operations (even with relaxed memory ordering) makes a big
difference in performance. The 2/3 drop in performance with
the blocking wait strategy comes from merely doing a
read-modify-write operation on every send (it currently uses
atomic swap, I haven't experimented with others yet). To be
fair, the only result I can take credit for is the blocking
algorithm. The other ideas are straight from the original
disruptor.
I've put the code up on GitHub (I'm using rustc from
master).[3] Currently, single and multi-stage pipelines
of receivers are supported, while many features are
missing, like multiple concurrent senders, multiple
concurrent receivers, or mutation of the items as they
pass through the pipeline. However, given what I have so
far, now is probably the right time to start soliciting
feedback and advice. I'm looking for review,
suggestions/constructive criticism, and guidance about
contributing this to the Rust codebase.
I'm not deeply familiar with Disruptor, but I believe that
it uses bounded queues. My general feeling thus far is
that, as the general 'go-to' channel type, people should
not be using bounded queues that block the sender when
full because of the potential for unexpected deadlocks. I
could be convinced otherwise though if it's just not
possible to have reasonably fast unbounded channels. Note
that I don't think it's critical for the general-purpose
channel to be as fast as possible - it's more important to
be convenient.
Yes, it does. I'm divided on this, because unbounded queues
can also lead to memory exhaustion and added latency, but I
suspect that for many use cases, you're right. For performance
critical code, I think there's probably no argument: if a
queue is too large, it starts causing latency problems (like
with bufferbloat). A queue that accepts an unlimited number of
items is like an API that doesn't let the caller know about
errors. The caller needs to know that there's a large queue,
and adjust its behaviour. Because of this, I doubt any
performance-critical application would find it truly optimal
to use unbounded queues. My opinion on this is strongly
influenced by this post:
http://mechanical-sympathy.blogspot.co.uk/2012/05/apply-back-pressure-when-overloaded.html
For general usage, though, I need to do more research. Any
application where latency is relevant really should be
designed to deal with back-pressure from queues, but there may
be some batch job style use cases where, as you say, it isn't
worth the extra effort. On the other hand, it's relevant to
think about how deadlocks occur, and decide whether or not
it's reasonable for developers to expect to be able to do
those things. I'll look into this and see what I come up with.
If there were some general way to mitigate the deadlock issue
within the runtime, it would also solve this problem.
As a last resort, I suspect that I could probably figure out a
way to have the sender resize the buffer when it fills, copy
the elements over, and then switch the consumers over to the
larger buffer. I don't know if I could do it without affecting
the fast path on the receiver side.
Please keep working on this. I'm excited to see your results.
I appreciate the encouragement :)
Thanks,
Simon
[1] http://lmax-exchange.github.io/disruptor/
[2] A 2.66GHz Intel P8800 CPU running in a Thinkpad T500
on Linux x86_64
[3] https://github.com/sruggier/rust-disruptor
_______________________________________________
Rust-dev mailing list
[email protected] <mailto:[email protected]>
https://mail.mozilla.org/listinfo/rust-dev
_______________________________________________
Rust-dev mailing list
[email protected] <mailto:[email protected]>
https://mail.mozilla.org/listinfo/rust-dev