Re: [rust-dev] Faster communication between tasks

Brian Anderson Tue, 04 Feb 2014 14:02:20 -0800

Thanks for the update! Once it's pushed a writeup about the design andperformance might be well-receieved on r/rust.


On 01/28/2014 06:05 PM, Simon Ruggier wrote:

A small update: I've gotten a resizable version of my disruptorimplementation working, and the performance looks pretty good so far.I still have a few loose ends to tie up before I push out the changes.I should have the updated code on GitHub hopefully within a couple ofweeks, depending on how much time I find to work on it.

On Sat, Nov 9, 2013 at 2:13 PM, Simon Ruggier <simo...@gmail.com<mailto:simo...@gmail.com>> wrote:


    Hi all, I've tentatively come up with a design that would allow
    the sender to reallocate the buffer as necessary, with very little
    added performance cost. The sending side would bear the cost of
    reallocation, and there would be an extra test that receivers
    would have to make every time they process an item (no extra
    atomic operations needed). However, it may be a few weeks or more
    before I have a working implementation to demonstrate, so I
    figured it might be worthwhile to mention now that I'll be working
    on this.

    Also, I think it would be interesting to investigate doing
    something like the Linux kernel's deadlock detection[1], but
    generalized to apply to bounded queues, and implemented as a
    static check. I know little about this, but even so, I can see how
    it would be an enormous amount of work. On the other hand, I would
    have thought the same thing about the memory safety rules that
    Rust enforces. I'm hopeful that this will eventually be possible
    as well.

    [1] https://www.kernel.org/doc/Documentation/lockdep-design.txt


    On Wed, Oct 30, 2013 at 12:55 AM, Simon Ruggier <simo...@gmail.com
    <mailto:simo...@gmail.com>> wrote:

        On Tue, Oct 29, 2013 at 3:30 PM, Brian Anderson
        <bander...@mozilla.com <mailto:bander...@mozilla.com>> wrote:

            On 10/28/2013 10:02 PM, Simon Ruggier wrote:

            Greetings fellow Rustians!

            First of all, thanks for working on such a great
            language. I really like the clean syntax, increased
            safety, separation of data from function definitions, and
            freedom from having to declare duplicate method
            prototypes in header files.

            I've been working on an alternate way to communicate
            between tasks in Rust, following the same approach as the
            LMAX Disruptor.[1] I'm hoping to eventually offer a
            superset of the functionality in the pipes API, and
            replace them as the default communication mechanism
            between tasks. Just as with concurrency in general, my
            main motivation in implementing this is to improve
            performance. For more information about the disruptor
            approach, there's a lot of information linked from their
            home page, in a variety of formats.


            This is really exciting work. Thanks for pursuing it. I've
            been interested in exploring something like Disruptor in
            Rust. The current channel types in Rust are indeed slow,
            and fixing them is the topic of
            https://github.com/mozilla/rust/issues/8568.

        I'll start paying attention to that. The Morrison & Afek 2013
        paper looks like something I should read.


            This is my first major contribution of new functionality
            to an open-source project, so I didn't want to discuss it
            in advance until I had a working system to demonstrate. I
            currently have a very basic proof of concept that
            achieves almost two orders of magnitude better
            performance than the pipes API. On my hardware[2], I
            currently see throughput of about 27 million items per
            second when synchronizing with a double-checked wait
            condition protocol between sender and receivers, 80+
            million items with no blocking (i.e. busy waiting), and
            anywhere from 240,000 to 600,000 when using pipes. The
            LMAX Disruptor library gets up to 110 million items per
            second on the same hardware (using busy waiting and
            yielding), so there's definitely still room for
            significant improvement.


            Those are awesome results!


        Thanks! When I first brought it up, it was getting about 14
        million with the busy waiting. Minimizing the number of atomic
        operations (even with relaxed memory ordering) makes a big
        difference in performance. The 2/3 drop in performance with
        the blocking wait strategy comes from merely doing a
        read-modify-write operation on every send (it currently uses
        atomic swap, I haven't experimented with others yet). To be
        fair, the only result I can take credit for is the blocking
        algorithm. The other ideas are straight from the original
        disruptor.

            I've put the code up on GitHub (I'm using rustc from
            master).[3] Currently, single and multi-stage pipelines
            of receivers are supported, while many features are
            missing, like multiple concurrent senders, multiple
            concurrent receivers, or mutation of the items as they
            pass through the pipeline. However, given what I have so
            far, now is probably the right time to start soliciting
            feedback and advice. I'm looking for review,
            suggestions/constructive criticism, and guidance about
            contributing this to the Rust codebase.

I'm not deeply familiar with Disruptor, but I believe that
it uses bounded queues. My general feeling thus far is
that, as the general 'go-to' channel type, people should
not be using bounded queues that block the sender when
full because of the potential for unexpected deadlocks. I
could be convinced otherwise though if it's just not
possible to have reasonably fast unbounded channels. Note
that I don't think it's critical for the general-purpose
channel to be as fast as possible - it's more important to
be convenient.

Yes, it does. I'm divided on this, because unbounded queues
can also lead to memory exhaustion and added latency, but I
suspect that for many use cases, you're right. For performance
critical code, I think there's probably no argument: if a
queue is too large, it starts causing latency problems (like
with bufferbloat). A queue that accepts an unlimited number of
items is like an API that doesn't let the caller know about
errors. The caller needs to know that there's a large queue,
and adjust its behaviour. Because of this, I doubt any
performance-critical application would find it truly optimal
to use unbounded queues. My opinion on this is strongly
influenced by this post:

http://mechanical-sympathy.blogspot.co.uk/2012/05/apply-back-pressure-when-overloaded.html

For general usage, though, I need to do more research. Any
application where latency is relevant really should be
designed to deal with back-pressure from queues, but there may
be some batch job style use cases where, as you say, it isn't
worth the extra effort. On the other hand, it's relevant to
think about how deadlocks occur, and decide whether or not
it's reasonable for developers to expect to be able to do
those things. I'll look into this and see what I come up with.

If there were some general way to mitigate the deadlock issue
within the runtime, it would also solve this problem.

As a last resort, I suspect that I could probably figure out a
way to have the sender resize the buffer when it fills, copy
the elements over, and then switch the consumers over to the
larger buffer. I don't know if I could do it without affecting
the fast path on the receiver side.

Please keep working on this. I'm excited to see your results.

I appreciate the encouragement :)


            Thanks,
            Simon

            [1] http://lmax-exchange.github.io/disruptor/
            [2] A 2.66GHz Intel P8800 CPU running in a Thinkpad T500
            on Linux x86_64
            [3] https://github.com/sruggier/rust-disruptor


            _______________________________________________
            Rust-dev mailing list
            Rust-dev@mozilla.org  <mailto:Rust-dev@mozilla.org>
            https://mail.mozilla.org/listinfo/rust-dev



            _______________________________________________
            Rust-dev mailing list
            Rust-dev@mozilla.org <mailto:Rust-dev@mozilla.org>
            https://mail.mozilla.org/listinfo/rust-dev

_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] Faster communication between tasks

Reply via email to