Thanks for the update! Once it's pushed a writeup about the design and performance might be well-receieved on r/rust.

On 01/28/2014 06:05 PM, Simon Ruggier wrote:
A small update: I've gotten a resizable version of my disruptor implementation working, and the performance looks pretty good so far. I still have a few loose ends to tie up before I push out the changes. I should have the updated code on GitHub hopefully within a couple of weeks, depending on how much time I find to work on it.


On Sat, Nov 9, 2013 at 2:13 PM, Simon Ruggier <simo...@gmail.com <mailto:simo...@gmail.com>> wrote:

    Hi all, I've tentatively come up with a design that would allow
    the sender to reallocate the buffer as necessary, with very little
    added performance cost. The sending side would bear the cost of
    reallocation, and there would be an extra test that receivers
    would have to make every time they process an item (no extra
    atomic operations needed). However, it may be a few weeks or more
    before I have a working implementation to demonstrate, so I
    figured it might be worthwhile to mention now that I'll be working
    on this.

    Also, I think it would be interesting to investigate doing
    something like the Linux kernel's deadlock detection[1], but
    generalized to apply to bounded queues, and implemented as a
    static check. I know little about this, but even so, I can see how
    it would be an enormous amount of work. On the other hand, I would
    have thought the same thing about the memory safety rules that
    Rust enforces. I'm hopeful that this will eventually be possible
    as well.

    [1] https://www.kernel.org/doc/Documentation/lockdep-design.txt


    On Wed, Oct 30, 2013 at 12:55 AM, Simon Ruggier <simo...@gmail.com
    <mailto:simo...@gmail.com>> wrote:

        On Tue, Oct 29, 2013 at 3:30 PM, Brian Anderson
        <bander...@mozilla.com <mailto:bander...@mozilla.com>> wrote:

            On 10/28/2013 10:02 PM, Simon Ruggier wrote:
            Greetings fellow Rustians!

            First of all, thanks for working on such a great
            language. I really like the clean syntax, increased
            safety, separation of data from function definitions, and
            freedom from having to declare duplicate method
            prototypes in header files.

            I've been working on an alternate way to communicate
            between tasks in Rust, following the same approach as the
            LMAX Disruptor.[1] I'm hoping to eventually offer a
            superset of the functionality in the pipes API, and
            replace them as the default communication mechanism
            between tasks. Just as with concurrency in general, my
            main motivation in implementing this is to improve
            performance. For more information about the disruptor
            approach, there's a lot of information linked from their
            home page, in a variety of formats.

            This is really exciting work. Thanks for pursuing it. I've
            been interested in exploring something like Disruptor in
            Rust. The current channel types in Rust are indeed slow,
            and fixing them is the topic of
            https://github.com/mozilla/rust/issues/8568.

        I'll start paying attention to that. The Morrison & Afek 2013
        paper looks like something I should read.



            This is my first major contribution of new functionality
            to an open-source project, so I didn't want to discuss it
            in advance until I had a working system to demonstrate. I
            currently have a very basic proof of concept that
            achieves almost two orders of magnitude better
            performance than the pipes API. On my hardware[2], I
            currently see throughput of about 27 million items per
            second when synchronizing with a double-checked wait
            condition protocol between sender and receivers, 80+
            million items with no blocking (i.e. busy waiting), and
            anywhere from 240,000 to 600,000 when using pipes. The
            LMAX Disruptor library gets up to 110 million items per
            second on the same hardware (using busy waiting and
            yielding), so there's definitely still room for
            significant improvement.

            Those are awesome results!


        Thanks! When I first brought it up, it was getting about 14
        million with the busy waiting. Minimizing the number of atomic
        operations (even with relaxed memory ordering) makes a big
        difference in performance. The 2/3 drop in performance with
        the blocking wait strategy comes from merely doing a
        read-modify-write operation on every send (it currently uses
        atomic swap, I haven't experimented with others yet). To be
        fair, the only result I can take credit for is the blocking
        algorithm. The other ideas are straight from the original
        disruptor.


            I've put the code up on GitHub (I'm using rustc from
            master).[3] Currently, single and multi-stage pipelines
            of receivers are supported, while many features are
            missing, like multiple concurrent senders, multiple
            concurrent receivers, or mutation of the items as they
            pass through the pipeline. However, given what I have so
            far, now is probably the right time to start soliciting
            feedback and advice. I'm looking for review,
            suggestions/constructive criticism, and guidance about
            contributing this to the Rust codebase.

            I'm not deeply familiar with Disruptor, but I believe that
            it uses bounded queues. My general feeling thus far is
            that, as the general 'go-to' channel type, people should
            not be using bounded queues that block the sender when
            full because of the potential for unexpected deadlocks. I
            could be convinced otherwise though if it's just not
            possible to have reasonably fast unbounded channels. Note
            that I don't think it's critical for the general-purpose
            channel to be as fast as possible - it's more important to
            be convenient.


        Yes, it does. I'm divided on this, because unbounded queues
        can also lead to memory exhaustion and added latency, but I
        suspect that for many use cases, you're right. For performance
        critical code, I think there's probably no argument: if a
        queue is too large, it starts causing latency problems (like
        with bufferbloat). A queue that accepts an unlimited number of
        items is like an API that doesn't let the caller know about
        errors. The caller needs to know that there's a large queue,
        and adjust its behaviour. Because of this, I doubt any
        performance-critical application would find it truly optimal
        to use unbounded queues. My opinion on this is strongly
        influenced by this post:
        
http://mechanical-sympathy.blogspot.co.uk/2012/05/apply-back-pressure-when-overloaded.html

        For general usage, though, I need to do more research. Any
        application where latency is relevant really should be
        designed to deal with back-pressure from queues, but there may
        be some batch job style use cases where, as you say, it isn't
        worth the extra effort. On the other hand, it's relevant to
        think about how deadlocks occur, and decide whether or not
        it's reasonable for developers to expect to be able to do
        those things. I'll look into this and see what I come up with.

        If there were some general way to mitigate the deadlock issue
        within the runtime, it would also solve this problem.

        As a last resort, I suspect that I could probably figure out a
        way to have the sender resize the buffer when it fills, copy
        the elements over, and then switch the consumers over to the
        larger buffer. I don't know if I could do it without affecting
        the fast path on the receiver side.

            Please keep working on this. I'm excited to see your results.


        I appreciate the encouragement :)



            Thanks,
            Simon

            [1] http://lmax-exchange.github.io/disruptor/
            [2] A 2.66GHz Intel P8800 CPU running in a Thinkpad T500
            on Linux x86_64
            [3] https://github.com/sruggier/rust-disruptor


            _______________________________________________
            Rust-dev mailing list
            Rust-dev@mozilla.org  <mailto:Rust-dev@mozilla.org>
            https://mail.mozilla.org/listinfo/rust-dev


            _______________________________________________
            Rust-dev mailing list
            Rust-dev@mozilla.org <mailto:Rust-dev@mozilla.org>
            https://mail.mozilla.org/listinfo/rust-dev





_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Reply via email to