Re: [rust-dev] The future of M:N threading

Alex Crichton Wed, 13 Nov 2013 10:32:59 -0800

The situation may not be as dire as you think. The runtime is still in a state
of flux, and don't forget that in one summer the entire runtime was rewritten in
rust and was entirely redesigned. I personally still think that M:N is a viable
model for various applications, and it seems especially unfortunate to just
remove everything because it's not tailored for all use cases.

Rust made an explicit design decision early on to pursue lightweight/green
tasks, and it was made with the understanding that there were drawbacks to the
strategy. Using libuv as a backend for driving I/O was also an explicit decision
with known drawbacks.

That being said, I do not believe that all is lost. I don't believe that the
rust standard library as-is today can support *every* use case, but it's getting
to a point where it can get pretty close. In the recent redesign of the I/O
implementation, all I/O was abstracted behind trait objects that are synchronous
in their interface. This I/O interface is all implemented in librustuv by
talking to the rust scheduler under the hood. Additionally, in pull #10457, I'm
starting to add support for a native implementation of this I/O interface. The
great boon of this strategy is that all std::io primitives have no idea if their
underlying interface is native and blocking or libuv and asynchronous. The exact
same rust code works for one as it does for the other.

I personally don't see why the same strategy shouldn't work for the task model
as well. When you link a program to the librustuv crate, then you're choosing to
have a runtime with M:N scheduling and asynchronous I/O. Perhaps, though, if you
didn't link to librustuv, you would get 1:1 scheduling with blocking I/O. You
would still have all the benefits of the standard library's communication
primitives, spawning primitives, I/O, task-local-storage etc. The only
difference is that everything would be powered by OS-level threads instead of
rust-level green tasks.

I would very much like to see a standard library which supports this
abstraction, and I believe that it is very realistically possible. Right now we
have an EventLoop interface which defines interacting with I/O that is the
abstraction between asynchronous I/O and blocking I/O. This sounds like
we need a more formalized Scheduler interface which abstracts M:N scheduling vs
1:1 scheduling.

The main goal of all of this would be to allow the same exact rust code to work
in both M:N and 1:1 environments. This ability would allow authors to specialize
their code for their task at-hand. Those writing web servers would be sure to
link to librustuv, but those writing command-line utilities would simply just
omit librustuv. Additionally, as a library author, I don't really care which
implementation you're using. I can write a mysql database driver and then you as
a consumer of my library decided whether my network calls are blocking or not.

This is a fairly new concept to me (I haven't thought much about it before), but
this sounds like it may be the right way forward to addressing your concerns
without compromising too much existing functionality. There would certainly be
plenty of work to do in this realm, and I'm not sure if this goal would block
the 1.0 milestone or not. Ideally, this would be a completely
backwards-compatible change, but there would perhaps be unintended consequences.
As always, this would need plenty of discussion to see whether this is even a
reasonable strategy to take.

On Wed, Nov 13, 2013 at 2:45 AM, Daniel Micay <danielmi...@gmail.com> wrote:
> Before getting right into the gritty details about why I think we should think
> about a path away from M:N scheduling, I'll go over the details of the
> concurrency model we currently use.
>
> Rust uses a user-mode scheduler to cooperatively schedule many tasks onto OS
> threads. Due to the lack of preemption, tasks need to manually yield control
> back to the scheduler. Performing I/O with the standard library will block the
> *task*, but yield control back to the scheduler until the I/O is completed.
>
> The scheduler manages a thread pool where the unit of work is a task rather
> than a queue of closures to be executed or data to be pass to a function. A
> task consists of a stack, register context and task-local storage much like an
> OS thread.
>
> In the world of high-performance computing, this is a proven model for
> maximizing throughput for CPU-bound tasks. By abandoning preemption, there's
> zero overhead from context switches. For socket servers with only negligible
> server-side computations the avoidance of context switching is a boon for
> scalability and predictable performance.
>
> # Lightweight?
>
> Rust's tasks are often called *lightweight* but at least on Linux the only
> optimization is the lack of preemption. Since segmented stacks have been
> dropped, the resident/virtual memory usage will be identical.
>
> # Spawning performance
>
> An OS thread can actually spawn nearly as fast as a Rust task on a system with
> one CPU. On a multi-core system, there's a high chance of the new thread being
> spawned on a different CPU resulting in a performance loss.
>
> Sample C program, if you need to see it to believe it:
>
> ```
> #include <pthread.h>
> #include <err.h>
>
> static const size_t n_thread = 100000;
>
> static void *foo(void *arg) {
>     return arg;
> }
>
> int main(void) {
>     for (size_t i = 0; i < n_thread; i++) {
>         pthread_attr_t attr;
>         if (pthread_attr_init(&attr) < 0) {
>             return 1;
>         }
>         if (pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_DETACHED) < 0) {
>             return 1;
>         }
>         pthread_t thread;
>         if (pthread_create(&thread, &attr, foo, NULL) < 0) {
>             return 1;
>         }
>     }
>     pthread_exit(NULL);
> }
> ```
>
> Sample Rust program:
>
> ```
> fn main() {
>     for _ in range(0, 100000) {
>         do spawn {
>         }
>     }
> }
> ```
>
> For both programs, I get around 0.9s consistently when pinned to a core. The
> Rust version drops to 1.1s when not pinned and the OS thread one to about 2s.
> It drops further when asked to allocate 8MiB stacks like C is doing, and will
> drop more when it has to do `mmap` and `mprotect` calls like the pthread API.
>
> # Asynchronous I/O
>
> Rust's requirements for asynchronous I/O would be filled well by direct usage
> of IOCP on Windows. However, Linux only has solid support for non-blocking
> sockets because file operations usually just retrieve a result from cache and
> do not truly have to block. This results in libuv being significantly slower
> than blocking I/O for most common cases for the sake of scalable socket
> servers.
>
> On modern systems with flash memory, including mobile, there is a *consistent*
> and relatively small worst-case latency for accessing data on the disk so
> blocking is essentially a non-issue. Memory mapped I/O is also an incredibly
> important feature for I/O performance, and there's almost no reason to use
> traditional I/O on 64-bit. However, it's a no-go with M:N scheduling because
> the page faults block the thread.
>
> # Overview
>
> Advantages:
>
> * lack of preemptive/fair scheduling, leading to higher throughput
> * very fast context switches to other tasks on the same scheduler thread
>
> Disadvantages:
>
> * lack of preemptive/fair scheduling (lower-level model)
> * poor profiler/debugger support
> * async I/O stack is much slower for the common case; for example stat is 35x
>   slower when run in a loop for an mlocate-like utility
> * true blocking code will still block a scheduler thread
> * most existing libraries use blocking I/O and OS threads
> * cannot directly use fast and easy to use linker-supported thread-local data
> * many existing libraries rely on thread-local storage, so there's a need to 
> be
>   wary of hidden yields in Rust function calls and it's very difficult to
>   expose a safe interface to these libraries
> * every level of a CPU architecture adding registers needs explicit support
>   from Rust, and it must be selected at runtime when not targeting a specific
>   CPU (this is currently not done correctly)
>
> # User-mode scheduling
>
> Windows 7 introduced user-mode scheduling[1] to replace fibers on 64-bit.
> Google implemented the same thing for Linux (perhaps even before Windows 7 was
> released), and plans on pushing for it upstream.[2] The linked video does a
> better job of covering this than I can.
>
> User-mode scheduling provides a 1:1 threading model including full support for
> normal thread-local data and existing debuggers/profilers. It can yield to the
> scheduler on system calls and page faults. The operating system is responsible
> for details like context switching, so a large maintenance/portability burden
> is dealt with. It narrows down the above disadvantage list to just the point
> about not having preemptive/fair scheduling and doesn't introduce any new 
> ones.
>
> I hope this is where concurrency is headed, and I hope Rust doesn't miss this
> boat by concentrating too much on libuv. I think it would allow us to simply
> drop support for pseudo-blocking I/O in the Go style and ignore asynchronous
> I/O and non-blocking sockets in the standard library. It may be useful to have
> the scheduler use them, but it wouldn't be essential.
>
> [1] 
> http://msdn.microsoft.com/en-us/library/windows/desktop/dd627187(v=vs.85).aspx
> [2] http://www.youtube.com/watch?v=KXuZi9aeGTw
> _______________________________________________
> Rust-dev mailing list
> Rust-dev@mozilla.org
> https://mail.mozilla.org/listinfo/rust-dev
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] The future of M:N threading

Reply via email to