Re: [rust-dev] The future of M:N threading

Thad Guidry Thu, 14 Nov 2013 07:02:11 -0800

Great !  Glad to hear that the plan is to support both use cases by
abstracting M:N and 1:1 eventually.  That will go a long way towards my
later experiments with ETL (extract, transform, load) under various OS's
and multi-core architectures using parallel task flow (I still don't know
what I will call the flow type...lolol)


On Wed, Nov 13, 2013 at 11:50 PM, Brian Anderson <[email protected]>wrote:

> Thanks for the great reply, Alex. This is the approach we are going to
> take. Rust is not going to move away from green threads; the plan is to
> support both use cases in the standard library.
>
>
> On 11/13/2013 10:32 AM, Alex Crichton wrote:
>
>> The situation may not be as dire as you think. The runtime is still in a
>> state
>> of flux, and don't forget that in one summer the entire runtime was
>> rewritten in
>> rust and was entirely redesigned. I personally still think that M:N is a
>> viable
>> model for various applications, and it seems especially unfortunate to
>> just
>> remove everything because it's not tailored for all use cases.
>>
>> Rust made an explicit design decision early on to pursue lightweight/green
>> tasks, and it was made with the understanding that there were drawbacks
>> to the
>> strategy. Using libuv as a backend for driving I/O was also an explicit
>> decision
>> with known drawbacks.
>>
>> That being said, I do not believe that all is lost. I don't believe that
>> the
>> rust standard library as-is today can support *every* use case, but it's
>> getting
>> to a point where it can get pretty close. In the recent redesign of the
>> I/O
>> implementation, all I/O was abstracted behind trait objects that are
>> synchronous
>> in their interface. This I/O interface is all implemented in librustuv by
>> talking to the rust scheduler under the hood. Additionally, in pull
>> #10457, I'm
>> starting to add support for a native implementation of this I/O
>> interface. The
>> great boon of this strategy is that all std::io primitives have no idea
>> if their
>> underlying interface is native and blocking or libuv and asynchronous.
>> The exact
>> same rust code works for one as it does for the other.
>>
>> I personally don't see why the same strategy shouldn't work for the task
>> model
>> as well. When you link a program to the librustuv crate, then you're
>> choosing to
>> have a runtime with M:N scheduling and asynchronous I/O. Perhaps, though,
>> if you
>> didn't link to librustuv, you would get 1:1 scheduling with blocking I/O.
>> You
>> would still have all the benefits of the standard library's communication
>> primitives, spawning primitives, I/O, task-local-storage etc. The only
>> difference is that everything would be powered by OS-level threads
>> instead of
>> rust-level green tasks.
>>
>> I would very much like to see a standard library which supports this
>> abstraction, and I believe that it is very realistically possible. Right
>> now we
>> have an EventLoop interface which defines interacting with I/O that is the
>> abstraction between asynchronous I/O and blocking I/O. This sounds like
>> we need a more formalized Scheduler interface which abstracts M:N
>> scheduling vs
>> 1:1 scheduling.
>>
>> The main goal of all of this would be to allow the same exact rust code
>> to work
>> in both M:N and 1:1 environments. This ability would allow authors to
>> specialize
>> their code for their task at-hand. Those writing web servers would be
>> sure to
>> link to librustuv, but those writing command-line utilities would simply
>> just
>> omit librustuv. Additionally, as a library author, I don't really care
>> which
>> implementation you're using. I can write a mysql database driver and then
>> you as
>> a consumer of my library decided whether my network calls are blocking or
>> not.
>>
>> This is a fairly new concept to me (I haven't thought much about it
>> before), but
>> this sounds like it may be the right way forward to addressing your
>> concerns
>> without compromising too much existing functionality. There would
>> certainly be
>> plenty of work to do in this realm, and I'm not sure if this goal would
>> block
>> the 1.0 milestone or not. Ideally, this would be a completely
>> backwards-compatible change, but there would perhaps be unintended
>> consequences.
>> As always, this would need plenty of discussion to see whether this is
>> even a
>> reasonable strategy to take.
>>
>>
>> On Wed, Nov 13, 2013 at 2:45 AM, Daniel Micay <[email protected]>
>> wrote:
>>
>>> Before getting right into the gritty details about why I think we should
>>> think
>>> about a path away from M:N scheduling, I'll go over the details of the
>>> concurrency model we currently use.
>>>
>>> Rust uses a user-mode scheduler to cooperatively schedule many tasks
>>> onto OS
>>> threads. Due to the lack of preemption, tasks need to manually yield
>>> control
>>> back to the scheduler. Performing I/O with the standard library will
>>> block the
>>> *task*, but yield control back to the scheduler until the I/O is
>>> completed.
>>>
>>> The scheduler manages a thread pool where the unit of work is a task
>>> rather
>>> than a queue of closures to be executed or data to be pass to a
>>> function. A
>>> task consists of a stack, register context and task-local storage much
>>> like an
>>> OS thread.
>>>
>>> In the world of high-performance computing, this is a proven model for
>>> maximizing throughput for CPU-bound tasks. By abandoning preemption,
>>> there's
>>> zero overhead from context switches. For socket servers with only
>>> negligible
>>> server-side computations the avoidance of context switching is a boon for
>>> scalability and predictable performance.
>>>
>>> # Lightweight?
>>>
>>> Rust's tasks are often called *lightweight* but at least on Linux the
>>> only
>>> optimization is the lack of preemption. Since segmented stacks have been
>>> dropped, the resident/virtual memory usage will be identical.
>>>
>>> # Spawning performance
>>>
>>> An OS thread can actually spawn nearly as fast as a Rust task on a
>>> system with
>>> one CPU. On a multi-core system, there's a high chance of the new thread
>>> being
>>> spawned on a different CPU resulting in a performance loss.
>>>
>>> Sample C program, if you need to see it to believe it:
>>>
>>> ```
>>> #include <pthread.h>
>>> #include <err.h>
>>>
>>> static const size_t n_thread = 100000;
>>>
>>> static void *foo(void *arg) {
>>>      return arg;
>>> }
>>>
>>> int main(void) {
>>>      for (size_t i = 0; i < n_thread; i++) {
>>>          pthread_attr_t attr;
>>>          if (pthread_attr_init(&attr) < 0) {
>>>              return 1;
>>>          }
>>>          if (pthread_attr_setdetachstate(&attr,
>>> PTHREAD_CREATE_DETACHED) < 0) {
>>>              return 1;
>>>          }
>>>          pthread_t thread;
>>>          if (pthread_create(&thread, &attr, foo, NULL) < 0) {
>>>              return 1;
>>>          }
>>>      }
>>>      pthread_exit(NULL);
>>> }
>>> ```
>>>
>>> Sample Rust program:
>>>
>>> ```
>>> fn main() {
>>>      for _ in range(0, 100000) {
>>>          do spawn {
>>>          }
>>>      }
>>> }
>>> ```
>>>
>>> For both programs, I get around 0.9s consistently when pinned to a core.
>>> The
>>> Rust version drops to 1.1s when not pinned and the OS thread one to
>>> about 2s.
>>> It drops further when asked to allocate 8MiB stacks like C is doing, and
>>> will
>>> drop more when it has to do `mmap` and `mprotect` calls like the pthread
>>> API.
>>>
>>> # Asynchronous I/O
>>>
>>> Rust's requirements for asynchronous I/O would be filled well by direct
>>> usage
>>> of IOCP on Windows. However, Linux only has solid support for
>>> non-blocking
>>> sockets because file operations usually just retrieve a result from
>>> cache and
>>> do not truly have to block. This results in libuv being significantly
>>> slower
>>> than blocking I/O for most common cases for the sake of scalable socket
>>> servers.
>>>
>>> On modern systems with flash memory, including mobile, there is a
>>> *consistent*
>>> and relatively small worst-case latency for accessing data on the disk so
>>> blocking is essentially a non-issue. Memory mapped I/O is also an
>>> incredibly
>>> important feature for I/O performance, and there's almost no reason to
>>> use
>>> traditional I/O on 64-bit. However, it's a no-go with M:N scheduling
>>> because
>>> the page faults block the thread.
>>>
>>> # Overview
>>>
>>> Advantages:
>>>
>>> * lack of preemptive/fair scheduling, leading to higher throughput
>>> * very fast context switches to other tasks on the same scheduler thread
>>>
>>> Disadvantages:
>>>
>>> * lack of preemptive/fair scheduling (lower-level model)
>>> * poor profiler/debugger support
>>> * async I/O stack is much slower for the common case; for example stat
>>> is 35x
>>>    slower when run in a loop for an mlocate-like utility
>>> * true blocking code will still block a scheduler thread
>>> * most existing libraries use blocking I/O and OS threads
>>> * cannot directly use fast and easy to use linker-supported thread-local
>>> data
>>> * many existing libraries rely on thread-local storage, so there's a
>>> need to be
>>>    wary of hidden yields in Rust function calls and it's very difficult
>>> to
>>>    expose a safe interface to these libraries
>>> * every level of a CPU architecture adding registers needs explicit
>>> support
>>>    from Rust, and it must be selected at runtime when not targeting a
>>> specific
>>>    CPU (this is currently not done correctly)
>>>
>>> # User-mode scheduling
>>>
>>> Windows 7 introduced user-mode scheduling[1] to replace fibers on 64-bit.
>>> Google implemented the same thing for Linux (perhaps even before Windows
>>> 7 was
>>> released), and plans on pushing for it upstream.[2] The linked video
>>> does a
>>> better job of covering this than I can.
>>>
>>> User-mode scheduling provides a 1:1 threading model including full
>>> support for
>>> normal thread-local data and existing debuggers/profilers. It can yield
>>> to the
>>> scheduler on system calls and page faults. The operating system is
>>> responsible
>>> for details like context switching, so a large maintenance/portability
>>> burden
>>> is dealt with. It narrows down the above disadvantage list to just the
>>> point
>>> about not having preemptive/fair scheduling and doesn't introduce any
>>> new ones.
>>>
>>> I hope this is where concurrency is headed, and I hope Rust doesn't miss
>>> this
>>> boat by concentrating too much on libuv. I think it would allow us to
>>> simply
>>> drop support for pseudo-blocking I/O in the Go style and ignore
>>> asynchronous
>>> I/O and non-blocking sockets in the standard library. It may be useful
>>> to have
>>> the scheduler use them, but it wouldn't be essential.
>>>
>>> [1] http://msdn.microsoft.com/en-us/library/windows/desktop/
>>> dd627187(v=vs.85).aspx
>>> [2] http://www.youtube.com/watch?v=KXuZi9aeGTw
>>> _______________________________________________
>>> Rust-dev mailing list
>>> [email protected]
>>> https://mail.mozilla.org/listinfo/rust-dev
>>>
>> _______________________________________________
>> Rust-dev mailing list
>> [email protected]
>> https://mail.mozilla.org/listinfo/rust-dev
>>
>
> _______________________________________________
> Rust-dev mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/rust-dev
>



-- 
-Thad
+ThadGuidry <https://www.google.com/+ThadGuidry>
Thad on LinkedIn <http://www.linkedin.com/in/thadguidry/>

_______________________________________________
Rust-dev mailing list
[email protected]
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] The future of M:N threading

Reply via email to