Re: [rust-dev] net::tcp::TcpSocket slow?

Brian Anderson Thu, 20 Dec 2012 21:18:18 -0800

On 12/20/2012 08:17 PM, Patrick Walton wrote:

I just profiled this. Some thoughts:


Thanks.

On 12/20/12 9:12 PM, Brian Anderson wrote:
First, stack switching. Switching between Rust and C code has bad
performance due to bad branch prediction. Some workloads can spend
10% of their time stalling in the stack switch.
This didn't seem too high, actually. It should only be ~20,000 stackswitches (read and write) if we solve the following issue:
Second, working with uv involves sending a bunch of little work units to
a dedicated uv task. This is because callbacks from uv into Rust *must
not fail* or the runtime will crash. Where typical uv code runs directly
in the event callbacks, Rust dispatches most or all of that work to
other tasks. This imposes significant context switching and locking
overhead.
This is actually the problem. If you're using a nonblocking I/Olibrary (libuv) for a fundamentally blocking workload (sending lots ofrequests to redis and blocking on the response for each one), *and*you're multiplexing userland green threads on top of it, then you'regoing to get significantly worse performance than you would if you hadused a blocking I/O setup. We can make some of the performancedifferential up by switching uv over to pipes, and maybe we can playdirty tricks like having the main thread spin on the read lock so thatwe don't have to fall into the scheduler to punt it awake, but I stilldon't see any way we will make up the 10x performance difference forthis particular use case without a fundamental change to thearchitecture. Work stealing doesn't seem to be a viable solution heresince the uv task really needs to be one-task-per-thread.

I'm still optimistic, though I agree that the throughput of using pipesand tasks to create synchronous interfaces on top of uv must be lowerthan if using uv as intended, particularly if you multiplex data acrossmany tasks.

In the best case you essentially just want to be able to quicklytransfer large buffers from the client task to the I/O task, or theother way. In a dual-core setup pipes should be able to do that veryfast, rarely yielding to the (Rust) scheduler. I'm assuming that we cando that without copying buffers to and from the uv loop, and I alsoassume that the current implementation requires copies. You'll still paythe cost of dealing with the uv async callback (don't recall exactlywhat it's called but it's how you wake up the loop from another thread),so it's going to be slower, but as the general purpose I/O subsystem Ithink it will be the best choice because it doesn't block your task, soyou can write task oriented code and it will do the right thing.

Maybe the best thing is just to make the choice of nonblocking versusblocking I/O a choice that tasks can make on an individual basis. It'sa footgun to be sure; if you use blocking I/O you run the risk ofstarving other tasks on the same scheduler to death, so perhaps weshould restrict this mode to schedulers with 1:1 scheduling. But thiswould be in line with the general principle that we've been followingthat the choice of 1:1 and M:N scheduling should be left to the user,because there are performance advantages and disadvantages to each mode.

I agree that we need multiple strategies, ideally that implement commontraits.

In order to have competitive performance in high-throughput applicationswe need to use uv as intended and run our code inside the uv callbacks.Right now there is no high-level API for this, so I think that when werevisit this stuff one of the first things we need to do is add thosemodules and get them solid and fast. Then rebuild everything on top of that.


-Brian
_______________________________________________
Rust-dev mailing list
Rust-dev@mozilla.org
https://mail.mozilla.org/listinfo/rust-dev

Re: [rust-dev] net::tcp::TcpSocket slow?

Reply via email to