GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/1907
[WIP] [SPARK-2468] Netty based block server / client module This is a rough draft, but it can't be much worse than the old Netty module since the old one didn't really work :) Compared with the old Netty module, this one features: - It appears to work :) The old one didn't have a frame decoder and only worked for blocks with very small size - Basically a rewrite of the old Netty module, which was introduced about 1.5 years ago but turned off by default - SPARK-2941: option to specicy nio vs oio vs epoll for channel/transport. By default epoll is used on Linux, and nio is used on other platforms - SPARK-2943: options to specify send buf and receive buf for users who want to do hyper tuning - SPARK-2942: io errors are reported from server to client (the protocol uses negative length to indicate error) - SPARK-2940: fetching multiple blocks in a single request to reduce syscalls - SPARK-2959: clients share a single thread pool - SPARK-2990: use PooledByteBufAllocator to reduce GC (basically a Netty managed pool of buffers with jmalloc) Compared with the existing communication manager, this one features: - IMO it is substantially easier to understand - Should create much less garbage because of SPARK-2990 (PooledByteBufAllocator) - don't quote me on this, but I think a lot less sys calls - zero-copy send for the server for on-disk blocks - one-copy receive (due to a frame decoder) TODOs before it can fully replace the existing ConnectionManager, if that will ever happen (most of them can probably be done in separate PRs since this needs to be turned on explicitly) - [ ] Lots of test cases - [ ] Performance analysis - [ ] Support client connection reuse so we don't need to keep opening new connections - [ ] Support serving non-disk blocks - [ ] Support SASL authentication Thanks to @coderplay for peer coding with me on a Sunday. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark netty Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1907.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1907 ---- commit cc56c138eb13e5cabbda2e888a066f140438bd42 Author: Reynold Xin <r...@apache.org> Date: 2014-08-11T08:03:55Z Basic skeleton. commit f6263e0766b87541a9d654c4a426b86527980685 Author: Reynold Xin <r...@apache.org> Date: 2014-08-12T06:52:40Z New Netty implementation. commit 9bc17fe5cc71e9002d081bbf847c94961b5a41fc Author: Reynold Xin <r...@apache.org> Date: 2014-08-12T06:59:30Z Completed protocol documentation. commit 60bff09dd8f6984a9af1c334532cfba845aef96c Author: Reynold Xin <r...@apache.org> Date: 2014-08-12T08:05:21Z Connected the new netty network module with rest of Spark. commit b292701702c57b41b728c2ea74fb7d68f391f106 Author: Reynold Xin <r...@apache.org> Date: 2014-08-12T18:09:47Z Made everything work with proper reference counting. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org