GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/1907

    [WIP] [SPARK-2468] Netty based block server / client module

    This is a rough draft, but it can't be much worse than the old Netty module 
since the old one didn't really work :)
    
    Compared with the old Netty module, this one features:
    - It appears to work :) The old one didn't have a frame decoder and only 
worked for blocks with very small size
    - Basically a rewrite of the old Netty module, which was introduced about 
1.5 years ago but turned off by default
    - SPARK-2941: option to specicy nio vs oio vs epoll for channel/transport. 
By default epoll is used on Linux, and nio is used on other platforms
    - SPARK-2943: options to specify send buf and receive buf for users who 
want to do hyper tuning
    - SPARK-2942: io errors are reported from server to client (the protocol 
uses negative length to indicate error)
    - SPARK-2940: fetching multiple blocks in a single request to reduce 
syscalls
    - SPARK-2959: clients share a single thread pool
    - SPARK-2990: use PooledByteBufAllocator to reduce GC (basically a Netty 
managed pool of buffers with jmalloc)
    
    Compared with the existing communication manager, this one features:
    - IMO it is substantially easier to understand
    - Should create much less garbage because of SPARK-2990 
(PooledByteBufAllocator)
    - don't quote me on this, but I think a lot less sys calls
    - zero-copy send for the server for on-disk blocks
    - one-copy receive (due to a frame decoder)
    
    TODOs before it can fully replace the existing ConnectionManager, if that 
will ever happen (most of them can probably be done in separate PRs since this 
needs to be turned on explicitly)
    - [ ] Lots of test cases
    - [ ] Performance analysis
    - [ ] Support client connection reuse so we don't need to keep opening new 
connections
    - [ ] Support serving non-disk blocks
    - [ ] Support SASL authentication
    
    
    Thanks to @coderplay for peer coding with me on a Sunday.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark netty

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1907.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1907
    
----
commit cc56c138eb13e5cabbda2e888a066f140438bd42
Author: Reynold Xin <r...@apache.org>
Date:   2014-08-11T08:03:55Z

    Basic skeleton.

commit f6263e0766b87541a9d654c4a426b86527980685
Author: Reynold Xin <r...@apache.org>
Date:   2014-08-12T06:52:40Z

    New Netty implementation.

commit 9bc17fe5cc71e9002d081bbf847c94961b5a41fc
Author: Reynold Xin <r...@apache.org>
Date:   2014-08-12T06:59:30Z

    Completed protocol documentation.

commit 60bff09dd8f6984a9af1c334532cfba845aef96c
Author: Reynold Xin <r...@apache.org>
Date:   2014-08-12T08:05:21Z

    Connected the new netty network module with rest of Spark.

commit b292701702c57b41b728c2ea74fb7d68f391f106
Author: Reynold Xin <r...@apache.org>
Date:   2014-08-12T18:09:47Z

    Made everything work with proper reference counting.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to