[ 
https://issues.apache.org/jira/browse/HDFS-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263601#comment-15263601
 ] 

Duo Zhang commented on HDFS-223:
--------------------------------

In HBASE-14790 we have implemented a {{FanOutOneBlockAsyncDFSOutput}} based on 
netty. It performs much better than the default {{DFSOutputStream}} for WAL in 
HBase.

And we plan to move these stuffs into HDFS since it should belong to HDFS. Of 
course this is not a simple copy-paste work, the implementation in HBase is not 
suitable for general use.

Will come back later with a proposal on the asynchronous API first.

 Thanks.

> Asynchronous IO Handling in Hadoop and HDFS
> -------------------------------------------
>
>                 Key: HDFS-223
>                 URL: https://issues.apache.org/jira/browse/HDFS-223
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Raghu Angadi
>         Attachments: GrizzlyEchoServer.patch, MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with 
> generic asynchronous IO in  Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for 
> data transfers. Each write operation takes up 2 threads at each of the 
> datanodes and each read operation takes one irrespective of how much activity 
> is on the sockets. The kinds of load that HDFS serves has been expanding 
> quite fast and HDFS should handle these varied loads better. If there is a 
> framework for non-blocking IO, read and write pipeline state machines could 
> be implemented with async events on a fixed number of threads. 
> A generic utility is better since it could be used in other places like 
> DFSClient. DFSClient currently creates 2 extra threads for each file it has 
> open for writing.
> Initially I started writing a primitive "selector", then tried to see if such 
> facility already exists. [Apache MINA|http://mina.apache.org] seemed to do 
> exactly this. My impression after looking the the interface and examples is 
> that it does not give kind control we might prefer or need.  First use case I 
> was thinking of implementing using MINA was to replace "response handlers" in 
> DataNode. The response handlers are simpler since they don't involve disk 
> I/O. I [asked on MINA user 
> list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
>  but looks like it can not be done, I think mainly because the sockets are 
> already created.
> Essentially what I have in mind is similar to MINA, except that read and 
> write of the sockets is done by the event handlers. The lowest layer 
> essentially invokes selectors, invokes event handlers on single or on 
> multiple threads. Each event handler is is expected to do some non-blocking 
> work. We would of course have utility handler implementations that do  read, 
> write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more 
> flexible. It is under GPL.
> Are there other such implementations we should look at?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to