[ https://issues.apache.org/jira/browse/HDFS-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15263601#comment-15263601 ]
Duo Zhang commented on HDFS-223: -------------------------------- In HBASE-14790 we have implemented a {{FanOutOneBlockAsyncDFSOutput}} based on netty. It performs much better than the default {{DFSOutputStream}} for WAL in HBase. And we plan to move these stuffs into HDFS since it should belong to HDFS. Of course this is not a simple copy-paste work, the implementation in HBase is not suitable for general use. Will come back later with a proposal on the asynchronous API first. Thanks. > Asynchronous IO Handling in Hadoop and HDFS > ------------------------------------------- > > Key: HDFS-223 > URL: https://issues.apache.org/jira/browse/HDFS-223 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Raghu Angadi > Attachments: GrizzlyEchoServer.patch, MinaEchoServer.patch > > > I think Hadoop needs utilities or framework to make it simpler to deal with > generic asynchronous IO in Hadoop. > Example use case : > Its been a long standing problem that DataNode takes too many threads for > data transfers. Each write operation takes up 2 threads at each of the > datanodes and each read operation takes one irrespective of how much activity > is on the sockets. The kinds of load that HDFS serves has been expanding > quite fast and HDFS should handle these varied loads better. If there is a > framework for non-blocking IO, read and write pipeline state machines could > be implemented with async events on a fixed number of threads. > A generic utility is better since it could be used in other places like > DFSClient. DFSClient currently creates 2 extra threads for each file it has > open for writing. > Initially I started writing a primitive "selector", then tried to see if such > facility already exists. [Apache MINA|http://mina.apache.org] seemed to do > exactly this. My impression after looking the the interface and examples is > that it does not give kind control we might prefer or need. First use case I > was thinking of implementing using MINA was to replace "response handlers" in > DataNode. The response handlers are simpler since they don't involve disk > I/O. I [asked on MINA user > list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html], > but looks like it can not be done, I think mainly because the sockets are > already created. > Essentially what I have in mind is similar to MINA, except that read and > write of the sockets is done by the event handlers. The lowest layer > essentially invokes selectors, invokes event handlers on single or on > multiple threads. Each event handler is is expected to do some non-blocking > work. We would of course have utility handler implementations that do read, > write, accept etc, that are useful for simple processing. > Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more > flexible. It is under GPL. > Are there other such implementations we should look at? -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org