[jira] Commented: (HADOOP-3856) Asynchronous IO Handling in Hadoop and HDFS

Doug Cutting (JIRA) Wed, 08 Oct 2008 14:36:14 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12638100#action_12638100
 ]


Doug Cutting commented on HADOOP-3856:
--------------------------------------

I'm not sure we need "streaming" RPCs or whether buffer-by-buffer calls are 
sufficient.  My hunch would be to try buffer-by-buffer calls first.  Support 
for output parameters is probably required to avoid extra buffer copies, but 
one could first try it without even that to see if we're in the ballpark.

As for async, I don't follow your question.  It would seem simpler to me to add 
an async call pattern to RPC than to build a new async stack.  Perhaps we could 
signal it in a protocol by declaring a method that returns a SelectionKey.  
Such methods would return as soon as their parameters are written, and the 
return value could be used to listen for the response.

None of the RPC systems we might switch to today yet support such features, to 
my knowledge.  So, we could 
 - wait until they do and we've switched to that RPC system.  
 - add these today to our RPC system, start using it today, and then port what 
we've done to a different RPC system when we switch if needed
 - add these features ourselves to a different RPC system now, then port HDFS 
to use that system

The second of these appeals to me, since I think we'll both learn more about 
what features we need and solve our immediate needs sooner.  Much of the work 
would probably not be wasted, since I suspect that whatever RPC system we use 
long-term we will need to implement some low-level transport features, since 
our requirements are more extreme and specific than most folks.


> Asynchronous IO Handling in Hadoop and HDFS
> -------------------------------------------
>
>                 Key: HADOOP-3856
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3856
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs, io
>            Reporter: Raghu Angadi
>            Assignee: Raghu Angadi
>         Attachments: GrizzlyEchoServer.patch, MinaEchoServer.patch
>
>
> I think Hadoop needs utilities or framework to make it simpler to deal with 
> generic asynchronous IO in  Hadoop.
> Example use case :
> Its been a long standing problem that DataNode takes too many threads for 
> data transfers. Each write operation takes up 2 threads at each of the 
> datanodes and each read operation takes one irrespective of how much activity 
> is on the sockets. The kinds of load that HDFS serves has been expanding 
> quite fast and HDFS should handle these varied loads better. If there is a 
> framework for non-blocking IO, read and write pipeline state machines could 
> be implemented with async events on a fixed number of threads. 
> A generic utility is better since it could be used in other places like 
> DFSClient. DFSClient currently creates 2 extra threads for each file it has 
> open for writing.
> Initially I started writing a primitive "selector", then tried to see if such 
> facility already exists. [Apache MINA|http://mina.apache.org] seemed to do 
> exactly this. My impression after looking the the interface and examples is 
> that it does not give kind control we might prefer or need.  First use case I 
> was thinking of implementing using MINA was to replace "response handlers" in 
> DataNode. The response handlers are simpler since they don't involve disk 
> I/O. I [asked on MINA user 
> list|http://www.nabble.com/Async-events-with-existing-NIO-sockets.-td18640767.html],
>  but looks like it can not be done, I think mainly because the sockets are 
> already created.
> Essentially what I have in mind is similar to MINA, except that read and 
> write of the sockets is done by the event handlers. The lowest layer 
> essentially invokes selectors, invokes event handlers on single or on 
> multiple threads. Each event handler is is expected to do some non-blocking 
> work. We would of course have utility handler implementations that do  read, 
> write, accept etc, that are useful for simple processing.
> Sam Pullara mentioned that [xSockets|http://xsocket.sourceforge.net/] is more 
> flexible. It is under GPL.
> Are there other such implementations we should look at?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3856) Asynchronous IO Handling in Hadoop and HDFS

Reply via email to