DFSClient  writes : DataStreamer thread can be removed
------------------------------------------------------

                 Key: HADOOP-3325
                 URL: https://issues.apache.org/jira/browse/HADOOP-3325
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.16.0
            Reporter: Raghu Angadi



When a client is writing data to DFS, DFSClient keeps two threads for each file 
open : 
- DataStreamer thread : writes the data to DataNodes (as 64k packets)
- ResponseProcessor : receives acks from the datanodes and detects related 
errors.

I think job of DataStreamer can be done inside user's write() (i.e. inside the 
user thread). So for normal case, there will be one less thread. When there is 
an error in the write pipeline, all the un-acked packets need to be resent. In 
that case, ResponseProcessor can always create temporary thread to send these 
packets.
 
In the future, the acks for multiple pipelines can be handled by a common 
thread (at least in the default case where sockets are non-blocking). 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to