[
https://issues.apache.org/jira/browse/HADOOP-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12468720
]
dhruba borthakur commented on HADOOP-922:
-----------------------------------------
I agree with your comments. The amount of data cached by the receiving size of
the TCP connection could possibly depend on the latency of transfer and the
amount of memory available to the sender and received.
By default, the TCP sending window size is usually 128KB and receiving windows
size is 4MB. I propose that I change the above patch to trigger the optmization
only if the skip length is <= 128KB.
> Optimize small reads and seeks
> ------------------------------
>
> Key: HADOOP-922
> URL: https://issues.apache.org/jira/browse/HADOOP-922
> Project: Hadoop
> Issue Type: Bug
> Components: dfs
> Affects Versions: 0.10.1
> Reporter: dhruba borthakur
> Assigned To: dhruba borthakur
> Attachments: smallreadseek3.patch
>
>
> A seek on a DFSInputStream causes causes the next read to re-open the socket
> connection to the datanode and fetch the remainder of the block all over
> again. This is not optimal.
> A small read followed by a small positive seek could re-utilize the data
> already fetched from the datanode as part of the previous read.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.