[ 
https://issues.apache.org/jira/browse/HDFS-814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12794446#action_12794446
 ] 

dhruba borthakur commented on HDFS-814:
---------------------------------------

> but it has the following issue: Applications that don't care about very 
> accurate file lengths will pay the cost for files

This will happen only if the file is being written to when somebody else does a 
getFileStatus on the file. This should never happen for the most typical app 
that runs on HDFS... a map-reduce job.

>Cost of ls -r of a dir (say MR output dir) can go up when some of the files in 
>the subtree are open for writing.

I suspect that this is not a typical use-case. The MR-job output directory will 
typically be empty until the job is committed and all files get renamed into 
the out directory (from the tmp directory).

I am good for this patch because this does not introduce a 
FileSystem/FileContext API.



> Add an api to get the visible length of a DFSDataInputStream.
> -------------------------------------------------------------
>
>                 Key: HDFS-814
>                 URL: https://issues.apache.org/jira/browse/HDFS-814
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs client
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Tsz Wo (Nicholas), SZE
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: h814_20091221.patch, h814_20091221_0.21.patch
>
>
> Hflush guarantees that the bytes written before are visible to the new 
> readers.  However, there is no way to get the length of the visible bytes.  
> The visible length is useful in some applications like SequenceFile.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to