[jira] Commented: (HADOOP-2657) Enhancements to DFSClient to support flushing data at any point in time

Doug Cutting (JIRA) Thu, 28 Feb 2008 13:00:49 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12573441#action_12573441
 ]


Doug Cutting commented on HADOOP-2657:
--------------------------------------

This is a bug in ChecksumFileSystem: if you call flush, it should flush the 
checksum stream too.  But we perhaps don't have to fix that bug in this issue.

Changing flush() to throw an exception on all but HDFS (as proposed above) 
would not be good.  This issue should improve flush() for HDFS, and not break 
it for all other filesystems.

Filing a separate issue to improve flush for ChecksumFileSystem would be good.  
This could either be done as I suggested above, by having FSOutputStream 
implement Seekable, but only implementing seek() in the local filesystem.  Or 
instead, we could leave the FSOutputStream API alone, and ChecksumFileSystem 
could, when more output is written after a flush, throw an exception if the 
underlying FSOutputStream implementation doesn't implement Seekable.  In either 
case, RawLocalFileSystem would implement Seekable for its FSOutputStream 
implementation, and ChecksumFileSystem could use this to rewind checksum output 
when data is appended after a flush().

> Enhancements to DFSClient to support flushing data at any point in time
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-2657
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2657
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: flush.patch, flush2.patch, flush3.patch
>
>
> The HDFS Append Design (HADOOP-1700) requires that there be a public API to 
> flush data written to a HDFS file that can be invoked by an application. This 
> API (popularly referred to a fflush(OutputStream)) will ensure that data 
> written to the DFSOutputStream is flushed to datanodes and any required 
> metadata is persisted on Namenode.
> This API has to handle the case when the client decides to flush after 
> writing data that is not a exact multiple of io.bytes.per.checksum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-2657) Enhancements to DFSClient to support flushing data at any point in time

Reply via email to