[jira] [Commented] (HDFS-11435) NameNode should track open for write files lengths more frequent than on newer block allocations

Manoj Govindassamy (JIRA) Wed, 22 Feb 2017 10:44:57 -0800

    [ 
https://issues.apache.org/jira/browse/HDFS-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15878952#comment-15878952
 ]


Manoj Govindassamy commented on HDFS-11435:
-------------------------------------------

Thanks [~linyiqun] for the reference to HDFS-11194. Will take a look.

[~raviprak], 
I get your points. On normal circumstances, we will not be needing the _near_ 
realtime lengths of OPEN_FOR_WRITE files. If at all needed, as Jing pointed 
out, there are already provisions for clients to reach out to DataNodes 
directly to find the the latest lengths for a being written file. The intention 
here is not check the progress of a slow writer. I believe the current 
LeaseManager soft/hard lease limits are good enough for tackling very slow 
writer issues.

The intention of this jira is to close a gap in HDFS Snapshots w.r.t open 
files. As many other metadata only operations, HDFS Snapshots are NameNode only 
operations and there by the file lengths captured are as good as what is 
available in NN at the Snapshot time. So, for the files that are open and being 
written, NN lags the latest file lengths by as much as a block size and there 
by these open files that are captured in Snapshots have incorrect lengths. The 
current behavior of HDFS Snapshots is to let these open files in the Snapshots 
also grow/shrink just like the original file, and finalize it only after its 
open file is closed. Thus HDFS Snapshots are not truly _read-only_ w.r.t open 
files. HDFS -11402 attempts to close this gap and make HDFS Snapshots truly 
read-only by freezing these open files in Snapshots via meta data copy. To make 
the design proposed in above jira more reliable, we need NN getting to know 
lengths of open files more frequently than the current model. More discussions 
on this are available in HDFS-11402. Please let me know if you need more 
details. 




> NameNode should track open for write files lengths more frequent than on 
> newer block allocations
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11435
>                 URL: https://issues.apache.org/jira/browse/HDFS-11435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>
> *Problem:*
> Currently the length of an open for write / Under construction file is 
> updated on the NameNode only when 
> # Block boundary: On block boundaries and upon allocation of new Block, 
> NameNode gets to know the file growth and the file length catches up
> # hsync(SyncFlag.UPDATE_LENGTH): Upon Client apps invoking a hsync on the 
> write stream with a special flag, DataNodes send an incremental block report 
> with the latest file length which NameNode uses it to update its meta data.
> # First hflush() on the new Block: Upon Client apps doing first time hflush() 
> on an every new Block, DataNodes notifies NameNode about the latest file 
> length.
> # Output stream close: Forces DataNodes update NameNode about the file length 
> after data persistence and proper acknowledgements in the pipeline.
> So, lengths for open for write files are usually a lot less than the length 
> seen by the DN/client. Highly preferred to have NameNode not lagging in file 
> lengths by order of Block size for under construction files and to have more 
> frequent, scalable update mechanism for these open file lengths. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-11435) NameNode should track open for write files lengths more frequent than on newer block allocations

Reply via email to