[ https://issues.apache.org/jira/browse/HDFS-15972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17320069#comment-17320069 ]
Steve Loughran commented on HDFS-15972: --------------------------------------- Is it data which has been persisted via Syncable.hsync() which is being lost? As there are no guarantees of when data written to HDFS are persisted until that call is made. I suspect that in your test setup, the appendToFile client hasn't yet actually called hflush() or hsync() so all its data is still being stored on buffers in the client: nothing is persisted in HDFS. There's also the issue that the NN metadata lags, even after an hsync/hflush call: a file may be longer than the data you get from getFileStatus() until: the file is closed or the write completes an entire block and the NN is updated. https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/outputstream.md That may be a cause of the problem: even if the DNs have the data, if the length of the file in the NN is not yet updated, the old length is known of so updates will be missed > Fedbalance only copies data partially when there's existing opened file > ----------------------------------------------------------------------- > > Key: HDFS-15972 > URL: https://issues.apache.org/jira/browse/HDFS-15972 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Felix N > Priority: Major > > If there are opened files when fedbalance is run and data is being written to > these files, fedbalance might skip the newly written data. > Steps to recreate the issue: > # Create a dummy file /test/file with some data: {{echo "start" | hdfs dfs > -appendToFile /test/file}} > # Start writing to the file: {{hdfs dfs -appendToFile /test/file}} but do > not stop writing > # Run fedbalance: {{hadoop fedbalance submit hdfs://ns1/test > hdfs://ns2/test}} > # Write something to the file while fedbalance is running, "end" for > example, then stop writing > # After fedbalance is done, {{hdfs://ns2/test/file}} should only contain > "start" while {{hdfs://ns1/user/hadoop/.Trash/Current/test/file}} contains > "start\nend" > Fedbalance is run with default configs and arguments so no diff should happen. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org