[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479499#comment-13479499 ] stack commented on HBASE-6758: -- That would explain it (I missed that it was a move... its been at least a week since I reviewed patches... forgive me). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479491#comment-13479491 ] Lars Hofhansl commented on HBASE-6758: -- What diff are you looking at? The diff in HLog moves a block of code around. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479484#comment-13479484 ] stack commented on HBASE-6758: -- Better include it then when you apply to 0.94 (I can't see a diff... not even white space) > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479479#comment-13479479 ] Lars Hofhansl commented on HBASE-6758: -- @Stack: Possibly, that's what the 0.96 patch does too :) > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479474#comment-13479474 ] stack commented on HBASE-6758: -- Is that a non-change in Index: src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java? Good by me committing to 0.94. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479471#comment-13479471 ] Jean-Daniel Cryans commented on HBASE-6758: --- +1 if it's tested on a cluster. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479450#comment-13479450 ] Lars Hofhansl commented on HBASE-6758: -- any objections/concerns with committing this to 0.94? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.94.3, 0.96.0 > > Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478925#comment-13478925 ] Hudson commented on HBASE-6758: --- Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #226 (See [https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/226/]) HBASE-6758 [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file (Devaraj Das via JD) (Revision 1399517) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478732#comment-13478732 ] Hudson commented on HBASE-6758: --- Integrated in HBase-TRUNK #3455 (See [https://builds.apache.org/job/HBase-TRUNK/3455/]) HBASE-6758 [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file (Devaraj Das via JD) (Revision 1399517) Result = FAILURE jdcryans : Files : * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java * /hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java * /hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478714#comment-13478714 ] Devaraj Das commented on HBASE-6758: This should mostly be applicable on 0.94 straightaway.. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478695#comment-13478695 ] Lars Hofhansl commented on HBASE-6758: -- 0.94? Looks like a good fix to backport. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478677#comment-13478677 ] Devaraj Das commented on HBASE-6758: Thanks, [~jdcryans], for the reviews. Party time :-) > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476313#comment-13476313 ] Devaraj Das commented on HBASE-6758: IMO the last patch is good to go. Is there anything pending from my end on this issue? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473701#comment-13473701 ] Hadoop QA commented on HBASE-6758: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12548635/6758-trunk-4.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3030//console This message is automatically generated. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473670#comment-13473670 ] Devaraj Das commented on HBASE-6758: bq. I disagree. Right now we add the log in ZK under postLogRoll() and createWriterInstance will run before that so the file should exist at least. Ah! and Ooops! I forgot about the fact that I changed the code to have preLogRoll not be ignored in the replication handler. Sorry, all the time I was thinking about the change in the placement of the call to postLogRoll.. So yes, it could happen that the logfile is up in ZK before the file exists but it appears (as we just discussed in the previous comments) that the issue would take care of itself (the RS that picks this file would dump it after some retries)... > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473647#comment-13473647 ] Jean-Daniel Cryans commented on HBASE-6758: --- bq. The lines of code that I moved are to do with postLogRoll which happens after the sequence that you are talking about. This problem exists with/without this patch. I disagree. Right now we add the log in ZK under postLogRoll() and createWriterInstance will run before that so the file should exist at least. bq. I think the RS that picks this queue up will dump the file after a couple of retries Yeah the fact that it's the last file and that the multiplier would go to the max and that it's a recovered queue should take care of that. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473593#comment-13473593 ] Devaraj Das commented on HBASE-6758: [~jdcryans], this sequence of events could happen currently too, isn't it? The lines of code that I moved are to do with postLogRoll which happens after the sequence that you are talking about. This problem exists with/without this patch. bq. You end up with a log tracked in ZK that doesn't exist. This RS's queue will be recovered by another RS that will eventually try to read from that non-existing file. My concern is how we're going to treat that file. To answer your question, I think the RS that picks this queue up will dump the file after a couple of retries (since the file doesn't exist and will never show up in the recovered logs directory). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473580#comment-13473580 ] Jean-Daniel Cryans commented on HBASE-6758: --- bq. please let me know if I missed something or misunderstood your concern Consider this scenario. First this runs: bq. Path newPath = computeFilename(); Then with your patch we add this file in ZK during: bq. i.preLogRoll(oldPath, newPath); Now let's say HDFS becomes unavailable or the RS fails and never gets to this line: bq. HLog.Writer nextWriter = this.createWriterInstance(fs, newPath, conf); You end up with a log tracked in ZK that doesn't exist. This RS's queue will be recovered by another RS that will eventually try to read from that non-existing file. My concern is how we're going to treat that file. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > 6758-trunk-4.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473566#comment-13473566 ] Devaraj Das commented on HBASE-6758: bq. Ah I see, I didn't fully grok the new preRoll/postRoll dance in my first review. That's clever. Cool. Thanks for taking a pass at this. bq. Will the recovered queue hang or will it abandon that HLog? FWIW there's another jira regarding that problem but this could be a new failure case. The change done to the placement of the postLogRoll call in the patch will not affect recovered queues. This will only affect files that the RS in question is creating himself. The changes in ReplicationSource.java will only take effect for non-recovered files (there is a check _!this.queueRecovered_ before setting _currentWALisBeingWrittenTo_ to true).. So I think we are covered (please let me know if I missed something or misunderstood your concern). I'll submit a patch shortly with the nits pointed out by [~te...@apache.org] fixed. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473419#comment-13473419 ] Jean-Daniel Cryans commented on HBASE-6758: --- Ah I see, I didn't fully grok the new preRoll/postRoll dance in my first review. That's clever. My one last concern before committing would be what happens when we are able to compute a new HLog name and put it up in ZK, but then fail to create the HLog and the RS dies. Will the recovered queue hang or will it abandon that HLog? FWIW there's another jira regarding that problem but this could be a new failure case. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473005#comment-13473005 ] Devaraj Das commented on HBASE-6758: Hey [~jdcryans], this patch doesn't change that behavior at all (new log is put up in ZK before the log is being written to, and blocks talking to ZK..). This patch only changes the postLogRoll placement and that deterministically ensures the previous log file is really closed before enqueuing the new log for replication. The code changes in the replicator thread (ReplicationSource.java) makes sure that the entire iteration of the loop "sees" a closed log file at least once (and hence takes care of the problem reported in the jira). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472982#comment-13472982 ] Jean-Daniel Cryans commented on HBASE-6758: --- The last time I played around postLogRoll in HBASE-3515, I found that we must ensure that we have that log up in ZK before we start writing to it because it would be possible for writers to append and at the same time not be able to add the log in ZK because the session timed out. The current code blocks talking to ZK. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469930#comment-13469930 ] Hadoop QA commented on HBASE-6758: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547851/6758-trunk-3.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 3 new or modified tests. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 81 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/3010//console This message is automatically generated. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469850#comment-13469850 ] Devaraj Das commented on HBASE-6758: Thanks, [~te...@apache.org] for looking. I will incorporate your comments in the next version of the patch (once I hear back from [~jdcryans] and/or [~stack]). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469841#comment-13469841 ] Ted Yu commented on HBASE-6758: --- Thanks for your continued effort, Devaraj. {code} + void prelogRoll(Path newLog) throws IOException { {code} I think the 'l' of 'log' should be capitalized. Same here: {code} + void postlogRoll(Path newLog) throws IOException { {code} nit: since the following line is modified, please add space after if: {code} -if(readAllEntriesToReplicateOrNextFile()) { +if(readAllEntriesToReplicateOrNextFile(fileInUse)) { {code} Please add javadoc for the new parameter: {code} /** * Do the shipping logic */ - protected void shipEdits() { + protected void shipEdits(boolean fileInUse) { {code} > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469063#comment-13469063 ] Hadoop QA commented on HBASE-6758: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12547636/6758-trunk-2.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop2.0{color}. The patch compiles against the hadoop 2.0 profile. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 83 warning messages. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 findbugs{color}. The patch appears to introduce 5 new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort org.apache.hadoop.hbase.regionserver.TestAtomicOperation Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2999//console This message is automatically generated. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469009#comment-13469009 ] Devaraj Das commented on HBASE-6758: In case it is not clear what's the deal with delaying the enqueueing of the new WAL file, the problem described in this jira happens because the new WAL file is enqueued too early (before the last WAL file is closed). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, 6758-trunk-2.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468986#comment-13468986 ] Devaraj Das commented on HBASE-6758: In the trunk case, I think something better can be done (and the interface changes can be avoided). Replication.postLogRoll could do the enqueue of the new path in the ReplicationSource's queue. The Replication.preLogRoll would do everything else (creating ZK entries, etc.) except the enqueuing of the path in the queue.. The postLogRoll is currently called before the writer is reset (to _nextWriter_) in FSHLog.rollWriter. I propose that it be called after the writer is reset. That in my opinion seems to be a more precise place for calling postLogRoll.. Thoughts? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468800#comment-13468800 ] Devaraj Das commented on HBASE-6758: bq. Can we not pass down RegionServerServices? Can we pass a narrow Interface instead? I think we can (I can pull out the getWAL() method from the interface RegionServerServices into a new interface and have RegionServerServices extend that..). But in that case we will pass two instances of HRS still (as pointed out by JD earlier). But thinking about it, that probably makes downstream methods' abstractions cleaner (when compared with the approach of having them accept a fat interface). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468762#comment-13468762 ] stack commented on HBASE-6758: -- Can we not pass down RegionServerServices? Can we pass a narrow Interface instead? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468751#comment-13468751 ] Devaraj Das commented on HBASE-6758: Thanks, [~jdcryans] for looking at the patch. Actually, upon looking at the RegionServerServices interface closely, I see that it extends the Server interface. So the problem you pointed out could be addressed by making the affected constructors and methods (the ones that I changed to have the new RegionServerServices argument) to have only RegionServerServices instead of Server/Stoppable instances. Will submit a patch soon. Hope that will look better. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468728#comment-13468728 ] Jean-Daniel Cryans commented on HBASE-6758: --- I really don't like that we have to pass down another instance of HRS (through RegionServerServices). The fact that we're now doing this: {code} -new Replication(this, this.fs, logdir, oldLogDir): null; +new Replication(this, this.fs, logdir, oldLogDir, this): null; {code} is making me sad. Also it leaks all over the code. It seems to me that there should be another way to handle this just in ReplicationSource. At the moment I'd be +1 for commit only to trunk and on commit this logging will need to cleaned up: {code} LOG.info("File " + getCurrentPath() + " in use"); {code} Is ok with you [~devaraj]? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das >Priority: Critical > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465644#comment-13465644 ] Hadoop QA commented on HBASE-6758: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12546989/6758-trunk-1.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 9 new or modified tests. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. -1 javadoc. The javadoc tool appears to have generated 149 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestFromClientSide Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2959//console This message is automatically generated. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Fix For: 0.96.0 > > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > 6758-trunk-1.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462074#comment-13462074 ] Devaraj Das commented on HBASE-6758: [~jdcryans] could you please have a look at the recent patch. Thanks! > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460664#comment-13460664 ] Devaraj Das commented on HBASE-6758: [~stack] I have already responded to Ted's comment. In summary, the problem is that the log-splitter couldn't complete its work soon enough, and hence the file wasn't moved to .oldlogs soon enough. The replicator did the maxRetries and gave up. So this is a different issue (and maybe solved by increasing the value of maxRetries in the config.) > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460648#comment-13460648 ] stack commented on HBASE-6758: -- [~devaraj] What you think of Ted comment above boss? [~jdcryans] Any comment on this patch? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457998#comment-13457998 ] Devaraj Das commented on HBASE-6758: [~yuzhih...@gmail.com] Hey thanks for taking the patch for a spin. Talk about races! Here it seems like the splitter didn't complete within the expected time, and the replication didn't happen for some data. Here are the relevant log snippets (look for "considering dumping" where the file got dropped before the splitter completed). But in this case, the issue can be addressed by increasing the number of retries (which is already configurable). The patch attached here doesn't attempt to solve this problem. {noformat} 2012-09-17 18:13:03,665 WARN [ReplicationExecutor-0.replicationSource,2-sea-lab-0,41831,1347930742751] regionserver.ReplicationSource(555): 2-sea-lab-0,41831,1347930742751 Got: java.io.IOException: File from recovered queue is nowhere to be found at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:537) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:304) Caused by: java.io.FileNotFoundException: File does not exist: hdfs://localhost:41196/user/hduser/hbase/.oldlogs/sea-lab-0%2C41831%2C1347930742751.1347930771911 at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517) at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796) at org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader. (SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader. (SequenceFileLogReader.java:58) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:166) at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:689) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:503) ... 1 more 2012-09-17 18:13:03,665 WARN [ReplicationExecutor-0.replicationSource,2-sea-lab-0,41831,1347930742751] regionserver.ReplicationSource(559): Waited too long for this file, considering dumping 2012-09-17 18:13:03,665 INFO [ReplicationExecutor-0.replicationSource,2-sea-lab-0,41831,1347930742751] regionserver.ReplicationSourceManager(365): Done with the recovered queue 2-sea-lab-0,41831,1347930742751 2012-09-17 18:13:04,305 DEBUG [main-EventThread] wal.HLogSplitter(657): Archived processed log hdfs://localhost:41196/user/hduser/hbase/.logs/sea-lab-0,41831,1347930742751-splitting/sea-lab-0%2C41831%2C1347930742751.1347930771911 to hdfs://localhost:41196/user/hduser/hbase/.oldlogs/sea-lab-0%2C41831%2C1347930742751.1347930771911 2012-09-17 18:13:04,306 INFO [main-EventThread] master.SplitLogManager(392): Done splitting /1/splitlog/hdfs%3A%2F%2Flocalhost%3A41196%2Fuser%2Fhduser%2Fhbase%2F.logs%2Fsea-lab-0%2C41831%2C1347930742751-splitting%2Fsea-lab-0%252C41831%252C1347930742751.1347930771911 {noformat} > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, > TEST-org.apache.hadoop.hbase.replication.TestReplication.xml > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457526#comment-13457526 ] Ted Yu commented on HBASE-6758: --- @Devaraj: I tried your patch v2 and I still got: {code} queueFailover(org.apache.hadoop.hbase.replication.TestReplication) Time elapsed: 86.817 sec <<< FAILURE! java.lang.AssertionError: Waited too much time for queueFailover replication. Waited 41973ms. at org.junit.Assert.fail(Assert.java:93) at org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:666) {code} I will attach some test output momentarily. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457502#comment-13457502 ] Devaraj Das commented on HBASE-6758: bq. Otherwise, I love the fact that you are figuring bugs and fixes in replication just using the test. Painful I'd imagine. Great work. Thanks, Stack. Yes, I have burnt some midnight oil on these issues. Fun though. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457437#comment-13457437 ] Devaraj Das commented on HBASE-6758: bq. I see, all that double-negation (eg !fileNotInUse) confused me Sorry about that. I'll see if I can change it to single negation :-) bq. So in layman's terms, your patch short circuits all the checks to change the current path if we know for sure that the file we are replicating from is being written to. The side effect is that we won't quit the current file unless it has aged right? Yes .. bq. FWIW that might not be totally true, at least in 0.94 HLog.postLogRoll is called before HLog.cleanupCurrentWriter which does issue a sync(). I don't get this, JD. Could you please clarify a bit more? Given the fact that the currentPath would be updated only after the call to cleanupCurrentWriter, I don't see a difference in the behavior between 0.92 and 0.94... (maybe I am missing something though). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457342#comment-13457342 ] Jean-Daniel Cryans commented on HBASE-6758: --- I see, all that double-negation (eg !fileNotInUse) confused me :) So in layman's terms, your patch short circuits all the checks to change the current path if we know for sure that the file we are replicating from is being written to. The side effect is that we won't quit the current file unless it has aged right? bq. The replication executor is always trailing, and so when the HLog guy says that a path is not in use (being written to), it seems to me a fact that it indeed is not being written to and any writes that ever happened was in the past. FWIW that might not be totally true, at least in 0.94 HLog.postLogRoll is called before HLog.cleanupCurrentWriter which does issue a sync(). > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457296#comment-13457296 ] Devaraj Das commented on HBASE-6758: [~jdcryans] Thanks for looking. Responses below. bq. My understanding of this patch is that it reduces the race condition but it still leaves a small window eg you can take the "fileNotInUse" snapshot, get "false", and the moment after that the log could roll. If this is correct, I'm not sure it's worth the added complexity. I don't think there is ever that window. The replication executor thread picks up a path that the LogRoller puts in the replicator's queue BEFORE the log roll happens (and the HLog constructor puts the first path before the replication executor starts). The replication executor is always trailing, and so when the HLog guy says that a path is not in use (being written to), it seems to me a fact that it indeed is not being written to and any writes that ever happened was in the past. Also note that the currentPath is reset AFTER a log roll, which is kind of delayed.. bq. It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for the time we read the log to be 100% sure log rolling doesn't happen. This has multiple side effects like delaying flushes and log rolls for a few ms while replication is reading the log. It would also require having a way to get to the WAL from ReplicationSource. Yeah, I tried my best to avoid taking that crucial lock! bq. Anyways, one solution I can think of that doesn't involve leaking HRS into replication would be giving the log a "second chance". Basically if you get an EOF, flip the secondChance bit. If it's on then you don't get rid of that log yet. Reset the bit when you loop back to read, now if there was new data added you should get it else go to the next log. I considered some variant of this. However, I gave it up and took a more conservative approach - make sure that the replication-executor thread gets at least one pass at a CLOSED file. All other solutions seemed incomplete to me and prone to races... [~stack] forgot to answer one of your previous questions. bq. Should currentFilePath be an atomic reference so all threads see the changes when they happen? I think volatile suffices for the use case here. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457194#comment-13457194 ] Jean-Daniel Cryans commented on HBASE-6758: --- My understanding of this patch is that it reduces the race condition but it still leaves a small window eg you can take the "fileNotInUse" snapshot, get "false", and the moment after that the log could roll. If this is correct, I'm not sure it's worth the added complexity. It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for the time we read the log to be 100% sure log rolling doesn't happen. This has multiple side effects like delaying flushes and log rolls for a few ms while replication is reading the log. It would also require having a way to get to the WAL from ReplicationSource. While I'm thinking about this, it just occurred to me that when we read a log that's not being written to then we don't need the open/close file dance since the new data is already available. Possible optimization here! Anyways, one solution I can think of that doesn't involve leaking HRS into replication would be giving the log a "second chance". Basically if you get an EOF, flip the secondChance bit. If it's on then you don't get rid of that log yet. Reset the bit when you loop back to read, now if there was new data added you should get it else go to the next log. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457156#comment-13457156 ] Devaraj Das commented on HBASE-6758: [~zhi...@ebaysf.com] Not sure why you got a compilation error. Will look.. [~stack] Thanks for the detailed comments. Here are the responses. bq. Rather than change all new Replication invocations to take a null, why not override the Replication constructor? Your patch would be smaller. I had considered that but it didn't seem adding a new constructor is justified in the long run. There probably are no consumers of the constructor outside HBase, etc., and adding another constructor means new code to take care of, etc. So although it makes the patch bigger, I think it's okay.. bq. Could there be issues with isFileInUse in multithreaded context? Should currentFilePath be an atomic reference so all threads see the changes when they happen? Do you think this an issue? There shouldn't be any multithreading issues here. Each ReplicationExecutor thread has its own copy of everything (including currentFilePath), and the getters/setters are in the same thread context. bq. Do we have to pass in an HRegionServer instance into ReplicationSourceManager? Can it be one of the Interfaces Server or RegionServerServices? Or looking at why you need it, you want it because you want to get at HLog instance. Can we not pass this? Or better, an Interface that has isFileInUse on it? Yes, I tried to pass the HLog instance to Replication's constructor call within HRegionServer. But the code is kind of tangled up. HRegionServer instantiates a Replication object (in setupWALAndReplication). HLog is instantiated in instantiateHLog, and the constructor of HLog invokes rollWriter. If the Replication object was not registered prior to rollWriter call, things don't work (which means the Replication object needs to be constructed first but the HLog instance is not available yet). I tried fixing it but then I ran into other issues... But yeah, I like the interface idea. Will try to refactor the code in that respect. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457097#comment-13457097 ] stack commented on HBASE-6758: -- Rather than change all new Replication invocations to take a null, why not override the Replication constructor? Your patch would be smaller. Could there be issues with isFileInUse in multithreaded context? Should currentFilePath be an atomic reference so all threads see the changes when they happen? Do you think this an issue? Do we have to pass in an HRegionServer instance into ReplicationSourceManager? Can it be one of the Interfaces Server or RegionServerServices? Or looking at why you need it, you want it because you want to get at HLog instance. Can we not pass this? Or better, an Interface that has isFileInUse on it? Currently, you are passing an HRegionServer Instance to ReplicationSourceManager to which is added a public method that exposes the HRegionServer instance on which we invoke the getWAL method to call isFileInUse. We're adding a bit of tangle. Otherwise, I love the fact that you are figuring bugs and fixes in replication just using the test. Painful I'd imagine. Great work. > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file
[ https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457068#comment-13457068 ] Ted Yu commented on HBASE-6758: --- @Devaraj: Thanks for your effort. I got the following at compilation time: {code} [ERROR] /home/hduser/92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:[317,11] readAllEntriesToReplicateOrNextFile(boolean) in org.apache.hadoop.hbase.replication.regionserver.ReplicationSource cannot be applied to () {code} Do you see similar error ? > [replication] The replication-executor should make sure the file that it is > replicating is closed before declaring success on that file > --- > > Key: HBASE-6758 > URL: https://issues.apache.org/jira/browse/HBASE-6758 > Project: HBase > Issue Type: Bug >Reporter: Devaraj Das >Assignee: Devaraj Das > Attachments: 6758-1-0.92.patch > > > I have seen cases where the replication-executor would lose data to replicate > since the file hasn't been closed yet. Upon closing, the new data becomes > visible. Before that happens the ZK node shouldn't be deleted in > ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made > in ReplicationSource.processEndOfFile as well (currentPath related). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira