subject:"\[jira\] \[Commented\] \(HBASE\-6758\) \[replication\] The replication\-executor should make sure the file that it is replicating is closed before declaring success on that file"

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479499#comment-13479499
 ] 

stack commented on HBASE-6758:
--

That would explain it (I missed that it was a move... its been at least a week 
since I reviewed patches... forgive me).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479491#comment-13479491
 ] 

Lars Hofhansl commented on HBASE-6758:
--

What diff are you looking at? The diff in HLog moves a block of code around.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479484#comment-13479484
 ] 

stack commented on HBASE-6758:
--

Better include it then when you apply to 0.94 (I can't see a diff... not even 
white space)

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479479#comment-13479479
 ] 

Lars Hofhansl commented on HBASE-6758:
--

@Stack: Possibly, that's what the 0.96 patch does too :)

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479474#comment-13479474
 ] 

stack commented on HBASE-6758:
--

Is that a non-change in Index: 
src/main/java/org/apache/hadoop/hbase/regionserver/wal/HLog.java?

Good by me committing to 0.94.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479471#comment-13479471
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

+1 if it's tested on a cluster.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479450#comment-13479450
 ] 

Lars Hofhansl commented on HBASE-6758:
--

any objections/concerns with committing this to 0.94?

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.94.3, 0.96.0
>
> Attachments: 6758-0.94.txt, 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-18 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478925#comment-13478925
 ] 

Hudson commented on HBASE-6758:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #226 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/226/])
HBASE-6758  [replication] The replication-executor should make sure the file
that it is replicating is closed before declaring success on that
file (Devaraj Das via JD) (Revision 1399517)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java


> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-17 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478732#comment-13478732
 ] 

Hudson commented on HBASE-6758:
---

Integrated in HBase-TRUNK #3455 (See 
[https://builds.apache.org/job/HBase-TRUNK/3455/])
HBASE-6758  [replication] The replication-executor should make sure the file
that it is replicating is closed before declaring success on that
file (Devaraj Das via JD) (Revision 1399517)

 Result = FAILURE
jdcryans : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/wal/FSHLog.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSourceManager.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/replication/regionserver/TestReplicationSourceManager.java


> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-17 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478714#comment-13478714
 ] 

Devaraj Das commented on HBASE-6758:


This should mostly be applicable on 0.94 straightaway..

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-17 Thread Lars Hofhansl (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478695#comment-13478695
 ] 

Lars Hofhansl commented on HBASE-6758:
--

0.94? Looks like a good fix to backport.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-17 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478677#comment-13478677
 ] 

Devaraj Das commented on HBASE-6758:


Thanks, [~jdcryans], for the reviews. Party time :-)

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-15 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13476313#comment-13476313
 ] 

Devaraj Das commented on HBASE-6758:


IMO the last patch is good to go. Is there anything pending from my end on this 
issue?

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473701#comment-13473701
 ] 

Hadoop QA commented on HBASE-6758:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12548635/6758-trunk-4.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
81 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3030//console

This message is automatically generated.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473670#comment-13473670
 ] 

Devaraj Das commented on HBASE-6758:


bq. I disagree. Right now we add the log in ZK under postLogRoll() and 
createWriterInstance will run before that so the file should exist at least.

Ah! and Ooops! I forgot about the fact that I changed the code to have 
preLogRoll not be ignored in the replication handler. Sorry, all the time I was 
thinking about the change in the placement of the call to postLogRoll.. So yes, 
it could happen that the logfile is up in ZK before the file exists but it 
appears (as we just discussed in the previous comments) that the issue would 
take care of itself (the RS that picks this file would dump it after some 
retries)...

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473647#comment-13473647
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

bq. The lines of code that I moved are to do with postLogRoll which happens 
after the sequence that you are talking about. This problem exists with/without 
this patch.

I disagree. Right now we add the log in ZK under postLogRoll() and 
createWriterInstance will run before that so the file should exist at least.

bq. I think the RS that picks this queue up will dump the file after a couple 
of retries

Yeah the fact that it's the last file and that the multiplier would go to the 
max and that it's a recovered queue should take care of that.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473593#comment-13473593
 ] 

Devaraj Das commented on HBASE-6758:


[~jdcryans], this sequence of events could happen currently too, isn't it? The 
lines of code that I moved are to do with postLogRoll which happens after the 
sequence that you are talking about. This problem exists with/without this 
patch.

bq. You end up with a log tracked in ZK that doesn't exist. This RS's queue 
will be recovered by another RS that will eventually try to read from that 
non-existing file. My concern is how we're going to treat that file.

To answer your question, I think the RS that picks this queue up will dump the 
file after a couple of retries (since the file doesn't exist and will never 
show up in the recovered logs directory).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473580#comment-13473580
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

bq. please let me know if I missed something or misunderstood your concern

Consider this scenario. First this runs:

bq. Path newPath = computeFilename();

Then with your patch we add this file in ZK during:

bq. i.preLogRoll(oldPath, newPath);

Now let's say HDFS becomes unavailable or the RS fails and never gets to this 
line:

bq. HLog.Writer nextWriter = this.createWriterInstance(fs, newPath, conf);

You end up with a log tracked in ZK that doesn't exist. This RS's queue will be 
recovered by another RS that will eventually try to read from that non-existing 
file. My concern is how we're going to treat that file.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> 6758-trunk-4.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473566#comment-13473566
 ] 

Devaraj Das commented on HBASE-6758:


bq. Ah I see, I didn't fully grok the new preRoll/postRoll dance in my first 
review. That's clever.

Cool. Thanks for taking a pass at this.

bq. Will the recovered queue hang or will it abandon that HLog? FWIW there's 
another jira regarding that problem but this could be a new failure case.

The change done to the placement of the postLogRoll call in the patch will not 
affect recovered queues. This will only affect files that the RS in question is 
creating himself. The changes in ReplicationSource.java will only take effect 
for non-recovered files (there is a check _!this.queueRecovered_ before setting 
_currentWALisBeingWrittenTo_ to true).. So I think we are covered (please let 
me know if I missed something or misunderstood your concern).

I'll submit a patch shortly with the nits pointed out by [~te...@apache.org] 
fixed.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-10 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473419#comment-13473419
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

Ah I see, I didn't fully grok the new preRoll/postRoll dance in my first 
review. That's clever.

My one last concern before committing would be what happens when we are able to 
compute a new HLog name and put it up in ZK, but then fail to create the HLog 
and the RS dies. Will the recovered queue hang or will it abandon that HLog? 
FWIW there's another jira regarding that problem but this could be a new 
failure case.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-09 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13473005#comment-13473005
 ] 

Devaraj Das commented on HBASE-6758:


Hey [~jdcryans], this patch doesn't change that behavior at all (new log is put 
up in ZK before the log is being written to, and blocks talking to ZK..). This 
patch only changes the postLogRoll placement and that deterministically ensures 
the previous log file is really closed before enqueuing the new log for 
replication. The code changes in the replicator thread (ReplicationSource.java) 
makes sure that the entire iteration of the loop "sees" a closed log file at 
least once (and hence takes care of the problem reported in the jira).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13472982#comment-13472982
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

The last time I played around postLogRoll in HBASE-3515, I found that we must 
ensure that we have that log up in ZK before we start writing to it because it 
would be possible for writers to append and at the same time not be able to add 
the log in ZK because the session timed out. 

The current code blocks talking to ZK.


> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-04 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469930#comment-13469930
 ] 

Hadoop QA commented on HBASE-6758:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12547851/6758-trunk-3.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 3 new 
or modified tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
81 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3010//console

This message is automatically generated.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-04 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469850#comment-13469850
 ] 

Devaraj Das commented on HBASE-6758:


Thanks, [~te...@apache.org] for looking. I will incorporate your comments in 
the next version of the patch (once I hear back from [~jdcryans] and/or 
[~stack]).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-04 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469841#comment-13469841
 ] 

Ted Yu commented on HBASE-6758:
---

Thanks for your continued effort, Devaraj.
{code}
+  void prelogRoll(Path newLog) throws IOException {
{code}
I think the 'l' of 'log' should be capitalized.
Same here:
{code}
+  void postlogRoll(Path newLog) throws IOException {
{code}
nit: since the following line is modified, please add space after if:
{code}
-if(readAllEntriesToReplicateOrNextFile()) {
+if(readAllEntriesToReplicateOrNextFile(fileInUse)) {
{code}
Please add javadoc for the new parameter:
{code}
   /**
* Do the shipping logic
*/
-  protected void shipEdits() {
+  protected void shipEdits(boolean fileInUse) {
{code}

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 6758-trunk-3.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469063#comment-13469063
 ] 

Hadoop QA commented on HBASE-6758:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12547636/6758-trunk-2.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
83 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.coprocessor.TestRowProcessorEndpoint
  
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient
  
org.apache.hadoop.hbase.coprocessor.TestRegionServerCoprocessorExceptionWithAbort
  org.apache.hadoop.hbase.regionserver.TestAtomicOperation

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2999//console

This message is automatically generated.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13469009#comment-13469009
 ] 

Devaraj Das commented on HBASE-6758:


In case it is not clear what's the deal with delaying the enqueueing of the new 
WAL file, the problem described in this jira happens because the new WAL file 
is enqueued too early (before the last WAL file is closed).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 6758-trunk-2.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468986#comment-13468986
 ] 

Devaraj Das commented on HBASE-6758:


In the trunk case, I think something better can be done (and the interface 
changes can be avoided). Replication.postLogRoll could do the enqueue of the 
new path in the ReplicationSource's queue. The Replication.preLogRoll would do 
everything else (creating ZK entries, etc.) except the enqueuing of the path in 
the queue.. 

The postLogRoll is currently called before the writer is reset (to 
_nextWriter_) in FSHLog.rollWriter. I propose that it be called after the 
writer is reset. That in my opinion seems to be a more precise place for 
calling postLogRoll..

Thoughts?

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468800#comment-13468800
 ] 

Devaraj Das commented on HBASE-6758:


bq. Can we not pass down RegionServerServices? Can we pass a narrow Interface 
instead?

I think we can (I can pull out the getWAL() method from the interface 
RegionServerServices into a new interface and have RegionServerServices extend 
that..). But in that case we will pass two instances of HRS still (as pointed 
out by JD earlier). But thinking about it, that probably makes downstream 
methods' abstractions cleaner (when compared with the approach of having them 
accept a fat interface).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468762#comment-13468762
 ] 

stack commented on HBASE-6758:
--

Can we not pass down RegionServerServices?  Can we pass a narrow Interface 
instead?



> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468751#comment-13468751
 ] 

Devaraj Das commented on HBASE-6758:


Thanks, [~jdcryans] for looking at the patch. Actually, upon looking at the 
RegionServerServices interface closely, I see that it extends the Server 
interface. So the problem you pointed out could be addressed by making the 
affected constructors and methods (the ones that I changed to have the new 
RegionServerServices argument) to have only RegionServerServices instead of 
Server/Stoppable instances.

Will submit a patch soon. Hope that will look better.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-10-03 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13468728#comment-13468728
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

I really don't like that we have to pass down another instance of HRS (through 
RegionServerServices). The fact that we're now doing this:

{code}
-new Replication(this, this.fs, logdir, oldLogDir): null;
+new Replication(this, this.fs, logdir, oldLogDir, this): null;
{code}

is making me sad. Also it leaks all over the code. It seems to me that there 
should be another way to handle this just in ReplicationSource.

At the moment I'd be +1 for commit only to trunk and on commit this logging 
will need to cleaned up:

{code}
LOG.info("File " + getCurrentPath() + " in use");
{code}

Is ok with you [~devaraj]?

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
>Priority: Critical
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-28 Thread Hadoop QA (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465644#comment-13465644
 ] 

Hadoop QA commented on HBASE-6758:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12546989/6758-trunk-1.patch
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 9 new or modified tests.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

-1 javadoc.  The javadoc tool appears to have generated 149 warning 
messages.

+1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

-1 findbugs.  The patch appears to introduce 10 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestFromClientSide

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2959//console

This message is automatically generated.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Fix For: 0.96.0
>
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> 6758-trunk-1.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-24 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462074#comment-13462074
 ] 

Devaraj Das commented on HBASE-6758:


[~jdcryans] could you please have a look at the recent patch. Thanks!

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-21 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460664#comment-13460664
 ] 

Devaraj Das commented on HBASE-6758:


[~stack] I have already responded to Ted's comment. In summary, the problem is 
that the log-splitter couldn't complete its work soon enough, and hence the 
file wasn't moved to .oldlogs soon enough. The replicator did the maxRetries 
and gave up. So this is a different issue (and maybe solved by increasing the 
value of maxRetries in the config.)

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-21 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13460648#comment-13460648
 ] 

stack commented on HBASE-6758:
--

[~devaraj] What you think of Ted comment above boss?

[~jdcryans] Any comment on this patch?

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-18 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457998#comment-13457998
 ] 

Devaraj Das commented on HBASE-6758:


[~yuzhih...@gmail.com] Hey thanks for taking the patch for a spin.

Talk about races! Here it seems like the splitter didn't complete within the 
expected time, and the replication didn't happen for some data. 

Here are the relevant log snippets (look for "considering dumping" where the 
file got dropped before the splitter completed). But in this case, the issue 
can be addressed by increasing the number of retries (which is already 
configurable). The patch attached here doesn't attempt to solve this problem.

{noformat}

2012-09-17 18:13:03,665 WARN  
[ReplicationExecutor-0.replicationSource,2-sea-lab-0,41831,1347930742751] 
regionserver.ReplicationSource(555): 2-sea-lab-0,41831,1347930742751 Got:
java.io.IOException: File from recovered queue is nowhere to be found
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:537)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:304)
Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://localhost:41196/user/hduser/hbase/.oldlogs/sea-lab-0%2C41831%2C1347930742751.1347930771911
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:517)
at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:796)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1475)
at 
org.apache.hadoop.io.SequenceFile$Reader.(SequenceFile.java:1470)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.(SequenceFileLogReader.java:58)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:166)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:689)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:503)
... 1 more

2012-09-17 18:13:03,665 WARN  
[ReplicationExecutor-0.replicationSource,2-sea-lab-0,41831,1347930742751] 
regionserver.ReplicationSource(559): Waited too long for this file, considering 
dumping

2012-09-17 18:13:03,665 INFO  
[ReplicationExecutor-0.replicationSource,2-sea-lab-0,41831,1347930742751] 
regionserver.ReplicationSourceManager(365): Done with the recovered queue 
2-sea-lab-0,41831,1347930742751

2012-09-17 18:13:04,305 DEBUG [main-EventThread] wal.HLogSplitter(657): 
Archived processed log 
hdfs://localhost:41196/user/hduser/hbase/.logs/sea-lab-0,41831,1347930742751-splitting/sea-lab-0%2C41831%2C1347930742751.1347930771911
 to 
hdfs://localhost:41196/user/hduser/hbase/.oldlogs/sea-lab-0%2C41831%2C1347930742751.1347930771911

2012-09-17 18:13:04,306 INFO  [main-EventThread] master.SplitLogManager(392): 
Done splitting 
/1/splitlog/hdfs%3A%2F%2Flocalhost%3A41196%2Fuser%2Fhduser%2Fhbase%2F.logs%2Fsea-lab-0%2C41831%2C1347930742751-splitting%2Fsea-lab-0%252C41831%252C1347930742751.1347930771911

{noformat}

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
> TEST-org.apache.hadoop.hbase.replication.TestReplication.xml
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457526#comment-13457526
 ] 

Ted Yu commented on HBASE-6758:
---

@Devaraj:
I tried your patch v2 and I still got:
{code}
queueFailover(org.apache.hadoop.hbase.replication.TestReplication)  Time 
elapsed: 86.817 sec  <<< FAILURE!
java.lang.AssertionError: Waited too much time for queueFailover replication. 
Waited 41973ms.
  at org.junit.Assert.fail(Assert.java:93)
  at 
org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:666)
{code}
I will attach some test output momentarily.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457502#comment-13457502
 ] 

Devaraj Das commented on HBASE-6758:


bq. Otherwise, I love the fact that you are figuring bugs and fixes in 
replication just using the test. Painful I'd imagine. Great work.

Thanks, Stack. Yes, I have burnt some midnight oil on these issues. Fun though.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457437#comment-13457437
 ] 

Devaraj Das commented on HBASE-6758:


bq. I see, all that double-negation (eg !fileNotInUse) confused me

Sorry about that. I'll see if I can change it to single negation :-)

bq. So in layman's terms, your patch short circuits all the checks to change 
the current path if we know for sure that the file we are replicating from is 
being written to. The side effect is that we won't quit the current file unless 
it has aged right?

Yes .. 

bq. FWIW that might not be totally true, at least in 0.94 HLog.postLogRoll is 
called before HLog.cleanupCurrentWriter which does issue a sync().

I don't get this, JD. Could you please clarify a bit more? Given the fact that 
the currentPath would be updated only after the call to cleanupCurrentWriter, I 
don't see a difference in the behavior between 0.92 and 0.94... (maybe I am 
missing something though).

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457342#comment-13457342
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

I see, all that double-negation (eg !fileNotInUse) confused me :)

So in layman's terms, your patch short circuits all the checks to change the 
current path if we know for sure that the file we are replicating from is being 
written to. The side effect is that we won't quit the current file unless it 
has aged right? 

bq. The replication executor is always trailing, and so when the HLog guy says 
that a path is not in use (being written to), it seems to me a fact that it 
indeed is not being written to and any writes that ever happened was in the 
past.

FWIW that might not be totally true, at least in 0.94 HLog.postLogRoll is 
called before HLog.cleanupCurrentWriter which does issue a sync().



> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457296#comment-13457296
 ] 

Devaraj Das commented on HBASE-6758:


[~jdcryans] Thanks for looking. Responses below.

bq. My understanding of this patch is that it reduces the race condition but it 
still leaves a small window eg you can take the "fileNotInUse" snapshot, get 
"false", and the moment after that the log could roll. If this is correct, I'm 
not sure it's worth the added complexity.

I don't think there is ever that window. The replication executor thread picks 
up a path that the LogRoller puts in the replicator's queue BEFORE the log roll 
happens (and the HLog constructor puts the first path before the replication 
executor starts). The replication executor is always trailing, and so when the 
HLog guy says that a path is not in use (being written to), it seems to me a 
fact that it indeed is not being written to and any writes that ever happened 
was in the past. Also note that the currentPath is reset AFTER a log roll, 
which is kind of delayed..

bq. It seems to me this is a case where we'd need to lock HLog.cacheFlushLock 
for the time we read the log to be 100% sure log rolling doesn't happen. This 
has multiple side effects like delaying flushes and log rolls for a few ms 
while replication is reading the log. It would also require having a way to get 
to the WAL from ReplicationSource.

Yeah, I tried my best to avoid taking that crucial lock!

bq. Anyways, one solution I can think of that doesn't involve leaking HRS into 
replication would be giving the log a "second chance". Basically if you get an 
EOF, flip the secondChance bit. If it's on then you don't get rid of that log 
yet. Reset the bit when you loop back to read, now if there was new data added 
you should get it else go to the next log.

I considered some variant of this. However, I gave it up and took a more 
conservative approach - make sure that the replication-executor thread gets at 
least one pass at a CLOSED file. All other solutions seemed incomplete to me 
and prone to races...

[~stack] forgot to answer one of your previous questions.
bq. Should currentFilePath be an atomic reference so all threads see the 
changes when they happen?

I think volatile suffices for the use case here.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457194#comment-13457194
 ] 

Jean-Daniel Cryans commented on HBASE-6758:
---

My understanding of this patch is that it reduces the race condition but it 
still leaves a small window eg you can take the "fileNotInUse" snapshot, get 
"false", and the moment after that the log could roll. If this is correct, I'm 
not sure it's worth the added complexity.

It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for 
the time we read the log to be 100% sure log rolling doesn't happen. This has 
multiple side effects like delaying flushes and log rolls for a few ms while 
replication is reading the log. It would also require having a way to get to 
the WAL from ReplicationSource.

While I'm thinking about this, it just occurred to me that when we 
read a log that's not being written to then we don't need the open/close file 
dance since the new data is already available. Possible optimization 
here!

Anyways, one solution I can think of that doesn't involve leaking HRS into 
replication would be giving the log a "second chance". Basically if you get an 
EOF, flip the secondChance bit. If it's on then you don't get rid of that log 
yet. Reset the bit when you loop back to read, now if there was new data added 
you should get it else go to the next log.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Devaraj Das (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457156#comment-13457156
 ] 

Devaraj Das commented on HBASE-6758:


[~zhi...@ebaysf.com] Not sure why you got a compilation error. Will look..

[~stack] Thanks for the detailed comments. Here are the responses.

bq. Rather than change all new Replication invocations to take a null, why not 
override the Replication constructor? Your patch would be smaller.

I had considered that but it didn't seem adding a new constructor is justified 
in the long run. There probably are no consumers of the constructor outside 
HBase, etc., and adding another constructor means new code to take care of, 
etc. So although it makes the patch bigger, I think it's okay..

bq. Could there be issues with isFileInUse in multithreaded context? Should 
currentFilePath be an atomic reference so all threads see the changes when they 
happen? Do you think this an issue?

There shouldn't be any multithreading issues here. Each ReplicationExecutor 
thread has its own copy of everything (including currentFilePath), and the 
getters/setters are in the same thread context.

bq. Do we have to pass in an HRegionServer instance into 
ReplicationSourceManager? Can it be one of the Interfaces Server or 
RegionServerServices? Or looking at why you need it, you want it because you 
want to get at HLog instance. Can we not pass this? Or better, an Interface 
that has isFileInUse on it?

Yes, I tried to pass the HLog instance to Replication's constructor call within 
HRegionServer. But the code is kind of tangled up. HRegionServer instantiates a 
Replication object (in setupWALAndReplication). HLog is instantiated in 
instantiateHLog, and the constructor of HLog invokes rollWriter. If the 
Replication object was not registered prior to rollWriter call, things don't 
work (which means the Replication object needs to be constructed first but the 
HLog instance is not available yet). I tried fixing it but then I ran into 
other issues...

But yeah, I like the interface idea. Will try to refactor the code in that 
respect.

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread stack (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457097#comment-13457097
 ] 

stack commented on HBASE-6758:
--

Rather than change all new Replication invocations to take a null, why not 
override the Replication constructor?  Your patch would be smaller.

Could there be issues with isFileInUse in multithreaded context?  Should 
currentFilePath be an atomic reference so all threads see the changes when they 
happen?  Do you think this an issue?

Do we have to pass in an HRegionServer instance into ReplicationSourceManager?  
Can it be one of the Interfaces Server or RegionServerServices?  Or looking at 
why you need it, you want it because you want to get at HLog instance.  Can we 
not pass this?  Or better, an Interface that has isFileInUse on it?

Currently, you are passing an HRegionServer Instance to 
ReplicationSourceManager to which is added a public method that exposes the 
HRegionServer instance on which we invoke the getWAL method to call 
isFileInUse.  We're adding a bit of tangle.

Otherwise, I love the fact that you are figuring bugs and fixes in replication 
just using the test.  Painful I'd imagine.  Great work.



> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

2012-09-17 Thread Ted Yu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13457068#comment-13457068
 ] 

Ted Yu commented on HBASE-6758:
---

@Devaraj:
Thanks for your effort.
I got the following at compilation time:
{code}
[ERROR] 
/home/hduser/92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:[317,11]
 readAllEntriesToReplicateOrNextFile(boolean) in 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource cannot be 
applied to ()
{code}
Do you see similar error ?

> [replication] The replication-executor should make sure the file that it is 
> replicating is closed before declaring success on that file
> ---
>
> Key: HBASE-6758
> URL: https://issues.apache.org/jira/browse/HBASE-6758
> Project: HBase
>  Issue Type: Bug
>Reporter: Devaraj Das
>Assignee: Devaraj Das
> Attachments: 6758-1-0.92.patch
>
>
> I have seen cases where the replication-executor would lose data to replicate 
> since the file hasn't been closed yet. Upon closing, the new data becomes 
> visible. Before that happens the ZK node shouldn't be deleted in 
> ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
> in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

46 matches

Mail list logo