[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16841012#comment-16841012 ] Hudson commented on HBASE-20305: Results for branch branch-2 [build #1894 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//console]. (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2/1894//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840084#comment-16840084 ] Hudson commented on HBASE-20305: Results for branch branch-2.2 [build #260 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/260/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/260//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/260//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/260//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (x) {color:red}-1 client integration test{color} --Failed when running client tests on top of Hadoop 2. [see log for details|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.2/260//artifact/output-integration/hadoop-2.log]. (note that this means we didn't run on Hadoop 3) > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16840048#comment-16840048 ] Hudson commented on HBASE-20305: Results for branch branch-2.1 [build #1143 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1143/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1143//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1143//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-2.1/1143//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. (/) {color:green}+1 client integration test{color} > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839987#comment-16839987 ] Hudson commented on HBASE-20305: Results for branch branch-1.4 [build #796 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/796/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/796//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/796//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- Something went wrong running this stage, please [check relevant console output|https://builds.apache.org/job/HBase%20Nightly/job/branch-1.4/796//console]. (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839960#comment-16839960 ] Hudson commented on HBASE-20305: Results for branch branch-1 [build #824 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/824/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/824//General_Nightly_Build_Report/] (x) {color:red}-1 jdk7 checks{color} -- For more information [see jdk7 report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/824//JDK7_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/branch-1/824//JDK8_Nightly_Build_Report_(Hadoop2)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 1.4.10, 2.3.0, 2.1.5, 2.2.1 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16839829#comment-16839829 ] Andrew Purtell commented on HBASE-20305: Thanks, committing now > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838620#comment-16838620 ] HBase QA commented on HBASE-20305: -- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 44s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:blue}0{color} | {color:blue} findbugs {color} | {color:blue} 0m 1s{color} | {color:blue} Findbugs executables are not available. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} branch-1 Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 52s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 43s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 23s{color} | {color:green} branch-1 passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 45s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 27s{color} | {color:green} branch-1 passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 36s{color} | {color:green} branch-1 passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 41s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 38s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 38s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 39s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 39s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 22s{color} | {color:red} hbase-server: The patch generated 2 new + 10 unchanged - 4 fixed = 12 total (was 14) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 2m 50s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 1m 42s{color} | {color:green} Patch does not cause any errors with Hadoop 2.7.4. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 26s{color} | {color:green} the patch passed with JDK v1.8.0_212 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 35s{color} | {color:green} the patch passed with JDK v1.7.0_222 {color} | || || || || {color:brown} Other Tests {color} || | {color:red}-1{color} | {color:red} unit {color} | {color:red}127m 7s{color} | {color:red} hbase-server in the patch failed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 29s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black}147m 38s{color} | {color:black} {color} | \\ \\ || Reason || Tests || | Failed junit tests | hadoop.hbase.coprocessor.TestMetaTableMetrics | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce base: https://builds.apache.org/job/PreCommit-HBASE-Build/296/artifact/patchprocess/Dockerfile | | JIRA Issue | HBASE-20305 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12968547/HBASE-20305.branch-1.001.patch | | Optional Tests | dupname asflicense javac javadoc unit fi
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16838512#comment-16838512 ] Wellington Chevreuil commented on HBASE-20305: -- Hi [~apurtell], sorry for the late reply. Found a surge in demand for this feature on earlier release, then realised this only went to master branch. Attached patches for branch-2 and branch-1, respectively. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.branch-1.001.patch, HBASE-20305.branch-2.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815617#comment-16815617 ] Andrew Purtell commented on HBASE-20305: Any progress here? Or unschedule it? Or close it? > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.3.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16542248#comment-16542248 ] Andrew Purtell commented on HBASE-20305: No concerns here > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541892#comment-16541892 ] Sean Busbey commented on HBASE-20305: - tentatively adding to scope of next minor releases. If I don't hear an objection I'll backport this later this week. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0, 1.5.0, 2.2.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430073#comment-16430073 ] Hudson commented on HBASE-20305: Results for branch HBASE-19064 [build #90 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90/]: (x) *{color:red}-1 overall{color}* details (if available): (x) {color:red}-1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/HBASE-19064/90//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426890#comment-16426890 ] Hudson commented on HBASE-20305: Results for branch master [build #284 on builds.a.o|https://builds.apache.org/job/HBase%20Nightly/job/master/284/]: (x) *{color:red}-1 overall{color}* details (if available): (/) {color:green}+1 general checks{color} -- For more information [see general report|https://builds.apache.org/job/HBase%20Nightly/job/master/284//General_Nightly_Build_Report/] (x) {color:red}-1 jdk8 hadoop2 checks{color} -- For more information [see jdk8 (hadoop2) report|https://builds.apache.org/job/HBase%20Nightly/job/master/284//JDK8_Nightly_Build_Report_(Hadoop2)/] (x) {color:red}-1 jdk8 hadoop3 checks{color} -- For more information [see jdk8 (hadoop3) report|https://builds.apache.org/job/HBase%20Nightly/job/master/284//JDK8_Nightly_Build_Report_(Hadoop3)/] (/) {color:green}+1 source release artifact{color} -- See build output for details. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16426639#comment-16426639 ] Wellington Chevreuil commented on HBASE-20305: -- Thanks for reviewing it [~yuzhih...@gmail.com] [~davelatham] ! > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425932#comment-16425932 ] Ted Yu commented on HBASE-20305: {code} [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.5.2:install (default-install) on project hbase-thrift: Failed to install metadata org.apache.hbase:hbase-thrift:3.0.0-SNAPSHOT/maven-metadata.xml: Could not parse metadata /home/jenkins/.m2/repository/org/apache/hbase/hbase-thrift/3.0.0-SNAPSHOT/maven-metadata-local.xml: in epilog non whitespace content is not allowed but got / (position: END_TAG seen ...\n/... @25:2) -> [Help 1] {code} The above occurred in other QA runs too - not related to the patch. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Fix For: 3.0.0 > > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16425900#comment-16425900 ] Dave Latham commented on HBASE-20305: - +1 assuming the hadoopcheck errors are worked out. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424548#comment-16424548 ] Ted Yu commented on HBASE-20305: lgtm > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424450#comment-16424450 ] Hadoop QA commented on HBASE-20305: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 10s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 49s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 28s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 59s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 38s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 48s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 30s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 16s{color} | {color:green} hbase-mapreduce: The patch generated 0 new + 7 unchanged - 1 fixed = 7 total (was 8) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 4m 37s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 7m 2s{color} | {color:red} The patch causes 10 errors with Hadoop v2.6.5. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 9m 11s{color} | {color:red} The patch causes 10 errors with Hadoop v2.7.4. {color} | | {color:red}-1{color} | {color:red} hadoopcheck {color} | {color:red} 11m 35s{color} | {color:red} The patch causes 10 errors with Hadoop v3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 46s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 13s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 13m 20s{color} | {color:green} hbase-mapreduce in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 10s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 43m 14s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20305 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12917413/HBASE-20305.master.002.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 129c4a6a4fc9 4.4.0-43-generic #63-Ubuntu SMP Wed Oct 12 13:48:03 UTC 2016 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh | | git revision | master / bf29a1fee9 | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC3 | | hadoopcheck | https://builds.apache.org/j
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424382#comment-16424382 ] Wellington Chevreuil commented on HBASE-20305: -- Added new patch with the suggested the changes and additional tests for doPuts. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: 0001-HBASE-20305.master.001.patch, > HBASE-20305.master.002.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16424058#comment-16424058 ] Wellington Chevreuil commented on HBASE-20305: -- Thanks for the review and suggestions [~davelatham]! Indeed, *doDeletes* sounds more intuitive. I'm gonna work on this and the additional *doPuts,* together with reviews for checkstyle violations. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: 0001-HBASE-20305.master.001.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420735#comment-16420735 ] Hadoop QA commented on HBASE-20305: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 16s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s{color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s{color} | {color:green} The patch appears to include 1 new or modified test files. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 35s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 0m 18s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 38s{color} | {color:green} branch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 41s{color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 15s{color} | {color:green} master passed {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 5m 23s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 34s{color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 0m 17s{color} | {color:red} hbase-mapreduce: The patch generated 1 new + 7 unchanged - 1 fixed = 8 total (was 8) {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} shadedjars {color} | {color:green} 5m 7s{color} | {color:green} patch has no errors when building our shaded downstream artifacts. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 22m 14s{color} | {color:green} Patch does not cause any errors with Hadoop 2.6.5 2.7.4 or 3.0.0. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 50s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 16s{color} | {color:green} the patch passed {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} unit {color} | {color:green} 17m 30s{color} | {color:green} hbase-mapreduce in the patch passed. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 11s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 60m 34s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:d8b550f | | JIRA Issue | HBASE-20305 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12916692/0001-HBASE-20305.master.001.patch | | Optional Tests | asflicense javac javadoc unit findbugs shadedjars hadoopcheck hbaseanti checkstyle compile | | uname | Linux 50812b01c7f5 4.4.0-98-generic #121-Ubuntu SMP Tue Oct 10 14:24:03 UTC 2017 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / d57001ee2d | | maven | version: Apache Maven 3.5.3 (3383c37e1f9e9b3bc3df5050c29c8aff9f295297; 2018-02-24T19:49:05Z) | | Default Java | 1.8.0_162 | | findbugs | v3.1.0-RC3 | | checkstyle | https://builds.apache.org/job/PreCommit-HBASE-Build/12245/artifact/patchprocess/diff-checkstyle-hbase-mapreduce.txt | | Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/12245/testReport/ | | Max. process+thread count | 4987 (vs. ulimit of 1) | | modules | C: hbase-mapreduce U: hbase-mapreduce | |
[jira] [Commented] (HBASE-20305) Add option to SyncTable that skip deletes on target cluster
[ https://issues.apache.org/jira/browse/HBASE-20305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16420663#comment-16420663 ] Dave Latham commented on HBASE-20305: - Thanks for the patch, Wellington. Looks like it also fixes a nasty bug where dryRun doesn't actually appear to work. I hope that did not bite you in using the tool. I think I'd prefer calling the option something like doDeletes (default true) rather than insertsOnly (default false). Could also have a similar option for doPuts to allow people to do deletes but not puts if they preferred. +0.9 as is. I'm going to hit the Submit Patch button to try to get Hadoop QA to take a pass. > Add option to SyncTable that skip deletes on target cluster > --- > > Key: HBASE-20305 > URL: https://issues.apache.org/jira/browse/HBASE-20305 > Project: HBase > Issue Type: Improvement > Components: mapreduce >Affects Versions: 2.0.0-alpha-4 >Reporter: Wellington Chevreuil >Assignee: Wellington Chevreuil >Priority: Minor > Attachments: 0001-HBASE-20305.master.001.patch > > > We had a situation where two clusters with active-active replication got out > of sync, but both had data that should be kept. The tables in question never > have data deleted, but ingestion had happened on the two different clusters, > some rows had been even updated. > In this scenario, a cell that is present in one of the table clusters should > not be deleted, but replayed on the other. Also, for cells with same > identifier but different values, the most recent value should be kept. > Current version of SyncTable would not be applicable here, because it would > simply copy the whole state from source to target, then losing any additional > rows that might be only in target, as well as cell values that got most > recent update. This could be solved by adding an option to skip deletes for > SyncTable. This way, the additional cells not present on source would still > be kept. For cells with same identifier but different values, it would just > perform a Put for the cell version from source, but client scans would still > fetch the most recent timestamp. > I'm attaching a patch with this additional option shortly. Please share your > thoughts. > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)