[jira] [Commented] (HBASE-15557) document SyncTable in ref guide
[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677952#comment-16677952 ] Hadoop QA commented on HBASE-15557: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 3m 21s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 7m 16s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} refguide {color} | {color:blue} 5m 11s{color} | {color:blue} branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 4m 33s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} refguide {color} | {color:blue} 4m 47s{color} | {color:blue} patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 25m 44s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-15557 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12947209/HBASE-15557.master.002.patch | | Optional Tests | dupname asflicense refguide | | uname | Linux 3ce3fcc68a65 4.4.0-137-generic #163-Ubuntu SMP Mon Sep 24 13:14:43 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 6d46b8d256 | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | refguide | https://builds.apache.org/job/PreCommit-HBASE-Build/14980/artifact/patchprocess/branch-site/book.html | | refguide | https://builds.apache.org/job/PreCommit-HBASE-Build/14980/artifact/patchprocess/patch-site/book.html | | Max. process+thread count | 93 (vs. ulimit of 1) | | modules | C: . U: . | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14980/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > document SyncTable in ref guide > --- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 1.2.0 >Reporter: Sean Busbey >Assignee: Wellington Chevreuil >Priority: Critical > Attachments: HBASE-15557.master.001.patch, > HBASE-15557.master.002.patch > > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (d
[jira] [Commented] (HBASE-15557) document SyncTable in ref guide
[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677913#comment-16677913 ] Wellington Chevreuil commented on HBASE-15557: -- Thanks for noticing that, [~busbey], actually I had submitted new patch file with email info corrected. > document SyncTable in ref guide > --- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 1.2.0 >Reporter: Sean Busbey >Assignee: Wellington Chevreuil >Priority: Critical > Attachments: HBASE-15557.master.001.patch, > HBASE-15557.master.002.patch > > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > produce a hash of the data for each range. > Make the hashes available to the target cluster - I'd recommend using DistCp > to copy it across. > In the target table's cluster, run > org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where > you put the hashes, and the names of the source and destination tables. You > will likely also need to specify the source table's ZK quorum via the > --sourcezkcluster option. SyncTable will then read the hash information, and > compute the hashes of the same row ranges for the target table. For any row > range where the hash fails to match, it will open a remote scanner to the > source table, read the data for that range, and do Puts and Deletes to the > target table to update it to match the source. > I hope that clarifies it a bit. Let me know if you need a hand. If anyone > wants to work on getting some documentation into the book, I can try to write > some more but would love a hand on turning it into an actual book patch. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15557) document SyncTable in ref guide
[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16677001#comment-16677001 ] Sean Busbey commented on HBASE-15557: - looks great. I'm all set to push this. [~wchevreuil] is the email address on the current patch the one you want attribution to go to? > document SyncTable in ref guide > --- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 1.2.0 >Reporter: Sean Busbey >Assignee: Wellington Chevreuil >Priority: Critical > Attachments: HBASE-15557.master.001.patch > > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > produce a hash of the data for each range. > Make the hashes available to the target cluster - I'd recommend using DistCp > to copy it across. > In the target table's cluster, run > org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where > you put the hashes, and the names of the source and destination tables. You > will likely also need to specify the source table's ZK quorum via the > --sourcezkcluster option. SyncTable will then read the hash information, and > compute the hashes of the same row ranges for the target table. For any row > range where the hash fails to match, it will open a remote scanner to the > source table, read the data for that range, and do Puts and Deletes to the > target table to update it to match the source. > I hope that clarifies it a bit. Let me know if you need a hand. If anyone > wants to work on getting some documentation into the book, I can try to write > some more but would love a hand on turning it into an actual book patch. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15557) document SyncTable in ref guide
[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16676973#comment-16676973 ] Hadoop QA commented on HBASE-15557: --- | (/) *{color:green}+1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 19s{color} | {color:blue} Docker mode activated. {color} | || || || || {color:brown} Prechecks {color} || | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s{color} | {color:green} The patch does not contain any @author tags. {color} | || || || || {color:brown} master Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 9s{color} | {color:green} master passed {color} | | {color:blue}0{color} | {color:blue} refguide {color} | {color:blue} 6m 0s{color} | {color:blue} branch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect. {color} | || || || || {color:brown} Patch Compile Tests {color} || | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 6m 1s{color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s{color} | {color:green} The patch has no whitespace issues. {color} | | {color:blue}0{color} | {color:blue} refguide {color} | {color:blue} 5m 44s{color} | {color:blue} patch has no errors when building the reference guide. See footer for rendered docs, which you should manually inspect. {color} | || || || || {color:brown} Other Tests {color} || | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 18s{color} | {color:green} The patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 24m 52s{color} | {color:black} {color} | \\ \\ || Subsystem || Report/Notes || | Docker | Client=17.05.0-ce Server=17.05.0-ce Image:yetus/hbase:b002b0b | | JIRA Issue | HBASE-15557 | | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12945650/HBASE-15557.master.001.patch | | Optional Tests | dupname asflicense refguide | | uname | Linux f7729664dd37 4.4.0-138-generic #164-Ubuntu SMP Tue Oct 2 17:16:02 UTC 2018 x86_64 GNU/Linux | | Build tool | maven | | Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build/component/dev-support/hbase-personality.sh | | git revision | master / 86cbbdea9e | | maven | version: Apache Maven 3.5.4 (1edded0938998edf8bf061f1ceb3cfdeccf443fe; 2018-06-17T18:33:14Z) | | refguide | https://builds.apache.org/job/PreCommit-HBASE-Build/14962/artifact/patchprocess/branch-site/book.html | | refguide | https://builds.apache.org/job/PreCommit-HBASE-Build/14962/artifact/patchprocess/patch-site/book.html | | Max. process+thread count | 93 (vs. ulimit of 1) | | modules | C: . U: . | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/14962/console | | Powered by | Apache Yetus 0.8.0 http://yetus.apache.org | This message was automatically generated. > document SyncTable in ref guide > --- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 1.2.0 >Reporter: Sean Busbey >Assignee: Wellington Chevreuil >Priority: Critical > Attachments: HBASE-15557.master.001.patch > > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > pr
[jira] [Commented] (HBASE-15557) document SyncTable in ref guide
[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664224#comment-16664224 ] Wellington Chevreuil commented on HBASE-15557: -- Attached patch with my proposed description for HashTable/SyncTable on the ref guide. Please review and let me know on any suggestions. > document SyncTable in ref guide > --- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 1.2.0 >Reporter: Sean Busbey >Assignee: Wellington Chevreuil >Priority: Critical > Attachments: HBASE-15557.master.001.patch > > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > produce a hash of the data for each range. > Make the hashes available to the target cluster - I'd recommend using DistCp > to copy it across. > In the target table's cluster, run > org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where > you put the hashes, and the names of the source and destination tables. You > will likely also need to specify the source table's ZK quorum via the > --sourcezkcluster option. SyncTable will then read the hash information, and > compute the hashes of the same row ranges for the target table. For any row > range where the hash fails to match, it will open a remote scanner to the > source table, read the data for that range, and do Puts and Deletes to the > target table to update it to match the source. > I hope that clarifies it a bit. Let me know if you need a hand. If anyone > wants to work on getting some documentation into the book, I can try to write > some more but would love a hand on turning it into an actual book patch. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-15557) document SyncTable in ref guide
[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437596#comment-16437596 ] Roland Teague commented on HBASE-15557: --- [~davelatham] Can you add documentation for this MR tool to the HBase Ref Guide on how to use the tool? This have been open for 2 years now. > document SyncTable in ref guide > --- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation >Affects Versions: 1.2.0 >Reporter: Sean Busbey >Priority: Critical > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > produce a hash of the data for each range. > Make the hashes available to the target cluster - I'd recommend using DistCp > to copy it across. > In the target table's cluster, run > org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where > you put the hashes, and the names of the source and destination tables. You > will likely also need to specify the source table's ZK quorum via the > --sourcezkcluster option. SyncTable will then read the hash information, and > compute the hashes of the same row ranges for the target table. For any row > range where the hash fails to match, it will open a remote scanner to the > source table, read the data for that range, and do Puts and Deletes to the > target table to update it to match the source. > I hope that clarifies it a bit. Let me know if you need a hand. If anyone > wants to work on getting some documentation into the book, I can try to write > some more but would love a hand on turning it into an actual book patch. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)