[ https://issues.apache.org/jira/browse/HBASE-15557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437596#comment-16437596 ]
Roland Teague commented on HBASE-15557: --------------------------------------- [~davelatham] Can you add documentation for this MR tool to the HBase Ref Guide on how to use the tool? This have been open for 2 years now. > document SyncTable in ref guide > ------------------------------- > > Key: HBASE-15557 > URL: https://issues.apache.org/jira/browse/HBASE-15557 > Project: HBase > Issue Type: Bug > Components: documentation > Affects Versions: 1.2.0 > Reporter: Sean Busbey > Priority: Critical > > The docs for SyncTable are insufficient. Brief description from [~davelatham] > HBASE-13639 comment: > {quote} > Sorry for the lack of better documentation, Abhishek Soni. Thanks for > bringing it up. I'll try to provide a better explanation. You may have > already seen it, but if not, the design doc linked in the description above > may also give you some better clues as to how it should be used. > Briefly, the feature is intended to start with a pair of tables in remote > clusters that are already substantially similar and make them identical by > comparing hashes of the data and copying only the diffs instead of having to > copy the entire table. So it is targeted at a very specific use case (with > some work it could generalize to cover things like CopyTable and > VerifyRepliaction but it's not there yet). To use it, you choose one table to > be the "source", and the other table is the "target". After the process is > complete the target table should end up being identical to the source table. > In the source table's cluster, run > org.apache.hadoop.hbase.mapreduce.HashTable and pass it the name of the > source table and an output directory in HDFS. HashTable will scan the source > table, break the data up into row key ranges (default of 8kB per range) and > produce a hash of the data for each range. > Make the hashes available to the target cluster - I'd recommend using DistCp > to copy it across. > In the target table's cluster, run > org.apache.hadoop.hbase.mapreduce.SyncTable and pass it the directory where > you put the hashes, and the names of the source and destination tables. You > will likely also need to specify the source table's ZK quorum via the > --sourcezkcluster option. SyncTable will then read the hash information, and > compute the hashes of the same row ranges for the target table. For any row > range where the hash fails to match, it will open a remote scanner to the > source table, read the data for that range, and do Puts and Deletes to the > target table to update it to match the source. > I hope that clarifies it a bit. Let me know if you need a hand. If anyone > wants to work on getting some documentation into the book, I can try to write > some more but would love a hand on turning it into an actual book patch. > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)