[ https://issues.apache.org/jira/browse/HBASE-17871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958766#comment-15958766 ]
Tomu Tsuruhara commented on HBASE-17871: ---------------------------------------- [~yangzhe1991] Thanks! Yes, the patch can be applied to the branch-1 cleanly. > scan#setBatch(int) call leads wrong result of VerifyReplication > --------------------------------------------------------------- > > Key: HBASE-17871 > URL: https://issues.apache.org/jira/browse/HBASE-17871 > Project: HBase > Issue Type: Bug > Affects Versions: 2.0.0, 1.4.0 > Reporter: Tomu Tsuruhara > Assignee: Tomu Tsuruhara > Priority: Minor > Attachments: after.png, beforethepatch.png, > HBASE-17871.master.001.patch, HBASE-17871.master.002.patch, > HBASE-17871.master.003.patch, HBASE-17871.master.003.patch, > HBASE-17871.master.004.patch > > > VerifyReplication tool printed weird logs. > {noformat} > 2017-04-03 23:30:50,252 ERROR [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: > CONTENT_DIFFERENT_ROWS, rowkey=a00001001930000 > 2017-04-03 23:30:50,280 ERROR [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: > ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001001930000 > 2017-04-03 23:30:50,387 ERROR [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: > CONTENT_DIFFERENT_ROWS, rowkey=a00001003850000 > 2017-04-03 23:30:50,414 ERROR [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: > ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001003850000 > 2017-04-03 23:30:50,480 ERROR [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: > CONTENT_DIFFERENT_ROWS, rowkey=a00001005320000 > 2017-04-03 23:30:50,508 ERROR [main] > org.apache.hadoop.hbase.mapreduce.replication.VerifyReplication: > ONLY_IN_PEER_TABLE_ROWS, rowkey=a00001005320000 > {noformat} > Here, each bad rows were marked as both {{CONTENT_DIFFERENT_ROWS}} and > {{ONLY_IN_PEER_TABLE_ROWS}}. > This should never happen so I took a look at code and found scan.setBatch > call. > {code} > @Override > public void map(ImmutableBytesWritable row, final Result value, > Context context) > throws IOException { > if (replicatedScanner == null) { > ... > final Scan scan = new Scan(); > scan.setBatch(batch); > {code} > As stated in HBASE-16376, {{scan#setBatch(int)}} call implicitly allows scan > results to be partial. > Since {{VerifyReplication}} is assuming each {{scanner.next()}} call returns > entire row, > partial results break compare logic. > We should avoid setBatch call here. > Thanks to RPC chunking (explained in this blog > https://blogs.apache.org/hbase/entry/scan_improvements_in_hbase_1), > it's safe and acceptable I think. -- This message was sent by Atlassian JIRA (v6.3.15#6346)