[ https://issues.apache.org/jira/browse/HBASE-20769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16527098#comment-16527098 ]
Jingyun Tian commented on HBASE-20769: -------------------------------------- [~apurtell] Sorry for the delay. Patch uploaded. > getSplits() has a out of bounds problem in TableSnapshotInputFormatImpl > ----------------------------------------------------------------------- > > Key: HBASE-20769 > URL: https://issues.apache.org/jira/browse/HBASE-20769 > Project: HBase > Issue Type: Bug > Affects Versions: 1.3.0, 1.4.0, 2.0.0 > Reporter: Jingyun Tian > Assignee: Jingyun Tian > Priority: Major > Fix For: 2.0.0 > > Attachments: HBASE-20769.branch-1.001.patch, > HBASE-20769.master.001.patch, HBASE-20769.master.002.patch, > HBASE-20769.master.003.patch, HBASE-20769.master.004.patch > > > When numSplits > 1, getSplits may create split that has start row smaller > than user specified scan's start row or stop row larger than user specified > scan's stop row. > {code} > byte[][] sp = sa.split(hri.getStartKey(), hri.getEndKey(), numSplits, > true); > for (int i = 0; i < sp.length - 1; i++) { > if (PrivateCellUtil.overlappingKeys(scan.getStartRow(), > scan.getStopRow(), sp[i], > sp[i + 1])) { > List<String> hosts = > calculateLocationsForInputSplit(conf, htd, hri, tableDir, > localityEnabled); > Scan boundedScan = new Scan(scan); > boundedScan.setStartRow(sp[i]); > boundedScan.setStopRow(sp[i + 1]); > splits.add(new InputSplit(htd, hri, hosts, boundedScan, > restoreDir)); > } > } > {code} > Since we split keys by the range of regions, when sp[i] < scan.getStartRow() > or sp[i + 1] > scan.getStopRow(), the created bounded scan may contain range > that over user defined scan. > fix should be simple: > {code} > boundedScan.setStartRow( > Bytes.compareTo(scan.getStartRow(), sp[i]) > 0 ? scan.getStartRow() : sp[i]); > boundedScan.setStopRow( > Bytes.compareTo(scan.getStopRow(), sp[i + 1]) < 0 ? scan.getStopRow() : sp[i > + 1]); > {code} > I will also try to add UTs to help discover this problem -- This message was sent by Atlassian JIRA (v7.6.3#76005)