It is a SkipScanFilter bug？

程磊 Tue, 28 Feb 2017 03:41:33 -0800

When I work on PHOENIX-1193 to make skip scan work with reverse scan and read 
the code of SkipScanFilter.navigate method, it seems that 
SkipScanFilter.navigate method has a bug, please help me see if I am right.



See following simple unit test first,the rowKey is composed of three PInteger 
columns,and the slots of SkipScanFilter are:
[[[1 - 4]], [5, 7], [[9 - 10]]],
when SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose rowKey 
is [2,7,11], obviously SkipScanFilter.filterKeyValue
returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint 
returns [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually 
returns [2,8,5,9] , a very strange value, following unit tests failed.


@Test public void testNavigate() { RowKeySchemaBuilder builder = new 
RowKeySchemaBuilder(3); for(int i=0;i<3;i++) { builder.addField( new PDatum() { 
@Override public boolean isNullable() { return false; } @Override public 
PDataType getDataType() { return PInteger.INSTANCE; } @Override public Integer 
getMaxLength() { return PInteger.INSTANCE.getMaxLength(null); } @Override 
public Integer getScale() { return PInteger.INSTANCE.getScale(null); } 
@Override public SortOrder getSortOrder() { return SortOrder.getDefault(); } }, 
false, SortOrder.getDefault()); } List<List<KeyRange>> 
rowKeyColumnRangesList=Arrays.asList( Arrays.asList( 
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1), true, 
PInteger.INSTANCE.toBytes(4), true)), Arrays.asList( 
KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)), 
KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))), Arrays.asList( 
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), true, 
PInteger.INSTANCE.toBytes(10), true)) ); SkipScanFilter skipScanFilter=new 
SkipScanFilter(rowKeyColumnRangesList, builder.build()); 
System.out.println(skipScanFilter); byte[] rowKey=ByteUtil.concat( 
PInteger.INSTANCE.toBytes(2), PInteger.INSTANCE.toBytes(7), 
PInteger.INSTANCE.toBytes(11)); KeyValue 
keyValue=KeyValue.createFirstOnRow(rowKey); ReturnCode 
returnCode=skipScanFilter.filterKeyValue(keyValue); assertTrue(returnCode == 
ReturnCode.SEEK_NEXT_USING_HINT); Cell 
nextCellHint=skipScanFilter.getNextCellHint(keyValue); 
assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals("\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09"));
 }


Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in 
SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the 
second column 7, which match the second range [7] of SkipScanFilter's second 
slot [5, 7],so position[1] is 1 and we go to the third column 11, which is 
bigger than the third slot range [9 - 10],so position[2] is 0 and we begin to 
backtrack to second column, because the second range [7] of SkipScanFilter's 
second slot is singleKey and there is no more range,so position[1] is 0 and we 
continue to backtrack to first column, because the first slot range [1-4] is 
not singleKey so we stop backtracking at first column.


Now the problem comes, in following line 448 of SkipScanFilter.navigate 
method,SkipScanFilter.setStartKey method is invoked,first copy rowKey columns 
before ptr to SkipScanFilter.startKey, because now ptr still point to the third 
column, so copy the first and second columns to 
SkipScanFilter.startKey,SkipScanFilter.startKey is [2,7] after this step , then 
setStartKey method copy the lower bound SkipScanFilter.slots from j+1 column, 
accoring to SkipScanFilter.position array,now j is 0, and both position[1] and 
position[2] are 0,so SkipScanFilter.startKey becomes [2,7,5,9], and in 
following line 457, ByteUtil.nextKey is invoked on [2,7], [2,7] is incremented 
to [2,8], finally SkipScanFilter.startKey is [2,8,5,9].


| 448 int currentLength = setStartKey(ptr, minOffset, j+1, nSlots, false); 449 
// From here on, we use startKey as our buffer (resetting minOffset and 
maxOffset) 450 // We've copied the part of the current key above that we need 
into startKey 451 // Reinitialize the iterator to be positioned at previous 
slot position 452 minOffset = 0; 453 maxOffset = startKeyLength; 454 
schema.iterator(startKey, minOffset, maxOffset, ptr, 
ScanUtil.getRowKeyPosition(slotSpan, j)+1); 455 // Do nextKey after setting the 
accessor b/c otherwise the null byte may have 456 // been incremented causing 
us not to find it 457 ByteUtil.nextKey(startKey, currentLength); |


From above analysis, we can see the second column is repeatedly copied to 
SkipScanFilter.startKey, copy 7 to SkipScanFilter.startKey is error ,before 
SkipScanFilter.setStartKey method is invoked in line 448, we should first move 
ptr to j+1, that is to say ,following code might be inserted before line 448:


schema.reposition( ptr, ScanUtil.getRowKeyPosition(slotSpan, i), 
ScanUtil.getRowKeyPosition(slotSpan, j+1), minOffset, maxOffset, slotSpan[j+1]);


Another problem is why the existing SkipScanFilter unit tests are ok? That is 
because in all existing unit test cases, backtrack column is just before 
current column which ptr points to, i.e. i==j+1, and this problem may not 
affect the final query result , but obviously, it may lead to the erroneous 
startKey and cause the Skip Scan inefficient.

It is a SkipScanFilter bug？

Reply via email to