[ https://issues.apache.org/jira/browse/PHOENIX-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892629#comment-15892629 ]
chenglei edited comment on PHOENIX-3705 at 3/2/17 5:24 PM: ----------------------------------------------------------- I write following unit test for Row Value Constructor(RVC), ,the rowKey is composed of four PInteger columns,and the slots of SkipScanFilter are: [ [[(1,2) - (3,4)]], [5, 7], [[9 - 10]] ], the first slot is RVC,when SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose rowKey is [2,3,7,11], SkipScanFilter.filterKeyValue should return ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint should return [2,4,5,9]. {code} @Test public void testRVCSkipScanFilter() throws Exception { RowKeySchemaBuilder builder = new RowKeySchemaBuilder(4); for(int i=0;i<4;i++) { builder.addField( new PDatum() { @Override public boolean isNullable() { return false; } @Override public PDataType getDataType() { return PInteger.INSTANCE; } @Override public Integer getMaxLength() { return PInteger.INSTANCE.getMaxLength(null); } @Override public Integer getScale() { return PInteger.INSTANCE.getScale(null); } @Override public SortOrder getSortOrder() { return SortOrder.getDefault(); } }, false, SortOrder.getDefault()); } List<List<RowKeyRange>> rowKeyColumnRangesList=Arrays.asList( Arrays.asList( RowKeyRange.createRowKeyRange( ByteUtil.concat(PInteger.INSTANCE.toBytes(1),PInteger.INSTANCE.toBytes(2)), true, ByteUtil.concat(PInteger.INSTANCE.toBytes(3),PInteger.INSTANCE.toBytes(4)), true)), Arrays.asList( RowKeyRange.createRowKeyRange(PInteger.INSTANCE.toBytes(5)), RowKeyRange.createRowKeyRange(PInteger.INSTANCE.toBytes(7))), Arrays.asList( PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), true, PInteger.INSTANCE.toBytes(10), true)) ); SkipScanFilter skipScanFilter=new SkipScanFilter(rowKeyColumnRangesList, new int[]{1,0,0},builder.build()); byte[] rowKey=ByteUtil.concat( PInteger.INSTANCE.toBytes(2), PInteger.INSTANCE.toBytes(3), PInteger.INSTANCE.toBytes(7), PInteger.INSTANCE.toBytes(11)); KeyValue keyValue=KeyValue.createFirstOnRow(rowKey); ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue); assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT); Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue); assertTrue(Arrays.equals( CellUtil.cloneRow(nextCellHint), ByteUtil.concat( PInteger.INSTANCE.toBytes(2), PInteger.INSTANCE.toBytes(4), PInteger.INSTANCE.toBytes(5), PInteger.INSTANCE.toBytes(9)) )); } {code} Take the above unit test for example, we can not use ScanUtil.getRowKeyPosition(slotSpan, j+1) in line 454: {code} 454 schema.iterator(startKey, minOffset, maxOffset, ptr, ScanUtil.getRowKeyPosition(slotSpan, j)+1); {code} That is because the intention of line 454 is to make ptr point to the first rowKeyColumn of j slot,in my unit test, j is 0, and slot 0 has two rowKey Columns: rowKey column 0 and rowKey column 1 ,so line 454 is to make ptr point to rowKey column 0. If we use ScanUtil.getRowKeyPosition(slotSpan, j+1) in line 454, we make ptr point to the last rowKeyColumn of j slot, in my unit test, make ptr point to rowKey column 1.Obviously, we should make ptr point to rowKey column 0, not rowKey column 1. It may be more clear if we see the following code of RowKeySchema.iterator method, it actually make the ptr point to rowKey column which index is {{position-1}}, not {{position}}.BTW, the method name "iterator" seems misleading and meaningless,maybe we could rename it as "moveToColumnBefore" , etc. {code} public Boolean iterator(byte[] src, int srcOffset, int srcLength, ImmutableBytesWritable ptr, int position) { Boolean hasValue = null; ptr.set(src, srcOffset, 0); int maxOffset = srcOffset + srcLength; for (int i = 0; i < position; i++) { hasValue = next(ptr, i, maxOffset); } return hasValue; } {code} However, there is another problem in line 454 for RVC, after RowKeySchema.iterator method is called,ptr only point to rowKey column 0, but we expect ptr pointing to rowKey Column 0 and rowKey 1,that is to say, we expect the value of ptr is (2,3) ,but actually ptr is only 2, that is because RowKeySchema.iterator method does not conside RVC as RowKeySchema.reposition method and RowKeySchema.next method, take following RowKeySchema.reposition method as example,it invoke readExtraFields method with extraSpan parameter for RVC. {code} public Boolean reposition(ImmutableBytesWritable ptr, int oldPosition, int newPosition, int minOffset, int maxOffset, int extraSpan) { Boolean returnValue = reposition(ptr, oldPosition, newPosition, minOffset, maxOffset); readExtraFields(ptr, newPosition + 1, maxOffset, extraSpan); return returnValue; } {code} so line 454 should be as following code,with additional slotSpan[j] parameter: {code} schema.iterator(startKey, minOffset, maxOffset, ptr, ScanUtil.getRowKeyPosition(slotSpan, j)+1, slotSpan[j]); {code} I will make more RVC tests and update my patch. was (Author: comnetwork): I write following unit test for Row Value Constructor(RVC), ,the rowKey is composed of four PInteger columns,and the slots of SkipScanFilter are: [ [[(1,2) - (3,4)]], [5, 7], [[9 - 10]] ], the first slot is RVC,when SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose rowKey is [2,3,7,11], SkipScanFilter.filterKeyValue should return ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint should return [2,4,5,9]. {code} @Test public void testRVCSkipScanFilter() throws Exception { RowKeySchemaBuilder builder = new RowKeySchemaBuilder(4); for(int i=0;i<4;i++) { builder.addField( new PDatum() { @Override public boolean isNullable() { return false; } @Override public PDataType getDataType() { return PInteger.INSTANCE; } @Override public Integer getMaxLength() { return PInteger.INSTANCE.getMaxLength(null); } @Override public Integer getScale() { return PInteger.INSTANCE.getScale(null); } @Override public SortOrder getSortOrder() { return SortOrder.getDefault(); } }, false, SortOrder.getDefault()); } List<List<RowKeyRange>> rowKeyColumnRangesList=Arrays.asList( Arrays.asList( RowKeyRange.createRowKeyRange( ByteUtil.concat(PInteger.INSTANCE.toBytes(1),PInteger.INSTANCE.toBytes(2)), true, ByteUtil.concat(PInteger.INSTANCE.toBytes(3),PInteger.INSTANCE.toBytes(4)), true)), Arrays.asList( RowKeyRange.createRowKeyRange(PInteger.INSTANCE.toBytes(5)), RowKeyRange.createRowKeyRange(PInteger.INSTANCE.toBytes(7))), Arrays.asList( PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), true, PInteger.INSTANCE.toBytes(10), true)) ); SkipScanFilter skipScanFilter=new SkipScanFilter(rowKeyColumnRangesList, new int[]{1,0,0},builder.build()); byte[] rowKey=ByteUtil.concat( PInteger.INSTANCE.toBytes(2), PInteger.INSTANCE.toBytes(3), PInteger.INSTANCE.toBytes(7), PInteger.INSTANCE.toBytes(11)); KeyValue keyValue=KeyValue.createFirstOnRow(rowKey); ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue); assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT); Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue); assertTrue(Arrays.equals( CellUtil.cloneRow(nextCellHint), ByteUtil.concat( PInteger.INSTANCE.toBytes(2), PInteger.INSTANCE.toBytes(4), PInteger.INSTANCE.toBytes(5), PInteger.INSTANCE.toBytes(9)) )); } {code} Take the above unit test for example, we can not use ScanUtil.getRowKeyPosition(slotSpan, j+1) in line 454: {code} 454 schema.iterator(startKey, minOffset, maxOffset, ptr, ScanUtil.getRowKeyPosition(slotSpan, j)+1); {code} That is because the intention of line 454 is to make ptr point to the first rowKeyColumn of j slot,in my unit test, j is 0, and slot 0 has two rowKey Columns: rowKey column 0 and rowKey column 1 ,so line 454 is to make ptr point to rowKey column 0. If we use ScanUtil.getRowKeyPosition(slotSpan, j+1) in line 454, we make ptr point to the last rowKeyColumn of j slot, in my unit test, make ptr point to rowKey column 1.Obviously, we should make ptr point to rowKey column 0, not rowKey column 1. It may be more clear if we see the following code of RowKeySchema.iterator method, it actually make the ptr point to rowKey column which index is {{position-1}}, not {{position}}.BTW, the method name "iterator" seems misleading and meaningless,maybe we could rename it as "moveToColumnBefore" , etc. {code} public Boolean iterator(byte[] src, int srcOffset, int srcLength, ImmutableBytesWritable ptr, int position) { Boolean hasValue = null; ptr.set(src, srcOffset, 0); int maxOffset = srcOffset + srcLength; for (int i = 0; i < position; i++) { hasValue = next(ptr, i, maxOffset); } return hasValue; } {code} However, there is another problem in line 454 for RVC, after RowKeySchema.iterator method is called,ptr only point to rowKey column 0, but we expect ptr pointing to rowKey Column 0 and rowKey 1,that is to say, we expect the value of ptr is (2,3) ,but actually ptr is only 2, that is because RowKeySchema.iterator method does not conside RVC as RowKeySchema.reposition method and RowKeySchema.next method, take following RowKeySchema.reposition method as example,it invoke readExtraFields method with extraSpan parameter for RVC. {code} public Boolean reposition(ImmutableBytesWritable ptr, int oldPosition, int newPosition, int minOffset, int maxOffset, int extraSpan) { Boolean returnValue = reposition(ptr, oldPosition, newPosition, minOffset, maxOffset); readExtraFields(ptr, newPosition + 1, maxOffset, extraSpan); return returnValue; } {code} so line 454 should be as following code,with additional slotSpan[j] parameter: {code} schema.iterator(startKey, minOffset, maxOffset, ptr, ScanUtil.getRowKeyPosition(slotSpan, j)+1, slotSpan[j]); {code} I will make more RVC tests and update my patch. > SkipScanFilter may repeatedly copy rowKey Columns to startKey > ------------------------------------------------------------- > > Key: PHOENIX-3705 > URL: https://issues.apache.org/jira/browse/PHOENIX-3705 > Project: Phoenix > Issue Type: Bug > Affects Versions: 4.9.0 > Reporter: chenglei > Priority: Blocker > Fix For: 4.10.0 > > Attachments: PHOENIX-3705_v1.patch > > > See following simple unit test first,the rowKey is composed of three PInteger > columns,and the slots of SkipScanFilter are: > [ [[1 - 4]], [5, 7], [[9 - 10]] ], > When SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose > rowKey is [2,7,11], obviously SkipScanFilter.filterKeyValue > returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint > returns [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually > returns [2,8,5,9] , a very strange value, the unit tests failed. > {code} > @Test > public void testNavigate() { > RowKeySchemaBuilder builder = new RowKeySchemaBuilder(3); > for(int i=0;i<3;i++) { > builder.addField( > new PDatum() { > @Override > public boolean isNullable() { > return false; > } > @Override > public PDataType getDataType() { > return PInteger.INSTANCE; > } > @Override > public Integer getMaxLength() { > return PInteger.INSTANCE.getMaxLength(null); > } > @Override > public Integer getScale() { > return PInteger.INSTANCE.getScale(null); > } > @Override > public SortOrder getSortOrder() { > return SortOrder.getDefault(); > } > }, false, SortOrder.getDefault()); > } > > List<List<KeyRange>> rowKeyColumnRangesList=Arrays.asList( > Arrays.asList( > > PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1), true, > PInteger.INSTANCE.toBytes(4), true)), > Arrays.asList( > KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)), > KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))), > Arrays.asList( > > PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), true, > PInteger.INSTANCE.toBytes(10), true)) > ); > > SkipScanFilter skipScanFilter=new > SkipScanFilter(rowKeyColumnRangesList, builder.build()); > > System.out.println(skipScanFilter); > > byte[] rowKey=ByteUtil.concat( > PInteger.INSTANCE.toBytes(2), > PInteger.INSTANCE.toBytes(7), > PInteger.INSTANCE.toBytes(11)); > KeyValue keyValue=KeyValue.createFirstOnRow(rowKey); > ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue); > assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT); > Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue); > > assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals( > "\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09")); > } > {code} > Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in > SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the > second column 7, which match the second range [7] of SkipScanFilter's second > slot [5, 7],so position[1] is 1 and we go to the third column 11, which is > bigger than the third slot range [9 - 10],so position[2] is 0 and the > {{SkipScanFilter.ptr}} which points to current column still stay on the third > column. Now we begin to backtrack to second column, because the second range > [7] of SkipScanFilter's second slot is singleKey and there is no more > range,so position[1] is 0 and we continue to backtrack to first column, > because the first slot range [1-4] is not singleKey so we stop backtracking > at first column. > Now the problem comes, in following line 448 of {{SkipScanFilter.navigate}} > method,{{SkipScanFilter.setStartKey}} method is invoked,first copy rowKey > columns before {{SkipScanFilter.ptr}} to {{SkipScanFilter.startKey}}, because > now {{SkipScanFilter.ptr}} still point to the third column, so copy the first > and second columns to {{SkipScanFilter.startKey}},{{SkipScanFilter.startKey}} > is [2,7] after this step , then setStartKey method copy the lower bound > {{SkipScanFilter.slots}} from {{j+1}} column, accoring to > {{SkipScanFilter.position}} array,because now j is 0, and both position[1] > and position[2] are 0, so {{SkipScanFilter.startKey}} becomes [2,7,5,9], and > in following line 457, {{ByteUtil.nextKey}} is invoked on [2,7], [2,7] is > incremented to [2,8], finally {{SkipScanFilter.startKey}} is [2,8,5,9]. > {code} > 448 int currentLength = setStartKey(ptr, minOffset, j+1, > nSlots, false); > 449 // From here on, we use startKey as our buffer > (resetting minOffset and maxOffset) > 450 // We've copied the part of the current key above that > we need into startKey > 451 // Reinitialize the iterator to be positioned at > previous slot position > 452 minOffset = 0; > 453 maxOffset = startKeyLength; > 454 schema.iterator(startKey, minOffset, maxOffset, ptr, > ScanUtil.getRowKeyPosition(slotSpan, j)+1); > 455 // Do nextKey after setting the accessor b/c otherwise > the null byte may have > 456 // been incremented causing us not to find it > 457 ByteUtil.nextKey(startKey, currentLength); > {code} > From above analysis, we can see the second column is repeatedly copied to > {{SkipScanFilter.startKey}}, copy 7 to {{SkipScanFilter.startKey}} is error > ,before {{SkipScanFilter.setStartKey}} method is invoked in line 448, we > should first move {{SkipScanFilter.ptr}} to {{j+1}}, that is to say > ,following code might be inserted before line 448: > {code} > schema.reposition( > ptr, > ScanUtil.getRowKeyPosition(slotSpan, i), > ScanUtil.getRowKeyPosition(slotSpan, j+1), > minOffset, > maxOffset, > slotSpan[j+1]); > {code} > Another problem is why the existing SkipScanFilter unit tests are ok? That > is because in all existing unit test cases, backtrack column is just before > current column which ptr points to, i.e. i==j+1, so the unit tests are ok. > This issue may not affect the final query result , but obviously, it may lead > to the erroneous startKey and cause the Skip Scan inefficient. -- This message was sent by Atlassian JIRA (v6.3.15#6346)