[
https://issues.apache.org/jira/browse/PHOENIX-7580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Viraj Jasani updated PHOENIX-7580:
----------------------------------
Fix Version/s: 5.3.0
> Data in last salt bucket is not being scanned for range scan
> ------------------------------------------------------------
>
> Key: PHOENIX-7580
> URL: https://issues.apache.org/jira/browse/PHOENIX-7580
> Project: Phoenix
> Issue Type: Bug
> Reporter: Sanjeet Malhotra
> Assignee: Sanjeet Malhotra
> Priority: Major
> Fix For: 5.3.0
>
>
> Steps to reproduce:
> * Run DDL:
> ** CREATE TABLE IF NOT EXISTS TABLE1 (
> PK1 CHAR(7) NOT NULL,
> PK2 CHAR(7) NOT NULL,
> PK3 DECIMAL NOT NULL,
> PK4 CHAR(32) NOT NULL,
> COL1 VARCHAR,
> COL2 VARCHAR,
> CONSTRAINT PK PRIMARY KEY (
> PK1,
> PK2,
> PK3,
> PK4
> )
> ) VERSIONS=1, MULTI_TENANT=true, REPLICATION_SCOPE=0, SALT_BUCKETS=20,
> UPDATE_CACHE_FREQUENCY=172800000;
> * Add data to the table and make sure via HBase scan that some rows did went
> to last salt bucket.
> ** Make sure to add only that much data so, that no region split happens.
> You should have 20 regions as salt bucket count is 20.
> ** Add data such that first 3 PK columns have values: 'PK_VAL1', 'PK_VAL2'
> and 1743478459000, for all the rows and only last PK column is different for
> each of the added rows.
> * Run range query:
> ** {{select count(\*) from TABLE1 where PK1 = 'PK_VAL1' AND PK3 =
> 1743478459000 AND PK2 = 'PK_VAL2';}}
> ** Note down the count of rows returned by above query.
> * Run scan on HBase from shell:
> ** Sample scan for salt bucket `\x00`: `scan "TABLE1", \{VERSIONS => 1,
> COLUMNS => "0:_0", ROWPREFIXFILTER => "\x00PK_VAL1PK_VAL2\xC7\x02K#O.[\x00"}`
> ** Run the above scan for all the salt buckets from `\x00` to `\x13`. Note
> down the row count for each salt buckets. The sum should be same as what you
> got above from Phoenix query.
> * So, far we are good as Phoenix is able to scan rows of last salt bucket
> from HBase.
> * Now add 3 rows to second last salt bucket: `\x12`, such that row key
> prefix (constructed from first 3 PK columns) for these rows is greater than
> `\x12PK_VAL1PK_VAL2\xC7\x02K#O.[\x00`.
> * Out of the 3 new rows added use the second one (in lexicographic order) as
> split key for splitting the region corresponding to the second last salt
> bucket. Split the region corresponding to the second last salt bucket.
> * Now again run same Phoenix range query and you will observe that this time
> count of rows will be less than last time. And, diff. in count of rows will
> be same as no. of rows in last salt bucket (`\x13`).
> * So, the rows are there in HBase but Phoenix is not scanning them.
>
> Root cause:
> * Please go through above steps to reproduce first to better understand the
> root cause.
> * For getting the region location, we are going through this code:
> [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1048-L1064|#L1048-L1064].
> Here we get all the region locations for all the 20 regions as expected. So,
> no bug here.
> * In here:
> [#https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245|#L1245].
> We iterate over all the region locations we got, one by one and get scan for
> each region location.
> ** As you can see the end key of previous region becomes the start key to
> get the scan for next region
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1268]).
> And, end key for getting scan for last region is empty as per
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1156].
> ** So, when we are doing a range scan over a salted table then start key for
> the scan over last region will be end key of the region corresponding to the
> second last salt bucket.
> * Next, we call {{intersectScan}}
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/iterate/BaseResultIterators.java#L1245])
> for getting scan for the last region.
> ** In {{intersectScan}} function def.
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L263]),
> the {{originalStartKey}} for last region is end key of region corresponding
> to the second last salt bucket and {{originalStopKey}} is empty byte array.
> ** Suppose following condition is satisfied:
> *** The region corresponding to the second last salt bucket has at least one
> region after it and belonging to the same second last bucket.
> ** So, this will make {{originalStartKey}} to have same first byte as second
> last salt bucket.
> ** Because of above, we will go in this if block
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286])
> and it will be because {{Bytes.compareTo(originalStopKey, nextBucketStart)
> <= 0}} got satisfied.
> ** Suppose following condition is satisfied:
> *** Create a byte array from end key of region corresponding to second last
> salt bucket i.e. {{{}scanStartKey{}}}, by excluding the first byte. Let’s
> call it {{{}b1{}}}.
> *** Create a byte array from row key prefix from WHERE clause of range scan
> excluding the first byte. Let’s call it {{{}b2{}}}.
> *** On doing byte comparison, {{b1}} > {{{}b2{}}}.
> ** Above condition will get us in this if block
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L385]).
> Important to note as this is range scan on salted table so,
> {{scanKeyOffset}} and thus, {{scanStartKeyOffset}} and {{scanStopKeyOffset}}
> both will be 1.
> ** Because of condition that {{b1}} > {{b2}} finally no scan is created for
> last salt bucket. And, we end up missing to scan last bucket.
> * Seems like the bug is:
> ** {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}}
> ([link|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L283]).
> This check succeeded in above described root cause analysis but it shouldn’t
> have ideally as {{originalStopKey}} was empty byte array and when a stop key
> is empty byte array then it needs to be handled as a special case that it
> means its the biggest possible value of the stop key.
> ** So, ideally we should not go into the above if block because of
> {{Bytes.compareTo(originalStopKey, nextBucketStart) <= 0}} check succeeding
> when {{originalStopKey}} is empty byte array. Rather we should first hit
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L294]
> line.
> ** Then in the next iteration of while loop we will go inside same if block
> but because of {{lastBucket}} boolean variable being true. And this time the
> first byte of {{wrkStartKey}} and {{nextByteBucket}} will be same. So, when
> doing range scan over a salted table if we are hitting
> [this|https://github.com/apache/phoenix/blob/7a5965887f679bc229c1424e84951202a5ab27b7/phoenix-core-client/src/main/java/org/apache/phoenix/compile/ScanRanges.java#L286]
> line such that {{wrkStartKey}} and {{originalStopKey}} both belong to
> different salt buckets then its wrong.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)