[
https://issues.apache.org/jira/browse/PHOENIX-7619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sanjeet Malhotra updated PHOENIX-7619:
--------------------------------------
Affects Version/s: 5.2.1
5.2.0
5.2.2
5.3
5.2.3
> Excess HFiles are being read to look for more than required column versions
> ---------------------------------------------------------------------------
>
> Key: PHOENIX-7619
> URL: https://issues.apache.org/jira/browse/PHOENIX-7619
> Project: Phoenix
> Issue Type: Bug
> Affects Versions: 5.2.0, 5.2.1, 5.2.2, 5.3, 5.2.3
> Reporter: Sanjeet Malhotra
> Assignee: Sanjeet Malhotra
> Priority: Major
>
> Steps to reproduce:
> * Create table with one column family.
> {code:java}
> CREATE TABLE TEST.HBASE_READS( ID1 VARCHAR NOT NULL, ID2 VARCHAR, VAL1
> VARCHAR CONSTRAINT PK PRIMARY KEY (ID1)) BLOOMFILTER = NONE;{code}
> * Write some data to the table and flush the table. So, that there is at
> least 1 HFile. (During my testing I ensured there are 3 HFiles per region.)
> * Write some more data to the table but this time don't flush the table. So,
> this data will stay in memstore.
> * Query a single row such that its in the data still in memstore but not in
> HFiles. So, the rows should come purely from memstore w/o even needing to
> read from HFile.
> Expectation: The queried row should come from memstore and there shouldn't be
> any need to read HFiles.
> Actual: Memstore along with all HFiles were scanned to get the Result back to
> the client.
>
> Reason:
> In HBase, when [StoreScanner is initialized
> |https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L266]then
> we go for lazy seek as Scan object coming from Phoenix specifies column
> qualifiers to be queried. If the StoreFile on which we are doing lazy seek
> has no deleteFamily or deleteFamilyVersion markers then [this
> line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java#L438]
> will be hit. Same will be done for all StoreFileScanners. While head of
> memstore scanner (SegmentScanner) will be at the first column of the given
> row. Next [this
> line|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ScanQueryMatcher.java#L192]
> will be hit until memstore scanner is the top most scanner in the priority
> queue of all the scanners: 3 StoreFile scanners and 1 memstore scanner. Once
> memstore Scanner is the top most scanner then first column being queried will
> be read from memstore and [this line will be
> hit|https://github.com/apache/hbase/blob/b21ba71f73881336345fd5dd7d647910b3058e05/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/ExplicitColumnTracker.java#L167]
> after successful column match. Here if {{maxVersions}} have been found then
> we skip to next column which again will be read from memstore. But if
> {{maxVersions}} are not found then the we go on to read the next version i.e.
> next cell which leads to scanning all the StoreFiles. In "User" scans
> {{maxVersions}} should have been {{1}} for us so, we should have skipped to
> the next column once we found the latest version of the current column in
> memstore. But for "User" scans {{maxVersions}} is {{INT_MAX}} for us leading
> to reading all the StoreFiles. We should have [hit this
> line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L746]
> but we end up [hitting this
> line|https://github.com/apache/hbase/blob/efa228ef446c0e63bbe2915a48d3324efab79ccc/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java#L704].The
> {{maxVersions}} is {{INT_MAX}} for us because we override it in
> [here|https://github.com/apache/phoenix/blob/9cb48832a7e9b9a972d682535179ab6a2fd0cb16/phoenix-core-server/src/main/java/org/apache/phoenix/coprocessor/BaseScannerRegionObserver.java#L432-L435].
> The {{preStoreScannerOpen}} hook is called for "User" scans. So, we are
> penalizing all the "User" scans.
>
> Fix for preStoreScannerOpen() hook:
> * Don't override MIN_VERSIONS and VERSIONS.
> * Set TTL to {{Long.MAX_VALUE}} instead of {{HConstants.FOREVER}} . This is
> needed because {{HConstant.FOREVER}} is INT_MAX and the TTL overridden as
> part of ScanOptions is interpreted in milliseconds by HBase. INT_MAX value in
> ms is equivalent to a little less than 25 days. So, HBase will treat even
> latest version of a column qualifier as expired if its older than 25 days.
> This can cause rows to partially expire. Currently, rows are not expiring
> partially because we set MIN_VERSIONS in this hook to INT_MAX. Once we stop
> overriding MIN_VERSIONS we need to set TTL to Long.MAX_VALUE as TTL's data
> type is long. Verified this via IT.
> * Continue overriding {{KeepDeletedCells}} to {{{}TTL{}}}. If we stop doing
> this then SCN queries will get impacted. Scenario: We keep KeepDeletedCells
> as {{False.}} Say at T1 timestamp I wrote a row and at T2 > T1 I delete the
> row. Now suppose I set my SCN value to a timestamp b/w T2 and T1 then
> expectation is I should see the inserted row but I won't because to see past
> delete markers when custom time range is specified in scan I need to set
> KeepDeletedCells to a value other than {{False}} . I verified this via IT.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)