Himanshu Gwalani created HBASE-29864:
----------------------------------------
Summary: Standardize KeyValueScanner interface and all
implementations to LimitedPrivate
Key: HBASE-29864
URL: https://issues.apache.org/jira/browse/HBASE-29864
Project: HBase
Issue Type: New Feature
Components: API, regionserver, Scanners
Reporter: Himanshu Gwalani
Assignee: Himanshu Gwalani
Fix For: 2.7.0, 3.0.0-beta-2
*Goal:* Introduce a mechanism to track and expose the specific HFiles involved
in a scan operation.
{*}Use-case{*}: This is essential for validations on client side to ensure
right set of files are scanned (if source of truth is available, for example:
snapshot data manifest during snapshot based scans), debugging performance
related issues and analysis on data access patterns.
*Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the
{{KeyValueScanner}} interface.
*Implementation Details*
* *Capturing list of files when scanner is initialized.*
** Leaf Scanners
*** StoreFileScanner: Returns singleton having the path of the associated
{{{}HFile{}}}.
*** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner: Returns
empty set.
** Composite Scanners
*** StoreScanner & ReversedStoreScanner: Aggregates files from all active
{{StoreFileScanners}}
*** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal
priority queue of scanners.
** Abstract Scanners
*** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns empty
set.{*}{{*}}
* *Exposing via RegionScanner & TableSnapshotRecordReader*
** RegionScanner: Aggregates files from all underlying StoreScanners
** TableSnapshotRecordReader: Proxies the call through ClientSideRegionScanner
to allow MapReduce jobs to access this for snapshot-based scans.
** Note: Also
--
This message was sent by Atlassian Jira
(v8.20.10#820010)