Himanshu Gwalani created HBASE-29864:
----------------------------------------

             Summary: Standardize KeyValueScanner interface and all 
implementations to LimitedPrivate
                 Key: HBASE-29864
                 URL: https://issues.apache.org/jira/browse/HBASE-29864
             Project: HBase
          Issue Type: New Feature
          Components: API, regionserver, Scanners
            Reporter: Himanshu Gwalani
            Assignee: Himanshu Gwalani
             Fix For: 2.7.0, 3.0.0-beta-2


*Goal:* Introduce a mechanism to track and expose the specific HFiles involved 
in a scan operation.

{*}Use-case{*}: This is essential for validations on client side to ensure 
right set of files are scanned (if source of truth is available, for example: 
snapshot data manifest during snapshot based scans), debugging performance 
related issues and analysis on data access patterns.

*Proposed API* Add {{Set<Path> getScannerInitializedFiles()}} to the 
{{KeyValueScanner}} interface.

*Implementation Details*
 * *Capturing list of files when scanner is initialized.*
 ** Leaf Scanners
 *** StoreFileScanner: Returns singleton having the path of the associated 
{{{}HFile{}}}.
 *** SnapshotSegmentScanner / CollectionBackedScanner / SegmentScanner: Returns 
empty set.
 ** Composite Scanners
 *** StoreScanner & ReversedStoreScanner: Aggregates files from all active 
{{StoreFileScanners}}
 *** KeyValueHeap & ReversedKeyValueHeap: Aggregates files from its internal 
priority queue of scanners.
 ** Abstract Scanners
 *** NonLazyKeyValueScanner / NonReversedNonLazyKeyValueScanner: Returns empty 
set.{*}{{*}}
 * *Exposing via RegionScanner & TableSnapshotRecordReader*
 ** RegionScanner: Aggregates files from all underlying StoreScanners
 ** TableSnapshotRecordReader: Proxies the call through ClientSideRegionScanner 
to allow MapReduce jobs to access this for snapshot-based scans.
 ** Note: Also 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to