Re: [PR] HBASE-29112 Apply KeyOnlyFilter to RowCounter [hbase]

via GitHub Wed, 05 Feb 2025 19:10:39 -0800


junegunn commented on code in PR #6666:
URL: https://github.com/apache/hbase/pull/6666#discussion_r1944032580



##########
hbase-mapreduce/src/main/java/org/apache/hadoop/hbase/mapreduce/RowCounter.java:
##########
@@ -311,24 +314,40 @@ private static List<MultiRowRangeFilter.RowRange> 
parseRowRangeParameter(String
 
   /**
    * Sets filter {@link FilterBase} to the {@link Scan} instance. If provided 
rowRangeList contains
-   * more than one element, method sets filter which is instance of {@link 
MultiRowRangeFilter}.
-   * Otherwise, method sets filter which is instance of {@link 
FirstKeyOnlyFilter}. If rowRangeList
-   * contains exactly one element, startRow and stopRow are set to the scan.
+   * more than one element, method sets filter which is instance of {@link 
MultiRowRangeFilter}. If
+   * rowRangeList contains exactly one element, startRow and stopRow are set 
to the scan. Also,
+   * method may apply {@link FirstKeyOnlyFilter} an {@link KeyOnlyFilter} for 
better performance.
    */
   private static void setScanFilter(Scan scan, 
List<MultiRowRangeFilter.RowRange> rowRangeList,
     boolean countDeleteMarkers) {
-    final int size = rowRangeList == null ? 0 : rowRangeList.size();
-    // all cells will be needed if --countDeleteMarkers flag is set, hence, 
skipping filter
-    if (size <= 1 && !countDeleteMarkers) {
-      scan.setFilter(new FirstKeyOnlyFilter());
+    List<Filter> filters = new ArrayList<>();
+
+    // Apply filters for better performance.
+    if (!countDeleteMarkers) {
+      // We only need one cell per row, unless --countDeleteMarkers flag is 
set.
+      filters.add(new FirstKeyOnlyFilter());
+
+      // We're not interested in values. Use KeyOnlyFilter.
+      // NOTE: Logically, KeyOnlyFilter should be okay for 
--countDeleteMarkers, because it should
+      // only empty the values, but currently, having a filter changes the 
behavior of a raw scan.
+      // So we don't use it. See {@link 
UserScanQueryMatcher#mergeFilterResponse} for more details.

Review Comment:
   This was a little strange to me. But well, it is how it is.
   
   ```ruby
   create 't', 'd'
   
   put 't', 'r', 'd:foo', 'bar'
   
   now = Time.now.to_i * 1000
   delete 't', 'r', 'd:foo'
   delete 't', 'r', 'd:foo', now
   deleteall 't', 'r'
   deleteall 't', 'r', 'd:foo'
   deleteall 't', 'r', 'd:foo', now
   
   scan 't'
     # ROW  COLUMN+CELL
     # 0 row(s)
   scan 't', RAW => true
     # ROW  COLUMN+CELL
     #  r column=d:, timestamp=2025-02-06T12:02:23.136, type=DeleteFamily
     #  r column=d:foo, timestamp=2025-02-06T12:02:23.176, type=DeleteColumn
     #  r column=d:foo, timestamp=2025-02-06T12:02:23, type=DeleteColumn
     #  r column=d:foo, timestamp=2025-02-06T12:02:23, type=Delete
     #  r column=d:foo, timestamp=2025-02-06T12:02:21.837, type=Delete
     #  r column=d:foo, timestamp=2025-02-06T12:02:21.837, value=bar
     # 1 row(s)
   scan 't', RAW => true, FILTER => 'KeyOnlyFilter()'
     # ROW  COLUMN+CELL
     #  r column=d:, timestamp=2025-02-06T12:02:23.136, type=DeleteFamily
     #  r column=d:foo, timestamp=2025-02-06T12:02:23.176, type=DeleteColumn
     # 1 row(s)
   scan 't', RAW => true, FILTER => 'KeyOnlyFilter()', VERSIONS => 100
     # ROW  COLUMN+CELL
     #  r column=d:, timestamp=2025-02-06T12:02:23.136, type=DeleteFamily
     #  r column=d:foo, timestamp=2025-02-06T12:02:23.176, type=DeleteColumn
     #  r column=d:foo, timestamp=2025-02-06T12:02:23, type=DeleteColumn
     #  r column=d:foo, timestamp=2025-02-06T12:02:23, type=Delete
     #  r column=d:foo, timestamp=2025-02-06T12:02:21.837, type=Delete
     #  r column=d:foo, timestamp=2025-02-06T12:02:21.837, value=
     # 1 row(s)
   ```
   
   The thing is, `UserScanQueryMatcher#mergeFilterResponse` alters the 
MatchCode. `INCLUDE` + `INCLUDE from filter` becomes `SEEK_NEXT_COL` as shown 
below.
   
   <img width="607" alt="image" 
src="https://github.com/user-attachments/assets/68547274-ee44-4ed3-ac4a-7432e1dfcda7";
 />
   
   
https://github.com/apache/hbase/blob/85a8b54213ddd3220df59b6ecdf295c9f891c46f/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/querymatcher/UserScanQueryMatcher.java#L164-L166
   
   This probably doesn't matter. Typical use cases shouldn't be affected. It's 
just something odd I happened to notice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] HBASE-29112 Apply KeyOnlyFilter to RowCounter [hbase]

Reply via email to