Jackie-Jiang commented on code in PR #15350:
URL: https://github.com/apache/pinot/pull/15350#discussion_r2125209005


##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/BloomFilterSegmentPruner.java:
##########
@@ -189,26 +190,26 @@ private List<IndexSegment> prefetch(List<IndexSegment> 
segments, QueryContext qu
   }
 
   private boolean pruneSegmentWithFetchContext(IndexSegment segment, 
FetchContext fetchContext, FilterContext filter,
-      Map<String, DataSource> dataSourceCache, ValueCache cachedValues) {
+      Map<String, DataSource> dataSourceCache, ValueCache cachedValues, 
QueryContext queryContext) {
     if (fetchContext == null) {
-      return pruneSegment(segment, filter, dataSourceCache, cachedValues);
+      return pruneSegment(segment, filter, dataSourceCache, cachedValues, 
queryContext);

Review Comment:
   Virtual data source should never have bloom filter. We can remove `assert 
dataSource != null;` and return `false` when data source doesn't exist



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/SegmentPreloadUtils.java:
##########
@@ -191,4 +199,33 @@ private static Map<String, SegmentZKMetadata> 
getSegmentsZKMetadata(String table
         .forEach(m -> segmentMetadataMap.put(m.getSegmentName(), m));
     return segmentMetadataMap;
   }
+
+    /**

Review Comment:
   Format is wrong



##########
pinot-core/src/test/java/org/apache/pinot/core/query/pruner/BloomFilterSegmentPrunerTest.java:
##########
@@ -101,7 +101,7 @@ public void testQueryTimeoutOnPruning()
       throws IOException {
     IndexSegment indexSegment = mockIndexSegment(new String[]{"1.0", "2.0", 
"3.0", "5.0", "7.0", "21.0"});
     DataSource dataSource = mock(DataSource.class);
-    when(indexSegment.getDataSource("column")).thenReturn(dataSource);
+    when(indexSegment.getDataSource("column", null)).thenReturn(dataSource);

Review Comment:
   We should never pass in `null` schema



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/IndexSegment.java:
##########
@@ -62,7 +62,7 @@ public interface IndexSegment {
   Set<String> getPhysicalColumnNames();
 
   /// Returns the [DataSource] for the given column.
-  /// TODO: Revisit all usage of this method to support virtual [DataSource].
+  /// Keeping this around for testing and usage during non query execution 
code paths.

Review Comment:
   This API is useful when column is expected to exist



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/SegmentPreloadUtils.java:
##########
@@ -191,4 +199,33 @@ private static Map<String, SegmentZKMetadata> 
getSegmentsZKMetadata(String table
         .forEach(m -> segmentMetadataMap.put(m.getSegmentName(), m));
     return segmentMetadataMap;
   }
+
+    /**
+     * Get a virtual data source for the given column in the table.
+     *
+     *
+     * @param column Column name for which the data source is needed
+     * @param totalDocCount Total document count for the column
+     * @return DataSource for the column
+     */
+  public static DataSource getVirtualDataSource(Schema tableSchema, String 
column, int totalDocCount) {

Review Comment:
   This method doesn't belong to this util class. The virtual `DataSource` is 
created at query time, but not load time. Consider adding a new util class for 
it



##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/SegmentPrunerService.java:
##########
@@ -105,7 +106,8 @@ public List<IndexSegment> prune(List<IndexSegment> 
segments, QueryContext query,
   public List<IndexSegment> prune(List<IndexSegment> segments, QueryContext 
query, SegmentPrunerStatistics stats,
       @Nullable ExecutorService executorService) {
     try (InvocationScope scope = 
Tracing.getTracer().createScope(SegmentPrunerService.class)) {
-      segments = removeInvalidSegments(segments, query, stats);
+      segments = segments.stream()
+              .filter(segment -> 
!SegmentPrunerService.isEmptySegment(segment)).collect(Collectors.toList());

Review Comment:
   Keep the existing logic and keep all non-empty segments, which comes with 
lower overhead



##########
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/IndexSegment.java:
##########
@@ -76,10 +76,9 @@ default DataSource getDataSource(String column) {
   /// Returns the [DataSource] for the given column, or creates a virtual one 
if it doesn't exist. The passed in
   /// [Schema] should be the latest schema of the table, not the one from 
[SegmentMetadata], and should contain the
   /// asked column.
-  /// TODO: Add support for virtual [DataSource].
-  default DataSource getDataSource(String column, Schema schema) {
-    return getDataSource(column);
-  }
+ default DataSource getDataSource(String column, Schema schema) {

Review Comment:
   You can remove the default impl



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to