[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7243: Broker Performance Improvement when there are 10k+ segments

GitBox Tue, 03 Aug 2021 12:30:46 -0700


Jackie-Jiang commented on a change in pull request #7243:
URL: https://github.com/apache/pinot/pull/7243#discussion_r682021900




##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/EmptySegmentPruner.java
##########
@@ -123,6 +123,9 @@ public synchronized void refreshSegment(String segment) {
    */
   @Override
   public Set<String> prune(BrokerRequest brokerRequest, Set<String> segments) {
+    if (_emptySegments.isEmpty()) {

Review comment:
       +1

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
##########
@@ -163,48 +181,44 @@ public synchronized void refreshSegment(String segment) {
       if (filterExpression == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
-        PartitionInfo partitionInfo = _partitionInfoMap.get(segment);
-        if (partitionInfo == null || partitionInfo == INVALID_PARTITION_INFO 
|| isPartitionMatch(filterExpression,
-            partitionInfo)) {
-          selectedSegments.add(segment);
-        }
-      }
-      return selectedSegments;
+      return pruneSegments((partitionInfo, cachedPartitionFunction) -> 
isPartitionMatch(filterExpression,

Review comment:
       (Critical) This algorithm does not prune based on the input segments, 
and will ignore the results for other pruners. Also, we need to keep the 
segments with `null` or `INVALID_PARTITION_INFO` because these segments might 
not be partitioned

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
##########
@@ -163,48 +181,44 @@ public synchronized void refreshSegment(String segment) {
       if (filterExpression == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
-        PartitionInfo partitionInfo = _partitionInfoMap.get(segment);
-        if (partitionInfo == null || partitionInfo == INVALID_PARTITION_INFO 
|| isPartitionMatch(filterExpression,
-            partitionInfo)) {
-          selectedSegments.add(segment);
-        }
-      }
-      return selectedSegments;
+      return pruneSegments((partitionInfo, cachedPartitionFunction) -> 
isPartitionMatch(filterExpression,
+        partitionInfo, cachedPartitionFunction));
     } else {
       // PQL
       FilterQueryTree filterQueryTree = 
RequestUtils.generateFilterQueryTree(brokerRequest);
       if (filterQueryTree == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
-        PartitionInfo partitionInfo = _partitionInfoMap.get(segment);
-        if (partitionInfo == null || partitionInfo == INVALID_PARTITION_INFO 
|| isPartitionMatch(filterQueryTree,
-            partitionInfo)) {
-          selectedSegments.add(segment);
-        }
+      return pruneSegments((partitionInfo, cachedPartitionFunction) -> 
isPartitionMatch(filterQueryTree, partitionInfo, cachedPartitionFunction));
+    }
+  }
+
+  private Set<String> 
pruneSegments(java.util.function.BiFunction<PartitionInfo, 
CachedPartitionFunction,  Boolean> partitionMatchLambda) {
+    Set<String> selectedSegments = new HashSet<>();
+    CachedPartitionFunction cachedPartitionFunction = new 
CachedPartitionFunction();

Review comment:
       Why are we creating a cache per query? If we just need to cache the 
partition id, `Map<PartitionFunction, Map<String, Integer>>` should be enough




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] Jackie-Jiang commented on a change in pull request #7243: Broker Performance Improvement when there are 10k+ segments

Reply via email to