klsince commented on a change in pull request #7243:
URL: https://github.com/apache/pinot/pull/7243#discussion_r687408548



##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentselector/SelectedSegments.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.broker.routing.segmentselector;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.commons.codec.digest.DigestUtils;
+
+/**
+ * The SelectedSegments has a strict (Sha256 for now) hash for all the 
segments in it.
+ * It can be used to quickly compare whether the two segment sets are 
identical.
+ * It will be helpful for high QPS queries for tables with lots of segments
+ */
+public class SelectedSegments {
+  private final Set<String> _segments;
+  private final String _segmentHash;
+  private final boolean _hasHash;
+  public static final String EMPTY_HASH = "";
+
+  /**
+   * Note that computing Hash involves sorting, so it is O(N*LogN) complexity
+   * @param segments the segments
+   * @param computeHash whether we compute the hash for quick comparison.
+   *                    Note that computing Hash involves sorting, so it is 
O(N*LogN) complexity
+   */
+  public SelectedSegments(Set<String> segments, boolean computeHash){
+      this(segments, computeHash ? computeHash(segments) : EMPTY_HASH);
+  }
+
+  public SelectedSegments(Set<String> segments, String segmentHash) {
+    _segments = segments;
+    _segmentHash = segmentHash;
+    _hasHash = !segmentHash.equals(EMPTY_HASH);
+  }
+
+  /**
+   * Gets the segment set
+   * @return the set of all segments
+   */
+  public Set<String> getSegments() {
+    return _segments;
+  }
+
+  /**
+   * Gets the checksum of all the segments in it. Mostly used for quick 
comparison of sets
+   * @return The string representation of hash
+   */
+  public String getSegmentHash() {
+    return _segmentHash;
+  }
+
+  /**
+   * Gets whether the hash is valid and usable
+   * @return a flag that whether the hash is valid
+   */
+  public boolean hasHash() {
+    return _hasHash;
+  }
+
+  /**
+   * Compute the hash checksum for all segments
+   * @param segments the set for all segments
+   * @return string digest of byte arrays
+   */
+  public static String computeHash(Set<String> segments) {
+    SortedSet<String> sorted = new TreeSet<>(segments);
+    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

Review comment:
       did computeHash show up in cpu profile? I wonder if there is need to 
cache outputStream to reuse (its underlying byte[])

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentselector/SelectedSegments.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.broker.routing.segmentselector;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.commons.codec.digest.DigestUtils;
+
+/**
+ * The SelectedSegments has a strict (Sha256 for now) hash for all the 
segments in it.
+ * It can be used to quickly compare whether the two segment sets are 
identical.
+ * It will be helpful for high QPS queries for tables with lots of segments
+ */
+public class SelectedSegments {
+  private final Set<String> _segments;
+  private final String _segmentHash;
+  private final boolean _hasHash;
+  public static final String EMPTY_HASH = "";
+
+  /**
+   * Note that computing Hash involves sorting, so it is O(N*LogN) complexity
+   * @param segments the segments
+   * @param computeHash whether we compute the hash for quick comparison.
+   *                    Note that computing Hash involves sorting, so it is 
O(N*LogN) complexity
+   */
+  public SelectedSegments(Set<String> segments, boolean computeHash){
+      this(segments, computeHash ? computeHash(segments) : EMPTY_HASH);
+  }
+
+  public SelectedSegments(Set<String> segments, String segmentHash) {
+    _segments = segments;
+    _segmentHash = segmentHash;
+    _hasHash = !segmentHash.equals(EMPTY_HASH);
+  }
+
+  /**
+   * Gets the segment set
+   * @return the set of all segments
+   */
+  public Set<String> getSegments() {
+    return _segments;
+  }
+
+  /**
+   * Gets the checksum of all the segments in it. Mostly used for quick 
comparison of sets
+   * @return The string representation of hash
+   */
+  public String getSegmentHash() {
+    return _segmentHash;
+  }
+
+  /**
+   * Gets whether the hash is valid and usable
+   * @return a flag that whether the hash is valid
+   */
+  public boolean hasHash() {
+    return _hasHash;
+  }
+
+  /**
+   * Compute the hash checksum for all segments
+   * @param segments the set for all segments
+   * @return string digest of byte arrays
+   */
+  public static String computeHash(Set<String> segments) {
+    SortedSet<String> sorted = new TreeSet<>(segments);
+    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
+    try {
+    for (String segment: sorted) {

Review comment:
       nit: format

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentselector/SelectedSegments.java
##########
@@ -0,0 +1,97 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.broker.routing.segmentselector;
+
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.util.Set;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import org.apache.commons.codec.digest.DigestUtils;
+
+/**
+ * The SelectedSegments has a strict (Sha256 for now) hash for all the 
segments in it.
+ * It can be used to quickly compare whether the two segment sets are 
identical.
+ * It will be helpful for high QPS queries for tables with lots of segments
+ */
+public class SelectedSegments {
+  private final Set<String> _segments;
+  private final String _segmentHash;
+  private final boolean _hasHash;
+  public static final String EMPTY_HASH = "";

Review comment:
       nit: put `public static final ...` before object fields

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
##########
@@ -163,48 +190,64 @@ public synchronized void refreshSegment(String segment) {
       if (filterExpression == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
-        PartitionInfo partitionInfo = _partitionInfoMap.get(segment);
-        if (partitionInfo == null || partitionInfo == INVALID_PARTITION_INFO 
|| isPartitionMatch(filterExpression,
-            partitionInfo)) {
-          selectedSegments.add(segment);
-        }
-      }
-      return selectedSegments;
+      return pruneSegments(segments, (partitionInfo, cachedPartitionFunction) 
-> isPartitionMatch(filterExpression,
+        partitionInfo, cachedPartitionFunction));
     } else {
       // PQL
       FilterQueryTree filterQueryTree = 
RequestUtils.generateFilterQueryTree(brokerRequest);
       if (filterQueryTree == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
+      return pruneSegments(segments, (partitionInfo, cachedPartitionFunction) 
-> isPartitionMatch(filterQueryTree, partitionInfo, cachedPartitionFunction));
+    }
+  }
+
+  private SelectedSegments pruneSegments(SelectedSegments segments, 
java.util.function.BiFunction<PartitionInfo, CachedPartitionFunction,  Boolean> 
partitionMatchLambda) {

Review comment:
       why not import BiFunction? 

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
##########
@@ -278,5 +321,41 @@ private boolean isPartitionMatch(FilterQueryTree 
filterQueryTree, PartitionInfo
       _partitionFunction = partitionFunction;
       _partitions = partitions;
     }
+    @Override
+    public boolean equals(Object o) {

Review comment:
       nit: 
   1. add {} even if there is one line in the if-body
   2. the 2nd if can be simplified as `if (!(o instanceof PartitionInfo)) {..}` 

##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/RoutingManager.java
##########
@@ -570,14 +571,14 @@ void refreshSegment(String segment) {
     }
 
     InstanceSelector.SelectionResult calculateRouting(BrokerRequest 
brokerRequest) {
-      Set<String> selectedSegments = _segmentSelector.select(brokerRequest);
-      if (!selectedSegments.isEmpty()) {
+      SelectedSegments selectedSegments = 
_segmentSelector.select(brokerRequest);
+      if (!selectedSegments.getSegments().isEmpty()) {

Review comment:
       nit: consider make isEmpty() a method on SelectedSegments

##########
File path: 
pinot-segment-spi/src/main/java/org/apache/pinot/segment/spi/partition/MurmurPartitionFunction.java
##########
@@ -104,4 +105,18 @@ int murmur2(final byte[] data) {
 
     return h;
   }
+
+
+  @Override
+  public boolean equals(Object o) {
+    if (this == o) return true;
+    if (o == null || ! (o instanceof MurmurPartitionFunction)) return false;

Review comment:
       same here. could be simplified a bit




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to