ccaominh commented on a change in pull request #8925: Parallel indexing single 
dim partitions
URL: https://github.com/apache/incubator-druid/pull/8925#discussion_r355104107
 
 

 ##########
 File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/distribution/PartitionBoundaries.java
 ##########
 @@ -20,28 +20,48 @@
 package org.apache.druid.indexing.common.task.batch.parallel.distribution;
 
 import com.google.common.collect.ForwardingList;
-import com.google.common.collect.ImmutableList;
 
 import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
 import java.util.List;
+import java.util.stream.Collectors;
 
 /**
- * Convenience wrapper to make code more readable.
+ * List of range partition boundaries.
  */
-public class Partitions extends ForwardingList<String> implements List<String>
+public class PartitionBoundaries extends ForwardingList<String> implements 
List<String>
 {
   private final List<String> delegate;
 
   // For jackson
   @SuppressWarnings("unused")
-  private Partitions()
+  private PartitionBoundaries()
   {
     delegate = new ArrayList<>();
   }
 
-  public Partitions(String... partitions)
+  /**
+   * @param partitions Elements corresponding to evenly-spaced fractional 
ranks of the distribution
+   */
+  public PartitionBoundaries(String... partitions)
   {
-    delegate = ImmutableList.copyOf(partitions);
+    if (partitions.length == 0) {
+      delegate = Collections.emptyList();
+      return;
+    }
+
+    List<String> partitionBoundaries = Arrays.stream(partitions)
+                                             .distinct()
+                                             
.collect(Collectors.toCollection(ArrayList::new));
+
+    // First partition starts with null (see StringPartitionChunk.isStart())
+    partitionBoundaries.set(0, null);
+
+    // Last partition ends with null (see StringPartitionChunk.isEnd())
+    partitionBoundaries.add(null);
 
 Review comment:
   The last partition will never be empty because it'll have at least one row 
with the max value.
   
   Previously, I had logic to combine it with the second-to-last partition when 
it was small:
   https://github.com/apache/incubator-druid/pull/8925#discussion_r350539718
   
   If it's still desired to not have that logic to decide whether to combine it 
or not, then it needs to either always be combined or never be combined.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to