ccaominh commented on a change in pull request #8925: Parallel indexing single dim partitions URL: https://github.com/apache/incubator-druid/pull/8925#discussion_r355104107
########## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/distribution/PartitionBoundaries.java ########## @@ -20,28 +20,48 @@ package org.apache.druid.indexing.common.task.batch.parallel.distribution; import com.google.common.collect.ForwardingList; -import com.google.common.collect.ImmutableList; import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; import java.util.List; +import java.util.stream.Collectors; /** - * Convenience wrapper to make code more readable. + * List of range partition boundaries. */ -public class Partitions extends ForwardingList<String> implements List<String> +public class PartitionBoundaries extends ForwardingList<String> implements List<String> { private final List<String> delegate; // For jackson @SuppressWarnings("unused") - private Partitions() + private PartitionBoundaries() { delegate = new ArrayList<>(); } - public Partitions(String... partitions) + /** + * @param partitions Elements corresponding to evenly-spaced fractional ranks of the distribution + */ + public PartitionBoundaries(String... partitions) { - delegate = ImmutableList.copyOf(partitions); + if (partitions.length == 0) { + delegate = Collections.emptyList(); + return; + } + + List<String> partitionBoundaries = Arrays.stream(partitions) + .distinct() + .collect(Collectors.toCollection(ArrayList::new)); + + // First partition starts with null (see StringPartitionChunk.isStart()) + partitionBoundaries.set(0, null); + + // Last partition ends with null (see StringPartitionChunk.isEnd()) + partitionBoundaries.add(null); Review comment: The last partition will never be empty because it'll have at least one row with the max value. Previously, I had logic to combine it with the second-to-last partition when it was small: https://github.com/apache/incubator-druid/pull/8925#discussion_r350539718 If it's still desired to not have that logic to decide whether to combine it or not, then it needs to either always be combined or never be combined. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org