[GitHub] [incubator-druid] clintropolis commented on a change in pull request #7185: Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality

GitBox Tue, 05 Mar 2019 23:33:43 -0800

clintropolis commented on a change in pull request #7185: Avoid many 
unnecessary materializations of collections of 'all segments in cluster' 
cardinality
URL: https://github.com/apache/incubator-druid/pull/7185#discussion_r262811871


 ##########
 File path: 
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
 ##########
 @@ -94,10 +93,29 @@
 @ManageLifecycle
 public class DruidCoordinator
 {
-  public static Comparator<DataSegment> SEGMENT_COMPARATOR = 
Ordering.from(Comparators.intervalsByEndThenStart())
-                                                                     
.onResultOf(DataSegment::getInterval)
-                                                                     
.compound(Ordering.<DataSegment>natural())
-                                                                     
.reverse();
+  /**
+   * This comparator orders "freshest" segments first, i. e. segments with 
most recent intervals.
+   *
+   * It is used in historical nodes' {@link LoadQueuePeon}s to make 
historicals load more recent segment first.
+   *
+   * It is also used in {@link DruidCoordinatorRuntimeParams} for {@link
+   * DruidCoordinatorRuntimeParams#getAvailableSegments()} - a collection of 
segments to be considered during some
+   * coordinator run for different {@link DruidCoordinatorHelper}s. The order 
matters only for {@link
+   * DruidCoordinatorRuleRunner}, which tries to apply the rules while 
iterating the segments in the order imposed by
+   * this comparator. In {@link LoadRule} the throttling limit may be hit (via 
{@link ReplicationThrottler}; see
+   * {@link CoordinatorDynamicConfig#getReplicationThrottleLimit()}). So 
before we potentially hit this limit, we want
+   * to schedule loading the more recent segments (among all of those that 
need to be loaded).
+   *
+   * In both {@link LoadQueuePeon}s and {@link DruidCoordinatorRuleRunner}, we 
want to load more recent segments first
+   * because presumably they are queried more often and contain are more 
important data for users, so if the Druid
+   * cluster has availability problems and struggling to make all segments 
available immediately, at least we try to
+   * make more "important" (more recent) segments available as soon as 
possible.
+   */
 
 Review comment:
   Thanks for javadoc :+1:

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] [incubator-druid] clintropolis commented on a change in pull request #7185: Avoid many unnecessary materializations of collections of 'all segments in cluster' cardinality

Reply via email to