adarshsanjeev commented on code in PR #16804:
URL: https://github.com/apache/druid/pull/16804#discussion_r1700096766


##########
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java:
##########
@@ -328,4 +321,29 @@ private ShuffleSpec 
findShuffleSpecForNextWindow(List<OperatorFactory> operatorF
 
     return new HashShuffleSpec(new ClusterBy(keyColsOfWindow, 0), 
maxWorkerCount);
   }
+
+  /**
+   * Override the shuffle spec of the last stage based on the shuffling 
required by the first window stage.
+   * @param queryId
+   * @param dataSourcePlan
+   * @param shuffleSpec
+   * @return
+   */
+  private QueryDefinitionBuilder makeQueryDefinitionBuilder(String queryId, 
DataSourcePlan dataSourcePlan, ShuffleSpec shuffleSpec)
+  {
+    final QueryDefinitionBuilder queryDefBuilder = 
QueryDefinition.builder(queryId);
+    int previousStageNumber = 
dataSourcePlan.getSubQueryDefBuilder().get().build().getFinalStageDefinition().getStageNumber();
+    for (final StageDefinition stageDef : 
dataSourcePlan.getSubQueryDefBuilder().get().build().getStageDefinitions()) {
+      if (stageDef.getStageNumber() == previousStageNumber) {
+        RowSignature rowSignature = QueryKitUtils.sortableSignature(
+            stageDef.getSignature(),
+            shuffleSpec.clusterBy().getColumns()
+        );
+        
queryDefBuilder.add(StageDefinition.builder(stageDef).shuffleSpec(shuffleSpec).signature(rowSignature));

Review Comment:
   Currently, it is entirely possible that the final old shuffle spec contains 
non null shuffling. When creating the previous stages, it is the windowQueryKit 
which gives hints on what final it wants the shuffling to be based on 
`resultShuffleSpecFactory` parameter. Currently, this is 
`globalSortWithMaxPartitionCount` (if this was a normal subquery, we would want 
it to be running on as many workers as that stage allows).
   
   I think it might be better to change the resultShuffleSpec we pass from 
windowQueryKit, and then do an assert on the shuffle spec. 
   
   @kgyrtkirk Do you know of any example query which might cause issues if we 
change the shuffle spec? I have thought about it a bit, but I can't think of 
one. Since it is window function which is reading the results of this shuffle, 
and it does not care about the ordering (I'm assuming this is the case since we 
want to change it to mixShuffleSpec), there should not be an issue with this 
change, from what I can tell, but I might have to think more about this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to