adarshsanjeev commented on code in PR #16804: URL: https://github.com/apache/druid/pull/16804#discussion_r1700096766
########## extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/querykit/WindowOperatorQueryKit.java: ########## @@ -328,4 +321,29 @@ private ShuffleSpec findShuffleSpecForNextWindow(List<OperatorFactory> operatorF return new HashShuffleSpec(new ClusterBy(keyColsOfWindow, 0), maxWorkerCount); } + + /** + * Override the shuffle spec of the last stage based on the shuffling required by the first window stage. + * @param queryId + * @param dataSourcePlan + * @param shuffleSpec + * @return + */ + private QueryDefinitionBuilder makeQueryDefinitionBuilder(String queryId, DataSourcePlan dataSourcePlan, ShuffleSpec shuffleSpec) + { + final QueryDefinitionBuilder queryDefBuilder = QueryDefinition.builder(queryId); + int previousStageNumber = dataSourcePlan.getSubQueryDefBuilder().get().build().getFinalStageDefinition().getStageNumber(); + for (final StageDefinition stageDef : dataSourcePlan.getSubQueryDefBuilder().get().build().getStageDefinitions()) { + if (stageDef.getStageNumber() == previousStageNumber) { + RowSignature rowSignature = QueryKitUtils.sortableSignature( + stageDef.getSignature(), + shuffleSpec.clusterBy().getColumns() + ); + queryDefBuilder.add(StageDefinition.builder(stageDef).shuffleSpec(shuffleSpec).signature(rowSignature)); Review Comment: Currently, it is entirely possible that the final old shuffle spec contains non null shuffling. When creating the previous stages, it is the windowQueryKit which gives hints on what final it wants the shuffling to be based on `resultShuffleSpecFactory` parameter. Currently, this is `globalSortWithMaxPartitionCount` (if this was a normal subquery, we would want it to be running on as many workers as that stage allows). I think it might be better to change the resultShuffleSpec we pass from windowQueryKit, and then do an assert on the shuffle spec. @kgyrtkirk Do you know of any example query which might cause issues if we change the shuffle spec? I have thought about it a bit, but I can't think of one. Since it is window function which is reading the results of this shuffle, and it does not care about the ordering (I'm assuming this is the case since we want to change it to mixShuffleSpec), there should not be an issue with this change, from what I can tell, but I might have to think more about this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org