gianm opened a new issue #8091: groupBy with subtotalsSpec doesn't fully group each set URL: https://github.com/apache/incubator-druid/issues/8091 These two queries (if posed in GroupByQueryRunnerTest) return different results, but I think they should return the same results. The only difference between query 1 and query 2 is that query 2 has an additional dimension that is _not_ referenced in the subtotalsSpec. I believe the issue is that `GroupByStrategyV2.processSubtotalsSpec` calls `mergeResults` on rows returned by `GroupByRowProcessor.getRowsFromGrouper` for each subtotal dimension list, but this isn't enough to fully group them. The result rows from the grouper are sorted based on the original dimension set, and `mergeResults` only merges adjacent rows. i.e. Imagine you have two dimensions and they each have values A, B, and C. The original grouper might have rows like this: AA AB BA BB Passing "dimsToInclude" with just the second dim would yield these rows from the Grouper: A B A B Which cannot be merged by `mergeResults`. I can think of a couple of fixes: 1. Re-sorting the Grouper rows each time they are pulled out. However, not all Groupers support re-sorting (e.g. SpillingGrouper cannot change its sort order once it has spilled to disk) so it might not always be possible. 2. Using _two_ Groupers (with two merge buffers), one to store the initially grouped rows and one to store subtotal groupings. The act of adding results to the subtotal Grouper will properly and fully group them. Takes more memory but should work in all cases. Query 1: ```java GroupByQuery query = makeQueryBuilder() .setDataSource(QueryRunnerTestHelper.dataSource) .setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird) .setDimensions( ImmutableList.of( new DefaultDimensionSpec("market", "market") ) ) .setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount)) .setGranularity(QueryRunnerTestHelper.allGran) .setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market"))) .build(); ``` Query 2: ```java GroupByQuery query = makeQueryBuilder() .setDataSource(QueryRunnerTestHelper.dataSource) .setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird) .setDimensions( ImmutableList.of( new DefaultDimensionSpec("quality", "quality"), new DefaultDimensionSpec("market", "market") ) ) .setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount)) .setGranularity(QueryRunnerTestHelper.allGran) .setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market"))) .build(); ``` /cc @himanshug
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org