gianm opened a new issue #8091: groupBy with subtotalsSpec doesn't fully group 
each set
URL: https://github.com/apache/incubator-druid/issues/8091
 
 
   These two queries (if posed in GroupByQueryRunnerTest) return different 
results, but I think they should return the same results. The only difference 
between query 1 and query 2 is that query 2 has an additional dimension that is 
_not_ referenced in the subtotalsSpec.
   
   I believe the issue is that `GroupByStrategyV2.processSubtotalsSpec` calls 
`mergeResults` on rows returned by `GroupByRowProcessor.getRowsFromGrouper` for 
each subtotal dimension list, but this isn't enough to fully group them. The 
result rows from the grouper are sorted based on the original dimension set, 
and `mergeResults` only merges adjacent rows.
   
   i.e. Imagine you have two dimensions and they each have values A, B, and C. 
The original grouper might have rows like this:
   
   AA
   AB
   BA
   BB
   
   Passing "dimsToInclude" with just the second dim would yield these rows from 
the Grouper:
   
   A
   B
   A
   B
   
   Which cannot be merged by `mergeResults`.
   
   I can think of a couple of fixes:
   
   1. Re-sorting the Grouper rows each time they are pulled out. However, not 
all Groupers support re-sorting (e.g. SpillingGrouper cannot change its sort 
order once it has spilled to disk) so it might not always be possible.
   2. Using _two_ Groupers (with two merge buffers), one to store the initially 
grouped rows and one to store subtotal groupings. The act of adding results to 
the subtotal Grouper will properly and fully group them. Takes more memory but 
should work in all cases.
   
   Query 1:
   
   ```java
       GroupByQuery query = makeQueryBuilder()
           .setDataSource(QueryRunnerTestHelper.dataSource)
           .setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird)
           .setDimensions(
               ImmutableList.of(
                   new DefaultDimensionSpec("market", "market")
               )
           )
           
.setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount))
           .setGranularity(QueryRunnerTestHelper.allGran)
           .setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market")))
           .build();
   ```
   
   Query 2:
   
   ```java
       GroupByQuery query = makeQueryBuilder()
           .setDataSource(QueryRunnerTestHelper.dataSource)
           .setQuerySegmentSpec(QueryRunnerTestHelper.firstToThird)
           .setDimensions(
               ImmutableList.of(
                   new DefaultDimensionSpec("quality", "quality"),
                   new DefaultDimensionSpec("market", "market")
               )
           )
           
.setAggregatorSpecs(Collections.singletonList(QueryRunnerTestHelper.rowsCount))
           .setGranularity(QueryRunnerTestHelper.allGran)
           .setSubtotalsSpec(ImmutableList.of(ImmutableList.of("market")))
           .build();
   ```
   
   /cc @himanshug 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

Reply via email to