superhawk610 opened a new issue, #12616:
URL: https://github.com/apache/druid/issues/12616
### Affected Version
`2021.11.1-iap`
### Description
When providing multiple sub-groups to `subtotalsSpec`, the Druid docs
recommend using the `grouping` aggregator to differentiate between results. The
`grouping` aggregator reports, for a given list of dimensions, whether or not
that dimension is used in a given sub-groups totals. `subtotalsSpec`, however,
allows providing any `outputName`, not just dimensions. Take this example:
```jsonc
{
"queryType": "groupBy",
"granularity": "all",
// .. (snip) ..
"subtotalsSpec": [["a"], ["b"]],
"aggregations": [
{
"type": "grouping",
"name": "__grouping__",
"groupings": ["a", "b"]
}
],
"dimensions": [
{
"type": "lookup",
"dimension": "id",
"outputName": "a",
"lookup": {
"type": "map",
"map": { "1": "foo", "2": "foo", "3": "bar" }
}
},
{
"type": "lookup",
"dimension": "id",
"outputName": "b",
"lookup": {
"type": "map",
// importantly, `id=2` is in a different sub-group depending on
whether
// we're grouping by `a` or `b` (even though the base dimension,
`id`,
// is the same in each case)
"map": { "1": "X", "2": "Y", "3": "Z" }
}
}
]
}
```
I would expect this query to return results that look like this:
```json
[
// omitting the timestamp/version/event wrapper, but you get the idea
{
"__grouping__": 0b01,
"a": "foo",
"b": null,
"views": 2 // some metric, doesn't really matter
},
{
"__grouping__": 0b01,
"a": "bar",
"b": null,
"views": 1
},
{
"__grouping__": 0b10,
"a": null,
"b": "X",
"views": 1
},
{
"__grouping__": 0b10,
"a": null,
"b": "Y",
"views": 2
}
]
```
However, `__grouping__` is `0b11` for all 4 results; since the "dimensions"
`a` and `b` aren't used in any result (they're not dimensions, they're the
output name for lookups). If I provide `id` to the `grouping` aggregator, its
corresponding bit in the output will correctly be `0` for all rows, since it's
used to generate both the `a` and `b` values, but this isn't helpful as I
cannot differentiate which results are grouped by `a`, and which are grouped by
`b`.
I propose that the `grouping` aggregator allow specifying `outputName`
instead of dimension name, to align 1:1 with how `subtotalsSpec` works and
allow for differentiating between output sub-groups in cases like that
illustrated above.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]