Question on GroupBy query results merging process

Jisoo Kim Thu, 19 Jul 2018 14:43:01 -0700

Hi all,

I am currently working on a project that uses Druid's QueryRunner and other
druid-processing classes. It uses Druid's own classes to calculate query
results. I have been testing large GroupBy queries (using v2), and it seems
like parallel combining threads for GroupBy queries are only enabled on the
historical level. I think it is only getting called by
GroupByStrategyV2.mergeRunners()
<https://github.com/apache/incubator-druid/blob/druid-0.12.1/processing/src/main/java/io/druid/query/groupby/strategy/GroupByStrategyV2.java#L335>
which is only called by GroupByQueryRunnerFactory.mergeRunners() on
historicals.


Are GroupByMergingQueryRunnerV2 and parallel combining threads meant for
computing and merging per-segment results only, or can they also be used on
the broker level? I changed the logic of my project from calling
queryToolChest.mergeResults() on MergeSequence (created by providing a list
of per-segment/per-server sequences) to calling
queryToolChest.mergeResults() on queryRunnerFactory.mergeRunners() (where
each runner returns a deserialized result sequence), and that seemed to
have reduced really heavy groupby query computation time or failures by
quite a lot. Or is this just a coincidence and there shouldn't be a
performance difference in merging groupby query results, and the only
difference could've been by parallelizing the deserialization of result
sequences from sub-queries?

Thanks,
Jisoo

Question on GroupBy query results merging process

Reply via email to