FlattenSpec for Nested Data With Unknown Array Length

Evan Galpin Wed, 05 May 2021 19:37:43 -0700

Hi Druid devs!

I’m investigating Druid for an analytical workload and I think it would be
a great fit for the data and use case I have. One thing I’m stuck on right
now is data modelling.


I’ll use a somewhat classic “Blog Post” example to illustrate. Let’s assume
a Blog Entry may have many associated “comments” (in unknown quantity), and
many unknown “reactions” (quantity also unknown).

What is the best way to model this? The example flattenSpec’s that I’ve
seen showing array handling seem to indicate that the size of the array
must be known and constant. Does that then rule out the possibility of
modelling the above Blog Entry as a singular row for peak performance?

One natural way to model the above with an RDBMS would be a table for each
of Blog Entries, Comments, and Reactions, then performing joins as needed.
But am I correct in assuming that joins ought to be avoided?

Thanks in advance,
Evan

FlattenSpec for Nested Data With Unknown Array Length

Reply via email to