Hi Druid devs! I’m investigating Druid for an analytical workload and I think it would be a great fit for the data and use case I have. One thing I’m stuck on right now is data modelling.
I’ll use a somewhat classic “Blog Post” example to illustrate. Let’s assume a Blog Entry may have many associated “comments” (in unknown quantity), and many unknown “reactions” (quantity also unknown). What is the best way to model this? The example flattenSpec’s that I’ve seen showing array handling seem to indicate that the size of the array must be known and constant. Does that then rule out the possibility of modelling the above Blog Entry as a singular row for peak performance? One natural way to model the above with an RDBMS would be a table for each of Blog Entries, Comments, and Reactions, then performing joins as needed. But am I correct in assuming that joins ought to be avoided? Thanks in advance, Evan