#general
@wrbriggs: Sorry to be a never-ending fount of questions, folks… is it expected / necessary to create a rangeIndex on dateTime fields, or are those automatically indexed efficiently? Likewise, should I add dateTime fields to the noDictionaryColumns list?
@mayanks: What's your time granularity?
@mayanks: Typically, we don't need to set explicit indexing for dateTime fields, as we can still prune segments based on metadata.
@wrbriggs: My base timestamp is epoch milliseconds, I am playing around with deriving an hourly grain field for pre-aggregating in a star-tree index, but based on my current prototype, that seems like it might be premature optimization
@mayanks: A general recommendation is to sort on primary key (or a dimension that appears in most queries), and minimal number of inv indexing to have a reasonable selectivity across your query set.
@wrbriggs: :thumbsup:
@wrbriggs: Thank you. I am also looking to partition the incoming data based on a dimension that is almost always used selectively in the WHERE clause, and use broker-side partition pruning to minimize scanning unnecessary segments - I’m not sure if that will actually help, or if forcing data locality like that will bottleneck things.
@wrbriggs: I am using that same dimension as my sort key, but right now, it’s not particularly useful to sort on it, because it shows up in all segments
@mayanks: Oh yeah, partitioning is definitely a good idea.
@mayanks: In our usecases, we typically have the partitioning as well as sorting on the same dimension
@wrbriggs: Ok, that makes me feel better, as that was my plan.
@mayanks: Good plan, I'd say.
@wrbriggs: Another stupid question - should I create an inverted index on the sort column, or is that unnecessary?
@mayanks: That is unnecessary, it won't be used. In fact, the segment generation might just ignore and not create.
@wrbriggs: Perfect, thank you. It seemed like it would be unnecessary, but I’ve seen stranger things, and the docs, while great for an incubating project, were a little unclear - I would love to volunteer to keep notes while I’m doing this, and maybe propose some updates to the docs if that would be helpful.
@mayanks: That would be really awesome, would really appreciate your help in improving our docs.
@g.kishore: @wrbriggs you might find this video useful
@g.kishore: it talks about all the indexing techniques and when to use what.
@wrbriggs: That’s awesome, thank you
#discuss-validation
@chinmay.cerebro: @mayanks @ssubrama @snlee: not sure if you got time to review the table config validation schema created by @mohammedgalalen056:
@chinmay.cerebro: Please review when you get some time
@chinmay.cerebro: I think there are cases where this might break things. For eg: `"replication": { "type": "string" }` : I've seen cases where we can use `"replication": 3` -> this is flagged by the validation since we expect a string
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
