#general


@wrbriggs: Sorry to be a never-ending fount of questions, folks… is it expected / necessary to create a rangeIndex on dateTime fields, or are those automatically indexed efficiently? Likewise, should I add dateTime fields to the noDictionaryColumns list?
  @mayanks: What's your time granularity?
  @mayanks: Typically, we don't need to set explicit indexing for dateTime fields, as we can still prune segments based on metadata.
  @wrbriggs: My base timestamp is epoch milliseconds, I am playing around with deriving an hourly grain field for pre-aggregating in a star-tree index, but based on my current prototype, that seems like it might be premature optimization
  @mayanks: A general recommendation is to sort on primary key (or a dimension that appears in most queries), and minimal number of inv indexing to have a reasonable selectivity across your query set.
  @wrbriggs: :thumbsup:
  @wrbriggs: Thank you. I am also looking to partition the incoming data based on a dimension that is almost always used selectively in the WHERE clause, and use broker-side partition pruning to minimize scanning unnecessary segments - I’m not sure if that will actually help, or if forcing data locality like that will bottleneck things.
  @wrbriggs: I am using that same dimension as my sort key, but right now, it’s not particularly useful to sort on it, because it shows up in all segments
  @mayanks: Oh yeah, partitioning is definitely a good idea.
  @mayanks: In our usecases, we typically have the partitioning as well as sorting on the same dimension
  @wrbriggs: Ok, that makes me feel better, as that was my plan.
  @mayanks: Good plan, I'd say.
  @wrbriggs: Another stupid question - should I create an inverted index on the sort column, or is that unnecessary?
  @mayanks: That is unnecessary, it won't be used. In fact, the segment generation might just ignore and not create.
  @wrbriggs: Perfect, thank you. It seemed like it would be unnecessary, but I’ve seen stranger things, and the docs, while great for an incubating project, were a little unclear - I would love to volunteer to keep notes while I’m doing this, and maybe propose some updates to the docs if that would be helpful.
  @mayanks: That would be really awesome, would really appreciate your help in improving our docs.
  @g.kishore: @wrbriggs you might find this video useful
  @g.kishore: it talks about all the indexing techniques and when to use what.
  @wrbriggs: That’s awesome, thank you

#discuss-validation


@chinmay.cerebro: @mayanks @ssubrama @snlee: not sure if you got time to review the table config validation schema created by @mohammedgalalen056:
@chinmay.cerebro: Please review when you get some time
@chinmay.cerebro: I think there are cases where this might break things. For eg: `"replication": { "type": "string" }` : I've seen cases where we can use `"replication": 3` -> this is flagged by the validation since we expect a string
--------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]

Reply via email to