Would you mind to share more detail about how you indexing these aggregations and how your query will convert to ES API?
BTW, does this similar to Druid doing? >Multiple indexing is what we take advantage of. ES, by default indexes on >all fields of a document. We store a multidimensional aggregation as an ES >document whose fields are the various dimensions and metrics associated >with the aggregation. Best Regards! --------------------- Luke Han On Sat, Dec 12, 2015 at 3:05 AM, Sarnath <[email protected]> wrote: > >>>> Sorted indexes are a viable approach to OLAP storage — Druid[1] does > it, and so does SAP HANA. The idea is that if you sort and compress your > data it becomes very compact, so you can do very fast scans. So fast that > you don’t need to pre-aggregate it. > > Yes, the problem (which I think you have covered below) is that you can > only sort on a column of interest... And you can sort again on other > columns among all rows where the first column has the same value.... But > then, if you were to filter by second column - you will still need to scan > entire table. Very similar to the analogy in our blog.(search for all > English words whose second letter is 'a') > And, as your filtering query becomes complex, it becomes very difficult. I > believe Druid is optimized for time series analytics (how much by minute, > hour, day etc..). Not sure about multidimensional aggregations... > > >>>> Elasticsearch is an index but it is not an OLAP index - their use case > does not call for compressing numeric data, and they optimize for point > lookups rather than scans. > > We use ES only to serve pre-aggregated cube data and not to index the raw > data to produce OLAP cubes. > > >>>>> The best OLAP indexes are able to combine multiple indexes. E.g. take > two not-very-selective conditions and make a selective condition. The > poorer ones can only use one index, so to get coverage you need to build > more indexes. > > Can you elaborate on Not-so-selective condition? I am a bit lost on the > context. > > Multiple indexing is what we take advantage of. ES, by default indexes on > all fields of a document. We store a multidimensional aggregation as an ES > document whose fields are the various dimensions and metrics associated > with the aggregation. Thus the cube can be sliced and diced on any > dimension and filtered on metrics as well.. And again, this indexing is > completely different from indexing on raw data or table data. We are > dealing with data cubes here. > > Best, > Sarnath >
