Sorted indexes are a viable approach to OLAP storage — Druid[1] does it, and so does SAP HANA. The idea is that if you sort and compress your data it becomes very compact, so you can do very fast scans. So fast that you don’t need to pre-aggregate it.
OLAP indexes require that the attributes you want to partition on are present in the table — therefore they cannot handle joins (i.e. star schemas). So, for a OLAP traditional star schema, I’d go for aggregation. I suspect that they don’t handle updates too well, but I may be wrong. Elasticsearch is an index but it is not an OLAP index - their use case does not call for compressing numeric data, and they optimize for point lookups rather than scans. The best OLAP indexes are able to combine multiple indexes. E.g. take two not-very-selective conditions and make a selective condition. The poorer ones can only use one index, so to get coverage you need to build more indexes. I am still pondering the relationship between index-based OLAP and aggregate-based OLAP. My hunch is that the ideal would be a hybrid system, using aggregates for high-level queries and using indexes for time-series like queries, queries over a narrow time period, and queries on recent data. Julian [1] http://druid.io/ > On Dec 11, 2015, at 9:39 AM, Sarnath <[email protected]> wrote: > > By default, ES indexes on all fields.. And that includes metric as well :) > > Check this: > https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html > > BTW... How do you do that in HBase today?
