Re: Group by + where clause

Julian Hyde Fri, 11 Dec 2015 10:34:33 -0800

Sorted indexes are a viable approach to OLAP storage — Druid[1] does it, and so 
does SAP HANA. The idea is that if you sort and compress your data it becomes 
very compact, so you can do very fast scans. So fast that you don’t need to 
pre-aggregate it.

OLAP indexes require that the attributes you want to partition on are present 
in the table — therefore they cannot handle joins (i.e. star schemas). So, for 
a OLAP traditional star schema, I’d go for aggregation.

I suspect that they don’t handle updates too well, but I may be wrong.

Elasticsearch is an index but it is not an OLAP index - their use case does not 
call for compressing numeric data, and they optimize for point lookups rather 
than scans.

The best OLAP indexes are able to combine multiple indexes. E.g. take two 
not-very-selective conditions and make a selective condition. The poorer ones 
can only use one index, so to get coverage you need to build more indexes.

I am still pondering the relationship between index-based OLAP and 
aggregate-based OLAP. My hunch is that the ideal would be a hybrid system, 
using aggregates for high-level queries and using indexes for time-series like 
queries, queries over a narrow time period, and queries on recent data.

Julian

[1] http://druid.io/

> On Dec 11, 2015, at 9:39 AM, Sarnath <[email protected]> wrote:
> 
> By default, ES indexes on all fields.. And that includes metric as well :)
> 
> Check this:
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html
> 
> BTW... How do you do that in HBase today?

Re: Group by + where clause

Reply via email to