Re: Group by + where clause

Luke Han Fri, 11 Dec 2015 16:14:00 -0800

Would you mind to share more detail about how you indexing these
aggregations and how your query will convert to ES API?


BTW, does this similar to Druid doing?


>Multiple indexing is what we take advantage of. ES, by default indexes on
>all fields of a document. We store a multidimensional aggregation as an ES
>document whose fields are the various dimensions and metrics associated
>with the aggregation.


Best Regards!
---------------------

Luke Han

On Sat, Dec 12, 2015 at 3:05 AM, Sarnath <[email protected]> wrote:

> >>>> Sorted indexes are a viable approach to OLAP storage — Druid[1] does
> it, and so does SAP HANA. The idea is that if you sort and compress your
> data it becomes very compact, so you can do very fast scans. So fast that
> you don’t need to pre-aggregate it.
>
> Yes, the problem (which I think you have covered below) is that you can
> only sort on a column of interest... And you can sort again on other
> columns among all rows where the first column has the same value.... But
> then, if you were to filter by second column - you will still need to scan
> entire table. Very similar to the analogy in our blog.(search for all
> English words whose second letter is 'a')
> And, as your filtering query becomes complex, it becomes very difficult. I
> believe Druid is optimized for time series analytics (how much by minute,
> hour, day etc..). Not sure about multidimensional aggregations...
>
> >>>> Elasticsearch is an index but it is not an OLAP index - their use case
> does not call for compressing numeric data, and they optimize for point
> lookups rather than scans.
>
> We use ES only to serve pre-aggregated cube data and not to index the raw
> data to produce OLAP cubes.
>
> >>>>> The best OLAP indexes are able to combine multiple indexes. E.g. take
> two not-very-selective conditions and make a selective condition. The
> poorer ones can only use one index, so to get coverage you need to build
> more indexes.
>
> Can you elaborate on Not-so-selective condition? I am a bit lost on the
> context.
>
> Multiple indexing is what we take advantage of. ES, by default indexes on
> all fields of a document. We store a multidimensional aggregation as an ES
> document whose fields are the various dimensions and metrics associated
> with the aggregation. Thus the cube can be sliced and diced on any
> dimension and filtered on metrics as well.. And again, this indexing is
> completely different from indexing on raw data or table data. We are
> dealing with data cubes here.
>
> Best,
> Sarnath
>

Re: Group by + where clause

Reply via email to