Why don't they only pre-aggregate the standard report set, and compute the 'custom report' in runtime based on column-store storage, say Bigtable? as you said, they only select 5 dimension at the same time in custom report, IMHO, 'column families' in bigtable can help to scan less data in practice.
On Wed, Nov 14, 2012 at 1:25 AM, Asaf Mesika <[email protected]> wrote: > Interesting. > Analytics offers drilling up to 5 dimensions in depth - your choice of them > out of a few tenths. That's quite a lot of combinations for them to > pre-aggregate. So its seems they will a heavy storage penalty for such pre > calculation. > Regarding large data sets - when you are using the app you are focus on one > domain. So the data set is as large as the site traffic. As I understand they > 20k-50k machines, so I thought they can disperse the data on it, and run > Dremel on top of this data. They can optimize by doing some first level > aggregations in all sorts of dimensions, and then run Dremel on top of that > which makes the data set smaller by x10 the very least. > > Asaf > > On 13 בנוב 2012, at 17:51, David Gruzman <[email protected]> wrote: > >> As far as I know, it is not. It is heavy sampling and pre-calculations. >> If you do processing of large data sets - result of aggregation will be >> also large - something dremel does not intended to support. It is designed >> to build small derivative over large dataset. >> David >> >> On Tue, Nov 13, 2012 at 5:36 PM, Mesika, Asaf <[email protected]> wrote: >> >>> Hi, >>> >>> Do you know if Google Analytics is powered by Dremel? >>> >>> Thanks, >>> >>> Asaf >>> >>> >
