Why don't they only pre-aggregate the standard report set, and compute
the 'custom report' in runtime based on column-store storage, say
Bigtable? as you said, they only select 5 dimension at the same time
in custom report, IMHO, 'column families' in bigtable can help to scan
less data in practice.

On Wed, Nov 14, 2012 at 1:25 AM, Asaf Mesika <[email protected]> wrote:
> Interesting.
> Analytics offers drilling up to 5 dimensions in depth - your choice of them 
> out of a few tenths. That's quite a lot of combinations for them to 
> pre-aggregate. So its seems they will a heavy storage penalty for such pre 
> calculation.
> Regarding large data sets - when you are using the app you are focus on one 
> domain. So the data set is as large as the site traffic. As I understand they 
> 20k-50k machines, so I thought they can disperse the data on it, and run 
> Dremel on top of this data. They can optimize by doing some first level 
> aggregations in all sorts of dimensions, and then run Dremel on top of that 
> which makes the data set smaller by x10 the very least.
>
> Asaf
>
> On 13 בנוב 2012, at 17:51, David Gruzman <[email protected]> wrote:
>
>> As far as I know, it is not. It is heavy sampling and pre-calculations.
>> If you do processing of large data sets - result of aggregation will be
>> also large - something dremel does not intended to support. It is designed
>> to build small derivative over large dataset.
>> David
>>
>> On Tue, Nov 13, 2012 at 5:36 PM, Mesika, Asaf <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> Do you know if Google Analytics is powered by Dremel?
>>>
>>> Thanks,
>>>
>>> Asaf
>>>
>>>
>

Reply via email to