But we're talking more then 30 dimensions, some with very high
cardinality, in every available order. That's a huge storage penalty
to pay.

Sent from my iPhone

On 15 בנוב 2012, at 07:47, Xun Zhou <[email protected]> wrote:

> Why don't they only pre-aggregate the standard report set, and compute
> the 'custom report' in runtime based on column-store storage, say
> Bigtable? as you said, they only select 5 dimension at the same time
> in custom report, IMHO, 'column families' in bigtable can help to scan
> less data in practice.
>
> On Wed, Nov 14, 2012 at 1:25 AM, Asaf Mesika <[email protected]> wrote:
>> Interesting.
>> Analytics offers drilling up to 5 dimensions in depth - your choice of them 
>> out of a few tenths. That's quite a lot of combinations for them to 
>> pre-aggregate. So its seems they will a heavy storage penalty for such pre 
>> calculation.
>> Regarding large data sets - when you are using the app you are focus on one 
>> domain. So the data set is as large as the site traffic. As I understand 
>> they 20k-50k machines, so I thought they can disperse the data on it, and 
>> run Dremel on top of this data. They can optimize by doing some first level 
>> aggregations in all sorts of dimensions, and then run Dremel on top of that 
>> which makes the data set smaller by x10 the very least.
>>
>> Asaf
>>
>> On 13 בנוב 2012, at 17:51, David Gruzman <[email protected]> wrote:
>>
>>> As far as I know, it is not. It is heavy sampling and pre-calculations.
>>> If you do processing of large data sets - result of aggregation will be
>>> also large - something dremel does not intended to support. It is designed
>>> to build small derivative over large dataset.
>>> David
>>>
>>> On Tue, Nov 13, 2012 at 5:36 PM, Mesika, Asaf <[email protected]> wrote:
>>>
>>>> Hi,
>>>>
>>>> Do you know if Google Analytics is powered by Dremel?
>>>>
>>>> Thanks,
>>>>
>>>> Asaf
>>

Reply via email to