But we're talking more then 30 dimensions, some with very high cardinality, in every available order. That's a huge storage penalty to pay.
Sent from my iPhone On 15 בנוב 2012, at 07:47, Xun Zhou <[email protected]> wrote: > Why don't they only pre-aggregate the standard report set, and compute > the 'custom report' in runtime based on column-store storage, say > Bigtable? as you said, they only select 5 dimension at the same time > in custom report, IMHO, 'column families' in bigtable can help to scan > less data in practice. > > On Wed, Nov 14, 2012 at 1:25 AM, Asaf Mesika <[email protected]> wrote: >> Interesting. >> Analytics offers drilling up to 5 dimensions in depth - your choice of them >> out of a few tenths. That's quite a lot of combinations for them to >> pre-aggregate. So its seems they will a heavy storage penalty for such pre >> calculation. >> Regarding large data sets - when you are using the app you are focus on one >> domain. So the data set is as large as the site traffic. As I understand >> they 20k-50k machines, so I thought they can disperse the data on it, and >> run Dremel on top of this data. They can optimize by doing some first level >> aggregations in all sorts of dimensions, and then run Dremel on top of that >> which makes the data set smaller by x10 the very least. >> >> Asaf >> >> On 13 בנוב 2012, at 17:51, David Gruzman <[email protected]> wrote: >> >>> As far as I know, it is not. It is heavy sampling and pre-calculations. >>> If you do processing of large data sets - result of aggregation will be >>> also large - something dremel does not intended to support. It is designed >>> to build small derivative over large dataset. >>> David >>> >>> On Tue, Nov 13, 2012 at 5:36 PM, Mesika, Asaf <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> Do you know if Google Analytics is powered by Dremel? >>>> >>>> Thanks, >>>> >>>> Asaf >>
