Re: Understanding the cube building process

Vaibhav Taro Sun, 08 May 2016 05:38:51 -0700

Thanks a lot for the clarification, I'll tune my Kylin setup accordingly.
On 08-May-2016 5:19 PM, "Li Yang" <liy...@apache.org> wrote:


> Many things affects cube build speed. From workload point of view, it's
> your data size and cube definition. From capacity point of view, it's the
> size and available resource of your hadoop cluster. Finally, there are many
> tuning about MR jobs. Checking if hive table as mapper splits are balanced
> is the starting point.
>
> Multiple segments build in parallel is no problem in theory. It's just for
> simplicity at moment that they go in sequence.
>
> On Wed, May 4, 2016 at 3:09 PM, Vaibhav Taro <vaibhavtar...@gmail.com>
> wrote:
>
> > I am also waiting for the document on Streaming cubes, glad to hear that
> > it's in progress.
> >
> > The talk that you gave is very insightful. I still have few doubts
> > regarding Cube build process, it would be really helpful if you can clear
> > them.
> >
> > - Cube build process sometimes takes more time, how can we optimize the
> > cube build process? In my case, I don't have hierarchical dimension or
> > derived dimensions, so not much scope to optimize as per this doc
> > http://kylin.apache.org/docs15/howto/howto_optimize_cubes.html
> >
> > - I tried doing cube refresh when there is no new data in that cube
> > segment, still cube build processes took around 6 minutes. So it looks
> like
> > there is scope to optimize cube build process in such cases. In the
> > nutshell what are the factor affecting cube build time?
> >
> > - Is it possible to run refresh cube for multiple cube segments in
> > parallel?
> >
> > Thanks in advance.
> >
> >
> >
> > On Wed, May 4, 2016 at 11:43 AM, Li Yang <liy...@apache.org> wrote:
> >
> > > Shaofeng is working on a document about Kafka and streaming cubing.
> Let's
> > > wait.
> > >
> > > On Tue, May 3, 2016 at 11:26 PM, Nick Dimiduk <ndimi...@apache.org>
> > wrote:
> > >
> > > > Very nice talk, thank you. That helped put many things into context
> for
> > > me.
> > > > I will resume my study of the code for understanding engine
> > > implementation
> > > > details.
> > > >
> > > > One final question -- is there a doc for getting started with the
> > > > experimental Kafka integration?
> > > >
> > > > Thanks,
> > > > Nick
> > > >
> > > > On Tue, May 3, 2016 at 2:45 AM, Li Yang <liy...@apache.org> wrote:
> > > >
> > > > > It's complicated. As of Kylin 1.5, there are two flavors of cubing
> > > > > algorithm. Below talk covered a bit. There's no comprehensive
> > document
> > > at
> > > > > the moment.
> > > > >
> > > > > https://www.youtube.com/watch?v=n74zvLmIgF0
> > > > >
> > > > >
> > > > > On Tue, May 3, 2016 at 7:52 AM, Nick Dimiduk <ndimi...@apache.org>
> > > > wrote:
> > > > >
> > > > > > Hi there,
> > > > > >
> > > > > > I'm curious to understand how Kylin goes about building cubes.
> I've
> > > > > > deployed it on a single-node cluster and played around with the
> > > sample
> > > > > cube
> > > > > > [0]. Now i'm looking through the kylin server log and the code in
> > the
> > > > > > 'engine-mr'. I'm not finding much in the way of docs in the
> source
> > > code
> > > > > > though :(
> > > > > >
> > > > > > Is there any presentation, blog post, &c that gives and overview
> of
> > > > these
> > > > > > internals? I did find [1] but I'm looking go descend another
> level.
> > > I'm
> > > > > > curious about the various steps involved (looks like it ran 18
> > > "steps"
> > > > > and
> > > > > > 10 MR jobs), what they're doing. I'm also curious about the
> schema
> > > > design
> > > > > > for the data model in HBase.
> > > > > >
> > > > > > Thanks in advance!
> > > > > > -n
> > > > > >
> > > > > > [0]: http://kylin.apache.org/docs15/tutorial/kylin_sample.html
> > > > > > [1]:
> > http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Regards,
> > VaibhaV
> >
>

Re: Understanding the cube building process

Reply via email to