Thanks a lot for the clarification, I'll tune my Kylin setup accordingly. On 08-May-2016 5:19 PM, "Li Yang" <liy...@apache.org> wrote:
> Many things affects cube build speed. From workload point of view, it's > your data size and cube definition. From capacity point of view, it's the > size and available resource of your hadoop cluster. Finally, there are many > tuning about MR jobs. Checking if hive table as mapper splits are balanced > is the starting point. > > Multiple segments build in parallel is no problem in theory. It's just for > simplicity at moment that they go in sequence. > > On Wed, May 4, 2016 at 3:09 PM, Vaibhav Taro <vaibhavtar...@gmail.com> > wrote: > > > I am also waiting for the document on Streaming cubes, glad to hear that > > it's in progress. > > > > The talk that you gave is very insightful. I still have few doubts > > regarding Cube build process, it would be really helpful if you can clear > > them. > > > > - Cube build process sometimes takes more time, how can we optimize the > > cube build process? In my case, I don't have hierarchical dimension or > > derived dimensions, so not much scope to optimize as per this doc > > http://kylin.apache.org/docs15/howto/howto_optimize_cubes.html > > > > - I tried doing cube refresh when there is no new data in that cube > > segment, still cube build processes took around 6 minutes. So it looks > like > > there is scope to optimize cube build process in such cases. In the > > nutshell what are the factor affecting cube build time? > > > > - Is it possible to run refresh cube for multiple cube segments in > > parallel? > > > > Thanks in advance. > > > > > > > > On Wed, May 4, 2016 at 11:43 AM, Li Yang <liy...@apache.org> wrote: > > > > > Shaofeng is working on a document about Kafka and streaming cubing. > Let's > > > wait. > > > > > > On Tue, May 3, 2016 at 11:26 PM, Nick Dimiduk <ndimi...@apache.org> > > wrote: > > > > > > > Very nice talk, thank you. That helped put many things into context > for > > > me. > > > > I will resume my study of the code for understanding engine > > > implementation > > > > details. > > > > > > > > One final question -- is there a doc for getting started with the > > > > experimental Kafka integration? > > > > > > > > Thanks, > > > > Nick > > > > > > > > On Tue, May 3, 2016 at 2:45 AM, Li Yang <liy...@apache.org> wrote: > > > > > > > > > It's complicated. As of Kylin 1.5, there are two flavors of cubing > > > > > algorithm. Below talk covered a bit. There's no comprehensive > > document > > > at > > > > > the moment. > > > > > > > > > > https://www.youtube.com/watch?v=n74zvLmIgF0 > > > > > > > > > > > > > > > On Tue, May 3, 2016 at 7:52 AM, Nick Dimiduk <ndimi...@apache.org> > > > > wrote: > > > > > > > > > > > Hi there, > > > > > > > > > > > > I'm curious to understand how Kylin goes about building cubes. > I've > > > > > > deployed it on a single-node cluster and played around with the > > > sample > > > > > cube > > > > > > [0]. Now i'm looking through the kylin server log and the code in > > the > > > > > > 'engine-mr'. I'm not finding much in the way of docs in the > source > > > code > > > > > > though :( > > > > > > > > > > > > Is there any presentation, blog post, &c that gives and overview > of > > > > these > > > > > > internals? I did find [1] but I'm looking go descend another > level. > > > I'm > > > > > > curious about the various steps involved (looks like it ran 18 > > > "steps" > > > > > and > > > > > > 10 MR jobs), what they're doing. I'm also curious about the > schema > > > > design > > > > > > for the data model in HBase. > > > > > > > > > > > > Thanks in advance! > > > > > > -n > > > > > > > > > > > > [0]: http://kylin.apache.org/docs15/tutorial/kylin_sample.html > > > > > > [1]: > > http://www.slideshare.net/YangLi43/design-cube-in-apache-kylin > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Regards, > > VaibhaV > > >