Re: Incremental builds assumptions and clarifications

Abhilash L L Fri, 25 Dec 2015 00:37:09 -0800

Thanks for the clarification Luke, Li Yang.

Please find my comments / questions inline


>    Is there a document explaining the assumptions for incremental builds.
>> *Luke: I'm afraid there's no such doc yet. what's exactly "assumption"
you
>> are looking for, to know the code level implementation or how to
optimize?*
--> Not at code / implementation level. More at a feature level. Like the
one shared by Li Yang regarding a TS column for differentiation. Also, on
how it breaks it up into segments and how a user can rebuild part of the
segments


>    Do we allow 'updates' on a facts ?
> 1) Because of some typo the quantity came in as 100 instead of 10. What is
> the suggested approach to handle this.
>>So you want to refresh a built piece of data. And yes, that's doable.
Kylin
>>cut cube into segments by time period. You can refresh (or rebuild) a
>>segment without impacting the rests.
--> a) How does kylin cut the data (initial / incremental) into segments ?
Does one day become one segment ?  b) When new data comes, does it
automatically figure out which segments to rebuild ?  c) How to rebuild
only part of the data / segment via Rest API.


>> Luke: Do you mean data model changes? Then you have to disable that
>>cube, purge data and refine it, the rebuild it.
--> No only data changes, not model changes. For model as now I understand
we have to rebuild the full cube.


>    How to support deletes in fact / dimension ?
>
>>*      Luke: delete in fact table is fine, but in dimension should be
>>careful, properly it will require rebuild.*
Lets for a time period T1-T2 there were 100 records earlier, now due to
deletion it should be only 98 for the same time period. How to trigger
delete of the 2 records ?  Is it to populate all 98 records in facttable
and then ask kylin to rebuild for T1-T2 ?










Regards,
Abhilash

On Fri, Dec 25, 2015 at 7:35 AM, Li Yang <liy...@apache.org> wrote:

> Em.. don't think Luke has all the questions fully answered. My additions.
>
> >    Is there a document explaining the assumptions for incremental builds.
> The only assumption (or requirement) is that there is date or timestamp
> column on the fact table that distinguishes the old from the new.
>
> >    Do we allow 'updates' on a facts ?
> > 1) Because of some typo the quantity came in as 100 instead of 10. What
> is
> > the suggested approach to handle this.
> So you want to refresh a built piece of data. And yes, that's doable. Kylin
> cut cube into segments by time period. You can refresh (or rebuild) a
> segment without impacting the rests.
>
> > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it
> > got updated to d2 for the same dimension. How does it 'deduct' from the
> > aggregation for d1 for all cuboids and 'accumulate' for d2 in all
> cuboids.
> >
> >    How to support Slowly Changing Dimensions (SCD). Support for type 2
> and
> > type 3.
> The design is Kylin remembers data at the point it's built. So you may
> build a daily segment on T day with category set C in lookup table; then on
> T+1 day, the category lookup table is updated into C~, and with that build
> a T+1 daily segment. Now if you query the cube, it will report categories
> including both C and C~. More precisely Kylin will return C for T day
> transactions and C~ for T+1 transactions.
>
> If what you want is to reflect C~ in historic data, then earlier segments
> have to be rebuild.
>
> On Thu, Dec 24, 2015 at 10:59 PM, Luke Han <luke...@gmail.com> wrote:
>
> > Hi Abhilash,
> >     Please refer to below comments inline.
> >
> >     Thanks.
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Thu, Dec 10, 2015 at 2:28 PM, Abhilash L L <abhil...@infoworks.io>
> > wrote:
> >
> > > Hello,
> > >
> > >    Is there a document explaining the assumptions for incremental
> builds.
> > > *Luke: I'm afraid there's no such doc yet. what's exactly "assumption"
> > you
> > > are looking for, to know the code level implementation or how to
> > optimize?*
> >
> >
> >
> > >
> > >    Is it purely additive ? Lets say category id is one my row key
> > > components. I had 10 products on category id 20. Now I got a new
> product
> > > for same category would it add up. Would distinct count also be fine ?
> > >
> > *      Luke:  Kylin performs very well for such case, it will add up to
> 21,
> > also for distinct count, but the result of distinct count is
> > approximately.*
> >
> > >
> > >    Do we allow 'updates' on a facts ?
> > > 1) Because of some typo the quantity came in as 100 instead of 10. What
> > is
> > > the suggested approach to handle this.
> > >
> >        Luke: Do you mean data model changes? Then you have to disable
> that
> > cube, purge data and refine it, the rebuild it.
> >
> > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it
> > > got updated to d2 for the same dimension. How does it 'deduct' from the
> > > aggregation for d1 for all cuboids and 'accumulate' for d2 in all
> > cuboids.
> > >
> > >    How to support Slowly Changing Dimensions (SCD). Support for type 2
> > and
> > > type 3.
> > >
> > *      Luke: Kylin does not support SCD very well yet.*
> >
> > >
> > >    How to support deletes in fact / dimension ?
> > >
> > *      Luke: delete in fact table is fine, but in dimension should be
> > careful, properly it will require rebuild.*
> >
> > >
> > >
> > >    If theres a document explaining already, it would help us and a lot
> of
> > > people.
> > >
> > >
> > > Regards,
> > > Abhilash
> > >
> >
>

Re: Incremental builds assumptions and clarifications

Reply via email to