> a) How does kylin cut the data (initial / incremental) into segments ?
> Does one day become one segment ?  b) When new data comes, does it
> automatically figure out which segments to rebuild ?  c) How to rebuild
> only part of the data / segment via Rest API.

User tells Kylin how to cut segment. Every build request must carry a start
date and end date which defines the time boundary of segment. Segment never
overlaps (normally) and should be continuous. Say you have 3 segments, [T1,
T2), [T2, T3), [T3, T4).

When your data changes, be it record addition, removal, or update, you need
to refresh (or rebuild) the related segments. E.g. if 2 records on T2 were
removed, you should refresh [T2, T3) segment.

On Fri, Dec 25, 2015 at 4:36 PM, Abhilash L L <[email protected]> wrote:

> Thanks for the clarification Luke, Li Yang.
>
> Please find my comments / questions inline
>
> >    Is there a document explaining the assumptions for incremental builds.
> >> *Luke: I'm afraid there's no such doc yet. what's exactly "assumption"
> you
> >> are looking for, to know the code level implementation or how to
> optimize?*
> --> Not at code / implementation level. More at a feature level. Like the
> one shared by Li Yang regarding a TS column for differentiation. Also, on
> how it breaks it up into segments and how a user can rebuild part of the
> segments
>
>
> >    Do we allow 'updates' on a facts ?
> > 1) Because of some typo the quantity came in as 100 instead of 10. What
> is
> > the suggested approach to handle this.
> >>So you want to refresh a built piece of data. And yes, that's doable.
> Kylin
> >>cut cube into segments by time period. You can refresh (or rebuild) a
> >>segment without impacting the rests.
> --> a) How does kylin cut the data (initial / incremental) into segments ?
> Does one day become one segment ?  b) When new data comes, does it
> automatically figure out which segments to rebuild ?  c) How to rebuild
> only part of the data / segment via Rest API.
>
>
> >> Luke: Do you mean data model changes? Then you have to disable that
> >>cube, purge data and refine it, the rebuild it.
> --> No only data changes, not model changes. For model as now I understand
> we have to rebuild the full cube.
>
>
> >    How to support deletes in fact / dimension ?
> >
> >>*      Luke: delete in fact table is fine, but in dimension should be
> >>careful, properly it will require rebuild.*
> Lets for a time period T1-T2 there were 100 records earlier, now due to
> deletion it should be only 98 for the same time period. How to trigger
> delete of the 2 records ?  Is it to populate all 98 records in facttable
> and then ask kylin to rebuild for T1-T2 ?
>
>
>
>
>
>
>
>
>
>
> Regards,
> Abhilash
>
> On Fri, Dec 25, 2015 at 7:35 AM, Li Yang <[email protected]> wrote:
>
> > Em.. don't think Luke has all the questions fully answered. My additions.
> >
> > >    Is there a document explaining the assumptions for incremental
> builds.
> > The only assumption (or requirement) is that there is date or timestamp
> > column on the fact table that distinguishes the old from the new.
> >
> > >    Do we allow 'updates' on a facts ?
> > > 1) Because of some typo the quantity came in as 100 instead of 10. What
> > is
> > > the suggested approach to handle this.
> > So you want to refresh a built piece of data. And yes, that's doable.
> Kylin
> > cut cube into segments by time period. You can refresh (or rebuild) a
> > segment without impacting the rests.
> >
> > > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now
> it
> > > got updated to d2 for the same dimension. How does it 'deduct' from the
> > > aggregation for d1 for all cuboids and 'accumulate' for d2 in all
> > cuboids.
> > >
> > >    How to support Slowly Changing Dimensions (SCD). Support for type 2
> > and
> > > type 3.
> > The design is Kylin remembers data at the point it's built. So you may
> > build a daily segment on T day with category set C in lookup table; then
> on
> > T+1 day, the category lookup table is updated into C~, and with that
> build
> > a T+1 daily segment. Now if you query the cube, it will report categories
> > including both C and C~. More precisely Kylin will return C for T day
> > transactions and C~ for T+1 transactions.
> >
> > If what you want is to reflect C~ in historic data, then earlier segments
> > have to be rebuild.
> >
> > On Thu, Dec 24, 2015 at 10:59 PM, Luke Han <[email protected]> wrote:
> >
> > > Hi Abhilash,
> > >     Please refer to below comments inline.
> > >
> > >     Thanks.
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Thu, Dec 10, 2015 at 2:28 PM, Abhilash L L <[email protected]>
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > >    Is there a document explaining the assumptions for incremental
> > builds.
> > > > *Luke: I'm afraid there's no such doc yet. what's exactly
> "assumption"
> > > you
> > > > are looking for, to know the code level implementation or how to
> > > optimize?*
> > >
> > >
> > >
> > > >
> > > >    Is it purely additive ? Lets say category id is one my row key
> > > > components. I had 10 products on category id 20. Now I got a new
> > product
> > > > for same category would it add up. Would distinct count also be fine
> ?
> > > >
> > > *      Luke:  Kylin performs very well for such case, it will add up to
> > 21,
> > > also for distinct count, but the result of distinct count is
> > > approximately.*
> > >
> > > >
> > > >    Do we allow 'updates' on a facts ?
> > > > 1) Because of some typo the quantity came in as 100 instead of 10.
> What
> > > is
> > > > the suggested approach to handle this.
> > > >
> > >        Luke: Do you mean data model changes? Then you have to disable
> > that
> > > cube, purge data and refine it, the rebuild it.
> > >
> > > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now
> it
> > > > got updated to d2 for the same dimension. How does it 'deduct' from
> the
> > > > aggregation for d1 for all cuboids and 'accumulate' for d2 in all
> > > cuboids.
> > > >
> > > >    How to support Slowly Changing Dimensions (SCD). Support for type
> 2
> > > and
> > > > type 3.
> > > >
> > > *      Luke: Kylin does not support SCD very well yet.*
> > >
> > > >
> > > >    How to support deletes in fact / dimension ?
> > > >
> > > *      Luke: delete in fact table is fine, but in dimension should be
> > > careful, properly it will require rebuild.*
> > >
> > > >
> > > >
> > > >    If theres a document explaining already, it would help us and a
> lot
> > of
> > > > people.
> > > >
> > > >
> > > > Regards,
> > > > Abhilash
> > > >
> > >
> >
>

Reply via email to