Thanks for the clarification Luke, Li Yang. Please find my comments / questions inline
> Is there a document explaining the assumptions for incremental builds. >> *Luke: I'm afraid there's no such doc yet. what's exactly "assumption" you >> are looking for, to know the code level implementation or how to optimize?* --> Not at code / implementation level. More at a feature level. Like the one shared by Li Yang regarding a TS column for differentiation. Also, on how it breaks it up into segments and how a user can rebuild part of the segments > Do we allow 'updates' on a facts ? > 1) Because of some typo the quantity came in as 100 instead of 10. What is > the suggested approach to handle this. >>So you want to refresh a built piece of data. And yes, that's doable. Kylin >>cut cube into segments by time period. You can refresh (or rebuild) a >>segment without impacting the rests. --> a) How does kylin cut the data (initial / incremental) into segments ? Does one day become one segment ? b) When new data comes, does it automatically figure out which segments to rebuild ? c) How to rebuild only part of the data / segment via Rest API. >> Luke: Do you mean data model changes? Then you have to disable that >>cube, purge data and refine it, the rebuild it. --> No only data changes, not model changes. For model as now I understand we have to rebuild the full cube. > How to support deletes in fact / dimension ? > >>* Luke: delete in fact table is fine, but in dimension should be >>careful, properly it will require rebuild.* Lets for a time period T1-T2 there were 100 records earlier, now due to deletion it should be only 98 for the same time period. How to trigger delete of the 2 records ? Is it to populate all 98 records in facttable and then ask kylin to rebuild for T1-T2 ? Regards, Abhilash On Fri, Dec 25, 2015 at 7:35 AM, Li Yang <liy...@apache.org> wrote: > Em.. don't think Luke has all the questions fully answered. My additions. > > > Is there a document explaining the assumptions for incremental builds. > The only assumption (or requirement) is that there is date or timestamp > column on the fact table that distinguishes the old from the new. > > > Do we allow 'updates' on a facts ? > > 1) Because of some typo the quantity came in as 100 instead of 10. What > is > > the suggested approach to handle this. > So you want to refresh a built piece of data. And yes, that's doable. Kylin > cut cube into segments by time period. You can refresh (or rebuild) a > segment without impacting the rests. > > > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it > > got updated to d2 for the same dimension. How does it 'deduct' from the > > aggregation for d1 for all cuboids and 'accumulate' for d2 in all > cuboids. > > > > How to support Slowly Changing Dimensions (SCD). Support for type 2 > and > > type 3. > The design is Kylin remembers data at the point it's built. So you may > build a daily segment on T day with category set C in lookup table; then on > T+1 day, the category lookup table is updated into C~, and with that build > a T+1 daily segment. Now if you query the cube, it will report categories > including both C and C~. More precisely Kylin will return C for T day > transactions and C~ for T+1 transactions. > > If what you want is to reflect C~ in historic data, then earlier segments > have to be rebuild. > > On Thu, Dec 24, 2015 at 10:59 PM, Luke Han <luke...@gmail.com> wrote: > > > Hi Abhilash, > > Please refer to below comments inline. > > > > Thanks. > > > > > > Best Regards! > > --------------------- > > > > Luke Han > > > > On Thu, Dec 10, 2015 at 2:28 PM, Abhilash L L <abhil...@infoworks.io> > > wrote: > > > > > Hello, > > > > > > Is there a document explaining the assumptions for incremental > builds. > > > *Luke: I'm afraid there's no such doc yet. what's exactly "assumption" > > you > > > are looking for, to know the code level implementation or how to > > optimize?* > > > > > > > > > > > > Is it purely additive ? Lets say category id is one my row key > > > components. I had 10 products on category id 20. Now I got a new > product > > > for same category would it add up. Would distinct count also be fine ? > > > > > * Luke: Kylin performs very well for such case, it will add up to > 21, > > also for distinct count, but the result of distinct count is > > approximately.* > > > > > > > > Do we allow 'updates' on a facts ? > > > 1) Because of some typo the quantity came in as 100 instead of 10. What > > is > > > the suggested approach to handle this. > > > > > Luke: Do you mean data model changes? Then you have to disable > that > > cube, purge data and refine it, the rebuild it. > > > > 2) Lets say the the value for dimension 1 was d1 in the facttable. Now it > > > got updated to d2 for the same dimension. How does it 'deduct' from the > > > aggregation for d1 for all cuboids and 'accumulate' for d2 in all > > cuboids. > > > > > > How to support Slowly Changing Dimensions (SCD). Support for type 2 > > and > > > type 3. > > > > > * Luke: Kylin does not support SCD very well yet.* > > > > > > > > How to support deletes in fact / dimension ? > > > > > * Luke: delete in fact table is fine, but in dimension should be > > careful, properly it will require rebuild.* > > > > > > > > > > > If theres a document explaining already, it would help us and a lot > of > > > people. > > > > > > > > > Regards, > > > Abhilash > > > > > >