Hi WILL,
Q: What are the compute/storage constraints on this? A: computation and storage are unlimited in theory. Q: Where is the data stored? A: if you use the branch of kylin4_on_cloud means that you are storing the data on S3. If you use Kylin on the local cluster, it will store the data on HDFS. Q: Which nodes are the computation occurring on? A: Kylin4 uses spark to be the computation part, so nodes that are spark loaded will be the computation occurring on. By the way, the branch of kylin4_on_cloud has an architecture image for your interest. https://github.com/apache/kylin/tree/kylin4_on_cloud#architecture Q: if we have a large number of dimensions, what part of the cloud based kylin needs to be increased? A: if the number of dimensions is in your scene, it suggests you split the dimensions to be a fact table and multiple lookup tables. Then you just add the new computation nodes to help Kylin 4 to build and query on cloud. -- Best regards. Tengting Xu At 2022-10-12 02:21:13, "Will Glass-Husain" <wgl...@forio.com> wrote: >Thank you -- very helpful. > >Regarding limits on the number of dimensions. What are the >compute/storage constraints on this? For a given query: >* Where is the data stored >* Which nodes is the computation occurring on? > >I am trying to figure out -- if we have a large number of dimensions, what >part of the cloud based kylin needs to be increased (I'm doing the setup >from the kylin4_on_cloud branch) > >Thanks, WILL > >On Tue, Oct 11, 2022 at 1:20 AM Xiaoxiang Yu <x...@apache.org> wrote: > >> 1) The criteria for filtering (e.g. selecting sex='male') and grouping (e.g. >> group by state) should be dimensions - is this correct? >> Yes, besides Kylin has limit of 63 dimensions at maximum. But you should >> be aware of 'The Curse of Dimensionality'. >> >> 2.1) Items that I would like to sum should be measures, is that right? >> Yes. >> >> 2.2) Is there a limit to the number of measures? >> No, there isn't such limit. >> >> 3) Did Kylin support sum(expression)? >> From mysql doc >> https://dev.mysql.com/doc/refman/5.7/en/aggregate-functions.html#function_sum >> , >> we know MySQL supports it. >> For Kylin, Kylin should support it for Kylin 3.X and the future version >> 5.x. But unluckily, Kylin 4.x didn't support sum exprssion, and Kylin 4.x >> is the version you are using. >> >> 4) Does Kylin support MEDIAN? >> >> Yes, Kylin should support but I didn't test it. In fact, Kylin has a >> measure PERCENTILE, and I think 50th percentile is equal to MEDIAN, am I >> right? >> >> -- >> *Best wishes to you ! * >> *From :**Xiaoxiang Yu* >> >> >> >> At 2022-10-11 14:03:14, "Will Glass-Husain" <wgl...@forio.com> wrote: >> >Hi, >> > >> >Thanks for the recent help as I set up my first Kylin system. I have a >> >question regarding proper design of a cube to run some >> >demographic queries. I want to make this accessible in a webapp, with >> >reasonable response time. >> > >> >I have a CSV file with about 80 columns on sex, race, state, age, internet >> >access, job, etc. >> > >> >Can you advise regarding proper cube design? >> > >> >1) The criteria for filtering (e.g. selecting sex='male') and grouping >> >(e.g. group by state) should be dimensions - is this correct? >> > >> >2) Items that I would like to sum should be measures, is that right? Is >> >there a limit to the number of measures? I want to report out up to 300 >> >different measures aggregated by the dimensions. >> > >> >3) >> >In MySQL, I am querying for different values like this >> > >> >select SUM((married=1) * weight) as MARRIED_1, SUM((married=2) * weight) as >> >MARRIED_2 from data group by state; >> > >> >This returns the total number of weighted records for records where married >> >is 1 and where married is 2. >> > >> >Question - is there a way to do this in the Kylin query? Or do I need to >> >pre-compute my weights and create columns MARRIED_1 and MARRIED_2 in the >> >source data, then sum it in Kylin. >> > >> >4) This is a tricky one. Does Kylin support MEDIAN? In MySQL, there's no >> >MEDIAN function but we can calculate it by counting all the records, then >> >selecting the record at an offset of half the records. I want to >> >calculate "median" (not mean) for age and some other variables. >> > >> >Thanks for any tips. >> > >> >Best, WILL >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> >-- >> >William Glass-Husain /forio | +1 (415) 440 7500 x802 | forio.com >> ><http://www.forio.com/> >> >> > >-- >William Glass-Husain /forio | +1 (415) 440 7500 x802 | forio.com ><http://www.forio.com/>