Hi WILL,

Q: What are the compute/storage constraints on this?
A: computation and storage are unlimited in theory.


Q: Where is the data stored?
A: if you use the branch of kylin4_on_cloud means that you are storing the data 
on S3. If you use Kylin on the local cluster, it will store the data on HDFS.


Q: Which nodes are the computation occurring on?
A: Kylin4 uses spark to be the computation part, so nodes that are spark loaded 
will be the computation occurring on. By the way, the branch of kylin4_on_cloud 
has an architecture image for your interest.
https://github.com/apache/kylin/tree/kylin4_on_cloud#architecture


Q: if we have a large number of dimensions, what part of the cloud based kylin 
needs to be increased?
A: if the number of dimensions is in your scene, it suggests you split the 
dimensions to be a fact table and multiple lookup tables. Then you just add the 
new computation nodes to help Kylin 4 to build and query on cloud.



--

Best regards.
Tengting Xu





At 2022-10-12 02:21:13, "Will Glass-Husain" <wgl...@forio.com> wrote:
>Thank you -- very helpful.
>
>Regarding limits on the number of dimensions.    What are the
>compute/storage constraints on this?  For a given query:
>* Where is the data stored
>* Which nodes is the computation occurring on?
>
>I am trying to figure out -- if we have a large number of dimensions, what
>part of the cloud based kylin  needs to be increased (I'm doing the setup
>from the kylin4_on_cloud branch)
>
>Thanks, WILL
>
>On Tue, Oct 11, 2022 at 1:20 AM Xiaoxiang Yu <x...@apache.org> wrote:
>
>> 1) The criteria for filtering (e.g. selecting sex='male') and grouping (e.g.
>> group by state) should be dimensions - is this correct?
>> Yes, besides Kylin has limit of 63 dimensions at maximum.  But you should
>> be aware of 'The Curse of Dimensionality'.
>>
>> 2.1) Items that I would like to sum should be measures, is that right?
>> Yes.
>>
>> 2.2) Is there a limit to the number of measures?
>> No, there isn't such limit.
>>
>> 3) Did Kylin support sum(expression)?
>> From mysql doc
>> https://dev.mysql.com/doc/refman/5.7/en/aggregate-functions.html#function_sum
>>  ,
>> we know MySQL supports it.
>> For Kylin, Kylin should support it for Kylin 3.X and the future version
>> 5.x. But unluckily, Kylin 4.x didn't support sum exprssion, and Kylin 4.x
>> is the version you are using.
>>
>> 4) Does Kylin support MEDIAN?
>>
>> Yes, Kylin should support but I didn't test it. In fact, Kylin has a
>> measure PERCENTILE, and I think 50th percentile is equal to MEDIAN, am I
>> right?
>>
>> --
>> *Best wishes to you ! *
>> *From :**Xiaoxiang Yu*
>>
>>
>>
>> At 2022-10-11 14:03:14, "Will Glass-Husain" <wgl...@forio.com> wrote:
>> >Hi,
>> >
>> >Thanks for the recent help as I set up my first Kylin system.   I have a
>> >question regarding proper design of a cube to run some
>> >demographic queries.   I want to make this accessible in a webapp, with
>> >reasonable response time.
>> >
>> >I have a CSV file with about 80 columns on sex, race, state, age, internet
>> >access, job, etc.
>> >
>> >Can you advise regarding proper cube design?
>> >
>> >1) The criteria for filtering (e.g. selecting sex='male') and grouping
>> >(e.g. group by state) should be dimensions - is this correct?
>> >
>> >2) Items that I would like to sum should be measures, is that right?   Is
>> >there a limit to the number of measures?  I want to report out up to 300
>> >different measures aggregated by the dimensions.
>> >
>> >3)
>> >In MySQL, I am querying for different values like this
>> >
>> >select SUM((married=1) * weight) as MARRIED_1, SUM((married=2) * weight) as
>> >MARRIED_2 from data group by state;
>> >
>> >This returns the total number of weighted records for records where married
>> >is 1 and where married is 2.
>> >
>> >Question - is there a way to do this in the Kylin query?    Or do I need to
>> >pre-compute my weights and create columns MARRIED_1 and MARRIED_2 in the
>> >source data, then sum it in Kylin.
>> >
>> >4) This is a tricky one.  Does Kylin support MEDIAN?   In MySQL, there's no
>> >MEDIAN function but we can calculate it by counting all the records, then
>> >selecting the record at an offset of half the records.   I want to
>> >calculate "median" (not mean) for age and some other variables.
>> >
>> >Thanks for any tips.
>> >
>> >Best, WILL
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >--
>> >William Glass-Husain   /forio  |  +1 (415) 440 7500 x802  |  forio.com
>> ><http://www.forio.com/>
>>
>>
>
>-- 
>William Glass-Husain   /forio  |  +1 (415) 440 7500 x802  |  forio.com
><http://www.forio.com/>

Reply via email to