Congratulations, hope you will make good use of the ability of Kylin 5 for your use cases.
------------------------ With warm regard Xiaoxiang Yu On Thu, Nov 2, 2023 at 10:50 AM Nam Đỗ Duy <[email protected]> wrote: > The query is too fast, less than a second, can you make it a little bit > slower so that I can see it clearly 😀😀 > [image: image.png] > > On Thu, Nov 2, 2023 at 9:32 AM Nam Đỗ Duy <[email protected]> wrote: > >> Thank you Xiaoxiang for the guideline. Will definitely read it carefully. >> Kindly help the following questions: >> >> 1. Computed column >> >> I created a “computed column” and add it to dimensions (among other >> dimensions) >> >> When I use query to select the computed column it returned error >> >> 2. Datatype optimization: will you think that the int be better than >> string for key join columns? >> >> Please advise >> >> >> On Wed, 1 Nov 2023 at 17:32 Xiaoxiang Yu <[email protected]> wrote: >> >>> Yes, that is almost correct. >>> >>> If you have a lot of complex queries, and you want to using Kylin 5 to >>> accelerate them, the recommended steps of mine are as follows: >>> >>> 1. You analyse all queries and collect all join relation/pattern. >>> 2. You create Models for each specific join relation/pattern, with the >>> join >>> relation you find in above step. >>> 3. You analyse and collect dimensions and measures from all queries, and >>> add them to the corresponding Model. >>> 4. You build segments of all Models with proper data range. >>> 5. You turned off the pushdown switch, and sent all queries to Kylin. If >>> there are some queries which failed, fix them. >>> Here are some common situations. >>> 5.1 Join relation/pattern is not matched >>> 5.2 If the join relation is matched, the Model might not contain >>> every >>> column that your query needs, please check kylin.query.log with keyword ' >>> unmatched'. >>> 6. (Optional) If you find some of your queries do not exactly match with >>> your Index(your query on [colA, colB], but your index contains more >>> columns >>> than colA and colB), you can add some aggregate groups(or smaller Table >>> Index) to optimize the query performance. >>> >>> >>> >>> ------------------------ >>> With warm regard >>> Xiaoxiang Yu >>> >>> >>> >>> On Wed, Nov 1, 2023 at 5:57 PM Nam Đỗ Duy <[email protected]> >>> wrote: >>> >>> > Thank you Xiaoxiang, I nearly got to the point. >>> > >>> > So can I interpret that: 1 model equal (~) to a set of Joins of >>> (Dim/Fact) >>> > table, that is to say we need to create several models according to >>> > multiple kinds of joins queries? >>> > >>> > Best regards >>> > >>> > On Wed, Nov 1, 2023 at 4:50 PM Xiaoxiang Yu <[email protected]> wrote: >>> > >>> >> Have you ever tried to analyse the reason why your query can not hit >>> >> Model 'sample_ssb'? >>> >> It is because the join relation of your query is not suitable for the >>> >> join relation/pattern of Model 'sample_ssb'. >>> >> >>> >> Your query used a join relation/pattern like: A inner join B. >>> >> But the Model 'sample_ssb' used a join relation/pattern like : A inner >>> >> join B inner join C. >>> >> >>> >> If you are familiar with the definition of Inner join, you may know >>> that >>> >> the >>> >> relation/pattern 'A inner join B inner join C' will have a chance >>> >> to lose some rows when compared to pattern 'A inner join B'. >>> >> So the Model 'sample_ssb' will be excluded to serve your query. >>> >> >>> >> That is to say, you need to create a new model that is similar to >>> Model >>> >> 'sample_ssb', >>> >> but with additional tables removed. >>> >> >>> >> >>> >> >>> >> ------------------------ >>> >> With warm regard >>> >> Xiaoxiang Yu >>> >> >>> >> >>> >> >>> >> On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy <[email protected]> >>> wrote: >>> >> >>> >>> Hi Xiaoxiang, >>> >>> >>> >>> Thank you very much >>> >>> >>> >>> I have clearer picture of Kylin already thanks to your explanation. >>> >>> >>> >>> Now back to the sample project of SSB in attached photo, when I run >>> this >>> >>> query with push_down option OFF, why the OLAP error appears, and in >>> such >>> >>> case, how to create a new cube for this query? >>> >>> >>> >>> [image: image.png] >>> >>> >>> >>> On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <[email protected]> wrote: >>> >>> >>> >>>> Here is some of my explanation and it may not be perfect. >>> >>>> Segment in Kylin is part of model/cube pre-computed data, in most >>> >>>> cases, divided by date column. >>> >>>> >>> >>>> Here is some difference between Segment and Snapshot. >>> >>>> Segment, whose source data comes from one fact table joins some >>> dimension >>> >>>> tables with 'specific date range', is 'precomputed', and will >>> accelerate >>> >>>> complex query. >>> >>>> Snapshot, whose source data comes from one specific dimension table >>> without >>> >>>> specific date range, is "not precomputed", and can join with >>> segments >>> >>>> at runtime . >>> >>>> >>> >>>> - https://kylin.apache.org/5.0/docs/snapshot/snapshot_management >>> >>>> - >>> >>>> >>> https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro >>> >>>> >>> >>>> ------------------------ >>> >>>> With warm regard >>> >>>> Xiaoxiang Yu >>> >>>> >>> >>>> >>> >>>> >>> >>>> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <[email protected]> wrote: >>> >>>> >>> >>>>> Thank you again, very smart of you to automatically select cube >>> for a >>> >>>>> certain query. Sorry If I ask too much: Is the concept of Segment >>> in Kylin >>> >>>>> model similar to Slice-and-Dice concept of Cube, what is the >>> different >>> >>>>> between Kylin Segment and Kylin Snapshot? >>> >>>>> >>> >>>>> PS. I sent you the log files for your help in investigating why my >>> >>>>> cube has not been used. >>> >>>>> >>> >>>>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <[email protected]> >>> wrote: >>> >>>>> >>> >>>>>> I guess there is a misunderstanding from your sentences. >>> >>>>>> >>> >>>>>> -- 'I need to select Cube from a combo box below the query window' >>> >>>>>> It is not right to use 'need', that combo box is for some specific >>> >>>>>> cases(for example, Kylin did not choose a cube which is the most >>> >>>>>> efficient), not the most cases. >>> >>>>>> In most cases(both for Kylin 4 and Kylin 5), you don't need to >>> select >>> >>>>>> a Cube in the combo box, Kylin will do the choice for you. >>> >>>>>> >>> >>>>>> ------------------------ >>> >>>>>> With warm regard >>> >>>>>> Xiaoxiang Yu >>> >>>>>> >>> >>>>>> >>> >>>>>> >>> >>>>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <[email protected] >>> > >>> >>>>>> wrote: >>> >>>>>> >>> >>>>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a >>> >>>>>>> question of a beginner) >>> >>>>>>> >>> >>>>>>> "obviously" means "clearly" >>> >>>>>>> >>> >>>>>>> because I need to select Cube from a combo box below the query >>> window >>> >>>>>>> >>> >>>>>>> Thank you very much >>> >>>>>>> >>> >>>>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <[email protected]> >>> wrote: >>> >>>>>>> >>> >>>>>>>> From my side, I cannot understand why you say Kylin 4 is 'very >>> >>>>>>>> obviously'. Can you give an example? >>> >>>>>>>> From the source code, the basic logic of choosing the right >>> >>>>>>>> cube/model are similar. >>> >>>>>>>> ------------------------ >>> >>>>>>>> With warm regard >>> >>>>>>>> Xiaoxiang Yu >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> >>> >>>>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <[email protected]> >>> wrote: >>> >>>>>>>> >>> >>>>>>>>> Thank you for your kind reply, please answer 1 more question >>> about >>> >>>>>>>>> version 5: >>> >>>>>>>>> >>> >>>>>>>>> In version 4.x we run query against a Cube very obviously, but >>> in >>> >>>>>>>>> version 5, the cube usage is a implication socan you advise: >>> for a given >>> >>>>>>>>> query, which model will be used, which index (cube) will be >>> used for this >>> >>>>>>>>> query? >>> >>>>>>>>> >>> >>>>>>>>> Thank you >>> >>>>>>>>> >>> >>>>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <[email protected]> >>> >>>>>>>>> wrote: >>> >>>>>>>>> >>> >>>>>>>>>> 1. How do I measure the size of the index (cube) in version 5? >>> >>>>>>>>>> You can check storage of specific Indexes from the Index >>> page. >>> >>>>>>>>>> >>> >>>>>>>>>> >>> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index >>> >>>>>>>>>> or >>> >>>>>>>>>> >>> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> 2. How to create the cardinality for each column? >>> >>>>>>>>>> You should check this link : >>> >>>>>>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ . >>> >>>>>>>>>> >>> >>>>>>>>>> 3. In your default project sample named SSB project, you have >>> >>>>>>>>>> only 4 simple aggregate group index and no table index as in >>> attached file >>> >>>>>>>>>> so what is the best strategy to select index for our OLAP? >>> >>>>>>>>>> 1. There does exist a 'Base Table Index' by default >>> >>>>>>>>>> actually, its id is 20000000001. >>> >>>>>>>>>> 2. I think it is a good question and Kylin 5 lacks such a >>> >>>>>>>>>> guide for better modeling. You are free to ask your question >>> to >>> >>>>>>>>>> mailing list and I will try to reply. >>> >>>>>>>>>> >>> >>>>>>>>>> ------------------------ >>> >>>>>>>>>> With warm regard >>> >>>>>>>>>> Xiaoxiang Yu >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> >>> >>>>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <[email protected]> >>> >>>>>>>>>> wrote: >>> >>>>>>>>>> >>> >>>>>>>>>>> OK, I didn't read all the mail history so I misunderstand the >>> >>>>>>>>>>> situation. Looks like you need to analyse >>> >>>>>>>>>>> the cause why the query didn't hit the cube correctly. >>> >>>>>>>>>>> >>> >>>>>>>>>>> Please generate query diagnosis package and send it to me >>> >>>>>>>>>>> privately. I will analyse the query log. >>> >>>>>>>>>>> You can refer to the following steps in screenshots. >>> >>>>>>>>>>> >>> >>>>>>>>>>> [image: image.png] >>> >>>>>>>>>>> >>> >>>>>>>>>>> If the screenshots are not displaying correctly, please read >>> >>>>>>>>>>> this guide : >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui >>> >>>>>>>>>>> >>> >>>>>>>>>>> By the way, you need to analyse the cause by reading >>> >>>>>>>>>>> kylin.query.log, not the kylin.log, >>> >>>>>>>>>>> refer to >>> >>>>>>>>>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log >>> >>>>>>>>>>> >>> >>>>>>>>>>> ------------------------ >>> >>>>>>>>>>> With warm regard >>> >>>>>>>>>>> Xiaoxiang Yu >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> >>> >>>>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <[email protected]> >>> >>>>>>>>>>> wrote: >>> >>>>>>>>>>> >>> >>>>>>>>>>>> Thank you Xiaoxiang for your advice. As my title email >>> shown, I >>> >>>>>>>>>>>> guessed that the OLAP functionalities has not been >>> correctly set up in my >>> >>>>>>>>>>>> computer. >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> The evidence about it is that: when I disable the Pushdown >>> >>>>>>>>>>>> option box to use solely the precomputation cube only, it >>> showed following >>> >>>>>>>>>>>> error: Please kindly advise how to properly build the OLAP >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> LIMIT 500": No realization found for OLAPContext, >>> MODEL_UNMATCHED_JOIN, >>> rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, >>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, >>> 12, 13, 14, 15, 16, 17, 18, 19, 20]) >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> >>> >>>>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu < >>> [email protected]> >>> >>>>>>>>>>>> wrote: >>> >>>>>>>>>>>> >>> >>>>>>>>>>>>> Hi, >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Yesterday, I tried to see if query pushdown functions >>> work >>> >>>>>>>>>>>>> well in the Kylin5 docker, and all of my queries return >>> proper responses . >>> >>>>>>>>>>>>> After checking your logs from Shaofeng, I found these >>> >>>>>>>>>>>>> error messages repeated many times: >>> >>>>>>>>>>>>> 1. 'java.io.IOException: All datanodes >>> >>>>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866 >>> ,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] >>> >>>>>>>>>>>>> are bad. Aborting...' >>> >>>>>>>>>>>>> 2. 'curator.ConnectionState : Connection timed out for >>> >>>>>>>>>>>>> connection string (localhost:2181) and timeout (15000) / >>> elapsed (41794) >>> >>>>>>>>>>>>> org.apache.curator.CuratorConnectionLossException: >>> >>>>>>>>>>>>> KeeperErrorCode = ConnectionLoss' >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> I guess the root cause is that the container didn't not >>> >>>>>>>>>>>>> have enough resources. I found you query on a table called >>> >>>>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a >>> complex query on a >>> >>>>>>>>>>>>> table which contains 400 million rows? >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Since I am the uploader of kylin5 's docker image, I >>> want >>> >>>>>>>>>>>>> to give some explainment. Kylin5 docker is not a place for >>> performance >>> >>>>>>>>>>>>> benchmarks, it is only for demonstration. It is only >>> allocated with very >>> >>>>>>>>>>>>> little resources(8G memory) if you are using the default >>> command from >>> >>>>>>>>>>>>> docker hub page. Before I uploaded my image, I only tested >>> my image using >>> >>>>>>>>>>>>> the ssb dataset, which the biggest table only contains >>> about 60k rows. If >>> >>>>>>>>>>>>> you are using a larger dataset and complexer queries, you >>> have to scale the >>> >>>>>>>>>>>>> resource properly. Try querying tables which contain not >>> more than 100k >>> >>>>>>>>>>>>> rows by default. >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> Here are some tips which may help you to check if the >>> >>>>>>>>>>>>> daemon service is in health status and >>> resources(particularly disk space) >>> >>>>>>>>>>>>> is configured properly. >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> 1. Checking HDFS 's web ui( >>> >>>>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to >>> >>>>>>>>>>>>> confirm whether HDFS service is in 'In service' status. >>> >>>>>>>>>>>>> 2. Checking Datanode 's log in >>> >>>>>>>>>>>>> >>> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, check if >>> >>>>>>>>>>>>> there is any error message. Like: cat >>> >>>>>>>>>>>>> >>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep ERROR >>> >>>>>>>>>>>>> | wc -l >>> >>>>>>>>>>>>> 3. Checking if your docker engine is configured with >>> >>>>>>>>>>>>> enough disk space, if you are using Docker Desktop like >>> me,please go to >>> >>>>>>>>>>>>> "Settings" - "Resources" - "Advanced", make sure you have >>> allocated 40GB+ >>> >>>>>>>>>>>>> disk space to the docker container. >>> >>>>>>>>>>>>> 4. Checking the available disk space of your container >>> by >>> >>>>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less than >>> 60% . >>> >>>>>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make >>> sure >>> >>>>>>>>>>>>> these metrics are not really high when you send a query. >>> >>>>>>>>>>>>> ------------------------ >>> >>>>>>>>>>>>> With warm regard >>> >>>>>>>>>>>>> Xiaoxiang Yu >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy >>> >>>>>>>>>>>>> <[email protected]> wrote: >>> >>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Hi ShaoFeng >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> Thank you very much for your valuable feedback >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> I saw the application to be there (if I see it right) as >>> in >>> >>>>>>>>>>>>>> the attachment photo. Kindly advise so that I can run >>> this query on OLAP. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> PS. I sent you the log file in private. >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> [image: image.png] >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < >>> >>>>>>>>>>>>>> [email protected]> wrote: >>> >>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Can you provide the messages in logs/kylin.log when >>> >>>>>>>>>>>>>>> executing the SQL? and you can also check the Spark UI >>> from yarn resource >>> >>>>>>>>>>>>>>> manager (there should be one running application called >>> Spardar, which is >>> >>>>>>>>>>>>>>> Kylin's backend spark application). If the application >>> is not there, it may >>> >>>>>>>>>>>>>>> indicates the yarn doesn't have resource to startup it. >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Best regards, >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Shaofeng Shi 史少锋 >>> >>>>>>>>>>>>>>> Apache Kylin PMC, >>> >>>>>>>>>>>>>>> Apache Incubator PMC, >>> >>>>>>>>>>>>>>> Email: [email protected] >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Apache Kylin FAQ: >>> >>>>>>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >>> >>>>>>>>>>>>>>> Join Kylin user mail group: >>> [email protected] >>> >>>>>>>>>>>>>>> Join Kylin dev mail group: >>> [email protected] >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> Nam Đỗ Duy <[email protected]> 于2023年10月31日周二 10:35写道: >>> >>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> Dear Sir/Madam, >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> I have a fact with 500million rows then I build model, >>> >>>>>>>>>>>>>>>> index according to the website help. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> I chose full incremental because this is the first >>> times I >>> >>>>>>>>>>>>>>>> load data >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> I create both index types Aggregate group index, table >>> >>>>>>>>>>>>>>>> index as photo attached. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> But the query always failed after timeout of 300 >>> seconds (I >>> >>>>>>>>>>>>>>>> run in docker), I dont want to increase the value of >>> 300 seconds because I >>> >>>>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that >>> possible?) >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> It seems that the OLAP function in indexing not working >>> to >>> >>>>>>>>>>>>>>>> speedup the query by precomputed cube. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> Can you advise to check whether the index did really >>> work? >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> It is quite urgent task for me so prompt response is >>> highly >>> >>>>>>>>>>>>>>>> appreciated. >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>>> Thank you very much >>> >>>>>>>>>>>>>>>> >>> >>>>>>>>>>>>>>> >>> >>
