Have you ever tried to analyse the reason why your query can not hit Model 'sample_ssb'? It is because the join relation of your query is not suitable for the join relation/pattern of Model 'sample_ssb'.
Your query used a join relation/pattern like: A inner join B. But the Model 'sample_ssb' used a join relation/pattern like : A inner join B inner join C. If you are familiar with the definition of Inner join, you may know that the relation/pattern 'A inner join B inner join C' will have a chance to lose some rows when compared to pattern 'A inner join B'. So the Model 'sample_ssb' will be excluded to serve your query. That is to say, you need to create a new model that is similar to Model 'sample_ssb', but with additional tables removed. ------------------------ With warm regard Xiaoxiang Yu On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Hi Xiaoxiang, > > Thank you very much > > I have clearer picture of Kylin already thanks to your explanation. > > Now back to the sample project of SSB in attached photo, when I run this > query with push_down option OFF, why the OLAP error appears, and in such > case, how to create a new cube for this query? > > [image: image.png] > > On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <x...@apache.org> wrote: > >> Here is some of my explanation and it may not be perfect. >> Segment in Kylin is part of model/cube pre-computed data, in most >> cases, divided by date column. >> >> Here is some difference between Segment and Snapshot. >> Segment, whose source data comes from one fact table joins some dimension >> tables with 'specific date range', is 'precomputed', and will accelerate >> complex query. >> Snapshot, whose source data comes from one specific dimension table without >> specific date range, is "not precomputed", and can join with segments at >> runtime . >> >> - https://kylin.apache.org/5.0/docs/snapshot/snapshot_management >> - >> https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro >> >> ------------------------ >> With warm regard >> Xiaoxiang Yu >> >> >> >> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >> >>> Thank you again, very smart of you to automatically select cube for a >>> certain query. Sorry If I ask too much: Is the concept of Segment in Kylin >>> model similar to Slice-and-Dice concept of Cube, what is the different >>> between Kylin Segment and Kylin Snapshot? >>> >>> PS. I sent you the log files for your help in investigating why my cube >>> has not been used. >>> >>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <x...@apache.org> wrote: >>> >>>> I guess there is a misunderstanding from your sentences. >>>> >>>> -- 'I need to select Cube from a combo box below the query window' >>>> It is not right to use 'need', that combo box is for some specific >>>> cases(for example, Kylin did not choose a cube which is the most >>>> efficient), not the most cases. >>>> In most cases(both for Kylin 4 and Kylin 5), you don't need to select a >>>> Cube in the combo box, Kylin will do the choice for you. >>>> >>>> ------------------------ >>>> With warm regard >>>> Xiaoxiang Yu >>>> >>>> >>>> >>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >>>> wrote: >>>> >>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a >>>>> question of a beginner) >>>>> >>>>> "obviously" means "clearly" >>>>> >>>>> because I need to select Cube from a combo box below the query window >>>>> >>>>> Thank you very much >>>>> >>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>>> >>>>>> From my side, I cannot understand why you say Kylin 4 is 'very >>>>>> obviously'. Can you give an example? >>>>>> From the source code, the basic logic of choosing the right >>>>>> cube/model are similar. >>>>>> ------------------------ >>>>>> With warm regard >>>>>> Xiaoxiang Yu >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >>>>>> >>>>>>> Thank you for your kind reply, please answer 1 more question about >>>>>>> version 5: >>>>>>> >>>>>>> In version 4.x we run query against a Cube very obviously, but in >>>>>>> version 5, the cube usage is a implication socan you advise: for a given >>>>>>> query, which model will be used, which index (cube) will be used for >>>>>>> this >>>>>>> query? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> wrote: >>>>>>> >>>>>>>> 1. How do I measure the size of the index (cube) in version 5? >>>>>>>> You can check storage of specific Indexes from the Index page. >>>>>>>> >>>>>>>> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index >>>>>>>> or >>>>>>>> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png >>>>>>>> >>>>>>>> >>>>>>>> 2. How to create the cardinality for each column? >>>>>>>> You should check this link : >>>>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ . >>>>>>>> >>>>>>>> 3. In your default project sample named SSB project, you have only >>>>>>>> 4 simple aggregate group index and no table index as in attached file >>>>>>>> so what is the best strategy to select index for our OLAP? >>>>>>>> 1. There does exist a 'Base Table Index' by default actually, >>>>>>>> its id is 20000000001. >>>>>>>> 2. I think it is a good question and Kylin 5 lacks such a guide >>>>>>>> for better modeling. You are free to ask your question to >>>>>>>> mailing list and I will try to reply. >>>>>>>> >>>>>>>> ------------------------ >>>>>>>> With warm regard >>>>>>>> Xiaoxiang Yu >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> OK, I didn't read all the mail history so I misunderstand the >>>>>>>>> situation. Looks like you need to analyse >>>>>>>>> the cause why the query didn't hit the cube correctly. >>>>>>>>> >>>>>>>>> Please generate query diagnosis package and send it to me >>>>>>>>> privately. I will analyse the query log. >>>>>>>>> You can refer to the following steps in screenshots. >>>>>>>>> >>>>>>>>> [image: image.png] >>>>>>>>> >>>>>>>>> If the screenshots are not displaying correctly, please read this >>>>>>>>> guide : >>>>>>>>> >>>>>>>>> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui >>>>>>>>> >>>>>>>>> By the way, you need to analyse the cause by reading >>>>>>>>> kylin.query.log, not the kylin.log, >>>>>>>>> refer to >>>>>>>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log >>>>>>>>> >>>>>>>>> ------------------------ >>>>>>>>> With warm regard >>>>>>>>> Xiaoxiang Yu >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >>>>>>>>> >>>>>>>>>> Thank you Xiaoxiang for your advice. As my title email shown, I >>>>>>>>>> guessed that the OLAP functionalities has not been correctly set up >>>>>>>>>> in my >>>>>>>>>> computer. >>>>>>>>>> >>>>>>>>>> The evidence about it is that: when I disable the Pushdown option >>>>>>>>>> box to use solely the precomputation cube only, it showed following >>>>>>>>>> error: >>>>>>>>>> Please kindly advise how to properly build the OLAP >>>>>>>>>> >>>>>>>>>> LIMIT 500": No realization found for OLAPContext, >>>>>>>>>> MODEL_UNMATCHED_JOIN, >>>>>>>>>> rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, >>>>>>>>>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, >>>>>>>>>> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu <x...@apache.org> >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Yesterday, I tried to see if query pushdown functions work >>>>>>>>>>> well in the Kylin5 docker, and all of my queries return proper >>>>>>>>>>> responses . >>>>>>>>>>> After checking your logs from Shaofeng, I found these error >>>>>>>>>>> messages repeated many times: >>>>>>>>>>> 1. 'java.io.IOException: All datanodes >>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] >>>>>>>>>>> are bad. Aborting...' >>>>>>>>>>> 2. 'curator.ConnectionState : Connection timed out for >>>>>>>>>>> connection string (localhost:2181) and timeout (15000) / elapsed >>>>>>>>>>> (41794) >>>>>>>>>>> org.apache.curator.CuratorConnectionLossException: >>>>>>>>>>> KeeperErrorCode = ConnectionLoss' >>>>>>>>>>> >>>>>>>>>>> I guess the root cause is that the container didn't not have >>>>>>>>>>> enough resources. I found you query on a table called >>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a complex query >>>>>>>>>>> on a >>>>>>>>>>> table which contains 400 million rows? >>>>>>>>>>> >>>>>>>>>>> Since I am the uploader of kylin5 's docker image, I want to >>>>>>>>>>> give some explainment. Kylin5 docker is not a place for performance >>>>>>>>>>> benchmarks, it is only for demonstration. It is only allocated with >>>>>>>>>>> very >>>>>>>>>>> little resources(8G memory) if you are using the default command >>>>>>>>>>> from >>>>>>>>>>> docker hub page. Before I uploaded my image, I only tested my image >>>>>>>>>>> using >>>>>>>>>>> the ssb dataset, which the biggest table only contains about 60k >>>>>>>>>>> rows. If >>>>>>>>>>> you are using a larger dataset and complexer queries, you have to >>>>>>>>>>> scale the >>>>>>>>>>> resource properly. Try querying tables which contain not more than >>>>>>>>>>> 100k >>>>>>>>>>> rows by default. >>>>>>>>>>> >>>>>>>>>>> Here are some tips which may help you to check if the daemon >>>>>>>>>>> service is in health status and resources(particularly disk space) >>>>>>>>>>> is >>>>>>>>>>> configured properly. >>>>>>>>>>> >>>>>>>>>>> 1. Checking HDFS 's web ui( >>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to confirm >>>>>>>>>>> whether HDFS service is in 'In service' status. >>>>>>>>>>> 2. Checking Datanode 's log in >>>>>>>>>>> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, >>>>>>>>>>> check if >>>>>>>>>>> there is any error message. Like: cat >>>>>>>>>>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | >>>>>>>>>>> grep ERROR >>>>>>>>>>> | wc -l >>>>>>>>>>> 3. Checking if your docker engine is configured with enough >>>>>>>>>>> disk space, if you are using Docker Desktop like me,please go to >>>>>>>>>>> "Settings" >>>>>>>>>>> - "Resources" - "Advanced", make sure you have allocated 40GB+ disk >>>>>>>>>>> space >>>>>>>>>>> to the docker container. >>>>>>>>>>> 4. Checking the available disk space of your container by >>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less than 60% . >>>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make sure >>>>>>>>>>> these metrics are not really high when you send a query. >>>>>>>>>>> ------------------------ >>>>>>>>>>> With warm regard >>>>>>>>>>> Xiaoxiang Yu >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy >>>>>>>>>>> <na...@vnpay.vn.invalid> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi ShaoFeng >>>>>>>>>>>> >>>>>>>>>>>> Thank you very much for your valuable feedback >>>>>>>>>>>> >>>>>>>>>>>> I saw the application to be there (if I see it right) as in the >>>>>>>>>>>> attachment photo. Kindly advise so that I can run this query on >>>>>>>>>>>> OLAP. >>>>>>>>>>>> >>>>>>>>>>>> PS. I sent you the log file in private. >>>>>>>>>>>> >>>>>>>>>>>> [image: image.png] >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < >>>>>>>>>>>> shaofeng...@apache.org> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Can you provide the messages in logs/kylin.log when executing >>>>>>>>>>>>> the SQL? and you can also check the Spark UI from yarn resource >>>>>>>>>>>>> manager >>>>>>>>>>>>> (there should be one running application called Spardar, which is >>>>>>>>>>>>> Kylin's >>>>>>>>>>>>> backend spark application). If the application is not there, it >>>>>>>>>>>>> may >>>>>>>>>>>>> indicates the yarn doesn't have resource to startup it. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> >>>>>>>>>>>>> Shaofeng Shi 史少锋 >>>>>>>>>>>>> Apache Kylin PMC, >>>>>>>>>>>>> Apache Incubator PMC, >>>>>>>>>>>>> Email: shaofeng...@apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> Apache Kylin FAQ: >>>>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >>>>>>>>>>>>> Join Kylin user mail group: user-subscr...@kylin.apache.org >>>>>>>>>>>>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道: >>>>>>>>>>>>> >>>>>>>>>>>>>> Dear Sir/Madam, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have a fact with 500million rows then I build model, index >>>>>>>>>>>>>> according to the website help. >>>>>>>>>>>>>> >>>>>>>>>>>>>> I chose full incremental because this is the first times I >>>>>>>>>>>>>> load data >>>>>>>>>>>>>> >>>>>>>>>>>>>> I create both index types Aggregate group index, table index >>>>>>>>>>>>>> as photo attached. >>>>>>>>>>>>>> >>>>>>>>>>>>>> But the query always failed after timeout of 300 seconds (I >>>>>>>>>>>>>> run in docker), I dont want to increase the value of 300 seconds >>>>>>>>>>>>>> because I >>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that possible?) >>>>>>>>>>>>>> >>>>>>>>>>>>>> It seems that the OLAP function in indexing not working to >>>>>>>>>>>>>> speedup the query by precomputed cube. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Can you advise to check whether the index did really work? >>>>>>>>>>>>>> >>>>>>>>>>>>>> It is quite urgent task for me so prompt response is highly >>>>>>>>>>>>>> appreciated. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thank you very much >>>>>>>>>>>>>> >>>>>>>>>>>>>