The query is too fast, less than a second, can you make it a little bit slower so that I can see it clearly 😀😀 [image: image.png]
On Thu, Nov 2, 2023 at 9:32 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: > Thank you Xiaoxiang for the guideline. Will definitely read it carefully. > Kindly help the following questions: > > 1. Computed column > > I created a “computed column” and add it to dimensions (among other > dimensions) > > When I use query to select the computed column it returned error > > 2. Datatype optimization: will you think that the int be better than > string for key join columns? > > Please advise > > > On Wed, 1 Nov 2023 at 17:32 Xiaoxiang Yu <x...@apache.org> wrote: > >> Yes, that is almost correct. >> >> If you have a lot of complex queries, and you want to using Kylin 5 to >> accelerate them, the recommended steps of mine are as follows: >> >> 1. You analyse all queries and collect all join relation/pattern. >> 2. You create Models for each specific join relation/pattern, with the >> join >> relation you find in above step. >> 3. You analyse and collect dimensions and measures from all queries, and >> add them to the corresponding Model. >> 4. You build segments of all Models with proper data range. >> 5. You turned off the pushdown switch, and sent all queries to Kylin. If >> there are some queries which failed, fix them. >> Here are some common situations. >> 5.1 Join relation/pattern is not matched >> 5.2 If the join relation is matched, the Model might not contain every >> column that your query needs, please check kylin.query.log with keyword ' >> unmatched'. >> 6. (Optional) If you find some of your queries do not exactly match with >> your Index(your query on [colA, colB], but your index contains more >> columns >> than colA and colB), you can add some aggregate groups(or smaller Table >> Index) to optimize the query performance. >> >> >> >> ------------------------ >> With warm regard >> Xiaoxiang Yu >> >> >> >> On Wed, Nov 1, 2023 at 5:57 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: >> >> > Thank you Xiaoxiang, I nearly got to the point. >> > >> > So can I interpret that: 1 model equal (~) to a set of Joins of >> (Dim/Fact) >> > table, that is to say we need to create several models according to >> > multiple kinds of joins queries? >> > >> > Best regards >> > >> > On Wed, Nov 1, 2023 at 4:50 PM Xiaoxiang Yu <x...@apache.org> wrote: >> > >> >> Have you ever tried to analyse the reason why your query can not hit >> >> Model 'sample_ssb'? >> >> It is because the join relation of your query is not suitable for the >> >> join relation/pattern of Model 'sample_ssb'. >> >> >> >> Your query used a join relation/pattern like: A inner join B. >> >> But the Model 'sample_ssb' used a join relation/pattern like : A inner >> >> join B inner join C. >> >> >> >> If you are familiar with the definition of Inner join, you may know >> that >> >> the >> >> relation/pattern 'A inner join B inner join C' will have a chance >> >> to lose some rows when compared to pattern 'A inner join B'. >> >> So the Model 'sample_ssb' will be excluded to serve your query. >> >> >> >> That is to say, you need to create a new model that is similar to Model >> >> 'sample_ssb', >> >> but with additional tables removed. >> >> >> >> >> >> >> >> ------------------------ >> >> With warm regard >> >> Xiaoxiang Yu >> >> >> >> >> >> >> >> On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> wrote: >> >> >> >>> Hi Xiaoxiang, >> >>> >> >>> Thank you very much >> >>> >> >>> I have clearer picture of Kylin already thanks to your explanation. >> >>> >> >>> Now back to the sample project of SSB in attached photo, when I run >> this >> >>> query with push_down option OFF, why the OLAP error appears, and in >> such >> >>> case, how to create a new cube for this query? >> >>> >> >>> [image: image.png] >> >>> >> >>> On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <x...@apache.org> wrote: >> >>> >> >>>> Here is some of my explanation and it may not be perfect. >> >>>> Segment in Kylin is part of model/cube pre-computed data, in most >> >>>> cases, divided by date column. >> >>>> >> >>>> Here is some difference between Segment and Snapshot. >> >>>> Segment, whose source data comes from one fact table joins some >> dimension >> >>>> tables with 'specific date range', is 'precomputed', and will >> accelerate >> >>>> complex query. >> >>>> Snapshot, whose source data comes from one specific dimension table >> without >> >>>> specific date range, is "not precomputed", and can join with segments >> >>>> at runtime . >> >>>> >> >>>> - https://kylin.apache.org/5.0/docs/snapshot/snapshot_management >> >>>> - >> >>>> >> https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro >> >>>> >> >>>> ------------------------ >> >>>> With warm regard >> >>>> Xiaoxiang Yu >> >>>> >> >>>> >> >>>> >> >>>> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >> >>>> >> >>>>> Thank you again, very smart of you to automatically select cube for >> a >> >>>>> certain query. Sorry If I ask too much: Is the concept of Segment >> in Kylin >> >>>>> model similar to Slice-and-Dice concept of Cube, what is the >> different >> >>>>> between Kylin Segment and Kylin Snapshot? >> >>>>> >> >>>>> PS. I sent you the log files for your help in investigating why my >> >>>>> cube has not been used. >> >>>>> >> >>>>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <x...@apache.org> >> wrote: >> >>>>> >> >>>>>> I guess there is a misunderstanding from your sentences. >> >>>>>> >> >>>>>> -- 'I need to select Cube from a combo box below the query window' >> >>>>>> It is not right to use 'need', that combo box is for some specific >> >>>>>> cases(for example, Kylin did not choose a cube which is the most >> >>>>>> efficient), not the most cases. >> >>>>>> In most cases(both for Kylin 4 and Kylin 5), you don't need to >> select >> >>>>>> a Cube in the combo box, Kylin will do the choice for you. >> >>>>>> >> >>>>>> ------------------------ >> >>>>>> With warm regard >> >>>>>> Xiaoxiang Yu >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a >> >>>>>>> question of a beginner) >> >>>>>>> >> >>>>>>> "obviously" means "clearly" >> >>>>>>> >> >>>>>>> because I need to select Cube from a combo box below the query >> window >> >>>>>>> >> >>>>>>> Thank you very much >> >>>>>>> >> >>>>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> >> wrote: >> >>>>>>> >> >>>>>>>> From my side, I cannot understand why you say Kylin 4 is 'very >> >>>>>>>> obviously'. Can you give an example? >> >>>>>>>> From the source code, the basic logic of choosing the right >> >>>>>>>> cube/model are similar. >> >>>>>>>> ------------------------ >> >>>>>>>> With warm regard >> >>>>>>>> Xiaoxiang Yu >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> >> wrote: >> >>>>>>>> >> >>>>>>>>> Thank you for your kind reply, please answer 1 more question >> about >> >>>>>>>>> version 5: >> >>>>>>>>> >> >>>>>>>>> In version 4.x we run query against a Cube very obviously, but >> in >> >>>>>>>>> version 5, the cube usage is a implication socan you advise: >> for a given >> >>>>>>>>> query, which model will be used, which index (cube) will be >> used for this >> >>>>>>>>> query? >> >>>>>>>>> >> >>>>>>>>> Thank you >> >>>>>>>>> >> >>>>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> >> >>>>>>>>> wrote: >> >>>>>>>>> >> >>>>>>>>>> 1. How do I measure the size of the index (cube) in version 5? >> >>>>>>>>>> You can check storage of specific Indexes from the Index >> page. >> >>>>>>>>>> >> >>>>>>>>>> >> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index >> >>>>>>>>>> or >> >>>>>>>>>> >> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> 2. How to create the cardinality for each column? >> >>>>>>>>>> You should check this link : >> >>>>>>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ . >> >>>>>>>>>> >> >>>>>>>>>> 3. In your default project sample named SSB project, you have >> >>>>>>>>>> only 4 simple aggregate group index and no table index as in >> attached file >> >>>>>>>>>> so what is the best strategy to select index for our OLAP? >> >>>>>>>>>> 1. There does exist a 'Base Table Index' by default >> >>>>>>>>>> actually, its id is 20000000001. >> >>>>>>>>>> 2. I think it is a good question and Kylin 5 lacks such a >> >>>>>>>>>> guide for better modeling. You are free to ask your question to >> >>>>>>>>>> mailing list and I will try to reply. >> >>>>>>>>>> >> >>>>>>>>>> ------------------------ >> >>>>>>>>>> With warm regard >> >>>>>>>>>> Xiaoxiang Yu >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org> >> >>>>>>>>>> wrote: >> >>>>>>>>>> >> >>>>>>>>>>> OK, I didn't read all the mail history so I misunderstand the >> >>>>>>>>>>> situation. Looks like you need to analyse >> >>>>>>>>>>> the cause why the query didn't hit the cube correctly. >> >>>>>>>>>>> >> >>>>>>>>>>> Please generate query diagnosis package and send it to me >> >>>>>>>>>>> privately. I will analyse the query log. >> >>>>>>>>>>> You can refer to the following steps in screenshots. >> >>>>>>>>>>> >> >>>>>>>>>>> [image: image.png] >> >>>>>>>>>>> >> >>>>>>>>>>> If the screenshots are not displaying correctly, please read >> >>>>>>>>>>> this guide : >> >>>>>>>>>>> >> >>>>>>>>>>> >> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui >> >>>>>>>>>>> >> >>>>>>>>>>> By the way, you need to analyse the cause by reading >> >>>>>>>>>>> kylin.query.log, not the kylin.log, >> >>>>>>>>>>> refer to >> >>>>>>>>>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log >> >>>>>>>>>>> >> >>>>>>>>>>> ------------------------ >> >>>>>>>>>>> With warm regard >> >>>>>>>>>>> Xiaoxiang Yu >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> >> >>>>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Thank you Xiaoxiang for your advice. As my title email >> shown, I >> >>>>>>>>>>>> guessed that the OLAP functionalities has not been correctly >> set up in my >> >>>>>>>>>>>> computer. >> >>>>>>>>>>>> >> >>>>>>>>>>>> The evidence about it is that: when I disable the Pushdown >> >>>>>>>>>>>> option box to use solely the precomputation cube only, it >> showed following >> >>>>>>>>>>>> error: Please kindly advise how to properly build the OLAP >> >>>>>>>>>>>> >> >>>>>>>>>>>> LIMIT 500": No realization found for OLAPContext, >> MODEL_UNMATCHED_JOIN, >> rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, >> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, >> 12, 13, 14, 15, 16, 17, 18, 19, 20]) >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> >> >>>>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu < >> x...@apache.org> >> >>>>>>>>>>>> wrote: >> >>>>>>>>>>>> >> >>>>>>>>>>>>> Hi, >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Yesterday, I tried to see if query pushdown functions >> work >> >>>>>>>>>>>>> well in the Kylin5 docker, and all of my queries return >> proper responses . >> >>>>>>>>>>>>> After checking your logs from Shaofeng, I found these >> >>>>>>>>>>>>> error messages repeated many times: >> >>>>>>>>>>>>> 1. 'java.io.IOException: All datanodes >> >>>>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866 >> ,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] >> >>>>>>>>>>>>> are bad. Aborting...' >> >>>>>>>>>>>>> 2. 'curator.ConnectionState : Connection timed out for >> >>>>>>>>>>>>> connection string (localhost:2181) and timeout (15000) / >> elapsed (41794) >> >>>>>>>>>>>>> org.apache.curator.CuratorConnectionLossException: >> >>>>>>>>>>>>> KeeperErrorCode = ConnectionLoss' >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> I guess the root cause is that the container didn't not >> >>>>>>>>>>>>> have enough resources. I found you query on a table called >> >>>>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a >> complex query on a >> >>>>>>>>>>>>> table which contains 400 million rows? >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Since I am the uploader of kylin5 's docker image, I >> want >> >>>>>>>>>>>>> to give some explainment. Kylin5 docker is not a place for >> performance >> >>>>>>>>>>>>> benchmarks, it is only for demonstration. It is only >> allocated with very >> >>>>>>>>>>>>> little resources(8G memory) if you are using the default >> command from >> >>>>>>>>>>>>> docker hub page. Before I uploaded my image, I only tested >> my image using >> >>>>>>>>>>>>> the ssb dataset, which the biggest table only contains >> about 60k rows. If >> >>>>>>>>>>>>> you are using a larger dataset and complexer queries, you >> have to scale the >> >>>>>>>>>>>>> resource properly. Try querying tables which contain not >> more than 100k >> >>>>>>>>>>>>> rows by default. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> Here are some tips which may help you to check if the >> >>>>>>>>>>>>> daemon service is in health status and >> resources(particularly disk space) >> >>>>>>>>>>>>> is configured properly. >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> 1. Checking HDFS 's web ui( >> >>>>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to >> >>>>>>>>>>>>> confirm whether HDFS service is in 'In service' status. >> >>>>>>>>>>>>> 2. Checking Datanode 's log in >> >>>>>>>>>>>>> >> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, check if >> >>>>>>>>>>>>> there is any error message. Like: cat >> >>>>>>>>>>>>> >> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep ERROR >> >>>>>>>>>>>>> | wc -l >> >>>>>>>>>>>>> 3. Checking if your docker engine is configured with >> >>>>>>>>>>>>> enough disk space, if you are using Docker Desktop like >> me,please go to >> >>>>>>>>>>>>> "Settings" - "Resources" - "Advanced", make sure you have >> allocated 40GB+ >> >>>>>>>>>>>>> disk space to the docker container. >> >>>>>>>>>>>>> 4. Checking the available disk space of your container >> by >> >>>>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less than 60% >> . >> >>>>>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make >> sure >> >>>>>>>>>>>>> these metrics are not really high when you send a query. >> >>>>>>>>>>>>> ------------------------ >> >>>>>>>>>>>>> With warm regard >> >>>>>>>>>>>>> Xiaoxiang Yu >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> >> >>>>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy >> >>>>>>>>>>>>> <na...@vnpay.vn.invalid> wrote: >> >>>>>>>>>>>>> >> >>>>>>>>>>>>>> Hi ShaoFeng >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> Thank you very much for your valuable feedback >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> I saw the application to be there (if I see it right) as in >> >>>>>>>>>>>>>> the attachment photo. Kindly advise so that I can run this >> query on OLAP. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> PS. I sent you the log file in private. >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> [image: image.png] >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < >> >>>>>>>>>>>>>> shaofeng...@apache.org> wrote: >> >>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Can you provide the messages in logs/kylin.log when >> >>>>>>>>>>>>>>> executing the SQL? and you can also check the Spark UI >> from yarn resource >> >>>>>>>>>>>>>>> manager (there should be one running application called >> Spardar, which is >> >>>>>>>>>>>>>>> Kylin's backend spark application). If the application is >> not there, it may >> >>>>>>>>>>>>>>> indicates the yarn doesn't have resource to startup it. >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Best regards, >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Shaofeng Shi 史少锋 >> >>>>>>>>>>>>>>> Apache Kylin PMC, >> >>>>>>>>>>>>>>> Apache Incubator PMC, >> >>>>>>>>>>>>>>> Email: shaofeng...@apache.org >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Apache Kylin FAQ: >> >>>>>>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >> >>>>>>>>>>>>>>> Join Kylin user mail group: >> user-subscr...@kylin.apache.org >> >>>>>>>>>>>>>>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道: >> >>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Dear Sir/Madam, >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I have a fact with 500million rows then I build model, >> >>>>>>>>>>>>>>>> index according to the website help. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I chose full incremental because this is the first times >> I >> >>>>>>>>>>>>>>>> load data >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> I create both index types Aggregate group index, table >> >>>>>>>>>>>>>>>> index as photo attached. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> But the query always failed after timeout of 300 seconds >> (I >> >>>>>>>>>>>>>>>> run in docker), I dont want to increase the value of 300 >> seconds because I >> >>>>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that >> possible?) >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> It seems that the OLAP function in indexing not working >> to >> >>>>>>>>>>>>>>>> speedup the query by precomputed cube. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Can you advise to check whether the index did really >> work? >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> It is quite urgent task for me so prompt response is >> highly >> >>>>>>>>>>>>>>>> appreciated. >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>>> Thank you very much >> >>>>>>>>>>>>>>>> >> >>>>>>>>>>>>>>> >> >