Hi Xiaoxiang, I would like to provide information about computed column error in attachment file. Kindly help so that I can query computed_column.
Thank you On Thu, Nov 2, 2023 at 9:58 AM Xiaoxiang Yu <x...@apache.org> wrote: > Hi, for the first question,you don't provide any detail for analysis, > please send me your query diagnostic package which includes your metadata, > query, and logs. > > For the second question, I am not sure at the moment. > > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Thu, Nov 2, 2023 at 10:33 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > >> Thank you Xiaoxiang for the guideline. Will definitely read it carefully. >> Kindly help the following questions: >> >> 1. Computed column >> >> I created a “computed column” and add it to dimensions (among other >> dimensions) >> >> When I use query to select the computed column it returned error >> >> 2. Datatype optimization: will you think that the int be better than >> string >> for key join columns? >> >> Please advise >> >> >> On Wed, 1 Nov 2023 at 17:32 Xiaoxiang Yu <x...@apache.org> wrote: >> >> > Yes, that is almost correct. >> > >> > If you have a lot of complex queries, and you want to using Kylin 5 to >> > accelerate them, the recommended steps of mine are as follows: >> > >> > 1. You analyse all queries and collect all join relation/pattern. >> > 2. You create Models for each specific join relation/pattern, with the >> join >> > relation you find in above step. >> > 3. You analyse and collect dimensions and measures from all queries, and >> > add them to the corresponding Model. >> > 4. You build segments of all Models with proper data range. >> > 5. You turned off the pushdown switch, and sent all queries to Kylin. If >> > there are some queries which failed, fix them. >> > Here are some common situations. >> > 5.1 Join relation/pattern is not matched >> > 5.2 If the join relation is matched, the Model might not contain >> every >> > column that your query needs, please check kylin.query.log with keyword >> ' >> > unmatched'. >> > 6. (Optional) If you find some of your queries do not exactly match with >> > your Index(your query on [colA, colB], but your index contains more >> columns >> > than colA and colB), you can add some aggregate groups(or smaller Table >> > Index) to optimize the query performance. >> > >> > >> > >> > ------------------------ >> > With warm regard >> > Xiaoxiang Yu >> > >> > >> > >> > On Wed, Nov 1, 2023 at 5:57 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> wrote: >> > >> > > Thank you Xiaoxiang, I nearly got to the point. >> > > >> > > So can I interpret that: 1 model equal (~) to a set of Joins of >> > (Dim/Fact) >> > > table, that is to say we need to create several models according to >> > > multiple kinds of joins queries? >> > > >> > > Best regards >> > > >> > > On Wed, Nov 1, 2023 at 4:50 PM Xiaoxiang Yu <x...@apache.org> wrote: >> > > >> > >> Have you ever tried to analyse the reason why your query can not hit >> > >> Model 'sample_ssb'? >> > >> It is because the join relation of your query is not suitable for the >> > >> join relation/pattern of Model 'sample_ssb'. >> > >> >> > >> Your query used a join relation/pattern like: A inner join B. >> > >> But the Model 'sample_ssb' used a join relation/pattern like : A >> inner >> > >> join B inner join C. >> > >> >> > >> If you are familiar with the definition of Inner join, you may know >> that >> > >> the >> > >> relation/pattern 'A inner join B inner join C' will have a chance >> > >> to lose some rows when compared to pattern 'A inner join B'. >> > >> So the Model 'sample_ssb' will be excluded to serve your query. >> > >> >> > >> That is to say, you need to create a new model that is similar to >> Model >> > >> 'sample_ssb', >> > >> but with additional tables removed. >> > >> >> > >> >> > >> >> > >> ------------------------ >> > >> With warm regard >> > >> Xiaoxiang Yu >> > >> >> > >> >> > >> >> > >> On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> >> > wrote: >> > >> >> > >>> Hi Xiaoxiang, >> > >>> >> > >>> Thank you very much >> > >>> >> > >>> I have clearer picture of Kylin already thanks to your explanation. >> > >>> >> > >>> Now back to the sample project of SSB in attached photo, when I run >> > this >> > >>> query with push_down option OFF, why the OLAP error appears, and in >> > such >> > >>> case, how to create a new cube for this query? >> > >>> >> > >>> [image: image.png] >> > >>> >> > >>> On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <x...@apache.org> >> wrote: >> > >>> >> > >>>> Here is some of my explanation and it may not be perfect. >> > >>>> Segment in Kylin is part of model/cube pre-computed data, in most >> > >>>> cases, divided by date column. >> > >>>> >> > >>>> Here is some difference between Segment and Snapshot. >> > >>>> Segment, whose source data comes from one fact table joins some >> > dimension >> > >>>> tables with 'specific date range', is 'precomputed', and will >> > accelerate >> > >>>> complex query. >> > >>>> Snapshot, whose source data comes from one specific dimension table >> > without >> > >>>> specific date range, is "not precomputed", and can join with >> segments >> > >>>> at runtime . >> > >>>> >> > >>>> - https://kylin.apache.org/5.0/docs/snapshot/snapshot_management >> > >>>> - >> > >>>> >> > >> https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro >> > >>>> >> > >>>> ------------------------ >> > >>>> With warm regard >> > >>>> Xiaoxiang Yu >> > >>>> >> > >>>> >> > >>>> >> > >>>> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: >> > >>>> >> > >>>>> Thank you again, very smart of you to automatically select cube >> for a >> > >>>>> certain query. Sorry If I ask too much: Is the concept of Segment >> in >> > Kylin >> > >>>>> model similar to Slice-and-Dice concept of Cube, what is the >> > different >> > >>>>> between Kylin Segment and Kylin Snapshot? >> > >>>>> >> > >>>>> PS. I sent you the log files for your help in investigating why my >> > >>>>> cube has not been used. >> > >>>>> >> > >>>>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <x...@apache.org> >> wrote: >> > >>>>> >> > >>>>>> I guess there is a misunderstanding from your sentences. >> > >>>>>> >> > >>>>>> -- 'I need to select Cube from a combo box below the query >> window' >> > >>>>>> It is not right to use 'need', that combo box is for some >> specific >> > >>>>>> cases(for example, Kylin did not choose a cube which is the most >> > >>>>>> efficient), not the most cases. >> > >>>>>> In most cases(both for Kylin 4 and Kylin 5), you don't need to >> > select >> > >>>>>> a Cube in the combo box, Kylin will do the choice for you. >> > >>>>>> >> > >>>>>> ------------------------ >> > >>>>>> With warm regard >> > >>>>>> Xiaoxiang Yu >> > >>>>>> >> > >>>>>> >> > >>>>>> >> > >>>>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid >> > >> > >>>>>> wrote: >> > >>>>>> >> > >>>>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a >> > >>>>>>> question of a beginner) >> > >>>>>>> >> > >>>>>>> "obviously" means "clearly" >> > >>>>>>> >> > >>>>>>> because I need to select Cube from a combo box below the query >> > window >> > >>>>>>> >> > >>>>>>> Thank you very much >> > >>>>>>> >> > >>>>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> >> > wrote: >> > >>>>>>> >> > >>>>>>>> From my side, I cannot understand why you say Kylin 4 is 'very >> > >>>>>>>> obviously'. Can you give an example? >> > >>>>>>>> From the source code, the basic logic of choosing the right >> > >>>>>>>> cube/model are similar. >> > >>>>>>>> ------------------------ >> > >>>>>>>> With warm regard >> > >>>>>>>> Xiaoxiang Yu >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> >> > >>>>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> >> wrote: >> > >>>>>>>> >> > >>>>>>>>> Thank you for your kind reply, please answer 1 more question >> > about >> > >>>>>>>>> version 5: >> > >>>>>>>>> >> > >>>>>>>>> In version 4.x we run query against a Cube very obviously, >> but in >> > >>>>>>>>> version 5, the cube usage is a implication socan you advise: >> for >> > a given >> > >>>>>>>>> query, which model will be used, which index (cube) will be >> used >> > for this >> > >>>>>>>>> query? >> > >>>>>>>>> >> > >>>>>>>>> Thank you >> > >>>>>>>>> >> > >>>>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> >> > >>>>>>>>> wrote: >> > >>>>>>>>> >> > >>>>>>>>>> 1. How do I measure the size of the index (cube) in version >> 5? >> > >>>>>>>>>> You can check storage of specific Indexes from the Index >> > page. >> > >>>>>>>>>> >> > >>>>>>>>>> >> > >> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index >> > >>>>>>>>>> or >> > >>>>>>>>>> >> > >> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png >> > >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> 2. How to create the cardinality for each column? >> > >>>>>>>>>> You should check this link : >> > >>>>>>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ >> . >> > >>>>>>>>>> >> > >>>>>>>>>> 3. In your default project sample named SSB project, you have >> > >>>>>>>>>> only 4 simple aggregate group index and no table index as in >> > attached file >> > >>>>>>>>>> so what is the best strategy to select index for our OLAP? >> > >>>>>>>>>> 1. There does exist a 'Base Table Index' by default >> > >>>>>>>>>> actually, its id is 20000000001. >> > >>>>>>>>>> 2. I think it is a good question and Kylin 5 lacks such a >> > >>>>>>>>>> guide for better modeling. You are free to ask your question >> to >> > >>>>>>>>>> mailing list and I will try to reply. >> > >>>>>>>>>> >> > >>>>>>>>>> ------------------------ >> > >>>>>>>>>> With warm regard >> > >>>>>>>>>> Xiaoxiang Yu >> > >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> >> > >>>>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org >> > >> > >>>>>>>>>> wrote: >> > >>>>>>>>>> >> > >>>>>>>>>>> OK, I didn't read all the mail history so I misunderstand >> the >> > >>>>>>>>>>> situation. Looks like you need to analyse >> > >>>>>>>>>>> the cause why the query didn't hit the cube correctly. >> > >>>>>>>>>>> >> > >>>>>>>>>>> Please generate query diagnosis package and send it to me >> > >>>>>>>>>>> privately. I will analyse the query log. >> > >>>>>>>>>>> You can refer to the following steps in screenshots. >> > >>>>>>>>>>> >> > >>>>>>>>>>> [image: image.png] >> > >>>>>>>>>>> >> > >>>>>>>>>>> If the screenshots are not displaying correctly, please read >> > >>>>>>>>>>> this guide : >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui >> > >>>>>>>>>>> >> > >>>>>>>>>>> By the way, you need to analyse the cause by reading >> > >>>>>>>>>>> kylin.query.log, not the kylin.log, >> > >>>>>>>>>>> refer to >> > >>>>>>>>>>> >> https://kylin.apache.org/5.0/docs/operations/logs/system_log >> > >>>>>>>>>>> >> > >>>>>>>>>>> ------------------------ >> > >>>>>>>>>>> With warm regard >> > >>>>>>>>>>> Xiaoxiang Yu >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> >> > >>>>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> >> > >>>>>>>>>>> wrote: >> > >>>>>>>>>>> >> > >>>>>>>>>>>> Thank you Xiaoxiang for your advice. As my title email >> shown, >> > I >> > >>>>>>>>>>>> guessed that the OLAP functionalities has not been >> correctly >> > set up in my >> > >>>>>>>>>>>> computer. >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> The evidence about it is that: when I disable the Pushdown >> > >>>>>>>>>>>> option box to use solely the precomputation cube only, it >> > showed following >> > >>>>>>>>>>>> error: Please kindly advise how to properly build the OLAP >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> LIMIT 500": No realization found for OLAPContext, >> > MODEL_UNMATCHED_JOIN, >> > rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, >> > FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, >> > 12, 13, 14, 15, 16, 17, 18, 19, 20]) >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> >> > >>>>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu < >> x...@apache.org >> > > >> > >>>>>>>>>>>> wrote: >> > >>>>>>>>>>>> >> > >>>>>>>>>>>>> Hi, >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> Yesterday, I tried to see if query pushdown functions >> > work >> > >>>>>>>>>>>>> well in the Kylin5 docker, and all of my queries return >> > proper responses . >> > >>>>>>>>>>>>> After checking your logs from Shaofeng, I found these >> > >>>>>>>>>>>>> error messages repeated many times: >> > >>>>>>>>>>>>> 1. 'java.io.IOException: All datanodes >> > >>>>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866 >> > ,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] >> > >>>>>>>>>>>>> are bad. Aborting...' >> > >>>>>>>>>>>>> 2. 'curator.ConnectionState : Connection timed out for >> > >>>>>>>>>>>>> connection string (localhost:2181) and timeout (15000) / >> > elapsed (41794) >> > >>>>>>>>>>>>> org.apache.curator.CuratorConnectionLossException: >> > >>>>>>>>>>>>> KeeperErrorCode = ConnectionLoss' >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> I guess the root cause is that the container didn't >> not >> > >>>>>>>>>>>>> have enough resources. I found you query on a table called >> > >>>>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a >> > complex query on a >> > >>>>>>>>>>>>> table which contains 400 million rows? >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> Since I am the uploader of kylin5 's docker image, I >> want >> > >>>>>>>>>>>>> to give some explainment. Kylin5 docker is not a place for >> > performance >> > >>>>>>>>>>>>> benchmarks, it is only for demonstration. It is only >> > allocated with very >> > >>>>>>>>>>>>> little resources(8G memory) if you are using the default >> > command from >> > >>>>>>>>>>>>> docker hub page. Before I uploaded my image, I only tested >> > my image using >> > >>>>>>>>>>>>> the ssb dataset, which the biggest table only contains >> about >> > 60k rows. If >> > >>>>>>>>>>>>> you are using a larger dataset and complexer queries, you >> > have to scale the >> > >>>>>>>>>>>>> resource properly. Try querying tables which contain not >> > more than 100k >> > >>>>>>>>>>>>> rows by default. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> Here are some tips which may help you to check if the >> > >>>>>>>>>>>>> daemon service is in health status and >> > resources(particularly disk space) >> > >>>>>>>>>>>>> is configured properly. >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> 1. Checking HDFS 's web ui( >> > >>>>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to >> > >>>>>>>>>>>>> confirm whether HDFS service is in 'In service' status. >> > >>>>>>>>>>>>> 2. Checking Datanode 's log in >> > >>>>>>>>>>>>> >> > `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, check >> if >> > >>>>>>>>>>>>> there is any error message. Like: cat >> > >>>>>>>>>>>>> >> > /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep >> ERROR >> > >>>>>>>>>>>>> | wc -l >> > >>>>>>>>>>>>> 3. Checking if your docker engine is configured with >> > >>>>>>>>>>>>> enough disk space, if you are using Docker Desktop like >> > me,please go to >> > >>>>>>>>>>>>> "Settings" - "Resources" - "Advanced", make sure you have >> > allocated 40GB+ >> > >>>>>>>>>>>>> disk space to the docker container. >> > >>>>>>>>>>>>> 4. Checking the available disk space of your >> container by >> > >>>>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less than >> 60% . >> > >>>>>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make >> > sure >> > >>>>>>>>>>>>> these metrics are not really high when you send a query. >> > >>>>>>>>>>>>> ------------------------ >> > >>>>>>>>>>>>> With warm regard >> > >>>>>>>>>>>>> Xiaoxiang Yu >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy >> > >>>>>>>>>>>>> <na...@vnpay.vn.invalid> wrote: >> > >>>>>>>>>>>>> >> > >>>>>>>>>>>>>> Hi ShaoFeng >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> Thank you very much for your valuable feedback >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> I saw the application to be there (if I see it right) as >> in >> > >>>>>>>>>>>>>> the attachment photo. Kindly advise so that I can run >> this >> > query on OLAP. >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> PS. I sent you the log file in private. >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> [image: image.png] >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < >> > >>>>>>>>>>>>>> shaofeng...@apache.org> wrote: >> > >>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> Can you provide the messages in logs/kylin.log when >> > >>>>>>>>>>>>>>> executing the SQL? and you can also check the Spark UI >> > from yarn resource >> > >>>>>>>>>>>>>>> manager (there should be one running application called >> > Spardar, which is >> > >>>>>>>>>>>>>>> Kylin's backend spark application). If the application >> is >> > not there, it may >> > >>>>>>>>>>>>>>> indicates the yarn doesn't have resource to startup it. >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> Best regards, >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> Shaofeng Shi 史少锋 >> > >>>>>>>>>>>>>>> Apache Kylin PMC, >> > >>>>>>>>>>>>>>> Apache Incubator PMC, >> > >>>>>>>>>>>>>>> Email: shaofeng...@apache.org >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> Apache Kylin FAQ: >> > >>>>>>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html >> > >>>>>>>>>>>>>>> Join Kylin user mail group: >> > user-subscr...@kylin.apache.org >> > >>>>>>>>>>>>>>> Join Kylin dev mail group: >> dev-subscr...@kylin.apache.org >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道: >> > >>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> Dear Sir/Madam, >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> I have a fact with 500million rows then I build model, >> > >>>>>>>>>>>>>>>> index according to the website help. >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> I chose full incremental because this is the first >> times I >> > >>>>>>>>>>>>>>>> load data >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> I create both index types Aggregate group index, table >> > >>>>>>>>>>>>>>>> index as photo attached. >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> But the query always failed after timeout of 300 >> seconds >> > (I >> > >>>>>>>>>>>>>>>> run in docker), I dont want to increase the value of >> 300 >> > seconds because I >> > >>>>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that >> possible?) >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> It seems that the OLAP function in indexing not >> working to >> > >>>>>>>>>>>>>>>> speedup the query by precomputed cube. >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> Can you advise to check whether the index did really >> work? >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> It is quite urgent task for me so prompt response is >> > highly >> > >>>>>>>>>>>>>>>> appreciated. >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>>> Thank you very much >> > >>>>>>>>>>>>>>>> >> > >>>>>>>>>>>>>>> >> > >> >