Hi, for the first question,you don't provide any detail for analysis, please send me your query diagnostic package which includes your metadata, query, and logs.
For the second question, I am not sure at the moment. ------------------------ With warm regard Xiaoxiang Yu On Thu, Nov 2, 2023 at 10:33 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > Thank you Xiaoxiang for the guideline. Will definitely read it carefully. > Kindly help the following questions: > > 1. Computed column > > I created a “computed column” and add it to dimensions (among other > dimensions) > > When I use query to select the computed column it returned error > > 2. Datatype optimization: will you think that the int be better than string > for key join columns? > > Please advise > > > On Wed, 1 Nov 2023 at 17:32 Xiaoxiang Yu <x...@apache.org> wrote: > > > Yes, that is almost correct. > > > > If you have a lot of complex queries, and you want to using Kylin 5 to > > accelerate them, the recommended steps of mine are as follows: > > > > 1. You analyse all queries and collect all join relation/pattern. > > 2. You create Models for each specific join relation/pattern, with the > join > > relation you find in above step. > > 3. You analyse and collect dimensions and measures from all queries, and > > add them to the corresponding Model. > > 4. You build segments of all Models with proper data range. > > 5. You turned off the pushdown switch, and sent all queries to Kylin. If > > there are some queries which failed, fix them. > > Here are some common situations. > > 5.1 Join relation/pattern is not matched > > 5.2 If the join relation is matched, the Model might not contain > every > > column that your query needs, please check kylin.query.log with keyword ' > > unmatched'. > > 6. (Optional) If you find some of your queries do not exactly match with > > your Index(your query on [colA, colB], but your index contains more > columns > > than colA and colB), you can add some aggregate groups(or smaller Table > > Index) to optimize the query performance. > > > > > > > > ------------------------ > > With warm regard > > Xiaoxiang Yu > > > > > > > > On Wed, Nov 1, 2023 at 5:57 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> > wrote: > > > > > Thank you Xiaoxiang, I nearly got to the point. > > > > > > So can I interpret that: 1 model equal (~) to a set of Joins of > > (Dim/Fact) > > > table, that is to say we need to create several models according to > > > multiple kinds of joins queries? > > > > > > Best regards > > > > > > On Wed, Nov 1, 2023 at 4:50 PM Xiaoxiang Yu <x...@apache.org> wrote: > > > > > >> Have you ever tried to analyse the reason why your query can not hit > > >> Model 'sample_ssb'? > > >> It is because the join relation of your query is not suitable for the > > >> join relation/pattern of Model 'sample_ssb'. > > >> > > >> Your query used a join relation/pattern like: A inner join B. > > >> But the Model 'sample_ssb' used a join relation/pattern like : A inner > > >> join B inner join C. > > >> > > >> If you are familiar with the definition of Inner join, you may know > that > > >> the > > >> relation/pattern 'A inner join B inner join C' will have a chance > > >> to lose some rows when compared to pattern 'A inner join B'. > > >> So the Model 'sample_ssb' will be excluded to serve your query. > > >> > > >> That is to say, you need to create a new model that is similar to > Model > > >> 'sample_ssb', > > >> but with additional tables removed. > > >> > > >> > > >> > > >> ------------------------ > > >> With warm regard > > >> Xiaoxiang Yu > > >> > > >> > > >> > > >> On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> > > wrote: > > >> > > >>> Hi Xiaoxiang, > > >>> > > >>> Thank you very much > > >>> > > >>> I have clearer picture of Kylin already thanks to your explanation. > > >>> > > >>> Now back to the sample project of SSB in attached photo, when I run > > this > > >>> query with push_down option OFF, why the OLAP error appears, and in > > such > > >>> case, how to create a new cube for this query? > > >>> > > >>> [image: image.png] > > >>> > > >>> On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <x...@apache.org> wrote: > > >>> > > >>>> Here is some of my explanation and it may not be perfect. > > >>>> Segment in Kylin is part of model/cube pre-computed data, in most > > >>>> cases, divided by date column. > > >>>> > > >>>> Here is some difference between Segment and Snapshot. > > >>>> Segment, whose source data comes from one fact table joins some > > dimension > > >>>> tables with 'specific date range', is 'precomputed', and will > > accelerate > > >>>> complex query. > > >>>> Snapshot, whose source data comes from one specific dimension table > > without > > >>>> specific date range, is "not precomputed", and can join with > segments > > >>>> at runtime . > > >>>> > > >>>> - https://kylin.apache.org/5.0/docs/snapshot/snapshot_management > > >>>> - > > >>>> > > > https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro > > >>>> > > >>>> ------------------------ > > >>>> With warm regard > > >>>> Xiaoxiang Yu > > >>>> > > >>>> > > >>>> > > >>>> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn> wrote: > > >>>> > > >>>>> Thank you again, very smart of you to automatically select cube > for a > > >>>>> certain query. Sorry If I ask too much: Is the concept of Segment > in > > Kylin > > >>>>> model similar to Slice-and-Dice concept of Cube, what is the > > different > > >>>>> between Kylin Segment and Kylin Snapshot? > > >>>>> > > >>>>> PS. I sent you the log files for your help in investigating why my > > >>>>> cube has not been used. > > >>>>> > > >>>>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <x...@apache.org> > wrote: > > >>>>> > > >>>>>> I guess there is a misunderstanding from your sentences. > > >>>>>> > > >>>>>> -- 'I need to select Cube from a combo box below the query window' > > >>>>>> It is not right to use 'need', that combo box is for some specific > > >>>>>> cases(for example, Kylin did not choose a cube which is the most > > >>>>>> efficient), not the most cases. > > >>>>>> In most cases(both for Kylin 4 and Kylin 5), you don't need to > > select > > >>>>>> a Cube in the combo box, Kylin will do the choice for you. > > >>>>>> > > >>>>>> ------------------------ > > >>>>>> With warm regard > > >>>>>> Xiaoxiang Yu > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid > > > > >>>>>> wrote: > > >>>>>> > > >>>>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a > > >>>>>>> question of a beginner) > > >>>>>>> > > >>>>>>> "obviously" means "clearly" > > >>>>>>> > > >>>>>>> because I need to select Cube from a combo box below the query > > window > > >>>>>>> > > >>>>>>> Thank you very much > > >>>>>>> > > >>>>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> > > wrote: > > >>>>>>> > > >>>>>>>> From my side, I cannot understand why you say Kylin 4 is 'very > > >>>>>>>> obviously'. Can you give an example? > > >>>>>>>> From the source code, the basic logic of choosing the right > > >>>>>>>> cube/model are similar. > > >>>>>>>> ------------------------ > > >>>>>>>> With warm regard > > >>>>>>>> Xiaoxiang Yu > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> > wrote: > > >>>>>>>> > > >>>>>>>>> Thank you for your kind reply, please answer 1 more question > > about > > >>>>>>>>> version 5: > > >>>>>>>>> > > >>>>>>>>> In version 4.x we run query against a Cube very obviously, but > in > > >>>>>>>>> version 5, the cube usage is a implication socan you advise: > for > > a given > > >>>>>>>>> query, which model will be used, which index (cube) will be > used > > for this > > >>>>>>>>> query? > > >>>>>>>>> > > >>>>>>>>> Thank you > > >>>>>>>>> > > >>>>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> > > >>>>>>>>> wrote: > > >>>>>>>>> > > >>>>>>>>>> 1. How do I measure the size of the index (cube) in version 5? > > >>>>>>>>>> You can check storage of specific Indexes from the Index > > page. > > >>>>>>>>>> > > >>>>>>>>>> > > > https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index > > >>>>>>>>>> or > > >>>>>>>>>> > > > https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> 2. How to create the cardinality for each column? > > >>>>>>>>>> You should check this link : > > >>>>>>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ . > > >>>>>>>>>> > > >>>>>>>>>> 3. In your default project sample named SSB project, you have > > >>>>>>>>>> only 4 simple aggregate group index and no table index as in > > attached file > > >>>>>>>>>> so what is the best strategy to select index for our OLAP? > > >>>>>>>>>> 1. There does exist a 'Base Table Index' by default > > >>>>>>>>>> actually, its id is 20000000001. > > >>>>>>>>>> 2. I think it is a good question and Kylin 5 lacks such a > > >>>>>>>>>> guide for better modeling. You are free to ask your question > to > > >>>>>>>>>> mailing list and I will try to reply. > > >>>>>>>>>> > > >>>>>>>>>> ------------------------ > > >>>>>>>>>> With warm regard > > >>>>>>>>>> Xiaoxiang Yu > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org> > > >>>>>>>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>>> OK, I didn't read all the mail history so I misunderstand the > > >>>>>>>>>>> situation. Looks like you need to analyse > > >>>>>>>>>>> the cause why the query didn't hit the cube correctly. > > >>>>>>>>>>> > > >>>>>>>>>>> Please generate query diagnosis package and send it to me > > >>>>>>>>>>> privately. I will analyse the query log. > > >>>>>>>>>>> You can refer to the following steps in screenshots. > > >>>>>>>>>>> > > >>>>>>>>>>> [image: image.png] > > >>>>>>>>>>> > > >>>>>>>>>>> If the screenshots are not displaying correctly, please read > > >>>>>>>>>>> this guide : > > >>>>>>>>>>> > > >>>>>>>>>>> > > > https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui > > >>>>>>>>>>> > > >>>>>>>>>>> By the way, you need to analyse the cause by reading > > >>>>>>>>>>> kylin.query.log, not the kylin.log, > > >>>>>>>>>>> refer to > > >>>>>>>>>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log > > >>>>>>>>>>> > > >>>>>>>>>>> ------------------------ > > >>>>>>>>>>> With warm regard > > >>>>>>>>>>> Xiaoxiang Yu > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> > > >>>>>>>>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Thank you Xiaoxiang for your advice. As my title email > shown, > > I > > >>>>>>>>>>>> guessed that the OLAP functionalities has not been correctly > > set up in my > > >>>>>>>>>>>> computer. > > >>>>>>>>>>>> > > >>>>>>>>>>>> The evidence about it is that: when I disable the Pushdown > > >>>>>>>>>>>> option box to use solely the precomputation cube only, it > > showed following > > >>>>>>>>>>>> error: Please kindly advise how to properly build the OLAP > > >>>>>>>>>>>> > > >>>>>>>>>>>> LIMIT 500": No realization found for OLAPContext, > > MODEL_UNMATCHED_JOIN, > > rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, > > FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, > > 12, 13, 14, 15, 16, 17, 18, 19, 20]) > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu < > x...@apache.org > > > > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>> > > >>>>>>>>>>>>> Hi, > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Yesterday, I tried to see if query pushdown functions > > work > > >>>>>>>>>>>>> well in the Kylin5 docker, and all of my queries return > > proper responses . > > >>>>>>>>>>>>> After checking your logs from Shaofeng, I found these > > >>>>>>>>>>>>> error messages repeated many times: > > >>>>>>>>>>>>> 1. 'java.io.IOException: All datanodes > > >>>>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866 > > ,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] > > >>>>>>>>>>>>> are bad. Aborting...' > > >>>>>>>>>>>>> 2. 'curator.ConnectionState : Connection timed out for > > >>>>>>>>>>>>> connection string (localhost:2181) and timeout (15000) / > > elapsed (41794) > > >>>>>>>>>>>>> org.apache.curator.CuratorConnectionLossException: > > >>>>>>>>>>>>> KeeperErrorCode = ConnectionLoss' > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I guess the root cause is that the container didn't not > > >>>>>>>>>>>>> have enough resources. I found you query on a table called > > >>>>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a > > complex query on a > > >>>>>>>>>>>>> table which contains 400 million rows? > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Since I am the uploader of kylin5 's docker image, I > want > > >>>>>>>>>>>>> to give some explainment. Kylin5 docker is not a place for > > performance > > >>>>>>>>>>>>> benchmarks, it is only for demonstration. It is only > > allocated with very > > >>>>>>>>>>>>> little resources(8G memory) if you are using the default > > command from > > >>>>>>>>>>>>> docker hub page. Before I uploaded my image, I only tested > > my image using > > >>>>>>>>>>>>> the ssb dataset, which the biggest table only contains > about > > 60k rows. If > > >>>>>>>>>>>>> you are using a larger dataset and complexer queries, you > > have to scale the > > >>>>>>>>>>>>> resource properly. Try querying tables which contain not > > more than 100k > > >>>>>>>>>>>>> rows by default. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Here are some tips which may help you to check if the > > >>>>>>>>>>>>> daemon service is in health status and > > resources(particularly disk space) > > >>>>>>>>>>>>> is configured properly. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> 1. Checking HDFS 's web ui( > > >>>>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to > > >>>>>>>>>>>>> confirm whether HDFS service is in 'In service' status. > > >>>>>>>>>>>>> 2. Checking Datanode 's log in > > >>>>>>>>>>>>> > > `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, check > if > > >>>>>>>>>>>>> there is any error message. Like: cat > > >>>>>>>>>>>>> > > /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep > ERROR > > >>>>>>>>>>>>> | wc -l > > >>>>>>>>>>>>> 3. Checking if your docker engine is configured with > > >>>>>>>>>>>>> enough disk space, if you are using Docker Desktop like > > me,please go to > > >>>>>>>>>>>>> "Settings" - "Resources" - "Advanced", make sure you have > > allocated 40GB+ > > >>>>>>>>>>>>> disk space to the docker container. > > >>>>>>>>>>>>> 4. Checking the available disk space of your container > by > > >>>>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less than > 60% . > > >>>>>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm gc. Make > > sure > > >>>>>>>>>>>>> these metrics are not really high when you send a query. > > >>>>>>>>>>>>> ------------------------ > > >>>>>>>>>>>>> With warm regard > > >>>>>>>>>>>>> Xiaoxiang Yu > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy > > >>>>>>>>>>>>> <na...@vnpay.vn.invalid> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hi ShaoFeng > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Thank you very much for your valuable feedback > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I saw the application to be there (if I see it right) as > in > > >>>>>>>>>>>>>> the attachment photo. Kindly advise so that I can run this > > query on OLAP. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> PS. I sent you the log file in private. > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> [image: image.png] > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < > > >>>>>>>>>>>>>> shaofeng...@apache.org> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Can you provide the messages in logs/kylin.log when > > >>>>>>>>>>>>>>> executing the SQL? and you can also check the Spark UI > > from yarn resource > > >>>>>>>>>>>>>>> manager (there should be one running application called > > Spardar, which is > > >>>>>>>>>>>>>>> Kylin's backend spark application). If the application is > > not there, it may > > >>>>>>>>>>>>>>> indicates the yarn doesn't have resource to startup it. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Best regards, > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Shaofeng Shi 史少锋 > > >>>>>>>>>>>>>>> Apache Kylin PMC, > > >>>>>>>>>>>>>>> Apache Incubator PMC, > > >>>>>>>>>>>>>>> Email: shaofeng...@apache.org > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Apache Kylin FAQ: > > >>>>>>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html > > >>>>>>>>>>>>>>> Join Kylin user mail group: > > user-subscr...@kylin.apache.org > > >>>>>>>>>>>>>>> Join Kylin dev mail group: > dev-subscr...@kylin.apache.org > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道: > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Dear Sir/Madam, > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I have a fact with 500million rows then I build model, > > >>>>>>>>>>>>>>>> index according to the website help. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I chose full incremental because this is the first > times I > > >>>>>>>>>>>>>>>> load data > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> I create both index types Aggregate group index, table > > >>>>>>>>>>>>>>>> index as photo attached. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> But the query always failed after timeout of 300 seconds > > (I > > >>>>>>>>>>>>>>>> run in docker), I dont want to increase the value of 300 > > seconds because I > > >>>>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that > possible?) > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> It seems that the OLAP function in indexing not working > to > > >>>>>>>>>>>>>>>> speedup the query by precomputed cube. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Can you advise to check whether the index did really > work? > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> It is quite urgent task for me so prompt response is > > highly > > >>>>>>>>>>>>>>>> appreciated. > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>>> Thank you very much > > >>>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> > > >