Hi Xiaoxiang, Thank you so much as always for the kind guideline about automation of the process
I looked at the segment management API and not sure how to generate new segment from API, could you please elaborate a bit? More over, when I query the sole Fact table only, how can I define model with fact table only? Best regards On Thu, Nov 2, 2023 at 1:21 PM Xiaoxiang Yu <x...@apache.org> wrote: > 1. How can I automate the build index daily for newest data? > I guess your team/teammate will manage a ETL pipeline(Jenkins, > DolphinScheduler etc), > you may call Kylin by a rest api in your pipeline, here is the link: > https://kylin.apache.org/5.0/docs/restapi/segment_management_api > > 2. Can I apply the above automate process for near realtime or realtime > data? > There will be a latency of about 10 min to 2 hours in most cases, it > depends on how fast > the build index job is completed. > > > ------------------------ > With warm regard > Xiaoxiang Yu > > > > On Thu, Nov 2, 2023 at 2:10 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote: > > > Hi Xiaoxiang > > > > My case is not in date range and I need to do daily. > > > > 1. How can I automate the build index daily for newest data? > > > > 2. Can I apply the above automate process for near realtime or realtime > > data (load realtime data from Hive into new index/segment) > > > > Thank you very much for your help > > > > > > On Thu, 2 Nov 2023 at 12:58 Xiaoxiang Yu <x...@apache.org> wrote: > > > > > If the new data 's date range is covered by a segment in your model, > you > > > should refresh your existing segment, refer to : > > > > > > > > > > > > https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro#segment-operation > > > . > > > > > > If not, create a new segment and build index, refer to : > > > https://kylin.apache.org/5.0/docs/modeling/load_data/by_date > > > > > > ------------------------ > > > With warm regard > > > Xiaoxiang Yu > > > > > > > > > > > > On Thu, Nov 2, 2023 at 11:57 AM Nam Đỗ Duy <na...@vnpay.vn.invalid> > > wrote: > > > > > > > Thank you XiangXiao, I still have 1 question as follows: > > > > > > > > When the Hive Datasource to be added with new data, how to reflect > > those > > > in > > > > Cube (index) and query result? > > > > > > > > > > > > On Thu, Nov 2, 2023 at 10:00 AM Xiaoxiang Yu <x...@apache.org> > wrote: > > > > > > > > > Congratulations, hope you will make good use of the ability of > Kylin > > 5 > > > > for > > > > > your use cases. > > > > > > > > > > > > > > > ------------------------ > > > > > With warm regard > > > > > Xiaoxiang Yu > > > > > > > > > > > > > > > > > > > > On Thu, Nov 2, 2023 at 10:50 AM Nam Đỗ Duy <na...@vnpay.vn.invalid > > > > > > wrote: > > > > > > > > > >> The query is too fast, less than a second, can you make it a > little > > > bit > > > > >> slower so that I can see it clearly 😀😀 > > > > >> [image: image.png] > > > > >> > > > > >> On Thu, Nov 2, 2023 at 9:32 AM Nam Đỗ Duy <na...@vnpay.vn> wrote: > > > > >> > > > > >>> Thank you Xiaoxiang for the guideline. Will definitely read > > > > >>> it carefully. Kindly help the following questions: > > > > >>> > > > > >>> 1. Computed column > > > > >>> > > > > >>> I created a “computed column” and add it to dimensions (among > other > > > > >>> dimensions) > > > > >>> > > > > >>> When I use query to select the computed column it returned error > > > > >>> > > > > >>> 2. Datatype optimization: will you think that the int be better > > than > > > > >>> string for key join columns? > > > > >>> > > > > >>> Please advise > > > > >>> > > > > >>> > > > > >>> On Wed, 1 Nov 2023 at 17:32 Xiaoxiang Yu <x...@apache.org> > wrote: > > > > >>> > > > > >>>> Yes, that is almost correct. > > > > >>>> > > > > >>>> If you have a lot of complex queries, and you want to using > Kylin > > 5 > > > to > > > > >>>> accelerate them, the recommended steps of mine are as follows: > > > > >>>> > > > > >>>> 1. You analyse all queries and collect all join > relation/pattern. > > > > >>>> 2. You create Models for each specific join relation/pattern, > with > > > the > > > > >>>> join > > > > >>>> relation you find in above step. > > > > >>>> 3. You analyse and collect dimensions and measures from all > > queries, > > > > and > > > > >>>> add them to the corresponding Model. > > > > >>>> 4. You build segments of all Models with proper data range. > > > > >>>> 5. You turned off the pushdown switch, and sent all queries to > > > Kylin. > > > > If > > > > >>>> there are some queries which failed, fix them. > > > > >>>> Here are some common situations. > > > > >>>> 5.1 Join relation/pattern is not matched > > > > >>>> 5.2 If the join relation is matched, the Model might not > > contain > > > > >>>> every > > > > >>>> column that your query needs, please check kylin.query.log with > > > > keyword > > > > >>>> ' > > > > >>>> unmatched'. > > > > >>>> 6. (Optional) If you find some of your queries do not exactly > > match > > > > with > > > > >>>> your Index(your query on [colA, colB], but your index contains > > more > > > > >>>> columns > > > > >>>> than colA and colB), you can add some aggregate groups(or > smaller > > > > Table > > > > >>>> Index) to optimize the query performance. > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> ------------------------ > > > > >>>> With warm regard > > > > >>>> Xiaoxiang Yu > > > > >>>> > > > > >>>> > > > > >>>> > > > > >>>> On Wed, Nov 1, 2023 at 5:57 PM Nam Đỗ Duy > <na...@vnpay.vn.invalid > > > > > > > >>>> wrote: > > > > >>>> > > > > >>>> > Thank you Xiaoxiang, I nearly got to the point. > > > > >>>> > > > > > >>>> > So can I interpret that: 1 model equal (~) to a set of Joins > of > > > > >>>> (Dim/Fact) > > > > >>>> > table, that is to say we need to create several models > according > > > to > > > > >>>> > multiple kinds of joins queries? > > > > >>>> > > > > > >>>> > Best regards > > > > >>>> > > > > > >>>> > On Wed, Nov 1, 2023 at 4:50 PM Xiaoxiang Yu <x...@apache.org> > > > > wrote: > > > > >>>> > > > > > >>>> >> Have you ever tried to analyse the reason why your query can > > not > > > > hit > > > > >>>> >> Model 'sample_ssb'? > > > > >>>> >> It is because the join relation of your query is not suitable > > for > > > > the > > > > >>>> >> join relation/pattern of Model 'sample_ssb'. > > > > >>>> >> > > > > >>>> >> Your query used a join relation/pattern like: A inner join B. > > > > >>>> >> But the Model 'sample_ssb' used a join relation/pattern like > : > > A > > > > >>>> inner > > > > >>>> >> join B inner join C. > > > > >>>> >> > > > > >>>> >> If you are familiar with the definition of Inner join, you > may > > > know > > > > >>>> that > > > > >>>> >> the > > > > >>>> >> relation/pattern 'A inner join B inner join C' will have a > > chance > > > > >>>> >> to lose some rows when compared to pattern 'A inner join B'. > > > > >>>> >> So the Model 'sample_ssb' will be excluded to serve your > query. > > > > >>>> >> > > > > >>>> >> That is to say, you need to create a new model that is > similar > > to > > > > >>>> Model > > > > >>>> >> 'sample_ssb', > > > > >>>> >> but with additional tables removed. > > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> >> ------------------------ > > > > >>>> >> With warm regard > > > > >>>> >> Xiaoxiang Yu > > > > >>>> >> > > > > >>>> >> > > > > >>>> >> > > > > >>>> >> On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy > > <na...@vnpay.vn.invalid > > > > > > > > >>>> wrote: > > > > >>>> >> > > > > >>>> >>> Hi Xiaoxiang, > > > > >>>> >>> > > > > >>>> >>> Thank you very much > > > > >>>> >>> > > > > >>>> >>> I have clearer picture of Kylin already thanks to your > > > > explanation. > > > > >>>> >>> > > > > >>>> >>> Now back to the sample project of SSB in attached photo, > when > > I > > > > run > > > > >>>> this > > > > >>>> >>> query with push_down option OFF, why the OLAP error appears, > > and > > > > in > > > > >>>> such > > > > >>>> >>> case, how to create a new cube for this query? > > > > >>>> >>> > > > > >>>> >>> [image: image.png] > > > > >>>> >>> > > > > >>>> >>> On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu < > x...@apache.org> > > > > >>>> wrote: > > > > >>>> >>> > > > > >>>> >>>> Here is some of my explanation and it may not be perfect. > > > > >>>> >>>> Segment in Kylin is part of model/cube pre-computed data, > in > > > most > > > > >>>> >>>> cases, divided by date column. > > > > >>>> >>>> > > > > >>>> >>>> Here is some difference between Segment and Snapshot. > > > > >>>> >>>> Segment, whose source data comes from one fact table joins > > some > > > > >>>> dimension > > > > >>>> >>>> tables with 'specific date range', is 'precomputed', and > will > > > > >>>> accelerate > > > > >>>> >>>> complex query. > > > > >>>> >>>> Snapshot, whose source data comes from one specific > dimension > > > > >>>> table without > > > > >>>> >>>> specific date range, is "not precomputed", and can join > with > > > > >>>> segments > > > > >>>> >>>> at runtime . > > > > >>>> >>>> > > > > >>>> >>>> - > > > https://kylin.apache.org/5.0/docs/snapshot/snapshot_management > > > > >>>> >>>> - > > > > >>>> >>>> > > > > >>>> > > > > > > > > > > https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro > > > > >>>> >>>> > > > > >>>> >>>> ------------------------ > > > > >>>> >>>> With warm regard > > > > >>>> >>>> Xiaoxiang Yu > > > > >>>> >>>> > > > > >>>> >>>> > > > > >>>> >>>> > > > > >>>> >>>> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn> > > > > wrote: > > > > >>>> >>>> > > > > >>>> >>>>> Thank you again, very smart of you to automatically select > > > cube > > > > >>>> for a > > > > >>>> >>>>> certain query. Sorry If I ask too much: Is the concept of > > > > Segment > > > > >>>> in Kylin > > > > >>>> >>>>> model similar to Slice-and-Dice concept of Cube, what is > the > > > > >>>> different > > > > >>>> >>>>> between Kylin Segment and Kylin Snapshot? > > > > >>>> >>>>> > > > > >>>> >>>>> PS. I sent you the log files for your help in > investigating > > > why > > > > my > > > > >>>> >>>>> cube has not been used. > > > > >>>> >>>>> > > > > >>>> >>>>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu < > > x...@apache.org> > > > > >>>> wrote: > > > > >>>> >>>>> > > > > >>>> >>>>>> I guess there is a misunderstanding from your sentences. > > > > >>>> >>>>>> > > > > >>>> >>>>>> -- 'I need to select Cube from a combo box below the > query > > > > >>>> window' > > > > >>>> >>>>>> It is not right to use 'need', that combo box is for some > > > > >>>> specific > > > > >>>> >>>>>> cases(for example, Kylin did not choose a cube which is > the > > > > most > > > > >>>> >>>>>> efficient), not the most cases. > > > > >>>> >>>>>> In most cases(both for Kylin 4 and Kylin 5), you don't > need > > > to > > > > >>>> select > > > > >>>> >>>>>> a Cube in the combo box, Kylin will do the choice for > you. > > > > >>>> >>>>>> > > > > >>>> >>>>>> ------------------------ > > > > >>>> >>>>>> With warm regard > > > > >>>> >>>>>> Xiaoxiang Yu > > > > >>>> >>>>>> > > > > >>>> >>>>>> > > > > >>>> >>>>>> > > > > >>>> >>>>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy > > > > <na...@vnpay.vn.invalid > > > > >>>> > > > > > >>>> >>>>>> wrote: > > > > >>>> >>>>>> > > > > >>>> >>>>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it > is > > > > just a > > > > >>>> >>>>>>> question of a beginner) > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> "obviously" means "clearly" > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> because I need to select Cube from a combo box below the > > > query > > > > >>>> window > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> Thank you very much > > > > >>>> >>>>>>> > > > > >>>> >>>>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu < > > > x...@apache.org> > > > > >>>> wrote: > > > > >>>> >>>>>>> > > > > >>>> >>>>>>>> From my side, I cannot understand why you say Kylin 4 > is > > > > 'very > > > > >>>> >>>>>>>> obviously'. Can you give an example? > > > > >>>> >>>>>>>> From the source code, the basic logic of choosing the > > right > > > > >>>> >>>>>>>> cube/model are similar. > > > > >>>> >>>>>>>> ------------------------ > > > > >>>> >>>>>>>> With warm regard > > > > >>>> >>>>>>>> Xiaoxiang Yu > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy < > > na...@vnpay.vn> > > > > >>>> wrote: > > > > >>>> >>>>>>>> > > > > >>>> >>>>>>>>> Thank you for your kind reply, please answer 1 more > > > question > > > > >>>> about > > > > >>>> >>>>>>>>> version 5: > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> In version 4.x we run query against a Cube very > > obviously, > > > > >>>> but in > > > > >>>> >>>>>>>>> version 5, the cube usage is a implication socan you > > > advise: > > > > >>>> for a given > > > > >>>> >>>>>>>>> query, which model will be used, which index (cube) > will > > > be > > > > >>>> used for this > > > > >>>> >>>>>>>>> query? > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> Thank you > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu < > > > > x...@apache.org> > > > > >>>> >>>>>>>>> wrote: > > > > >>>> >>>>>>>>> > > > > >>>> >>>>>>>>>> 1. How do I measure the size of the index (cube) in > > > version > > > > >>>> 5? > > > > >>>> >>>>>>>>>> You can check storage of specific Indexes from the > > > Index > > > > >>>> page. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> > > > > > > > > > > https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index > > > > >>>> >>>>>>>>>> or > > > > >>>> >>>>>>>>>> > > > > >>>> > > > > > > > > > > https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> 2. How to create the cardinality for each column? > > > > >>>> >>>>>>>>>> You should check this link : > > > > >>>> >>>>>>>>>> > > > > https://kylin.apache.org/5.0/docs/datasource/data_sampling/ > > > > >>>> . > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> 3. In your default project sample named SSB project, > > you > > > > have > > > > >>>> >>>>>>>>>> only 4 simple aggregate group index and no table > index > > as > > > > in > > > > >>>> attached file > > > > >>>> >>>>>>>>>> so what is the best strategy to select index for our > > > OLAP? > > > > >>>> >>>>>>>>>> 1. There does exist a 'Base Table Index' by > > default > > > > >>>> >>>>>>>>>> actually, its id is 20000000001. > > > > >>>> >>>>>>>>>> 2. I think it is a good question and Kylin 5 > lacks > > > > such a > > > > >>>> >>>>>>>>>> guide for better modeling. You are free to ask your > > > > question > > > > >>>> to > > > > >>>> >>>>>>>>>> mailing list and I will try to reply. > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> ------------------------ > > > > >>>> >>>>>>>>>> With warm regard > > > > >>>> >>>>>>>>>> Xiaoxiang Yu > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu < > > > > x...@apache.org > > > > >>>> > > > > > >>>> >>>>>>>>>> wrote: > > > > >>>> >>>>>>>>>> > > > > >>>> >>>>>>>>>>> OK, I didn't read all the mail history so I > > > misunderstand > > > > >>>> the > > > > >>>> >>>>>>>>>>> situation. Looks like you need to analyse > > > > >>>> >>>>>>>>>>> the cause why the query didn't hit the cube > correctly. > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> Please generate query diagnosis package and send it > to > > > me > > > > >>>> >>>>>>>>>>> privately. I will analyse the query log. > > > > >>>> >>>>>>>>>>> You can refer to the following steps in screenshots. > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> [image: image.png] > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> If the screenshots are not displaying correctly, > > please > > > > read > > > > >>>> >>>>>>>>>>> this guide : > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> > > > > >>>> > > > > > > > > > > https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> By the way, you need to analyse the cause by reading > > > > >>>> >>>>>>>>>>> kylin.query.log, not the kylin.log, > > > > >>>> >>>>>>>>>>> refer to > > > > >>>> >>>>>>>>>>> > > > > >>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> ------------------------ > > > > >>>> >>>>>>>>>>> With warm regard > > > > >>>> >>>>>>>>>>> Xiaoxiang Yu > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy < > > > > na...@vnpay.vn> > > > > >>>> >>>>>>>>>>> wrote: > > > > >>>> >>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> Thank you Xiaoxiang for your advice. As my title > > email > > > > >>>> shown, I > > > > >>>> >>>>>>>>>>>> guessed that the OLAP functionalities has not been > > > > >>>> correctly set up in my > > > > >>>> >>>>>>>>>>>> computer. > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> The evidence about it is that: when I disable the > > > > Pushdown > > > > >>>> >>>>>>>>>>>> option box to use solely the precomputation cube > > only, > > > it > > > > >>>> showed following > > > > >>>> >>>>>>>>>>>> error: Please kindly advise how to properly build > the > > > > OLAP > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> LIMIT 500": No realization found for OLAPContext, > > > > >>>> MODEL_UNMATCHED_JOIN, > > > > >>>> > > > rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, > > > > >>>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, > > 10, > > > > >>>> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]) > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu < > > > > >>>> x...@apache.org> > > > > >>>> >>>>>>>>>>>> wrote: > > > > >>>> >>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Hi, > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Yesterday, I tried to see if query pushdown > > > > functions > > > > >>>> work > > > > >>>> >>>>>>>>>>>>> well in the Kylin5 docker, and all of my queries > > > return > > > > >>>> proper responses . > > > > >>>> >>>>>>>>>>>>> After checking your logs from Shaofeng, I > found > > > > these > > > > >>>> >>>>>>>>>>>>> error messages repeated many times: > > > > >>>> >>>>>>>>>>>>> 1. 'java.io.IOException: All datanodes > > > > >>>> >>>>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866 > > > > >>>> ,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] > > > > >>>> >>>>>>>>>>>>> are bad. Aborting...' > > > > >>>> >>>>>>>>>>>>> 2. 'curator.ConnectionState : Connection timed > > out > > > > for > > > > >>>> >>>>>>>>>>>>> connection string (localhost:2181) and timeout > > > (15000) / > > > > >>>> elapsed (41794) > > > > >>>> >>>>>>>>>>>>> org.apache.curator.CuratorConnectionLossException: > > > > >>>> >>>>>>>>>>>>> KeeperErrorCode = ConnectionLoss' > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> I guess the root cause is that the container > > > didn't > > > > >>>> not > > > > >>>> >>>>>>>>>>>>> have enough resources. I found you query on a > table > > > > called > > > > >>>> >>>>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you > gave > > a > > > > >>>> complex query on a > > > > >>>> >>>>>>>>>>>>> table which contains 400 million rows? > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Since I am the uploader of kylin5 's docker > > > image, I > > > > >>>> want > > > > >>>> >>>>>>>>>>>>> to give some explainment. Kylin5 docker is not a > > place > > > > >>>> for performance > > > > >>>> >>>>>>>>>>>>> benchmarks, it is only for demonstration. It is > only > > > > >>>> allocated with very > > > > >>>> >>>>>>>>>>>>> little resources(8G memory) if you are using the > > > default > > > > >>>> command from > > > > >>>> >>>>>>>>>>>>> docker hub page. Before I uploaded my image, I > only > > > > >>>> tested my image using > > > > >>>> >>>>>>>>>>>>> the ssb dataset, which the biggest table only > > contains > > > > >>>> about 60k rows. If > > > > >>>> >>>>>>>>>>>>> you are using a larger dataset and complexer > > queries, > > > > you > > > > >>>> have to scale the > > > > >>>> >>>>>>>>>>>>> resource properly. Try querying tables which > contain > > > not > > > > >>>> more than 100k > > > > >>>> >>>>>>>>>>>>> rows by default. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> Here are some tips which may help you to check > > if > > > > the > > > > >>>> >>>>>>>>>>>>> daemon service is in health status and > > > > >>>> resources(particularly disk space) > > > > >>>> >>>>>>>>>>>>> is configured properly. > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> 1. Checking HDFS 's web ui( > > > > >>>> >>>>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode > ) > > > to > > > > >>>> >>>>>>>>>>>>> confirm whether HDFS service is in 'In service' > > > status. > > > > >>>> >>>>>>>>>>>>> 2. Checking Datanode 's log in > > > > >>>> >>>>>>>>>>>>> > > > > >>>> > `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, > > > > check if > > > > >>>> >>>>>>>>>>>>> there is any error message. Like: cat > > > > >>>> >>>>>>>>>>>>> > > > > >>>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | > > > grep > > > > ERROR > > > > >>>> >>>>>>>>>>>>> | wc -l > > > > >>>> >>>>>>>>>>>>> 3. Checking if your docker engine is > configured > > > with > > > > >>>> >>>>>>>>>>>>> enough disk space, if you are using Docker Desktop > > > like > > > > >>>> me,please go to > > > > >>>> >>>>>>>>>>>>> "Settings" - "Resources" - "Advanced", make sure > you > > > > have > > > > >>>> allocated 40GB+ > > > > >>>> >>>>>>>>>>>>> disk space to the docker container. > > > > >>>> >>>>>>>>>>>>> 4. Checking the available disk space of your > > > > >>>> container by > > > > >>>> >>>>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less > > > than > > > > >>>> 60% . > > > > >>>> >>>>>>>>>>>>> 5. Checking the load average/ cpu usage/ jvm > gc. > > > > Make > > > > >>>> sure > > > > >>>> >>>>>>>>>>>>> these metrics are not really high when you send a > > > query. > > > > >>>> >>>>>>>>>>>>> ------------------------ > > > > >>>> >>>>>>>>>>>>> With warm regard > > > > >>>> >>>>>>>>>>>>> Xiaoxiang Yu > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy > > > > >>>> >>>>>>>>>>>>> <na...@vnpay.vn.invalid> wrote: > > > > >>>> >>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>> Hi ShaoFeng > > > > >>>> >>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>> Thank you very much for your valuable feedback > > > > >>>> >>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>> I saw the application to be there (if I see it > > right) > > > > as > > > > >>>> in > > > > >>>> >>>>>>>>>>>>>> the attachment photo. Kindly advise so that I can > > run > > > > >>>> this query on OLAP. > > > > >>>> >>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>> PS. I sent you the log file in private. > > > > >>>> >>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>> [image: image.png] > > > > >>>> >>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi < > > > > >>>> >>>>>>>>>>>>>> shaofeng...@apache.org> wrote: > > > > >>>> >>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> Can you provide the messages in logs/kylin.log > > when > > > > >>>> >>>>>>>>>>>>>>> executing the SQL? and you can also check the > > Spark > > > UI > > > > >>>> from yarn resource > > > > >>>> >>>>>>>>>>>>>>> manager (there should be one running application > > > > called > > > > >>>> Spardar, which is > > > > >>>> >>>>>>>>>>>>>>> Kylin's backend spark application). If the > > > application > > > > >>>> is not there, it may > > > > >>>> >>>>>>>>>>>>>>> indicates the yarn doesn't have resource to > > startup > > > > it. > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> Best regards, > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> Shaofeng Shi 史少锋 > > > > >>>> >>>>>>>>>>>>>>> Apache Kylin PMC, > > > > >>>> >>>>>>>>>>>>>>> Apache Incubator PMC, > > > > >>>> >>>>>>>>>>>>>>> Email: shaofeng...@apache.org > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> Apache Kylin FAQ: > > > > >>>> >>>>>>>>>>>>>>> > > > https://kylin.apache.org/docs/gettingstarted/faq.html > > > > >>>> >>>>>>>>>>>>>>> Join Kylin user mail group: > > > > >>>> user-subscr...@kylin.apache.org > > > > >>>> >>>>>>>>>>>>>>> Join Kylin dev mail group: > > > > >>>> dev-subscr...@kylin.apache.org > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 > > 10:35写道: > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> Dear Sir/Madam, > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> I have a fact with 500million rows then I build > > > > model, > > > > >>>> >>>>>>>>>>>>>>>> index according to the website help. > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> I chose full incremental because this is the > > first > > > > >>>> times I > > > > >>>> >>>>>>>>>>>>>>>> load data > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> I create both index types Aggregate group > index, > > > > table > > > > >>>> >>>>>>>>>>>>>>>> index as photo attached. > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> But the query always failed after timeout of > 300 > > > > >>>> seconds (I > > > > >>>> >>>>>>>>>>>>>>>> run in docker), I dont want to increase the > value > > > of > > > > >>>> 300 seconds because I > > > > >>>> >>>>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that > > > > >>>> possible?) > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> It seems that the OLAP function in indexing not > > > > >>>> working to > > > > >>>> >>>>>>>>>>>>>>>> speedup the query by precomputed cube. > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> Can you advise to check whether the index did > > > really > > > > >>>> work? > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> It is quite urgent task for me so prompt > response > > > is > > > > >>>> highly > > > > >>>> >>>>>>>>>>>>>>>> appreciated. > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>>> Thank you very much > > > > >>>> >>>>>>>>>>>>>>>> > > > > >>>> >>>>>>>>>>>>>>> > > > > >>>> > > > > >>> > > > > > > > > > >