Re: OLAP functionalities in Kylin 5.0 seems not yet working for me

Nam Đỗ Duy Wed, 01 Nov 2023 00:53:38 -0700

Thank you again, very smart of you to automatically select cube for a
certain query. Sorry If I ask too much: Is the concept of Segment in Kylin
model similar to Slice-and-Dice concept of Cube, what is the different
between Kylin Segment and Kylin Snapshot?


PS. I sent you the log files for your help in investigating why my cube has
not been used.

On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <x...@apache.org> wrote:

> I guess there is a misunderstanding from your sentences.
>
> -- 'I need to select Cube from a combo box below the query window'
> It is not right to use 'need', that combo box is for some specific
> cases(for example, Kylin did not choose a cube which is the most
> efficient), not the most cases.
> In most cases(both for Kylin 4 and Kylin 5), you don't need to select a
> Cube in the combo box, Kylin will do the choice for you.
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it is just a question
>> of a beginner)
>>
>> "obviously" means "clearly"
>>
>> because I need to select Cube from a combo box below the query window
>>
>> Thank you very much
>>
>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <x...@apache.org> wrote:
>>
>>> From my side, I cannot understand why you say Kylin 4 is 'very obviously'.
>>> Can you give an example?
>>> From the source code, the basic logic of choosing the right cube/model
>>> are similar.
>>> ------------------------
>>> With warm regard
>>> Xiaoxiang Yu
>>>
>>>
>>>
>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>>>
>>>> Thank you for your kind reply, please answer 1 more question about
>>>> version 5:
>>>>
>>>> In version 4.x we run query against a Cube very obviously, but in
>>>> version 5, the cube usage is a implication socan you advise: for a given
>>>> query, which model will be used, which index (cube) will be used for this
>>>> query?
>>>>
>>>> Thank you
>>>>
>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <x...@apache.org> wrote:
>>>>
>>>>> 1. How do I measure the size of the index (cube) in version 5?
>>>>>    You can check storage of specific Indexes from the Index page.
>>>>>
>>>>> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index
>>>>> or
>>>>> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png
>>>>>
>>>>>
>>>>> 2. How to create the cardinality for each column?
>>>>>    You should check this link :
>>>>> https://kylin.apache.org/5.0/docs/datasource/data_sampling/ .
>>>>>
>>>>> 3. In your default project sample named SSB project, you have only 4
>>>>> simple aggregate group index and no table index as in attached file
>>>>> so what is the best strategy to select index for our OLAP?
>>>>>     1. There does exist a 'Base Table Index'  by default actually,
>>>>> its id is 20000000001.
>>>>>     2. I think it is a good question and Kylin 5 lacks such a guide
>>>>> for better modeling. You are free to ask your question to
>>>>> mailing list and I will try to reply.
>>>>>
>>>>> ------------------------
>>>>> With warm regard
>>>>> Xiaoxiang Yu
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <x...@apache.org> wrote:
>>>>>
>>>>>> OK, I didn't read all the mail history so I misunderstand the
>>>>>> situation. Looks like you need to analyse
>>>>>> the cause why the query didn't hit the cube correctly.
>>>>>>
>>>>>> Please generate query diagnosis package and send it to me privately.
>>>>>> I will analyse the query log.
>>>>>> You can refer to the following steps in screenshots.
>>>>>>
>>>>>> [image: image.png]
>>>>>>
>>>>>> If the screenshots are not displaying correctly, please read this
>>>>>> guide :
>>>>>>
>>>>>> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui
>>>>>>
>>>>>> By the way, you need to analyse the cause by reading kylin.query.log,
>>>>>> not the kylin.log,
>>>>>> refer to https://kylin.apache.org/5.0/docs/operations/logs/system_log
>>>>>>
>>>>>> ------------------------
>>>>>> With warm regard
>>>>>> Xiaoxiang Yu
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <na...@vnpay.vn> wrote:
>>>>>>
>>>>>>> Thank you Xiaoxiang for your advice. As my title email shown, I
>>>>>>> guessed that the OLAP functionalities has not been correctly set up in 
>>>>>>> my
>>>>>>> computer.
>>>>>>>
>>>>>>> The evidence about it is that: when I disable the Pushdown option
>>>>>>> box to use solely the precomputation cube only, it showed following 
>>>>>>> error:
>>>>>>> Please kindly advise how to properly build the OLAP
>>>>>>>
>>>>>>> LIMIT 500": No realization found for OLAPContext, MODEL_UNMATCHED_JOIN, 
>>>>>>> rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS, 
>>>>>>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 
>>>>>>> 12, 13, 14, 15, 16, 17, 18, 19, 20])
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu <x...@apache.org>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>     Yesterday, I tried to see if query pushdown functions work well
>>>>>>>> in the Kylin5 docker, and all of my queries return proper responses .
>>>>>>>>     After checking your logs from Shaofeng, I found these error
>>>>>>>> messages repeated many times:
>>>>>>>>     1. 'java.io.IOException: All datanodes DatanodeInfoWithStorage[
>>>>>>>> 127.0.0.1:9866,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK] are
>>>>>>>> bad. Aborting...'
>>>>>>>>     2. 'curator.ConnectionState : Connection timed out for
>>>>>>>> connection string (localhost:2181) and timeout (15000) / elapsed 
>>>>>>>> (41794)
>>>>>>>> org.apache.curator.CuratorConnectionLossException: KeeperErrorCode
>>>>>>>> = ConnectionLoss'
>>>>>>>>
>>>>>>>>     I guess the root cause is that the container didn't not have
>>>>>>>> enough resources. I found you query on a table called
>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you gave a complex query on 
>>>>>>>> a
>>>>>>>> table which contains 400 million rows?
>>>>>>>>
>>>>>>>>     Since I am the uploader of kylin5 's docker image, I want to
>>>>>>>> give some explainment. Kylin5 docker is not a place for performance
>>>>>>>> benchmarks, it is only for demonstration. It is only allocated with 
>>>>>>>> very
>>>>>>>> little resources(8G memory) if you are using the default command from
>>>>>>>> docker hub page. Before I uploaded my image, I only tested my image 
>>>>>>>> using
>>>>>>>> the ssb dataset, which the biggest table only contains about 60k rows. 
>>>>>>>> If
>>>>>>>> you are using a larger dataset and complexer queries, you have to 
>>>>>>>> scale the
>>>>>>>> resource properly. Try querying tables which contain not more than 100k
>>>>>>>> rows by default.
>>>>>>>>
>>>>>>>>     Here are some tips which may help you to check if the daemon
>>>>>>>> service is in health status and resources(particularly disk space) is
>>>>>>>> configured properly.
>>>>>>>>
>>>>>>>>     1. Checking HDFS 's web ui(
>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode ) to confirm
>>>>>>>> whether HDFS service is in 'In service' status.
>>>>>>>>     2. Checking Datanode 's log in
>>>>>>>> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`, 
>>>>>>>> check if
>>>>>>>> there is any error message. Like: cat
>>>>>>>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log | grep 
>>>>>>>> ERROR
>>>>>>>> | wc -l
>>>>>>>>     3. Checking if your docker engine is configured with enough
>>>>>>>> disk space, if you are using Docker Desktop like me,please go to 
>>>>>>>> "Settings"
>>>>>>>> - "Resources" - "Advanced", make sure you have allocated 40GB+ disk 
>>>>>>>> space
>>>>>>>> to the docker container.
>>>>>>>>     4. Checking the available disk space of your container by `df
>>>>>>>> -h`, make sure the 'Use%' of 'overlay' is less than 60% .
>>>>>>>>     5. Checking the load average/ cpu usage/ jvm gc. Make sure
>>>>>>>> these metrics are not really high when you send a query.
>>>>>>>> ------------------------
>>>>>>>> With warm regard
>>>>>>>> Xiaoxiang Yu
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy <na...@vnpay.vn.invalid>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi ShaoFeng
>>>>>>>>>
>>>>>>>>> Thank you very much for your valuable feedback
>>>>>>>>>
>>>>>>>>> I saw the application to be there (if I see it right) as in the
>>>>>>>>> attachment photo. Kindly advise so that I can run this query on OLAP.
>>>>>>>>>
>>>>>>>>> PS. I sent you the log file in private.
>>>>>>>>>
>>>>>>>>> [image: image.png]
>>>>>>>>>
>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi <
>>>>>>>>> shaofeng...@apache.org> wrote:
>>>>>>>>>
>>>>>>>>>> Can you provide the messages in logs/kylin.log when executing the
>>>>>>>>>> SQL? and you can also check the Spark UI from yarn resource manager 
>>>>>>>>>> (there
>>>>>>>>>> should be one running application called Spardar, which is Kylin's 
>>>>>>>>>> backend
>>>>>>>>>> spark application). If the application is not there, it may 
>>>>>>>>>> indicates the
>>>>>>>>>> yarn doesn't have resource to startup it.
>>>>>>>>>>
>>>>>>>>>> Best regards,
>>>>>>>>>>
>>>>>>>>>> Shaofeng Shi 史少锋
>>>>>>>>>> Apache Kylin PMC,
>>>>>>>>>> Apache Incubator PMC,
>>>>>>>>>> Email: shaofeng...@apache.org
>>>>>>>>>>
>>>>>>>>>> Apache Kylin FAQ:
>>>>>>>>>> https://kylin.apache.org/docs/gettingstarted/faq.html
>>>>>>>>>> Join Kylin user mail group: user-subscr...@kylin.apache.org
>>>>>>>>>> Join Kylin dev mail group: dev-subscr...@kylin.apache.org
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二 10:35写道：
>>>>>>>>>>
>>>>>>>>>>> Dear Sir/Madam,
>>>>>>>>>>>
>>>>>>>>>>> I have a fact with 500million rows then I build model, index
>>>>>>>>>>> according to the website help.
>>>>>>>>>>>
>>>>>>>>>>> I chose full incremental because this is the first times I load
>>>>>>>>>>> data
>>>>>>>>>>>
>>>>>>>>>>> I create both index types Aggregate group index, table index as
>>>>>>>>>>> photo attached.
>>>>>>>>>>>
>>>>>>>>>>> But the query always failed after timeout of 300 seconds (I run
>>>>>>>>>>> in docker), I dont want to increase the value of 300 seconds 
>>>>>>>>>>> because I wish
>>>>>>>>>>> the OLAP can run within 1 minutes (is that possible?)
>>>>>>>>>>>
>>>>>>>>>>> It seems that the OLAP function in indexing not working to
>>>>>>>>>>> speedup the query by precomputed cube.
>>>>>>>>>>>
>>>>>>>>>>> Can you advise to check whether the index did really work?
>>>>>>>>>>>
>>>>>>>>>>> It is quite urgent task for me so prompt response is highly
>>>>>>>>>>> appreciated.
>>>>>>>>>>>
>>>>>>>>>>> Thank you very much
>>>>>>>>>>>
>>>>>>>>>>

Re: OLAP functionalities in Kylin 5.0 seems not yet working for me

Reply via email to