Hi Xiaoxiang,

Thank you so much as always for the kind guideline about automation of the
process

I looked at the segment management API and not sure how to generate new
segment from API, could you please elaborate a bit?

More over, when I query the sole Fact table only, how can I define model
with fact table only?

Best regards

On Thu, Nov 2, 2023 at 1:21 PM Xiaoxiang Yu <x...@apache.org> wrote:

> 1. How can I automate the build index daily for newest data?
>     I guess your team/teammate will manage a ETL pipeline(Jenkins,
> DolphinScheduler etc),
>     you may call Kylin by a rest api in your pipeline, here is the link:
>     https://kylin.apache.org/5.0/docs/restapi/segment_management_api
>
> 2. Can I apply the above automate process for near realtime or realtime
> data?
>     There will be a latency of about 10 min to 2 hours in most cases, it
> depends on how fast
>     the build index job is completed.
>
>
> ------------------------
> With warm regard
> Xiaoxiang Yu
>
>
>
> On Thu, Nov 2, 2023 at 2:10 PM Nam Đỗ Duy <na...@vnpay.vn.invalid> wrote:
>
> > Hi Xiaoxiang
> >
> > My case is not in date range and I need to do daily.
> >
> > 1. How can I automate the build index daily for newest data?
> >
> > 2. Can I apply the above automate process for near realtime or realtime
> > data (load realtime data from Hive into new index/segment)
> >
> > Thank you very much for your help
> >
> >
> > On Thu, 2 Nov 2023 at 12:58 Xiaoxiang Yu <x...@apache.org> wrote:
> >
> > > If the new data 's date range is covered by a segment in your model,
> you
> > > should refresh your existing segment, refer to :
> > >
> > >
> > >
> >
> https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro#segment-operation
> > > .
> > >
> > > If not, create a new segment and build index, refer to :
> > >   https://kylin.apache.org/5.0/docs/modeling/load_data/by_date
> > >
> > > ------------------------
> > > With warm regard
> > > Xiaoxiang Yu
> > >
> > >
> > >
> > > On Thu, Nov 2, 2023 at 11:57 AM Nam Đỗ Duy <na...@vnpay.vn.invalid>
> > wrote:
> > >
> > > > Thank you XiangXiao, I  still have 1 question as follows:
> > > >
> > > > When the Hive Datasource to be added with new data, how to reflect
> > those
> > > in
> > > > Cube (index) and query result?
> > > >
> > > >
> > > > On Thu, Nov 2, 2023 at 10:00 AM Xiaoxiang Yu <x...@apache.org>
> wrote:
> > > >
> > > > > Congratulations, hope you will make good use of the ability of
> Kylin
> > 5
> > > > for
> > > > > your use cases.
> > > > >
> > > > >
> > > > > ------------------------
> > > > > With warm regard
> > > > > Xiaoxiang Yu
> > > > >
> > > > >
> > > > >
> > > > > On Thu, Nov 2, 2023 at 10:50 AM Nam Đỗ Duy <na...@vnpay.vn.invalid
> >
> > > > wrote:
> > > > >
> > > > >> The query is too fast, less than a second, can you make it a
> little
> > > bit
> > > > >> slower so that I can see it clearly 😀😀
> > > > >> [image: image.png]
> > > > >>
> > > > >> On Thu, Nov 2, 2023 at 9:32 AM Nam Đỗ Duy <na...@vnpay.vn> wrote:
> > > > >>
> > > > >>> Thank you Xiaoxiang for the guideline. Will definitely read
> > > > >>> it carefully. Kindly help the following questions:
> > > > >>>
> > > > >>> 1. Computed column
> > > > >>>
> > > > >>> I created a “computed column” and add it to dimensions (among
> other
> > > > >>> dimensions)
> > > > >>>
> > > > >>> When I use query to select the computed column it returned error
> > > > >>>
> > > > >>> 2. Datatype optimization: will you think that the int be better
> > than
> > > > >>> string for key join columns?
> > > > >>>
> > > > >>> Please advise
> > > > >>>
> > > > >>>
> > > > >>> On Wed, 1 Nov 2023 at 17:32 Xiaoxiang Yu <x...@apache.org>
> wrote:
> > > > >>>
> > > > >>>> Yes, that is almost correct.
> > > > >>>>
> > > > >>>> If you have a lot of complex queries, and you want to using
> Kylin
> > 5
> > > to
> > > > >>>> accelerate them, the recommended steps of mine are as follows:
> > > > >>>>
> > > > >>>> 1. You analyse all queries and collect all join
> relation/pattern.
> > > > >>>> 2. You create Models for each specific join relation/pattern,
> with
> > > the
> > > > >>>> join
> > > > >>>> relation you find in above step.
> > > > >>>> 3. You analyse and collect dimensions and measures from all
> > queries,
> > > > and
> > > > >>>> add them to the corresponding Model.
> > > > >>>> 4. You build segments of all Models with proper data range.
> > > > >>>> 5. You turned off the pushdown switch, and sent all queries to
> > > Kylin.
> > > > If
> > > > >>>> there are some queries which failed, fix them.
> > > > >>>>     Here are some common situations.
> > > > >>>>     5.1 Join relation/pattern is not matched
> > > > >>>>     5.2 If the join relation is matched, the Model might not
> > contain
> > > > >>>> every
> > > > >>>> column that your query needs, please check kylin.query.log with
> > > > keyword
> > > > >>>> '
> > > > >>>> unmatched'.
> > > > >>>> 6. (Optional) If you find some of your queries do not exactly
> > match
> > > > with
> > > > >>>> your Index(your query on [colA, colB], but your index contains
> > more
> > > > >>>> columns
> > > > >>>> than colA and colB), you can add some aggregate groups(or
> smaller
> > > > Table
> > > > >>>> Index) to optimize the query performance.
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> ------------------------
> > > > >>>> With warm regard
> > > > >>>> Xiaoxiang Yu
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>>> On Wed, Nov 1, 2023 at 5:57 PM Nam Đỗ Duy
> <na...@vnpay.vn.invalid
> > >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> > Thank you Xiaoxiang, I nearly got to the point.
> > > > >>>> >
> > > > >>>> > So can I interpret that: 1 model equal (~) to a set of Joins
> of
> > > > >>>> (Dim/Fact)
> > > > >>>> > table, that is to say we need to create several models
> according
> > > to
> > > > >>>> > multiple kinds of joins queries?
> > > > >>>> >
> > > > >>>> > Best regards
> > > > >>>> >
> > > > >>>> > On Wed, Nov 1, 2023 at 4:50 PM Xiaoxiang Yu <x...@apache.org>
> > > > wrote:
> > > > >>>> >
> > > > >>>> >> Have you ever tried to analyse the reason why your query can
> > not
> > > > hit
> > > > >>>> >> Model 'sample_ssb'?
> > > > >>>> >> It is because the join relation of your query is not suitable
> > for
> > > > the
> > > > >>>> >> join relation/pattern of  Model 'sample_ssb'.
> > > > >>>> >>
> > > > >>>> >> Your query used a join relation/pattern like: A inner join B.
> > > > >>>> >> But the Model 'sample_ssb' used a join relation/pattern like
> :
> > A
> > > > >>>> inner
> > > > >>>> >> join B inner join C.
> > > > >>>> >>
> > > > >>>> >> If you are familiar with the definition of Inner join, you
> may
> > > know
> > > > >>>> that
> > > > >>>> >> the
> > > > >>>> >> relation/pattern 'A inner join B inner join C' will have a
> > chance
> > > > >>>> >> to lose some rows when compared to pattern 'A inner join B'.
> > > > >>>> >> So the Model 'sample_ssb' will be excluded to serve your
> query.
> > > > >>>> >>
> > > > >>>> >> That is to say, you need to create a new model that is
> similar
> > to
> > > > >>>> Model
> > > > >>>> >> 'sample_ssb',
> > > > >>>> >>  but with additional tables removed.
> > > > >>>> >>
> > > > >>>> >>
> > > > >>>> >>
> > > > >>>> >> ------------------------
> > > > >>>> >> With warm regard
> > > > >>>> >> Xiaoxiang Yu
> > > > >>>> >>
> > > > >>>> >>
> > > > >>>> >>
> > > > >>>> >> On Wed, Nov 1, 2023 at 5:21 PM Nam Đỗ Duy
> > <na...@vnpay.vn.invalid
> > > >
> > > > >>>> wrote:
> > > > >>>> >>
> > > > >>>> >>> Hi Xiaoxiang,
> > > > >>>> >>>
> > > > >>>> >>> Thank you very much
> > > > >>>> >>>
> > > > >>>> >>> I have clearer picture of Kylin already thanks to your
> > > > explanation.
> > > > >>>> >>>
> > > > >>>> >>> Now back to the sample project of SSB in attached photo,
> when
> > I
> > > > run
> > > > >>>> this
> > > > >>>> >>> query with push_down option OFF, why the OLAP error appears,
> > and
> > > > in
> > > > >>>> such
> > > > >>>> >>> case, how to create a new cube for this query?
> > > > >>>> >>>
> > > > >>>> >>> [image: image.png]
> > > > >>>> >>>
> > > > >>>> >>> On Wed, Nov 1, 2023 at 3:49 PM Xiaoxiang Yu <
> x...@apache.org>
> > > > >>>> wrote:
> > > > >>>> >>>
> > > > >>>> >>>> Here is some of my explanation and it may not be perfect.
> > > > >>>> >>>> Segment in Kylin is part of model/cube pre-computed data,
> in
> > > most
> > > > >>>> >>>> cases, divided by date column.
> > > > >>>> >>>>
> > > > >>>> >>>> Here is some difference between Segment and Snapshot.
> > > > >>>> >>>> Segment, whose source data comes from one fact table joins
> > some
> > > > >>>> dimension
> > > > >>>> >>>> tables with 'specific date range', is 'precomputed', and
> will
> > > > >>>> accelerate
> > > > >>>> >>>> complex query.
> > > > >>>> >>>> Snapshot, whose source data comes from one specific
> dimension
> > > > >>>> table without
> > > > >>>> >>>> specific date range, is "not precomputed", and can join
> with
> > > > >>>> segments
> > > > >>>> >>>> at runtime .
> > > > >>>> >>>>
> > > > >>>> >>>> -
> > > https://kylin.apache.org/5.0/docs/snapshot/snapshot_management
> > > > >>>> >>>> -
> > > > >>>> >>>>
> > > > >>>>
> > > >
> > >
> >
> https://kylin.apache.org/5.0/docs/modeling/load_data/segment_operation_settings/intro
> > > > >>>> >>>>
> > > > >>>> >>>> ------------------------
> > > > >>>> >>>> With warm regard
> > > > >>>> >>>> Xiaoxiang Yu
> > > > >>>> >>>>
> > > > >>>> >>>>
> > > > >>>> >>>>
> > > > >>>> >>>> On Wed, Nov 1, 2023 at 3:53 PM Nam Đỗ Duy <na...@vnpay.vn>
> > > > wrote:
> > > > >>>> >>>>
> > > > >>>> >>>>> Thank you again, very smart of you to automatically select
> > > cube
> > > > >>>> for a
> > > > >>>> >>>>> certain query. Sorry If I ask too much: Is the concept of
> > > > Segment
> > > > >>>> in Kylin
> > > > >>>> >>>>> model similar to Slice-and-Dice concept of Cube, what is
> the
> > > > >>>> different
> > > > >>>> >>>>> between Kylin Segment and Kylin Snapshot?
> > > > >>>> >>>>>
> > > > >>>> >>>>> PS. I sent you the log files for your help in
> investigating
> > > why
> > > > my
> > > > >>>> >>>>> cube has not been used.
> > > > >>>> >>>>>
> > > > >>>> >>>>> On Wed, Nov 1, 2023 at 2:36 PM Xiaoxiang Yu <
> > x...@apache.org>
> > > > >>>> wrote:
> > > > >>>> >>>>>
> > > > >>>> >>>>>> I guess there is a misunderstanding from your sentences.
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> -- 'I need to select Cube from a combo box below the
> query
> > > > >>>> window'
> > > > >>>> >>>>>> It is not right to use 'need', that combo box is for some
> > > > >>>> specific
> > > > >>>> >>>>>> cases(for example, Kylin did not choose a cube which is
> the
> > > > most
> > > > >>>> >>>>>> efficient), not the most cases.
> > > > >>>> >>>>>> In most cases(both for Kylin 4 and Kylin 5), you don't
> need
> > > to
> > > > >>>> select
> > > > >>>> >>>>>> a Cube in the combo box, Kylin will do the choice for
> you.
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> ------------------------
> > > > >>>> >>>>>> With warm regard
> > > > >>>> >>>>>> Xiaoxiang Yu
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>
> > > > >>>> >>>>>> On Wed, Nov 1, 2023 at 3:24 PM Nam Đỗ Duy
> > > > <na...@vnpay.vn.invalid
> > > > >>>> >
> > > > >>>> >>>>>> wrote:
> > > > >>>> >>>>>>
> > > > >>>> >>>>>>> Hi Xiaoxiang, sorry if I made you confused (Anyway, it
> is
> > > > just a
> > > > >>>> >>>>>>> question of a beginner)
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>> "obviously" means "clearly"
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>> because I need to select Cube from a combo box below the
> > > query
> > > > >>>> window
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>> Thank you very much
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>> On Wed, Nov 1, 2023 at 2:20 PM Xiaoxiang Yu <
> > > x...@apache.org>
> > > > >>>> wrote:
> > > > >>>> >>>>>>>
> > > > >>>> >>>>>>>> From my side, I cannot understand why you say Kylin 4
> is
> > > > 'very
> > > > >>>> >>>>>>>> obviously'. Can you give an example?
> > > > >>>> >>>>>>>> From the source code, the basic logic of choosing the
> > right
> > > > >>>> >>>>>>>> cube/model are similar.
> > > > >>>> >>>>>>>> ------------------------
> > > > >>>> >>>>>>>> With warm regard
> > > > >>>> >>>>>>>> Xiaoxiang Yu
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>> On Wed, Nov 1, 2023 at 3:01 PM Nam Đỗ Duy <
> > na...@vnpay.vn>
> > > > >>>> wrote:
> > > > >>>> >>>>>>>>
> > > > >>>> >>>>>>>>> Thank you for your kind reply, please answer 1 more
> > > question
> > > > >>>> about
> > > > >>>> >>>>>>>>> version 5:
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> In version 4.x we run query against a Cube very
> > obviously,
> > > > >>>> but in
> > > > >>>> >>>>>>>>> version 5, the cube usage is a implication socan you
> > > advise:
> > > > >>>> for a given
> > > > >>>> >>>>>>>>> query, which model will be used, which index (cube)
> will
> > > be
> > > > >>>> used for this
> > > > >>>> >>>>>>>>> query?
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> Thank you
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>> On Wed, Nov 1, 2023 at 1:42 PM Xiaoxiang Yu <
> > > > x...@apache.org>
> > > > >>>> >>>>>>>>> wrote:
> > > > >>>> >>>>>>>>>
> > > > >>>> >>>>>>>>>> 1. How do I measure the size of the index (cube) in
> > > version
> > > > >>>> 5?
> > > > >>>> >>>>>>>>>>    You can check storage of specific Indexes from the
> > > Index
> > > > >>>> page.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>>
> > > >
> > >
> >
> https://kylin.apache.org/5.0/docs/modeling/model_design/aggregation_group#view-aggregate-index
> > > > >>>> >>>>>>>>>> or
> > > > >>>> >>>>>>>>>>
> > > > >>>>
> > > >
> > >
> >
> https://kylin.apache.org/5.0/assets/images/index_1-6ad3f55183d4ed61962359d9408ba192.png
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> 2. How to create the cardinality for each column?
> > > > >>>> >>>>>>>>>>    You should check this link :
> > > > >>>> >>>>>>>>>>
> > > > https://kylin.apache.org/5.0/docs/datasource/data_sampling/
> > > > >>>> .
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> 3. In your default project sample named SSB project,
> > you
> > > > have
> > > > >>>> >>>>>>>>>> only 4 simple aggregate group index and no table
> index
> > as
> > > > in
> > > > >>>> attached file
> > > > >>>> >>>>>>>>>> so what is the best strategy to select index for our
> > > OLAP?
> > > > >>>> >>>>>>>>>>     1. There does exist a 'Base Table Index'  by
> > default
> > > > >>>> >>>>>>>>>> actually, its id is 20000000001.
> > > > >>>> >>>>>>>>>>     2. I think it is a good question and Kylin 5
> lacks
> > > > such a
> > > > >>>> >>>>>>>>>> guide for better modeling. You are free to ask your
> > > > question
> > > > >>>> to
> > > > >>>> >>>>>>>>>> mailing list and I will try to reply.
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> ------------------------
> > > > >>>> >>>>>>>>>> With warm regard
> > > > >>>> >>>>>>>>>> Xiaoxiang Yu
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>> On Wed, Nov 1, 2023 at 2:12 PM Xiaoxiang Yu <
> > > > x...@apache.org
> > > > >>>> >
> > > > >>>> >>>>>>>>>> wrote:
> > > > >>>> >>>>>>>>>>
> > > > >>>> >>>>>>>>>>> OK, I didn't read all the mail history so I
> > > misunderstand
> > > > >>>> the
> > > > >>>> >>>>>>>>>>> situation. Looks like you need to analyse
> > > > >>>> >>>>>>>>>>> the cause why the query didn't hit the cube
> correctly.
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> Please generate query diagnosis package and send it
> to
> > > me
> > > > >>>> >>>>>>>>>>> privately. I will analyse the query log.
> > > > >>>> >>>>>>>>>>> You can refer to the following steps in screenshots.
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> [image: image.png]
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> If the screenshots are not displaying correctly,
> > please
> > > > read
> > > > >>>> >>>>>>>>>>> this guide :
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>
> > > > >>>>
> > > >
> > >
> >
> https://kylin.apache.org/5.0/docs/operations/system-operation/diagnosis/#generate-query-diagnosis-package-in-web-ui
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> By the way, you need to analyse the cause by reading
> > > > >>>> >>>>>>>>>>> kylin.query.log, not the kylin.log,
> > > > >>>> >>>>>>>>>>> refer to
> > > > >>>> >>>>>>>>>>>
> > > > >>>> https://kylin.apache.org/5.0/docs/operations/logs/system_log
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> ------------------------
> > > > >>>> >>>>>>>>>>> With warm regard
> > > > >>>> >>>>>>>>>>> Xiaoxiang Yu
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>> On Wed, Nov 1, 2023 at 12:18 PM Nam Đỗ Duy <
> > > > na...@vnpay.vn>
> > > > >>>> >>>>>>>>>>> wrote:
> > > > >>>> >>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> Thank you Xiaoxiang for your advice. As my title
> > email
> > > > >>>> shown, I
> > > > >>>> >>>>>>>>>>>> guessed that the OLAP functionalities has not been
> > > > >>>> correctly set up in my
> > > > >>>> >>>>>>>>>>>> computer.
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> The evidence about it is that: when I disable the
> > > > Pushdown
> > > > >>>> >>>>>>>>>>>> option box to use solely the precomputation cube
> > only,
> > > it
> > > > >>>> showed following
> > > > >>>> >>>>>>>>>>>> error: Please kindly advise how to properly build
> the
> > > > OLAP
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> LIMIT 500": No realization found for OLAPContext,
> > > > >>>> MODEL_UNMATCHED_JOIN,
> > > > >>>>
> > > rel#2240:KapTableScan.OLAP.[](table=[VNEVENT_HIVE_DWH_400MILLION_ROWS,
> > > > >>>> FACTUSEREVENT],ctx=0@null,fields=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
> > 10,
> > > > >>>> 11, 12, 13, 14, 15, 16, 17, 18, 19, 20])
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>> On Wed, Nov 1, 2023 at 10:40 AM Xiaoxiang Yu <
> > > > >>>> x...@apache.org>
> > > > >>>> >>>>>>>>>>>> wrote:
> > > > >>>> >>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> Hi,
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>     Yesterday, I tried to see if query pushdown
> > > > functions
> > > > >>>> work
> > > > >>>> >>>>>>>>>>>>> well in the Kylin5 docker, and all of my queries
> > > return
> > > > >>>> proper responses .
> > > > >>>> >>>>>>>>>>>>>     After checking your logs from Shaofeng, I
> found
> > > > these
> > > > >>>> >>>>>>>>>>>>> error messages repeated many times:
> > > > >>>> >>>>>>>>>>>>>     1. 'java.io.IOException: All datanodes
> > > > >>>> >>>>>>>>>>>>> DatanodeInfoWithStorage[127.0.0.1:9866
> > > > >>>> ,DS-5093899b-06c7-4386-95d5-6fc271d92b52,DISK]
> > > > >>>> >>>>>>>>>>>>> are bad. Aborting...'
> > > > >>>> >>>>>>>>>>>>>     2. 'curator.ConnectionState : Connection timed
> > out
> > > > for
> > > > >>>> >>>>>>>>>>>>> connection string (localhost:2181) and timeout
> > > (15000) /
> > > > >>>> elapsed (41794)
> > > > >>>> >>>>>>>>>>>>> org.apache.curator.CuratorConnectionLossException:
> > > > >>>> >>>>>>>>>>>>> KeeperErrorCode = ConnectionLoss'
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>     I guess the root cause is that the container
> > > didn't
> > > > >>>> not
> > > > >>>> >>>>>>>>>>>>> have enough resources. I found you query on a
> table
> > > > called
> > > > >>>> >>>>>>>>>>>>> 'XXX_hive_dwh_400million_rows', looks like you
> gave
> > a
> > > > >>>> complex query on a
> > > > >>>> >>>>>>>>>>>>> table which contains 400 million rows?
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>     Since I am the uploader of kylin5 's docker
> > > image, I
> > > > >>>> want
> > > > >>>> >>>>>>>>>>>>> to give some explainment. Kylin5 docker is not a
> > place
> > > > >>>> for performance
> > > > >>>> >>>>>>>>>>>>> benchmarks, it is only for demonstration. It is
> only
> > > > >>>> allocated with very
> > > > >>>> >>>>>>>>>>>>> little resources(8G memory) if you are using the
> > > default
> > > > >>>> command from
> > > > >>>> >>>>>>>>>>>>> docker hub page. Before I uploaded my image, I
> only
> > > > >>>> tested my image using
> > > > >>>> >>>>>>>>>>>>> the ssb dataset, which the biggest table only
> > contains
> > > > >>>> about 60k rows. If
> > > > >>>> >>>>>>>>>>>>> you are using a larger dataset and complexer
> > queries,
> > > > you
> > > > >>>> have to scale the
> > > > >>>> >>>>>>>>>>>>> resource properly. Try querying tables which
> contain
> > > not
> > > > >>>> more than 100k
> > > > >>>> >>>>>>>>>>>>> rows by default.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>     Here are some tips which may help you to check
> > if
> > > > the
> > > > >>>> >>>>>>>>>>>>> daemon service is in health status and
> > > > >>>> resources(particularly disk space)
> > > > >>>> >>>>>>>>>>>>> is configured properly.
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>     1. Checking HDFS 's web ui(
> > > > >>>> >>>>>>>>>>>>> http://localhost:9870/dfshealth.html#tab-datanode
> )
> > > to
> > > > >>>> >>>>>>>>>>>>> confirm whether HDFS service is in 'In service'
> > > status.
> > > > >>>> >>>>>>>>>>>>>     2. Checking Datanode 's log in
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>>
> `/opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log`,
> > > > check if
> > > > >>>> >>>>>>>>>>>>> there is any error message. Like: cat
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> /opt/hadoop-3.2.1/logs/hadoop-root-datanode-Kylin5-Machine.log |
> > > grep
> > > > ERROR
> > > > >>>> >>>>>>>>>>>>> | wc -l
> > > > >>>> >>>>>>>>>>>>>     3. Checking if your docker engine is
> configured
> > > with
> > > > >>>> >>>>>>>>>>>>> enough disk space, if you are using Docker Desktop
> > > like
> > > > >>>> me,please go to
> > > > >>>> >>>>>>>>>>>>> "Settings" - "Resources" - "Advanced", make sure
> you
> > > > have
> > > > >>>> allocated 40GB+
> > > > >>>> >>>>>>>>>>>>> disk space to the docker container.
> > > > >>>> >>>>>>>>>>>>>     4. Checking the available disk space of your
> > > > >>>> container by
> > > > >>>> >>>>>>>>>>>>> `df -h`, make sure the 'Use%' of 'overlay' is less
> > > than
> > > > >>>> 60% .
> > > > >>>> >>>>>>>>>>>>>     5. Checking the load average/ cpu usage/ jvm
> gc.
> > > > Make
> > > > >>>> sure
> > > > >>>> >>>>>>>>>>>>> these metrics are not really high when you send a
> > > query.
> > > > >>>> >>>>>>>>>>>>> ------------------------
> > > > >>>> >>>>>>>>>>>>> With warm regard
> > > > >>>> >>>>>>>>>>>>> Xiaoxiang Yu
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>> On Tue, Oct 31, 2023 at 5:13 PM Nam Đỗ Duy
> > > > >>>> >>>>>>>>>>>>> <na...@vnpay.vn.invalid> wrote:
> > > > >>>> >>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>> Hi ShaoFeng
> > > > >>>> >>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>> Thank you very much for your valuable feedback
> > > > >>>> >>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>> I saw the application to be there (if I see it
> > right)
> > > > as
> > > > >>>> in
> > > > >>>> >>>>>>>>>>>>>> the attachment photo. Kindly advise so that I can
> > run
> > > > >>>> this query on OLAP.
> > > > >>>> >>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>> PS. I sent you the log file in private.
> > > > >>>> >>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>> [image: image.png]
> > > > >>>> >>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>> On Tue, Oct 31, 2023 at 3:11 PM ShaoFeng Shi <
> > > > >>>> >>>>>>>>>>>>>> shaofeng...@apache.org> wrote:
> > > > >>>> >>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>> Can you provide the messages in logs/kylin.log
> > when
> > > > >>>> >>>>>>>>>>>>>>> executing the SQL? and you can also check the
> > Spark
> > > UI
> > > > >>>> from yarn resource
> > > > >>>> >>>>>>>>>>>>>>> manager (there should be one running application
> > > > called
> > > > >>>> Spardar, which is
> > > > >>>> >>>>>>>>>>>>>>> Kylin's backend spark application). If the
> > > application
> > > > >>>> is not there, it may
> > > > >>>> >>>>>>>>>>>>>>> indicates the yarn doesn't have resource to
> > startup
> > > > it.
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>> Best regards,
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>> Shaofeng Shi 史少锋
> > > > >>>> >>>>>>>>>>>>>>> Apache Kylin PMC,
> > > > >>>> >>>>>>>>>>>>>>> Apache Incubator PMC,
> > > > >>>> >>>>>>>>>>>>>>> Email: shaofeng...@apache.org
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>> Apache Kylin FAQ:
> > > > >>>> >>>>>>>>>>>>>>>
> > > https://kylin.apache.org/docs/gettingstarted/faq.html
> > > > >>>> >>>>>>>>>>>>>>> Join Kylin user mail group:
> > > > >>>> user-subscr...@kylin.apache.org
> > > > >>>> >>>>>>>>>>>>>>> Join Kylin dev mail group:
> > > > >>>> dev-subscr...@kylin.apache.org
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>> Nam Đỗ Duy <na...@vnpay.vn> 于2023年10月31日周二
> > 10:35写道:
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> Dear Sir/Madam,
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> I have a fact with 500million rows then I build
> > > > model,
> > > > >>>> >>>>>>>>>>>>>>>> index according to the website help.
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> I chose full incremental because this is the
> > first
> > > > >>>> times I
> > > > >>>> >>>>>>>>>>>>>>>> load data
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> I create both index types Aggregate group
> index,
> > > > table
> > > > >>>> >>>>>>>>>>>>>>>> index as photo attached.
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> But the query always failed after timeout of
> 300
> > > > >>>> seconds (I
> > > > >>>> >>>>>>>>>>>>>>>> run in docker), I dont want to increase the
> value
> > > of
> > > > >>>> 300 seconds because I
> > > > >>>> >>>>>>>>>>>>>>>> wish the OLAP can run within 1 minutes (is that
> > > > >>>> possible?)
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> It seems that the OLAP function in indexing not
> > > > >>>> working to
> > > > >>>> >>>>>>>>>>>>>>>> speedup the query by precomputed cube.
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> Can you advise to check whether the index did
> > > really
> > > > >>>> work?
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> It is quite urgent task for me so prompt
> response
> > > is
> > > > >>>> highly
> > > > >>>> >>>>>>>>>>>>>>>> appreciated.
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>> Thank you very much
> > > > >>>> >>>>>>>>>>>>>>>>
> > > > >>>> >>>>>>>>>>>>>>>
> > > > >>>>
> > > > >>>
> > > >
> > >
> >
>

Reply via email to