Hello Sonu,

In Kylin5, you can add multiple raw index in a data model. For each raw
index, you can add the column you want, specify column sequence (put the
filter column ahead of non-filter column), and specify a "ShardBy" column
(which should be a high-cardinality column, which use be used to distribute
data into different shard/bucket/file). Then Kylin will sort, distribute
and index the data with your preferred method, which will benefit the query
performance.

With the case you mentioned, "a table with 118 columns, and a filter query
can run on any column", I think it is difficult to achieve a sub-second
performance, under a reasonable resource.  Some suggestions here:

1) build the data incrementally into multiple segments, instead of one
whole build; by doing so, Kylin will have segment level index which may
help to do high level pruning;
2) add and build multiple raw index, but no need to add too many as that
will take much space;
3) in each raw index, carefully select the column sequence and "ShardBy"
column, so that Spark can do as much file prunning as it can;

Other developers may also have other inputs. Hope this helps.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC,
Apache Incubator PMC,
Email: shaofeng...@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscr...@kylin.apache.org
Join Kylin dev mail group: dev-subscr...@kylin.apache.org




Singh Sonu <sonusingh.javat...@gmail.com> 于2023年8月21日周一 16:33写道:

> Hi Kylin Team,
>
> How to use the inverted and sorted index in Kylin 5?
> Against aggregation data, queries are working fast, but when I am trying to
> run the query against raw data or a search query, Kylin 5 is not performing
> fast. I created a model against a table with 118 columns, and a filter
> query can run on any column.
>
> Please suggest.
>
>
>
>
>  You can reach me out at
>  Mb. No- 7092292112
>  Email- sonusingh.javat...@gmail.com
>
>  with regards,
>  Sonu Kumar Singh
>

Reply via email to