Hi ShaoFeng, Thanks for your valuable suggestions. I will surely apply your inputs to the model and let the Kylin team know about the improvement.
Email- [email protected] with regards, Sonu Kumar Singh On Mon, Aug 21, 2023 at 2:48 PM ShaoFeng Shi <[email protected]> wrote: > Hello Sonu, > > In Kylin5, you can add multiple raw index in a data model. For each raw > index, you can add the column you want, specify column sequence (put the > filter column ahead of non-filter column), and specify a "ShardBy" column > (which should be a high-cardinality column, which use be used to distribute > data into different shard/bucket/file). Then Kylin will sort, distribute > and index the data with your preferred method, which will benefit the query > performance. > > With the case you mentioned, "a table with 118 columns, and a filter query > can run on any column", I think it is difficult to achieve a sub-second > performance, under a reasonable resource. Some suggestions here: > > 1) build the data incrementally into multiple segments, instead of one > whole build; by doing so, Kylin will have segment level index which may > help to do high level pruning; > 2) add and build multiple raw index, but no need to add too many as that > will take much space; > 3) in each raw index, carefully select the column sequence and "ShardBy" > column, so that Spark can do as much file prunning as it can; > > Other developers may also have other inputs. Hope this helps. > > Best regards, > > Shaofeng Shi 史少锋 > Apache Kylin PMC, > Apache Incubator PMC, > Email: [email protected] > > Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html > Join Kylin user mail group: [email protected] > Join Kylin dev mail group: [email protected] > > > > > Singh Sonu <[email protected]> 于2023年8月21日周一 16:33写道: > > > Hi Kylin Team, > > > > How to use the inverted and sorted index in Kylin 5? > > Against aggregation data, queries are working fast, but when I am trying > to > > run the query against raw data or a search query, Kylin 5 is not > performing > > fast. I created a model against a table with 118 columns, and a filter > > query can run on any column. > > > > Please suggest. > > > > > > > > > > You can reach me out at > > Mb. No- 7092292112 > > Email- [email protected] > > > > with regards, > > Sonu Kumar Singh > > >
