[ https://issues.apache.org/jira/browse/KYLIN-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guangyuan Feng resolved KYLIN-5564. ----------------------------------- Resolution: Fixed > Introduce Bloom Filter to optimize data scanning based on Spark > --------------------------------------------------------------- > > Key: KYLIN-5564 > URL: https://issues.apache.org/jira/browse/KYLIN-5564 > Project: Kylin > Issue Type: Improvement > Components: Query Engine > Affects Versions: 5.0-alpha > Reporter: Guangyuan Feng > Assignee: Guangyuan Feng > Priority: Major > Fix For: 5.0-beta > > Attachments: RowGroup BloomFilter 场景介绍和性能测试.pdf > > > Currently, all the data generated by Kylin are saved as *Parquet* files > through Spark, but Kylin has not make full use of the features of Parquet > when scanning data. Among them, BloomFilter must be stressed, because it's > the most common tool to help *READERs* to skip useless data. > Therefore, we introduced an approach to build *BloomFilter* automatically, > conditionally and smartly when constructing segments, on the desired columns > especially according to the query histories. > After brought in BloomFilter, Spark will have a good performance improvement > in the most cases. > > _About the benchmarks or performance tests, please read the attached PDF is > the report testing on SSB._ > -- This message was sent by Atlassian Jira (v8.20.10#820010)