Guangyuan Feng created KYLIN-5564:
-------------------------------------
Summary: Introduce Bloom Filter to optimize data scanning based on
Spark
Key: KYLIN-5564
URL: https://issues.apache.org/jira/browse/KYLIN-5564
Project: Kylin
Issue Type: Improvement
Components: Query Engine
Affects Versions: 5.0-alpha
Reporter: Guangyuan Feng
Assignee: Guangyuan Feng
Fix For: 5.0-alpha
Currently, all the data generated by Kylin are saved as Parquet files through
Spark, but Kylin has not make full use of the features of Parquet when scanning
data. Among them, BloomFilter must be stressed, because it's the most common
tool to help READERs to skip useless data.
Therefore, we introduced a approach to build BloomFilter automatically,
conditionally and smartly when constructing segments, on the desired columns
especially according to the query histories.
After brought in BloomFilter, Spark will have a good performance improvement in
the most cases.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)