Hrongrong Cao created KYLIN-5704:
------------------------------------
Summary: For ‘in’ condition query of non-time partition columns,
when the data type of the value in 'in' condition is inconsistent with that of
the non-time partition column, the segment pruner fails, resulting in full
Segment scanning
Key: KYLIN-5704
URL: https://issues.apache.org/jira/browse/KYLIN-5704
Project: Kylin
Issue Type: Bug
Affects Versions: 5.0-alpha
Reporter: Hrongrong Cao
Fix For: 5.0-beta
The query column is a non-time partition column, a common dimension column, and
the filter condition of the common dimension column is col in (x1, x2...) In
this case (and because the col and x1 types do not match, it is automatically
converted to (cast col as string) in (x1,x2..), Fileprunner will report an
error because
org.apache.spark.sql.execution.datasource.FilePruner#convertCastFilter does not
handle in.
Explain that the convertCastFilter method is to remove the cast condition, so
that the filter condition can be matched when calling
DataSourceStrategy.translateFilter, and then the Segment can be filtered.
However, currently convertCastFilter misses the processing of the in condition,
so translateFilter cannot match and becomes empty, so The query was thrown
incorrectly.
In addition: if it is a time partition column, it does not matter if an error
is reported here, because in the previous steps, the calcite file prunner has
already completed the Segment Prune of the time partition column.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)