[jira] [Created] (KYLIN-5704) For ‘in’ condition query of non-time partition columns, when the data type of the value in 'in' condition is inconsistent with that of the non-time partition column, the segment pruner fails, resulting in full Segment scanning

Hrongrong Cao (Jira) Tue, 14 Nov 2023 22:51:31 -0800

Hrongrong Cao created KYLIN-5704:
------------------------------------

             Summary: For ‘in’ condition query of non-time partition columns, 
when the data type of the value in 'in' condition is inconsistent with that of 
the non-time partition column, the segment pruner fails, resulting in full 
Segment scanning
                 Key: KYLIN-5704
                 URL: https://issues.apache.org/jira/browse/KYLIN-5704
             Project: Kylin
          Issue Type: Bug
    Affects Versions: 5.0-alpha
            Reporter: Hrongrong Cao
             Fix For: 5.0-beta



The query column is a non-time partition column, a common dimension column, and 
the filter condition of the common dimension column is col in (x1, x2...) In 
this case (and because the col and x1 types do not match, it is automatically 
converted to (cast col as string) in (x1,x2..), Fileprunner will report an 
error because 
org.apache.spark.sql.execution.datasource.FilePruner#convertCastFilter does not 
handle in.

Explain that the convertCastFilter method is to remove the cast condition, so 
that the filter condition can be matched when calling 
DataSourceStrategy.translateFilter, and then the Segment can be filtered. 
However, currently convertCastFilter misses the processing of the in condition, 
so translateFilter cannot match and becomes empty, so The query was thrown 
incorrectly.

In addition: if it is a time partition column, it does not matter if an error 
is reported here, because in the previous steps, the calcite file prunner has 
already completed the Segment Prune of the time partition column.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KYLIN-5704) For ‘in’ condition query of non-time partition columns, when the data type of the value in 'in' condition is inconsistent with that of the non-time partition column, the segment pruner fails, resulting in full Segment scanning

Reply via email to