[ https://issues.apache.org/jira/browse/SPARK-35688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17359786#comment-17359786 ]
todd.chen commented on SPARK-35688: ----------------------------------- == Physical Plan == Project [get_json_object(json#14, $.uid) AS uid#20, name_array#21] +- Generate explode(explode_name_array(name#13)), [json#14], false, [name_array#21] +- Filter ((((isnotnull(name#13) AND NOT (name#13 = ?)) AND (cast(get_json_object(json#14, $.uid) as int) > 0)) AND (size(explode_name_array(name#13), true) > 0)) AND isnotnull(explode_name_array(name#13))) +- *(1) ColumnarToRow +- FileScan parquet [name#13,json#14] Batched: true, DataFilters: [isnotnull(name#13), NOT (name#13 = ?), (cast(get_json_object(json#14, $.uid) as int) > 0), (size..., Format: Parquet, Location: InMemoryFileIndex[file:/tmp/tb_eliminate_bad_case_data], PartitionFilters: [], PushedFilters: [IsNotNull(name), Not(EqualTo(name,?))], ReadSchema: struct<name:string,json:string> and from this plan filter invalid data " ?" will execute before explode ,but because spark.sql.subexpressionElimination.enabled is true ,spark call explode_name_array before filter data > GeneratePredicate eliminate will fail in some case > --------------------------------------------------- > > Key: SPARK-35688 > URL: https://issues.apache.org/jira/browse/SPARK-35688 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1 > Reporter: todd.chen > Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org