Chetan Bhat created SPARK-35835: ----------------------------------- Summary: Select filter query on table with struct complex type fails Key: SPARK-35835 URL: https://issues.apache.org/jira/browse/SPARK-35835 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.1.1 Environment: Spark 3.1.1 Reporter: Chetan Bhat
[Steps]:- >From Spark beeline create a parquet or ORC table having complex type data. >Load data in the table and execute select filter query. 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, STRUCT_INT_DOUBLE_STRING_DATE struct<ID:int,SALARY:double,COUNTRY:STRING,CHECK_DATE:string>,CARD_COUNT int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) stored as parquet; +---------+ | Result | +---------+ +---------+ No rows selected (0.161 seconds) 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com; +---------+ | Result | +---------+ +---------+ No rows selected (1.09 seconds) 0: jdbc:hive2://vm2:22550/> SELECT struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum FROM (select * from Struct_com) SUB_QRY WHERE struct_int_double_string_date.id > 5700 GRO UP BY struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country ORDER BY struct_int_double_string_date.COUNTRY asc,struct_int_double_string_date.CHECK_DATE asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri ng_date.Country asc; [Actual Issue] : - Select filter query on table with struct complex type fails 0: jdbc:hive2://vm2:22550/> SELECT struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum FROM (select * from Struct_com) SUB_QRY WHERE struct_int_double_string_date.id > 5700 GRO UP BY struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country ORDER BY struct_int_double_string_date.COUNTRY asc,struct_int_double_string_date.CHECK_DATE asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri ng_date.Country asc; Error: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, Country#139899, Sum#139877L]) +- Exchange hashpartitioning(_gen_alias_139928#139928, _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, _gen_alias_139929#139929], functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS _gen_alias_139931#139931] +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], PartitionFi lters: [], PushedFilters: [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: struct<STRUCT_INT_DOUBLE_STRIN G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>> at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar kExecuteStatementOperation.scala:396) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) at org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, Country#139899, Sum#139877L]) +- Exchange hashpartitioning(_gen_alias_139928#139928, _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, _gen_alias_139929#139929], functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS _gen_alias_139931#139931] +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], PartitionFi lters: [], PushedFilters: [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: struct<STRUCT_INT_DOUBLE_STRIN G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:163) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453) at org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452) at org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496) at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:746) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:321) at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:387) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3706) at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2968) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:777) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3695) at org.apache.spark.sql.Dataset.collect(Dataset.scala:2968) at org.apache.spark.sql.hive.thriftserver.SparkExecuteProxyStatementOperation.processResults(SparkExecuteProxyStatementOperation.scala:221) at org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar kExecuteStatementOperation.scala:365) ... 16 more Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding attribute, tree: _gen_alias_139930#139930 at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:322) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407) at org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405) at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:322) at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:306) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74) at org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) at scala.collection.immutable.List.foreach(List.scala:392) at scala.collection.TraversableLike.map(TraversableLike.scala:238) at scala.collection.TraversableLike.map$(TraversableLike.scala:231) at scala.collection.immutable.List.map(List.scala:298) at org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:554) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:741) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:148) at org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:95) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:90) at org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:90) at org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:47) at org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:655) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:718) at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:118) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:118) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency$lzycompute(ShuffleExchangeExec.scala:151) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency(ShuffleExchangeExec.scala:149) at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:166) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 47 more Caused by: java.lang.RuntimeException: Couldn't find _gen_alias_139930#139930 in [_gen_alias_139928#139928,_gen_alias_139929#139929,sum(cast(_gen_alias_139931#13993 1 as bigint))#139901L] at scala.sys.package$.error(package.scala:30) at org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.$anonfun$applyOrElse$1(BoundAttribute.scala:81) at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) ... 90 more (state=,code=0) [Expected Result] :- Select filter query on table with struct complex type should be success -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org