[ https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366386#comment-17366386 ]
pavithra ramachandran commented on SPARK-35835: ----------------------------------------------- i shall raise a PR soon > Select filter query on table with struct complex type fails > ----------------------------------------------------------- > > Key: SPARK-35835 > URL: https://issues.apache.org/jira/browse/SPARK-35835 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 3.1.1 > Environment: Spark 3.1.1 > Reporter: Chetan Bhat > Priority: Minor > > [Steps]:- > From Spark beeline create a parquet or ORC table having complex type data. > Load data in the table and execute select filter query. > 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR > int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, > STRUCT_INT_DOUBLE_STRING_DATE > struct<ID:int,SALARY:double,COUNTRY:STRING,CHECK_DATE:string>,CARD_COUNT > int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) > stored as parquet; > +---------+ > | Result | > +---------+ > +---------+ > No rows selected (0.161 seconds) > 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH > 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com; > +---------+ > | Result | > +---------+ > +---------+ > No rows selected (1.09 seconds) > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > > [Actual Issue] : - Select filter query on table with struct complex type fails > 0: jdbc:hive2://vm2:22550/> SELECT > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in > t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum > FROM (select * from Struct_com) SUB_QRY WHERE > struct_int_double_string_date.id > 5700 GRO UP BY > struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country > ORDER BY struct_int_double_string_date.COUNTRY > asc,struct_int_double_string_date.CHECK_DATE > asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri > ng_date.Country asc; > Error: org.apache.hive.service.cli.HiveSQLException: Error running query: > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree: > Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 > ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS > FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] > +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as > bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, > Country#139899, Sum#139877L]) > +- Exchange hashpartitioning(_gen_alias_139928#139928, > _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] > +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], > functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g > en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) > +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS > _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS > _gen_alias_139931#139931] > +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND > (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) > +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] > Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 > 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, > Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], > PartitionFi lters: [], PushedFilters: > [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), > GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: > struct<STRUCT_INT_DOUBLE_STRIN > G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>> > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar > kExecuteStatementOperation.scala:396) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281) > at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78) > at > org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > execute, tree: > Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 > ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS > FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161] > +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as > bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, > Country#139899, Sum#139877L]) > +- Exchange hashpartitioning(_gen_alias_139928#139928, > _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157] > +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, > _gen_alias_139929#139929], > functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g > en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L]) > +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS > _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS > _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS > _gen_alias_139931#139931] > +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND > (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)) > +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] > Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 > 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, > Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], > PartitionFi lters: [], PushedFilters: > [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), > GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: > struct<STRUCT_INT_DOUBLE_STRIN > G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>> > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:163) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > at > org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453) > at > org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452) > at > org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496) > at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:746) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > at > org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:321) > at > org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:387) > at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3706) > at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2968) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:777) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65) > at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3695) > at org.apache.spark.sql.Dataset.collect(Dataset.scala:2968) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteProxyStatementOperation.processResults(SparkExecuteProxyStatementOperation.scala:221) > at > org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar > kExecuteStatementOperation.scala:365) > ... 16 more > Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: > Binding attribute, tree: _gen_alias_139930#139930 > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:322) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:322) > at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:306) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:554) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:741) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:148) > at > org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:95) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at > org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:90) > at > org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:90) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:47) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:655) > at > org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:718) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215) > at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:118) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:118) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency$lzycompute(ShuffleExchangeExec.scala:151) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency(ShuffleExchangeExec.scala:149) > at > org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:166) > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > ... 47 more > Caused by: java.lang.RuntimeException: Couldn't find _gen_alias_139930#139930 > in > [_gen_alias_139928#139928,_gen_alias_139929#139929,sum(cast(_gen_alias_139931#13993 > 1 as bigint))#139901L] > at scala.sys.package$.error(package.scala:30) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.$anonfun$applyOrElse$1(BoundAttribute.scala:81) > at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52) > ... 90 more (state=,code=0) > > [Expected Result] :- Select filter query on table with struct complex type > should be success -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org