[jira] [Commented] (SPARK-35835) Select filter query on table with struct complex type fails

pavithra ramachandran (Jira) Sun, 20 Jun 2021 22:37:06 -0700


    [ 
https://issues.apache.org/jira/browse/SPARK-35835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17366386#comment-17366386
 ]


pavithra ramachandran commented on SPARK-35835:
-----------------------------------------------

i shall raise a PR soon

> Select filter query on table with struct complex type fails
> -----------------------------------------------------------
>
>                 Key: SPARK-35835
>                 URL: https://issues.apache.org/jira/browse/SPARK-35835
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.1.1
>         Environment: Spark 3.1.1
>            Reporter: Chetan Bhat
>            Priority: Minor
>
> [Steps]:-
> From Spark beeline create a parquet or ORC table having complex type data. 
> Load data in the table and execute select filter query.
> 0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR 
> int, MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
> STRUCT_INT_DOUBLE_STRING_DATE 
> struct<ID:int,SALARY:double,COUNTRY:STRING,CHECK_DATE:string>,CARD_COUNT 
> int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
> stored as parquet;
> +---------+
> | Result |
> +---------+
> +---------+
> No rows selected (0.161 seconds)
> 0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH 
> 'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com;
> +---------+
> | Result |
> +---------+
> +---------+
> No rows selected (1.09 seconds)
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
>  
> [Actual Issue] : - Select filter query on table with struct complex type fails
> 0: jdbc:hive2://vm2:22550/> SELECT 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
>  t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
> FROM (select * from Struct_com) SUB_QRY WHERE 
> struct_int_double_string_date.id > 5700 GRO UP BY 
> struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
>  ORDER BY struct_int_double_string_date.COUNTRY 
> asc,struct_int_double_string_date.CHECK_DATE 
> asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
> ng_date.Country asc;
> Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
> Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
> ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
> FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
> +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
> bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
> Country#139899, Sum#139877L])
> +- Exchange hashpartitioning(_gen_alias_139928#139928, 
> _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
> +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], 
> functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g 
> en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L])
> +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
> _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
> _gen_alias_139931#139931]
> +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
> (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
> +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
> Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 
> 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, 
> Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], 
> PartitionFi lters: [], PushedFilters: 
> [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
> GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
> struct<STRUCT_INT_DOUBLE_STRIN 
> G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>>
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar
>  kExecuteStatementOperation.scala:396)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281)
> at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> execute, tree:
> Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
> ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
> FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
> +- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
> bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
> Country#139899, Sum#139877L])
> +- Exchange hashpartitioning(_gen_alias_139928#139928, 
> _gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
> +- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
> _gen_alias_139929#139929], 
> functions=[partial_sum(cast(_gen_alias_139931#139931 as bigint))], output=[_g 
> en_alias_139928#139928, _gen_alias_139929#139929, sum#139934L])
> +- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
> _gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
> _gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
> _gen_alias_139931#139931]
> +- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
> (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
> +- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
> Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 
> 9885), (STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, 
> Location: InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], 
> PartitionFi lters: [], PushedFilters: 
> [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
> GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
> struct<STRUCT_INT_DOUBLE_STRIN 
> G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>>
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:163)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
> at 
> org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
> at 
> org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
> at 
> org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
> at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:746)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> at 
> org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:321)
> at 
> org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:387)
> at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3706)
> at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2968)
> at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697)
> at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170)
> at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91)
> at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:777)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
> at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3695)
> at org.apache.spark.sql.Dataset.collect(Dataset.scala:2968)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteProxyStatementOperation.processResults(SparkExecuteProxyStatementOperation.scala:221)
> at 
> org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar
>  kExecuteStatementOperation.scala:365)
> ... 16 more
> Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
> Binding attribute, tree: _gen_alias_139930#139930
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317)
> at 
> org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:322)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
> at 
> org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:322)
> at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:306)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
> at scala.collection.immutable.List.foreach(List.scala:392)
> at scala.collection.TraversableLike.map(TraversableLike.scala:238)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
> at scala.collection.immutable.List.map(List.scala:298)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96)
> at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:554)
> at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:741)
> at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:148)
> at 
> org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:95)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> at 
> org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:90)
> at 
> org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:90)
> at 
> org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:47)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:655)
> at 
> org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:718)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
> at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
> at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
> at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
> at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:118)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:118)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency$lzycompute(ShuffleExchangeExec.scala:151)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency(ShuffleExchangeExec.scala:149)
> at 
> org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:166)
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> ... 47 more
> Caused by: java.lang.RuntimeException: Couldn't find _gen_alias_139930#139930 
> in 
> [_gen_alias_139928#139928,_gen_alias_139929#139929,sum(cast(_gen_alias_139931#13993
>  1 as bigint))#139901L]
> at scala.sys.package$.error(package.scala:30)
> at 
> org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.$anonfun$applyOrElse$1(BoundAttribute.scala:81)
> at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
> ... 90 more (state=,code=0)
>  
> [Expected Result] :- Select filter query on table with struct complex type 
> should be success



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35835) Select filter query on table with struct complex type fails

Reply via email to