[jira] [Created] (SPARK-35835) Select filter query on table with struct complex type fails

Chetan Bhat (Jira) Sun, 20 Jun 2021 22:36:05 -0700

Chetan Bhat created SPARK-35835:
-----------------------------------

             Summary: Select filter query on table with struct complex type 
fails
                 Key: SPARK-35835
                 URL: https://issues.apache.org/jira/browse/SPARK-35835
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 3.1.1
         Environment: Spark 3.1.1
            Reporter: Chetan Bhat



[Steps]:-

>From Spark beeline create a parquet or ORC table having complex type data. 
>Load data in the table and execute select filter query.

0: jdbc:hive2://vm2:22550/> create table Struct_com (CUST_ID string, YEAR int, 
MONTH int, AGE int, GENDER string, EDUCATED string, IS_MARRIED string, 
STRUCT_INT_DOUBLE_STRING_DATE 
struct<ID:int,SALARY:double,COUNTRY:STRING,CHECK_DATE:string>,CARD_COUNT 
int,DEBIT_COUNT int, CREDIT_COUNT int, DEPOSIT double, HQ_DEPOSIT double) 
stored as parquet;
+---------+
| Result |
+---------+
+---------+
No rows selected (0.161 seconds)
0: jdbc:hive2://vm2:22550/> LOAD DATA INPATH 
'hdfs://hacluster/chetan/Struct.csv' OVERWRITE INTO TABLE Struct_com;
+---------+
| Result |
+---------+
+---------+
No rows selected (1.09 seconds)
0: jdbc:hive2://vm2:22550/> SELECT 
struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
 t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
FROM (select * from Struct_com) SUB_QRY WHERE struct_int_double_string_date.id 
> 5700 GRO UP BY 
struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
 ORDER BY struct_int_double_string_date.COUNTRY 
asc,struct_int_double_string_date.CHECK_DATE 
asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
ng_date.Country asc;

 

[Actual Issue] : - Select filter query on table with struct complex type fails

0: jdbc:hive2://vm2:22550/> SELECT 
struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_in
 t_double_string_date.Country, SUM(struct_int_double_string_date.id) AS Sum 
FROM (select * from Struct_com) SUB_QRY WHERE struct_int_double_string_date.id 
> 5700 GRO UP BY 
struct_int_double_string_date.COUNTRY,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.CHECK_DATE,struct_int_double_string_date.Country
 ORDER BY struct_int_double_string_date.COUNTRY 
asc,struct_int_double_string_date.CHECK_DATE 
asc,struct_int_double_string_date.CHECK_DATE asc, struct_int_double_stri 
ng_date.Country asc;
Error: org.apache.hive.service.cli.HiveSQLException: Error running query: 
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
+- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
_gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
Country#139899, Sum#139877L])
+- Exchange hashpartitioning(_gen_alias_139928#139928, 
_gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
+- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
_gen_alias_139929#139929], functions=[partial_sum(cast(_gen_alias_139931#139931 
as bigint))], output=[_g en_alias_139928#139928, _gen_alias_139929#139929, 
sum#139934L])
+- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
_gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
_gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
_gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
_gen_alias_139931#139931]
+- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
(STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
+- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 9885), 
(STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, Location: 
InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], PartitionFi 
lters: [], PushedFilters: [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
struct<STRUCT_INT_DOUBLE_STRIN 
G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>>

at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar
 kExecuteStatementOperation.scala:396)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.$anonfun$run$3(SparkExecuteStatementOperation.scala:281)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties(SparkOperation.scala:78)
at 
org.apache.spark.sql.hive.thriftserver.SparkOperation.withLocalProperties$(SparkOperation.scala:62)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.withLocalProperties(SparkExecuteStatementOperation.scala:46)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:281)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2$$anon$3.run(SparkExecuteStatementOperation.scala:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1761)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation$$anon$2.run(SparkExecuteStatementOperation.scala:295)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
execute, tree:
Exchange rangepartitioning(COUNTRY#139896 ASC NULLS FIRST, CHECK_DATE#139897 
ASC NULLS FIRST, CHECK_DATE#139897 ASC NULLS FIRST, COUNTRY#139896 ASC NULLS 
FIRST, 200 ), ENSURE_REQUIREMENTS, [id=#17161]
+- *(2) HashAggregate(keys=[_gen_alias_139928#139928, 
_gen_alias_139929#139929], functions=[sum(cast(_gen_alias_139931#139931 as 
bigint))], output=[COUNTRY#139896, CHECK_DATE#139897, CHECK_DATE#139898, 
Country#139899, Sum#139877L])
+- Exchange hashpartitioning(_gen_alias_139928#139928, 
_gen_alias_139929#139929, 200), ENSURE_REQUIREMENTS, [id=#17157]
+- *(1) HashAggregate(keys=[_gen_alias_139928#139928, 
_gen_alias_139929#139929], functions=[partial_sum(cast(_gen_alias_139931#139931 
as bigint))], output=[_g en_alias_139928#139928, _gen_alias_139929#139929, 
sum#139934L])
+- *(1) Project [STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
_gen_alias_139928#139928, STRUCT_INT_DOUBLE_STRING_DATE#139885.CHECK_DATE AS 
_gen_alias_13 9929#139929, STRUCT_INT_DOUBLE_STRING_DATE#139885.COUNTRY AS 
_gen_alias_139930#139930, STRUCT_INT_DOUBLE_STRING_DATE#139885.ID AS 
_gen_alias_139931#139931]
+- *(1) Filter (isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#139885) AND 
(STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700))
+- FileScan parquet default.struct_com[STRUCT_INT_DOUBLE_STRING_DATE#139885] 
Batched: false, DataFilters: [isnotnull(STRUCT_INT_DOUBLE_STRING_DATE#13 9885), 
(STRUCT_INT_DOUBLE_STRING_DATE#139885.ID > 5700)], Format: Parquet, Location: 
InMemoryFileIndex[hdfs://hacluster/user/hive/warehouse/struct_com], PartitionFi 
lters: [], PushedFilters: [IsNotNull(STRUCT_INT_DOUBLE_STRING_DATE), 
GreaterThan(STRUCT_INT_DOUBLE_STRING_DATE.ID,5700)], ReadSchema: 
struct<STRUCT_INT_DOUBLE_STRIN 
G_DATE:struct<ID:int,COUNTRY:string,CHECK_DATE:string>>

at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:163)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDD(WholeStageCodegenExec.scala:525)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs(WholeStageCodegenExec.scala:453)
at 
org.apache.spark.sql.execution.InputRDDCodegen.inputRDDs$(WholeStageCodegenExec.scala:452)
at 
org.apache.spark.sql.execution.InputAdapter.inputRDDs(WholeStageCodegenExec.scala:496)
at org.apache.spark.sql.execution.SortExec.inputRDDs(SortExec.scala:132)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:746)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:321)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:387)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3706)
at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2968)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3697)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:108)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:170)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:91)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:777)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3695)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2968)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteProxyStatementOperation.processResults(SparkExecuteProxyStatementOperation.scala:221)
at 
org.apache.spark.sql.hive.thriftserver.SparkExecuteStatementOperation.org$apache$spark$sql$hive$thriftserver$SparkExecuteStatementOperation$$execute(Spar
 kExecuteStatementOperation.scala:365)
... 16 more
Caused by: org.apache.spark.sql.catalyst.errors.package$TreeNodeException: 
Binding attribute, tree: _gen_alias_139930#139930
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:317)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:73)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:317)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:322)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:407)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:243)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:405)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:358)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:322)
at org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:306)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.collection.TraversableLike.map(TraversableLike.scala:238)
at scala.collection.TraversableLike.map$(TraversableLike.scala:231)
at scala.collection.immutable.List.map(List.scala:298)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.generateResultFunction(HashAggregateExec.scala:554)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduceWithKeys(HashAggregateExec.scala:741)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.doProduce(HashAggregateExec.scala:148)
at 
org.apache.spark.sql.execution.CodegenSupport.$anonfun$produce$1(WholeStageCodegenExec.scala:95)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at 
org.apache.spark.sql.execution.CodegenSupport.produce(WholeStageCodegenExec.scala:90)
at 
org.apache.spark.sql.execution.CodegenSupport.produce$(WholeStageCodegenExec.scala:90)
at 
org.apache.spark.sql.execution.aggregate.HashAggregateExec.produce(HashAggregateExec.scala:47)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doCodeGen(WholeStageCodegenExec.scala:655)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:718)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD$lzycompute(ShuffleExchangeExec.scala:118)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.inputRDD(ShuffleExchangeExec.scala:118)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency$lzycompute(ShuffleExchangeExec.scala:151)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.shuffleDependency(ShuffleExchangeExec.scala:149)
at 
org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.$anonfun$doExecute$1(ShuffleExchangeExec.scala:166)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 47 more
Caused by: java.lang.RuntimeException: Couldn't find _gen_alias_139930#139930 
in 
[_gen_alias_139928#139928,_gen_alias_139929#139929,sum(cast(_gen_alias_139931#13993
 1 as bigint))#139901L]
at scala.sys.package$.error(package.scala:30)
at 
org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.$anonfun$applyOrElse$1(BoundAttribute.scala:81)
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:52)
... 90 more (state=,code=0)

 

[Expected Result] :- Select filter query on table with struct complex type 
should be success



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-35835) Select filter query on table with struct complex type fails

Reply via email to