[ https://issues.apache.org/jira/browse/SPARK-20744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16032679#comment-16032679 ]
Bogdan Raducanu edited comment on SPARK-20744 at 6/1/17 9:19 AM: ----------------------------------------------------------------- Array generally needs all components to be same type. Casts are added automatically but it's not always possible: {code} sql("select array(now(), 1)").show {code} {code} org.apache.spark.sql.AnalysisException: cannot resolve 'array(current_timestamp(), 1)' due to data type mismatch: input to function array should all be the same type, but it's [timestamp, int]; line 1 pos 7; {code} was (Author: bograd): Array generally needs all components to be same type. Casts are added automatically but it's not always possible: ```sql("select array(now(), 1)").show``` ```org.apache.spark.sql.AnalysisException: cannot resolve 'array(current_timestamp(), 1)' due to data type mismatch: input to function array should all be the same type, but it's [timestamp, int]; line 1 pos 7;``` > Predicates with multiple columns do not work > -------------------------------------------- > > Key: SPARK-20744 > URL: https://issues.apache.org/jira/browse/SPARK-20744 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.3.0 > Reporter: Bogdan Raducanu > > The following code reproduces the problem: > {code} > scala> spark.range(10).selectExpr("id as a", "id as b").where("(a,b) in > ((1,1))").show > org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', > `a`, 'b', `b`) IN (named_struct('col1', 1, 'col2', 1)))' due to data type > mismatch: Arguments must be same type; line 1 pos 6; > 'Filter named_struct(a, a#42L, b, b#43L) IN (named_struct(col1, 1, col2, 1)) > +- Project [id#39L AS a#42L, id#39L AS b#43L] > +- Range (0, 10, step=1, splits=Some(1)) > {code} > Similarly it won't work from SQL either, which is something that other SQL DB > support: > {code} > scala> spark.range(10).selectExpr("id as a", "id as > b").createOrReplaceTempView("tab1") > scala> sql("select * from tab1 where (a,b) in ((1,1), (2,2))").show > org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', > tab1.`a`, 'b', tab1.`b`) IN (named_struct('col1', 1, 'col2', 1), > named_struct('col1', 2, 'col2', 2)))' due to data type mismatch: Arguments > must be same type; line 1 pos 31; > 'Project [*] > +- 'Filter named_struct(a, a#50L, b, b#51L) IN (named_struct(col1, 1, col2, > 1),named_struct(col1, 2, col2, 2)) > +- SubqueryAlias tab1 > +- Project [id#47L AS a#50L, id#47L AS b#51L] > +- Range (0, 10, step=1, splits=Some(1)) > {code} > Other examples: > {code} > scala> sql("select * from tab1 where (a,b) =(1,1)").show > org.apache.spark.sql.AnalysisException: cannot resolve '(named_struct('a', > tab1.`a`, 'b', tab1.`b`) = named_struct('col1', 1, 'col2', 1))' due to data > type mismatch: differing types in '(named_struct('a', tab1.`a`, 'b', > tab1.`b`) = named_struct('col1', 1, 'col2', 1))' (struct<a:bigint,b:bigint> > and struct<col1:int,col2:int>).; line 1 pos 25; > 'Project [*] > +- 'Filter (named_struct(a, a#50L, b, b#51L) = named_struct(col1, 1, col2, 1)) > +- SubqueryAlias tab1 > +- Project [id#47L AS a#50L, id#47L AS b#51L] > +- Range (0, 10, step=1, splits=Some(1)) > {code} > Expressions such as (1,1) are apparently read as structs and then the types > do not match. Perhaps they should be arrays. > The following code works: > {code} > sql("select * from tab1 where array(a,b) in (array(1,1),array(2,2))").show > {code} > This also works, but requires the cast: > {code} > sql("select * from tab1 where (a,b) in (named_struct('a', cast(1 as bigint), > 'b', cast(1 as bigint)))").show > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org