MIK created SPARK-25051:
---------------------------

             Summary: where clause on dataset gives AnalysisException
                 Key: SPARK-25051
                 URL: https://issues.apache.org/jira/browse/SPARK-25051
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 2.3.0
            Reporter: MIK


*schemas :*
df1
=> id ts
df2
=> id name country

*code:*

val df = df1.join(df2, Seq("id"), "left_outer").where(df2("id").isNull)

*error*:

org.apache.spark.sql.AnalysisException:Resolved attribute(s) id#0 missing from 
xx#15,xx#9L,id#5,xx#6,xx#11,xx#14,xx#13,xx#12,xx#7,xx#16,xx#10,xx#8L in 
operator !Filter isnull(id#0). Attribute(s) with the same name appear in the 
operation: id. Please check if the right attribute(s) are used.;;

 at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.failAnalysis(CheckAnalysis.scala:41)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.failAnalysis(Analyzer.scala:91)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:289)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:80)
    at 
org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:127)
    at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:80)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:91)
    at 
org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:104)
    at 
org.apache.spark.sql.execution.QueryExecution.analyzed$lzycompute(QueryExecution.scala:57)
    at 
org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)
    at 
org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:172)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:178)
    at org.apache.spark.sql.Dataset$.apply(Dataset.scala:65)
    at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3300)
    at org.apache.spark.sql.Dataset.filter(Dataset.scala:1458)
    at org.apache.spark.sql.Dataset.where(Dataset.scala:1486)

This works fine in spark 2.2.2



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to