[ https://issues.apache.org/jira/browse/SPARK-18065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen resolved SPARK-18065. ------------------------------- Resolution: Not A Problem > Spark 2 allows filter/where on columns not in current schema > ------------------------------------------------------------ > > Key: SPARK-18065 > URL: https://issues.apache.org/jira/browse/SPARK-18065 > Project: Spark > Issue Type: Bug > Affects Versions: 2.0.0, 2.0.1 > Reporter: Matthew Scruggs > Priority: Minor > > I noticed in Spark 2 (unlike 1.6) it's possible to use filter/where on a > DataFrame that previously had a column, but no longer has it in its schema > due to a select() operation. > In Spark 1.6.2, in spark-shell, we see that an exception is thrown when > attempting to filter/where using the selected-out column: > {code:title=Spark 1.6.2} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 1.6.2 > /_/ > Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) > Type in expressions to have them evaluated. > Type :help for more information. > Spark context available as sc. > SQL context available as sqlContext. > scala> val df1 = sqlContext.createDataFrame(sc.parallelize(Seq((1, "one"), > (2, "two")))).selectExpr("_1 as id", "_2 as word") > df1: org.apache.spark.sql.DataFrame = [id: int, word: string] > scala> df1.show() > +---+----+ > | id|word| > +---+----+ > | 1| one| > | 2| two| > +---+----+ > scala> val df2 = df1.select("id") > df2: org.apache.spark.sql.DataFrame = [id: int] > scala> df2.printSchema() > root > |-- id: integer (nullable = false) > scala> df2.where("word = 'one'").show() > org.apache.spark.sql.AnalysisException: cannot resolve 'word' given input > columns: [id]; > {code} > However in Spark 2.0.0 and 2.0.1, we see that the same filter/where succeeds > (no AnalysisException) and seems to filter out data as if the column remains: > {code:title=Spark 2.0.1} > Welcome to > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /___/ .__/\_,_/_/ /_/\_\ version 2.0.1 > /_/ > > Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_60) > Type in expressions to have them evaluated. > Type :help for more information. > scala> val df1 = sc.parallelize(Seq((1, "one"), (2, > "two"))).toDF().selectExpr("_1 as id", "_2 as word") > df1: org.apache.spark.sql.DataFrame = [id: int, word: string] > scala> df1.show() > +---+----+ > | id|word| > +---+----+ > | 1| one| > | 2| two| > +---+----+ > scala> val df2 = df1.select("id") > df2: org.apache.spark.sql.DataFrame = [id: int] > scala> df2.printSchema() > root > |-- id: integer (nullable = false) > scala> df2.where("word = 'one'").show() > +---+ > | id| > +---+ > | 1| > +---+ > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org