[ https://issues.apache.org/jira/browse/SPARK-30421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022719#comment-17022719 ]
Tobias Hermann commented on SPARK-30421: ---------------------------------------- [~dongjoon] No, that's different. To make it equivalent, you'd have to change your example to the following: {quote}import pandas as pd df = pd.DataFrame(data=\{'foo': [0, 1], 'bar': ["a", "b"]}) df2 = df.drop(columns=["bar"]) df2[df2["bar"] == "a"] {quote} And that correctly results in {quote}KeyError: 'bar' {quote} In Spark, however, the following code works without error: {quote}val df = Seq((0, "a"), (1, "b")).toDF("foo", "bar") val df2 = df.drop("bar") df2.where($"bar" === "a").show {quote} > Dropped columns still available for filtering > --------------------------------------------- > > Key: SPARK-30421 > URL: https://issues.apache.org/jira/browse/SPARK-30421 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.4 > Reporter: Tobias Hermann > Priority: Minor > > The following minimal example: > {quote}val df = Seq((0, "a"), (1, "b")).toDF("foo", "bar") > df.select("foo").where($"bar" === "a").show > df.drop("bar").where($"bar" === "a").show > {quote} > should result in an error like the following: > {quote}org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given > input columns: [foo]; > {quote} > However, it does not but instead works without error, as if the column "bar" > would exist. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org