[ https://issues.apache.org/jira/browse/SPARK-30421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022900#comment-17022900 ]
Tobias Hermann commented on SPARK-30421: ---------------------------------------- [~dongjoon] I'm glad we are aligned now. :) For future reference: The original Pandas example {quote}df.drop(columns=["col1"]).loc[df["col1"] == 1] {quote} accesses the (unnamed) dataframe resulting from the drop call by row index (loc). This would even work (but not be very meaningful) by using a totally independent dataframe for this filtering. {quote}df_foo = pd.DataFrame(data=\{'foo': [0, 1]}) df_bar = pd.DataFrame(data=\{'bar': ["a", "b"]}) df_bar.loc[df_foo["foo"] == 1] {quote} > Dropped columns still available for filtering > --------------------------------------------- > > Key: SPARK-30421 > URL: https://issues.apache.org/jira/browse/SPARK-30421 > Project: Spark > Issue Type: Bug > Components: Spark Core > Affects Versions: 2.4.4 > Reporter: Tobias Hermann > Priority: Minor > > The following minimal example: > {quote}val df = Seq((0, "a"), (1, "b")).toDF("foo", "bar") > df.select("foo").where($"bar" === "a").show > df.drop("bar").where($"bar" === "a").show > {quote} > should result in an error like the following: > {quote}org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given > input columns: [foo]; > {quote} > However, it does not but instead works without error, as if the column "bar" > would exist. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org