[ 
https://issues.apache.org/jira/browse/SPARK-30421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17022900#comment-17022900
 ] 

Tobias Hermann commented on SPARK-30421:
----------------------------------------

[~dongjoon] I'm glad we are aligned now. :)

For future reference:

The original Pandas example
{quote}df.drop(columns=["col1"]).loc[df["col1"] == 1]
{quote}
accesses the (unnamed) dataframe resulting from the drop call by row index 
(loc). This would even work (but not be very meaningful) by using a totally 
independent dataframe for this filtering.
{quote}df_foo = pd.DataFrame(data=\{'foo': [0, 1]})
df_bar = pd.DataFrame(data=\{'bar': ["a", "b"]})
df_bar.loc[df_foo["foo"] == 1]
{quote}

> Dropped columns still available for filtering
> ---------------------------------------------
>
>                 Key: SPARK-30421
>                 URL: https://issues.apache.org/jira/browse/SPARK-30421
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.4
>            Reporter: Tobias Hermann
>            Priority: Minor
>
> The following minimal example:
> {quote}val df = Seq((0, "a"), (1, "b")).toDF("foo", "bar")
> df.select("foo").where($"bar" === "a").show
> df.drop("bar").where($"bar" === "a").show
> {quote}
> should result in an error like the following:
> {quote}org.apache.spark.sql.AnalysisException: cannot resolve '`bar`' given 
> input columns: [foo];
> {quote}
> However, it does not but instead works without error, as if the column "bar" 
> would exist.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to