[ https://issues.apache.org/jira/browse/SPARK-23945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16433316#comment-16433316 ]
Nicholas Chammas commented on SPARK-23945: ------------------------------------------ I always looked at DataFrames and SQL as two different interfaces to the same underlying logical model, so I just assumed that the vision was for them to be equally powerful. Is that not the case? So in the grand scheme of things I'd expect DataFrames to be able to do everything that SQL can and vice versa, but for the narrow purposes of this ticket I'm just interested in {{IN }}and {{NOT IN.}} > Column.isin() should accept a single-column DataFrame as input > -------------------------------------------------------------- > > Key: SPARK-23945 > URL: https://issues.apache.org/jira/browse/SPARK-23945 > Project: Spark > Issue Type: Improvement > Components: SQL > Affects Versions: 2.3.0 > Reporter: Nicholas Chammas > Priority: Minor > > In SQL you can filter rows based on the result of a subquery: > {code:java} > SELECT * > FROM table1 > WHERE name NOT IN ( > SELECT name > FROM table2 > );{code} > In the Spark DataFrame API, the equivalent would probably look like this: > {code:java} > (table1 > .where( > ~col('name').isin( > table2.select('name') > ) > ) > ){code} > However, .isin() currently [only accepts a local list of > values|http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.Column.isin]. > I imagine making this enhancement would happen as part of a larger effort to > support correlated subqueries in the DataFrame API. > Or perhaps there is no plan to support this style of query in the DataFrame > API, and queries like this should instead be written in a different way? How > would we write a query like the one I have above in the DataFrame API, > without needing to collect values locally for the NOT IN filter? > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org