You can use left anti join instead. isin accept a list type, not a column
type.
On Mon, Jan 24, 2022 at 01:38 Bitfox wrote:
> >>> df.show(3)
>
> ++-+
>
> |word|count|
>
> ++-+
>
> | on|1|
>
> | dec|1|
>
> |2020|1|
>
> ++-+
>
> only showing top 3 rows
>
>
> >>> df2.show(3)
>
> ++-+
>
> |stopword|count|
>
> ++-+
>
> |able|1|
>
> | about|1|
>
> | above|1|
>
> ++-+
>
> only showing top 3 rows
>
>
> >>> df3=df.filter(~col("word").isin(df2.stopword ))
>
> Traceback (most recent call last):
>
> File "", line 1, in
>
> File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter
>
> jdf = self._jdf.filter(condition._jc)
>
> File
> "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line
> 1310, in __call__
>
> File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco
>
> raise converted from None
>
> pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4
> missing from word#0,count#1L in operator !Filter NOT word#0 IN
> (stopword#4).;
>
> !Filter NOT word#0 IN (stopword#4)
>
> +- LogicalRDD [word#0, count#1L], false
>
>
>
>
>
> The filter method doesn't work here.
>
> Maybe I need a join for two DF?
>
> What's the syntax for this?
>
>
>
> Thank you and regards,
>
> Bitfox
>
--
Gary Liu