>>> df.show(3) +----+-----+
|word|count| +----+-----+ | on| 1| | dec| 1| |2020| 1| +----+-----+ only showing top 3 rows >>> df2.show(3) +--------+-----+ |stopword|count| +--------+-----+ | able| 1| | about| 1| | above| 1| +--------+-----+ only showing top 3 rows >>> df3=df.filter(~col("word").isin(df2.stopword )) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter jdf = self._jdf.filter(condition._jc) File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py", line 1310, in __call__ File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco raise converted from None pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4 missing from word#0,count#1L in operator !Filter NOT word#0 IN (stopword#4).; !Filter NOT word#0 IN (stopword#4) +- LogicalRDD [word#0, count#1L], false The filter method doesn't work here. Maybe I need a join for two DF? What's the syntax for this? Thank you and regards, Bitfox