>>> df.show(3)

+----+-----+

|word|count|

+----+-----+

|  on|    1|

| dec|    1|

|2020|    1|

+----+-----+

only showing top 3 rows


>>> df2.show(3)

+--------+-----+

|stopword|count|

+--------+-----+

|    able|    1|

|   about|    1|

|   above|    1|

+--------+-----+

only showing top 3 rows


>>> df3=df.filter(~col("word").isin(df2.stopword ))

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

  File "/opt/spark/python/pyspark/sql/dataframe.py", line 1733, in filter

    jdf = self._jdf.filter(condition._jc)

  File "/opt/spark/python/lib/py4j-0.10.9.2-src.zip/py4j/java_gateway.py",
line 1310, in __call__

  File "/opt/spark/python/pyspark/sql/utils.py", line 117, in deco

    raise converted from None

pyspark.sql.utils.AnalysisException: Resolved attribute(s) stopword#4
missing from word#0,count#1L in operator !Filter NOT word#0 IN
(stopword#4).;

!Filter NOT word#0 IN (stopword#4)

+- LogicalRDD [word#0, count#1L], false





The filter method doesn't work here.

Maybe I need a join for two DF?

What's the syntax for this?



Thank you and regards,

Bitfox

Reply via email to