[ https://issues.apache.org/jira/browse/SPARK-3581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Shawn Guo updated SPARK-3581: ----------------------------- Description: Construct a RDD of dictionaries(dictRDD), try to use the RDD API, RDD.distinct() or RDD.subtract(). {code:title=PySpark RDD APIborderStyle=solid} dictRDD = sc.parallelize(({'MOVIE_ID': 1, 'MOVIE_NAME': 'Lord of the Rings','MOVIE_DIRECTOR': 'Peter Jackson'},{'MOVIE_ID': 2, 'MOVIE_NAME': 'King King', 'MOVIE_DIRECTOR': 'Peter Jackson'},{'MOVIE_ID': 2, 'MOVIE_NAME': 'King King', 'MOVIE_DIRECTOR': 'Peter Jackson'})) dictRDD.distinct().collect() dictRDD.subtract(dictRDD).collect() {code} An error occurred while calling, TypeError: unhashable type: 'dict' I'm not sure if it is a bug or expected results. was: Construct a RDD of dictionaries(dictRDD), try to use the RDD API, RDD.distinct() or RDD.subtract(). dictRDD = sc.parallelize(({'MOVIE_ID': 1, 'MOVIE_NAME': 'Lord of the Rings','MOVIE_DIRECTOR': 'Peter Jackson'},{'MOVIE_ID': 2, 'MOVIE_NAME': 'King King', 'MOVIE_DIRECTOR': 'Peter Jackson'},{'MOVIE_ID': 2, 'MOVIE_NAME': 'King King', 'MOVIE_DIRECTOR': 'Peter Jackson'})) dictRDD.distinct().collect() dictRDD.subtract(dictRDD).collect() An error occurred while calling, TypeError: unhashable type: 'dict' I'm not sure if it is a bug or expected results. > RDD API(distinct/subtract) does not work for RDD of Dictionaries > ---------------------------------------------------------------- > > Key: SPARK-3581 > URL: https://issues.apache.org/jira/browse/SPARK-3581 > Project: Spark > Issue Type: Bug > Components: PySpark > Affects Versions: 1.1.0 > Environment: Spark 1.0 1.1 > JDK 1.6 > Reporter: Shawn Guo > Priority: Minor > > Construct a RDD of dictionaries(dictRDD), > try to use the RDD API, RDD.distinct() or RDD.subtract(). > {code:title=PySpark RDD APIborderStyle=solid} > dictRDD = sc.parallelize(({'MOVIE_ID': 1, 'MOVIE_NAME': 'Lord of the > Rings','MOVIE_DIRECTOR': 'Peter Jackson'},{'MOVIE_ID': 2, 'MOVIE_NAME': 'King > King', 'MOVIE_DIRECTOR': 'Peter Jackson'},{'MOVIE_ID': 2, 'MOVIE_NAME': 'King > King', 'MOVIE_DIRECTOR': 'Peter Jackson'})) > dictRDD.distinct().collect() > dictRDD.subtract(dictRDD).collect() > {code} > An error occurred while calling, > TypeError: unhashable type: 'dict' > I'm not sure if it is a bug or expected results. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org