Thanks for replying so fast! it was not clear. my code is : joined_dataset = show_channel.join(show_views)
for your knowledge, the first lines are joined_dataset.take(4) Out[93]: [(u'PostModern_Cooking', (u'DEF', 1038)), (u'PostModern_Cooking', (u'DEF', 415)), (u'PostModern_Cooking', (u'DEF', 100)), (u'PostModern_Cooking', (u'DEF', 597))] I would like to sum views per show for channel = "ABC" -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/How-to-work-with-a-joined-rdd-in-pyspark-tp25510p25512.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org