Re: How to work with a joined rdd in pyspark?

Gylfi Sun, 29 Nov 2015 20:35:10 -0800

Hi. 

Can't you do a filter, to get only the ABC shows, map that into a keyed
instance of the show, 
and then do a reduceByKey to sum up the views?


Something like this in Scala code:  /filter for the channel     new pair
(show, view count) /
val myAnswer = joined_dataset.filter( _._2._1 == "ABC" ).map( (_._1,
_._2._2)
  .reduceByKey( (a,b) => a + b ) 

This should give you an RDD of one record per show and the summed view count
but only for shows on ABC right? 



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-work-with-a-joined-rdd-in-pyspark-tp25510p25514.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Re: How to work with a joined rdd in pyspark?

Reply via email to