ahhh I get it thx!!
I did not know that we can use "double index"
I used x[0] to point on shows, x[1][0] to point on channels x[1][1] to point
on views.
I feel terribly noob.
Thank you all :)
--
View this message in context:
Hi.
Your code is like this right?
"/joined_dataset = show_channel.join(show_views) joined_dataset.take(4)/"
well /joined_dataset / is now an array (because you used /.take(4)/ ).
So it does not support any RDD operations..
Could that be the problem?
Otherwise more code is needed to
Thanks for replying so fast!
it was not clear.
my code is :
joined_dataset = show_channel.join(show_views)
for your knowledge, the first lines are
joined_dataset.take(4)
Out[93]:
[(u'PostModern_Cooking', (u'DEF', 1038)), (u'PostModern_Cooking', (u'DEF',
415)), (u'PostModern_Cooking', (u'DEF',
Hi.
Can't you do a filter, to get only the ABC shows, map that into a keyed
instance of the show,
and then do a reduceByKey to sum up the views?
Something like this in Scala code: /filter for the channel new pair
(show, view count) /
val myAnswer = joined_dataset.filter( _._2._1 == "ABC"
Yes that 's what I am trying to do, but I do not manage to "point" on the
channel field to filter on "ABC" and then in the map step to get only shows
and views.
In scala you do it with (_._2._1 == "ABC") and (_._1, _._2._2), but I don't
find the right syntax in python to do the same :(
--
View
Can't you just access it by element, like with [0] and [1] ?
http://www.tutorialspoint.com/python/python_tuples.htm
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-work-with-a-joined-rdd-in-pyspark-tp25510p25517.html
Sent from the Apache Spark