Maybe this version is easier to use:
plist.mapValues((v) = (if (v 0) 1 else 0, 1)).reduceByKey((x, y) =
(x._1 + y._1, x._2 + y._2))
It has similar behavior with combineByKey(), will by faster than
groupByKey() version.
On Thu, Oct 9, 2014 at 9:28 PM, HARIPRIYA AYYALASOMAYAJULA
Thank you guys!
It was very helpful and now I understand it better.
On Fri, Oct 10, 2014 at 1:38 AM, Davies Liu dav...@databricks.com wrote:
Maybe this version is easier to use:
plist.mapValues((v) = (if (v 0) 1 else 0, 1)).reduceByKey((x, y) =
(x._1 + y._1, x._2 + y._2))
It has
You have a typo in your code at var acc:, and the map from opPart1
to opPart2 looks like a no-op, but those aren't the problem I think.
It sounds like you intend the first element of each pair to be a count
of nonzero values, but you initialize the first element of the pair to
v, not 1, in v =
If you just want the ratio of positive to all values per key (if I'm
reading right) this works
val reduced= input.groupByKey().map(grp=
grp._2.filter(v=v0).size.toFloat/grp._2.size)
reduced.foreach(println)
I don't think you need reduceByKey or combineByKey as you're not doing
anything where the
Hello Sean,
Thank you, but changing from v to 1 doesn't help me either.
I am trying to count the number of non-zero values using the first
accumulator.
val newlist = List ((LAX,6), (LAX,0), (LAX,7), (SFO,0), (SFO,0),
(SFO,9))
val plist = sc.parallelize(newlist)
val part1 = plist.combineByKey(
Sean,
Thank you. It works. But I am still confused about the function. Can you
kindly throw some light on it?
I was going through the example mentioned in
https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html
Is there any better source through which I can learn
It's the exact same reason you wrote:
(acc: (Int, Int), v) = ( if(v 0) acc._1 + 1 else acc._1, acc._2 + 1),
right? the first function establishes an initial value for a count.
The value is either (0,1) or (1,1) depending on whether the value is 0
or not.
You're otherwise using the method just