Re: Help with using combineByKey

2014-10-10 Thread Davies Liu
Maybe this version is easier to use: plist.mapValues((v) = (if (v 0) 1 else 0, 1)).reduceByKey((x, y) = (x._1 + y._1, x._2 + y._2)) It has similar behavior with combineByKey(), will by faster than groupByKey() version. On Thu, Oct 9, 2014 at 9:28 PM, HARIPRIYA AYYALASOMAYAJULA

Re: Help with using combineByKey

2014-10-10 Thread HARIPRIYA AYYALASOMAYAJULA
Thank you guys! It was very helpful and now I understand it better. On Fri, Oct 10, 2014 at 1:38 AM, Davies Liu dav...@databricks.com wrote: Maybe this version is easier to use: plist.mapValues((v) = (if (v 0) 1 else 0, 1)).reduceByKey((x, y) = (x._1 + y._1, x._2 + y._2)) It has

Re: Help with using combineByKey

2014-10-09 Thread Sean Owen
You have a typo in your code at var acc:, and the map from opPart1 to opPart2 looks like a no-op, but those aren't the problem I think. It sounds like you intend the first element of each pair to be a count of nonzero values, but you initialize the first element of the pair to v, not 1, in v =

Re: Help with using combineByKey

2014-10-09 Thread Yana Kadiyska
If you just want the ratio of positive to all values per key (if I'm reading right) this works val reduced= input.groupByKey().map(grp= grp._2.filter(v=v0).size.toFloat/grp._2.size) reduced.foreach(println) I don't think you need reduceByKey or combineByKey as you're not doing anything where the

Re: Help with using combineByKey

2014-10-09 Thread HARIPRIYA AYYALASOMAYAJULA
Hello Sean, Thank you, but changing from v to 1 doesn't help me either. I am trying to count the number of non-zero values using the first accumulator. val newlist = List ((LAX,6), (LAX,0), (LAX,7), (SFO,0), (SFO,0), (SFO,9)) val plist = sc.parallelize(newlist) val part1 = plist.combineByKey(

Re: Help with using combineByKey

2014-10-09 Thread HARIPRIYA AYYALASOMAYAJULA
Sean, Thank you. It works. But I am still confused about the function. Can you kindly throw some light on it? I was going through the example mentioned in https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch04.html Is there any better source through which I can learn

Re: Help with using combineByKey

2014-10-09 Thread Sean Owen
It's the exact same reason you wrote: (acc: (Int, Int), v) = ( if(v 0) acc._1 + 1 else acc._1, acc._2 + 1), right? the first function establishes an initial value for a count. The value is either (0,1) or (1,1) depending on whether the value is 0 or not. You're otherwise using the method just