On 5/12/2016 10:01 PM, Holden Karau wrote:
This is not the expected behavior, can you maybe post the code where you are running into this?


Hello, thanks for replying!

Below is the function I took out from the code.
def converter(rdd: RDD[(String, JsValue)], param:String): RDD[(String, Int)] = {
  // I am breaking this down for future readability and ease of optimization
  // as a first attempt at solving this problem, I am not concerned with performance
  // and pretty, more with accuracy ;)
  // r1 will be an RDD containing only the "param" method of selection
  val r1:RDD[(String,JsValue)] = rdd.filter(x => (x._2 \ "field1" \ "field2").as[String].replace("\"","") == param.replace("\"",""))
  // r2 will be an RDD of Lists of fields (A1-Z1) with associated counts
  // remapFields returns a List[(String,Int)]
  val r2:RDD[List[(String,Int)]] = r1.map(x => remapFields(x._2 \ "extra"))
  // r3 will be flattened to enable grouping
  val r3:RDD[(String,Int)] = r2.flatMap(x => x)
  // now we can group by entity
  val r4:RDD[(String,Iterable[(String,Int)])] = r3.groupBy(x => x._1)
  // and produce a mapping of entity -> count pairs
  val r5:RDD[(String,Int)] = r4.map(x => (x._1, x._2.map(y => y._2).sum))
  // return the result
  r5
}
If I call on the above function and collectAsMap on the returned RDD, I get an empty Map(). If I copy/paste this code into the caller, I get the properly filled in Map.

I am fairly new to Spark and Scala so excuse any inefficiencies - my priority was to be able to solve the problem in an obvious and correct way and worry about making it pretty later.

Thanks!

On Thursday, May 12, 2016, Dood@ODDO <oddodao...@gmail.com> wrote:
Hello all,

I have been programming for years but this has me baffled.

I have an RDD[(String,Int)] that I return from a function after extensive manipulation of an initial RDD of a different type. When I return this RDD and initiate the .collectAsMap() on it from the caller, I get an empty Map().

If I copy and paste the code from the function into the caller (same exact code) and produce the same RDD and call collectAsMap() on it, I get the Map with all the expected information in it.

What gives?

Does Spark defy programming principles or am I crazy? ;-)

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



--
Cell : 425-233-8271




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to