On 5/12/2016 10:01 PM, Holden Karau wrote:
This is not the expected behavior, can you maybe post
the code where you are running into this?
Hello, thanks for replying!
Below is the function I took out from the code.
def converter(rdd: RDD[(String, JsValue)], param:String): RDD[(String, Int)] = {
// I am breaking this down for future readability and ease of optimization
// as a first attempt at solving this problem, I am not concerned with performance
// and pretty, more with accuracy ;)
// r1 will be an RDD containing only the "param" method of selection
val r1:RDD[(String,JsValue)] = rdd.filter(x => (x._2 \ "field1" \ "field2").as[String].replace("\"","") == param.replace("\"",""))
// r2 will be an RDD of Lists of fields (A1-Z1) with associated counts
// remapFields returns a List[(String,Int)]
val r2:RDD[List[(String,Int)]] = r1.map(x => remapFields(x._2 \ "extra"))
// r3 will be flattened to enable grouping
val r3:RDD[(String,Int)] = r2.flatMap(x => x)
// now we can group by entity
val r4:RDD[(String,Iterable[(String,Int)])] = r3.groupBy(x => x._1)
// and produce a mapping of entity -> count pairs
val r5:RDD[(String,Int)] = r4.map(x => (x._1, x._2.map(y => y._2).sum))
// return the result
r5
}
If I call on the above function and collectAsMap on the returned
RDD, I get an empty Map(). If I copy/paste this code into the
caller, I get the properly filled in Map.
I am fairly new to Spark and Scala so excuse any inefficiencies - my
priority was to be able to solve the problem in an obvious and
correct way and worry about making it pretty later.
Thanks!
On Thursday, May 12, 2016, Dood@ODDO
wrote:
Hello all,
I have been programming for years but this has me baffled.
I have an RDD[(String,Int)] that I return from a function after
extensive manipulation of an initial RDD of a different type.
When I return this RDD and initiate the .collectAsMap() on it
from the caller, I get an empty Map().
If I copy and paste the code from the function into the caller
(same exact code) and produce the same RDD and call
collectAsMap() on it, I get the Map with all the expected
information in it.
What gives?
Does Spark defy programming principles or am I crazy? ;-)
Thanks!
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org
--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org