Re: Confused - returning RDDs from functions

2016-05-13 Thread Dood

  
  
On 5/12/2016 10:01 PM, Holden Karau wrote:
This is not the expected behavior, can you maybe post
  the code where you are running into this?
  


Hello, thanks for replying!

Below is the function I took out from the code.

def converter(rdd: RDD[(String, JsValue)], param:String): RDD[(String, Int)] = {
  // I am breaking this down for future readability and ease of optimization
  // as a first attempt at solving this problem, I am not concerned with performance
  // and pretty, more with accuracy ;)
  // r1 will be an RDD containing only the "param" method of selection
  val r1:RDD[(String,JsValue)] = rdd.filter(x => (x._2 \ "field1" \ "field2").as[String].replace("\"","") == param.replace("\"",""))
  // r2 will be an RDD of Lists of fields (A1-Z1) with associated counts
  // remapFields returns a List[(String,Int)]
  val r2:RDD[List[(String,Int)]] = r1.map(x => remapFields(x._2 \ "extra"))
  // r3 will be flattened to enable grouping
  val r3:RDD[(String,Int)] = r2.flatMap(x => x)
  // now we can group by entity
  val r4:RDD[(String,Iterable[(String,Int)])] = r3.groupBy(x => x._1)
  // and produce a mapping of entity -> count pairs
  val r5:RDD[(String,Int)] = r4.map(x => (x._1, x._2.map(y => y._2).sum))
  // return the result
  r5
}

If I call on the above function and collectAsMap on the returned
RDD, I get an empty Map(). If I copy/paste this code into the
caller, I get the properly filled in Map.

I am fairly new to Spark and Scala so excuse any inefficiencies - my
priority was to be able to solve the problem in an obvious and
correct way and worry about making it pretty later. 

Thanks!

On Thursday, May 12, 2016, Dood@ODDO 
  wrote:
  Hello all,

I have been programming for years but this has me baffled.

I have an RDD[(String,Int)] that I return from a function after
extensive manipulation of an initial RDD of a different type.
When I return this RDD and initiate the .collectAsMap() on it
from the caller, I get an empty Map().

If I copy and paste the code from the function into the caller
(same exact code) and produce the same RDD and call
collectAsMap() on it, I get the Map with all the expected
information in it.

What gives?

Does Spark defy programming principles or am I crazy? ;-)

Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

  
  
  
  -- 
  

  
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau
  

  
  


  


-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: Confused - returning RDDs from functions

2016-05-12 Thread Holden Karau
This is not the expected behavior, can you maybe post the code where you
are running into this?

On Thursday, May 12, 2016, Dood@ODDO  wrote:

> Hello all,
>
> I have been programming for years but this has me baffled.
>
> I have an RDD[(String,Int)] that I return from a function after extensive
> manipulation of an initial RDD of a different type. When I return this RDD
> and initiate the .collectAsMap() on it from the caller, I get an empty
> Map().
>
> If I copy and paste the code from the function into the caller (same exact
> code) and produce the same RDD and call collectAsMap() on it, I get the Map
> with all the expected information in it.
>
> What gives?
>
> Does Spark defy programming principles or am I crazy? ;-)
>
> Thanks!
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Confused - returning RDDs from functions

2016-05-12 Thread Dood

Hello all,

I have been programming for years but this has me baffled.

I have an RDD[(String,Int)] that I return from a function after 
extensive manipulation of an initial RDD of a different type. When I 
return this RDD and initiate the .collectAsMap() on it from the caller, 
I get an empty Map().


If I copy and paste the code from the function into the caller (same 
exact code) and produce the same RDD and call collectAsMap() on it, I 
get the Map with all the expected information in it.


What gives?

Does Spark defy programming principles or am I crazy? ;-)

Thanks!

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org