Re: [Spark] Working with JavaPairRDD from Scala

2017-07-22 Thread Lukasz Tracewski
Hi - and my thanks to you and Gerard. Only late hour in the night can explain 
how I could possibly miss this.

Cheers!
Lukasz

On 22/07/2017 10:48, yohann jardin wrote:

Hello Lukasz,


You can just:

val pairRdd = javapairrdd.rdd();


Then pairRdd will be of type RDD>, with K being 
com.vividsolutions.jts.geom.Polygon, and V being 
java.util.HashSet[com.vividsolutions.jts.geom.Polygon]



If you really want to continue with Java objects:

val calculateIntersection = new Function2, 
scala.collection.mutable.Set[Double]>() {}

and in the curly braces, overriding the call function.


Another solution would be to use lambda (I do not code much in scala and I'm 
definitely not sure this works, but I expect it to, so you'd have to test it):

javaparrdd.map((polygon: Polygon, hash: HashSet) => (polygon, 
hash.asScala.map(polygon.intersection(_).getArea))

________
De : Lukasz Tracewski 
<mailto:lukasz.tracew...@outlook.com>
Envoyé : samedi 22 juillet 2017 00:18
À : user@spark.apache.org<mailto:user@spark.apache.org>
Objet : [Spark] Working with JavaPairRDD from Scala


Hi,

I would like to call a method on JavaPairRDD from Scala and I am not sure how 
to write a function for the "map". I am using a third-party library that uses 
Spark for geospatial computations and it happens that it returns some results 
through Java API. I'd welcome a hint how to write a function for 'map' such 
that JavaPairRDD is happy.

Here's a signature:
org.apache.spark.api.java.JavaPairRDD[com.vividsolutions.jts.geom.Polygon,java.util.HashSet[com.vividsolutions.jts.geom.Polygon]]
 = org.apache.spark.api.java.JavaPairRDD

Normally I would write something like this:

def calculate_intersection(polygon: Polygon, hashSet: HashSet[Polygon]) = {
  (polygon, hashSet.asScala.map(polygon.intersection(_).getArea))
}

javapairrdd.map(calculate_intersection)


... but it will complain that it's not a Java Function.

My first thought was to implement the interface, i.e.:


class PairRDDWrapper extends 
org.apache.spark.api.java.function.Function2[Polygon, HashSet[Polygon]]
{
  override def call(polygon: Polygon, hashSet: HashSet[Polygon]): (Polygon, 
scala.collection.mutable.Set[Double]) = {
(polygon, hashSet.asScala.map(polygon.intersection(_).getArea))
  }
}




I am not sure though how to use it, or if it makes any sense in the first 
place. Should be simple, it's just my Java / Scala is "little rusty".


Cheers,
Lucas



[Spark] Working with JavaPairRDD from Scala

2017-07-21 Thread Lukasz Tracewski
Hi,

I would like to call a method on JavaPairRDD from Scala and I am not sure how 
to write a function for the "map". I am using a third-party library that uses 
Spark for geospatial computations and it happens that it returns some results 
through Java API. I'd welcome a hint how to write a function for 'map' such 
that JavaPairRDD is happy.

Here's a signature:
org.apache.spark.api.java.JavaPairRDD[com.vividsolutions.jts.geom.Polygon,java.util.HashSet[com.vividsolutions.jts.geom.Polygon]]
 = org.apache.spark.api.java.JavaPairRDD

Normally I would write something like this:

def calculate_intersection(polygon: Polygon, hashSet: HashSet[Polygon]) = {
  (polygon, hashSet.asScala.map(polygon.intersection(_).getArea))
}

javapairrdd.map(calculate_intersection)


... but it will complain that it's not a Java Function.

My first thought was to implement the interface, i.e.:


class PairRDDWrapper extends 
org.apache.spark.api.java.function.Function2[Polygon, HashSet[Polygon]]
{
  override def call(polygon: Polygon, hashSet: HashSet[Polygon]): (Polygon, 
scala.collection.mutable.Set[Double]) = {
(polygon, hashSet.asScala.map(polygon.intersection(_).getArea))
  }
}




I am not sure though how to use it, or if it makes any sense in the first 
place. Should be simple, it's just my Java / Scala is "little rusty".


Cheers,
Lucas