Joins of typed datasets

daunnc Wed, 19 Oct 2016 05:29:13 -0700

Hi! 
I work with a new Spark 2 datasets api. PR:
https://github.com/geotrellis/geotrellis/pull/1675


The idea is to use Datasets[(K, V)] and for example to join by Key of type
K. 
The first problems was that there are no Encoders for custom types (not
products), so the workaround was to use Kryo:
https://github.com/pomadchin/geotrellis/blob/4f417f3c5e99eacf2ca57b4e8405047d556beda0/spark/src/main/scala/geotrellis/spark/KryoEncoderImplicits.scala

But it has a limitation, that we can't join on this K type (i suppose as
Spark represents everything as byte blobs, using kryo encoder): 

def combineValues[R: ClassTag](other: Dataset[(K, V)])(f: (V, V) => R):
Dataset[(K, R)] = {
  self.toDF("_1", "_2").alias("self").join(other.toDF("_1",
"_2").alias("other"), $"self._1" === $"other._1").as[(K, (V,
V))].mapValues({ case (tile1, tile2) =>
    f(tile1, tile2)
  })
}

What is the correct solution? K is important, as it is a geospatial key.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joins-of-typed-datasets-tp27924.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Joins of typed datasets

Reply via email to