Hi! 
I work with a new Spark 2 datasets api. PR:
https://github.com/geotrellis/geotrellis/pull/1675

The idea is to use Datasets[(K, V)] and for example to join by Key of type
K. 
The first problems was that there are no Encoders for custom types (not
products), so the workaround was to use Kryo:
https://github.com/pomadchin/geotrellis/blob/4f417f3c5e99eacf2ca57b4e8405047d556beda0/spark/src/main/scala/geotrellis/spark/KryoEncoderImplicits.scala

But it has a limitation, that we can't join on this K type (i suppose as
Spark represents everything as byte blobs, using kryo encoder): 

def combineValues[R: ClassTag](other: Dataset[(K, V)])(f: (V, V) => R):
Dataset[(K, R)] = {
  self.toDF("_1", "_2").alias("self").join(other.toDF("_1",
"_2").alias("other"), $"self._1" === $"other._1").as[(K, (V,
V))].mapValues({ case (tile1, tile2) =>
    f(tile1, tile2)
  })
}

What is the correct solution? K is important, as it is a geospatial key.




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Joins-of-typed-datasets-tp27924.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to