Hi! I work with a new Spark 2 datasets api. PR: https://github.com/geotrellis/geotrellis/pull/1675
The idea is to use Datasets[(K, V)] and for example to join by Key of type K. The first problems was that there are no Encoders for custom types (not products), so the workaround was to use Kryo: https://github.com/pomadchin/geotrellis/blob/4f417f3c5e99eacf2ca57b4e8405047d556beda0/spark/src/main/scala/geotrellis/spark/KryoEncoderImplicits.scala But it has a limitation, that we can't join on this K type (i suppose as Spark represents everything as byte blobs, using kryo encoder): def combineValues[R: ClassTag](other: Dataset[(K, V)])(f: (V, V) => R): Dataset[(K, R)] = { self.toDF("_1", "_2").alias("self").join(other.toDF("_1", "_2").alias("other"), $"self._1" === $"other._1").as[(K, (V, V))].mapValues({ case (tile1, tile2) => f(tile1, tile2) }) } What is the correct solution? K is important, as it is a geospatial key. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Joins-of-typed-datasets-tp27924.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org