you can still use it as Dataset[Set[X]]. all transformations should work
correctly.

however dataset.schema will show binary type, and dataset.show will show
bytes (unfortunately).

for example:

scala> implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]]
setEncoder: [X]=> org.apache.spark.sql.Encoder[Set[X]]

scala> val x = Seq(Set(1,2,3)).toDS
x: org.apache.spark.sql.Dataset[scala.collection.immutable.Set[Int]] =
[value: binary]

scala> x.map(_ + 4).collect
res17: Array[scala.collection.immutable.Set[Int]] = Array(Set(1, 2, 3, 4))

scala> x.show
+--------------------+
|               value|
+--------------------+
|[2A 01 03 02 02 0...|
+--------------------+


scala> x.schema
res19: org.apache.spark.sql.types.StructType =
StructType(StructField(value,BinaryType,true))


On Wed, Feb 1, 2017 at 12:03 PM, Jerry Lam <chiling...@gmail.com> wrote:

> Hi Koert,
>
> Thanks for the tips. I tried to do that but the column's type is now
> Binary. Do I get the Set[X] back in the Dataset?
>
> Best Regards,
>
> Jerry
>
>
> On Tue, Jan 31, 2017 at 8:04 PM, Koert Kuipers <ko...@tresata.com> wrote:
>
>> set is currently not supported. you can use kryo encoder. there is no
>> other work around that i know of.
>>
>> import org.apache.spark.sql.{ Encoder, Encoders }
>> implicit def setEncoder[X]: Encoder[Set[X]] = Encoders.kryo[Set[X]]
>>
>> On Tue, Jan 31, 2017 at 7:33 PM, Jerry Lam <chiling...@gmail.com> wrote:
>>
>>> Hi guys,
>>>
>>> I got an exception like the following, when I tried to implement a user
>>> defined aggregation function.
>>>
>>>  Exception in thread "main" java.lang.UnsupportedOperationException: No
>>> Encoder found for Set[(scala.Long, scala.Long)]
>>>
>>> The Set[(Long, Long)] is a field in the case class which is the output
>>> type for the aggregation.
>>>
>>> Is there a workaround for this?
>>>
>>> Best Regards,
>>>
>>> Jerry
>>>
>>
>>
>

Reply via email to