Xiangrui, is there a timeline for when UDTs will become a public API? I'm currently using them to support java 8's ZonedDateTime.
On Tue, May 19, 2015 at 3:14 PM Xiangrui Meng <men...@gmail.com> wrote: > (Note that UDT is not a public API yet.) > > On Thu, May 7, 2015 at 7:11 AM, wjur <wojtek.jurc...@gmail.com> wrote: > > Hi all! > > > > I'm using Spark 1.3.0 and I'm struggling with a definition of a new type > for > > a project I'm working on. I've created a case class Person(name: String) > and > > now I'm trying to make Spark to be able serialize and deserialize the > > defined type. I made a couple of attempts but none of them did not work > in > > 100% (there were issues either in serialization or deserialization). > > > > This is my class and the corresponding UDT. > > > > @SQLUserDefinedType(udt = classOf[PersonUDT]) > > case class Person(name: String) > > > > class PersonUDT extends UserDefinedType[Person] { > > override def sqlType: DataType = StructType(Seq(StructField("name", > > StringType))) > > > > override def serialize(obj: Any): Seq[Any] = { > > This should return a Row instance instead of Seq[Any], because the > sqlType is a struct type. > > > obj match { > > case c: Person => > > Seq(c.name) > > } > > } > > > > override def userClass: Class[Person] = classOf[Person] > > > > override def deserialize(datum: Any): Person = { > > datum match { > > case values: Seq[_] => > > assert(values.length == 1) > > Person(values.head.asInstanceOf[String]) > > case values: util.ArrayList[_] => > > Person(values.get(0).asInstanceOf[String]) > > } > > } > > > > // In some other attempt I was creating RDD of Seq with manually > > serialized data and > > // I had to override equals because two DFs with the same type weren't > > actually equal > > // StructField(person,...types.PersonUDT@a096ac3) > > // StructField(person,...types.PersonUDT@613fd937) > > def canEqual(other: Any): Boolean = other.isInstanceOf[PersonUDT] > > > > override def equals(other: Any): Boolean = other match { > > case that: PersonUDT => true > > case _ => false > > } > > > > override def hashCode(): Int = 1 > > } > > > > This is how I create RDD of Person and then try to create a DataFrame > > val rdd = sparkContext.parallelize((1 to 100).map(i => > Person(i.toString))) > > val sparkDataFrame = sqlContext.createDataFrame(rdd) > > > > The second line throws an exception: > > java.lang.ClassCastException: ....types.PersonUDT cannot be cast to > > org.apache.spark.sql.types.StructType > > at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316) > > > > I looked into the code in SQLContext.scala and it seems that the code > > requires UDT to be extending StructType but in fact it extends > > UserDefinedType which extends directly DataType. > > I'm not sure whether it is a bug or I just don't know how to use UDTs. > > > > Do you have any suggestions how to solve this? I based my UDT on > > ExamplePointUDT but it seems to be incorrect. Is there a working example > for > > UDT? > > > > > > Thank you for the reply in advance! > > wjur > > > > > > > > -- > > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/User-Defined-Type-UDT-tp22796.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > > For additional commands, e-mail: user-h...@spark.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >