Probably in 1.5. I made a JIRA for it: https://issues.apache.org/jira/browse/SPARK-7768. You can watch that JIRA (and vote). -Xiangrui
On Wed, May 20, 2015 at 11:03 AM, Justin Uang <justin.u...@gmail.com> wrote: > Xiangrui, is there a timeline for when UDTs will become a public API? I'm > currently using them to support java 8's ZonedDateTime. > > On Tue, May 19, 2015 at 3:14 PM Xiangrui Meng <men...@gmail.com> wrote: >> >> (Note that UDT is not a public API yet.) >> >> On Thu, May 7, 2015 at 7:11 AM, wjur <wojtek.jurc...@gmail.com> wrote: >> > Hi all! >> > >> > I'm using Spark 1.3.0 and I'm struggling with a definition of a new type >> > for >> > a project I'm working on. I've created a case class Person(name: String) >> > and >> > now I'm trying to make Spark to be able serialize and deserialize the >> > defined type. I made a couple of attempts but none of them did not work >> > in >> > 100% (there were issues either in serialization or deserialization). >> > >> > This is my class and the corresponding UDT. >> > >> > @SQLUserDefinedType(udt = classOf[PersonUDT]) >> > case class Person(name: String) >> > >> > class PersonUDT extends UserDefinedType[Person] { >> > override def sqlType: DataType = StructType(Seq(StructField("name", >> > StringType))) >> > >> > override def serialize(obj: Any): Seq[Any] = { >> >> This should return a Row instance instead of Seq[Any], because the >> sqlType is a struct type. >> >> > obj match { >> > case c: Person => >> > Seq(c.name) >> > } >> > } >> > >> > override def userClass: Class[Person] = classOf[Person] >> > >> > override def deserialize(datum: Any): Person = { >> > datum match { >> > case values: Seq[_] => >> > assert(values.length == 1) >> > Person(values.head.asInstanceOf[String]) >> > case values: util.ArrayList[_] => >> > Person(values.get(0).asInstanceOf[String]) >> > } >> > } >> > >> > // In some other attempt I was creating RDD of Seq with manually >> > serialized data and >> > // I had to override equals because two DFs with the same type weren't >> > actually equal >> > // StructField(person,...types.PersonUDT@a096ac3) >> > // StructField(person,...types.PersonUDT@613fd937) >> > def canEqual(other: Any): Boolean = other.isInstanceOf[PersonUDT] >> > >> > override def equals(other: Any): Boolean = other match { >> > case that: PersonUDT => true >> > case _ => false >> > } >> > >> > override def hashCode(): Int = 1 >> > } >> > >> > This is how I create RDD of Person and then try to create a DataFrame >> > val rdd = sparkContext.parallelize((1 to 100).map(i => >> > Person(i.toString))) >> > val sparkDataFrame = sqlContext.createDataFrame(rdd) >> > >> > The second line throws an exception: >> > java.lang.ClassCastException: ....types.PersonUDT cannot be cast to >> > org.apache.spark.sql.types.StructType >> > at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316) >> > >> > I looked into the code in SQLContext.scala and it seems that the code >> > requires UDT to be extending StructType but in fact it extends >> > UserDefinedType which extends directly DataType. >> > I'm not sure whether it is a bug or I just don't know how to use UDTs. >> > >> > Do you have any suggestions how to solve this? I based my UDT on >> > ExamplePointUDT but it seems to be incorrect. Is there a working example >> > for >> > UDT? >> > >> > >> > Thank you for the reply in advance! >> > wjur >> > >> > >> > >> > -- >> > View this message in context: >> > http://apache-spark-user-list.1001560.n3.nabble.com/User-Defined-Type-UDT-tp22796.html >> > Sent from the Apache Spark User List mailing list archive at Nabble.com. >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> > For additional commands, e-mail: user-h...@spark.apache.org >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org >> For additional commands, e-mail: user-h...@spark.apache.org >> > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org