Re: User Defined Type (UDT)

Justin Uang Wed, 20 May 2015 11:06:00 -0700

Xiangrui, is there a timeline for when UDTs will become a public API? I'm
currently using them to support java 8's ZonedDateTime.


On Tue, May 19, 2015 at 3:14 PM Xiangrui Meng <men...@gmail.com> wrote:

> (Note that UDT is not a public API yet.)
>
> On Thu, May 7, 2015 at 7:11 AM, wjur <wojtek.jurc...@gmail.com> wrote:
> > Hi all!
> >
> > I'm using Spark 1.3.0 and I'm struggling with a definition of a new type
> for
> > a project I'm working on. I've created a case class Person(name: String)
> and
> > now I'm trying to make Spark to be able serialize and deserialize the
> > defined type. I made a couple of attempts but none of them did not work
> in
> > 100% (there were issues either in serialization or deserialization).
> >
> > This is my class and the corresponding UDT.
> >
> > @SQLUserDefinedType(udt = classOf[PersonUDT])
> > case class Person(name: String)
> >
> > class PersonUDT extends UserDefinedType[Person] {
> >   override def sqlType: DataType = StructType(Seq(StructField("name",
> > StringType)))
> >
> >   override def serialize(obj: Any): Seq[Any] = {
>
> This should return a Row instance instead of Seq[Any], because the
> sqlType is a struct type.
>
> >     obj match {
> >       case c: Person =>
> >         Seq(c.name)
> >     }
> >   }
> >
> >   override def userClass: Class[Person] = classOf[Person]
> >
> >   override def deserialize(datum: Any): Person = {
> >     datum match {
> >       case values: Seq[_] =>
> >         assert(values.length == 1)
> >         Person(values.head.asInstanceOf[String])
> >       case values: util.ArrayList[_] =>
> >         Person(values.get(0).asInstanceOf[String])
> >     }
> >   }
> >
> >   // In some other attempt I was creating RDD of Seq with manually
> > serialized data and
> >   // I had to override equals because two DFs with the same type weren't
> > actually equal
> >   // StructField(person,...types.PersonUDT@a096ac3)
> >   // StructField(person,...types.PersonUDT@613fd937)
> >   def canEqual(other: Any): Boolean = other.isInstanceOf[PersonUDT]
> >
> >   override def equals(other: Any): Boolean = other match {
> >     case that: PersonUDT => true
> >     case _ => false
> >   }
> >
> >   override def hashCode(): Int = 1
> > }
> >
> > This is how I create RDD of Person and then try to create a DataFrame
> > val rdd = sparkContext.parallelize((1 to 100).map(i =>
> Person(i.toString)))
> > val sparkDataFrame = sqlContext.createDataFrame(rdd)
> >
> > The second line throws an exception:
> > java.lang.ClassCastException: ....types.PersonUDT cannot be cast to
> > org.apache.spark.sql.types.StructType
> > at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:316)
> >
> > I looked into the code in SQLContext.scala and it seems that the code
> > requires UDT to be extending StructType but in fact it extends
> > UserDefinedType which extends directly DataType.
> > I'm not sure whether it is a bug or I just don't know how to use UDTs.
> >
> > Do you have any suggestions how to solve this? I based my UDT on
> > ExamplePointUDT but it seems to be incorrect. Is there a working example
> for
> > UDT?
> >
> >
> > Thank you for the reply in advance!
> > wjur
> >
> >
> >
> > --
> > View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/User-Defined-Type-UDT-tp22796.html
> > Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> > For additional commands, e-mail: user-h...@spark.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: User Defined Type (UDT)

Reply via email to