[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16592128#comment-16592128 ]
Alexander edited comment on SPARK-7768 at 8/24/18 8:22 PM: ----------------------------------------------------------- I've also noticed that there are some idiosyncracies in the className -> class functionality when it comes to path dependent types. If you have some PDT Foo.Bar e.g: {code:java} object Foo { case object Bar { name:String = "Bar" } } {code} it sometimes expects Foo.Bar e.g. here: {code:java} // org.apache.spark.sql.catalyst.ScalaReflection case t if UDTRegistration.exists(getClassNameFromType(t)) // t evaluates to Foo.Bar {code} ... but other times expects Foo$Bar e.g. here: {code:java} // org.apache.spark.sql.catalyst.encoders.RowEncoder case udt: UserDefinedType[_] => ... UDTRegistration.getUDTFor(udt.userClass.getName).getOrElse { {code} So the only way to get the PDT to work properly was to register both variants like this: {code:java} UDTRegistration.register(classOf[Foo.Bar].getName.replace("$","."), classOf[BarUDT].getName) UDTRegistration.register(classOf[Foo.Bar].getName, classOf[BarUDT].getName) {code} Now all of this is nice and well but I still can't seem to find a way to query object fields of this UDT in SQL e.g: {code:java} case class UsingBar(id:Int, value:Bar) Seq(UsingBar(1, Bar), UsingBar(2, Bar)).toDS.select($"value.name" === "Bar"){code} This gives the following error: {code:java} Can't extract value from value#3: need struct type but got bar; {code} Perhaps because the Bar UDT does not have the right kind of data type i.e. it's not a struct! {code:java} StructType(StructField(id,IntegerType,false), StructField(e,org.apache.spark.BarUDT@61a2aeb7,true)) {code} Isn't the whole point of giving these UDTs a StructType varaible to make them queryable? Is there something I'm missing here? was (Author: alexanderioffe): I've also noticed that there are some idiosyncracies in the className -> class functionality when it comes to path dependent types. If you have some PDT Foo.Bar e.g: {code:java} object Foo { case object Bar { name:String = "Bar" } } {code} it sometimes expects Foo.Bar e.g. here: {code:java} // org.apache.spark.sql.catalyst.ScalaReflection case t if UDTRegistration.exists(getClassNameFromType(t)) // t evaluates to Foo.Bar {code} ... but other times expects Foo$Bar e.g. here: {code:java} // org.apache.spark.sql.catalyst.encoders.RowEncoder case udt: UserDefinedType[_] => ... UDTRegistration.getUDTFor(udt.userClass.getName).getOrElse { {code} So the only way to get the PDT to work properly was to register both variants like this: {code:java} UDTRegistration.register(classOf[Foo.Bar].getName.replace("$","."), classOf[BarUDT].getName) UDTRegistration.register(classOf[Foo.Bar].getName, classOf[BarUDT].getName) {code} Now all of this is nice and well but I still can't seem to find a way to query object fields of this UDT in SQL e.g: {code:java} case class UsingBar(id:Int, value:Bar) Seq(UsingBar(1, Bar), UsingBar(2, Bar)).toDS.select($"value.name" === "Bar"){code} This gives the following error: {code:java} Can't extract value from value#3: need struct type but got bar; {code} Perhaps because the Bar UDT does not have the right kind of data type i.e. it's not a struct! {code:java} StructType(StructField(id,IntegerType,false), StructField(e,org.apache.spark.BarUDT@61a2aeb7,true)) {code} Isn't the whole point of giving these UDTs a StructType varaible to make them queryable. Is there something I'm missing here? > Make user-defined type (UDT) API public > --------------------------------------- > > Key: SPARK-7768 > URL: https://issues.apache.org/jira/browse/SPARK-7768 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Xiangrui Meng > Priority: Critical > > As the demand for UDTs increases beyond sparse/dense vectors in MLlib, it > would be nice to make the UDT API public in 1.5. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org