[ https://issues.apache.org/jira/browse/SPARK-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Patrick Wendell updated SPARK-3572: ----------------------------------- Summary: Internal API for User-Defined Types (was: Support register UserType in SQL) > Internal API for User-Defined Types > ----------------------------------- > > Key: SPARK-3572 > URL: https://issues.apache.org/jira/browse/SPARK-3572 > Project: Spark > Issue Type: New Feature > Components: SQL > Reporter: Xiangrui Meng > Assignee: Joseph K. Bradley > > If a user knows how to map a class to a struct type in Spark SQL, he should > be able to register this mapping through sqlContext and hence SQL can figure > out the schema automatically. > {code} > trait RowSerializer[T] { > def dataType: StructType > def serialize(obj: T): Row > def deserialize(row: Row): T > } > sqlContext.registerUserType[T](clazz: classOf[T], serializer: > classOf[RowSerializer[T]]) > {code} > In sqlContext, we can maintain a class-to-serializer map and use it for > conversion. The serializer class can be embedded into the metadata, so when > `select` is called, we know we want to deserialize the result. > {code} > sqlContext.registerUserType(classOf[Vector], classOf[VectorRowSerializer]) > val points: RDD[LabeledPoint] = ... > val features: RDD[Vector] = points.select('features).map { case Row(v: > Vector) => v } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org