John Muller created SPARK-11003: ----------------------------------- Summary: Allowing UserDefinedTypes to extend primatives Key: SPARK-11003 URL: https://issues.apache.org/jira/browse/SPARK-11003 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.1, 1.5.0 Reporter: John Muller Priority: Minor
Currently, the classes and constructors of all the primative DataTypes (of StructFields) are private: https://github.com/apache/spark/tree/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types Which means for even simple String-based UDTs users will always have to implement serialize() and deserialize(). UDTs for something as simple as a Northwind database (products, orders, customers) would be very useful for pattern matching / validation. For example: import org.apache.spark.sql.types._ @SQLUserDefinedType(udt = classOf[ProductNameUDT]) case class ProductName(name: String) extends StringType with Validator { import scala.util.matching.Regex private val pattern = """[A-Z][A-Za-z]*""" def validate(): Boolean = { name match { case pattern(_*) => true case _ => false } } } class ProductNameUDT extends UserDefinedType[ProductName] { // No need for this; ProductName is a StringType so we know how to deserialize override def serialize(p: Any): Any = { p match { case p: ProductName => Seq(p.name) } } // Not sure why this override is needed at all; can't we always get this simply by the UDT type param? override def userClass: Class[ProductName] = classOf[ProductName] // Instead of the below, just infer the StructField name via reflection of the wrapper class' name override def sqlType: DataType = StructType(Seq(StructField("ProductName", StringType))) // Still needed. override def deserialize(datum: Any): ProductName = { datum match { case values: Seq[_] => assert(values.length == 1) ProductName(values.head.asInstanceOf[String]) } } } This would simplify the process of creating "primative extension" UDTs down to just 2 steps: 1. Annotated case class that extends a primative DataType 2. The UDT itself just needs a deserializer -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org