Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow merge UserDefinedType into a native DataType URL: https://github.com/apache/spark/pull/26644#discussion_r363111332
########## File path: sql/catalyst/src/test/scala/org/apache/spark/sql/types/TestUDT.scala ########## @@ -58,4 +63,22 @@ object TestUDT { override def equals(other: Any): Boolean = other.isInstanceOf[MyDenseVectorUDT] } + + private[sql] class MyXMLGregorianCalendarUDT extends UserDefinedType[XMLGregorianCalendar] { + override def sqlType: DataType = TimestampType + + override def serialize(obj: XMLGregorianCalendar): Any = + obj.toGregorianCalendar.getTimeInMillis * 1000 + + override def deserialize(datum: Any): XMLGregorianCalendar = { + val calendar = new GregorianCalendar + calendar.setTimeInMillis(datum.asInstanceOf[Long]) + DatatypeFactory.newInstance.newXMLGregorianCalendar(calendar) + } + + override def userClass: Class[XMLGregorianCalendar] = classOf[XMLGregorianCalendar] + + // By setting this to a timestamp, we lose the information about the udt + override private[sql] def jsonValue: JValue = "timestamp" Review comment: That is correct. So when we serialize it, it will use: ```scala override def serialize(obj: XMLGregorianCalendar): Any = obj.toGregorianCalendar.getTimeInMillis * 1000 ``` Which will write it as a timestamp stored as the number of microseconds from the epoch of 1970-01-01T00:00:00.000000Z (UTC+00:00). And this serialize implementation should be the same as the one in the `sqlType`, because we don't store any references to the UDT, another Spark session will just decode it as a `TimestampType`, so the decoder of the TimestampType will be used. I know that this isn't trivial, but it is very powerful and makes the UDT's much more flexible when writing ETL jobs since you can directly use your own types with the Dataset API. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org