Fokko commented on a change in pull request #26644: [SPARK-30004][SQL] Allow 
merge UserDefinedType into a native DataType
URL: https://github.com/apache/spark/pull/26644#discussion_r363111332
 
 

 ##########
 File path: sql/catalyst/src/test/scala/org/apache/spark/sql/types/TestUDT.scala
 ##########
 @@ -58,4 +63,22 @@ object TestUDT {
 
     override def equals(other: Any): Boolean = 
other.isInstanceOf[MyDenseVectorUDT]
   }
+
+  private[sql] class MyXMLGregorianCalendarUDT extends 
UserDefinedType[XMLGregorianCalendar] {
+    override def sqlType: DataType = TimestampType
+
+    override def serialize(obj: XMLGregorianCalendar): Any =
+      obj.toGregorianCalendar.getTimeInMillis * 1000
+
+    override def deserialize(datum: Any): XMLGregorianCalendar = {
+      val calendar = new GregorianCalendar
+      calendar.setTimeInMillis(datum.asInstanceOf[Long])
+      DatatypeFactory.newInstance.newXMLGregorianCalendar(calendar)
+    }
+
+    override def userClass: Class[XMLGregorianCalendar] = 
classOf[XMLGregorianCalendar]
+
+    // By setting this to a timestamp, we lose the information about the udt
+    override private[sql] def jsonValue: JValue = "timestamp"
 
 Review comment:
   That is correct. So when we serialize it, it will use:
   ```scala
       override def serialize(obj: XMLGregorianCalendar): Any =
         obj.toGregorianCalendar.getTimeInMillis * 1000
   ```
   Which will write it as a timestamp stored as the number of microseconds from 
the epoch of 1970-01-01T00:00:00.000000Z (UTC+00:00).
   
   And this serialize implementation should be the same as the one in the 
`sqlType`, because we don't store any references to the UDT, another Spark 
session will just decode it as a `TimestampType`, so the decoder of the 
TimestampType will be used. 
   
   I know that this isn't trivial, but it is very powerful and makes the UDT's 
much more flexible when writing ETL jobs since you can directly use your own 
types with the Dataset API.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to