pang-wu commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1222252120


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
           updater.setLong(ordinal, micros + 
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
 
       case (MESSAGE, StringType)
-          if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+        if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
         (updater, ordinal, value) =>
           // Convert 'Any' protobuf message to JSON string.
           val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
           updater.set(ordinal, UTF8String.fromString(jsonStr))
 
+      // Handle well known wrapper types. We unpack the value field instead of 
keeping

Review Comment:
   > Better for Spark to preserve the same information, right?
   
   I would say no, the issue here is after converting these data type to struct 
erase the original type info, i.e. all user see is a struct, but this struct 
could be a custom struct from the user rather than wrapper types. In that case, 
we provide no additional information for a user to decide whether special 
action needs to take, it adds the burden of understanding the original schema 
to data consumer  -- remember the data consumer may not have the original 
schema at hand.
   The idea behind wrapper type are these are structures defined that have 
special meaning, so parsers can leverage these type information to get presence 
information and generate data in such a way that is more intuitive, thus the 
implementaion in libraries like jsonpb.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to