justaparth commented on code in PR #41498:
URL: https://github.com/apache/spark/pull/41498#discussion_r1223202896


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -247,12 +247,86 @@ private[sql] class ProtobufDeserializer(
           updater.setLong(ordinal, micros + 
TimeUnit.NANOSECONDS.toMicros(nanoSeconds))
 
       case (MESSAGE, StringType)
-          if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
+        if protoType.getMessageType.getFullName == "google.protobuf.Any" =>
         (updater, ordinal, value) =>
           // Convert 'Any' protobuf message to JSON string.
           val jsonStr = jsonPrinter.print(value.asInstanceOf[DynamicMessage])
           updater.set(ordinal, UTF8String.fromString(jsonStr))
 
+      // Handle well known wrapper types. We unpack the value field instead of 
keeping

Review Comment:
   Also one point that may not have been clear, but these wrapper types for 
primitives are not inherently useful as "messages"; rather they exist for the 
purpose of distinguishing between empty and 0 in proto3, before the existence 
of optional.
   
   As an example
   
   ```
   syntax = "proto3";
   
   message A {
     int64 val = 1;
   }
   ```
   and then
   
   ```
   A()
   A(0)
   ```
   
   are not distinguishable from one another because default values are [not 
serialized](https://protobuf.dev/programming-guides/field_presence/#presence-in-tag-value-stream-wire-format-serialization)
 to the wire for primitives in proto3.
   
   But if you had
   ```
   syntax = "proto3";
   
   message A {
     google.protobuf.Int64Value val = 1;
   }
   ```
   then
   ```
   A()
   A(Int64Value(0))
   ```
   
   Now become distinguishable from one another.
   
   Basically it's not like it's useful to just nest primitives inside a message 
for the sake of it, the purpose is simply to disambiguate in that case above 
(and you're right that `optional` in newer versions of protos basically solves 
this problem).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to