Parth Upadhyay created SPARK-44001:
--------------------------------------

             Summary: Improve parsing of well known wrapper types
                 Key: SPARK-44001
                 URL: https://issues.apache.org/jira/browse/SPARK-44001
             Project: Spark
          Issue Type: Improvement
          Components: Protobuf
    Affects Versions: 3.4.0
            Reporter: Parth Upadhyay


Under `com.google.protobuf`, there are some well known wrapper types for 
primitives, 
[namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto],
 useful for distinguishing between absence of primitive fields and their 
default values, as well as for use within `google.protobuf.Any` types. These 
types are:

{code}
DoubleValue
FloatValue
Int64Value
Uint64Value
Int32Value
Uint32Value
BoolValue
StringValue
BytesValue
{code}

Currently, when we deserialize these from a serialized protobuf into a spark 
struct, we expand them as if they were normal messages. Concretely, if we have

{code}
syntax = "proto3";

import "google/protobuf/wrappers.proto"

message WktExample {
  google.protobuf.BoolValue bool_val = 1;
  google.protobuf.Int32Value int32_val = 2;
}
{code}

And a message like
{code}
WktExample(true, 100)
{code}

Then the behavior today is to deserialize this as.
{code}
{"bool_val": {"value": true}, "int32_val": {"value": 100}}
{code}

This is quite difficult to work with and not in the spirit of the wrapper type, 
so it would be nice to deserialize as

{code}
{"bool_val": true, "int32_val": 100}
{code}

This is also the behavior by other popular deserialization libraries, including 
java protobuf util 
[Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914]
 and golangs 
[jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214].

So for consistency with other libraries and improved usability, I propose we 
deserialize well known types in this way. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to