[ 
https://issues.apache.org/jira/browse/SPARK-44001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44001.
----------------------------------
    Fix Version/s: 4.0.0
       Resolution: Fixed

Issue resolved by pull request 43767
[https://github.com/apache/spark/pull/43767]

> Improve parsing of well known wrapper types
> -------------------------------------------
>
>                 Key: SPARK-44001
>                 URL: https://issues.apache.org/jira/browse/SPARK-44001
>             Project: Spark
>          Issue Type: Improvement
>          Components: Protobuf
>    Affects Versions: 3.4.0
>            Reporter: Parth Upadhyay
>            Assignee: Parth Upadhyay
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> Under `com.google.protobuf`, there are some well known wrapper types for 
> primitives, 
> [namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto],
>  useful for distinguishing between absence of primitive fields and their 
> default values, as well as for use within `google.protobuf.Any` types. These 
> types are:
> {code}
> DoubleValue
> FloatValue
> Int64Value
> Uint64Value
> Int32Value
> Uint32Value
> BoolValue
> StringValue
> BytesValue
> {code}
> Currently, when we deserialize these from a serialized protobuf into a spark 
> struct, we expand them as if they were normal messages. Concretely, if we have
> {code}
> syntax = "proto3";
> import "google/protobuf/wrappers.proto"
> message WktExample {
>   google.protobuf.BoolValue bool_val = 1;
>   google.protobuf.Int32Value int32_val = 2;
> }
> {code}
> And a message like
> {code}
> WktExample(true, 100)
> {code}
> Then the behavior today is to deserialize this as.
> {code}
> {"bool_val": {"value": true}, "int32_val": {"value": 100}}
> {code}
> This is quite difficult to work with and not in the spirit of the wrapper 
> type, so it would be nice to deserialize as
> {code}
> {"bool_val": true, "int32_val": 100}
> {code}
> This is also the behavior by other popular deserialization libraries, 
> including java protobuf util 
> [Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914]
>  and golangs 
> [jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214].
> So for consistency with other libraries and improved usability, I propose we 
> deserialize well known types in this way. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to