[ https://issues.apache.org/jira/browse/SPARK-44001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Hyukjin Kwon resolved SPARK-44001. ---------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 43767 [https://github.com/apache/spark/pull/43767] > Improve parsing of well known wrapper types > ------------------------------------------- > > Key: SPARK-44001 > URL: https://issues.apache.org/jira/browse/SPARK-44001 > Project: Spark > Issue Type: Improvement > Components: Protobuf > Affects Versions: 3.4.0 > Reporter: Parth Upadhyay > Assignee: Parth Upadhyay > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Under `com.google.protobuf`, there are some well known wrapper types for > primitives, > [namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto], > useful for distinguishing between absence of primitive fields and their > default values, as well as for use within `google.protobuf.Any` types. These > types are: > {code} > DoubleValue > FloatValue > Int64Value > Uint64Value > Int32Value > Uint32Value > BoolValue > StringValue > BytesValue > {code} > Currently, when we deserialize these from a serialized protobuf into a spark > struct, we expand them as if they were normal messages. Concretely, if we have > {code} > syntax = "proto3"; > import "google/protobuf/wrappers.proto" > message WktExample { > google.protobuf.BoolValue bool_val = 1; > google.protobuf.Int32Value int32_val = 2; > } > {code} > And a message like > {code} > WktExample(true, 100) > {code} > Then the behavior today is to deserialize this as. > {code} > {"bool_val": {"value": true}, "int32_val": {"value": 100}} > {code} > This is quite difficult to work with and not in the spirit of the wrapper > type, so it would be nice to deserialize as > {code} > {"bool_val": true, "int32_val": 100} > {code} > This is also the behavior by other popular deserialization libraries, > including java protobuf util > [Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914] > and golangs > [jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214]. > So for consistency with other libraries and improved usability, I propose we > deserialize well known types in this way. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org