Parth Upadhyay created SPARK-44001: -------------------------------------- Summary: Improve parsing of well known wrapper types Key: SPARK-44001 URL: https://issues.apache.org/jira/browse/SPARK-44001 Project: Spark Issue Type: Improvement Components: Protobuf Affects Versions: 3.4.0 Reporter: Parth Upadhyay
Under `com.google.protobuf`, there are some well known wrapper types for primitives, [namely|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/wrappers.proto], useful for distinguishing between absence of primitive fields and their default values, as well as for use within `google.protobuf.Any` types. These types are: {code} DoubleValue FloatValue Int64Value Uint64Value Int32Value Uint32Value BoolValue StringValue BytesValue {code} Currently, when we deserialize these from a serialized protobuf into a spark struct, we expand them as if they were normal messages. Concretely, if we have {code} syntax = "proto3"; import "google/protobuf/wrappers.proto" message WktExample { google.protobuf.BoolValue bool_val = 1; google.protobuf.Int32Value int32_val = 2; } {code} And a message like {code} WktExample(true, 100) {code} Then the behavior today is to deserialize this as. {code} {"bool_val": {"value": true}, "int32_val": {"value": 100}} {code} This is quite difficult to work with and not in the spirit of the wrapper type, so it would be nice to deserialize as {code} {"bool_val": true, "int32_val": 100} {code} This is also the behavior by other popular deserialization libraries, including java protobuf util [Jsonformat|https://github.com/protocolbuffers/protobuf/blob/main/java/util/src/main/java/com/google/protobuf/util/JsonFormat.java#L904-L914] and golangs [jsonpb|https://github.com/gogo/protobuf/blob/master/jsonpb/jsonpb.go#L207-L214]. So for consistency with other libraries and improved usability, I propose we deserialize well known types in this way. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org