[ https://issues.apache.org/jira/browse/SPARK-43051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717147#comment-17717147 ]
Nikita Awasthi commented on SPARK-43051: ---------------------------------------- User 'justaparth' has created a pull request for this issue: https://github.com/apache/spark/pull/40686 > Allow materializing zero values when deserializing protobuf messages > -------------------------------------------------------------------- > > Key: SPARK-43051 > URL: https://issues.apache.org/jira/browse/SPARK-43051 > Project: Spark > Issue Type: Improvement > Components: Protobuf > Affects Versions: 3.4.0 > Reporter: Parth Upadhyay > Priority: Major > > Currently, when deserializing protobufs using {{{}from_protobuf{}}}, fields > that are not explicitly present in the serialized message are deserialized as > {{null}} in the resulting struct. (In proto3, this also includes fields that > have been explicitly set to their zero value, as it is not distinguishable in > the serialized format. > [https://protobuf.dev/programming-guides/field_presence/]) > For example, given a message format like > {code:java} > syntax = "proto3"; > message SearchRequest { > string query = 1; > int32 page_number = 2; > int32 result_per_page = 3; > } > {code} > and an example message like > {code:python} > SearchRequest(query = "", page_number = 10) > {code} > the result from calling {{from_protobuf}} on the serialized form of the above > message would be > {code:json} > {"query": null, "page_number": 10, "result_per_page": null} > {code} > In proto3, all fields are considered optional and have default values > ([https://protobuf.dev/programming-guides/proto3/#default]), and reader > clients in some languages (e.g. go, scala) will fill in that default value > when reading the protobuf. It could be useful to make this configurable so > that zero values can optionally be materialized if desired. > Concretely, in the example above, we might want to deserialize it instead as > {code:json} > {"query": "", "page_number": 10, "result_per_page": 0} > {code} > In this ticket I propose implementing a way to get the above functionality. > In the linked PR, i've done it by adding an option, {{materializeZeroValues}} > that can be passed to the options map in the {{from_protobuf}} function to > enable this behavior. However i'd love any feedback on if i've understood the > problem correctly and if the implementation makes sense. > > PR: https://github.com/apache/spark/pull/40686 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org