[ 
https://issues.apache.org/jira/browse/SPARK-43051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17717147#comment-17717147
 ] 

Nikita Awasthi commented on SPARK-43051:
----------------------------------------

User 'justaparth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40686

> Allow materializing zero values when deserializing protobuf messages
> --------------------------------------------------------------------
>
>                 Key: SPARK-43051
>                 URL: https://issues.apache.org/jira/browse/SPARK-43051
>             Project: Spark
>          Issue Type: Improvement
>          Components: Protobuf
>    Affects Versions: 3.4.0
>            Reporter: Parth Upadhyay
>            Priority: Major
>
> Currently, when deserializing protobufs using {{{}from_protobuf{}}}, fields 
> that are not explicitly present in the serialized message are deserialized as 
> {{null}} in the resulting struct. (In proto3, this also includes fields that 
> have been explicitly set to their zero value, as it is not distinguishable in 
> the serialized format. 
> [https://protobuf.dev/programming-guides/field_presence/])
> For example, given a message format like
> {code:java}
> syntax = "proto3";
> message SearchRequest {
>   string query = 1;
>   int32 page_number = 2;
>   int32 result_per_page = 3;
> }
> {code}
> and an example message like
> {code:python}
> SearchRequest(query = "", page_number = 10)
> {code}
> the result from calling {{from_protobuf}} on the serialized form of the above 
> message would be
> {code:json}
> {"query": null, "page_number": 10, "result_per_page": null}
> {code}
> In proto3, all fields are considered optional and have default values 
> ([https://protobuf.dev/programming-guides/proto3/#default]), and reader 
> clients in some languages (e.g. go, scala) will fill in that default value 
> when reading the protobuf. It could be useful to make this configurable so 
> that zero values can optionally be materialized if desired.
> Concretely, in the example above, we might want to deserialize it instead as
> {code:json}
> {"query": "", "page_number": 10, "result_per_page": 0}
> {code}
> In this ticket I propose implementing a way to get the above functionality. 
> In the linked PR, i've done it by adding an option, {{materializeZeroValues}} 
> that can be passed to the options map in the {{from_protobuf}} function to 
> enable this behavior. However i'd love any feedback on if i've understood the 
> problem correctly and if the implementation makes sense.
>  
> PR: https://github.com/apache/spark/pull/40686



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to