[
https://issues.apache.org/jira/browse/SPARK-53347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035873#comment-18035873
]
Gurpal SINGH commented on SPARK-53347:
--------------------------------------
[~gurwls223] [~cloud_fan] Hello,
This issue is quite simple and should not take much time to review. No one has
given any update on this Jira since it was created.
I think i have followed Spark guidelines to address the issue through by
opening this ticket.
Do you know what else needs to be done so that someone checks this issue ?
Many thanks & regards,
Gurpal
> Spark from_protobuf() incorrectly deserializes "false" boolean values as null
> -----------------------------------------------------------------------------
>
> Key: SPARK-53347
> URL: https://issues.apache.org/jira/browse/SPARK-53347
> Project: Spark
> Issue Type: Bug
> Components: Protobuf
> Affects Versions: 3.5.0, 3.5.1, 3.5.2, 4.0.0
> Environment: Scala 2.13,
> Spark 3.5.2
> JDK 17
> Maven 3.9.11
> Reporter: Gurpal SINGH
> Priority: Major
> Labels: Correctness, Deserialize, boolean, correctness,
> data-loss, false, from_protobuf, null, protobuf, pull-request-available, spark
> Original Estimate: 96h
> Remaining Estimate: 96h
>
> *Problem*
> When deserializing a Protobuf message using `{_}from_protobuf(){_}` in Spark,
> boolean fields with the value `false` are incorrectly deserialized as `null`.
> This leads to incorrect data in the resulting DataFrame and breaks semantic
> expectations.
> *Reproduction*
> Given a Protobuf message like the following (using
> `{_}google.protobuf.BoolValue{_}`):
>
> {code:java}
> syntax = "proto3";
> message Example {
> google.protobuf.BoolValue is_active = 1;
> }{code}
>
>
> And a message where is_active is explicitly set to false, the result of
> _from_protobuf()_ will show null instead of false.
>
> *Root Cause*
> In {_}ProtobufDeserializer.scala{_}, the logic for deserializing
> google.protobuf.BoolValue relies on the getFieldValue() method, which uses
> the following condition:
>
> {code:java}
> if (field.isRepeated || record.hasField(field) || field.hasDefaultValue ||
> (!field.hasPresence && this.emitDefaultValues)) {
> record.getField(field)
> } else {
> null
> } {code}
>
> However, for BoolValue, even when the inner value is false,
> record.hasField(field) returns false — as the field is present but set to
> false (not "unset"). As a result, _getFieldValue()_ returns null instead of
> false.
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]