[ https://issues.apache.org/jira/browse/SPARK-46275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raghu Angadi updated SPARK-46275: --------------------------------- Description: Consider a protobuf with two fields {{message Person { string name = 1; int id = 2; }} * The struct returned by {{from_protobuf("Person")}} like this: ** STRUCT<name STRING, id INT> * If the underlying binary record fails to deserialize, it results in a exception and query fails. * Buf if the option {{mode}} is set to {{PERMISSIVE}} , malformed records are tolerated {{null}} is returned. ** {*}BUT{*}: The retuned struct looks like this \{"name: null, id: "null"} * ** *** This is not convenient to the user. *** *Ideally,* {{from_protobuf()}} *should return* {{null}} *.* ** {{from_protobuf()}} borrowed the current behavior from {{from_avro()}} implementation. It is not clear what the motivation was. I think we should update the implementation to return {{null}} rather than a struct with null-fields inside. was: Consider a protobuf with two fields {{message Person { string name = 1; int id = 2; }} * The struct returned by {{from_protobuf("Person")}} like this: ** STRUCT<name STRING, id INT> * If the underlying binary record fails to deserialize, it results in a exception and query fails. * Buf if the option {{mode}} is set to {{PERMISSIVE}} , malformed records are tolerated {{null}} is returned. ** {*}BUT{*}: The retuned struct looks like this {{{"name: null, id: "null"}}} * ** *** This is not convenient to the user. *** *Ideally,* {{from_protobuf()}} *should return* {{null}} *.* ** {{from_protobuf()}} borrowed the current behavior from {{from_avro()}} implementation. It is not clear what the motivation was. I think we should update the implementation to return {{null}} rather than a struct with null-fields inside. > Protobuf: Permissive mode should return null rather than struct with null > fields > -------------------------------------------------------------------------------- > > Key: SPARK-46275 > URL: https://issues.apache.org/jira/browse/SPARK-46275 > Project: Spark > Issue Type: Bug > Components: Protobuf, Structured Streaming > Affects Versions: 3.5.0 > Reporter: Raghu Angadi > Priority: Major > Fix For: 4.0.0, 3.5.1 > > > Consider a protobuf with two fields {{message Person { string name = 1; int > id = 2; }} > * The struct returned by {{from_protobuf("Person")}} like this: > ** STRUCT<name STRING, id INT> > * If the underlying binary record fails to deserialize, it results in a > exception and query fails. > * Buf if the option {{mode}} is set to {{PERMISSIVE}} , malformed records > are tolerated {{null}} is returned. > ** {*}BUT{*}: The retuned struct looks like this \{"name: null, id: "null"} > * > ** > *** This is not convenient to the user. > *** *Ideally,* {{from_protobuf()}} *should return* {{null}} *.* > ** {{from_protobuf()}} borrowed the current behavior from {{from_avro()}} > implementation. It is not clear what the motivation was. > I think we should update the implementation to return {{null}} rather than a > struct with null-fields inside. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org