[
https://issues.apache.org/jira/browse/GOBBLIN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ahmed Abdul Hamid updated GOBBLIN-933:
--------------------------------------
Summary: Handle null strings when converting JSON arrays to Avro (was:
Handle null strings when converting JSON arrays and maps to Avro)
> Handle null strings when converting JSON arrays to Avro
> -------------------------------------------------------
>
> Key: GOBBLIN-933
> URL: https://issues.apache.org/jira/browse/GOBBLIN-933
> Project: Apache Gobblin
> Issue Type: Bug
> Reporter: Ahmed Abdul Hamid
> Priority: Major
>
> Converting a JSON array of strings to Avro (using
> {{JsonElementConversionFactory}} or
> {{JsonElementConversionWithAvroSchemaFactory}}) fails.
> For instance, running
> {{JsonRecordAvroSchemaToAvroConverterTest.testConverter()}} with
> {{arrayField = ["arr1", "arr2", "arr3", null]}} in
> {{gobblin-core/src/test/resources/converter/jsonToAvroRecord.json}} fails
> with the following error:
> {code:java}
> Caused by: java.lang.RuntimeException: Field: arrayField is not nullable and
> contains a null value
> at
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:278)
> at
> org.apache.gobblin.converter.avro.JsonElementConversionWithAvroSchemaFactory$ArrayConverter.convertField(JsonElementConversionWithAvroSchemaFactory.java:93)
> at
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
> at
> org.apache.gobblin.converter.avro.JsonRecordAvroSchemaToAvroConverter.convertNestedRecord(JsonRecordAvroSchemaToAvroConverter.java:125)
> ... 51 more
> {code}
> The root cause of this issue is:
> * {{arrayField}} has the following Avro schema
> {code:java}
> {
> "name": "arrayField",
> "type": {
> "type": "array",
> "items": "string"
> }
> } {code}
> * This Avro schema gets translated to the following {{JsonSchema}} for the
> outer array type
> {code:java}
> {"columnName":"root","dataType":{"type":"ARRAY"}} {code}
> and the following {{JsonSchema}} for the inner string type
> {code:java}
> {"columnName":"arrayField","dataType":{"type":"string"},"isNullable":false}
> {code}
> * The string schema has {{isNullable}} set to {{false}} because
> {{JsonElementConversionWithAvroSchemaFactory.ArrayConverter}} is propagating
> the non-nullability of the outer type ({{array}}) to the inner type
> ({{string}}), which is problematic because it causes the code to reject
> {{null}} in arrays of strings even though it is perfectly legal in both JSON
> and Avro.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)