[ 
https://issues.apache.org/jira/browse/GOBBLIN-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ahmed Abdul Hamid updated GOBBLIN-933:
--------------------------------------
    Summary: Handle null strings when converting JSON arrays to Avro  (was: 
Handle null strings when converting JSON arrays and maps to Avro)

> Handle null strings when converting JSON arrays to Avro
> -------------------------------------------------------
>
>                 Key: GOBBLIN-933
>                 URL: https://issues.apache.org/jira/browse/GOBBLIN-933
>             Project: Apache Gobblin
>          Issue Type: Bug
>            Reporter: Ahmed Abdul Hamid
>            Priority: Major
>
> Converting a JSON array of strings to Avro (using 
> {{JsonElementConversionFactory}} or 
> {{JsonElementConversionWithAvroSchemaFactory}}) fails.
> For instance, running 
> {{JsonRecordAvroSchemaToAvroConverterTest.testConverter()}}  with 
> {{arrayField = ["arr1", "arr2", "arr3", null]}} in 
> {{gobblin-core/src/test/resources/converter/jsonToAvroRecord.json}} fails 
> with the following error:
> {code:java}
> Caused by: java.lang.RuntimeException: Field: arrayField is not nullable and 
> contains a null value
>       at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:278)
>       at 
> org.apache.gobblin.converter.avro.JsonElementConversionWithAvroSchemaFactory$ArrayConverter.convertField(JsonElementConversionWithAvroSchemaFactory.java:93)
>       at 
> org.apache.gobblin.converter.avro.JsonElementConversionFactory$JsonElementConverter.convert(JsonElementConversionFactory.java:280)
>       at 
> org.apache.gobblin.converter.avro.JsonRecordAvroSchemaToAvroConverter.convertNestedRecord(JsonRecordAvroSchemaToAvroConverter.java:125)
>       ... 51 more
>  {code}
> The root cause of this issue is:
>  * {{arrayField}} has the following Avro schema
> {code:java}
> {
>   "name": "arrayField",
>   "type": {
>     "type": "array",
>     "items": "string"
>   }
> } {code}
>  * This Avro schema gets translated to the following {{JsonSchema}} for the 
> outer array type
> {code:java}
> {"columnName":"root","dataType":{"type":"ARRAY"}} {code}
> and the following {{JsonSchema}} for the inner string type
> {code:java}
> {"columnName":"arrayField","dataType":{"type":"string"},"isNullable":false} 
> {code}
>  * The string schema has {{isNullable}} set to {{false}} because 
> {{JsonElementConversionWithAvroSchemaFactory.ArrayConverter}} is propagating 
> the non-nullability of the outer type ({{array}}) to the inner type 
> ({{string}}), which is problematic because it causes the code to reject 
> {{null}} in arrays of strings even though it is perfectly legal in both JSON 
> and Avro.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to