Ryan Weisman created ARROW-17835:
------------------------------------

             Summary: pyarrow.json.read_json ignores nullable=True on fields 
with non-nullable subfields in explicit_schema parse_options
                 Key: ARROW-17835
                 URL: https://issues.apache.org/jira/browse/ARROW-17835
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 7.0.0
            Reporter: Ryan Weisman


*Summary:*

The parser seems to be ignoring the "nullable" flag on the parent field.

This behavior may be related to ARROW-16603, but that issue covers the opposite 
case - failing to include the "not null" constraints in the schema of the 
output table.

*Reproducible example:*
import json
import pyarrow as pa

interior_struct = pa.struct([
    pa.field(name='some_information', type=pa.int64(), nullable=False)
])

sample_schema = pa.schema([
    pa.field(name='my_struct', type=interior_struct, nullable=True)
])

# encode an empty JSON object -
# the issue persists regardless of the input JSON,
# so long as there is no field named "egg_carton" in the input
sample_json_bytes = io.BytesIO('{}'.encode())

table = pa.json.read_json(
    input_file=sample_json_bytes,
    parse_options=pa.json.ParseOptions(explicit_schema=sample_schema)
)

print(table)
*Expected output:*

Table containing one column, with name "my_struct" and type "interior_struct", 
whose value is null.{*}{*}

*Actual output:*
{code:java}
ArrowInvalid: JSON parse error: a required field was null{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to