Ryan Weisman created ARROW-17835: ------------------------------------ Summary: pyarrow.json.read_json ignores nullable=True on fields with non-nullable subfields in explicit_schema parse_options Key: ARROW-17835 URL: https://issues.apache.org/jira/browse/ARROW-17835 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 7.0.0 Reporter: Ryan Weisman
*Summary:* The parser seems to be ignoring the "nullable" flag on the parent field. This behavior may be related to ARROW-16603, but that issue covers the opposite case - failing to include the "not null" constraints in the schema of the output table. *Reproducible example:* import json import pyarrow as pa interior_struct = pa.struct([ pa.field(name='some_information', type=pa.int64(), nullable=False) ]) sample_schema = pa.schema([ pa.field(name='my_struct', type=interior_struct, nullable=True) ]) # encode an empty JSON object - # the issue persists regardless of the input JSON, # so long as there is no field named "egg_carton" in the input sample_json_bytes = io.BytesIO('{}'.encode()) table = pa.json.read_json( input_file=sample_json_bytes, parse_options=pa.json.ParseOptions(explicit_schema=sample_schema) ) print(table) *Expected output:* Table containing one column, with name "my_struct" and type "interior_struct", whose value is null.{*}{*} *Actual output:* {code:java} ArrowInvalid: JSON parse error: a required field was null{code} -- This message was sent by Atlassian Jira (v8.20.10#820010)