timsaucer opened a new issue, #715: URL: https://github.com/apache/datafusion-python/issues/715
**Describe the bug** When you have a column that is a struct of struct and you attempt to index into the lowest level, if there is a null at the first level of the struct you get an unexpected result. In the dataframe below I have an `outer_1` stuct that if it is null and we try to access an inner member, we would expect to also get a null. I have exported this dataframe to parquet and tested on the rust side and the problem does not exist there, so I think it is something in this repo. **To Reproduce** ``` ctx = SessionContext() batch = pa.RecordBatch.from_arrays( [pa.array([ {"outer_1": {"inner_1": 1, "inner_2": 2}}, {"outer_1": {"inner_1": 1, "inner_2": None}}, {"outer_1": None}, ])], names=["a"], ) df = ctx.create_dataframe([[batch]]) df.write_parquet("/dbfs/tmp/tsaucer/struct_of_struct.parquet") df.select(col("a")).show() df.select(col("a")["outer_1"]).show() df.select(col("a")["outer_1"]["inner_2"]).show() ``` Produces: ``` 03:20 PM (<1s) ctx = SessionContext() batch = pa.RecordBatch.from_arrays( [pa.array([ {"outer_1": {"inner_1": 1, "inner_2": 2}}, {"outer_1": {"inner_1": 1, "inner_2": None}}, {"outer_1": None}, ])], names=["a"], ) df = ctx.create_dataframe([[batch]]) df.write_parquet("/dbfs/tmp/tsaucer/struct_of_struct.parquet") df.select(col("a")).show() df.select(col("a")["outer_1"]).show() df.select(col("a")["outer_1"]["inner_2"]).show() DataFrame() +-------------------------------------+ | a | +-------------------------------------+ | {outer_1: {inner_1: 1, inner_2: 2}} | | {outer_1: {inner_1: 1, inner_2: }} | | {outer_1: } | +-------------------------------------+ DataFrame() +----------------------------------------------+ | cc251bd408f114ca2a4354b6976d91339.a[outer_1] | +----------------------------------------------+ | {inner_1: 1, inner_2: 2} | | {inner_1: 1, inner_2: } | | | +----------------------------------------------+ DataFrame() +-------------------------------------------------------+ | cc251bd408f114ca2a4354b6976d91339.a[outer_1][inner_2] | +-------------------------------------------------------+ | 2 | | | | 0 | +-------------------------------------------------------+ ``` **Expected behavior** Accessing a subfield of a null entry should also return null. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org