timsaucer opened a new issue, #715:
URL: https://github.com/apache/datafusion-python/issues/715
**Describe the bug**
When you have a column that is a struct of struct and you attempt to index
into the lowest level, if there is a null at the first level of the struct you
get an unexpected result. In the dataframe below I have an `outer_1` stuct that
if it is null and we try to access an inner member, we would expect to also get
a null.
I have exported this dataframe to parquet and tested on the rust side and
the problem does not exist there, so I think it is something in this repo.
**To Reproduce**
```
ctx = SessionContext()
batch = pa.RecordBatch.from_arrays(
[pa.array([
{"outer_1": {"inner_1": 1, "inner_2": 2}},
{"outer_1": {"inner_1": 1, "inner_2": None}},
{"outer_1": None},
])],
names=["a"],
)
df = ctx.create_dataframe([[batch]])
df.write_parquet("/dbfs/tmp/tsaucer/struct_of_struct.parquet")
df.select(col("a")).show()
df.select(col("a")["outer_1"]).show()
df.select(col("a")["outer_1"]["inner_2"]).show()
```
Produces:
```
03:20 PM (<1s)
ctx = SessionContext()
batch = pa.RecordBatch.from_arrays(
[pa.array([
{"outer_1": {"inner_1": 1, "inner_2": 2}},
{"outer_1": {"inner_1": 1, "inner_2": None}},
{"outer_1": None},
])],
names=["a"],
)
df = ctx.create_dataframe([[batch]])
df.write_parquet("/dbfs/tmp/tsaucer/struct_of_struct.parquet")
df.select(col("a")).show()
df.select(col("a")["outer_1"]).show()
df.select(col("a")["outer_1"]["inner_2"]).show()
DataFrame()
+-------------------------------------+
| a |
+-------------------------------------+
| {outer_1: {inner_1: 1, inner_2: 2}} |
| {outer_1: {inner_1: 1, inner_2: }} |
| {outer_1: } |
+-------------------------------------+
DataFrame()
+----------------------------------------------+
| cc251bd408f114ca2a4354b6976d91339.a[outer_1] |
+----------------------------------------------+
| {inner_1: 1, inner_2: 2} |
| {inner_1: 1, inner_2: } |
| |
+----------------------------------------------+
DataFrame()
+-------------------------------------------------------+
| cc251bd408f114ca2a4354b6976d91339.a[outer_1][inner_2] |
+-------------------------------------------------------+
| 2 |
| |
| 0 |
+-------------------------------------------------------+
```
**Expected behavior**
Accessing a subfield of a null entry should also return null.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]