liamphmurphy opened a new issue, #15338:
URL: https://github.com/apache/datafusion/issues/15338
### Describe the bug
This bug for me originated when encountering schema evolutions on Delta
tables using the `delta-rs` library. Whenever a schema evolution occurred on my
table that contains a field with a list of structs, Datafusion is returning
this error:
```
This feature is not implemented: Unsupported CAST from Struct([Field { name:
"properties", data_type: Struct([Field { name: "someNewField", data_type: Utf8,
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field {
name: "fields", data_type: List(Field { name: "item", data_type: Struct([Field
{ name: "messageId", data_type: Utf8, nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }), nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]), nullable: true, dict_id: 0,
dict_is_ordered: false, metadata: {} }]) to Struct([Field { name: "properties",
data_type: Struct([Field { name: "fields", data_type: List(Field { name:
"element", data_type: Struct([Field { name: "messageId", data_type: Utf8,
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} }), nullable: true,
dict_id: 0, dict_is_ordere
d: false, metadata: {} }, Field { name: "someNewField", data_type: Utf8,
nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]), nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} }])
```
### To Reproduce
Below is the python code using delta-rs (which is currently on Datafusion
46) that shows this error
```python
import pyarrow as pa
from deltalake import write_deltalake
# Define the path for the Delta table
delta_table_path = "./datafusion-repro-test-table"
# Define the data for the first write
data_first_write = [
{
"uid": "ws_2",
"event": {
"properties": {
"fields": [
{
"messageId": "veniam sed et elit adipisicing"
}
],
},
}
}
]
schema = pa.schema([
pa.field("uid", pa.string()),
pa.field("event", pa.struct([
pa.field("properties", pa.struct([
pa.field("fields", pa.list_(pa.struct([
pa.field("messageId", pa.string()),
]))),
])),
])),
])
print(schema)
first_write = pa.Table.from_pylist(data_first_write, schema=schema)
# Write data to Delta table for the first write
write_deltalake(delta_table_path, first_write, mode="append", engine="rust",
schema_mode="merge")
#### NOW FOR THE SECOND WRITE THAT BREAKS ####
data_second_write = [
{
"uid": "ws_2",
"event": {
"properties": {
"someNewField": "test-value", # New field
"fields": [
{
"messageId": "veniam sed et elit adipisicing"
}
],
},
}
}
]
second_schema = pa.schema([
pa.field("uid", pa.string()),
pa.field("event", pa.struct([
pa.field("properties", pa.struct([
pa.field("someNewField", pa.string()), # New field
pa.field("fields", pa.list_(pa.struct([
pa.field("messageId", pa.string()),
]))),
])),
])),
])
second_write = pa.Table.from_pylist(data_second_write, schema=second_schema)
# Write data to Delta table for the second write
write_deltalake(delta_table_path, second_write, mode="append",
engine="rust", schema_mode="merge")
```
### Expected behavior
Datafusion would support casting a schema when said schema contains a list
of structs.
### Additional context
Originating bug report in delta-rs:
https://github.com/delta-io/delta-rs/issues/3339
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]