SteveLauC opened a new issue, #7241:
URL: https://github.com/apache/arrow-datafusion/issues/7241
### Describe the bug
The query `SELECT * EXCEPT (field) FROM 'test.parquet' WHERE field =
'field';` panicked when being executed against an empty parquet file, below is
the error message given by `datafusion-cli`:
> The parquet file is empty in data, not schema, it has a field `field`.
```
push_down_projection
caused by
Internal error: Optimizer rule 'push_down_projection' failed, due to
generate a different schema,
original schema: DFSchema { fields: [], metadata: {} },
new schema: DFSchema { fields: [DFField { qualifier: Some(Bare { table:
"test.parquet" }), field: Field { name: "field", data_type: Utf8, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }.
```
### To Reproduce
1. Use the following rust code to create an empty parquet file, data is
empty but the fields in schema is NOT empty.
```
fn main() {
let file = OpenOptions::new()
.write(true)
.create(true)
.open("test.parquet")
.unwrap();
let writer = ArrowWriter::try_new(
file,
Arc::new(Schema::new(vec![Field::new("field", DataType::Utf8,
true)])),
None,
)
.unwrap();
writer.close().unwrap();
}
```
2. Run the following query in `datafuion-cli`
> Or a Rust program using the `datafusion` library, you will get the
same result
```shell
$ ls -l test.parquet
.rw-r--r--@ 263 steve 9 Aug 13:40 test.parquet
$ datafusion-cli
❯ SELECT * EXCEPT (field) FROM 'test.parquet' WHERE field = 'field';
push_down_projection
caused by
Internal error: Optimizer rule 'push_down_projection' failed, due to
generate a different schema, original schema: DFSchema { fields: [], metadata:
{} }, new schema: DFSchema { fields: [DFField { qualifier: Some(Bare { table:
"test.parquet" }), field: Field { name: "field", data_type: Utf8, nullable:
true, dict_id: 0, dict_is_ordered: false, metadata: {} } }], metadata: {} }.
This was likely caused by a bug in DataFusion's code and we would welcome that
you file an bug report in our issue tracker
```
### Expected behavior
The query can be successfully executed.
### Additional context
* parquet library version: 43
* datafusion-cli version: 28.0.0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]