waruto210 opened a new issue, #12378:
URL: https://github.com/apache/datafusion/issues/12378
### Is your feature request related to a problem or challenge?
When I use the following code to attempt creating external tables using SQL
and Rust API respectively, a "Corrupt footer" error occurs. Eventually, I
discovered that this was because there were files of other formats in the
`partitioned` directory, but DataFusion was reading them as data files. I think
this behavior is confusing, because the SQL specifies the Parquet format, and
the Rust code also creates a new `ParquetFormat`.
```sql
CREATE EXTERNAL TABLE hits
STORED AS PARQUET
LOCATION 'partitioned/';
```
```rust
let mut opts = TableParquetOptions::default();
opts.set("pushdown_filters", "true").unwrap();
let format = ParquetFormat::new().with_options(opts);
let options = ListingOptions::new(Arc::new(format))
.with_table_partition_cols(vec![("A".to_owned(), DataType::Int32)]);
ctx.register_listing_table(
"hits",
"partitioned/",
options,
None,
None,
)
.await
```
### Describe the solution you'd like
Currently, when creating ListingOptions using format, `format.get_ext` is
not called. I believe `format.get_ext` should be used as the default file
extension filter, which would also better align with the definition of
`FileFormat`trait. I you agree with me, I could submit a pr to make that change.
```rust
impl ListingOptions {
/// Creates an options instance with the given format
/// Default values:
/// - no file extension filter
/// - no input partition to discover
/// - one target partition
/// - stat collection
pub fn new(format: Arc<dyn FileFormat>) -> Self {
Self {
file_extension: String::new(),
format,
table_partition_cols: vec![],
collect_stat: true,
target_partitions: 1,
file_sort_order: vec![],
}
}
```
### Describe alternatives you've considered
_No response_
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]