waruto210 opened a new issue, #12378:
URL: https://github.com/apache/datafusion/issues/12378

   ### Is your feature request related to a problem or challenge?
   
   When I use the following code to attempt creating external tables using SQL 
and Rust API respectively, a "Corrupt footer" error occurs. Eventually, I 
discovered that this was because there were files of other formats in the 
`partitioned` directory, but DataFusion was reading them as data files. I think 
this behavior is confusing, because the SQL specifies the Parquet format, and 
the Rust code also creates a new `ParquetFormat`. 
   ```sql
   CREATE EXTERNAL TABLE hits
   STORED AS PARQUET
   LOCATION 'partitioned/';
   ```
   ```rust
     let mut opts = TableParquetOptions::default();
     opts.set("pushdown_filters", "true").unwrap();
     let format = ParquetFormat::new().with_options(opts);
     let options = ListingOptions::new(Arc::new(format))
         .with_table_partition_cols(vec![("A".to_owned(), DataType::Int32)]);
     ctx.register_listing_table(
         "hits",
         "partitioned/",
         options,
         None,
         None,
     )
     .await
   ```
   
   ### Describe the solution you'd like
   
   Currently, when creating ListingOptions using format, `format.get_ext` is 
not called. I believe `format.get_ext` should be used as the default file 
extension filter, which would also better align with the definition of 
`FileFormat`trait. I you agree with me, I could submit a pr to make that change.
   
   ```rust
   impl ListingOptions {
     /// Creates an options instance with the given format
     /// Default values:
     /// - no file extension filter
     /// - no input partition to discover
     /// - one target partition
     /// - stat collection
     pub fn new(format: Arc<dyn FileFormat>) -> Self {
         Self {
             file_extension: String::new(),
             format,
             table_partition_cols: vec![],
             collect_stat: true,
             target_partitions: 1,
             file_sort_order: vec![],
         }
     }
   ```
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to