jsimpson-gro opened a new issue, #7954:
URL: https://github.com/apache/arrow-datafusion/issues/7954

   ### Describe the bug
   
   When reading from a parquet file, if the file extension doesn't happen to 
match the default configuration, an empty `DataFrame` is produced. I believe 
this is easily encountered as a beginner (reading a Parquet file using the 
default read options is one of the first things I tried), and the lack of 
feedback makes this a challenge to debug.
   
   ### To Reproduce
   
   ```
   //! ```cargo
   //! [dependencies]
   //! anyhow = "1.0"
   //! datafusion = "32.0"
   //! tokio = { version = "1.33", features = ["macros", "rt-multi-thread"] }
   //! ```
   use std::sync::Arc;
   
   use datafusion::arrow::array::{Float32Array, Int32Array};
   use datafusion::arrow::datatypes::{DataType, Field, Schema};
   use datafusion::arrow::record_batch::RecordBatch;
   use datafusion::dataframe::DataFrameWriteOptions;
   use datafusion::parquet::basic::Compression;
   use datafusion::parquet::file::properties::WriterProperties;
   use datafusion::prelude::*;
   
   #[tokio::main]
   async fn main() -> anyhow::Result<()> {
       let ctx = SessionContext::new();
   
       // Make up a new dataframe.
       let write_df = ctx.read_batch(RecordBatch::try_new(
           Arc::new(Schema::new(vec![
               Field::new("purchase_id", DataType::Int32, false),
               Field::new("price", DataType::Float32, false),
               Field::new("quantity", DataType::Int32, false),
           ])),
           vec![
               Arc::new(Int32Array::from(vec![1, 2, 3, 4, 5])),
               Arc::new(Float32Array::from(vec![1.12, 3.40, 2.33, 9.10, 6.66])),
               Arc::new(Int32Array::from(vec![1, 3, 2, 4, 3])),
           ],
       )?)?;
   
       write_df
           .write_parquet(
               "output.parquet.snappy",
               DataFrameWriteOptions::new().with_single_file_output(true),
               Some(
                   WriterProperties::builder()
                       .set_compression(Compression::SNAPPY)
                       .build(),
               ),
           )
           .await?;
   
       let read_df = ctx
           .read_parquet(
               "output.parquet.snappy",
               ParquetReadOptions {
                   // If this line is uncommented, the read will be successful.
                   // file_extension: "parquet.snappy",
                   ..Default::default()
               },
           )
           .await?;
   
       read_df.show().await?;
   
       Ok(())
   }
   ```
   
   Gives output:
   
   ```
   $ cargo run
      Compiling bug_repro_open_no_error v0.1.0 
(/home/jacob/src/gro-sandbox/user/jsimpson/app/bug_repro_open_no_error)
       Finished dev [unoptimized + debuginfo] target(s) in 7.56s
        Running `target/debug/bug_repro_open_no_error`
   ++
   ++
   ```
   
   ### Expected behavior
   
   I would like to get an error if the read fails, rather than an empty 
`DataFrame`.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to