jsimpson-gro opened a new issue, #7954:
URL: https://github.com/apache/arrow-datafusion/issues/7954
### Describe the bug
When reading from a parquet file, if the file extension doesn't happen to
match the default configuration, an empty `DataFrame` is produced. I believe
this is easily encountered as a beginner (reading a Parquet file using the
default read options is one of the first things I tried), and the lack of
feedback makes this a challenge to debug.
### To Reproduce
```
//! ```cargo
//! [dependencies]
//! anyhow = "1.0"
//! datafusion = "32.0"
//! tokio = { version = "1.33", features = ["macros", "rt-multi-thread"] }
//! ```
use std::sync::Arc;
use datafusion::arrow::array::{Float32Array, Int32Array};
use datafusion::arrow::datatypes::{DataType, Field, Schema};
use datafusion::arrow::record_batch::RecordBatch;
use datafusion::dataframe::DataFrameWriteOptions;
use datafusion::parquet::basic::Compression;
use datafusion::parquet::file::properties::WriterProperties;
use datafusion::prelude::*;
#[tokio::main]
async fn main() -> anyhow::Result<()> {
let ctx = SessionContext::new();
// Make up a new dataframe.
let write_df = ctx.read_batch(RecordBatch::try_new(
Arc::new(Schema::new(vec![
Field::new("purchase_id", DataType::Int32, false),
Field::new("price", DataType::Float32, false),
Field::new("quantity", DataType::Int32, false),
])),
vec![
Arc::new(Int32Array::from(vec![1, 2, 3, 4, 5])),
Arc::new(Float32Array::from(vec![1.12, 3.40, 2.33, 9.10, 6.66])),
Arc::new(Int32Array::from(vec![1, 3, 2, 4, 3])),
],
)?)?;
write_df
.write_parquet(
"output.parquet.snappy",
DataFrameWriteOptions::new().with_single_file_output(true),
Some(
WriterProperties::builder()
.set_compression(Compression::SNAPPY)
.build(),
),
)
.await?;
let read_df = ctx
.read_parquet(
"output.parquet.snappy",
ParquetReadOptions {
// If this line is uncommented, the read will be successful.
// file_extension: "parquet.snappy",
..Default::default()
},
)
.await?;
read_df.show().await?;
Ok(())
}
```
Gives output:
```
$ cargo run
Compiling bug_repro_open_no_error v0.1.0
(/home/jacob/src/gro-sandbox/user/jsimpson/app/bug_repro_open_no_error)
Finished dev [unoptimized + debuginfo] target(s) in 7.56s
Running `target/debug/bug_repro_open_no_error`
++
++
```
### Expected behavior
I would like to get an error if the read fails, rather than an empty
`DataFrame`.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]