Igosuki opened a new issue #1923:
URL: https://github.com/apache/arrow-datafusion/issues/1923
**Describe the bug**
One can register a table with the file scheme `file://`, this in turns
allows listing table to list files and find partitions.
Unfortunately, LocalStore returns a FileMetaStream where the SizedFile path
has the prefix stripped. This could be fine except
`datafusion::datasource::listing::helpers::parse_partitions_for_path``` calls
strip_prefix on the file_path with the original path used to register the
table, which contains the scheme.
There are two ways to fix this, either strip the scheme off the path in the
registered table as well (would probably be best to let the ObjectStore
implementation do that), or enhance FileMeta and use a URI instead of just a
path.
**To Reproduce**
Steps to reproduce the behavior:
```/tmp/listing_table/part1=value1/``` and
```/tmp/listing_table/part1=value2/```
should contain one parquet file each
```
let mut ctx = ExecutionContext::new();
let listing_options = ListingOptions {
file_extension: "parquet".to_string(),
format: Arc::new(ParquetFormat::default()),
table_partition_cols: vec!["part1"],
collect_stat: true,
target_partitions: 8,
};
ctx.register_listing_table(
"my_table",
"file:///tmp/listing_table",
listing_options,
None,
)
.await?;
let df = ctx.sql("select count(*) from my_table").await?;
let rb = df.collect().await?;
eprintln!("rb = {:?}", rb);
```
**Expected behavior**
The above should count the lines in the files properly, with the current
behavior it'll return 0.
**Additional context**
I'm trying to be consistent on my project and so I use schemes for both
local and remote files. Finding this debug required a lot of debugging.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]