petern48 commented on code in PR #18577:
URL: https://github.com/apache/datafusion/pull/18577#discussion_r2508442522
##########
datafusion/datasource-parquet/src/reader.rs:
##########
@@ -97,6 +97,7 @@ impl DefaultParquetFileReaderFactory {
pub struct ParquetFileReader {
pub file_metrics: ParquetFileMetrics,
pub inner: ParquetObjectReader,
+ pub partitioned_file: PartitionedFile,
Review Comment:
I wasn't sure, would this be considered a breaking change?
`ParquetFileMetrics` is in this `ParquetFileReader` struct, but
`ParquetFileReader` didn't have a way to access the total file size, since the
`file_size` field in `ParquetObjectReader` was private (and inside of
`arrow-rs`. This is the change that made the most sense to me, since
`CachedParquetFileReader` already has this as a field.
https://github.com/apache/datafusion/blob/f162fd325565e14be8e4cace17d8a3a8b2764cc8/datafusion/datasource-parquet/src/reader.rs#L225-L229
Alternatively, maybe I could add a `file_size()` getter to
[ParquetObjectReader](https://github.com/apache/arrow-rs/blob/43c7637c634a96998cc135490ea0a8b73972feb8/parquet/src/arrow/async_reader/store.rs#L55)
in `arrow-rs`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]