petern48 commented on code in PR #18577:
URL: https://github.com/apache/datafusion/pull/18577#discussion_r2508442522


##########
datafusion/datasource-parquet/src/reader.rs:
##########
@@ -97,6 +97,7 @@ impl DefaultParquetFileReaderFactory {
 pub struct ParquetFileReader {
     pub file_metrics: ParquetFileMetrics,
     pub inner: ParquetObjectReader,
+    pub partitioned_file: PartitionedFile,

Review Comment:
   I wasn't sure, would this technically be considered a breaking change? This 
struct is only constructed manually throughout the crate (no constructor).
   
   `ParquetFileMetrics` is in this `ParquetFileReader` struct, but 
`ParquetFileReader` didn't have a way to access the total file size, since the 
`file_size` field in `ParquetObjectReader` was private (and inside of 
`arrow-rs`. This is the change that made the most sense to me, since 
`CachedParquetFileReader` already has this as a field.
   
   
https://github.com/apache/datafusion/blob/f162fd325565e14be8e4cace17d8a3a8b2764cc8/datafusion/datasource-parquet/src/reader.rs#L225-L229
   
   Alternatively, maybe I could add a `file_size()` getter to 
[ParquetObjectReader](https://github.com/apache/arrow-rs/blob/43c7637c634a96998cc135490ea0a8b73972feb8/parquet/src/arrow/async_reader/store.rs#L55)
 in `arrow-rs`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to