rdettai commented on a change in pull request #1141:
URL: https://github.com/apache/arrow-datafusion/pull/1141#discussion_r735312508
##########
File path: datafusion/src/datasource/file_format/mod.rs
##########
@@ -36,25 +36,25 @@ use async_trait::async_trait;
use super::object_store::{ObjectReader, ObjectReaderStream, ObjectStore};
use super::PartitionedFile;
-/// The configurations to be passed when creating a physical plan for
-/// a given file format.
+/// The base configurations to provide when creating a physical plan for
+/// any given file format.
pub struct PhysicalPlanConfig {
/// Store from which the `files` should be fetched
pub object_store: Arc<dyn ObjectStore>,
/// Schema before projection
- pub schema: SchemaRef,
+ pub file_schema: SchemaRef,
/// List of files to be processed, grouped into partitions
- pub files: Vec<Vec<PartitionedFile>>,
+ pub file_groups: Vec<Vec<PartitionedFile>>,
/// Estimated overall statistics of the plan, taking `filters` into account
pub statistics: Statistics,
/// Columns on which to project the data
pub projection: Option<Vec<usize>>,
/// The maximum number of records per arrow column
pub batch_size: usize,
- /// The filters that were pushed down to this execution plan
- pub filters: Vec<Expr>,
/// The minimum number of records required from this source plan
pub limit: Option<usize>,
+ /// The partitioning column names
+ pub table_partition_dims: Vec<String>,
Review comment:
Haha, I have to admit that I am in a huge hesitation regarding naming 😅.
I am wondering if it's not the comment that should be changed. This partitions
are originally encoded in the file path, that we then parse and project into a
column if necessary. So they end up as columns, but they are not columns per
say.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]