houqp commented on a change in pull request #1141:
URL: https://github.com/apache/arrow-datafusion/pull/1141#discussion_r735323298



##########
File path: datafusion/src/datasource/file_format/mod.rs
##########
@@ -36,25 +36,25 @@ use async_trait::async_trait;
 use super::object_store::{ObjectReader, ObjectReaderStream, ObjectStore};
 use super::PartitionedFile;
 
-/// The configurations to be passed when creating a physical plan for
-/// a given file format.
+/// The base configurations to provide when creating a physical plan for
+/// any given file format.
 pub struct PhysicalPlanConfig {
     /// Store from which the `files` should be fetched
     pub object_store: Arc<dyn ObjectStore>,
     /// Schema before projection
-    pub schema: SchemaRef,
+    pub file_schema: SchemaRef,
     /// List of files to be processed, grouped into partitions
-    pub files: Vec<Vec<PartitionedFile>>,
+    pub file_groups: Vec<Vec<PartitionedFile>>,
     /// Estimated overall statistics of the plan, taking `filters` into account
     pub statistics: Statistics,
     /// Columns on which to project the data
     pub projection: Option<Vec<usize>>,
     /// The maximum number of records per arrow column
     pub batch_size: usize,
-    /// The filters that were pushed down to this execution plan
-    pub filters: Vec<Expr>,
     /// The minimum number of records required from this source plan
     pub limit: Option<usize>,
+    /// The partitioning column names
+    pub table_partition_dims: Vec<String>,

Review comment:
       Conceptually they are handled as "virtual columns" during compute right? 
for example, when a user is writing a SQL query to filter against a partition, 
they will apply the filter to that partition just like other regular columns. I 
am suggesting partition column here because it's the term used in hive and 
spark, so readers would be more familiar with it. Are there systems that use 
partition dimensions as the naming convention?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to