[GitHub] [arrow-datafusion] rdettai commented on a change in pull request #1141: File partitioning for ListingTable

GitBox Mon, 25 Oct 2021 00:09:24 -0700


rdettai commented on a change in pull request #1141:
URL: https://github.com/apache/arrow-datafusion/pull/1141#discussion_r735312508




##########
File path: datafusion/src/datasource/file_format/mod.rs
##########
@@ -36,25 +36,25 @@ use async_trait::async_trait;
 use super::object_store::{ObjectReader, ObjectReaderStream, ObjectStore};
 use super::PartitionedFile;
 
-/// The configurations to be passed when creating a physical plan for
-/// a given file format.
+/// The base configurations to provide when creating a physical plan for
+/// any given file format.
 pub struct PhysicalPlanConfig {
     /// Store from which the `files` should be fetched
     pub object_store: Arc<dyn ObjectStore>,
     /// Schema before projection
-    pub schema: SchemaRef,
+    pub file_schema: SchemaRef,
     /// List of files to be processed, grouped into partitions
-    pub files: Vec<Vec<PartitionedFile>>,
+    pub file_groups: Vec<Vec<PartitionedFile>>,
     /// Estimated overall statistics of the plan, taking `filters` into account
     pub statistics: Statistics,
     /// Columns on which to project the data
     pub projection: Option<Vec<usize>>,
     /// The maximum number of records per arrow column
     pub batch_size: usize,
-    /// The filters that were pushed down to this execution plan
-    pub filters: Vec<Expr>,
     /// The minimum number of records required from this source plan
     pub limit: Option<usize>,
+    /// The partitioning column names
+    pub table_partition_dims: Vec<String>,

Review comment:
       Haha, I have to admit that I am in a huge hesitation regarding naming 😅. 
I am wondering if it's not the comment that should be changed. This partitions 
are originally encoded in the file path, that we then parse and project into a 
column if necessary. So they end up as columns, but they are not columns per 
say. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] rdettai commented on a change in pull request #1141: File partitioning for ListingTable

Reply via email to