yjshen commented on a change in pull request #811:
URL: https://github.com/apache/arrow-datafusion/pull/811#discussion_r688304741
##########
File path: datafusion/src/datasource/mod.rs
##########
@@ -36,3 +47,231 @@ pub(crate) enum Source<R = Box<dyn std::io::Read + Send +
Sync + 'static>> {
/// Read data from a reader
Reader(std::sync::Mutex<Option<R>>),
}
+
+#[derive(Debug, Clone)]
+/// A single file that should be read, along with its schema, statistics
Review comment:
Yes, that's the intention here.
- `PartitionedFile` -> Single file (for the moment) or part of a file
(later, part of the row groups or rows), and we may even extend this to include
partition value and partition schema (see below) to support partitioned tables:
`/path/to/table/root/p_date=20210813/p_hour=1200/xxxxx.parquet`
- `FilePartition` -> The basic unit for parallel processing, each task is
responsible for processing one `FilePartition` which is composed of several
`PartitionFile`s.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]