adriangb commented on code in PR #18386:
URL: https://github.com/apache/datafusion/pull/18386#discussion_r2485030124
##########
datafusion/substrait/tests/cases/roundtrip_physical_plan.rs:
##########
@@ -35,24 +35,22 @@ use substrait::proto::extensions;
#[tokio::test]
async fn parquet_exec() -> Result<()> {
- let source = Arc::new(ParquetSource::default());
-
- let scan_config = FileScanConfigBuilder::new(
- ObjectStoreUrl::local_filesystem(),
- Arc::new(Schema::empty()),
- source,
- )
- .with_file_groups(vec![
- FileGroup::new(vec![PartitionedFile::new(
- "file://foo/part-0.parquet".to_string(),
- 123,
- )]),
- FileGroup::new(vec![PartitionedFile::new(
- "file://foo/part-1.parquet".to_string(),
- 123,
- )]),
- ])
- .build();
+ let schema = Arc::new(Schema::empty());
+ let source = Arc::new(ParquetSource::new(schema.clone()));
+
+ let scan_config =
+ FileScanConfigBuilder::new(ObjectStoreUrl::local_filesystem(), schema,
source)
Review Comment:
I got it to work! It's more code churn but I do think it's better. Now there
is a single source of truth for the schema.
This does mean more and more "stuff" is being moved from `FileScanConfig`
into `FileSource` but I think that's in line with the big picture goal.
FWIW what I envision FileScanConfig doing in the end is:
- Statistics. It has the list of files and partitioning so it can compute
statistics, EqProperties, etc. It will have to get the filter, projection and
unprojected schema from the FileSource.
- Some shared configuration.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]