alamb commented on code in PR #10842:
URL: https://github.com/apache/datafusion/pull/10842#discussion_r1633401185


##########
datafusion/substrait/tests/testdata/query_1.json:
##########
@@ -0,0 +1,810 @@
+{

Review Comment:
   Can you please 
   1. Move this file into a directory that makes it clearer where it came from. 
Perhaps `datafusion/substrait/tests/testdata/tpch_substrait_plans/query_1.json`
   2. add a README.md file in 
`datafusion/substrait/tests/testdata/tpch_substrait_plans `  that explains the 
files came from 
https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans?



##########
datafusion/substrait/tests/cases/tpch.rs:
##########
@@ -0,0 +1,63 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! tests contains in 
<https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans>

Review Comment:
   This is very cool -- thank you . I think the context of this PR may be lost 
after merge so some more documentation might help
   
   Something like
   
   ```suggestion
   //! TPCH `substrait_consumer` tests
   //!
   //! This module tests that substrait plans as json encoded protobuf can be 
   //! correctly read as DataFusion plans. 
   //! 
   //! The input data comes from  
<https://github.com/substrait-io/consumer-testing/tree/main/substrait_consumer/tests/integration/queries/tpch_substrait_plans>
   ```
   



##########
datafusion/substrait/tests/cases/mod.rs:
##########
@@ -19,3 +19,4 @@ mod logical_plans;
 mod roundtrip_logical_plan;
 mod roundtrip_physical_plan;
 mod serialize;
+mod tpch;

Review Comment:
   What do you think about renaming this module to `consumer_integration` to 
make it clearer that this is an integration test of existing substrait plans?



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -569,7 +571,80 @@ pub async fn from_substrait_rel(
 
                 Ok(LogicalPlan::Values(Values { schema, values }))
             }
-            _ => not_impl_err!("Only NamedTable and VirtualTable reads are 
supported"),
+            Some(ReadType::LocalFiles(lf)) => {
+                fn extract_filename(name: &str) -> Option<String> {
+                    let corrected_url =
+                        if name.starts_with("file://") && 
!name.starts_with("file:///") {
+                            name.replacen("file://", "file:///", 1)

Review Comment:
   this makes all URLs absolute (is that intended)?



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -569,7 +571,80 @@ pub async fn from_substrait_rel(
 
                 Ok(LogicalPlan::Values(Values { schema, values }))
             }
-            _ => not_impl_err!("Only NamedTable and VirtualTable reads are 
supported"),
+            Some(ReadType::LocalFiles(lf)) => {
+                fn extract_filename(name: &str) -> Option<String> {
+                    let corrected_url =
+                        if name.starts_with("file://") && 
!name.starts_with("file:///") {
+                            name.replacen("file://", "file:///", 1)
+                        } else {
+                            name.to_string()
+                        };
+
+                    Url::parse(&corrected_url).ok().and_then(|url| {
+                        let path = url.path();
+                        std::path::Path::new(path)
+                            .file_name()
+                            .map(|filename| 
filename.to_string_lossy().to_string())
+                    })
+                }
+
+                // we could use the file name to check the original table 
provider
+                // TODO: currently does not support multiple local files
+                let filename: Option<String> =
+                    lf.items.first().and_then(|x| match x.path_type.as_ref() {
+                        Some(UriFile(name)) => extract_filename(name),
+                        _ => None,
+                    });
+
+                if lf.items.len() > 1 || filename.is_none() {
+                    return not_impl_err!(
+                        "Only NamedTable and VirtualTable reads are supported"
+                    );
+                }
+                let name = filename.unwrap();
+                // directly use unwrap here since we could determine it is a 
valid one
+                let table_reference = TableReference::Bare { table: 
name.into() };
+                let t = ctx.table(table_reference).await?;
+                let t = t.into_optimized_plan()?;
+                match &read.projection {
+                    Some(MaskExpression { select, .. }) => match 
&select.as_ref() {
+                        Some(projection) => {
+                            let column_indices: Vec<usize> = projection
+                                .struct_items
+                                .iter()
+                                .map(|item| item.field as usize)
+                                .collect();
+                            match &t {

Review Comment:
   I think if you matched on `t` you could avoid the `scan.clone()` later on



##########
datafusion/substrait/tests/testdata/tpch/lineitem.csv:
##########
@@ -0,0 +1,2 @@
+l_orderkey,l_partkey,l_suppkey,l_linenumber,l_quantity,l_extendedprice,l_discount,l_tax,l_returnflag,l_linestatus,l_shipdate,l_commitdate,l_receiptdate,l_shipinstruct,l_shipmode,l_comment
+1,1,1,1,17,21168.23,0.04,0.02,'N','O','1996-03-13','1996-02-12','1996-03-22','DELIVER
 IN PERSON','TRUCK','egular courts above the'

Review Comment:
   I think a single line row is fine 👍 



##########
datafusion/substrait/src/logical_plan/consumer.rs:
##########
@@ -569,7 +571,80 @@ pub async fn from_substrait_rel(
 
                 Ok(LogicalPlan::Values(Values { schema, values }))
             }
-            _ => not_impl_err!("Only NamedTable and VirtualTable reads are 
supported"),
+            Some(ReadType::LocalFiles(lf)) => {
+                fn extract_filename(name: &str) -> Option<String> {
+                    let corrected_url =
+                        if name.starts_with("file://") && 
!name.starts_with("file:///") {
+                            name.replacen("file://", "file:///", 1)
+                        } else {
+                            name.to_string()
+                        };
+
+                    Url::parse(&corrected_url).ok().and_then(|url| {
+                        let path = url.path();
+                        std::path::Path::new(path)
+                            .file_name()
+                            .map(|filename| 
filename.to_string_lossy().to_string())
+                    })
+                }
+
+                // we could use the file name to check the original table 
provider
+                // TODO: currently does not support multiple local files

Review Comment:
   Should we file at ticket for this feature?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to