vbarua commented on code in PR #16857:
URL: https://github.com/apache/datafusion/pull/16857#discussion_r2223862112


##########
datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json:
##########
@@ -0,0 +1,94 @@
+{

Review Comment:
   minor: I would suggest we update the name to something like  
`virtual_table_with_expressions.substrait.json` which more accurately reflects 
what this plan is intended to verify.



##########
datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs:
##########
@@ -114,14 +114,37 @@ pub async fn from_read_rel(
             .await
         }
         Some(ReadType::VirtualTable(vt)) => {
-            if vt.values.is_empty() {
+            if vt.values.is_empty() && vt.expressions.is_empty() {
                 return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
                     produce_one_row: false,
                     schema: DFSchemaRef::new(substrait_schema),
                 }));
             }
 
-            let values = vt
+            let values = if vt.values.is_empty() {
+                let mut exprs = vec![];
+                for row in &vt.expressions {
+                    let mut name_idx = 0;
+                    let mut row_exprs = vec![];
+                    for expression in &row.fields {
+                        name_idx += 1;
+                        let expr = consumer
+                            .consume_expression(expression, &substrait_schema)
+                            .await?;
+                        row_exprs.push(expr);
+                    }
+                    if name_idx != named_struct.names.len() {
+                        return substrait_err!(
+                                "Names list must match exactly to nested 
schema, but found {} uses for {} names",
+                                name_idx,
+                                named_struct.names.len()
+                            );
+                    }
+                    exprs.push(row_exprs);
+                }
+                exprs

Review Comment:
   This does introduce a bit of duplication with the block below, but given 
that `consume_expression` is async and `from_substrait_literal` is not it would 
be a bit tricky to unify the processing. In the long-term Substrait drop 
support for the `values` field which is deprecated so we'll be able to remove 
the lower block entirely.



##########
datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json:
##########
@@ -0,0 +1,94 @@
+{
+    "extensionUris": [
+      {
+        "uri": 
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml";
+      }
+    ],
+    "extensions": [
+      {
+        "extensionFunction": {
+          "functionAnchor": 185,
+          "name": "count:any"
+        }
+      }
+    ],
+    "relations": [
+      {
+        "root": {
+          "input": {
+            "aggregate": {
+              "common": {
+                "direct": {
+                }
+              },
+              "input": {
+                "read": {
+                  "common": {
+                    "direct": {
+                    }
+                  },
+                  "baseSchema": {
+                    "names": [
+                      "dummy"
+                    ],
+                    "struct": {
+                      "types": [
+                        {
+                          "i64": {
+                            "nullability": "NULLABILITY_REQUIRED"
+                          }
+                        }
+                      ],
+                      "nullability": "NULLABILITY_REQUIRED"
+                    }
+                  },
+                  "virtualTable": {
+                    "expressions": [
+                      {
+                        "fields": [
+                            {
+                                "literal": {
+                                    "i64": "0",
+                                    "nullable": false
+                                }
+                            }
+                        ]
+                      }

Review Comment:
   To make this a slighly more robust test, I would suggest:
   * Using a struct with more than 1 field to verify the name processing is 
applied correctly.
   * Having more than 1 row to check that multiple rows can be processed.



##########
datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs:
##########
@@ -114,14 +114,37 @@ pub async fn from_read_rel(
             .await
         }
         Some(ReadType::VirtualTable(vt)) => {
-            if vt.values.is_empty() {
+            if vt.values.is_empty() && vt.expressions.is_empty() {
                 return Ok(LogicalPlan::EmptyRelation(EmptyRelation {
                     produce_one_row: false,
                     schema: DFSchemaRef::new(substrait_schema),
                 }));
             }
 
-            let values = vt
+            let values = if vt.values.is_empty() {

Review Comment:
   minor: I would suggest checking `!vt.expressions.is_empty()` to make it 
clearer that `expressions` are preferred over values. Effectively, it's the 
difference between "We are processing expressions because there are no values" 
as opposed to "We are processing expressions because they are present".



##########
datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json:
##########
@@ -0,0 +1,94 @@
+{
+    "extensionUris": [
+      {
+        "uri": 
"https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml";
+      }
+    ],
+    "extensions": [
+      {
+        "extensionFunction": {
+          "functionAnchor": 185,
+          "name": "count:any"
+        }
+      }
+    ],
+    "relations": [
+      {
+        "root": {
+          "input": {
+            "aggregate": {
+              "common": {
+                "direct": {
+                }
+              },
+              "input": {
+                "read": {

Review Comment:
   minor: I would suggest of getting rid of everything other than the `read` in 
this plan, to make it clearer that this is a test of virtual table 
functionality. The aggregation in this plan is effectively noise.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to