vbarua commented on code in PR #16857: URL: https://github.com/apache/datafusion/pull/16857#discussion_r2223862112
########## datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json: ########## @@ -0,0 +1,94 @@ +{ Review Comment: minor: I would suggest we update the name to something like `virtual_table_with_expressions.substrait.json` which more accurately reflects what this plan is intended to verify. ########## datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs: ########## @@ -114,14 +114,37 @@ pub async fn from_read_rel( .await } Some(ReadType::VirtualTable(vt)) => { - if vt.values.is_empty() { + if vt.values.is_empty() && vt.expressions.is_empty() { return Ok(LogicalPlan::EmptyRelation(EmptyRelation { produce_one_row: false, schema: DFSchemaRef::new(substrait_schema), })); } - let values = vt + let values = if vt.values.is_empty() { + let mut exprs = vec![]; + for row in &vt.expressions { + let mut name_idx = 0; + let mut row_exprs = vec![]; + for expression in &row.fields { + name_idx += 1; + let expr = consumer + .consume_expression(expression, &substrait_schema) + .await?; + row_exprs.push(expr); + } + if name_idx != named_struct.names.len() { + return substrait_err!( + "Names list must match exactly to nested schema, but found {} uses for {} names", + name_idx, + named_struct.names.len() + ); + } + exprs.push(row_exprs); + } + exprs Review Comment: This does introduce a bit of duplication with the block below, but given that `consume_expression` is async and `from_substrait_literal` is not it would be a bit tricky to unify the processing. In the long-term Substrait drop support for the `values` field which is deprecated so we'll be able to remove the lower block entirely. ########## datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json: ########## @@ -0,0 +1,94 @@ +{ + "extensionUris": [ + { + "uri": "https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml" + } + ], + "extensions": [ + { + "extensionFunction": { + "functionAnchor": 185, + "name": "count:any" + } + } + ], + "relations": [ + { + "root": { + "input": { + "aggregate": { + "common": { + "direct": { + } + }, + "input": { + "read": { + "common": { + "direct": { + } + }, + "baseSchema": { + "names": [ + "dummy" + ], + "struct": { + "types": [ + { + "i64": { + "nullability": "NULLABILITY_REQUIRED" + } + } + ], + "nullability": "NULLABILITY_REQUIRED" + } + }, + "virtualTable": { + "expressions": [ + { + "fields": [ + { + "literal": { + "i64": "0", + "nullable": false + } + } + ] + } Review Comment: To make this a slighly more robust test, I would suggest: * Using a struct with more than 1 field to verify the name processing is applied correctly. * Having more than 1 row to check that multiple rows can be processed. ########## datafusion/substrait/src/logical_plan/consumer/rel/read_rel.rs: ########## @@ -114,14 +114,37 @@ pub async fn from_read_rel( .await } Some(ReadType::VirtualTable(vt)) => { - if vt.values.is_empty() { + if vt.values.is_empty() && vt.expressions.is_empty() { return Ok(LogicalPlan::EmptyRelation(EmptyRelation { produce_one_row: false, schema: DFSchemaRef::new(substrait_schema), })); } - let values = vt + let values = if vt.values.is_empty() { Review Comment: minor: I would suggest checking `!vt.expressions.is_empty()` to make it clearer that `expressions` are preferred over values. Effectively, it's the difference between "We are processing expressions because there are no values" as opposed to "We are processing expressions because they are present". ########## datafusion/substrait/tests/testdata/test_plans/select_count_from_select_1_virtual_table_expressions.substrait.json: ########## @@ -0,0 +1,94 @@ +{ + "extensionUris": [ + { + "uri": "https://github.com/substrait-io/substrait/blob/main/extensions/functions_aggregate_generic.yaml" + } + ], + "extensions": [ + { + "extensionFunction": { + "functionAnchor": 185, + "name": "count:any" + } + } + ], + "relations": [ + { + "root": { + "input": { + "aggregate": { + "common": { + "direct": { + } + }, + "input": { + "read": { Review Comment: minor: I would suggest of getting rid of everything other than the `read` in this plan, to make it clearer that this is a test of virtual table functionality. The aggregation in this plan is effectively noise. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org