vbarua commented on code in PR #16950: URL: https://github.com/apache/datafusion/pull/16950#discussion_r2244029778
########## datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs: ########## @@ -113,6 +122,15 @@ pub fn name_to_op(name: &str) -> Option<Operator> { } } +pub fn substrait_to_df_name(name: &str) -> Option<&str> { + match name { + "sign" => Some("signum"), + "is_nan" => Some("isnan"), + "logb" => Some("log"), Review Comment: This one might also be subtle actually. DataFusion [log](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#log) is > log(base, numeric_expression) Whereas Substrait [logb](https://github.com/substrait-io/substrait/blob/main/extensions/functions_logarithmic.yaml) is > logb(x, base) => log_{base} (x) Effectively, the arguments are in reverse orders 😓 This mapping could potentially be handled easily as part of https://github.com/apache/datafusion/pull/16984 ########## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ########## @@ -426,6 +426,34 @@ async fn simple_scalar_function_substr() -> Result<()> { roundtrip("SELECT SUBSTR(f, 1, 3) FROM data").await } +#[tokio::test] +// Test that DataFusion ISNAN function gets correctly mapped to Substrait "is_nan" in Substrait producer, and checks roundtrip comparison +// Follows the same structure as existing roundtrip tests, but more explicitly tests for name mappings +async fn scalar_function_with_diff_substrait_df_names() -> Result<()> { Review Comment: minor: I would suggest updating this to be something like ```rust // Verify that DataFusion functions mapped to Substrait functions with different names (i.e. "isnan" vs "is_nan") are generated correct Substrait plans and can roundtrip back to Datafusion. ``` To make it clearer that ISNAN is a specific case of a broader property we are testing. ########## datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs: ########## @@ -113,6 +122,15 @@ pub fn name_to_op(name: &str) -> Option<Operator> { } } +pub fn substrait_to_df_name(name: &str) -> Option<&str> { + match name { + "sign" => Some("signum"), Review Comment: Was just checking the signature for this, and I realized that this may not actually be the correct mapping. Specifically, the Datafusion [signum](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#signum) function > Returns the sign of a number. Negative numbers return -1. Zero and positive numbers return 1. Whereas in Substrait ``` Integer values return signedness with the same type as the input. Possible return values are [-1, 0, 1] Floating point values return signedness with the same type as the input. Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN] ``` Effectively, the Substrait function returns 0 if the input is 0. Checking with `datafusion-cli`, it does seem like the return value types line up. It may make sense to hold back on this specific mapping for now. ########## datafusion/substrait/tests/cases/roundtrip_logical_plan.rs: ########## @@ -426,6 +426,34 @@ async fn simple_scalar_function_substr() -> Result<()> { roundtrip("SELECT SUBSTR(f, 1, 3) FROM data").await } +#[tokio::test] +// Test that DataFusion ISNAN function gets correctly mapped to Substrait "is_nan" in Substrait producer, and checks roundtrip comparison +// Follows the same structure as existing roundtrip tests, but more explicitly tests for name mappings +async fn scalar_function_with_diff_substrait_df_names() -> Result<()> { Review Comment: minor: I would suggest updating this to be something like ```rust // Verify that DataFusion functions mapped to Substrait functions with different names (i.e. "isnan" vs "is_nan") are generated correct Substrait plans and can roundtrip back to Datafusion. ``` To make it clearer that ISNAN is a specific case of a broader property we are testing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org