vbarua commented on code in PR #16950:
URL: https://github.com/apache/datafusion/pull/16950#discussion_r2244029778


##########
datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs:
##########
@@ -113,6 +122,15 @@ pub fn name_to_op(name: &str) -> Option<Operator> {
     }
 }
 
+pub fn substrait_to_df_name(name: &str) -> Option<&str> {
+    match name {
+        "sign" => Some("signum"),
+        "is_nan" => Some("isnan"),
+        "logb" => Some("log"),

Review Comment:
   This one might also be subtle actually. DataFusion 
[log](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#log) 
is 
   > log(base, numeric_expression)
   
   Whereas Substrait 
[logb](https://github.com/substrait-io/substrait/blob/main/extensions/functions_logarithmic.yaml)
 is
   >  logb(x, base) => log_{base} (x)
   
   Effectively, the arguments are in reverse orders 😓
   
   This mapping could potentially be handled easily as part of 
https://github.com/apache/datafusion/pull/16984



##########
datafusion/substrait/tests/cases/roundtrip_logical_plan.rs:
##########
@@ -426,6 +426,34 @@ async fn simple_scalar_function_substr() -> Result<()> {
     roundtrip("SELECT SUBSTR(f, 1, 3) FROM data").await
 }
 
+#[tokio::test]
+// Test that DataFusion ISNAN function gets correctly mapped to Substrait 
"is_nan" in Substrait producer, and checks roundtrip comparison
+// Follows the same structure as existing roundtrip tests, but more explicitly 
tests for name mappings
+async fn scalar_function_with_diff_substrait_df_names() -> Result<()> {

Review Comment:
   minor: I would suggest updating this to be something like
   ```rust
   // Verify that DataFusion functions mapped to Substrait functions with 
different names (i.e. "isnan" vs "is_nan") are generated correct Substrait 
plans and can roundtrip back to Datafusion.
   ```
   
   To make it clearer that ISNAN is a specific case of a broader property we 
are testing.



##########
datafusion/substrait/src/logical_plan/consumer/expr/scalar_function.rs:
##########
@@ -113,6 +122,15 @@ pub fn name_to_op(name: &str) -> Option<Operator> {
     }
 }
 
+pub fn substrait_to_df_name(name: &str) -> Option<&str> {
+    match name {
+        "sign" => Some("signum"),

Review Comment:
   Was just checking the signature for this, and I realized that this may not 
actually be the correct mapping.
   
   Specifically, the Datafusion 
[signum](https://datafusion.apache.org/user-guide/sql/scalar_functions.html#signum)
 function
   > Returns the sign of a number. Negative numbers return -1. Zero and 
positive numbers return 1.
   
   Whereas in Substrait
   ```
         Integer values return signedness with the same type as the input.
         Possible return values are [-1, 0, 1]
   
         Floating point values return signedness with the same type as the 
input.
         Possible return values are [-1.0, -0.0, 0.0, 1.0, NaN]
   ```
   
   Effectively, the Substrait function returns 0 if the input is 0. Checking 
with `datafusion-cli`, it does seem like the return value types line up.
   
   It may make sense to hold back on this specific mapping for now.
   



##########
datafusion/substrait/tests/cases/roundtrip_logical_plan.rs:
##########
@@ -426,6 +426,34 @@ async fn simple_scalar_function_substr() -> Result<()> {
     roundtrip("SELECT SUBSTR(f, 1, 3) FROM data").await
 }
 
+#[tokio::test]
+// Test that DataFusion ISNAN function gets correctly mapped to Substrait 
"is_nan" in Substrait producer, and checks roundtrip comparison
+// Follows the same structure as existing roundtrip tests, but more explicitly 
tests for name mappings
+async fn scalar_function_with_diff_substrait_df_names() -> Result<()> {

Review Comment:
   minor: I would suggest updating this to be something like
   ```rust
   // Verify that DataFusion functions mapped to Substrait functions with 
different names (i.e. "isnan" vs "is_nan") are generated correct Substrait 
plans and can roundtrip back to Datafusion.
   ```
   
   To make it clearer that ISNAN is a specific case of a broader property we 
are testing.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to