MohamedAbdeen21 commented on code in PR #11585:
URL: https://github.com/apache/datafusion/pull/11585#discussion_r1686376343


##########
datafusion/physical-expr/src/expressions/binary.rs:
##########
@@ -289,6 +289,14 @@ impl PhysicalExpr for BinaryExpr {
             return apply_cmp_for_nested(self.op, &lhs, &rhs);
         }
 
+        if left_data_type.is_floating() {
+            lhs = normalize_floating_zeros(lhs, &left_data_type)?;
+        };
+
+        if right_data_type.is_floating() {
+            rhs = normalize_floating_zeros(rhs, &right_data_type)?;
+        }

Review Comment:
   While the overhead is ridiculously high anyway, I added a check to see if 
the array contains a -0.0 value before calling the normalization.
   
   ```rs
   pub fn contains_negative_zeros(array: &ColumnarValue, dt: &DataType) -> 
Result<bool> {
       match dt {
           DataType::Float64 => {
               contains_negative_zeros_impl(array, 
ScalarValue::Float64(Some(-0.0)))
           }
           DataType::Float32 => {
               contains_negative_zeros_impl(array, 
ScalarValue::Float32(Some(-0.0)))
           }
           DataType::Float16 => todo!(),
           _ => Ok(false),
       }
   }
   
   fn contains_negative_zeros_impl(
       array: &ColumnarValue,
       zero: ScalarValue,
   ) -> Result<bool> {
       let col = match array {
           ColumnarValue::Array(array) => eq(&array.as_ref(), 
&zero.to_scalar()?)?,
           ColumnarValue::Scalar(value) => eq(&value.to_scalar()?, 
&zero.to_scalar()?)?,
       };
       Ok(col.true_count() > 0)
   }
   ```
   
   
   When neither left nor right has a negative zero (again for 1M rows)
   
   ```
   evaluate with normalization
                           time:   [2.7361 ms 2.7585 ms 2.7850 ms]
                           change: [-87.716% -87.603% -87.458%] (p = 0.00 < 
0.05)
                           Performance has improved.
   Found 10 outliers among 100 measurements (10.00%)
     4 (4.00%) high mild
     6 (6.00%) high severe
   
   evaluate without normalization
                           time:   [910.36 µs 914.61 µs 919.46 µs]
                           change: [-1.4568% -0.5052% +0.5182%] (p = 0.30 > 
0.05)
                           No change in performance detected.
   Found 9 outliers among 100 measurements (9.00%)
     5 (5.00%) high mild
     4 (4.00%) high severe
   ```
   
   If either cols has even a single negative zero, we multiply the time by 10x 
per col (20x if each col contains at least a single negative zero)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to