Jefffrey commented on PR #22697: URL: https://github.com/apache/datafusion/pull/22697#issuecomment-4815849512
I feel the i128 conversion could be avoided with some generic trickery 🤔 For reference, I updated the [benchmarks](https://github.com/apache/datafusion/blob/main/datafusion/functions/benches/round.rs) to have an i64 bench with scale 2 and with scale -2, and ran on this branch: ``` round size=1024/round_f64_array time: [813.88 ns 819.69 ns 823.61 ns] change: [−2.9895% −2.2717% −1.7347%] (p = 0.00 < 0.05) Performance has improved. Found 2 outliers among 10 measurements (20.00%) 1 (10.00%) low severe 1 (10.00%) high mild round size=1024/round_i64_array time: [104.98 ns 105.65 ns 106.55 ns] change: [−2.5935% −1.3028% −0.0545%] (p = 0.07 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe round size=1024/round_i64_array_neg_scale time: [4.4552 µs 4.4583 µs 4.4623 µs] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high severe round size=4096/round_f64_array time: [2.7175 µs 2.7288 µs 2.7357 µs] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low severe round size=4096/round_i64_array time: [105.02 ns 105.41 ns 105.90 ns] Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild round size=4096/round_i64_array_neg_scale time: [17.140 µs 17.144 µs 17.149 µs] round size=8192/round_f64_array time: [5.6102 µs 5.6347 µs 5.6583 µs] round size=8192/round_i64_array time: [104.78 ns 105.08 ns 105.37 ns] round size=8192/round_i64_array_neg_scale time: [34.582 µs 34.604 µs 34.627 µs] ``` - round_i64_array = scale 2, expected pass through fast since no work needed - round_i64_array_neg_scale = scale -2, needs to go through i128 path And to test, I modified the code to avoid the i128 conversion and just operate directly on i64: ``` round size=1024/round_f64_array time: [810.04 ns 814.87 ns 817.61 ns] change: [−1.4862% −0.5891% +0.2634%] (p = 0.21 > 0.05) No change in performance detected. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) low severe round size=1024/round_i64_array time: [105.65 ns 105.78 ns 105.92 ns] change: [−0.7381% +0.1219% +0.7794%] (p = 0.79 > 0.05) No change in performance detected. round size=1024/round_i64_array_neg_scale time: [2.3741 µs 2.3780 µs 2.3824 µs] change: [−46.758% −46.661% −46.557%] (p = 0.00 < 0.05) Performance has improved. round size=4096/round_f64_array time: [2.6846 µs 2.6904 µs 2.6962 µs] change: [−1.7566% −1.4088% −0.9277%] (p = 0.00 < 0.05) Change within noise threshold. round size=4096/round_i64_array time: [106.03 ns 106.31 ns 106.67 ns] change: [+0.3251% +0.8590% +1.4019%] (p = 0.01 < 0.05) Change within noise threshold. Found 1 outliers among 10 measurements (10.00%) 1 (10.00%) high mild round size=4096/round_i64_array_neg_scale time: [8.9223 µs 8.9313 µs 8.9400 µs] change: [−47.965% −47.905% −47.852%] (p = 0.00 < 0.05) Performance has improved. round size=8192/round_f64_array time: [5.5491 µs 5.5718 µs 5.5949 µs] change: [−1.6684% −1.1164% −0.5111%] (p = 0.00 < 0.05) Change within noise threshold. round size=8192/round_i64_array time: [109.97 ns 110.49 ns 111.02 ns] change: [+4.5883% +5.1438% +5.7191%] (p = 0.00 < 0.05) Performance has regressed. round size=8192/round_i64_array_neg_scale time: [17.872 µs 17.946 µs 18.034 µs] change: [−48.360% −48.138% −47.920%] (p = 0.00 < 0.05) Performance has improved. ``` So the i128 conversion does seem to have a noticeable performance impact. We can always try to optimize in a followup if that is preference, just want to highlight this -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
