findepi commented on PR #7789: URL: https://github.com/apache/arrow-rs/pull/7789#issuecomment-3031229279
The starting point of this PR, aka the merge base of this PR, is 01c5efc6180af8509bd8ff847350deb89441495b. On this commit, I run ``` time caffeinate -sd cargo bench --bench row_format -- --save-baseline pr-merge-base ``` and then (still on the same commit) ``` time caffeinate -sd cargo bench --bench row_format -- --baseline pr-merge-base # and again time caffeinate -sd cargo bench --bench row_format -- --baseline pr-merge-base ``` Some benchmarks returned 19% improvement, some 19% or even 46% degradation. Definitely running on my laptop can hardly be called a well isolated environment, but i did my best in this regard. All in all, I can't get reproducible results of the benchmark, so it would be impossible for me to try to improve it. <details> <summary>third run transcript</summary> ``` triplet:~/repos/arrow-rs (remotes/origin/findepi/fix-rowconverter-when-fixedsizelist-is-not-the-last-b962b2~5)$ time caffeinate -sd cargo bench --bench row_format -- --baseline pr-merge-base warning: unnecessary transmute --> arrow-array/src/arithmetic.rs:421:14 | 421 | unsafe { std::mem::transmute(-1_i32) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace this with: `f32::from_bits(i32::cast_unsigned(-1_i32))` | = note: `#[warn(unnecessary_transmutes)]` on by default warning: unnecessary transmute --> arrow-array/src/arithmetic.rs:422:14 | 422 | unsafe { std::mem::transmute(i32::MAX) } | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace this with: `f32::from_bits(i32::cast_unsigned(i32::MAX))` warning: unnecessary transmute --> arrow-array/src/arithmetic.rs:428:14 | 428 | unsafe { std::mem::transmute(-1_i64) }, | ^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace this with: `f64::from_bits(i64::cast_unsigned(-1_i64))` warning: unnecessary transmute --> arrow-array/src/arithmetic.rs:429:14 | 429 | unsafe { std::mem::transmute(i64::MAX) } | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: replace this with: `f64::from_bits(i64::cast_unsigned(i64::MAX))` warning: `arrow-array` (lib) generated 4 warnings (run `cargo fix --lib -p arrow-array` to apply 4 suggestions) Finished `bench` profile [optimized] target(s) in 0.08s Running benches/row_format.rs (target/release/deps/row_format-1f0816d1a35cae07) convert_columns 4096 u64(0) time: [3.5123 µs 3.5154 µs 3.5185 µs] change: [+1.4054% +1.5969% +1.7836%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 1 (1.00%) low severe 4 (4.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe convert_columns_prepared 4096 u64(0) time: [3.4225 µs 3.4282 µs 3.4340 µs] change: [+0.6549% +0.8770% +1.0880%] (p = 0.00 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) low mild 5 (5.00%) high mild 1 (1.00%) high severe convert_rows 4096 u64(0) time: [16.449 µs 16.473 µs 16.499 µs] change: [+0.2037% +0.8191% +1.3354%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 1 (1.00%) low severe 2 (2.00%) low mild 2 (2.00%) high mild convert_columns 4096 u64(0.3) time: [5.1276 µs 5.2818 µs 5.5157 µs] change: [+4.1717% +5.5424% +7.5067%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 4 (4.00%) high mild 6 (6.00%) high severe convert_columns_prepared 4096 u64(0.3) time: [4.9268 µs 4.9414 µs 4.9585 µs] change: [−4.2134% −2.0315% −0.2573%] (p = 0.04 < 0.05) Change within noise threshold. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild convert_rows 4096 u64(0.3) time: [16.906 µs 16.999 µs 17.080 µs] change: [−3.2495% −1.3424% +0.4547%] (p = 0.17 > 0.05) No change in performance detected. convert_columns 4096 i64(0) time: [3.5386 µs 3.5465 µs 3.5552 µs] change: [−0.2023% +0.1053% +0.3938%] (p = 0.50 > 0.05) No change in performance detected. Found 16 outliers among 100 measurements (16.00%) 10 (10.00%) low mild 4 (4.00%) high mild 2 (2.00%) high severe convert_columns_prepared 4096 i64(0) time: [3.4603 µs 3.5877 µs 3.8169 µs] change: [−0.1419% +6.0071% +15.908%] (p = 0.12 > 0.05) No change in performance detected. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) high severe convert_rows 4096 i64(0) time: [17.309 µs 17.335 µs 17.366 µs] change: [+3.1937% +3.4695% +3.7630%] (p = 0.00 < 0.05) Performance has regressed. Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe convert_columns 4096 i64(0.3) time: [5.1595 µs 5.8456 µs 6.6599 µs] change: [+2.3459% +11.076% +21.746%] (p = 0.01 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 3 (3.00%) high mild 8 (8.00%) high severe convert_columns_prepared 4096 i64(0.3) time: [4.9328 µs 5.0086 µs 5.1153 µs] change: [−3.7176% +3.8380% +13.460%] (p = 0.42 > 0.05) No change in performance detected. Found 12 outliers among 100 measurements (12.00%) 2 (2.00%) high mild 10 (10.00%) high severe convert_rows 4096 i64(0.3) time: [17.244 µs 17.294 µs 17.347 µs] change: [+5.2345% +5.7414% +6.2264%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) low mild convert_columns 4096 bool(0, 0.5) time: [3.4031 µs 3.4053 µs 3.4079 µs] change: [−0.6059% −0.4483% −0.2916%] (p = 0.00 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 5 (5.00%) high mild 5 (5.00%) high severe convert_columns_prepared 4096 bool(0, 0.5) time: [3.3275 µs 3.3294 µs 3.3316 µs] change: [−0.3049% +0.0912% +0.4613%] (p = 0.65 > 0.05) No change in performance detected. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe convert_rows 4096 bool(0, 0.5) time: [11.098 µs 11.147 µs 11.200 µs] change: [+0.3523% +0.7462% +1.1393%] (p = 0.00 < 0.05) Change within noise threshold. convert_columns 4096 bool(0.3, 0.5) time: [4.6304 µs 4.6407 µs 4.6506 µs] change: [−0.7284% −0.4623% −0.2011%] (p = 0.00 < 0.05) Change within noise threshold. Found 5 outliers among 100 measurements (5.00%) 5 (5.00%) high mild convert_columns_prepared 4096 bool(0.3, 0.5) time: [4.6846 µs 4.6989 µs 4.7124 µs] change: [+1.9213% +2.3077% +2.6936%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) low mild 1 (1.00%) high mild convert_rows 4096 bool(0.3, 0.5) time: [11.330 µs 11.339 µs 11.348 µs] change: [−1.9081% −1.5355% −1.1433%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe convert_columns 4096 string(10, 0) time: [27.098 µs 27.141 µs 27.207 µs] change: [+0.0201% +0.2204% +0.4700%] (p = 0.04 < 0.05) Change within noise threshold. Found 10 outliers among 100 measurements (10.00%) 3 (3.00%) high mild 7 (7.00%) high severe convert_columns_prepared 4096 string(10, 0) time: [27.013 µs 27.391 µs 27.927 µs] change: [−6.2087% −5.0801% −3.7997%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 4 (4.00%) high mild 12 (12.00%) high severe convert_rows 4096 string(10, 0) time: [37.290 µs 37.357 µs 37.427 µs] change: [+11.253% +12.443% +13.644%] (p = 0.00 < 0.05) Performance has regressed. Found 5 outliers among 100 measurements (5.00%) 2 (2.00%) high mild 3 (3.00%) high severe convert_columns 4096 string(30, 0) time: [40.363 µs 41.860 µs 43.640 µs] change: [+1.8373% +4.4305% +7.1715%] (p = 0.00 < 0.05) Performance has regressed. Found 25 outliers among 100 measurements (25.00%) 24 (24.00%) low severe 1 (1.00%) high mild convert_columns_prepared 4096 string(30, 0) time: [43.832 µs 45.362 µs 46.927 µs] change: [+27.475% +30.397% +32.822%] (p = 0.00 < 0.05) Performance has regressed. Found 9 outliers among 100 measurements (9.00%) 8 (8.00%) low severe 1 (1.00%) low mild convert_rows 4096 string(30, 0) time: [38.713 µs 38.763 µs 38.815 µs] change: [−7.3768% −6.6136% −5.8665%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe convert_columns 4096 string(100, 0) time: [56.201 µs 56.973 µs 57.557 µs] change: [−7.8593% −4.1885% −0.7609%] (p = 0.02 < 0.05) Change within noise threshold. convert_columns_prepared 4096 string(100, 0) time: [47.935 µs 50.074 µs 52.423 µs] change: [+7.3365% +11.361% +15.358%] (p = 0.00 < 0.05) Performance has regressed. Found 25 outliers among 100 measurements (25.00%) 20 (20.00%) low severe 3 (3.00%) high mild 2 (2.00%) high severe convert_rows 4096 string(100, 0) time: [54.363 µs 54.443 µs 54.543 µs] change: [+0.1156% +0.3566% +0.6077%] (p = 0.01 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe convert_columns 4096 string(100, 0.5) time: [39.420 µs 40.017 µs 40.816 µs] change: [−9.1114% −6.2932% −3.4101%] (p = 0.00 < 0.05) Performance has improved. convert_columns_prepared 4096 string(100, 0.5) time: [39.373 µs 40.050 µs 40.902 µs] change: [−15.851% −13.830% −11.438%] (p = 0.00 < 0.05) Performance has improved. Found 18 outliers among 100 measurements (18.00%) 18 (18.00%) high severe convert_rows 4096 string(100, 0.5) time: [40.732 µs 40.826 µs 40.937 µs] change: [−1.8779% −1.0972% −0.4512%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe convert_columns 4096 string view(10, 0) time: [27.212 µs 27.747 µs 28.417 µs] change: [−13.825% −11.655% −9.3139%] (p = 0.00 < 0.05) Performance has improved. Found 16 outliers among 100 measurements (16.00%) 4 (4.00%) high mild 12 (12.00%) high severe convert_columns_prepared 4096 string view(10, 0) time: [27.493 µs 28.293 µs 29.363 µs] change: [−2.4027% +0.9219% +4.4338%] (p = 0.57 > 0.05) No change in performance detected. Found 22 outliers among 100 measurements (22.00%) 1 (1.00%) high mild 21 (21.00%) high severe convert_rows 4096 string view(10, 0) time: [45.577 µs 45.876 µs 46.192 µs] change: [+7.8124% +8.5016% +9.2252%] (p = 0.00 < 0.05) Performance has regressed. convert_columns 4096 string view(30, 0) time: [40.399 µs 41.990 µs 43.798 µs] change: [+0.8825% +3.7564% +6.2660%] (p = 0.01 < 0.05) Change within noise threshold. convert_columns_prepared 4096 string view(30, 0) time: [48.053 µs 48.346 µs 48.567 µs] change: [+2.0347% +4.6213% +7.5026%] (p = 0.00 < 0.05) Performance has regressed. convert_rows 4096 string view(30, 0) time: [44.417 µs 45.413 µs 46.555 µs] change: [+1.3415% +2.4868% +3.6178%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 2 (2.00%) high mild 8 (8.00%) high severe convert_columns 4096 string view(100, 0) time: [48.070 µs 49.871 µs 51.849 µs] change: [+2.0461% +5.8026% +9.7667%] (p = 0.00 < 0.05) Performance has regressed. convert_columns_prepared 4096 string view(100, 0) time: [43.266 µs 45.054 µs 47.194 µs] change: [−15.602% −12.074% −8.5293%] (p = 0.00 < 0.05) Performance has improved. convert_rows 4096 string view(100, 0) time: [65.065 µs 66.118 µs 67.117 µs] change: [+7.9296% +9.8750% +11.869%] (p = 0.00 < 0.05) Performance has regressed. convert_columns 4096 string view(100, 0.5) time: [40.333 µs 40.913 µs 41.663 µs] change: [−15.221% −13.291% −11.207%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 1 (1.00%) high mild 8 (8.00%) high severe convert_columns_prepared 4096 string view(100, 0.5) time: [40.000 µs 40.072 µs 40.172 µs] change: [+0.0381% +0.2770% +0.5375%] (p = 0.02 < 0.05) Change within noise threshold. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe convert_rows 4096 string view(100, 0.5) time: [59.937 µs 60.181 µs 60.443 µs] change: [+12.169% +14.407% +16.482%] (p = 0.00 < 0.05) Performance has regressed. Found 34 outliers among 100 measurements (34.00%) 20 (20.00%) low severe 14 (14.00%) high mild convert_columns 4096 string_dictionary(10, 0) time: [70.191 µs 70.511 µs 70.837 µs] change: [+4.4056% +4.9419% +5.5098%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild convert_columns_prepared 4096 string_dictionary(10, 0) time: [59.794 µs 61.125 µs 62.758 µs] change: [−0.9714% +0.8232% +2.9839%] (p = 0.42 > 0.05) No change in performance detected. convert_rows 4096 string_dictionary(10, 0) time: [33.586 µs 34.371 µs 35.079 µs] change: [−3.6991% −2.0560% −0.3505%] (p = 0.01 < 0.05) Change within noise threshold. Found 19 outliers among 100 measurements (19.00%) 2 (2.00%) high mild 17 (17.00%) high severe convert_columns 4096 string_dictionary(30, 0) time: [79.126 µs 81.227 µs 83.271 µs] change: [+4.7787% +7.1591% +9.7089%] (p = 0.00 < 0.05) Performance has regressed. convert_columns_prepared 4096 string_dictionary(30, 0) time: [87.614 µs 87.936 µs 88.251 µs] change: [+6.7616% +7.5396% +8.3319%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) high mild 2 (2.00%) high severe convert_rows 4096 string_dictionary(30, 0) time: [38.874 µs 38.896 µs 38.921 µs] change: [−3.4712% −3.0786% −2.7536%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe convert_columns 4096 string_dictionary(100, 0) time: [99.538 µs 99.984 µs 100.35 µs] change: [+12.146% +14.140% +16.268%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild convert_columns_prepared 4096 string_dictionary(100, 0) time: [97.243 µs 97.988 µs 98.640 µs] change: [+12.597% +15.219% +17.982%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 4 (4.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe convert_rows 4096 string_dictionary(100, 0) time: [55.050 µs 55.344 µs 55.697 µs] change: [+1.7452% +2.3044% +2.9405%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 2 (2.00%) high mild 16 (16.00%) high severe convert_columns 4096 string_dictionary(100, 0.5) time: [64.703 µs 65.626 µs 66.552 µs] change: [+11.904% +14.819% +17.723%] (p = 0.00 < 0.05) Performance has regressed. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high mild convert_columns_prepared 4096 string_dictionary(100, 0.5) time: [61.177 µs 62.821 µs 64.214 µs] change: [+2.4758% +6.1819% +9.5135%] (p = 0.00 < 0.05) Performance has regressed. convert_rows 4096 string_dictionary(100, 0.5) time: [43.655 µs 45.117 µs 46.617 µs] change: [+3.2423% +5.1998% +7.6313%] (p = 0.00 < 0.05) Performance has regressed. Found 21 outliers among 100 measurements (21.00%) 3 (3.00%) high mild 18 (18.00%) high severe convert_columns 4096 string_dictionary_low_cardinality(10, 0) time: [31.373 µs 32.080 µs 32.963 µs] change: [−20.765% −18.626% −16.628%] (p = 0.00 < 0.05) Performance has improved. Found 17 outliers among 100 measurements (17.00%) 3 (3.00%) high mild 14 (14.00%) high severe convert_columns_prepared 4096 string_dictionary_low_cardinality(10, 0) time: [31.252 µs 32.681 µs 34.202 µs] change: [−12.084% −8.9366% −5.6724%] (p = 0.00 < 0.05) Performance has improved. Found 20 outliers among 100 measurements (20.00%) 4 (4.00%) high mild 16 (16.00%) high severe convert_rows 4096 string_dictionary_low_cardinality(10, 0) time: [37.271 µs 38.069 µs 38.832 µs] change: [+14.762% +16.484% +18.172%] (p = 0.00 < 0.05) Performance has regressed. Found 6 outliers among 100 measurements (6.00%) 6 (6.00%) low severe convert_columns 4096 string_dictionary_low_cardinality(30, 0) time: [50.043 µs 50.221 µs 50.392 µs] change: [+41.863% +42.547% +43.195%] (p = 0.00 < 0.05) Performance has regressed. Found 4 outliers among 100 measurements (4.00%) 3 (3.00%) high mild 1 (1.00%) high severe convert_columns_prepared 4096 string_dictionary_low_cardinality(30, 0) time: [44.743 µs 46.611 µs 48.447 µs] change: [+26.599% +30.237% +33.547%] (p = 0.00 < 0.05) Performance has regressed. Found 10 outliers among 100 measurements (10.00%) 7 (7.00%) low severe 1 (1.00%) low mild 1 (1.00%) high mild 1 (1.00%) high severe convert_rows 4096 string_dictionary_low_cardinality(30, 0) time: [38.518 µs 38.612 µs 38.717 µs] change: [−3.5141% −3.2849% −3.0590%] (p = 0.00 < 0.05) Performance has improved. convert_columns 4096 string_dictionary_low_cardinality(100, 0) time: [47.136 µs 49.295 µs 51.341 µs] change: [+5.1642% +7.3359% +9.1326%] (p = 0.00 < 0.05) Performance has regressed. Found 18 outliers among 100 measurements (18.00%) 6 (6.00%) low severe 1 (1.00%) low mild 6 (6.00%) high mild 5 (5.00%) high severe convert_columns_prepared 4096 string_dictionary_low_cardinality(100, 0) time: [51.445 µs 51.643 µs 51.861 µs] change: [+40.851% +45.293% +49.658%] (p = 0.00 < 0.05) Performance has regressed. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild convert_rows 4096 string_dictionary_low_cardinality(100, 0) time: [55.181 µs 55.373 µs 55.613 µs] change: [−3.9148% −2.1703% −0.3429%] (p = 0.02 < 0.05) Change within noise threshold. convert_columns 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0) time: [124.56 µs 127.75 µs 130.72 µs] change: [−9.6229% −7.9306% −6.1546%] (p = 0.00 < 0.05) Performance has improved. convert_columns_prepared 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0) time: [136.35 µs 136.62 µs 136.92 µs] change: [+10.690% +12.396% +14.137%] (p = 0.00 < 0.05) Performance has regressed. Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 5 (5.00%) high mild 4 (4.00%) high severe convert_rows 4096 string(20, 0.5), string(30, 0), string(100, 0), i64(0) time: [141.88 µs 143.37 µs 145.06 µs] change: [+1.6849% +2.4824% +3.2835%] (p = 0.00 < 0.05) Performance has regressed. Found 24 outliers among 100 measurements (24.00%) 2 (2.00%) high mild 22 (22.00%) high severe convert_columns 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(10... time: [226.54 µs 230.51 µs 234.66 µs] change: [+2.7518% +4.2774% +5.8637%] (p = 0.00 < 0.05) Performance has regressed. Found 19 outliers among 100 measurements (19.00%) 15 (15.00%) high mild 4 (4.00%) high severe convert_columns_prepared 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dict... time: [231.70 µs 236.63 µs 241.00 µs] change: [+2.7453% +4.7096% +6.8555%] (p = 0.00 < 0.05) Performance has regressed. convert_rows 4096 4096 string_dictionary(20, 0.5), string_dictionary(30, 0), string_dictionary(100, ... time: [129.71 µs 130.72 µs 131.88 µs] change: [−5.4505% −4.7840% −4.0530%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 3 (3.00%) high mild 5 (5.00%) high severe iterate rows time: [1.3233 µs 1.3248 µs 1.3270 µs] change: [−0.1231% +0.0818% +0.2850%] (p = 0.45 > 0.05) No change in performance detected. Found 9 outliers among 100 measurements (9.00%) 5 (5.00%) high mild 4 (4.00%) high severe real 13m19.673s user 12m44.097s sys 1m8.692s ``` </details> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org