dongjoon-hyun commented on code in PR #46266: URL: https://github.com/apache/spark/pull/46266#discussion_r1585538679
########## sql/core/benchmarks/DataSourceReadBenchmark-jdk21-results.txt: ########## @@ -2,430 +2,430 @@ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 7930 7984 77 2.0 504.2 1.0X -SQL Json 8135 8250 163 1.9 517.2 1.0X -SQL Parquet Vectorized: DataPageV1 76 87 9 205.7 4.9 103.7X -SQL Parquet Vectorized: DataPageV2 55 65 8 285.3 3.5 143.8X -SQL Parquet MR: DataPageV1 1785 1787 3 8.8 113.5 4.4X -SQL Parquet MR: DataPageV2 1643 1680 52 9.6 104.5 4.8X -SQL ORC Vectorized 114 124 10 138.2 7.2 69.7X -SQL ORC MR 1494 1496 3 10.5 95.0 5.3X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 9759 9826 94 1.6 620.5 1.0X +SQL Json 8157 8194 53 1.9 518.6 1.2X +SQL Parquet Vectorized: DataPageV1 86 99 11 183.5 5.4 113.9X +SQL Parquet Vectorized: DataPageV2 112 120 6 140.8 7.1 87.4X +SQL Parquet MR: DataPageV1 1775 1776 1 8.9 112.9 5.5X +SQL Parquet MR: DataPageV2 1745 1749 5 9.0 110.9 5.6X +SQL ORC Vectorized 119 133 8 132.4 7.6 82.1X +SQL ORC MR 1464 1464 0 10.7 93.1 6.7X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 35 36 1 449.0 2.2 1.0X -ParquetReader Vectorized: DataPageV2 25 26 1 638.4 1.6 1.4X -ParquetReader Vectorized -> Row: DataPageV1 29 30 1 548.0 1.8 1.2X -ParquetReader Vectorized -> Row: DataPageV2 18 20 2 851.6 1.2 1.9X +ParquetReader Vectorized: DataPageV1 94 96 3 167.7 6.0 1.0X +ParquetReader Vectorized: DataPageV2 113 115 2 139.1 7.2 0.8X +ParquetReader Vectorized -> Row: DataPageV1 75 75 1 210.9 4.7 1.3X +ParquetReader Vectorized -> Row: DataPageV2 95 96 1 166.2 6.0 1.0X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 9218 9237 26 1.7 586.1 1.0X -SQL Json 8885 8900 21 1.8 564.9 1.0X -SQL Parquet Vectorized: DataPageV1 74 86 9 212.6 4.7 124.6X -SQL Parquet Vectorized: DataPageV2 74 88 12 211.4 4.7 123.9X -SQL Parquet MR: DataPageV1 1832 1837 8 8.6 116.5 5.0X -SQL Parquet MR: DataPageV2 1761 1763 3 8.9 112.0 5.2X -SQL ORC Vectorized 104 114 11 150.9 6.6 88.5X -SQL ORC MR 1523 1560 52 10.3 96.8 6.1X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 9826 9827 2 1.6 624.7 1.0X +SQL Json 9154 9168 20 1.7 582.0 1.1X +SQL Parquet Vectorized: DataPageV1 98 107 8 161.1 6.2 100.7X +SQL Parquet Vectorized: DataPageV2 95 107 11 164.7 6.1 102.9X +SQL Parquet MR: DataPageV1 1876 1883 9 8.4 119.3 5.2X +SQL Parquet MR: DataPageV2 1841 1849 11 8.5 117.1 5.3X +SQL ORC Vectorized 109 120 9 144.5 6.9 90.3X +SQL ORC MR 1600 1601 2 9.8 101.7 6.1X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 125 138 14 125.8 7.9 1.0X -ParquetReader Vectorized: DataPageV2 125 137 11 126.2 7.9 1.0X -ParquetReader Vectorized -> Row: DataPageV1 44 47 5 355.9 2.8 2.8X -ParquetReader Vectorized -> Row: DataPageV2 44 47 5 357.8 2.8 2.8X +ParquetReader Vectorized: DataPageV1 76 78 2 207.9 4.8 1.0X +ParquetReader Vectorized: DataPageV2 76 78 2 208.0 4.8 1.0X +ParquetReader Vectorized -> Row: DataPageV1 45 46 2 351.2 2.8 1.7X +ParquetReader Vectorized -> Row: DataPageV2 44 45 1 353.5 2.8 1.7X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 9794 9896 144 1.6 622.7 1.0X -SQL Json 9146 9163 24 1.7 581.5 1.1X -SQL Parquet Vectorized: DataPageV1 109 117 7 144.1 6.9 89.7X -SQL Parquet Vectorized: DataPageV2 126 136 5 124.8 8.0 77.7X -SQL Parquet MR: DataPageV1 2090 2102 16 7.5 132.9 4.7X -SQL Parquet MR: DataPageV2 1898 1907 14 8.3 120.6 5.2X -SQL ORC Vectorized 138 149 14 114.1 8.8 71.0X -SQL ORC MR 1574 1605 43 10.0 100.1 6.2X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 9858 9859 1 1.6 626.8 1.0X +SQL Json 9321 9334 18 1.7 592.6 1.1X +SQL Parquet Vectorized: DataPageV1 115 130 17 137.0 7.3 85.9X +SQL Parquet Vectorized: DataPageV2 135 149 17 116.9 8.6 73.2X +SQL Parquet MR: DataPageV1 2192 2199 10 7.2 139.4 4.5X +SQL Parquet MR: DataPageV2 2003 2026 32 7.9 127.4 4.9X +SQL ORC Vectorized 143 153 17 109.9 9.1 68.9X +SQL ORC MR 1944 1951 11 8.1 123.6 5.1X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 140 161 67 112.2 8.9 1.0X -ParquetReader Vectorized: DataPageV2 163 166 3 96.4 10.4 0.9X -ParquetReader Vectorized -> Row: DataPageV1 139 140 2 113.1 8.8 1.0X -ParquetReader Vectorized -> Row: DataPageV2 166 182 10 94.8 10.6 0.8X +ParquetReader Vectorized: DataPageV1 140 147 8 112.7 8.9 1.0X +ParquetReader Vectorized: DataPageV2 173 177 3 91.0 11.0 0.8X +ParquetReader Vectorized -> Row: DataPageV1 134 141 8 117.2 8.5 1.0X +ParquetReader Vectorized -> Row: DataPageV2 165 176 12 95.2 10.5 0.8X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 11232 11256 33 1.4 714.1 1.0X -SQL Json 9725 9740 22 1.6 618.3 1.2X -SQL Parquet Vectorized: DataPageV1 84 97 15 187.8 5.3 134.1X -SQL Parquet Vectorized: DataPageV2 162 181 13 96.8 10.3 69.1X -SQL Parquet MR: DataPageV1 1882 1900 26 8.4 119.6 6.0X -SQL Parquet MR: DataPageV2 1898 1899 2 8.3 120.7 5.9X -SQL ORC Vectorized 148 157 13 106.1 9.4 75.7X -SQL ORC MR 1667 1674 10 9.4 106.0 6.7X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11219 11235 22 1.4 713.3 1.0X +SQL Json 9660 9667 9 1.6 614.2 1.2X +SQL Parquet Vectorized: DataPageV1 122 126 4 129.1 7.7 92.1X +SQL Parquet Vectorized: DataPageV2 178 195 17 88.5 11.3 63.1X +SQL Parquet MR: DataPageV1 2007 2031 33 7.8 127.6 5.6X +SQL Parquet MR: DataPageV2 2060 2084 34 7.6 131.0 5.4X +SQL ORC Vectorized 175 184 13 89.8 11.1 64.0X +SQL ORC MR 1804 1844 56 8.7 114.7 6.2X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 130 140 11 121.1 8.3 1.0X -ParquetReader Vectorized: DataPageV2 213 230 10 74.0 13.5 0.6X -ParquetReader Vectorized -> Row: DataPageV1 128 132 6 122.9 8.1 1.0X -ParquetReader Vectorized -> Row: DataPageV2 222 226 5 70.7 14.1 0.6X +ParquetReader Vectorized: DataPageV1 150 157 6 104.7 9.6 1.0X +ParquetReader Vectorized: DataPageV2 212 226 9 74.3 13.5 0.7X +ParquetReader Vectorized -> Row: DataPageV1 164 170 6 95.8 10.4 0.9X +ParquetReader Vectorized -> Row: DataPageV2 242 246 4 64.9 15.4 0.6X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 14617 14690 103 1.1 929.3 1.0X -SQL Json 10772 10780 11 1.5 684.9 1.4X -SQL Parquet Vectorized: DataPageV1 118 132 13 133.4 7.5 124.0X -SQL Parquet Vectorized: DataPageV2 268 300 20 58.7 17.0 54.5X -SQL Parquet MR: DataPageV1 2289 2314 36 6.9 145.5 6.4X -SQL Parquet MR: DataPageV2 1993 1995 3 7.9 126.7 7.3X -SQL ORC Vectorized 215 224 12 73.1 13.7 68.0X -SQL ORC MR 1840 1851 17 8.6 117.0 7.9X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11095 11134 54 1.4 705.4 1.0X +SQL Json 9688 9701 18 1.6 616.0 1.1X +SQL Parquet Vectorized: DataPageV1 293 297 4 53.7 18.6 37.9X +SQL Parquet Vectorized: DataPageV2 225 253 23 69.9 14.3 49.3X +SQL Parquet MR: DataPageV1 2423 2437 20 6.5 154.0 4.6X +SQL Parquet MR: DataPageV2 2041 2055 19 7.7 129.8 5.4X +SQL ORC Vectorized 165 192 24 95.3 10.5 67.2X +SQL ORC MR 1742 1753 15 9.0 110.8 6.4X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 167 179 12 94.0 10.6 1.0X -ParquetReader Vectorized: DataPageV2 324 331 4 48.5 20.6 0.5X -ParquetReader Vectorized -> Row: DataPageV1 181 185 5 87.1 11.5 0.9X -ParquetReader Vectorized -> Row: DataPageV2 322 331 6 48.8 20.5 0.5X +ParquetReader Vectorized: DataPageV1 308 317 8 51.0 19.6 1.0X +ParquetReader Vectorized: DataPageV2 276 283 5 56.9 17.6 1.1X +ParquetReader Vectorized -> Row: DataPageV1 317 321 4 49.6 20.2 1.0X +ParquetReader Vectorized -> Row: DataPageV2 271 278 7 58.1 17.2 1.1X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 11070 11076 9 1.4 703.8 1.0X -SQL Json 11574 11602 39 1.4 735.9 1.0X -SQL Parquet Vectorized: DataPageV1 86 97 15 182.7 5.5 128.6X -SQL Parquet Vectorized: DataPageV2 94 103 5 166.9 6.0 117.4X -SQL Parquet MR: DataPageV1 2065 2130 93 7.6 131.3 5.4X -SQL Parquet MR: DataPageV2 2157 2169 17 7.3 137.1 5.1X -SQL ORC Vectorized 266 288 20 59.0 16.9 41.5X -SQL ORC MR 1740 1780 57 9.0 110.6 6.4X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11177 11185 13 1.4 710.6 1.0X +SQL Json 11229 11252 32 1.4 713.9 1.0X +SQL Parquet Vectorized: DataPageV1 83 97 15 189.6 5.3 134.7X +SQL Parquet Vectorized: DataPageV2 82 96 13 191.1 5.2 135.8X +SQL Parquet MR: DataPageV1 2029 2055 36 7.8 129.0 5.5X +SQL Parquet MR: DataPageV2 1986 2014 39 7.9 126.3 5.6X +SQL ORC Vectorized 229 241 17 68.7 14.6 48.8X +SQL ORC MR 1751 1763 18 9.0 111.3 6.4X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 144 144 1 109.5 9.1 1.0X -ParquetReader Vectorized: DataPageV2 140 142 1 112.1 8.9 1.0X -ParquetReader Vectorized -> Row: DataPageV1 149 156 6 105.6 9.5 1.0X -ParquetReader Vectorized -> Row: DataPageV2 148 153 5 106.2 9.4 1.0X +ParquetReader Vectorized: DataPageV1 134 141 7 117.5 8.5 1.0X +ParquetReader Vectorized: DataPageV2 150 159 8 105.0 9.5 0.9X +ParquetReader Vectorized -> Row: DataPageV1 143 150 7 109.9 9.1 0.9X +ParquetReader Vectorized -> Row: DataPageV2 143 152 15 109.9 9.1 0.9X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 14612 14718 150 1.1 929.0 1.0X -SQL Json 14802 14812 14 1.1 941.1 1.0X -SQL Parquet Vectorized: DataPageV1 126 144 15 124.3 8.0 115.5X -SQL Parquet Vectorized: DataPageV2 161 167 5 97.4 10.3 90.5X -SQL Parquet MR: DataPageV1 2239 2249 14 7.0 142.4 6.5X -SQL Parquet MR: DataPageV2 2125 2169 63 7.4 135.1 6.9X -SQL ORC Vectorized 352 366 11 44.6 22.4 41.5X -SQL ORC MR 1823 1824 1 8.6 115.9 8.0X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11485 11545 86 1.4 730.2 1.0X +SQL Json 11591 11597 8 1.4 737.0 1.0X +SQL Parquet Vectorized: DataPageV1 269 288 18 58.5 17.1 42.7X +SQL Parquet Vectorized: DataPageV2 273 287 16 57.6 17.4 42.0X +SQL Parquet MR: DataPageV1 2456 2477 29 6.4 156.2 4.7X +SQL Parquet MR: DataPageV2 2373 2386 17 6.6 150.9 4.8X +SQL ORC Vectorized 585 588 2 26.9 37.2 19.6X +SQL ORC MR 2132 2137 7 7.4 135.5 5.4X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 202 205 2 77.7 12.9 1.0X -ParquetReader Vectorized: DataPageV2 200 205 5 78.5 12.7 1.0X -ParquetReader Vectorized -> Row: DataPageV1 182 187 5 86.2 11.6 1.1X -ParquetReader Vectorized -> Row: DataPageV2 182 186 4 86.3 11.6 1.1X +ParquetReader Vectorized: DataPageV1 314 328 14 50.1 19.9 1.0X +ParquetReader Vectorized: DataPageV2 326 332 8 48.2 20.7 1.0X +ParquetReader Vectorized -> Row: DataPageV1 322 325 5 48.9 20.5 1.0X +ParquetReader Vectorized -> Row: DataPageV2 322 328 7 48.8 20.5 1.0X ================================================================================================ SQL Single Numeric Column Scan in Struct ================================================================================================ -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single TINYINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------- -SQL ORC MR 1989 2016 38 7.9 126.4 1.0X -SQL ORC Vectorized (Nested Column Disabled) 1965 1966 2 8.0 124.9 1.0X -SQL ORC Vectorized (Nested Column Enabled) 195 207 15 80.6 12.4 10.2X -SQL Parquet MR: DataPageV1 2261 2267 9 7.0 143.7 0.9X -SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2698 2708 14 5.8 171.5 0.7X -SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 100 105 4 157.5 6.3 19.9X -SQL Parquet MR: DataPageV2 2108 2109 1 7.5 134.0 0.9X -SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2617 2636 27 6.0 166.4 0.8X -SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 90 98 9 175.2 5.7 22.2X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL ORC MR 2186 2228 60 7.2 139.0 1.0X +SQL ORC Vectorized (Nested Column Disabled) 2220 2236 22 7.1 141.1 1.0X +SQL ORC Vectorized (Nested Column Enabled) 147 156 14 107.0 9.3 14.9X +SQL Parquet MR: DataPageV1 2403 2434 44 6.5 152.8 0.9X +SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 3030 3053 33 5.2 192.6 0.7X +SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 105 135 23 149.9 6.7 20.8X +SQL Parquet MR: DataPageV2 2343 2350 10 6.7 149.0 0.9X +SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2929 2964 50 5.4 186.2 0.7X +SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 104 117 15 151.1 6.6 21.0X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single SMALLINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------- -SQL ORC MR 2099 2122 32 7.5 133.5 1.0X -SQL ORC Vectorized (Nested Column Disabled) 2154 2157 4 7.3 137.0 1.0X -SQL ORC Vectorized (Nested Column Enabled) 275 287 14 57.3 17.5 7.6X -SQL Parquet MR: DataPageV1 2310 2320 15 6.8 146.9 0.9X -SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2891 2907 23 5.4 183.8 0.7X -SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 99 122 23 158.5 6.3 21.2X -SQL Parquet MR: DataPageV2 2250 2254 7 7.0 143.0 0.9X -SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2848 2874 37 5.5 181.0 0.7X -SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 124 137 10 127.2 7.9 17.0X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL ORC MR 2180 2203 32 7.2 138.6 1.0X +SQL ORC Vectorized (Nested Column Disabled) 2156 2188 45 7.3 137.1 1.0X +SQL ORC Vectorized (Nested Column Enabled) 261 273 12 60.3 16.6 8.4X +SQL Parquet MR: DataPageV1 2345 2347 4 6.7 149.1 0.9X +SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2652 2672 28 5.9 168.6 0.8X +SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 120 124 3 131.4 7.6 18.2X +SQL Parquet MR: DataPageV2 2326 2357 43 6.8 147.9 0.9X +SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2635 2664 42 6.0 167.5 0.8X +SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 142 159 14 110.6 9.0 15.3X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single INT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------- -SQL ORC MR 2147 2182 49 7.3 136.5 1.0X -SQL ORC Vectorized (Nested Column Disabled) 2138 2160 30 7.4 136.0 1.0X -SQL ORC Vectorized (Nested Column Enabled) 307 315 10 51.2 19.5 7.0X -SQL Parquet MR: DataPageV1 2349 2351 3 6.7 149.3 0.9X -SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2783 2823 56 5.7 177.0 0.8X -SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 92 111 18 170.3 5.9 23.3X -SQL Parquet MR: DataPageV2 2394 2416 31 6.6 152.2 0.9X -SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2774 2776 3 5.7 176.4 0.8X -SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 206 227 18 76.3 13.1 10.4X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL ORC MR 2364 2366 4 6.7 150.3 1.0X +SQL ORC Vectorized (Nested Column Disabled) 2347 2358 14 6.7 149.2 1.0X +SQL ORC Vectorized (Nested Column Enabled) 281 293 10 55.9 17.9 8.4X +SQL Parquet MR: DataPageV1 2409 2426 24 6.5 153.1 1.0X +SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 2909 2918 14 5.4 184.9 0.8X +SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 123 128 4 127.4 7.8 19.2X +SQL Parquet MR: DataPageV2 2387 2398 16 6.6 151.8 1.0X +SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2865 2869 6 5.5 182.1 0.8X +SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 276 282 4 57.0 17.5 8.6X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single BIGINT Column Scan in Struct: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------------------------- -SQL ORC MR 2253 2258 7 7.0 143.3 1.0X -SQL ORC Vectorized (Nested Column Disabled) 2311 2324 18 6.8 146.9 1.0X -SQL ORC Vectorized (Nested Column Enabled) 356 377 29 44.2 22.6 6.3X -SQL Parquet MR: DataPageV1 2600 2609 13 6.0 165.3 0.9X -SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 3090 3097 9 5.1 196.5 0.7X -SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 128 144 16 122.6 8.2 17.6X -SQL Parquet MR: DataPageV2 2303 2325 31 6.8 146.4 1.0X -SQL Parquet Vectorized: DataPageV2 (Nested Column Disabled) 2816 2821 7 5.6 179.0 0.8X -SQL Parquet Vectorized: DataPageV2 (Nested Column Enabled) 332 339 7 47.3 21.1 6.8X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL ORC MR 2207 2213 9 7.1 140.3 1.0X +SQL ORC Vectorized (Nested Column Disabled) 2245 2267 32 7.0 142.7 1.0X +SQL ORC Vectorized (Nested Column Enabled) 285 305 38 55.1 18.1 7.7X +SQL Parquet MR: DataPageV1 2807 2812 8 5.6 178.4 0.8X +SQL Parquet Vectorized: DataPageV1 (Nested Column Disabled) 3405 3407 3 4.6 216.5 0.6X +SQL Parquet Vectorized: DataPageV1 (Nested Column Enabled) 291 321 20 54.0 18.5 7.6X Review Comment: This also shows a regression on `DataPageV1` (according to the ratio change of `DataPageV1` vs `DataPageV2`. `DataPageV1` becomes slower than `DataPageV2`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org