dongjoon-hyun commented on code in PR #46266: URL: https://github.com/apache/spark/pull/46266#discussion_r1585536429
########## sql/core/benchmarks/DataSourceReadBenchmark-jdk21-results.txt: ########## @@ -2,430 +2,430 @@ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 7930 7984 77 2.0 504.2 1.0X -SQL Json 8135 8250 163 1.9 517.2 1.0X -SQL Parquet Vectorized: DataPageV1 76 87 9 205.7 4.9 103.7X -SQL Parquet Vectorized: DataPageV2 55 65 8 285.3 3.5 143.8X -SQL Parquet MR: DataPageV1 1785 1787 3 8.8 113.5 4.4X -SQL Parquet MR: DataPageV2 1643 1680 52 9.6 104.5 4.8X -SQL ORC Vectorized 114 124 10 138.2 7.2 69.7X -SQL ORC MR 1494 1496 3 10.5 95.0 5.3X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 9759 9826 94 1.6 620.5 1.0X +SQL Json 8157 8194 53 1.9 518.6 1.2X +SQL Parquet Vectorized: DataPageV1 86 99 11 183.5 5.4 113.9X +SQL Parquet Vectorized: DataPageV2 112 120 6 140.8 7.1 87.4X +SQL Parquet MR: DataPageV1 1775 1776 1 8.9 112.9 5.5X +SQL Parquet MR: DataPageV2 1745 1749 5 9.0 110.9 5.6X +SQL ORC Vectorized 119 133 8 132.4 7.6 82.1X +SQL ORC MR 1464 1464 0 10.7 93.1 6.7X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 35 36 1 449.0 2.2 1.0X -ParquetReader Vectorized: DataPageV2 25 26 1 638.4 1.6 1.4X -ParquetReader Vectorized -> Row: DataPageV1 29 30 1 548.0 1.8 1.2X -ParquetReader Vectorized -> Row: DataPageV2 18 20 2 851.6 1.2 1.9X +ParquetReader Vectorized: DataPageV1 94 96 3 167.7 6.0 1.0X +ParquetReader Vectorized: DataPageV2 113 115 2 139.1 7.2 0.8X +ParquetReader Vectorized -> Row: DataPageV1 75 75 1 210.9 4.7 1.3X +ParquetReader Vectorized -> Row: DataPageV2 95 96 1 166.2 6.0 1.0X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 9218 9237 26 1.7 586.1 1.0X -SQL Json 8885 8900 21 1.8 564.9 1.0X -SQL Parquet Vectorized: DataPageV1 74 86 9 212.6 4.7 124.6X -SQL Parquet Vectorized: DataPageV2 74 88 12 211.4 4.7 123.9X -SQL Parquet MR: DataPageV1 1832 1837 8 8.6 116.5 5.0X -SQL Parquet MR: DataPageV2 1761 1763 3 8.9 112.0 5.2X -SQL ORC Vectorized 104 114 11 150.9 6.6 88.5X -SQL ORC MR 1523 1560 52 10.3 96.8 6.1X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 9826 9827 2 1.6 624.7 1.0X +SQL Json 9154 9168 20 1.7 582.0 1.1X +SQL Parquet Vectorized: DataPageV1 98 107 8 161.1 6.2 100.7X +SQL Parquet Vectorized: DataPageV2 95 107 11 164.7 6.1 102.9X +SQL Parquet MR: DataPageV1 1876 1883 9 8.4 119.3 5.2X +SQL Parquet MR: DataPageV2 1841 1849 11 8.5 117.1 5.3X +SQL ORC Vectorized 109 120 9 144.5 6.9 90.3X +SQL ORC MR 1600 1601 2 9.8 101.7 6.1X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 125 138 14 125.8 7.9 1.0X -ParquetReader Vectorized: DataPageV2 125 137 11 126.2 7.9 1.0X -ParquetReader Vectorized -> Row: DataPageV1 44 47 5 355.9 2.8 2.8X -ParquetReader Vectorized -> Row: DataPageV2 44 47 5 357.8 2.8 2.8X +ParquetReader Vectorized: DataPageV1 76 78 2 207.9 4.8 1.0X +ParquetReader Vectorized: DataPageV2 76 78 2 208.0 4.8 1.0X +ParquetReader Vectorized -> Row: DataPageV1 45 46 2 351.2 2.8 1.7X +ParquetReader Vectorized -> Row: DataPageV2 44 45 1 353.5 2.8 1.7X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 9794 9896 144 1.6 622.7 1.0X -SQL Json 9146 9163 24 1.7 581.5 1.1X -SQL Parquet Vectorized: DataPageV1 109 117 7 144.1 6.9 89.7X -SQL Parquet Vectorized: DataPageV2 126 136 5 124.8 8.0 77.7X -SQL Parquet MR: DataPageV1 2090 2102 16 7.5 132.9 4.7X -SQL Parquet MR: DataPageV2 1898 1907 14 8.3 120.6 5.2X -SQL ORC Vectorized 138 149 14 114.1 8.8 71.0X -SQL ORC MR 1574 1605 43 10.0 100.1 6.2X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 9858 9859 1 1.6 626.8 1.0X +SQL Json 9321 9334 18 1.7 592.6 1.1X +SQL Parquet Vectorized: DataPageV1 115 130 17 137.0 7.3 85.9X +SQL Parquet Vectorized: DataPageV2 135 149 17 116.9 8.6 73.2X +SQL Parquet MR: DataPageV1 2192 2199 10 7.2 139.4 4.5X +SQL Parquet MR: DataPageV2 2003 2026 32 7.9 127.4 4.9X +SQL ORC Vectorized 143 153 17 109.9 9.1 68.9X +SQL ORC MR 1944 1951 11 8.1 123.6 5.1X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single SMALLINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 140 161 67 112.2 8.9 1.0X -ParquetReader Vectorized: DataPageV2 163 166 3 96.4 10.4 0.9X -ParquetReader Vectorized -> Row: DataPageV1 139 140 2 113.1 8.8 1.0X -ParquetReader Vectorized -> Row: DataPageV2 166 182 10 94.8 10.6 0.8X +ParquetReader Vectorized: DataPageV1 140 147 8 112.7 8.9 1.0X +ParquetReader Vectorized: DataPageV2 173 177 3 91.0 11.0 0.8X +ParquetReader Vectorized -> Row: DataPageV1 134 141 8 117.2 8.5 1.0X +ParquetReader Vectorized -> Row: DataPageV2 165 176 12 95.2 10.5 0.8X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 11232 11256 33 1.4 714.1 1.0X -SQL Json 9725 9740 22 1.6 618.3 1.2X -SQL Parquet Vectorized: DataPageV1 84 97 15 187.8 5.3 134.1X -SQL Parquet Vectorized: DataPageV2 162 181 13 96.8 10.3 69.1X -SQL Parquet MR: DataPageV1 1882 1900 26 8.4 119.6 6.0X -SQL Parquet MR: DataPageV2 1898 1899 2 8.3 120.7 5.9X -SQL ORC Vectorized 148 157 13 106.1 9.4 75.7X -SQL ORC MR 1667 1674 10 9.4 106.0 6.7X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11219 11235 22 1.4 713.3 1.0X +SQL Json 9660 9667 9 1.6 614.2 1.2X +SQL Parquet Vectorized: DataPageV1 122 126 4 129.1 7.7 92.1X +SQL Parquet Vectorized: DataPageV2 178 195 17 88.5 11.3 63.1X +SQL Parquet MR: DataPageV1 2007 2031 33 7.8 127.6 5.6X +SQL Parquet MR: DataPageV2 2060 2084 34 7.6 131.0 5.4X +SQL ORC Vectorized 175 184 13 89.8 11.1 64.0X +SQL ORC MR 1804 1844 56 8.7 114.7 6.2X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single INT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 130 140 11 121.1 8.3 1.0X -ParquetReader Vectorized: DataPageV2 213 230 10 74.0 13.5 0.6X -ParquetReader Vectorized -> Row: DataPageV1 128 132 6 122.9 8.1 1.0X -ParquetReader Vectorized -> Row: DataPageV2 222 226 5 70.7 14.1 0.6X +ParquetReader Vectorized: DataPageV1 150 157 6 104.7 9.6 1.0X +ParquetReader Vectorized: DataPageV2 212 226 9 74.3 13.5 0.7X +ParquetReader Vectorized -> Row: DataPageV1 164 170 6 95.8 10.4 0.9X +ParquetReader Vectorized -> Row: DataPageV2 242 246 4 64.9 15.4 0.6X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 14617 14690 103 1.1 929.3 1.0X -SQL Json 10772 10780 11 1.5 684.9 1.4X -SQL Parquet Vectorized: DataPageV1 118 132 13 133.4 7.5 124.0X -SQL Parquet Vectorized: DataPageV2 268 300 20 58.7 17.0 54.5X -SQL Parquet MR: DataPageV1 2289 2314 36 6.9 145.5 6.4X -SQL Parquet MR: DataPageV2 1993 1995 3 7.9 126.7 7.3X -SQL ORC Vectorized 215 224 12 73.1 13.7 68.0X -SQL ORC MR 1840 1851 17 8.6 117.0 7.9X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11095 11134 54 1.4 705.4 1.0X +SQL Json 9688 9701 18 1.6 616.0 1.1X +SQL Parquet Vectorized: DataPageV1 293 297 4 53.7 18.6 37.9X +SQL Parquet Vectorized: DataPageV2 225 253 23 69.9 14.3 49.3X +SQL Parquet MR: DataPageV1 2423 2437 20 6.5 154.0 4.6X +SQL Parquet MR: DataPageV2 2041 2055 19 7.7 129.8 5.4X +SQL ORC Vectorized 165 192 24 95.3 10.5 67.2X +SQL ORC MR 1742 1753 15 9.0 110.8 6.4X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single BIGINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 167 179 12 94.0 10.6 1.0X -ParquetReader Vectorized: DataPageV2 324 331 4 48.5 20.6 0.5X -ParquetReader Vectorized -> Row: DataPageV1 181 185 5 87.1 11.5 0.9X -ParquetReader Vectorized -> Row: DataPageV2 322 331 6 48.8 20.5 0.5X +ParquetReader Vectorized: DataPageV1 308 317 8 51.0 19.6 1.0X +ParquetReader Vectorized: DataPageV2 276 283 5 56.9 17.6 1.1X +ParquetReader Vectorized -> Row: DataPageV1 317 321 4 49.6 20.2 1.0X +ParquetReader Vectorized -> Row: DataPageV2 271 278 7 58.1 17.2 1.1X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 11070 11076 9 1.4 703.8 1.0X -SQL Json 11574 11602 39 1.4 735.9 1.0X -SQL Parquet Vectorized: DataPageV1 86 97 15 182.7 5.5 128.6X -SQL Parquet Vectorized: DataPageV2 94 103 5 166.9 6.0 117.4X -SQL Parquet MR: DataPageV1 2065 2130 93 7.6 131.3 5.4X -SQL Parquet MR: DataPageV2 2157 2169 17 7.3 137.1 5.1X -SQL ORC Vectorized 266 288 20 59.0 16.9 41.5X -SQL ORC MR 1740 1780 57 9.0 110.6 6.4X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11177 11185 13 1.4 710.6 1.0X +SQL Json 11229 11252 32 1.4 713.9 1.0X +SQL Parquet Vectorized: DataPageV1 83 97 15 189.6 5.3 134.7X +SQL Parquet Vectorized: DataPageV2 82 96 13 191.1 5.2 135.8X +SQL Parquet MR: DataPageV1 2029 2055 36 7.8 129.0 5.5X +SQL Parquet MR: DataPageV2 1986 2014 39 7.9 126.3 5.6X +SQL ORC Vectorized 229 241 17 68.7 14.6 48.8X +SQL ORC MR 1751 1763 18 9.0 111.3 6.4X + +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor Parquet Reader Single FLOAT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative --------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized: DataPageV1 144 144 1 109.5 9.1 1.0X -ParquetReader Vectorized: DataPageV2 140 142 1 112.1 8.9 1.0X -ParquetReader Vectorized -> Row: DataPageV1 149 156 6 105.6 9.5 1.0X -ParquetReader Vectorized -> Row: DataPageV2 148 153 5 106.2 9.4 1.0X +ParquetReader Vectorized: DataPageV1 134 141 7 117.5 8.5 1.0X +ParquetReader Vectorized: DataPageV2 150 159 8 105.0 9.5 0.9X +ParquetReader Vectorized -> Row: DataPageV1 143 150 7 109.9 9.1 0.9X +ParquetReader Vectorized -> Row: DataPageV2 143 152 15 109.9 9.1 0.9X -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure AMD EPYC 7763 64-Core Processor SQL Single DOUBLE Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 14612 14718 150 1.1 929.0 1.0X -SQL Json 14802 14812 14 1.1 941.1 1.0X -SQL Parquet Vectorized: DataPageV1 126 144 15 124.3 8.0 115.5X -SQL Parquet Vectorized: DataPageV2 161 167 5 97.4 10.3 90.5X -SQL Parquet MR: DataPageV1 2239 2249 14 7.0 142.4 6.5X -SQL Parquet MR: DataPageV2 2125 2169 63 7.4 135.1 6.9X -SQL ORC Vectorized 352 366 11 44.6 22.4 41.5X -SQL ORC MR 1823 1824 1 8.6 115.9 8.0X - -OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure +SQL CSV 11485 11545 86 1.4 730.2 1.0X +SQL Json 11591 11597 8 1.4 737.0 1.0X +SQL Parquet Vectorized: DataPageV1 269 288 18 58.5 17.1 42.7X Review Comment: This also has slightly different ratio. `DataPageV1` vs `DataPageV2`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org