dongjoon-hyun commented on code in PR #46266:
URL: https://github.com/apache/spark/pull/46266#discussion_r1585535466


##########
sql/core/benchmarks/DataSourceReadBenchmark-jdk21-results.txt:
##########
@@ -2,430 +2,430 @@
 SQL Single Numeric Column Scan
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 SQL Single BOOLEAN Column Scan:           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-SQL CSV                                            7930           7984         
 77          2.0         504.2       1.0X
-SQL Json                                           8135           8250         
163          1.9         517.2       1.0X
-SQL Parquet Vectorized: DataPageV1                   76             87         
  9        205.7           4.9     103.7X
-SQL Parquet Vectorized: DataPageV2                   55             65         
  8        285.3           3.5     143.8X
-SQL Parquet MR: DataPageV1                         1785           1787         
  3          8.8         113.5       4.4X
-SQL Parquet MR: DataPageV2                         1643           1680         
 52          9.6         104.5       4.8X
-SQL ORC Vectorized                                  114            124         
 10        138.2           7.2      69.7X
-SQL ORC MR                                         1494           1496         
  3         10.5          95.0       5.3X
-
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+SQL CSV                                            9759           9826         
 94          1.6         620.5       1.0X
+SQL Json                                           8157           8194         
 53          1.9         518.6       1.2X
+SQL Parquet Vectorized: DataPageV1                   86             99         
 11        183.5           5.4     113.9X
+SQL Parquet Vectorized: DataPageV2                  112            120         
  6        140.8           7.1      87.4X
+SQL Parquet MR: DataPageV1                         1775           1776         
  1          8.9         112.9       5.5X
+SQL Parquet MR: DataPageV2                         1745           1749         
  5          9.0         110.9       5.6X
+SQL ORC Vectorized                                  119            133         
  8        132.4           7.6      82.1X
+SQL ORC MR                                         1464           1464         
  0         10.7          93.1       6.7X
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 Parquet Reader Single BOOLEAN Column Scan:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized: DataPageV1                    35             36      
     1        449.0           2.2       1.0X
-ParquetReader Vectorized: DataPageV2                    25             26      
     1        638.4           1.6       1.4X
-ParquetReader Vectorized -> Row: DataPageV1             29             30      
     1        548.0           1.8       1.2X
-ParquetReader Vectorized -> Row: DataPageV2             18             20      
     2        851.6           1.2       1.9X
+ParquetReader Vectorized: DataPageV1                    94             96      
     3        167.7           6.0       1.0X
+ParquetReader Vectorized: DataPageV2                   113            115      
     2        139.1           7.2       0.8X
+ParquetReader Vectorized -> Row: DataPageV1             75             75      
     1        210.9           4.7       1.3X
+ParquetReader Vectorized -> Row: DataPageV2             95             96      
     1        166.2           6.0       1.0X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 SQL Single TINYINT Column Scan:           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-SQL CSV                                            9218           9237         
 26          1.7         586.1       1.0X
-SQL Json                                           8885           8900         
 21          1.8         564.9       1.0X
-SQL Parquet Vectorized: DataPageV1                   74             86         
  9        212.6           4.7     124.6X
-SQL Parquet Vectorized: DataPageV2                   74             88         
 12        211.4           4.7     123.9X
-SQL Parquet MR: DataPageV1                         1832           1837         
  8          8.6         116.5       5.0X
-SQL Parquet MR: DataPageV2                         1761           1763         
  3          8.9         112.0       5.2X
-SQL ORC Vectorized                                  104            114         
 11        150.9           6.6      88.5X
-SQL ORC MR                                         1523           1560         
 52         10.3          96.8       6.1X
-
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+SQL CSV                                            9826           9827         
  2          1.6         624.7       1.0X
+SQL Json                                           9154           9168         
 20          1.7         582.0       1.1X
+SQL Parquet Vectorized: DataPageV1                   98            107         
  8        161.1           6.2     100.7X
+SQL Parquet Vectorized: DataPageV2                   95            107         
 11        164.7           6.1     102.9X
+SQL Parquet MR: DataPageV1                         1876           1883         
  9          8.4         119.3       5.2X
+SQL Parquet MR: DataPageV2                         1841           1849         
 11          8.5         117.1       5.3X
+SQL ORC Vectorized                                  109            120         
  9        144.5           6.9      90.3X
+SQL ORC MR                                         1600           1601         
  2          9.8         101.7       6.1X
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 Parquet Reader Single TINYINT Column Scan:   Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized: DataPageV1                   125            138      
    14        125.8           7.9       1.0X
-ParquetReader Vectorized: DataPageV2                   125            137      
    11        126.2           7.9       1.0X
-ParquetReader Vectorized -> Row: DataPageV1             44             47      
     5        355.9           2.8       2.8X
-ParquetReader Vectorized -> Row: DataPageV2             44             47      
     5        357.8           2.8       2.8X
+ParquetReader Vectorized: DataPageV1                    76             78      
     2        207.9           4.8       1.0X
+ParquetReader Vectorized: DataPageV2                    76             78      
     2        208.0           4.8       1.0X
+ParquetReader Vectorized -> Row: DataPageV1             45             46      
     2        351.2           2.8       1.7X
+ParquetReader Vectorized -> Row: DataPageV2             44             45      
     1        353.5           2.8       1.7X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 SQL Single SMALLINT Column Scan:          Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-SQL CSV                                            9794           9896         
144          1.6         622.7       1.0X
-SQL Json                                           9146           9163         
 24          1.7         581.5       1.1X
-SQL Parquet Vectorized: DataPageV1                  109            117         
  7        144.1           6.9      89.7X
-SQL Parquet Vectorized: DataPageV2                  126            136         
  5        124.8           8.0      77.7X
-SQL Parquet MR: DataPageV1                         2090           2102         
 16          7.5         132.9       4.7X
-SQL Parquet MR: DataPageV2                         1898           1907         
 14          8.3         120.6       5.2X
-SQL ORC Vectorized                                  138            149         
 14        114.1           8.8      71.0X
-SQL ORC MR                                         1574           1605         
 43         10.0         100.1       6.2X
-
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+SQL CSV                                            9858           9859         
  1          1.6         626.8       1.0X
+SQL Json                                           9321           9334         
 18          1.7         592.6       1.1X
+SQL Parquet Vectorized: DataPageV1                  115            130         
 17        137.0           7.3      85.9X
+SQL Parquet Vectorized: DataPageV2                  135            149         
 17        116.9           8.6      73.2X
+SQL Parquet MR: DataPageV1                         2192           2199         
 10          7.2         139.4       4.5X
+SQL Parquet MR: DataPageV2                         2003           2026         
 32          7.9         127.4       4.9X
+SQL ORC Vectorized                                  143            153         
 17        109.9           9.1      68.9X
+SQL ORC MR                                         1944           1951         
 11          8.1         123.6       5.1X
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 Parquet Reader Single SMALLINT Column Scan:  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized: DataPageV1                   140            161      
    67        112.2           8.9       1.0X
-ParquetReader Vectorized: DataPageV2                   163            166      
     3         96.4          10.4       0.9X
-ParquetReader Vectorized -> Row: DataPageV1            139            140      
     2        113.1           8.8       1.0X
-ParquetReader Vectorized -> Row: DataPageV2            166            182      
    10         94.8          10.6       0.8X
+ParquetReader Vectorized: DataPageV1                   140            147      
     8        112.7           8.9       1.0X
+ParquetReader Vectorized: DataPageV2                   173            177      
     3         91.0          11.0       0.8X
+ParquetReader Vectorized -> Row: DataPageV1            134            141      
     8        117.2           8.5       1.0X
+ParquetReader Vectorized -> Row: DataPageV2            165            176      
    12         95.2          10.5       0.8X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 SQL Single INT Column Scan:               Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-SQL CSV                                           11232          11256         
 33          1.4         714.1       1.0X
-SQL Json                                           9725           9740         
 22          1.6         618.3       1.2X
-SQL Parquet Vectorized: DataPageV1                   84             97         
 15        187.8           5.3     134.1X
-SQL Parquet Vectorized: DataPageV2                  162            181         
 13         96.8          10.3      69.1X
-SQL Parquet MR: DataPageV1                         1882           1900         
 26          8.4         119.6       6.0X
-SQL Parquet MR: DataPageV2                         1898           1899         
  2          8.3         120.7       5.9X
-SQL ORC Vectorized                                  148            157         
 13        106.1           9.4      75.7X
-SQL ORC MR                                         1667           1674         
 10          9.4         106.0       6.7X
-
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+SQL CSV                                           11219          11235         
 22          1.4         713.3       1.0X
+SQL Json                                           9660           9667         
  9          1.6         614.2       1.2X
+SQL Parquet Vectorized: DataPageV1                  122            126         
  4        129.1           7.7      92.1X
+SQL Parquet Vectorized: DataPageV2                  178            195         
 17         88.5          11.3      63.1X
+SQL Parquet MR: DataPageV1                         2007           2031         
 33          7.8         127.6       5.6X
+SQL Parquet MR: DataPageV2                         2060           2084         
 34          7.6         131.0       5.4X
+SQL ORC Vectorized                                  175            184         
 13         89.8          11.1      64.0X
+SQL ORC MR                                         1804           1844         
 56          8.7         114.7       6.2X
+
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 Parquet Reader Single INT Column Scan:       Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
---------------------------------------------------------------------------------------------------------------------------
-ParquetReader Vectorized: DataPageV1                   130            140      
    11        121.1           8.3       1.0X
-ParquetReader Vectorized: DataPageV2                   213            230      
    10         74.0          13.5       0.6X
-ParquetReader Vectorized -> Row: DataPageV1            128            132      
     6        122.9           8.1       1.0X
-ParquetReader Vectorized -> Row: DataPageV2            222            226      
     5         70.7          14.1       0.6X
+ParquetReader Vectorized: DataPageV1                   150            157      
     6        104.7           9.6       1.0X
+ParquetReader Vectorized: DataPageV2                   212            226      
     9         74.3          13.5       0.7X
+ParquetReader Vectorized -> Row: DataPageV1            164            170      
     6         95.8          10.4       0.9X
+ParquetReader Vectorized -> Row: DataPageV2            242            246      
     4         64.9          15.4       0.6X
 
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+OpenJDK 64-Bit Server VM 21.0.3+9-LTS on Linux 6.5.0-1018-azure
 AMD EPYC 7763 64-Core Processor
 SQL Single BIGINT Column Scan:            Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-SQL CSV                                           14617          14690         
103          1.1         929.3       1.0X
-SQL Json                                          10772          10780         
 11          1.5         684.9       1.4X
-SQL Parquet Vectorized: DataPageV1                  118            132         
 13        133.4           7.5     124.0X
-SQL Parquet Vectorized: DataPageV2                  268            300         
 20         58.7          17.0      54.5X
-SQL Parquet MR: DataPageV1                         2289           2314         
 36          6.9         145.5       6.4X
-SQL Parquet MR: DataPageV2                         1993           1995         
  3          7.9         126.7       7.3X
-SQL ORC Vectorized                                  215            224         
 12         73.1          13.7      68.0X
-SQL ORC MR                                         1840           1851         
 17          8.6         117.0       7.9X
-
-OpenJDK 64-Bit Server VM 21.0.2+13-LTS on Linux 6.5.0-1016-azure
+SQL CSV                                           11095          11134         
 54          1.4         705.4       1.0X
+SQL Json                                           9688           9701         
 18          1.6         616.0       1.1X
+SQL Parquet Vectorized: DataPageV1                  293            297         
  4         53.7          18.6      37.9X

Review Comment:
   `DataPageV1` seems to show a regression in `BIGINT` column. It's slower than 
`DataPageV2` of Parquet.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to