sadikovi commented on a change in pull request #34611: URL: https://github.com/apache/spark/pull/34611#discussion_r751668591
########## File path: sql/core/benchmarks/DataSourceReadBenchmark-results.txt ########## @@ -1,252 +1,275 @@ +================================================================================================ +SQL Single Boolean Column Scan +================================================================================================ + +OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +SQL Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------ +SQL CSV 13472 13878 574 1.2 856.5 1.0X +SQL Json 10036 10477 623 1.6 638.0 1.3X +SQL Parquet Vectorized 144 167 12 109.2 9.2 93.5X +SQL Parquet MR 2224 2230 7 7.1 141.4 6.1X +SQL ORC Vectorized 191 203 6 82.3 12.2 70.5X +SQL ORC MR 1865 1870 7 8.4 118.6 7.2X + +OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz +Parquet Reader Single BOOLEAN Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative +------------------------------------------------------------------------------------------------------------------------- +ParquetReader Vectorized 119 125 8 131.9 7.6 1.0X +ParquetReader Vectorized -> Row 60 63 2 260.2 3.8 2.0X + + ================================================================================================ SQL Single Numeric Column Scan ================================================================================================ -OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz SQL Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------ -SQL CSV 15943 15956 18 1.0 1013.6 1.0X -SQL Json 9109 9158 70 1.7 579.1 1.8X -SQL Parquet Vectorized 168 191 16 93.8 10.7 95.1X -SQL Parquet MR 1938 1950 17 8.1 123.2 8.2X -SQL ORC Vectorized 191 199 6 82.2 12.2 83.3X -SQL ORC MR 1523 1537 20 10.3 96.8 10.5X - -OpenJDK 64-Bit Server VM 1.8.0_282-b08 on Linux 5.4.0-1043-azure -Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz +SQL CSV 16820 16859 54 0.9 1069.4 1.0X +SQL Json 11583 11586 4 1.4 736.4 1.5X +SQL Parquet Vectorized 164 177 11 96.0 10.4 102.7X +SQL Parquet MR 2839 2857 25 5.5 180.5 5.9X +SQL ORC Vectorized 150 161 7 104.8 9.5 112.1X +SQL ORC MR 1915 1923 12 8.2 121.7 8.8X + +OpenJDK 64-Bit Server VM 1.8.0_312-b07 on Linux 5.11.0-1020-azure +Intel(R) Xeon(R) CPU E5-2673 v3 @ 2.40GHz Parquet Reader Single TINYINT Column Scan: Best Time(ms) Avg Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------------------------------- -ParquetReader Vectorized 203 206 3 77.5 12.9 1.0X -ParquetReader Vectorized -> Row 97 100 2 161.6 6.2 2.1X +ParquetReader Vectorized 211 218 5 74.6 13.4 1.0X +ParquetReader Vectorized -> Row 286 293 7 55.1 18.2 0.7X Review comment: Yes, you are right, it is likely unrelated but I think we might need to take a look into this, maybe it is just noise in the benchmark results but the drop is quite significant though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org