[ https://issues.apache.org/jira/browse/SPARK-25539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-25539: ------------------------------------ Assignee: Apache Spark > Update lz4-java to get speed improvement > ---------------------------------------- > > Key: SPARK-25539 > URL: https://issues.apache.org/jira/browse/SPARK-25539 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 2.5.0 > Reporter: Yuming Wang > Assignee: Apache Spark > Priority: Major > > *General speed improvements* > ||Version||v1.8.1||v1.8.2||Improvement|| > |Decompression speed|2490 MB/s|2770 MB/s|+11%| > Compression speeds also receive a welcomed boost, though improvement is not > evenly distributed, with higher levels benefiting quite a lot more. > ||Version||v1.8.1||v1.8.2||Improvement|| > |lz4 -1|504 MB/s|516 MB/s|+2%| > |lz4 -9|23.2 MB/s|25.6 MB/s|+10%| > |lz4 -12|3.5 Mb/s|9.5 MB/s|+170%| > More details: > [https://github.com/lz4/lz4/releases/tag/v1.8.2] > > Below is my benchmark result: > lz4-java 1.5.0 run {{FilterPushdownBenchmark}}: > {noformat} > [success] Total time: 11592 s, completed Sep 26, 2018 2:12:12 PM > {noformat} > lz4-java 1.4.0 run {{FilterPushdownBenchmark}}: > {noformat} > [success] Total time: 11809 s, completed Sep 26, 2018 2:15:49 PM > {noformat} > Some benchmark result: > {noformat} > lz4-java 1.5.0 Select 90% decimal(9, 2) rows (value < 14155776): Best/Avg > Time(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 10350 / 10387 1.5 > 658.0 1.0X > Parquet Vectorized (Pushdown) 10381 / 10490 1.5 > 660.0 1.0X > Native ORC Vectorized 9997 / 10040 1.6 > 635.6 1.0X > Native ORC Vectorized (Pushdown) 10042 / 10095 1.6 > 638.4 1.0X > lz4-java 1.4.0 Select 90% decimal(9, 2) rows (value < 14155776): Best/Avg > Time(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 10272 / 10367 1.5 > 653.1 1.0X > Parquet Vectorized (Pushdown) 10279 / 10357 1.5 > 653.5 1.0X > Native ORC Vectorized 10367 / 10422 1.5 > 659.1 1.0X > Native ORC Vectorized (Pushdown) 10521 / 10912 1.5 > 668.9 1.0X > lz4-java 1.5.0 Select 90% decimal(18, 2) rows (value < 14155776): Best/Avg > Time(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 9730 / 9747 1.6 > 618.6 1.0X > Parquet Vectorized (Pushdown) 9391 / 9405 1.7 > 597.1 1.0X > Native ORC Vectorized 9359 / 9401 1.7 > 595.1 1.0X > Native ORC Vectorized (Pushdown) 9413 / 9460 1.7 > 598.5 1.0X > lz4-java 1.4.0 Select 90% decimal(18, 2) rows (value < 14155776): Best/Avg > Time(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 9737 / 9778 1.6 > 619.1 1.0X > Parquet Vectorized (Pushdown) 9409 / 9436 1.7 > 598.2 1.0X > Native ORC Vectorized 9625 / 9657 1.6 > 611.9 1.0X > Native ORC Vectorized (Pushdown) 10283 / 10309 1.5 > 653.8 0.9X > lz4-java 1.5.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg > Time(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 6035 / 6282 2.6 > 383.7 1.0X > Parquet Vectorized (Pushdown) 1463 / 1476 10.8 > 93.0 4.1X > Native ORC Vectorized 4112 / 4209 3.8 > 261.4 1.5X > Native ORC Vectorized (Pushdown) 1309 / 1377 12.0 > 83.2 4.6X > lz4-java 1.4.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg > Time(ms) Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 6911 / 7753 2.3 > 439.4 1.0X > Parquet Vectorized (Pushdown) 1551 / 1634 10.1 > 98.6 4.5X > Native ORC Vectorized 4875 / 5788 3.2 > 310.0 1.4X > Native ORC Vectorized (Pushdown) 1456 / 1652 10.8 > 92.6 4.7X > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org