GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/22551
[SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement ## What changes were proposed in this pull request? This PR upgrade `lz4-java` to 1.5.0 get speed improvement. **General speed improvements** LZ4 decompression speed has always been a strong point. In v1.8.2, this gets even better, as it improves decompression speed by about 10%, thanks in a large part to suggestion from @svpv . For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU @ 3.10GHz, running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0: Version | v1.8.1 | v1.8.2 | Improvement -- | -- | -- | -- Decompression speed | 2490 MB/s | 2770 MB/s | +11% Compression speeds also receive a welcomed boost, though improvement is not evenly distributed, with higher levels benefiting quite a lot more. Version | v1.8.1 | v1.8.2 | Improvement -- | -- | -- | -- lz4 -1 | 504 MB/s | 516 MB/s | +2% lz4 -9 | 23.2 MB/s | 25.6 MB/s | +10% lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170% More details: https://github.com/lz4/lz4/releases/tag/v1.8.2 **Below is my benchmark result**: z4-java 1.5.0 run FilterPushdownBenchmark: ``` [success] Total time: 11592 s, completed Sep 26, 2018 2:12:12 PM ``` lz4-java 1.4.0 run FilterPushdownBenchmark: ``` [success] Total time: 11809 s, completed Sep 26, 2018 2:15:49 PM ``` Some benchmark result: ``` lz4-java 1.5.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 6035 / 6282 2.6 383.7 1.0X Parquet Vectorized (Pushdown) 1463 / 1476 10.8 93.0 4.1X Native ORC Vectorized 4112 / 4209 3.8 261.4 1.5X Native ORC Vectorized (Pushdown) 1309 / 1377 12.0 83.2 4.6X lz4-java 1.4.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg Time(ms) Rate(M/s) Per Row(ns) Relative ------------------------------------------------------------------------------------------------ Parquet Vectorized 6911 / 7753 2.3 439.4 1.0X Parquet Vectorized (Pushdown) 1551 / 1634 10.1 98.6 4.5X Native ORC Vectorized 4875 / 5788 3.2 310.0 1.4X Native ORC Vectorized (Pushdown) 1456 / 1652 10.8 92.6 4.7X ``` ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-25539 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/22551.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #22551 ---- commit 331298b81403894ba3d9a95b71efcba6f718063c Author: Yuming Wang <yumwang@...> Date: 2018-09-26T06:27:37Z Upgrade lz4-java to 1.5.0 ---- --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org