GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/22551
[SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement
## What changes were proposed in this pull request?
This PR upgrade `lz4-java` to 1.5.0 get speed improvement.
**General speed improvements**
LZ4 decompression speed has always been a strong point. In v1.8.2, this
gets even better, as it improves decompression speed by about 10%, thanks in a
large part to suggestion from @svpv .
For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU @ 3.10GHz,
running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0:
Version | v1.8.1 | v1.8.2 | Improvement
-- | -- | -- | --
Decompression speed | 2490 MB/s | 2770 MB/s | +11%
Compression speeds also receive a welcomed boost, though improvement is not
evenly distributed, with higher levels benefiting quite a lot more.
Version | v1.8.1 | v1.8.2 | Improvement
-- | -- | -- | --
lz4 -1 | 504 MB/s | 516 MB/s | +2%
lz4 -9 | 23.2 MB/s | 25.6 MB/s | +10%
lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170%
More details:
https://github.com/lz4/lz4/releases/tag/v1.8.2
**Below is my benchmark result**:
z4-java 1.5.0 run FilterPushdownBenchmark:
```
[success] Total time: 11592 s, completed Sep 26, 2018 2:12:12 PM
```
lz4-java 1.4.0 run FilterPushdownBenchmark:
```
[success] Total time: 11809 s, completed Sep 26, 2018 2:15:49 PM
```
Some benchmark result:
```
lz4-java 1.5.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg
Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 6035 / 6282 2.6
383.7 1.0X
Parquet Vectorized (Pushdown) 1463 / 1476 10.8
93.0 4.1X
Native ORC Vectorized 4112 / 4209 3.8
261.4 1.5X
Native ORC Vectorized (Pushdown) 1309 / 1377 12.0
83.2 4.6X
lz4-java 1.4.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg
Time(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------
Parquet Vectorized 6911 / 7753 2.3
439.4 1.0X
Parquet Vectorized (Pushdown) 1551 / 1634 10.1
98.6 4.5X
Native ORC Vectorized 4875 / 5788 3.2
310.0 1.4X
Native ORC Vectorized (Pushdown) 1456 / 1652 10.8
92.6 4.7X
```
## How was this patch tested?
manual tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-25539
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22551.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22551
----
commit 331298b81403894ba3d9a95b71efcba6f718063c
Author: Yuming Wang <yumwang@...>
Date: 2018-09-26T06:27:37Z
Upgrade lz4-java to 1.5.0
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]