GitHub user wangyum opened a pull request:

    https://github.com/apache/spark/pull/22551

    [SPARK-25539][BUILD] Upgrade lz4-java to 1.5.0 get speed improvement

    ## What changes were proposed in this pull request?
    
    This PR upgrade `lz4-java` to 1.5.0 get speed improvement.
    
    **General speed improvements**
    
    LZ4 decompression speed has always been a strong point. In v1.8.2, this 
gets even better, as it improves decompression speed by about 10%, thanks in a 
large part to suggestion from @svpv .
    
    For example, on a Mac OS-X laptop with an Intel Core i7-5557U CPU @ 3.10GHz,
    running lz4 -bsilesia.tar compiled with default compiler llvm v9.1.0:
    
    Version | v1.8.1 | v1.8.2 | Improvement
    -- | -- | -- | --
    Decompression speed | 2490 MB/s | 2770 MB/s | +11%
    
    
    Compression speeds also receive a welcomed boost, though improvement is not 
evenly distributed, with higher levels benefiting quite a lot more.
    
    Version | v1.8.1 | v1.8.2 | Improvement
    -- | -- | -- | --
    lz4 -1 | 504 MB/s | 516 MB/s | +2%
    lz4 -9 | 23.2 MB/s | 25.6 MB/s | +10%
    lz4 -12 | 3.5 Mb/s | 9.5 MB/s | +170%
    
    More details:
    https://github.com/lz4/lz4/releases/tag/v1.8.2
    
    **Below is my benchmark result**:
    z4-java 1.5.0 run FilterPushdownBenchmark:
    ```
    [success] Total time: 11592 s, completed Sep 26, 2018 2:12:12 PM
    ```
    lz4-java 1.4.0 run FilterPushdownBenchmark:
    ```
    [success] Total time: 11809 s, completed Sep 26, 2018 2:15:49 PM
    ```
    Some benchmark result:
    ```
    lz4-java 1.5.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg 
Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            6035 / 6282          2.6      
   383.7       1.0X
    Parquet Vectorized (Pushdown)                 1463 / 1476         10.8      
    93.0       4.1X
    Native ORC Vectorized                         4112 / 4209          3.8      
   261.4       1.5X
    Native ORC Vectorized (Pushdown)              1309 / 1377         12.0      
    83.2       4.6X
    
    
    lz4-java 1.4.0 Select 10% decimal(38, 2) rows (value < 1572864): Best/Avg 
Time(ms)    Rate(M/s)   Per Row(ns)   Relative
    
------------------------------------------------------------------------------------------------
    Parquet Vectorized                            6911 / 7753          2.3      
   439.4       1.0X
    Parquet Vectorized (Pushdown)                 1551 / 1634         10.1      
    98.6       4.5X
    Native ORC Vectorized                         4875 / 5788          3.2      
   310.0       1.4X
    Native ORC Vectorized (Pushdown)              1456 / 1652         10.8      
    92.6       4.7X
    ```
    
    
    ## How was this patch tested?
    
    manual tests


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/wangyum/spark SPARK-25539

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22551.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22551
    
----
commit 331298b81403894ba3d9a95b71efcba6f718063c
Author: Yuming Wang <yumwang@...>
Date:   2018-09-26T06:27:37Z

    Upgrade lz4-java to 1.5.0

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to