[ https://issues.apache.org/jira/browse/SPARK-25539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sean Owen updated SPARK-25539: ------------------------------ Priority: Minor (was: Major) > Update lz4-java to get speed improvement > ---------------------------------------- > > Key: SPARK-25539 > URL: https://issues.apache.org/jira/browse/SPARK-25539 > Project: Spark > Issue Type: Improvement > Components: Build > Affects Versions: 3.0.0 > Reporter: Yuming Wang > Priority: Minor > Fix For: 3.0.0 > > Attachments: FilterPushdownBenchmark-lz4-java-140-results.txt, > FilterPushdownBenchmark-lz4-java-150-results.txt > > > *General speed improvements* > ||Version||v1.8.1||v1.8.2||Improvement|| > |Decompression speed|2490 MB/s|2770 MB/s|+11%| > Compression speeds also receive a welcomed boost, though improvement is not > evenly distributed, with higher levels benefiting quite a lot more. > ||Version||v1.8.1||v1.8.2||Improvement|| > |lz4 -1|504 MB/s|516 MB/s|+2%| > |lz4 -9|23.2 MB/s|25.6 MB/s|+10%| > |lz4 -12|3.5 Mb/s|9.5 MB/s|+170%| > More details: > [https://github.com/lz4/lz4/releases/tag/v1.8.2] > > Below is my benchmark result (set {{spark.sql.parquet.compression.codec}} to > {{lz4}} and disable orc benchmark, then run FilterPushdownBenchmark): > lz4-java 1.5.0 run {{FilterPushdownBenchmark}}: > {noformat} > [success] Total time: 5585 s, completed Sep 26, 2018 5:22:16 PM > {noformat} > lz4-java 1.4.0 run {{FilterPushdownBenchmark}}: > {noformat} > [success] Total time: 5591 s, completed Sep 26, 2018 5:22:24 PM > {noformat} > Some benchmark result: > {noformat} > lz4-java 1.5.0 Select 1 row with 500 filters: Best/Avg Time(ms) > Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 1953 / 1980 0.0 > 1952502908.0 1.0X > Parquet Vectorized (Pushdown) 2541 / 2585 0.0 > 2541019869.0 0.8X > lz4-java 1.4.0 Select 1 row with 500 filters: Best/Avg Time(ms) > Rate(M/s) Per Row(ns) Relative > ------------------------------------------------------------------------------------------------ > Parquet Vectorized 1979 / 2103 0.0 > 1979328144.0 1.0X > Parquet Vectorized (Pushdown) 2596 / 2909 0.0 > 2596222118.0 0.8X > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org