dbtsai commented on issue #26085: [SPARK-29434][Core] Improve the MapStatuses 
Serialization Performance
URL: https://github.com/apache/spark/pull/26085#issuecomment-544750761
 
 
   @tgravescs The following the result ran on my desktop. LZ4 is 5x faster but 
creates 1.6x bigger data. Wondering should we trade the serialization time with 
larger data?
   
   1. ZSTD
   
   ```scala
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14.2
   Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
   200000 MapOutputs, 1000 blocks w/o broadcast:  Best Time(ms)   Avg Time(ms)  
 Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Serialization                                      3340           3355       
   21          0.1       16700.1       1.0X
   Deserialization                                     650            660       
   14          0.3        3248.6       5.1X
   
   Compressed Serialized MapStatus sizes: 123 MB
   Compressed Serialized Broadcast MapStatus sizes: 0 bytes
   ```
   
   2. LZ4
   ```scala
   Running benchmark: 200000 MapOutputs, 1000 blocks w/o broadcast
     Running case: Serialization
     Stopped after 3 iterations, 2109 ms
     Running case: Deserialization
     Stopped after 5 iterations, 2424 ms
   
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14.2
   Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
   200000 MapOutputs, 1000 blocks w/o broadcast:  Best Time(ms)   Avg Time(ms)  
 Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Serialization                                       677            703       
   32          0.3        3383.6       1.0X
   Deserialization                                     466            485       
   27          0.4        2331.1       1.5X
   
   Compressed Serialized MapStatus sizes: 194 MB
   Compressed Serialized Broadcast MapStatus sizes: 0 bytes
   ```
   
   2. LZF
   ```scala
   Java HotSpot(TM) 64-Bit Server VM 1.8.0_161-b12 on Mac OS X 10.14.2
   Intel(R) Xeon(R) CPU E5-1650 v2 @ 3.50GHz
   200000 MapOutputs, 1000 blocks w/o broadcast:  Best Time(ms)   Avg Time(ms)  
 Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
   
------------------------------------------------------------------------------------------------------------------------
   Serialization                                      2199           2202       
    4          0.1       10994.6       1.0X
   Deserialization                                     690            720       
   46          0.3        3450.6       3.2X
   
   Compressed Serialized MapStatus sizes: 182 MB
   Compressed Serialized Broadcast MapStatus sizes: 0 bytes
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to