Cache'ing performance

Maciej Bryński Sat, 27 Aug 2016 13:40:05 -0700

Hi,
I did some benchmark of cache function today.

*RDD*
sc.parallelize(0 until Int.MaxValue).cache().count()


*Datasets*
spark.range(Int.MaxValue).cache().count()

For me Datasets was 2 times slower.

Results (3 nodes, 20 cores and 48GB RAM each)
*RDD - 6s*
*Datasets - 13,5 s*

Is that expected behavior for Datasets and Encoders ?

Regards,
-- 
Maciek Bryński

Cache'ing performance

Reply via email to