Hi,
I did some benchmark of cache function today.

*RDD*
sc.parallelize(0 until Int.MaxValue).cache().count()

*Datasets*
spark.range(Int.MaxValue).cache().count()

For me Datasets was 2 times slower.

Results (3 nodes, 20 cores and 48GB RAM each)
*RDD - 6s*
*Datasets - 13,5 s*

Is that expected behavior for Datasets and Encoders ?

Regards,
-- 
Maciek Bryński

Reply via email to