Hi, I did some benchmark of cache function today. *RDD* sc.parallelize(0 until Int.MaxValue).cache().count()
*Datasets* spark.range(Int.MaxValue).cache().count() For me Datasets was 2 times slower. Results (3 nodes, 20 cores and 48GB RAM each) *RDD - 6s* *Datasets - 13,5 s* Is that expected behavior for Datasets and Encoders ? Regards, -- Maciek Bryński