So far, the canonical way to materialize an RDD just to make sure it's
cached is to call count(). That's fine but incurs the overhead of
actually counting the elements.

However, rdd.foreachPartition(p => None) for example also seems to
cause the RDD to be materialized, and is a no-op. Is that a better way
to do it or am I not thinking of why it's insufficient?

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to