So far, the canonical way to materialize an RDD just to make sure it's cached is to call count(). That's fine but incurs the overhead of actually counting the elements.
However, rdd.foreachPartition(p => None) for example also seems to cause the RDD to be materialized, and is a no-op. Is that a better way to do it or am I not thinking of why it's insufficient? --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org