Github user srowen commented on the issue: https://github.com/apache/spark/pull/23016 I'm not sure about that, because the returned PrefixSpanModel has an RDD that depends on that RDD. We could cache the final RDD instead and materialize it; that could make more sense. In other places we have done such a thing only when the input is cached, in order to kind of follow the caller's lead, but there isn't a consistent standard for this. I'd be OK improving this to persist the final RDD only, and then unpersist the intermediate one. That makes at least more sense. You can cache at the same storage level as the input (which might be NONE)
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org