Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/23016#discussion_r234395721
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/fpm/PrefixSpan.scala 
---
    @@ -174,6 +174,10 @@ class PrefixSpan private (
         val freqSequences = results.map { case (seq: Array[Int], count: Long) 
=>
           new FreqSequence(toPublicRepr(seq), count)
         }
    +    // Cache the final RDD to the same storage level as input
    +    freqSequences.persist(data.getStorageLevel)
    --- End diff --
    
    The problem here is that it won't get persisted until something 
materializes it, and at that point its dependent RDD dataInternalRepr is 
already unpersisted.
    
    I'd say that _if_ the input's storage level isn't NONE, then persist 
freqSequences at the same level and .count() it to materialize it. Then 
unpersist dataInternalRepr in all events.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to