[ https://issues.apache.org/jira/browse/SPARK-3623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Guoqiang Li updated SPARK-3623: ------------------------------- Description: Consider the following code: {code} for (i <- 0 until totalIter) { val previousCorpus = corpus logInfo("Start Gibbs sampling (Iteration %d/%d)".format(i, totalIter)) val corpusTopicDist = collectTermTopicDist(corpus, globalTopicCounter, sumTerms, numTerms, numTopics, alpha, beta).persist(storageLevel) val corpusSampleTopics = sampleTopics(corpusTopicDist, globalTopicCounter, sumTerms, numTerms, numTopics, alpha, beta).persist(storageLevel) corpus = updateCounter(corpusSampleTopics, numTopics).persist(storageLevel) globalTopicCounter = collectGlobalCounter(corpus, numTopics) assert(bsum(globalTopicCounter) == sumTerms) previousCorpus.unpersistVertices() corpusTopicDist.unpersistVertices() corpusSampleTopics.unpersistVertices() } {code} If there is no checkpoint operation will appear the following problems. 1. The RDD of corpus dependencies are too deep 2. The shuffle files are too large. 3. Any of a server crash will cause the algorithm to recalculate was: Consider the following code: {code} for (i <- 0 until totalIter) { val previousCorpus = corpus logInfo("Start Gibbs sampling (Iteration %d/%d)".format(i, totalIter)) val corpusTopicDist = collectTermTopicDist(corpus, globalTopicCounter, sumTerms, numTerms, numTopics, alpha, beta).persist(storageLevel) val corpusSampleTopics = sampleTopics(corpusTopicDist, globalTopicCounter, sumTerms, numTerms, numTopics, alpha, beta).persist(storageLevel) corpus = updateCounter(corpusSampleTopics, numTopics).persist(storageLevel) globalTopicCounter = collectGlobalCounter(corpus, numTopics) assert(bsum(globalTopicCounter) == sumTerms) previousCorpus.unpersistVertices() corpusTopicDist.unpersistVertices() corpusSampleTopics.unpersistVertices() } {code} If there is no checkpoint operation will appear the following problems. 1. The RDD of corpus dependencies are too deep 2. The shuffle files are too large. > GraphX does not support the checkpoint operation > ------------------------------------------------ > > Key: SPARK-3623 > URL: https://issues.apache.org/jira/browse/SPARK-3623 > Project: Spark > Issue Type: Improvement > Components: GraphX > Affects Versions: 1.0.2, 1.1.0 > Reporter: Guoqiang Li > Priority: Critical > > Consider the following code: > {code} > for (i <- 0 until totalIter) { > val previousCorpus = corpus > logInfo("Start Gibbs sampling (Iteration %d/%d)".format(i, totalIter)) > val corpusTopicDist = collectTermTopicDist(corpus, globalTopicCounter, > sumTerms, > numTerms, numTopics, alpha, beta).persist(storageLevel) > val corpusSampleTopics = sampleTopics(corpusTopicDist, > globalTopicCounter, sumTerms, numTerms, > numTopics, alpha, beta).persist(storageLevel) > corpus = updateCounter(corpusSampleTopics, > numTopics).persist(storageLevel) > globalTopicCounter = collectGlobalCounter(corpus, numTopics) > assert(bsum(globalTopicCounter) == sumTerms) > previousCorpus.unpersistVertices() > corpusTopicDist.unpersistVertices() > corpusSampleTopics.unpersistVertices() > } > {code} > If there is no checkpoint operation will appear the following problems. > 1. The RDD of corpus dependencies are too deep > 2. The shuffle files are too large. > 3. Any of a server crash will cause the algorithm to recalculate -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org