[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

hhbyyh Wed, 25 Oct 2017 17:24:51 -0700

Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19565#discussion_r147021004
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -446,14 +445,14 @@ final class OnlineLDAOptimizer extends LDAOptimizer 
with Logging {
       override private[clustering] def next(): OnlineLDAOptimizer = {
         val batch = docs.sample(withReplacement = sampleWithReplacement, 
miniBatchFraction,
           randomGenerator.nextLong())
    -    if (batch.isEmpty()) return this
         submitMiniBatch(batch)
       }
     
       /**
        * Submit a subset (like 1%, decide by the miniBatchFraction) of the 
corpus to the Online LDA
        * model, and it will update the topic distribution adaptively for the 
terms appearing in the
        * subset.
    +   * The methods assumes no empty documents are submitted.
    --- End diff --
    
    Maybe add a require



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19565: [SPARK-22111][MLLIB] OnlineLDAOptimizer should fi...

Reply via email to