[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

jkbradley Mon, 30 Apr 2018 10:17:32 -0700

Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/21183#discussion_r185049467
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala ---
    @@ -605,14 +609,16 @@ private[clustering] object OnlineLDAOptimizer {
           expElogbeta: BDM[Double],
           alpha: breeze.linalg.Vector[Double],
           gammaShape: Double,
    -      k: Int): (BDV[Double], BDM[Double], List[Int]) = {
    +      k: Int,
    +      seed: Long): (BDV[Double], BDM[Double], List[Int]) = {
         val (ids: List[Int], cts: Array[Double]) = termCounts match {
           case v: DenseVector => ((0 until v.size).toList, v.values)
           case v: SparseVector => (v.indices.toList, v.values)
         }
         // Initialize the variational distribution q(theta|gamma) for the 
mini-batch
    +    val randBasis = new RandBasis(new 
org.apache.commons.math3.random.MersenneTwister(seed))
         val gammad: BDV[Double] =
    -      new Gamma(gammaShape, 1.0 / gammaShape).samplesVector(k)             
      // K
    +      new Gamma(gammaShape, 1.0 / gammaShape)(randBasis).samplesVector(k) 
// K
    --- End diff --
    
    nit: Note that the original spacing with the comment was intentional to 
match lines below.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21183: [SPARK-22210][ML] Add seed for LDA variationalTop...

Reply via email to