[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

MLnick Fri, 04 Nov 2016 00:14:30 -0700

Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15593#discussion_r86497058
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
    @@ -1486,57 +1489,75 @@ private class LogisticAggregator(
         var marginOfLabel = 0.0
         var maxMargin = Double.NegativeInfinity
     
    -    val margins = Array.tabulate(numClasses) { i =>
    -      var margin = 0.0
    -      features.foreachActive { (index, value) =>
    -        if (localFeaturesStd(index) != 0.0 && value != 0.0) {
    -          margin += localCoefficients(i * numFeaturesPlusIntercept + 
index) *
    -            value / localFeaturesStd(index)
    -        }
    +    val margins = new Array[Double](numClasses)
    --- End diff --
    
    I think a slightly more detailed comment would be good for the aggregator, 
something like the following (please feel free to make it more clear):
    
    ```
    In order to avoid unnecessary computation during calculation of the 
gradient updates,
    we lay out the coefficients in column major order during training. This 
allows us to 
    perform feature standardization once, while still retaining sequential 
memory access 
    for speed. We convert back to row-major order when we create the model, 
    since this form is optimal for the matrix operations used for prediction.
    ```
    
    Here we can amend slightly to say:
    ```
    Additionally, since the coefficients were laid out in column major order 
during training to
    avoid extra computation, we convert them back to row major before passing 
them to the model.
    ```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15593: [SPARK-18060][ML] Avoid unnecessary computation f...

Reply via email to