Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/15593#discussion_r86497058 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala --- @@ -1486,57 +1489,75 @@ private class LogisticAggregator( var marginOfLabel = 0.0 var maxMargin = Double.NegativeInfinity - val margins = Array.tabulate(numClasses) { i => - var margin = 0.0 - features.foreachActive { (index, value) => - if (localFeaturesStd(index) != 0.0 && value != 0.0) { - margin += localCoefficients(i * numFeaturesPlusIntercept + index) * - value / localFeaturesStd(index) - } + val margins = new Array[Double](numClasses) --- End diff -- I think a slightly more detailed comment would be good for the aggregator, something like the following (please feel free to make it more clear): ``` In order to avoid unnecessary computation during calculation of the gradient updates, we lay out the coefficients in column major order during training. This allows us to perform feature standardization once, while still retaining sequential memory access for speed. We convert back to row-major order when we create the model, since this form is optimal for the matrix operations used for prediction. ``` Here we can amend slightly to say: ``` Additionally, since the coefficients were laid out in column major order during training to avoid extra computation, we convert them back to row major before passing them to the model. ```
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org