[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

MLnick Mon, 10 Jul 2017 22:16:37 -0700

Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18305
  
    I get it in source code. I'm indifferent in tests. If it's easy sure.
    
    How does it help uncover bugs though?
    On Tue, 11 Jul 2017 at 04:42, Yanbo Liang <notificati...@github.com> wrote:
    
    > *@yanboliang* commented on this pull request.
    > ------------------------------
    >
    > In
    > 
mllib/src/test/scala/org/apache/spark/ml/optim/aggregator/LogisticAggregatorSuite.scala
    > <https://github.com/apache/spark/pull/18305#discussion_r126582816>:
    >
    > > +    )
    > +  }
    > +
    > +
    > +  /** Get summary statistics for some data and create a new 
LogisticAggregator. */
    > +  private def getNewAggregator(
    > +      instances: Array[Instance],
    > +      coefficients: Vector,
    > +      fitIntercept: Boolean,
    > +      isMultinomial: Boolean): LogisticAggregator = {
    > +    val (featuresSummarizer, ySummarizer) =
    > +      
DifferentiableLossAggregatorSuite.getClassificationSummarizers(instances)
    > +    val numClasses = ySummarizer.histogram.length
    > +    val featuresStd = featuresSummarizer.variance.toArray.map(math.sqrt)
    > +    val bcFeaturesStd = spark.sparkContext.broadcast(featuresStd)
    > +    val bcCoefficients = spark.sparkContext.broadcast(coefficients)
    >
    > I think we always try to destroy broadcast variable explicitly both in
    > source code and test cases, like here
    > <https://github.com/apache/spark/pull/18152>. Of course, these broadcast
    > variables can be destroyed after spark session is torn down.
    > The reason of why we do this in source code is users application may be
    > long-time running, so it will accumulate lots of these variables, waste
    > lots of resource and slower your application.
    > The reason of why we do this in test case is we should keep same code
    > route as in source code. Since we have encountered similar bugs which was
    > not covered by test cases.
    > But in this case, I think it's safe to not destroy these variables. I just
    > suggested to follow MLlib's convention. Thanks.
    >
    > â
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/18305#discussion_r126582816>, or 
mute
    > the thread
    > 
<https://github.com/notifications/unsubscribe-auth/AA_SB6vnRqfFLxdk23k9Q51rWhMu0KgCks5sMuEdgaJpZM4N6TDe>
    > .
    >




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18305: [SPARK-20988][ML] Logistic regression uses aggregator hi...

Reply via email to