Github user XXXShao commented on a diff in the pull request:

    https://github.com/apache/spark/pull/16722#discussion_r137897201
  
    --- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala ---
    @@ -1002,9 +1018,9 @@ private[spark] object RandomForest extends Logging {
           val numSplits = metadata.numSplits(featureIndex)
     
           // get count for each distinct value
    -      val (valueCountMap, numSamples) = 
featureSamples.foldLeft((Map.empty[Double, Int], 0)) {
    -        case ((m, cnt), x) =>
    -          (m + ((x, m.getOrElse(x, 0) + 1)), cnt + 1)
    +      val (valueCountMap, numSamples) = 
featureSamples.foldLeft((Map.empty[Double, Double], 0.0)) {
    --- End diff --
    
    Hi, thanks for your contribution~ I have a question about considering 
weight info in findSplitsForContinuousFeature here. It looks the continuous 
features will be influenced much more by instance weight because the weight 
part is considered twice: (1)make split (2) calculate impurity. Normally weight 
is only mentioned in impurity calculation part according to limited papers I 
have read. Could you provide some reference you refer here?  And correct me if 
I misunderstand your code. :) Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to