This is the default value (Double.MinValue) for invalid gain:

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/model/InformationGainStats.scala#L67

Please ignore it. Maybe we should update `toString` to use scientific notation.

-Xiangrui


On Mon, Mar 16, 2015 at 5:19 PM, cjwang <c...@cjwang.us> wrote:
> I dumped the trees in the random forest model, and occasionally saw a leaf
> node with strange stats:
>
> - pred=1.000000 prob=0.800000 imp=-1.000000
> gain=-179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000
>
>
> Here impurity = -1 and gain = a giant negative number.  Normally, I would
> get a None from Node.stats at a leaf node.  Here it printed because Some(s)
> matches:
>
>             node.stats match {
>                 case Some(s) => println(" imp=%f gain=%f" format(s.impurity,
> s.gain))
>                 case None => println
>             }
>
>
> Is it a bug?
>
> This doesn't seem happening in the model from DecisionTree, but my data sets
> are limited.
>
>
>
> --
> View this message in context: 
> http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-stats-in-Random-Forest-leaf-node-tp22087.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to