This is the default value (Double.MinValue) for invalid gain: https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/model/InformationGainStats.scala#L67
Please ignore it. Maybe we should update `toString` to use scientific notation. -Xiangrui On Mon, Mar 16, 2015 at 5:19 PM, cjwang <c...@cjwang.us> wrote: > I dumped the trees in the random forest model, and occasionally saw a leaf > node with strange stats: > > - pred=1.000000 prob=0.800000 imp=-1.000000 > gain=-179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 > > > Here impurity = -1 and gain = a giant negative number. Normally, I would > get a None from Node.stats at a leaf node. Here it printed because Some(s) > matches: > > node.stats match { > case Some(s) => println(" imp=%f gain=%f" format(s.impurity, > s.gain)) > case None => println > } > > > Is it a bug? > > This doesn't seem happening in the model from DecisionTree, but my data sets > are limited. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-stats-in-Random-Forest-leaf-node-tp22087.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org