I dumped the trees in the random forest model, and occasionally saw a leaf
node with strange stats:

- pred=1.000000 prob=0.800000 imp=-1.000000
gain=-179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000


Here impurity = -1 and gain = a giant negative number.  Normally, I would
get a None from Node.stats at a leaf node.  Here it printed because Some(s)
matches:

            node.stats match {
                case Some(s) => println(" imp=%f gain=%f" format(s.impurity,
s.gain))
                case None => println
            }


Is it a bug?

This doesn't seem happening in the model from DecisionTree, but my data sets
are limited.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-stats-in-Random-Forest-leaf-node-tp22087.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to