I dumped the trees in the random forest model, and occasionally saw a leaf node with strange stats:
- pred=1.000000 prob=0.800000 imp=-1.000000 gain=-179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000.000000 Here impurity = -1 and gain = a giant negative number. Normally, I would get a None from Node.stats at a leaf node. Here it printed because Some(s) matches: node.stats match { case Some(s) => println(" imp=%f gain=%f" format(s.impurity, s.gain)) case None => println } Is it a bug? This doesn't seem happening in the model from DecisionTree, but my data sets are limited. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Garbage-stats-in-Random-Forest-leaf-node-tp22087.html Sent from the Apache Spark User List mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org