GitHub user smurching opened a pull request: https://github.com/apache/spark/pull/13881
[SPARK-3723] [MLlib] Adding instrumentation to random forests ## What changes were proposed in this pull request? In RandomForest.run(), added instrumentation for the number of node groups, along with the min, max, and average number of nodes per group. Also fixed a typo in BaggedPoint.scala documentation. ## How was this patch tested? Tested by running RandomForestClassifierSuite, checking the test output manually to make sure instrumentation information was present and reasonable. You can merge this pull request into a Git repository by running: $ git pull https://github.com/smurching/spark random-forest-instrumentation Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/13881.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #13881 ---- commit 8f45533b9a5f7c3c1f46d0d15a9f1815fa6227d5 Author: Siddharth Murching <smurch...@databricks.com> Date: 2016-06-23T23:40:26Z Fix typo in BaggedPoint.scala, add simple instrumentation to Random Forests commit bd7d24d4f5a79eca6ff9629706c254beba74bc45 Author: Siddharth Murching <smurch...@databricks.com> Date: 2016-06-24T00:40:02Z Reorder instrumentation logging statements to look nicer ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org