GitHub user smurching opened a pull request:

    https://github.com/apache/spark/pull/13881

    [SPARK-3723] [MLlib] Adding instrumentation to random forests

    ## What changes were proposed in this pull request?
    
    In RandomForest.run(), added instrumentation for the number of node groups, 
along with the min, max, and average number of nodes per group.
    
    Also fixed a typo in BaggedPoint.scala documentation.
    
    
    ## How was this patch tested?
    
    Tested by running RandomForestClassifierSuite, checking the test output 
manually to make sure instrumentation information was present and reasonable.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/smurching/spark random-forest-instrumentation

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13881.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13881
    
----
commit 8f45533b9a5f7c3c1f46d0d15a9f1815fa6227d5
Author: Siddharth Murching <smurch...@databricks.com>
Date:   2016-06-23T23:40:26Z

    Fix typo in BaggedPoint.scala, add simple instrumentation to Random Forests

commit bd7d24d4f5a79eca6ff9629706c254beba74bc45
Author: Siddharth Murching <smurch...@databricks.com>
Date:   2016-06-24T00:40:02Z

    Reorder instrumentation logging statements to look nicer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to