GitHub user mhmoudr opened a pull request:

    https://github.com/apache/spark/pull/13588

    SPARK-15858: Fix calculating error by tree stack over flow problem an…

    ## What changes were proposed in this pull request?
    
    Improving evaluateEachIteration function in mllib as it fails when trying 
to calculate error by tree for a model that has more than 500 trees 
    
    ## How was this patch tested?
    
    the batch tested on productions data set (2K rows x 2K features) training a 
gradient boosted model without validation with 1000 maxIteration settings, then 
trying to produce the error by tree, the new patch was able to perform the 
calculation within 30 seconds, while previously it was take hours then fail.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/mhmoudr/spark SPARK-15858.1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13588.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13588
    
----
commit 4726937bacd6ee43dd12b27e1746bc708e99c6da
Author: Mahmoud Rawas <mhmo...@gmail.com>
Date:   2016-06-10T01:27:21Z

    SPARK-15858: Fix calculating error by tree stack over flow problem and over 
memory allocation issue for a model that have 2000+ trees.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to