[GitHub] spark pull request #13588: SPARK-15858: Fix calculating error by tree stack ...

2016-06-12 Thread mhmoudr
Github user mhmoudr closed the pull request at:

https://github.com/apache/spark/pull/13588


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13588: SPARK-15858: Fix calculating error by tree stack ...

2016-06-09 Thread mhmoudr
GitHub user mhmoudr opened a pull request:

https://github.com/apache/spark/pull/13588

SPARK-15858: Fix calculating error by tree stack over flow problem an…

## What changes were proposed in this pull request?

Improving evaluateEachIteration function in mllib as it fails when trying 
to calculate error by tree for a model that has more than 500 trees 

## How was this patch tested?

the batch tested on productions data set (2K rows x 2K features) training a 
gradient boosted model without validation with 1000 maxIteration settings, then 
trying to produce the error by tree, the new patch was able to perform the 
calculation within 30 seconds, while previously it was take hours then fail.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mhmoudr/spark SPARK-15858.1.6

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/13588.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13588


commit 4726937bacd6ee43dd12b27e1746bc708e99c6da
Author: Mahmoud Rawas 
Date:   2016-06-10T01:27:21Z

SPARK-15858: Fix calculating error by tree stack over flow problem and over 
memory allocation issue for a model that have 2000+ trees.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org