[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-05-10 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16005844#comment-16005844
 ] 

Hyukjin Kwon commented on SPARK-20228:
--

gentle ping [~Ansgar Schulze]

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957257#comment-15957257
 ] 

Sean Owen commented on SPARK-20228:
---

I don't think I would expect the exact same results even with identical 
settings. These are stochastic implementations. More resource can change 
partitioning, etc. 

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Ansgar Schulze (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957411#comment-15957411
 ] 

Ansgar Schulze commented on SPARK-20228:


Hi Sean, thanks for your comment!
Its not a problem that the results are not the same with the second 
configuration but they are always much worse (about 20% more wrong predictions 
for the test data set). 

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15957530#comment-15957530
 ] 

Sean Owen commented on SPARK-20228:
---

This alone wouldn't matter directly, but could certainly matter indirectly. 
More resource might mean you can run faster, and build better trees in the same 
time. More memory might mean you run fewer executors and the way the trees are 
built might actually benefit from that. There is a maxMemoryMB parameter you 
might be increasing which allows more splits to be evaluated. This isn't enough 
info and isn't evidence of a problem.

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-06 Thread Ansgar Schulze (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958719#comment-15958719
 ] 

Ansgar Schulze commented on SPARK-20228:


Should not be there a WARN message if there is a memory issue? There is one if 
i change the maxMemoryInMB parameter to a low value but not in this situation 
here.

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-06 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958755#comment-15958755
 ] 

Sean Owen commented on SPARK-20228:
---

I don't think there's a problem here. I'm saying that if you vary these 
parameters you might legitimately get better or worse results. I'm also asking 
if you are varying these other things. Also is this consistent or just on one 
run?

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-06 Thread Ansgar Schulze (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959168#comment-15959168
 ] 

Ansgar Schulze commented on SPARK-20228:


I just varied spark.executor.memory, everything else is constant. And I 
performed 5 runs with each configuration.

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-07 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15960471#comment-15960471
 ] 

Sean Owen commented on SPARK-20228:
---

Without more detail I'm not sure what to make of it. Just giving more memory 
shouldn't change anything directly, but it could affect things like caching and 
therefore locality, could affect whether your jobs are failing for lack of 
memory. I have never encountered this when working with decision forests and 
varying memory settings. It's hard to investigate with no reproduction. Can you 
suggest what the issue is?

> Random Forest instable results depending on spark.executor.memory
> -
>
> Key: SPARK-20228
> URL: https://issues.apache.org/jira/browse/SPARK-20228
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.1.0
>Reporter: Ansgar Schulze
>
> If I deploy a random forrest modeling with example 
> spark.executor.memory20480M
> I got another result as if i depoy the modeling with
> spark.executor.memory6000M
> I excpected the same results but different runtimes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org