[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-11-19 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15680080#comment-15680080
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 11/19/16 11:59 PM:
---

Ok guess I spoke too soon :), onto the next set of challenges, jenkins build 
report is here:  
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68897/


I ran each of these tests individually as well as together as a suite locally 
and they all passed, any ideas on how to address these?


was (Author: kanjilal):
Ok guess I spoke too soon :), onto the next set of challenges, jenkins build 
report is here:  
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68897/


I ran each of these tests individually as well as together as a suite and they 
all passed, any ideas on how to address these?

> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-11-11 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15658479#comment-15658479
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 11/11/16 11:20 PM:
---

ok so I've spent the last hour or so doing deeper investigations into the 
failures, I used this 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68531/ as a 
point of reference, listed below is what I found



LogisticRegressionSuite  (passed successfully on both my master and feature 
branches)   
   
OneVsRestSuite(passed successfully on both my master and feature branches) 
DataFrameStatSuite   (passed successfully on both my master and feature 
branches)   
   
DataFrameSuite (passed successfully on both my master and feature branches) 

 
SQLQueryTestSuite  (passed successfully on both my master and feature branches) 

  
ForeachSinkSuite  (passed successfully on both my master and feature branches)  


JavaAPISuite   (failed on both my master and feature branches)


The master branch does not have any code changes from me and the feature branch 
of course does 

I am running individual tests by issuing commands like the following from the 
root directory based on the documentation:  ./build/mvn test -P... 
-DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite


Therefore my conclusion so far based on the above jenkins report is that my 
changes have not introduced any new failures that were not already there, 
[~srowen] please let me know if my methodology is off anywhere




was (Author: kanjilal):
ok so I've spent the last hour or so doing deeper investigations into the 
failures, I used this 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/68531/ as a 
point of reference, listed below is what I found


java/scala test   my master 
branch  
my feature branch 
LogisticRegressionSuite  success

success
OneVsRestSuite success  

  success
DataFrameStatSuite   success

 success
DataFrameSuite  success 

success
SQLQueryTestSuitesuccess

 success
ForeachSinkSuite   success  

   success
JavaAPISuite  
failure 
   failure


The master branch does not have any code changes from me and the feature branch 
of course does 

I am running individual tests by issuing commands like the following from the 
root directory based on the documentation:  ./build/mvn test -P... 
-DwildcardSuites=none -Dtest=org.apache.spark.streaming.JavaAPISuite


Therefore my conclusion so far based on the above jenkins report is that my 
changes have not introduced any new failures that were not already there, 
[~srowen] please let me know if my methodology is off anywhere



> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> genera

[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-11-09 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15651806#comment-15651806
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 11/9/16 7:22 PM:
-

Ok , for some odd reason my local branch had the changes but weren't committed, 
PR is here:  
https://github.com/skanjila/spark/commit/ec0b2a81dc8362e84e70457873560d997a7cb244

I added the change to local[4] to both streaming as well as repl, based on what 
I'm seeing locally all Java/Scala changes should be accounted for and unit 
tests pass, the only exception is the code inside spark examples in 
PageViewStream.scala, should I change this, seems like it doesn't belong as 
part of the unit tests.


My next TODOs:
1) Change the example code if it makes sense in PageViewStream
2) Start the code changes to fix the python unit tests

Let me know thoughts or concerns.


was (Author: kanjilal):
Ok , for some odd reason my local branch had the changes but weren't committed, 
PR is here:  
https://github.com/skanjila/spark/commit/ec0b2a81dc8362e84e70457873560d997a7cb244

I added the change to local[4] to both streaming as well as repl, based on what 
I'm seeing locally all Java/Scala changes should be accounted for except for 
spark examples with the code inside PageViewStream.scala, should I change this, 
seems like it doesn't belong as part of the unit tests.


My next TODOs:
1) Change the example code if it makes sense in PageViewStream
2) Start the code changes to fix the python unit tests

Let me know thoughts or concerns.

> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-30 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620613#comment-15620613
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 10/30/16 9:46 PM:
--

[~srowen] Yes I read through that link and adjusted the PR title, however 
please do let me know if I can proceed adding more to this PR including python 
and other parts of the codebase.


was (Author: kanjilal):
[~srowen] Yes I read through that link and adjusted the PR title, I will 
Jenkins test this next, however please do let me know if I can proceed adding 
more to this PR including python and other parts of the codebase.

> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-30 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15620613#comment-15620613
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 10/30/16 9:40 PM:
--

[~srowen] Yes I read through that link and adjusted the PR title, I will 
Jenkins test this next, however please do let me know if I can proceed adding 
more to this PR including python and other parts of the codebase.


was (Author: kanjilal):
[~srowen] Yes I read through that and adjusted the PR title, I will Jenkins 
test this next, however please do let me know if I can proceed adding more to 
this PR including python and other parts of the codebase.

> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-18 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587627#comment-15587627
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 10/19/16 4:24 AM:
--

[~holdenk] finally getting time to look at this, so I am starting small, I made 
the change inside ContextCleanerSuite and HeartbeatReceiverSuite from local[2] 
tp local[4], per the documentation here 
(http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
 I ran mvn -P hadoop2 -Dsuites= org.apache.spark.HeartbeatReceiverSuite test--- 
looks like everything worked

I then ran mvn -P hadoop2 -Dsuites= org.apache.spark.ContextCleanerSuite test-- 
looks like everything worked as well

See the attachments and let me know if this is the right process to run single 
unit tests,  I'll start making changes to the other Suites , how would you like 
to see the output, should I just have attachments or just do a pull request 
from the new branch that I created?
Thanks

PS
Another question, running single unit tests like this takes forever, are there 
flags I can set to speed up the builds, even on my 15 inch macbook pro with SSD 
the builds shouldnt take this long :(.  


Let me know next steps to get this into a PR.






was (Author: kanjilal):
[~holdenk] finally getting time to look at this, so I am starting small, I made 
the change inside ContextCleanerSuite and HeartbeatReceiverSuite from local[2] 
tp local[4], per the documentation here 
(http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
 I ran mvn -P hadoop2 -Dsuites= org.apache.spark.HeartbeatReceiverSuite test--- 
looks like everything worked

I then ran mvn -P hadoop2 -Dsuites= org.apache.spark.ContextCleanerSuite test-- 
looks like everything worked as well

See the attachments and let me know if this is this is the right process to run 
single unit tests, if not I'll start making changes to the other Suites , how 
would you like to see the output, should I just have attachments or just do a 
pull request from the new branch that I created?
Thanks

PS
Another question, running single unit tests like this takes forever, are there 
flags I can set to speed up the builds, even on my 15 inch macbook pro with SSD 
the builds shouldnt take this long :(.  


Let me know next steps to get this into a PR.





> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2016-10-18 Thread Saikat Kanjilal (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587627#comment-15587627
 ] 

Saikat Kanjilal edited comment on SPARK-9487 at 10/19/16 4:24 AM:
--

[~holdenk] finally getting time to look at this, so I am starting small, I made 
the change inside ContextCleanerSuite and HeartbeatReceiverSuite from local[2] 
tp local[4], per the documentation here 
(http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
 I ran mvn -P hadoop2 -Dsuites= org.apache.spark.HeartbeatReceiverSuite test--- 
looks like everything worked

I then ran mvn -P hadoop2 -Dsuites= org.apache.spark.ContextCleanerSuite test-- 
looks like everything worked as well

See the attachments and let me know if this is this is the right process to run 
single unit tests, if not I'll start making changes to the other Suites , how 
would you like to see the output, should I just have attachments or just do a 
pull request from the new branch that I created?
Thanks

PS
Another question, running single unit tests like this takes forever, are there 
flags I can set to speed up the builds, even on my 15 inch macbook pro with SSD 
the builds shouldnt take this long :(.  


Let me know next steps to get this into a PR.






was (Author: kanjilal):
[~holdenk] finally getting time to look at this, so I am starting small, I made 
the change inside ContextCleanerSuite and HeartbeatReceiverSuite from local[2] 
tp local[4], per the documentation here 
(http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version)
 I ran mvn -Phadoop2 -Dsuites=org.apache.spark.HeartbeatReceiverSuite test--- 
looks like everything worked

I then ran mvn -Phadoop2 -Dsuites=org.apache.spark.ContextCleanerSuite test-- 
looks like everything worked as well

See the attachments and let me know if this is not the right process to run 
single unit tests, if not I'll start making changes to the other Suites , how 
would you like to see the output, should I just have attachments or just do a 
pull request from the new branch that I created?
Thanks

PS
Another question, running single unit tests like this takes forever, are there 
flags I can set to speed up the builds, even on my 15 inch macbook pro with SSD 
the builds shouldnt take this long :(.  


Let me know next steps to get this into a PR.





> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>  Labels: starter
> Attachments: ContextCleanerSuiteResults, HeartbeatReceiverSuiteResults
>
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-9487) Use the same num. worker threads in Scala/Python unit tests

2015-10-06 Thread Evan Chen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-9487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14941961#comment-14941961
 ] 

Evan Chen edited comment on SPARK-9487 at 10/6/15 11:24 PM:


Hey [~mengxr],

What would be the preferred num. worker threads? Should we set all of them to 
local[2] to stay consistent with the Scala/Java side?

Thanks


was (Author: evanchen92):
Hey Xiangrui,

What would be the preferred num. worker threads? Should we set all of them to 
local[2] to stay consistent with the Scala/Java side?

Thanks

> Use the same num. worker threads in Scala/Python unit tests
> ---
>
> Key: SPARK-9487
> URL: https://issues.apache.org/jira/browse/SPARK-9487
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Spark Core, SQL, Tests
>Affects Versions: 1.5.0
>Reporter: Xiangrui Meng
>
> In Python we use `local[4]` for unit tests, while in Scala/Java we use 
> `local[2]` and `local` for some unit tests in SQL, MLLib, and other 
> components. If the operation depends on partition IDs, e.g., random number 
> generator, this will lead to different result in Python and Scala/Java. It 
> would be nice to use the same number in all unit tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org