[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43181035
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15023/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43177125
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43181033
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43259205
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43177758
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15019/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43259206
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15030/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43177756
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43252004
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43177136
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread andrewor14
GitHub user andrewor14 opened a pull request:

https://github.com/apache/spark/pull/787

[WIP] [SPARK-1808] Route bin/pyspark through Spark submit

*Problem.* For `bin/pyspark`, there is currently no other way to specify 
Spark configuration properties other than through `SPARK_JAVA_OPTS` in 
`conf/spark-env.sh`, which is supposedly deprecated. It needs to pick up 
configurations explicitly specified in `conf/spark-defaults.conf` instead.

*Solution.* Have `bin/pyspark` invoke `bin/spark-submit`, like all of its 
counterparts in Scala land (i.e. `bin/spark-shell`, `bin/run-example`). This 
has the additional benefit of making the Spark scripts consistent with each 
other.

---

*Details.* `bin/pyspark` inherently handles two cases: (1) running python 
applications and (2) running the python shell. For (1), Spark submit already 
offers an existing code path to run python applications. For cases in which 
`bin/pyspark` is given a python file, we can simply call pass the file directly 
to spark-submit. Here, the JVM launches the python application as a 
sub-process. This is the simple case:

- `bin/pyspark` passes the python file to Spark submit
- Spark submit passes the python file to `PythonAppRunner`
- `PythonAppRunner` sets up the Py4j GatewayServer on the Java side
- `PythonAppRunner` runs the python file as a sub-process

Case (2) is more involved. We cannot simply run the shell as another 
application, and use the existing code path in Spark submit as in (1). This is 
because the keyboard signals will not be propagated to the python interpreter 
properly, and dealing with each signal individually is cumbersome and likely 
not comprehensive. Thus, this PR takes the approach of making Python the parent 
process instead. This allows all keyboard signals to be propagated to the 
python REPL first, rather than to the JVM first:

- `bin/pyspark` calls `python/pyspark/repl.py`
- `repl.py` calls Spark submit as a sub-process
- Spark submit calls `PythonShellRunner`
- `PythonShellRunner` sets up the Py4j GatewayServer on
- `repl.py` learns the Py4j gateway server port from `PythonShellRunner` 
through sockets
- `repl.py` creates a SparkContext using this gateway server
- `repl.py` starts a REPL with this SparkContext


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/andrewor14/spark pyspark-submit

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/787.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #787


commit e4b543a91a0c42921aea6710d266013848425a9f
Author: Andrew Or andrewo...@gmail.com
Date:   2014-05-15T06:31:31Z

Route bin/pyspark through Spark submit

The bin/pyspark script takes two pathways, depending on the application.

If the application is a python file, the script passes the python file
directly to Spark submit, which launches the python application as a
sub-process within the JVM.

If the application is the pyspark shell, the script invokes a special
python script that invokes Spark submit as a sub-process. The main
benefit here is that the Python is now the parent process (rather than
Scala), such that all keyboard signals are propagated to the python
interpreter properly.

This divergence of code path here means Spark submit needs to launch
two different kinds of python runners (in Scala). Currently, Spark
submit invokes the PythonRunner, which creates python subprocessses
to run python applications. However, this is not applicable to the
shell, because the parent process is already the python process that
runs the REPL. This is why PythonRunner is split into PythonAppRunner
(for launching applications) and PythonShellRunner (for launching
the pyspark shell).

The new bin/pyspark has been tested locally to run both the REPL and
python applications successfully through Spark submit. A big TODO at
this point is to make sure the IPython case is not affected.

commit e195289f717263276637fa7505deb898db33801c
Author: Andrew Or andrewo...@gmail.com
Date:   2014-05-15T06:50:31Z

Merge branch 'master' of github.com:apache/spark into pyspark-submit




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread andrewor14
Github user andrewor14 closed the pull request at:

https://github.com/apache/spark/pull/787


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43251989
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43251502
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43180290
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43251370
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43180281
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43264897
  
Making big changes; re-opening in a bit.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [WIP] [SPARK-1808] Route bin/pyspark through S...

2014-05-16 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/787#issuecomment-43179937
  
Git exception. Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---