[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2016-03-24 Thread jurriaan
Github user jurriaan commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-200756142
  
Yeah, this is quite a useful feature :+1: 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2016-02-19 Thread buckhx
Github user buckhx commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-186259497
  
This is a pretty big pain when using pyspark (adding modules to workers) 
and should definitely be included. I'll open a new pull request and to push 
this through.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-12-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4897


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-12-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-168112604
  
I'm going to close this pull request. If this is still relevant and you are 
interested in pushing it forward, please open a new pull request. Thanks!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-12-02 Thread DoHe
Github user DoHe commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-161336531
  
For the test you could also add a local egg file to the repository and 
install this instead of relying on the external pypi server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-10-21 Thread buckhx
Github user buckhx commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-150020186
  
@davies removed the requirements file from the context constructor. I 
looked into add --py-requirements to spark-submit, but it looked a bit more in 
depth than I was thinking it would be. I think it makes sense to add that as a 
separate commit/PR. 

Also, when testing I import a package from pip, but the Spark Jenkins test 
errors because it doesn't allow for access to the global pypi server. Any 
thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-10-21 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-150056080
  
cc @JoshRosen @sknapp who are more familiar with Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-10-02 Thread buckhx
Github user buckhx commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-145139329
  
Looking to add support for namespaces, which I have tested locally and 
confirmed to work. I was also thinking of exposing an add_package function that 
would add local packages to the spark context.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-09-25 Thread buckhx
Github user buckhx commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r40440718
  
--- Diff: python/pyspark/context.py ---
@@ -711,6 +721,30 @@ def addPyFile(self, path):
 # for tests in local mode
 sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), 
filename))
 
+def addRequirementsFile(self, path):
+"""
+Add a pip requirements file to distribute dependencies for all 
tasks
+on thie SparkContext in the future. An ImportError will be thrown 
if
+a module in the file can't be downloaded.
+See 
https://pip.pypa.io/en/latest/user_guide.html#requirements-files
+Raises ImportError if the requirement can't be found
+"""
+import pip
+tar_dir = tempfile.mkdtemp()
+try:
+for req in pip.req.parse_requirements(path, 
session=uuid.uuid1()):
+if not req.check_if_exists():
+pip.main(['install', req.req.__str__()])
+mod = __import__(req.name)
+mod_path = mod.__path__[0]
--- End diff --

Indeed mod.__path__[0] does not work for namespace packages. Working on 
supporting them by bundling all paths in mod.__path__ into a single tar.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-09-25 Thread buckhx
Github user buckhx commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-143251274
  
The tests fail because one of them attempts to pip install a package and 
doesn't have permissions to do so. Is there a way to enable that? Or just leave 
the pip installing piece untested?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-08-21 Thread daradib
Github user daradib commented on the pull request:

https://github.com/apache/spark/pull/4743#issuecomment-133317435
  
Stumbled across this pull request. For future reference, it was superseded 
by #4897 in the same JIRA ticket. Looking forward to the `requirements.txt` 
integration.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121707276
  
  [Test build #1076 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1076/consoleFull)
 for   PR 4897 at commit 
[`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-15 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121710939
  
  [Test build #1076 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1076/console)
 for   PR 4897 at commit 
[`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083527
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083609
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083572
  
  [Test build #6 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/6/consoleFull)
 for   PR 4897 at commit 
[`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083540
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083607
  
  [Test build #6 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/6/console)
 for   PR 4897 at commit 
[`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083316
  
@andrewor14 the test failure in the `dev/run-tests` doctest are because we 
did a shallow clone of the Spark repository into the fresh Jeknins workspace. I 
think that this is an ephemeral problem that will resolve itself after 
retesting. We could also disable / remove that test for now or skip it in 
Jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083413
  
I've disabled shallow cloning in SlowSparkPullRequestBuilder, so let's try 
this again...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121083420
  
jenkins slow test please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121073664
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121078002
  
@buckheroux Do you think that we should also support this in 
`bin/spark-submit`? For example, we should accept `.txt` file (via --py-files) 
as dependencies. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread buckheroux
Github user buckheroux commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r34519652
  
--- Diff: python/pyspark/context.py ---
@@ -711,6 +721,30 @@ def addPyFile(self, path):
 # for tests in local mode
 sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), 
filename))
 
+def addRequirementsFile(self, path):
+
+Add a pip requirements file to distribute dependencies for all 
tasks
+on thie SparkContext in the future. An ImportError will be thrown 
if
+a module in the file can't be downloaded.
+See 
https://pip.pypa.io/en/latest/user_guide.html#requirements-files
+Raises ImportError if the requirement can't be found
+
+import pip
+tar_dir = tempfile.mkdtemp()
+try:
+for req in pip.req.parse_requirements(path, 
session=uuid.uuid1()):
+if not req.check_if_exists():
+pip.main(['install', req.req.__str__()])
+mod = __import__(req.name)
+mod_path = mod.__path__[0]
--- End diff --

Worried about this working across different types of modules (namespace 
packages etc)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread buckheroux
Github user buckheroux commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r34519663
  
--- Diff: python/pyspark/context.py ---
@@ -711,6 +721,30 @@ def addPyFile(self, path):
 # for tests in local mode
 sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), 
filename))
 
+def addRequirementsFile(self, path):
+
+Add a pip requirements file to distribute dependencies for all 
tasks
+on thie SparkContext in the future. An ImportError will be thrown 
if
+a module in the file can't be downloaded.
+See 
https://pip.pypa.io/en/latest/user_guide.html#requirements-files
+Raises ImportError if the requirement can't be found
+
+import pip
+tar_dir = tempfile.mkdtemp()
+try:
+for req in pip.req.parse_requirements(path, 
session=uuid.uuid1()):
+if not req.check_if_exists():
+pip.main(['install', req.req.__str__()])
+mod = __import__(req.name)
+mod_path = mod.__path__[0]
+tar_path = os.path.join(tar_dir, req.name+'.tar.gz')
--- End diff --

Hmm good thought. I'll look into this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121074983
  
Jenkins slow test please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121075088
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121075116
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121075215
  
  [Test build #5 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/5/consoleFull)
 for   PR 4897 at commit 
[`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121075227
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121075221
  
  [Test build #5 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/5/console)
 for   PR 4897 at commit 
[`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9).
 * This patch **fails some tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread buckheroux
Github user buckheroux commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121075908
  
Just saw the test failure, looking into it now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread andrewor14
Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121077652
  
@buckheroux I believe the test failure is not related. @JoshRosen is 
looking into a hot fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread buckheroux
Github user buckheroux commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-121080610
  
@andrewor14 Sounds good.

@davies It's probably a good idea to support a CLI arg for the requirements 
file. I'll work on this as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r34518168
  
--- Diff: python/pyspark/context.py ---
@@ -67,8 +70,9 @@ class SparkContext(object):
 PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar')
 
 def __init__(self, master=None, appName=None, sparkHome=None, 
pyFiles=None,
- environment=None, batchSize=0, 
serializer=PickleSerializer(), conf=None,
- gateway=None, jsc=None, profiler_cls=BasicProfiler):
+ environment=None, batchSize=0, 
serializer=PickleSerializer(),
+ conf=None, gateway=None, jsc=None, 
profiler_cls=BasicProfiler,
+ requirementsFile=None):
--- End diff --

We already have two much parameters here, Can we just have 
`addRequirementsFile`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r34518258
  
--- Diff: python/pyspark/context.py ---
@@ -711,6 +721,30 @@ def addPyFile(self, path):
 # for tests in local mode
 sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), 
filename))
 
+def addRequirementsFile(self, path):
+
+Add a pip requirements file to distribute dependencies for all 
tasks
+on thie SparkContext in the future. An ImportError will be thrown 
if
+a module in the file can't be downloaded.
+See 
https://pip.pypa.io/en/latest/user_guide.html#requirements-files
+Raises ImportError if the requirement can't be found
+
+import pip
+tar_dir = tempfile.mkdtemp()
+try:
+for req in pip.req.parse_requirements(path, 
session=uuid.uuid1()):
+if not req.check_if_exists():
+pip.main(['install', req.req.__str__()])
+mod = __import__(req.name)
+mod_path = mod.__path__[0]
+tar_path = os.path.join(tar_dir, req.name+'.tar.gz')
--- End diff --

Is possible that we can re-use the packaged file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-07-13 Thread buckheroux
Github user buckheroux commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r34519855
  
--- Diff: python/pyspark/context.py ---
@@ -67,8 +70,9 @@ class SparkContext(object):
 PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar')
 
 def __init__(self, master=None, appName=None, sparkHome=None, 
pyFiles=None,
- environment=None, batchSize=0, 
serializer=PickleSerializer(), conf=None,
- gateway=None, jsc=None, profiler_cls=BasicProfiler):
+ environment=None, batchSize=0, 
serializer=PickleSerializer(),
+ conf=None, gateway=None, jsc=None, 
profiler_cls=BasicProfiler,
+ requirementsFile=None):
--- End diff --

I was trying to match the pyFiles interface as closely as possible, 
although I do see where you are coming from. I'll re-evaluate having the 
requirementsFile kwarg after add a requirements file arg to the CLI.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-05-12 Thread buckheroux
Github user buckheroux commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-101366749
  
@robcowie I wouldn't think so. Do you have a good example of a pypi package 
that I could try to install that has a namespace? I can try and support it if 
possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-04-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-96769377
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-04-20 Thread buckheroux
Github user buckheroux commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-94474401
  
Any traction on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-04-20 Thread robcowie
Github user robcowie commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-94510834
  
This could be very useful. One question; Does this behave with namespace 
packages?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-04-03 Thread buckheroux
Github user buckheroux commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-89413340
  
Changed importlib to __import__ and moved most of the requirements up. I 
left the pip requirement in the method itself as to not require the system has 
pip unless that method is used. Also added a test that tests the import of a 
library not found on most systems https://github.com/buckheroux/QuadKey.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-10 Thread buckheroux
Github user buckheroux commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-78065029
  
Great, will get to that this week and add tests.

Do you think bundling importlib with pyspark is reasonable? Or is finding 
another way to track down the local package the way to go?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-10 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-78120122
  
I think we can go with `__import__`, it's similar to importlib


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-77659741
  
Jenkins, Ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-06 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-77659717
  
Good idea, this is a useful feature to have. Could you also a argument for 
spark-submit, (for example, --pip or --py-requirements). Also, could you add a 
test for it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r25988691
  
--- Diff: python/pyspark/context.py ---
@@ -65,8 +65,9 @@ class SparkContext(object):
 _python_includes = None  # zip and egg files that need to be added to 
PYTHONPATH
 
 def __init__(self, master=None, appName=None, sparkHome=None, 
pyFiles=None,
- environment=None, batchSize=0, 
serializer=PickleSerializer(), conf=None,
- gateway=None, jsc=None, profiler_cls=BasicProfiler):
+ requirementsFile=None, environment=None, batchSize=0,
--- End diff --

Insert a parameter in the middle will break compatibility, we should put it 
in the end. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-06 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/4897#discussion_r25988754
  
--- Diff: python/pyspark/context.py ---
@@ -710,6 +717,33 @@ def addPyFile(self, path):
 # for tests in local mode
 sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), 
filename))
 
+def addRequirementsFile(self, path):
+
+Add a pip requirements file to distribute dependencies for all 
tasks
+on thie SparkContext in the future. An ImportError will be thrown 
if
+a module in the file can't be downloaded.
+See 
https://pip.pypa.io/en/latest/user_guide.html#requirements-files
+
+import importlib
+import pip
+import tarfile
+import tempfile
+import uuid
+tar_dir = tempfile.mkdtemp()
+try:
+for req in pip.req.parse_requirements(path, 
session=uuid.uuid1()):
+if not req.check_if_exists():
+pip.main(['install', req.req.__str__()])
+mod = importlib.import_module(req.name) # Can throw 
ImportError
--- End diff --

import lib is not available in Python 2.6


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-04 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4897#issuecomment-77285477
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-03-04 Thread buckheroux
GitHub user buckheroux opened a pull request:

https://github.com/apache/spark/pull/4897

[SPARK-5929] Pyspark: Register a pip requirements file with spark_context

Ships all packages in the requirements file by installing them locally via 
pip and then ships the packages to the workers via addPyFile

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/buckheroux/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4897.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4897


commit 0ed060df2ec5a1a0427df6c160bd51c7014b29da
Author: buck heroux bher...@palantir.com
Date:   2015-02-18T17:50:27Z

added requirements file to pyspark

commit 6b8bcde60378b58998f5c14d81d72de81f44d718
Author: buck heroux bher...@palantir.com
Date:   2015-02-18T23:30:28Z

tarfile has no contextmanager in python2.

commit 2773483ea6cc244cb7de02c7dc184391a94d29e6
Author: buck heroux bher...@palantir.com
Date:   2015-02-19T01:52:06Z

reqs fix

commit 0371ad9b13f96dcc534d897789ccd32f907d5ed9
Author: buck heroux bher...@palantir.com
Date:   2015-02-19T02:06:48Z

temp tar file

commit f2a46e5d6e309a5ba29259cc1f77e594d932b0f5
Author: buck heroux bher...@palantir.com
Date:   2015-03-05T00:51:46Z

bubbled up try finally

commit fca4be61c6542b807a0d5370f761ef031fc7eb86
Author: buck heroux bher...@palantir.com
Date:   2015-03-05T00:53:17Z

forgot to remove




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-02-26 Thread buckheroux
Github user buckheroux closed the pull request at:

https://github.com/apache/spark/pull/4743


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-02-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4743#issuecomment-75775029
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...

2015-02-24 Thread buckheroux
GitHub user buckheroux opened a pull request:

https://github.com/apache/spark/pull/4743

[SPARK-5929] Pyspark: Register a pip requirements file with spark_context

https://issues.apache.org/jira/browse/SPARK-5929

Register a pip requirements file with the spark_context which will ship all 
defined dependencies to the workers. Functions similarly to addPyFile.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/buckheroux/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4743.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4743


commit 0ed060df2ec5a1a0427df6c160bd51c7014b29da
Author: buck heroux bher...@palantir.com
Date:   2015-02-18T17:50:27Z

added requirements file to pyspark

commit 6b8bcde60378b58998f5c14d81d72de81f44d718
Author: buck heroux bher...@palantir.com
Date:   2015-02-18T23:30:28Z

tarfile has no contextmanager in python2.

commit 2773483ea6cc244cb7de02c7dc184391a94d29e6
Author: buck heroux bher...@palantir.com
Date:   2015-02-19T01:52:06Z

reqs fix

commit 0371ad9b13f96dcc534d897789ccd32f907d5ed9
Author: buck heroux bher...@palantir.com
Date:   2015-02-19T02:06:48Z

temp tar file




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org