[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user jurriaan commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-200756142 Yeah, this is quite a useful feature :+1: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckhx commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-186259497 This is a pretty big pain when using pyspark (adding modules to workers) and should definitely be included. I'll open a new pull request and to push this through. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4897 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-168112604 I'm going to close this pull request. If this is still relevant and you are interested in pushing it forward, please open a new pull request. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user DoHe commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-161336531 For the test you could also add a local egg file to the repository and install this instead of relying on the external pypi server. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckhx commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-150020186 @davies removed the requirements file from the context constructor. I looked into add --py-requirements to spark-submit, but it looked a bit more in depth than I was thinking it would be. I think it makes sense to add that as a separate commit/PR. Also, when testing I import a package from pip, but the Spark Jenkins test errors because it doesn't allow for access to the global pypi server. Any thoughts? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-150056080 cc @JoshRosen @sknapp who are more familiar with Jenkins --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckhx commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-145139329 Looking to add support for namespaces, which I have tested locally and confirmed to work. I was also thinking of exposing an add_package function that would add local packages to the spark context. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckhx commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r40440718 --- Diff: python/pyspark/context.py --- @@ -711,6 +721,30 @@ def addPyFile(self, path): # for tests in local mode sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename)) +def addRequirementsFile(self, path): +""" +Add a pip requirements file to distribute dependencies for all tasks +on thie SparkContext in the future. An ImportError will be thrown if +a module in the file can't be downloaded. +See https://pip.pypa.io/en/latest/user_guide.html#requirements-files +Raises ImportError if the requirement can't be found +""" +import pip +tar_dir = tempfile.mkdtemp() +try: +for req in pip.req.parse_requirements(path, session=uuid.uuid1()): +if not req.check_if_exists(): +pip.main(['install', req.req.__str__()]) +mod = __import__(req.name) +mod_path = mod.__path__[0] --- End diff -- Indeed mod.__path__[0] does not work for namespace packages. Working on supporting them by bundling all paths in mod.__path__ into a single tar. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckhx commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-143251274 The tests fail because one of them attempts to pip install a package and doesn't have permissions to do so. Is there a way to enable that? Or just leave the pip installing piece untested? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user daradib commented on the pull request: https://github.com/apache/spark/pull/4743#issuecomment-133317435 Stumbled across this pull request. For future reference, it was superseded by #4897 in the same JIRA ticket. Looking forward to the `requirements.txt` integration. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121707276 [Test build #1076 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1076/consoleFull) for PR 4897 at commit [`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121710939 [Test build #1076 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1076/console) for PR 4897 at commit [`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083527 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083609 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083572 [Test build #6 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/6/consoleFull) for PR 4897 at commit [`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083540 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083607 [Test build #6 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/6/console) for PR 4897 at commit [`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083316 @andrewor14 the test failure in the `dev/run-tests` doctest are because we did a shallow clone of the Spark repository into the fresh Jeknins workspace. I think that this is an ephemeral problem that will resolve itself after retesting. We could also disable / remove that test for now or skip it in Jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083413 I've disabled shallow cloning in SlowSparkPullRequestBuilder, so let's try this again... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121083420 jenkins slow test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121073664 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121078002 @buckheroux Do you think that we should also support this in `bin/spark-submit`? For example, we should accept `.txt` file (via --py-files) as dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r34519652 --- Diff: python/pyspark/context.py --- @@ -711,6 +721,30 @@ def addPyFile(self, path): # for tests in local mode sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename)) +def addRequirementsFile(self, path): + +Add a pip requirements file to distribute dependencies for all tasks +on thie SparkContext in the future. An ImportError will be thrown if +a module in the file can't be downloaded. +See https://pip.pypa.io/en/latest/user_guide.html#requirements-files +Raises ImportError if the requirement can't be found + +import pip +tar_dir = tempfile.mkdtemp() +try: +for req in pip.req.parse_requirements(path, session=uuid.uuid1()): +if not req.check_if_exists(): +pip.main(['install', req.req.__str__()]) +mod = __import__(req.name) +mod_path = mod.__path__[0] --- End diff -- Worried about this working across different types of modules (namespace packages etc) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r34519663 --- Diff: python/pyspark/context.py --- @@ -711,6 +721,30 @@ def addPyFile(self, path): # for tests in local mode sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename)) +def addRequirementsFile(self, path): + +Add a pip requirements file to distribute dependencies for all tasks +on thie SparkContext in the future. An ImportError will be thrown if +a module in the file can't be downloaded. +See https://pip.pypa.io/en/latest/user_guide.html#requirements-files +Raises ImportError if the requirement can't be found + +import pip +tar_dir = tempfile.mkdtemp() +try: +for req in pip.req.parse_requirements(path, session=uuid.uuid1()): +if not req.check_if_exists(): +pip.main(['install', req.req.__str__()]) +mod = __import__(req.name) +mod_path = mod.__path__[0] +tar_path = os.path.join(tar_dir, req.name+'.tar.gz') --- End diff -- Hmm good thought. I'll look into this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121074983 Jenkins slow test please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121075088 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121075116 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121075215 [Test build #5 has started](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/5/consoleFull) for PR 4897 at commit [`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121075227 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121075221 [Test build #5 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SlowSparkPullRequestBuilder/5/console) for PR 4897 at commit [`76ff637`](https://github.com/apache/spark/commit/76ff63733b9d293c43218aa743a74bcce36a20c9). * This patch **fails some tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121075908 Just saw the test failure, looking into it now --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121077652 @buckheroux I believe the test failure is not related. @JoshRosen is looking into a hot fix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-121080610 @andrewor14 Sounds good. @davies It's probably a good idea to support a CLI arg for the requirements file. I'll work on this as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r34518168 --- Diff: python/pyspark/context.py --- @@ -67,8 +70,9 @@ class SparkContext(object): PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar') def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None, - environment=None, batchSize=0, serializer=PickleSerializer(), conf=None, - gateway=None, jsc=None, profiler_cls=BasicProfiler): + environment=None, batchSize=0, serializer=PickleSerializer(), + conf=None, gateway=None, jsc=None, profiler_cls=BasicProfiler, + requirementsFile=None): --- End diff -- We already have two much parameters here, Can we just have `addRequirementsFile`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r34518258 --- Diff: python/pyspark/context.py --- @@ -711,6 +721,30 @@ def addPyFile(self, path): # for tests in local mode sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename)) +def addRequirementsFile(self, path): + +Add a pip requirements file to distribute dependencies for all tasks +on thie SparkContext in the future. An ImportError will be thrown if +a module in the file can't be downloaded. +See https://pip.pypa.io/en/latest/user_guide.html#requirements-files +Raises ImportError if the requirement can't be found + +import pip +tar_dir = tempfile.mkdtemp() +try: +for req in pip.req.parse_requirements(path, session=uuid.uuid1()): +if not req.check_if_exists(): +pip.main(['install', req.req.__str__()]) +mod = __import__(req.name) +mod_path = mod.__path__[0] +tar_path = os.path.join(tar_dir, req.name+'.tar.gz') --- End diff -- Is possible that we can re-use the packaged file? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r34519855 --- Diff: python/pyspark/context.py --- @@ -67,8 +70,9 @@ class SparkContext(object): PACKAGE_EXTENSIONS = ('.zip', '.egg', '.jar') def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None, - environment=None, batchSize=0, serializer=PickleSerializer(), conf=None, - gateway=None, jsc=None, profiler_cls=BasicProfiler): + environment=None, batchSize=0, serializer=PickleSerializer(), + conf=None, gateway=None, jsc=None, profiler_cls=BasicProfiler, + requirementsFile=None): --- End diff -- I was trying to match the pyFiles interface as closely as possible, although I do see where you are coming from. I'll re-evaluate having the requirementsFile kwarg after add a requirements file arg to the CLI. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-101366749 @robcowie I wouldn't think so. Do you have a good example of a pypi package that I could try to install that has a namespace? I can try and support it if possible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-96769377 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-94474401 Any traction on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user robcowie commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-94510834 This could be very useful. One question; Does this behave with namespace packages? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-89413340 Changed importlib to __import__ and moved most of the requirements up. I left the pip requirement in the method itself as to not require the system has pip unless that method is used. Also added a test that tests the import of a library not found on most systems https://github.com/buckheroux/QuadKey. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-78065029 Great, will get to that this week and add tests. Do you think bundling importlib with pyspark is reasonable? Or is finding another way to track down the local package the way to go? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-78120122 I think we can go with `__import__`, it's similar to importlib --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-77659741 Jenkins, Ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-77659717 Good idea, this is a useful feature to have. Could you also a argument for spark-submit, (for example, --pip or --py-requirements). Also, could you add a test for it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r25988691 --- Diff: python/pyspark/context.py --- @@ -65,8 +65,9 @@ class SparkContext(object): _python_includes = None # zip and egg files that need to be added to PYTHONPATH def __init__(self, master=None, appName=None, sparkHome=None, pyFiles=None, - environment=None, batchSize=0, serializer=PickleSerializer(), conf=None, - gateway=None, jsc=None, profiler_cls=BasicProfiler): + requirementsFile=None, environment=None, batchSize=0, --- End diff -- Insert a parameter in the middle will break compatibility, we should put it in the end. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4897#discussion_r25988754 --- Diff: python/pyspark/context.py --- @@ -710,6 +717,33 @@ def addPyFile(self, path): # for tests in local mode sys.path.insert(1, os.path.join(SparkFiles.getRootDirectory(), filename)) +def addRequirementsFile(self, path): + +Add a pip requirements file to distribute dependencies for all tasks +on thie SparkContext in the future. An ImportError will be thrown if +a module in the file can't be downloaded. +See https://pip.pypa.io/en/latest/user_guide.html#requirements-files + +import importlib +import pip +import tarfile +import tempfile +import uuid +tar_dir = tempfile.mkdtemp() +try: +for req in pip.req.parse_requirements(path, session=uuid.uuid1()): +if not req.check_if_exists(): +pip.main(['install', req.req.__str__()]) +mod = importlib.import_module(req.name) # Can throw ImportError --- End diff -- import lib is not available in Python 2.6 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4897#issuecomment-77285477 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
GitHub user buckheroux opened a pull request: https://github.com/apache/spark/pull/4897 [SPARK-5929] Pyspark: Register a pip requirements file with spark_context Ships all packages in the requirements file by installing them locally via pip and then ships the packages to the workers via addPyFile You can merge this pull request into a Git repository by running: $ git pull https://github.com/buckheroux/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4897.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4897 commit 0ed060df2ec5a1a0427df6c160bd51c7014b29da Author: buck heroux bher...@palantir.com Date: 2015-02-18T17:50:27Z added requirements file to pyspark commit 6b8bcde60378b58998f5c14d81d72de81f44d718 Author: buck heroux bher...@palantir.com Date: 2015-02-18T23:30:28Z tarfile has no contextmanager in python2. commit 2773483ea6cc244cb7de02c7dc184391a94d29e6 Author: buck heroux bher...@palantir.com Date: 2015-02-19T01:52:06Z reqs fix commit 0371ad9b13f96dcc534d897789ccd32f907d5ed9 Author: buck heroux bher...@palantir.com Date: 2015-02-19T02:06:48Z temp tar file commit f2a46e5d6e309a5ba29259cc1f77e594d932b0f5 Author: buck heroux bher...@palantir.com Date: 2015-03-05T00:51:46Z bubbled up try finally commit fca4be61c6542b807a0d5370f761ef031fc7eb86 Author: buck heroux bher...@palantir.com Date: 2015-03-05T00:53:17Z forgot to remove --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user buckheroux closed the pull request at: https://github.com/apache/spark/pull/4743 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4743#issuecomment-75775029 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-5929] Pyspark: Register a pip requireme...
GitHub user buckheroux opened a pull request: https://github.com/apache/spark/pull/4743 [SPARK-5929] Pyspark: Register a pip requirements file with spark_context https://issues.apache.org/jira/browse/SPARK-5929 Register a pip requirements file with the spark_context which will ship all defined dependencies to the workers. Functions similarly to addPyFile. You can merge this pull request into a Git repository by running: $ git pull https://github.com/buckheroux/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/4743.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #4743 commit 0ed060df2ec5a1a0427df6c160bd51c7014b29da Author: buck heroux bher...@palantir.com Date: 2015-02-18T17:50:27Z added requirements file to pyspark commit 6b8bcde60378b58998f5c14d81d72de81f44d718 Author: buck heroux bher...@palantir.com Date: 2015-02-18T23:30:28Z tarfile has no contextmanager in python2. commit 2773483ea6cc244cb7de02c7dc184391a94d29e6 Author: buck heroux bher...@palantir.com Date: 2015-02-19T01:52:06Z reqs fix commit 0371ad9b13f96dcc534d897789ccd32f907d5ed9 Author: buck heroux bher...@palantir.com Date: 2015-02-19T02:06:48Z temp tar file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org