[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user belevtsoff commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-129903372 Not sure if here's the right place to post this, but the documentation on the official website appears to be outdated. For example, for spark 1.4.0 and 1.4.1 [this](http://spark.apache.org/docs/1.4.1/programming-guide.html#linking-with-spark) paragraph (python tab) seems particularly misleading. Also, the last line of [this](http://spark.apache.org/docs/1.4.1/#downloading) paragraph doesn't mention python 3 support. Maybe there are other places. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-129943743 This is also a duplicate of https://issues.apache.org/jira/browse/SPARK-9705, which I'm going to merge into the new issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-129938050 @belevtsoff Thanks for reporting this, Created https://issues.apache.org/jira/browse/SPARK-9822 to track it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user delallea commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-120427873 I created a Jira here (I'm running into the same issue): https://issues.apache.org/jira/browse/SPARK-8976 It's my first time creating a Jira for Apache products, so someone more familiar with the process should probably review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user rilut commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-119837999 Sorry, I'm in a remote location for months. Maybe you/anyone could help us to create a new issue if it still unresolved. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user latkin commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-119745079 Is there a link to JIRA yet? I'm hitting the invalid mode issue with 3.4.3, as well. [Searching](https://issues.apache.org/jira/browse/SPARK-7909?jql=project%20%3D%20SPARK%20AND%20text%20~%203.4.3) 3.4.3 yields only [this guy](https://issues.apache.org/jira/browse/SPARK-7909), which is related but not quite the same. FWIW 3.4.3 is now the default Python version that Visual Studio 2015 suggests you install. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user rilut commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-113410652 I'm on Python 3.4.3 (Anaconda 2.2.0 64-bit, Windows 10 x64) and experiencing a problem. sc.parallelize([1, 2]).count() File C:\Anaconda3\lib\socket.py, line 205, in makefile raise ValueError(invalid mode %r (only r, w, b allowed) % (mode,)) ValueError: invalid mode 'a+' (only r, w, b allowed) 15/06/19 14:10:54 WARN PythonRDD: Incomplete task interrupted: Attempting to kill Python Worker I think it's because of https://github.com/apache/spark/blob/master/python/pyspark/worker.py#L149 . I'm not sure if `a+` mode exists in Python 3. import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((127.0.0.1, 4040)) sock_file = sock.makefile(a+, 65536) ## failed Traceback (most recent call last): File stdin, line 1, in module File C:\Anaconda3\lib\socket.py, line 205, in makefile raise ValueError(invalid mode %r (only r, w, b allowed) % (mode,)) ValueError: invalid mode 'a+' (only r, w, b allowed) sock_file = sock.makefile(r, 65536) ## r is okay sock_file = sock.makefile(x, 65536) ## x is obviously doesn't exists, I'm making this up. Traceback (most recent call last): File stdin, line 1, in module File C:\Anaconda3\lib\socket.py, line 205, in makefile raise ValueError(invalid mode %r (only r, w, b allowed) % (mode,)) ValueError: invalid mode 'x' (only r, w, b allowed) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user twneale commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-113541699 Looks like this is originating from the socket module https://github.com/python/cpython/blob/master/Lib/socket.py. the .makefile method only allows r w and b modes, probably in both 2.x and 3.x. I wonder if that particular socket is usually a UNIX socket, but falls back to tcp on windows. Rilut does your spark installation work with python 2.7? On Fri, Jun 19, 2015, 3:34 AM rilut notificati...@github.com wrote: I'm on Python 3.4.3 (Anaconda 2.2.0 64-bit, Windows 10 x64) and experiencing a problem. sc.parallelize([1, 2]).count() File C:\Anaconda3\lib\socket.py, line 205, in makefile raise ValueError(invalid mode %r (only r, w, b allowed) % (mode,)) ValueError: invalid mode 'a+' (only r, w, b allowed) 15/06/19 14:10:54 WARN PythonRDD: Incomplete task interrupted: Attempting to kill Python Worker I think it's because of https://github.com/apache/spark/blob/master/python/pyspark/worker.py#L149 . I'm not sure if a+ mode exists in Python 3. import socket sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM) sock.connect((127.0.0.1, 4040)) sock_file = sock.makefile(a+, 65536) ## failed Traceback (most recent call last): File stdin, line 1, in module File C:\Anaconda3\lib\socket.py, line 205, in makefile raise ValueError(invalid mode %r (only r, w, b allowed) % (mode,)) ValueError: invalid mode 'a+' (only r, w, b allowed) sock_file = sock.makefile(r, 65536) ## r is okay sock_file = sock.makefile(x, 65536) ## x is obviously doesn't exists, I'm making this up. Traceback (most recent call last): File stdin, line 1, in module File C:\Anaconda3\lib\socket.py, line 205, in makefile raise ValueError(invalid mode %r (only r, w, b allowed) % (mode,)) ValueError: invalid mode 'x' (only r, w, b allowed) â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/5173#issuecomment-113410652. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-113586048 @rilut Can we move this discussion to JIRA so that it's easier to track? File a ticket at https://issues.apache.org/jira/browse/SPARK and link it here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user rilut commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-113586217 @twneale @JoshRosen ok, I'll repost it to JIRA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user evertlammerts commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-98980872 My bad! I forgot to roll out the new module to all nodes after recompiling. After doing that I'm back in business. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user evertlammerts commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-98541048 After this PR is merged i'm having some trouble getting pyspark to work: ``` sc.parallelize([1, 2]).count() ... AttributeError: 'module' object has no attribute '_builtin_type' ... ``` I'm on python 2.7.9 (Anaconda 2.2.0) on RHEL 6 with Spark master @ f4af92550cb90e47a12d4625fa615dd2b1587d42 I see some tests are skipped, maybe ```count``` among them? ```distinct``` seems to have the same problem, as far as I can see now. Does anybody else see this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-98541978 @evertlammerts I don't see that issue locally; I'm using Python 2.7.9 on OSX. Can you file an issue on the [Spark JIRA](http://issues.apache.org/jira/browse/SPARK) and mark it as a 1.4.0 blocker until we've debugged? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-97563467 [Test build #28 has started](https://hadrian.millennium.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/28/consoleFull) for PR 5173 at commit [`59bb492`](https://github.com/apache/spark/commit/59bb49260f62fda2af0d48e35447d1a7dcd0a479). * This patch **does not merge cleanly**. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-97600364 **[Test build #28 timed out](https://hadrian.millennium.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/28/consoleFull)** for PR 5173 at commit [`59bb492`](https://github.com/apache/spark/commit/59bb49260f62fda2af0d48e35447d1a7dcd0a479) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-97603182 ignore these comments -- this is me testing on our staging instance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user ogrisel commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-94064445 Thank you very much, porting PySpark to Python 3 is very appreciated. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93657492 [Test build #30401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30401/consoleFull) for PR 5173 at commit [`cafd5ec`](https://github.com/apache/spark/commit/cafd5ec1403f47681950233361815f468435d05f). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93660706 [Test build #30404 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30404/consoleFull) for PR 5173 at commit [`b716610`](https://github.com/apache/spark/commit/b716610900f54b4f88c5953ab1ad1a27caa3386d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93659307 [Test build #30401 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30401/consoleFull) for PR 5173 at commit [`cafd5ec`](https://github.com/apache/spark/commit/cafd5ec1403f47681950233361815f468435d05f). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93659321 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30401/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93673473 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30404/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93673463 [Test build #30404 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30404/consoleFull) for PR 5173 at commit [`b716610`](https://github.com/apache/spark/commit/b716610900f54b4f88c5953ab1ad1a27caa3386d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93779040 [Test build #30422 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30422/consoleFull) for PR 5173 at commit [`99e334f`](https://github.com/apache/spark/commit/99e334f987c561421b10fe2a6942144d1585ecb1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93802559 [Test build #30429 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30429/consoleFull) for PR 5173 at commit [`6c52a98`](https://github.com/apache/spark/commit/6c52a98dee887e21e115ba1194b0e617ff9f27a8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93795958 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30422/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93795907 [Test build #30422 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30422/consoleFull) for PR 5173 at commit [`99e334f`](https://github.com/apache/spark/commit/99e334f987c561421b10fe2a6942144d1585ecb1). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93819774 [Test build #30429 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30429/consoleFull) for PR 5173 at commit [`6c52a98`](https://github.com/apache/spark/commit/6c52a98dee887e21e115ba1194b0e617ff9f27a8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93827547 [Test build #30433 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30433/consoleFull) for PR 5173 at commit [`d7d6323`](https://github.com/apache/spark/commit/d7d63237036bec439700085ed31b6225199b38c1). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93846635 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30433/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93846620 [Test build #30433 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30433/consoleFull) for PR 5173 at commit [`d7d6323`](https://github.com/apache/spark/commit/d7d63237036bec439700085ed31b6225199b38c1). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93816252 MLlib changes look good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93819786 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30429/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user shaneknapp commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93862558 woot! On Thu, Apr 16, 2015 at 4:22 PM, Josh Rosen notificati...@github.com wrote: I've merged this into master (1.4.0). Thanks @davies https://github.com/davies, @twneale https://github.com/twneale, @mengxr https://github.com/mengxr, and everyone else who helped to test this patch! â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/5173#issuecomment-93861091. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93861091 I've merged this into `master` (1.4.0). Thanks @davies, @twneale, @mengxr, and everyone else who helped to test this patch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93863171 Oh, and a **huge** thanks to @shaneknapp for helping us configure Jenkins for Python 3 and PyPy, which was no easy task! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5173 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93806902 @JoshRosen Once it pass the tests, I think it's ready to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user shaananc commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93394590 Good idea. I was on YARN 2.4, testing it with just two nodes. I just tried running it locally rather than on the cluster and it worked fine. If you want more details or the docker images I'm using let me know. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user rgbkrk commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28436264 --- Diff: python/pyspark/cloudpickle.py --- @@ -40,164 +40,126 @@ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +from __future__ import print_function import operator import os +import io import pickle import struct import sys import types from functools import partial import itertools -from copy_reg import _extension_registry, _inverted_registry, _extension_cache -import new import dis import traceback -import platform - -PyImp = platform.python_implementation() - -import logging -cloudLog = logging.getLogger(Cloud.Transport) --- End diff -- Would you be amenable to moving cloudpickle to a separate repository? We'd like to be able to rely on it in IPython parallel as well as other projects. In the past couple days, folks at the PyCon sprints have been adding tests for the [current codebase](https://github.com/cloudpipe/cloudpickle). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28444539 --- Diff: python/pyspark/streaming/dstream.py --- @@ -579,9 +580,9 @@ def reduceFunc(t, a, b): g = b.groupByKey(numPartitions).mapValues(lambda vs: (list(vs), None)) else: g = a.cogroup(b.partitionBy(numPartitions), numPartitions) -g = g.mapValues(lambda (va, vb): (list(vb), list(va)[0] if len(va) else None)) --- End diff -- done --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28444580 --- Diff: python/pyspark/sql/functions.py --- @@ -116,7 +114,7 @@ def __init__(self, func, returnType): def _create_judf(self): f = self.func # put it in closure `func` -func = lambda _, it: imap(lambda x: f(*x), it) +func = lambda _, it: map(lambda x: f(*x), it) --- End diff -- Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28445096 --- Diff: python/pyspark/cloudpickle.py --- @@ -40,164 +40,126 @@ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +from __future__ import print_function import operator import os +import io import pickle import struct import sys import types from functools import partial import itertools -from copy_reg import _extension_registry, _inverted_registry, _extension_cache -import new import dis import traceback -import platform - -PyImp = platform.python_implementation() - -import logging -cloudLog = logging.getLogger(Cloud.Transport) --- End diff -- Right now, vendoring is easier for us to minimize the dependencies. We'd like contributing back these changes to upstream later. @rgbkrk Have you tried Dill ? https://github.com/uqfoundation/dill --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user shaananc commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93519593 @davies - Thanks so much and good catch! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93528080 [Test build #30365 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30365/consoleFull) for PR 5173 at commit [`8c8b957`](https://github.com/apache/spark/commit/8c8b957fdeb721dc772584ba6135810163ef488c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93506874 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93510822 @shaananc It seems that you are using Python3 in the driver, but python2 in YARN. PySpark can not work with different minor versions in driver and worker. So you could specify the Python version by: ``` PYSPARK_PYTHON=python2 bin/spark-submit xxx ``` Or change the default version to python3 in YARN. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93516544 [Test build #30363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30363/consoleFull) for PR 5173 at commit [`5c57c95`](https://github.com/apache/spark/commit/5c57c95a0e8b8ca11f88a60d6e48ef0e4caa3a16). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93524365 [Test build #30364 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30364/consoleFull) for PR 5173 at commit [`8f8e710`](https://github.com/apache/spark/commit/8f8e7100937f2fd5ce5252c11e42cb2230d21581). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93507501 [Test build #30362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30362/consoleFull) for PR 5173 at commit [`4006829`](https://github.com/apache/spark/commit/400682982bbb6277d4b2c6dca2c3b88d491e5b21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user rgbkrk commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28447868 --- Diff: python/pyspark/cloudpickle.py --- @@ -40,164 +40,126 @@ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +from __future__ import print_function import operator import os +import io import pickle import struct import sys import types from functools import partial import itertools -from copy_reg import _extension_registry, _inverted_registry, _extension_cache -import new import dis import traceback -import platform - -PyImp = platform.python_implementation() - -import logging -cloudLog = logging.getLogger(Cloud.Transport) --- End diff -- Yes, others have noted that dill didn't have the same opinionated base for pickling functions (especially functions within main). /cc @ogrisel --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28459908 --- Diff: python/pyspark/cloudpickle.py --- @@ -40,164 +40,126 @@ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +from __future__ import print_function import operator import os +import io import pickle import struct import sys import types from functools import partial import itertools -from copy_reg import _extension_registry, _inverted_registry, _extension_cache -import new import dis import traceback -import platform - -PyImp = platform.python_implementation() - -import logging -cloudLog = logging.getLogger(Cloud.Transport) --- End diff -- I'd be interested in chatting more aboutcCloudpickle, but should probably move this discussion to a mailing list since it's hard to link to GitHub line comments. Mind sending an email to the [dev list](https://spark.apache.org/community.html) and CC'ing me? My email is `joshro...@databricks.com`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user ogrisel commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28461471 --- Diff: python/pyspark/cloudpickle.py --- @@ -40,164 +40,126 @@ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +from __future__ import print_function import operator import os +import io import pickle import struct import sys import types from functools import partial import itertools -from copy_reg import _extension_registry, _inverted_registry, _extension_cache -import new import dis import traceback -import platform - -PyImp = platform.python_implementation() - -import logging -cloudLog = logging.getLogger(Cloud.Transport) --- End diff -- Maybe you can join us on gitter.im: https://gitter.im/cloudpipe/cloudpickle @rgbkrk @sdegryze and I can also join the spark-dev mailing list if your really want to. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user rgbkrk commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28451434 --- Diff: python/pyspark/cloudpickle.py --- @@ -40,164 +40,126 @@ NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - +from __future__ import print_function import operator import os +import io import pickle import struct import sys import types from functools import partial import itertools -from copy_reg import _extension_registry, _inverted_registry, _extension_cache -import new import dis import traceback -import platform - -PyImp = platform.python_implementation() - -import logging -cloudLog = logging.getLogger(Cloud.Transport) --- End diff -- It's no longer vendoring when changes are happening in your own code base of cloudpickle. This ends up being even worse for projects hoping to use it when pyspark itself isn't pip installable either. What's the best path forward for us to help maintain cloudpickle in a way that is friendly to you vendoring it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93536322 [Test build #30362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30362/consoleFull) for PR 5173 at commit [`4006829`](https://github.com/apache/spark/commit/400682982bbb6277d4b2c6dca2c3b88d491e5b21). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93536338 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30362/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93539962 [Test build #678 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/678/consoleFull) for PR 5173 at commit [`8c8b957`](https://github.com/apache/spark/commit/8c8b957fdeb721dc772584ba6135810163ef488c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93542611 [Test build #30364 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30364/consoleFull) for PR 5173 at commit [`8f8e710`](https://github.com/apache/spark/commit/8f8e7100937f2fd5ce5252c11e42cb2230d21581). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93542647 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30364/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93549729 [Test build #30363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30363/consoleFull) for PR 5173 at commit [`5c57c95`](https://github.com/apache/spark/commit/5c57c95a0e8b8ca11f88a60d6e48ef0e4caa3a16). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93549748 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30363/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93551704 [Test build #30365 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30365/consoleFull) for PR 5173 at commit [`8c8b957`](https://github.com/apache/spark/commit/8c8b957fdeb721dc772584ba6135810163ef488c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93551719 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30365/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93593451 **[Test build #30373 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30373/consoleFull)** for PR 5173 at commit [`179fc8d`](https://github.com/apache/spark/commit/179fc8d7b426cbd2e00640ac3b46b26475cfa73a) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93593461 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30373/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93575149 [Test build #680 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/680/consoleFull) for PR 5173 at commit [`179fc8d`](https://github.com/apache/spark/commit/179fc8d7b426cbd2e00640ac3b46b26475cfa73a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93595119 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30376/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93595106 [Test build #30376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30376/consoleFull) for PR 5173 at commit [`bf225d7`](https://github.com/apache/spark/commit/bf225d7ddd43cf297eeabbfe6e888b0655b6b1a5). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93575338 [Test build #30376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30376/consoleFull) for PR 5173 at commit [`bf225d7`](https://github.com/apache/spark/commit/bf225d7ddd43cf297eeabbfe6e888b0655b6b1a5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93575307 [Test build #678 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/678/consoleFull) for PR 5173 at commit [`8c8b957`](https://github.com/apache/spark/commit/8c8b957fdeb721dc772584ba6135810163ef488c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483754 --- Diff: python/pyspark/rdd.py --- @@ -123,6 +129,13 @@ def _load_from_socket(port, serializer): sock.close() +def ignore_unicode_prefix(f): --- End diff -- Please add docstring for this function. It is not clear what it does based on the method name. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483721 --- Diff: python/pyspark/mllib/util.py --- @@ -40,7 +40,7 @@ def _parse_libsvm_line(line, multiclass=None): nnz = len(items) - 1 indices = np.zeros(nnz, dtype=np.int32) values = np.zeros(nnz) -for i in xrange(nnz): +for i in range(nnz): --- End diff -- Can we use the following instead? Using `range` would hurt performance in Python 2 here. ~~~ if sys.version = '3': xrange = range ~~~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483636 --- Diff: python/pyspark/mllib/recommendation.py --- @@ -61,7 +61,7 @@ class MatrixFactorizationModel(JavaModelWrapper, JavaSaveable, JavaLoader): model = ALS.train(ratings, 4, seed=10) model.userFeatures().collect() -[(1, array('d', [...])), (2, array('d', [...]))] +[(1, DenseVector([...])), (2, DenseVector([...]))] --- End diff -- We shouldn't change the return type. I'm not sure `DenseVector` is a safe replacement for array. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483597 --- Diff: examples/src/main/python/mllib/gradient_boosted_trees.py --- @@ -49,8 +50,8 @@ def testRegression(trainingData, testData): # Evaluate model on test instances and compute test error predictions = model.predict(testData.map(lambda x: x.features)) labelsAndPredictions = testData.map(lambda lp: lp.label).zip(predictions) -testMSE = labelsAndPredictions.map(lambda (v, p): (v - p) * (v - p)).sum() \ -/ float(testData.count()) +testMSE = labelsAndPredictions.map(lambda v_p1: (v_p1[0] - v_p1[1]) * (v_p1[0] - v_p1[1]))\ --- End diff -- Why using `v_p1` instead of `vp` or `v_p`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483601 --- Diff: python/pyspark/mllib/feature.py --- @@ -216,7 +221,10 @@ def __init__(self, numFeatures=1 20): def indexOf(self, term): Returns the index of the input term. -return hash(term) % self.numFeatures +# hash of string is not portable in Python 3 +if isinstance(term, unicode): +term = term.encode('utf-8') +return (binascii.crc32(term) 0x7FFF) % self.numFeatures --- End diff -- Any performance overhead with the new approach? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483605 --- Diff: python/pyspark/rdd.py --- @@ -368,12 +381,14 @@ def randomSplit(self, weights, seed=None): :param seed: random seed :return: split RDDs in a list - rdd = sc.parallelize(range(5), 1) + rdd = sc.parallelize(range(500), 1) rdd1, rdd2 = rdd.randomSplit([2, 3], 17) - rdd1.collect() -[1, 3] - rdd2.collect() -[0, 2, 4] + len(rdd1.collect() + rdd2.collect()) +500 + 180 rdd1.count() 220 --- End diff -- This could be relaxed to 150-250, and then `250 rdd2.count() 350`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/5173#discussion_r28483604 --- Diff: python/pyspark/rdd.py --- @@ -353,8 +365,8 @@ def sample(self, withReplacement, fraction, seed=None): :param seed: seed for the random number generator rdd = sc.parallelize(range(100), 4) - rdd.sample(False, 0.1, 81).count() -10 + 9 = rdd.sample(False, 0.1, 81).count() = 11 --- End diff -- We can further relax the bounds to match the theory, e.g., 6-14. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93569850 [Test build #30373 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30373/consoleFull) for PR 5173 at commit [`179fc8d`](https://github.com/apache/spark/commit/179fc8d7b426cbd2e00640ac3b46b26475cfa73a). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-92972052 [Test build #667 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/667/consoleFull) for PR 5173 at commit [`71535e9`](https://github.com/apache/spark/commit/71535e9450419adec289685abdd306d2e264e710). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93123888 [Test build #30287 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30287/consoleFull) for PR 5173 at commit [`4006829`](https://github.com/apache/spark/commit/400682982bbb6277d4b2c6dca2c3b88d491e5b21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93147729 [Test build #30287 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30287/consoleFull) for PR 5173 at commit [`4006829`](https://github.com/apache/spark/commit/400682982bbb6277d4b2c6dca2c3b88d491e5b21). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93147750 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30287/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93119996 @mengxr Could you take a look at the MLlib changes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user shaananc commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93175644 I'm getting the error: `File /spark/python/pyspark/serializers.py, line 419, in loads return pickle.loads(obj) TypeError: ('code() takes at most 14 arguments (15 given)', type 'code', (2, 0, 2, 2, 19, '\x88\x00\x00|\x01\x00\x83\x01\x00S', (None,), (), (u's', u'iterator'), u'/spark/python/pyspark/rdd.py', u'func', 294, '\x00\x01', (u'f',), ()))` Whenever I try and run `data = (1, 2) distData = sc.parallelize(data) distData.reduce(lambda a, b: a + b)` After having pulled this into master. Any clue what the issue could be? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93117173 [Test build #675 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/675/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93131436 [Test build #675 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/675/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93125045 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30285/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93125649 [Test build #676 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/676/consoleFull) for PR 5173 at commit [`4006829`](https://github.com/apache/spark/commit/400682982bbb6277d4b2c6dca2c3b88d491e5b21). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93156062 [Test build #676 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/676/consoleFull) for PR 5173 at commit [`4006829`](https://github.com/apache/spark/commit/400682982bbb6277d4b2c6dca2c3b88d491e5b21). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user davies commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93184039 @shaananc It works fine here: ``` Using Python version 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014 00:54:21) SparkContext available as sc, SQLContext available as sqlContext. data = (1, 2) sc.parallelize(data).reduce(lambda a, b: a + b) 3 ``` What's is your environment? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93075612 [Test build #674 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/674/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93059764 [Test build #30278 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30278/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93062692 [Test build #669 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/669/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93106732 [Test build #30278 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30278/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93106745 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30278/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-93113314 [Test build #674 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/674/consoleFull) for PR 5173 at commit [`2fc0066`](https://github.com/apache/spark/commit/2fc0066bc402a5b1579c9355687ea8f46de9e99c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch **adds the following new dependencies:** * `commons-math3-3.4.1.jar` * `snappy-java-1.1.1.7.jar` * This patch **removes the following dependencies:** * `commons-math3-3.1.1.jar` * `snappy-java-1.1.1.6.jar` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-92466565 [Test build #30186 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30186/consoleFull) for PR 5173 at commit [`2ddfba0`](https://github.com/apache/spark/commit/2ddfba04c2b874a632a7b0c49d28ec570109767d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-92486933 [Test build #30186 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30186/consoleFull) for PR 5173 at commit [`2ddfba0`](https://github.com/apache/spark/commit/2ddfba04c2b874a632a7b0c49d28ec570109767d). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-92486953 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/30186/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4897] [PySpark] Python 3 support
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5173#issuecomment-92517118 [Test build #30193 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/30193/consoleFull) for PR 5173 at commit [`5a55ab4`](https://github.com/apache/spark/commit/5a55ab4cf65dc13e76a32325d99dda81d7e65874). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org