[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 IMHO it is, but this feature is hardly essential. Arguably we wouldn't need Scala API in the first place, if the built-in `Future` supported canceling. It is possible I am overthinking the latter one, but I don't see much point of adding an API which doesn't integrate with existing language features. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18052 Hm.. for the former, isn't that the problem existing already for other APIs if Py4J itself is problematic? For the latter, if this is not that simple, I would rathet avoid adding this API for now personally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 Personally I would prefer not including this at all, than using JVM implementation with callbacks: - Py4J gateway is already pretty slow, and can be unstable under high load. Putting higher pressure there doesn't seem like a good approach. - To "wrap" JVM side we would have to re-implement a full featured future API, at least partially compatible with `asyncio.Future` or `concurrent.futures.Future`. It is much higher maintenance burden, especially when both APIs are actively developed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18052 Yea, I would then go +1 if this can simply done by exposing the existing APIs. For my production case, I need this at least. This would not be something to be suggested to be closed for no interests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18052 Sure, I think that the Python specific implementation is probably something we don't want to pick up as a maintenance burden - but exposing the current Java API seems reasonable -- especially if we are ever planning on adding async actions on Datasets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18052 I subscribed this PR as actually I am the user who implemented the similar one in user side weirdly. But one thing I can tell is, it was buggy and hard to debugg / test in particular when the workload is intensive. Some jobs were aborted and I ended up with writing up weird codes to prevent this (it was maybe my fault but I just wanted to tell my anecdote). If this needs a different Python specific implementation alone, I would not rather support this functionality for maintanace issue for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user davies commented on the issue: https://github.com/apache/spark/pull/18052 Personally, I think less is more, don't add everything into every software, otherwise every software can write email eventually. The RDD API is kind of frozen, we don't add more APIs into it if it's not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18052 I think the kind of users wanting to use async actions are also the same kind of users who would be writing multi-threaded Spark applications. That being said @davies is there a reason you don't want support for this in Spark natively? We already have built in support on the Scala side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 @davies It is. Monkey patching context, `RDD` and some classes not covered by Scala `AsyncRDDFunctions`, [takes around 100 LOCs](https://github.com/zero323/pyspark-asyncactions) (excluding tests, comments, and package boilerplate). Without implicit Spark requirements (thread safety) one could also use `asyncio`, and skip thread pool whatsoever. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user davies commented on the issue: https://github.com/apache/spark/pull/18052 It seems that it's also easy to implement these outside of PySpark by user themselves or third-party libraries, right? If that's the case, I'd like not to add it into PySpark. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user holdenk commented on the issue: https://github.com/apache/spark/pull/18052 Lets see what @davies has to say. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user zero323 commented on the issue: https://github.com/apache/spark/pull/18052 __Note__: [Waiting for some feedback](https://twitter.com/holdenkarau/status/866672579318337537). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77180/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77180 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77180/testReport)** for PR 18052 at commit [`ff55e2d`](https://github.com/apache/spark/commit/ff55e2d9788dc3f212d923e36a6984f6c97f51d4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77180 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77180/testReport)** for PR 18052 at commit [`ff55e2d`](https://github.com/apache/spark/commit/ff55e2d9788dc3f212d923e36a6984f6c97f51d4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77161 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77161/testReport)** for PR 18052 at commit [`06116d1`](https://github.com/apache/spark/commit/06116d1b29c9038df4a3c231bde471c337bf3c53). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77161/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77161 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77161/testReport)** for PR 18052 at commit [`06116d1`](https://github.com/apache/spark/commit/06116d1b29c9038df4a3c231bde471c337bf3c53). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77157 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77157/testReport)** for PR 18052 at commit [`72bd097`](https://github.com/apache/spark/commit/72bd097896aca042944d8e20282617e4864d9dd0). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18052 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77157/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18052 **[Test build #77157 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77157/testReport)** for PR 18052 at commit [`72bd097`](https://github.com/apache/spark/commit/72bd097896aca042944d8e20282617e4864d9dd0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org