[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/18052
  
IMHO it is, but this feature is hardly essential. Arguably we wouldn't need 
Scala API in the first place, if the built-in `Future` supported canceling.

It is possible I am overthinking the latter one, but I don't see much point 
of adding an API which doesn't integrate with existing language features. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18052
  
Hm.. for the former, isn't that the problem existing already for other APIs 
if Py4J itself is problematic? For the latter,  if this is not that simple, I 
would rathet avoid adding this API for now personally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/18052
  
Personally I would prefer not including this at all, than using JVM 
implementation with callbacks:

- Py4J gateway is already pretty slow, and can be unstable under high load. 
Putting higher pressure there doesn't seem like a good approach.
- To "wrap" JVM side we would have to re-implement a full featured future 
API, at least partially compatible with `asyncio.Future` or 
`concurrent.futures.Future`. It is much higher maintenance burden, especially 
when both APIs are actively developed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18052
  
Yea, I would then go +1 if this can simply done by exposing the existing 
APIs. For my production case, I need this at least. This would not be something 
to be suggested to be closed for no interests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18052
  
Sure, I think that the Python specific implementation is probably something 
we don't want to pick up as a maintenance burden - but exposing the current 
Java API seems reasonable -- especially if we are ever planning on adding async 
actions on Datasets.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/18052
  
I subscribed this PR as actually I am the user who implemented the similar 
one in user side weirdly. But one thing I can tell is, it was buggy and hard to 
debugg / test in particular when the workload is intensive. Some jobs were 
aborted and I ended up with writing up weird codes to prevent this (it was 
maybe my fault but I just wanted to tell my anecdote).

If this needs a different Python specific implementation alone, I would not 
rather support this functionality for maintanace issue for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/18052
  
Personally, I think less is more, don't add everything into every software, 
otherwise every software can write email eventually. 

The RDD API is kind of frozen, we don't add more APIs into it if it's not 
necessary. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18052
  
I think the kind of users wanting to use async actions are also the same 
kind of users who would be writing multi-threaded Spark applications.

That being said @davies is there a reason you don't want support for this 
in Spark natively? We already have built in support on the Scala side.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/18052
  
@davies It is. Monkey patching context, `RDD` and some classes not covered 
by Scala `AsyncRDDFunctions`, [takes around 100 
LOCs](https://github.com/zero323/pyspark-asyncactions) (excluding tests, 
comments, and package boilerplate). Without implicit Spark requirements (thread 
safety) one could also use `asyncio`, and skip thread pool whatsoever.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread davies
Github user davies commented on the issue:

https://github.com/apache/spark/pull/18052
  
It seems that it's also easy to implement these outside of PySpark by user 
themselves or third-party libraries, right? If that's the case, I'd like not to 
add it into PySpark.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-20 Thread holdenk
Github user holdenk commented on the issue:

https://github.com/apache/spark/pull/18052
  
Lets see what @davies has to say.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-06-03 Thread zero323
Github user zero323 commented on the issue:

https://github.com/apache/spark/pull/18052
  
__Note__: [Waiting for some 
feedback](https://twitter.com/holdenkarau/status/866672579318337537).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18052
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77180/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18052
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18052
  
**[Test build #77180 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77180/testReport)**
 for PR 18052 at commit 
[`ff55e2d`](https://github.com/apache/spark/commit/ff55e2d9788dc3f212d923e36a6984f6c97f51d4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-22 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18052
  
**[Test build #77180 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77180/testReport)**
 for PR 18052 at commit 
[`ff55e2d`](https://github.com/apache/spark/commit/ff55e2d9788dc3f212d923e36a6984f6c97f51d4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18052
  
**[Test build #77161 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77161/testReport)**
 for PR 18052 at commit 
[`06116d1`](https://github.com/apache/spark/commit/06116d1b29c9038df4a3c231bde471c337bf3c53).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77161/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18052
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18052
  
**[Test build #77161 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77161/testReport)**
 for PR 18052 at commit 
[`06116d1`](https://github.com/apache/spark/commit/06116d1b29c9038df4a3c231bde471c337bf3c53).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18052
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18052
  
**[Test build #77157 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77157/testReport)**
 for PR 18052 at commit 
[`72bd097`](https://github.com/apache/spark/commit/72bd097896aca042944d8e20282617e4864d9dd0).
 * This patch **fails Python style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18052
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77157/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18052: [SPARK-20347][PYSPARK][WIP] Provide AsyncRDDActions in P...

2017-05-21 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18052
  
**[Test build #77157 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77157/testReport)**
 for PR 18052 at commit 
[`72bd097`](https://github.com/apache/spark/commit/72bd097896aca042944d8e20282617e4864d9dd0).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org