[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 >If we can remove uses of askWithRetry as we find these issues, we can, at some point, finally get rid of the API altogether. How do you think about providing a *"blocking"* `ask` in `RpcEndpointRef`? Just wait the future but no retry. Thus no need to wait the future outside of `RpcEndpontRef`. If you agree, I'll make another PR for changing. Currently I already found some places using a blocking `ask`, that's why I think it's needed to provid one in `RpcEndpointRef`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @vanzin Thanks a lot for your comment. It's very helpful. I'll change it to `ask`. I think it make sense to keep receiver idempotent when handling `AskPermissionToCommitOutput`, even though we use `ask` to replace `askWithRetry`. So I didn't remove the code and unit test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16503 You can make `ask` blocking by waiting for its future (e.g. with `ThreadUtils.awaitResult`). My point of not using `askWithRetry` is that it's basically an unneeded API, and a leftover from the akka days that doesn't make sense anymore. It's prone to cause deadlocks (exactly because it's blocking), it imposes restrictions on the caller (e.g. idempotency) and other things that people generally don't pay that much attention to when using it. If we can remove uses of `askWithRetry` as we find these issues, we can, at some point, finally get rid of the API altogether. > RPC layer doesn't drop message but message can be timeout. Yes it can timeout. You can retry it (basically doing what `askWithRetry` does) but it should be such an edge case that failing the task should be ok. If you think about how the RPC layer works when you use `askWithRetry`, this is what happens: - first RPC is sent - remote end is blocked on something, RPC is waiting in the queue - sender re-sends the RPC - lather, rinse, repeat - at some point, receive goes through the RPC queue and start responding to the RPCs - it responds to the *first* RPC above first, sender ignores the answer since RPC was timed out - lather, rinse, repeat - finally the last RPC is responded to and the sender sees the reply So it's a really expensive way of just doing `ask` with a longer timeout. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @zsxing, @vanzin Maybe using `ask` in method `canCommit` is not suitable(i think). Because `ask` returns a Future, but it should be a blocking process to get result of `AskPermissionToCommitOutput` in `canCommit`. RPC layer doesn't drop message but message can be timeout. If timeout happens, we can retry or fail the task and let spark reschedule it, both of them are ok. But for some heavy tasks, retry is light weight (i think). So my suggestion is to keep the `askWithRetry` here and change the receiver to be idempotent. What do you think ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16503 > In which case the executor will die (see CoarseGrainedExecutorBackend::onDisconnected). Yeah. Didn't recall that. Then I agree that using `ask` is better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16503 > It doesn't drop but the connection may be broken In which case the executor will die (see `CoarseGrainedExecutorBackend::onDisconnected`). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16503 > That was the case with akka (I think, not really sure), but the netty RPC layer doesn't drop messages. The new one is "exactly once". It doesn't drop but the connection may be broken. `askWithRetry` will reconnect if the connection is broken. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16503 > The RPC layer only guarantees at-most-once That was the case with akka (I think, not really sure), but the netty RPC layer doesn't drop messages. The new one is "exactly once". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/16503 Good catch. Looks good to me. @vanzin The RPC layer only guarantees at-most-once. Retry may be still helpful in some case, but the receiver should be idempotent. Either the current change or changing to use `ask` (Spark will retry the task) is good to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16503 I think this is another case where using `askWithRetry` makes no sense given the guarantees of the RPC layer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @zsxwing @kayousterhout @andrewor14 Could you please help take a look at this ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/16503 @mccheah @JoshRosen @ash211 Could you please take look at this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org