[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
>If we can remove uses of askWithRetry as we find these issues, we can, at 
some point, finally get rid of the API altogether.

How do you think about providing a *"blocking"* `ask` in `RpcEndpointRef`? 
Just wait the future but no retry. Thus no need to wait the future outside of 
`RpcEndpontRef`. If you agree, I'll make another PR for changing. Currently I 
already found some places using a blocking `ask`, that's why I think it's 
needed to provid one in `RpcEndpointRef`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-10 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
@vanzin 
Thanks a lot for your comment. It's very helpful.
I'll change it to `ask`.
I think it make sense to keep receiver idempotent when handling 
`AskPermissionToCommitOutput`, even though we use `ask` to replace 
`askWithRetry`. So I didn't remove the code and unit test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-10 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16503
  
You can make `ask` blocking by waiting for its future (e.g. with 
`ThreadUtils.awaitResult`).

My point of not using `askWithRetry` is that it's basically an unneeded 
API, and a leftover from the akka days that doesn't make sense anymore. It's 
prone to cause deadlocks (exactly because it's blocking), it imposes 
restrictions on the caller (e.g. idempotency) and other things that people 
generally don't pay that much attention to when using it.

If we can remove uses of `askWithRetry` as we find these issues, we can, at 
some point, finally get rid of the API altogether.

> RPC layer doesn't drop message but message can be timeout. 

Yes it can timeout. You can retry it (basically doing what `askWithRetry` 
does) but it should be such an edge case that failing the task should be ok.

If you think about how the RPC layer works when you use `askWithRetry`, 
this is what happens:

- first RPC is sent
- remote end is blocked on something, RPC is waiting in the queue
- sender re-sends the RPC
- lather, rinse, repeat
- at some point, receive goes through the RPC queue and start responding to 
the RPCs
- it responds to the *first* RPC above first, sender ignores the answer 
since RPC was timed out
- lather, rinse, repeat
- finally the last RPC is responded to and the sender sees the reply

So it's a really expensive way of just doing `ask` with a longer timeout.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
@zsxing, @vanzin
Maybe using `ask` in method `canCommit` is not suitable(i think). Because 
`ask` returns a Future, but it should be a blocking process to get result of 
`AskPermissionToCommitOutput` in `canCommit`. RPC layer doesn't drop message 
but message can be timeout. If timeout happens, we can retry or fail the task 
and let spark reschedule it, both of them are ok. But for some heavy tasks, 
retry is light weight (i think). So my suggestion is to keep the `askWithRetry` 
here and change the receiver to be idempotent. What do you think ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16503
  
> In which case the executor will die (see 
CoarseGrainedExecutorBackend::onDisconnected).

Yeah. Didn't recall that. Then I agree that using `ask` is better.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16503
  
> It doesn't drop but the connection may be broken

In which case the executor will die (see 
`CoarseGrainedExecutorBackend::onDisconnected`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16503
  
> That was the case with akka (I think, not really sure), but the netty RPC 
layer doesn't drop messages. The new one is "exactly once".

It doesn't drop but the connection may be broken. `askWithRetry` will 
reconnect if the connection is broken.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16503
  
> The RPC layer only guarantees at-most-once

That was the case with akka (I think, not really sure), but the netty RPC 
layer doesn't drop messages. The new one is "exactly once".


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/16503
  
Good catch. Looks good to me.

@vanzin The RPC layer only guarantees at-most-once. Retry may be still 
helpful in some case, but the receiver should be idempotent. Either the current 
change or changing to use `ask` (Spark will retry the task) is good to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread vanzin
Github user vanzin commented on the issue:

https://github.com/apache/spark/pull/16503
  
I think this is another case where using `askWithRetry` makes no sense 
given the guarantees of the RPC layer.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-09 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
@zsxwing @kayousterhout @andrewor14 Could you please help take a look at 
this ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16503: [SPARK-18113] canCommit should return same when called b...

2017-01-08 Thread jinxing64
Github user jinxing64 commented on the issue:

https://github.com/apache/spark/pull/16503
  
@mccheah @JoshRosen @ash211 Could you please take look at this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org