Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19145
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19145
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19145
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19145
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19145
Can one of the admins verify this patch?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
sorry for the late response, IIUC, in MR, this case handled by below
1. AM received the container failed message
2. AM will check whether there are any attempts of the same task is RUNNING
Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/19145
Have you guys reached a consensus on whether this PR is needed or not?
---
-
To unsubscribe, e-mail:
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
@jerryshao thank you for your comment, I will try to find how MR/TEZ handle
this
---
-
To unsubscribe, e-mail:
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19145
@klion26 , this is not a problem related to Spark Streaming and Structured
Streaming. For any Spark application it will run into this problem. This is
basically a YARN problem and looks hard to
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
@squito I agree with you that this should be handled by yarn.
In my opinion, this is some form of defensive programming. The Spark
Streaming and structured streaming will both request more
Github user squito commented on the issue:
https://github.com/apache/spark/pull/19145
I'm not sure I totally follow the sequence of events, but I get the feeling
this should be handled in yarn, not spark.
Also, I agree with Jerry, it seems like your `completedContainerIdSet`
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
My colleague create a
[issue](https://issues.apache.org/jira/browse/YARN-7214) here, I rewrite the
description here.
Spark Streaming (app1) running on Yarn, app1's one container (c1) runs
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19145
And based on your fix:
1. looks like you don't have retention mechanism, which will potential
introduce memory leak.
2. I don't see your logic to avoid requesting new containers, is
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19145
>But if we restart the RM, then, the lost containers in the NM will be
reported to RM as lost again because of recovery
Since you already enabled RM and NM recovery, IIUC the failure of
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
We enabled RM and NM recovery.
If we assume there are 2 containers running on this NM, after 10 minute, RM
detects the failure of NM and relaunches 2 lost containers in other NMs. This
is
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19145
Did you enable RM or NM recovery, can you please clarify it?
Normally, if we assume there's are 2 containers running on this NM, after
10 minutes, RM will detect the failure of NM and
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
Hi @jerryshao, thank you for your reply.
# Problem
the problem is for long running jobs which run on **yarn with HA** will
request more executors than it requests.
# How to
Github user jerryshao commented on the issue:
https://github.com/apache/spark/pull/19145
Hi @klion26 , sorry for the late response. Can we please understand the
problem first, would you please describe your problem in detail and how to
reproduce your issue?
---
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
@jerryshao Could you help me to review this pathc?
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
Will the same completed message will be reported more than twice, if these
message will not be reported more than twice, then i could use
`completedContainerIdSet.remove(containerId)`
instead
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
@HyukjinKwon @vanzin @srowen @foxish @djvulee @squito Could you please
help to review this pr?
---
-
To unsubscribe, e-mail:
Github user klion26 commented on the issue:
https://github.com/apache/spark/pull/19145
@HyukjinKwon i am sorry for that, have changed the title form
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
22 matches
Mail list logo