Github user mridulm commented on the pull request:

    https://github.com/apache/spark/pull/3541#issuecomment-66424149
  
    On Thu, Dec 4, 2014 at 2:57 AM, Davies Liu <notificati...@github.com> wrote:
    
    > @davies <https://github.com/davies> I am not sure I completely understood
    > your comment.
    > Sorry for that, maybe I didnot explain it clearly.
    >
    > As detailed above, there are multiple reasons why a task can fail - and
    > quite a lot of them are non-fatal from 'rescheduling the task on same 
host'
    > point of view : in particular race in spark between reporting executor
    > going down, shutdown hooks running and task schedules due to locality
    > preference.
    > So we need per-executor blacklist - note that this is just a temporary -
    > to either allow the executor to recover (in case task failures are due to
    > transient reasons), or allow task to get scheduled elsewhere in meantime
    > (if schedule locality constraints can be satisfied).
    >
    > Agreed that the executor based blacklist worked for you, and I think the
    > host based blacklist will also work for you (there is a little regression
    > about locality).
    >
    
    It is not a small regression - if you have 4 - 8 executors on a host (as is
    common here) : this change will blacklist all of them instead of
    blacklisting a single executor.
    This is fairly severe regression : which is why I said I am -1 on modifying
    existing behavior unless new functionality allows for existing feature to
    continue to work as currently expected to.
    The thing to understand is executor blacklist is not subsumed by host
    blacklist other than in a very crude model.
    
    
    
    >  A different set of criterion would apply when we want to do host level
    > blacklist - when we have determined that the node is unusable, and so task
    > fails on all executors in the node : due to NODE_LOCAL locality level, we
    > would keep trying other executors on the same node in case executor
    > blacklist kicks in; so in case the node is temporarily unusable, executor
    > black list might not help.
    >
    > So we need host based blacklist.
    >
    
    Yes, the reasons why we need host blacklist are valid and separate from why
    we need executor blacklist.
    They might overlap in some degenerate cases (since obviously host level
    issues do impact executors too) : executor blacklist is more fine grained -
    while host level issues are more coarser in comparison.
    While executor blacklist might alleviate lack of host blacklist to some
    extent (as exists currently), it is suboptimal to do so : so need for host
    blacklist is justified.
    
    
    
    
    >  The timeout based temporary executor blacklist we currently have is
    > still a stop gap solution which solves immediate problems observed at that
    > time : without which spark was becoming unusable in large enough
    > multi-tennet clusters.
    >
    > Agreed.
    >
    > If we want to it to a host level and do a principled solution - then we
    > need a lot of other pieces to be put into place (since currently we only
    > take task scheduling into account; which is insufficient).
    > Top of my head - remove it from rdd replication, de-allocate executors
    > already on the node, moving existing rdd blocks away from the executors on
    > the node, blacklisting the node from further allocation requests (yarn,
    > mesos), and so on. I am sure @kayousterhout
    > <https://github.com/kayousterhout> might have other thoughts on this.
    >
    > Agreed. Figure out the failure domain is a hard thing in distributed
    > environment, I'm doubt that who can contribute a principled solution to
    > retry the failed tasks in the best position in near term (such as
    > reschedule it in same executor, different executor on same host, different
    > host, different rack).
    >
    > I think the host based blacklist is the simplest solution and work well in
    > most failure cases.
    >
    > Unfortunately, I do not have the bandwidth to engage on this; so I am
    > hoping the right thing gets done. Whatever it is, I am -1 on removing
    > executor level blacklist - that is something we heavily depend on to get
    > our jobs to work. A better solution while not regressing on this
    > functionality is most welcome !
    >
    > Really appreciate your comments here, to have a better solution. Could you
    > raise a detailed cases that the host based blacklist will break you job?
    > Maybe there are some cases I did not figure out in your situation, please
    > correct me.
    >
    
    
    The primary reason for executor blacklist, as @kayousterhout
    <https://github.com/kayousterhout> also referred to, were initially quite
    simple :
    Task gets submitted to same executor repeatedly due to locality constraint
    - but keeps failing on the executor since the executor might be in
    inconsistent state (like in middle of shutdown, etc).This very quickly
    caused applications to fail.
    There are other cases where similar pattern is exhibited. Usually they are
    transient in nature - and so why the executor blacklist was modelled as a
    temporary timebound blacklist - that too for only a specific taskset (other
    tasksets can run on the executor, and did not impact other tasks currently
    running on the executor).
    This was a conscious decision not just to simplify the implementation, but
    also because that was the immediate space of problems we were targetting.
    
    A host blacklist might have to consider other aspects - I dont know the
    scope of the problem being approached.
    1) We could make a simplifying assumption and assume the host issues are
    transient in nature - like with executor and proceed with that assumption.
    This might be valid to make in case failed executors are cleaning up their
    state and the host cane recover and be healthy soon (like disk space
    reclaimed on executor shutdown, etc).
    2) We might want to model more serious issues - like what MR seems to do.
    I have not looked at their implementation, but iirc once blacklisted a host
    is not considered for a specific AM again.
    @tgravescs <https://github.com/tgravescs> might be able to comment more.
    3) Since spark manages more state - do we want to do anything about that :
    list replicated blocks, etc.
    
    
    
    
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/3541#issuecomment-65493883>.
    >


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to