[ 
https://issues.apache.org/jira/browse/SPARK-21084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16049669#comment-16049669
 ] 

Frederick Reiss commented on SPARK-21084:
-----------------------------------------

[~sowen] thanks for having a look at this JIRA and giving feedback!

I must confess that, when our product groups first brought these issues to my 
attention, my initial response was similar to yours. Each of the issues 
described above can be fixed *in isolation* by reconfiguring Spark and/or the 
resource manager.  The problem is that every such fix makes the other issues 
worse.  We spent a number of weeks playing best practices whack-a-mole before 
resigning ourselves to making some targeted improvements to Spark itself.

I'll update the description of this JIRA in a moment with a high-level 
description of the Spark changes we're currently looking into.

In the meantime, here's a quick summary of what we ran into while attempting to 
devise a workable configuration of dynamic allocation for notebook users:
Issue #1 (starvation): The obvious fix here is preemption. But there is 
currently no way to preempt an executor gently. The only option is to shut down 
the executor and drop its data, which leads to issues #2 and #4.  Worse, 
Spark's scheduling and cache management are opaque to the resource manager, so 
the RM makes arbitrary choices of which executor to shoot.
Another approach is to configure 
{{spark.dynamicAllocation.cachedExecutorIdleTimeout}} so that notebook sessions 
voluntarily give up executors, even when those executors have cached data. But 
this leads to issues #2 and #4.

Issue #2 (request latency): This issue has two root causes: 
a) It takes a noticeable amount of time to start and ramp up new executors.
b) Spark defers the cost of issue #4 (losing cached data) until a job attempts 
to consume the missing data.
For root cause (a), the obvious solution is to reserve a permanent minimum pool 
of executors for each notebook user by setting the 
{{spark.dynamicAllocation.minExecutors}} parameter to a sufficiently high 
value. But tying down containers in this way leaves fewer resources for other 
users, exacerbating issues #1, #3, and #4. The reserved executors are likely to 
be idle most of the time, because notebook users alternate between running 
Spark jobs, running local computation in the notebook kernel, and looking at 
results in the web browser. 
See issue #4 below for what happens when you try to address root cause (b) with 
config changes.

Issue #3 (unfair allocation of CPU): The obvious fix here is to set 
{{spark.dynamicAllocation.cachedExecutorIdleTimeout}} so that notebook sessions 
voluntarily give up executors, even when those executors have cached data. But 
this leads to issues #2 and #4. One can also reduce the value of 
{{spark.dynamicAllocation.maxExecutors}}, but that puts a cap on the degree of 
parallelism that a given user can access, leading to more of issue #2.

Issue #4 (loss of cached data): The obvious fix here is to set 
{{spark.dynamicAllocation.cachedExecutorIdleTimeout}} to infinity. But then any 
notebook user who has called RDD.cache() at some point in the past will tie 
down a large pool of containers indefinitely, leading to issues #1, #2, and #3. 
If you attempt to limit the size of this large pool by reducing 
{{spark.dynamicAllocation.maxExecutors}}, you limit the peak performance that 
the notebook user can get out of Spark, leading to issue #2.


> Improvements to dynamic allocation for notebook use cases
> ---------------------------------------------------------
>
>                 Key: SPARK-21084
>                 URL: https://issues.apache.org/jira/browse/SPARK-21084
>             Project: Spark
>          Issue Type: Umbrella
>          Components: Block Manager, Scheduler, Spark Core, YARN
>    Affects Versions: 2.2.0, 2.3.0
>            Reporter: Frederick Reiss
>
> One important application of Spark is to support many notebook users with a 
> single YARN or Spark Standalone cluster.  We at IBM have seen this 
> requirement across multiple deployments of Spark: on-premises and private 
> cloud deployments at our clients, as well as on the IBM cloud.  The scenario 
> goes something like this: "Every morning at 9am, 500 analysts log into their 
> computers and start running Spark notebooks intermittently for the next 8 
> hours." I'm sure that many other members of the community are interested in 
> making similar scenarios work.
>     
> Dynamic allocation is supposed to support these kinds of use cases by 
> shifting cluster resources towards users who are currently executing scalable 
> code.  In our own testing, we have encountered a number of issues with using 
> the current implementation of dynamic allocation for this purpose:
> *Issue #1: Starvation.* A Spark job acquires all available containers, 
> preventing other jobs or applications from starting.
> *Issue #2: Request latency.* Jobs that would normally finish in less than 30 
> seconds take 2-4x longer than normal with dynamic allocation.
> *Issue #3: Unfair resource allocation due to cached data.* Applications that 
> have cached RDD partitions hold onto executors indefinitely, denying those 
> resources to other applications.
> *Issue #4: Loss of cached data leads to thrashing.*  Applications repeatedly 
> lose partitions of cached RDDs because the underlying executors are removed; 
> the applications then need to rerun expensive computations.
>     
> This umbrella JIRA covers efforts to address these issues by making 
> enhancements to Spark.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to