[
https://issues.apache.org/jira/browse/SPARK-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420984#comment-15420984
]
Rajesh Balamohan commented on SPARK-17036:
--
When large number of jobs are run in concurrent fashion via spark thrift
server, it starts consuming large amount of CPU fairly soon. Since
{{spark.hadoop.cloneConf=false}} by default, it
caches the job conf for every RDD that is created in {{HadoopRDD.getJobConf}}.
This creates large GC pressure and ends up causing this high CPU usage. This
would not cause OOM as this cache is a soft reference cache internally.
Creating this JIRA to explore on whether this caching can be made optional and
create new conf object instead.
> Hadoop config caching could lead to memory pressure and high CPU usage in
> thrift server
> ---
>
> Key: SPARK-17036
> URL: https://issues.apache.org/jira/browse/SPARK-17036
> Project: Spark
> Issue Type: Improvement
> Components: SQL
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> Creating this as a follow up jira to SPARK-12920. Profiler output on the
> caching is attached in SPARK-12920.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org