[ 
https://issues.apache.org/jira/browse/SPARK-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420984#comment-15420984
 ] 

Rajesh Balamohan commented on SPARK-17036:
------------------------------------------

When large number of jobs are run in concurrent fashion via spark thrift 
server, it starts consuming large amount of CPU fairly soon. Since 
{{spark.hadoop.cloneConf=false}} by default, it 
caches the job conf for every RDD that is created in {{HadoopRDD.getJobConf}}. 
This creates large GC pressure and ends up causing this high CPU usage. This 
would not cause OOM as this cache is a soft reference cache internally.
Creating this JIRA to explore on whether this caching can be made optional and 
create new conf object instead.

> Hadoop config caching could lead to memory pressure and high CPU usage in 
> thrift server
> ---------------------------------------------------------------------------------------
>
>                 Key: SPARK-17036
>                 URL: https://issues.apache.org/jira/browse/SPARK-17036
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>            Reporter: Rajesh Balamohan
>            Priority: Minor
>
> Creating this as a follow up jira to SPARK-12920.  Profiler output on the 
> caching is attached in SPARK-12920.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to