[jira] [Commented] (SPARK-17036) Hadoop config caching could lead to memory pressure and high CPU usage in thrift server

2016-08-15 Thread Rajesh Balamohan (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15420984#comment-15420984
 ] 

Rajesh Balamohan commented on SPARK-17036:
--

When large number of jobs are run in concurrent fashion via spark thrift 
server, it starts consuming large amount of CPU fairly soon. Since 
{{spark.hadoop.cloneConf=false}} by default, it 
caches the job conf for every RDD that is created in {{HadoopRDD.getJobConf}}. 
This creates large GC pressure and ends up causing this high CPU usage. This 
would not cause OOM as this cache is a soft reference cache internally.
Creating this JIRA to explore on whether this caching can be made optional and 
create new conf object instead.

> Hadoop config caching could lead to memory pressure and high CPU usage in 
> thrift server
> ---
>
> Key: SPARK-17036
> URL: https://issues.apache.org/jira/browse/SPARK-17036
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> Creating this as a follow up jira to SPARK-12920.  Profiler output on the 
> caching is attached in SPARK-12920.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17036) Hadoop config caching could lead to memory pressure and high CPU usage in thrift server

2016-08-12 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15418828#comment-15418828
 ] 

Sean Owen commented on SPARK-17036:
---

[~rajesh.balamohan] please summarize the issue here.

> Hadoop config caching could lead to memory pressure and high CPU usage in 
> thrift server
> ---
>
> Key: SPARK-17036
> URL: https://issues.apache.org/jira/browse/SPARK-17036
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Reporter: Rajesh Balamohan
>Priority: Minor
>
> Creating this as a follow up jira to SPARK-12920.  Profiler output on the 
> caching is attached in SPARK-12920.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org