[ 
https://issues.apache.org/jira/browse/SPARK-14091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-14091:
-------------------------------
    Summary: Improve performance of SparkContext.getCallSite()  (was: Consider 
improving performance of SparkContext.getCallSite())

> Improve performance of SparkContext.getCallSite()
> -------------------------------------------------
>
>                 Key: SPARK-14091
>                 URL: https://issues.apache.org/jira/browse/SPARK-14091
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>            Reporter: Rajesh Balamohan
>            Assignee: Rajesh Balamohan
>
> Currently SparkContext.getCallSite() makes a call to Utils.getCallSite().
> {noformat}
>   private[spark] def getCallSite(): CallSite = {
>     val callSite = Utils.getCallSite()
>     CallSite(
>       
> Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm),
>       
> Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm)
>     )
>   }
> {noformat}
> However, in some places utils.withDummyCallSite(sc) is invoked to avoid 
> expensive threaddumps within getCallSite().  But Utils.getCallSite() is 
> evaluated earlier causing threaddumps to be computed.  This would impact when 
> lots of RDDs are created (e.g spends close to 3-7 seconds when 1000+ are RDDs 
> are present, which can have significant impact when entire query runtime is 
> in the order of 10-20 seconds)
> Creating this jira to consider evaluating getCallSite only when needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to