[ https://issues.apache.org/jira/browse/SPARK-14091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Josh Rosen updated SPARK-14091: ------------------------------- Assignee: Rajesh Balamohan > Consider improving performance of SparkContext.getCallSite() > ------------------------------------------------------------ > > Key: SPARK-14091 > URL: https://issues.apache.org/jira/browse/SPARK-14091 > Project: Spark > Issue Type: Bug > Components: Spark Core > Reporter: Rajesh Balamohan > Assignee: Rajesh Balamohan > > Currently SparkContext.getCallSite() makes a call to Utils.getCallSite(). > {noformat} > private[spark] def getCallSite(): CallSite = { > val callSite = Utils.getCallSite() > CallSite( > > Option(getLocalProperty(CallSite.SHORT_FORM)).getOrElse(callSite.shortForm), > > Option(getLocalProperty(CallSite.LONG_FORM)).getOrElse(callSite.longForm) > ) > } > {noformat} > However, in some places utils.withDummyCallSite(sc) is invoked to avoid > expensive threaddumps within getCallSite(). But Utils.getCallSite() is > evaluated earlier causing threaddumps to be computed. This would impact when > lots of RDDs are created (e.g spends close to 3-7 seconds when 1000+ are RDDs > are present, which can have significant impact when entire query runtime is > in the order of 10-20 seconds) > Creating this jira to consider evaluating getCallSite only when needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org