Github user sahilTakiar commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19413#discussion_r142499531
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
    @@ -157,20 +157,23 @@ class HadoopRDD[K, V](
           if (conf.isInstanceOf[JobConf]) {
             logDebug("Re-using user-broadcasted JobConf")
             conf.asInstanceOf[JobConf]
    -      } else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) {
    -        logDebug("Re-using cached JobConf")
    -        HadoopRDD.getCachedMetadata(jobConfCacheKey).asInstanceOf[JobConf]
           } else {
    -        // Create a JobConf that will be cached and used across this RDD's 
getJobConf() calls in the
    -        // local process. The local cache is accessed through 
HadoopRDD.putCachedMetadata().
    -        // The caching helps minimize GC, since a JobConf can contain 
~10KB of temporary objects.
    -        // Synchronize to prevent ConcurrentModificationException 
(SPARK-1097, HADOOP-10456).
    -        HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
    -          logDebug("Creating new JobConf and caching it for later re-use")
    -          val newJobConf = new JobConf(conf)
    -          initLocalJobConfFuncOpt.foreach(f => f(newJobConf))
    -          HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf)
    -          newJobConf
    +        val newJobConf = HadoopRDD.getCachedMetadata(jobConfCacheKey)
    --- End diff --
    
    Thanks for the tip. Updated the patch.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to