Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19413#discussion_r142418306 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -157,20 +157,23 @@ class HadoopRDD[K, V]( if (conf.isInstanceOf[JobConf]) { logDebug("Re-using user-broadcasted JobConf") conf.asInstanceOf[JobConf] - } else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) { - logDebug("Re-using cached JobConf") - HadoopRDD.getCachedMetadata(jobConfCacheKey).asInstanceOf[JobConf] } else { - // Create a JobConf that will be cached and used across this RDD's getJobConf() calls in the - // local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). - // The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. - // Synchronize to prevent ConcurrentModificationException (SPARK-1097, HADOOP-10456). - HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized { - logDebug("Creating new JobConf and caching it for later re-use") - val newJobConf = new JobConf(conf) - initLocalJobConfFuncOpt.foreach(f => f(newJobConf)) - HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf) - newJobConf + val newJobConf = HadoopRDD.getCachedMetadata(jobConfCacheKey) --- End diff -- You are using two val with the same name here, normally we don't recommend that this way. How about follow vanzin's suggestion in the JIRA issue, like this: ``` Option(HadoopRDD.getCachedMetadata(jobConfCacheKey)).getOrElse( HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized { ...... }) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org