[GitHub] spark pull request #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfigurati...

jiangxb1987 Tue, 03 Oct 2017 07:31:55 -0700

Github user jiangxb1987 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19413#discussion_r142418306
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
    @@ -157,20 +157,23 @@ class HadoopRDD[K, V](
           if (conf.isInstanceOf[JobConf]) {
             logDebug("Re-using user-broadcasted JobConf")
             conf.asInstanceOf[JobConf]
    -      } else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) {
    -        logDebug("Re-using cached JobConf")
    -        HadoopRDD.getCachedMetadata(jobConfCacheKey).asInstanceOf[JobConf]
           } else {
    -        // Create a JobConf that will be cached and used across this RDD's 
getJobConf() calls in the
    -        // local process. The local cache is accessed through 
HadoopRDD.putCachedMetadata().
    -        // The caching helps minimize GC, since a JobConf can contain 
~10KB of temporary objects.
    -        // Synchronize to prevent ConcurrentModificationException 
(SPARK-1097, HADOOP-10456).
    -        HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
    -          logDebug("Creating new JobConf and caching it for later re-use")
    -          val newJobConf = new JobConf(conf)
    -          initLocalJobConfFuncOpt.foreach(f => f(newJobConf))
    -          HadoopRDD.putCachedMetadata(jobConfCacheKey, newJobConf)
    -          newJobConf
    +        val newJobConf = HadoopRDD.getCachedMetadata(jobConfCacheKey)
    --- End diff --
    
    You are using two val with the same name here, normally we don't recommend 
that this way. How about follow vanzin's suggestion in the JIRA issue, like 
this:
    ```
    Option(HadoopRDD.getCachedMetadata(jobConfCacheKey)).getOrElse(
         HadoopRDD.CONFIGURATION_INSTANTIATION_LOCK.synchronized {
              ......
    })
    ```



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19413: [SPARK-20466][CORE] HadoopRDD#addLocalConfigurati...

Reply via email to