[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
GitHub user colorant opened a pull request: https://github.com/apache/spark/pull/1273 [SPARK-1097] Workaround Hadoop conf ConcurrentModification issue Workaround Hadoop conf ConcurrentModification issue You can merge this pull request into a Git repository by running: $ git pull https://github.com/colorant/spark hadoopRDD Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/1273.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1273 commit 37c13b30a80793e05dd2300f9accbc29db17a336 Author: Raymond Liu Date: 2014-07-01T08:33:33Z Workaround Hadoop conf ConcurrentModification issue --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47631084 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47631092 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47631094 as described in #1000 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47635045 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16281/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-47635044 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1273#discussion_r14533232 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -141,7 +141,7 @@ class HadoopRDD[K, V]( // local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). // The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. // synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456) - broadcastedConf.synchronized { + broadcastedConf.value.value.synchronized { --- End diff -- Can we just use "conf" here and on the next line, instead of `broadcastedConf.value.value`? This is actually the first guy that assumes conf is not null, though, maybe add an assert for that. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1273#discussion_r14544250 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -141,7 +141,7 @@ class HadoopRDD[K, V]( // local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). // The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. // synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456) - broadcastedConf.synchronized { + broadcastedConf.value.value.synchronized { --- End diff -- @aarondav , Yes, I also thought about this before. The reason I keep use broadcastedConf.value.value here is because: though broadcast variable is suggested to be read only and not changed, But I wonder maybe in case someone miss use it and change the value, read the latest value might be helpful. And it read the latest code in the next line in the original code, so I keep this style. But think again, if the value did changed in some place without hold any synchronize lock, this might still not be able to solve the problem. I will update the pull request. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1273#discussion_r14544536 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -141,7 +141,7 @@ class HadoopRDD[K, V]( // local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). // The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. // synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456) - broadcastedConf.synchronized { + broadcastedConf.value.value.synchronized { --- End diff -- regard conf , it is wrapped in SerializableWritable which enforce it to be a Writable, I think it won't be a null , or exception already been thrown somewhere before here. What do you think? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-48000716 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-48000723 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-48002947 All automated tests passed. Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16330/ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-48002946 Merged build finished. All automated tests passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user aarondav commented on a diff in the pull request: https://github.com/apache/spark/pull/1273#discussion_r14545496 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -141,7 +141,7 @@ class HadoopRDD[K, V]( // local process. The local cache is accessed through HadoopRDD.putCachedMetadata(). // The caching helps minimize GC, since a JobConf can contain ~10KB of temporary objects. // synchronize to prevent ConcurrentModificationException (Spark-1097, Hadoop-10456) - broadcastedConf.synchronized { + broadcastedConf.value.value.synchronized { --- End diff -- We can keep without the check, the NPE should be apparent on the synchronized block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1273#issuecomment-48003381 LGTM, merging into master and branch-1.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1273 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---