[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-01 Thread colorant
GitHub user colorant opened a pull request:

https://github.com/apache/spark/pull/1273

[SPARK-1097] Workaround Hadoop conf ConcurrentModification issue

Workaround Hadoop conf ConcurrentModification issue



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/colorant/spark hadoopRDD

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1273


commit 37c13b30a80793e05dd2300f9accbc29db17a336
Author: Raymond Liu 
Date:   2014-07-01T08:33:33Z

Workaround Hadoop conf ConcurrentModification issue




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-47631084
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-47631092
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-01 Thread colorant
Github user colorant commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-47631094
  
as described in #1000 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-47635045
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16281/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-47635044
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1273#discussion_r14533232
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -141,7 +141,7 @@ class HadoopRDD[K, V](
   // local process. The local cache is accessed through 
HadoopRDD.putCachedMetadata().
   // The caching helps minimize GC, since a JobConf can contain ~10KB 
of temporary objects.
   // synchronize to prevent ConcurrentModificationException 
(Spark-1097, Hadoop-10456)
-  broadcastedConf.synchronized {
+  broadcastedConf.value.value.synchronized {
--- End diff --

Can we just use "conf" here and on the next line, instead of 
`broadcastedConf.value.value`? This is actually the first guy that assumes conf 
is not null, though, maybe add an assert for that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread colorant
Github user colorant commented on a diff in the pull request:

https://github.com/apache/spark/pull/1273#discussion_r14544250
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -141,7 +141,7 @@ class HadoopRDD[K, V](
   // local process. The local cache is accessed through 
HadoopRDD.putCachedMetadata().
   // The caching helps minimize GC, since a JobConf can contain ~10KB 
of temporary objects.
   // synchronize to prevent ConcurrentModificationException 
(Spark-1097, Hadoop-10456)
-  broadcastedConf.synchronized {
+  broadcastedConf.value.value.synchronized {
--- End diff --

@aarondav , Yes, I also thought about this before. The reason I keep use 
broadcastedConf.value.value here is because: though broadcast variable is 
suggested to be read only and not changed, But I wonder maybe in case someone 
miss use it and change the value, read the latest value might be helpful. And 
it read the latest code in the next line in the original code, so I keep this 
style. But think again, if the value did changed in some place without hold any 
synchronize lock, this might still not be able to solve the  problem. I will 
update the pull request.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread colorant
Github user colorant commented on a diff in the pull request:

https://github.com/apache/spark/pull/1273#discussion_r14544536
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -141,7 +141,7 @@ class HadoopRDD[K, V](
   // local process. The local cache is accessed through 
HadoopRDD.putCachedMetadata().
   // The caching helps minimize GC, since a JobConf can contain ~10KB 
of temporary objects.
   // synchronize to prevent ConcurrentModificationException 
(Spark-1097, Hadoop-10456)
-  broadcastedConf.synchronized {
+  broadcastedConf.value.value.synchronized {
--- End diff --

regard conf , it is wrapped in SerializableWritable which enforce it to be 
a Writable, I think it won't be a null , or exception already been thrown 
somewhere before here. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-48000716
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-48000723
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-48002947
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16330/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-48002946
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1273#discussion_r14545496
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -141,7 +141,7 @@ class HadoopRDD[K, V](
   // local process. The local cache is accessed through 
HadoopRDD.putCachedMetadata().
   // The caching helps minimize GC, since a JobConf can contain ~10KB 
of temporary objects.
   // synchronize to prevent ConcurrentModificationException 
(Spark-1097, Hadoop-10456)
-  broadcastedConf.synchronized {
+  broadcastedConf.value.value.synchronized {
--- End diff --

We can keep without the check, the NPE should be apparent on the 
synchronized block.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/1273#issuecomment-48003381
  
LGTM, merging into master and branch-1.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1097] Workaround Hadoop conf Concurrent...

2014-07-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1273


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---