[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2871


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60792272
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22362/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60792256
  
  [Test build #22362 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22362/consoleFull)
 for   PR 2871 at commit 
[`1456d65`](https://github.com/apache/spark/commit/1456d6552f0becaefd97f20d9804bda54efdbc43).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60778288
  
  [Test build #22362 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22362/consoleFull)
 for   PR 2871 at commit 
[`1456d65`](https://github.com/apache/spark/commit/1456d6552f0becaefd97f20d9804bda54efdbc43).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread shivaram
Github user shivaram commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60778158
  
Thanks @rxin and @JoshRosen for taking a look. I will merge after Jenkins 
passes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19480010
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -157,14 +161,12 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
 out.defaultWriteObject()
   }
 
-  /** Used by the JVM when deserializing this object. */
-  private def readObject(in: ObjectInputStream): Unit = 
Utils.tryOrIOException {
-in.defaultReadObject()
+  private def readBroadcastBlock(): T = Utils.tryOrIOException {
--- End diff --

Hmm -- thanks for the clarification. I've kept the `tryOrIOException` for 
now as `getValue` hasn't thrown exceptions so far and we are introducing new 
behavior. We can revisit this at some point if it makes stack traces harder to 
read.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19479577
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
   val time = (System.nanoTime() - start) / 1e9
   logInfo("Reading broadcast variable " + id + " took " + time + " 
s")
--- End diff --

I used `Utils.getUsedTimeMs` -- Is that good ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-28 Thread shivaram
Github user shivaram commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19479618
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
   val time = (System.nanoTime() - start) / 1e9
   logInfo("Reading broadcast variable " + id + " took " + time + " 
s")
 
-  _value =
-TorrentBroadcast.unBlockifyObject[T](blocks, 
SparkEnv.get.serializer, compressionCodec)
+  val obj = TorrentBroadcast.unBlockifyObject[T](
+blocks, SparkEnv.get.serializer, compressionCodec)
   // Store the merged copy in BlockManager so other tasks on this 
executor don't
   // need to re-fetch it.
   SparkEnv.get.blockManager.putSingle(
-broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster 
= false)
+broadcastId, obj, StorageLevel.MEMORY_AND_DISK, tellMaster = 
false)
+  obj
   }
 }
   }
+
+  /** Used by the JVM when deserializing this object. */
+  private def readObject(in: ObjectInputStream) {
--- End diff --

Removed this


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60716720
  
LGTM, too.  Very clever testing strategy!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19455404
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -157,14 +161,12 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
 out.defaultWriteObject()
   }
 
-  /** Used by the JVM when deserializing this object. */
-  private def readObject(in: ObjectInputStream): Unit = 
Utils.tryOrIOException {
-in.defaultReadObject()
+  private def readBroadcastBlock(): T = Utils.tryOrIOException {
--- End diff --

I don't think that `Utils.tryOrIOException` is _strictly_ necessary here, 
since this method will no longer be called from `readObject()`, so we won't 
have to worry about any exceptions that it throws being swallowed inside of a 
Java deserializer.  I added the `tryOrIOException` blocks as part of a general 
campaign to combat error-reporting issues in our custom Externalizable / 
Serializable classes.  It's fine to use this here, though, if you want to 
restrict the range of possible exceptions that can be thrown when calling 
`getValue()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19455293
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
   val time = (System.nanoTime() - start) / 1e9
   logInfo("Reading broadcast variable " + id + " took " + time + " 
s")
--- End diff --

actually if u need to change josh's other comment, can u fix this one as 
well? :) thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19455263
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
   val time = (System.nanoTime() - start) / 1e9
   logInfo("Reading broadcast variable " + id + " took " + time + " 
s")
 
-  _value =
-TorrentBroadcast.unBlockifyObject[T](blocks, 
SparkEnv.get.serializer, compressionCodec)
+  val obj = TorrentBroadcast.unBlockifyObject[T](
+blocks, SparkEnv.get.serializer, compressionCodec)
   // Store the merged copy in BlockManager so other tasks on this 
executor don't
   // need to re-fetch it.
   SparkEnv.get.blockManager.putSingle(
-broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster 
= false)
+broadcastId, obj, StorageLevel.MEMORY_AND_DISK, tellMaster = 
false)
+  obj
   }
 }
   }
+
+  /** Used by the JVM when deserializing this object. */
+  private def readObject(in: ObjectInputStream) {
--- End diff --

Do we still need this method if it's just going to call the default 
readObject?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19455214
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
   val time = (System.nanoTime() - start) / 1e9
   logInfo("Reading broadcast variable " + id + " took " + time + " 
s")
--- End diff --

+1.  I've seen several cases where this type of human-friendly formatting 
has resulted in log messages that report things like "spilling 0 MB to disk" or 
"task took 0 seconds", which isn't helpful while debugging.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60715459
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19455067
  
--- Diff: 
core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala ---
@@ -21,11 +21,28 @@ import scala.util.Random
 
 import org.scalatest.{Assertions, FunSuite}
 
-import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException}
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException, SparkEnv}
 import org.apache.spark.io.SnappyCompressionCodec
+import org.apache.spark.rdd.RDD
 import org.apache.spark.serializer.JavaSerializer
 import org.apache.spark.storage._
 
+// Dummy class that creates a broadcast variable but doesn't use it
+class DummyBroadcastClass(rdd: RDD[Int]) extends Serializable {
+  @transient val list = List(1, 2, 3, 4)
+  val broadcast = rdd.context.broadcast(list)
+  val bid = broadcast.id
+
+  def doSomething() = {
+rdd.map { x =>
--- End diff --

this is pretty neat


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/2871#discussion_r19455054
  
--- Diff: 
core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala ---
@@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: 
ClassTag](obj: T, id: Long)
   val time = (System.nanoTime() - start) / 1e9
   logInfo("Reading broadcast variable " + id + " took " + time + " 
s")
--- End diff --

this is not your change, but we should log ms instead of s here. i will fix 
it later.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60714605
  
  [Test build #22335 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22335/consoleFull)
 for   PR 2871 at commit 
[`d6c5ee9`](https://github.com/apache/spark/commit/d6c5ee9629e4bc894913a4d181ca8ed963472e49).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...

2014-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2871#issuecomment-60714608
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22335/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org