[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2871 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60792272 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22362/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60792256 [Test build #22362 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22362/consoleFull) for PR 2871 at commit [`1456d65`](https://github.com/apache/spark/commit/1456d6552f0becaefd97f20d9804bda54efdbc43). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60778288 [Test build #22362 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22362/consoleFull) for PR 2871 at commit [`1456d65`](https://github.com/apache/spark/commit/1456d6552f0becaefd97f20d9804bda54efdbc43). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60778158 Thanks @rxin and @JoshRosen for taking a look. I will merge after Jenkins passes --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19480010 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -157,14 +161,12 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) out.defaultWriteObject() } - /** Used by the JVM when deserializing this object. */ - private def readObject(in: ObjectInputStream): Unit = Utils.tryOrIOException { -in.defaultReadObject() + private def readBroadcastBlock(): T = Utils.tryOrIOException { --- End diff -- Hmm -- thanks for the clarification. I've kept the `tryOrIOException` for now as `getValue` hasn't thrown exceptions so far and we are introducing new behavior. We can revisit this at some point if it makes stack traces harder to read. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19479577 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) val time = (System.nanoTime() - start) / 1e9 logInfo("Reading broadcast variable " + id + " took " + time + " s") --- End diff -- I used `Utils.getUsedTimeMs` -- Is that good ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user shivaram commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19479618 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) val time = (System.nanoTime() - start) / 1e9 logInfo("Reading broadcast variable " + id + " took " + time + " s") - _value = -TorrentBroadcast.unBlockifyObject[T](blocks, SparkEnv.get.serializer, compressionCodec) + val obj = TorrentBroadcast.unBlockifyObject[T]( +blocks, SparkEnv.get.serializer, compressionCodec) // Store the merged copy in BlockManager so other tasks on this executor don't // need to re-fetch it. SparkEnv.get.blockManager.putSingle( -broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster = false) +broadcastId, obj, StorageLevel.MEMORY_AND_DISK, tellMaster = false) + obj } } } + + /** Used by the JVM when deserializing this object. */ + private def readObject(in: ObjectInputStream) { --- End diff -- Removed this --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60716720 LGTM, too. Very clever testing strategy! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19455404 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -157,14 +161,12 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) out.defaultWriteObject() } - /** Used by the JVM when deserializing this object. */ - private def readObject(in: ObjectInputStream): Unit = Utils.tryOrIOException { -in.defaultReadObject() + private def readBroadcastBlock(): T = Utils.tryOrIOException { --- End diff -- I don't think that `Utils.tryOrIOException` is _strictly_ necessary here, since this method will no longer be called from `readObject()`, so we won't have to worry about any exceptions that it throws being swallowed inside of a Java deserializer. I added the `tryOrIOException` blocks as part of a general campaign to combat error-reporting issues in our custom Externalizable / Serializable classes. It's fine to use this here, though, if you want to restrict the range of possible exceptions that can be thrown when calling `getValue()`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19455293 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) val time = (System.nanoTime() - start) / 1e9 logInfo("Reading broadcast variable " + id + " took " + time + " s") --- End diff -- actually if u need to change josh's other comment, can u fix this one as well? :) thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19455263 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) val time = (System.nanoTime() - start) / 1e9 logInfo("Reading broadcast variable " + id + " took " + time + " s") - _value = -TorrentBroadcast.unBlockifyObject[T](blocks, SparkEnv.get.serializer, compressionCodec) + val obj = TorrentBroadcast.unBlockifyObject[T]( +blocks, SparkEnv.get.serializer, compressionCodec) // Store the merged copy in BlockManager so other tasks on this executor don't // need to re-fetch it. SparkEnv.get.blockManager.putSingle( -broadcastId, _value, StorageLevel.MEMORY_AND_DISK, tellMaster = false) +broadcastId, obj, StorageLevel.MEMORY_AND_DISK, tellMaster = false) + obj } } } + + /** Used by the JVM when deserializing this object. */ + private def readObject(in: ObjectInputStream) { --- End diff -- Do we still need this method if it's just going to call the default readObject? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19455214 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) val time = (System.nanoTime() - start) / 1e9 logInfo("Reading broadcast variable " + id + " took " + time + " s") --- End diff -- +1. I've seen several cases where this type of human-friendly formatting has resulted in log messages that report things like "spilling 0 MB to disk" or "task took 0 seconds", which isn't helpful while debugging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60715459 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19455067 --- Diff: core/src/test/scala/org/apache/spark/broadcast/BroadcastSuite.scala --- @@ -21,11 +21,28 @@ import scala.util.Random import org.scalatest.{Assertions, FunSuite} -import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkException} +import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, SparkException, SparkEnv} import org.apache.spark.io.SnappyCompressionCodec +import org.apache.spark.rdd.RDD import org.apache.spark.serializer.JavaSerializer import org.apache.spark.storage._ +// Dummy class that creates a broadcast variable but doesn't use it +class DummyBroadcastClass(rdd: RDD[Int]) extends Serializable { + @transient val list = List(1, 2, 3, 4) + val broadcast = rdd.context.broadcast(list) + val bid = broadcast.id + + def doSomething() = { +rdd.map { x => --- End diff -- this is pretty neat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/2871#discussion_r19455054 --- Diff: core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala --- @@ -173,15 +175,21 @@ private[spark] class TorrentBroadcast[T: ClassTag](obj: T, id: Long) val time = (System.nanoTime() - start) / 1e9 logInfo("Reading broadcast variable " + id + " took " + time + " s") --- End diff -- this is not your change, but we should log ms instead of s here. i will fix it later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60714605 [Test build #22335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22335/consoleFull) for PR 2871 at commit [`d6c5ee9`](https://github.com/apache/spark/commit/d6c5ee9629e4bc894913a4d181ca8ed963472e49). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4031] Make torrent broadcast read block...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2871#issuecomment-60714608 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22335/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org