[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/5178 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-136894558 Let's close this patch and reopen it later if necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-121073615 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-113335190 @viper-kun @maropu any updates? Should we take over? I would recommend that we close this patch since it's mostly gone stale. We can always reopen an updated version later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-94250246 @maropu I will update it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r28650737 --- Diff: core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala --- @@ -180,7 +180,8 @@ class FileShuffleBlockManager(conf: SparkConf) Some(segment.nioByteBuffer()) } - override def getBlockData(blockId: ShuffleBlockId): ManagedBuffer = { + override def getBlockData(blockId: ShuffleBlockId, + blockManagerId: BlockManagerId = blockManager.blockManagerId): ManagedBuffer = { --- End diff -- It not support HashShuffle with consolidateShuffleFiles. For not confuseï¼it only support SortShuffleManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-93917004 @viper-kun What's the status of this patch? If you don't make further updates, I'd like to brush up this patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27781922 --- Diff: core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala --- @@ -180,7 +180,8 @@ class FileShuffleBlockManager(conf: SparkConf) Some(segment.nioByteBuffer()) } - override def getBlockData(blockId: ShuffleBlockId): ManagedBuffer = { + override def getBlockData(blockId: ShuffleBlockId, + blockManagerId: BlockManagerId = blockManager.blockManagerId): ManagedBuffer = { --- End diff -- Why this patch only supports SortShuffleManager? ISTM the same logic could be straightforwardly implemented in HashShuffleManager. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27781900 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -181,68 +181,102 @@ final class ShuffleBlockFetcherIterator( // at most maxBytesInFlight in order to limit the amount of data in flight. val remoteRequests = new ArrayBuffer[FetchRequest] +val shuffleMgrName = blockManager.conf.get("spark.shuffle.manager", "sort") +val externalShuffleServiceEnabled = + blockManager.conf.getBoolean("spark.shuffle.service.enabled", false) + // Tracks total number of blocks (including zero sized blocks) var totalBlocks = 0 -for ((address, blockInfos) <- blocksByAddress) { - totalBlocks += blockInfos.size - if (address.executorId == blockManager.blockManagerId.executorId) { -// Filter out zero-sized blocks -localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1) -numBlocksToFetch += localBlocks.size - } else { -val iterator = blockInfos.iterator -var curRequestSize = 0L -var curBlocks = new ArrayBuffer[(BlockId, Long)] -while (iterator.hasNext) { - val (blockId, size) = iterator.next() - // Skip empty blocks - if (size > 0) { -curBlocks += ((blockId, size)) -remoteBlocks += blockId -numBlocksToFetch += 1 -curRequestSize += size - } else if (size < 0) { -throw new BlockException(blockId, "Negative block size " + size) - } - if (curRequestSize >= targetRequestSize) { -// Add this FetchRequest -remoteRequests += new FetchRequest(address, curBlocks) -curBlocks = new ArrayBuffer[(BlockId, Long)] -logDebug(s"Creating fetch request of $curRequestSize at $address") -curRequestSize = 0 - } +if (shuffleMgrName.toLowerCase == "hash" || externalShuffleServiceEnabled) { + for ((address, blockInfos) <- blocksByAddress) { --- End diff -- We should check if SortShuffleManager is used because this patch only supports it. This logic fails when new shuffle managers will be implemented. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-89735823 Understood. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-89716003 @maropu , yeah i think it is a common case for yarn mode. We often specify more executors than nodemanager, that means there are more than one executor on one machine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-89318869 One question; are there many cases for executors to share a single host in the Yarn mode? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27719488 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -223,6 +226,15 @@ class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus } } + // Return local dirs of other blockmanager on the same machine as blockManagerId + private def getLocalDirsPath( +blockManagerId: BlockManagerId): Map[BlockManagerId, Array[String]] = { +blockManagerInfo + .filter { case(id, _) => (id != blockManagerId && id.host == blockManagerId.host)} --- End diff -- Nit: Unnecessary parentheses. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-88064150 @andrewor14 Pls review it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87611637 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29391/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87577323 [Test build #29391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29391/consoleFull) for PR 5178 at commit [`6c5c1d4`](https://github.com/apache/spark/commit/6c5c1d4d143e4806edd6cf747b84c56f992f14a9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87572136 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29390/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87571847 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87566631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29385/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87552816 [Test build #29381 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29381/consoleFull) for PR 5178 at commit [`c429b66`](https://github.com/apache/spark/commit/c429b6617c042accf9028c61ba604bb8a6fe2635). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class GetLocalDirsPath(blockManagerId: BlockManagerId) extends ToBlockManagerMaster` * This patch does not change any dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87552831 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29381/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87547529 [Test build #29385 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29385/consoleFull) for PR 5178 at commit [`7ae6e46`](https://github.com/apache/spark/commit/7ae6e46dbdf5b9f7853c1ff2f03ea6c39af259ab). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87535646 [Test build #29381 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29381/consoleFull) for PR 5178 at commit [`c429b66`](https://github.com/apache/spark/commit/c429b6617c042accf9028c61ba604bb8a6fe2635). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87378645 [Test build #29363 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29363/consoleFull) for PR 5178 at commit [`6ee8df2`](https://github.com/apache/spark/commit/6ee8df2ccd10b05c031ed10ef3efe20a0b016a1e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class GetLocalDirsPath(blockManagerId: BlockManagerId) extends ToBlockManagerMaster` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87378679 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29363/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87370834 [Test build #29363 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29363/consoleFull) for PR 5178 at commit [`6ee8df2`](https://github.com/apache/spark/commit/6ee8df2ccd10b05c031ed10ef3efe20a0b016a1e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87370436 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29361/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87370430 [Test build #29361 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29361/consoleFull) for PR 5178 at commit [`383a21f`](https://github.com/apache/spark/commit/383a21ff062a3f96513e72705596527a2e571cb0). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class GetLocalDirsPath(blockManagerId: BlockManagerId) extends ToBlockManagerMaster` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87369980 [Test build #29361 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29361/consoleFull) for PR 5178 at commit [`383a21f`](https://github.com/apache/spark/commit/383a21ff062a3f96513e72705596527a2e571cb0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87212626 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29346/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87212610 [Test build #29346 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29346/consoleFull) for PR 5178 at commit [`9ddf84c`](https://github.com/apache/spark/commit/9ddf84cb983182abb208a49256c3037868229295). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class GetLocalDirsPath(blockManagerId: BlockManagerId) extends ToBlockManagerMaster` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87201940 [Test build #29346 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29346/consoleFull) for PR 5178 at commit [`9ddf84c`](https://github.com/apache/spark/commit/9ddf84cb983182abb208a49256c3037868229295). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user scwf commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-87201266 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86906360 **[Test build #29296 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29296/consoleFull)** for PR 5178 at commit [`bb94736`](https://github.com/apache/spark/commit/bb9473641c575e5022b95ee89372fe8981aa0814) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86906368 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29296/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86879709 [Test build #29296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29296/consoleFull) for PR 5178 at commit [`bb94736`](https://github.com/apache/spark/commit/bb9473641c575e5022b95ee89372fe8981aa0814). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86878577 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86849484 Hi @andrewor14. pls retest it, test build time out. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86828278 **[Test build #29274 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29274/consoleFull)** for PR 5178 at commit [`bb94736`](https://github.com/apache/spark/commit/bb9473641c575e5022b95ee89372fe8981aa0814) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86828291 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29274/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86805460 [Test build #29274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29274/consoleFull) for PR 5178 at commit [`bb94736`](https://github.com/apache/spark/commit/bb9473641c575e5022b95ee89372fe8981aa0814). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86433945 **[Test build #29218 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29218/consoleFull)** for PR 5178 at commit [`6162aef`](https://github.com/apache/spark/commit/6162aef5a5a2be52cf1b706b6cfc0c1aec204552) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86433963 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29218/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86430835 **[Test build #29217 timed out](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29217/consoleFull)** for PR 5178 at commit [`0eaf1af`](https://github.com/apache/spark/commit/0eaf1afdc0c082fde53f3edcbc45ee75425ea98b) after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86430853 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29217/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86388109 [Test build #29218 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29218/consoleFull) for PR 5178 at commit [`6162aef`](https://github.com/apache/spark/commit/6162aef5a5a2be52cf1b706b6cfc0c1aec204552). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user viper-kun commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27193887 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -181,68 +181,100 @@ final class ShuffleBlockFetcherIterator( // at most maxBytesInFlight in order to limit the amount of data in flight. val remoteRequests = new ArrayBuffer[FetchRequest] +val shuffleMgrName = SparkEnv.get.conf.get("spark.shuffle.manager", "sort") + // Tracks total number of blocks (including zero sized blocks) var totalBlocks = 0 -for ((address, blockInfos) <- blocksByAddress) { - totalBlocks += blockInfos.size - if (address.executorId == blockManager.blockManagerId.executorId) { -// Filter out zero-sized blocks -localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1) -numBlocksToFetch += localBlocks.size - } else { -val iterator = blockInfos.iterator -var curRequestSize = 0L -var curBlocks = new ArrayBuffer[(BlockId, Long)] -while (iterator.hasNext) { - val (blockId, size) = iterator.next() - // Skip empty blocks - if (size > 0) { -curBlocks += ((blockId, size)) -remoteBlocks += blockId -numBlocksToFetch += 1 -curRequestSize += size - } else if (size < 0) { -throw new BlockException(blockId, "Negative block size " + size) - } - if (curRequestSize >= targetRequestSize) { -// Add this FetchRequest -remoteRequests += new FetchRequest(address, curBlocks) -curBlocks = new ArrayBuffer[(BlockId, Long)] -logDebug(s"Creating fetch request of $curRequestSize at $address") -curRequestSize = 0 - } +if(shuffleMgrName == "hash") { --- End diff -- what are you meansï¼how to specify the full pathï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86383226 [Test build #29217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29217/consoleFull) for PR 5178 at commit [`0eaf1af`](https://github.com/apache/spark/commit/0eaf1afdc0c082fde53f3edcbc45ee75425ea98b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86201608 Hi @viper-kun it seems that there are quite a few style violations. For more detail please see https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide or https://github.com/databricks/scala-style-guide. Once you fix those I will do a closer review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27161351 --- Diff: core/src/main/scala/org/apache/spark/storage/ShuffleBlockFetcherIterator.scala --- @@ -181,68 +181,100 @@ final class ShuffleBlockFetcherIterator( // at most maxBytesInFlight in order to limit the amount of data in flight. val remoteRequests = new ArrayBuffer[FetchRequest] +val shuffleMgrName = SparkEnv.get.conf.get("spark.shuffle.manager", "sort") + // Tracks total number of blocks (including zero sized blocks) var totalBlocks = 0 -for ((address, blockInfos) <- blocksByAddress) { - totalBlocks += blockInfos.size - if (address.executorId == blockManager.blockManagerId.executorId) { -// Filter out zero-sized blocks -localBlocks ++= blockInfos.filter(_._2 != 0).map(_._1) -numBlocksToFetch += localBlocks.size - } else { -val iterator = blockInfos.iterator -var curRequestSize = 0L -var curBlocks = new ArrayBuffer[(BlockId, Long)] -while (iterator.hasNext) { - val (blockId, size) = iterator.next() - // Skip empty blocks - if (size > 0) { -curBlocks += ((blockId, size)) -remoteBlocks += blockId -numBlocksToFetch += 1 -curRequestSize += size - } else if (size < 0) { -throw new BlockException(blockId, "Negative block size " + size) - } - if (curRequestSize >= targetRequestSize) { -// Add this FetchRequest -remoteRequests += new FetchRequest(address, curBlocks) -curBlocks = new ArrayBuffer[(BlockId, Long)] -logDebug(s"Creating fetch request of $curRequestSize at $address") -curRequestSize = 0 - } +if(shuffleMgrName == "hash") { --- End diff -- I don't think this is safe to do. The user can specify the full path to hash shuffle writer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27159923 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -81,7 +89,38 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon new File(subDir, filename) } - def getFile(blockId: BlockId): File = getFile(blockId.name) + /** Looks up a file by blockManagerId */ + // This method should be kept in sync with + // org.apache.spark.network.shuffle.StandaloneShuffleBlockManager#getFile(). + def getFile(filename: String, blockManagerId: BlockManagerId): File = { --- End diff -- actually, I think it would make more sense to refactor this and the `getFile` in L65 to take the form: ``` def getLocalFile( filename: String, localDirs: Seq[String], createIfAbsent: Boolean = false): File ``` If the block manager is the one that belongs to this executor, then `createIfAbsent` will be true. Otherwise it's false. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27159189 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -81,7 +89,38 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon new File(subDir, filename) } - def getFile(blockId: BlockId): File = getFile(blockId.name) + /** Looks up a file by blockManagerId */ + // This method should be kept in sync with + // org.apache.spark.network.shuffle.StandaloneShuffleBlockManager#getFile(). + def getFile(filename: String, blockManagerId: BlockManagerId): File = { --- End diff -- should this be called `getLocalFile`? Also the javadoc should explicitly say that the given `blockManagerId` is expected to refer to an executor started on this particular machine (i.e. it should be local). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27159008 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -81,7 +89,38 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon new File(subDir, filename) } - def getFile(blockId: BlockId): File = getFile(blockId.name) + /** Looks up a file by blockManagerId */ + // This method should be kept in sync with + // org.apache.spark.network.shuffle.StandaloneShuffleBlockManager#getFile(). + def getFile(filename: String, blockManagerId: BlockManagerId): File = { +// Figure out which local directory it hashes to, and which subdirectory in that +val hash = Utils.nonNegativeHash(filename) + +var localDirsPath = localDirsByBlkMgr.get(blockManagerId) +if(!localDirsPath.isDefined) { + localDirsPath = localDirsByBlkMgr.synchronized { +val old = localDirsByBlkMgr.get(blockManagerId) +if(old.isDefined) { + old +} else { + localDirsByBlkMgr ++= blockManager.master.getLocalDirsPath(blockManager.blockManagerId) + localDirsByBlkMgr.get(blockManagerId) +} + } +} + +val dirId = hash % localDirsPath.get.length +val subDirId = (hash / localDirsPath.get.length) % subDirsPerLocalDir + +new File(localDirsPath.get(dirId) + "/" + "%02x".format(subDirId), filename) + } + + def getFile(blockId: BlockId, blockManagerId: BlockManagerId = blockManager.blockManagerId): File = { +if(blockManagerId == blockManager.blockManagerId) + getFile(blockId.name) +else + getFile(blockId.name, blockManagerId) --- End diff -- style: ``` if (...) { getFile(...) } else { getFile(...) } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27157449 --- Diff: core/src/main/scala/org/apache/spark/storage/DiskBlockManager.scala --- @@ -81,7 +89,38 @@ private[spark] class DiskBlockManager(blockManager: BlockManager, conf: SparkCon new File(subDir, filename) } - def getFile(blockId: BlockId): File = getFile(blockId.name) + /** Looks up a file by blockManagerId */ + // This method should be kept in sync with + // org.apache.spark.network.shuffle.StandaloneShuffleBlockManager#getFile(). --- End diff -- It seems that the changes here caused this to be out of sync, right? Do we want to make the same changes in `StandaloneShuffleBlockManager` too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27157027 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -223,6 +226,14 @@ class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus } } + // Return local dirs of other blockmanager on the same machine as blockManagerId + private def getLocalDirsPath(blockManagerId: BlockManagerId): Map[BlockManagerId, Array[String]] = { +blockManagerInfo.filter(info => (info._1 != blockManagerId && info._1.host == blockManagerId.host)) + .map { case(blockManagerId, info) => + (blockManagerId, info.localDirsPath) +}.toMap --- End diff -- style ``` blockManagerInfo .filter { case (id, _) => (id != blockManagerId && id.host == blockManagerId.host) } .mapValues { info => info.localDirsPath } .toMap ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86183415 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/29167/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86183403 [Test build #29167 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29167/consoleFull) for PR 5178 at commit [`5b766ca`](https://github.com/apache/spark/commit/5b766ca3f02d2e9b213e39908aec9c6b5804b623). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` case class GetLocalDirsPath(blockManagerId: BlockManagerId) extends ToBlockManagerMaster` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27156826 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -52,9 +52,9 @@ class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus private val akkaTimeout = AkkaUtils.askTimeout(conf) - override def receiveWithLogging: PartialFunction[Any, Unit] = { --- End diff -- please keep the return type --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86182951 [Test build #29167 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/29167/consoleFull) for PR 5178 at commit [`5b766ca`](https://github.com/apache/spark/commit/5b766ca3f02d2e9b213e39908aec9c6b5804b623). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/5178#discussion_r27156748 --- Diff: core/src/main/scala/org/apache/spark/shuffle/IndexShuffleBlockManager.scala --- @@ -52,12 +52,16 @@ class IndexShuffleBlockManager(conf: SparkConf) extends ShuffleBlockManager { ShuffleBlockId(shuffleId, mapId, 0) } - def getDataFile(shuffleId: Int, mapId: Int): File = { -blockManager.diskBlockManager.getFile(ShuffleDataBlockId(shuffleId, mapId, 0)) + def getDataFile(shuffleId: Int, + mapId: Int, + blockManagerId: BlockManagerId = blockManager.blockManagerId): File = { --- End diff -- hey @viper-kun the style here and other places should be: ``` def getDataFile( shuffleId: Int, mapId: Int, blockManagerId: BlockManagerId = ...): File = { ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-86182102 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/5178#issuecomment-85807625 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6521][Core]executors in the same node r...
GitHub user viper-kun opened a pull request: https://github.com/apache/spark/pull/5178 [SPARK-6521][Core]executors in the same node read local shuffle file In the past, executor read other executor's shuffle file in the same node by net. This pr make that executors in the same node read local shuffle file In sort-based Shuffle. It will reduce net transport. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viper-kun/spark readlocal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/5178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #5178 commit 28a96ec3fe2e9e6adadedddc6e74b2c1ef667b00 Author: xukun 00228947 Date: 2015-03-25T02:26:33Z read local commit e6d1e0b62e248714042a1f1d463f9ab26b018516 Author: xukun 00228947 Date: 2015-03-25T02:46:08Z read local commit 5b766ca3f02d2e9b213e39908aec9c6b5804b623 Author: xukun 00228947 Date: 2015-03-25T02:49:02Z fix it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org